When used with the Spark, Aerospike offers massively parallel storage to enable you to build high throughput and low latency ETL pipelines. Aerospike is a highly scalable NoSQL database and its hybrid memory architecture makes it an ideal database for Spark applications. It is typically deployed into real-time environments managing terabyte to petabyte data volumes and supports millisecond read and write latencies. It leverages bleeding-edge storage innovations such as PMem or persistent memory from best of breed hardware companies such as HPE and Intel. By storing indexes in DRAM, and data on persistent storage (SSD) and read directly from SSD, Aerospike provides unparalleled speed and cost-efficiency.
Accelerate Spark queries with Predicate Pushdown using Aerospike
· 8 min read