Skip to main content

2 posts tagged with "large dataset"

View All Tags

· 14 min read
Neel Phadnis

(Source: Photo by Clem Onojeghuo on Unsplash Source: Photo by Clem Onojeghuo on Unsplash

While it is possible to process a data set using a large number of parallel streams, a higher degree of parallelism may not be necessarily optimal or even possible. This article explores how to think about parallelism, and discusses many bottlenecks that limit the level of parallelism. It also highlights the need to perform measurements in the target setup due to many factors that cannot be easily quantified.

· 12 min read
Neel Phadnis

(Source: Photo by Dan Gold on Unsplash Source: Photo by Dan Gold on Unsplash

Aerospike provides several mechanisms for accessing large data sets over parallel streams to match worker throughput in parallel computations. This article explains the key mechanisms, and describes specific schemes for defining data splits and a framework for testing them.