Passive-container sieve: computing prime coordinates on a mod-60 grid rather than walking segments

5 days ago 4
ARTICLE AD BOX

I've implemented a prime sieve where segments are passive containers rather than active search spaces. Instead of walking through each segment to mark multiples, a function (ricerca_ciclo) computes directly the coordinates of each prime's multiples within the current window — the segment is written by address, never traversed.

The grid is mod-60 with 16 active residues. Each window is bit-packed into a uint64_t array (~32 KB). The parallel structure:

#pragma omp parallel reduction(+:contatore_totale) { std::vector<uint64_t> maschera_bit(array_size, 0); #pragma omp for for (int cicli = 1; cicli < quanti_cicli; cicli++) { std::fill(maschera_bit.begin(), maschera_bit.end(), 0); // ricerca_ciclo deposits coordinates directly — no segment walk for (uint64_t val : maschera_bit) ones_nel_segmento += std::popcount(val); contatore_totale += total_bits - ones_nel_segmento; } }

Questions:

Is std::fill the most cache-friendly reset for a ~32 KB uint64_t vector, or would memset / SIMD be preferable?

False sharing risk with per-thread heap-allocated vectors?

Would std::array with compile-time size outperform std::vector here?

Full implementation: https://github.com/Claugo/GC-60-sieve-wheel

Looking for collaborators to test on high-core-count hardware — issues and PRs welcome.

Read Entire Article