- maintain load balance and low synchronization
- explore of architectural features, e.g. vectorization, synchronization (mutexes, compare-and-swap, transactional memory, privatization), managing high-bandwidth memory (MCDRAM).
- Platform: One KNL processor
- Speedup: 1.8x speedup over a dual socket Intel Xeon 44-core system.
- HPC systems are increasingly used for data intensive computations which exhibit irregular memory accesses, non-uniform work distributions, large memory footprints, and high memory bandwidth demands.
- sparse, unstructured tensors
- Challenges of optimization algorithms on many-core processors:
a high degree of parallelism, load balance tens to hundreds of parallel threads, and effectively utilize the high-bandwidth memory.
- FROSTT [Link]