A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
-
Updated
Apr 20, 2024 - C++
A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
An Efficient Pipelined Data Parallel Approach for Training Large Model
Hydrodynamic Cytoskeleton Simulator
HPC-optimised C++ CFD solver (D2Q9 LBM) featuring hybrid MPI/OpenMP parallelism, explicit AVX2 SIMD vectorisation, and flow validation against the von Kármán vortex street.
Multiple Sequence Aligner using hybrid parallel computing
Gaussian blur implementation comparing serial, MPI, and hybrid MPI+OpenMP performance.
High-performance n-body solver achieving 128× speedup via checkerboard domain decomposition, toroidal neighbor communication, and sweep-and-prune for scalable uniform/gaussian workloads.
Add a description, image, and links to the hybrid-parallelism topic page so that developers can more easily learn about it.
To associate your repository with the hybrid-parallelism topic, visit your repo's landing page and select "manage topics."