A Multi-GPU Implementation of a D2Q37 Lattice Boltzmann Code

Abstract
We describe a parallel implementation of a compressible Lattice Boltzmann code on a multi-GPU cluster based on Nvidia Fermi processors. We analyze how to optimize the algorithm for GP-GPU architectures, describe the implementation choices that we have adopted and compare our performance results with an implementation optimized for latest generation multi-core CPUs. Our program runs at approximate to 30% of the double-precision peak performance of one GPU and shows almost linear scaling when run on the multi-GPU cluster.
Anno
2012
Tipo pubblicazione
Altri Autori
Biferale, Luca and Mantovani, Filippo and Pivanti, Marcello and Pozzati, Fabio and Sbragaglia, Mauro and Scagliarini, Andrea and Schifano, Sebastiano Fabio and Toschi, Federico and Tripiccione, Raffaele
Titolo Volume
PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I