AMG based on compatible weighted matching for GPUs

We describe the main issues found in the design of an efficient implementation, tailored to GPGPUs, of an Algebraic MultiGrid (AMG) preconditioner recently proposed by one of the authors and already available for CPU in the open-source BootCMatch code. The AMG method relies on a new approach for coarsening sparse symmetric positive definite matrices, which we refer as coarsening based on compatible weighted matching. It exploits maximum weight matching in the adjacency graph of the sparse matrix and the principle of compatible relaxation to define a pairwise aggregation of unknowns. The aggregates are unknown pairs coupled in a maximum product matching of the original sparse matrix graph with suitable edge weights. The final aim is to enhance the diagonal dominance of a matrix that is the hierarchical complement of the resulting coarse matrix with respect to a given scalar product, thereby improving the convergence properties of a corresponding compatible relaxation scheme. The matched nodes are aggregated to form coarse set of unknowns, and piecewise constant or smoothed interpolation operators are applied for the construction of a multigrid hierarchy. No reference is made to any a priori knowledge on the matrix origin and possible information about smooth errors are used to define edge weights assigned to the original matrix graph. More large aggregates can be obtained by combining multiple steps of the basic pairwise aggregation. The most demanding kernel in this type of coarsening is the computation of an efficient maximum product matching. Accurate solutions for computation of maximum product matching in a graph are based on the Hungarian algorithm to search optimal augmenting paths in the matrix between unmatched vertices. This algorithm is a sequential process representing a roadblock in the search for an efficient parallel computation of a maximum weight matching. In our attempt to exploit high computing power of GPUs in the design of a parallel coarsening based on compatible weighted matching, we adopt an approximate solution widely used in related fields, such as in coarsening strategies for multilevel data partitioning as well as in scaling and permuting sparse matrices for efficient parallel direct solvers. We show that approximate solutions based on a recently proposed multithreaded matching algorithm, referred as the suitor algorithm, allow us to obtain good quality coarse matrices for our AMG on GPUs, largely reducing run times with respect to the original sequential algorithm implemented in BootCMatch. We will show results on a large set of sparse matrices arising from discretization of partial differential equations as well as from Laplacian operators of general complex graphs.
Tipo pubblicazione
Altri Autori
Massimo Bernaschi
Pasqua D;Ambra
Dario Pasquini