Abstract
The Louvain community detection algorithm is a hierarchal clustering method categorized in the NP-hard problem. Its execution time to find communities in large graphs is, therefore, a challenge. Parallelization is an effective solution for amortizing Louvain's execution time. In this paper, we propose an adaptive CUDA Louvain method (ACLM) algorithm that benefits from the graphic processing unit (GPU). ACLM uses the shared memory in GPU, as well as the optimal number of threads in the GPU blocks. These features minimize parallelization overhead and accelerate the calculation of modularity parameters. The proposed algorithm allocates threads to each block based on the number of required streaming multiprocessors (SMs) and warps on GPU. The implementation results show that ACLM can effectively accelerate the execution time by 77% compared to the competitive method in the large graph benchmarks.
Original language | English |
---|---|
Pages (from-to) | 6056-6077 |
Number of pages | 22 |
Journal | Journal of Supercomputing |
Volume | 77 |
Issue number | 6 |
DOIs | |
Publication status | Published - Jun 2021 |
Keywords
- CUDA
- Louvain community detection
- Modularity
- Parallel processing