GPU-based Parallel Technique for Solving the N-Similarity Problem in Textual Data Mining

Mahmood Fazlali, Mina Mirhosseini, Mahyar Shahsavari, Alex Shafarenko, Mashaallah Mashinchi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

An important issue in data mining and information retrieval is the problem of multiple similarity or n-similarity. This problem entails finding a group of n data points with the highest similarity within a large dataset. Exact methods to solve this problem exist but come with high time and space complexities. Additionally, various metaheuristic algorithms have been proposed, including genetic algorithms, gravitational search algorithms, particle swarm optimization, imperialist competitive algorithms, and fuzzy imperialist competitive algorithms. These metaheuristics are capable of finding near-optimal solutions within a reasonable timeframe, although there is no guarantee of achieving exact results. In this paper, we employ a parallelization technique using CUDA to expedite the exact method. We conduct experiments on textual datasets to identify a group of n textual documents with the highest similarity to each other. The experimental results demonstrate that the proposed parallel exact method significantly reduces execution time compared to the best sequential approach and CPU multi-core implementation. Furthermore, it is evident that the proposed method requires less memory space than the exact method.
Original languageEnglish
Title of host publication2024 Third International Conference on Distributed Computing and High Performance Computing (DCHPC)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Chapter4
Pages1-6
Number of pages6
ISBN (Electronic)979-8-3503-8158-0
DOIs
Publication statusPublished - 4 Mar 2024
Event2024 Third International Conference on Distributed Computing and High Performance Computing (DCHPC) - Iran, Tehran, Iran, Islamic Republic of
Duration: 14 Apr 202415 Apr 2024
Conference number: 3
https://www.iahpc.ir/

Conference

Conference2024 Third International Conference on Distributed Computing and High Performance Computing (DCHPC)
Abbreviated titleDCHPC 2024
Country/TerritoryIran, Islamic Republic of
CityTehran
Period14/04/2415/04/24
OtherThe DCHPC2024 biannual conference is organized jointly by the Institute for Research in Fundamental Sciences (IPM) and the Informatic Society of Iran (DCS scientific group), following the successful conferences held in 2018 and 2022 (www.iahpc.ir).
Internet address

Keywords

  • multiple similarity
  • n-similarity
  • parallel programming
  • text document similarity

Fingerprint

Dive into the research topics of 'GPU-based Parallel Technique for Solving the N-Similarity Problem in Textual Data Mining'. Together they form a unique fingerprint.

Cite this