TY - JOUR
T1 - Swarm Intelligence-based Hierarchical Clustering for Identification of ncRNA using Covariance Search Model
AU - Pratiwi, Lustiana
AU - Choo, Yun Huoy
AU - Muda, Azah Kamilah
AU - Pratama, Satrya Fajri
N1 - © 2022,International Journal of Advanced Computer Science and Applications. All Rights Reserved.
PY - 2022
Y1 - 2022
N2 - Covariance Model (CM) has been quite effective in finding potential members of existing families of non-coding Ribonucleic Acid (ncRNA) identification and has provided excellent accuracy in genome sequence database. However, it has significant drawbacks with family-specific search. An existing Hierarchical Agglomerative Clustering (HAC) technique merged overlapping sequences which is known as combined CM (CCM). However, the structural information will be discarded, and the sequence features of each family will be significantly diluted as the number of original structures increases. Additionally, it can only find members of the existing families and is not useful in finding potential members of novel ncRNA families. Furthermore, it is also important to construct generic sequence models which can be used to recognize new potential members of novel ncRNA families and define unknown ncRNA sequence as the potential members for known families. To achieve these objectives, this study proposes to implement Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) to ensure the CCMs have the best quality for every level of dendrogram hierarchy. This study will also apply distance matrix as the criteria to measure the compatibility between two CMs. The proposed techniques will be using five gene families with fifty sequences from each family from Rfam database which will be divided into training and testing dataset to test CMs combination method. The proposed techniques will be compared to the existing HAC in terms of identification accuracy, sum of bit-scores, and processing time, where each of these performance measurements will be statistically validated.
AB - Covariance Model (CM) has been quite effective in finding potential members of existing families of non-coding Ribonucleic Acid (ncRNA) identification and has provided excellent accuracy in genome sequence database. However, it has significant drawbacks with family-specific search. An existing Hierarchical Agglomerative Clustering (HAC) technique merged overlapping sequences which is known as combined CM (CCM). However, the structural information will be discarded, and the sequence features of each family will be significantly diluted as the number of original structures increases. Additionally, it can only find members of the existing families and is not useful in finding potential members of novel ncRNA families. Furthermore, it is also important to construct generic sequence models which can be used to recognize new potential members of novel ncRNA families and define unknown ncRNA sequence as the potential members for known families. To achieve these objectives, this study proposes to implement Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) to ensure the CCMs have the best quality for every level of dendrogram hierarchy. This study will also apply distance matrix as the criteria to measure the compatibility between two CMs. The proposed techniques will be using five gene families with fifty sequences from each family from Rfam database which will be divided into training and testing dataset to test CMs combination method. The proposed techniques will be compared to the existing HAC in terms of identification accuracy, sum of bit-scores, and processing time, where each of these performance measurements will be statistically validated.
KW - Covariance model
KW - Hierarchical clustering
KW - Ncrna identification
KW - Swarm intelligence
UR - http://www.scopus.com/inward/record.url?scp=85143853952&partnerID=8YFLogxK
U2 - 10.14569/IJACSA.2022.0131195
DO - 10.14569/IJACSA.2022.0131195
M3 - Article
AN - SCOPUS:85143853952
SN - 2158-107X
VL - 13
SP - 822
EP - 831
JO - International Journal of Advanced Computer Science and Applications
JF - International Journal of Advanced Computer Science and Applications
IS - 11
ER -