TY - JOUR
T1 - Clustering Persian viseme using phoneme subspace for developing visual speech application
AU - Aghaahmadi, Mohammad
AU - Dehshibi, Mohammad Mahdi
AU - Bastanfard, Azam
PY - 2013/8
Y1 - 2013/8
N2 - There are numerous multimedia applications such as talking head, lip reading, lip synchronization, and computer assisted pronunciation training, which entices researchers to bring clustering and analyzing viseme into focus. With respect to the fact that clustering and analyzing visemes are language dependent process, we concentrated our research on Persian language, which indeed has suffered from the lack of such study. To this end, we proposed a novel adopting image-based approach which consists of four main steps including (a) extracting the lip region, (b) obtaining Eigenviseme of each phoneme considering coarticulation effect, (c) mapping each viseme into its subspace and other phonemes' subspaces in order to create the distance matrix so as to calculate the distance between viseme's cluster, and finally (d) comparing similarity of each viseme based on the weight value of reconstructed one. In order to indicate the robustness of the proposed algorithm, three sets of experiments were conducted on Persian and English databases in which Consonant/Vowel and Consonant/Vowel/Consonant syllables were examined. The results indicated that the proposed method outperformed the observed state-of-the-art algorithms in feature extraction, and it had a comparable efficiency in generating adequate clusters. Moreover, obtained results reached a milestone in grouping Persian visemes with respect to the perceptual test given by volunteers.
AB - There are numerous multimedia applications such as talking head, lip reading, lip synchronization, and computer assisted pronunciation training, which entices researchers to bring clustering and analyzing viseme into focus. With respect to the fact that clustering and analyzing visemes are language dependent process, we concentrated our research on Persian language, which indeed has suffered from the lack of such study. To this end, we proposed a novel adopting image-based approach which consists of four main steps including (a) extracting the lip region, (b) obtaining Eigenviseme of each phoneme considering coarticulation effect, (c) mapping each viseme into its subspace and other phonemes' subspaces in order to create the distance matrix so as to calculate the distance between viseme's cluster, and finally (d) comparing similarity of each viseme based on the weight value of reconstructed one. In order to indicate the robustness of the proposed algorithm, three sets of experiments were conducted on Persian and English databases in which Consonant/Vowel and Consonant/Vowel/Consonant syllables were examined. The results indicated that the proposed method outperformed the observed state-of-the-art algorithms in feature extraction, and it had a comparable efficiency in generating adequate clusters. Moreover, obtained results reached a milestone in grouping Persian visemes with respect to the perceptual test given by volunteers.
KW - Audio/visual processing
KW - Computer assisted pronunciation training
KW - Eigen space
KW - Multimedia systems
KW - Persian viseme clustering
UR - http://www.scopus.com/inward/record.url?scp=84878341066&partnerID=8YFLogxK
U2 - 10.1007/s11042-012-1128-7
DO - 10.1007/s11042-012-1128-7
M3 - Article
AN - SCOPUS:84878341066
SN - 1380-7501
VL - 65
SP - 521
EP - 541
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 3
ER -