Spoken Term Detection (STD) approaches can be divided into two main groups: Hidden Markov Model (HMM)-based and Discriminative STD (DSTD) approaches. One of the important advantages of HMM-based methods is that they can use context dependent (diphone or triphones) infor More
Spoken Term Detection (STD) approaches can be divided into two main groups: Hidden Markov Model (HMM)-based and Discriminative STD (DSTD) approaches. One of the important advantages of HMM-based methods is that they can use context dependent (diphone or triphones) information to improve the whole STD system performance. On the other hand, lack of triphones information is one of the significant drawbacks of DSTD methods. In this paper, we propose a solution to overcome this drawback of DSTD systems. To this end, we modify the feature extraction part of an Evolutionary DSTD (EDSTD) system to consider triphones information. At first, we propose a monophone-based feature extraction part for the EDSTD system. Then, we propose an approach for exploiting triphones information in the EDSTD system. The results on TIMIT database indicate that the true detection rate of the triphone-based EDSTD (Tph-EDSTD) system, in false alarm per keyword per hour greater than two, is about 3% higher than that of the monophone-based EDSTD (Mph-SDSTD) system. This improvement costs about 36% degradation of the system response speed which is neglected.
Manuscript profile