A vast amount of research is going on for design of robust speech recognition in to alleviate speech variability conditions. One of the variability aspects is the difference between telephony speech and direct speech (recorded in noise free conditions). In this paper by More
A vast amount of research is going on for design of robust speech recognition in to alleviate speech variability conditions. One of the variability aspects is the difference between telephony speech and direct speech (recorded in noise free conditions). In this paper by using a set of experiments, it is shown that LHCB parameters are superior to traditional MFCCs for speech recognition applications when they are used in a neural network based speech recognition system for both direct and telephony speech. Then by extraction of LHCBs from direct and telephony speech, and training of a MLP based speech recognition model, a direct and telephony speech recognition system is developed. Using a neural network inversion based on gradient descent method, the telephony speech feature vectors are modified toward to the direct speech feature vectors and by training a second network on modified telephony and direct speech feature vectors a 1.4% enhancement on speech recognition was achieved. Later, using general inversion method of neural networks both telephony and direct speech feature vectors are modified in a manner which mainly contains phonetic information and not other speech variations. Then by the training of the second neural network on this dataset, the system achieved 2.98% and 1.68% higher recognition rate for direct and telephony speech, respectively.
Manuscript profile
In this article, for the purpose of improving neural network models applied in face recognition using single image per person, a bidirectional neural network inspired of neocortex functional model is presented. In the proposed model, recognition is not performed in a si More
In this article, for the purpose of improving neural network models applied in face recognition using single image per person, a bidirectional neural network inspired of neocortex functional model is presented. In the proposed model, recognition is not performed in a single stage, but via two bottom-up and top-down phases and the recognition results of first stage is used for model adaptation. We have applied this novel adapting model in combination with clustering person and pose information technique to separate person and pose information and to estimate corresponding manifolds. To increase the number of training samples in the classifier neural network, virtual views of frontal images in the test dataset are synthesized using estimated manifolds. Training classifier network via virtual images obtained from bidirectional network, gives an accuracy rate of 85.45% on the test dataset which shows 1.82% improvement in accuracy of face recognition compared to training classifier with virtual images obtained from clustering person and pose information network.
Manuscript profile
In this work, in order to increase the capacity of a recurrent neural network, we present a model for extracting common features and sharing them across data. As a result of using this model, extracted principle components of data will be invariant to unwanted variation More
In this work, in order to increase the capacity of a recurrent neural network, we present a model for extracting common features and sharing them across data. As a result of using this model, extracted principle components of data will be invariant to unwanted variations. The recurrent connection of the network removes the noise using a continuous attractor formed during the training phase. The defined speaker codes will be transformed to the information need for switching the continuous attractor in the input space. As a result, speaker variations can be compensated and the recognition will performed when a clean signal is available. We compared the performance of this method with a reference network described in the paper. The results show that the proposed model is more useful in removing noise and unwanted variations.
We compared the performance of this method with the reference network. The results show that the proposed model performs better in removing noise and unwanted variations, it increased the phoneme recognition accuracy about 5% when the signal to noise ratio is 0 dB.
Manuscript profile