In Western countries, speech recognition-based technologies have significantly developed compared to the countries of the South Asian subcontinent like India. India is a multilingual country (22 scheduled languages) with over 1.3 Billion population of which a major percentage faces difficulty with the user interface of different technological advancements and therefore speech recognition tools are very useful. In this paper, we propose LIFA: Language Identification From Audio - a fully automated tool that can identify the spoken language (phrases/words) and invoke the language-specific recognition engine. Experiments were performed on more than 2200 hours of data from the top-11 spoken languages in India. The clips were parameterized with a novel linear predictive cepstral coefficient (LPCC)-based features, which we call LPCC-Grade (LPCC-G). The proposed feature is capable of focusing on the distribution of energy across different frequency ranges in an audio clip for better classification while avoiding high dimensionality issues. Using a random forest-based classifier, we achieved the highest accuracy of 99.01%. Further, we tested the robustness of the system with different noisy scenarios on multiple datasets wherein accuracies in the range of 79%-98% were obtained. We also studied other popular existing features in our comparison where accuracies of and 96.37% and 92.48% were obtained for LSF and MFCC-based features.
Preview PaperProvide a Feedback