Nepali Speech Recognization project at Fusemachines

The Project

A team of young engineers from the Fusemachines AI Fellowship has been working on the “ Nepali Automatic Speech Recognition (Nepali-ASR)” project. The system is expected to fluently transcribe the Nepali speech into Devanagari lipi (standard Nepali font) by recognizing the speaker’s dialect. The project, still in its initial phase is already showing great potential. So far, the system has achieved an early success in recognizing and suggesting accurate write-outs for basic sets of Nepali words.

One of the goals of the project is to allow more Nepali students to do research on speech recognition through the building of an open-source domain. Another goal is to provide speech recognition services that are optimized for specific business domains. For example, the end product will be used in customer care service allowing customers to simply use voice commands that are transcribed into text. This will be very helpful specifically for those who aren’t fluent in English. Physically impaired people will also benefit from the product. Any tasks done on a computer or other digital devices that need a keyboard can be replaced by this speech recognition system.

The Team

The team is composed Miran Ghimire, Manish Chandra and Digya Acharya who have been working on the project for about a month now. Steven J Rennie, former team lead at IBM Watson (Multimodal Group), has been actively mentoring the team of recent graduates. Steven has more than twenty years of experience in the field of robust automatic speech recognition (ASR) and multi-tasks ASR.  Miran and Manish joined the AI Fellowship, whereas Digya has been working at Fusemachines for a year now as a Software Engineer Associate. 

Digya recently joined the Automatic Nepali Speech Recognition Project and has been devoting her full attention to the project. When asked “what led you to pursue a career path in AI?”, Digya mentions that she was always astonished by the way intelligent systems mimic the human cognition. “For my degree, I selected AI from the selective courses which gave me the first exposure to AI. I got to know about how various programs and robots can be taught to assist humans in so many tasks, which I thought was a very cool thing.” Digya adds. 

Miran and Manish are some of the few candidates from the AI Fellowship chosen to work on different AI-related projects. Manish used to wonder how can intelligence be displayed by the machines. “I had an opportunity to work on an AI related project during my Engineering. While working on it, I got to know different aspects of AI which really fascinated me and enhanced my enthusiasm in AI.” Miran also developed on his encounter with AI: “Once in school one of the teachers talked about fifth generation computers going to have artificial intelligence, since that moment AI has always been a topic of fascination to me.” Later he pursued AI in college and also got to implement it in a project, he fell in love with the field soon after that.

The Progress

With the help of Kaldian automatic speech recognition system, the team has been leveraging training models for the project. Progressively, they are collecting, processing & cleaning data and providing phonetics required for speech recognition and speech synthesis. 

People from other departments at Fusemachines have also contributed to this project by recording the necessary training data for the ASR system which was restricted to the political news only. They are developing the recognizer using Kaldi and training the data on various audio-books and news recitations so as to obtain enough data required to move on to open domain. They are also working on preprocessing or noise reduction of the training data. The training of the model in Kaldi has been fully automated as well. A benchmark of success of the project will be performing as well or better than the recently released Google Cloud API. If completed successfully, this Nepali Speech Recognition system would allow users to alleviate their requirement to share data with Google.

This project is observing progress each day and at this point, we cannot predict when it will be completed. However, it can be attested based on the qualifications and dedication of the engineers that the end product will push innovation in speech recognition technology forward.

Read our next blog ....