India-made app turns impaired speech into clear speech in near-realtime

A whisper. A few chatty words. For those suffering from dysarthria, a motor speech disorder, basic communication is a challenge that indelibly affects both their professional and personal lives. But now a new innovation based on artificial intelligence (AI) and developed in India could be life-changing.

A team led by associate professor Vineet Gandhi from the International Institute of Information Technology (IIIT), Hyderabad, has developed a simple app that can help people speak as voice translation transforms the speaker’s voice in near real-time. The app can convert slurred speech into clear, natural-sounding speech or use a camera to analyze lip movements and subtle throat vibrations to create intelligible speech.

While the current project is being conducted in English, the team’s next goal is to bring these technologies to regional languages including Hindi, Telugu and Tamil; because many people across the country do not have the opportunity to benefit from accessibility-oriented artificial intelligence models. For this work, Mr. Gandhi won the Anusandhan National Research Foundation (ANRF) award in 2026.

Excerpts from an interview:

What inspired you to start working on this human AI project?

My research has always been guided by a simple question: What real problem can technology help solve?

Although my academic training was predominantly in computer vision, about four years ago I began to see the exciting possibilities emerging in speech research and decided to explore the field in more depth. I have become increasingly aware of the challenges faced by many people who lose the ability to speak due to medical conditions: The impact of this loss extends far beyond communication; impacting independence, identity, and connection.

Recognizing this need inspired me to focus my work on accessibility-focused technologies designed to restore or enable speech, with the goal of helping people regain their voices.

Can you explain how the app works for people with speech disabilities?

The app is designed to convert garbled or garbled speech into clear, natural-sounding speech with a delay of just a few hundred milliseconds. The user simply speaks with his or her own voice and the system processes this to produce intelligible speech for the listener.

We are also developing a complementary lip-speech capability where the person can silently move their lips and the system generates the corresponding speech.

A key aspect we focus on is personalization, where users can calibrate and improve the app to their own voice by reading a few minutes of text in the app.

We aim to integrate these technologies into common communication platforms such as web-based search applications and facilitate daily communication for people with speech disabilities.

You also aim to expand this technology to regional Indian languages. How do you hope to achieve this?

Currently, much of the global speech technology ecosystem is designed predominantly for English, and our early experiments naturally followed suit. But one of the main goals of our research is to extend these capabilities to regional Indian languages, where accessible speech technologies are equally important.

To achieve this, we plan to collect speech data across Indian languages and develop data-efficient models suitable for low-resource scenarios. Our approach involves data augmentation and efficient fine-tuning of pre-trained models.

We have already conducted preliminary experiments in Hindi with promising results, and with support from the Anusandhan National Research Foundation, we aim to further develop this work and extend it to other Indian languages.

You believe that “accessibility and linguistic diversity” are crucial for AI research in India. Can you elaborate?

Accessibility and linguistic diversity are key considerations for AI research in India. After spending several years in Europe, I have observed that accessibility is being integrated much more systematically into public infrastructure and digital services.

In India, by contrast, there are still significant gaps even in public spaces such as railway stations, where basic accessibility is often limited. This underscores a broader need to design technologies that are consciously inclusive of people with disabilities.

At the same time, India’s linguistic diversity reveals another important dimension. In many parts of the country, especially in rural areas, conversation remains the most natural and basic form of interaction. Text-heavy or typing-based interfaces may not always be practical or inclusive in such contexts. Therefore, AI systems designed for India need to prioritize conversational interaction and support multiple regional languages.

Taken together, meaningful accessibility and strong support for linguistic diversity are essential for digital technologies to be truly inclusive and widely used across the country.

WHO said, “The future of health is digital”…

The World Health Organization emphasized that the future of healthcare will become increasingly digital. In a country like India, telemedicine can play a transformative role, especially when supported by basic diagnostic infrastructure that allows for more accurate remote consultations at the local level.

Another important aspect is AI-assisted diagnosis, where machine learning systems analyze medical images, speech or health records to support early disease detection and prediction.

Practical solutions are already emerging. For example, ‘Shishu Maapan’ developed by Wadhwani AI helps measure newborn weight and size just from mobile photos and is being adopted by frontline healthcare workers like ASHA workers.

Digital tools also enable assistive health technologies, including speech restoration systems for people who have lost the ability to speak, and wearable devices that constantly monitor health parameters and alert doctors to possible abnormalities. These developments show how digital innovation can make healthcare more accessible and scalable.

A common criticism of AI-generated speech is that, although it is understandable, it often fails to capture the unique rhythm of the speaker. When restoring voice to someone with dysarthria, how do you balance the need for clear communication with the need to preserve the user’s individual human essence?

This is an important concern. If recordings of the speaker’s original voice before the onset of dysarthria are available, modern voice cloning techniques can recreate that voice in as little as 10 seconds of speech. So preserving an individual’s vocal identity is technically possible today, and there is significant research demonstrating this ability. However, our current implementation focuses primarily on restoring the comprehensibility of content and ensuring that what the user wants to say is clearly communicated. For now, the generated speech uses a common voice instead of a personalized voice.

However, text-to-speech systems are becoming increasingly natural and are now being integrated into conversational bots that are replacing many traditional customer service applications. As we discussed in our previous work on empathetic speech production, emotional nuance remains more challenging, but progress is rapid.

How does the model distinguish between slurred speech and a noisy background when navigating, say, a busy street in India?

This is indeed a significant challenge for India, where real-world environments can be extremely chaotic. Anyone considering driving self-driving cars here quickly realizes how unpredictable our roads can be: traffic patterns, honking horns, pedestrians and vehicles all interact in highly dynamic ways. Speech technology faces a similar level of complexity.

In our experiments, we improve robustness by using noise boosting, where we simulate different noisy environments during training so that the model learns to handle background noises. Ultimately, the most effective solution is to collect more real-world data from noisy environments and train on this data. Even then, some performance degradation is inevitable because distinguishing speech impairment from heavy background noise is a fundamentally difficult problem.

divya.gandhi@thehindu.co.in