7 days ago
Inside the making of India's default internet interface
Bengaluru: Geeta Nikam, 38, speaks into her smartphone in Marathi as she makes her way through a bustling vegetable market in Hiware Bazar, a village in Maharashtra, looking for seeds for her farm. A first-time internet and mobile phone user, Nikam has never typed a word. Keyboards, especially in Indic scripts, feel alien.
In Ludhiana, a large textile manufacturer with crores of rupees in revenue spends his entire working day talking to people on his phone to get tasks done. A computer system loaded with the business softwares of the world is useless to him.
Nikam and the textile manufacturer are part of the 'non-typing majority' among India's 900 million internet users. These are primarily people from Tier II, Tier III cities and villages, where English is uncommon and digital literacy is just emerging. Their preference for voice communication highlights a fundamental need for new interaction methods.
'Nobody's typing in Gujarati or Marathi," says Abhishek Upperwal, founder of Soket AI Labs. Founded in 2019 by Upperwal, Soket AI Labs is an AI research company developing multilingual large language models such as Pragna-1B for Indian languages. It is among the four startups selected by the government under its IndiaAI initiative to co-develop indigenous AI systems.
Voice remains the 'primary interface' for most Indians, says Upperwal. These users, including rural entrepreneurs, gig workers and homemakers, are reshaping India's internet, demanding tools that listen and respond in their native languages.
Those demands are slowly being addressed by AI startups. In the drought-prone villages of Maharashtra, for instance, farmers can now learn about crop insurance, credit eligibility and weather-resilient agriculture without reading a single word. They receive three-minute voice calls from bots deployed by a local non-banking financial company (NBFC), in partnership with Bengaluru-based conversational AI firm
The bot speaks in their own dialect, poses simple questions and delivers tailored advice. In one pilot, more than 15,000 farmers across 120 villages received weekly updates and 38% of them adopted new crop diversification strategies using this information, according to
Voice AI is improving accessibility and also reshaping how information is delivered at scale. In Tamil Nadu, the same technology was used to deliver financial literacy and health education to over 12,000 women in 85 villages. Following the initiative, 59% of the participants opened their first savings accounts and 41% reported improved medical savings behaviour, the company states.
These pilots, in some ways, demonstrate how a voice-native internet might function. 'India's voice-first internet will likely be a dynamic blend of multilingual, context-aware, and highly personalized experiences," says Ganesh Gopalam, CEO and cofounder of
From GUIs to voice
In 2019, a Google report said that Hindi had become the second-most used language globally on Google Assistant, just behind English, indicating how voice was gaining ground in multilingual markets such as India. Around 60% of Indian users interact with voice assistants on their smartphones, making voice a core part of everyday digital life.
A subsequent report by WATConsult found that 76% of Indian users were familiar with speech and voice-recognition technology, which reflects a natural shift in a country where smartphone access is high, but digital and linguistic literacy is not evenly distributed. A growing reliance on voice has emerged as a workaround to the limitations of Graphical User Interface (GUI)-based systems, which require users to be comfortable navigating English-language menus.
A GUI lets you interact with computers and devices using visuals such as buttons, icons and menus. This method of using computers started evolving in the 1970s with Xerox PARC's Alto. GUIs gained popularity with Apple's Macintosh in 1984 and then expanded through Microsoft Windows, transforming computing from complex text-based command systems into a more visual experience.
'GUIs ruled for decades because Apple and Microsoft made screens and clicks the global standard, but they're a poor fit for India's chaotic and voice-driven markets," says Tushar Shinde, founder of Vaani Research, an enterprise voice AI startup. 'With millions of non-typing users juggling dialects and high-volume businesses, voice is the natural interface. It's how we've always connected," he adds.
'Many entrepreneurs are earning crores in revenue but barely use any software, because they spend most of their day just talking to people. The systems built for them were never in a format they found acceptable," says Shinde, referring to enterprise softwares built specifically for small and medium businesses. 'That's where voice comes in." His startup builds voice agents for insurance, banking and healthcare clients.
Voice as an interface is now gaining prominence with advanced AI, as it offers hands-free convenience and challenges the long-standing dominance of GUIs in many everyday interactions.
This is especially relevant in markets such as India, where GUI-based systems that come designed with dropdown menus, buttons, toggles and input fields require a level of literacy and linguistic ease that many users simply don't have. Though Indic keyboards were introduced as an alternative for users like Nikam, they remain clunky and unreliable. Autocorrect often yields incorrect results and filling out forms in Hindi or Tamil becomes a frustrating ordeal.
Even targeted solutions such as Indus OS, founded by an IIT Bombay alumnus to create a multilingual app ecosystem, struggled to take off due to its heavy reliance on text navigation. In contrast, voice, which is rooted in India's oral culture, is now emerging as a more intuitive bridge to digital access.
Unlike GUI-based apps, voice systems deliver content naturally in the user's own words. A 2025 study cited by Gnani suggests that educational content delivered in local dialects results in 47% higher retention than standardized language formats.
Government push
India's voice AI sector is experiencing massive growth, driven by the country's linguistic diversity and increasing demand for voice-first digital interactions. Under the government's IndiaAI mission, launched in 2024 with a five-year budget of ₹10,372 crore, four startups—Sarvam, Soket Labs, and been selected to build foundational AI models in India.
Sarvam AI has developed Sarvam-M, a 24-billion-parameter multilingual large language model trained in 10 Indian languages, aiming to enhance reasoning tasks such as mathematics, coding and multilingual comprehension. Despite early criticism, the model is recognized for its technical achievements in building AI infrastructure within India.
specializes in voice-first agentic AI solutions, and supports over 40 languages, including 12 Indian languages. The platform handles more than 30 million voice interactions daily, serving over 150 enterprises across India and the US.
In Gurugram, Soket AI Labs is commercializing its Realtime Speech API, enabling AI agents to augment call centres with support for Hindi, Tamil and Marathi, addressing India's non-typing users. Last year, in one of its more ambitious endeavours, Soket AI Labs developed its foundational AI model, Pragna-1B (like Open AI's ChatGPT) with a focus on Indian languages. But in the absence of venture capital funding for research efforts, Soket pivoted to building monetizable voice APIs for customer support, marketing and sales, which proved to be a more immediate route to revenue, one that still serves India's non-typing majority.
Startups building voice-based applications in India lack the kind of institutional support and infrastructure readily available to their western counterparts. As a result, most are building voice-first applications for business use-cases to continue to fund their foundational research journey and attract investments.
Tech chops and challenges
Building voice AI for India is as much a linguistic challenge as a technological one. 'If there are five tokens for English, there will be almost 15 to 20 for Hindi," says Upperwal, pointing to the extra compute required for Indic languages.
A token is a unit of text, such as a word or character or subword, which language models use to process and generate language. The more tokens a sentence requires, the more memory and compute power the model needs to handle it, making Indian languages more resource intensive to train.
Soket introduced a novel tokenization method that significantly reduces this burden, making AI systems faster and more cost-effective for Indian languages.
Upperwal says that apart from core infrastructure, India's voice-first internet also relies on domain-specific intelligence tailored to Indian needs. 'Sectors like education and law are especially underserved. Western LLM models like ChatGPT don't work well with Indian legal systems. It often mixes up Indian and US laws," he says, recalling how a legal AI startup ran into challenges while fine-tuning an open-source model after it began hallucinating hybrid jurisprudence.
Soket is now exploring collaborations with domain experts to build models from scratch, rooted in Indian context and vernacular data. 'We're not domain experts, we build systems. But if startups come in with their expertise, we can train models together and open-source them for the ecosystem," he explains.
Indian startups still rely heavily on models like OpenAI and Deepgram, trained on Western datasets. These models often misinterpret names, accents or local nuances, especially in sectors such as healthcare or banking, where clarity is critical.
To close this gap, the IndiaAI mission has allocated subsidies and compute access to startups such as Soket and Vaani, encouraging them to build speech systems trained on Indian datasets. 'These infrastructure breakthroughs don't just make AI cheaper. They make it accessible to speakers of Hindi, Gujarati or Marathi," says Shinde. 'When an AI enabled system can understand a manufacturer in a remote village asking for a loan, that's inclusion at the infrastructure level."
For Global South
Startups, meanwhile, are also rethinking how voice might reshape user behaviour. 'Imagine booking a cab or shopping online with a single voice command," says Shinde. In low-bandwidth towns where GUIs fail to load, voice interfaces will allow people to interact with the web through language and not literacy. 'We could go back to how people actually used to shop conversationally, through discovery, without filters and dropdowns," he adds.
According to Prashanth Prakash, founding partner at Accel, voice will become the default interface for India's most critical sectors and redefine how people interact with information. 'In healthcare and education, voice won't replace doctors or teachers, but it will radically change explainability and access," he says. From appointment bookings to post-discharge summaries and classroom companions that personalize learning, voice interfaces offer a lower friction and highly intuitive alternative. 'Every doctor will have a co-pilot. Every workflow will start with voice," he adds.
The government thinks voice tools will play a key role in frontline service delivery. From crop advisories to citizen helplines, voice is being positioned as the cornerstone of public digital infrastructure, says Abhishek Singh, CEO of IndiaAI mission.
'If a voice-enabled advisory tool trained on Indian datasets can help a farmer here, it'll likely work anywhere in the Global South," Singh adds.
Limitations with voice
Voice-based applications offer a natural bridge to India's non-typing internet users, but the path is riddled with complexity. India's diverse linguistic landscape acts as a hurdle in accurate speech recognition.
While voice bots are changing habits, making digital interaction easier for those unfamiliar with typing, accuracy drops sharply in noisy environments and with dialectal variations.
'Replacing GUIs with voice at scale in India faces major design and infrastructure challenges, particularly for first-time digital users," says Gopalan of pointing out that ensuring accurate voice recognition across diverse accents, dialects and languages continues to remain a challenge.
Farmer Geeta Nikam admits that she uses voice commands to browse ecommerce websites but has never made a purchase using her device yet. India needs a system that enables confidence, ease and convenience for someone like Nikam to not just browse but place an order successfully using voice commands. When a billion Indians like her are able to navigate the web effortlessly by speaking, just as others do by typing, India's internet will look and sound different.
Read more on IndiaAI Mission in tomorrow's Long Story.