The Two-Rupee Voice
Here is a fact about Indian AI in 2026 -
A full voice conversation in Hindi — speech-to-text, a language model that thinks for a second, text-to-speech that talks back, and a translation step in the middle — now costs roughly ₹1.85 to ₹2.05 end-to-end. Call it two rupees. That is about 30 to 60 seconds of MGNREGA wage under the FY 2025–26 wage notification. It is also one-fifteenth of what the same conversation would have cost on a frontier-vendor API in mid-2024, when GPT-4-class tokenizers were charging Hindi-speakers four to eight tokens per word — versus 1.4 for English — for the exact same meaning. The “token tax.”
The token tax is dead. An Indian-built 2-billion-parameter model called Sarvam-1, trained from scratch on 4,096 H100s on Yotta’s Shakti cloud, dropped Hindi token fertility to 1.4–2.1. Saaras V3, an Indian-built ASR system, reaches 19.31% word error rate on the IndicVoices 10-language test set at ₹30 per hour of audio. Bulbul, an Indian-built TTS system, costs ₹15 per 10,000 characters. The compute underneath all this is, courtesy of the IndiaAI Mission, ₹65–92 per H100-hour with a 40% subsidy that drops the effective rate to about ₹40/hour for empanelled users. A leading commercial Indian neocloud charges around ₹249/hour for the same chip. AWS Mumbai charges roughly ₹330. The Indian government, when it wants a Hindi sentence to come out of a speaker, is paying somewhere between zero and twelve percent of what AWS would charge.
This is, in industrial-policy terms, a triumph. It is also — and this is the part I want to spend the next several thousand words on — the moment the economics of giving every poor Indian a personal AI advisor structurally crossed below the cost of any single welfare program in the country. And almost nobody is currently buying.
Let me explain.
1. The basic problem
The basic problem is that there are, depending on which number you trust, somewhere between 129 million and 234 million Indians living in poverty, and they have always had an information problem.
NITI Aayog’s January 2024 Discussion Paper on Multidimensional Poverty puts the headcount ratio at 11.28% in 2022–23, down from 24.85% in 2015–16, with 24.82 crore people exiting in eight years. UP alone moved 5.94 crore out of MPI poverty; Bihar 3.77 crore; MP 2.30 crore. The UNDP’s Global MPI 2024, which uses the older NFHS-5 round, still classifies 234 million Indians as multidimensionally poor — the largest national cohort in the world. The Household Consumption Expenditure Survey 2022–23, the first such round since 2011–12, implies a Tendulkar-equivalent monetary poverty of 5–10%. The Government of India, asked under RTI in December 2024 by Down To Earth for its current official poverty count, replied — and I am compressing this — we don’t actually maintain one.
So: a number, somewhere between 130 and 240 million, mostly concentrated in Bihar, UP, Jharkhand, MP, Odisha, Rajasthan, Assam, Chhattisgarh; speaking, mostly, not the languages OpenAI’s tokenizer was optimized for; relying on a frontline-worker network — about a million ASHAs, 1.3 million Anganwadi workers, 50,000 agricultural extension officers — that is structurally undersupplied; and, since the JAM trinity finished its job, fully addressable via Aadhaar and UPI and increasingly via WhatsApp.
What this person needs from an AI is not particularly mysterious. Is my crop disease bacterial or fungal? When does my PM-KISAN installment land? Is my pregnancy at risk? What does my MGNREGA wage statement actually say? My Aadhaar got delinked from my ration card — what do I do? The answers to these questions are, mostly, sitting in some government database or Krishi Vigyan Kendra advisory that the user cannot read in any language they speak. The agent-to-farmer ratio in the Indian extension system is 1:650. The ASHA-to-population ratio is roughly 1:1,000. The information layer is broken not because the information doesn’t exist but because the last-mile translation of that information into the user’s spoken Bhojpuri or Maithili or Santhali is, structurally, missing.
The thing AI is genuinely good at is the spoken-Bhojpuri-to-government-database translation. The thing AI was, until eighteen months ago, far too expensive to do at population scale. That second thing has changed. The first thing has not.
2. The token tax, and what its death means
I want to dwell on the token tax for a moment, because it is the cleanest example of how the global frontier was, for several years, charging a structural surcharge to people who could not afford it.
A tokenizer is the thing that breaks an input string into the units a language model actually thinks in. Roughly, OpenAI’s cl100k_base tokenizer — the GPT-4 / GPT-3.5 era — assigns about 1.4 tokens per word for English. For Hindi, the same tokenizer assigns roughly 4 to 6 tokens per word. For Tamil, 7 to 8. For Malayalam, in some samples, double-digit tokens per word. The reason is mundane: the tokenizer was trained on a corpus that was overwhelmingly English, so English words got single-token “shortcuts” while Devanagari/Tamil/Malayalam scripts had to be decomposed into characters or sub-character UTF-8 fragments.
What this looks like in practice: if you and I are asking the same question, and you ask in English and I ask in Tamil, I am paying five times your bill. For the same meaning. On the same model. At the same per-token price. When OpenAI released GPT-4o’s o200k_base in 2024, Microsoft’s announcement noted that the new tokenizer cut Tamil tokens by about 74% and Malayalam by roughly 4×. A tacit admission, if you read it the right way, that the previous billing was structurally weird.
The Indian fix arrived in October 2024. Sarvam-1, a 2-billion-parameter dense decoder trained from scratch by Sarvam AI on 2 trillion tokens via NVIDIA NeMo on 4,096 H100 SXMs in Yotta’s Shakti cloud, dropped Hindi token fertility to 1.4–2.1. Telugu, Kannada, Tamil, Bengali, Marathi, Gujarati, Punjabi, Malayalam, Odia, English: ten Indic-plus-English languages, each tokenized at roughly the same fertility as English. The model is openly available; the API is rupee-priced; the inference is reportedly four to six times faster than Gemma-2-9B and Llama-3.1-8B on Indic tasks.
The practical consequence is that the price of one unit of meaning in Hindi is now structurally similar to the price of one unit of meaning in English. The poor Indian’s linguistic surcharge — let’s call it what it was — has been refunded.
The price-per-meaning collapse compounds with two other things:
The first is ASR cost. Sarvam’s published Saaras V3 rate is ₹30 per hour of audio (about $0.36), at 19.31% WER on the 10-language IndicVoices subset. Whisper-large-v3, OpenAI’s flagship Indic-enabled ASR, would cost five to ten times more per hour at retail and would, on rural Bhojpuri women’s voices specifically, be measurably worse — there is an entire AI4Bharat dataset called “Bhojpuri and Hindi Rural Women ASR” because this gap had to be addressed by hand.
The second is compute cost. The IndiaAI Mission has onboarded 34,000+ GPUs across 14 empanelled providers — Yotta, E2E Networks, Tata Communications, Jio Platforms, CtrlS, AWS MSPs and others — at a tendered floor of ₹65/GPU-hour, with H100s at ₹92/hour and a 40% subsidy for approved researchers, MSMEs, startups, and government users. Sarvam alone received 4,096 H100s and a ₹98.68 crore subsidy against a ₹246.71 crore project award. The mission’s total approved sanction across 12 foundation-model awardees is over ₹2,000 crore.
Stack the three: tokenizer fertility down ~3–4×; ASR cost down ~5–10×; compute cost down ~5–9× against AWS retail. The unit cost of one Indian-language voice round-trip has fallen by something like fifteen to forty times in eighteen months, depending on the language and the workload.
This is what people mean when they say “the price has crossed the threshold.” It is a real number; it is published; it is invoiced.
3. The two-rupee voice
Here is the actual unit-economics arithmetic, which I want to walk through carefully because it is the central fact of this essay.
A full voice round-trip — a poor Hindi-speaking farmer asks his phone whether his cotton crop has bollworm, the system transcribes, retrieves, generates, translates, speaks back — has roughly four cost layers:
| Layer | Provider/system | Cost per query |
|---|---|---|
| ASR (≤60 sec audio) | Sarvam Saaras V3 at ₹30/hr | ₹0.50 |
| LLM inference (~500 in / 200 out tokens) | Sarvam-1 / Llama 3.2 3B on subsidized IndiaAI compute | ₹0.05–0.20 |
| TTS (~200 chars output) | Bulbul v2 at ₹15/10K chars | ₹0.30 |
| Translation (~500 chars) | IndicTrans2 / Bhashini at ₹20/10K chars | ₹1.00 |
| Total per voice query | ₹1.85–2.00 |
Round it: ₹2 per full voice conversation. Two rupees. About 2.4 US cents.
Compare to:
- MGNREGA wage, FY 2025–26, national average roughly ₹370/day for an 8-hour workday. Per-minute wage: ~₹0.77. Per second: about 1.3 paise. One AI voice query equals about two and a half minutes of MGNREGA wage.
- Mobile data at ₹8–10/GB retail; a 60-second voice round-trip pushes maybe 100 KB up and 50 KB of compressed audio back. Data cost per query: <₹0.001. Negligible. Already paid for.
- A Jio Bharat ₹125/month plan is unlimited voice and minimal data; the connectivity cost per AI query, amortized, is a fraction of a paisa.
- A two-minute call to a doctor on a paid telemedicine app: typically ₹50–200. The AI alternative is 25–100× cheaper per interaction. (It is also, of course, not a doctor. We will get to this.)
Now run it at population scale.
A daily AI advisory — say, 5 voice queries per day per BPL adult — for India’s 234 million MPI-poor population, at retail ₹2/query, costs:
234,000,000 × 5 × ₹2 × 365 = ₹85,410 crore per year.
That is, by retail accounting, roughly equivalent to one entire MGNREGA budget (~₹86,000 crore for FY 2025–26). It is the upper bound — the bill if every Indian below the poverty line consumed a doctor-grade conversational AI five times a day, at full sticker. It is, in welfare-state terms, not actually that big.
Now run it at subsidized IndiaAI prices. Compute is the dominant variable cost; everything else (ASR, TTS) scales with the same subsidized GPU pool. Apply a conservative 60% all-in cost reduction on the LLM layer and 40% on the speech layers — a fair mid-range estimate of what empanelled IndiaAI access actually unlocks:
Effective per-query cost ≈ ₹0.80–1.20 → ₹1/query as a working number.
234M × 5 × ₹1 × 365 = ₹42,705 crore/year.
Now narrow the scope. Say the goal is not five queries per BPL Indian per day but one structured voice query per BPL household per day, which is a more realistic adoption curve. India has roughly 50 million MPI-poor households. One query per household per day, at ₹1 effective cost:
50M × 1 × ₹1 × 365 = ₹1,825 crore/year.
That is less than 3% of the PM-KISAN budget (₹60,000 crore/year). It is about 2% of MGNREGA. It is one-tenth of what the Ministry of Rural Development spends only on the Pradhan Mantri Awaas Yojana – Gramin in a busy year. If you wanted to give every poor household in India a daily, voice-mediated conversation with a personalized scheme/agriculture/health advisor — at scale, in their own language — the upper bound on the bill is smaller than the line items on the existing welfare receipts that the same household already receives.
The economics, for the first time in the history of post-Independence India, are not the binding constraint.
4. The phone is the binding constraint, but only just
It would be very Timlig of me to stop here and say “the economics work, end of story.” The economics work. The phone is the next problem.
Here is what the BPL household actually owns, roughly:
The Jio Bharat 4G feature phone, ₹799–1,199, 512 MB RAM, 4 GB storage. It cannot run an SLM on-device. It can consume cloud-mediated voice over a UPI 123Pay-style flow — IVR plus cloud LLM plus TTS streamed back. Tens of millions of these.
The Redmi A series, Realme C series, Itel/Infinix budget Android, ₹6,000–9,000, 4 GB RAM, MediaTek Helio G35–G88 or Snapdragon 4 Gen 2. A 1B-parameter SLM at 4-bit quantization, ~600 MB on disk, ~1 GB RAM at runtime, runs — at maybe 3–8 tokens per second for short Hindi voice replies. Borderline usable. Not pleasant.
The mid-tier Xiaomi/Realme/Samsung A-series, ₹12,000–18,000, Snapdragon 6 Gen 1 / Dimensity 7000-class, 6–8 GB RAM. A 3B model at 4-bit (~2 GB on disk) is plausible. This is the male earner’s phone in many BPL households, often the household’s only smartphone.
ASER 2024 finds that 90% of 14–16-year-olds in rural households have a smartphone at home, but only 31% own one personally, and the share that uses it for educational purposes is 57%. NFHS-5 / Data For India: phone usage among the poorest women rose from 39% to 67% in three years; among the poorest men, from 74% to 84%. A persistent 10+ point gender gap. The household has access to a smartphone; the woman often does not control it. The model that fits the poorest household’s phone is too small for nuanced dialect understanding, and the model that handles the dialect doesn’t fit the phone.
Which means, today, the credible deployment topology is not on-device autonomous agent for the poor — that is a 2027–2028 story, when the median ₹8,000 budget Android crosses 6 GB RAM and Snapdragon 6 Gen 1 — but cloud-mediated voice, accessed via:
- WhatsApp (78% of Indian smartphones, including BPL), the way Jugalbandi, Farmer.CHAT, Kisan e-Mitra, and ASHABot already deliver. The phone records audio and plays audio. The intelligence is in the cloud, on subsidized H100s.
- IVR over feature phones, the way Kilkari (3M+ active maternal mHealth subscribers) already works — upgraded from one-way pre-recorded audio to two-way voice agents.
- Frontline workers’ phones, the way ASHABot routes through 869 ASHAs in Udaipur. The worker has a Snapdragon 6, the citizen has a feature phone, the worker mediates.
- Common Service Centres, kirana stores, Bank Mitras, the way the JAM trinity already mediates. About 400,000 CSCs, 1.2 million kirana stores, and PM-WANI’s 4,09,111 Wi-Fi hotspots. The infra is there.
The cloud-mediated voice topology, on subsidized IndiaAI compute, already passes the price test for any meaningful BPL deployment. The on-device topology will pass it in 2027–28. Either way, the phone stops being the binding constraint by 2028.
5. The boring-but-essential second-order thing: the missing payer
This is the part of the essay that earns its keep. It is also the part that makes everything above slightly depressing.
The unit economics work. The model is built. The compute is subsidized. The data exists — IndicVoices is 7,348 hours across 22 languages, 16,237 speakers, 145 districts, funded for ~₹30 crore by Bhashini, EkStep, and Nilekani Philanthropies. The DPI rails — Aadhaar, UPI, DigiLocker, Account Aggregator, ONDC, Bhashini, BharatNet’s 2.15 lakh gram panchayats with 4G and 5G coverage in 99.9% of districts — are unusually good. The frontline worker network — a million ASHAs, 1.3 million Anganwadis, 50,000 extension officers — is in place, structurally undersupplied, and ready to be augmented.
What is missing is the buyer of inference at the back end.
In every functioning AI economy, somebody pays per query. ChatGPT users pay $20/month. Enterprise customers pay per-token through API contracts. Advertisers pay through Google. The economic loop closes because the consumer of the inference is also the funder of it, or because an advertiser triangulates between them.
For BPL Indians, neither holds. The user cannot pay $20/month for ChatGPT — that is roughly two days of MGNREGA wage. The user is not an advertising target attractive enough to sustain a venture-funded direct-to-consumer model — Karya is a real exception precisely because it pays into BPL households rather than monetizing them, but Karya is a labor platform, not an inference platform. The user is, in the language of welfare economics, somebody whose AI consumption produces positive externalities (better health, higher agricultural yield, fewer wrongful scheme exclusions) that are not captured by the user’s willingness-to-pay.
In every other welfare program, India has solved this with an explicit payer rail. PM-KISAN: the central government pays ₹6,000/year directly to 11 crore farmers from a ₹60,000 crore budget. MGNREGA: ~₹86,000 crore/year, paid as wages. PMAY-G: subsidies for housing. NHM: ASHA honoraria. Each of these has a ministry, a budget head, a Direct Benefit Transfer rail, an audit. AI inference for BPL Indians has none of these.
The IndiaAI Mission’s compute-pricing miracle exists, but, per a MediaNama report from April 2026 citing a Rajya Sabha reply by MeitY on February 9, 2026, only ~₹400 crore of the ₹10,372 crore Mission outlay has actually been released over two years — ₹21.79 crore in 2024–25 against revised estimates of ₹173 crore, and ₹379.15 crore in 2025–26 against revised estimates of ₹800 crore. Roughly 4% of the five-year corpus, in two years. The rails are getting built; the budget velocity is glacial; and even within that 4%, almost none has been earmarked for the specific job of paying for inference consumed by poor citizens.
The absence of the payer is not a market failure in the textbook sense. It is a categorical failure. The inference is currently priced as a B2B sale (Sarvam to Reliance, Sarvam to Tata, Sarvam to a fintech), or as a B2G sale (a foundation-model awardee to a ministry), or as a charity-grant deployment (Digital Green’s Farmer.CHAT funded by Gates, Walmart, and Google.org). What it is not yet priced as is a citizen entitlement, the way an MGNREGA day is, or a ration is. Until it is, every individual deployment will hit unit economics it cannot justify and will retreat into grant cycles.
This matters because grant-funded deployments end. ASHABot is one Rajasthan district. Farmer.CHAT — across India, Kenya, Ethiopia, Nigeria — reports about 830,000 users and 6.2 million queries, real numbers for a grant-funded NGO and very small numbers for India’s BPL population of 234 million. Kisan e-Mitra at 20,000 queries per day scales to about 7 million queries a year — one query per BPL Indian every 33 years. The slope of these projects is real but not on the right vector. Without a population-scale payer, none of them will reach the population.
The natural question is: who should pay? The candidates, in declining order of structural fit:
One, the Government of India, via a “PM-AI-Sahayak” line item in IndiaAI Mission Phase 2, paying empanelled providers per validated query for BPL Aadhaar holders. ₹1,000–2,000 crore/year buys roughly one query/day per poor household. This is small money.
Two, sectoral ministries — DA&FW for agriculture, MoHFW for ASHA support, MoRD for MGNREGA navigation — each absorbing inference into their existing schemes. The advantage is ownership; the disadvantage is fragmentation.
Three, state governments, the way Rajasthan funded Kisan e-Mitra. Tamil Nadu, Kerala, and Karnataka are credible early adopters; Bihar, UP, and Jharkhand are the actual targets and the laggards.
Four, multilateral and philanthropic — Gates, Wadhwani, Walmart, EkStep, J-PAL — bridging to Stage 1 deployments while a public payer is configured. The current default. Inadequate at population scale.
Five, the user, via small co-pay (₹5–10/month). Plausible for the upper BPL segments. Implausible for the bottom three deciles.
The cleanest policy move is option one — but India has never, in its post-Independence welfare history, paid per information transaction. It has paid for grain, for housing, for school meals, for cash transfers, for hospital admissions. Information has always been a public good produced by the state and consumed by anyone who walks in. AI inference is the first welfare-relevant information good with a real per-unit cost. Treating it as a citizen entitlement requires a categorical update — the way Aadhaar required the categorical update of treating identity as something the state issues at unit cost.
I think this update will happen. I do not think it has happened yet.
6. The error rate is a liability, and the citizen is currently carrying it
A thing I want to mention briefly, because the previous Timlig post on GPU economics was about depreciation, and there is a structurally similar issue here.
Every BPL-facing AI deployment has an error rate. ASHABot, the most carefully evaluated of these, has doctor-graded accuracy of about 85% on 163 evaluated responses — meaning roughly 1 in 7 ASHA queries gets a response that a doctor would not endorse. Farmer.CHAT reports a 75% successful-answer rate in the Singh et al. arXiv paper — meaning 1 in 4 farmer queries goes unanswered or mis-answered. Saaras V3 ASR has a 19.31% WER on the IndicVoices subset and worse on rural dialects — so 1 in 5 spoken words is misrecognized at the front end, before the LLM has even thought.
These are state-of-the-art numbers in the SLM-for-low-resource-languages frontier. They are also unacceptable for a doctor, an agriculture officer, or a scheme grievance system. If the AI tells a pregnant woman that her symptoms are normal when they are actually pre-eclampsia, the household pays the cost. If the AI tells a smallholder that his crop disease is fungal when it is bacterial, the household pays the cost. The error rate is a real liability, and currently the citizen carries it.
The depreciation analogy: a commercial neocloud books an H100 over 5 years instead of 3 to make the gross margin look like SaaS, even though the lender amortizes the same chip over 3.5–5 years because that is what the lender actually believes. A BPL-AI deployment books its accuracy as “85%” because that is what the published study says, even though the citizen experiences the misclassification rate as “1 in 7 of my actual real-life questions came back wrong.” The burden of the error rate, in the absence of a recourse mechanism, falls on the user. This is the exact opposite of how welfare-state liability normally works. You cannot refuse food in your ration; you can refuse a wrong AI advisory, but only if you knew it was wrong, which by definition you do not.
The credible architecture is SLM-augmented frontline worker, not direct-to-citizen autonomous agent. The ASHA reads the chatbot’s answer, applies clinical judgment, communicates with the patient. The error rate is absorbed by the worker’s training and the hierarchical referral system — exactly the way a junior doctor’s errors are absorbed by a senior. ASHABot’s deployment topology assumes this. Direct-to-citizen agents — Kisan e-Mitra at 20,000 queries/day with no human in the loop — assume the citizen can self-evaluate. For an English-literate user with good context, fine. For a rural Bhojpuri-speaking smallholder asking about cotton bollworm, not fine.
This is the second-order constraint that caps how aggressively any of this can be scaled. The unit economics work; the compute is subsidized; the model fits; the language fits. What is not yet built is the liability and recourse rail — the AI equivalent of the social audit that MGNREGA has, of the grievance redressal that PM-KISAN has, of the appeals process that ration card disputes have. Without it, scale is reckless.
7. What does it actually mean
So what does it mean, if you are a builder, a philanthropist, a state government, or an Anganwadi supervisor, that the price of meaning in Bhojpuri has fallen by an order of magnitude in eighteen months?
A few things.
One: pick the human cadre, not the citizen. The unit economics that work today work for SLM-augmented ASHAs, Anganwadi workers, agriculture extension officers, kirana-store-based CSC operators. They do not yet work, with appropriate safety, for direct-to-poorest-citizen autonomous agents. The right question for any deployment is which existing frontline worker am I making 3× more productive, and how do I measure it. ASHABot is the right pattern. Scale that, not standalone direct-consumer chatbots.
Two: assume cloud-mediated voice over WhatsApp/IVR through 2027. The on-device thesis is real but premature at the median budget Android. Every rupee spent on quantizing a model to run on a Redmi A2 today is a rupee that was not spent on the dialect coverage that would actually serve the user. Optimize for cloud cost (subsidized ₹0.05–0.20 per LLM call), bandwidth efficiency (audio compression, partial offloading of frequent intents), and dialect-tuned ASR. The on-device step is a 2027–28 unlock, not a 2026 one.
Three: the dialect coverage is the moat. Sarvam-1 covers ten Indic languages well. Bhojpuri, Maithili, Magahi, Awadhi, Marwari, Santhali, Mundari, Bhili, Gondi, Kui — these are the languages spoken by the actual BPL population, and they remain at fertility and accuracy levels that materially degrade outcomes. Adi Vaani (IIT Delhi consortium with the Ministry of Tribal Affairs, 2025 beta) is the seed. The next ₹100–200 crore of philanthropic money in this space should buy IndicVoices-quality datasets for the eight underserved Indo-Aryan and Munda languages, sourced via Karya-style ethical data labor. This is the highest-leverage rupee in the entire stack right now.
Four: build the payer rail before the application. The single most under-built piece of this stack is the per-query reimbursement mechanism for BPL citizens. The Aadhaar-authenticated, consent-mediated voucher for inference does not exist; it should. ₹2,000 crore/year, payable to empanelled providers against verified BPL Aadhaar usage, would cover a query a day for every poor household in India. This is one-thirtieth of PM-KISAN. It is well within the IndiaAI Mission’s authorization. Somebody has to write the GR.
Five: instrument outcomes, not adoption. The next round of BPL-AI evaluation should be J-PAL-quality RCTs comparing SLM-augmented frontline workers against business-as-usual, on hard outcome metrics: institutional delivery rates, ANC4+ completion, agricultural yield per acre, scheme-take-up rates, MGNREGA wage receipt accuracy. The Pratham/J-PAL EdTech evidence base — overall mixed, sometimes negative, often inferior to good human pedagogy — is the relevant reference class. This is the field where good intentions have failed expensively before. The right discipline is to measure outcomes, not query counts.
8. What this is, and isn’t
The previous Timlig post on sovereign AI argued that India’s stack is uneven across layers — a direction of travel, sometimes propaganda, often substantive. AI-for-BPL is similar. The slogan — “AI for India’s poor” — papers over a research, distribution, and funding stack with seven distinct layers, of which roughly four are in good shape (model, tokenizer, ASR/TTS, DPI rails), two are in motion (application, distribution), and one is essentially missing (payer).
The honest version is that, for the first time, you could give every poor person in India a personal, multilingual, voice-first AI advisor — agriculture, health, scheme navigation — for less than the cost of a single existing welfare program. The model is built. The compute is subsidized. The languages mostly work. The phone, mostly, fits. The frontline workers are in place. What is missing is somebody to write the line item.
That somebody is, almost certainly, the Government of India, because no other actor in this stack has both the rail (Aadhaar, DBT, IndiaAI compute) and the mandate. The fact that this hasn’t happened yet is not a failure of technology or of economics. It is a failure of categorical imagination — the same imagination that, twenty years ago, declined to think of identity as a thing the state could provision at unit cost, and ten years ago declined to think of payment rails as a public utility. Both of those happened, eventually, because somebody in Delhi wrote a note.
The two-rupee voice is real. The 234-million-person addressable population is real. The ₹2,000-crore-a-year program that would actually move SDG 1, 2, 3, and 5 needles for a meaningful slice of that population is, today, not real. It is a memo away.
Sources
National Institution for Transforming India (NITI Aayog), Multidimensional Poverty in India since 2005-06, Discussion Paper, January 2024 · NITI Aayog, SDG India Index 2023–24 (4th edition) · UNDP / OPHI, Global Multidimensional Poverty Index 2024 · Ministry of Statistics and Programme Implementation, Household Consumption Expenditure Survey 2022–23 · Down To Earth, RTI reply on the official poverty count (December 2024) · The South First, “India has 23.4 crore people living in poverty — highest in the world” · Sarvam AI, Sarvam-1 model card and announcement (October 2024) · Sarvam AI, Saaras V3 ASR and Bulbul TTS API documentation · IndiaAI Mission, Press Information Bureau release on Cabinet approval (PRID 2012355, March 2024) · PIB on IndiaAI compute capacity crossing 34,000 GPUs (PRID 2132817) · MediaNama, “IndiaAI Mission: Only Rs 400 Crore Released in Two Years” (April 2026), citing Rajya Sabha reply by MeitY (February 9, 2026) · SME Futures, “Rs 65 per GPU per hour: Subsidy rate under India AI Mission from 14 service providers” · Outlook Business, “Over 17,000 GPUs successfully installed Under Govt’s IndiaAI Mission” · AI4Bharat (IIT Madras): IndicTrans2 (arXiv 2305.16307), IndicVoices (arXiv 2403.01926), Airavata, IndicConformer, “Bhojpuri and Hindi Rural Women ASR” Hugging Face collection · OpenAI, GPT-4o tokenizer announcement; Microsoft Azure technical commentary on o200k_base for Indic languages · Microsoft Research / Khushi Baby, ASHABot evaluation, CHI 2025 (Ramjee et al., “ASHABot: An LLM-Powered Chatbot to Support the Informational Needs of Community Health Workers”) · Digital Green, Farmer.CHAT public deployment metrics; Singh et al., “Farmer.Chat: Scaling AI-Powered Agricultural Services for Smallholder Farmers” (arXiv 2409.08916) · Wadhwani AI, CottonAce program data and Google.org / Welspun / Deshpande Foundation assessments · BBC Media Action / ARMMAN / MoHFW, Kilkari maternal mHealth program · Verma et al., “Leveraging AI to improve health information access in the World’s largest maternal mobile health program,” AI Magazine (Wiley 2024) · Microsoft Research / EkStep / AI4Bharat / OpenNyAI / Bhashini, Jugalbandi · PIB, “KISAN E-MITRA and IoT enabled systems to improve crop productivity” (PRID 2117392) · IndiaAI, “Exploring Pradhan Mantri-KISAN AI Chatbot” · Ministry of Rural Development, MGNREGA wage notification FY 2025–26 (effective April 1, 2025) · Ministry of Health & Family Welfare, ASHA honorarium notification (Lok Sabha unstarred Q4764, March 28, 2025) · Bhashini documentation; PIB, “BHASHINI: Transforming Maha Kumbh through Multilingual Innovation” (PRID 2093333) · BharatNet, 2.15 lakh gram panchayats coverage (Department of Telecommunications, December 2025) · ASER Centre, Annual Status of Education Report 2024 · National Family Health Survey-5 (2019–21) · Data For India on phone access and internet penetration · Ookla, India connectivity report H1 2025 · J-PAL, EdTech Sector Review · Banerjee et al., Mindspark CAL evaluation · Pratham, Teaching at the Right Level program reports · Karya / DRK Foundation public profiles · Rajya Sabha, IndiaAI Mission foundation-model awardee replies (MoS Jitin Prasada, February 13, 2026) · MeitY, India AI Governance Guidelines (November 2025) · Press releases from Sarvam AI, BharatGen consortium, Hanooman / SML / BharatGPT, Krutrim, Soket AI, Gnani AI, Gan AI · Adi Vaani consortium press materials (Ministry of Tribal Affairs, 2025) · Tribal language demographics, 2011 Census · arXiv 2512.06490, “Optimizing LLMs Using Quantization For Mobile Execution” · arXiv 2410.03613, “Large Language Model Performance Benchmarking on Mobile Platforms” · arXiv 2506.09653, “Recognizing Every Voice: Towards Inclusive ASR for Rural Bhojpuri Women” · IBM Granite, Microsoft Phi-3, Google Gemma, Meta Llama 3.2 documentation · Local AI Master, on-device SLM benchmarks 2026 · Kettani & Moulin, “Rethinking the Role of Technology for Development in the AI Era: From AI4D to Smart ICT4D” (IJETT 2025) · The George Institute for Global Health, “AI for Community Health Workers in India” series · Frontiers in Global Women’s Health, on Kilkari deployment in Assam (2025) · Inc42, From LLMs to Verticalisation: India’s Sovereign AI Models Take Shape.