Google DeepMind’s India unit has embarked on an ambitious mission to bridge the digital divide in one of the world’s most linguistically diverse countries. Under a new initiative called ‘Project Morni’ (Multimodal Representation for India), DeepMind aims to develop an artificial intelligence (AI) model capable of understanding and representing 125 Indian languages and dialects. The project seeks to ensure that India’s vast array of languages is included in the digital realm, providing an equal opportunity for every voice to be heard.
India is home to a staggering number of languages, with 22 officially recognised by the Constitution. However, over 100 languages are spoken daily across the country, reflecting a rich cultural tapestry. Google DeepMind’s research highlights that approximately 60 Indian languages are spoken by over a billion people, and more than 125 languages have at least 100,000 speakers. Yet, many of these languages, particularly the lesser-known ones, lack a digital footprint. For instance, while Hindi is spoken by nearly 10 per cent of the global population, it accounts for just 0.1 per cent of the content available on the internet. Moreover, 73 out of these 125 languages have no digital data available at all.
To address this digital exclusion, Google DeepMind has launched Project Vaani in collaboration with the Indian Institute of Science (IISc) and ARTPARK (Artificial Intelligence & Robotics Technology Park). Project Vaani focuses on collecting and making speech data from across India available as open-source material. The project, first announced in December 2022, aims to collect and transcribe 154,000 hours of speech data from all 773 districts in India. In its initial phase, Project Vaani gathered over 14,000 hours of speech data across 58 languages, involving contributions from 80,000 individuals in 80 districts.
The project has now entered its second phase, which aims to expand its reach to 160 districts across all states. This massive data collection effort is a critical step toward developing AI models that genuinely reflect India’s linguistic diversity. By capturing the nuances and unique characteristics of these languages, Google DeepMind is ensuring that each one, regardless of the number of speakers, is represented in the digital age.
Google DeepMind’s efforts through Project Morni and Project Vaani are not merely technological endeavours; they are initiatives rooted in social inclusion. By prioritising such a wide range of languages, these projects aim to preserve India’s linguistic heritage and make technology more accessible to people who speak these languages daily. “This is a significant step toward creating a more inclusive digital world where everyone’s voice can be heard,” said a Google DeepMind spokesperson.
The success of Project Morni will also depend on its ability to gather data from remote and underserved regions, where many languages have remained largely undocumented in digital formats. As the project progresses, the aim is to democratise access to information and services for speakers of all Indian languages, from the most prominent to the most obscure.
As Project Vaani continues to collect data and expand its reach, the hope is that more Indian languages will find a place in the digital landscape, empowering their speakers and preserving their cultural legacy for generations to come.
Comments