Although Africa is home to a huge proportion of the world's languages – well over a quarter according to some estimates - many are missing when it comes to the development of artificial intelligence (AI).

This is both an issue of a lack of investment and readily available data. Most AI tools, such as ChatGPT, used today are trained on English as well as other European and Chinese languages, which have vast quantities of online text to draw from. But many African languages are mostly spoken rather than written, leading to a significant lack of text to train AI and make it useful for speakers of those languages.

For millions across the continent, this means exclusion from technological advancements and information access.

Researchers addressing this gap have unveiled what is thought to be the largest known dataset of African languages, indicating a step forward in this largely unexplored territory. We think in our own languages, dream in them and interpret the world through them. If technology doesn't reflect that, a whole group risks being left behind, said Prof Vukosi Marivate from the University of Pretoria, echoing the importance of language in technology.

The African Next Voices project aims to create AI-ready datasets in 18 African languages. Although this forms a small portion of the more than 2,000 languages spoken across the continent, the project is hopeful for future expansions to encompass a wider array of tongues.

Through two years of extensive voice recording across Kenya, Nigeria, and South Africa, the initiative managed to collect 9,000 hours of speech, mirroring real-life scenarios from farming to health and education.

The audio corpus includes languages like Kikuyu, Hausa, Yoruba, isiZulu, and Tshivenda, representing millions of speakers. Prof Marivate emphasized the essential nature of this groundwork, stating that it will enable further innovation and development for African languages in the AI domain.

Moreover, farmer Kelebogile Mosime from South Africa exemplifies the practical benefits such technological integration brings, as she utilizes an AI-powered app to troubleshoot agricultural challenges in her native Setswana, enhancing her farming practice significantly.

The push for inclusive AI extends beyond mere business convenience, representing a broader cultural significance. Prof Marivate cautions that neglecting African languages forfeits not just data but also the very history and knowledge embedded within these languages. Language remains a gateway to imagination—its vitality is key in shaping a cohesive future for all.