The Capacity Accelerator Network (CAN) is building a workforce of purpose-driven data and AI practitioners to unlock the power of data for social impact. Oluwaseun Nifemi advances purpose-driven AI solutions across sectors and domains through her roles as a senior data scientist at EqualyzAI and a team lead at Data Science Nigeria. Oluwaseun is an Africa Low-Resource Language CAN Fellow.
In this rapidly evolving AI landscape, what was the “aha moment” when you realized the opportunity and the necessity to train AI on low-resource languages to unlock and accelerate Africa’s AI potential?
I realized how often low-resource African languages are left out of global natural language processing (NLP) advancements, as most machine translation models underperform for these languages, not because they are less important, but because the data, infrastructure, and high computing are not readily available. The gap created by this divide doesn’t just limit innovation but marginalizes millions of people, hindering access to critical sectors like primary health care, education, and agriculture, where AI is needed to bridge the gap.
The “aha moment” for me is that if we are serious about AI being a force for inclusive growth, we can no longer overlook the languages our people in Africa speak daily as a developmental imperative. Imagine AI-driven conversational agents that can offer basic medical advice in the Hausa language for a rural village in Northern Nigeria, bridging the gap created by the shortage of health professionals. We can democratize access to technology by enabling localized solutions that empower communities across the continent.
Projections suggest AI can contribute over $1.2 trillion to Africa’s GDP by 2030, which shows that we have a massive opportunity and an urgent responsibility. The necessity becomes clear: without AI models trained on Africa’s linguistic diversity, the continent risks being left behind in the global revolution. Training AI on low-resource languages is not just about catching up but creating truly inclusive and scalable solutions. The vision of AI that genuinely reflects the continent’s contexts drives my work to help accelerate Africa’s AI future.
The necessity becomes clear: without AI models trained on Africa's linguistic diversity, the continent risks being left behind in the global revolution.
Oluwaseun Nifemi Lead, Technical Delivery (Consulting & Services) Data Science Nigeria (DSN)
How does your work with low-resource languages move the needle for data and AI for social impact work? What are some of the biggest challenges you have faced in doing so?
Nigeria has over 500 languages, making it one of the most linguistically diverse countries in the world. However, over 90 percent of these languages are considered low-resource in Natural Language Processing (NLP), meaning they lack the digital resources, corpora, computational infrastructure, and datasets needed to build effective language models. And that’s a problem because, without language inclusion, we’re building technology that doesn’t serve everyone. My work focuses on closing that gap by training AI in local African languages and building localized AI solutions to unlock access to critical services in education, healthcare, agriculture, and finance for communities that have historically been left out. When a student in a rural area can learn in their mother tongue or a patient can describe symptoms to a chatbot that understands them, that’s impact.
But it has not been easy. One of the biggest challenges we faced was acquiring locally nuanced datasets. Community-driven data collection, such as crowdsourcing, is promising but slow and resource-intensive. Additionally, limited access to computational infrastructure hinders model training. These barriers slow progress and prevent low-resource communities from accessing effectively trained AI models in their local languages. Despite the hurdles, we’re seeing progress. Our homegrown Equalyz Crowd allows you to collect multi-modal datasets and be incentivized. Through our startup, equalyzAI, we have built a language-inclusive product that drives health, education, and financial inclusion. We move the needle by making inclusion the foundation, not an afterthought, fostering equitable development, preserving cultural heritage, and driving socioeconomic progress.
What are the diverse, interdisciplinary skills that are required to do this work effectively? Which one surprised you the most?
Developing effective low-resource language models that authentically reflect Indigenous communities’ natural conversational style, cultural nuances, and religious contexts requires an interdisciplinary blend of skills. Of course, you need strong technical skills in machine learning, speech recognition, and model optimization, especially for real-time applications like speech-to-text systems. But what often gets overlooked is just how crucial linguistic expertise is, particularly from native speakers who are also trained linguists. Their ability to capture subtle tonal shifts, idiomatic expressions, and grammatical structures is non-negotiable for accuracy in low-resource language processing.
Beyond linguistics and engineering, we also needed cultural and anthropological insight, with ethical data governance, because we’re representing people’s identities, histories, and worldviews. That’s why community engagement is at the center of the process. We’ve had to co-design data collection methods with local communities to build trust and ensure the outputs are validated in contexts (meaningful and respectful).
The identity element challenged me to think beyond the algorithm and focus on inclusive, ethical AI development that reflects the people it serves.
What key responsible practices should AI practitioners prioritize when developing and training AI systems in African—or other low-resource languages?
Developing AI for African and other low-resource languages demands responsible practices to ensure ethical and inclusive outcomes. Firstly, I strongly recommend Privacy-by-Design principles and robust consent protocols. Prioritizing participant sovereignty and culturally sensitive data is responsible AI development. Interdisciplinary teams, including data governance experts and legal compliance specialists, must enforce these guardrails to align with local regulations.
Secondly, it is important to address linguistic biases in training data. These biases can distort cultural representation and reduce model accuracy. Data Collectors should curate diverse datasets and account for dialectal variations to preserve meaning across contexts.
I attest that community trust is foundational. Engaging local communities fosters linguistic authenticity, improves data quality, and builds confidence in AI systems. Transparent collaboration, including co-designing data collection with indigenous stakeholders, ensures models reflect cultural nuances and meet community needs. Communities may resist participation without trust, undermining data integrity and model effectiveness. By prioritizing ethical stewardship and community trust, AI products can drive equitable impact that preserves cultural heritage and drives social progress in low-resource settings.
Beyond linguistics and engineering, we also needed cultural and anthropological insight, with ethical data governance, because we're representing people's identities, histories, and worldviews.
Oluwaseun Nifemi Lead, Technical Delivery (Consulting & Services) Data Science Nigeria (DSN)
What is the importance of cross-sector collaborations in building inclusive AI? What advice would you offer to people interested in this work?
I advocate for partnerships among AI startups, tech companies, academic institutions, governments, and local communities. This pool of expertise, resources, and perspectives addresses linguistic and cultural gaps in AI systems.
These partnerships minimize challenges like scarce datasets and limited infrastructure by leveraging shared resources, such as community-driven data collection or government-funded computing facilities. They also promote ethical practices, balancing technological advancement and cultural preservation.
I advise those interested in AI language equity to prioritize interdisciplinary learning and community engagement. Gain NLP, linguistics, and ethics skills and develop cultural competence to collaborate effectively with diverse stakeholders. Seek mentorship from experts in low-resource language AI and contribute to open-source projects to build practical experience. Finally, it is important to engage communities actively; their insights are critical for creating relevant, trustworthy AI systems.
“5 Minutes with” series
These articles share the stories of people around the world leveraging data and AI to drive impact.