A Nigerian student has Built an AI Text-to-Speech Model that Responds with a Nigerian Accent

A student of UNILAG has created an AI text-to-speech model with a Nigerian accent.

Feb 6, 2025
A Nigerian student has Built an AI Text-to-Speech Model that Responds with a Nigerian Accent
A Nigerian student has Built an AI Text-to-Speech Model that Responds with a Nigerian Accent

Saheed Azeez gained fame when he created 2 million GPT tokens but his determination for innovation only grew stronger. His new project is called YarnGPT, a text-to-speech AI model that can read text aloud in a Nigerian accent. This tool has joined a long line of AI built to generate lifelike voices in seconds. Of course, there were some hurdles on the way, with two particularly standing out. 

The first one, Azeez is a Nigerian university student with limited resources, and the second, developing an AI model capable of accurately capturing the nuances of a Nigerian accent has technical challenges. It took a great deal of effort to create this model. Even the man himself said, "it was quite tasking, especially gathering the data needed to make this happen."

The Inspiration behind YarnGPT 

From his success with Naijaweb, Azeez sought to build something new and he quoted "the amount of conversations and interest people had in Naijaweb. Imagine getting featured on Techpoint Africa; it motivated me to do this." Failure also played a part. Before starting YarnGPT, Azeez had applied for a job at a Nigerian AI company, however, he didn't perform well enough in the interview. 

What a way to bounce back from the disappointment with YarnGPT. It became the project that would help him improve his skills and increase his chances of landing similar roles in the future. 

Voices for AI 

Azeez sought a vast number of Nigerian voices to build the AI model. "I used some movies that were available online. I extracted their audio and subtitles," he said. Nollywood provided an avenue to gather sounds, thanks to thousands of movies uploaded to YouTube. According to him, "The problem with building in Nigeria is data. Replicating what has been built overseas isn’t that hard, but data always gets in the way."

However, Nollywood sounds would prove inadequate, so he turned to Hugging Face, an open-source platform for machine learning and data science. He went on to combine the audio from Nigerian movies with high-quality datasets from Hugging Face to train his model. Next up was training the AI model but he didn't have access to his own GPU. He relied on cloud computing services like Google Colab, which would set him back $50 (₦80,000) — a significant amount for a university student. Unfortunately, it was a waste.

Thankfully, he would discover Oute AI, a platform that had developed a text-to-speech model in an autoregressive manner. "The way the model works is, you give it a piece of text, and it predicts one word at a time. It takes that word, adds it back to the text, and then predicts the next one — kind of like how ChatGPT completes sentences. That’s what makes it autoregressive."

Getting it Right 

Even with help from Oute, Azeez still built his model. He got a language model called SmolLM2-360M from Hugging Face, and included speech functionality to it, a process that involved major algorithmic changes. Besides, he needed another $50 to train the model, with the training taking over three days.

Tokenization of AI models was a challenge and with audio, things are different. It takes "breaking down continuous sound waves into smaller, manageable pieces that a model can understand and process." "The model needs to convert the sound into a sequence of discrete values, kind of like turning a long speech into tiny puzzle pieces. These smaller audio tokens can then be used to train the AI, and later, the model can reassemble them to generate speech that sounds natural."

YarnGPT came together with a wave tokenizer, thanks to resources from Hugging Face, Oute AI, and other Nigerian repositories. Azeez went public with YarnGPT and explained how it works, catching the attention of more than 100,000 people on X (formerly Twitter), including tech big shots. 

What does this mean for Nigeria? 

Most of the East and West have left Nigeria and Africa trailing in the AI race, but with innovators like Azeez developing exciting AI models, Nigeria has made a good start. He has also reacted to Nigeria’s position in AI: "Honestly, we’re way off. We’re not even in the race. The big AI models today — like OpenAI’s or the ones from China — are trained on massive datasets with huge computational resources, things we don’t have here."

But Azeez remains optimistic. "Instead of trying to build from scratch, we can focus on localizing AI for our own needs. We can take what’s already been built and adapt it for Nigerian languages and accents. That’s how we can start catching up, he stated"