Open-source ML training and inference framework for African AI.
Train multi-task NLP, Speech, and Vision models with a single YAML config — no code required. Built in Kenya, built for Africa.
JengaAI is a framework that lets researchers, engineers, and non-technical teams train production-grade machine learning models on African language data — and deploy them without vendor lock-in, without API dependencies, and without sending sensitive data to foreign servers.
Your model. Your data. Your task.
| Model | Task | Language | Base |
|---|---|---|---|
| Rogendo/afribert-kenya-adapted | Masked Language Modeling (DAPT) | Swahili · Sheng · English | castorini/afriberta_large |
| Rogendo/cpims-nlp-intent-urgency | Intent + Urgency Classification | Swahili · Sheng · English | afribert-kenya-adapted |
Domain-adaptive pre-training of AfriBERT on ~39M tokens of Kenyan language data — Swahili Wikipedia, East African journalism, synthetic Sheng/code-switch corpus, and real CPIMS field worker WhatsApp data. Achieves 30.4% average perplexity improvement over the base model on Kenyan domain text, with 66% improvement on Sheng and 41% on English-Swahili code-switching.
Multi-task classifier trained on CPIMS child protection support messages. Simultaneously predicts 63 intent classes and urgency level (high / medium / low) from a single encoder pass. Intent F1: 74.5% — up from 46% on a generic English base model. Handles English, Swahili, and Kenyan code-switching.
pip install jenga-ai
Train any model with a single YAML config:
project_name: swahili-hate-speech
model:
base_model: castorini/afriberta_large
max_seq_len: 128
tasks:
- name: classification
type: single_label_classification
data_path: data/hate_speech.csv
text_column: text
label_column: label
training:
epochs: 5
batch_size: 16
learning_rate: 3.0e-5
python -m jenga_ai train --config swahili-hate-speech.yaml
| Modality | Status | Notes |
|---|---|---|
| NLP — classification, NER, multi-task | ✅ Production | Multi-task with shared encoder + dual heads |
| Speech — Whisper fine-tuning, transcription | ⚙️ Active development | ASR for Swahili and African languages |
| Vision — classification, OCR, object detection | ⚙️ Active development | Document verification, image classification |
| LLM — LoRA fine-tuning, Ollama integration | ⚙️ Active development | Swahili instruction tuning |
Africa's AI ecosystem is being built on API wrappers — products that call GPT-4 or Claude and rebrand the output as "African AI." These products are expensive at scale, dependent on foreign infrastructure, unable to handle African languages properly, and unable to keep sensitive data on the continent.
JengaAI exists to make the alternative practical.
A locally trained, domain-adapted model:
Child protection systems — intent classification and urgency triage for CPIMS support messages in English, Swahili, and Sheng
Community health — symptom extraction and referral urgency from CHW voice notes and field reports
Financial services — M-PESA dispute classification, fraud signal detection, transaction intent analysis
Government services — citizen complaint routing, document OCR, service request classification
Education — student question routing, learner sentiment analysis, multilingual content classification
Media monitoring — hate speech detection, misinformation flagging, topic classification in Swahili and code-switched text
JengaAI is built with responsible AI development as a core principle, not an afterthought:
JengaAI is developed in the spirit of African AI communities doing the work right — Data Science Africa, Masakhane, Deep Learning Indaba, and AIMS.
We believe that building AI for Africa means building it on African data, in African languages, with African institutional contexts — not wrapping foreign models in local branding.
pip install jenga-aiBuilt in Kenya 🇰🇪 — for Africa and beyond.
Edit this README.md markdown file to author your organization card.