M1A1 - Data Curation and Listing
Inventory of available open-source datasets for Wolof and Bambara across different categories: (1) English/French-Bambara translation pairs, (2) English/French-Wolof translation pairs, (3) Audio-transcription pairs, (4): Raw audio datasets without annotation. Establish data annot…
M1A2 - Synthetic Data Generation for Translation
The goal here will be to leverage our existing Wolof and Bambara LLM Translator to generate synthetic data at scale from open-source English corpora. The goal is to create small translator LLMs that are distilled from the larger models. Note : Distillation is when one a large mod…
M1A3 - Create Text & speech embedding eval dataset
A critical gap we address is the lack of evaluation benchmarks and high-quality datasets for African languages. Text and speech embedding models are used in RAG systems to find relevant documents based on user queries. To properly evaluate these models, we need benchmark datasets…
M2A1 - LLM Optimization for On-device/Offline Use
The idea here is to focus on small and frugal models, which are the most useful in african context due to internet connectivity limitations. Our goal would be to optimize models (like Oolel-Small) for size and efficiency for on-device deployment scenarios by distilling a small mo…
M2A2 - HuggingFace Space Deployment
Live interactive demo where users can test Wolof translation model and provide feedback, enabling community engagement and informal validation
M3A1 - Define the visual identity for the Soynade
Complete brand identity established for Soynade including logo, color palette, typography, and illustration style for consistent communication and marketing materials.
M3A2 - Design a website for Soynade
Fully designed and functional Soynade website outlining company mission, products, and contact information to establish online presence.
M4A1 - Market Research and Customer Discovery
Since we already have models performing well in translation, transcription, and analyzing English/French documents in Wolof or Bambara, we will develop a viable business plan around these capabilities, initially focusing on a streamlined SaaS offering. This activity will help val…
M4A2 - SaaS Concept Development
Defined SaaS product concept with specific features, target market, and value proposition.
M5A1 - Attend technical product dev session
The members of Soynade team will be joining a UNICEF Ventures workshop to meet the fellow portfolio companies and attend technical work sessions.
M5A2 - Revision of Q1 work plan
Revise the current workplan, by adding what have been achieved and what need to be adapted
M5A3 - Creation of company Karma profile
The creation of the Soynade profile and project file on Grantee Accountability Protocol (Karma)
M6A1 - Communications branding on Social media
We plan to post one image and caption in our linkedin and social media to announce a new release of dataset, model or for announcing a new project.
M6A2 - Communications and Branding (BlogPost)
We plan to write blogposts in order to share our vision of AI and open-source.
M7A1 - Establish licensing strategy
Clear open source licensing strategy established and OSI-approved licenses applied to all public repositories, ensuring legal compliance and community clarity
M7A2 - Create a Soynade project charter
Establish project charter defining Soynade's open source project vision, mission, community guidelines, and intellectual property strategy.
M7A3 - Ensure repository contains clear README
Create READMEs (in English) for all public repositories. READMEs should include: overview of specific repo, developer environment instructions (i.e. how to set software up), note about how repo connects into overall product, list of any Open Source software used to create product…
M7A4 - Public documentation
Create a public Open Source documentation github web page ensuring up-to-date technical documentation accessible to community for all project specific repositories
M7A5 - Establish an Open Source QA process
Established QA processes appropriate to project types (data validation, model evaluation, etc.) ensuring code and model quality before public release.
M7A6 - Establish Code of Conduct
Identify a Code of Conduct for any public Open Source repositories. Upload it to public source code repositories. Create internal documentation for how to respond to a Code of Conduct report, if one were to be made
M7A7 - Standardize Pull Request Workflow
Standardize pull request workflow adopted across all repositories ensuring code review, quality control, and collaborative development practices.
M8A1- Data privacy and security awareness session
Team members trained on data privacy and security best practices
M8A2 - Complete the Privacy Impact Assessment
Privacy Impact Assessment using UNICEF template documenting data processing practices and identifying any privacy risks for Soynade products.
2025 - Data and Trust Cohort
Soynade Research develops open-source AI technologies including LLMs, translation tools, and speech systems for underserved African languages. Selected for the UNICEF Venture Fund program, we're building language technologies that serve millions of speakers across the continent.…