Project Overview
This project develops a custom-designed artificial intelligence system for translation between Kalderash Romani and English, addressing the severe underrepresentation of Romani languages in existing AI and language technology. The project focuses on building task-specific translation workflows suitable for research, educational use, and controlled experimentation rather than mass deployment.
Project lead and system designer: Huseyin Oylupinar (Institute for Knowledge, Research, and Society)
Status: In development
Research Theme: AI, Knowledge Systems, and Research Methodology
Secondary Theme: Minorities, Pluralism, and Intercommunal Relations
Focus Region: Transnational / Romani communities
Problem and Significance
Most existing machine translation systems perform poorly or not at all for Romani languages due to limited data availability, dialectal diversity, and historical marginalization. Kalderash Romani, despite its wide geographic spread, remains largely absent from computational language resources.
This project addresses this gap by developing a custom AI translation system grounded in carefully curated materials and linguistic expertise. Beyond translation accuracy, the project examines how AI systems can be designed responsibly for minority languages without reinforcing simplification, erasure, or misrepresentation.
Research Questions
- How can custom AI systems be designed for under-resourced languages with limited and heterogeneous data?
- What linguistic features of Kalderash Romani pose specific challenges for AI-assisted translation?
- How can translation systems for minority languages be developed without abstracting language from its social and cultural contexts?
Sources and Materials
- Curated Kalderash Romani texts
- Parallel Romani–English materials where available
- Linguistically annotated examples
- Manually aligned sentence pairs and glosses
All materials are curated and structured manually prior to computational processing.
Methods and Approach
Human-led research
The project is grounded in linguistic analysis, close reading, and comparative translation practices. Human expertise guides corpus selection, annotation, and evaluation of translation outputs.
Custom AI systems (methodological infrastructure)
The project develops custom AI workflows adapted to the constraints of under-resourced languages, including controlled training, evaluation against human reference translations, and explicit documentation of system behavior and limitations. AI outputs are treated as provisional analytical tools, not as authoritative translations.
Ethics, Integrity, and Safeguards
- Respect for the cultural and social contexts of Romani language use
- Avoidance of extractive or uncontextualized data practices
- Transparent documentation of limitations and uncertainty
- No automation of final translation decisions
- Human oversight in all stages of development and evaluation
Outputs
Research outputs
- Methodological notes on AI development for under-resourced languages
- Comparative evaluation materials
Educational outputs
- Teaching materials on minority languages and AI
- Demonstration cases for workshops and training
Public-facing outputs
- Selected translated examples with contextual commentary
Updates
- 2025–2026 — Dataset preparation and workflow design
- 2026 — Initial evaluation and methodological documentation