Institute for Knowledge, Research, and Society

Custom AI for Kalderash Romani–English Translation

Project Overview

This project develops a custom-designed artificial intelligence system for translation between Kalderash Romani and English, addressing the severe underrepresentation of Romani languages in existing AI and language technology. The project focuses on building task-specific translation workflows suitable for research, educational use, and controlled experimentation rather than mass deployment.

Project lead and system designer: Huseyin Oylupinar (Institute for Knowledge, Research, and Society)

Status: In development
Research Theme: AI, Knowledge Systems, and Research Methodology
Secondary Theme: Minorities, Pluralism, and Intercommunal Relations
Focus Region: Transnational / Romani communities

Problem and Significance

Most existing machine translation systems perform poorly or not at all for Romani languages due to limited data availability, dialectal diversity, and historical marginalization. Kalderash Romani, despite its wide geographic spread, remains largely absent from computational language resources.

This project addresses this gap by developing a custom AI translation system grounded in carefully curated materials and linguistic expertise. Beyond translation accuracy, the project examines how AI systems can be designed responsibly for minority languages without reinforcing simplification, erasure, or misrepresentation.

Research Questions

How can custom AI systems be designed for under-resourced languages with limited and heterogeneous data?
What linguistic features of Kalderash Romani pose specific challenges for AI-assisted translation?
How can translation systems for minority languages be developed without abstracting language from its social and cultural contexts?

Sources and Materials

Curated Kalderash Romani texts
Parallel Romani–English materials where available
Linguistically annotated examples
Manually aligned sentence pairs and glosses

All materials are curated and structured manually prior to computational processing.

Methods and Approach

Human-led research
The project is grounded in linguistic analysis, close reading, and comparative translation practices. Human expertise guides corpus selection, annotation, and evaluation of translation outputs.

Custom AI systems (methodological infrastructure)
The project develops custom AI workflows adapted to the constraints of under-resourced languages, including controlled training, evaluation against human reference translations, and explicit documentation of system behavior and limitations. AI outputs are treated as provisional analytical tools, not as authoritative translations.

Ethics, Integrity, and Safeguards

Respect for the cultural and social contexts of Romani language use
Avoidance of extractive or uncontextualized data practices
Transparent documentation of limitations and uncertainty
No automation of final translation decisions
Human oversight in all stages of development and evaluation

Outputs

Research outputs

Methodological notes on AI development for under-resourced languages
Comparative evaluation materials

Educational outputs

Teaching materials on minority languages and AI
Demonstration cases for workshops and training

Public-facing outputs

Selected translated examples with contextual commentary

Updates

2025–2026 — Dataset preparation and workflow design
2026 — Initial evaluation and methodological documentation