Maieutic Lab

Hackerman Hall

Johns Hopkins University

Baltimore, MD 12345

Welcome to the Multilingual Artificial Intelligence for Eliciting Understanding Through Intermodal Content (MAIEUTIC) Lab website. We are a team of researchers at Johns Hopkins University who work on making AI systems perform better at languages beyond only English.

We work on a wide variety of multilingual aspects of artificial intelligence across a range of modalities. This includes Machine Translation, Cross-Lingual Retrieval (and Retrieval Augmented Generation), Preference Optimization, Dataset Creation, Hyperparameter Optimization, Robust Evaluation, and overall core Machine Learning. Within this, we work across modalities. For instance, in Machine Translation, we work not only on textual translation, but Optical Character Recognition (OCR) Translation and Speech (Audio) Translation as well. Or, we work on retrieving events in large collections of videos in multiple languages, as opposed to only working in the textual modality. Naturally, this is only a small subset of our ongoing projects and interests. You can find out more in the repositories tab. Also check out the websites of our individual researchers for even more depth.

We are always looking to work on fun problems and collaborate with many other centers and labs both within JHU and beyond. If you are a student currently affiliated with Hopkins and are interested in joinging the lab, please fill out this form.

news

Nov 27, 2025	Founded New ACL SIG on Image and Language
Oct 20, 2025	CFP ACL Tutorials
Sep 17, 2025	1st WMDQS to take place at COLM 2025
Aug 08, 2025	1st MAGMaR Workshop takes place at ACL 2025
Aug 06, 2025	Welcome to the new MAIEUTIC Lab Website.

selected publications

ICML 2024

Contrastive preference optimization: pushing the boundaries of LLM performance in machine translation

Haoran Xu, Amr Sharaf, Yunmo Chen, and 5 more authors

In Proceedings of the 41st International Conference on Machine Learning, 2024

Abs PDF

Moderate-sized large language models (LLMs) – those with 7B or 13B parameters – exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT’21, WMT’22 and WMT’23 test datasets.
NAACL 2024

Kreyòl-MT: Building MT for Latin American, Caribbean and Colonial African Creole Languages

Nathaniel Robinson, Raj Dabre, Ammon Shurtz, and 14 more authors

In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun 2024

Abs PDF

A majority of language technologies are tailored for a small number of high-resource languages, while relatively many low-resource languages are neglected. One such group, Creole languages, have long been marginalized in academic study, though their speakers could benefit from machine translation (MT). These languages are predominantly used in much of Latin America, Africa and the Caribbean. We present the largest cumulative dataset to date for Creole language MT, including 14.5M unique Creole sentences with parallel translations – 11.6M of which we release publicly, and the largest bitexts gathered to date for 41 languages – the first ever for 21. In addition, we provide MT models supporting all 41 Creole languages in 172 translation directions. Given our diverse dataset, we produce a model for Creole language MT exposed to more genre diversity than ever before, which outperforms a genre-specific Creole MT model on its own benchmark for 26 of 34 translation directions.
LREC-COLING 2024

Exploring Geometric Representational Disparities between Multilingual and Bilingual Translation Models

Neha Verma, Kenton Murray, and Kevin Duh

In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024

Abs PDF

Multilingual machine translation has proven immensely useful for both parameter efficiency and overall performance across many language pairs via complete multilingual parameter sharing. However, some language pairs in multilingual models can see worse performance than in bilingual models, especially in the one-to-many translation setting. Motivated by their empirical differences, we examine the geometric differences in representations from bilingual models versus those from one-to-many multilingual models. Specifically, we compute the isotropy of these representations using intrinsic dimensionality and IsoScore, in order to measure how the representations utilize the dimensions in their underlying vector space. Using the same evaluation data in both models, we find that for a given language pair, its multilingual model decoder representations are consistently less isotropic and occupy fewer dimensions than comparable bilingual model decoder representations. Additionally, we show that much of the anisotropy in multilingual decoder representations can be attributed to modeling language-specific information, therefore limiting remaining representational capacity.
ICLR 2024

Error norm truncation: Robust training in the presence of data noise for text generation models

Tianjian Li, Haoran Xu, Philipp Koehn, and 2 more authors

In Proceedings of the 12th International Conference on Learning Representations, May 2024

Abs PDF

Text generation models are notoriously vulnerable to errors in the training data. With the wide-spread availability of massive amounts of web-crawled data becoming more commonplace, how can we enhance the robustness of models trained on a massive amount of noisy web-crawled text? In our work, we propose Error Norm Truncation (ENT), a robust enhancement method to the standard training objective that truncates noisy data. Compared to methods that only uses the negative log-likelihood loss to estimate data quality, our method provides a more accurate estimation by considering the distribution of non-target tokens, which is often overlooked by previous work. Through comprehensive experiments across language modeling, machine translation, and text summarization, we show that equipping text generation models with ENT improves generation quality over standard training and previous soft and hard truncation methods. Furthermore, we show that our method improves the robustness of models against two of the most detrimental types of noise in machine translation, resulting in an increase of more than 2 BLEU points over the MLE baseline when up to 50 of noise is added to the data.