CMU researchers are presenting 50 papers at the Thirtieth Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), held from November 4 – 9 in Suzhou, China. This includes 27 papers in the main conference, 19 papers in the Findings track, 2 system demonstrations papers, and 2 industry track papers. This blog post provides aggregated information about EMNLP 2025 papers published by CMU researchers.
Key areas addressed are visualized below (representing 30 of the 50 total papers), illustrating the breadth of NLP and machine learning research being conducted at CMU :
Note: All information in this post has been obtained through the ACL Anthology API and the EMNLP 2025 Presentation Information spreadsheet. Please contact CMU ML Blog editors if you would like any information added or changed.
Table of Contents
Main Conference Papers
Findings Papers
System Demonstrations
Industry Track Papers
Main Conference Papers
Special Theme: Interdisciplinary Recontextualization of NLP
MolErr2Fix: Benchmarking LLM Trustworthiness in Chemistry via Modular Error Detection, Localization, Explanation, and Correction
Yuyang Wu, Jinhui Ye, Shuhao Zhang, Lu Dai, Yonatan Bisk, Olexandr Isayev
Spontaneous Giving and Calculated Greed in Language Models
Yuxuan Li, Hirokazu Shirado
Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics
Jiarui Liu, Yueqi Song, Yunze Xiao, Mingqian Zheng, Lindia Tjuatja, Jana Schaich Borg, Mona T. Diab, Maarten Sap
Multimodality and Language Grounding to Vision, Robotics and Beyond
Social Genome: Grounded Social Reasoning Abilities of Multimodal Models
Leena Mathur, Marian Qian, Paul Pu Liang, Louis-Philippe Morency
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Yiming Jia, Jiachen Li, Xiang Yue, Bo Li, Ping Nie, Kai Zou, Wenhu Chen
Identifying & Interactively Refining Ambiguous User Goals for Data Visualization Code Generation
Mert Inan, Anthony Sicilia, Alex Xie, Saujas Vaduguru, Daniel Fried, Malihe Alikhani
Resources and Evaluation
Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles
Kimberly Truong, Riccardo Fogliato, Hoda Heidari, Steven Wu
Human-AI Interaction/Cooperation
Estimating LLM Consistency: A User Baseline vs Surrogate Metrics
Xiaoyuan Wu, Weiran Lin, Omer Akgul, Lujo Bauer
Humanizing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of Design
Yunze Xiao, Lynnette Hui Xian Ng, Jiarui Liu, Mona T. Diab
Interpretability, Model Editing, Transparency, and Explainability
Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies
Terrance Liu, Shuyi Wang, Daniel Preotiuc-Pietro, Yash Chandarana, Chirag Gupta
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions
Emmy Liu, Amanda Bertsch, Lintang Sutawika, Lindia Tjuatja, Patrick Fernandes, Lara Marinov, Michael Chen, Shreya Singhal, Carolin Lawrence, Aditi Raghunathan, Kiril Gashteovski, Graham Neubig
Mathematical, Symbolic, and Logical Reasoning in NLP
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening
Andre Wang He, Daniel Fried, Sean Welleck
Agentic-R1: Distilled Dual-Strategy Reasoning
Weihua Du, Pranjal Aggarwal, Sean Welleck, Yiming Yang
Generalizability and Transfer
SOCIAL SCAFFOLDS: A Generalization Framework for Social Understanding Tasks
Ritam Dutt, Carolyn Rose, Maarten Sap
Searching for the Most Human-like Emergent Language
Brendon Boldt, David R. Mortensen
NLP Applications
PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs
Sana Kang, Myeongseok Gwon, Su Young Kwon, Jaewook Lee, Andrew Lan, Bhiksha Raj, Rita Singh
Safety and Alignment in LLMs
Anecdoctoring: Automated Red-Teaming Across Language and Place
Alejandro Cuevas, Saloni Dash, Bharat Kumar Nayak, Dan Vann, Madeleine I. G. Daepp
Natural Language Generation
CIE: Controlling Language Model Text Generations Using Continuous Signals
Vinay Samuel, Harshita Diddee, Yiming Zhang, Daphne Ippolito
Question Answering
Table-R1: Inference-Time Scaling for Table Reasoning Tasks
Zheyuan Yang, Lyuhao Chen, Arman Cohan, Yilun Zhao
Multilinguality and Language Diversity
Grounding Multilingual Multimodal LLMs With Cultural Knowledge
Jean De Dieu Nyandwi, Yueqi Song, Simran Khanuja, Graham Neubig
Computational Social Science, Cultural Analytics, and NLP for Social Good
Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication
Jocelyn J Shen, Akhila Yerukola, Xuhui Zhou, Cynthia Breazeal, Maarten Sap, Hae Won Park
AI/LLM Agents
On the Fine-Grained Planning Abilities of VLM Web Agents
Surgan Jandial, Yinong Oliver Wang, Andrea Bajcsy, Fernando De la Torre
Code Models
An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Shubham Gandhi, Atharva Naik, Yiqing Xie, Carolyn Rose
Summarization
Summarizing Speech: A Comprehensive Survey
Fabian Retkowski, Maike Züfle, Andreas Sudmann, Dinah Pfau, Shinji Watanabe, Jan Niehues, Alexander Waibel
Retrieval-Augmented Language Models
MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers
Jushaan Singh Kalra, Xinran Zhao, To Eun Kim, Fengyu Cai, Fernando Diaz, Tongshuang Wu
Phonology, Morphology and Word Segmentation
Morpheme Induction for Emergent Language
Brendon Boldt, David R. Mortensen
Low-resource Methods for NLP
Language Models Can be Efficiently Steered via Minimal Embedding Layer Transformations
Diogo Tavares, David Semedo, Alexander Rudnicky, Joao Magalhaes
Findings Papers
Special Theme: Interdisciplinary Recontextualization of NLP
FicSim: A Dataset for Multi-Faceted Semantic Similarity in Long-Form Fiction
Natasha Johnson, Amanda Bertsch, Maria-Emil Deal, Emma Strubell
Resources and Evaluation
ResearchArena: Benchmarking Large Language Models’ Ability to Collect and Organize Information as Research Agents
Hao Kang, Chenyan Xiong
SimBA: Simplifying Benchmark Analysis Using Performance Matrices Alone
Nishant Subramani, Alfredo Gomez, Mona T. Diab
mrCAD: Multimodal Communication to Refine Computer-aided Designs
William P McCarthy, Saujas Vaduguru, Karl D.d. Willis, Justin Matejka, Judith E Fan, Daniel Fried, Yewen Pu
Human-AI Interaction/Cooperation
Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences
Mingqian Zheng, Wenjia Hu, Patrick Zhao, Motahhare Eslami, Jena D. Hwang, Faeze Brahman, Carolyn Rose, Maarten Sap
Interpretability, Model Editing, Transparency, and Explainability
Linear Steerability in Language Models: When It Emerges and How It Evolves
Jianshu She, Xinyue Li, Eric P. Xing, Zhengzhong Liu, Qirong Ho
Predicting Language Models’ Success at Zero-Shot Probabilistic Prediction
Kevin Ren, Santiago Cortes-Gomez, Carlos Miguel Patiño, Ananya Joshi, Ruiqi Lyu, Jingjing Tang, Alistair Turcan, Khurram Yamin, Steven Wu, Bryan Wilder
Multilinguality and Language Diversity
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
Xu Huang, Wenhao Zhu, Hanxu Hu, Conghui He, Lei Li, Shujian Huang, Fei Yuan
AI/LLM Agents
FLAIRR-TS – Forecasting LLM-Agents with Iterative Refinement and Retrieval for Time Series
Gunjan Jalori, Preetika Verma, Sercan O Arik
Large Language Model Agents in Finance: A Survey Bridging Research, Practice, and Real-World Deployment
Yifei Dong, Fengyi Wu, Kunlin Zhang, Yilong Dai, Sanjian Zhang, Wanghao Ye, Sihan Chen, Zhi-Qi Cheng
Code Models
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
Yuansheng Ni, Ping Nie, Kai Zou, Xiang Yue, Wenhu Chen
Retrieval-Augmented Language Models
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
Yilin Zhang, Xinran Zhao, Zora Zhiruo Wang, Chenyang Yang, Jiayi Wei, Tongshuang Wu
GAMIC: Graph-Aligned Molecular In-context Learning for Molecule Analysis via LLMs
Ali Al Lawati, Jason S Lucas, Zhiwei Zhang, Prasenjit Mitra, Suhang Wang
Speech Processing and Spoken Language Understanding
SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions
Massa Baali, Sarthak Bisht, Francisco Teixeira, Kateryna Shapovalenko, Rita Singh, Bhiksha Raj
CAARMA: Class Augmentation with Adversarial Mixup Regularization
Massa Baali, Xiang Li, Hao Chen, Syed Abdul Hannan, Rita Singh, Bhiksha Raj
Semantics: Lexical, Sentence-Level Semantics, Textual Inference, and Other Areas
Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text Modifications
Yiming Zeng, Wanhao Yu, Zexin Li, Tao Ren, Yu Ma, Jinghan Cao, Xiyan Chen, Tingting Yu
Ethics, Bias, and Fairness
Mitigate One, Skew Another? Tackling Intersectional Biases in Text-to-Image Models
Pushkar Shukla, Aditya Chinchure, Emily Diana, Alexander Tolbert, Kartik Hosanagar, Vineeth N. Balasubramanian, Leonid Sigal, Matthew A. Turk
Dialogue and Interactive Systems
Aligning Dialogue Agents with Global Feedback via Large Language Model Multimodal Reward Decomposition
Dong Won Lee, Hae Won Park, Cynthia Breazeal, Louis-Philippe Morency
LLM Efficiency
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
Jiahao Qiu, Yifu Lu, Yifan Zeng, Jiacheng Guo, Jiayi Geng, Chenhao Zhu, Xinzhe Juan, Ling Yang, Huazheng Wang, Kaixuan Huang, Yue Wu, Mengdi Wang
System Demonstrations
AgentDiagnose: An Open Toolkit for Diagnosing LLM Agent Trajectories
Tianyue Ou, Wanyao Guo, Apurva Gandhi, Graham Neubig, Xiang Yue
BioGraphia: A LLM-Assisted Biological Pathway Graph Annotation Platform
Xi Xu, Sumin Jo, Adam Officer, Angela Chen, Yufei Huang, Lei Li
Industry Track Papers
Leveraging LLMs to Streamline the Review of Public Funding Applications
João DS Marques, Andre Vicente Duarte, André Mendes Marques de Carvalho, Gil Rocha, Bruno Martins, Arlindo L. Oliveira
Semantic Agreement Enables Efficient Open-Ended LLM Cascades
Duncan Soiffer, Steven Kolawole, Virginia Smith