Machine Learning Blog | ML@CMU | Carnegie Mellon University

computer vision machine learning reinforcement learning Research

Introducing ARFBench: A time series question-answering benchmark based on real incidents

by Stephan Xie / April 27, 2026

More than a trillion dollars are lost every year due to system failures. To resolve them, engineers must troubleshoot outages quickly. An important task in incident response involves analyzing observability metrics, or time series data that snapshot the health of software systems. For example, an engineer for a service may use Datadog to answer questions like “When did latency start increasing?” and “What metrics outside of latency are also behaving abnormally?” to localize the root cause of the anomalous behavior.…

93 1811

machine learning

Carnegie Mellon at ICLR 2026

April 20, 2026

CMU researchers are presenting 194 papers at the Fourteenth International Conference on Learning Representations (ICLR 2026), held from April 23rd-April 27th at the Riocentro Convention and Event Center in Rio de Janeiro, Brazil. Here is a quick overview of the…

151 5051

machine learning

When Should AI Step Aside?: Teaching Agents When Humans Want to Intervene

April 13, 2026

Recent advances in large language models (LLMs) have enabled AI agents to perform increasingly complex tasks in web navigation. Despite this progress, effective use of such agents continues to rely on human involvement to correct misinterpretations or adjust outputs that…

188 3079

artificial intelligence computer science machine learning natural language processing Research

LumberChunker: Long-Form Narrative Document Segmentation

March 17, 2026

Links:Paper | Code | Data LumberChunker lets an LLM decide where a long story should be split, creating more natural chunks that help Retrieval Augmented Generation (RAG) systems retrieve the right information. Introduction Long-form narrative documents usually have an explicit…

291 4703

machine learning

Carnegie Mellon at NeurIPS 2025

February 11, 2026

CMU researchers are presenting 156 papers at the Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025), held from December 2nd-December 7th at the San Diego Convention. Here is a quick overview of the areas our researchers are working…

620 23818

artificial intelligence machine learning natural language processing

Yes, AI, There is a Santa Claus

December 23, 2025

People use LLMs to ask for insight on a variety of important questions: future planning, emotional problems, scientific research. But in late December, one can expect some LLM users to be asking another, perhaps more pressing question: Is Santa Claus…

535 11154

machine learning

Validating LLM-as-a-Judge Systems under Rating Indeterminacy

December 9, 2025

Figure 1: Our framework for validating LLM-as-a-judge systems under rating indeterminacy, where items in a subjective rating task can have multiple “correct” ratings. Our framework provides guidance on (i) how to structure rating tasks to capture rater disagreement, (ii) how…

620 14231

machine learning reinforcement learning Research

How to Explore to Scale RL Training of LLMs on Hard Problems?

November 26, 2025

LLM RL typically operates in one of three exploration regimes: sharpening, chaining, or guided exploration; standard RL stays in the first two and plateaus on hard problems, even those in the training set. Mixing easy and hard data triggers interference,…

717 20492

Educational machine learning

Carnegie Mellon University at EMNLP 2025

November 8, 2025

CMU researchers are presenting 50 papers at the Thirtieth Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), held from November 4 – 9 in Suzhou, China. This includes 27 papers in the main conference, 19 papers in the…

758 8456

artificial intelligence machine learning natural language processing Research

Learning from Failure to Tackle Extremely Hard Problems

October 27, 2025

This blog post is based on the work BaNEL: Exploration Posteriors for Generative Modeling Using Only Negative Rewards. Tackling Very Hard Problems The ultimate aim of machine learning research is to push machines beyond human limits in critical applications, including…

786 45921

Older Posts

Machine Learning Blog | ML@CMU | Carnegie Mellon University

Statistics:

Categories: