Machine unlearning is a promising approach to mitigate undesirable memorization of training data in ML models. In this post, we will discuss our work (which appeared at ICLR 2025) demonstrating that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of benign relearning attacks: With access to only a small and potentially loosely related set of data, we find that we can “jog” the memory of unlearned models to reverse the effects of unlearning. For example,…