Figure 1: Overview of RL Prompt for discrete prompt optimization. All language models (LMs) are frozen. We build our policy network by training a task-specific multi-layer perceptron (MLP) network inserted into a frozen pre-trained LM. The figure above illustrates 1) generation of a prompt (left), 2) example usages in a masked LM for classification (top right) and a left-to-right LM for generation (bottom right), and 3) update of the MLP using RL reward signals (red arrows). TL;DR: Prompting enables large…