Adversarial Loop Prompting
Generalizing my LLM-assisted coding workflow
My last post described a code review workflow. The technique I described generalizes to other domains. Those that interest me most right now are writing prose and learning. Here’s the formula that a prompter can apply to multiple domains: I’m calling it “Adversarial Loop Prompting.” I picture it as a kind of courtroom, with a defense lawyer, a prosecutor, and a judge.
The generator (defense) produces output, code, prose, analysis, or an answer to a problem.
The critic (prosecutor) attacks the output, finding problems, some of which could be hallucinated. The critic needs to provide convincing, falsifiable evidence of each problem.
The generator either fixes or defends the output in each case where problems were found. Either the generator explains why something is correct, or corrects it.
The human judge arbitrates the whole process at each step, guiding it to its completion, making decisions, and guiding the generator and critic.
Let’s apply adversarial loop prompting to prose. I have a strong bias towards human-written prose, because AI-generated text seems to have a taint. Readers are very sensitive to prose that seems written by an LLM, and I recoil at the idea of being perceived as inauthentic. In this case, the generator is a human.
The human generator crafts a first draft of prose. It’s easier for humans to write first, then edit.
The LLM or human editor reviews the first draft, pointing out logical errors, inconsistencies in the argument, sections that sound weak, grammatical errors, and stylistic choices that could be improved.
The writer fixes or defends each problem that the editor has found, iterating through multiple edit cycles.
The human judge evaluates the outcome of the editing cycles and decides when the prose is ready for publication or sharing.
Now let’s try it for learning. Either the question comes from the source material (e.g., a textbook) or from an LLM acting as a tutor.
The human learner answers the question.
The LLM finds issues with the human learner’s answer, suggests corrections or improvements, and explains why the found issues are problems.
The learner either corrects the issues or defends the initial answer or the part where the fault was found. The learner and LLM continue iterating for multiple cycles until all issues are resolved and agreement is reached.
The human judge determines when equilibrium is reached and when the learner can move on to the next question or is finished with the assignment.
In similar ways, this technique could also be applied to analysis and research, or decision-making. As with coding, it requires some experimentation to figure out the details. Nevertheless, this technique can be helpful in many domains.
Copying and pasting text from one LLM to another is wasted effort. In the future, it might be useful to have tools that automate the adversarial loop. The whole thing can be done in a human/LLM shared forum, where each participant knows their current role in the process. Another innovation might be a kind of jury, where critics gather to evaluate each other’s reviews, reducing the need for human vetting. I plan to explore the application of adversarial loop prompting to prose and learning to start, then move on to other domains.
The technique I’m calling Adversarial Loop Prompting has academic precedent: it's called Multi-Agent Debate (MAD), which has a nice ring to it. A colleague of mine suggested a key innovation, that code reviewers should write tests for their assertions. For the LLM critic, this is the falsifiability requirement: if you claim there’s a problem, you need to prove it.

