AMEL: Accumulated Message Effects on LLM Judgments
How past reviews secretly shape an AI's next judgment
Large language models used to evaluate work—like reviewing code or moderating content—shift their judgments based on what they've just evaluated. When fed a stream of mostly positive or negative reviews, models become biased toward that same tone on identical test items, with the effect strongest when the model was genuinely uncertain. Negative history creates 1.62 times more bias than positive, and the problem persists even in the largest models, though starting fresh for each evaluation eliminates it entirely.
Companies and platforms increasingly use AI to automate high-stakes judgments: grading student work, reviewing job applications, moderating content at scale. If these systems systematically skew their verdicts based on what came before—showing extra leniency after positive reviews or extra harshness after negative ones—they'll rate identical submissions unfairly depending on order. The fix is simple: evaluating each item in a fresh context rather than batch-processing many items in one conversation. Without it, the outcome for any given submission risks being determined partly by luck.