Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most

Computer Science · AI May 18, 2026

Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most

Why AI tutors spot perfect answers but miss the learning opportunities

Tahreem Yasir, Wenbo Li, Sam Gilson et al.
arXiv:2605.16207

Summary

Large language models used as tutoring agents excel at recognizing correct student solutions but systematically fail at distinguishing between wrong answers and right answers that use flawed reasoning—exactly the feedback that helps students improve. Across seven different AI models tested on 10,836 logic problems, the models over-accepted incorrect reasoning and over-rejected valid but inefficient approaches, suggesting these failures stem from how the models are built rather than from missing information.

Why it matters

As schools and tutoring platforms increasingly deploy AI as learning tools, this gap could undermine their effectiveness. Students might receive approval for sloppy reasoning or harsh rejection for approaches that actually work, neither of which promotes real understanding. The research suggests that AI tutors work best not as standalone replacements for human judgment, but as part of a hybrid system where traditional logic-based systems diagnose student reasoning while AI handles open-ended conversation and encouragement.

Read on arXiv Posted on arXiv · May 15, 2026