From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets

Quantitative Finance May 29, 2026

From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets

Testing whether AI traders are actually skilled or just remembering stock prices

Taojie Zhu, Wentao Zhao, Rui Sun et al.
arXiv:2605.28359

Summary

When researchers tested advanced AI language models on simulated stock trading, the models appeared to make money—but the gains came almost entirely from broad market movements, not genuine investment skill. A new benchmark called KTD-Fin revealed this by hiding stock names and dates to prevent the AI from relying on memorized information, and by breaking down returns to show which part came from real decision-making versus passive market exposure.

Why it matters

Companies and investors are pouring money into AI trading systems based on impressive backtest results. If those results are driven by the AI simply remembering what happened rather than learning to pick winning stocks, the systems will fail in live markets. This benchmark makes it possible to spot the difference—separating genuine trading skill from inflated performance numbers created by data leakage.

Read on arXiv Posted on arXiv · May 27, 2026