PAPER PLAINE

Fresh research, simply explained. Updates twice daily.

From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets

Testing whether AI traders are actually skilled or just remembering stock prices

When researchers tested advanced AI language models on simulated stock trading, the models appeared to make money—but the gains came almost entirely from broad market movements, not genuine investment skill. A new benchmark called KTD-Fin revealed this by hiding stock names and dates to prevent the AI from relying on memorized information, and by breaking down returns to show which part came from real decision-making versus passive market exposure.

Companies and investors are pouring money into AI trading systems based on impressive backtest results. If those results are driven by the AI simply remembering what happened rather than learning to pick winning stocks, the systems will fail in live markets. This benchmark makes it possible to spot the difference—separating genuine trading skill from inflated performance numbers created by data leakage.