Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

Computer Science · AI Jun 12, 2026

Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

Teaching delivery systems to balance speed and efficiency using real marketplace outcomes

Haochen Wu, Yi Hou, Shiguang Xie
arXiv:2606.13604

Summary

DoorDash researchers built an AI system that learns to adjust how its delivery dispatch algorithm weights speed against batching efficiency, using actual delayed signals from thousands of real deliveries. The system increased batching and cut courier time costs without slowing customer delivery times, by learning from historical marketplace data rather than requiring live experimentation.

Why it matters

Delivery platforms balance competing pressures constantly—faster delivery satisfies customers but wastes courier time; efficient batching saves money but frustrates hungry customers. This system automates that tradeoff adjustment using real operational data, letting platforms improve both cost and service simultaneously. The approach also demonstrates how to safely learn from messy, delayed real-world feedback without destabilizing live operations.

Read on arXiv Posted on arXiv · Jun 11, 2026