FAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly Detection
Spotting exactly which log line signals a server problem, not just that something went wrong
Most systems that catch server problems flag entire groups of log lines, forcing engineers to dig through dozens of routine entries per alert. FAME uses an AI model to understand log patterns offline, then deploys lightweight detectors that pinpoint the exact problematic line in real time—catching 86% of problems even from never-before-seen error types, while requiring humans to label fewer than 100 examples per log type.
Server outages cost thousands of dollars per minute, and every minute spent investigating false alerts or irrelevant log lines is a minute closer to serious impact. By identifying the single line responsible for a failure instead of grouping entire sessions, FAME lets operators act faster and more confidently. The approach also cuts the labeling work required to deploy such systems by 76x, making it practical for teams managing millions of daily log lines across heterogeneous infrastructure.