On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference
How shuffling AI model outputs doesn't actually hide them from hackers
A security technique meant to protect AI models during remote computation—shuffling the model's internal activations before revealing them—can be broken for about $1 worth of queries. Researchers show how to align these shuffled values back to their original order, then use them to recover the model's actual weights, demonstrating the attack works on real models like GPT-2.
As AI systems move to cloud computing, companies rely on cryptographic defenses to keep model weights secret while still computing results. This attack shows a widely-used shuffling defense provides a false sense of security—meaning companies using it may think their models are protected when they're actually vulnerable to cheap theft. Developers now need better defenses before deploying sensitive models to untrusted servers.