RoSHAP: A Distributional Framework and Robust Metric for Stable Feature Attribution
Making machine learning explanations reliable across different data splits
Machine learning models often rank features differently depending on random choices in training, making it hard to trust which factors actually matter. This paper introduces RoSHAP, a new method that accounts for this natural variation by treating feature importance as a distribution rather than a single number, and shows it identifies truly influential features more reliably than standard approaches.
When doctors, banks, or regulators rely on machine learning to make decisions, they need to know which factors the model actually used—not just a ranking that changes every time the model is retrained. RoSHAP makes those explanations stable and trustworthy. The method also lets companies use fewer data inputs while keeping the same prediction accuracy, reducing complexity without sacrificing performance.