The ability of predicting failures before their occurrence is a fundamental enabler for reducing field failures and improving the reliability of complex software systems. Recent research proposes many techniques to detect anomalous values of system metrics, and demonstrates that collective anomalies are a good symptom of failure-prone states. In this paper (i) we observe the analogy of complex software systems with multi-particle and network systems, (ii) propose to use energy-based models commonly exploited in physics and statistical mechanics to precisely reveal failure-prone behaviors without training with seeded errors, and (iii) present some preliminary experimental results that show the feasibility of our approach.
Cristina Monni Università della Svizzera Italiana, Mauro Pezze Università della Svizzera italiana (USI) (Switzerland) and Università degli Studi di Milano Bicocca (Italy)