Explainable and Privacy-Preserving Machine Learning via Domain-Aware Symbolic Regression

Kei Sen Fong, Mehul Motani

View paper (PDF)

Abstract: Explainability and privacy are the top concerns in machine learning (ML) for medical applications. In this paper, we propose a novel method, Domain-Aware Symbolic Regression with Homomorphic Encryption (DASR-HE), that addresses both concerns simultaneously by: (i) producing domain-aware, intuitive and explainable models that do not require the end-user to possess ML expertise and (ii) training only on securely encrypted data without access to actual data values or model parameters. DASR-HE is based on Symbolic Regression (SR), which is a first-class ML approach that produces simple and concise equations for regression, requiring no ML expertise to interpret. In our work, we improve the performance of SR algorithms by using existing domain-specific medical equations to augment the search space of equations, decreasing the search complexity and producing equations that are similar in structure to those used in practice. To preserve the privacy of the medical data, we enable our algorithm to learn on data that is homomorphically encrypted (HE), meaning that arithmetic operations can be done in the encrypted space. This makes HE suitable for machine learning algorithms to learn models without access to the actual data values or model parameters. We evaluate DASR-HE on three medical tasks, namely predicting glomerular filtration rate, endotracheal tube (ETT) internal diameter and ETT depth and find that DASR-HE outperforms existing medical equations, other SR ML algorithms and other explainable ML algorithms.