Tutorials from ACM CHIL 2021

Causal Inference in Clinical Research: From Theory to Practice

Linbo Wang

Abstract: Causal inference is an important topic in healthcare because a causal relationship between an exposure and a health outcome may suggest an intervention to improve the health outcome. In this tutorial, we provide an introduction to the field of causal inference. We will cover several fundamental topics in causal inference, including the potential outcome framework, structural equation modeling, propensity score modeling, and instrumental variable analysis. Methods will be illustrated using real clinical examples.

Bio: Linbo Wang is an assistant professor in the Department of Statistical Sciences, University of Toronto. He is also an Affiliate Assistant Professor in the Department of Statistics, University of Washington, and a faculty affiliate at Vector Institute. His research interest is centered around causality and its interaction with statistics and machine learning. Prior to these roles, he was a postdoc at Harvard T.H. Chan School of Public Health. He obtained his Ph.D. from the University of Washington.

Experimental Design and Causal Inference Methods For Micro-Randomized Trials: A Framework for Developing Mobile Health Interventions

Tianchen Qian

Abstract: Mobile health (mHealth) technologies are providing new promising ways to deliver interventions in both clinical and non-clinical settings. Wearable sensors and smartphones collect real-time data streams that provide information about an individual’s current health including both internal (e.g., mood, blood sugar level) and external (e.g., social, location) contexts. Both wearables and smartphones can be used to deliver interventions. mHealth interventions are in current use across a vast number of health-related fields including medication adherence, physical activity, weight loss, mental illness and addictions. This tutorial discusses the micro-randomized trial (MRT), an experimental trial design for use in optimizing real time delivery of sequences of treatment, with an emphasis on mHealth. We introduce the MRT design using HeartSteps, a physical activity study, as an example. We define the causal excursion effect and discuss reasons why this effect is often considered the primary causal effect of interest in MRT analysis. We introduce statistical methods for primary and secondary analyses for MRT with continuous binary outcomes. We discuss the sample size considerations for designing MRTs.

Bio: Tianchen Qian is an Assistant Professor in the Department of Statistics at University of California, Irvine. He completed his PhD at the Johns Hopkins University and was a postdoctoral fellow at Harvard University. His research is focused on the experimental design and statistical analysis methods for developing mobile health interventions. In particular, he has developed causal inference methods for analyzing micro-randomized trial data and sample size calculation approaches for designing micro-randomized trials.

Tianchen Qian, Ph.D., Assistant Professor, Department of Statistics, Donald Bren School of Information and Computer Sciences, UC Irvine | Email: t.qian@uci.edu | Website: https://sites.google.com/view/tianchen-qian

Offline Reinforcement Learning

Guy Tennenholtz

Abstract: Offline reinforcement learning (offline RL), a.k.a. batch-mode reinforcement learning, involves learning a policy from potentially suboptimal data. In contrast to imitation learning, offline RL does not rely on expert demonstrations, but rather seeks to surpass the average performance of the agents that generated the data. Methodologies such as the gathering of new experience fall short in offline settings, requiring reassessment of fundamental learning paradigms. In this tutorial I aim to provide the necessary background and challenges of this exciting area of research, from off policy evaluation through bandits to deep reinforcement learning.

Bio: Guy Tennenholtz is a fourth-year Ph.D. student at the Technion University, advised by Prof. Shie Mannor. His research interests lie in the field of reinforcement learning, and specifically, how offline data can be leveraged to build better agents. Problems of large action spaces, partial observability, confounding bias, and uncertainty are only some of the problems he is actively researching. In his spare time Guy also enjoys creating mobile games, with the vision of incorporating AI into both the game development process and gameplay.

Explainable ML: Understanding the Limits and Pushing the Boundaries

Hima Lakkaraju

Abstract: As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in a post hoc manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. However, recent research has shed light on the vulnerabilities of popular post hoc explanation techniques. In this tutorial, I will provide a brief overview of post hoc explanation methods with special emphasis on feature attribution methods such as LIME and SHAP. I will then discuss recent research which demonstrates that these methods are brittle, unstable, and are vulnerable to a variety of adversarial attacks. Lastly, I will present two solutions to address some of the vulnerabilities of these methods – (i) a generic framework based on adversarial training that is designed to make post hoc explanations more stable and robust to shifts in the underlying data, and (ii) a Bayesian framework that captures the uncertainty associated with post hoc explanations and in turn allows us to generate reliable explanations which satisfy user specified levels of confidence. Overall, this tutorial will provide a bird’s eye view of the state-of-the-art in the burgeoning field of explainable machine learning.

Bio: Hima Lakkaraju is an Assistant Professor at Harvard University focusing on explainability, fairness, and robustness of machine learning models. She has also been working with various domain experts in criminal justice and healthcare to understand the real world implications of explainable and fair ML. Hima has recently been named one of the 35 innovators under 35 by MIT Tech Review, and has received best paper awards at SIAM International Conference on Data Mining (SDM) and INFORMS. She has given invited workshop talks at ICML, NeurIPS, AAAI, and CVPR, and her research has also been covered by various popular media outlets including the New York Times, MIT Tech Review, TIME, and Forbes. For more information, please visit: https://himalakkaraju.github.io

Semi-supervised Phenotyping with Electronic Health Records

Jesse Gronsbell , Chuan Hong , Molei Liu , Clara-Lea Bonzel , Aaron Sonabend

Abstract: Phenotyping is the process of identifying a patient’s health state based on the information in their electronic health records. In this tutorial, we will discuss why phenotyping is a challenging problem from both a practical and methodological perspective. We will focus primarily on the the challenges in obtaining annotated phenotype information from patient records and present statistical learning methods that leverage unlabeled examples to improve model estimation and evaluation to reduce the annotation burden.

Bio: Jesse Gronsbell is an Assistant Professor in the Department of Statistical Sciences at the University of Toronto. Prior to joining U of T, Jesse spent a couple of years as a data scientist in the Mental Health Research and Development Group at Alphabet's Verily Life Sciences. Her primary interest is in the development of statistical methods for modern digital data sources such as electronic health records and mobile health data.

Chuan Hong is an instructor in biomedical informatics from the Department of Biomedical Informatics (DBMI) at Harvard Medical School. She received her PhD in Biostatistics from the University of Texas Health Science Center at Houston. Her doctoral research focused on meta-analysis and DNA methylation detection. At DBMI, Chuan's research interests lie in developing statistical and computational methods for biomarker evaluation, predictive modeling, and precision medicine with biomedical data. In particular, she is interested in combining electronic medical records with biorepositories and relevant resources to improve phenotyping accuracy, detect novel biomarkers, and monitor disease progression in clinical research.

Molei Liu is a 4th year PhD candidate in the Biostatistics department at Harvard T.H. Chan School of Public Health. He received a Bachelor's degree in Statistics from Peking University. Molei has been working in areas including high dimensional statistics, distributed learning, semi-supervised learning, semi-parametric inference, and model-X inference. He has also been working on methods for phenome-wide association studies (PheWAS) using electronic health records data.

Clara-Lea Bonzel is a research assistant at the Department of Biomedical Informatics at Harvard Medical School. She is mainly interested in personalized medicine using phenomic and genomic data, and model selection and evaluation. Clara-Lea received her master's degree in Applied Mathematics and Financial Engineering from the Swiss Federal Institute of Technology (EPFL).

Aaron Sonabend is a PhD candidate in the Biostatistics department at Harvard T.H. Chan School of Public Health. He is primarily focused on developing robust reinforcement learning and natural language processing methods for contexts with sampling bias, partially observed rewards, or strong distribution shifts. He is interested in healthcare and biomedical applications, such as finding optimal sequential treatment regimes for complex diseases, and phenotyping using electronic health records. Aaron holds a Bachelor's degree in Applied Mathematics, and in Economics from the National Autonomous Technological Institute of Mexico (ITAM).