Research


I have a background in statistics, survey data science, and ML. The main area of my research concerns calibrating big data as large-scale non-probability samples for finite population inference. The wide reliance of existing approaches on flexible prediction modeling has led me to develop my theoretical knowledge and skills in ML, with more focus on Bayesian learning, e.g. BART and GP, which are not only robust but also permit quantifying the prediction uncertainty. Since survey data come with additional complexities in the sample design and data structure, I’d like to expand my studies on how to properly account for sampling weights and sampling clusters when training ML models.

    My research interest also goes to a wide range of methods used for causal inference using observational data. I am particularly interested in the use of ML methods for estimating heterogeneous treatment effects in complex scenarios, e.g. when non-compliance or pre-exposure bias is present. I am also experienced in the design and analysis of complex sample surveys, especially in health domains. Over the course of working at CDC, I have had the opportunity to contribute to the conduct of multiple health surveys from designing the sampling scheme to analyzing the data and providing reports. Bayesian reinforcement learning, and building automated anomaly detection systems, especially in the domain of early detection of infectious disease epidemics.