First Advisor
Yu, Han
Document Type
Dissertation
Date Created
5-2022
Department
College of Education and Behavioral Sciences, Applied Statistics and Research Methods, ASRM Student Work
Abstract
High-dimensional data are increasingly popular in various physical science and social science disciplines. This study proposed a new computationally efficient sample splitting method called Neighborhood-Based Cross Fitting (NBCF) for double machine learning in causal inference on high-dimensional data. A common existing approach of repeatedly splitting data was suggested to address the overfitting problem in high-dimensional statistics, however it is computationally expensive. The proposed method deals well with the problem of post-selection bias in causal inference in the presence of high-dimensional confounders. Also, it provides an equivalent performance in unbiased estimation as repeated data splitting, which is suggested to expand the scope of function class by Donsker. Simulation studies were conducted to demonstrate that the proposed NBCF approach is not only more computationally efficient than the existing sample splitting methods, but also better in bias reduction compared with other existing methods. Under certain conditions, simulation results further showed that the proposed estimators are consistent, asymptotically unbiased, and normally distributed, which allows construction of valid confidence intervals. The practical application of NBCF was illustrated with a real dataset.
Extent
149 pages
Local Identifiers
Agboola_unco_0161D_11005.pdf
Rights Statement
Copyright is held by the author.
Recommended Citation
Agboola, Oluwagbenga David, "An Efficient Computational Method for Causal Inference in High-Dimensional Data: Neighborhood-Based Cross Fitting" (2022). Dissertations. 845.
https://digscholarship.unco.edu/dissertations/845