First Advisor

Yu, Han

Document Type


Date Created



High-dimensional data are increasingly popular in various physical science and social science disciplines. This study proposed a new computationally efficient sample splitting method called Neighborhood-Based Cross Fitting (NBCF) for double machine learning in causal inference on high-dimensional data. A common existing approach of repeatedly splitting data was suggested to address the overfitting problem in high-dimensional statistics, however it is computationally expensive. The proposed method deals well with the problem of post-selection bias in causal inference in the presence of high-dimensional confounders. Also, it provides an equivalent performance in unbiased estimation as repeated data splitting, which is suggested to expand the scope of function class by Donsker. Simulation studies were conducted to demonstrate that the proposed NBCF approach is not only more computationally efficient than the existing sample splitting methods, but also better in bias reduction compared with other existing methods. Under certain conditions, simulation results further showed that the proposed estimators are consistent, asymptotically unbiased, and normally distributed, which allows construction of valid confidence intervals. The practical application of NBCF was illustrated with a real dataset.


149 pages

Local Identifiers


Rights Statement

Copyright is held by the author.