College of Educational and Behavioral Sciences; College of Educational and Behavioral Sciences
University of Northern Colorado
Type of Resources
Place of Publication
University of Northern Colorado
Density estimation and nonparametric methods have become increasingly important over the last three decades. These methods are useful in analyzing modern data which includes many variables with numerous observations. It is common to see datasets with hundreds of variables and millions of observations. Examples proliferate in fields such as geological exploration, speech recognition, and biological and medical research. Therefore, there is an increasing demand for tools that can detect and summarize the multivariate structure in complex data. Only nonparametric methods are able to do so when little is known about the data generating process. The term nonparametric does not necessarily mean that models lack parameters but that the number of parameters is infinite and not fixed. The functional forms of its parametric counterparts are known up to only a finite number of parameters. Kernel method is one of the most prominent and useful nonparametric approaches in statistics, machine learning, and econometrics. In fact, virtually all nonparametric algorithms are asymptotically kernel methods. Kernel analysis allows for transformation of high-dimensional data to low-dimensional statistical problems via kernel functions. Density estimation is now recognized as a useful tool with univariate and bivariate data. The goal of this study was to demonstrate that it is also a powerful tool for the analysis of high-dimensional data. The asymptotic aspects as well as the application of the nonparametric methods applied to multivariate data were the focus in this research which eventually leads to the research on Gaussian processes (GP), or more generally, random fields (RF). In this dissertation, a novel multivariate nonparametric approach was proposed to more strongly smooth raw data, reducing the dimension of the solution to a handful of interesting parameters. The proposed approach employed methods that exploited kernel density estimation (KDE) which can be applied to hypothesis testing of the equality of location parameters in the one-way layout as well as the main effects and interaction effect in the two-way layout. First, multivariate kernel-based tests were developed to conduct multivariate analysis of variance (MANOVA) for testing hypotheses against distinct group means in various settings, including one and two-way with interaction settings. Then, the asymptotic properties of the proposed methods were investigated and the asymptotic distributions were derived. Next, simulations were conducted to investigate the small-sample behavior of the proposed nonparametric kernel-based test statistics for the one and two-way layout. Then, comparisons were made between the proposed nonparametric kernel-based methods and the traditional parametric counterparts for one-way and two-way layout. Finally, the proposed nonparametric kernel-based methods were applied to a real image dataset. The results of this dissertation showed that the proposed nonparametric kernel-based methods have greater power than the counterpart parametric methods, i.e. MANOVA, in detecting differences between groups in multivariate settings when the underlying distribution of the data is not normal.
Copyright is held by the author.