Developing Criteria for Extracting Principal Components and Assessing Multiple Significance Tests in Knowledge Discovery Applications

PDF Version Also Available for Download.

Description

With advances in computer technology, organizations are able to store large amounts of data in data warehouses. There are two fundamental issues researchers must address: the dimensionality of data and the interpretation of multiple statistical tests. The first issue addressed by this research is the determination of the number of components to retain in principal components analysis. This research establishes regression, asymptotic theory, and neural network approaches for estimating mean and 95th percentile eigenvalues for implementing Horn's parallel analysis procedure for retaining components. Certain methods perform better for specific combinations of sample size and numbers of variables. The adjusted normal ... continued below

Creation Information

Keeling, Kellie Bliss August 1999.

Context

This dissertation is part of the collection entitled: UNT Theses and Dissertations and was provided by UNT Libraries to Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 81 times . More information about this dissertation can be viewed below.

Who

People and organizations associated with either the creation of this dissertation or its content.

Chair

Committee Members

Publisher

Rights Holder

For guidance see Citations, Rights, Re-Use.

  • Keeling, Kellie Bliss

Audiences

We've identified this thesis or dissertation as a primary source within our collections. Researchers, educators, and students may find this dissertation useful in their work.

Provided By

UNT Libraries

Library facilities at the University of North Texas function as the nerve center for teaching and academic research. In addition to a major collection of electronic journals, books and databases, five campus facilities house just under six million cataloged holdings, including books, periodicals, maps, documents, microforms, audiovisual materials, music scores, full-text journals and books. A branch library is located at the University of North Texas Dallas Campus.

Contact Us

What

Descriptive information to help identify this dissertation. Follow the links below to find similar items on the Digital Library.

Description

With advances in computer technology, organizations are able to store large amounts of data in data warehouses. There are two fundamental issues researchers must address: the dimensionality of data and the interpretation of multiple statistical tests. The first issue addressed by this research is the determination of the number of components to retain in principal components analysis. This research establishes regression, asymptotic theory, and neural network approaches for estimating mean and 95th percentile eigenvalues for implementing Horn's parallel analysis procedure for retaining components. Certain methods perform better for specific combinations of sample size and numbers of variables. The adjusted normal order statistic estimator (ANOSE), an asymptotic procedure, performs the best overall. Future research is warranted on combining methods to increase accuracy. The second issue involves interpreting multiple statistical tests. This study uses simulation to show that Parker and Rothenberg's technique using a density function with a mixture of betas to model p-values is viable for p-values from central and non-central t distributions. The simulation study shows that final estimates obtained in the proposed mixture approach reliably estimate the true proportion of the distributions associated with the null and nonnull hypotheses. Modeling the density of p-values allows for better control of the true experimentwise error rate and is used to provide insight into grouping hypothesis tests for clustering purposes. Future research will expand the simulation to include p-values generated from additional distributions. The techniques presented are applied to data from Lake Texoma where the size of the database and the number of hypotheses of interest call for nontraditional data mining techniques. The issue is to determine if information technology can be used to monitor the chlorophyll levels in the lake as chloride is removed upstream. A relationship established between chlorophyll and the energy reflectance, which can be measured by satellites, enables more comprehensive and frequent monitoring. The results have both economic and political ramifications.

Language

Identifier

Unique identifying numbers for this dissertation in the Digital Library or other systems.

Collections

This dissertation is part of the following collection of related materials.

UNT Theses and Dissertations

Theses and dissertations represent a wealth of scholarly and artistic content created by masters and doctoral students in the degree-seeking process. Some ETDs in this collection are restricted to use by the UNT community.

What responsibilities do I have when using this dissertation?

When

Dates and time periods associated with this dissertation.

Creation Date

  • August 1999

Added to The UNT Digital Library

  • Sept. 20, 2007, 8:49 p.m.

Description Last Updated

  • April 8, 2016, 6:59 p.m.

Usage Statistics

When was this dissertation last used?

Yesterday: 0
Past 30 days: 0
Total Uses: 81

Interact With This Dissertation

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

Citations, Rights, Re-Use

Keeling, Kellie Bliss. Developing Criteria for Extracting Principal Components and Assessing Multiple Significance Tests in Knowledge Discovery Applications, dissertation, August 1999; Denton, Texas. (digital.library.unt.edu/ark:/67531/metadc2231/: accessed April 30, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; .