Developing Criteria for Extracting Principal Components and Assessing Multiple Significance Tests in Knowledge Discovery Applications

Keeling, Kellie Bliss

Developing Criteria for Extracting Principal Components and Assessing Multiple Significance Tests in Knowledge Discovery Applications

Primary view of object titled 'Developing Criteria for Extracting Principal Components and Assessing Multiple Significance Tests in Knowledge Discovery Applications'.

PDF Version Also Available for Download.

Description

With advances in computer technology, organizations are able to store large amounts of data in data warehouses. There are two fundamental issues researchers must address: the dimensionality of data and the interpretation of multiple statistical tests. The first issue addressed by this research is the determination of the number of components to retain in principal components analysis. This research establishes regression, asymptotic theory, and neural network approaches for estimating mean and 95th percentile eigenvalues for implementing Horn's parallel analysis procedure for retaining components. Certain methods perform better for specific combinations of sample size and numbers of variables. The adjusted normal … continued below

Creation Information

Keeling, Kellie Bliss August 1999.

Context

This dissertation is part of the collection entitled: UNT Theses and Dissertations and was provided by the UNT Libraries to the UNT Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 128 times. More information about this dissertation can be viewed below.

Publisher

University of North Texas
Place of Publication: Denton, Texas

Rights Holder

For guidance see Citations, Rights, Re-Use.

Keeling, Kellie Bliss

Provided By

UNT Libraries

The UNT Libraries serve the university and community by providing access to physical and online collections, fostering information literacy, supporting academic research, and much, much more.

Degree Information

Name: Doctor of Philosophy
Level: Doctoral
Discipline: Management Science
Department: Department of Information Technology and Decision Sciences
Grantor: University of North Texas

Description

With advances in computer technology, organizations are able to store large amounts of data in data warehouses. There are two fundamental issues researchers must address: the dimensionality of data and the interpretation of multiple statistical tests. The first issue addressed by this research is the determination of the number of components to retain in principal components analysis. This research establishes regression, asymptotic theory, and neural network approaches for estimating mean and 95th percentile eigenvalues for implementing Horn's parallel analysis procedure for retaining components. Certain methods perform better for specific combinations of sample size and numbers of variables. The adjusted normal order statistic estimator (ANOSE), an asymptotic procedure, performs the best overall. Future research is warranted on combining methods to increase accuracy. The second issue involves interpreting multiple statistical tests. This study uses simulation to show that Parker and Rothenberg's technique using a density function with a mixture of betas to model p-values is viable for p-values from central and non-central t distributions. The simulation study shows that final estimates obtained in the proposed mixture approach reliably estimate the true proportion of the distributions associated with the null and nonnull hypotheses. Modeling the density of p-values allows for better control of the true experimentwise error rate and is used to provide insight into grouping hypothesis tests for clustering purposes. Future research will expand the simulation to include p-values generated from additional distributions. The techniques presented are applied to data from Lake Texoma where the size of the database and the number of hypotheses of interest call for nontraditional data mining techniques. The issue is to determine if information technology can be used to monitor the chlorophyll levels in the lake as chloride is removed upstream. A relationship established between chlorophyll and the energy reflectance, which can be measured by satellites, enables more comprehensive and frequent monitoring. The results have both economic and political ramifications.

Subjects

Keywords

Library of Congress Subject Headings

Language

English

Item Type

Thesis or Dissertation

Identifier

Unique identifying numbers for this dissertation in the Digital Library or other systems.

OCLC: 45056324
UNT Catalog No.: b2228421 | View in Discover
Archival Resource Key: ark:/67531/metadc2231

Collections

This dissertation is part of the following collection of related materials.

UNT Theses and Dissertations

Theses and dissertations represent a wealth of scholarly and artistic content created by masters and doctoral students in the degree-seeking process. Some ETDs in this collection are restricted to use by the UNT community.

What responsibilities do I have when using this dissertation?

Creation Date

August 1999

Added to The UNT Digital Library

Sept. 20, 2007, 8:49 p.m.

Description Last Updated

March 21, 2019, 3:43 p.m.

Usage Statistics

When was this dissertation last used?

Yesterday: 0

Past 30 days: 0

Total Uses: 128

Keeling, Kellie Bliss. Developing Criteria for Extracting Principal Components and Assessing Multiple Significance Tests in Knowledge Discovery Applications, dissertation, August 1999; Denton, Texas. (https://digital.library.unt.edu/ark:/67531/metadc2231/: accessed April 19, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; .

Developing Criteria for Extracting Principal Components and Assessing Multiple Significance Tests in Knowledge Discovery Applications

Description

Creation Information

Context

Who

Author

Chair

Committee Members

Publisher

Rights Holder

Provided By

UNT Libraries

Contact Us

What

Degree Information

Description

Subjects

Keywords

Library of Congress Subject Headings

Language

Item Type

Identifier

Collections

UNT Theses and Dissertations

Digital Files

When

Creation Date

Added to The UNT Digital Library

Description Last Updated

Usage Statistics

Interact With This Dissertation

Search Inside

Start Reading

Citations, Rights, Re-Use

International Image Interoperability Framework

Print / Share

Links for Robots

Archival Resource Key (ARK)

International Image Interoperability Framework (IIIF)

Metadata Formats

Images

URLs

Stats