UNT Theses and Dissertations - Browse


Classification by Neural Network and Statistical Models in Tandem: Does Integration Enhance Performance?

Description: The major purposes of the current research are twofold. The first purpose is to present a composite approach to the general classification problem by using outputs from various parametric statistical procedures and neural networks. The second purpose is to compare several parametric and neural network models on a transportation planning related classification problem and five simulated classification problems.
Date: December 1998
Creator: Mitchell, David
Partner: UNT Libraries

Comparing the Powers of Several Proposed Tests for Testing the Equality of the Means of Two Populations When Some Data Are Missing

Description: In comparing the means .of two normally distributed populations with unknown variance, two tests very often used are: the two independent sample and the paired sample t tests. There is a possible gain in the power of the significance test by using the paired sample design instead of the two independent samples design.
Date: May 1994
Creator: Dunu, Emeka Samuel
Partner: UNT Libraries

Developing Criteria for Extracting Principal Components and Assessing Multiple Significance Tests in Knowledge Discovery Applications

Description: With advances in computer technology, organizations are able to store large amounts of data in data warehouses. There are two fundamental issues researchers must address: the dimensionality of data and the interpretation of multiple statistical tests. The first issue addressed by this research is the determination of the number of components to retain in principal components analysis. This research establishes regression, asymptotic theory, and neural network approaches for estimating mean and 95th percentile eigenvalues for implementing Horn's parallel analysis procedure for retaining components. Certain methods perform better for specific combinations of sample size and numbers of variables. The adjusted normal order statistic estimator (ANOSE), an asymptotic procedure, performs the best overall. Future research is warranted on combining methods to increase accuracy. The second issue involves interpreting multiple statistical tests. This study uses simulation to show that Parker and Rothenberg's technique using a density function with a mixture of betas to model p-values is viable for p-values from central and non-central t distributions. The simulation study shows that final estimates obtained in the proposed mixture approach reliably estimate the true proportion of the distributions associated with the null and nonnull hypotheses. Modeling the density of p-values allows for better control of the true experimentwise error rate and is used to provide insight into grouping hypothesis tests for clustering purposes. Future research will expand the simulation to include p-values generated from additional distributions. The techniques presented are applied to data from Lake Texoma where the size of the database and the number of hypotheses of interest call for nontraditional data mining techniques. The issue is to determine if information technology can be used to monitor the chlorophyll levels in the lake as chloride is removed upstream. A relationship established between chlorophyll and the energy reflectance, which can be measured by satellites, enables ...
Date: August 1999
Creator: Keeling, Kellie Bliss
Partner: UNT Libraries

Economic Statistical Design of Inverse Gaussian Distribution Control Charts

Description: Statistical quality control (SQC) is one technique companies are using in the development of a Total Quality Management (TQM) culture. Shewhart control charts, a widely used SQC tool, rely on an underlying normal distribution of the data. Often data are skewed. The inverse Gaussian distribution is a probability distribution that is wellsuited to handling skewed data. This analysis develops models and a set of tools usable by practitioners for the constrained economic statistical design of control charts for inverse Gaussian distribution process centrality and process dispersion. The use of this methodology is illustrated by the design of an x-bar chart and a V chart for an inverse Gaussian distributed process.
Date: August 1990
Creator: Grayson, James M. (James Morris)
Partner: UNT Libraries

The Effect of Certain Modifications to Mathematical Programming Models for the Two-Group Classification Problem

Description: This research examines certain modifications of the mathematical programming models to improve their classificatory performance. These modifications involve the inclusion of second-order terms and secondary goals in mathematical programming models. A Monte Carlo simulation study is conducted to investigate the performance of two standard parametric models and various mathematical programming models, including the MSD (minimize sum of deviations) model, the MIP (mixed integer programming) model and the hybrid linear programming model.
Date: May 1994
Creator: Wanarat, Pradit
Partner: UNT Libraries

The Fixed v. Variable Sampling Interval Shewhart X-Bar Control Chart in the Presence of Positively Autocorrelated Data

Description: This study uses simulation to examine differences between fixed sampling interval (FSI) and variable sampling interval (VSI) Shewhart X-bar control charts for processes that produce positively autocorrelated data. The influence of sample size (1 and 5), autocorrelation parameter, shift in process mean, and length of time between samples is investigated by comparing average time (ATS) and average number of samples (ANSS) to produce an out of control signal for FSI and VSI Shewhart X-bar charts. These comparisons are conducted in two ways: control chart limits pre-set at ±3σ_x / √n and limits computed from the sampling process. Proper interpretation of the Shewhart X-bar chart requires the assumption that observations are statistically independent; however, process data are often autocorrelated over time. Results of this study indicate that increasing the time between samples decreases the effect of positive autocorrelation between samples. Thus, with sufficient time between samples the assumption of independence is essentially not violated. Samples of size 5 produce a faster signal than samples of size 1 with both the FSI and VSI Shewhart X-bar chart when positive autocorrelation is present. However, samples of size 5 require the same time when the data are independent, indicating that this effect is a result of autocorrelation. This research determined that the VSI Shewhart X-bar chart signals increasingly faster than the corresponding FSI chart as the shift in the process mean increases. If the process is likely to exhibit a large shift in the mean, then the VSI technique is recommended. But the faster signaling time of the VSI chart is undesirable when the process is operating on target. However, if the control limits are estimated from process samples, results show that when the process is in control the ARL for the FSI and the ANSS for the VSI are approximately the same, and ...
Date: May 1993
Creator: Harvey, Martha M. (Martha Mattern)
Partner: UNT Libraries

A Heuristic Procedure for Specifying Parameters in Neural Network Models for Shewhart X-bar Control Chart Applications

Description: This study develops a heuristic procedure for specifying parameters for a neural network configuration (learning rate, momentum, and the number of neurons in a single hidden layer) in Shewhart X-bar control chart applications. Also, this study examines the replicability of the neural network solution when the neural network is retrained several times with different initial weights.
Date: December 1993
Creator: Nam, Kyungdoo T.
Partner: UNT Libraries

Mathematical Programming Approaches to the Three-Group Classification Problem

Description: In the last twelve years there has been considerable research interest in mathematical programming approaches to the statistical classification problem, primarily because they are not based on the assumptions of the parametric methods (Fisher's linear discriminant function, Smith's quadratic discriminant function) for optimality. This dissertation focuses on the development of mathematical programming models for the three-group classification problem and examines the computational efficiency and classificatory performance of proposed and existing models. The classificatory performance of these models is compared with that of Fisher's linear discriminant function and Smith's quadratic discriminant function. Additionally, this dissertation investigates theoretical characteristics of mathematical programming models for the classification problem with three or more groups. A computationally efficient model for the three-group classification problem is developed. This model minimizes directly the number of misclassifications in the training sample. Furthermore, the classificatory performance of the proposed model is enhanced by the introduction of a two-phase algorithm. The same algorithm can be used to improve the classificatory performance of any interval-based mathematical programming model for the classification problem with three or more groups. A modification to improve the computational efficiency of an existing model is also proposed. In addition, a multiple-group extension of a mathematical programming model for the two-group classification problem is introduced. A simulation study on classificatory performance reveals that the proposed models yield lower misclassification rates than Fisher's linear discriminant function and Smith's quadratic discriminant function under certain data configurations. Data configurations, where the parametric methods outperform the proposed models, are also identified. A number of theoretical characteristics of mathematical programming models for the classification problem are identified. These include conditions for the existence of feasible solutions, as well as conditions for the avoidance of degenerate solutions. Additionally, conditions are identified that guarantee the classificatory non-inferiority of one model over another in the training ...
Date: August 1993
Creator: Loucopoulos, Constantine
Partner: UNT Libraries

Robustness of Parametric and Nonparametric Tests When Distances between Points Change on an Ordinal Measurement Scale

Description: The purpose of this research was to evaluate the effect on parametric and nonparametric tests using ordinal data when the distances between points changed on the measurement scale. The research examined the performance of Type I and Type II error rates using selected parametric and nonparametric tests.
Date: August 1994
Creator: Chen, Andrew H. (Andrew Hwa-Fen)
Partner: UNT Libraries

Robustness of the One-Sample Kolmogorov Test to Sampling from a Finite Discrete Population

Description: One of the most useful and best known goodness of fit test is the Kolmogorov one-sample test. The assumptions for the Kolmogorov (one-sample test) test are: 1. A random sample; 2. A continuous random variable; 3. F(x) is a completely specified hypothesized cumulative distribution function. The Kolmogorov one-sample test has a wide range of applications. Knowing the effect fromusing the test when an assumption is not met is of practical importance. The purpose of this research is to analyze the robustness of the Kolmogorov one-sample test to sampling from a finite discrete distribution. The standard tables for the Kolmogorov test are derived based on sampling from a theoretical continuous distribution. As such, the theoretical distribution is infinite. The standard tables do not include a method or adjustment factor to estimate the effect on table values for statistical experiments where the sample stems from a finite discrete distribution without replacement. This research provides an extension of the Kolmogorov test when the hypothesized distribution function is finite and discrete, and the sampling distribution is based on sampling without replacement. An investigative study has been conducted to explore possible tendencies and relationships in the distribution of Dn when sampling with and without replacement for various parameter settings. In all, 96 sampling distributions were derived. Results show the standard Kolmogorov table values are conservative, particularly when the sample sizes are small or the sample represents 10% or more of the population.
Date: December 1996
Creator: Tucker, Joanne M. (Joanne Morris)
Partner: UNT Libraries

A Simulation Study Comparing Various Confidence Intervals for the Mean of Voucher Populations in Accounting

Description: This research examined the performance of three parametric methods for confidence intervals: the classical, the Bonferroni, and the bootstrap-t method, as applied to estimating the mean of voucher populations in accounting. Usually auditing populations do not follow standard models. The population for accounting audits generally is a nonstandard mixture distribution in which the audit data set contains a large number of zero values and a comparatively small number of nonzero errors. This study assumed a situation in which only overstatement errors exist. The nonzero errors were assumed to be normally, exponentially, and uniformly distributed. Five indicators of performance were used. The classical method was found to be unreliable. The Bonferroni method was conservative for all population conditions. The bootstrap-t method was excellent in terms of reliability, but the lower limit of the confidence intervals produced by this method was unstable for all population conditions. The classical method provided the shortest average width of the confidence intervals among the three methods. This study provided initial evidence as to how the parametric bootstrap-t method performs when applied to the nonstandard distribution of audit populations of line items. Further research should provide a reliable confidence interval for a wider variety of accounting populations.
Date: December 1992
Creator: Lee, Ihn Shik
Partner: UNT Libraries