## You limited your search to:

**Partner:**UNT Libraries

**Degree Discipline:**Educational Research

**Degree Level:**Doctoral

**Collection:**UNT Theses and Dissertations

### Ability Estimation Under Different Item Parameterization and Scoring Models

**Date:**May 2002

**Creator:**Si, Ching-Fung B.

**Description:**A Monte Carlo simulation study investigated the effect of scoring format, item parameterization, threshold configuration, and prior ability distribution on the accuracy of ability estimation given various IRT models. Item response data on 30 items from 1,000 examinees was simulated using known item parameters and ability estimates. The item response data sets were submitted to seven dichotomous or polytomous IRT models with different item parameterization to estimate examinee ability. The accuracy of the ability estimation for a given IRT model was assessed by the recovery rate and the root mean square errors. The results indicated that polytomous models produced more accurate ability estimates than the dichotomous models, under all combinations of research conditions, as indicated by higher recovery rates and lower root mean square errors. For the item parameterization models, the one-parameter model out-performed the two-parameter and three-parameter models under all research conditions. Among the polytomous models, the partial credit model had more accurate ability estimation than the other three polytomous models. The nominal categories model performed better than the general partial credit model and the multiple-choice model with the multiple-choice model the least accurate. The results further indicated that certain prior ability distributions had an effect on the accuracy ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc3116/

### The Analysis of the Accumulation of Type II Error in Multiple Comparisons for Specified Levels of Power to Violation of Normality with the Dunn-Bonferroni Procedure: a Monte Carlo Study

**Date:**August 1989

**Creator:**Powers-Prather, Bonnie Ann

**Description:**The study seeks to determine the degree of accumulation of Type II error rates, while violating the assumptions of normality, for different specified levels of power among sample means. The study employs a Monte Carlo simulation procedure with three different specified levels of power, methodologies, and population distributions. On the basis of the comparisons of actual and observed error rates, the following conclusions appear to be appropriate. 1. Under the strict criteria for evaluation of the hypotheses, Type II experimentwise error does accumulate at a rate that the probability of accepting at least one null hypothesis in a family of tests, when in theory all of the alternate hypotheses are true, is high, precluding valid tests at the beginning of the study. 2. The Dunn-Bonferroni procedure of setting the critical value based on the beta value per contrast did not significantly reduce the probability of committing a Type II error in a family of tests. 3. The use of an adequate sample size and orthogonal contrasts, or limiting the number of pairwise comparisons to the number of means, is the best method to control for the accumulation of Type II errors. 4. The accumulation of Type II error is irrespective ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc331385/

### Attenuation of the Squared Canonical Correlation Coefficient Under Varying Estimates of Score Reliability

**Date:**August 2010

**Creator:**Wilson, Celia M.

**Description:**Research pertaining to the distortion of the squared canonical correlation coefficient has traditionally been limited to the effects of sampling error and associated correction formulas. The purpose of this study was to compare the degree of attenuation of the squared canonical correlation coefficient under varying conditions of score reliability. Monte Carlo simulation methodology was used to fulfill the purpose of this study. Initially, data populations with various manipulated conditions were generated (N = 100,000). Subsequently, 500 random samples were drawn with replacement from each population, and data was subjected to canonical correlation analyses. The canonical correlation results were then analyzed using descriptive statistics and an ANOVA design to determine under which condition(s) the squared canonical correlation coefficient was most attenuated when compared to population Rc2 values. This information was analyzed and used to determine what effect, if any, the different conditions considered in this study had on Rc2. The results from this Monte Carlo investigation clearly illustrated the importance of score reliability when interpreting study results. As evidenced by the outcomes presented, the more measurement error (lower reliability) present in the variables included in an analysis, the more attenuation experienced by the effect size(s) produced in the analysis, in this ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc30528/

### Bias and Precision of the Squared Canonical Correlation Coefficient under Nonnormal Data Conditions

**Date:**August 2006

**Creator:**Leach, Lesley Ann Freeny

**Description:**This dissertation: (a) investigated the degree to which the squared canonical correlation coefficient is biased in multivariate nonnormal distributions and (b) identified formulae that adjust the squared canonical correlation coefficient (Rc2) such that it most closely approximates the true population effect under normal and nonnormal data conditions. Five conditions were manipulated in a fully-crossed design to determine the degree of bias associated with Rc2: distribution shape, variable sets, sample size to variable ratios, and within- and between-set correlations. Very few of the condition combinations produced acceptable amounts of bias in Rc2, but those that did were all found with first function results. The sample size to variable ratio (n:v)was determined to have the greatest impact on the bias associated with the Rc2 for the first, second, and third functions. The variable set condition also affected the accuracy of Rc2, but for the second and third functions only. The kurtosis levels of the marginal distributions (b2), and the between- and within-set correlations demonstrated little or no impact on the bias associated with Rc2. Therefore, it is recommended that researchers use n:v ratios of at least 10:1 in canonical analyses, although greater n:v ratios have the potential to produce even less bias. ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc5361/

### The Characteristics and Properties of the Threshold and Squared-Error Criterion-Referenced Agreement Indices

**Date:**May 1988

**Creator:**Dutschke, Cynthia F. (Cynthia Fleming)

**Description:**Educators who use criterion-referenced measurement to ascertain the current level of performance of an examinee in order that the examinee may be classified as either a master or a nonmaster need to know the accuracy and consistency of their decisions regarding assignment of mastery states. This study examined the sampling distribution characteristics of two reliability indices that use the squared-error agreement function: Livingston's k^2(X,Tx) and Brennan and Kane's M(C). The sampling distribution characteristics of five indices that use the threshold agreement function were also examined: Subkoviak's Pc. Huynh's p and k. and Swaminathan's p and k. These seven methods of calculating reliability were also compared under varying conditions of sample size, test length, and criterion or cutoff score. Computer-generated data provided randomly parallel test forms for N = 2000 cases. From this, 1000 samples were drawn, with replacement, and each of the seven reliability indices was calculated. Descriptive statistics were collected for each sample set and examined for distribution characteristics. In addition, the mean value for each index was compared to the population parameter value of consistent mastery/nonmastery classifications. The results indicated that the sampling distribution characteristics of all seven reliability indices approach normal characteristics with increased sample size. The ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc331415/

### A Comparison of IRT and Rasch Procedures in a Mixed-Item Format Test

**Date:**August 2003

**Creator:**Kinsey, Tari L.

**Description:**This study investigated the effects of test length (10, 20 and 30 items), scoring schema (proportion of dichotomous ad polytomous scoring) and item analysis model (IRT and Rasch) on the ability estimates, test information levels and optimization criteria of mixed item format tests. Polytomous item responses to 30 items for 1000 examinees were simulated using the generalized partial-credit model and SAS software. Portions of the data were re-coded dichotomously over 11 structured proportions to create 33 sets of test responses including mixed item format tests. MULTILOG software was used to calculate the examinee ability estimates, standard errors, item and test information, reliability and fit indices. A comparison of IRT and Rasch item analysis procedures was made using SPSS software across ability estimates and standard errors of ability estimates using a 3 x 11 x 2 fixed factorial ANOVA. Effect sizes and power were reported for each procedure. Scheffe post hoc procedures were conducted on significant factos. Test information was analyzed and compared across the range of ability levels for all 66-design combinations. The results indicated that both test length and the proportion of items scored polytomously had a significant impact on the amount of test information produced by mixed item ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc4316/

### Comparison of Methods for Computation and Cumulation of Effect Sizes in Meta-Analysis

**Date:**December 1987

**Creator:**Ronco, Sharron L. (Sharron Lee)

**Description:**This study examined the statistical consequences of employing various methods of computing and cumulating effect sizes in meta-analysis. Six methods of computing effect size, and three techniques for combining study outcomes, were compared. Effect size metrics were calculated with one-group and pooled standardizing denominators, corrected for bias and for unreliability of measurement, and weighted by sample size and by sample variance. Cumulating techniques employed as units of analysis the effect size, the study, and an average study effect. In order to determine whether outcomes might vary with the size of the meta-analysis, mean effect sizes were also compared for two smaller subsets of studies. An existing meta-analysis of 60 studies examining the effectiveness of computer-based instruction was used as a data base for this investigation. Recomputation of the original study data under the six different effect size formulas showed no significant difference among the metrics. Maintaining the independence of the data by using only one effect size per study, whether a single or averaged effect, produced a higher mean effect size than averaging all effect sizes together, although the difference did not reach statistical significance. The sampling distribution of effect size means approached that of the population of 60 studies ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc331655/

### A Comparison of Some Continuity Corrections for the Chi-Squared Test in 3 x 3, 3 x 4, and 3 x 5 Tables

**Date:**May 1987

**Creator:**Mullen, Jerry D. (Jerry Davis)

**Description:**This study was designed to determine whether chis-quared based tests for independence give reliable estimates (as compared to the exact values provided by Fisher's exact probabilities test) of the probability of a relationship between the variables in 3 X 3, 3 X 4 , and 3 X 5 contingency tables when the sample size is 10, 20, or 30. In addition to the classical (uncorrected) chi-squared test, four methods for continuity correction were compared to Fisher's exact probabilities test. The four methods were Yates' correction, two corrections attributed to Cochran, and Mantel's correction. The study was modeled after a similar comparison conducted on 2 X 2 contingency tables and published by Michael Haber.

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc331001/

### A comparison of the Effects of Different Sizes of Ceiling Rules on the Estimates of Reliability of a Mathematics Achievement Test

**Date:**May 1987

**Creator:**Somboon Suriyawongse

**Description:**This study compared the estimates of reliability made using one, two, three, four, five, and unlimited consecutive failures as ceiling rules in scoring a mathematics achievement test which is part of the Iowa Tests of Basic Skill (ITBS), Form 8. There were 700 students randomly selected from a population (N=2640) of students enrolled in the eight grades in a large urban school district in the southwestern United States. These 700 students were randomly divided into seven subgroups so that each subgroup had 100 students. The responses of all those students to three subtests of the mathematics achievement battery, which included mathematical concepts (44 items), problem solving (32 items), and computation (45 items), were analyzed to obtain the item difficulties and a total score for each student. The items in each subtest then were rearranged based on the item difficulties from the highest to the lowest value. In each subgroup, the method using one, two, three, four, five, and unlimited consecutive failures as the ceiling rules were applied to score the individual responses. The total score for each individual was the sum of the correct responses prior to the point described by the ceiling rule. The correct responses after the ceiling ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc331183/

### A Comparison of Three Criteria Employed in the Selection of Regression Models Using Simulated and Real Data

**Date:**December 1994

**Creator:**Graham, D. Scott

**Description:**Researchers who make predictions from educational data are interested in choosing the best regression model possible. Many criteria have been devised for choosing a full or restricted model, and also for selecting the best subset from an all-possible-subsets regression. The relative practical usefulness of three of the criteria used in selecting a regression model was compared in this study: (a) Mallows' C_p, (b) Amemiya's prediction criterion, and (c) Hagerty and Srinivasan's method involving predictive power. Target correlation matrices with 10,000 cases were simulated so that the matrices had varying degrees of effect sizes. The amount of power for each matrix was calculated after one or two predictors was dropped from the full regression model, for sample sizes ranging from n = 25 to n = 150. Also, the null case, when one predictor was uncorrelated with the other predictors, was considered. In addition, comparisons for regression models selected using C_p and prediction criterion were performed using data from the National Educational Longitudinal Study of 1988.

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc278764/