## You limited your search to:

**Partner:**UNT Libraries

**Department:**Department of Technology and Cognition

**Degree Discipline:**Educational Research

### Establishing the utility of a classroom effectiveness index as a teacher accountability system.

**Date:**May 2002

**Creator:**Bembry, Karen L.

**Description:**How to identify effective teachers who improve student achievement despite diverse student populations and school contexts is an ongoing discussion in public education. The need to show communities and parents how well teachers and schools improve student learning has led districts and states to seek a fair, equitable and valid measure of student growth using student achievement. This study investigated a two stage hierarchical model for estimating teacher effect on student achievement. This measure was entitled a Classroom Effectiveness Index (CEI). Consistency of this model over time, outlier influences in individual CEIs, variance among CEIs across four years, and correlations of second stage student residuals with first stage student residuals were analyzed. The statistical analysis used four years of student residual data from a state-mandated mathematics assessment (n=7086) and a state-mandated reading assessment (n=7572) aggregated by teacher. The study identified the following results. Four years of district grand slopes and grand intercepts were analyzed to show consistent results over time. Repeated measures analyses of grand slopes and intercepts in mathematics were statistically significant at the .01 level. Repeated measures analyses of grand slopes and intercepts in reading were not statistically significant. The analyses indicated consistent results over time for reading ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc3161/

### Ability Estimation Under Different Item Parameterization and Scoring Models

**Date:**May 2002

**Creator:**Si, Ching-Fung B.

**Description:**A Monte Carlo simulation study investigated the effect of scoring format, item parameterization, threshold configuration, and prior ability distribution on the accuracy of ability estimation given various IRT models. Item response data on 30 items from 1,000 examinees was simulated using known item parameters and ability estimates. The item response data sets were submitted to seven dichotomous or polytomous IRT models with different item parameterization to estimate examinee ability. The accuracy of the ability estimation for a given IRT model was assessed by the recovery rate and the root mean square errors. The results indicated that polytomous models produced more accurate ability estimates than the dichotomous models, under all combinations of research conditions, as indicated by higher recovery rates and lower root mean square errors. For the item parameterization models, the one-parameter model out-performed the two-parameter and three-parameter models under all research conditions. Among the polytomous models, the partial credit model had more accurate ability estimation than the other three polytomous models. The nominal categories model performed better than the general partial credit model and the multiple-choice model with the multiple-choice model the least accurate. The results further indicated that certain prior ability distributions had an effect on the accuracy ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc3116/

### Measurement Disturbance Effects on Rasch Fit Statistics and the Logit Residual Index

**Date:**August 1997

**Creator:**Mount, Robert E. (Robert Earl)

**Description:**The effects of random guessing as a measurement disturbance on Rasch fit statistics (unweighted total, weighted total, and unweighted ability between) and the Logit Residual Index (LRI) were examined through simulated data sets of varying sample sizes, test lengths, and distribution types. Three test lengths (25, 50, and 100), three sample sizes (25, 50, and 100), two item difficulty distributions (normal and uniform), and three levels of guessing (no guessing (0%), 25%, and 50%) were used in the simulations, resulting in 54 experimental conditions. The mean logit person ability for each experiment was +1. Each experimental condition was simulated once in an effort to approximate what could happen on the single administration of a four option per item multiple choice test to a group of relatively high ability persons. Previous research has shown that varying item and person parameters have no effect on Rasch fit statistics. Consequently, these parameters were used in the present study to establish realistic test conditions, but were not interpreted as effect factors in determining the results of this study.

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc279376/

### A comparison of traditional and IRT factor analysis.

**Date:**December 2004

**Creator:**Kay, Cheryl Ann

**Description:**This study investigated the item parameter recovery of two methods of factor analysis. The methods researched were a traditional factor analysis of tetrachoric correlation coefficients and an IRT approach to factor analysis which utilizes marginal maximum likelihood estimation using an EM algorithm (MMLE-EM). Dichotomous item response data was generated under the 2-parameter normal ogive model (2PNOM) using PARDSIM software. Examinee abilities were sampled from both the standard normal and uniform distributions. True item discrimination, a, was normal with a mean of .75 and a standard deviation of .10. True b, item difficulty, was specified as uniform [-2, 2]. The two distributions of abilities were completely crossed with three test lengths (n= 30, 60, and 100) and three sample sizes (N = 50, 500, and 1000). Each of the 18 conditions was replicated 5 times, resulting in 90 datasets. PRELIS software was used to conduct a traditional factor analysis on the tetrachoric correlations. The IRT approach to factor analysis was conducted using BILOG 3 software. Parameter recovery was evaluated in terms of root mean square error, average signed bias, and Pearson correlations between estimated and true item parameters. ANOVAs were conducted to identify systematic differences in error indices. Based on many ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc4695/

### Bias and Precision of the Squared Canonical Correlation Coefficient under Nonnormal Data Conditions

**Date:**August 2006

**Creator:**Leach, Lesley Ann Freeny

**Description:**This dissertation: (a) investigated the degree to which the squared canonical correlation coefficient is biased in multivariate nonnormal distributions and (b) identified formulae that adjust the squared canonical correlation coefficient (Rc2) such that it most closely approximates the true population effect under normal and nonnormal data conditions. Five conditions were manipulated in a fully-crossed design to determine the degree of bias associated with Rc2: distribution shape, variable sets, sample size to variable ratios, and within- and between-set correlations. Very few of the condition combinations produced acceptable amounts of bias in Rc2, but those that did were all found with first function results. The sample size to variable ratio (n:v)was determined to have the greatest impact on the bias associated with the Rc2 for the first, second, and third functions. The variable set condition also affected the accuracy of Rc2, but for the second and third functions only. The kurtosis levels of the marginal distributions (b2), and the between- and within-set correlations demonstrated little or no impact on the bias associated with Rc2. Therefore, it is recommended that researchers use n:v ratios of at least 10:1 in canonical analyses, although greater n:v ratios have the potential to produce even less bias. ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc5361/

### A Comparison of Three Criteria Employed in the Selection of Regression Models Using Simulated and Real Data

**Date:**December 1994

**Creator:**Graham, D. Scott

**Description:**Researchers who make predictions from educational data are interested in choosing the best regression model possible. Many criteria have been devised for choosing a full or restricted model, and also for selecting the best subset from an all-possible-subsets regression. The relative practical usefulness of three of the criteria used in selecting a regression model was compared in this study: (a) Mallows' C_p, (b) Amemiya's prediction criterion, and (c) Hagerty and Srinivasan's method involving predictive power. Target correlation matrices with 10,000 cases were simulated so that the matrices had varying degrees of effect sizes. The amount of power for each matrix was calculated after one or two predictors was dropped from the full regression model, for sample sizes ranging from n = 25 to n = 150. Also, the null case, when one predictor was uncorrelated with the other predictors, was considered. In addition, comparisons for regression models selected using C_p and prediction criterion were performed using data from the National Educational Longitudinal Study of 1988.

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc278764/

### A Comparison of Two Differential Item Functioning Detection Methods: Logistic Regression and an Analysis of Variance Approach Using Rasch Estimation

**Date:**August 1995

**Creator:**Whitmore, Marjorie Lee Threet

**Description:**Differential item functioning (DIF) detection rates were examined for the logistic regression and analysis of variance (ANOVA) DIF detection methods. The methods were applied to simulated data sets of varying test length (20, 40, and 60 items) and sample size (200, 400, and 600 examinees) for both equal and unequal underlying ability between groups as well as for both fixed and varying item discrimination parameters. Each test contained 5% uniform DIF items, 5% non-uniform DIF items, and 5% combination DIF (simultaneous uniform and non-uniform DIF) items. The factors were completely crossed, and each experiment was replicated 100 times. For both methods and all DIF types, a test length of 20 was sufficient for satisfactory DIF detection. The detection rate increased significantly with sample size for each method. With the ANOVA DIF method and uniform DIF, there was a difference in detection rates between discrimination parameter types, which favored varying discrimination and decreased with increased sample size. The detection rate of non-uniform DIF using the ANOVA DIF method was higher with fixed discrimination parameters than with varying discrimination parameters when relative underlying ability was unequal. In the combination DIF case, there was a three-way interaction among the experimental factors discrimination type, ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc278366/

### An Empirical Comparison of Random Number Generators: Period, Structure, Correlation, Density, and Efficiency

**Date:**August 1995

**Creator:**Bang, Jung Woong

**Description:**Random number generators (RNGs) are widely used in conducting Monte Carlo simulation studies, which are important in the field of statistics for comparing power, mean differences, or distribution shapes between statistical approaches. Statistical results, however, may differ when different random number generators are used. Often older methods have been blindly used with no understanding of their limitations. Many random functions supplied with computers today have been found to be comparatively unsatisfactory. In this study, five multiplicative linear congruential generators (MLCGs) were chosen which are provided in the following statistical packages: RANDU (IBM), RNUN (IMSL), RANUNI (SAS), UNIFORM(SPSS), and RANDOM (BMDP). Using a personal computer (PC), an empirical investigation was performed using five criteria: period length before repeating random numbers, distribution shape, correlation between adjacent numbers, density of distributions and normal approach of random number generator (RNG) in a normal function. All RNG FORTRAN programs were rewritten into Pascal which is more efficient language for the PC. Sets of random numbers were generated using different starting values. A good RNG should have the following properties: a long enough period; a well-structured pattern in distribution; independence between random number sequences; random and uniform distribution; and a good normal approach in the normal ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc277807/

### The Effect of Psychometric Parallelism among Predictors on the Efficiency of Equal Weights and Least Squares Weights in Multiple Regression

**Date:**May 1996

**Creator:**Zhang, Desheng

**Description:**There are several conditions for applying equal weights as an alternative to least squares weights. Psychometric parallelism, one of the conditions, has been suggested as a necessary and sufficient condition for equal-weights aggregation. The purpose of this study is to investigate the effect of psychometric parallelism among predictors on the efficiency of equal weights and least squares weights. Target correlation matrices with 10,000 cases were simulated so that the matrices had varying degrees of psychometric parallelism. Five hundred samples with six ratios of observation to predictor = 5/1, 10/1, 20/1, 30/1, 40/1, and 50/1 were drawn from each population. The efficiency is interpreted as the accuracy and the predictive power estimated by the weighting methods. The accuracy is defined by the deviation between the population R² and the sample R² . The predictive power is referred to as the population cross-validated R² and the population mean square error of prediction. The findings indicate there is no statistically significant relationship between the level of psychometric parallelism and the accuracy of least squares weights. In contrast, the correlation between the level of psychometric parallelism and the accuracy of equal weights is significantly negative. Under different conditions, the minimum p value of χ² ...

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc278996/

### The Generalization of the Logistic Discriminant Function Analysis and Mantel Score Test Procedures to Detection of Differential Testlet Functioning

**Date:**August 1994

**Creator:**Kinard, Mary E.

**Description:**Two procedures for detection of differential item functioning (DIF) for polytomous items were generalized to detection of differential testlet functioning (DTLF). The methods compared were the logistic discriminant function analysis procedure for uniform and non-uniform DTLF (LDFA-U and LDFA-N), and the Mantel score test procedure. Further analysis included comparison of results of DTLF analysis using the Mantel procedure with DIF analysis of individual testlet items using the Mantel-Haenszel (MH) procedure. Over 600 chi-squares were analyzed and compared for rejection of null hypotheses. Samples of 500, 1,000, and 2,000 were drawn by gender subgroups from the NELS:88 data set, which contains demographic and test data from over 25,000 eighth graders. Three types of testlets (totalling 29) from the NELS:88 test were analyzed for DTLF. The first type, the common passage testlet, followed the conventional testlet definition: items grouped together by a common reading passage, figure, or graph. The other two types were based upon common content and common process. as outlined in the NELS test specification.

**Contributing Partner:**UNT Libraries

**Permallink:**digital.library.unt.edu/ark:/67531/metadc279043/