UNT Theses and Dissertations - 42 Matching Results

Search Results

Establishing the utility of a classroom effectiveness index as a teacher accountability system.

Description: How to identify effective teachers who improve student achievement despite diverse student populations and school contexts is an ongoing discussion in public education. The need to show communities and parents how well teachers and schools improve student learning has led districts and states to seek a fair, equitable and valid measure of student growth using student achievement. This study investigated a two stage hierarchical model for estimating teacher effect on student achievement. This measure was entitled a Classroom Effectiveness Index (CEI). Consistency of this model over time, outlier influences in individual CEIs, variance among CEIs across four years, and correlations of second stage student residuals with first stage student residuals were analyzed. The statistical analysis used four years of student residual data from a state-mandated mathematics assessment (n=7086) and a state-mandated reading assessment (n=7572) aggregated by teacher. The study identified the following results. Four years of district grand slopes and grand intercepts were analyzed to show consistent results over time. Repeated measures analyses of grand slopes and intercepts in mathematics were statistically significant at the .01 level. Repeated measures analyses of grand slopes and intercepts in reading were not statistically significant. The analyses indicated consistent results over time for reading but not for mathematics. Data were analyzed to assess outlier effects. Nineteen statistically significant outliers in 15,378 student residuals were identified. However, the impact on individual teachers was extreme in eight of the 19 cases. Further study is indicated. Subsets of teachers in the same assignment at the same school for four consecutive years and for three consecutive years indicated CEIs were stable over time. There were no statistically significant differences in either mathematics or reading. Correlations between Level One student residuals and HLM residuals were statistically significant in reading and in mathematics. This implied that the second stage of ...
Date: May 2002
Creator: Bembry, Karen L.
Partner: UNT Libraries

Ability Estimation Under Different Item Parameterization and Scoring Models

Description: A Monte Carlo simulation study investigated the effect of scoring format, item parameterization, threshold configuration, and prior ability distribution on the accuracy of ability estimation given various IRT models. Item response data on 30 items from 1,000 examinees was simulated using known item parameters and ability estimates. The item response data sets were submitted to seven dichotomous or polytomous IRT models with different item parameterization to estimate examinee ability. The accuracy of the ability estimation for a given IRT model was assessed by the recovery rate and the root mean square errors. The results indicated that polytomous models produced more accurate ability estimates than the dichotomous models, under all combinations of research conditions, as indicated by higher recovery rates and lower root mean square errors. For the item parameterization models, the one-parameter model out-performed the two-parameter and three-parameter models under all research conditions. Among the polytomous models, the partial credit model had more accurate ability estimation than the other three polytomous models. The nominal categories model performed better than the general partial credit model and the multiple-choice model with the multiple-choice model the least accurate. The results further indicated that certain prior ability distributions had an effect on the accuracy of ability estimation; however, no clear order of accuracy among the four prior distribution groups was identified due to an interaction between prior ability distribution and threshold configuration. The recovery rate was lower when the test items had categories with unequal threshold distances, were close at one end of the ability/difficulty continuum, and were administered to a sample of examinees whose population ability distribution was skewed to the same end of the ability continuum.
Date: May 2002
Creator: Si, Ching-Fung B.
Partner: UNT Libraries

The Supply and Demand of Physician Assistants in the United States: A Trend Analysis

Description: The supply of non-physician clinicians (NPCs), such as physician assistant (PAs), could significantly influence demand requirements in medical workforce projections. This study predicts supply of and demand for PAs from 2006 to 2020. The PA supply model utilized the number of certified PAs, the educational capacity (at 10% and 25% expansion) with assumed attrition rates, and retirement assumptions. Gross domestic product (GDP) chained in 2000 dollar and US population were utilized in a transfer function trend analyses with the number of PAs as the dependent variable for the PA demand model. Historical analyses revealed strong correlations between GDP and US population with the number of PAs. The number of currently certified PAs represents approximately 75% of the projected demand. At 10% growth, the supply and demand equilibrium for PAs will be reached in 2012. A 25% increase in new entrants causes equilibrium to be met one year earlier. Robust application trends in PA education enrollment (2.2 applicants per seat for PAs is the same as for allopathic medical school applicants) support predicted increases. However, other implications for the PA educational institutions include recruitment and retention of qualified faculty, clinical site maintenance and diversity of matriculates. Further research on factors affecting the supply and demand for PAs is needed in the areas of retirement age rates, gender, and lifestyle influences. Specialization trends and visit intensity levels are potential variables.
Date: May 2007
Creator: Orcutt, Venetia L.
Partner: UNT Libraries

Comparisons of Improvement-Over-Chance Effect Sizes for Two Groups Under Variance Heterogeneity and Prior Probabilities

Description: The distributional properties of improvement-over-chance, I, effect sizes derived from linear and quadratic predictive discriminant analysis (PDA) and from logistic regression analysis (LRA) for the two-group univariate classification were examined. Data were generated under varying levels of four data conditions: population separation, variance pattern, sample size, and prior probabilities. None of the indices provided acceptable estimates of effect for all the conditions examined. There were only a small number of conditions under which both accuracy and precision were acceptable. The results indicate that the decision of which method to choose is primarily determined by variance pattern and prior probabilities. Under variance homogeneity, any of the methods may be recommended. However, LRA is recommended when priors are equal or extreme and linear PDA is recommended when priors are moderate. Under variance heterogeneity, selecting a recommended method is more complex. In many cases, more than one method could be used appropriately.
Date: May 2003
Creator: Alexander, Erika D.
Partner: UNT Libraries

A Comparison of IRT and Rasch Procedures in a Mixed-Item Format Test

Description: This study investigated the effects of test length (10, 20 and 30 items), scoring schema (proportion of dichotomous ad polytomous scoring) and item analysis model (IRT and Rasch) on the ability estimates, test information levels and optimization criteria of mixed item format tests. Polytomous item responses to 30 items for 1000 examinees were simulated using the generalized partial-credit model and SAS software. Portions of the data were re-coded dichotomously over 11 structured proportions to create 33 sets of test responses including mixed item format tests. MULTILOG software was used to calculate the examinee ability estimates, standard errors, item and test information, reliability and fit indices. A comparison of IRT and Rasch item analysis procedures was made using SPSS software across ability estimates and standard errors of ability estimates using a 3 x 11 x 2 fixed factorial ANOVA. Effect sizes and power were reported for each procedure. Scheffe post hoc procedures were conducted on significant factos. Test information was analyzed and compared across the range of ability levels for all 66-design combinations. The results indicated that both test length and the proportion of items scored polytomously had a significant impact on the amount of test information produced by mixed item format tests. Generally, tests with 100% of the items scored polytomously produced the highest overall information. This seemed to be especially true for examinees with lower ability estimates. Optimality comparisons were made between IRT and Rasch procedures based on standard error rates for the ability estimates, marginal reliabilities and fit indices (-2LL). The only significant differences reported involved the standard error rates for both the IRT and Rasch procedures. This result must be viewed in light of the fact that the effect size reported was negligible. Optimality was found to be highest when longer tests and higher proportions of polytomous ...
Date: August 2003
Creator: Kinsey, Tari L.
Partner: UNT Libraries

A comparison of traditional and IRT factor analysis.

Description: This study investigated the item parameter recovery of two methods of factor analysis. The methods researched were a traditional factor analysis of tetrachoric correlation coefficients and an IRT approach to factor analysis which utilizes marginal maximum likelihood estimation using an EM algorithm (MMLE-EM). Dichotomous item response data was generated under the 2-parameter normal ogive model (2PNOM) using PARDSIM software. Examinee abilities were sampled from both the standard normal and uniform distributions. True item discrimination, a, was normal with a mean of .75 and a standard deviation of .10. True b, item difficulty, was specified as uniform [-2, 2]. The two distributions of abilities were completely crossed with three test lengths (n= 30, 60, and 100) and three sample sizes (N = 50, 500, and 1000). Each of the 18 conditions was replicated 5 times, resulting in 90 datasets. PRELIS software was used to conduct a traditional factor analysis on the tetrachoric correlations. The IRT approach to factor analysis was conducted using BILOG 3 software. Parameter recovery was evaluated in terms of root mean square error, average signed bias, and Pearson correlations between estimated and true item parameters. ANOVAs were conducted to identify systematic differences in error indices. Based on many of the indices, it appears the IRT approach to factor analysis recovers item parameters better than the traditional approach studied. Future research should compare other methods of factor analysis to MMLE-EM under various non-normal distributions of abilities.
Date: December 2004
Creator: Kay, Cheryl Ann
Partner: UNT Libraries

A Quantitative Modeling Approach to Examining High School, Pre-Admission, Program, Certification and Career Choice Variables in Undergraduate Teacher Preparation Programs

Description: The purpose of this study was to examine if there is an association between effective supervision and communication competence in divisions of student affairs at Christian higher education institutions. The investigation examined chief student affairs officers (CSAOs) and their direct reports at 45 institutions across the United States using the Synergistic Supervision Scale and the Communication Competence Questionnaire. A positive significant association was found between the direct report's evaluation of the CSAO's level of synergistic supervision and the direct report's evaluation of the CSAO's level of communication competence. The findings of this study will advance the supervision and communication competence literature while informing practice for student affairs professionals. This study provides a foundation of research in the context specific field of student affairs where there has been a dearth of literature regarding effective supervision. This study can be used as a platform for future research to further the understanding of characteristics that define effective supervision.
Date: December 2007
Creator: Williams, Cynthia Savage
Partner: UNT Libraries

Bias and Precision of the Squared Canonical Correlation Coefficient under Nonnormal Data Conditions

Description: This dissertation: (a) investigated the degree to which the squared canonical correlation coefficient is biased in multivariate nonnormal distributions and (b) identified formulae that adjust the squared canonical correlation coefficient (Rc2) such that it most closely approximates the true population effect under normal and nonnormal data conditions. Five conditions were manipulated in a fully-crossed design to determine the degree of bias associated with Rc2: distribution shape, variable sets, sample size to variable ratios, and within- and between-set correlations. Very few of the condition combinations produced acceptable amounts of bias in Rc2, but those that did were all found with first function results. The sample size to variable ratio (n:v)was determined to have the greatest impact on the bias associated with the Rc2 for the first, second, and third functions. The variable set condition also affected the accuracy of Rc2, but for the second and third functions only. The kurtosis levels of the marginal distributions (b2), and the between- and within-set correlations demonstrated little or no impact on the bias associated with Rc2. Therefore, it is recommended that researchers use n:v ratios of at least 10:1 in canonical analyses, although greater n:v ratios have the potential to produce even less bias. Furthermore,because it was determined that b2 did not impact the accuracy of Rc2, one can be somewhat confident that, with marginal distributions possessing homogenous kurtosis levels ranging anywhere from -1 to 8, Rc2 will likely be as accurate as that resulting from a normal distribution. Because the majority of Rc2 estimates were extremely biased, it is recommended that all Rc2 effects, regardless of which function from which they result, be adjusted using an appropriate adjustment formula. If no rationale exists for the use of another formula, the Rozeboom-2 would likely be a safe choice given that it produced the greatest ...
Date: August 2006
Creator: Leach, Lesley Ann Freeny
Partner: UNT Libraries

Investigating the hypothesized factor structure of the Noel-Levitz Student Satisfaction Inventory: A study of the student satisfaction construct.

Description: College student satisfaction is a concept that has become more prevalent in higher education research journals. Little attention has been given to the psychometric properties of previous instrumentation, and few studies have investigated the structure of current satisfaction instrumentation. This dissertation: (a) investigated the tenability of the theoretical dimensional structure of the Noel-Levitz Student Satisfaction Inventory™ (SSI), (b) investigated an alternative factor structure using explanatory factor analyses (EFA), and (c) used multiple-group CFA procedures to determine whether an alternative SSI factor structure would be invariant for three demographic variables: gender (men/women), race/ethnicity (Caucasian/Other), and undergraduate classification level (lower level/upper level). For this study, there was little evidence for the multidimensional structure of the SSI. A single factor, termed General Satisfaction with College, was the lone unidimensional construct that emerged from the iterative CFA and EFA procedures. A revised 20-item model was developed, and a series of multigroup CFAs were used to detect measurement invariance for three variables: student gender, race/ethnicity, and class level. No measurement invariance was noted for the revised 20-item model. Results for the invariance tests indicated equivalence across the comparison groups for (a) the number of factors, (b) the pattern of indicator-factor loadings, (c) the factor loadings, and (d) the item error variances. Because little attention has been given to the psychometric properties of the satisfaction instrumentation, it is recommended that further research continue on the SSI and any additional instrumentation developed to measure student satisfaction. It is possible that invariance issues may explain a portion of the inconsistent findings noted in the review of literature. Although measurement analyses are a time-consuming process, they are essential for understanding the psychometrics characterized by a set of scores obtained from a survey, or any other form of assessment instrument.
Date: December 2008
Creator: Odom, Leslie R.
Partner: UNT Libraries

Stratified item selection and exposure control in unidimensional adaptive testing in the presence of two-dimensional data.

Description: It is not uncommon to use unidimensional item response theory (IRT) models to estimate ability in multidimensional data. Therefore it is important to understand the implications of summarizing multiple dimensions of ability into a single parameter estimate, especially if effects are confounded when applied to computerized adaptive testing (CAT). Previous studies have investigated the effects of different IRT models and ability estimators by manipulating the relationships between item and person parameters. However, in all cases, the maximum information criterion was used as the item selection method. Because maximum information is heavily influenced by the item discrimination parameter, investigating a-stratified item selection methods is tenable. The current Monte Carlo study compared maximum information, a-stratification, and a-stratification with b blocking item selection methods, alone, as well as in combination with the Sympson-Hetter exposure control strategy. The six testing conditions were conditioned on three levels of interdimensional item difficulty correlations and four levels of interdimensional examinee ability correlations. Measures of fidelity, estimation bias, error, and item usage were used to evaluate the effectiveness of the methods. Results showed either stratified item selection strategy is warranted if the goal is to obtain precise estimates of ability when using unidimensional CAT in the presence of two-dimensional data. If the goal also includes limiting bias of the estimate, Sympson-Hetter exposure control should be included. Results also confirmed that Sympson-Hetter is effective in optimizing item pool usage. Given these results, existing unidimensional CAT implementations might consider employing a stratified item selection routine plus Sympson-Hetter exposure control, rather than recalibrate the item pool under a multidimensional model.
Date: August 2009
Creator: Kalinowski, Kevin E.
Partner: UNT Libraries

Determination of the Optimal Number of Strata for Bias Reduction in Propensity Score Matching.

Description: Previous research implementing stratification on the propensity score has generally relied on using five strata, based on prior theoretical groundwork and minimal empirical evidence as to the suitability of quintiles to adequately reduce bias in all cases and across all sample sizes. This study investigates bias reduction across varying number of strata and sample sizes via a large-scale simulation to determine the adequacy of quintiles for bias reduction under all conditions. Sample sizes ranged from 100 to 50,000 and strata from 3 to 20. Both the percentage of bias reduction and the standardized selection bias were examined. The results show that while the particular covariates in the simulation met certain criteria with five strata that greater bias reduction could be achieved by increasing the number of strata, especially with larger sample sizes. Simulation code written in R is included.
Date: May 2010
Creator: Akers, Allen
Partner: UNT Libraries

Attenuation of the Squared Canonical Correlation Coefficient Under Varying Estimates of Score Reliability

Description: Research pertaining to the distortion of the squared canonical correlation coefficient has traditionally been limited to the effects of sampling error and associated correction formulas. The purpose of this study was to compare the degree of attenuation of the squared canonical correlation coefficient under varying conditions of score reliability. Monte Carlo simulation methodology was used to fulfill the purpose of this study. Initially, data populations with various manipulated conditions were generated (N = 100,000). Subsequently, 500 random samples were drawn with replacement from each population, and data was subjected to canonical correlation analyses. The canonical correlation results were then analyzed using descriptive statistics and an ANOVA design to determine under which condition(s) the squared canonical correlation coefficient was most attenuated when compared to population Rc2 values. This information was analyzed and used to determine what effect, if any, the different conditions considered in this study had on Rc2. The results from this Monte Carlo investigation clearly illustrated the importance of score reliability when interpreting study results. As evidenced by the outcomes presented, the more measurement error (lower reliability) present in the variables included in an analysis, the more attenuation experienced by the effect size(s) produced in the analysis, in this case Rc2. These results also demonstrated the role between and within set correlation, variable set size, and sample size played in the attenuation levels of the squared canonical correlation coefficient.
Date: August 2010
Creator: Wilson, Celia M.
Partner: UNT Libraries

A Hierarchical Regression Analysis of the Relationship Between Blog Reading, Online Political Activity, and Voting During the 2008 Presidential Campaign

Description: The advent of the Internet has increased access to information and impacted many aspects of life, including politics. The present study utilized Pew Internet & American Life survey data from the November 2008 presidential election time period to investigate the degree to which political blog reading predicted online political discussion, online political participation, whether or not a person voted, and voting choice, over and above the predication that could be explained by demographic measures of age, education level, gender, income, marital status, race/ethnicity, and region. Ordinary least squares hierarchical regression revealed that political blog reading was positively and statistically significantly related to online political discussion and online political participation. Hierarchical logistic regression analysis indicated that the odds of a political blog reader voting were 1.98 the odds of a nonreader voting, but vote choice was not predicted by reading political blogs. These results are interpreted within the uses and gratifications framework and the understanding that blogs add an interpersonal communication aspect to a mass medium. As more people use blogs and the nature of the blog-reading audience shifts, continuing to track and describe the blog audience with valid measures will be important for researchers and practitioners alike. Subsequent potential effects of political blog reading on engagement, discussion, and participation will be important to understand as these effects could impact the political landscape of this country and, therefore, the world.
Date: December 2010
Creator: Lewis, Mitzi
Partner: UNT Libraries

Structural Validity and Item Functioning of the LoTi Digital-Age Survey.

Description: The present study examined the structural construct validity of the LoTi Digital-Age Survey, a measure of teacher instructional practices with technology in the classroom. Teacher responses (N = 2840) from across the United States were used to assess factor structure of the instrument using both exploratory and confirmatory analyses. Parallel analysis suggests retaining a five-factor solution compared to the MAP test that suggests retaining a three-factor solution. Both analyses (EFA and CFA) indicate that changes need to be made to the current factor structure of the survey. The last two factors were composed of items that did not cover or accurately measure the content of the latent trait. Problematic items, such as items with crossloadings, were discussed. Suggestions were provided to improve the factor structure, items, and scale of the survey.
Date: May 2011
Creator: Mehta, Vandhana
Partner: UNT Libraries

Spatial Ability, Motivation, and Attitude of Students as Related to Science Achievement

Description: Understanding student achievement in science is important as there is an increasing reliance of the U.S. economy on math, science, and technology-related fields despite the declining number of youth seeking college degrees and careers in math and science. A series of structural equation models were tested using the scores from a statewide science exam for 276 students from a suburban north Texas public school district at the end of their 5th grade year and the latent variables of spatial ability, motivation to learn science and science-related attitude. Spatial ability was tested as a mediating variable on motivation and attitude; however, while spatial ability had statistically significant regression coefficients with motivation and attitude, spatial ability was found to be the sole statistically significant predictor of science achievement for these students explaining 23.1% of the variance in science scores.
Date: May 2011
Creator: Bolen, Judy Ann
Partner: UNT Libraries

Parent Involvement and Science Achievement: A Latent Growth Curve Analysis

Description: This study examined science achievement growth across elementary and middle school and parent school involvement using the Early Childhood Longitudinal Study – Kindergarten Class of 1998 – 1999 (ECLS-K). The ECLS-K is a nationally representative kindergarten cohort of students from public and private schools who attended full-day or half-day kindergarten class in 1998 – 1999. The present study’s sample (N = 8,070) was based on students that had a sampling weight available from the public-use data file. Students were assessed in science achievement at third, fifth, and eighth grades and parents of the students were surveyed at the same time points. Analyses using latent growth curve modeling with time invariant and varying covariates in an SEM framework revealed a positive relationship between science achievement and parent involvement at eighth grade. Furthermore, there were gender and racial/ethnic differences in parents’ school involvement as a predictor of science achievement. Findings indicated that students with lower initial science achievement scores had a faster rate of growth across time. The achievement gap between low and high achievers in earth, space and life sciences lessened from elementary to middle school. Parents’ involvement with school usually tapers off after elementary school, but due to parent school involvement being a significant predictor of eighth grade science achievement, later school involvement may need to be supported and better implemented in secondary schooling.
Date: August 2011
Creator: Johnson, Ursula Yvette
Partner: UNT Libraries

Missing Data Treatments at the Second Level of Hierarchical Linear Models

Description: The current study evaluated the performance of traditional versus modern MDTs in the estimation of fixed-effects and variance components for data missing at the second level of an hierarchical linear model (HLM) model across 24 different study conditions. Variables manipulated in the analysis included, (a) number of Level-2 variables with missing data, (b) percentage of missing data, and (c) Level-2 sample size. Listwise deletion outperformed all other methods across all study conditions in the estimation of both fixed-effects and variance components. The model-based procedures evaluated, EM and MI, outperformed the other traditional MDTs, mean and group mean substitution, in the estimation of the variance components, outperforming mean substitution in the estimation of the fixed-effects as well. Group mean substitution performed well in the estimation of the fixed-effects, but poorly in the estimation of the variance components. Data in the current study were modeled as missing completely at random (MCAR). Further research is suggested to compare the performance of model-based versus traditional MDTs, specifically listwise deletion, when data are missing at random (MAR), a condition that is more likely to occur in practical research settings.
Date: August 2011
Creator: St. Clair, Suzanne W.
Partner: UNT Libraries

The Use Of Effect Size Estimates To Evaluate Covariate Selection, Group Separation, And Sensitivity To Hidden Bias In Propensity Score Matching.

Description: Covariate quality has been primarily theory driven in propensity score matching with a general adversity to the interpretation of group prediction. However, effect sizes are well supported in the literature and may help to inform the method. Specifically, I index can be used as a measure of effect size in logistic regression to evaluate group prediction. As such, simulation was used to create 35 conditions of I, initial bias and sample size to examine statistical differences in (a) post-matching bias reduction and (b) treatment effect sensitivity. The results of this study suggest these conditions do not explain statistical differences in percent bias reduction of treatment likelihood after matching. However, I and sample size do explain statistical differences in treatment effect sensitivity. Treatment effect sensitivity was lower when sample sizes and I increased. However, this relationship was mitigated within smaller sample sizes as I increased above I = .50.
Date: December 2011
Creator: Lane, Forrest C.
Partner: UNT Libraries

An Investigation of the Effect of Violating the Assumption of Homogeneity of Regression Slopes in the Analysis of Covariance Model upon the F-Statistic

Description: The study seeks to determine the effect upon the F-statistic of violating the assumption of homogeneity of regression slopes in the one-way, fixed-effects analysis of covariance model. The study employs a Monte Carlo simulation technique to vary the degree of heterogeneity of regression slopes with varied sample sizes within experiments to determine the effect of such conditions. One hundred and eighty-three simulations were used.
Date: August 1972
Creator: McClaran, Virgil Rutledge
Partner: UNT Libraries

Convergent Validity of Variables Residualized By a Single Covariate: the Role of Correlated Error in Populations and Samples

Description: This study examined the bias and precision of four residualized variable validity estimates (C0, C1, C2, C3) across a number of study conditions. Validity estimates that considered measurement error, correlations among error scores, and correlations between error scores and true scores (C3) performed the best, yielding no estimates that were practically significantly different than their respective population parameters, across study conditions. Validity estimates that considered measurement error and correlations among error scores (C2) did a good job in yielding unbiased, valid, and precise results. Only in a select number of study conditions were C2 estimates unable to be computed or produced results that had sufficient variance to affect interpretation of results. Validity estimates based on observed scores (C0) fared well in producing valid, precise, and unbiased results. Validity estimates based on observed scores that were only corrected for measurement error (C1) performed the worst. Not only did they not reliably produce estimates even when the level of modeled correlated error was low, C1 produced values higher than the theoretical limit of 1.0 across a number of study conditions. Estimates based on C1 also produced the greatest number of conditions that were practically significantly different than their population parameters.
Date: May 2013
Creator: Nimon, Kim
Partner: UNT Libraries

A Comparison of Traditional Norming and Rasch Quick Norming Methods

Description: The simplicity and ease of use of the Rasch procedure is a decided advantage. The test user needs only two numbers: the frequency of persons who answered each item correctly and the Rasch-calibrated item difficulty, usually a part of an existing item bank. Norms can be computed quickly for any specific group of interest. In addition, once the selected items from the calibrated bank are normed, any test, built from the item bank, is automatically norm-referenced. Thus, it was concluded that the Rasch quick norm procedure is a meaningful alternative to traditional classical true score norming for test users who desire normative data.
Date: August 1993
Creator: Bush, Joan Spooner
Partner: UNT Libraries

A Comparison of Two Differential Item Functioning Detection Methods: Logistic Regression and an Analysis of Variance Approach Using Rasch Estimation

Description: Differential item functioning (DIF) detection rates were examined for the logistic regression and analysis of variance (ANOVA) DIF detection methods. The methods were applied to simulated data sets of varying test length (20, 40, and 60 items) and sample size (200, 400, and 600 examinees) for both equal and unequal underlying ability between groups as well as for both fixed and varying item discrimination parameters. Each test contained 5% uniform DIF items, 5% non-uniform DIF items, and 5% combination DIF (simultaneous uniform and non-uniform DIF) items. The factors were completely crossed, and each experiment was replicated 100 times. For both methods and all DIF types, a test length of 20 was sufficient for satisfactory DIF detection. The detection rate increased significantly with sample size for each method. With the ANOVA DIF method and uniform DIF, there was a difference in detection rates between discrimination parameter types, which favored varying discrimination and decreased with increased sample size. The detection rate of non-uniform DIF using the ANOVA DIF method was higher with fixed discrimination parameters than with varying discrimination parameters when relative underlying ability was unequal. In the combination DIF case, there was a three-way interaction among the experimental factors discrimination type, relative ability, and sample size for both detection methods. The error rate for the ANOVA DIF detection method decreased as test length increased and increased as sample size increased. For both methods, the error rate was slightly higher with varying discrimination parameters than with fixed. For logistic regression, the error rate increased with sample size when relative underlying ability was unequal between groups. The logistic regression method detected uniform and non-uniform DIF at a higher rate than the ANOVA DIF method. Because the type of DIF present in real data is rarely known, the logistic regression method is recommended for ...
Date: August 1995
Creator: Whitmore, Marjorie Lee Threet
Partner: UNT Libraries

An Empirical Comparison of Random Number Generators: Period, Structure, Correlation, Density, and Efficiency

Description: Random number generators (RNGs) are widely used in conducting Monte Carlo simulation studies, which are important in the field of statistics for comparing power, mean differences, or distribution shapes between statistical approaches. Statistical results, however, may differ when different random number generators are used. Often older methods have been blindly used with no understanding of their limitations. Many random functions supplied with computers today have been found to be comparatively unsatisfactory. In this study, five multiplicative linear congruential generators (MLCGs) were chosen which are provided in the following statistical packages: RANDU (IBM), RNUN (IMSL), RANUNI (SAS), UNIFORM(SPSS), and RANDOM (BMDP). Using a personal computer (PC), an empirical investigation was performed using five criteria: period length before repeating random numbers, distribution shape, correlation between adjacent numbers, density of distributions and normal approach of random number generator (RNG) in a normal function. All RNG FORTRAN programs were rewritten into Pascal which is more efficient language for the PC. Sets of random numbers were generated using different starting values. A good RNG should have the following properties: a long enough period; a well-structured pattern in distribution; independence between random number sequences; random and uniform distribution; and a good normal approach in the normal distribution. Findings in this study suggested that the above five criteria need to be examined when conducting a simulation study with large enough sample sizes and various starting values because the RNG selected can affect the statistical results. Furthermore, a study for purposes of indicating reproducibility and validity should indicate the source of the RNG, the type of RNG used, evaluation results of the RNG, and any pertinent information related to the computer used in the study. Recommendations for future research are suggested in the area of other RNGs and methods not used in this study, such as additive, combined, ...
Date: August 1995
Creator: Bang, Jung Woong
Partner: UNT Libraries

A Comparison of Three Criteria Employed in the Selection of Regression Models Using Simulated and Real Data

Description: Researchers who make predictions from educational data are interested in choosing the best regression model possible. Many criteria have been devised for choosing a full or restricted model, and also for selecting the best subset from an all-possible-subsets regression. The relative practical usefulness of three of the criteria used in selecting a regression model was compared in this study: (a) Mallows' C_p, (b) Amemiya's prediction criterion, and (c) Hagerty and Srinivasan's method involving predictive power. Target correlation matrices with 10,000 cases were simulated so that the matrices had varying degrees of effect sizes. The amount of power for each matrix was calculated after one or two predictors was dropped from the full regression model, for sample sizes ranging from n = 25 to n = 150. Also, the null case, when one predictor was uncorrelated with the other predictors, was considered. In addition, comparisons for regression models selected using C_p and prediction criterion were performed using data from the National Educational Longitudinal Study of 1988.
Date: December 1994
Creator: Graham, D. Scott
Partner: UNT Libraries