Search Results

The Cluster Hypothesis: A Visual/Statistical Analysis
By allowing judgments based on a small number of exemplar documents to be applied to a larger number of unexamined documents, clustered presentation of search results represents an intuitively attractive possibility for reducing the cognitive resource demands on human users of information retrieval systems. However, clustered presentation of search results is sensible only to the extent that naturally occurring similarity relationships among documents correspond to topically coherent clusters. The Cluster Hypothesis posits just such a systematic relationship between document similarity and topical relevance. To date, experimental validation of the Cluster Hypothesis has proved problematic, with collection-specific results both supporting and failing to support this fundamental theoretical postulate. The present study consists of two computational information visualization experiments, representing a two-tiered test of the Cluster Hypothesis under adverse conditions. Both experiments rely on multidimensionally scaled representations of interdocument similarity matrices. Experiment 1 is a term-reduction condition, in which descriptive titles are extracted from Associated Press news stories drawn from the TREC information retrieval test collection. The clustering behavior of these titles is compared to the behavior of the corresponding full text via statistical analysis of the visual characteristics of a two-dimensional similarity map. Experiment 2 is a dimensionality reduction condition, in which inter-item similarity coefficients for full text documents are scaled into a single dimension and then rendered as a two-dimensional visualization; the clustering behavior of relevant documents within these unidimensionally scaled representations is examined via visual and statistical methods. Taken as a whole, results of both experiments lend strong though not unqualified support to the Cluster Hypothesis. In Experiment 1, semantically meaningful 6.6-word document surrogates systematically conform to the predictions of the Cluster Hypothesis. In Experiment 2, the majority of the unidimensionally scaled datasets exhibit a marked nonuniformity of distribution of relevant documents, further supporting the Cluster Hypothesis. Results of …
Creating a Criterion-Based Information Agent Through Data Mining for Automated Identification of Scholarly Research on the World Wide Web
This dissertation creates an information agent that correctly identifies Web pages containing scholarly research approximately 96% of the time. It does this by analyzing the Web page with a set of criteria, and then uses a classification tree to arrive at a decision. The criteria were gathered from the literature on selecting print and electronic materials for academic libraries. A Delphi study was done with an international panel of librarians to expand and refine the criteria until a list of 41 operationalizable criteria was agreed upon. A Perl program was then designed to analyze a Web page and determine a numerical value for each criterion. A large collection of Web pages was gathered comprising 5,000 pages that contain the full work of scholarly research and 5,000 random pages, representative of user searches, which do not contain scholarly research. Datasets were built by running the Perl program on these Web pages. The datasets were split into model building and testing sets. Data mining was then used to create different classification models. Four techniques were used: logistic regression, nonparametric discriminant analysis, classification trees, and neural networks. The models were created with the model datasets and then tested against the test dataset. Precision and recall were used to judge the effectiveness of each model. In addition, a set of pages that were difficult to classify because of their similarity to scholarly research was gathered and classified with the models. The classification tree created the most effective classification model, with a precision ratio of 96% and a recall ratio of 95.6%. However, logistic regression created a model that was able to correctly classify more of the problematic pages. This agent can be used to create a database of scholarly research published on the Web. In addition, the technique can be used to create a …
Empowering Agent for Oklahoma School Learning Communities: An Examination of the Oklahoma Library Improvement Program
The purposes of this study were to determine the initial impact of the Oklahoma Library Media Improvement Grants on Oklahoma school library media programs; assess whether the Oklahoma Library Media Improvement Grants continue to contribute to Oklahoma school learning communities; and examine possible relationships between school library media programs and student academic success. It also seeks to document the history of the Oklahoma Library Media Improvement Program 1978 - 1994 and increase awareness of its influence upon the Oklahoma school library media programs. Methods of data collection included: examining Oklahoma Library Media Improvement Program archival materials; sending a survey to 1703 school principals in Oklahoma; and interviewing Oklahoma Library Media Improvement Program participants. Data collection took place over a one year period. Data analyses were conducted in three primary phases: descriptive statistics and frequencies were disaggregated to examine mean scores as they related to money spent on school library media programs; opinions of school library media programs; and possible relationships between school library media programs and student academic achievement. Analysis of variance was used in the second phase of data analysis to determine if any variation between means was significant as related to Oklahoma Library Improvement Grants, time spent in the library media center by library media specialists, principal gender, opinions of library media programs, student achievement indicators, and the region of the state in which the respondent was located. The third phase of data analysis compared longitudinal data collected in the 2000 survey with past data. The primary results indicated students in Oklahoma from schools with a centralized library media center, served by a full-time library media specialist, and the school having received one or more Library Media Improvement Grants scored significantly higher academically than students in schools not having a centralized library media center, not served by a …
An Examination Of The Variation In Information Systems Project Cost Estimates: The Case Of Year 2000 Compliance Projects
The year 2000 (Y2K) problem presented a fortuitous opportunity to explore the relationship between estimated costs of software projects and five cost influence dimensions described by the Year 2000 Enterprise Cost Model (Kappelman, et al., 1998) -- organization, problem, solution, resources, and stage of completion. This research was a field study survey of (Y2K) project managers in industry, government, and education and part of a joint project that began in 1996 between the University of North Texas and the Y2K Working Group of the Society for Information Management (SIM). Evidence was found to support relationships between estimated costs and organization, problem, resources, and project stage but not for the solution dimension. Project stage appears to moderate the relationships for organization, particularly IS practices, and resources. A history of superior IS practices appears to mean lower estimated costs, especially for projects in larger IS organizations. Acquiring resources, especially external skills, appears to increase costs. Moreover, projects apparently have many individual differences, many related to size and to project stage, and their influences on costs appear to be at the sub-dimension or even the individual variable level. A Revised Year 2000 Enterprise Model is presented incorporating this granularity. Two primary conclusions can be drawn from this research: (1) large software projects are very complex and thus cost estimating is also; and (2) the devil of cost estimating is in the details of knowing which of the many possible variables are the important ones for each particular enterprise and project. This points to the importance of organizations keeping software project metrics and the historical calibration of cost-estimating practices. Project managers must understand the relevant details and their interaction and importance in order to successfully develop a cost estimate for a particular project, even when rational cost models are used. This research also indicates …
An Experimental Study of Teachers' Verbal and Nonverbal Immediacy, Student Motivation, and Cognitive Learning in Video Instruction
This study used an experimental design and a direct test of recall to provide data about teacher immediacy and student cognitive learning. Four hypotheses and a research question addressed two research problems: first, how verbal and nonverbal immediacy function together and/or separately to enhance learning; and second, how immediacy affects cognitive learning in relation to student motivation. These questions were examined in the context of video instruction to provide insight into distance learning processes and to ensure maximum control over experimental manipulations. Participants (N = 347) were drawn from university students in an undergraduate communication course. Students were randomly assigned to groups, completed a measure of state motivation, and viewed a 15-minute video lecture containing part of the usual course content delivered by a guest instructor. Participants were unaware that the video instructor was actually performing one of four scripted manipulations reflecting higher and lower combinations of specific verbal and nonverbal cues, representing the four cells of the 2x2 research design. Immediately after the lecture, students completed a recall measure, consisting of portions of the video text with blanks in the place of key words. Participants were to fill in the blanks with exact words they recalled from the videotape. Findings strengthened previous research associating teacher nonverbal immediacy with enhanced cognitive learning outcomes. However, higher verbal immediacy, in the presence of higher and lower nonverbal immediacy, was not shown to produce greater learning among participants in this experiment. No interaction effects were found between higher and lower levels of verbal and nonverbal immediacy. Recall scores were comparatively low in the presence of higher verbal and lower nonverbal immediacy, suggesting that nonverbal expectancy violations may have hindered cognitive learning. Student motivation was not found to be a significant source of error in measuring immediacy's effects, and no interaction effects were detected …
Identifying At-Risk Students: An Assessment Instrument for Distributed Learning Courses in Higher Education
The current period of rapid technological change, particularly in the area of mediated communication, has combined with new philosophies of education and market forces to bring upheaval to the realm of higher education. Technical capabilities exceed our knowledge of whether expenditures on hardware and software lead to corresponding gains in student learning. Educators do not yet possess sophisticated assessments of what we may be gaining or losing as we widen the scope of distributed learning. The purpose of this study was not to draw sweeping conclusions with respect to the costs or benefits of technology in education. The researcher focused on a single issue involved in educational quality: assessing the ability of a student to complete a course. Previous research in this area indicates that attrition rates are often higher in distributed learning environments. Educators and students may benefit from a reliable instrument to identify those students who may encounter difficulty in these learning situations. This study is aligned with research focused on the individual engaged in seeking information, assisted or hindered by the capabilities of the computer information systems that create and provide access to information. Specifically, the study focused on the indicators of completion for students enrolled in video conferencing and Web-based courses. In the final version, the Distributed Learning Survey encompassed thirteen indicators of completion. The results of this study of 396 students indicated that the Distributed Learning Survey represented a reliable and valid instrument for identifying at-risk students in video conferencing and Web-based courses where the student population is similar to the study participants. Educational level, GPA, credit hours taken in the semester, study environment, motivation, computer confidence, and the number of previous distributed learning courses accounted for most of the predictive power in the discriminant function based on student scores from the survey.
MEDLINE Metric: A method to assess medical students' MEDLINE search effectiveness
Medical educators advocate the need for medical students to acquire information management skills, including the ability to search the MEDLINE database. There has been no published validated method available to use for assessing medical students' MEDLINE information retrieval skills. This research proposes and evaluates a method, designed as the MEDLINE Metric, for assessing medical students' search skills. MEDLINE Metric consists of: (a) the development, by experts, of realistic clinical scenarios that include highly constructed search questions designed to test defined search skills; (b) timed tasks (searches) completed by subjects; (c) the evaluation of search results; and (d) instructive feedback. A goal is to offer medical educators a valid, reliable, and feasible way to judge mastery of information searching skill by measuring results (search retrieval) rather than process (search behavior) or cognition (knowledge about searching). Following a documented procedure for test development, search specialists and medical content experts formulated six clinical search scenarios and questions. One hundred and forty-five subjects completed the six-item test under timed conditions. Subjects represented a wide range of MEDLINE search expertise. One hundred twenty complete cases were used, representing 53 second-year medical students (44%), 47 fourth-year medical students (39%), and 20 medical librarians (17%). Data related to educational level, search training, search experience, confidence in retrieval, difficulty of search, and score were analyzed. Evidence supporting the validity of the method includes the agreement by experts about the skills and knowledge necessary to successfully retrieve information relevant to a clinical question from the MEDLINE database. Also, the test discriminated among different performance levels. There were statistically significant, positive relationships between test score and level of education, self-reported previous MEDLINE training, and self-reported previous search experience. The findings from this study suggest that MEDLINE Metric is a valid method for constructing and administering a performance-based test to identify …
Modeling Utilization of Planned Information Technology
Implementations of information technology solutions to address specific information problems are only successful when the technology is utilized. The antecedents of technology use involve user, system, task and organization characteristics as well as externalities which can affect all of these entities. However, measurement of the interaction effects between these entities can act as a proxy for individual attribute values. A model is proposed which based upon evaluation of these interaction effects can predict technology utilization. This model was tested with systems being implemented at a pediatric health care facility. Results from this study provide insight into the relationship between the antecedents of technology utilization. Specifically, task time provided significant direct causal effects on utilization. Indirect causal effects were identified in task value and perceived utility constructs. Perceived utility, along with organizational support also provided direct causal effects on user satisfaction. Task value also impacted user satisfaction in an indirect fashion. Also, results provide a predictive model and taxonomy of variables which can be applied to predict or manipulate the likelihood of utilization for planned technology.
Public School Educators' Use of Computer-Mediated Communication
This study examined the uses of computer-mediated communication (CMC) by educators in selected public schools. It used Rogers' Diffusion of Innovation Theory as the underpinnings of the study. CMC refers to any exchange of information that involves the use of computers for communication between individuals or individuals and a machine. This study was an exploration of difficulties users confront, what services they access, and the tasks they accomplish when using CMC. It investigated the factors that affect the use of CMC. The sample population was drawn from registered users on TENET, the Texas Education Network as of December 1997. The educators were described with frequency and percentages analyzing the demographic data. For the research, eight indices were selected to test how strongly these user and environmental attributes were associated with the use of CMC. These variables were (1) education, (2) position, (3) place of employment, (4) geographic location, (5) district size, (6) organization vitality, (7) adopter resources, and (8) instrumentality Two dependent variables were used to test for usage: (1) depth or frequency of CMC usage and amount of time spent online and (2) breadth or variety of Internet utilities used. Additionally, the users' perception of network benefits was measured. Network benefits were correlated with social interaction and perception of CMC to investigate what tasks educators were accomplishing with CMC. Correlations, SEQ CHAPTER h r 1 crosstabulations, and ANOVAs were used to analysis the data for testing the four hypotheses. The major findings of the study, based on the hypotheses tested, were that the socioeconomic variables of education and position influenced the use of CMC. A significant finding is that teachers used e-mail and for Internet resources less frequently than those in other positions. An interesting finding was that frequency of use was more significant for usage than amount of …
Relevance Thresholds: A Conjunctive/Disjunctive Model of End-User Cognition as an Evaluative Process
This investigation identifies end-user cognitive heuristics that facilitate judgment and evaluation during information retrieval (IR) system interactions. The study extends previous research surrounding relevance as a key construct for representing the value end-users ascribe to items retrieved from IR systems and the perceived effectiveness of such systems. The Lens Model of user cognition serves as the foundation for design and interpretation of the study; earlier research in problem solving, decision making, and attitude formation also contribute to the model and analysis. A self reporting instrument collected evaluative responses from 32 end-users related to 1432 retrieved items in relation to five characteristics of each item: topical, pertinence, utility, systematic, and motivational levels of relevance. The nominal nature of the data collected led to non-parametric statistical analyses that indicated that end-user evaluation of retrieved items to resolve an information problem at hand is most likely a multi-stage process. That process appears to be a cognitive progression from topic to meaning (pertinence) to functionality (use). Each step in end-user evaluative processing engages a cognitive hierarchy of heuristics that includes consideration (of appropriate cues), differentiation (the positive or negative aspects of those cues considered), and aggregation (the combination of differentiated cue aspects needed to render an evaluative label of the item in relation to the information problem at hand). While individuals may differ in their judgments and evaluations of retrieved items, they appear to make those decisions by using consistent heuristic approaches.
The Second Vatican Council and American Catholic Theological Research: A Bibliometric Analysis of Theological Studies: 1940-1995
A descriptive analysis was given of the characteristics of the authors and citations of the articles in the journal Theological Studies from 1940-1995. Data was gathered on the institutional affiliation, geographic location, occupation, and gender and personal characteristics of the author. The citation characteristics were examined for the cited authors, date and age of the citations, format, language, place of publication, and journal titles. These characteristics were compared to the time-period before and after the Second Vatican Council in order to detect any changes that might have occurred in the characteristics after certain recommendations by the council were made to theologians. Subject dispersion of the literature was also analyzed. Lotka's Law of author productivity and Bradford's Law of title dispersion were also performed for this literature. The profile of the characteristics of the authors showed that the articles published by women and laypersons has increased since the recommendations of the council. The data had a good fit to Lotka's Law for the pre-Vatican II time period but not for the period after Vatican II. The data was a good fit to Bradford's Law for the predicted number of journals in the nucleus and Zone 2, but the observed number of journals in Zone 3 was higher than predicted for all time-periods. Subject dispersion of research from disciplines other than theology is low but citation to works from the fields of education, psychology, social sciences, and science has increased since Vatican II. The results of the analysis of the characteristics of the citations showed that there was no significant change in the age, format and languages used, or the geographic location of the publisher of the cited works after Vatican II. Citation characteristics showed that authors prefer research from monographs published in English and in U.S. locations for all time-periods. Research …
A Study of Graphically Chosen Features for Representation of TREC Topic-Document Sets
Document representation is important for computer-based text processing. Good document representations must include at least the most salient concepts of the document. Documents exist in a multidimensional space that difficult the identification of what concepts to include. A current problem is to measure the effectiveness of the different strategies that have been proposed to accomplish this task. As a contribution towards this goal, this dissertation studied the visual inter-document relationship in a dimensionally reduced space. The same treatment was done on full text and on three document representations. Two of the representations were based on the assumption that the salient features in a document set follow the chi-distribution in the whole document set. The third document representation identified features through a novel method. A Coefficient of Variability was calculated by normalizing the Cartesian distance of the discriminating value in the relevant and the non-relevant document subsets. Also, the local dictionary method was used. Cosine similarity values measured the inter-document distance in the information space and formed a matrix to serve as input to the Multi-Dimensional Scale (MDS) procedure. A Precision-Recall procedure was averaged across all treatments to statistically compare them. Treatments were not found to be statistically the same and the null hypotheses were rejected.
The Validity of Health Claims on the World Wide Web: A Case Study of the Herbal Remedy Opuntia
The World Wide Web has become a significant source of medical information for the public, but there is concern that much of the information is inaccurate, misleading, and unsupported by scientific evidence. This study analyzes the validity of health claims on the World Wide Web for the herbal Opuntia using an evidence-based approach, and supports the observation that individuals must critically assess health information in this relatively new medium of communication. A systematic search by means of nine search engines and online resources of Web sites relating to herbal remedies was conducted and specific sites providing information on the cactus herbal remedy from the genus Opuntia were retrieved. Validity of therapeutic health claims on the Web sites was checked by comparison with reports in the scientific literature subjected to two established quality assessment rating instruments. 184 Web sites from a variety of sources were retrieved and evaluated, and 98 distinct health claims were identified. 53 scientific reports were retrieved to validate claims. 25 involved human subjects, and 28 involved animal or laboratory models. Only 33 (34%) of the claims were addressed in the scientific literature. For 3% of the claims, evidence from the scientific reports was conflicting or contradictory. Of the scientific reports involving human subjects, none met the predefined criteria for high quality as determined by quality assessment rating instruments. Two-thirds of the claims were unsupported by scientific evidence and were based on folklore, or indirect evidence from related sources. Information on herbal remedies such as Opuntia is well represented on the World Wide Web. Health claims on Web sites were numerous and varied widely in subject matter. The determination of the validity of information about claims made for herbals on the Web would help individuals assess their value in medical treatment. However, the Web is conducive to dubious …
Back to Top of Screen