You limited your search to:
Resource Type:
Paper
Language:
English
Collection:
UNT Scholarly Works
The Semantic Wildcard
Date: May 2002
Creator: Mihalcea, Rada
Description: This paper introduces the semantic wildcard. The IRSLO (Information Retrieval using Semantic and Lexical Operators) project aims at integrating semantic and lexical information into the retrieval process, in order to overcome some of the impediments currently encountered with today's information retrieval systems. This paper introduces the semantic wildcard, one of the most powerful operators implemented in IRSLO, which allows for searches along general-specific lines. The semantic wildcard, denoted with #, acts in a manner similar with the lexical wildcard, but at semantic levels, enabling the retrieval of subsumed concepts. For instance, a search for animal# will match any concept that is of type animal, including dog, goat, and so forth, thereby going beyond the explicit knowledge stated in texts. This operator, together with a lexical locality operator that enables the retrieval of paragraphs rather than entire documents, have been both implemented in the IRSLO system and tested on requests of information run against an index of 130,000 documents. Significant improvement was observed over classic keyword-based retrieval systems in terms of precision, recall and success rate.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc83308/
Building a Sense Tagged Corpus with Open Mind Word Expert
Date: July 2002
Creator: Chklovski, Timothy & Mihalcea, Rada
Description: This paper discusses Open Mind Word Expert, an implemented active learning system for collecting word sense tagging from the general public over the Web. It is available at http://teach-computers.org. The authors expect the system to yield a large volume of high-quality training data at a much lower cost than the traditional method of hiring lexicographers. The authors thus propose a Senseval-3 lexical sample activity where the training data is collected via Open Mind Word Expert. If successful, the collection process can be extended to create the definitive corpus of word sense information.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc81389/
Dynamic Agent Population in Agent-Based Distance Vector Routing
Date: August 2002
Creator: Amin, Kaizar A. & Mikler, Armin R.
Description: This paper discusses dynamic agent population in agent-based distance vector routing. Abstract: The Intelligent mobile agent paradigm can be applied to a wide variety of intrinsically parallel and distributed applications. Network routing is one such application that can be mapped to an agent-based approach. The performance of any agent-based system will depend on its agent population. Although a lot of research has been conducted on agent-based systems, little consideration has been given to the importance of agent population in dynamic networks. A large number of constituent agents can increase the resource overhead of the system, thereby impeding the overall performance of the network. Hence, it is imperative to find the optimal number of agents in the system that would maximize the efficiency of the agent-based mechanism in the network. This optimal value cannot be determined manually, thereby emphasizing the need for an adaptive approach that manipulates the number of agents in the system based on its resource availability. This paper discusses an agent-based approach to Distance Vector Routing, referred as Agent-based Distance Vector Routing and also describes an adaptive approach controlling the number of agents in the network using pheromones and discusses their limitations.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc132968/
Instance Based Learning with Automatic Feature Selection Applied to Word Sense Disambiguation
Date: August 2002
Creator: Mihalcea, Rada, 1974-
Description: This paper discusses instance based learning with automatic feature selection applied to word sense disambiguation. Abstract We describe an algorithm for Word Sense Disambiguation (WSD) that relies on a lazy learner improved with automatic feature selection. The algorithm was implemented in a system that achieves excellent performance on the set of data released during the SENSEVAL-2 competition. We present the results obtained and discuss the performance of various features in the context of supervised learning algorithms for WSD.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc30943/
Letter Level Learning for Language Independent Diacritics Restoration
Date: September 2002
Creator: Mihalcea, Rada, 1974- & Nastase, Vivi
Description: This paper discusses letter level learning for language independent diacritics restoration. Abstract: This paper presents a method for diacritics restoration based on learning mechanisms that act at letter level. The method requires no additional tagging tools or resources other than raw text, which makes it independent of the language, and particularly appealing for languages for which there are few resources available. The algorithm was evaluated on four different languages, namely Czech, Hungarian, Polish, and Romanian, and an average accuracy of over 98% was observed.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc30944/
Classifier Stacking and Voting for Text Filtering
Date: November 2002
Creator: Mihalcea, Rada, 1974-
Description: Abstract: This paper summarizes the approach and the results of the TextCat system participating in the Filtering track in the Text Retrieval Conference 2002. The system relies primarily on statistical methods, and was designed with the main purpose of having a backbone system in which we can further integrate semantic components, and evaluate their relative performance as compared to traditional statistical approaches. They system is therefore simple, and is based on techniques for keywords extraction, and various classifier combinations including stacking and voting. TextCat participated in the Batch and Routing tasks. In the Batch task, it achieved a score of 39.02% normalized utility, and 26.37% F-measure respectively, averaged over all topics. The averaged uninterpolated precision for our best routing submission was 14.16%.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc30942/
Assessing Metadata Utilization: An Analysis of MARC Content Designation Use
Date: 2003
Creator: Moen, William E. & Benardino, Penelope
Description: This paper discusses metadata utilization. Abstract: Metadata schemes emerge to meet community and user requirements, and they evolve over time to meet changing requirements. This paper reports results of an analysis of a large sample of MARC 21 bibliographic records. MARC 21 is an encoding scheme related closely to metadata elements occurring in library bibliographic records. The records were analyzed for the utilization of content designation available in MARC 21. Results indicate that less than 5% of available content designation accounts for over 80% of occurrences. The implications of these findings affect indexing policies, system design, and can inform setting requirements for extending a metadata scheme based on a threshold of community requirements.
Contributing Partner: UNT College of Information
Permallink:digital.library.unt.edu/ark:/67531/metadc36303/
Targeted Access for Varied Audiences to Integrated, Heterogeneous Digital Information Resources
Date: 2003
Creator: Alemneh, Daniel Gelaw; Hartman, Cathy Nelson & Hastings, Samantha Kelly
Description: This poster presents an overview of the University of North Texas (UNT) Libraries' "Portal to Texas History" project, which aims to integrate and ensure long-term access to large quantities of heterogeneous digital resources from many different institutions. Portals have emerged as an important tool for facilitating single-point-access to digital resources. The UNT Library is undertaking the leadership role by creating the application framework, setting project standards and guidelines, and facilitating collaborative efforts for content building. Also discussed are expanded services for targeted audiences, project approaches to preservation challenges, collaboration benefits, and other issues that emerged in the process of building a platform for the portal system.
Contributing Partner: UNT Libraries
Permallink:digital.library.unt.edu/ark:/67531/metadc29309/
Performance Analysis of a Part of Speech Tagging Task
Date: February 2003
Creator: Mihalcea, Rada
Description: In this paper, the author attempts to make a formal analysis of the performance in automatic part of speech tagging. Lower and upper bounds in tagging precision using existing taggers or their combination are provided. Since the author shows that with existing taggers, automatic perfect tagging is not possible, two solutions for applications requiring very high precision are presented: (1) a solution involving minimum human intervention for a precision of over 98.7%, and (2) a combination of taggers using a memory based learning algorithm that succeeds in reducing the error rate with 11.6% with respect to the best tagger involved.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc30950/
Creating Large Annotated Data Collections with Web Users' Help
Date: April 2003
Creator: Mihalcea, Rada, 1974- & Chklovski, Timothy A. (Timothy Anatolievich), 1977
Description: This paper discusses creating annotated data collections. Abstract: Open Mind Word Expert is an implemented active learning system that aims to create large annotated corpora by tapping into the world's vast pool of knowledge. It does this by relying on the vast number of Web users who contribute their knowledge to data annotation. Open Mind Word Expert focuses on building semantically annotated corpora, by collecting word sense tagging from the general public over the Web. During the first nine months of activity, the system yielded 90,000 high quality tagged items at a much lower cost than the traditional method of hiring lexicographers.
Contributing Partner: UNT College of Engineering
Permallink:digital.library.unt.edu/ark:/67531/metadc30949/