You limited your search to:
Department:
Computer Science and Engineering
Collection:
UNT Scholarly Works
- Performance Analysis of a Part of Speech Tagging Task
- In this paper, the author attempts to make a formal analysis of the performance in automatic part of speech tagging. Lower and upper bounds in tagging precision using existing taggers or their combination are provided. Since the author shows that with existing taggers, automatic perfect tagging is not possible, two solutions for applications requiring very high precision are presented: (1) a solution involving minimum human intervention for a precision of over 98.7%, and (2) a combination of taggers using a memory based learning algorithm that succeeds in reducing the error rate with 11.6% with respect to the best tagger involved. digital.library.unt.edu/ark:/67531/metadc30950/
- Word Sense Disambiguation with Pattern Learning and Automatic Feature Selection
- This paper presents a novel approach for word sense disambiguation. The underlying algorithm has two main components: (1) pattern learning from available sense-tagged corpora (SemCor), from dictionary definitions (WordNet) and from a generated corpus (GenCor), and (2) instance based learning with automatic feature selection, when training data is available for a particular word. The ideas described in this paper were implemented in a system that achieved the best score during the SENSEVAL-2 evaluation exercise, for both English all words and English lexical sample tasks. digital.library.unt.edu/ark:/67531/metadc30945/
- Classifier Stacking and Voting for Text Filtering
- Abstract: This paper summarizes the approach and the results of the TextCat system participating in the Filtering track in the Text Retrieval Conference 2002. The system relies primarily on statistical methods, and was designed with the main purpose of having a backbone system in which we can further integrate semantic components, and evaluate their relative performance as compared to traditional statistical approaches. They system is therefore simple, and is based on techniques for keywords extraction, and various classifier combinations including stacking and voting. TextCat participated in the Batch and Routing tasks. In the Batch task, it achieved a score of 39.02% normalized utility, and 26.37% F-measure respectively, averaged over all topics. The averaged uninterpolated precision for our best routing submission was 14.16%. digital.library.unt.edu/ark:/67531/metadc30942/
- Letter Level Learning for Language Independent Diacritics Restoration
- This paper discusses letter level learning for language independent diacritics restoration. Abstract: This paper presents a method for diacritics restoration based on learning mechanisms that act at letter level. The method requires no additional tagging tools or resources other than raw text, which makes it independent of the language, and particularly appealing for languages for which there are few resources available. The algorithm was evaluated on four different languages, namely Czech, Hungarian, Polish, and Romanian, and an average accuracy of over 98% was observed. digital.library.unt.edu/ark:/67531/metadc30944/
- Dynamic Agent Population in Agent-Based Distance Vector Routing
- This paper discusses dynamic agent population in agent-based distance vector routing. Abstract: The Intelligent mobile agent paradigm can be applied to a wide variety of intrinsically parallel and distributed applications. Network routing is one such application that can be mapped to an agent-based approach. The performance of any agent-based system will depend on its agent population. Although a lot of research has been conducted on agent-based systems, little consideration has been given to the importance of agent population in dynamic networks. A large number of constituent agents can increase the resource overhead of the system, thereby impeding the overall performance of the network. Hence, it is imperative to find the optimal number of agents in the system that would maximize the efficiency of the agent-based mechanism in the network. This optimal value cannot be determined manually, thereby emphasizing the need for an adaptive approach that manipulates the number of agents in the system based on its resource availability. This paper discusses an agent-based approach to Distance Vector Routing, referred as Agent-based Distance Vector Routing and also describes an adaptive approach controlling the number of agents in the network using pheromones and discusses their limitations. digital.library.unt.edu/ark:/67531/metadc132968/
- Instance Based Learning with Automatic Feature Selection Applied to Word Sense Disambiguation
- This paper discusses instance based learning with automatic feature selection applied to word sense disambiguation. Abstract We describe an algorithm for Word Sense Disambiguation (WSD) that relies on a lazy learner improved with automatic feature selection. The algorithm was implemented in a system that achieves excellent performance on the set of data released during the SENSEVAL-2 competition. We present the results obtained and discuss the performance of various features in the context of supervised learning algorithms for WSD. digital.library.unt.edu/ark:/67531/metadc30943/
- Building a Sense Tagged Corpus with Open Mind Word Expert
- This paper discusses building a sense tagged corpus with Open Mind Word Expert. Abstract: Open Mind Word Expert is an implemented active learning system for collecting word sense tagging from the general public over the Web. It is available at http://teach-computers.org. The authors expect the system to yield a large volume of high-quality training data at a much lower cost than the traditional method of hiring lexicographers. The authors thus propose a Senseval-3 lexical sample activity where the training data is collected via Open Mind Word Expert. If successful, the collection process can be extended to create the definitive corpus of word sense information. digital.library.unt.edu/ark:/67531/metadc81389/
- Open Mind Word Expert: Creating Large Data Collections with Web Users' Help
- This article discusses Open Mind Word Expert (OMWE). The World Wide Web has both exacerbated the need and provided an opportunity for creating automatic tools for language processing. OMWE is a system that aims to tap people's ability to disambiguate words and to give computers the benefit of people's knowledge. Any Web user can visit the OMWE site and contribute some knowledge about the meanings of given words in given sentences. As a result, OMWE creates large sense-tagged corpora that can be used to build automatic WSD systems. digital.library.unt.edu/ark:/67531/metadc83294/
- CDMA Network Design
- This presentation gives an overview of code-division multiple access (CDMA) and inter-cell effects, network capacities, sensitivity analysis of base station locations, pilot-signal power, and transmission power of the mobiles, and concludes with numerical results. digital.library.unt.edu/ark:/67531/metadc30928/
- The Semantic Wildcard
- This paper introduces the semantic wildcard. The IRSLO (Information Retrieval using Semantic and Lexical Operators) project aims at integrating semantic and lexical information into the retrieval process, in order to overcome some of the impediments currently encountered with today's information retrieval systems. This paper introduces the semantic wildcard, one of the most powerful operators implemented in IRSLO, which allows for searches along general-specific lines. The semantic wildcard, denoted with #, acts in a manner similar with the lexical wildcard, but at semantic levels, enabling the retrieval of subsumed concepts. For instance, a search for animal# will match any concept that is of type animal, including dog, goat, and so forth, thereby going beyond the explicit knowledge stated in texts. This operator, together with a lexical locality operator that enables the retrieval of paragraphs rather than entire documents, have been both implemented in the IRSLO system and tested on requests of information run against an index of 130,000 documents. Significant improvement was observed over classic keyword-based retrieval systems in terms of precision, recall and success rate. digital.library.unt.edu/ark:/67531/metadc83308/
- Cell Design to Maximize Capacity in CDMA Networks
- This presentation discusses the code division multiple access (CDMA) inter-cell effects, capacity regions, maximizing network capacity, mobility, a call admission control algorithm, and network performance. digital.library.unt.edu/ark:/67531/metadc30929/
- Agent-based Distance Vector Routing: A Resource Efficient and Scalable approach to Routing in Large Communication Networks
- This article discusses agent-based distance vector routing. Abstract: In spite of the ever-increasing availability of computation and communication resources in modern networks, the overhead associated with network management protocols, such as traffic control and routing, continues to be an important aspect in the design of new methodologies. Resource efficiency of such protocols has become even more prominent with the recent developments of wireless and ad-hoc networks, which are marked by much more severe resource constraints in terms of bandwidth, memory, and computational capabilities. This paper presents an Agent-Based approach to Distance Vector Routing that addresses these resources constraints. Agent-Based Distance Vector Routing (ADVR) is a resource efficient implementation of Distance Vector Routing that is fault tolerant and scales well for large networks. ADVR draws upon some basic biologically inspired principles to facilitate coordination among the mobile agents that implement the routing task. Specifically, simulated pheromones are used to control the movement of agents within the network and to dynamically adjust the number of agents in the population. The behavior of ADVR is analyzed and compared to that of traditional Distance Vector Routing. digital.library.unt.edu/ark:/67531/metadc111275/
- Efficient Energy Saving Scheme for On-Chip Caches
- This paper discusses efficient energy saving scheme for on-chip caches. Abstract: With the reduction in feature size the static power component, such as the leakage power, dominates the dynamic power consumption in the on-chip caches. It has been observed that all cache lines need not be kept alive at all times. Only a very few lines during a given window of time need to be actively powered from the footprint, i.e., they are accessed during that time. Earlier research has addressed the issue of how to determine the set of active lines and how long to keep them active (powered). Circuit techniques have also been developed to keep a cache line in low leakage state i.e., Drowsy State when the line is not being accessed or used. Such a cache is called drowsy cache. These circuit techniques try to achieve maximum reduction in the leakage power without losing the information content and with minimal performance penalty associated with power transitions. These techniques when used with optimal switching scheme, which decides when and what lines to drowse, results in maximum reduction in energy consumed. In this paper, the authors study the cache access pattern to evaluate them and arrive at an optimal scheme to implement the drowsy cache. The authors achieve energy reduction on the average of 88% of maximum gain achievable through the underlying circuit technique. The authors also compare the performance of their scheme with the earlier proposed schemes and show that the authors can achieve up to 6% of higher saving in cache energy for the benchmarks studied (with an average on 4% for all benchmarks with equal weights) without any additional performance penalty. digital.library.unt.edu/ark:/67531/metadc94293/
- Answering complex, list and context questions with LCC's Question-Answering Server
- Abstract: This paper presents the architecture of the Question-Answering server (QAS) developed at the Language Computer Corporation (LCC) and used in the TREC-10 evaluations. LCC's QAS™ extracts answers for (a) factual questions of variable degree of difficulty; (b) questions that expect lists of answers; and (c) questions posed in the context of previous questions and answers. One of the major novelties is the implementation of bridging inference mechanisms that guide the search for answers to complex questions. Additionally, LCC's QAS™ encodes an efficient way of modeling context via reference resolution. In TREC-10, this system generated an RAR of 0.58 on the main task and 0.78 on the context task. digital.library.unt.edu/ark:/67531/metadc83297/
- Automatic generation of a coarse grained WordNet
- This paper discusses automatic generation of a coarse grained WordNet. Abstract: Several principles for the automatic transformation of WordNet into a coarser grained dictionary are proposed. A new version of WordNet is derived, leading to a reduction of 26% in the average polysemy of words, while introducing a small error rate of 2.1%, as measured on a sense tagged corpus. digital.library.unt.edu/ark:/67531/metadc83310/
- eXtended WordNet: progress report
- This paper discusses eXtended WordNet. Abstract: eXtended WordNet (XWN), a morphologically and semantically enhanced version of the WordNet dictionary, is currently build at SMU. There are several phases in the XWN project. This paper focuses on the semantic disambiguation stage of this project, and the preprocessing required by this stage. digital.library.unt.edu/ark:/67531/metadc83309/
- Multicell CDMA Network Design
- This article discusses multicell CDMA network design. Abstract: Traditional design rules for cellular networks are not directly applicable to code division multiple access (CDMA) networks where intercell interference is not mitigated by cell placement and careful frequency planning. For transmission quality requirements, a minimum signal-to-interface ratio (SIR) must be achieved. The base-station location, its pilot-signal power (which determines the size of the cell), and the transmission power of the mobiles all affect the received SIR. In addition, because of the need for power control in CDMA networks, large cells can cause a lot of interference to adjacent small cells, posing another constraint to design. In order to maximize the network capacity associated with a design, we develop a methodology to calculate the sensitivity of capacity to base-station location, pilot-signal power, and transmission power of each mobile. To alleviate the problem caused by difference cell sizes, we introduce the power compensation factor, by which the nominal power of the mobiles in every cell is adjusted. We then use the calculated sensitivities in an iterative algorithm to determine the optimal locations of the base stations, pilot-signal powers, and power compensation factors in order to maximize capacity. We show examples of how networks using these design techniques provide higher capacity than those designed using traditional techniques. digital.library.unt.edu/ark:/67531/metadc30815/
- Document Indexing using Named Entities
- This article discusses document indexing using named entities. Abstract: Current text indexing and retrieval techniques have their roots in the field of Information Retrieval where the task is to extract documents that best match a query. With an ever increasing number of documents available due to the easy access through the Internet, the challenge is to provide users with concise and relevant information. The authors are proposing here a novel, yet simple approach, which indexes the named entities in the documents, such as to improve the relevance of documents retrieved. Experiments performed in finding information related to a set of 75 input questions, from a large collection of 125,000 documents, show that this new technique reduces the number of retrieved documents by a factor of 2, while still retrieving the relevant documents. digital.library.unt.edu/ark:/67531/metadc83311/
- The Role of Lexico-Semantic Feedback in Open-Domain Textual Question-Answering
- This paper presents an open-domain textual Question-Answering system that uses several feedback loops to enhance its performance. These feedback loops combine in a new way statistical results with syntactic, semantic or pragmatic information derived from texts and lexical databases. The paper presents the contribution of each feedback loop to the overall performance of 76% human-assessed precise answers. digital.library.unt.edu/ark:/67531/metadc81390/
- FALCON: Boosting Knowledge for Answer Engines
- This paper discusses FALCON. Abstract: This paper presents the features of FALCON, an answer engine that integrates different forms of syntactic, semantic and pragmatic knowledge for the goal of achieving better performance. The answer engine handles question reformulations, finds the expected answer type from a large hierarchy that incorporates the WordNet semantic net and extracts answers after performing unifications on the semantic forms of the question and its candidate answers. To rule out erroneous answers, it provides justification option, implemented as an abductive proof. In TREC-9, FALCON generated a score of 58% for short answers and 76% for long answers. digital.library.unt.edu/ark:/67531/metadc83296/
- Semantic Indexing using WordNet Senses
- In this paperarticle, the authors describe a boolean Information Retrieval system that adds words semantics to the classic word based indexing. Two of the main tasks of our system, namely the indexing and retrieval components, are using a combined word-based and sense-based approach. The key to our system is a methodology for building semantic representations of open text, at word and collocation level. This new technique, called semantic indexing, shows improved effectiveness over the classic word based indexing techniques. digital.library.unt.edu/ark:/67531/metadc83301/
- The Structure and Performance of an Open-Domain Question Answering System
- This paper presents the architecture, operation and results obtained with the LASSO Question Answering system developed in the Natural Language Processing Laboratory at SMU. To find answers, the system relies on a combination of syntactic and semantic techniques. The search for the answer is based on a novel form of indexing called paragraph indexing. A score of 55.5% for short answers and 64.5% for long answers was achieved at the TREC-8 competition. digital.library.unt.edu/ark:/67531/metadc83312/
- Call Admission Control Scheme for Arbitrary Traffic Distribution in CDMA Cellular Systems
- This presentation discusses call admission control (CAC). The authors define a set of feasible call configurations that results in a CAC algorithm that captures the effect of having an arbitrary traffic distribution and whose complexity scales linearly with the number of cells. digital.library.unt.edu/ark:/67531/metadc81374/
- An Iterative Approach to Word Sense Disambiguation
- This paper discusses an iterative approach to Word Sense Disambiguation. Abstract: In this paper, we present an iterative algorithm for Word Sense Disambiguation. It combines two sources of information: WordNet and a semantic tagged corpus, for the purpose of identifying the correct sense of the words in a given text. It differs from other standard approaches in that the disambiguation process is performed in an iterative manner: starting from free text, a set of disambiguated words is built, using various methods; new words are sense tagged based on their relation to the already disambiguated words, and then added to the set. This iterative process allows us to identify, in the original text, a set of words which can be disambiguated with high precision; 55% of the verbs and nouns are disambiguated with an accuracy of 92%. digital.library.unt.edu/ark:/67531/metadc83304/
- A Semi-Complete Disambiguation Algorithm for Open Text
- This paper discusses a semi-complete disambiguation algorithm for open text. Word Sense Disambiguation (WSD) is one of the most difficult areas of Natural Language Processing (NLP); the semantic comprehension of a text, and the possibility to expand a text with semantically related information, drastically depends on the availability of a highly accurate WSD algorithm. Solutions considered so far by researchers for the WSD problem, are making use of machine readable dictionaries (Leacock, Chodorow and Miller 1998), or the information gathered from raw or semantically disambiguated corpora (Yarowsky 1995). These methods are designed either to work with a few pre-selected words, in which case a high accuracy is obtained, or they are general methods which disambiguate, with lower precision, all the words in a text. With the present work, the authors are trying to achieve a compromise between these two different directions. There are fields in NLP, like Information Retrieval and others, which could benefit from a method which performs a semi-complete disambiguation (i.e. it disambiguates only a certain percentage of the words in a text), but which is highly accurate. digital.library.unt.edu/ark:/67531/metadc83293/
- LASSO: A Tool for Surfing the Answer Net
- This paper discusses LASSO, a tool for surfing the answer net. Abstract: This paper presents the architecture, operation and results obtained with the LASSO system developed in the Natural Language Processing Laboratory at SMU. The system relies on a combination of syntactic and semantic techniques, and lightweight abductive inference to find answers. The search for the answer is based on a novel form of indexing called paragraph indexing. A score of 55.5% for short answers and 64.5% for long answers was achieved. digital.library.unt.edu/ark:/67531/metadc83331/
- Cell Placement in a CDMA Network
- This presentation discusses research on cell placement in a CDMA network. In order to enable iterative cell placement the authors use a computationally efficient iterative process to calculate the inter-cell and intra-cell interferences as a function of pilot-signal power and base station location. digital.library.unt.edu/ark:/67531/metadc81375/
- Improving the search on the Internet by using WordNet and lexical operators
- This article discusses improving the search on the internet by using WordNet and lexical operators. Abstract: This paper presents a natural language interface system to an Internet search engine that provides the following improvements: (1) accepts natural language (English) questions, (2) expands the query, based on a word sense disambiguation method, and (3) uses a new lexical operator to post-process the documents retrieved for extracting only the part of a document that is relevant to a query. The system was tested on 100 queries of which 50 were adopted from the TIPSTER topics collection, provided at the 6th Text Retrieval Conference (TREC-6) and 50 were selected from among the queries submitted by users to an existing Web search engine. The results obtained demonstrate a substantial increase in both the precision and the percentage of queries answered correctly, while the amount of text presented to the user is reduced in comparison with the current Internet search engine technology. digital.library.unt.edu/ark:/67531/metadc83306/
- Flexible Allocation of Capacity in Multi-Cell CDMA Networks
- This presentation discusses flexible allocation of capacity in multi-cell CDMA networks. The effect of reverse power levels on the capacity of a code-division multiple-access (CDMA) cellular network is evaluated. The inter-cell and intra-cell interferences of every cell on every other cell are first calculated for a given network topology. Based on this, the nominal power of users is increased by a factor the authors call the Power Compensation Factor (PCF) which enables small cells to overcome the excessive interference from adjacent large cells. digital.library.unt.edu/ark:/67531/metadc81377/
- Effects of Call Arrival Rate and Mobility on Network Throughput in Multi-Cell CDMA
- This presentation discusses call arrival rate and mobility. The effect of call arrival rate on the capacity of a code-division multiple-access (CDMA) cellular network is evaluated. First the inter-cell and intra-cell interferences of every cell on every other cell are calculated for a given network topology. Then the capacity region for the number of simultaneous calls in every cell is defined for specified system parameters. This region is used to evaluate the new call blocking and handoff call blocking probabilities. digital.library.unt.edu/ark:/67531/metadc81376/
- A Method for Word Sense Disambiguation of Unrestricted Text
- This paper discusses a method for word sense disambiguation of unrestricted text. Abstract: Selecting the most appropriate sense for an ambiguous word in a sentence is a central problem in Natural Language Processing. In this paper, the authors present a method that attempts to disambiguate all the nouns, verbs, adverbs and adjectives in a text, using the senses provided in WordNet. The senses are ranked using two sources of information: (1) the Internet for gathering statistics for word-word co-occurrences and (2) WordNet for measuring the semantic density for a pair of words. The authors report an average accuracy of 80% for the first ranked sense, and 91% for the first two ranked senses. Extensions of this method for larger windows of more than two words are considered. digital.library.unt.edu/ark:/67531/metadc83302/
- An Automatic Method for Generating Sense Tagged Corpora
- This paper discusses an automatic method for generating sense tagged corpora. Abstract: The unavailability of very large corpora with semantically disambiguated words is a major limitation in text processing research. For example, statistical methods for word sense disambiguation of free text are known to achieve high accuracy results when large corpora are available to develop context rules, to train and test them. This article presents a novel approach to automatically generate arbitrarily large corpora for word senses. The method is based on (1) the information provided in WordNet, used to formulate queries consisting of synonyms or definitions of word senses, and (2) the information gathered from Internet using existing search engines. The method was tested on 120 word senses and a precision of 91% was observed. digital.library.unt.edu/ark:/67531/metadc83300/
- Word Sense Disambiguation based on Semantic Density
- This paper presents a Word Sense Disambiguation method based on the idea of semantic density between words. The disambiguation is done in the context of WordNet. The Internet is used as a raw corpora to provide statistical information for word associations. A metric is introduced and used to measure the semantic density and to rank all possible combinations of the senses of two words. This method provides a precision of 58% in indicating the correct sense for both words at the same time. The precision increases as we consider more choices: 70% for top two ranked and 73% for top three ranked. digital.library.unt.edu/ark:/67531/metadc83303/
- A WordNet-Based Interface to Internet Search Engines
- This paper discusses a WordNet-based interface to Internet search engines. A vast amount of information is available on the Internet, and naturally, many information gathering tools have been developed. Several search engines with different characteristics, such as Alta Vista, Lycos, Infoseek, and others are available. However, the web information retrieval technology is still in its infancy, and there is need for considerable improvement. Some inherent difficulties are: (1) the web information is diverse and highly unstructured, (2) the size of information is large and it grows at an exponential rate, and (3) the current search engine technology is still rudimentary. While the first two issues are more profound and require long term solutions, it may be possible to develop software around the search engines to improve the quality of the information retrieved. In this paper the authors present a natural language interface system to a search engine and discuss some of the results obtained. digital.library.unt.edu/ark:/67531/metadc83305/
- CCAP: A Strategic Tool for Managing Capacity of CDMA Networks
- This presentation discusses CCAP, a strategic tool for managing capacity of CDMA networks. CCAP is a graphical interactive tool for CDMA that calculates the coverage area, call capacity of a CDMA network, and subscriber network performance to optimize capacity. digital.library.unt.edu/ark:/67531/metadc81373/