A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources

Date: May 2008
Creator: Banea, Carmen; Wiebe, Janyce M. & Mihalcea, Rada, 1974-
Description: This article discusses a bootstrapping method for building subjectivity lexicons for languages with scarce resources.
Contributing Partner: UNT College of Engineering
Networks and Natural Language Processing

Networks and Natural Language Processing

Date: September 2008
Creator: Radev, Dragomir R. & Mihalcea, Rada, 1974-
Description: Article discussing networks and natural language processing. The authors present some of the most successful graph-based representations and algorithms used in language processing and try to explain how and why they work.
Contributing Partner: UNT College of Engineering
[Review] The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data

[Review] The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data

Date: March 2008
Creator: Mihalcea, Rada, 1974-
Description: This book review discusses 'The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data' by Ronen Feldman and James Sanger. The book is an introduction to text mining, covering the general architecture of text mining systems, along with the main techniques used by such systems.
Contributing Partner: UNT College of Engineering
Learning to Identify Emotions in Text

Learning to Identify Emotions in Text

Date: March 2008
Creator: Strapparava, Carlo, 1962- & Mihalcea, Rada, 1974-
Description: This paper discusses learning to identify emotions in text.
Contributing Partner: UNT College of Engineering
Linguistic Ethnography: Identifying Dominant Word Classes in Text

Linguistic Ethnography: Identifying Dominant Word Classes in Text

Date: March 2009
Creator: Pulman, Stephen & Mihalcea, Rada, 1974-
Description: This paper discusses linguistic ethnography.
Contributing Partner: UNT College of Engineering
Machine Language Techniques for Conversational Agents

Machine Language Techniques for Conversational Agents

Date: December 2003
Creator: Sule, Manisha D.
Description: Machine Learning is the ability of a machine to perform better at a given task, using its previous experience. Various algorithms like decision trees, Bayesian learning, artificial neural networks and instance-based learning algorithms are used widely in machine learning systems. Current applications of machine learning include credit card fraud detection, customer service based on history of purchased products, games and many more. The application of machine learning techniques to natural language processing (NLP) has increased tremendously in recent years. Examples are handwriting recognition and speech recognition. The problem we tackle in this Problem in Lieu of Thesis is applying machine-learning techniques to improve the performance of a conversational agent. The OpenMind repository of common sense, in the form of question-answer pairs is treated as the training data for the machine learning system. WordNet is interfaced with to capture important semantic and syntactic information about the words in the sentences. Further, k-closest neighbors algorithm, an instance based learning algorithm is used to simulate a case based learning system. The resulting system is expected to be able to answer new queries with knowledge gained from the training data it was fed with.
Contributing Partner: UNT Libraries
UNT 2005 TREC QA Participation: Using Lemur as IR Search Engine

UNT 2005 TREC QA Participation: Using Lemur as IR Search Engine

Date: 2005
Creator: Chen, Jiangping; Yu, Ping & Ge, He
Description: This paper reports the authors' TREC 2005 QA participation. The authors' QA system Eagle QA developed last year was expanded and modified for this year's QA experiments. Particularly, the authors used Lemur 4.1 as the Information Retrieval (IR) Engine this year to find documents that may contain answers for the test questions from the document collection. The authors' result shows Lemur did a reasonable job on finding relevant documents. But certainly there is room for further improvement.
Contributing Partner: UNT College of Information
UNT at TREC 2004: Question Answering Combining Multiple Evidences

UNT at TREC 2004: Question Answering Combining Multiple Evidences

Date: 2004
Creator: Chen, Jiangping; Ge, He; Wu, Yan & Jiang, Shikun
Description: This paper discusses Question Answering (QA) combining multiple evidences.
Contributing Partner: UNT College of Information
Texas Newspapers Natural Language Processing

Texas Newspapers Natural Language Processing

Date: April 7, 2013
Creator: Torget, Andrew J., 1978-
Description: This dataset includes data on natural language processing from the Texas Newspapers Project. The dataset includes word counts, name entity recognition results, and topic models.
Contributing Partner: UNT Libraries
Automatic Tagging of Communication Data

Automatic Tagging of Communication Data

Date: August 2012
Creator: Hoyt, Matthew Ray
Description: Globally distributed software teams are widespread throughout industry. But finding reliable methods that can properly assess a team's activities is a real challenge. Methods such as surveys and manual coding of activities are too time consuming and are often unreliable. Recent advances in information retrieval and linguistics, however, suggest that automated and/or semi-automated text classification algorithms could be an effective way of finding differences in the communication patterns among individuals and groups. Communication among group members is frequent and generates a significant amount of data. Thus having a web-based tool that can automatically analyze the communication patterns among global software teams could lead to a better understanding of group performance. The goal of this thesis, therefore, is to compare automatic and semi-automatic measures of communication and evaluate their effectiveness in classifying different types of group activities that occur within a global software development project. In order to achieve this goal, we developed a web-based component that can be used to help clean and classify communication activities. The component was then used to compare different automated text classification techniques on various group activities to determine their effectiveness in correctly classifying data from a global software development team project.
Contributing Partner: UNT Libraries
FIRST PREV 1 2 3 4 NEXT LAST