UNT Theses and Dissertations - Browse

ABOUT BROWSE FEED

Bayesian Probabilistic Reasoning Applied to Mathematical Epidemiology for Predictive Spatiotemporal Analysis of Infectious Diseases

Description: Abstract Probabilistic reasoning under uncertainty suits well to analysis of disease dynamics. The stochastic nature of disease progression is modeled by applying the principles of Bayesian learning. Bayesian learning predicts the disease progression, including prevalence and incidence, for a geographic region and demographic composition. Public health resources, prioritized by the order of risk levels of the population, will efficiently minimize the disease spread and curtail the epidemic at the earliest. A Bayesian network representing the outbreak of influenza and pneumonia in a geographic region is ported to a newer region with different demographic composition. Upon analysis for the newer region, the corresponding prevalence of influenza and pneumonia among the different demographic subgroups is inferred for the newer region. Bayesian reasoning coupled with disease timeline is used to reverse engineer an influenza outbreak for a given geographic and demographic setting. The temporal flow of the epidemic among the different sections of the population is analyzed to identify the corresponding risk levels. In comparison to spread vaccination, prioritizing the limited vaccination resources to the higher risk groups results in relatively lower influenza prevalence. HIV incidence in Texas from 1989-2002 is analyzed using demographic based epidemic curves. Dynamic Bayesian networks are integrated with probability distributions of HIV surveillance data coupled with the census population data to estimate the proportion of HIV incidence among the different demographic subgroups. Demographic based risk analysis lends to observation of varied spectrum of HIV risk among the different demographic subgroups. A methodology using hidden Markov models is introduced that enables to investigate the impact of social behavioral interactions in the incidence and prevalence of infectious diseases. The methodology is presented in the context of simulated disease outbreak data for influenza. Probabilistic reasoning analysis enhances the understanding of disease progression in order to identify the critical points of surveillance, ...
Date: May 2006
Creator: Abbas, Kaja Moinudeen
Partner: UNT Libraries

Boosting for Learning From Imbalanced, Multiclass Data Sets

Description: In many real-world applications, it is common to have uneven number of examples among multiple classes. The data imbalance, however, usually complicates the learning process, especially for the minority classes, and results in deteriorated performance. Boosting methods were proposed to handle the imbalance problem. These methods need elongated training time and require diversity among the classifiers of the ensemble to achieve improved performance. Additionally, extending the boosting method to handle multi-class data sets is not straightforward. Examples of applications that suffer from imbalanced multi-class data can be found in face recognition, where tens of classes exist, and in capsule endoscopy, which suffers massive imbalance between the classes. This dissertation introduces RegBoost, a new boosting framework to address the imbalanced, multi-class problems. This method applies a weighted stratified sampling technique and incorporates a regularization term that accommodates multi-class data sets and automatically determines the error bound of each base classifier. The regularization parameter penalizes the classifier when it misclassifies instances that were correctly classified in the previous iteration. The parameter additionally reduces the bias towards majority classes. Experiments are conducted using 12 diverse data sets with moderate to high imbalance ratios. The results demonstrate superior performance of the proposed method compared to several state-of-the-art algorithms for imbalanced, multi-class classification problems. More importantly, the sensitivity improvement of the minority classes using RegBoost is accompanied with the improvement of the overall accuracy for all classes. With unpredictability regularization, a diverse group of classifiers are created and the maximum accuracy improvement reaches above 24%. Using stratified undersampling, RegBoost exhibits the best efficiency. The reduction in computational cost is significant reaching above 50%. As the volume of training data increase, the gain of efficiency with the proposed method becomes more significant.
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: December 2013
Creator: Abouelenien, Mohamed
Partner: UNT Libraries

Qos Aware Service Oriented Architecture

Description: Service-oriented architecture enables web services to operate in a loosely-coupled setting and provides an environment for dynamic discovery and use of services over a network using standards such as WSDL, SOAP, and UDDI. Web service has both functional and non-functional characteristics. This thesis work proposes to add QoS descriptions (non-functional properties) to WSDL and compose various services to form a business process. This composition of web services also considers QoS properties along with functional properties and the composed services can again be published as a new Web Service and can be part of any other composition using Composed WSDL.
Date: August 2013
Creator: Adepu, Sagarika
Partner: UNT Libraries

Computer Realization of Human Music Cognition

Description: This study models the human process of music cognition on the digital computer. The definition of music cognition is derived from the work in music cognition done by the researchers Carol Krumhansl and Edward Kessler, and by Mari Jones, as well as from the music theories of Heinrich Schenker. The computer implementation functions in three stages. First, it translates a musical "performance" in the form of MIDI (Musical Instrument Digital Interface) messages into LISP structures. Second, the various parameters of the performance are examined separately a la Jones's joint accent structure, quantified according to psychological findings, and adjusted to a common scale. The findings of Krumhansl and Kessler are used to evaluate the consonance of each note with respect to the key of the piece and with respect to the immediately sounding harmony. This process yields a multidimensional set of points, each of which is a cognitive evaluation of a single musical event within the context of the piece of music within which it occurred. This set of points forms a metric space in multi-dimensional Euclidean space. The third phase of the analysis maps the set of points into a topology-preserving data structure for a Schenkerian-like middleground structural analysis. This process yields a hierarchical stratification of all the musical events (notes) in a piece of music. It has been applied to several pieces of music with surprising results. In each case, the analysis obtained very closely resembles a structural analysis which would be supplied by a human theorist. The results obtained invite us to take another look at the representation of knowledge and perception from another perspective, that of a set of points in a topological space, and to ask if such a representation might not be useful in other domains. It also leads us to ask if such a ...
Date: August 1988
Creator: Albright, Larry E. (Larry Eugene)
Partner: UNT Libraries

3GPP Long Term Evolution LTE Scheduling

Description: Future generation cellular networks are expected to deliver an omnipresent broadband access network for an endlessly increasing number of subscribers. Long term Evolution (LTE) represents a significant milestone towards wireless networks known as 4G cellular networks. A key feature of LTE is the implementation of enhanced Radio Resource Management (RRM) mechanism to improve the system performance. The structure of LTE networks was simplified by diminishing the number of the nodes of the core network. Also, the design of the radio protocol architecture is quite unique. In order to achieve high data rate in LTE, 3rd Generation Partnership Project (3GPP) has selected Orthogonal Frequency Division Multiplexing (OFDM) as an appropriate scheme in terms of downlinks. However, the proper scheme for an uplink is the Single-Carrier Frequency Domain Multiple Access due to the peak-to-average-power-ratio (PAPR) constraint. LTE packet scheduling plays a primary role as part of RRM to improve the system’s data rate as well as supporting various QoS requirements of mobile services. The major function of the LTE packet scheduler is to assign Physical Resource Blocks (PRBs) to mobile User Equipment (UE). In our work, we formed a proposed packet scheduler algorithm. The proposed scheduler algorithm acts based on the number of UEs attached to the eNodeB. To evaluate the proposed scheduler algorithm, we assumed two different scenarios based on a number of UEs. When the number of UE is lower than the number of PRBs, the UEs with highest Channel Quality Indicator (CQI) will be assigned PRBs. Otherwise, the scheduler will assign PRBs based on a given proportional fairness metric. The eNodeB’s throughput is increased when the proposed algorithm was implemented.
Date: December 2013
Creator: Alotaibi, Sultan
Partner: UNT Libraries

Real-time Rendering of Burning Objects in Video Games

Description: In recent years there has been growing interest in limitless realism in computer graphics applications. Among those, my foremost concentration falls into the complex physical simulations and modeling with diverse applications for the gaming industry. Different simulations have been virtually successful by replicating the details of physical process. As a result, some were strong enough to lure the user into believable virtual worlds that could destroy any sense of attendance. In this research, I focus on fire simulations and its deformation process towards various virtual objects. In most game engines model loading takes place at the beginning of the game or when the game is transitioning between levels. Game models are stored in large data structures. Since changing or adjusting a large data structure while the game is proceeding may adversely affect the performance of the game. Therefore, developers may choose to avoid procedural simulations to save resources and avoid interruptions on performance. I introduce a process to implement a real-time model deformation while maintaining performance. It is a challenging task to achieve high quality simulation while utilizing minimum resources to represent multiple events in timely manner. Especially in video games, this overwhelming criterion would be robust enough to sustain the engaging player's willing suspension of disbelief. I have implemented and tested my method on a relatively modest GPU using CUDA. My experiments conclude this method gives a believable visual effect while using small fraction of CPU and GPU resources.
Date: August 2013
Creator: Amarasinghe, Dhanyu Eshaka
Partner: UNT Libraries

An Integrated Architecture for Ad Hoc Grids

Description: Extensive research has been conducted by the grid community to enable large-scale collaborations in pre-configured environments. grid collaborations can vary in scale and motivation resulting in a coarse classification of grids: national grid, project grid, enterprise grid, and volunteer grid. Despite the differences in scope and scale, all the traditional grids in practice share some common assumptions. They support mutually collaborative communities, adopt a centralized control for membership, and assume a well-defined non-changing collaboration. To support grid applications that do not confirm to these assumptions, we propose the concept of ad hoc grids. In the context of this research, we propose a novel architecture for ad hoc grids that integrates a suite of component frameworks. Specifically, our architecture combines the community management framework, security framework, abstraction framework, quality of service framework, and reputation framework. The overarching objective of our integrated architecture is to support a variety of grid applications in a self-controlled fashion with the help of a self-organizing ad hoc community. We introduce mechanisms in our architecture that successfully isolates malicious elements from the community, inherently improving the quality of grid services and extracting deterministic quality assurances from the underlying infrastructure. We also emphasize on the technology-independence of our architecture, thereby offering the requisite platform for technology interoperability. The feasibility of the proposed architecture is verified with a high-quality ad hoc grid implementation. Additionally, we have analyzed the performance and behavior of ad hoc grids with respect to several control parameters.
Date: May 2006
Creator: Amin, Kaizar Abdul Husain
Partner: UNT Libraries

Resource Efficient and Scalable Routing using Intelligent Mobile Agents

Description: Many of the contemporary routing algorithms use simple mechanisms such as flooding or broadcasting to disseminate the routing information available to them. Such routing algorithms cause significant network resource overhead due to the large number of messages generated at each host/router throughout the route update process. Many of these messages are wasteful since they do not contribute to the route discovery process. Reducing the resource overhead may allow for several algorithms to be deployed in a wide range of networks (wireless and ad-hoc) which require a simple routing protocol due to limited availability of resources (memory and bandwidth). Motivated by the need to reduce the resource overhead associated with routing algorithms a new implementation of distance vector routing algorithm using an agent-based paradigm known as Agent-based Distance Vector Routing (ADVR) has been proposed. In ADVR, the ability of route discovery and message passing shifts from the nodes to individual agents that traverse the network, co-ordinate with each other and successively update the routing tables of the nodes they visit.
Date: May 2003
Creator: Amin, Kaizar Abdul Husain
Partner: UNT Libraries

Independent Quadtrees

Description: This dissertation deals with the problem of manipulating and storing an image using quadtrees. A quadtree is a tree in which each node has four ordered children or is a leaf. It can be used to represent an image via hierarchical decomposition. The image is broken into four regions. A region can be a solid color (homogeneous) or a mixture of colors (heterogeneous). If a region is heterogeneous it is broken into four subregions, and the process continues recursively until all subregions are homogeneous. The traditional quadtree suffers from dependence on the underlying grid. The grid coordinate system is implicit, and therefore fixed. The fixed coordinate system implies a rigid tree. A rigid tree cannot be translated, scaled, or rotated. Instead, a new tree must be built which is the result of one of these transformations. This dissertation introduces the independent quadtree. The independent quadtree is free of any underlying coordinate system. The tree is no longer rigid and can be easily translated, scaled, or rotated. Algorithms to perform these operations axe presented. The translation and rotation algorithms take constant time. The scaling algorithm has linear time in the number nodes in the tree. The disadvantage of independent quadtrees is the longer generation and display time. This dissertation also introduces an alternate method of hierarchical decomposition. This new method finds the largest homogeneous block with respect to the corners of the image. This block defines the division point for the decomposition. If the size of the block is below some cutoff point, it is deemed to be to small to make the overhead worthwhile and the traditional method is used instead. This new method is compared to the traditional method on randomly generated rectangles, triangles, and circles. The new method is shown to use significantly less space for all three ...
Date: December 1986
Creator: Atwood, Larry D. (Larry Dale)
Partner: UNT Libraries

Inheritance Problems in Object-Oriented Database

Description: This research is concerned with inheritance as used in object-oriented database. More specifically, partial bi-directional inheritance among classes is examined. In partial inheritance, a class can inherit a proper subset of instance variables from another class. Two subclasses of the same superclass do not need to inherit the same proper subset of instance variables from their superclass. Bi-directional partial inheritance allows a class to inherit instance variables from its subclass. The prototype of an object-oriented database that supports both full and partial bi-directional inheritance among classes was developed on top of an existing relational database management system. The prototype was tested with two database applications. One database application needs full and partial inheritance. The second database application required bi-directional inheritance. The result of this testing suggests both advantages and disadvantages of partial bi-directional inheritance. Future areas of research are also suggested.
Date: May 1989
Creator: Auepanwiriyakul, Raweewan
Partner: UNT Libraries

Privacy Management for Online Social Networks

Description: One in seven people in the world use online social networking for a variety of purposes -- to keep in touch with friends and family, to share special occasions, to broadcast announcements, and more. The majority of society has been bought into this new era of communication technology, which allows everyone on the internet to share information with friends. Since social networking has rapidly become a main form of communication, holes in privacy have become apparent. It has come to the point that the whole concept of sharing information requires restructuring. No longer are online social networks simply technology available for a niche market; they are in use by all of society. Thus it is important to not forget that a sense of privacy is inherent as an evolutionary by-product of social intelligence. In any context of society, privacy needs to be a part of the system in order to help users protect themselves from others. This dissertation attempts to address the lack of privacy management in online social networks by designing models which understand the social science behind how we form social groups and share information with each other. Social relationship strength was modeled using activity patterns, vocabulary usage, and behavioral patterns. In addition, automatic configuration for default privacy settings was proposed to help prevent new users from leaking personal information. This dissertation aims to mobilize a new era of social networking that understands social aspects of human network, and uses that knowledge to honor users' privacy.
Date: August 2013
Creator: Baatarjav, Enkh-Amgalan
Partner: UNT Libraries

Unique Channel Email System

Description: Email connects 85% of the world. This paper explores the pattern of information overload encountered by majority of email users and examine what steps key email providers are taking to combat the problem. Besides fighting spam, popular email providers offer very limited tools to reduce the amount of unwanted incoming email. Rather, there has been a trend to expand storage space and aid the organization of email. Storing email is very costly and harmful to the environment. Additionally, information overload can be detrimental to productivity. We propose a simple solution that results in drastic reduction of unwanted mail, also known as graymail.
Date: August 2015
Creator: Balakchiev, Milko
Partner: UNT Libraries

The Role of Intelligent Mobile Agents in Network Management and Routing

Description: In this research, the application of intelligent mobile agents to the management of distributed network environments is investigated. Intelligent mobile agents are programs which can move about network systems in a deterministic manner in carrying their execution state. These agents can be considered an application of distributed artificial intelligence where the (usually small) agent code is moved to the data and executed locally. The mobile agent paradigm offers potential advantages over many conventional mechanisms which move (often large) data to the code, thereby wasting available network bandwidth. The performance of agents in network routing and knowledge acquisition has been investigated and simulated. A working mobile agent system has also been designed and implemented in JDK 1.2.
Date: December 2000
Creator: Balamuru, Vinay Gopal
Partner: UNT Libraries

FORTRAN Optimizations at the Source Code Level

Description: This paper discusses FORTRAN optimizations that the user can perform manually at the source code level to improve object code performance. It makes use of descriptive examples within the text of the paper for explanatory purposes. The paper defines key areas in writing a FORTRAN program and recommends ways to improve efficiency in these areas.
Date: August 1977
Creator: Barber, Willie D.
Partner: UNT Libraries

Multi-perspective, Multi-modal Image Registration and Fusion

Description: Multi-modal image fusion is an active research area with many civilian and military applications. Fusion is defined as strategic combination of information collected by various sensors from different locations or different types in order to obtain a better understanding of an observed scene or situation. Fusion of multi-modal images cannot be completed unless these two modalities are spatially aligned. In this research, I consider two important problems. Multi-modal, multi-perspective image registration and decision level fusion of multi-modal images. In particular, LiDAR and visual imagery. Multi-modal image registration is a difficult task due to the different semantic interpretation of features extracted from each modality. This problem is decoupled into three sub-problems. The first step is identification and extraction of common features. The second step is the determination of corresponding points. The third step consists of determining the registration transformation parameters. Traditional registration methods use low level features such as lines and corners. Using these features require an extensive optimization search in order to determine the corresponding points. Many methods use global positioning systems (GPS), and a calibrated camera in order to obtain an initial estimate of the camera parameters. The advantages of our work over the previous works are the following. First, I used high level-features, which significantly reduce the search space for the optimization process. Second, the determination of corresponding points is modeled as an assignment problem between a small numbers of objects. On the other side, fusing LiDAR and visual images is beneficial, due to the different and rich characteristics of both modalities. LiDAR data contain 3D information, while images contain visual information. Developing a fusion technique that uses the characteristics of both modalities is very important. I establish a decision-level fusion technique using manifold models.
Date: August 2012
Creator: Belkhouche, Mohammed Yassine
Partner: UNT Libraries

Hopfield Networks as an Error Correcting Technique for Speech Recognition

Description: I experimented with Hopfield networks in the context of a voice-based, query-answering system. Hopfield networks are used to store and retrieve patterns. I used this technique to store queries represented as natural language sentences and I evaluated the accuracy of the technique for error correction in a spoken question-answering dialog between a computer and a user. I show that the use of an auto-associative Hopfield network helps make the speech recognition system more fault tolerant. I also looked at the available encoding schemes to convert a natural language sentence into a pattern of zeroes and ones that can be stored in the Hopfield network reliably, and I suggest scalable data representations which allow storing a large number of queries.
Access: This item is restricted to the UNT Community Members at a UNT Libraries Location.
Date: May 2004
Creator: Bireddy, Chakradhar
Partner: UNT Libraries

Modeling and Simulation of the Vector-Borne Dengue Disease and the Effects of Regional Variation of Temperature in the Disease Prevalence in Homogenous and Heterogeneous Human Populations

Description: The history of mitigation programs to contain vector-borne diseases is a story of successes and failures. Due to the complex interplay among multiple factors that determine disease dynamics, the general principles for timely and specific intervention for incidence reduction or eradication of life-threatening diseases has yet to be determined. This research discusses computational methods developed to assist in the understanding of complex relationships affecting vector-borne disease dynamics. A computational framework to assist public health practitioners with exploring the dynamics of vector-borne diseases, such as malaria and dengue in homogenous and heterogeneous populations, has been conceived, designed, and implemented. The framework integrates a stochastic computational model of interactions to simulate horizontal disease transmission. The intent of the computational modeling has been the integration of stochasticity during simulation of the disease progression while reducing the number of necessary interactions to simulate a disease outbreak. While there are improvements in the computational time reducing the number of interactions needed for simulating disease dynamics, the realization of interactions can remain computationally expensive. Using multi-threading technology to improve performance upon the original computational model, multi-threading experimental results have been tested and reported. In addition, to the contact model, the modeling of biological processes specific to the corresponding pathogen-carrier vector to increase the specificity of the vector-borne disease has been integrated. Last, automation for requesting, retrieving, parsing, and storing specific weather data and geospatial information from federal agencies to study the differences between homogenous and heterogeneous populations has been implemented.
Date: August 2016
Creator: Bravo-Salgado, Angel D
Partner: UNT Libraries

Freeform Cursive Handwriting Recognition Using a Clustered Neural Network

Description: Optical character recognition (OCR) software has advanced greatly in recent years. Machine-printed text can be scanned and converted to searchable text with word accuracy rates around 98%. Reasonably neat hand-printed text can be recognized with about 85% word accuracy. However, cursive handwriting still remains a challenge, with state-of-the-art performance still around 75%. Algorithms based on hidden Markov models have been only moderately successful, while recurrent neural networks have delivered the best results to date. This thesis explored the feasibility of using a special type of feedforward neural network to convert freeform cursive handwriting to searchable text. The hidden nodes in this network were grouped into clusters, with each cluster being trained to recognize a unique character bigram. The network was trained on writing samples that were pre-segmented and annotated. Post-processing was facilitated in part by using the network to identify overlapping bigrams that were then linked together to form words and sentences. With dictionary assisted post-processing, the network achieved word accuracy of 66.5% on a small, proprietary corpus. The contributions in this thesis are threefold: 1) the novel clustered architecture of the feed-forward neural network, 2) the development of an expanded set of observers combining image masks, modifiers, and feature characterizations, and 3) the use of overlapping bigrams as the textual working unit to assist in context analysis and reconstruction.
Date: August 2015
Creator: Bristow, Kelly H.
Partner: UNT Libraries

SEM Predicting Success of Student Global Software Development Teams

Description: The extensive use of global teams to develop software has prompted researchers to investigate various factors that can enhance a team’s performance. While a significant body of research exists on global software teams, previous research has not fully explored the interrelationships and collective impact of various factors on team performance. This study explored a model that added the characteristics of a team’s culture, ability, communication frequencies, response rates, and linguistic categories to a central framework of team performance. Data was collected from two student software development projects that occurred between teams located in the United States, Panama, and Turkey. The data was obtained through online surveys and recorded postings of team activities that occurred throughout the global software development projects. Partial least squares path modeling (PLS-PM) was chosen as the analytic technique to test the model and identify the most influential factors. Individual factors associated with response rates and linguistic characteristics proved to significantly affect a team’s activity related to grade on the project, group cohesion, and the number of messages received and sent. Moreover, an examination of possible latent homogeneous segments in the model supported the existence of differences among groups based on leadership style. Teams with assigned leaders tended to have stronger relationships between linguistic characteristics and team performance factors, while teams with emergent leaders had stronger. Relationships between response rates and team performance factors. The contributions in this dissertation are three fold. 1) Novel analysis techniques using PLS-PM and clustering, 2) Use of new, quantifiable variables in analyzing team activity, 3) Identification of plausible causal indicators for team performance and analysis of the same.
Date: May 2015
Creator: Brooks, Ian Robert
Partner: UNT Libraries

Computerized Analysis of Radiograph Images of Embedded Objects as Applied to Bone Location and Mineral Content Measurement

Description: This investigation dealt with locating and measuring x-ray absorption of radiographic images. The methods developed provide a fast, accurate, minicomputer control, for analysis of embedded objects. A PDP/8 computer system was interfaced with a Joyce Loebl 3CS Microdensitometer and a Leeds & Northrup Recorder. Proposed algorithms for bone location and data smoothing work on a twelve-bit minicomputer. Designs of a software control program and operational procedure are presented. The filter made wedge and limb scans monotonic from minima to maxima. It was tested for various convoluted intervals. Ability to resmooth the same data in multiple passes was tested. An interval size of fifteen works well in one pass.
Date: August 1976
Creator: Buckner, Richard L.
Partner: UNT Libraries

A Design Approach for Digital Computer Peripheral Controllers, Case Study Design and Construction

Description: The purpose of this project was to describe a novel design approach for a digital computer peripheral controller, then design and construct a case study controller. This document consists of three chapters and an appendix. Chapter II presents the design approach chosen; a variation to a design presented by Charles R. Richards in an article published in Electronics magazine. Richards' approach consists of a finite state machine circuitry controlling all the functions of a controller. The variation to Richards' approach consists of considering the various logically independent processes which a controller carries out and assigning control of each process to a separate finite state machine. The appendix contains the documentation of the design and construction of the controller.
Date: May 1976
Creator: Cabrera, A. L.
Partner: UNT Libraries

Practical Cursive Script Recognition

Description: This research focused on the off-line cursive script recognition application. The problem is very large and difficult and there is much room for improvement in every aspect of the problem. Many different aspects of this problem were explored in pursuit of solutions to create a more practical and usable off-line cursive script recognizer than is currently available.
Date: August 1995
Creator: Carroll, Johnny Glen, 1953-
Partner: UNT Libraries

Automated Testing of Interactive Systems

Description: Computer systems which interact with human users to collect, update or provide information are growing more complex. Additionally, users are demanding more thorough testing of all computer systems. Because of the complexity and thoroughness required, automation of interactive systems testing is desirable, especially for functional testing. Many currently available testing tools, like program proving, are impractical for testing large systems. The solution presented here is the development of an automated test system which simulates human users. This system incorporates a high-level programming language, ATLIS. ATLIS programs are compiled and interpretively executed. Programs are selected for execution by operator command, and failures are reported to the operator's console. An audit trail of all activity is provided. This solution provides improved efficiency and effectiveness over conventional testing methods.
Date: May 1977
Creator: Cartwright, Stephen C.
Partner: UNT Libraries

Investigating the Extractive Summarization of Literary Novels

Description: Abstract Due to the vast amount of information we are faced with, summarization has become a critical necessity of everyday human life. Given that a large fraction of the electronic documents available online and elsewhere consist of short texts such as Web pages, news articles, scientific reports, and others, the focus of natural language processing techniques to date has been on the automation of methods targeting short documents. We are witnessing however a change: an increasingly larger number of books become available in electronic format. This means that the need for language processing techniques able to handle very large documents such as books is becoming increasingly important. This thesis addresses the problem of summarization of novels, which are long and complex literary narratives. While there is a significant body of research that has been carried out on the task of automatic text summarization, most of this work has been concerned with the summarization of short documents, with a particular focus on news stories. However, novels are different in both length and genre, and consequently different summarization techniques are required. This thesis attempts to close this gap by analyzing a new domain for summarization, and by building unsupervised and supervised systems that effectively take into account the properties of long documents, and outperform the traditional extractive summarization systems typically addressing news genre.
Date: December 2011
Creator: Ceylan, Hakan
Partner: UNT Libraries