UNT Theses and Dissertations - 195 Matching Results

Search Results

Mediation on XQuery Views

Description: The major goal of information integration is to provide efficient and easy-to-use access to multiple heterogeneous data sources with a single query. At the same time, one of the current trends is to use standard technologies for implementing solutions to complex software problems. In this dissertation, I used XML and XQuery as the standard technologies and have developed an extended projection algorithm to provide a solution to the information integration problem. In order to demonstrate my solution, I implemented a prototype mediation system called Omphalos based on XML related technologies. The dissertation describes the architecture of the system, its metadata, and the process it uses to answer queries. The system uses XQuery expressions (termed metaqueries) to capture complex mappings between global schemas and data source schemas. The system then applies these metaqueries in order to rewrite a user query on a virtual global database (representing the integrated view of the heterogeneous data sources) to a query (termed an outsourced query) on the real data sources. An extended XML document projection algorithm was developed to increase the efficiency of selecting the relevant subset of data from an individual data source to answer the user query. The system applies the projection algorithm to decompose an outsourced query into atomic queries which are each executed on a single data source. I also developed an algorithm to generate integrating queries, which the system uses to compose the answers from the atomic queries into a single answer to the original user query. I present a proof of both the extended XML document projection algorithm and the query integration algorithm. An analysis of the efficiency of the new extended algorithm is also presented. Finally I describe a collaborative schema-matching tool that was implemented to facilitate maintaining metadata.
Date: December 2006
Creator: Peng, Xiaobo
Partner: UNT Libraries

Energy-Aware Time Synchronization in Wireless Sensor Networks

Description: I present a time synchronization algorithm for wireless sensor networks that aims to conserve sensor battery power. The proposed method creates a hierarchical tree by flooding the sensor network from a designated source point. It then uses a hybrid algorithm derived from the timing-sync protocol for sensor networks (TSPN) and the reference broadcast synchronization method (RBS) to periodically synchronize sensor clocks by minimizing energy consumption. In multi-hop ad-hoc networks, a depleted sensor will drop information from all other sensors that route data through it, decreasing the physical area being monitored by the network. The proposed method uses several techniques and thresholds to maintain network connectivity. A new root sensor is chosen when the current one's battery power decreases to a designated value. I implement this new synchronization technique using Matlab and show that it can provide significant power savings over both TPSN and RBS.
Date: December 2006
Creator: Saravanos, Yanos
Partner: UNT Libraries

Grid-based Coordinated Routing in Wireless Sensor Networks

Description: Wireless sensor networks are battery-powered ad-hoc networks in which sensor nodes that are scattered over a region connect to each other and form multi-hop networks. These nodes are equipped with sensors such as temperature sensors, pressure sensors, and light sensors and can be queried to get the corresponding values for analysis. However, since they are battery operated, care has to be taken so that these nodes use energy efficiently. One of the areas in sensor networks where an energy analysis can be done is routing. This work explores grid-based coordinated routing in wireless sensor networks and compares the energy available in the network over time for different grid sizes.
Date: December 2006
Creator: Sawant, Uttara
Partner: UNT Libraries

Power-benefit analysis of erasure encoding with redundant routing in sensor networks.

Description: One of the problems sensor networks face is adversaries corrupting nodes along the path to the base station. One way to reduce the effect of these attacks is multipath routing. This introduces some intrusion-tolerance in the network by way of redundancy but at the cost of a higher power consumption by the sensor nodes. Erasure coding can be applied to this scenario in which the base station can receive a subset of the total data sent and reconstruct the entire message packet at its end. This thesis uses two commonly used encodings and compares their performance with respect to power consumed for unencoded data in multipath routing. It is found that using encoding with multipath routing reduces the power consumption and at the same time enables the user to send reasonably large data sizes. The experiments in this thesis were performed on the Tiny OS platform with the simulations done in TOSSIM and the power measurements were taken in PowerTOSSIM. They were performed on the simple radio model and the lossy radio model provided by Tiny OS. The lossy radio model was simulated with distances of 10 feet, 15 feet and 20 feet between nodes. It was found that by using erasure encoding, double or triple the data size can be sent at the same power consumption rate as unencoded data. All the experiments were performed with the radio set at a normal transmit power, and later a high transmit power.
Date: December 2006
Creator: Vishwanathan, Roopa
Partner: UNT Libraries

Timing and Congestion Driven Algorithms for FPGA Placement

Description: Placement is one of the most important steps in physical design for VLSI circuits. For field programmable gate arrays (FPGAs), the placement step determines the location of each logic block. I present novel timing and congestion driven placement algorithms for FPGAs with minimal runtime overhead. By predicting the post-routing timing-critical edges and estimating congestion accurately, this algorithm is able to simultaneously reduce the critical path delay and the minimum number of routing tracks. The core of the algorithm consists of a criticality-history record of connection edges and a congestion map. This approach is applied to the 20 largest Microelectronics Center of North Carolina (MCNC) benchmark circuits. Experimental results show that compared with the state-of-the-art FPGA place and route package, the Versatile Place and Route (VPR) suite, this algorithm yields an average of 8.1% reduction (maximum 30.5%) in the critical path delay and 5% reduction in channel width. Meanwhile, the average runtime of the algorithm is only 2.3X as of VPR.
Date: December 2006
Creator: Zhuo, Yue
Partner: UNT Libraries

Comparison and Evaluation of Existing Analog Circuit Simulator using Sigma-Delta Modulator

Description: In the world of VLSI (very large scale integration) technology, there are many different types of circuit simulators that are used to design and predict the circuit behavior before actual fabrication of the circuit. In this thesis, I compared and evaluated existing circuit simulators by considering standard benchmark circuits. The circuit simulators which I evaluated and explored are Ngspice, Tclspice, Winspice (open source) and Spectre® (commercial). I also tested standard benchmarks using these circuit simulators and compared their outputs. The simulators are evaluated using design metrics in order to quantify their performance and identify efficient circuit simulators. In addition, I designed a sigma-delta modulator and its individual components using the analog behavioral language Verilog-A. Initially, I performed simulations of individual components of the sigma-delta modulator and later of the whole system. Finally, CMOS (complementary metal-oxide semiconductor) transistor-level circuits were designed for the differential amplifier, operational amplifier and comparator of the modulator.
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: December 2006
Creator: Ale, Anil Kumar
Partner: UNT Libraries

A Language and Visual Interface to Specify Complex Spatial Pattern Mining

Description: The emerging interests in spatial pattern mining leads to the demand for a flexible spatial pattern mining language, on which easy to use and understand visual pattern language could be built. It is worthwhile to define a pattern mining language called LCSPM to allow users to specify complex spatial patterns. I describe a proposed pattern mining language in this paper. A visual interface which allows users to specify the patterns visually is developed. Visual pattern queries are translated into the LCSPM language by a parser and data mining process can be triggered afterwards. The visual language is based on and goes beyond the visual language proposed in literature. I implemented a prototype system based on the open source JUMP framework.
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: December 2006
Creator: Li, Xiaohui
Partner: UNT Libraries

A Multi-Variate Analysis of SMTP Paths and Relays to Restrict Spam and Phishing Attacks in Emails

Description: The classifier discussed in this thesis considers the path traversed by an email (instead of its content) and reputation of the relays, features inaccessible to spammers. Groups of spammers and individual behaviors of a spammer in a given domain were analyzed to yield association patterns, which were then used to identify similar spammers. Unsolicited and phishing emails were successfully isolated from legitimate emails, using analysis results. Spammers and phishers are also categorized into serial spammers/phishers, recent spammers/phishers, prospective spammers/phishers, and suspects. Legitimate emails and trusted domains are classified into socially close (family members, friends), socially distinct (strangers etc), and opt-outs (resolved false positives and false negatives). Overall this classifier resulted in far less false positives when compared to current filters like SpamAssassin, achieving a 98.65% precision, which is well comparable to the precisions achieved by SPF, DNSRBL blacklists.
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: December 2006
Creator: Palla, Srikanth
Partner: UNT Libraries

Design and Optimization of Components in a 45nm CMOS Phase Locked Loop

Description: A novel scheme of optimizing the individual components of a phase locked loop (PLL) which is used for stable clock generation and synchronization of signals is considered in this work. Verilog-A is used for the high level system design of the main components of the PLL, followed by the individual component wise optimization. The design of experiments (DOE) approach to optimize the analog, 45nm voltage controlled oscillator (VCO) is presented. Also a mixed signal analysis using the analog and digital Verilog behavior of components is studied. Overall a high level system design of a PLL, a systematic optimization of each of its components, and an analog and mixed signal behavioral design approach have been implemented using cadence custom IC design tools.
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: December 2006
Creator: Sarivisetti, Gayathri
Partner: UNT Libraries

Modeling and reduction of gate leakage during behavioral synthesis of nanoscale CMOS circuits.

Description: The major sources of power dissipation in a nanometer CMOS circuit are capacitive switching, short-circuit current, static leakage and gate oxide tunneling. However, with the aggressive scaling of technology the gate oxide direct tunneling current (gate leakage) is emerging as a prominent component of power dissipation. For sub-65 nm CMOS technology where the gate oxide (SiO2) thickness is very low, the direct tunneling current is the major form of tunneling. There are two contribution parts in this thesis: analytical modeling of behavioral level components for direct tunneling current and propagation delay, and the reduction of tunneling current during behavioral synthesis. Gate oxides of multiple thicknesses are useful in reducing the gate leakage dissipation. Analytical models from first principles to calculate the tunneling current and the propagation delay of behavioral level components is presented, which are backed by BSIM4/5 models and SPICE simulations. These components are characterized for 45 nm technology and an algorithm is provided for scheduling of datapath operations such that the overall tunneling current dissipation of a datapath circuit under design is minimal. It is observed that the oxide thickness that is being considered is very low it may not remain constant during the course of fabrication. Hence the algorithm takes process variation into consideration. Extensive experiments are conducted for various behavioral level benchmarks under various constraints and observed significant reductions, as high as 75.3% (with an average of 64.3%).
Access: This item is restricted to the UNT Community Members at a UNT Libraries Location.
Date: May 2006
Creator: Velagapudi, Ramakrishna
Partner: UNT Libraries

A Netcentric Scientific Research Repository

Description: The Internet and networks in general have become essential tools for disseminating in-formation. Search engines have become the predominant means of finding information on the Web and all other data repositories, including local resources. Domain scientists regularly acquire and analyze images generated by equipment such as microscopes and cameras, resulting in complex image files that need to be managed in a convenient manner. This type of integrated environment has been recently termed a netcentric sci-entific research repository. I developed a number of data manipulation tools that allow researchers to manage their information more effectively in a netcentric environment. The specific contributions are: (1) A unique interface for management of data including files and relational databases. A wrapper for relational databases was developed so that the data can be indexed and searched using traditional search engines. This approach allows data in databases to be searched with the same interface as other data. Fur-thermore, this approach makes it easier for scientists to work with their data if they are not familiar with SQL. (2) A Web services based architecture for integrating analysis op-erations into a repository. This technique allows the system to leverage the large num-ber of existing tools by wrapping them with a Web service and registering the service with the repository. Metadata associated with Web services was enhanced to allow this feature to be included. In addition, an improved binary to text encoding scheme was de-veloped to reduce the size overhead for sending large scientific data files via XML mes-sages used in Web services. (3) Integrated image analysis operations with SQL. This technique allows for images to be stored and managed conveniently in a relational da-tabase. SQL supplemented with map algebra operations is used to select and perform operations on sets of images.
Access: This item is restricted to the UNT Community Members at a UNT Libraries Location.
Date: December 2006
Creator: Harrington, Brian
Partner: UNT Libraries

Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing.

Description: This research addresses the problem of automatic keyphrase extraction from large documents and back of the book indexing. The potential benefits of automating this process are far reaching, from improving information retrieval in digital libraries, to saving countless man-hours by helping professional indexers creating back of the book indexes. The dissertation introduces a new methodology to evaluate automated systems, which allows for a detailed, comparative analysis of several techniques for keyphrase extraction. We introduce and evaluate both supervised and unsupervised techniques, designed to balance the resource requirements of an automated system and the best achievable performance. Additionally, a number of novel features are proposed, including a statistical informativeness measure based on chi statistics; an encyclopedic feature that taps into the vast knowledge base of Wikipedia to establish the likelihood of a phrase referring to an informative concept; and a linguistic feature based on sophisticated semantic analysis of the text using current theories of discourse comprehension. The resulting keyphrase extraction system is shown to outperform the current state of the art in supervised keyphrase extraction by a large margin. Moreover, a fully automated back of the book indexing system based on the keyphrase extraction system was shown to lead to back of the book indexes closely resembling those created by human experts.
Date: May 2008
Creator: Csomai, Andras
Partner: UNT Libraries

General Purpose Programming on Modern Graphics Hardware

Description: I start with a brief introduction to the graphics processing unit (GPU) as well as general-purpose computation on modern graphics hardware (GPGPU). Next, I explore the motivations for GPGPU programming, and the capabilities of modern GPUs (including advantages and disadvantages). Also, I give the background required for further exploring GPU programming, including the terminology used and the resources available. Finally, I include a comprehensive survey of previous and current GPGPU work, and end with a look at the future of GPU programming.
Date: May 2008
Creator: Fleming, Robert
Partner: UNT Libraries

Exploring Trusted Platform Module Capabilities: A Theoretical and Experimental Study

Description: Trusted platform modules (TPMs) are hardware modules that are bound to a computer's motherboard, that are being included in many desktops and laptops. Augmenting computers with these hardware modules adds powerful functionality in distributed settings, allowing us to reason about the security of these systems in new ways. In this dissertation, I study the functionality of TPMs from a theoretical as well as an experimental perspective. On the theoretical front, I leverage various features of TPMs to construct applications like random oracles that are impossible to implement in a standard model of computation. Apart from random oracles, I construct a new cryptographic primitive which is basically a non-interactive form of the standard cryptographic primitive of oblivious transfer. I apply this new primitive to secure mobile agent computations, where interaction between various entities is typically required to ensure security. I prove these constructions are secure using standard cryptographic techniques and assumptions. To test the practicability of these constructions and their applications, I performed an experimental study, both on an actual TPM and a software TPM simulator which has been enhanced to make it reflect timings from a real TPM. This allowed me to benchmark the performance of the applications and test the feasibility of the proposed extensions to standard TPMs. My tests also show that these constructions are practical.
Date: May 2008
Creator: Gunupudi, Vandana
Partner: UNT Libraries

Models to Combat Email Spam Botnets and Unwanted Phone Calls

Description: With the amount of email spam received these days it is hard to imagine that spammers act individually. Nowadays, most of the spam emails have been sent from a collection of compromised machines controlled by some spammers. These compromised computers are often called bots, using which the spammers can send massive volume of spam within a short period of time. The motivation of this work is to understand and analyze the behavior of spammers through a large collection of spam mails. My research examined a the data set collected over a 2.5-year period and developed an algorithm which would give the botnet features and then classify them into various groups. Principal component analysis was used to study the association patterns of group of spammers and the individual behavior of a spammer in a given domain. This is based on the features which capture maximum variance of information we have clustered. Presence information is a growing tool towards more efficient communication and providing new services and features within a business setting and much more. The main contribution in my thesis is to propose the willingness estimator that can estimate the callee's willingness without his/her involvement, the model estimates willingness level based on call history. Finally, the accuracy of the proposed willingness estimator is validated with the actual call logs.
Date: May 2008
Creator: Husna, Husain
Partner: UNT Libraries

General Nathan Twining and the Fifteenth Air Force in World War II

Description: General Nathan F. Twining distinguished himself in leading the American Fifteenth Air Force during the last full year of World War II in the European Theatre. Drawing on the leadership qualities he had already shown in combat in the Pacific Theatre, he was the only USAAF leader who commanded three separate air forces during World War II. His command of the Fifteenth Air Force gave him his biggest, longest lasting, and most challenging experience of the war, which would be the foundation for the reputation that eventually would win him appointment to the nation's highest military post as Chairman of the Joint Chiefs of Staff during the Cold War.
Date: May 2008
Creator: Hutchins, Brian
Partner: UNT Libraries

A CAM-Based, High-Performance Classifier-Scheduler for a Video Network Processor.

Description: Classification and scheduling are key functionalities of a network processor. Network processors are equipped with application specific integrated circuits (ASIC), so that as IP (Internet Protocol) packets arrive, they can be processed directly without using the central processing unit. A new network processor is proposed called the video network processor (VNP) for real time broadcasting of video streams for IP television (IPTV). This thesis explores the challenge in designing a combined classification and scheduling module for a VNP. I propose and design the classifier-scheduler module which will classify and schedule data for VNP. The proposed module discriminates between IP packets and video packets. The video packets are further processed for digital rights management (DRM). IP packets which carry regular traffic will traverse without any modification. Basic architecture of VNP and architecture of classifier-scheduler module based on content addressable memory (CAM) and random access memory (RAM) has been proposed. The module has been designed and simulated in Xilinx 9.1i; is built in ISE simulator with a throughput of 1.79 Mbps and a maximum working frequency of 111.89 MHz at a power dissipation of 33.6mW. The code has been translated and mapped for Spartan and Virtex family of devices.
Date: May 2008
Creator: Tarigopula, Srivamsi
Partner: UNT Libraries

Non-Uniform Grid-Based Coordinated Routing in Wireless Sensor Networks

Description: Wireless sensor networks are ad hoc networks of tiny battery powered sensor nodes that can organize themselves to form self-organized networks and collect information regarding temperature, light, and pressure in an area. Though the applications of sensor networks are very promising, sensor nodes are limited in their capability due to many factors. The main limitation of these battery powered nodes is energy. Sensor networks are expected to work for long periods of time once deployed and it becomes important to conserve the battery life of the nodes to extend network lifetime. This work examines non-uniform grid-based routing protocol as an effort to minimize energy consumption in the network and extend network lifetime. The entire test area is divided into non-uniformly shaped grids. Fixed source and sink nodes with unlimited energy are placed in the network. Sensor nodes with full battery life are deployed uniformly and randomly in the field. The source node floods the network with only the coordinator node active in each grid and the other nodes sleeping. The sink node traces the same route back to the source node through the same coordinators. This process continues till a coordinator node runs out of energy, when new coordinator nodes are elected to participate in routing. Thus the network stays alive till the link between the source and sink nodes is lost, i.e., the network is partitioned. This work explores the efficiency of the non-uniform grid-based routing protocol for different node densities and the non-uniform grid structure that best extends network lifetime.
Date: August 2008
Creator: Kadiyala, Priyanka
Partner: UNT Libraries

Region aware DCT domain invisible robust blind watermarking for color images.

Description: The multimedia revolution has made a strong impact on our society. The explosive growth of the Internet, the access to this digital information generates new opportunities and challenges. The ease of editing and duplication in digital domain created the concern of copyright protection for content providers. Various schemes to embed secondary data in the digital media are investigated to preserve copyright and to discourage unauthorized duplication: where digital watermarking is a viable solution. This thesis proposes a novel invisible watermarking scheme: a discrete cosine transform (DCT) domain based watermark embedding and blind extraction algorithm for copyright protection of the color images. Testing of the proposed watermarking scheme's robustness and security via different benchmarks proves its resilience to digital attacks. The detectors response, PSNR and RMSE results show that our algorithm has a better security performance than most of the existing algorithms.
Date: December 2008
Creator: Naraharisetti, Sahasan
Partner: UNT Libraries

Graph-based Centrality Algorithms for Unsupervised Word Sense Disambiguation

Description: This thesis introduces an innovative methodology of combining some traditional dictionary based approaches to word sense disambiguation (semantic similarity measures and overlap of word glosses, both based on WordNet) with some graph-based centrality methods, namely the degree of the vertices, Pagerank, closeness, and betweenness. The approach is completely unsupervised, and is based on creating graphs for the words to be disambiguated. We experiment with several possible combinations of the semantic similarity measures as the first stage in our experiments. The next stage attempts to score individual vertices in the graphs previously created based on several graph connectivity measures. During the final stage, several voting schemes are applied on the results obtained from the different centrality algorithms. The most important contributions of this work are not only that it is a novel approach and it works well, but also that it has great potential in overcoming the new-knowledge-acquisition bottleneck which has apparently brought research in supervised WSD as an explicit application to a plateau. The type of research reported in this thesis, which does not require manually annotated data, holds promise of a lot of new and interesting things, and our work is one of the first steps, despite being a small one, in this direction. The complete system is built and tested on standard benchmarks, and is comparable with work done on graph-based word sense disambiguation as well as lexical chains. The evaluation indicates that the right combination of the above mentioned metrics can be used to develop an unsupervised disambiguation engine as powerful as the state-of-the-art in WSD.
Date: December 2008
Creator: Sinha, Ravi Som
Partner: UNT Libraries

Direct Online/Offline Digital Signature Schemes.

Description: Online/offline signature schemes are useful in many situations, and two such scenarios are considered in this dissertation: bursty server authentication and embedded device authentication. In this dissertation, new techniques for online/offline signing are introduced, those are applied in a variety of ways for creating online/offline signature schemes, and five different online/offline signature schemes that are proved secure under a variety of models and assumptions are proposed. Two of the proposed five schemes have the best offline or best online performance of any currently known technique, and are particularly well-suited for the scenarios that are considered in this dissertation. To determine if the proposed schemes provide the expected practical improvements, a series of experiments were conducted comparing the proposed schemes with each other and with other state-of-the-art schemes in this area, both on a desktop class computer, and under AVR Studio, a simulation platform for an 8-bit processor that is popular for embedded systems. Under AVR Studio, the proposed SGE scheme using a typical key size for the embedded device authentication scenario, can complete the offline phase in about 24 seconds and then produce a signature (the online phase) in 15 milliseconds, which is the best offline performance of any known signature scheme that has been proven secure in the standard model. In the tests on a desktop class computer, the proposed SGS scheme, which has the best online performance and is designed for the bursty server authentication scenario, generated 469,109 signatures per second, and the Schnorr scheme (the next best scheme in terms of online performance) generated only 223,548 signatures. The experimental results demonstrate that the SGE and SGS schemes are the most efficient techniques for embedded device authentication and bursty server authentication, respectively.
Date: December 2008
Creator: Yu, Ping
Partner: UNT Libraries

Variability-aware low-power techniques for nanoscale mixed-signal circuits.

Description: New circuit design techniques that accommodate lower supply voltages necessary for portable systems need to be integrated into the semiconductor intellectual property (IP) core. Systems that once worked at 3.3 V or 2.5 V now need to work at 1.8 V or lower, without causing any performance degradation. Also, the fluctuation of device characteristics caused by process variation in nanometer technologies is seen as design yield loss. The numerous parasitic effects induced by layouts, especially for high-performance and high-speed circuits, pose a problem for IC design. Lack of exact layout information during circuit sizing leads to long design iterations involving time-consuming runs of complex tools. There is a strong need for low-power, high-performance, parasitic-aware and process-variation-tolerant circuit design. This dissertation proposes methodologies and techniques to achieve variability, power, performance, and parasitic-aware circuit designs. Three approaches are proposed: the single iteration automatic approach, the hybrid Monte Carlo and design of experiments (DOE) approach, and the corner-based approach. Widely used mixed-signal circuits such as analog-to-digital converter (ADC), voltage controlled oscillator (VCO), voltage level converter and active pixel sensor (APS) have been designed at nanoscale complementary metal oxide semiconductor (CMOS) and subjected to the proposed methodologies. The effectiveness of the proposed methodologies has been demonstrated through exhaustive simulations. Apart from these methodologies, the application of dual-oxide and dual-threshold techniques at circuit level in order to minimize power and leakage is also explored.
Date: May 2009
Creator: Ghai, Dhruva V.
Partner: UNT Libraries

Development, Implementation, and Analysis of a Contact Model for an Infectious Disease

Description: With a growing concern of an infectious diseases spreading in a population, epidemiology is becoming more important for the future of public health. In the past epidemiologist used existing data of an outbreak to help them determine how an infectious disease might spread in the future. Now with computational models, they able to analysis data produced by these models to help with prevention and intervention plans. This paper looks at the design, implementation, and analysis of a computational model based on the interactions of the population between individuals. The design of the working contact model looks closely at the SEIR model used as the foundation and the two timelines of a disease. The implementation of the contact model is reviewed while looking closely at data structures. The analysis of the experiments provide evidence this contact model can be used to help epidemiologist study the spread of an infectious disease based on the contact rate of individuals.
Date: May 2009
Creator: Thompson, Brett Morinaga
Partner: UNT Libraries

Social Network Simulation and Mining Social Media to Advance Epidemiology

Description: Traditional Public Health decision-support can benefit from the Web and social media revolution. This dissertation presents approaches to mining social media benefiting public health epidemiology. Through discovery and analysis of trends in Influenza related blogs, a correlation to Centers for Disease Control and Prevention (CDC) influenza-like-illness patient reporting at sentinel health-care providers is verified. A second approach considers personal beliefs of vaccination in social media. A vaccine for human papillomavirus (HPV) was approved by the Food and Drug Administration in May 2006. The virus is present in nearly all cervical cancers and implicated in many throat and oral cancers. Results from automatic sentiment classification of HPV vaccination beliefs are presented which will enable more accurate prediction of the vaccine's population-level impact. Two epidemic models are introduced that embody the intimate social networks related to HPV transmission. Ultimately, aggregating these methodologies with epidemic and social network modeling facilitate effective development of strategies for targeted interventions.
Date: August 2009
Creator: Corley, Courtney David
Partner: UNT Libraries