You limited your search to:

  Partner: UNT Libraries
 Degree Discipline: Computer Science
 Collection: UNT Theses and Dissertations
Temporally Correct Algorithms for Transaction Concurrency Control in Distributed Databases
Access: Use of this item is restricted to the UNT Community.
Many activities are comprised of temporally dependent events that must be executed in a specific chronological order. Supportive software applications must preserve these temporal dependencies. Whenever the processing of this type of an application includes transactions submitted to a database that is shared with other such applications, the transaction concurrency control mechanisms within the database must also preserve the temporal dependencies. A basis for preserving temporal dependencies is established by using (within the applications and databases) real-time timestamps to identify and order events and transactions. The use of optimistic approaches to transaction concurrency control can be undesirable in such situations, as they allow incorrect results for database read operations. Although the incorrectness is detected prior to transaction committal and the corresponding transaction(s) restarted, the impact on the application or entity that submitted the transaction can be too costly. Three transaction concurrency control algorithms are proposed in this dissertation. These algorithms are based on timestamp ordering, and are designed to preserve temporal dependencies existing among data-dependent transactions. The algorithms produce execution schedules that are equivalent to temporally ordered serial schedules, where the temporal order is established by the transactions' start times. The algorithms provide this equivalence while supporting currency to the extent out-of-order commits and reads. With respect to the stated concern with optimistic approaches, two of the proposed algorithms are risk-free and return to read operations only committed data-item values. Risk with the third algorithm is greatly reduced by its conservative bias. All three algorithms avoid deadlock while providing risk-free or reduced-risk operation. The performance of the algorithms is determined analytically and with experimentation. Experiments are performed using functional database management system models that implement the proposed algorithms and the well-known Conservative Multiversion Timestamp Ordering algorithm.
Computational Complexity of Hopfield Networks
There are three main results in this dissertation. They are PLS-completeness of discrete Hopfield network convergence with eight different restrictions, (degree 3, bipartite and degree 3, 8-neighbor mesh, dual of the knight's graph, hypercube, butterfly, cube-connected cycles and shuffle-exchange), exponential convergence behavior of discrete Hopfield network, and simulation of Turing machines by discrete Hopfield Network.
Modeling Alcohol Consumption Using Blog Data
How do the content and writing style of people who drink alcohol beverages stand out from non-drinkers? How much information can we learn about a person's alcohol consumption behavior by reading text that they have authored? This thesis attempts to extend the methods deployed in authorship attribution and authorship profiling research into the domain of automatically identifying the human action of drinking alcohol beverages. I examine how a psycholinguistics dictionary (the Linguistics Inquiry and Word Count lexicon, developed by James Pennebaker), together with Kenneth Burke's concept of words as symbols of human action, and James Wertsch's concept of mediated action provide a framework for analyzing meaningful data patterns from the content of blogs written by consumers of alcohol beverages. The contributions of this thesis to the research field are twofold. First, I show that it is possible to automatically identify blog posts that have content related to the consumption of alcohol beverages. And second, I provide a framework and tools to model human behavior through text analysis of blog data.
Optimizing Non-pharmaceutical Interventions Using Multi-coaffiliation Networks
Computational modeling is of fundamental significance in mapping possible disease spread, and designing strategies for its mitigation. Conventional contact networks implement the simulation of interactions as random occurrences, presenting public health bodies with a difficult trade off between a realistic model granularity and robust design of intervention strategies. Recently, researchers have been investigating the use of agent-based models (ABMs) to embrace the complexity of real world interactions. At the same time, theoretical approaches provide epidemiologists with general optimization models in which demographics are intrinsically simplified. The emerging study of affiliation networks and co-affiliation networks provide an alternative to such trade off. Co-affiliation networks maintain the realism innate to ABMs while reducing the complexity of contact networks into distinctively smaller k-partite graphs, were each partition represent a dimension of the social model. This dissertation studies the optimization of intervention strategies for infectious diseases, mainly distributed in school systems. First, concepts of synthetic populations and affiliation networks are extended to propose a modified algorithm for the synthetic reconstruction of populations. Second, the definition of multi-coaffiliation networks is presented as the main social model in which risk is quantified and evaluated, thereby obtaining vulnerability indications for each school in the system. Finally, maximization of the mitigation coverage and minimization of the overall cost of intervention strategies are proposed and compared, based on centrality measures.
Visualization of Surfaces and 3D Vector Fields
Visualization of trivariate functions and vector fields with three components in scientific computation is still a hard problem in compute graphic area. People build their own visualization packages for their special purposes. And there exist some general-purpose packages (MatLab, Vis5D), but they all require extensive user experience on setting all the parameters in order to generate images. We present a simple package to produce simplified but productive images of 3-D vector fields. We used this method to render the magnetic field and current as solutions of the Ginzburg-Landau equations on a 3-D domain.
Intrinsic and Extrinsic Adaptation in a Simulated Combat Environment
Genetic algorithm and artificial life techniques are applied to the development of challenging and interesting opponents in a combat-based computer game. Computer simulations are carried out against an idealized human player to gather data on the effectiveness of the computer generated opponents.
Exon/Intron Discrimination Using the Finite Induction Pattern Matching Technique
DNA sequence analysis involves precise discrimination of two of the sequence's most important components: exons and introns. Exons encode the proteins that are responsible for almost all the functions in a living organism. Introns interrupt the sequence coding for a protein and must be removed from primary RNA transcripts before translation to protein can occur. A pattern recognition technique called Finite Induction (FI) is utilized to study the language of exons and introns. FI is especially suited for analyzing and classifying large amounts of data representing sequences of interest. It requires no biological information and employs no statistical functions. Finite Induction is applied to the exon and intron components of DNA by building a collection of rules based upon what it finds in the sequences it examines. It then attempts to match the known rule patterns with new rules formed as a result of analyzing a new sequence. A high number of matches predict a probable close relationship between the two sequences; a low number of matches signifies a large amount of difference between the two. This research demonstrates FI to be a viable tool for measurement when known patterns are available for the formation of rule sets.
Intelligent Memory Management Heuristics
Automatic memory management is crucial in implementation of runtime systems even though it induces a significant computational overhead. In this thesis I explore the use of statistical properties of the directed graph describing the set of live data to decide between garbage collection and heap expansion in a memory management algorithm combining the dynamic array represented heaps with a mark and sweep garbage collector to enhance its performance. The sampling method predicting the density and the distribution of useful data is implemented as a partial marking algorithm. The algorithm randomly marks the nodes of the directed graph representing the live data at different depths with a variable probability factor p. Using the information gathered by the partial marking algorithm in the current step and the knowledge gathered in the previous iterations, the proposed empirical formula predicts with reasonable accuracy the density of live nodes on the heap, to decide between garbage collection and heap expansion. The resulting heuristics are tested empirically and shown to improve overall execution performance significantly in the context of the Jinni Prolog compiler's runtime system.
Toward a Data-Type-Based Real Time Geospatial Data Stream Management System
The advent of sensory and communication technologies enables the generation and consumption of large volumes of streaming data. Many of these data streams are geo-referenced. Existing spatio-temporal databases and data stream management systems are not capable of handling real time queries on spatial extents. In this thesis, we investigated several fundamental research issues toward building a data-type-based real time geospatial data stream management system. The thesis makes contributions in the following areas: geo-stream data models, aggregation, window-based nearest neighbor operators, and query optimization strategies. The proposed geo-stream data model is based on second-order logic and multi-typed algebra. Both abstract and discrete data models are proposed and exemplified. I further propose two useful geo-stream operators, namely Region By and WNN, which abstract common aggregation and nearest neighbor queries as generalized data model constructs. Finally, I propose three query optimization algorithms based on spatial, temporal, and spatio-temporal constraints of geo-streams. I show the effectiveness of the data model through many query examples. The effectiveness and the efficiency of the algorithms are validated through extensive experiments on both synthetic and real data sets. This work established the fundamental building blocks toward a full-fledged geo-stream database management system and has potential impact in many applications such as hazard weather alerting and monitoring, traffic analysis, and environmental modeling.
A Wireless Traffic Surveillance System Using Video Analytics
Video surveillance systems have been commonly used in transportation systems to support traffic monitoring, speed estimation, and incident detection. However, there are several challenges in developing and deploying such systems, including high development and maintenance costs, bandwidth bottleneck for long range link, and lack of advanced analytics. In this thesis, I leverage current wireless, video camera, and analytics technologies, and present a wireless traffic monitoring system. I first present an overview of the system. Then I describe the site investigation and several test links with different hardware/software configurations to demonstrate the effectiveness of the system. The system development process was documented to provide guidelines for future development. Furthermore, I propose a novel speed-estimation analytics algorithm that takes into consideration roads with slope angles. I prove the correctness of the algorithm theoretically, and validate the effectiveness of the algorithm experimentally. The experimental results on both synthetic and real dataset show that the algorithm is more accurate than the baseline algorithm 80% of the time. On average the accuracy improvement of speed estimation is over 3.7% even for very small slope angles.
Performance Evaluation of Data Integrity Mechanisms for Mobile Agents
Access: Use of this item is restricted to the UNT Community.
With the growing popularity of e-commerce applications that use software agents, the protection of mobile agent data has become imperative. To that end, the performance of four methods that protect the data integrity of mobile agents is evaluated. The methods investigated include existing approaches known as the Partial Result Authentication Codes, Hash Chaining, and Set Authentication Code methods, and a technique of our own design, called the Modified Set Authentication Code method, which addresses the limitations of the Set Authentication Code method. The experiments were run using the DADS agent system (developed at the Network Research Laboratory at UNT), for which a Data Integrity Module was designed. The experimental results show that our Modified Set Authentication Code technique performed comparably to the Set Authentication Code method.
Modeling Complex Forest Ecology in a Parallel Computing Infrastructure
Effective stewardship of forest ecosystems make it imperative to measure, monitor, and predict the dynamic changes of forest ecology. Measuring and monitoring provides us a picture of a forest's current state and the necessary data to formulate models for prediction. However, societal and natural events alter the course of a forest's development. A simulation environment that takes into account these events will facilitate forest management. In this thesis, we describe an efficient parallel implementation of a land cover use model, Mosaic, and discuss the development efforts to incorporate spatial interaction and succession dynamics into the model. To evaluate the performance of our implementation, an extensive set of simulation experiments was carried out using a dataset representing the H.J. Andrews Forest in the Oregon Cascades. Results indicate that a significant reduction in the simulation execution time of our parallel model can be achieved as compared to uni-processor simulations.
Improved Approximation Algorithms for Geometric Packing Problems With Experimental Evaluation
Access: Use of this item is restricted to the UNT Community.
Geometric packing problems are NP-complete problems that arise in VLSI design. In this thesis, we present two novel algorithms using dynamic programming to compute exactly the maximum number of k x k squares of unit size that can be packed without overlap into a given n x m grid. The first algorithm was implemented and ran successfully on problems of large input up to 1,000,000 nodes for different values. A heuristic based on the second algorithm is implemented. This heuristic is fast in practice, but may not always be giving optimal times in theory. However, over a wide range of random data this version of the algorithm is giving very good solutions very fast and runs on problems of up to 100,000,000 nodes in a grid and different ranges for the variables. It is also shown that this version of algorithm is clearly superior to the first algorithm and has shown to be very efficient in practice.
Performance Analysis of Wireless Networks with QoS Adaptations
The explosive demand for multimedia and fast transmission of continuous media on wireless networks means the simultaneous existence of traffic requiring different qualities of service (QoS). In this thesis, several efficient algorithms have been developed which offer several QoS to the end-user. We first look at a request TDMA/CDMA protocol for supporting wireless multimedia traffic, where CDMA is laid over TDMA. Then we look at a hybrid push-pull algorithm for wireless networks, and present a generalized performance analysis of the proposed protocol. Some of the QoS factors considered include customer retrial rates due to user impatience and system timeouts and different levels of priority and weights for mobile hosts. We have also looked at how customer impatience and system timeouts affect the QoS provided by several queuing and scheduling schemes such as FIFO, priority, weighted fair queuing, and the application of the stretch-optimal algorithm to scheduling.
Higher Compression from the Burrows-Wheeler Transform with New Algorithms for the List Update Problem
Burrows-Wheeler compression is a three stage process in which the data is transformed with the Burrows-Wheeler Transform, then transformed with Move-To-Front, and finally encoded with an entropy coder. Move-To-Front, Transpose, and Frequency Count are some of the many algorithms used on the List Update problem. In 1985, Competitive Analysis first showed the superiority of Move-To-Front over Transpose and Frequency Count for the List Update problem with arbitrary data. Earlier studies due to Bitner assumed independent identically distributed data, and showed that while Move-To-Front adapts to a distribution faster, incurring less overwork, the asymptotic costs of Frequency Count and Transpose are less. The improvements to Burrows-Wheeler compression this work covers are increases in the amount, not speed, of compression. Best x of 2x-1 is a new family of algorithms created to improve on Move-To-Front's processing of the output of the Burrows-Wheeler Transform which is like piecewise independent identically distributed data. Other algorithms for both the middle stage of Burrows-Wheeler compression and the List Update problem for which overwork, asymptotic cost, and competitive ratios are also analyzed are several variations of Move One From Front and part of the randomized algorithm Timestamp. The Best x of 2x - 1 family includes Move-To-Front, the part of Timestamp of interest, and Frequency Count. Lastly, a greedy choosing scheme, Snake, switches back and forth as the amount of compression that two List Update algorithms achieves fluctuates, to increase overall compression. The Burrows-Wheeler Transform is based on sorting of contexts. The other improvements are better sorting orders, such as “aeioubcdf...” instead of standard alphabetical “abcdefghi...” on English text data, and an algorithm for computing orders for any data, and Gray code sorting instead of standard sorting. Both techniques lessen the overwork incurred by whatever List Update algorithms are used by reducing the difference between adjacent sorted contexts.
An Annotated Bibliography of Mobile Agents in Networks
The purpose of this thesis is to present a comprehensive colligation of applications of mobile agents in networks, and provide a baseline association of these systems. This work has been motivated by the fact that mobile agent systems have been deemed proficuous alternatives in system applications. Several mobile agent systems have been developed to provide scalable and cogent solutions in network-centric applications. This thesis examines some existing mobile agent systems in core networking areas, in particular, those of network and resource management, routing, and the provision of fault tolerance and security. The inherent features of these systems are discussed with respect to their specific functionalities. The applicability and efficacy of mobile agents are further considered in the specific areas mentioned above. Although an initial foray into a collation of this nature, the goal of this annotated bibliography is to provide a generic referential view of mobile agent systems in network applications.
Anchor Nodes Placement for Effective Passive Localization
Access: Use of this item is restricted to the UNT Community.
Wireless sensor networks are composed of sensor nodes, which can monitor an environment and observe events of interest. These networks are applied in various fields including but not limited to environmental, industrial and habitat monitoring. In many applications, the exact location of the sensor nodes is unknown after deployment. Localization is a process used to find sensor node's positional coordinates, which is vital information. The localization is generally assisted by anchor nodes that are also sensor nodes but with known locations. Anchor nodes generally are expensive and need to be optimally placed for effective localization. Passive localization is one of the localization techniques where the sensor nodes silently listen to the global events like thunder sounds, seismic waves, lighting, etc. According to previous studies, the ideal location to place anchor nodes was on the perimeter of the sensor network. This may not be the case in passive localization, since the function of anchor nodes here is different than the anchor nodes used in other localization systems. I do extensive studies on positioning anchor nodes for effective localization. Several simulations are run in dense and sparse networks for proper positioning of anchor nodes. I show that, for effective passive localization, the optimal placement of the anchor nodes is at the center of the network in such a way that no three anchor nodes share linearity. The more the non-linearity, the better the localization. The localization for our network design proves better when I place anchor nodes at right angles.
Measuring Vital Signs Using Smart Phones
Smart phones today have become increasingly popular with the general public for its diverse abilities like navigation, social networking, and multimedia facilities to name a few. These phones are equipped with high end processors, high resolution cameras, built-in sensors like accelerometer, orientation-sensor, light-sensor, and much more. According to comScore survey, 25.3% of US adults use smart phones in their daily lives. Motivated by the capability of smart phones and their extensive usage, I focused on utilizing them for bio-medical applications. In this thesis, I present a new application for a smart phone to quantify the vital signs such as heart rate, respiratory rate and blood pressure with the help of its built-in sensors. Using the camera and a microphone, I have shown how the blood pressure and heart rate can be determined for a subject. People sometimes encounter minor situations like fainting or fatal accidents like car crash at unexpected times and places. It would be useful to have a device which can measure all vital signs in such an event. The second part of this thesis demonstrates a new mode of communication for next generation 9-1-1 calls. In this new architecture, the call-taker will be able to control the multimedia elements in the phone from a remote location. This would help the call-taker or first responder to have a better control over the situation. Transmission of the vital signs measured using the smart phone can be a life saver in critical situations. In today's voice oriented 9-1-1 calls, the dispatcher first collects critical information (e.g., location, call-back number) from caller, and assesses the situation. Meanwhile, the dispatchers constantly face a "60-second dilemma"; i.e., within 60 seconds, they need to make a complicated but important decision, whether to dispatch and, if so, what to dispatch. The dispatchers often feel that they lack sufficient information to make a confident dispatch decision. This remote-media-control described in this system will be able to facilitate information acquisition and decision-making in emergency situations within the 60-second response window in 9-1-1 calls using new multimedia technologies.
The Multipath Fault-Tolerant Protocol for Routing in Packet-Switched Communication Network
In order to provide improved service quality to applications, networks need to address the need for reliability of data delivery. Reliability can be improved by incorporating fault tolerance into network routing, wherein a set of multiple routes are used for routing between a given source and destination. This thesis proposes a new fault-tolerant protocol, called the Multipath Fault Tolerant Protocol for Routing (MFTPR), to improve the reliability of network routing services. The protocol is based on a multipath discovery algorithm, the Quasi-Shortest Multipath (QSMP), and is designed to work in conjunction with the routing protocol employed by the network. MFTPR improves upon the QSMP algorithm by finding more routes than QSMP, and also provides for maintenance of these routes in the event of failure of network components. In order to evaluate the resilience of a pair of paths to failure, this thesis proposes metrics that evaluate the non-disjointness of a pair of paths and measure the probability of simultaneous failure of these paths. The performance of MFTPR to find alternate routes based on these metrics is analyzed through simulation.
Automated Classification of Emotions Using Song Lyrics
This thesis explores the classification of emotions in song lyrics, using automatic approaches applied to a novel corpus of 100 popular songs. I use crowd sourcing via Amazon Mechanical Turk to collect line-level emotions annotations for this collection of song lyrics.  I then build classifiers that rely on textual features to automatically identify the presence of one or more of the following six Ekman emotions: anger, disgust, fear, joy, sadness and surprise. I compare different classification systems and evaluate the performance of the automatic systems against the manual annotations. I also introduce a system that uses data collected from the social network Twitter. I use the Twitter API to collect a large corpus of tweets manually labeled by their authors for one of the six emotions of interest. I then compare the classification of emotions obtained when training on data automatically collected from Twitter versus data obtained through crowd sourced annotations.
A Programming Language For Concurrent Processing
This thesis is a proposed solution to the problem of including an effective interrupt mechanism in the set of concurrent- processing primitives of a block-structured programming language or system. The proposed solution is presented in the form of a programming language definition and model. The language is called TRIPLE.
DirectShow Approach to Low-Cost Multimedia Security Surveillance System
In response to the recent intensive needs for civilian security surveillance, both full and compact versions of a Multimedia Security Surveillance (MSS) system have been built up. The new Microsoft DirectShow technology was applied in implementing the multimedia stream-processing module. Through Microsoft Windows Driver Model interface, the chosen IEEE1394 enabled Fire-i cameras as external sensors are integrated with PC based continuous storage unit. The MSS application also allows multimedia broadcasting and remote controls. Cost analysis is included.
Efficient Algorithms and Framework for Bandwidth Allocation, Quality-of-Service Provisioning and Location Management in Mobile Wireless Computing
The fusion of computers and communications has promised to herald the age of information super-highway over high speed communication networks where the ultimate goal is to enable a multitude of users at any place, access information from anywhere and at any time. This, in a nutshell, is the goal envisioned by the Personal Communication Services (PCS) and Xerox's ubiquitous computing. In view of the remarkable growth of the mobile communication users in the last few years, the radio frequency spectrum allocated by the FCC (Federal Communications Commission) to this service is still very limited and the usable bandwidth is by far much less than the expected demand, particularly in view of the emergence of the next generation wireless multimedia applications like video-on-demand, WWW browsing, traveler information systems etc. Proper management of available spectrum is necessary not only to accommodate these high bandwidth applications, but also to alleviate problems due to sudden explosion of traffic in so called hot cells. In this dissertation, we have developed simple load balancing techniques to cope with the problem of tele-traffic overloads in one or more hot cells in the system. The objective is to ease out the high channel demand in hot cells by borrowing channels from suitable cold cells and by proper assignment (or, re-assignment) of the channels among the users. We also investigate possible ways of improving system capacity by rescheduling bandwidth in case of wireless multimedia traffic. In our proposed scheme, traffic using multiple channels releases one or more channels to increase the carried traffic or throughput in the system. Two orthogonal QoS parameters, called carried traffic and bandwidth degradation, are identified and a cost function describing the total revenue earned by the system from a bandwidth degradation and call admission policy, is formulated. A channel sharing scheme is proposed for co-existing real-time and non-real-time traffic and analyzed using a Markov modulated Poisson process (MMPP) based queueing model. The location management problem in mobile computing deals with the problem of a combined management of location updates and paging in the network, both of which consume scarce network resources like bandwidth, CPU cycles etc. An easily implementable location update scheme is developed which considers per-user mobility pattern on top of the conventional location area based approach and computes an update strategy for each user by minimizing the average location management cost. The cost optimization problem is elegantly solved using a genetic algorithm.
Extensions to Jinni Mobile Agent Architecture
We extend the Jinni mobile agent architecture with a multicast network transport layer, an agent-to-agent delegation mechanism and a reflection based Prolog-to-Java interface. To ensure that our agent infrastructure runs efficiently, independently of router-level multicast support, we describe a blackboard based algorithm for locating a randomly roaming agent. As part of the agent-to-agent delegation mechanism, we describe an alternative to code-fetching mechanism for stronger mobility of mobile agents with less network overhead. In the context of direct and reflection based extension mechanisms for Jinni, we describe the design and the implementation of a reflection based Prolog-to-Java interface. The presence of subtyping and method overloading makes finding the most specific method corresponding to a Prolog call pattern fairly difficult. We describe a run-time algorithm which provides accurate handling of overloaded methods beyond Java's reflection package's limitations.
A Unifying Version Model for Objects and Schema in Object-Oriented Database System
There have been a number of different versioning models proposed. The research in this area can be divided into two categories: object versioning and schema versioning. In this dissertation, both problem domains are considered as a single unit. This dissertation describes a unifying version model (UVM) for maintaining changes to both objects and schema. UVM handles schema versioning operations by using object versioning techniques. The result is that the UVM allows the OODBMS to be much smaller than previous systems. Also, programmers need know only one set of versioning operations; thus, reducing the learning time by half. This dissertation shows that UVM is a simple but semantically sound and powerful version model for both objects and schema.
Symplectic Integration of Nonseparable Hamiltonian Systems
Numerical methods are usually necessary in solving Hamiltonian systems since there is often no closed-form solution. By utilizing a general property of Hamiltonians, namely the symplectic property, all of the qualities of the system may be preserved for indefinitely long integration times because all of the integral (Poincare) invariants are conserved. This allows for more reliable results and frequently leads to significantly shorter execution times as compared to conventional methods. The resonant triad Hamiltonian with one degree of freedom will be focused upon for most of the numerical tests because of its difficult nature and, moreover, analytical results exist whereby useful comparisons can be made.
Optimal Access Point Selection and Channel Assignment in IEEE 802.11 Networks
Designing 802.11 wireless networks includes two major components: selection of access points (APs) in the demand areas and assignment of radio frequencies to each AP. Coverage and capacity are some key issues when placing APs in a demand area. APs need to cover all users. A user is considered covered if the power received from its corresponding AP is greater than a given threshold. Moreover, from a capacity standpoint, APs need to provide certain minimum bandwidth to users located in the coverage area. A major challenge in designing wireless networks is the frequency assignment problem. The 802.11 wireless LANs operate in the unlicensed ISM frequency, and all APs share the same frequency. As a result, as 802.11 APs become widely deployed, they start to interfere with each other and degrade network throughput. In consequence, efficient assignment of channels becomes necessary to avoid and minimize interference. In this work, an optimal AP selection was developed by balancing traffic load. An optimization problem was formulated that minimizes heavy congestion. As a result, APs in wireless LANs will have well distributed traffic loads, which maximize the throughput of the network. The channel assignment algorithm was designed by minimizing channel interference between APs. The optimization algorithm assigns channels in such a way that minimizes co-channel and adjacent channel interference resulting in higher throughput.
Exploring Trusted Platform Module Capabilities: A Theoretical and Experimental Study
Trusted platform modules (TPMs) are hardware modules that are bound to a computer's motherboard, that are being included in many desktops and laptops. Augmenting computers with these hardware modules adds powerful functionality in distributed settings, allowing us to reason about the security of these systems in new ways. In this dissertation, I study the functionality of TPMs from a theoretical as well as an experimental perspective. On the theoretical front, I leverage various features of TPMs to construct applications like random oracles that are impossible to implement in a standard model of computation. Apart from random oracles, I construct a new cryptographic primitive which is basically a non-interactive form of the standard cryptographic primitive of oblivious transfer. I apply this new primitive to secure mobile agent computations, where interaction between various entities is typically required to ensure security. I prove these constructions are secure using standard cryptographic techniques and assumptions. To test the practicability of these constructions and their applications, I performed an experimental study, both on an actual TPM and a software TPM simulator which has been enhanced to make it reflect timings from a real TPM. This allowed me to benchmark the performance of the applications and test the feasibility of the proposed extensions to standard TPMs. My tests also show that these constructions are practical.
General Purpose Programming on Modern Graphics Hardware
I start with a brief introduction to the graphics processing unit (GPU) as well as general-purpose computation on modern graphics hardware (GPGPU). Next, I explore the motivations for GPGPU programming, and the capabilities of modern GPUs (including advantages and disadvantages). Also, I give the background required for further exploring GPU programming, including the terminology used and the resources available. Finally, I include a comprehensive survey of previous and current GPGPU work, and end with a look at the future of GPU programming.
Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing.
This research addresses the problem of automatic keyphrase extraction from large documents and back of the book indexing. The potential benefits of automating this process are far reaching, from improving information retrieval in digital libraries, to saving countless man-hours by helping professional indexers creating back of the book indexes. The dissertation introduces a new methodology to evaluate automated systems, which allows for a detailed, comparative analysis of several techniques for keyphrase extraction. We introduce and evaluate both supervised and unsupervised techniques, designed to balance the resource requirements of an automated system and the best achievable performance. Additionally, a number of novel features are proposed, including a statistical informativeness measure based on chi statistics; an encyclopedic feature that taps into the vast knowledge base of Wikipedia to establish the likelihood of a phrase referring to an informative concept; and a linguistic feature based on sophisticated semantic analysis of the text using current theories of discourse comprehension. The resulting keyphrase extraction system is shown to outperform the current state of the art in supervised keyphrase extraction by a large margin. Moreover, a fully automated back of the book indexing system based on the keyphrase extraction system was shown to lead to back of the book indexes closely resembling those created by human experts.
Arithmetic Computations and Memory Management Using a Binary Tree Encoding af Natural Numbers
Two applications of a binary tree data type based on a simple pairing function (a bijection between natural numbers and pairs of natural numbers) are explored. First, the tree is used to encode natural numbers, and algorithms that perform basic arithmetic computations are presented along with formal proofs of their correctness. Second, using this "canonical" representation as a base type, algorithms for encoding and decoding additional isomorphic data types of other mathematical constructs (sets, sequences, etc.) are also developed. An experimental application to a memory management system is constructed and explored using these isomorphic types. A practical analysis of this system's runtime complexity and space savings are provided, along with a proof of concept framework for both applications of the binary tree type, in the Java programming language.
The Design Of A Benchmark For Geo-stream Management Systems
The recent growth in sensor technology allows easier information gathering in real-time as sensors have grown smaller, more accurate, and less expensive. The resulting data is often in a geo-stream format continuously changing input with a spatial extent. Researchers developing geo-streaming management systems (GSMS) require a benchmark system for evaluation, which is currently lacking. This thesis presents GSMark, a benchmark for evaluating GSMSs. GSMark provides a data generator that creates a combination of synthetic and real geo-streaming data, a workload simulator to present the data to the GSMS as a data stream, and a set of benchmark queries that evaluate typical GSMS functionality and query performance. In particular, GSMark generates both moving points and evolving spatial regions, two fundamental data types for a broad range of geo-stream applications, and the geo-streaming queries on this data.
A Netcentric Scientific Research Repository
Access: Use of this item is restricted to the UNT Community.
The Internet and networks in general have become essential tools for disseminating in-formation. Search engines have become the predominant means of finding information on the Web and all other data repositories, including local resources. Domain scientists regularly acquire and analyze images generated by equipment such as microscopes and cameras, resulting in complex image files that need to be managed in a convenient manner. This type of integrated environment has been recently termed a netcentric sci-entific research repository. I developed a number of data manipulation tools that allow researchers to manage their information more effectively in a netcentric environment. The specific contributions are: (1) A unique interface for management of data including files and relational databases. A wrapper for relational databases was developed so that the data can be indexed and searched using traditional search engines. This approach allows data in databases to be searched with the same interface as other data. Fur-thermore, this approach makes it easier for scientists to work with their data if they are not familiar with SQL. (2) A Web services based architecture for integrating analysis op-erations into a repository. This technique allows the system to leverage the large num-ber of existing tools by wrapping them with a Web service and registering the service with the repository. Metadata associated with Web services was enhanced to allow this feature to be included. In addition, an improved binary to text encoding scheme was de-veloped to reduce the size overhead for sending large scientific data files via XML mes-sages used in Web services. (3) Integrated image analysis operations with SQL. This technique allows for images to be stored and managed conveniently in a relational da-tabase. SQL supplemented with map algebra operations is used to select and perform operations on sets of images.
Mobile agent security through multi-agent cryptographic protocols.
An increasingly promising and widespread topic of research in distributed computing is the mobile agent paradigm: code travelling and performing computations on remote hosts in an autonomous manner. One of the biggest challenges faced by this new paradigm is security. The issue of protecting sensitive code and data carried by a mobile agent against tampering from a malicious host is particularly hard but important. Based on secure multi-party computation, a recent research direction shows the feasibility of a software-only solution to this problem, which had been deemed impossible by some researchers previously. The best result prior to this dissertation is a single-agent protocol which requires the participation of a trusted third party. Our research employs multi-agent protocols to eliminate the trusted third party, resulting in a protocol with minimum trust assumptions. This dissertation presents one of the first formal definitions of secure mobile agent computation, in which the privacy and integrity of the agent code and data as well as the data provided by the host are all protected. We present secure protocols for mobile agent computation against static, semi-honest or malicious adversaries without relying on any third party or trusting any specific participant in the system. The security of our protocols is formally proven through standard proof technique and according to our formal definition of security. Our second result is a more practical agent protocol with strong security against most real-world host attacks. The security features are carefully analyzed, and the practicality is demonstrated through implementation and experimental study on a real-world mobile agent platform. All these protocols rely heavily on well-established cryptographic primitives, such as encrypted circuits, threshold decryption, and oblivious transfer. Our study of these tools yields new contributions to the general field of cryptography. Particularly, we correct a well-known construction of the encrypted circuit and give one of the first provably secure implementations of the encrypted circuit.
A general purpose semantic parser using FrameNet and WordNet®.
Syntactic parsing is one of the best understood language processing applications. Since language and grammar have been formally defined, it is easy for computers to parse the syntactic structure of natural language text. Does meaning have structure as well? If it has, how can we analyze the structure? Previous systems rely on a one-to-one correspondence between syntactic rules and semantic rules. But such systems can only be applied to limited fragments of English. In this thesis, we propose a general-purpose shallow semantic parser which utilizes a semantic network (WordNet), and a frame dataset (FrameNet). Semantic relations recognized by the parser are based on how human beings represent knowledge of the world. Parsing semantic structure allows semantic units and constituents to be accessed and processed in a more meaningful way than syntactic parsing, moving the automation of understanding natural language text to a higher level.
Evaluating the Scalability of SDF Single-chip Multiprocessor Architecture Using Automatically Parallelizing Code
Advances in integrated circuit technology continue to provide more and more transistors on a chip. Computer architects are faced with the challenge of finding the best way to translate these resources into high performance. The challenge in the design of next generation CPU (central processing unit) lies not on trying to use up the silicon area, but on finding smart ways to make use of the wealth of transistors now available. In addition, the next generation architecture should offer high throughout performance, scalability, modularity, and low energy consumption, instead of an architecture that is suitable for only one class of applications or users, or only emphasize faster clock rate. A program exhibits different types of parallelism: instruction level parallelism (ILP), thread level parallelism (TLP), or data level parallelism (DLP). Likewise, architectures can be designed to exploit one or more of these types of parallelism. It is generally not possible to design architectures that can take advantage of all three types of parallelism without using very complex hardware structures and complex compiler optimizations. We present the state-of-art architecture SDF (scheduled data flowed) which explores the TLP parallelism as much as that is supplied by that application. We implement a SDF single-chip multiprocessor constructed from simpler processors and execute the automatically parallelizing application on the single-chip multiprocessor. SDF has many desirable features such as high throughput, scalability, and low power consumption, which meet the requirements of the next generation of CPU design. Compared with superscalar, VLIW (very long instruction word), and SMT (simultaneous multithreading), the experiment results show that for application with very little parallelism SDF is comparable to other architectures, for applications with large amounts of parallelism SDF outperforms other architectures.
A Comparison of Agent-Oriented Software Engineering Frameworks and Methodologies
Agent-oriented software engineering (AOSE) covers issues on developing systems with software agents. There are many techniques, mostly agent-oriented and object-oriented, ready to be chosen as building blocks to create agent-based systems. There have been several AOSE methodologies proposed intending to show engineers guidelines on how these elements are constituted in having agents achieve the overall system goals. Although these solutions are promising, most of them are designed in ad-hoc manner without truly obeying software developing life-cycle fully, as well as lacking of examinations on agent-oriented features. To address these issues, we investigated state-of-the-art techniques and AOSE methodologies. By examining them in different respects, we commented on the strength and weakness of them. Toward a formal study, a comparison framework has been set up regarding four aspects, including concepts and properties, notations and modeling techniques, process, and pragmatics. Under these criteria, we conducted the comparison in both overview and detailed level. The comparison helped us with empirical and analytical study, to inspect the issues on how an ideal agent-based system will be formed.
Intelligent Memory Manager: Towards improving the locality behavior of allocation-intensive applications.
Dynamic memory management required by allocation-intensive (i.e., Object Oriented and linked data structured) applications has led to a large number of research trends. Memory performance due to the cache misses in these applications continues to lag in terms of execution cycles as ever increasing CPU-Memory speed gap continues to grow. Sophisticated prefetcing techniques, data relocations, and multithreaded architectures have tried to address memory latency. These techniques are not completely successful since they require either extra hardware/software in the system or special properties in the applications. Software needed for prefetching and data relocation strategies, aimed to improve cache performance, pollutes the cache so that the technique itself becomes counter-productive. On the other hand, extra hardware complexity needed in multithreaded architectures decelerates CPU's clock, since "Simpler is Faster." This dissertation, directed to seek the cause of poor locality behavior of allocation--intensive applications, studies allocators and their impact on the cache performance of these applications. Our study concludes that service functions, in general, and memory management functions, in particular, entangle with application's code and become the major cause of cache pollution. In this dissertation, we present a novel technique that transfers the allocation and de-allocation functions entirely to a separate processor residing in chip with DRAM (Intelligent Memory Manager). Our empirical results show that, on average, 60% of the cache misses caused by allocation and de-allocation service functions are eliminated using our technique.
Embedded monitors for detecting and preventing intrusions in cryptographic and application protocols.
There are two main approaches for intrusion detection: signature-based and anomaly-based. Signature-based detection employs pattern matching to match attack signatures with observed data making it ideal for detecting known attacks. However, it cannot detect unknown attacks for which there is no signature available. Anomaly-based detection builds a profile of normal system behavior to detect known and unknown attacks as behavioral deviations. However, it has a drawback of a high false alarm rate. In this thesis, we describe our anomaly-based IDS designed for detecting intrusions in cryptographic and application-level protocols. Our system has several unique characteristics, such as the ability to monitor cryptographic protocols and application-level protocols embedded in encrypted sessions, a very lightweight monitoring process, and the ability to react to protocol misuse by modifying protocol response directly.
Graph-Based Keyphrase Extraction Using Wikipedia
Keyphrases describe a document in a coherent and simple way, giving the prospective reader a way to quickly determine whether the document satisfies their information needs. The pervasion of huge amount of information on Web, with only a small amount of documents have keyphrases extracted, there is a definite need to discover automatic keyphrase extraction systems. Typically, a document written by human develops around one or more general concepts or sub-concepts. These concepts or sub-concepts should be structured and semantically related with each other, so that they can form the meaningful representation of a document. Considering the fact, the phrases or concepts in a document are related to each other, a new approach for keyphrase extraction is introduced that exploits the semantic relations in the document. For measuring the semantic relations between concepts or sub-concepts in the document, I present a comprehensive study aimed at using collaboratively constructed semantic resources like Wikipedia and its link structure. In particular, I introduce a graph-based keyphrase extraction system that exploits the semantic relations in the document and features such as term frequency. I evaluated the proposed system using novel measures and the results obtained compare favorably with previously published results on established benchmarks.
Group-EDF: A New Approach and an Efficient Non-Preemptive Algorithm for Soft Real-Time Systems
Hard real-time systems in robotics, space and military missions, and control devices are specified with stringent and critical time constraints. On the other hand, soft real-time applications arising from multimedia, telecommunications, Internet web services, and games are specified with more lenient constraints. Real-time systems can also be distinguished in terms of their implementation into preemptive and non-preemptive systems. In preemptive systems, tasks are often preempted by higher priority tasks. Non-preemptive systems are gaining interest for implementing soft-real applications on multithreaded platforms. In this dissertation, I propose a new algorithm that uses a two-level scheduling strategy for scheduling non-preemptive soft real-time tasks. Our goal is to improve the success ratios of the well-known earliest deadline first (EDF) approach when the load on the system is very high and to improve the overall performance in both underloaded and overloaded conditions. Our approach, known as group-EDF (gEDF), is based on dynamic grouping of tasks with deadlines that are very close to each other, and using a shortest job first (SJF) technique to schedule tasks within the group. I believe that grouping tasks dynamically with similar deadlines and utilizing secondary criteria, such as minimizing the total execution time can lead to new and more efficient real-time scheduling algorithms. I present results comparing gEDF with other real-time algorithms including, EDF, best-effort, and guarantee scheme, by using randomly generated tasks with varying execution times, release times, deadlines and tolerances to missing deadlines, under varying workloads. Furthermore, I implemented the gEDF algorithm in the Linux kernel and evaluated gEDF for scheduling real applications.
Modeling Infectious Disease Spread Using Global Stochastic Field Simulation
Susceptibles-infectives-removals (SIR) and its derivatives are the classic mathematical models for the study of infectious diseases in epidemiology. In order to model and simulate epidemics of an infectious disease, a global stochastic field simulation paradigm (GSFS) is proposed, which incorporates geographic and demographic based interactions. The interaction measure between regions is a function of population density and geographical distance, and has been extended to include demographic and migratory constraints. The progression of diseases using GSFS is analyzed, and similar behavior to the SIR model is exhibited by GSFS, using the geographic information systems (GIS) gravity model for interactions. The limitations of the SIR and similar models of homogeneous population with uniform mixing are addressed by the GSFS model. The GSFS model is oriented to heterogeneous population, and can incorporate interactions based on geography, demography, environment and migration patterns. The progression of diseases can be modeled at higher levels of fidelity using the GSFS model, and facilitates optimal deployment of public health resources for prevention, control and surveillance of infectious diseases.
Impact of actual interference on capacity and call admission control in a CDMA network.
An overwhelming number of models in the literature use average inter-cell interference for the calculation of capacity of a Code Division Multiple Access (CDMA) network. The advantage gained in terms of simplicity by using such models comes at the cost of rendering the exact location of a user within a cell irrelevant. We calculate the actual per-user interference and analyze the effect of user-distribution within a cell on the capacity of a CDMA network. We show that even though the capacity obtained using average interference is a good approximation to the capacity calculated using actual interference for a uniform user distribution, the deviation can be tremendously large for non-uniform user distributions. Call admission control (CAC) algorithms are responsible for efficient management of a network's resources while guaranteeing the quality of service and grade of service, i.e., accepting the maximum number of calls without affecting the quality of service of calls already present in the network. We design and implement global and local CAC algorithms, and through simulations compare their network throughput and blocking probabilities for varying mobility scenarios. We show that even though our global CAC is better at resource management, the lack of substantial gain in network throughput and exponential increase in complexity makes our optimized local CAC algorithm a much better choice for a given traffic distribution profile.
Bayesian Probabilistic Reasoning Applied to Mathematical Epidemiology for Predictive Spatiotemporal Analysis of Infectious Diseases
Abstract Probabilistic reasoning under uncertainty suits well to analysis of disease dynamics. The stochastic nature of disease progression is modeled by applying the principles of Bayesian learning. Bayesian learning predicts the disease progression, including prevalence and incidence, for a geographic region and demographic composition. Public health resources, prioritized by the order of risk levels of the population, will efficiently minimize the disease spread and curtail the epidemic at the earliest. A Bayesian network representing the outbreak of influenza and pneumonia in a geographic region is ported to a newer region with different demographic composition. Upon analysis for the newer region, the corresponding prevalence of influenza and pneumonia among the different demographic subgroups is inferred for the newer region. Bayesian reasoning coupled with disease timeline is used to reverse engineer an influenza outbreak for a given geographic and demographic setting. The temporal flow of the epidemic among the different sections of the population is analyzed to identify the corresponding risk levels. In comparison to spread vaccination, prioritizing the limited vaccination resources to the higher risk groups results in relatively lower influenza prevalence. HIV incidence in Texas from 1989-2002 is analyzed using demographic based epidemic curves. Dynamic Bayesian networks are integrated with probability distributions of HIV surveillance data coupled with the census population data to estimate the proportion of HIV incidence among the different demographic subgroups. Demographic based risk analysis lends to observation of varied spectrum of HIV risk among the different demographic subgroups. A methodology using hidden Markov models is introduced that enables to investigate the impact of social behavioral interactions in the incidence and prevalence of infectious diseases. The methodology is presented in the context of simulated disease outbreak data for influenza. Probabilistic reasoning analysis enhances the understanding of disease progression in order to identify the critical points of surveillance, control and prevention. Public health resources, prioritized by the order of risk levels of the population, will efficiently minimize the disease spread and curtail the epidemic at the earliest.
An Integrated Architecture for Ad Hoc Grids
Extensive research has been conducted by the grid community to enable large-scale collaborations in pre-configured environments. grid collaborations can vary in scale and motivation resulting in a coarse classification of grids: national grid, project grid, enterprise grid, and volunteer grid. Despite the differences in scope and scale, all the traditional grids in practice share some common assumptions. They support mutually collaborative communities, adopt a centralized control for membership, and assume a well-defined non-changing collaboration. To support grid applications that do not confirm to these assumptions, we propose the concept of ad hoc grids. In the context of this research, we propose a novel architecture for ad hoc grids that integrates a suite of component frameworks. Specifically, our architecture combines the community management framework, security framework, abstraction framework, quality of service framework, and reputation framework. The overarching objective of our integrated architecture is to support a variety of grid applications in a self-controlled fashion with the help of a self-organizing ad hoc community. We introduce mechanisms in our architecture that successfully isolates malicious elements from the community, inherently improving the quality of grid services and extracting deterministic quality assurances from the underlying infrastructure. We also emphasize on the technology-independence of our architecture, thereby offering the requisite platform for technology interoperability. The feasibility of the proposed architecture is verified with a high-quality ad hoc grid implementation. Additionally, we have analyzed the performance and behavior of ad hoc grids with respect to several control parameters.
Measuring Semantic Relatedness Using Salient Encyclopedic Concepts
While pragmatics, through its integration of situational awareness and real world relevant knowledge, offers a high level of analysis that is suitable for real interpretation of natural dialogue, semantics, on the other end, represents a lower yet more tractable and affordable linguistic level of analysis using current technologies. Generally, the understanding of semantic meaning in literature has revolved around the famous quote ``You shall know a word by the company it keeps''. In this thesis we investigate the role of context constituents in decoding the semantic meaning of the engulfing context; specifically we probe the role of salient concepts, defined as content-bearing expressions which afford encyclopedic definitions, as a suitable source of semantic clues to an unambiguous interpretation of context. Furthermore, we integrate this world knowledge in building a new and robust unsupervised semantic model and apply it to entail semantic relatedness between textual pairs, whether they are words, sentences or paragraphs. Moreover, we explore the abstraction of semantics across languages and utilize our findings into building a novel multi-lingual semantic relatedness model exploiting information acquired from various languages. We demonstrate the effectiveness and the superiority of our mono-lingual and multi-lingual models through a comprehensive set of evaluations on specialized synthetic datasets for semantic relatedness as well as real world applications such as paraphrase detection and short answer grading. Our work represents a novel approach to integrate world-knowledge into current semantic models and a means to cross the language boundary for a better and more robust semantic relatedness representation, thus opening the door for an improved abstraction of meaning that carries the potential of ultimately imparting understanding of natural language to machines.
Performance comparison of data distribution management strategies in large-scale distributed simulation.
Data distribution management (DDM) is a High Level Architecture/Run-time Infrastructure (HLA/RTI) service that manages the distribution of state updates and interaction information in large-scale distributed simulations. The key to efficient DDM is to limit and control the volume of data exchanged during the simulation, to relay data to only those hosts requiring the data. This thesis focuses upon different DDM implementations and strategies. This thesis includes analysis of three DDM methods including the fixed grid-based, dynamic grid-based, and region-based methods. Also included is the use of multi-resolution modeling with various DDM strategies and analysis of the performance effects of aggregation/disaggregation with these strategies. Running numerous federation executions, I simulate four different scenarios on a cluster of workstations with a mini-RTI Kit framework and propose a set of benchmarks for a comparison of the DDM schemes. The goals of this work are to determine the most efficient model for applying each DDM scheme, discover the limitations of the scalability of the various DDM methods, evaluate the effects of aggregation/disaggregation on performance and resource usage, and present accepted benchmarks for use in future research.
An Empirical Evaluation of Communication and Coordination Effectiveness in Autonomous Reactive Multiagent Systems
This thesis describes experiments designed to measure the effect of collaborative communication on task performance of a multiagent system. A discrete event simulation was developed to model a multi-agent system completing a task to find and collect food resources, with the ability to substitute various communication and coordination methods. Experiments were conducted to find the effects of the various communication methods on completion of the task to find and harvest the food resources. Results show that communication decreases the time required to complete the task. However, all communication methods do not fare equally well. In particular, results indicate that the communication model of the bee is a particularly effective method of agent communication and collaboration. Furthermore, results indicate that direct communication with additional information content provides better completion results. Cost-benefit models show some conflicting information, indicating that the increased performance may not offset the additional cost of achieving that performance.
Investigating the Extractive Summarization of Literary Novels
Abstract Due to the vast amount of information we are faced with, summarization has become a critical necessity of everyday human life. Given that a large fraction of the electronic documents available online and elsewhere consist of short texts such as Web pages, news articles, scientific reports, and others, the focus of natural language processing techniques to date has been on the automation of methods targeting short documents. We are witnessing however a change: an increasingly larger number of books become available in electronic format. This means that the need for language processing techniques able to handle very large documents such as books is becoming increasingly important. This thesis addresses the problem of summarization of novels, which are long and complex literary narratives. While there is a significant body of research that has been carried out on the task of automatic text summarization, most of this work has been concerned with the summarization of short documents, with a particular focus on news stories. However, novels are different in both length and genre, and consequently different summarization techniques are required. This thesis attempts to close this gap by analyzing a new domain for summarization, and by building unsupervised and supervised systems that effectively take into account the properties of long documents, and outperform the traditional extractive summarization systems typically addressing news genre.
The enhancement of machine translation for low-density languages using Web-gathered parallel texts.
The majority of the world's languages are poorly represented in informational media like radio, television, newspapers, and the Internet. Translation into and out of these languages may offer a way for speakers of these languages to interact with the wider world, but current statistical machine translation models are only effective with a large corpus of parallel texts - texts in two languages that are translations of one another - which most languages lack. This thesis describes the Babylon project which attempts to alleviate this shortage by supplementing existing parallel texts with texts gathered automatically from the Web -- specifically targeting pages that contain text in a pair of languages. Results indicate that parallel texts gathered from the Web can be effectively used as a source of training data for machine translation and can significantly improve the translation quality for text in a similar domain. However, the small quantity of high-quality low-density language parallel texts on the Web remains a significant obstacle.