You limited your search to:
- The Design and Implementation of an Intelligent Agent-Based File System
- As bandwidth constraints on LAN/WAN environments decrease, the demand for distributed services will continue to increase. In particular, the proliferation of user-level applications requiring high-capacity distributed file storage systems will demand that such services be universally available. At the same time, the advent of high-speed networks have made the deployment of application and communication solutions based upon an Intelligent Mobile Agent (IMA) framework practical. Agents have proven to present an ideal development paradigm for the creation of autonomous large-scale distributed systems, and an agent-based communication scheme would facilitate the creation of independently administered distributed file services. This thesis thus outlines an architecture for such a distributed file system based upon an IMA communication framework.
- Design and Implementation of Large-Scale Wireless Sensor Networks for Environmental Monitoring Applications
- Environmental monitoring represents a major application domain for wireless sensor networks (WSN). However, despite significant advances in recent years, there are still many challenging issues to be addressed to exploit the full potential of the emerging WSN technology. In this dissertation, we introduce the design and implementation of low-power wireless sensor networks for long-term, autonomous, and near-real-time environmental monitoring applications. We have developed an out-of-box solution consisting of a suite of software, protocols and algorithms to provide reliable data collection with extremely low power consumption. Two wireless sensor networks based on the proposed solution have been deployed in remote field stations to monitor soil moisture along with other environmental parameters. As parts of the ever-growing environmental monitoring cyberinfrastructure, these networks have been integrated into the Texas Environmental Observatory system for long-term operation. Environmental measurement and network performance results are presented to demonstrate the capability, reliability and energy-efficiency of the network.
- The Design Of A Benchmark For Geo-stream Management Systems
- The recent growth in sensor technology allows easier information gathering in real-time as sensors have grown smaller, more accurate, and less expensive. The resulting data is often in a geo-stream format continuously changing input with a spatial extent. Researchers developing geo-streaming management systems (GSMS) require a benchmark system for evaluation, which is currently lacking. This thesis presents GSMark, a benchmark for evaluating GSMSs. GSMark provides a data generator that creates a combination of synthetic and real geo-streaming data, a workload simulator to present the data to the GSMS as a data stream, and a set of benchmark queries that evaluate typical GSMS functionality and query performance. In particular, GSMark generates both moving points and evolving spatial regions, two fundamental data types for a broad range of geo-stream applications, and the geo-streaming queries on this data.
- Developing a Test Bed for Interactive Narrative in Virtual Environments
Access: Use of this item is restricted to the UNT Community.
As Virtual Environments (VE) become a more commonly used method of interaction and presentation, supporting users as they navigate and interact with scenarios presented in VE will be a significant issue. A key step in understanding the needs of users in these situations will be observing them perform representative tasks in a fully developed environment. In this paper, we describe the development of a test bed for interactive narrative in a virtual environment. The test bed was specifically developed to present multiple, simultaneous sequences of events (scenarios or narratives) and to support user navigation through these scenarios. These capabilities will support the development of multiple users testing scenarios, allowing us to study and better understand the needs of users of narrative VEs.
- Development, Implementation, and Analysis of a Contact Model for an Infectious Disease
- With a growing concern of an infectious diseases spreading in a population, epidemiology is becoming more important for the future of public health. In the past epidemiologist used existing data of an outbreak to help them determine how an infectious disease might spread in the future. Now with computational models, they able to analysis data produced by these models to help with prevention and intervention plans. This paper looks at the design, implementation, and analysis of a computational model based on the interactions of the population between individuals. The design of the working contact model looks closely at the SEIR model used as the foundation and the two timelines of a disease. The implementation of the contact model is reviewed while looking closely at data structures. The analysis of the experiments provide evidence this contact model can be used to help epidemiologist study the spread of an infectious disease based on the contact rate of individuals.
- DICOM Image Scrubbing Software Library/Utility
- This software is aimed at providing user-friendly, easy-to-use environment for the user to scrub (de-identify/modify) the DICOM header information. Some tools either anonymize or default the values without the user interaction. The user doesn't have the flexibility to edit the header information. One cannot scrub set of images simultaneously (batch scrubbing). This motivated to develop a tool/utility that can scrub a set of images in a single step more efficiently. This document also addresses security issues of the patient confidentiality to achieve protection of patient identifying information and some technical requirements
- Direct Online/Offline Digital Signature Schemes.
- Online/offline signature schemes are useful in many situations, and two such scenarios are considered in this dissertation: bursty server authentication and embedded device authentication. In this dissertation, new techniques for online/offline signing are introduced, those are applied in a variety of ways for creating online/offline signature schemes, and five different online/offline signature schemes that are proved secure under a variety of models and assumptions are proposed. Two of the proposed five schemes have the best offline or best online performance of any currently known technique, and are particularly well-suited for the scenarios that are considered in this dissertation. To determine if the proposed schemes provide the expected practical improvements, a series of experiments were conducted comparing the proposed schemes with each other and with other state-of-the-art schemes in this area, both on a desktop class computer, and under AVR Studio, a simulation platform for an 8-bit processor that is popular for embedded systems. Under AVR Studio, the proposed SGE scheme using a typical key size for the embedded device authentication scenario, can complete the offline phase in about 24 seconds and then produce a signature (the online phase) in 15 milliseconds, which is the best offline performance of any known signature scheme that has been proven secure in the standard model. In the tests on a desktop class computer, the proposed SGS scheme, which has the best online performance and is designed for the bursty server authentication scenario, generated 469,109 signatures per second, and the Schnorr scheme (the next best scheme in terms of online performance) generated only 223,548 signatures. The experimental results demonstrate that the SGE and SGS schemes are the most efficient techniques for embedded device authentication and bursty server authentication, respectively.
- DirectShow Approach to Low-Cost Multimedia Security Surveillance System
- In response to the recent intensive needs for civilian security surveillance, both full and compact versions of a Multimedia Security Surveillance (MSS) system have been built up. The new Microsoft DirectShow technology was applied in implementing the multimedia stream-processing module. Through Microsoft Windows Driver Model interface, the chosen IEEE1394 enabled Fire-i cameras as external sensors are integrated with PC based continuous storage unit. The MSS application also allows multimedia broadcasting and remote controls. Cost analysis is included.
- Dynamic Grid-Based Data Distribution Management in Large Scale Distributed Simulations
- Distributed simulation is an enabling concept to support the networked interaction of models and real world elements that are geographically distributed. This technology has brought a new set of challenging problems to solve, such as Data Distribution Management (DDM). The aim of DDM is to limit and control the volume of the data exchanged during a distributed simulation, and reduce the processing requirements of the simulation hosts by relaying events and state information only to those applications that require them. In this thesis, we propose a new DDM scheme, which we refer to as dynamic grid-based DDM. A lightweight UNT-RTI has been developed and implemented to investigate the performance of our DDM scheme. Our results clearly indicate that our scheme is scalable and it significantly reduces both the number of multicast groups used, and the message overhead, when compared to previous grid-based allocation schemes using large-scale and real-world scenarios.
- Dynamic Resource Management in RSVP- Controlled Unicast Networks
- Resources are said to be fragmented in the network when they are available in non-contiguous blocks, and calls are dropped as they may not end sufficient resources. Hence, available resources may remain unutilized. In this thesis, the effect of resource fragmentation (RF) on RSVP-controlled networks was studied and new algorithms were proposed to reduce the effect of RF. In order to minimize the effect of RF, resources in the network are dynamically redistributed on different paths to make them available in contiguous blocks. Extra protocol messages are introduced to facilitate resource redistribution in the network. The Dynamic Resource Redistribution (DRR) algorithm when used in conjunction with RSVP, not only increased the number of calls accommodated into the network but also increased the overall resource utilization of the network. Issues such as how many resources need to be redistributed and of which call(s), and how these choices affect the redistribution process were investigated. Further, various simulation experiments were conducted to study the performance of the DRR algorithm on different network topologies with varying traffic characteristics.
- Elicitation of Protein-Protein Interactions from Biomedical Literature Using Association Rule Discovery
- Extracting information from a stack of data is a tedious task and the scenario is no different in proteomics. Volumes of research papers are published about study of various proteins in several species, their interactions with other proteins and identification of protein(s) as possible biomarker in causing diseases. It is a challenging task for biologists to keep track of these developments manually by reading through the literatures. Several tools have been developed by computer linguists to assist identification, extraction and hypotheses generation of proteins and protein-protein interactions from biomedical publications and protein databases. However, they are confronted with the challenges of term variation, term ambiguity, access only to abstracts and inconsistencies in time-consuming manual curation of protein and protein-protein interaction repositories. This work attempts to attenuate the challenges by extracting protein-protein interactions in humans and elicit possible interactions using associative rule mining on full text, abstracts and captions from figures available from publicly available biomedical literature databases. Two such databases are used in our study: Directory of Open Access Journals (DOAJ) and PubMed Central (PMC). A corpus is built using articles based on search terms. A dataset of more than 38,000 protein-protein interactions from the Human Protein Reference Database (HPRD) is cross-referenced to validate discovered interactive pairs. A set of an optimal size of possible binary protein-protein interactions is generated to be made available for clinician or biological validation. A significant change in the number of new associations was found by altering the thresholds for support and confidence metrics. This study narrows down the limitations for biologists in keeping pace with discovery of protein-protein interactions via manually reading the literature and their needs to validate each and every possible interaction.
- Embedded monitors for detecting and preventing intrusions in cryptographic and application protocols.
- There are two main approaches for intrusion detection: signature-based and anomaly-based. Signature-based detection employs pattern matching to match attack signatures with observed data making it ideal for detecting known attacks. However, it cannot detect unknown attacks for which there is no signature available. Anomaly-based detection builds a profile of normal system behavior to detect known and unknown attacks as behavioral deviations. However, it has a drawback of a high false alarm rate. In this thesis, we describe our anomaly-based IDS designed for detecting intrusions in cryptographic and application-level protocols. Our system has several unique characteristics, such as the ability to monitor cryptographic protocols and application-level protocols embedded in encrypted sessions, a very lightweight monitoring process, and the ability to react to protocol misuse by modifying protocol response directly.
- An Empirical Evaluation of Communication and Coordination Effectiveness in Autonomous Reactive Multiagent Systems
- This thesis describes experiments designed to measure the effect of collaborative communication on task performance of a multiagent system. A discrete event simulation was developed to model a multi-agent system completing a task to find and collect food resources, with the ability to substitute various communication and coordination methods. Experiments were conducted to find the effects of the various communication methods on completion of the task to find and harvest the food resources. Results show that communication decreases the time required to complete the task. However, all communication methods do not fare equally well. In particular, results indicate that the communication model of the bee is a particularly effective method of agent communication and collaboration. Furthermore, results indicate that direct communication with additional information content provides better completion results. Cost-benefit models show some conflicting information, indicating that the increased performance may not offset the additional cost of achieving that performance.
- End of Insertion Detection in Colonoscopy Videos
- Colorectal cancer is the second leading cause of cancer-related deaths behind lung cancer in the United States. Colonoscopy is the preferred screening method for detection of diseases like Colorectal Cancer. In the year 2006, American Society for Gastrointestinal Endoscopy (ASGE) and American College of Gastroenterology (ACG) issued guidelines for quality colonoscopy. The guidelines suggest that on average the withdrawal phase during a screening colonoscopy should last a minimum of 6 minutes. My aim is to classify the colonoscopy video into insertion and withdrawal phase. The problem is that currently existing shot detection techniques cannot be applied because colonoscopy is a single camera shot from start to end. An algorithm to detect phase boundary has already been developed by the MIGLAB team. Existing method has acceptable levels of accuracy but the main issue is dependency on MPEG (Moving Pictures Expert Group) 1/2. I implemented exhaustive search for motion estimation to reduce the execution time and improve the accuracy. I took advantages of the C/C++ programming languages with multithreading which helped us get even better performances in terms of execution time. I propose a method for improving the current method of colonoscopy video analysis and also an extension for the same to make it usable for real time videos. The real time version we implemented is capable of handling streams coming directly from the camera in the form of uncompressed bitmap frames. Existing implementation could not be applied to real time scenario because of its dependency on MPEG 1/2. Future direction of this research includes improved motion search and GPU parallel computing techniques.
- The enhancement of machine translation for low-density languages using Web-gathered parallel texts.
- The majority of the world's languages are poorly represented in informational media like radio, television, newspapers, and the Internet. Translation into and out of these languages may offer a way for speakers of these languages to interact with the wider world, but current statistical machine translation models are only effective with a large corpus of parallel texts - texts in two languages that are translations of one another - which most languages lack. This thesis describes the Babylon project which attempts to alleviate this shortage by supplementing existing parallel texts with texts gathered automatically from the Web -- specifically targeting pages that contain text in a pair of languages. Results indicate that parallel texts gathered from the Web can be effectively used as a source of training data for machine translation and can significantly improve the translation quality for text in a similar domain. However, the small quantity of high-quality low-density language parallel texts on the Web remains a significant obstacle.
- Ensuring Authenticity and Integrity of Critical Information Using XML Digital Signatures
Access: Use of this item is restricted to the UNT Community.
It has been noticed in the past five years that the Internet use has been troubled by the lack of sufficient security and a legal framework to enable electronic commerce to flourish. Despite these shortcomings, governments, businesses and individuals are using the Internet more often as an inexpensive and ubiquitous means to disseminate and obtain information, goods and services. The Internet is insecure -- potentially millions of people have access, and "hackers" can intercept anything traveling over the wire. There is no way to make it a secure environment; it is, after all, a public network, hence the availability and affordability. In order for it to serve our purposes as a vehicle for legally binding transactions, efforts must be directed at securing the message itself, as opposed to the transport mechanism. Digital signatures have been evolved in the recent years as the best tool for ensuring the authenticity and integrity of critical information in the so called "paperless office". A model using XML digital signatures is developed and the level of security provided by this model in the real world scenario is outlined.
- Evaluating the Scalability of SDF Single-chip Multiprocessor Architecture Using Automatically Parallelizing Code
- Advances in integrated circuit technology continue to provide more and more transistors on a chip. Computer architects are faced with the challenge of finding the best way to translate these resources into high performance. The challenge in the design of next generation CPU (central processing unit) lies not on trying to use up the silicon area, but on finding smart ways to make use of the wealth of transistors now available. In addition, the next generation architecture should offer high throughout performance, scalability, modularity, and low energy consumption, instead of an architecture that is suitable for only one class of applications or users, or only emphasize faster clock rate. A program exhibits different types of parallelism: instruction level parallelism (ILP), thread level parallelism (TLP), or data level parallelism (DLP). Likewise, architectures can be designed to exploit one or more of these types of parallelism. It is generally not possible to design architectures that can take advantage of all three types of parallelism without using very complex hardware structures and complex compiler optimizations. We present the state-of-art architecture SDF (scheduled data flowed) which explores the TLP parallelism as much as that is supplied by that application. We implement a SDF single-chip multiprocessor constructed from simpler processors and execute the automatically parallelizing application on the single-chip multiprocessor. SDF has many desirable features such as high throughput, scalability, and low power consumption, which meet the requirements of the next generation of CPU design. Compared with superscalar, VLIW (very long instruction word), and SMT (simultaneous multithreading), the experiment results show that for application with very little parallelism SDF is comparable to other architectures, for applications with large amounts of parallelism SDF outperforms other architectures.
- Evaluation of MPLS Enabled Networks
Access: Use of this item is restricted to the UNT Community.
Recent developments in the Internet have inspired a wide range of business and consumer applications. The deployment of multimedia-based services has driven the demand for increased and guaranteed bandwidth requirements over the network. The diverse requirements of the wide range of users demand differentiated classes of service and quality assurance. The new technology of Multi-protocol label switching (MPLS) has emerged as a high performance and reliable option to address these challenges apart from the additional features that were not addressed before. This problem in lieu of thesis describes how the new paradigm of MPLS is advantageous over the conventional architecture. The motivation for this paradigm is discussed in the first part, followed by a detailed description of this new architecture. The information flow, the underlying protocols and the MPLS extensions to some of the traditional protocols are then discussed followed by the description of the simulation. The simulation results are used to show the advantages of the proposed technology.
- Exploring Trusted Platform Module Capabilities: A Theoretical and Experimental Study
- Trusted platform modules (TPMs) are hardware modules that are bound to a computer's motherboard, that are being included in many desktops and laptops. Augmenting computers with these hardware modules adds powerful functionality in distributed settings, allowing us to reason about the security of these systems in new ways. In this dissertation, I study the functionality of TPMs from a theoretical as well as an experimental perspective. On the theoretical front, I leverage various features of TPMs to construct applications like random oracles that are impossible to implement in a standard model of computation. Apart from random oracles, I construct a new cryptographic primitive which is basically a non-interactive form of the standard cryptographic primitive of oblivious transfer. I apply this new primitive to secure mobile agent computations, where interaction between various entities is typically required to ensure security. I prove these constructions are secure using standard cryptographic techniques and assumptions. To test the practicability of these constructions and their applications, I performed an experimental study, both on an actual TPM and a software TPM simulator which has been enhanced to make it reflect timings from a real TPM. This allowed me to benchmark the performance of the applications and test the feasibility of the proposed extensions to standard TPMs. My tests also show that these constructions are practical.
- Extensions to Jinni Mobile Agent Architecture
- We extend the Jinni mobile agent architecture with a multicast network transport layer, an agent-to-agent delegation mechanism and a reflection based Prolog-to-Java interface. To ensure that our agent infrastructure runs efficiently, independently of router-level multicast support, we describe a blackboard based algorithm for locating a randomly roaming agent. As part of the agent-to-agent delegation mechanism, we describe an alternative to code-fetching mechanism for stronger mobility of mobile agents with less network overhead. In the context of direct and reflection based extension mechanisms for Jinni, we describe the design and the implementation of a reflection based Prolog-to-Java interface. The presence of subtyping and method overloading makes finding the most specific method corresponding to a Prolog call pattern fairly difficult. We describe a run-time algorithm which provides accurate handling of overloaded methods beyond Java's reflection package's limitations.
- The Feasibility of Multicasting in RMI
- Due to the growing need of the Internet and networking technologies, simple, powerful, easily maintained distributed applications needed to be developed. These kinds of applications can benefit greatly from distributed computing concepts. Despite its powerful mechanisms, Jini has yet to be accepted in mainstream Java development. Until that happens, we need to find better Remote Method Invocation (RMI) solutions. Feasibility of implementation of Multicasting in RMI is worked in this paper. Multicasting capability can be used in RMI using Jini-like technique. Support of Multicast over Unicast reference layer is also studied. A piece of code explaining how it can be done, is added.
- Flexible Digital Authentication Techniques
- Abstract This dissertation investigates authentication techniques in some emerging areas. Specifically, authentication schemes have been proposed that are well-suited for embedded systems, and privacy-respecting pay Web sites. With embedded systems, a person could own several devices which are capable of communication and interaction, but these devices use embedded processors whose computational capabilities are limited as compared to desktop computers. Examples of this scenario include entertainment devices or appliances owned by a consumer, multiple control and sensor systems in an automobile or airplane, and environmental controls in a building. An efficient public key cryptosystem has been devised, which provides a complete solution to an embedded system, including protocols for authentication, authenticated key exchange, encryption, and revocation. The new construction is especially suitable for the devices with constrained computing capabilities and resources. Compared with other available authentication schemes, such as X.509, identity-based encryption, etc, the new construction provides unique features such as simplicity, efficiency, forward secrecy, and an efficient re-keying mechanism. In the application scenario for a pay Web site, users may be sensitive about their privacy, and do not wish their behaviors to be tracked by Web sites. Thus, an anonymous authentication scheme is desirable in this case. That is, a user can prove his/her authenticity without revealing his/her identity. On the other hand, the Web site owner would like to prevent a bunch of users from sharing a single subscription while hiding behind user anonymity. The Web site should be able to detect these possible malicious behaviors, and exclude corrupted users from future service. This dissertation extensively discusses anonymous authentication techniques, such as group signature, direct anonymous attestation, and traceable signature. Three anonymous authentication schemes have been proposed, which include a group signature scheme with signature claiming and variable linkability, a scheme for direct anonymous attestation in trusted computing platforms with sign and verify protocols nearly seven times more efficient than the current solution, and a state-of-the-art traceable signature scheme with support for variable anonymity. These three schemes greatly advance research in the area of anonymous authentication. The authentication techniques presented in this dissertation are based on common mathematical and cryptographical foundations, sharing similar security assumptions. We call them flexible digital authentication schemes.
- Force-Directed Graph Drawing and Aesthetics Measurement in a Non-Strict Pure Functional Programming Language
- Non-strict pure functional programming often requires redesigning algorithms and data structures to work more effectively under new constraints of non-strict evaluation and immutable state. Graph drawing algorithms, while numerous and broadly studied, have no presence in the non-strict pure functional programming model. Additionally, there is currently no freely licensed standalone toolkit used to quantitatively analyze aesthetics of graph drawings. This thesis addresses two previously unexplored questions. Can a force-directed graph drawing algorithm be implemented in a non-strict functional language, such as Haskell, and still be practically usable? Can an easily extensible aesthetic measuring tool be implemented in a language such as Haskell and still be practically usable? The focus of the thesis is on implementing one of the simplest force-directed algorithms, that of Fruchterman and Reingold, and comparing its resulting aesthetics to those of a well-known C++ implementation of the same algorithm.
- FP-tree Based Spatial Co-location Pattern Mining
- A co-location pattern is a set of spatial features frequently located together in space. A frequent pattern is a set of items that frequently appears in a transaction database. Since its introduction, the paradigm of frequent pattern mining has undergone a shift from candidate generation-and-test based approaches to projection based approaches. Co-location patterns resemble frequent patterns in many aspects. However, the lack of transaction concept, which is crucial in frequent pattern mining, makes the similar shift of paradigm in co-location pattern mining very difficult. This thesis investigates a projection based co-location pattern mining paradigm. In particular, a FP-tree based co-location mining framework and an algorithm called FP-CM, for FP-tree based co-location miner, are proposed. It is proved that FP-CM is complete, correct, and only requires a small constant number of database scans. The experimental results show that FP-CM outperforms candidate generation-and-test based co-location miner by an order of magnitude.
- A Framework for Analyzing and Optimizing Regional Bio-Emergency Response Plans
- The presence of naturally occurring and man-made public health threats necessitate the design and implementation of mitigation strategies, such that adequate response is provided in a timely manner. Since multiple variables, such as geographic properties, resource constraints, and government mandated time-frames must be accounted for, computational methods provide the necessary tools to develop contingency response plans while respecting underlying data and assumptions. A typical response scenario involves the placement of points of dispensing (PODs) in the affected geographic region to supply vaccines or medications to the general public. Computational tools aid in the analysis of such response plans, as well as in the strategic placement of PODs, such that feasible response scenarios can be developed. Due to the sensitivity of bio-emergency response plans, geographic information, such as POD locations, must be kept confidential. The generation of synthetic geographic regions allows for the development of emergency response plans on non-sensitive data, as well as for the study of the effects of single geographic parameters. Further, synthetic representations of geographic regions allow for results to be published and evaluated by the scientific community. This dissertation presents methodology for the analysis of bio-emergency response plans, methods for plan optimization, as well as methodology for the generation of synthetic geographic regions.
- General Purpose Programming on Modern Graphics Hardware
- I start with a brief introduction to the graphics processing unit (GPU) as well as general-purpose computation on modern graphics hardware (GPGPU). Next, I explore the motivations for GPGPU programming, and the capabilities of modern GPUs (including advantages and disadvantages). Also, I give the background required for further exploring GPU programming, including the terminology used and the resources available. Finally, I include a comprehensive survey of previous and current GPGPU work, and end with a look at the future of GPU programming.
- A general purpose semantic parser using FrameNet and WordNet®.
- Syntactic parsing is one of the best understood language processing applications. Since language and grammar have been formally defined, it is easy for computers to parse the syntactic structure of natural language text. Does meaning have structure as well? If it has, how can we analyze the structure? Previous systems rely on a one-to-one correspondence between syntactic rules and semantic rules. But such systems can only be applied to limited fragments of English. In this thesis, we propose a general-purpose shallow semantic parser which utilizes a semantic network (WordNet), and a frame dataset (FrameNet). Semantic relations recognized by the parser are based on how human beings represent knowledge of the world. Parsing semantic structure allows semantic units and constituents to be accessed and processed in a more meaningful way than syntactic parsing, moving the automation of understanding natural language text to a higher level.
- A Global Stochastic Modeling Framework to Simulate and Visualize Epidemics
- Epidemics have caused major human and monetary losses through the course of human civilization. It is very important that epidemiologists and public health personnel are prepared to handle an impending infectious disease outbreak. the ever-changing demographics, evolving infrastructural resources of geographic regions, emerging and re-emerging diseases, compel the use of simulation to predict disease dynamics. By the means of simulation, public health personnel and epidemiologists can predict the disease dynamics, population groups at risk and their geographic locations beforehand, so that they are prepared to respond in case of an epidemic outbreak. As a consequence of the large numbers of individuals and inter-personal interactions involved in simulating infectious disease spread in a region such as a county, sizeable amounts of data may be produced that have to be analyzed. Methods to visualize this data would be effective in facilitating people from diverse disciplines understand and analyze the simulation. This thesis proposes a framework to simulate and visualize the spread of an infectious disease in a population of a region such as a county. As real-world populations have a non-homogeneous demographic and spatial distribution, this framework models the spread of an infectious disease based on population of and geographic distance between census blocks; social behavioral parameters for demographic groups. the population is stratified into demographic groups in individual census blocks using census data. Infection spread is modeled by means of local and global contacts generated between groups of population in census blocks. the strength and likelihood of the contacts are based on population, geographic distance and social behavioral parameters of the groups involved. the disease dynamics are represented on a geographic map of the region using a heat map representation, where the intensity of infection is mapped to a color scale. This framework provides a tool for public health personnel and epidemiologists to run what-if analyses on disease spread in specific populations and plan for epidemic response. By the means of demographic stratification of population and incorporation of geographic distance and social behavioral parameters into the modeling of the outbreak, this framework takes into account non-homogeneity in demographic mix and spatial distribution of the population. Generation of contacts per population group instead of individuals contributes to lowering computational overhead. Heat map representation of the intensity of infection provides an intuitive way to visualize the disease dynamics.
- GPS CaPPture: a System for GPS Trajectory Collection, Processing, and Destination Prediction
- In the United States, smartphone ownership surpassed 69.5 million in February 2011 with a large portion of those users (20%) downloading applications (apps) that enhance the usability of a device by adding additional functionality. a large percentage of apps are written specifically to utilize the geographical position of a mobile device. One of the prime factors in developing location prediction models is the use of historical data to train such a model. with larger sets of training data, prediction algorithms become more accurate; however, the use of historical data can quickly become a downfall if the GPS stream is not collected or processed correctly. Inaccurate or incomplete or even improperly interpreted historical data can lead to the inability to develop accurately performing prediction algorithms. As GPS chipsets become the standard in the ever increasing number of mobile devices, the opportunity for the collection of GPS data increases remarkably. the goal of this study is to build a comprehensive system that addresses the following challenges: (1) collection of GPS data streams in a manner such that the data is highly usable and has a reduction in errors; (2) processing and reduction of the collected data in order to prepare it and make it highly usable for the creation of prediction algorithms; (3) creation of prediction/labeling algorithms at such a level that they are viable for commercial use. This study identifies the key research problems toward building the CaPPture (collection, processing, prediction) system.
- Graph-based Centrality Algorithms for Unsupervised Word Sense Disambiguation
- This thesis introduces an innovative methodology of combining some traditional dictionary based approaches to word sense disambiguation (semantic similarity measures and overlap of word glosses, both based on WordNet) with some graph-based centrality methods, namely the degree of the vertices, Pagerank, closeness, and betweenness. The approach is completely unsupervised, and is based on creating graphs for the words to be disambiguated. We experiment with several possible combinations of the semantic similarity measures as the first stage in our experiments. The next stage attempts to score individual vertices in the graphs previously created based on several graph connectivity measures. During the final stage, several voting schemes are applied on the results obtained from the different centrality algorithms. The most important contributions of this work are not only that it is a novel approach and it works well, but also that it has great potential in overcoming the new-knowledge-acquisition bottleneck which has apparently brought research in supervised WSD as an explicit application to a plateau. The type of research reported in this thesis, which does not require manually annotated data, holds promise of a lot of new and interesting things, and our work is one of the first steps, despite being a small one, in this direction. The complete system is built and tested on standard benchmarks, and is comparable with work done on graph-based word sense disambiguation as well as lexical chains. The evaluation indicates that the right combination of the above mentioned metrics can be used to develop an unsupervised disambiguation engine as powerful as the state-of-the-art in WSD.
- Graph-Based Keyphrase Extraction Using Wikipedia
- Keyphrases describe a document in a coherent and simple way, giving the prospective reader a way to quickly determine whether the document satisfies their information needs. The pervasion of huge amount of information on Web, with only a small amount of documents have keyphrases extracted, there is a definite need to discover automatic keyphrase extraction systems. Typically, a document written by human develops around one or more general concepts or sub-concepts. These concepts or sub-concepts should be structured and semantically related with each other, so that they can form the meaningful representation of a document. Considering the fact, the phrases or concepts in a document are related to each other, a new approach for keyphrase extraction is introduced that exploits the semantic relations in the document. For measuring the semantic relations between concepts or sub-concepts in the document, I present a comprehensive study aimed at using collaboratively constructed semantic resources like Wikipedia and its link structure. In particular, I introduce a graph-based keyphrase extraction system that exploits the semantic relations in the document and features such as term frequency. I evaluated the proposed system using novel measures and the results obtained compare favorably with previously published results on established benchmarks.
- Grid-based Coordinated Routing in Wireless Sensor Networks
- Wireless sensor networks are battery-powered ad-hoc networks in which sensor nodes that are scattered over a region connect to each other and form multi-hop networks. These nodes are equipped with sensors such as temperature sensors, pressure sensors, and light sensors and can be queried to get the corresponding values for analysis. However, since they are battery operated, care has to be taken so that these nodes use energy efficiently. One of the areas in sensor networks where an energy analysis can be done is routing. This work explores grid-based coordinated routing in wireless sensor networks and compares the energy available in the network over time for different grid sizes.
- Group-EDF: A New Approach and an Efficient Non-Preemptive Algorithm for Soft Real-Time Systems
- Hard real-time systems in robotics, space and military missions, and control devices are specified with stringent and critical time constraints. On the other hand, soft real-time applications arising from multimedia, telecommunications, Internet web services, and games are specified with more lenient constraints. Real-time systems can also be distinguished in terms of their implementation into preemptive and non-preemptive systems. In preemptive systems, tasks are often preempted by higher priority tasks. Non-preemptive systems are gaining interest for implementing soft-real applications on multithreaded platforms. In this dissertation, I propose a new algorithm that uses a two-level scheduling strategy for scheduling non-preemptive soft real-time tasks. Our goal is to improve the success ratios of the well-known earliest deadline first (EDF) approach when the load on the system is very high and to improve the overall performance in both underloaded and overloaded conditions. Our approach, known as group-EDF (gEDF), is based on dynamic grouping of tasks with deadlines that are very close to each other, and using a shortest job first (SJF) technique to schedule tasks within the group. I believe that grouping tasks dynamically with similar deadlines and utilizing secondary criteria, such as minimizing the total execution time can lead to new and more efficient real-time scheduling algorithms. I present results comparing gEDF with other real-time algorithms including, EDF, best-effort, and guarantee scheme, by using randomly generated tasks with varying execution times, release times, deadlines and tolerances to missing deadlines, under varying workloads. Furthermore, I implemented the gEDF algorithm in the Linux kernel and evaluated gEDF for scheduling real applications.
- High Performance Architecture using Speculative Threads and Dynamic Memory Management Hardware
- With the advances in very large scale integration (VLSI) technology, hundreds of billions of transistors can be packed into a single chip. With the increased hardware budget, how to take advantage of available hardware resources becomes an important research area. Some researchers have shifted from control flow Von-Neumann architecture back to dataflow architecture again in order to explore scalable architectures leading to multi-core systems with several hundreds of processing elements. In this dissertation, I address how the performance of modern processing systems can be improved, while attempting to reduce hardware complexity and energy consumptions. My research described here tackles both central processing unit (CPU) performance and memory subsystem performance. More specifically I will describe my research related to the design of an innovative decoupled multithreaded architecture that can be used in multi-core processor implementations. I also address how memory management functions can be off-loaded from processing pipelines to further improve system performance and eliminate cache pollution caused by runtime management functions.
- Higher Compression from the Burrows-Wheeler Transform with New Algorithms for the List Update Problem
- Burrows-Wheeler compression is a three stage process in which the data is transformed with the Burrows-Wheeler Transform, then transformed with Move-To-Front, and finally encoded with an entropy coder. Move-To-Front, Transpose, and Frequency Count are some of the many algorithms used on the List Update problem. In 1985, Competitive Analysis first showed the superiority of Move-To-Front over Transpose and Frequency Count for the List Update problem with arbitrary data. Earlier studies due to Bitner assumed independent identically distributed data, and showed that while Move-To-Front adapts to a distribution faster, incurring less overwork, the asymptotic costs of Frequency Count and Transpose are less. The improvements to Burrows-Wheeler compression this work covers are increases in the amount, not speed, of compression. Best x of 2x-1 is a new family of algorithms created to improve on Move-To-Front's processing of the output of the Burrows-Wheeler Transform which is like piecewise independent identically distributed data. Other algorithms for both the middle stage of Burrows-Wheeler compression and the List Update problem for which overwork, asymptotic cost, and competitive ratios are also analyzed are several variations of Move One From Front and part of the randomized algorithm Timestamp. The Best x of 2x - 1 family includes Move-To-Front, the part of Timestamp of interest, and Frequency Count. Lastly, a greedy choosing scheme, Snake, switches back and forth as the amount of compression that two List Update algorithms achieves fluctuates, to increase overall compression. The Burrows-Wheeler Transform is based on sorting of contexts. The other improvements are better sorting orders, such as “aeioubcdf...” instead of standard alphabetical “abcdefghi...” on English text data, and an algorithm for computing orders for any data, and Gray code sorting instead of standard sorting. Both techniques lessen the overwork incurred by whatever List Update algorithms are used by reducing the difference between adjacent sorted contexts.
- Hopfield Networks as an Error Correcting Technique for Speech Recognition
Access: Use of this item is restricted to the UNT Community.
I experimented with Hopfield networks in the context of a voice-based, query-answering system. Hopfield networks are used to store and retrieve patterns. I used this technique to store queries represented as natural language sentences and I evaluated the accuracy of the technique for error correction in a spoken question-answering dialog between a computer and a user. I show that the use of an auto-associative Hopfield network helps make the speech recognition system more fault tolerant. I also looked at the available encoding schemes to convert a natural language sentence into a pattern of zeroes and ones that can be stored in the Hopfield network reliably, and I suggest scalable data representations which allow storing a large number of queries.
- Impact of actual interference on capacity and call admission control in a CDMA network.
- An overwhelming number of models in the literature use average inter-cell interference for the calculation of capacity of a Code Division Multiple Access (CDMA) network. The advantage gained in terms of simplicity by using such models comes at the cost of rendering the exact location of a user within a cell irrelevant. We calculate the actual per-user interference and analyze the effect of user-distribution within a cell on the capacity of a CDMA network. We show that even though the capacity obtained using average interference is a good approximation to the capacity calculated using actual interference for a uniform user distribution, the deviation can be tremendously large for non-uniform user distributions. Call admission control (CAC) algorithms are responsible for efficient management of a network's resources while guaranteeing the quality of service and grade of service, i.e., accepting the maximum number of calls without affecting the quality of service of calls already present in the network. We design and implement global and local CAC algorithms, and through simulations compare their network throughput and blocking probabilities for varying mobility scenarios. We show that even though our global CAC is better at resource management, the lack of substantial gain in network throughput and exponential increase in complexity makes our optimized local CAC algorithm a much better choice for a given traffic distribution profile.
- Implementation of Back Up Host in TCP/IP
Access: Use of this item is restricted to the UNT Community.
This problem in lieu thesis is considering a TCP client H1 making a connection to distant server S and is downloading a file. In the midst of the downloading, if H1 crashes, the TCP connection from H1 to S is lost. In the future, if H1 restarts, the TCP connection from H1 to S will be reestablished and the file will be downloaded again. This cannot happen until host H1 restarts. Now consider a situation where there is a standby host H2 for the host H1. H1 and H2 monitor the health of each other by heartbeat messages (like SCTP). If H2 detects the failure of H1, then H2 takes over. This implies that all resources assigned to H1 are now reassigned or taken over by H2. The host H1 and H2 transmit data between each other when any one of it crashed. Throughout the data transmission process, heart beat chunk is exchanged between the hosts when one of the host crashes. In particular, the IP addresses that were originally assigned to H1 are assigned to H2. In this scenario, movement of the TCP connection between H1 and S to a connection between H2 and S without disrupting the TCP connection is achieved. Distant server S is not aware of any changes going on at the clients.
- Implementation of Scalable Secure Multicasting
Access: Use of this item is restricted to the UNT Community.
A large number of applications like multi-player games, video conferencing, chat groups and network management are presently based on multicast communication. As the group communication model is being deployed for mainstream use, it is critical to provide security mechanisms that facilitate confidentiality, authenticity and integrity in group communications. Providing security in multicast communication requires addressing the problem of scalability in group key distribution. Scalability is a concern in group communication due to group membership dynamics. Joining and leaving of members requires the distribution of a new session key to all the existing members of the group. The two approaches to key management namely centralized and distributed approaches are reviewed. A hybrid solution is then provided, which represents a improved scalable and robust approach for a secure multicast framework. This framework then is implemented in an example application of a multicast news service.
- Improved Approximation Algorithms for Geometric Packing Problems With Experimental Evaluation
Access: Use of this item is restricted to the UNT Community.
Geometric packing problems are NP-complete problems that arise in VLSI design. In this thesis, we present two novel algorithms using dynamic programming to compute exactly the maximum number of k x k squares of unit size that can be packed without overlap into a given n x m grid. The first algorithm was implemented and ran successfully on problems of large input up to 1,000,000 nodes for different values. A heuristic based on the second algorithm is implemented. This heuristic is fast in practice, but may not always be giving optimal times in theory. However, over a wide range of random data this version of the algorithm is giving very good solutions very fast and runs on problems of up to 100,000,000 nodes in a grid and different ranges for the variables. It is also shown that this version of algorithm is clearly superior to the first algorithm and has shown to be very efficient in practice.
- An Integrated Architecture for Ad Hoc Grids
- Extensive research has been conducted by the grid community to enable large-scale collaborations in pre-configured environments. grid collaborations can vary in scale and motivation resulting in a coarse classification of grids: national grid, project grid, enterprise grid, and volunteer grid. Despite the differences in scope and scale, all the traditional grids in practice share some common assumptions. They support mutually collaborative communities, adopt a centralized control for membership, and assume a well-defined non-changing collaboration. To support grid applications that do not confirm to these assumptions, we propose the concept of ad hoc grids. In the context of this research, we propose a novel architecture for ad hoc grids that integrates a suite of component frameworks. Specifically, our architecture combines the community management framework, security framework, abstraction framework, quality of service framework, and reputation framework. The overarching objective of our integrated architecture is to support a variety of grid applications in a self-controlled fashion with the help of a self-organizing ad hoc community. We introduce mechanisms in our architecture that successfully isolates malicious elements from the community, inherently improving the quality of grid services and extracting deterministic quality assurances from the underlying infrastructure. We also emphasize on the technology-independence of our architecture, thereby offering the requisite platform for technology interoperability. The feasibility of the proposed architecture is verified with a high-quality ad hoc grid implementation. Additionally, we have analyzed the performance and behavior of ad hoc grids with respect to several control parameters.
- Intelligent Memory Management Heuristics
- Automatic memory management is crucial in implementation of runtime systems even though it induces a significant computational overhead. In this thesis I explore the use of statistical properties of the directed graph describing the set of live data to decide between garbage collection and heap expansion in a memory management algorithm combining the dynamic array represented heaps with a mark and sweep garbage collector to enhance its performance. The sampling method predicting the density and the distribution of useful data is implemented as a partial marking algorithm. The algorithm randomly marks the nodes of the directed graph representing the live data at different depths with a variable probability factor p. Using the information gathered by the partial marking algorithm in the current step and the knowledge gathered in the previous iterations, the proposed empirical formula predicts with reasonable accuracy the density of live nodes on the heap, to decide between garbage collection and heap expansion. The resulting heuristics are tested empirically and shown to improve overall execution performance significantly in the context of the Jinni Prolog compiler's runtime system.
- Intelligent Memory Manager: Towards improving the locality behavior of allocation-intensive applications.
- Dynamic memory management required by allocation-intensive (i.e., Object Oriented and linked data structured) applications has led to a large number of research trends. Memory performance due to the cache misses in these applications continues to lag in terms of execution cycles as ever increasing CPU-Memory speed gap continues to grow. Sophisticated prefetcing techniques, data relocations, and multithreaded architectures have tried to address memory latency. These techniques are not completely successful since they require either extra hardware/software in the system or special properties in the applications. Software needed for prefetching and data relocation strategies, aimed to improve cache performance, pollutes the cache so that the technique itself becomes counter-productive. On the other hand, extra hardware complexity needed in multithreaded architectures decelerates CPU's clock, since "Simpler is Faster." This dissertation, directed to seek the cause of poor locality behavior of allocation--intensive applications, studies allocators and their impact on the cache performance of these applications. Our study concludes that service functions, in general, and memory management functions, in particular, entangle with application's code and become the major cause of cache pollution. In this dissertation, we present a novel technique that transfers the allocation and de-allocation functions entirely to a separate processor residing in chip with DRAM (Intelligent Memory Manager). Our empirical results show that, on average, 60% of the cache misses caused by allocation and de-allocation service functions are eliminated using our technique.
- Intrinsic and Extrinsic Adaptation in a Simulated Combat Environment
- Genetic algorithm and artificial life techniques are applied to the development of challenging and interesting opponents in a combat-based computer game. Computer simulations are carried out against an idealized human player to gather data on the effectiveness of the computer generated opponents.
- Investigating the Extractive Summarization of Literary Novels
- Abstract Due to the vast amount of information we are faced with, summarization has become a critical necessity of everyday human life. Given that a large fraction of the electronic documents available online and elsewhere consist of short texts such as Web pages, news articles, scientific reports, and others, the focus of natural language processing techniques to date has been on the automation of methods targeting short documents. We are witnessing however a change: an increasingly larger number of books become available in electronic format. This means that the need for language processing techniques able to handle very large documents such as books is becoming increasingly important. This thesis addresses the problem of summarization of novels, which are long and complex literary narratives. While there is a significant body of research that has been carried out on the task of automatic text summarization, most of this work has been concerned with the summarization of short documents, with a particular focus on news stories. However, novels are different in both length and genre, and consequently different summarization techniques are required. This thesis attempts to close this gap by analyzing a new domain for summarization, and by building unsupervised and supervised systems that effectively take into account the properties of long documents, and outperform the traditional extractive summarization systems typically addressing news genre.
- Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing.
- This research addresses the problem of automatic keyphrase extraction from large documents and back of the book indexing. The potential benefits of automating this process are far reaching, from improving information retrieval in digital libraries, to saving countless man-hours by helping professional indexers creating back of the book indexes. The dissertation introduces a new methodology to evaluate automated systems, which allows for a detailed, comparative analysis of several techniques for keyphrase extraction. We introduce and evaluate both supervised and unsupervised techniques, designed to balance the resource requirements of an automated system and the best achievable performance. Additionally, a number of novel features are proposed, including a statistical informativeness measure based on chi statistics; an encyclopedic feature that taps into the vast knowledge base of Wikipedia to establish the likelihood of a phrase referring to an informative concept; and a linguistic feature based on sophisticated semantic analysis of the text using current theories of discourse comprehension. The resulting keyphrase extraction system is shown to outperform the current state of the art in supervised keyphrase extraction by a large margin. Moreover, a fully automated back of the book indexing system based on the keyphrase extraction system was shown to lead to back of the book indexes closely resembling those created by human experts.
- A Language and Visual Interface to Specify Complex Spatial Pattern Mining
Access: Use of this item is restricted to the UNT Community.
The emerging interests in spatial pattern mining leads to the demand for a flexible spatial pattern mining language, on which easy to use and understand visual pattern language could be built. It is worthwhile to define a pattern mining language called LCSPM to allow users to specify complex spatial patterns. I describe a proposed pattern mining language in this paper. A visual interface which allows users to specify the patterns visually is developed. Visual pattern queries are translated into the LCSPM language by a parser and data mining process can be triggered afterwards. The visual language is based on and goes beyond the visual language proposed in literature. I implemented a prototype system based on the open source JUMP framework.
- Logic Programming Tools for Dynamic Content Generation and Internet Data Mining
Access: Use of this item is restricted to the UNT Community.
The phenomenal growth of Information Technology requires us to elicit, store and maintain huge volumes of data. Analyzing this data for various purposes is becoming increasingly important. Data mining consists of applying data analysis and discovery algorithms that under acceptable computational efficiency limitations, produce a particular enumeration of patterns over the data. We present two techniques based on using Logic programming tools for data mining. Data mining analyzes data by extracting patterns which describe its structure and discovers co-relations in the form of rules. We distinguish analysis methods as visual and non-visual and present one application of each. We explain that our focus on the field of Logic Programming makes some of the very complex tasks related to Web based data mining and dynamic content generation, simple and easy to implement in a uniform framework.
- Machine Language Techniques for Conversational Agents
- Machine Learning is the ability of a machine to perform better at a given task, using its previous experience. Various algorithms like decision trees, Bayesian learning, artificial neural networks and instance-based learning algorithms are used widely in machine learning systems. Current applications of machine learning include credit card fraud detection, customer service based on history of purchased products, games and many more. The application of machine learning techniques to natural language processing (NLP) has increased tremendously in recent years. Examples are handwriting recognition and speech recognition. The problem we tackle in this Problem in Lieu of Thesis is applying machine-learning techniques to improve the performance of a conversational agent. The OpenMind repository of common sense, in the form of question-answer pairs is treated as the training data for the machine learning system. WordNet is interfaced with to capture important semantic and syntactic information about the words in the sentences. Further, k-closest neighbors algorithm, an instance based learning algorithm is used to simulate a case based learning system. The resulting system is expected to be able to answer new queries with knowledge gained from the training data it was fed with.
- Measuring Semantic Relatedness Using Salient Encyclopedic Concepts
- While pragmatics, through its integration of situational awareness and real world relevant knowledge, offers a high level of analysis that is suitable for real interpretation of natural dialogue, semantics, on the other end, represents a lower yet more tractable and affordable linguistic level of analysis using current technologies. Generally, the understanding of semantic meaning in literature has revolved around the famous quote ``You shall know a word by the company it keeps''. In this thesis we investigate the role of context constituents in decoding the semantic meaning of the engulfing context; specifically we probe the role of salient concepts, defined as content-bearing expressions which afford encyclopedic definitions, as a suitable source of semantic clues to an unambiguous interpretation of context. Furthermore, we integrate this world knowledge in building a new and robust unsupervised semantic model and apply it to entail semantic relatedness between textual pairs, whether they are words, sentences or paragraphs. Moreover, we explore the abstraction of semantics across languages and utilize our findings into building a novel multi-lingual semantic relatedness model exploiting information acquired from various languages. We demonstrate the effectiveness and the superiority of our mono-lingual and multi-lingual models through a comprehensive set of evaluations on specialized synthetic datasets for semantic relatedness as well as real world applications such as paraphrase detection and short answer grading. Our work represents a novel approach to integrate world-knowledge into current semantic models and a means to cross the language boundary for a better and more robust semantic relatedness representation, thus opening the door for an improved abstraction of meaning that carries the potential of ultimately imparting understanding of natural language to machines.