The widely used vector model maintains its popularity because of its simplicity, fast speed, and the appeal of using spatial proximity for semantic proximity. However, this model faces a disadvantage that is associated with the vagueness from keywords overlapping. Efforts have been made to improve the vector model. The research on improving document representation has been focused on four areas, namely, statistical co-occurrence of related items, forming term phrases, grouping of related words, and representing the content of documents. In this thesis, we propose the idea-indexing model to improve document representation for the filtering task in IR. The idea-indexing model matches document terms with the ideas they express and indexes the document with these ideas. This indexing scheme represents the document with its semantics instead of sets of independent terms. We show in this thesis that indexing with ideas leads to better performance.
The C Navigational System (CNS) is a proposed programming environment for the C programming language. The introduction covers the major influences of programming environments and the components of a programming environment. The system is designed to support the design, coding and maintenance phases of software development. CNS provides multiple views to both the source and documentation for a programming project. User-defined and system-defined links allow the source and documentation to be hierarchically searched. CNS also creates a history list and function interface for each function in a module. The final chapter compares CNS and several other programming environments (Microscope, Rn, Cedar, PECAN, and Marvel).
This paper describes the general architecture and function of a Case-Based Reasoning (CBR) system implemented with ASP.NET and C#. Microsoft Visual Studio .NET and XML Web Services provide a flexible, standards-based model that allows clients to access data. Web Form Pages offer a powerful programming model for Web-enabled user interface. The system provides a variety of mechanisms and services related to story retrieval and adaptation. Users may browse and search a library of text stories. More advanced CBR capabilities were also implemented, including a multi-factor distance-calculation for matching user interests with stories in the library, recommendations on optimizing search, and adaptation of stories to match user interests.
We have a huge amount of video data from extensively available surveillance cameras and increasingly growing technology to record the motion of a moving object in the form of trajectory data. With proliferation of location-enabled devices and ongoing growth in smartphone penetration as well as advancements in exploiting image processing techniques, tracking moving objects is more flawlessly achievable. In this work, we explore some domain-independent qualitative and quantitative features in raw trajectory (spatio-temporal) data in videos captured by a fixed single wide-angle view camera sensor in outdoor areas. We study the efficacy of those features in classifying four basic high level actions by employing two supervised learning algorithms and show how each of the features affect the learning algorithms’ overall accuracy as a single factor or confounded with others.
Modern high performance computing is dependent on parallel processing systems. Most current benchmarks reveal only the high level computational throughput metrics, which may be sufficient for single processor systems, but can lead to a misrepresentation of true system capability for parallel systems. A new benchmark is therefore proposed. CLUE (Cluster Evaluator) uses a cellular automata algorithm to evaluate the scalability of parallel processing machines. The benchmark also uses algorithmic variations to evaluate individual system components' impact on the overall serial fraction and efficiency. CLUE is not a replacement for other performance-centric benchmarks, but rather shows the scalability of a system and provides metrics to reveal where one can improve overall performance. CLUE is a new benchmark which demonstrates a better comparison among different parallel systems than existing benchmarks and can diagnose where a particular parallel system can be optimized.
One of the greatest problems facing researchers in the sub field of Artificial Intelligence known as Intelligent Tutoring Systems (ITS) is the selection of a knowledge base designs that will facilitate the modification of the knowledge base. The Class-Entity-Relationship-Attribute (CERA), proposed by R. P. Brazile, holds certain promise as a more generic knowledge base design framework upon which can be built robust and efficient ITS. This study has a twofold purpose. The first is to demonstrate that a CERA knowledge base can be constructed for an ITS on a subset of the domain of Cretaceous paleontology and function as the "expert module" of the ITS. The second is to test the validity of the ideas that students guided through a lesson learn more factual knowledge, while those who explore the knowledge base that underlies the lesson through query at their own pace will be able to formulate their own integrative knowledge from the knowledge gained in their explorations and spend more time on the system. This study concludes that a CERA-based system can be constructed as an effective teaching tool. However, while an ITS - treatment provides for statistically significant gains in achievement test scores, the type of treatment seems not to matter as much as time spent on task. This would seem to indicate that a query-based system which allows the user to progress at their own pace would be a better type of system for the presentation of material due to the greater amount of on-line computer time exhibited by the users.
Agent-oriented software engineering (AOSE) covers issues on developing systems with software agents. There are many techniques, mostly agent-oriented and object-oriented, ready to be chosen as building blocks to create agent-based systems. There have been several AOSE methodologies proposed intending to show engineers guidelines on how these elements are constituted in having agents achieve the overall system goals. Although these solutions are promising, most of them are designed in ad-hoc manner without truly obeying software developing life-cycle fully, as well as lacking of examinations on agent-oriented features. To address these issues, we investigated state-of-the-art techniques and AOSE methodologies. By examining them in different respects, we commented on the strength and weakness of them. Toward a formal study, a comparison framework has been set up regarding four aspects, including concepts and properties, notations and modeling techniques, process, and pragmatics. Under these criteria, we conducted the comparison in both overview and detailed level. The comparison helped us with empirical and analytical study, to inspect the issues on how an ideal agent-based system will be formed.
This thesis compares the file organization techniques that are implemented on two different types of computer systems, the large-scale and the small-scale. File organizations from representative computers in each class are examined in detail: the IBM System/370 (OS/370) and the Harris 1600 Distributed Processing System with the Extended Communications Operating System (ECOS). In order to establish the basic framework for comparison, an introduction to file organizations is presented. Additionally, the functional requirements for file organizations are described by their characteristics and user demands. Concluding remarks compare file organization techniques and discuss likely future developments of file systems.
There are three main results in this dissertation. They are PLS-completeness of discrete Hopfield network convergence with eight different restrictions, (degree 3, bipartite and degree 3, 8-neighbor mesh, dual of the knight's graph, hypercube, butterfly, cube-connected cycles and shuffle-exchange), exponential convergence behavior of discrete Hopfield network, and simulation of Turing machines by discrete Hopfield Network.
Many infectious diseases are spread through interactions between susceptible and infectious individuals. Keeping track of where each exposure to the disease took place, when it took place, and which individuals were involved in the exposure can give public health officials important information that they may use to formulate their interventions. Further, knowing which individuals in the population are at the highest risk of becoming infected with the disease may prove to be a useful tool for public health officials trying to curtail the spread of the disease. Epidemiological models are needed to allow epidemiologists to study the population dynamics of transmission of infectious agents and the potential impact of infectious disease control programs. While many agent-based computational epidemiological models exist in the literature, they focus on the spread of disease rather than exposure risk. These models are designed to simulate very large populations, representing individuals as agents, and using random experiments and probabilities in an attempt to more realistically guide the course of the modeled disease outbreak. The work presented in this thesis focuses on tracking exposure risk to chickenpox in an elementary school setting. This setting is chosen due to the high level of detailed information realistically available to school administrators regarding individuals' schedules and movements. Using an agent-based approach, contacts between individuals are tracked and analyzed with respect to both individuals and locations. The results are then analyzed using a combination of tools from computer science and geographic information science.
Publicly available datasets in health science are often large and observational, in contrast to experimental datasets where a small number of data are collected in controlled experiments. Variables' causal relationships in the observational dataset are yet to be determined. However, there is a significant interest in health science to discover and analyze causal relationships from health data since identified causal relationships will greatly facilitate medical professionals to prevent diseases or to mitigate the negative effects of the disease. Recent advances in Computer Science, particularly in Bayesian networks, has initiated a renewed interest for causality research. Causal relationships can be possibly discovered through learning the network structures from data. However, the number of candidate graphs grows in a more than exponential rate with the increase of variables. Exact learning for obtaining the optimal structure is thus computationally infeasible in practice. As a result, heuristic approaches are imperative to alleviate the difficulty of computations. This research provides effective and efficient learning tools for local causal discoveries and novel methods of learning causal structures with a combination of background knowledge. Specifically in the direction of constraint based structural learning, polynomial-time algorithms for constructing causal structures are designed with first-order conditional independence. Algorithms of efficiently discovering non-causal factors are developed and proved. In addition, when the background knowledge is partially known, methods of graph decomposition are provided so as to reduce the number of conditioned variables. Experiments on both synthetic data and real epidemiological data indicate the provided methods are applicable to large-scale datasets and scalable for causal analysis in health data. Followed by the research methods and experiments, this dissertation gives thoughtful discussions on the reliability of causal discoveries computational health science research, complexity, and implications in health science research.
POD (Point of Dispensing)-based emergency response plans involving mass prophylaxis may seem feasible when considering the choice of dispensing points within a region, overall population density, and estimated traffic demands. However, the plan may fail to serve particular vulnerable sub-populations, resulting in access disparities during emergency response. Federal authorities emphasize on the need to identify sub-populations that cannot avail regular services during an emergency due to their special needs to ensure effective response. Vulnerable individuals require the targeted allocation of appropriate resources to serve their special needs. Devising schemes to address the needs of vulnerable sub-populations is essential for the effectiveness of response plans. This research focuses on data-driven computational methods to quantify and address vulnerabilities in response plans that require the allocation of targeted resources. Data-driven methods to identify and quantify vulnerabilities in response plans are developed as part of this research. Addressing vulnerabilities requires the targeted allocation of appropriate resources to PODs. The problem of resource allocation to PODs during public health emergencies is introduced and the variants of the resource allocation problem such as the spatial allocation, spatio-temporal allocation and optimal resource subset variants are formulated. Generating optimal resource allocation and scheduling solutions can be computationally hard problems. The application of metaheuristic techniques to find near-optimal solutions to the resource allocation problem in response plans is investigated. A vulnerability analysis and resource allocation framework that facilitates the demographic analysis of population data in the context of response plans, and the optimal allocation of resources with respect to the analysis are described.
Synthetic seismograms are a computer-generated aid in the search for hydrocarbons. Heretofore the solution has been done by z-transforms. This thesis presents a solution based on the method of finite differences. The resulting algorithm is fast and compact. The method is applied to three variations of the problem, all three are reduced to the same approximating equation, which is shown to be optimal, in that grid refinement does not change it. Two types of algorithms are derived from the equation. The number of obvious multiplications, additions and subtractions of each is analyzed. Critical section of each requires one multiplication, two additions and two subtractions. Four sample synthetic seismograms are shown. Implementation of the new algorithm runs twice as fast as previous computer program.
The problem with which this research was done was that of applying the IBM360 computer to the analysis of waveforms from a Beckman model 120C liquid chromatograph. Software to interpret these waveforms was written in the PLl language. For a control run, input to the computer consisted of a digital tape containing the raw results of the chromatograph run. Output consisted of several graphs and charts giving the results of the analysis. In addition, punched output was provided which gave the name of each amino acid, its elution time and color constant. These punched cards were then input to the computer as input to the experimental run, along with the raw data on the digital tape. From the known amounts of amino acids in the control run and the ratio of control to experimental peak area, the amino acids of the unknown were quantified. The resulting programs provided a complete and easy to use solution to the problem of chromatographic data analysis.
This paper presents the scan-line algorithm which has been implemented on the Lisp Machine. The scan-line algorithm resides beneath a library of primitive software routines which draw more fundamental objects: lines, triangles and rectangles. This routine, implemented in microcode, applies the A(BC)*D approach to word boundary alignments in order to create an extremely fast, efficient, and general purpose drawing primitive. The scan-line algorithm improves on previous methodologies by limiting the number of CPU intensive instructions and by minimizing the number of words referenced. This paper will describe how to draw scan-lines and the constraints imposed upon the scan-line algorithm by the Lisp Machine's hardware and software.
This study models the human process of music cognition on the digital computer. The definition of music cognition is derived from the work in music cognition done by the researchers Carol Krumhansl and Edward Kessler, and by Mari Jones, as well as from the music theories of Heinrich Schenker. The computer implementation functions in three stages. First, it translates a musical "performance" in the form of MIDI (Musical Instrument Digital Interface) messages into LISP structures. Second, the various parameters of the performance are examined separately a la Jones's joint accent structure, quantified according to psychological findings, and adjusted to a common scale. The findings of Krumhansl and Kessler are used to evaluate the consonance of each note with respect to the key of the piece and with respect to the immediately sounding harmony. This process yields a multidimensional set of points, each of which is a cognitive evaluation of a single musical event within the context of the piece of music within which it occurred. This set of points forms a metric space in multi-dimensional Euclidean space. The third phase of the analysis maps the set of points into a topology-preserving data structure for a Schenkerian-like middleground structural analysis. This process yields a hierarchical stratification of all the musical events (notes) in a piece of music. It has been applied to several pieces of music with surprising results. In each case, the analysis obtained very closely resembles a structural analysis which would be supplied by a human theorist. The results obtained invite us to take another look at the representation of knowledge and perception from another perspective, that of a set of points in a topological space, and to ask if such a representation might not be useful in other domains. It also leads us to ask if such a ...
This investigation dealt with locating and measuring x-ray absorption of radiographic images. The methods developed provide a fast, accurate, minicomputer control, for analysis of embedded objects. A PDP/8 computer system was interfaced with a Joyce Loebl 3CS Microdensitometer and a Leeds & Northrup Recorder. Proposed algorithms for bone location and data smoothing work on a twelve-bit minicomputer. Designs of a software control program and operational procedure are presented. The filter made wedge and limb scans monotonic from minima to maxima. It was tested for various convoluted intervals. Ability to resmooth the same data in multiple passes was tested. An interval size of fifteen works well in one pass.
Content-based image retrieval (CBIR) is the retrieval of images from a collection by means of internal feature measures of the information content of the images. In CBIR systems, text media is usually used only to retrieve exemplar images for further searching by image feature content. This research work describes a new method for integrating multimedia text and image content features to increase the retrieval performance of the system. I am exploring the content-based features of an image extracted from a video to build a storyboard for search retrieval of images. Metadata encoded multimedia features include extracting primitive features like color, shape and text from an image. Histograms are built for all the features extracted and stored in a database. Images are searched based on comparing these histogram values of the extracted image with the stored values. These histogram values are used for extraction of keyframes from a collection of images parsed from a video file. Individual shots of images are extracted from a video clip and run through processes that extract the features and build the histogram values. A keyframe extraction algorithm is run to get the keyframes from the collection of images to build a storyboard of images. In video retrieval, speech recognition and other multimedia encoding could help improve the CBIR indexing technique and makes keyframe extraction and searching effective. Research in area of embedding sound and other multimedia could enhance effective video retrieval.
Streaming multimedia content with UDP has become popular over distributed systems such as an Internet. This may encounter many losses due to dropped packets or late arrivals at destination since UDP can only provide best effort delivery. Even UDP doesn't have any self-recovery mechanism from congestion collapse or bursty loss to inform sender of the data to adjust future transmission rate of data like in TCP. So there is a need to incorporate various control schemes like forward error control, interleaving, and congestion control and error concealment into real-time transmission to prevent from effect of losses. Loss can be repaired by retransmission if roundtrip delay is allowed, otherwise error concealment techniques will be used based on the type and amount of loss. This paper implements the interleaving technique with packet spacing of varying interleaver block size for protecting real-time data from loss and its effect during transformation across the Internet. The packets are interleaved and maintain some time gap between two consecutive packets before being transmitted into the Internet. Thus loss of packets can be reduced from congestion and preventing loss of consecutive packets of information when a burst of several packets are lost. Several experiments have been conducted with video data for analysis of proposed model.
Surface fitting methods play an important role in many scientific fields as well as in computer aided geometric design. The problem treated here is that of constructing a smooth surface that interpolates data values associated with scattered nodes in the plane. The data is said to be convex if there exists a convex interpolant. The problem of convexity-preserving interpolation is to determine if the data is convex, and construct a convex interpolant if it exists.
Our generation has experienced one of the most dramatic changes in how society communicates. Today, we have online information on almost any imaginable topic. However, most of this information is available in only a few dozen languages. In this thesis, I explore the use of parallel texts to enable cross-language information retrieval (CLIR) for languages with scarce resources. To build the parallel text I use the Bible. I evaluate different variables and their impact on the resulting CLIR system, specifically: (1) the CLIR results when using different amounts of parallel text; (2) the role of paraphrasing on the quality of the CLIR output; (3) the impact on accuracy when translating the query versus translating the collection of documents; and finally (4) how the results are affected by the use of different dialects. The results show that all these variables have a direct impact on the quality of the CLIR system.
Blood pressure is vital sign information that physicians often need as preliminary data for immediate intervention during emergency situations or for regular monitoring of people with cardiovascular diseases. Despite the availability of portable blood pressure meters in the market, they are not regularly carried by people, creating a need for an ultra-portable measurement platform or device that can be easily carried and used at all times. One such device is the smartphone which, according to comScore survey is used by 26.2% of the US adult population. the mass production of these phones with built-in sensors and high computation power has created numerous possibilities for application development in different domains including biomedical. Motivated by this capability and their extensive usage, this thesis focuses on developing a blood pressure measurement platform on smartphones. Specifically, I developed a blood pressure measurement system on a smart phone using the built-in camera and a customized external microphone. the system consists of first obtaining heart beats using the microphone and finger pulse with the camera, and finally calculating the blood pressure using the recorded data. I developed techniques for finding the best location for obtaining the data, making the system usable by all categories of people. the proposed system resulted in accuracies between 90-100%, when compared to traditional blood pressure meters. the second part of this thesis presents a new system for remote heart beat monitoring using the smart phone. with the proposed system, heart beats can be transferred live by patients and monitored by physicians remotely for diagnosis. the proposed blood pressure measurement and remote monitoring systems will be able to facilitate information acquisition and decision making by the 9-1-1 operators.
Mobile agents require an appropriate platform that can facilitate their migration and execution. In particular, the design and implementation of such a system must balance several factors that will ensure that its constituent agents are executed without problems. Besides the basic requirements of migration and execution, an agent system must also provide mechanisms to ensure the security and survivability of an agent when it migrates between hosts. In addition, the system should be simple enough to facilitate its widespread use across large scale networks (i.e Internet). To address these issues, this thesis discusses the design and implementation of the Distributed Agent Delivery System (DADS). The DADS provides a de-coupled design that separates agent acceptance from agent execution. Using functional modules, the DADS provides services ranging from language execution and security to fault-tolerance and compression. Modules allow the administrator(s) of hosts to declare, at run-time, the services that they want to provide. Since each administrative domain is different, the DADS provides a platform that can be adapted to exchange heterogeneous blends of agents across large scale networks.
Managing large-scale dynamical systems (e.g., transportation systems, complex information systems, and power networks, etc.) in real-time is very challenging considering their complicated system dynamics, intricate network interactions, large scale, and especially the existence of various uncertainties. To address this issue, intelligent techniques which can quickly design decision-making strategies that are robust to uncertainties are needed. This dissertation aims to conquer these challenges by exploring a data-driven decision-making framework, which leverages big-data techniques and scalable uncertainty evaluation approaches to quickly solve optimal control problems. In particular, following techniques have been developed along this direction: 1) system modeling approaches to simplify the system analysis and design procedures for multiple applications; 2) effective simulation and analytical based approaches to efficiently evaluate system performance and design control strategies under uncertainty; and 3) big-data techniques that allow some computations of control strategies to be completed offline. These techniques and tools for analysis, design and control contribute to a wide range of applications including air traffic flow management, complex information systems, and airborne networks.
Mobile phone advancements and ubiquitous internet connectivity are resulting in ever expanding possibilities in the application of smart phones. Users of mobile phones are now capable of hosting server applications from their personal devices. Whether providing services individually or in an ad hoc network setting the devices are currently not configured for defending against distributed denial of service (DDoS) attacks. These attacks, often launched from a botnet, have existed in the space of personal computing for decades but recently have begun showing up on mobile devices. Research is done first into the required steps to develop a potential botnet on the Android platform. This includes testing for the amount of malicious traffic an Android phone would be capable of generating for a DDoS attack. On the other end of the spectrum is the need of mobile devices running networked applications to develop security against DDoS attacks. For this mobile, phones are setup, with web servers running Apache to simulate users running internet connected applications for either local ad hoc networks or serving to the internet. Testing is done for the viability of using commonly available modules developed for Apache and intended for servers as well as finding baseline capabilities of mobiles to handle higher traffic volumes. Given the unique challenge of the limited resources a mobile phone can dedicate to Apache when compared to a dedicated hosting server a new method was needed. A proposed defense algorithm is developed for mitigating DDoS attacks against the mobile server that takes into account the limited resources available on the mobile device. The algorithm is tested against TCP socket flooding for effectiveness and shown to perform better than the common Apache module installations on a mobile device.
This research explores the concepts of defensive programming as currently defined in the literature. Then these concepts are extended and more explicitly defined. The relationship between defensive programming, as presented in this research, and current programming practices is discussed and several benefits are observed. Defensive programming appears to benefit the entire software life cycle. Four identifiable phases of the software development process are defined, and the relationship between these four phases and defensive programming is shown. In this research, defensive programming is defined as writing programs in such a way that during execution the program itself produces communication allowing the programmer and the user to observe its dynamic states accurately and critically. To accomplish this end, the use of defensive programming snap shots is presented as a software development tool.
Free and fair elections are the basis for democracy, but conducting elections is not an easy task. Different groups of people are trying to influence the outcome of the election in their favor using the range of methods, from campaigning for a particular candidate to well-financed lobbying. Often the stakes are too high, and the methods are illegal. Two main properties of any voting scheme are the privacy of a voter’s choice and the integrity of the tally. Unfortunately, they are mutually exclusive. Integrity requires making elections transparent and auditable, but at the same time, we must preserve a voter’s privacy. It is always a trade-off between these two requirements. Current voting schemes favor privacy over auditability, and thus, they are vulnerable to voting fraud. I propose two novel voting systems that can achieve both privacy and verifiability. The first protocol is based on cryptographical primitives to ensure the integrity of the final tally and privacy of the voter. The second protocol is a simple paper-based voting scheme that achieves almost the same level of security without usage of cryptography.
Operatorless Prolog text is LL(1) in nature and any standard LL parser generator tool can be used to parse it. However, the Prolog text that conforms to the ISO Prolog standard allows the definition of dynamic operators. Since Prolog operators can be defined at run-time, operator symbols are not present in the grammar rules of the language. Unless the parser generator allows for some flexibility in the specification of the grammar rules, it is very difficult to generate a parser for such text. In this thesis we discuss the existing parsing methods and their modified versions to parse languages with dynamic operator capabilities. Implementation details of a parser using Javacc as a parser generator tool to parse standard Prolog text is provided. The output of the parser is an “Abstract Syntax Tree” that reflects the correct precedence and associativity rules among the various operators (static and dynamic) of the language. Empirical results are provided that show that a Prolog parser that is generated by the parser generator like Javacc is comparable in efficiency to a hand-coded parser.
As bandwidth constraints on LAN/WAN environments decrease, the demand for distributed services will continue to increase. In particular, the proliferation of user-level applications requiring high-capacity distributed file storage systems will demand that such services be universally available. At the same time, the advent of high-speed networks have made the deployment of application and communication solutions based upon an Intelligent Mobile Agent (IMA) framework practical. Agents have proven to present an ideal development paradigm for the creation of autonomous large-scale distributed systems, and an agent-based communication scheme would facilitate the creation of independently administered distributed file services. This thesis thus outlines an architecture for such a distributed file system based upon an IMA communication framework.
Environmental monitoring represents a major application domain for wireless sensor networks (WSN). However, despite significant advances in recent years, there are still many challenging issues to be addressed to exploit the full potential of the emerging WSN technology. In this dissertation, we introduce the design and implementation of low-power wireless sensor networks for long-term, autonomous, and near-real-time environmental monitoring applications. We have developed an out-of-box solution consisting of a suite of software, protocols and algorithms to provide reliable data collection with extremely low power consumption. Two wireless sensor networks based on the proposed solution have been deployed in remote field stations to monitor soil moisture along with other environmental parameters. As parts of the ever-growing environmental monitoring cyberinfrastructure, these networks have been integrated into the Texas Environmental Observatory system for long-term operation. Environmental measurement and network performance results are presented to demonstrate the capability, reliability and energy-efficiency of the network.
The purpose of this project was to describe a novel design approach for a digital computer peripheral controller, then design and construct a case study controller. This document consists of three chapters and an appendix. Chapter II presents the design approach chosen; a variation to a design presented by Charles R. Richards in an article published in Electronics magazine. Richards' approach consists of a finite state machine circuitry controlling all the functions of a controller. The variation to Richards' approach consists of considering the various logically independent processes which a controller carries out and assigning control of each process to a separate finite state machine. The appendix contains the documentation of the design and construction of the controller.
The recent growth in sensor technology allows easier information gathering in real-time as sensors have grown smaller, more accurate, and less expensive. The resulting data is often in a geo-stream format continuously changing input with a spatial extent. Researchers developing geo-streaming management systems (GSMS) require a benchmark system for evaluation, which is currently lacking. This thesis presents GSMark, a benchmark for evaluating GSMSs. GSMark provides a data generator that creates a combination of synthetic and real geo-streaming data, a workload simulator to present the data to the GSMS as a data stream, and a set of benchmark queries that evaluate typical GSMS functionality and query performance. In particular, GSMark generates both moving points and evolving spatial regions, two fundamental data types for a broad range of geo-stream applications, and the geo-streaming queries on this data.
There are several types of disorders that affect our colon’s ability to function properly such as colorectal cancer, ulcerative colitis, diverticulitis, irritable bowel syndrome and colonic polyps. Automatic detection of these diseases would inform the endoscopist of possible sub-optimal inspection during the colonoscopy procedure as well as save time during post-procedure evaluation. But existing systems only detects few of those disorders like colonic polyps. In this dissertation, we address the automatic detection of another important disorder called ulcerative colitis. We propose a novel texture feature extraction technique to detect the severity of ulcerative colitis in block, image, and video levels. We also enhance the current informative frame filtering methods by detecting water and bubble frames using our proposed technique. Our feature extraction algorithm based on accumulation of pixel value difference provides better accuracy at faster speed than the existing methods making it highly suitable for real-time systems. We also propose a hybrid approach in which our feature method is combined with existing feature method(s) to provide even better accuracy. We extend the block and image level detection method to video level severity score calculation and shot segmentation. Also, the proposed novel feature extraction method can detect water and bubble frames in colonoscopy videos with very high accuracy in significantly less processing time even when clustering is used to reduce the training size by 10 times.
This work describes an approach to determine whether people participate in the events they tweet about. Specifically, we determine whether people are participants in events with respect to the tweet timestamp. We target all events expressed by verbs in tweets, including past, present and events that may occur in future. We define event participant as people directly involved in an event regardless of whether they are the agent, recipient or play another role. We present an annotation effort, guidelines and quality analysis with 1,096 event mentions. We discuss the label distributions and event behavior in the annotated corpus. We also explain several features used and a standard supervised machine learning approach to automatically determine if and when the author is a participant of the event in the tweet. We discuss trends in the results obtained and devise important conclusions.
As Virtual Environments (VE) become a more commonly used method of interaction and presentation, supporting users as they navigate and interact with scenarios presented in VE will be a significant issue. A key step in understanding the needs of users in these situations will be observing them perform representative tasks in a fully developed environment. In this paper, we describe the development of a test bed for interactive narrative in a virtual environment. The test bed was specifically developed to present multiple, simultaneous sequences of events (scenarios or narratives) and to support user navigation through these scenarios. These capabilities will support the development of multiple users testing scenarios, allowing us to study and better understand the needs of users of narrative VEs.
With a growing concern of an infectious diseases spreading in a population, epidemiology is becoming more important for the future of public health. In the past epidemiologist used existing data of an outbreak to help them determine how an infectious disease might spread in the future. Now with computational models, they able to analysis data produced by these models to help with prevention and intervention plans. This paper looks at the design, implementation, and analysis of a computational model based on the interactions of the population between individuals. The design of the working contact model looks closely at the SEIR model used as the foundation and the two timelines of a disease. The implementation of the contact model is reviewed while looking closely at data structures. The analysis of the experiments provide evidence this contact model can be used to help epidemiologist study the spread of an infectious disease based on the contact rate of individuals.
This software is aimed at providing user-friendly, easy-to-use environment for the user to scrub (de-identify/modify) the DICOM header information. Some tools either anonymize or default the values without the user interaction. The user doesn't have the flexibility to edit the header information. One cannot scrub set of images simultaneously (batch scrubbing). This motivated to develop a tool/utility that can scrub a set of images in a single step more efficiently. This document also addresses security issues of the patient confidentiality to achieve protection of patient identifying information and some technical requirements
Online/offline signature schemes are useful in many situations, and two such scenarios are considered in this dissertation: bursty server authentication and embedded device authentication. In this dissertation, new techniques for online/offline signing are introduced, those are applied in a variety of ways for creating online/offline signature schemes, and five different online/offline signature schemes that are proved secure under a variety of models and assumptions are proposed. Two of the proposed five schemes have the best offline or best online performance of any currently known technique, and are particularly well-suited for the scenarios that are considered in this dissertation. To determine if the proposed schemes provide the expected practical improvements, a series of experiments were conducted comparing the proposed schemes with each other and with other state-of-the-art schemes in this area, both on a desktop class computer, and under AVR Studio, a simulation platform for an 8-bit processor that is popular for embedded systems. Under AVR Studio, the proposed SGE scheme using a typical key size for the embedded device authentication scenario, can complete the offline phase in about 24 seconds and then produce a signature (the online phase) in 15 milliseconds, which is the best offline performance of any known signature scheme that has been proven secure in the standard model. In the tests on a desktop class computer, the proposed SGS scheme, which has the best online performance and is designed for the bursty server authentication scenario, generated 469,109 signatures per second, and the Schnorr scheme (the next best scheme in terms of online performance) generated only 223,548 signatures. The experimental results demonstrate that the SGE and SGS schemes are the most efficient techniques for embedded device authentication and bursty server authentication, respectively.
In response to the recent intensive needs for civilian security surveillance, both full and compact versions of a Multimedia Security Surveillance (MSS) system have been built up. The new Microsoft DirectShow technology was applied in implementing the multimedia stream-processing module. Through Microsoft Windows Driver Model interface, the chosen IEEE1394 enabled Fire-i cameras as external sensors are integrated with PC based continuous storage unit. The MSS application also allows multimedia broadcasting and remote controls. Cost analysis is included.
Data is everywhere. The current Technological advancements in Digital, Social media and the ease at which the availability of different application services to interact with variety of systems are causing to generate tremendous volumes of data. Due to such varied services, Data format is now not restricted to only structure type like text but can generate unstructured content like social media data, videos and images etc. The generated Data is of no use unless been stored and analyzed to derive some Value. Traditional Database systems comes with limitations on the type of data format schema, access rates and storage sizes etc. Hadoop is an Apache open source distributed framework that support storing huge datasets of different formatted data reliably on its file system named Hadoop File System (HDFS) and to process the data stored on HDFS using MapReduce programming model. This thesis study is about building a Data Architecture using Hadoop and its related open source distributed frameworks to support a Data flow pipeline on a low commodity hardware. The Data flow components are, sourcing data, storage management on HDFS and data access layer. This study also discuss about a use case to utilize the architecture components. Sqoop, a framework to ingest the structured data from database onto Hadoop and Flume is used to ingest the semi-structured Twitter streaming json data on to HDFS for analysis. The data sourced using Sqoop and Flume have been analyzed using Hive for SQL like analytics and at a higher level of data access layer, Hadoop has been compared with an in memory computing system using Spark. Significant differences in query execution performances have been analyzed when working with Hadoop and Spark frameworks. This integration helps for ingesting huge Volumes of streaming json Variety data to derive better Value based analytics using Hive and ...
New peripheral devices are being developed at an ever increasing rate. Before such accessories can be used in the UNIX environment (UNIX is a trademark of Bell Laboratories), they must be able to communicate with the operating system. This involves writing a device driver for each device. In order to do this, very detailed knowledge is required of both the device to be integrated and the version of UNIX to which it will be attached. The process is long, detailed and prone to subtle problems and errors. This paper presents a menu-driven utility designed to simplify and accelerate the design and implementation of UNIX device drivers by freeing developers from many of the implementation specific low-level details.
Distributed simulation is an enabling concept to support the networked interaction of models and real world elements that are geographically distributed. This technology has brought a new set of challenging problems to solve, such as Data Distribution Management (DDM). The aim of DDM is to limit and control the volume of the data exchanged during a distributed simulation, and reduce the processing requirements of the simulation hosts by relaying events and state information only to those applications that require them. In this thesis, we propose a new DDM scheme, which we refer to as dynamic grid-based DDM. A lightweight UNT-RTI has been developed and implemented to investigate the performance of our DDM scheme. Our results clearly indicate that our scheme is scalable and it significantly reduces both the number of multicast groups used, and the message overhead, when compared to previous grid-based allocation schemes using large-scale and real-world scenarios.
Resources are said to be fragmented in the network when they are available in non-contiguous blocks, and calls are dropped as they may not end sufficient resources. Hence, available resources may remain unutilized. In this thesis, the effect of resource fragmentation (RF) on RSVP-controlled networks was studied and new algorithms were proposed to reduce the effect of RF. In order to minimize the effect of RF, resources in the network are dynamically redistributed on different paths to make them available in contiguous blocks. Extra protocol messages are introduced to facilitate resource redistribution in the network. The Dynamic Resource Redistribution (DRR) algorithm when used in conjunction with RSVP, not only increased the number of calls accommodated into the network but also increased the overall resource utilization of the network. Issues such as how many resources need to be redistributed and of which call(s), and how these choices affect the redistribution process were investigated. Further, various simulation experiments were conducted to study the performance of the DRR algorithm on different network topologies with varying traffic characteristics.
The fusion of computers and communications has promised to herald the age of information super-highway over high speed communication networks where the ultimate goal is to enable a multitude of users at any place, access information from anywhere and at any time. This, in a nutshell, is the goal envisioned by the Personal Communication Services (PCS) and Xerox's ubiquitous computing. In view of the remarkable growth of the mobile communication users in the last few years, the radio frequency spectrum allocated by the FCC (Federal Communications Commission) to this service is still very limited and the usable bandwidth is by far much less than the expected demand, particularly in view of the emergence of the next generation wireless multimedia applications like video-on-demand, WWW browsing, traveler information systems etc. Proper management of available spectrum is necessary not only to accommodate these high bandwidth applications, but also to alleviate problems due to sudden explosion of traffic in so called hot cells. In this dissertation, we have developed simple load balancing techniques to cope with the problem of tele-traffic overloads in one or more hot cells in the system. The objective is to ease out the high channel demand in hot cells by borrowing channels from suitable cold cells and by proper assignment (or, re-assignment) of the channels among the users. We also investigate possible ways of improving system capacity by rescheduling bandwidth in case of wireless multimedia traffic. In our proposed scheme, traffic using multiple channels releases one or more channels to increase the carried traffic or throughput in the system. Two orthogonal QoS parameters, called carried traffic and bandwidth degradation, are identified and a cost function describing the total revenue earned by the system from a bandwidth degradation and call admission policy, is formulated. A channel sharing scheme is proposed for ...
In this thesis, the gate matrix layout problem in VLSI design is considered where the goal is to minimize the number of tracks required to layout a given circuit and a taxonomy of approaches to its solution is presented. An efficient hybrid heuristic is also proposed for this combinatorial optimization problem, which is based on the combination of probabilistic hill-climbing technique and greedy method. This heuristic is tested experimentally with respect to four existing algorithms. As test cases, five benchmark problems from the literature as well as randomly generated problem instances are considered. The experimental results show that the proposed hybrid algorithm, on the average, performs better than other heuristics in terms of the required computation time and/or the quality of solution. Due to the computation-intensive nature of the problem, an exact solution within reasonable time limits is impossible. So, it is difficult to judge the effectiveness of any heuristic in terms of the quality of solution (number of tracks required). A probabilistic model of the gate matrix layout problem that computes the expected number of tracks from the given input parameters, is useful to this respect. Such a probabilistic model is proposed in this thesis, and its performance is experimentally evaluated.
The goal of a parallel algorithm is to solve a single problem using multiple processors working together and to do so in an efficient manner. In this regard, there is a need to categorize strategies in order to solve broad classes of problems with similar structures and requirements. In this dissertation, two parallel algorithm design strategies are considered: linked list ranking and parentheses matching.
Extracting information from a stack of data is a tedious task and the scenario is no different in proteomics. Volumes of research papers are published about study of various proteins in several species, their interactions with other proteins and identification of protein(s) as possible biomarker in causing diseases. It is a challenging task for biologists to keep track of these developments manually by reading through the literatures. Several tools have been developed by computer linguists to assist identification, extraction and hypotheses generation of proteins and protein-protein interactions from biomedical publications and protein databases. However, they are confronted with the challenges of term variation, term ambiguity, access only to abstracts and inconsistencies in time-consuming manual curation of protein and protein-protein interaction repositories. This work attempts to attenuate the challenges by extracting protein-protein interactions in humans and elicit possible interactions using associative rule mining on full text, abstracts and captions from figures available from publicly available biomedical literature databases. Two such databases are used in our study: Directory of Open Access Journals (DOAJ) and PubMed Central (PMC). A corpus is built using articles based on search terms. A dataset of more than 38,000 protein-protein interactions from the Human Protein Reference Database (HPRD) is cross-referenced to validate discovered interactive pairs. A set of an optimal size of possible binary protein-protein interactions is generated to be made available for clinician or biological validation. A significant change in the number of new associations was found by altering the thresholds for support and confidence metrics. This study narrows down the limitations for biologists in keeping pace with discovery of protein-protein interactions via manually reading the literature and their needs to validate each and every possible interaction.
There are two main approaches for intrusion detection: signature-based and anomaly-based. Signature-based detection employs pattern matching to match attack signatures with observed data making it ideal for detecting known attacks. However, it cannot detect unknown attacks for which there is no signature available. Anomaly-based detection builds a profile of normal system behavior to detect known and unknown attacks as behavioral deviations. However, it has a drawback of a high false alarm rate. In this thesis, we describe our anomaly-based IDS designed for detecting intrusions in cryptographic and application-level protocols. Our system has several unique characteristics, such as the ability to monitor cryptographic protocols and application-level protocols embedded in encrypted sessions, a very lightweight monitoring process, and the ability to react to protocol misuse by modifying protocol response directly.
This thesis describes experiments designed to measure the effect of collaborative communication on task performance of a multiagent system. A discrete event simulation was developed to model a multi-agent system completing a task to find and collect food resources, with the ability to substitute various communication and coordination methods. Experiments were conducted to find the effects of the various communication methods on completion of the task to find and harvest the food resources. Results show that communication decreases the time required to complete the task. However, all communication methods do not fare equally well. In particular, results indicate that the communication model of the bee is a particularly effective method of agent communication and collaboration. Furthermore, results indicate that direct communication with additional information content provides better completion results. Cost-benefit models show some conflicting information, indicating that the increased performance may not offset the additional cost of achieving that performance.
Students often use the web as a source of help for problems that they encounter on programming assignments.In this work, we seek to understand how students use the web to search for help on their assignments.We used a mixed methods approach with 344 students who complete a survey and 41 students who participate in a focus group meetings and helped in recording data about their search habits.The survey reveals data about student reported search habits while the focus group uses a web browser plug-in to record actual search patterns.We examine the results collectively and as broken down by class year.Survey results show that at least 2/3 of the students from each class year rely on search engines to locate resources for help with their programming bugs in at least half of their assignments;search habits vary by class year;and the value of different types of resources such as tutorials and forums varies by class year.Focus group results exposes the high frequency web sites used by the students in solving their programming assignments.
This dialog allows you to filter your current search.
Each of the Years listed note their name and the number of records that will be limited down to if you choose that option.
The list can be sorted by name or the count.
Having trouble finding an option within the list of Years? Start typing and we'll update the list to show only those items that match your needs.
This dialog allows you to filter your current search.
Each of the Department listed note their name and the number of records that will be limited down to if you choose that option.
The list can be sorted by name or the count.