Search Results

A Language and Visual Interface to Specify Complex Spatial Pattern Mining
The emerging interests in spatial pattern mining leads to the demand for a flexible spatial pattern mining language, on which easy to use and understand visual pattern language could be built. It is worthwhile to define a pattern mining language called LCSPM to allow users to specify complex spatial patterns. I describe a proposed pattern mining language in this paper. A visual interface which allows users to specify the patterns visually is developed. Visual pattern queries are translated into the LCSPM language by a parser and data mining process can be triggered afterwards. The visual language is based on and goes beyond the visual language proposed in literature. I implemented a prototype system based on the open source JUMP framework.
Deep Learning Approaches to Radio Map Estimation
Radio map estimation (RME) is the task of predicting radio power at all locations in a two-dimensional area and at all frequencies in a given band. This thesis explores four deep learning approaches to RME: dual path autoencoders, skip connection autoencoders, diffusion, and joint learning with transmitter localization.
Distributed Frameworks Towards Building an Open Data Architecture
Data is everywhere. The current Technological advancements in Digital, Social media and the ease at which the availability of different application services to interact with variety of systems are causing to generate tremendous volumes of data. Due to such varied services, Data format is now not restricted to only structure type like text but can generate unstructured content like social media data, videos and images etc. The generated Data is of no use unless been stored and analyzed to derive some Value. Traditional Database systems comes with limitations on the type of data format schema, access rates and storage sizes etc. Hadoop is an Apache open source distributed framework that support storing huge datasets of different formatted data reliably on its file system named Hadoop File System (HDFS) and to process the data stored on HDFS using MapReduce programming model. This thesis study is about building a Data Architecture using Hadoop and its related open source distributed frameworks to support a Data flow pipeline on a low commodity hardware. The Data flow components are, sourcing data, storage management on HDFS and data access layer. This study also discuss about a use case to utilize the architecture components. Sqoop, a framework to ingest the structured data from database onto Hadoop and Flume is used to ingest the semi-structured Twitter streaming json data on to HDFS for analysis. The data sourced using Sqoop and Flume have been analyzed using Hive for SQL like analytics and at a higher level of data access layer, Hadoop has been compared with an in memory computing system using Spark. Significant differences in query execution performances have been analyzed when working with Hadoop and Spark frameworks. This integration helps for ingesting huge Volumes of streaming json Variety data to derive better Value based analytics using Hive and …
Survey of Approximation Algorithms for Set Cover Problem
In this thesis, I survey 11 approximation algorithms for unweighted set cover problem. I have also implemented the three algorithms and created a software library that stores the code I have written. The algorithms I survey are: 1. Johnson's standard greedy; 2. f-frequency greedy; 3. Goldsmidt, Hochbaum and Yu's modified greedy; 4. Halldorsson's local optimization; 5. Dur and Furer semi local optimization; 6. Asaf Levin's improvement to Dur and Furer; 7. Simple rounding; 8. Randomized rounding; 9. LP duality; 10. Primal-dual schema; and 11. Network flow technique. Most of the algorithms surveyed are refinements of standard greedy algorithm.
A Smooth-turn Mobility Model for Airborne Networks
In this article, I introduce a novel airborne network mobility model, called the Smooth Turn Mobility Model, that captures the correlation of acceleration for airborne vehicles across time and spatial coordinates. E?ective routing in airborne networks (ANs) relies on suitable mobility models that capture the random movement pattern of airborne vehicles. As airborne vehicles cannot make sharp turns as easily as ground vehicles do, the widely used mobility models for Mobile Ad Hoc Networks such as Random Waypoint and Random Direction models fail. Our model is realistic in capturing the tendency of airborne vehicles toward making straight trajectory and smooth turns with large radius, and whereas is simple enough for tractable connectivity analysis and routing design.
FP-tree Based Spatial Co-location Pattern Mining
A co-location pattern is a set of spatial features frequently located together in space. A frequent pattern is a set of items that frequently appears in a transaction database. Since its introduction, the paradigm of frequent pattern mining has undergone a shift from candidate generation-and-test based approaches to projection based approaches. Co-location patterns resemble frequent patterns in many aspects. However, the lack of transaction concept, which is crucial in frequent pattern mining, makes the similar shift of paradigm in co-location pattern mining very difficult. This thesis investigates a projection based co-location pattern mining paradigm. In particular, a FP-tree based co-location mining framework and an algorithm called FP-CM, for FP-tree based co-location miner, are proposed. It is proved that FP-CM is complete, correct, and only requires a small constant number of database scans. The experimental results show that FP-CM outperforms candidate generation-and-test based co-location miner by an order of magnitude.
Adaptive Planning and Prediction in Agent-Supported Distributed Collaboration.
Agents that act as user assistants will become invaluable as the number of information sources continue to proliferate. Such agents can support the work of users by learning to automate time-consuming tasks and filter information to manageable levels. Although considerable advances have been made in this area, it remains a fertile area for further development. One application of agents under careful scrutiny is the automated negotiation of conflicts between different user's needs and desires. Many techniques require explicit user models in order to function. This dissertation explores a technique for dynamically constructing user models and the impact of using them to anticipate the need for negotiation. Negotiation is reduced by including an advising aspect to the agent that can use this anticipation of conflict to adjust user behavior.
A Netcentric Scientific Research Repository
The Internet and networks in general have become essential tools for disseminating in-formation. Search engines have become the predominant means of finding information on the Web and all other data repositories, including local resources. Domain scientists regularly acquire and analyze images generated by equipment such as microscopes and cameras, resulting in complex image files that need to be managed in a convenient manner. This type of integrated environment has been recently termed a netcentric sci-entific research repository. I developed a number of data manipulation tools that allow researchers to manage their information more effectively in a netcentric environment. The specific contributions are: (1) A unique interface for management of data including files and relational databases. A wrapper for relational databases was developed so that the data can be indexed and searched using traditional search engines. This approach allows data in databases to be searched with the same interface as other data. Fur-thermore, this approach makes it easier for scientists to work with their data if they are not familiar with SQL. (2) A Web services based architecture for integrating analysis op-erations into a repository. This technique allows the system to leverage the large num-ber of existing tools by wrapping them with a Web service and registering the service with the repository. Metadata associated with Web services was enhanced to allow this feature to be included. In addition, an improved binary to text encoding scheme was de-veloped to reduce the size overhead for sending large scientific data files via XML mes-sages used in Web services. (3) Integrated image analysis operations with SQL. This technique allows for images to be stored and managed conveniently in a relational da-tabase. SQL supplemented with map algebra operations is used to select and perform operations on sets of images.
Natural Language Interfaces to Databases
Natural language interfaces to databases (NLIDB) are systems that aim to bridge the gap between the languages used by humans and computers, and automatically translate natural language sentences to database queries. This thesis proposes a novel approach to NLIDB, using graph-based models. The system starts by collecting as much information as possible from existing databases and sentences, and transforms this information into a knowledge base for the system. Given a new question, the system will use this knowledge to analyze and translate the sentence into its corresponding database query statement. The graph-based NLIDB system uses English as the natural language, a relational database model, and SQL as the formal query language. In experiments performed with natural language questions ran against a large database containing information about U.S. geography, the system showed good performance compared to the state-of-the-art in the field.
Mediation on XQuery Views
The major goal of information integration is to provide efficient and easy-to-use access to multiple heterogeneous data sources with a single query. At the same time, one of the current trends is to use standard technologies for implementing solutions to complex software problems. In this dissertation, I used XML and XQuery as the standard technologies and have developed an extended projection algorithm to provide a solution to the information integration problem. In order to demonstrate my solution, I implemented a prototype mediation system called Omphalos based on XML related technologies. The dissertation describes the architecture of the system, its metadata, and the process it uses to answer queries. The system uses XQuery expressions (termed metaqueries) to capture complex mappings between global schemas and data source schemas. The system then applies these metaqueries in order to rewrite a user query on a virtual global database (representing the integrated view of the heterogeneous data sources) to a query (termed an outsourced query) on the real data sources. An extended XML document projection algorithm was developed to increase the efficiency of selecting the relevant subset of data from an individual data source to answer the user query. The system applies the projection algorithm to decompose an outsourced query into atomic queries which are each executed on a single data source. I also developed an algorithm to generate integrating queries, which the system uses to compose the answers from the atomic queries into a single answer to the original user query. I present a proof of both the extended XML document projection algorithm and the query integration algorithm. An analysis of the efficiency of the new extended algorithm is also presented. Finally I describe a collaborative schema-matching tool that was implemented to facilitate maintaining metadata.
Extracting Temporally-Anchored Knowledge from Tweets
Twitter has quickly become one of the most popular social media sites. It has 313 million monthly active users, and 500 million tweets are published daily. With the massive number of tweets, Twitter users share information about a location along with the temporal awareness. In this work, I focus on tweets where author of the tweets exclusively mentions a location in the tweet. Natural language processing systems can leverage wide range of information from the tweets to build applications like recommender systems that predict the location of the author. This kind of system can be used to increase the visibility of the targeted audience and can also provide recommendations interesting places to visit, hotels to stay, restaurants to eat, targeted on-line advertising, and co-traveler matching based on the temporal information extracted from a tweet. In this work I determine if the author of the tweet is present in the mentioned location of the tweet. I also determine if the author is present in the location before tweeting, while tweeting, or after tweeting. I introduce 5 temporal tags (before the tweet but > 24 hours; before the tweet but < 24 hours; during the tweet is posted; after the tweet is posted but < 24 hours; and after the tweet is posted but > 24 hours). The major contributions of this paper are: (1) creation of a corpus of 1062 tweets containing 1200 location named entities, containing annotations whether author of a tweet is or is not located in the location he tweets about with respect to 5 temporal tags; (2) detailed corpus analysis including real annotation examples and label distributions per temporal tag; (3) detailed inter-annotator agreements, including Cohen's kappa, Krippendorff's alpha and confusion matrices per temporal tag; (4) label distributions and analysis; and (5) supervised learning experiments, along with …
Developing a Wildlife Tracking Extension for ArcGIS
Wildlife tracking is an essential task to gain better understanding of the migration pattern and use of space of the wildlife. Advances in computer technology and global positioning systems (GPS) have lowered costs, reduced processing time, and improved accuracy for tracking wild animals. In this thesis, a wildlife tracking extension is developed for ArcGIS 9.x, which allows biologists and ecologists to effectively track, visualize and analyze the movement patterns of wild animals. The extension has four major components: (1) data import; (2) tracking; (3) spatial and temporal analysis; and (4) data export. Compared with existing software tools for wildlife tracking, the major features of the extension include: (1) wildlife tracking capabilities using a dynamic data layer supported by a file geodatabase with 1 TB storage limit; (2) spatial clustering of wildlife locations; (3) lacunarity analysis of one-dimensional individual animal trajectories and two-dimensional animal locations for better understanding of animal movement patterns; and (4) herds evolvement modeling and graphic representation. The application of the extension is demonstrated using simulated data, test data collected by a GPS collar, and a real dataset collected by ARGOS satellite telemetry for albatrosses in the Pacific Ocean.
A Wireless Traffic Surveillance System Using Video Analytics
Video surveillance systems have been commonly used in transportation systems to support traffic monitoring, speed estimation, and incident detection. However, there are several challenges in developing and deploying such systems, including high development and maintenance costs, bandwidth bottleneck for long range link, and lack of advanced analytics. In this thesis, I leverage current wireless, video camera, and analytics technologies, and present a wireless traffic monitoring system. I first present an overview of the system. Then I describe the site investigation and several test links with different hardware/software configurations to demonstrate the effectiveness of the system. The system development process was documented to provide guidelines for future development. Furthermore, I propose a novel speed-estimation analytics algorithm that takes into consideration roads with slope angles. I prove the correctness of the algorithm theoretically, and validate the effectiveness of the algorithm experimentally. The experimental results on both synthetic and real dataset show that the algorithm is more accurate than the baseline algorithm 80% of the time. On average the accuracy improvement of speed estimation is over 3.7% even for very small slope angles.
Determining Whether and When People Participate in the Events They Tweet About
This work describes an approach to determine whether people participate in the events they tweet about. Specifically, we determine whether people are participants in events with respect to the tweet timestamp. We target all events expressed by verbs in tweets, including past, present and events that may occur in future. We define event participant as people directly involved in an event regardless of whether they are the agent, recipient or play another role. We present an annotation effort, guidelines and quality analysis with 1,096 event mentions. We discuss the label distributions and event behavior in the annotated corpus. We also explain several features used and a standard supervised machine learning approach to automatically determine if and when the author is a participant of the event in the tweet. We discuss trends in the results obtained and devise important conclusions.
Location Estimation and Geo-Correlated Information Trends
A tremendous amount of information is being shared every day on social media sites such as Facebook, Twitter or Google+. However, only a small portion of users provide their location information, which can be helpful in targeted advertising and many other services. Current methods in location estimation using social relationships consider social friendship as a simple binary relationship. However, social closeness between users and structure of friends have strong implications on geographic distances. In the first task, we introduce new measures to evaluate the social closeness between users and structure of friends. Then we propose models that use them for location estimation. Compared with the models which take the friend relation as a binary feature, social closeness can help identify which friend of a user is more important and friend structure can help to determine significance level of locations, thus improving the accuracy of the location estimation models. A confidence iteration method is further introduced to improve estimation accuracy and overcome the problem of scarce location information. We evaluate our methods on two different datasets, Twitter and Gowalla. The results show that our model can improve the estimation accuracy by 5% - 20% compared with state-of-the-art friend-based models. In the second task, we also propose a Local Event Discovery and Summarization (LEDS) framework to detect local events from Twitter. Many existing algorithms for event detection focus on larger-scale events and are not sensitive to smaller-scale local events. Most of the local events detected by these methods are major events like important sports, shows, or big natural disasters. In this work, we propose the LEDS framework to detect both bigger and smaller events. LEDS contains three key steps: 1) Detecting possible event related terms by monitoring abnormal distribution in different locations and times; 2) Clustering tweets based on their key terms, …
Automated Real-time Objects Detection in Colonoscopy Videos for Quality Measurements
The effectiveness of colonoscopy depends on the quality of the inspection of the colon. There was no automated measurement method to evaluate the quality of the inspection. This thesis addresses this issue by investigating an automated post-procedure quality measurement technique and proposing a novel approach automatically deciding a percentage of stool areas in images of digitized colonoscopy video files. It involves the classification of image pixels based on their color features using a new method of planes on RGB (red, green and blue) color space. The limitation of post-procedure quality measurement is that quality measurements are available long after the procedure was done and the patient was released. A better approach is to inform any sub-optimal inspection immediately so that the endoscopist can improve the quality in real-time during the procedure. This thesis also proposes an extension to post-procedure method to detect stool, bite-block, and blood regions in real-time using color features in HSV color space. These three objects play a major role in quality measurements in colonoscopy. The proposed method partitions very large positive examples of each of these objects into a number of groups. These groups are formed by taking intersection of positive examples with a hyper plane. This hyper plane is named as 'positive plane'. 'Convex hulls' are used to model positive planes. Comparisons with traditional classifiers such as K-nearest neighbor (K-NN) and support vector machines (SVM) proves the soundness of the proposed method in terms of accuracy and speed that are critical in the targeted real-time quality measurement system.
Detection of Temporal Events and Abnormal Images for Quality Analysis in Endoscopy Videos
Recent reports suggest that measuring the objective quality is very essential towards the success of colonoscopy. Several quality indicators (i.e. metrics) proposed in recent studies are implemented in software systems that compute real-time quality scores for routine screening colonoscopy. Most quality metrics are derived based on various temporal events occurred during the colonoscopy procedure. The location of the phase boundary between the insertion and the withdrawal phases and the amount of circumferential inspection are two such important temporal events. These two temporal events can be determined by analyzing various camera motions of the colonoscope. This dissertation put forward a novel method to estimate X, Y and Z directional motions of the colonoscope using motion vector templates. Since abnormalities of a WCE or a colonoscopy video can be found in a small number of frames (around 5% out of total frames), it is very helpful if a computer system can decide whether a frame has any mucosal abnormalities. Also, the number of detected abnormal lesions during a procedure is used as a quality indicator. Majority of the existing abnormal detection methods focus on detecting only one type of abnormality or the overall accuracies are somewhat low if the method tries to detect multiple abnormalities. Most abnormalities in endoscopy images have unique textures which are clearly distinguishable from normal textures. In this dissertation a new method is proposed that achieves the objective of detecting multiple abnormalities with a higher accuracy using a multi-texture analysis technique. The multi-texture analysis method is designed by representing WCE and colonoscopy image textures as textons.
Enhancing Storage Dependability and Computing Energy Efficiency for Large-Scale High Performance Computing Systems
With the advent of information explosion age, larger capacity disk drives are used to store data and powerful devices are used to process big data. As the scale and complexity of computer systems increase, we expect these systems to provide dependable and energy-efficient services and computation. Although hard drives are reliable in general, they are the most commonly replaced hardware components. Disk failures cause data corruption and even data loss, which can significantly affect system performance and financial losses. In this dissertation research, I analyze different manifestations of disk failures in production data centers and explore data mining techniques combined with statistical analysis methods to discover categories of disk failures and their distinctive properties. I use similarity measures to quantify the degradation process of each failure type and derive the degradation signature. The derived degradation signatures are further leveraged to forecast when future disk failures may happen. Meanwhile, this dissertation also studies energy efficiency of high performance computers. Specifically, I characterize the power and energy consumption of Haswell processors which are used in multiple supercomputers, and analyze the power and energy consumption of Legion, a data-centric programming model and runtime system, and Legion applications. We find that power and energy efficiency can be improved significantly by optimizing the settings and runtime scheduling of processors, and Legion runtime performs well for larger-scale computation in terms of power and energy consumption.
Hybrid Approaches in Test Suite Prioritization
The rapid advancement of web and mobile application technologies has recently posed numerous challenges to the Software Engineering community, including how to cost-effectively test applications that have complex event spaces. Many software testing techniques attempt to cost-effectively improve the quality of such software. This dissertation primarily focuses on that of hybrid test suite prioritization. The techniques utilize two or more criteria to perform test suite prioritization as it is often insufficient to use only a single criterion. The dissertation consists of the following contributions: (1) a weighted test suite prioritization technique that employs the distance between criteria as a weighting factor, (2) a coarse-to-fine grained test suite prioritization technique that uses a multilevel approach to increase the granularity of the criteria at each subsequent iteration, (3) the Caret-HM tool for Android user session-based testing that allows testers to record, replay, and create heat maps from user interactions with Android applications via a web browser, and (4) Android user session-based test suite prioritization techniques that utilize heuristics developed from user sessions created by Caret-HM. Each of the chapters empirically evaluate the respective techniques. The proposed techniques generally show improved or equally good performance when compared to the baselines, depending on an application under test. Further, this dissertation provides guidance to testers as it relates to the use of the proposed hybrid techniques.
Classifying Pairwise Object Interactions: A Trajectory Analytics Approach
We have a huge amount of video data from extensively available surveillance cameras and increasingly growing technology to record the motion of a moving object in the form of trajectory data. With proliferation of location-enabled devices and ongoing growth in smartphone penetration as well as advancements in exploiting image processing techniques, tracking moving objects is more flawlessly achievable. In this work, we explore some domain-independent qualitative and quantitative features in raw trajectory (spatio-temporal) data in videos captured by a fixed single wide-angle view camera sensor in outdoor areas. We study the efficacy of those features in classifying four basic high level actions by employing two supervised learning algorithms and show how each of the features affect the learning algorithms’ overall accuracy as a single factor or confounded with others.
The Design Of A Benchmark For Geo-stream Management Systems
The recent growth in sensor technology allows easier information gathering in real-time as sensors have grown smaller, more accurate, and less expensive. The resulting data is often in a geo-stream format continuously changing input with a spatial extent. Researchers developing geo-streaming management systems (GSMS) require a benchmark system for evaluation, which is currently lacking. This thesis presents GSMark, a benchmark for evaluating GSMSs. GSMark provides a data generator that creates a combination of synthetic and real geo-streaming data, a workload simulator to present the data to the GSMS as a data stream, and a set of benchmark queries that evaluate typical GSMS functionality and query performance. In particular, GSMark generates both moving points and evolving spatial regions, two fundamental data types for a broad range of geo-stream applications, and the geo-streaming queries on this data.
Multi-perspective, Multi-modal Image Registration and Fusion
Multi-modal image fusion is an active research area with many civilian and military applications. Fusion is defined as strategic combination of information collected by various sensors from different locations or different types in order to obtain a better understanding of an observed scene or situation. Fusion of multi-modal images cannot be completed unless these two modalities are spatially aligned. In this research, I consider two important problems. Multi-modal, multi-perspective image registration and decision level fusion of multi-modal images. In particular, LiDAR and visual imagery. Multi-modal image registration is a difficult task due to the different semantic interpretation of features extracted from each modality. This problem is decoupled into three sub-problems. The first step is identification and extraction of common features. The second step is the determination of corresponding points. The third step consists of determining the registration transformation parameters. Traditional registration methods use low level features such as lines and corners. Using these features require an extensive optimization search in order to determine the corresponding points. Many methods use global positioning systems (GPS), and a calibrated camera in order to obtain an initial estimate of the camera parameters. The advantages of our work over the previous works are the following. First, I used high level-features, which significantly reduce the search space for the optimization process. Second, the determination of corresponding points is modeled as an assignment problem between a small numbers of objects. On the other side, fusing LiDAR and visual images is beneficial, due to the different and rich characteristics of both modalities. LiDAR data contain 3D information, while images contain visual information. Developing a fusion technique that uses the characteristics of both modalities is very important. I establish a decision-level fusion technique using manifold models.
Event Sequence Identification and Deep Learning Classification for Anomaly Detection and Predication on High-Performance Computing Systems
High-performance computing (HPC) systems continue growing in both scale and complexity. These large-scale, heterogeneous systems generate tens of millions of log messages every day. Effective log analysis for understanding system behaviors and identifying system anomalies and failures is highly challenging. Existing log analysis approaches use line-by-line message processing. They are not effective for discovering subtle behavior patterns and their transitions, and thus may overlook some critical anomalies. In this dissertation research, I propose a system log event block detection (SLEBD) method which can extract the log messages that belong to a component or system event into an event block (EB) accurately and automatically. At the event level, we can discover new event patterns, the evolution of system behavior, and the interaction among different system components. To find critical event sequences, existing sequence mining methods are mostly based on the a priori algorithm which is compute-intensive and runs for a long time. I develop a novel, topology-aware sequence mining (TSM) algorithm which is efficient to generate sequence patterns from the extracted event block lists. I also train a long short-term memory (LSTM) model to cluster sequences before specific events. With the generated sequence pattern and trained LSTM model, we can predict whether an event is going to occur normally or not. To accelerate such predictions, I propose a design flow by which we can convert recurrent neural network (RNN) designs into register-transfer level (RTL) implementations which are deployed on FPGAs. Due to its high parallelism and low power, FPGA achieves a greater speedup and better energy efficiency compared to CPU and GPU according to our experimental results.
Integrating Multiple Deep Learning Models to Classify Disaster Scene Videos
Recently, disaster scene description and indexing challenges attract the attention of researchers. In this dissertation, we solve a disaster-related multi-labeling task using a newly developed Low Altitude Disaster Imagery dataset. In the first task, we realize video content by selecting a set of summary key frames to represent the video sequence. Through inter-frame differences, the key frames are generated. The key frame extraction of disaster-related video clips is a powerful tool that can efficiently convert video data into image-level data, reduce the requirements for the extraction environment and improve the applicable environment. In the second, we propose a novel application of using deep learning methods on low altitude disaster video feature recognition. Supervised learning-based deep-learning approaches are effective in disaster-related features recognition via foreground object detection and background classification. Performed dataset validation, our model generalized well and improved performance by optimizing the YOLOv3 model and combining it with Resnet50. The comprehensive models showed more efficient and effective than those in prior published works. In the third task, we optimize the whole scene labeling classification by pruning the lightweight model MobileNetV3, which shows superior generalizability and can disaster features recognition from a disaster-related dataset be accomplished efficiently to assist disaster recovery.
A New Wireless Sensor Node Design for Program Isolation and Power Flexibility
Over-the-air programming systems for wireless sensor networks have drawbacks that stem from fundamental limitations in the hardware used in current sensor nodes. Also, advances in technology make it feasible to use capacitors as the sole energy storage mechanism for sensor nodes using energy harvesting, but most current designs require additional electronics. These two considerations led to the design of a new sensor node. A microcontroller was chosen that meets the Popek and Goldberg virtualization requirements. The hardware design for this new sensor node is presented, as well as a preliminary operating system. The prototypes are tested, and demonstrated to be sustainable with a capacitor and solar panel. The issue of capacitor leakage is considered and measured.
Toward Supporting Fine-Grained, Structured, Meaningful and Engaging Feedback in Educational Applications
Recent advancements in machine learning have started to put their mark on educational technology. Technology is evolving fast and, as people adopt it, schools and universities must also keep up (nearly 70% of primary and secondary schools in the UK are now using tablets for various purposes). As these numbers are likely going to follow the same increasing trend, it is imperative for schools to adapt and benefit from the advantages offered by technology: real-time processing of data, availability of different resources through connectivity, efficiency, and many others. To this end, this work contributes to the growth of educational technology by developing several algorithms and models that are meant to ease several tasks for the instructors, engage students in deep discussions and ultimately, increase their learning gains. First, a novel, fine-grained knowledge representation is introduced that splits phrases into their constituent propositions that are both meaningful and minimal. An automated extraction algorithm of the propositions is also introduced. Compared with other fine-grained representations, the extraction model does not require any human labor after it is trained, while the results show considerable improvement over two meaningful baselines. Second, a proposition alignment model is created that relies on even finer-grained units of text while also outperforming several alternative systems. Third, a detailed machine learning based analysis of students' unrestricted natural language responses to questions asked in classrooms is made by leveraging the proposition extraction algorithm to make computational predictions of textual assessment. Two computational approaches are introduced that use and compare manually engineered machine learning features with word embeddings input into a two-hidden layers neural network. Both methods achieve notable improvements over two alternative approaches, a recent short answer grading system and DiSAN – a recent, pre-trained, light-weight neural network that obtained state-of-the-art performance on multiple NLP tasks and corpora. Fourth, a …
Multi-Source Large Scale Bike Demand Prediction
Current works of bike demand prediction mainly focus on cluster level and perform poorly on predicting demands of a single station. In the first task, we introduce a contextual based bike demand prediction model, which predicts bike demands for per station by combining spatio-temporal network and environment contexts synergistically. Furthermore, since people's movement information is an important factor, which influences the bike demands of each station. To have a better understanding of people's movements, we need to analyze the relationship between different places. In the second task, we propose an origin-destination model to learn place representations by using large scale movement data. Then based on the people's movement information, we incorporate the place embedding into our bike demand prediction model, which is built by using multi-source large scale datasets: New York Citi bike data, New York taxi trip records, and New York POI data. Finally, as deep learning methods have been successfully applied to many fields such as image recognition and natural language processing, it inspires us to incorporate the complex deep learning method into the bike demand prediction problem. So in this task, we propose a deep spatial-temporal (DST) model, which contains three major components: spatial dependencies, temporal dependencies, and external influence. Experiments on the NYC Citi Bike system show the effectiveness and efficiency of our model when compared with the state-of-the-art methods.
GPS CaPPture: a System for GPS Trajectory Collection, Processing, and Destination Prediction
In the United States, smartphone ownership surpassed 69.5 million in February 2011 with a large portion of those users (20%) downloading applications (apps) that enhance the usability of a device by adding additional functionality. a large percentage of apps are written specifically to utilize the geographical position of a mobile device. One of the prime factors in developing location prediction models is the use of historical data to train such a model. with larger sets of training data, prediction algorithms become more accurate; however, the use of historical data can quickly become a downfall if the GPS stream is not collected or processed correctly. Inaccurate or incomplete or even improperly interpreted historical data can lead to the inability to develop accurately performing prediction algorithms. As GPS chipsets become the standard in the ever increasing number of mobile devices, the opportunity for the collection of GPS data increases remarkably. the goal of this study is to build a comprehensive system that addresses the following challenges: (1) collection of GPS data streams in a manner such that the data is highly usable and has a reduction in errors; (2) processing and reduction of the collected data in order to prepare it and make it highly usable for the creation of prediction algorithms; (3) creation of prediction/labeling algorithms at such a level that they are viable for commercial use. This study identifies the key research problems toward building the CaPPture (collection, processing, prediction) system.
Toward a Data-Type-Based Real Time Geospatial Data Stream Management System
The advent of sensory and communication technologies enables the generation and consumption of large volumes of streaming data. Many of these data streams are geo-referenced. Existing spatio-temporal databases and data stream management systems are not capable of handling real time queries on spatial extents. In this thesis, we investigated several fundamental research issues toward building a data-type-based real time geospatial data stream management system. The thesis makes contributions in the following areas: geo-stream data models, aggregation, window-based nearest neighbor operators, and query optimization strategies. The proposed geo-stream data model is based on second-order logic and multi-typed algebra. Both abstract and discrete data models are proposed and exemplified. I further propose two useful geo-stream operators, namely Region By and WNN, which abstract common aggregation and nearest neighbor queries as generalized data model constructs. Finally, I propose three query optimization algorithms based on spatial, temporal, and spatio-temporal constraints of geo-streams. I show the effectiveness of the data model through many query examples. The effectiveness and the efficiency of the algorithms are validated through extensive experiments on both synthetic and real data sets. This work established the fundamental building blocks toward a full-fledged geo-stream database management system and has potential impact in many applications such as hazard weather alerting and monitoring, traffic analysis, and environmental modeling.
Autonomic Failure Identification and Diagnosis for Building Dependable Cloud Computing Systems
The increasingly popular cloud-computing paradigm provides on-demand access to computing and storage with the appearance of unlimited resources. Users are given access to a variety of data and software utilities to manage their work. Users rent virtual resources and pay for only what they use. In spite of the many benefits that cloud computing promises, the lack of dependability in shared virtualized infrastructures is a major obstacle for its wider adoption, especially for mission-critical applications. Virtualization and multi-tenancy increase system complexity and dynamicity. They introduce new sources of failure degrading the dependability of cloud computing systems. To assure cloud dependability, in my dissertation research, I develop autonomic failure identification and diagnosis techniques that are crucial for understanding emergent, cloud-wide phenomena and self-managing resource burdens for cloud availability and productivity enhancement. We study the runtime cloud performance data collected from a cloud test-bed and by using traces from production cloud systems. We define cloud signatures including those metrics that are most relevant to failure instances. We exploit profiled cloud performance data in both time and frequency domain to identify anomalous cloud behaviors and leverage cloud metric subspace analysis to automate the diagnosis of observed failures. We implement a prototype of the anomaly identification system and conduct the experiments in an on-campus cloud computing test-bed and by using the Google datacenter traces. Our experimental results show that our proposed anomaly detection mechanism can achieve 93% detection sensitivity while keeping the false positive rate as low as 6.1% and outperform other tested anomaly detection schemes. In addition, the anomaly detector adapts itself by recursively learning from these newly verified detection results to refine future detection.
Detection of Ulcerative Colitis Severity and Enhancement of Informative Frame Filtering Using Texture Analysis in Colonoscopy Videos
There are several types of disorders that affect our colon’s ability to function properly such as colorectal cancer, ulcerative colitis, diverticulitis, irritable bowel syndrome and colonic polyps. Automatic detection of these diseases would inform the endoscopist of possible sub-optimal inspection during the colonoscopy procedure as well as save time during post-procedure evaluation. But existing systems only detects few of those disorders like colonic polyps. In this dissertation, we address the automatic detection of another important disorder called ulcerative colitis. We propose a novel texture feature extraction technique to detect the severity of ulcerative colitis in block, image, and video levels. We also enhance the current informative frame filtering methods by detecting water and bubble frames using our proposed technique. Our feature extraction algorithm based on accumulation of pixel value difference provides better accuracy at faster speed than the existing methods making it highly suitable for real-time systems. We also propose a hybrid approach in which our feature method is combined with existing feature method(s) to provide even better accuracy. We extend the block and image level detection method to video level severity score calculation and shot segmentation. Also, the proposed novel feature extraction method can detect water and bubble frames in colonoscopy videos with very high accuracy in significantly less processing time even when clustering is used to reduce the training size by 10 times.
Adaptive Power Management for Autonomic Resource Configuration in Large-scale Computer Systems
In order to run and manage resource-intensive high-performance applications, large-scale computing and storage platforms have been evolving rapidly in various domains in both academia and industry. The energy expenditure consumed to operate and maintain these cloud computing infrastructures is a major factor to influence the overall profit and efficiency for most cloud service providers. Moreover, considering the mitigation of environmental damage from excessive carbon dioxide emission, the amount of power consumed by enterprise-scale data centers should be constrained for protection of the environment.Generally speaking, there exists a trade-off between power consumption and application performance in large-scale computing systems and how to balance these two factors has become an important topic for researchers and engineers in cloud and HPC communities. Therefore, minimizing the power usage while satisfying the Service Level Agreements have become one of the most desirable objectives in cloud computing research and implementation. Since the fundamental feature of the cloud computing platform is hosting workloads with a variety of characteristics in a consolidated and on-demand manner, it is demanding to explore the inherent relationship between power usage and machine configurations. Subsequently, with an understanding of these inherent relationships, researchers are able to develop effective power management policies to optimize productivity by balancing power usage and system performance. In this dissertation, we develop an autonomic power-aware system management framework for large-scale computer systems. We propose a series of techniques including coarse-grain power profiling, VM power modelling, power-aware resource auto-configuration and full-system power usage simulator. These techniques help us to understand the characteristics of power consumption of various system components. Based on these techniques, we are able to test various job scheduling strategies and develop resource management approaches to enhance the systems' power efficiency.
Data-Driven Decision-Making Framework for Large-Scale Dynamical Systems under Uncertainty
Managing large-scale dynamical systems (e.g., transportation systems, complex information systems, and power networks, etc.) in real-time is very challenging considering their complicated system dynamics, intricate network interactions, large scale, and especially the existence of various uncertainties. To address this issue, intelligent techniques which can quickly design decision-making strategies that are robust to uncertainties are needed. This dissertation aims to conquer these challenges by exploring a data-driven decision-making framework, which leverages big-data techniques and scalable uncertainty evaluation approaches to quickly solve optimal control problems. In particular, following techniques have been developed along this direction: 1) system modeling approaches to simplify the system analysis and design procedures for multiple applications; 2) effective simulation and analytical based approaches to efficiently evaluate system performance and design control strategies under uncertainty; and 3) big-data techniques that allow some computations of control strategies to be completed offline. These techniques and tools for analysis, design and control contribute to a wide range of applications including air traffic flow management, complex information systems, and airborne networks.
Extracting Useful Information from Social Media during Disaster Events
In recent years, social media platforms such as Twitter and Facebook have emerged as effective tools for broadcasting messages worldwide during disaster events. With millions of messages posted through these services during such events, it has become imperative to identify valuable information that can help the emergency responders to develop effective relief efforts and aid victims. Many studies implied that the role of social media during disasters is invaluable and can be incorporated into emergency decision-making process. However, due to the "big data" nature of social media, it is very labor-intensive to employ human resources to sift through social media posts and categorize/classify them as useful information. Hence, there is a growing need for machine intelligence to automate the process of extracting useful information from the social media data during disaster events. This dissertation addresses the following questions: In a social media stream of messages, what is the useful information to be extracted that can help emergency response organizations to become more situationally aware during and following a disaster? What are the features (or patterns) that can contribute to automatically identifying messages that are useful during disasters? We explored a wide variety of features in conjunction with supervised learning algorithms to automatically identify messages that are useful during disaster events. The feature design includes sentiment features to extract the geo-mapped sentiment expressed in tweets, as well as tweet-content and user detail features to predict the likelihood of the information contained in a tweet to be quickly spread in the network. Further experimentation is carried out to see how these features help in identifying the informative tweets and filter out those tweets that are conversational in nature.
SEM Predicting Success of Student Global Software Development Teams
The extensive use of global teams to develop software has prompted researchers to investigate various factors that can enhance a team’s performance. While a significant body of research exists on global software teams, previous research has not fully explored the interrelationships and collective impact of various factors on team performance. This study explored a model that added the characteristics of a team’s culture, ability, communication frequencies, response rates, and linguistic categories to a central framework of team performance. Data was collected from two student software development projects that occurred between teams located in the United States, Panama, and Turkey. The data was obtained through online surveys and recorded postings of team activities that occurred throughout the global software development projects. Partial least squares path modeling (PLS-PM) was chosen as the analytic technique to test the model and identify the most influential factors. Individual factors associated with response rates and linguistic characteristics proved to significantly affect a team’s activity related to grade on the project, group cohesion, and the number of messages received and sent. Moreover, an examination of possible latent homogeneous segments in the model supported the existence of differences among groups based on leadership style. Teams with assigned leaders tended to have stronger relationships between linguistic characteristics and team performance factors, while teams with emergent leaders had stronger. Relationships between response rates and team performance factors. The contributions in this dissertation are three fold. 1) Novel analysis techniques using PLS-PM and clustering, 2) Use of new, quantifiable variables in analyzing team activity, 3) Identification of plausible causal indicators for team performance and analysis of the same.
Video Analytics with Spatio-Temporal Characteristics of Activities
As video capturing devices become more ubiquitous from surveillance cameras to smart phones, the demand of automated video analysis is increasing as never before. One obstacle in this process is to efficiently locate where a human operator’s attention should be, and another is to determine the specific types of activities or actions without ambiguity. It is the special interest of this dissertation to locate spatial and temporal regions of interest in videos and to develop a better action representation for video-based activity analysis. This dissertation follows the scheme of “locating then recognizing” activities of interest in videos, i.e., locations of potentially interesting activities are estimated before performing in-depth analysis. Theoretical properties of regions of interest in videos are first exploited, based on which a unifying framework is proposed to locate both spatial and temporal regions of interest with the same settings of parameters. The approach estimates the distribution of motion based on 3D structure tensors, and locates regions of interest according to persistent occurrences of low probability. Two contributions are further made to better represent the actions. The first is to construct a unifying model of spatio-temporal relationships between reusable mid-level actions which bridge low-level pixels and high-level activities. Dense trajectories are clustered to construct mid-level actionlets, and the temporal relationships between actionlets are modeled as Action Graphs based on Allen interval predicates. The second is an effort for a novel and efficient representation of action graphs based on a sparse coding framework. Action graphs are first represented using Laplacian matrices and then decomposed as a linear combination of primitive dictionary items following sparse coding scheme. The optimization is eventually formulated and solved as a determinant maximization problem, and 1-nearest neighbor is used for action classification. The experiments have shown better results than existing approaches for regions-of-interest detection and action …
SurfKE: A Graph-Based Feature Learning Framework for Keyphrase Extraction
Current unsupervised approaches for keyphrase extraction compute a single importance score for each candidate word by considering the number and quality of its associated words in the graph and they are not flexible enough to incorporate multiple types of information. For instance, nodes in a network may exhibit diverse connectivity patterns which are not captured by the graph-based ranking methods. To address this, we present a new approach to keyphrase extraction that represents the document as a word graph and exploits its structure in order to reveal underlying explanatory factors hidden in the data that may distinguish keyphrases from non-keyphrases. Experimental results show that our model, which uses phrase graph representations in a supervised probabilistic framework, obtains remarkable improvements in performance over previous supervised and unsupervised keyphrase extraction systems.
Trajectories As a Unifying Cross Domain Feature for Surveillance Systems
Manual video analysis is apparently a tedious task. An efficient solution is of highly importance to automate the process and to assist operators. A major goal of video analysis is understanding and recognizing human activities captured by surveillance cameras, a very challenging problem; the activities can be either individual or interactional among multiple objects. It involves extraction of relevant spatial and temporal information from visual images. Most video analytics systems are constrained by specific environmental situations. Different domains may require different specific knowledge to express characteristics of interesting events. Spatial-temporal trajectories have been utilized to capture motion characteristics of activities. The focus of this dissertation is on how trajectories are utilized in assist in developing video analytic system in the context of surveillance. The research as reported in this dissertation begins real-time highway traffic monitoring and dynamic traffic pattern analysis and in the end generalize the knowledge to event and activity analysis in a broader context. The main contributions are: the use of the graph-theoretic dominant set approach to the classification of traffic trajectories; the ability to first partition the trajectory clusters using entry and exit point awareness to significantly improve the clustering effectiveness and to reduce the computational time and complexity in the on-line processing of new trajectories; A novel tracking method that uses the extended 3-D Hungarian algorithm with a Kalman filter to preserve the smoothness of motion; a novel camera calibration method to determine the second vanishing point with no operator assistance; and a logic reasoning framework together with a new set of context free LLEs which could be utilized across different domains. Additional efforts have been made for three comprehensive surveillance systems together with main contributions mentioned above.
Computational Epidemiology - Analyzing Exposure Risk: A Deterministic, Agent-Based Approach
Many infectious diseases are spread through interactions between susceptible and infectious individuals. Keeping track of where each exposure to the disease took place, when it took place, and which individuals were involved in the exposure can give public health officials important information that they may use to formulate their interventions. Further, knowing which individuals in the population are at the highest risk of becoming infected with the disease may prove to be a useful tool for public health officials trying to curtail the spread of the disease. Epidemiological models are needed to allow epidemiologists to study the population dynamics of transmission of infectious agents and the potential impact of infectious disease control programs. While many agent-based computational epidemiological models exist in the literature, they focus on the spread of disease rather than exposure risk. These models are designed to simulate very large populations, representing individuals as agents, and using random experiments and probabilities in an attempt to more realistically guide the course of the modeled disease outbreak. The work presented in this thesis focuses on tracking exposure risk to chickenpox in an elementary school setting. This setting is chosen due to the high level of detailed information realistically available to school administrators regarding individuals' schedules and movements. Using an agent-based approach, contacts between individuals are tracked and analyzed with respect to both individuals and locations. The results are then analyzed using a combination of tools from computer science and geographic information science.
Analyzing Microwave Spectra Collected by the Solar Radio Burst Locator
Modern communication systems rely heavily upon microwave, radio, and other electromagnetic frequency bands as a means of providing wireless communication links. Although convenient, wireless communication is susceptible to electromagnetic interference. Solar activity causes both direct interference through electromagnetic radiation as well as indirect interference caused by charged particles interacting with Earth's magnetic field. The Solar Radio Burst Locator (SRBL) is a United States Air Force radio telescope designed to detect and locate solar microwave bursts as they occur on the Sun. By analyzing these events, the Air Force hopes to gain a better understanding of the root causes of solar interference and improve interference forecasts. This thesis presents methods of searching and analyzing events found in the previously unstudied SRBL data archive. A new web-based application aids in the searching and visualization of the data. Comparative analysis is performed amongst data collected by SRBL and several other instruments. This thesis also analyzes events across the time, intensity, and frequency domains. These analysis methods can be used to aid in the detection and understanding of solar events so as to provide improved forecasts of solar-induced electromagnetic interference.
Design and Implementation of Large-Scale Wireless Sensor Networks for Environmental Monitoring Applications
Environmental monitoring represents a major application domain for wireless sensor networks (WSN). However, despite significant advances in recent years, there are still many challenging issues to be addressed to exploit the full potential of the emerging WSN technology. In this dissertation, we introduce the design and implementation of low-power wireless sensor networks for long-term, autonomous, and near-real-time environmental monitoring applications. We have developed an out-of-box solution consisting of a suite of software, protocols and algorithms to provide reliable data collection with extremely low power consumption. Two wireless sensor networks based on the proposed solution have been deployed in remote field stations to monitor soil moisture along with other environmental parameters. As parts of the ever-growing environmental monitoring cyberinfrastructure, these networks have been integrated into the Texas Environmental Observatory system for long-term operation. Environmental measurement and network performance results are presented to demonstrate the capability, reliability and energy-efficiency of the network.
Trajectory Analytics
The numerous surveillance videos recorded by a single stationary wide-angle-view camera persuade the use of a moving point as the representation of each small-size object in wide video scene. The sequence of the positions of each moving point can be used to generate a trajectory containing both spatial and temporal information of object's movement. In this study, we investigate how the relationship between two trajectories can be used to recognize multi-agent interactions. For this purpose, we present a simple set of qualitative atomic disjoint trajectory-segment relations which can be utilized to represent the relationships between two trajectories. Given a pair of adjacent concurrent trajectories, we segment the trajectory pair to get the ordered sequence of related trajectory-segments. Each pair of corresponding trajectory-segments then is assigned a token associated with the trajectory-segment relation, which leads to the generation of a string called a pairwise trajectory-segment relationship sequence. From a group of pairwise trajectory-segment relationship sequences, we utilize an unsupervised learning algorithm, particularly the k-medians clustering, to detect interesting patterns that can be used to classify lower-level multi-agent activities. We evaluate the effectiveness of the proposed approach by comparing the activity classes predicted by our method to the actual classes from the ground-truth set obtained using the crowdsourcing technique. The results show that the relationships between a pair of trajectories can signify the low-level multi-agent activities.
Content and Temporal Analysis of Communications to Predict Task Cohesion in Software Development Global Teams
Virtual teams in industry are increasingly being used to develop software, create products, and accomplish tasks. However, analyzing those collaborations under same-time/different-place conditions is well-known to be difficult. In order to overcome some of these challenges, this research was concerned with the study of collaboration-based, content-based and temporal measures and their ability to predict cohesion within global software development projects. Messages were collected from three software development projects that involved students from two different countries. The similarities and quantities of these interactions were computed and analyzed at individual and group levels. Results of interaction-based metrics showed that the collaboration variables most related to Task Cohesion were Linguistic Style Matching and Information Exchange. The study also found that Information Exchange rate and Reply rate have a significant and positive correlation to Task Cohesion, a factor used to describe participants' engagement in the global software development process. This relation was also found at the Group level. All these results suggest that metrics based on rate can be very useful for predicting cohesion in virtual groups. Similarly, content features based on communication categories were used to improve the identification of Task Cohesion levels. This model showed mixed results, since only Work similarity and Social rate were found to be correlated with Task Cohesion. This result can be explained by how a group's cohesiveness is often associated with fairness and trust, and that these two factors are often achieved by increased social and work communications. Also, at a group-level, all models were found correlated to Task Cohesion, specifically, Similarity+Rate, which suggests that models that include social and work communication categories are also good predictors of team cohesiveness. Finally, temporal interaction similarity measures were calculated to assess their prediction capabilities in a global setting. Results showed a significant negative correlation between the Pacing Rate and …
Smartphone-based Household Travel Survey - a Literature Review, an App, and a Pilot Survey
High precision data from household travel survey (HTS) is extremely important for the transportation research, traffic models and policy formulation. Traditional methods of data collection were imprecise because they relied on people’s memories of trip information, such as date and location, and the remainder data had to be obtained by certain supplemental tools. The traditional methods suffered from intensive labor, large time consumption, and unsatisfactory data precision. Recent research trends to employ smartphone apps to collect HTS data. In this study, there are two goals to be addressed. First, a smartphone app is developed to realize a smartphone-based method only for data collection. Second, the researcher evaluates whether this method can supply or replace the traditional tools of HTS. Based on this premise, the smartphone app, TravelSurvey, is specially developed and used for this study. TravelSurvey is currently compatible with iPhone 4 or higher and iPhone Operating System (iOS) 6 or higher, except iPhone 6 or iPhone 6 plus and iOS 8. To evaluate the feasibility, eight individuals are recruited to participate in a pilot HTS. Afterwards, seven of them are involved in a semi-structured interview. The interview is designed to collect interviewees’ feedback directly, so the interview mainly concerns the users’ experience of TravelSurvey. Generally, the feedback is positive. In this study, the pilot HTS data is successfully uploaded to the server by the participants, and the interviewees prefer this smartphone-based method. Therefore, as a new tool, the smartphone-based method feasibly supports a typical HTS for data collection.
A Framework for Analyzing and Optimizing Regional Bio-Emergency Response Plans
The presence of naturally occurring and man-made public health threats necessitate the design and implementation of mitigation strategies, such that adequate response is provided in a timely manner. Since multiple variables, such as geographic properties, resource constraints, and government mandated time-frames must be accounted for, computational methods provide the necessary tools to develop contingency response plans while respecting underlying data and assumptions. A typical response scenario involves the placement of points of dispensing (PODs) in the affected geographic region to supply vaccines or medications to the general public. Computational tools aid in the analysis of such response plans, as well as in the strategic placement of PODs, such that feasible response scenarios can be developed. Due to the sensitivity of bio-emergency response plans, geographic information, such as POD locations, must be kept confidential. The generation of synthetic geographic regions allows for the development of emergency response plans on non-sensitive data, as well as for the study of the effects of single geographic parameters. Further, synthetic representations of geographic regions allow for results to be published and evaluated by the scientific community. This dissertation presents methodology for the analysis of bio-emergency response plans, methods for plan optimization, as well as methodology for the generation of synthetic geographic regions.
Monitoring Dengue Outbreaks Using Online Data
Internet technology has affected humans' lives in many disciplines. The search engine is one of the most important Internet tools in that it allows people to search for what they want. Search queries entered in a web search engine can be used to predict dengue incidence. This vector borne disease causes severe illness and kills a large number of people every year. This dissertation utilizes the capabilities of search queries related to dengue and climate to forecast the number of dengue cases. Several machine learning techniques are applied for data analysis, including Multiple Linear Regression, Artificial Neural Networks, and the Seasonal Autoregressive Integrated Moving Average. Predictive models produced from these machine learning methods are measured for their performance to find which technique generates the best model for dengue prediction. The results of experiments presented in this dissertation indicate that search query data related to dengue and climate can be used to forecast the number of dengue cases. The performance measurement of predictive models shows that Artificial Neural Networks outperform the others. These results will help public health officials in planning to deal with the outbreaks.
Comparative Study of RSS-Based Collaborative Localization Methods in Wireless Sensor Networks
In this thesis two collaborative localization techniques are studied: multidimensional scaling (MDS) and maximum likelihood estimator (MLE). A synthesis of a new location estimation method through a serial integration of these two techniques, such that an estimate is first obtained using MDS and then MLE is employed to fine-tune the MDS solution, was the subject of this research using various simulation and experimental studies. In the simulations, important issues including the effects of sensor node density, reference node density and different deployment strategies of reference nodes were addressed. In the experimental study, the path loss model of indoor environments is developed by determining the environment-specific parameters from the experimental measurement data. Then, the empirical path loss model is employed in the analysis and simulation study of the performance of collaborative localization techniques.
Physical-Layer Network Coding for MIMO Systems
The future wireless communication systems are required to meet the growing demands of reliability, bandwidth capacity, and mobility. However, as corruptions such as fading effects, thermal noise, are present in the channel, the occurrence of errors is unavoidable. Motivated by this, the work in this dissertation attempts to improve the system performance by way of exploiting schemes which statistically reduce the error rate, and in turn boost the system throughput. The network can be studied using a simplified model, the two-way relay channel, where two parties exchange messages via the assistance of a relay in between. In such scenarios, this dissertation performs theoretical analysis of the system, and derives closed-form and upper bound expressions of the error probability. These theoretical measurements are potentially helpful references for the practical system design. Additionally, several novel transmission methods including block relaying, permutation modulations for the physical-layer network coding, are proposed and discussed. Numerical simulation results are presented to support the validity of the conclusions.
Exploring Privacy in Location-based Services Using Cryptographic Protocols
Location-based services (LBS) are available on a variety of mobile platforms like cell phones, PDA's, etc. and an increasing number of users subscribe to and use these services. Two of the popular models of information flow in LBS are the client-server model and the peer-to-peer model, in both of which, existing approaches do not always provide privacy for all parties concerned. In this work, I study the feasibility of applying cryptographic protocols to design privacy-preserving solutions for LBS from an experimental and theoretical standpoint. In the client-server model, I construct a two-phase framework for processing nearest neighbor queries using combinations of cryptographic protocols such as oblivious transfer and private information retrieval. In the peer-to-peer model, I present privacy preserving solutions for processing group nearest neighbor queries in the semi-honest and dishonest adversarial models. I apply concepts from secure multi-party computation to realize our constructions and also leverage the capabilities of trusted computing technology, specifically TPM chips. My solution for the dishonest adversarial model is also of independent cryptographic interest. I prove my constructions secure under standard cryptographic assumptions and design experiments for testing the feasibility or practicability of our constructions and benchmark key operations. My experiments show that the proposed constructions are practical to implement and have reasonable costs, while providing strong privacy assurances.
Reliability Characterization and Performance Analysis of Solid State Drives in Data Centers
NAND flash-based solid state drives (SSDs) have been widely adopted in data centers and high performance computing (HPC) systems due to their better performance compared with hard disk drives. However, little is known about the reliability characteristics of SSDs in production systems. Existing works that study the statistical distributions of SSD failures in the field lack insights into distinct characteristics of SSDs. In this dissertation, I explore the SSD-specific SMART (Self-Monitoring, Analysis, and Reporting Technology) attributes and conduct in-depth analysis of SSD reliability in a production environment with a focus on the unique error types and health dynamics. QLC SSD delivers better performance in a cost-effective way. I study QLC SSDs in terms of their architecture and performance. In addition, I apply thermal stress tests to QLC SSDs and quantify their performance degradation processes. Various types of big data and machine learning workloads have been executed on SSDs under varying temperatures. The SSD throughput and application performance are analyzed and characterized.
Back to Top of Screen