Search Results

Online Construction of Android Application Test Suites
Mobile applications play an important role in the dissemination of computing and information resources. They are often used in domains such as mobile banking, e-commerce, and health monitoring. Cost-effective testing techniques in these domains are critical. This dissertation contributes novel techniques for automatic construction of mobile application test suites. In particular, this work provides solutions that focus on the prohibitively large number of possible event sequences that must be sampled in GUI-based mobile applications. This work makes three major contributions: (1) an automated GUI testing tool, Autodroid, that implements a novel online approach to automatic construction of Android application test suites (2) probabilistic and combinatorial-based algorithms that systematically sample the input space of Android applications to generate test suites with GUI/context events and (3) empirical studies to evaluate the cost-effectiveness of our techniques on real-world Android applications. Our experiments show that our techniques achieve better code coverage and event coverage compared to random test generation. We demonstrate that our techniques are useful for automatic construction of Android application test suites in the absence of source code and preexisting abstract models of an Application Under Test (AUT). The insights derived from our empirical studies provide guidance to researchers and practitioners involved in the development of automated GUI testing tools for Android applications.
Joint Schemes for Physical Layer Security and Error Correction
The major challenges facing resource constraint wireless devices are error resilience, security and speed. Three joint schemes are presented in this research which could be broadly divided into error correction based and cipher based. The error correction based ciphers take advantage of the properties of LDPC codes and Nordstrom Robinson code. A cipher-based cryptosystem is also presented in this research. The complexity of this scheme is reduced compared to conventional schemes. The securities of the ciphers are analyzed against known-plaintext and chosen-plaintext attacks and are found to be secure. Randomization test was also conducted on these schemes and the results are presented. For the proof of concept, the schemes were implemented in software and hardware and these shows a reduction in hardware usage compared to conventional schemes. As a result, joint schemes for error correction and security provide security to the physical layer of wireless communication systems, a layer in the protocol stack where currently little or no security is implemented. In this physical layer security approach, the properties of powerful error correcting codes are exploited to deliver reliability to the intended parties, high security against eavesdroppers and efficiency in communication system. The notion of a highly secure and reliable physical layer has the potential to significantly change how communication system designers and users think of the physical layer since the error control codes employed in this work will have the dual roles of both reliability and security.
Improving Memory Performance for Both High Performance Computing and Embedded/Edge Computing Systems
CPU-memory bottleneck is a widely recognized problem. It is known that majority of high performance computing (HPC) database systems are configured with large memories and dedicated to process specific workloads like weather prediction, molecular dynamic simulations etc. My research on optimal address mapping improves the memory performance by increasing the channel and bank level parallelism. In an another research direction, I proposed and evaluated adaptive page migration techniques that obviates the need for offline analysis of an application to determine page migration strategies. Furthermore, I explored different migration strategies like reverse migration, sub page migration that I found to be beneficial depending on the application behavior. Ideally, page migration strategies redirect the demand memory traffic to faster memory to improve the memory performance. In my third contribution, I worked and evaluated a memory-side accelerator to assist the main computational core in locating the non-zero elements of a sparse matrix that are typically used in scientific, machine learning workloads on a low-power embedded system configuration. Thus my contributions narrow the speed-gap by improving the latency and/or bandwidth between CPU and memory.
Evaluating Appropriateness of Emg and Flex Sensors for Classifying Hand Gestures
Hand and arm gestures are a great way of communication when you don't want to be heard, quieter and often more reliable than whispering into a radio mike. In recent years hand gesture identification became a major active area of research due its use in various applications. The objective of my work is to develop an integrated sensor system, which will enable tactical squads and SWAT teams to communicate when there is absence of a Line of Sight or in the presence of any obstacles. The gesture set involved in this work is the standardized hand signals for close range engagement operations used by military and SWAT teams. The gesture sets involved in this work are broadly divided into finger movements and arm movements. The core components of the integrated sensor system are: Surface EMG sensors, Flex sensors and accelerometers. Surface EMG is the electrical activity produced by muscle contractions and measured by sensors directly attached to the skin. Bend Sensors use a piezo resistive material to detect the bend. The sensor output is determined by both the angle between the ends of the sensor as well as the flex radius. Accelerometers sense the dynamic acceleration and inclination in 3 directions simultaneously. EMG sensors are placed on the upper and lower forearm and assist in the classification of the finger and wrist movements. Bend sensors are mounted on a glove that is worn on the hand. The sensors are located over the first knuckle of each figure and can determine if the finger is bent or not. An accelerometer is attached to the glove at the base of the wrist and determines the speed and direction of the arm movement. Classification algorithm SVM is used to classify the gestures.
Hybrid Optimization Models for Depot Location-Allocation and Real-Time Routing of Emergency Deliveries
Prompt and efficient intervention is vital in reducing casualty figures during epidemic outbreaks, disasters, sudden civil strife or terrorism attacks. This can only be achieved if there is a fit-for-purpose and location-specific emergency response plan in place, incorporating geographical, time and vehicular capacity constraints. In this research, a comprehensive emergency response model for situations of uncertainties (in locations' demand and available resources), typically obtainable in low-resource countries, is designed. It involves the development of algorithms for optimizing pre-and post-disaster activities. The studies result in the development of four models: (1) an adaptation of a machine learning clustering algorithm, for pre-positioning depots and emergency operation centers, which optimizes the placement of these depots, such that the largest geographical location is covered, and the maximum number of individuals reached, with minimal facility cost; (2) an optimization algorithm for routing relief distribution, using heterogenous fleets of vehicle, with considerations for uncertainties in humanitarian supplies; (3) a genetic algorithm-based route improvement model; and (4) a model for integrating possible new locations into the routing network, in real-time, using emergency severity ranking, with a high priority on the most-vulnerable population. The clustering approach to solving dept location-allocation problem produces a better time complexity, and the benchmarking of the routing algorithm with existing approaches, results in competitive outcomes.
Traffic Forecasting Applications Using Crowdsourced Traffic Reports and Deep Learning
Intelligent transportation systems (ITS) are essential tools for traffic planning, analysis, and forecasting that can utilize the huge amount of traffic data available nowadays. In this work, we aggregated detailed traffic flow sensor data, Waze reports, OpenStreetMap (OSM) features, and weather data, from California Bay Area for 6 months. Using that data, we studied three novel ITS applications using convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The first experiment is an analysis of the relation between roadway shapes and accident occurrence, where results show that the speed limit and number of lanes are significant predictors for major accidents on highways. The second experiment presents a novel method for forecasting congestion severity using crowdsourced data only (Waze, OSM, and weather), without the need for traffic sensor data. The third experiment studies the improvement of traffic flow forecasting using accidents, number of lanes, weather, and time-related features, where results show significant performance improvements when the additional features where used.
New Frameworks for Secure Image Communication in the Internet of Things (IoT)
The continuous expansion of technology, broadband connectivity and the wide range of new devices in the IoT cause serious concerns regarding privacy and security. In addition, in the IoT a key challenge is the storage and management of massive data streams. For example, there is always the demand for acceptable size with the highest quality possible for images to meet the rapidly increasing number of multimedia applications. The effort in this dissertation contributes to the resolution of concerns related to the security and compression functions in image communications in the Internet of Thing (IoT), due to the fast of evolution of IoT. This dissertation proposes frameworks for a secure digital camera in the IoT. The objectives of this dissertation are twofold. On the one hand, the proposed framework architecture offers a double-layer of protection: encryption and watermarking that will address all issues related to security, privacy, and digital rights management (DRM) by applying a hardware architecture of the state-of-the-art image compression technique Better Portable Graphics (BPG), which achieves high compression ratio with small size. On the other hand, the proposed framework of SBPG is integrated with the Digital Camera. Thus, the proposed framework of SBPG integrated with SDC is suitable for high performance imaging in the IoT, such as Intelligent Traffic Surveillance (ITS) and Telemedicine. Due to power consumption, which has become a major concern in any portable application, a low-power design of SBPG is proposed to achieve an energy- efficient SBPG design. As the visual quality of the watermarked and compressed images improves with larger values of PSNR, the results show that the proposed SBPG substantially increases the quality of the watermarked compressed images. Higher value of PSNR also shows how robust the algorithm is to different types of attack. From the results obtained for the energy- efficient SBPG …
Deep Learning Methods to Investigate Online Hate Speech and Counterhate Replies to Mitigate Hateful Content
Hateful content and offensive language are commonplace on social media platforms. Many surveys prove that high percentages of social media users experience online harassment. Previous efforts have been made to detect and remove online hate content automatically. However, removing users' content restricts free speech. A complementary strategy to address hateful content that does not interfere with free speech is to counter the hate with new content to divert the discourse away from the hate. In this dissertation, we complement the lack of previous work on counterhate arguments by analyzing and detecting them. Firstly, we study the relationships between hateful tweets and replies. Specifically, we analyze their fine-grained relationships by indicating whether the reply counters the hate, provides a justification, attacks the author of the tweet, or adds additional hate. The most obvious finding is that most replies generally agree with the hateful tweets; only 20% of them counter the hate. Secondly, we focus on the hate directed toward individuals and detect authentic counterhate arguments from online articles. We propose a methodology that assures the authenticity of the argument and its specificity to the individual of interest. We show that finding arguments in online articles is an efficient alternative compared to counterhate generation approaches that may hallucinate unsupported arguments. Thirdly, we investigate the replies to counterhate tweets beyond whether the reply agrees or disagrees with the counterhate tweet. We analyze the language of the counterhate tweet that leads to certain types of replies and predict which counterhate tweets may elicit more hate instead of stopping it. We find that counterhate tweets with profanity content elicit replies that agree with the counterhate tweet. This dissertation presents several corpora, detailed corpus analyses, and deep learning-based approaches for the three tasks mentioned above.
Toward Leveraging Artificial Intelligence to Support the Identification of Accessibility Challenges
The goal of this thesis is to support the automated identification of accessibility in user reviews or bug reports, to help technology professionals prioritize their handling, and, thus, to create more inclusive apps. Particularly, we propose a model that takes as input accessibility user reviews or bug reports and learns their keyword-based features to make a classification decision, for a given review, on whether it is about accessibility or not. Our empirically driven study follows a mixture of qualitative and quantitative methods. We introduced models that can accurately identify accessibility reviews and bug reports and automate detecting them. Our models can automatically classify app reviews and bug reports as accessibility-related or not so developers can easily detect accessibility issues with their products and improve them to more accessible and inclusive apps utilizing the users' input. Our goal is to create a sustainable change by including a model in the developer's software maintenance pipeline and raising awareness of existing errors that hinder the accessibility of mobile apps, which is a pressing need. In light of our findings from the Blackboard case study, Blackboard and the course material are not easily accessible to deaf students and hard of hearing. Thus, deaf students find that learning is extremely stressful during the pandemic.
Scalable Next Generation Blockchains for Large Scale Complex Cyber-Physical Systems and Their Embedded Systems in Smart Cities
The original FlexiChain and its descendants are a revolutionary distributed ledger technology (DLT) for cyber-physical systems (CPS) and their embedded systems (ES). FlexiChain, a DLT implementation, uses cryptography, distributed ledgers, peer-to-peer communications, scalable networks, and consensus. FlexiChain facilitates data structure agreements. This thesis offers a Block Directed Acyclic Graph (BDAG) architecture to link blocks to their forerunners to speed up validation. These data blocks are securely linked. This dissertation introduces Proof of Rapid Authentication, a novel consensus algorithm. This innovative method uses a distributed file to safely store a unique identifier (UID) based on node attributes to verify two blocks faster. This study also addresses CPS hardware security. A system of interconnected, user-unique identifiers allows each block's history to be monitored. This maintains each transaction and the validators who checked the block to ensure trustworthiness and honesty. We constructed a digital version that stays in sync with the distributed ledger as all nodes are linked by a NodeChain. The ledger is distributed without compromising node autonomy. Moreover, FlexiChain Layer 0 distributed ledger is also introduced and can connect and validate Layer 1 blockchains. This project produced a DAG-based blockchain integration platform with hardware security. The results illustrate a practical technique for creating a system depending on diverse applications' needs. This research's design and execution showed faster authentication, less cost, less complexity, greater scalability, higher interoperability, and reduced power consumption.
Frameworks for Attribute-Based Access Control (ABAC) Policy Engineering
In this disseration we propose semi-automated top-down policy engineering approaches for attribute-based access control (ABAC) development. Further, we propose a hybrid ABAC policy engineering approach to combine the benefits and address the shortcomings of both top-down and bottom-up approaches. In particular, we propose three frameworks: (i) ABAC attributes extraction, (ii) ABAC constraints extraction, and (iii) hybrid ABAC policy engineering. Attributes extraction framework comprises of five modules that operate together to extract attributes values from natural language access control policies (NLACPs); map the extracted values to attribute keys; and assign each key-value pair to an appropriate entity. For ABAC constraints extraction framework, we design a two-phase process to extract ABAC constraints from NLACPs. The process begins with the identification phase which focuses on identifying the right boundary of constraint expressions. Next is the normalization phase, that aims at extracting the actual elements that pose a constraint. On the other hand, our hybrid ABAC policy engineering framework consists of 5 modules. This framework combines top-down and bottom-up policy engineering techniques to overcome the shortcomings of both approaches and to generate policies that are more intuitive and relevant to actual organization policies. With this, we believe that our work takes essential steps towards a semi-automated ABAC policy development experience.
Radio Resource Control Approaches for LTE-Advanced Femtocell Networks
The architecture of mobile networks has dramatically evolved in order to fulfill the growing demands on wireless services and data. The radio resources, which are used by the current mobile networks, are limited while the users demands are substantially increasing. In the future, tremendous Internet applications are expected to be served by mobile networks. Therefore, increasing the capacity of mobile networks has become a vital issue. Heterogeneous networks (HetNets) have been considered as a promising paradigm for future mobile networks. Accordingly, the concept of small cell has been introduced in order to increase the capacity of the mobile networks. A femtocell network is a kind of small cell networks. Femtocells are deployed within macrocells coverage. Femtocells cover small areas and operate with low transmission power while providing high capacity. Also, UEs can be offloaded from macrocells to femtocells. Thus, the capacity can be increased. However, this will introduce different technical challenges. The interference has become one of the key challenges for deploying femtocells within a certain macrocells coverage. Undesirable impact of the interference can degrade the performance of the mobile networks. Therefore, radio resource management mechanisms are needed in order to address key challenges of deploying femtocells. The objective of this work is to introduce radio resource control approaches, which are used to increase mobile networks' capacity and alleviate undesirable impact of the interference. In addition, proposed radio resource control approaches ensure the coexistence between macrocell and femtocells based on LTE-Advanced environment. Firstly, a novel mechanism is proposed in order to address the interference challenge. The proposed approach mitigates the impact of interference based on controlling radio sub-channels' assignment and dynamically adjusting the transmission power. Secondly, a dynamic strategy is proposed for the FFR mechanism. In the FFR mechanism, the whole spectrum is divided into four fixed sub-channels and each …
A Data-Driven Computational Framework to Assess the Risk of Epidemics at Global Mass Gatherings
This dissertation presents a data-driven computational epidemic framework to simulate disease epidemics at global mass gatherings. The annual Muslim pilgrimage to Makkah, Saudi Arabia is used to demonstrate the simulation and analysis of various disease transmission scenarios throughout the different stages of the event from the arrival to the departure of international participants. The proposed agent-based epidemic model efficiently captures the demographic, spatial, and temporal heterogeneity at each stage of the global event of Hajj. Experimental results indicate the substantial impact of the demographic and mobility patterns of the heterogeneous population of pilgrims on the progression of the disease spread in the different stages of Hajj. In addition, these simulations suggest that the differences in the spatial and temporal settings in each stage can significantly affect the dynamic of the disease. Finally, the epidemic simulations conducted at the different stages in this dissertation illustrate the impact of the differences between the duration of each stage in the event and the length of the infectious and latent periods. This research contributes to a better understanding of epidemic modeling in the context of global mass gatherings to predict the risk of disease pandemics caused by associated international travel. The computational modeling and disease spread simulations in global mass gatherings provide public health authorities with powerful tools to assess the implication of these events at a different scale and to evaluate the efficacy of control strategies to reduce their potential impacts.
Resource Management in Wireless Networks
A local call admission control (CAC) algorithm for third generation wireless networks was designed and implemented, which allows for the simulation of network throughput for different spreading factors and various mobility scenarios. A global CAC algorithm is also implemented and used as a benchmark since it is inherently optimized; it yields the best possible performance but has an intensive computational complexity. Optimized local CAC algorithm achieves similar performance as global CAC algorithm at a fraction of the computational cost. Design of a dynamic channel assignment algorithm for IEEE 802.11 wireless systems is also presented. Channels are assigned dynamically depending on the minimal interference generated by the neighboring access points on a reference access point. Analysis of dynamic channel assignment algorithm shows an improvement by a factor of 4 over the default settings of having all access points use the same channel, resulting significantly higher network throughput.
Space and Spectrum Engineered High Frequency Components and Circuits
With the increasing demand on wireless and portable devices, the radio frequency front end blocks are required to feature properties such as wideband, high frequency, multiple operating frequencies, low cost and compact size. However, the current radio frequency system blocks are designed by combining several individual frequency band blocks into one functional block, which increase the cost and size of devices. To address these issues, it is important to develop novel approaches to further advance the current design methodologies in both space and spectrum domains. In recent years, the concept of artificial materials has been proposed and studied intensively in RF/Microwave, Terahertz, and optical frequency range. It is a combination of conventional materials such as air, wood, metal and plastic. It can achieve the material properties that have not been found in nature. Therefore, the artificial material (i.e. meta-materials) provides design freedoms to control both the spectrum performance and geometrical structures of radio frequency front end blocks and other high frequency systems. In this dissertation, several artificial materials are proposed and designed by different methods, and their applications to different high frequency components and circuits are studied. First, quasi-conformal mapping (QCM) method is applied to design plasmonic wave-adapters and couplers working at the optical frequency range. Second, inverse QCM method is proposed to implement flattened Luneburg lens antennas and parabolic antennas in the microwave range. Third, a dual-band compact directional coupler is realized by applying artificial transmission lines. In addition, a fully symmetrical coupler with artificial lumped element structure is also implemented. Finally, a tunable on-chip inductor, compact CMOS transmission lines, and metamaterial-based interconnects are proposed using artificial metal structures. All the proposed designs are simulated in full-wave 3D electromagnetic solvers, and the measurement results agree well with the simulation results. These artificial material-based novel design methodologies pave the way …
Statistical Strategies for Efficient Signal Detection and Parameter Estimation in Wireless Sensor Networks
This dissertation investigates data reduction strategies from a signal processing perspective in centralized detection and estimation applications. First, it considers a deterministic source observed by a network of sensors and develops an analytical strategy for ranking sensor transmissions based on the magnitude of their test statistics. The benefit of the proposed strategy is that the decision to transmit or not to transmit observations to the fusion center can be made at the sensor level resulting in significant savings in transmission costs. A sensor network based on target tracking application is simulated to demonstrate the benefits of the proposed strategy over the unconstrained energy approach. Second, it considers the detection of random signals in noisy measurements and evaluates the performance of eigenvalue-based signal detectors. Due to their computational simplicity, robustness and performance, these detectors have recently received a lot of attention. When the observed random signal is correlated, several researchers claim that the performance of eigenvalue-based detectors exceeds that of the classical energy detector. However, such claims fail to consider the fact that when the signal is correlated, the optimal detector is the estimator-correlator and not the energy detector. In this dissertation, through theoretical analyses and Monte Carlo simulations, eigenvalue-based detectors are shown to be suboptimal when compared to the energy detector and the estimator-correlator.
Application of Adaptive Techniques in Regression Testing for Modern Software Development
In this dissertation we investigate the applicability of different adaptive techniques to improve the effectiveness and efficiency of the regression testing. Initially, we introduce the concept of regression testing. We then perform a literature review of current practices and state-of-the-art regression testing techniques. Finally, we advance the regression testing techniques by performing four empirical studies in which we use different types of information (e.g. user session, source code, code commit, etc.) to investigate the effectiveness of each software metric on fault detection capability for different software environments. In our first empirical study, we show the effectiveness of applying user session information for test case prioritization. In our next study, we apply learning from the previous study, and implement a collaborative filtering recommender system for test case prioritization, which uses user sessions and change history information as input parameter, and return the risk score associated with each component. Results of this study show that our recommender system improves the effectiveness of test prioritization; the performance of our approach was particularly noteworthy when we were under time constraints. We then investigate the merits of multi-objective testing over single objective techniques with a graph-based testing framework. Results of this study indicate that the use of the graph-based technique reduces the algorithm execution time considerably, while being just as effective as the greedy algorithms in terms of fault detection rate. Finally, we apply the knowledge from the previous studies and implement a query answering framework for regression test selection. This framework is built based on a graph database and uses fault history information and test diversity in attempt to select the most effective set of test cases in term of fault detection capability. Our empirical evaluation of this study with four open source programs shows that our approach can be effective and efficient by …
Blockchain for AI: Smarter Contracts to Secure Artificial Intelligence Algorithms
In this dissertation, I investigate the existing smart contract problems that limit cognitive abilities. I use Taylor's serious expansion, polynomial equation, and fraction-based computations to overcome the limitations of calculations in smart contracts. To prove the hypothesis, I use these mathematical models to compute complex operations of naive Bayes, linear regression, decision trees, and neural network algorithms on Ethereum public test networks. The smart contracts achieve 95\% prediction accuracy compared to traditional programming language models, proving the soundness of the numerical derivations. Many non-real-time applications can use our solution for trusted and secure prediction services.
Sensing and Decoding Brain States for Predicting and Enhancing Human Behavior, Health, and Security
The human brain acts as an intelligent sensor by helping in effective signal communication and execution of logical functions and instructions, thus, coordinating all functions of the human body. More importantly, it shows the potential to combine prior knowledge with adaptive learning, thus ensuring constant improvement. These qualities help the brain to interact efficiently with both, the body (brain-body) as well as the environment (brain-environment). This dissertation attempts to apply the brain-body-environment interactions (BBEI) to elevate human existence and enhance our day-to-day experiences. For instance, when one stepped out of the house in the past, one had to carry keys (for unlocking), money (for purchasing), and a phone (for communication). With the advent of smartphones, this scenario changed completely and today, it is often enough to carry just one's smartphone because all the above activities can be performed with a single device. In the future, with advanced research and progress in BBEI interactions, one will be able to perform many activities by dictating it in one's mind without any physical involvement. This dissertation aims to shift the paradigm of existing brain-computer-interfaces from just ‘control' to ‘monitor, control, enhance, and restore' in three main areas - healthcare, transportation safety, and cryptography. In healthcare, measures were developed for understanding brain-body interactions by correlating cerebral autoregulation with brain signals. The variation in estimated blood flow of brain (obtained through EEG) was detected with evoked change in blood pressure, thus, enabling EEG metrics to be used as a first hand screening tool to check impaired cerebral autoregulation. To enhance road safety, distracted drivers' behavior in various multitasking scenarios while driving was identified by significant changes in the time-frequency spectrum of the EEG signals. A distraction metric was calculated to rank the severity of a distraction task that can be used as an intuitive measure …
Extrapolating Subjectivity Research to Other Languages
Socrates articulated it best, "Speak, so I may see you." Indeed, language represents an invisible probe into the mind. It is the medium through which we express our deepest thoughts, our aspirations, our views, our feelings, our inner reality. From the beginning of artificial intelligence, researchers have sought to impart human like understanding to machines. As much of our language represents a form of self expression, capturing thoughts, beliefs, evaluations, opinions, and emotions which are not available for scrutiny by an outside observer, in the field of natural language, research involving these aspects has crystallized under the name of subjectivity and sentiment analysis. While subjectivity classification labels text as either subjective or objective, sentiment classification further divides subjective text into either positive, negative or neutral. In this thesis, I investigate techniques of generating tools and resources for subjectivity analysis that do not rely on an existing natural language processing infrastructure in a given language. This constraint is motivated by the fact that the vast majority of human languages are scarce from an electronic point of view: they lack basic tools such as part-of-speech taggers, parsers, or basic resources such as electronic text, annotated corpora or lexica. This severely limits the implementation of techniques on par with those developed for English, and by applying methods that are lighter in the usage of text processing infrastructure, we are able to conduct multilingual subjectivity research in these languages as well. Since my aim is also to minimize the amount of manual work required to develop lexica or corpora in these languages, the techniques proposed employ a lever approach, where English often acts as the donor language (the fulcrum in a lever) and allows through a relatively minimal amount of effort to establish preliminary subjectivity research in a target language.
Toward Supporting Fine-Grained, Structured, Meaningful and Engaging Feedback in Educational Applications
Recent advancements in machine learning have started to put their mark on educational technology. Technology is evolving fast and, as people adopt it, schools and universities must also keep up (nearly 70% of primary and secondary schools in the UK are now using tablets for various purposes). As these numbers are likely going to follow the same increasing trend, it is imperative for schools to adapt and benefit from the advantages offered by technology: real-time processing of data, availability of different resources through connectivity, efficiency, and many others. To this end, this work contributes to the growth of educational technology by developing several algorithms and models that are meant to ease several tasks for the instructors, engage students in deep discussions and ultimately, increase their learning gains. First, a novel, fine-grained knowledge representation is introduced that splits phrases into their constituent propositions that are both meaningful and minimal. An automated extraction algorithm of the propositions is also introduced. Compared with other fine-grained representations, the extraction model does not require any human labor after it is trained, while the results show considerable improvement over two meaningful baselines. Second, a proposition alignment model is created that relies on even finer-grained units of text while also outperforming several alternative systems. Third, a detailed machine learning based analysis of students' unrestricted natural language responses to questions asked in classrooms is made by leveraging the proposition extraction algorithm to make computational predictions of textual assessment. Two computational approaches are introduced that use and compare manually engineered machine learning features with word embeddings input into a two-hidden layers neural network. Both methods achieve notable improvements over two alternative approaches, a recent short answer grading system and DiSAN – a recent, pre-trained, light-weight neural network that obtained state-of-the-art performance on multiple NLP tasks and corpora. Fourth, a …
A New Look at Retargetable Compilers
Consumers demand new and innovative personal computing devices every 2 years when their cellular phone service contracts are renewed. Yet, a 2 year development cycle for the concurrent development of both hardware and software is nearly impossible. As more components and features are added to the devices, maintaining this 2 year cycle with current tools will become commensurately harder. This dissertation delves into the feasibility of simplifying the development of such systems by employing heterogeneous systems on a chip in conjunction with a retargetable compiler such as the hybrid computer retargetable compiler (Hy-C). An example of a simple architecture description of sufficient detail for use with a retargetable compiler like Hy-C is provided. As a software engineer with 30 years of experience, I have witnessed numerous system failures. A plethora of software development paradigms and tools have been employed to prevent software errors, but none have been completely successful. Much discussion centers on software development in the military contracting market, as that is my background. The dissertation reviews those tools, as well as some existing retargetable compilers, in an attempt to determine how those errors occurred and how a system like Hy-C could assist in reducing future software errors. In the end, the potential for a simple retargetable solution like Hy-C is shown to be very simple, yet powerful enough to provide a very capable product in a very fast-growing market.
Content and Temporal Analysis of Communications to Predict Task Cohesion in Software Development Global Teams
Virtual teams in industry are increasingly being used to develop software, create products, and accomplish tasks. However, analyzing those collaborations under same-time/different-place conditions is well-known to be difficult. In order to overcome some of these challenges, this research was concerned with the study of collaboration-based, content-based and temporal measures and their ability to predict cohesion within global software development projects. Messages were collected from three software development projects that involved students from two different countries. The similarities and quantities of these interactions were computed and analyzed at individual and group levels. Results of interaction-based metrics showed that the collaboration variables most related to Task Cohesion were Linguistic Style Matching and Information Exchange. The study also found that Information Exchange rate and Reply rate have a significant and positive correlation to Task Cohesion, a factor used to describe participants' engagement in the global software development process. This relation was also found at the Group level. All these results suggest that metrics based on rate can be very useful for predicting cohesion in virtual groups. Similarly, content features based on communication categories were used to improve the identification of Task Cohesion levels. This model showed mixed results, since only Work similarity and Social rate were found to be correlated with Task Cohesion. This result can be explained by how a group's cohesiveness is often associated with fairness and trust, and that these two factors are often achieved by increased social and work communications. Also, at a group-level, all models were found correlated to Task Cohesion, specifically, Similarity+Rate, which suggests that models that include social and work communication categories are also good predictors of team cohesiveness. Finally, temporal interaction similarity measures were calculated to assess their prediction capabilities in a global setting. Results showed a significant negative correlation between the Pacing Rate and …
Monitoring Dengue Outbreaks Using Online Data
Internet technology has affected humans' lives in many disciplines. The search engine is one of the most important Internet tools in that it allows people to search for what they want. Search queries entered in a web search engine can be used to predict dengue incidence. This vector borne disease causes severe illness and kills a large number of people every year. This dissertation utilizes the capabilities of search queries related to dengue and climate to forecast the number of dengue cases. Several machine learning techniques are applied for data analysis, including Multiple Linear Regression, Artificial Neural Networks, and the Seasonal Autoregressive Integrated Moving Average. Predictive models produced from these machine learning methods are measured for their performance to find which technique generates the best model for dengue prediction. The results of experiments presented in this dissertation indicate that search query data related to dengue and climate can be used to forecast the number of dengue cases. The performance measurement of predictive models shows that Artificial Neural Networks outperform the others. These results will help public health officials in planning to deal with the outbreaks.
Optimization of Massive MIMO Systems for 5G Networks
In the first part of the dissertation, we provide an extensive overview of sub-6 GHz wireless access technology known as massive multiple-input multiple-output (MIMO) systems, highlighting its benefits, deployment challenges, and the key enabling technologies envisaged for 5G networks. We investigate the fundamental issues that degrade the performance of massive MIMO systems such as pilot contamination, precoding, user scheduling, and signal detection. In the second part, we optimize the performance of the massive MIMO system by proposing several algorithms, system designs, and hardware architectures. To mitigate the effect of pilot contamination, we propose a pilot reuse factor scheme based on the user environment and the number of active users. The results through simulations show that the proposed scheme ensures the system always operates at maximal spectral efficiency and achieves higher throughput. To address the user scheduling problem, we propose two user scheduling algorithms bases upon the measured channel gain. The simulation results show that our proposed user scheduling algorithms achieve better error performance, improve sum capacity and throughput, and guarantee fairness among the users. To address the uplink signal detection challenge in the massive MIMO systems, we propose four algorithms and their system designs. We show through simulations that the proposed algorithms are computationally efficient and can achieve near-optimal bit error rate performance. Additionally, we propose hardware architectures for all the proposed algorithms to identify the required physical components and their interrelationships.
Cooperative Perception for Connected Autonomous Vehicle Edge Computing System
This dissertation first conducts a study on raw-data level cooperative perception for enhancing the detection ability of self-driving systems for connected autonomous vehicles (CAVs). A LiDAR (Light Detection and Ranging sensor) point cloud-based 3D object detection method is deployed to enhance detection performance by expanding the effective sensing area, capturing critical information in multiple scenarios and improving detection accuracy. In addition, a point cloud feature based cooperative perception framework is proposed on edge computing system for CAVs. This dissertation also uses the features' intrinsically small size to achieve real-time edge computing, without running the risk of congesting the network. In order to distinguish small sized objects such as pedestrian and cyclist in 3D data, an end-to-end multi-sensor fusion model is developed to implement 3D object detection from multi-sensor data. Experiments show that by solving multiple perception on camera and LiDAR jointly, the detection model can leverage the advantages from high resolution image and physical world LiDAR mapping data, which leads the KITTI benchmark on 3D object detection. At last, an application of cooperative perception is deployed on edge to heal the live map for autonomous vehicles. Through 3D reconstruction and multi-sensor fusion detection, experiments on real-world dataset demonstrate that a high definition (HD) map on edge can afford well sensed local data for navigation to CAVs.
Extracting Possessions and Their Attributes
Possession is an asymmetric semantic relation between two entities, where one entity (the possessee) belongs to the other entity (the possessor). Automatically extracting possessions are useful in identifying skills, recommender systems and in natural language understanding. Possessions can be found in different communication modalities including text, images, videos, and audios. In this dissertation, I elaborate on the techniques I used to extract possessions. I begin with extracting possessions at the sentence level including the type and temporal anchors. Then, I extract the duration of possession and co-possessions (if multiple possessors possess the same entity). Next, I extract possessions from an entire Wikipedia article capturing the change of possessors over time. I extract possessions from social media including both text and images. Finally, I also present dense annotations generating possession timelines. I present separate datasets, detailed corpus analysis, and machine learning models for each task described above.
Social Network Simulation and Mining Social Media to Advance Epidemiology
Traditional Public Health decision-support can benefit from the Web and social media revolution. This dissertation presents approaches to mining social media benefiting public health epidemiology. Through discovery and analysis of trends in Influenza related blogs, a correlation to Centers for Disease Control and Prevention (CDC) influenza-like-illness patient reporting at sentinel health-care providers is verified. A second approach considers personal beliefs of vaccination in social media. A vaccine for human papillomavirus (HPV) was approved by the Food and Drug Administration in May 2006. The virus is present in nearly all cervical cancers and implicated in many throat and oral cancers. Results from automatic sentiment classification of HPV vaccination beliefs are presented which will enable more accurate prediction of the vaccine's population-level impact. Two epidemic models are introduced that embody the intimate social networks related to HPV transmission. Ultimately, aggregating these methodologies with epidemic and social network modeling facilitate effective development of strategies for targeted interventions.
The Value of Everything: Ranking and Association with Encyclopedic Knowledge
This dissertation describes WikiRank, an unsupervised method of assigning relative values to elements of a broad coverage encyclopedic information source in order to identify those entries that may be relevant to a given piece of text. The valuation given to an entry is based not on textual similarity but instead on the links that associate entries, and an estimation of the expected frequency of visitation that would be given to each entry based on those associations in context. This estimation of relative frequency of visitation is embodied in modifications to the random walk interpretation of the PageRank algorithm. WikiRank is an effective algorithm to support natural language processing applications. It is shown to exceed the performance of previous machine learning algorithms for the task of automatic topic identification, providing results comparable to that of human annotators. Second, WikiRank is found useful for the task of recognizing text-based paraphrases on a semantic level, by comparing the distribution of attention generated by two pieces of text using the encyclopedic resource as a common reference. Finally, WikiRank is shown to have the ability to use its base of encyclopedic knowledge to recognize terms from different ontologies as describing the same thing, and thus allowing for the automatic generation of mapping links between ontologies. The conclusion of this thesis is that the "knowledge access heuristic" is valuable and that a ranking process based on a large encyclopedic resource can form the basis for an extendable general purpose mechanism capable of identifying relevant concepts by association, which in turn can be effectively utilized for enumeration and comparison at a semantic level.
Reliability and Throughput Improvement in Vehicular Communication by Using 5G Technologies
The vehicular community is moving towards a whole new paradigm with the advancement of new technology. Vehicular communication not only supports safety services but also provides non-safety services like navigation support, toll collection, web browsing, media streaming, etc. The existing communication frameworks like Dedicated Short Range Communication (DSRC) and Cellular V2X (C-V2X) might not meet the required capacity in the coming days. So, the vehicular community needs to adopt new technologies and upgrade the existing communication frameworks so that it can fulfill the desired expectations. Therefore, an increment in reliability and data rate is required. Multiple Input Multiple Output (MIMO), 5G New Radio, Low Density Parity Check (LDPC) Code, and Massive MIMO signal detection and equalization algorithms are the latest addition to the 5G wireless communication domain. These technologies have the potential to make the existing V2X communication framework more robust. As a result, more reliability and throughput can be achieved. This work demonstrates these technologies' compatibility and positive impact on existing V2X communication standard.
An Efficient Approach for Dengue Mitigation: A Computational Framework
Dengue mitigation is a major research area among scientist who are working towards an effective management of the dengue epidemic. An effective dengue mitigation requires several other important components. These components include an accurate epidemic modeling, an efficient epidemic prediction, and an efficient resource allocation for controlling of the spread of the dengue disease. Past studies assumed homogeneous response pattern of the dengue epidemic to climate conditions throughout the regions. The dengue epidemic is climate dependent and also it is geographically dependent. A global model is not sufficient to capture the local variations of the epidemic. We propose a novel method of epidemic modeling considering local variation and that uses micro ensemble of regressors for each region. There are three regressors that are used in the construction of the ensemble. These are support vector regression, ordinary least square regression, and a k-nearest neighbor regression. The best performing regressors get selected into the ensemble. The proposed ensemble determines the risk of dengue epidemic in each region in advance. The risk is then used in risk-based resource allocation. The proposing resource allocation is built based on the genetic algorithm. The algorithm exploits the genetic algorithm with major modifications to its main components, mutation and crossover. The proposed resource allocation converges faster than the standard genetic algorithm and also produces a better allocation compared to the standard algorithm.
New Computational Methods for Literature-Based Discovery
In this work, we leverage the recent developments in computer science to address several of the challenges in current literature-based discovery (LBD) solutions. First, LBD solutions cannot use semantics or are too computational complex. To solve the problems we propose a generative model OverlapLDA based on topic modeling, which has been shown both effective and efficient in extracting semantics from a corpus. We also introduce an inference method of OverlapLDA. We conduct extensive experiments to show the effectiveness and efficiency of OverlapLDA in LBD. Second, we expand LBD to a more complex and realistic setting. The settings are that there can be more than one concept connecting the input concepts, and the connectivity pattern between concepts can also be more complex than a chain. Current LBD solutions can hardly complete the LBD task in the new setting. We simplify the hypotheses as concept sets and propose LBDSetNet based on graph neural networks to solve this problem. We also introduce different training schemes based on self-supervised learning to train LBDSetNet without relying on comprehensive labeled hypotheses that are extremely costly to get. Our comprehensive experiments show that LBDSetNet outperforms strong baselines on simple hypotheses and addresses complex hypotheses.
Procedural Generation of Content for Online Role Playing Games
Video game players demand a volume of content far in excess of the ability of game designers to create it. For example, a single quest might take a week to develop and test, which means that companies such as Blizzard are spending millions of dollars each month on new content for their games. As a result, both players and developers are frustrated with the inability to meet the demand for new content. By generating content on-demand, it is possible to create custom content for each player based on player preferences. It is also possible to make use of the current world state during generation, something which cannot be done with current techniques. Using developers to create rules and assets for a content generator instead of creating content directly will lower development costs as well as reduce the development time for new game content to seconds rather than days. This work is part of the field of computational creativity, and involves the use of computers to create aesthetically pleasing game content, such as terrain, characters, and quests. I demonstrate agent-based terrain generation, and economic modeling of game spaces. I also demonstrate the autonomous generation of quests for online role playing games, and the ability to play these quests using an emulated Everquest server.
Modeling and Analysis of Intentional And Unintentional Security Vulnerabilities in a Mobile Platform
Mobile phones are one of the essential parts of modern life. Making a phone call is not the main purpose of a smart phone anymore, but merely one of many other features. Online social networking, chatting, short messaging, web browsing, navigating, and photography are some of the other features users enjoy in modern smartphones, most of which are provided by mobile apps. However, with this advancement, many security vulnerabilities have opened up in these devices. Malicious apps are a major threat for modern smartphones. According to Symantec Corp., by the middle of 2013, about 273,000 Android malware apps were identified. It is a complex issue to protect everyday users of mobile devices from the attacks of technologically competent hackers, illegitimate users, trolls, and eavesdroppers. This dissertation emphasizes the concept of intention identification. Then it looks into ways to utilize this intention identification concept to enforce security in a mobile phone platform. For instance, a battery monitoring app requiring SMS permissions indicates suspicious intention as battery monitoring usually does not need SMS permissions. Intention could be either the user's intention or the intention of an app. These intentions can be identified using their behavior or by using their source code. Regardless of the intention type, identifying it, evaluating it, and taking actions by using it to prevent any malicious intentions are the main goals of this research. The following four different security vulnerabilities are identified in this research: Malicious apps, spammers and lurkers in social networks, eavesdroppers in phone conversations, and compromised authentication. These four vulnerabilities are solved by detecting malware applications, identifying malicious users in a social network, enhancing the encryption system of a phone communication, and identifying user activities using electroencephalogram (EEG) for authentication. Each of these solutions are constructed using the idea of intention identification. Furthermore, many of …
SurfKE: A Graph-Based Feature Learning Framework for Keyphrase Extraction
Current unsupervised approaches for keyphrase extraction compute a single importance score for each candidate word by considering the number and quality of its associated words in the graph and they are not flexible enough to incorporate multiple types of information. For instance, nodes in a network may exhibit diverse connectivity patterns which are not captured by the graph-based ranking methods. To address this, we present a new approach to keyphrase extraction that represents the document as a word graph and exploits its structure in order to reveal underlying explanatory factors hidden in the data that may distinguish keyphrases from non-keyphrases. Experimental results show that our model, which uses phrase graph representations in a supervised probabilistic framework, obtains remarkable improvements in performance over previous supervised and unsupervised keyphrase extraction systems.
Improving Software Quality through Syntax and Semantics Verification of Requirements Models
Software defects can frequently be traced to poorly-specified requirements. Many software teams manage their requirements using tools such as checklists and databases, which lack a formal semantic mapping to system behavior. Such a mapping can be especially helpful for safety-critical systems. Another limitation of many requirements analysis methods is that much of the analysis must still be done manually. We propose techniques that automate portions of the requirements analysis process, as well as clarify the syntax and semantics of requirements models using a variety of methods, including machine learning tools and our own tool, VeriCCM. The machine learning tools used help us identify potential model elements and verify their correctness. VeriCCM, a formalized extension of the causal component model (CCM), uses formal methods to ensure that requirements are well-formed, as well as providing the beginnings of a full formal semantics. We also explore the use of statecharts to identify potential abnormal behaviors from a given set of requirements. At each stage, we perform empirical studies to evaluate the effectiveness of our proposed approaches.
Metamodeling-based Fast Optimization of Nanoscale Ams-socs
Modern consumer electronic systems are mostly based on analog and digital circuits and are designed as analog/mixed-signal systems on chip (AMS-SoCs). the integration of analog and digital circuits on the same die makes the system cost effective. in AMS-SoCs, analog and mixed-signal portions have not traditionally received much attention due to their complexity. As the fabrication technology advances, the simulation times for AMS-SoC circuits become more complex and take significant amounts of time. the time allocated for the circuit design and optimization creates a need to reduce the simulation time. the time constraints placed on designers are imposed by the ever-shortening time to market and non-recurrent cost of the chip. This dissertation proposes the use of a novel method, called metamodeling, and intelligent optimization algorithms to reduce the design time. Metamodel-based ultra-fast design flows are proposed and investigated. Metamodel creation is a one time process and relies on fast sampling through accurate parasitic-aware simulations. One of the targets of this dissertation is to minimize the sample size while retaining the accuracy of the model. in order to achieve this goal, different statistical sampling techniques are explored and applied to various AMS-SoC circuits. Also, different metamodel functions are explored for their accuracy and application to AMS-SoCs. Several different optimization algorithms are compared for global optimization accuracy and convergence. Three different AMS circuits, ring oscillator, inductor-capacitor voltage-controlled oscillator (LC-VCO) and phase locked loop (PLL) that are present in many AMS-SoC are used in this study for design flow application. Metamodels created in this dissertation provide accuracy with an error of less than 2% from the physical layout simulations. After optimal sampling investigation, metamodel functions and optimization algorithms are ranked in terms of speed and accuracy. Experimental results show that the proposed design flow provides roughly 5,000x speedup over conventional design flows. Thus, …
Variability-aware low-power techniques for nanoscale mixed-signal circuits.
New circuit design techniques that accommodate lower supply voltages necessary for portable systems need to be integrated into the semiconductor intellectual property (IP) core. Systems that once worked at 3.3 V or 2.5 V now need to work at 1.8 V or lower, without causing any performance degradation. Also, the fluctuation of device characteristics caused by process variation in nanometer technologies is seen as design yield loss. The numerous parasitic effects induced by layouts, especially for high-performance and high-speed circuits, pose a problem for IC design. Lack of exact layout information during circuit sizing leads to long design iterations involving time-consuming runs of complex tools. There is a strong need for low-power, high-performance, parasitic-aware and process-variation-tolerant circuit design. This dissertation proposes methodologies and techniques to achieve variability, power, performance, and parasitic-aware circuit designs. Three approaches are proposed: the single iteration automatic approach, the hybrid Monte Carlo and design of experiments (DOE) approach, and the corner-based approach. Widely used mixed-signal circuits such as analog-to-digital converter (ADC), voltage controlled oscillator (VCO), voltage level converter and active pixel sensor (APS) have been designed at nanoscale complementary metal oxide semiconductor (CMOS) and subjected to the proposed methodologies. The effectiveness of the proposed methodologies has been demonstrated through exhaustive simulations. Apart from these methodologies, the application of dual-oxide and dual-threshold techniques at circuit level in order to minimize power and leakage is also explored.
Incremental Learning with Large Datasets
This dissertation focuses on the novel learning strategy based on geometric support vector machines to address the difficulties of processing immense data set. Support vector machines find the hyper-plane that maximizes the margin between two classes, and the decision boundary is represented with a few training samples it becomes a favorable choice for incremental learning. The dissertation presents a novel method Geometric Incremental Support Vector Machines (GISVMs) to address both efficiency and accuracy issues in handling massive data sets. In GISVM, skin of convex hulls is defined and an efficient method is designed to find the best skin approximation given available examples. The set of extreme points are found by recursively searching along the direction defined by a pair of known extreme points. By identifying the skin of the convex hulls, the incremental learning will only employ a much smaller number of samples with comparable or even better accuracy. When additional samples are provided, they will be used together with the skin of the convex hull constructed from previous dataset. This results in a small number of instances used in incremental steps of the training process. Based on the experimental results with synthetic data sets, public benchmark data sets from UCI and endoscopy videos, it is evident that the GISVM achieved satisfactory classifiers that closely model the underlying data distribution. GISVM improves the performance in sensitivity in the incremental steps, significantly reduced the demand for memory space, and demonstrates the ability of recovery from temporary performance degradation.
Probabilistic Analysis of Contracting Ebola Virus Using Contextual Intelligence
The outbreak of the Ebola virus was declared a Public Health Emergency of International Concern by the World Health Organisation (WHO). Due to the complex nature of the outbreak, the Centers for Disease Control and Prevention (CDC) had created interim guidance for monitoring people potentially exposed to Ebola and for evaluating their intended travel and restricting the movements of carriers when needed. Tools to evaluate the risk of individuals and groups of individuals contracting the disease could mitigate the growing anxiety and fear. The goal is to understand and analyze the nature of risk an individual would face when he/she comes in contact with a carrier. This thesis presents a tool that makes use of contextual data intelligence to predict the risk factor of individuals who come in contact with the carrier.
Machine-Learning-Enabled Cooperative Perception on Connected Autonomous Vehicles
The main research objective of this dissertation is to understand the sensing and communication challenges to achieving cooperative perception among autonomous vehicles, and then, using the insights gained, guide the design of the suitable format of data to be exchanged, reliable and efficient data fusion algorithms on vehicles. By understanding what and how data are exchanged among autonomous vehicles, from a machine learning perspective, it is possible to realize precise cooperative perception on autonomous vehicles, enabling massive amounts of sensor information to be shared amongst vehicles. I first discuss the trustworthy perception information sharing on connected and autonomous vehicles. Then how to achieve effective cooperative perception on autonomous vehicles via exchanging feature maps among vehicles is discussed in the following. In the last methodology part, I propose a set of mechanisms to improve the solution proposed before, i.e., reducing the amount of data transmitted in the network to achieve an efficient cooperative perception. The effectiveness and efficiency of our mechanism is analyzed and discussed.
Spatial Partitioning Algorithms for Solving Location-Allocation Problems
This dissertation presents spatial partitioning algorithms to solve location-allocation problems. Location-allocations problems pertain to both the selection of facilities to serve demand at demand points and the assignment of demand points to the selected or known facilities. In the first part of this dissertation, we focus on the well known and well-researched location-allocation problem, the "p-median problem", which is a distance-based location-allocation problem that involves selection and allocation of p facilities for n demand points. We evaluate the performance of existing p-median heuristic algorithms and investigate the impact of the scale of the problem, and the spatial distribution of demand points on the performance of these algorithms. Based on the results from this comparative study, we present guidelines for location analysts to aid them in selecting the best heuristic and corresponding parameters depending on the problem at hand. Additionally, we found that existing heuristic algorithms are not suitable for solving large-scale p-median problems in a reasonable amount of time. We present a density-based decomposition methodology to solve large-scale p-median problems efficiently. This algorithm identifies dense clusters in the region and uses a MapReduce procedure to select facilities in the clustered regions independently and combine the solutions from the subproblems. Lastly, we present a novel greedy heuristic algorithm to solve the contiguity constrained fixed facility demand distribution problem. The objective of this problem is to create contiguous service areas for the facilities such that the demand at all facilities is uniform or proportional to the available resources, while the distance between demand points and facilities is minimized. The results in this research are shown in the context of creating emergency response plans for bio-emergencies. The algorithms are used to select Point of Dispensing (POD) locations (if not known) and map them to population regions to ensure that all affected individuals are …
Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction
Automatic text summarization and keyphrase extraction are two interesting areas of research which extend along natural language processing and information retrieval. They have recently become very popular because of their wide applicability. Devising generic techniques for these tasks is challenging due to several issues. Yet we have a good number of intelligent systems performing the tasks. As different systems are designed with different perspectives, evaluating their performances with a generic strategy is crucial. It has also become immensely important to evaluate the performances with minimal human effort. In our work, we focus on designing a relativized scale for evaluating different algorithms. This is our major contribution which challenges the traditional approach of working with an absolute scale. We consider the impact of some of the environment variables (length of the document, references, and system-generated outputs) on the performance. Instead of defining some rigid lengths, we show how to adjust to their variations. We prove a mathematically sound baseline that should work for all kinds of documents. We emphasize automatically determining the syntactic well-formedness of the structures (sentences). We also propose defining an equivalence class for each unit (e.g. word) instead of the exact string matching strategy. We show an evaluation approach that considers the weighted relatedness of multiple references to adjust to the degree of disagreements between the gold standards. We publish the proposed approach as a free tool so that other systems can use it. We have also accumulated a dataset (scientific articles) with a reference summary and keyphrases for each document. Our approach is applicable not only for evaluating single-document based tasks but also for evaluating multiple-document based tasks. We have tested our evaluation method for three intrinsic tasks (taken from DUC 2004 conference), and in all three cases, it correlates positively with ROUGE. Based on our experiments …
A Multi-Modal Insider Threat Detection and Prevention based on Users' Behaviors
Insider threat is one of the greatest concerns for information security that could cause more significant financial losses and damages than any other attack. However, implementing an efficient detection system is a very challenging task. It has long been recognized that solutions to insider threats are mainly user-centric and several psychological and psychosocial models have been proposed. A user's psychophysiological behavior measures can provide an excellent source of information for detecting user's malicious behaviors and mitigating insider threats. In this dissertation, we propose a multi-modal framework based on the user's psychophysiological measures and computer-based behaviors to distinguish between a user's behaviors during regular activities versus malicious activities. We utilize several psychophysiological measures such as electroencephalogram (EEG), electrocardiogram (ECG), and eye movement and pupil behaviors along with the computer-based behaviors such as the mouse movement dynamics, and keystrokes dynamics to build our framework for detecting malicious insiders. We conduct human subject experiments to capture the psychophysiological measures and the computer-based behaviors for a group of participants while performing several computer-based activities in different scenarios. We analyze the behavioral measures, extract useful features, and evaluate their capability in detecting insider threats. We investigate each measure separately, then we use data fusion techniques to build two modules and a comprehensive multi-modal framework. The first module combines the synchronized EEG and ECG psychophysiological measures, and the second module combines the eye movement and pupil behaviors with the computer-based behaviors to detect the malicious insiders. The multi-modal framework utilizes all the measures and behaviors in one model to achieve better detection accuracy. Our findings demonstrate that psychophysiological measures can reveal valuable knowledge about a user's malicious intent and can be used as an effective indicator in designing insider threat monitoring and detection frameworks. Our work lays out the necessary foundation to establish a new generation …
Validation and Evaluation of Emergency Response Plans through Agent-Based Modeling and Simulation
Biological emergency response planning plays a critical role in protecting the public from possible devastating results of sudden disease outbreaks. These plans describe the distribution of medical countermeasures across a region using limited resources within a restricted time window. Thus, the ability to determine that such a plan will be feasible, i.e. successfully provide service to affected populations within the time limit, is crucial. Many of the current efforts to validate plans are in the form of live drills and training, but those may not test plan activation at the appropriate scale or with sufficient numbers of participants. Thus, this necessitates the use of computational resources to aid emergency managers and planners in developing and evaluating plans before they must be used. Current emergency response plan generation software packages such as RE-PLAN or RealOpt, provide rate-based validation analyses. However, these types of analysis may neglect details of real-world traffic dynamics. Therefore, this dissertation presents Validating Emergency Response Plan Execution Through Simulation (VERPETS), a novel, computational system for the agent-based simulation of biological emergency response plan activation. This system converts raw road network, population distribution, and emergency response plan data into a format suitable for simulation, and then performs these simulations using SUMO, or Simulations of Urban Mobility, to simulate realistic traffic dynamics. Additionally, high performance computing methodologies were utilized to decrease agent load on simulations and improve performance. Further strategies, such as use of agent scaling and a time limit on simulation execution, were also examined. Experimental results indicate that the time to plan completion, i.e. the time when all individuals of the population have received medication, determined by VERPETS aligned well with current alternate methodologies. It was determined that the dynamic of traffic congestion at the POD itself was one of the major factors affecting the completion time of …
Privacy Preserving Machine Learning as a Service
Machine learning algorithms based on neural networks have achieved remarkable results and are being extensively used in different domains. However, the machine learning algorithms requires access to raw data which is often privacy sensitive. To address this issue, we develop new techniques to provide solutions for running deep neural networks over encrypted data. In this paper, we develop new techniques to adopt deep neural networks within the practical limitation of current homomorphic encryption schemes. We focus on training and classification of the well-known neural networks and convolutional neural networks. First, we design methods for approximation of the activation functions commonly used in CNNs (i.e. ReLU, Sigmoid, and Tanh) with low degree polynomials which is essential for efficient homomorphic encryption schemes. Then, we train neural networks with the approximation polynomials instead of original activation functions and analyze the performance of the models. Finally, we implement neural networks and convolutional neural networks over encrypted data and measure performance of the models.
An Extensible Computing Architecture Design for Connected Autonomous Vehicle System
Autonomous vehicles have made milestone strides within the past decade. Advances up the autonomy ladder have come lock-step with the advances in machine learning, namely deep-learning algorithms and huge, open training sets. And while advances in CPUs have slowed, GPUs have edged into the previous decade's TOP 500 supercomputer territory. This new class of GPUs include novel deep-learning hardware that has essentially side-stepped Moore's law, outpacing the doubling observation by a factor of ten. While GPUs have make record progress, networks do not follow Moore's law and are restricted by several bottlenecks, from protocol-based latency lower bounds to the very laws of physics. In a way, the bottlenecks that plague modern networks gave rise to Edge computing, a key component of the Connected Autonomous Vehicle system, as the need for low-latency in some domains eclipsed the need for massive processing farms. The Connected Autonomous Vehicle ecosystem is one of the most complicated environments in all of computing. Not only is the hardware scaled all the way from 16 and 32-bit microcontrollers, to multi-CPU Edge nodes, and multi-GPU Cloud servers, but the networking also encompasses the gamut of modern communication transports. I propose a framework for negotiating, encapsulating and transferring data between vehicles ensuring efficient bandwidth utilization and respecting real-time privacy levels.
Understanding and Reasoning with Negation
In this dissertation, I start with an analysis of negation in eleven benchmark corpora covering six Natural Language Understanding (NLU) tasks. With a thorough investigation, I first show that (a) these benchmarks contain fewer negations compared to general-purpose English and (b) the few negations they contain are often unimportant. Further, my empirical studies demonstrate that state-of-the-art transformers trained using these corpora obtain substantially worse results with the instances that contain negation, especially if the negations are important. Second, I investigate whether translating negation is also an issue for modern machine translation (MT) systems. My studies find that indeed the presence of negation can significantly impact translation quality, in some cases resulting in reductions of over 60%. In light of these findings, I investigate strategies to better understand the semantics of negation. I start with identifying the focus of negation. I develop a neural model that takes into account the scope of negation, context from neighboring sentences, or both. My best proposed system obtains an accuracy improvement of 7.4% over prior work. Further, I analyze the main error categories of the systems through a detailed error analysis. Next, I explore more practical ways to understand the semantics of negation. I consider revealing the meaning of negation by revealing their affirmative interpretations. First, I propose a question-answer driven approach to create AFIN, a collection of 3,001 sentences with verbal negations and their affirmative interpretations. Then, I present an automated procedure to collect pairs of sentences with negation and their affirmative interpretations, resulting in over 150,000 pairs. Experimental results demonstrate that leveraging these pairs helps (a) a T5 system generate affirmative interpretations from negations in AFIN and (b) state-of-the-art transformers solve natural language understanding tasks, including natural language inference and sentiment analysis. Furthermore, I develop a plug-and-play affirmative interpretation generator that is potentially …
Enhancing Storage Dependability and Computing Energy Efficiency for Large-Scale High Performance Computing Systems
With the advent of information explosion age, larger capacity disk drives are used to store data and powerful devices are used to process big data. As the scale and complexity of computer systems increase, we expect these systems to provide dependable and energy-efficient services and computation. Although hard drives are reliable in general, they are the most commonly replaced hardware components. Disk failures cause data corruption and even data loss, which can significantly affect system performance and financial losses. In this dissertation research, I analyze different manifestations of disk failures in production data centers and explore data mining techniques combined with statistical analysis methods to discover categories of disk failures and their distinctive properties. I use similarity measures to quantify the degradation process of each failure type and derive the degradation signature. The derived degradation signatures are further leveraged to forecast when future disk failures may happen. Meanwhile, this dissertation also studies energy efficiency of high performance computers. Specifically, I characterize the power and energy consumption of Haswell processors which are used in multiple supercomputers, and analyze the power and energy consumption of Legion, a data-centric programming model and runtime system, and Legion applications. We find that power and energy efficiency can be improved significantly by optimizing the settings and runtime scheduling of processors, and Legion runtime performs well for larger-scale computation in terms of power and energy consumption.
Models to Combat Email Spam Botnets and Unwanted Phone Calls
With the amount of email spam received these days it is hard to imagine that spammers act individually. Nowadays, most of the spam emails have been sent from a collection of compromised machines controlled by some spammers. These compromised computers are often called bots, using which the spammers can send massive volume of spam within a short period of time. The motivation of this work is to understand and analyze the behavior of spammers through a large collection of spam mails. My research examined a the data set collected over a 2.5-year period and developed an algorithm which would give the botnet features and then classify them into various groups. Principal component analysis was used to study the association patterns of group of spammers and the individual behavior of a spammer in a given domain. This is based on the features which capture maximum variance of information we have clustered. Presence information is a growing tool towards more efficient communication and providing new services and features within a business setting and much more. The main contribution in my thesis is to propose the willingness estimator that can estimate the callee's willingness without his/her involvement, the model estimates willingness level based on call history. Finally, the accuracy of the proposed willingness estimator is validated with the actual call logs.
Back to Top of Screen