In this research, multi-perspective image registration using LiDAR and visual images was considered. 2D-3D image registration is a difficult task because it requires the extraction of different semantic features from each modality. This problem is solved in three parts. The first step involves detection and extraction of common features from each of the data sets. The second step consists of associating the common features between two different modalities. Traditional methods use lines or orthogonal corners as common features. The third step consists of building the projection matrix. Many existing methods use global positing system (GPS) or inertial navigation system (INS) for an initial estimate of the camera pose. However, the approach discussed herein does not use GPS, INS, or any such devices for initial estimate; hence the model can be used in places like the lunar surface or Mars where GPS or INS are not available. A variation of the method is also described, which does not require strong features from both images but rather uses intensity gradients in the image. This can be useful when one image does not have strong features (such as lines) or there are too many extraneous features.
Future generation cellular networks are expected to deliver an omnipresent broadband access network for an endlessly increasing number of subscribers. Long term Evolution (LTE) represents a significant milestone towards wireless networks known as 4G cellular networks. A key feature of LTE is the implementation of enhanced Radio Resource Management (RRM) mechanism to improve the system performance. The structure of LTE networks was simplified by diminishing the number of the nodes of the core network. Also, the design of the radio protocol architecture is quite unique. In order to achieve high data rate in LTE, 3rd Generation Partnership Project (3GPP) has selected Orthogonal Frequency Division Multiplexing (OFDM) as an appropriate scheme in terms of downlinks. However, the proper scheme for an uplink is the Single-Carrier Frequency Domain Multiple Access due to the peak-to-average-power-ratio (PAPR) constraint. LTE packet scheduling plays a primary role as part of RRM to improve the system’s data rate as well as supporting various QoS requirements of mobile services. The major function of the LTE packet scheduler is to assign Physical Resource Blocks (PRBs) to mobile User Equipment (UE). In our work, we formed a proposed packet scheduler algorithm. The proposed scheduler algorithm acts based on the number of UEs attached to the eNodeB. To evaluate the proposed scheduler algorithm, we assumed two different scenarios based on a number of UEs. When the number of UE is lower than the number of PRBs, the UEs with highest Channel Quality Indicator (CQI) will be assigned PRBs. Otherwise, the scheduler will assign PRBs based on a given proportional fairness metric. The eNodeB’s throughput is increased when the proposed algorithm was implemented.
In recent years, the collection of genomic data has skyrocketed and databases of genomic data are growing at a faster rate than ever before. Although many computational methods have been developed to interpret these data, they tend to struggle to process the ever increasing file sizes that are being produced and fail to take advantage of the advances in multi-core processors by using parallel processing. In some instances, loss of accuracy has been a necessary trade off to allow faster computation of the data. This thesis discusses one such algorithm that has been developed and how changes were made to allow larger input file sizes and reduce the time required to achieve a result without sacrificing accuracy. An information entropy based algorithm was used as a basis to demonstrate these techniques. The algorithm dissects the distinctive patterns underlying genomic data efficiently requiring no a priori knowledge, and thus is applicable in a variety of biological research applications. This research describes how parallel processing and object-oriented programming techniques were used to process larger files in less time and achieve a more accurate result from the algorithm. Through object oriented techniques, the maximum allowable input file size was significantly increased from 200 mb to 2000 mb. Using parallel processing techniques allowed the program to finish processing data in less than half the time of the sequential version. The accuracy of the algorithm was improved by reducing data loss throughout the algorithm. Finally, adding user-friendly options enabled the program to use requests more effectively and further customize the logic used within the algorithm.
Wireless sensor networks are composed of sensor nodes, which can monitor an environment and observe events of interest. These networks are applied in various fields including but not limited to environmental, industrial and habitat monitoring. In many applications, the exact location of the sensor nodes is unknown after deployment. Localization is a process used to find sensor node's positional coordinates, which is vital information. The localization is generally assisted by anchor nodes that are also sensor nodes but with known locations. Anchor nodes generally are expensive and need to be optimally placed for effective localization. Passive localization is one of the localization techniques where the sensor nodes silently listen to the global events like thunder sounds, seismic waves, lighting, etc. According to previous studies, the ideal location to place anchor nodes was on the perimeter of the sensor network. This may not be the case in passive localization, since the function of anchor nodes here is different than the anchor nodes used in other localization systems. I do extensive studies on positioning anchor nodes for effective localization. Several simulations are run in dense and sparse networks for proper positioning of anchor nodes. I show that, for effective passive localization, the optimal placement of the anchor nodes is at the center of the network in such a way that no three anchor nodes share linearity. The more the non-linearity, the better the localization. The localization for our network design proves better when I place anchor nodes at right angles.
Two applications of a binary tree data type based on a simple pairing function (a bijection between natural numbers and pairs of natural numbers) are explored. First, the tree is used to encode natural numbers, and algorithms that perform basic arithmetic computations are presented along with formal proofs of their correctness. Second, using this "canonical" representation as a base type, algorithms for encoding and decoding additional isomorphic data types of other mathematical constructs (sets, sequences, etc.) are also developed. An experimental application to a memory management system is constructed and explored using these isomorphic types. A practical analysis of this system's runtime complexity and space savings are provided, along with a proof of concept framework for both applications of the binary tree type, in the Java programming language.
This thesis explores the classification of emotions in song lyrics, using automatic approaches applied to a novel corpus of 100 popular songs. I use crowd sourcing via Amazon Mechanical Turk to collect line-level emotions annotations for this collection of song lyrics. I then build classifiers that rely on textual features to automatically identify the presence of one or more of the following six Ekman emotions: anger, disgust, fear, joy, sadness and surprise. I compare different classification systems and evaluate the performance of the automatic systems against the manual annotations. I also introduce a system that uses data collected from the social network Twitter. I use the Twitter API to collect a large corpus of tweets manually labeled by their authors for one of the six emotions of interest. I then compare the classification of emotions obtained when training on data automatically collected from Twitter versus data obtained through crowd sourced annotations.
Mobile applications are growing in popularity and pose new problems in the area of software testing. In particular, mobile applications heavily depend upon user interactions and a dynamically changing environment of system events. In this thesis, we focus on user-driven events and use Q-learning, a reinforcement machine learning algorithm, to generate tests for Android applications under test (AUT). We implement a framework that automates the generation of GUI test cases by using our Q-learning approach and compare it to a uniform random (UR) implementation. A novel feature of our approach is that we generate user-driven event sequences through the GUI, without the source code or the model of the AUT. Hence, considerable amount of cost and time are saved by avoiding the need for model generation for generating the tests. Our results show that the systematic path exploration used by Q-learning results in higher average code coverage in comparison to the uniform random approach.
Shadows in indoor scenarios are usually characterized with multiple light sources that produce complex shadow patterns of a single object. Without removing shadow, the foreground object tends to be erroneously segmented. The inconsistent hue and intensity of shadows make automatic removal a challenging task. In this thesis, a dynamic thresholding and transfer learning-based method for removing shadows is proposed. The method suppresses light shadows with a dynamically computed threshold and removes dark shadows using an online learning strategy that is built upon a base classifier trained with manually annotated examples and refined with the automatically identified examples in the new videos. Experimental results demonstrate that despite variation of lighting conditions in videos our proposed method is able to adapt to the videos and remove shadows effectively. The sensitivity of shadow detection changes slightly with different confidence levels used in example selection for classifier retraining and high confidence level usually yields better performance with less retraining iterations.
Globally distributed software teams are widespread throughout industry. But finding reliable methods that can properly assess a team's activities is a real challenge. Methods such as surveys and manual coding of activities are too time consuming and are often unreliable. Recent advances in information retrieval and linguistics, however, suggest that automated and/or semi-automated text classification algorithms could be an effective way of finding differences in the communication patterns among individuals and groups. Communication among group members is frequent and generates a significant amount of data. Thus having a web-based tool that can automatically analyze the communication patterns among global software teams could lead to a better understanding of group performance. The goal of this thesis, therefore, is to compare automatic and semi-automatic measures of communication and evaluate their effectiveness in classifying different types of group activities that occur within a global software development project. In order to achieve this goal, we developed a web-based component that can be used to help clean and classify communication activities. The component was then used to compare different automated text classification techniques on various group activities to determine their effectiveness in correctly classifying data from a global software development team project.
The increasingly popular cloud-computing paradigm provides on-demand access to computing and storage with the appearance of unlimited resources. Users are given access to a variety of data and software utilities to manage their work. Users rent virtual resources and pay for only what they use. In spite of the many benefits that cloud computing promises, the lack of dependability in shared virtualized infrastructures is a major obstacle for its wider adoption, especially for mission-critical applications. Virtualization and multi-tenancy increase system complexity and dynamicity. They introduce new sources of failure degrading the dependability of cloud computing systems. To assure cloud dependability, in my dissertation research, I develop autonomic failure identification and diagnosis techniques that are crucial for understanding emergent, cloud-wide phenomena and self-managing resource burdens for cloud availability and productivity enhancement. We study the runtime cloud performance data collected from a cloud test-bed and by using traces from production cloud systems. We define cloud signatures including those metrics that are most relevant to failure instances. We exploit profiled cloud performance data in both time and frequency domain to identify anomalous cloud behaviors and leverage cloud metric subspace analysis to automate the diagnosis of observed failures. We implement a prototype of the anomaly identification system and conduct the experiments in an on-campus cloud computing test-bed and by using the Google datacenter traces. Our experimental results show that our proposed anomaly detection mechanism can achieve 93% detection sensitivity while keeping the false positive rate as low as 6.1% and outperform other tested anomaly detection schemes. In addition, the anomaly detector adapts itself by recursively learning from these newly verified detection results to refine future detection.
In many real-world applications, it is common to have uneven number of examples among multiple classes. The data imbalance, however, usually complicates the learning process, especially for the minority classes, and results in deteriorated performance. Boosting methods were proposed to handle the imbalance problem. These methods need elongated training time and require diversity among the classifiers of the ensemble to achieve improved performance. Additionally, extending the boosting method to handle multi-class data sets is not straightforward. Examples of applications that suffer from imbalanced multi-class data can be found in face recognition, where tens of classes exist, and in capsule endoscopy, which suffers massive imbalance between the classes. This dissertation introduces RegBoost, a new boosting framework to address the imbalanced, multi-class problems. This method applies a weighted stratified sampling technique and incorporates a regularization term that accommodates multi-class data sets and automatically determines the error bound of each base classifier. The regularization parameter penalizes the classifier when it misclassifies instances that were correctly classified in the previous iteration. The parameter additionally reduces the bias towards majority classes. Experiments are conducted using 12 diverse data sets with moderate to high imbalance ratios. The results demonstrate superior performance of the proposed method compared to several state-of-the-art algorithms for imbalanced, multi-class classification problems. More importantly, the sensitivity improvement of the minority classes using RegBoost is accompanied with the improvement of the overall accuracy for all classes. With unpredictability regularization, a diverse group of classifiers are created and the maximum accuracy improvement reaches above 24%. Using stratified undersampling, RegBoost exhibits the best efficiency. The reduction in computational cost is significant reaching above 50%. As the volume of training data increase, the gain of efficiency with the proposed method becomes more significant.
In recent years, brain computer interfaces (BCIs) have gained popularity in non-medical domains such as the gaming, entertainment, personal health, and marketing industries. A growing number of companies offer various inexpensive consumer grade BCIs and some of these companies have recently introduced the concept of BCI "App stores" in order to facilitate the expansion of BCI applications and provide software development kits (SDKs) for other developers to create new applications for their devices. The BCI applications access to users' unique brainwave signals, which consequently allows them to make inferences about users' thoughts and mental processes. Since there are no specific standards that govern the development of BCI applications, its users are at the risk of privacy breaches. In this work, we perform first comprehensive analysis of BCI App stores including software development kits (SDKs), application programming interfaces (APIs), and BCI applications w.r.t privacy issues. The goal is to understand the way brainwave signals are handled by BCI applications and what threats to the privacy of users exist. Our findings show that most applications have unrestricted access to users' brainwave signals and can easily extract private information about their users without them even noticing. We discuss potential privacy threats posed by current practices used in BCI App stores and then describe some countermeasures that could be used to mitigate the privacy threats. Also, develop a prototype which gives the BCI app users a choice to restrict their brain signal dynamically.
We have a huge amount of video data from extensively available surveillance cameras and increasingly growing technology to record the motion of a moving object in the form of trajectory data. With proliferation of location-enabled devices and ongoing growth in smartphone penetration as well as advancements in exploiting image processing techniques, tracking moving objects is more flawlessly achievable. In this work, we explore some domain-independent qualitative and quantitative features in raw trajectory (spatio-temporal) data in videos captured by a fixed single wide-angle view camera sensor in outdoor areas. We study the efficacy of those features in classifying four basic high level actions by employing two supervised learning algorithms and show how each of the features affect the learning algorithms’ overall accuracy as a single factor or confounded with others.
Publicly available datasets in health science are often large and observational, in contrast to experimental datasets where a small number of data are collected in controlled experiments. Variables' causal relationships in the observational dataset are yet to be determined. However, there is a significant interest in health science to discover and analyze causal relationships from health data since identified causal relationships will greatly facilitate medical professionals to prevent diseases or to mitigate the negative effects of the disease. Recent advances in Computer Science, particularly in Bayesian networks, has initiated a renewed interest for causality research. Causal relationships can be possibly discovered through learning the network structures from data. However, the number of candidate graphs grows in a more than exponential rate with the increase of variables. Exact learning for obtaining the optimal structure is thus computationally infeasible in practice. As a result, heuristic approaches are imperative to alleviate the difficulty of computations. This research provides effective and efficient learning tools for local causal discoveries and novel methods of learning causal structures with a combination of background knowledge. Specifically in the direction of constraint based structural learning, polynomial-time algorithms for constructing causal structures are designed with first-order conditional independence. Algorithms of efficiently discovering non-causal factors are developed and proved. In addition, when the background knowledge is partially known, methods of graph decomposition are provided so as to reduce the number of conditioned variables. Experiments on both synthetic data and real epidemiological data indicate the provided methods are applicable to large-scale datasets and scalable for causal analysis in health data. Followed by the research methods and experiments, this dissertation gives thoughtful discussions on the reliability of causal discoveries computational health science research, complexity, and implications in health science research.
POD (Point of Dispensing)-based emergency response plans involving mass prophylaxis may seem feasible when considering the choice of dispensing points within a region, overall population density, and estimated traffic demands. However, the plan may fail to serve particular vulnerable sub-populations, resulting in access disparities during emergency response. Federal authorities emphasize on the need to identify sub-populations that cannot avail regular services during an emergency due to their special needs to ensure effective response. Vulnerable individuals require the targeted allocation of appropriate resources to serve their special needs. Devising schemes to address the needs of vulnerable sub-populations is essential for the effectiveness of response plans. This research focuses on data-driven computational methods to quantify and address vulnerabilities in response plans that require the allocation of targeted resources. Data-driven methods to identify and quantify vulnerabilities in response plans are developed as part of this research. Addressing vulnerabilities requires the targeted allocation of appropriate resources to PODs. The problem of resource allocation to PODs during public health emergencies is introduced and the variants of the resource allocation problem such as the spatial allocation, spatio-temporal allocation and optimal resource subset variants are formulated. Generating optimal resource allocation and scheduling solutions can be computationally hard problems. The application of metaheuristic techniques to find near-optimal solutions to the resource allocation problem in response plans is investigated. A vulnerability analysis and resource allocation framework that facilitates the demographic analysis of population data in the context of response plans, and the optimal allocation of resources with respect to the analysis are described.
Blood pressure is vital sign information that physicians often need as preliminary data for immediate intervention during emergency situations or for regular monitoring of people with cardiovascular diseases. Despite the availability of portable blood pressure meters in the market, they are not regularly carried by people, creating a need for an ultra-portable measurement platform or device that can be easily carried and used at all times. One such device is the smartphone which, according to comScore survey is used by 26.2% of the US adult population. the mass production of these phones with built-in sensors and high computation power has created numerous possibilities for application development in different domains including biomedical. Motivated by this capability and their extensive usage, this thesis focuses on developing a blood pressure measurement platform on smartphones. Specifically, I developed a blood pressure measurement system on a smart phone using the built-in camera and a customized external microphone. the system consists of first obtaining heart beats using the microphone and finger pulse with the camera, and finally calculating the blood pressure using the recorded data. I developed techniques for finding the best location for obtaining the data, making the system usable by all categories of people. the proposed system resulted in accuracies between 90-100%, when compared to traditional blood pressure meters. the second part of this thesis presents a new system for remote heart beat monitoring using the smart phone. with the proposed system, heart beats can be transferred live by patients and monitored by physicians remotely for diagnosis. the proposed blood pressure measurement and remote monitoring systems will be able to facilitate information acquisition and decision making by the 9-1-1 operators.
Managing large-scale dynamical systems (e.g., transportation systems, complex information systems, and power networks, etc.) in real-time is very challenging considering their complicated system dynamics, intricate network interactions, large scale, and especially the existence of various uncertainties. To address this issue, intelligent techniques which can quickly design decision-making strategies that are robust to uncertainties are needed. This dissertation aims to conquer these challenges by exploring a data-driven decision-making framework, which leverages big-data techniques and scalable uncertainty evaluation approaches to quickly solve optimal control problems. In particular, following techniques have been developed along this direction: 1) system modeling approaches to simplify the system analysis and design procedures for multiple applications; 2) effective simulation and analytical based approaches to efficiently evaluate system performance and design control strategies under uncertainty; and 3) big-data techniques that allow some computations of control strategies to be completed offline. These techniques and tools for analysis, design and control contribute to a wide range of applications including air traffic flow management, complex information systems, and airborne networks.
Mobile phone advancements and ubiquitous internet connectivity are resulting in ever expanding possibilities in the application of smart phones. Users of mobile phones are now capable of hosting server applications from their personal devices. Whether providing services individually or in an ad hoc network setting the devices are currently not configured for defending against distributed denial of service (DDoS) attacks. These attacks, often launched from a botnet, have existed in the space of personal computing for decades but recently have begun showing up on mobile devices. Research is done first into the required steps to develop a potential botnet on the Android platform. This includes testing for the amount of malicious traffic an Android phone would be capable of generating for a DDoS attack. On the other end of the spectrum is the need of mobile devices running networked applications to develop security against DDoS attacks. For this mobile, phones are setup, with web servers running Apache to simulate users running internet connected applications for either local ad hoc networks or serving to the internet. Testing is done for the viability of using commonly available modules developed for Apache and intended for servers as well as finding baseline capabilities of mobiles to handle higher traffic volumes. Given the unique challenge of the limited resources a mobile phone can dedicate to Apache when compared to a dedicated hosting server a new method was needed. A proposed defense algorithm is developed for mitigating DDoS attacks against the mobile server that takes into account the limited resources available on the mobile device. The algorithm is tested against TCP socket flooding for effectiveness and shown to perform better than the common Apache module installations on a mobile device.
Free and fair elections are the basis for democracy, but conducting elections is not an easy task. Different groups of people are trying to influence the outcome of the election in their favor using the range of methods, from campaigning for a particular candidate to well-financed lobbying. Often the stakes are too high, and the methods are illegal. Two main properties of any voting scheme are the privacy of a voter’s choice and the integrity of the tally. Unfortunately, they are mutually exclusive. Integrity requires making elections transparent and auditable, but at the same time, we must preserve a voter’s privacy. It is always a trade-off between these two requirements. Current voting schemes favor privacy over auditability, and thus, they are vulnerable to voting fraud. I propose two novel voting systems that can achieve both privacy and verifiability. The first protocol is based on cryptographical primitives to ensure the integrity of the final tally and privacy of the voter. The second protocol is a simple paper-based voting scheme that achieves almost the same level of security without usage of cryptography.
Environmental monitoring represents a major application domain for wireless sensor networks (WSN). However, despite significant advances in recent years, there are still many challenging issues to be addressed to exploit the full potential of the emerging WSN technology. In this dissertation, we introduce the design and implementation of low-power wireless sensor networks for long-term, autonomous, and near-real-time environmental monitoring applications. We have developed an out-of-box solution consisting of a suite of software, protocols and algorithms to provide reliable data collection with extremely low power consumption. Two wireless sensor networks based on the proposed solution have been deployed in remote field stations to monitor soil moisture along with other environmental parameters. As parts of the ever-growing environmental monitoring cyberinfrastructure, these networks have been integrated into the Texas Environmental Observatory system for long-term operation. Environmental measurement and network performance results are presented to demonstrate the capability, reliability and energy-efficiency of the network.
The recent growth in sensor technology allows easier information gathering in real-time as sensors have grown smaller, more accurate, and less expensive. The resulting data is often in a geo-stream format continuously changing input with a spatial extent. Researchers developing geo-streaming management systems (GSMS) require a benchmark system for evaluation, which is currently lacking. This thesis presents GSMark, a benchmark for evaluating GSMSs. GSMark provides a data generator that creates a combination of synthetic and real geo-streaming data, a workload simulator to present the data to the GSMS as a data stream, and a set of benchmark queries that evaluate typical GSMS functionality and query performance. In particular, GSMark generates both moving points and evolving spatial regions, two fundamental data types for a broad range of geo-stream applications, and the geo-streaming queries on this data.
There are several types of disorders that affect our colon’s ability to function properly such as colorectal cancer, ulcerative colitis, diverticulitis, irritable bowel syndrome and colonic polyps. Automatic detection of these diseases would inform the endoscopist of possible sub-optimal inspection during the colonoscopy procedure as well as save time during post-procedure evaluation. But existing systems only detects few of those disorders like colonic polyps. In this dissertation, we address the automatic detection of another important disorder called ulcerative colitis. We propose a novel texture feature extraction technique to detect the severity of ulcerative colitis in block, image, and video levels. We also enhance the current informative frame filtering methods by detecting water and bubble frames using our proposed technique. Our feature extraction algorithm based on accumulation of pixel value difference provides better accuracy at faster speed than the existing methods making it highly suitable for real-time systems. We also propose a hybrid approach in which our feature method is combined with existing feature method(s) to provide even better accuracy. We extend the block and image level detection method to video level severity score calculation and shot segmentation. Also, the proposed novel feature extraction method can detect water and bubble frames in colonoscopy videos with very high accuracy in significantly less processing time even when clustering is used to reduce the training size by 10 times.
This work describes an approach to determine whether people participate in the events they tweet about. Specifically, we determine whether people are participants in events with respect to the tweet timestamp. We target all events expressed by verbs in tweets, including past, present and events that may occur in future. We define event participant as people directly involved in an event regardless of whether they are the agent, recipient or play another role. We present an annotation effort, guidelines and quality analysis with 1,096 event mentions. We discuss the label distributions and event behavior in the annotated corpus. We also explain several features used and a standard supervised machine learning approach to automatically determine if and when the author is a participant of the event in the tweet. We discuss trends in the results obtained and devise important conclusions.
Data is everywhere. The current Technological advancements in Digital, Social media and the ease at which the availability of different application services to interact with variety of systems are causing to generate tremendous volumes of data. Due to such varied services, Data format is now not restricted to only structure type like text but can generate unstructured content like social media data, videos and images etc. The generated Data is of no use unless been stored and analyzed to derive some Value. Traditional Database systems comes with limitations on the type of data format schema, access rates and storage sizes etc. Hadoop is an Apache open source distributed framework that support storing huge datasets of different formatted data reliably on its file system named Hadoop File System (HDFS) and to process the data stored on HDFS using MapReduce programming model. This thesis study is about building a Data Architecture using Hadoop and its related open source distributed frameworks to support a Data flow pipeline on a low commodity hardware. The Data flow components are, sourcing data, storage management on HDFS and data access layer. This study also discuss about a use case to utilize the architecture components. Sqoop, a framework to ingest the structured data from database onto Hadoop and Flume is used to ingest the semi-structured Twitter streaming json data on to HDFS for analysis. The data sourced using Sqoop and Flume have been analyzed using Hive for SQL like analytics and at a higher level of data access layer, Hadoop has been compared with an in memory computing system using Spark. Significant differences in query execution performances have been analyzed when working with Hadoop and Spark frameworks. This integration helps for ingesting huge Volumes of streaming json Variety data to derive better Value based analytics using Hive and ...
Extracting information from a stack of data is a tedious task and the scenario is no different in proteomics. Volumes of research papers are published about study of various proteins in several species, their interactions with other proteins and identification of protein(s) as possible biomarker in causing diseases. It is a challenging task for biologists to keep track of these developments manually by reading through the literatures. Several tools have been developed by computer linguists to assist identification, extraction and hypotheses generation of proteins and protein-protein interactions from biomedical publications and protein databases. However, they are confronted with the challenges of term variation, term ambiguity, access only to abstracts and inconsistencies in time-consuming manual curation of protein and protein-protein interaction repositories. This work attempts to attenuate the challenges by extracting protein-protein interactions in humans and elicit possible interactions using associative rule mining on full text, abstracts and captions from figures available from publicly available biomedical literature databases. Two such databases are used in our study: Directory of Open Access Journals (DOAJ) and PubMed Central (PMC). A corpus is built using articles based on search terms. A dataset of more than 38,000 protein-protein interactions from the Human Protein Reference Database (HPRD) is cross-referenced to validate discovered interactive pairs. A set of an optimal size of possible binary protein-protein interactions is generated to be made available for clinician or biological validation. A significant change in the number of new associations was found by altering the thresholds for support and confidence metrics. This study narrows down the limitations for biologists in keeping pace with discovery of protein-protein interactions via manually reading the literature and their needs to validate each and every possible interaction.
Students often use the web as a source of help for problems that they encounter on programming assignments.In this work, we seek to understand how students use the web to search for help on their assignments.We used a mixed methods approach with 344 students who complete a survey and 41 students who participate in a focus group meetings and helped in recording data about their search habits.The survey reveals data about student reported search habits while the focus group uses a web browser plug-in to record actual search patterns.We examine the results collectively and as broken down by class year.Survey results show that at least 2/3 of the students from each class year rely on search engines to locate resources for help with their programming bugs in at least half of their assignments;search habits vary by class year;and the value of different types of resources such as tutorials and forums varies by class year.Focus group results exposes the high frequency web sites used by the students in solving their programming assignments.
This work presents an investigation of a piezoelectric sensor based energy harvesting system, which collects energy from the surrounding environment. Increasing costs and scarcity of fossil fuels is a great concern today for supplying power to electronic devices. Furthermore, generating electricity by ordinary methods is a complicated process. Disposal of chemical batteries and cables is polluting the nature every day. Due to these reasons, research on energy harvesting from renewable resources has become mandatory in order to achieve improved methods and strategies of generating and storing electricity. Many low power devices being used in everyday life can be powered by harvesting energy from natural energy resources. Power overhead and power energy efficiency is of prime concern in electronic circuits. In this work, an energy harvester is modeled and simulated in Simscape™ for the functional analysis and comparison of achieved outcomes with previous work. Results demonstrate that the harvester produces power in the 0 μW to 100 μW range, which is an adequate amount to provide supply to low power devices. Power efficiency calculations also demonstrate that the implemented harvester is capable of generating and storing power for low power pervasive applications.
In recent years, social media platforms such as Twitter and Facebook have emerged as effective tools for broadcasting messages worldwide during disaster events. With millions of messages posted through these services during such events, it has become imperative to identify valuable information that can help the emergency responders to develop effective relief efforts and aid victims. Many studies implied that the role of social media during disasters is invaluable and can be incorporated into emergency decision-making process. However, due to the "big data" nature of social media, it is very labor-intensive to employ human resources to sift through social media posts and categorize/classify them as useful information. Hence, there is a growing need for machine intelligence to automate the process of extracting useful information from the social media data during disaster events. This dissertation addresses the following questions: In a social media stream of messages, what is the useful information to be extracted that can help emergency response organizations to become more situationally aware during and following a disaster? What are the features (or patterns) that can contribute to automatically identifying messages that are useful during disasters? We explored a wide variety of features in conjunction with supervised learning algorithms to automatically identify messages that are useful during disaster events. The feature design includes sentiment features to extract the geo-mapped sentiment expressed in tweets, as well as tweet-content and user detail features to predict the likelihood of the information contained in a tweet to be quickly spread in the network. Further experimentation is carried out to see how these features help in identifying the informative tweets and filter out those tweets that are conversational in nature.
The presence of naturally occurring and man-made public health threats necessitate the design and implementation of mitigation strategies, such that adequate response is provided in a timely manner. Since multiple variables, such as geographic properties, resource constraints, and government mandated time-frames must be accounted for, computational methods provide the necessary tools to develop contingency response plans while respecting underlying data and assumptions. A typical response scenario involves the placement of points of dispensing (PODs) in the affected geographic region to supply vaccines or medications to the general public. Computational tools aid in the analysis of such response plans, as well as in the strategic placement of PODs, such that feasible response scenarios can be developed. Due to the sensitivity of bio-emergency response plans, geographic information, such as POD locations, must be kept confidential. The generation of synthetic geographic regions allows for the development of emergency response plans on non-sensitive data, as well as for the study of the effects of single geographic parameters. Further, synthetic representations of geographic regions allow for results to be published and evaluated by the scientific community. This dissertation presents methodology for the analysis of bio-emergency response plans, methods for plan optimization, as well as methodology for the generation of synthetic geographic regions.
Software applications’ performance is hindered by a variety of factors, but most notably by the well-known CPU-memory speed gap (often known as the memory wall). This results in the CPU sitting idle waiting for data to be brought from memory to processor caches. The addressing used by caches cause non-uniform accesses to various cache sets. The non-uniformity is due to several reasons, including how different objects are accessed by the code and how the data objects are located in memory. Memory allocators determine where dynamically created objects are placed, thus defining addresses and their mapping to cache locations. It is important to evaluate how different allocators behave with respect to the localities of the created objects. Most allocators use a single attribute, the size, of an object in making allocation decisions. Additional attributes such as the placement with respect to other objects, or specific cache area may lead to better use of cache memories. In this dissertation, we proposed and implemented a framework that allows for the development and evaluation of new memory allocation techniques. At the root of the framework is a memory tracing tool called Gleipnir, which provides very detailed information about every memory access, and relates it back to source level objects. Using the traces from Gleipnir, we extended a commonly used cache simulator for generating detailed cache statistics: per function, per data object, per cache line, and identify specific data objects that are conflicting with each other. The utility of the framework is demonstrated with a new memory allocator known as equivalence class allocator. The new allocator allows users to specify cache sets, in addition to object size, where the objects should be placed. We compare this new allocator with two well-known allocators, viz., Doug Lea and Pool allocators.
Optical character recognition (OCR) software has advanced greatly in recent years. Machine-printed text can be scanned and converted to searchable text with word accuracy rates around 98%. Reasonably neat hand-printed text can be recognized with about 85% word accuracy. However, cursive handwriting still remains a challenge, with state-of-the-art performance still around 75%. Algorithms based on hidden Markov models have been only moderately successful, while recurrent neural networks have delivered the best results to date. This thesis explored the feasibility of using a special type of feedforward neural network to convert freeform cursive handwriting to searchable text. The hidden nodes in this network were grouped into clusters, with each cluster being trained to recognize a unique character bigram. The network was trained on writing samples that were pre-segmented and annotated. Post-processing was facilitated in part by using the network to identify overlapping bigrams that were then linked together to form words and sentences. With dictionary assisted post-processing, the network achieved word accuracy of 66.5% on a small, proprietary corpus. The contributions in this thesis are threefold: 1) the novel clustered architecture of the feed-forward neural network, 2) the development of an expanded set of observers combining image masks, modifiers, and feature characterizations, and 3) the use of overlapping bigrams as the textual working unit to assist in context analysis and reconstruction.
The purpose of this project is to explore the GPU for general purpose computing. The GPU is a massively parallel computing device that has a high-throughput, exhibits high arithmetic intensity, has a large market presence, and with the increasing computation power being added to it each year through innovations, the GPU is a perfect candidate to complement the CPU in performing computations. The GPU follows the single instruction multiple data (SIMD) model for applying operations on its data. This model allows the GPU to be very useful for assisting the CPU in performing computations on data that is highly parallel in nature. The compute unified device architecture (CUDA) is a parallel computing and programming platform for NVIDIA GPUs. The main focus of this project is to show the power, speed, and performance of a CUDA-enabled GPU for digital video watermark insertion in the H.264 video compression domain. Digital video watermarking in general is a highly computationally intensive process that is strongly dependent on the video compression format in place. The H.264/MPEG-4 AVC video compression format has high compression efficiency at the expense of having high computational complexity and leaving little room for an imperceptible watermark to be inserted. Employing a human visual model to limit distortion and degradation of visual quality introduced by the watermark is a good choice for designing a video watermarking algorithm though this does introduce more computational complexity to the algorithm. Research is being conducted into how the CPU-GPU execution of the digital watermark application can boost the speed of the applications several times compared to running the application on a standalone CPU using NVIDIA visual profiler to optimize the application.
Epidemics have caused major human and monetary losses through the course of human civilization. It is very important that epidemiologists and public health personnel are prepared to handle an impending infectious disease outbreak. the ever-changing demographics, evolving infrastructural resources of geographic regions, emerging and re-emerging diseases, compel the use of simulation to predict disease dynamics. By the means of simulation, public health personnel and epidemiologists can predict the disease dynamics, population groups at risk and their geographic locations beforehand, so that they are prepared to respond in case of an epidemic outbreak. As a consequence of the large numbers of individuals and inter-personal interactions involved in simulating infectious disease spread in a region such as a county, sizeable amounts of data may be produced that have to be analyzed. Methods to visualize this data would be effective in facilitating people from diverse disciplines understand and analyze the simulation. This thesis proposes a framework to simulate and visualize the spread of an infectious disease in a population of a region such as a county. As real-world populations have a non-homogeneous demographic and spatial distribution, this framework models the spread of an infectious disease based on population of and geographic distance between census blocks; social behavioral parameters for demographic groups. the population is stratified into demographic groups in individual census blocks using census data. Infection spread is modeled by means of local and global contacts generated between groups of population in census blocks. the strength and likelihood of the contacts are based on population, geographic distance and social behavioral parameters of the groups involved. the disease dynamics are represented on a geographic map of the region using a heat map representation, where the intensity of infection is mapped to a color scale. This framework provides a tool for public health personnel and ...
In the United States, smartphone ownership surpassed 69.5 million in February 2011 with a large portion of those users (20%) downloading applications (apps) that enhance the usability of a device by adding additional functionality. a large percentage of apps are written specifically to utilize the geographical position of a mobile device. One of the prime factors in developing location prediction models is the use of historical data to train such a model. with larger sets of training data, prediction algorithms become more accurate; however, the use of historical data can quickly become a downfall if the GPS stream is not collected or processed correctly. Inaccurate or incomplete or even improperly interpreted historical data can lead to the inability to develop accurately performing prediction algorithms. As GPS chipsets become the standard in the ever increasing number of mobile devices, the opportunity for the collection of GPS data increases remarkably. the goal of this study is to build a comprehensive system that addresses the following challenges: (1) collection of GPS data streams in a manner such that the data is highly usable and has a reduction in errors; (2) processing and reduction of the collected data in order to prepare it and make it highly usable for the creation of prediction algorithms; (3) creation of prediction/labeling algorithms at such a level that they are viable for commercial use. This study identifies the key research problems toward building the CaPPture (collection, processing, prediction) system.
Keyphrases describe a document in a coherent and simple way, giving the prospective reader a way to quickly determine whether the document satisfies their information needs. The pervasion of huge amount of information on Web, with only a small amount of documents have keyphrases extracted, there is a definite need to discover automatic keyphrase extraction systems. Typically, a document written by human develops around one or more general concepts or sub-concepts. These concepts or sub-concepts should be structured and semantically related with each other, so that they can form the meaningful representation of a document. Considering the fact, the phrases or concepts in a document are related to each other, a new approach for keyphrase extraction is introduced that exploits the semantic relations in the document. For measuring the semantic relations between concepts or sub-concepts in the document, I present a comprehensive study aimed at using collaboratively constructed semantic resources like Wikipedia and its link structure. In particular, I introduce a graph-based keyphrase extraction system that exploits the semantic relations in the document and features such as term frequency. I evaluated the proposed system using novel measures and the results obtained compare favorably with previously published results on established benchmarks.
Trusted Computing capability has become ubiquitous these days, and it is being widely deployed into consumer devices as well as enterprise platforms. As the number of threats is increasing at an exponential rate, it is becoming a daunting task to secure the systems against them. In this context, the software integrity measurement at runtime with the support of trusted platforms can be a better security strategy. Trusted Computing devices like TPM secure the evidence of a breach or an attack. These devices remain tamper proof if the hardware platform is physically secured. This type of trusted security is crucial for forensic analysis in the aftermath of a breach. The advantages of trusted platforms can be further leveraged if they can be used wisely. RADIUM (Race-free on-demand Integrity Measurement Architecture) is one such architecture, which is built on the strength of TPM. RADIUM provides an asynchronous root of trust to overcome the TOC condition of DRTM. Even though the underlying architecture is trusted, attacks can still compromise applications during runtime by exploiting their vulnerabilities. I propose an application-level integrity measurement solution that fits into RADIUM, to expand the trusted computing capability to the application layer. This is based on the concept of program invariants that can be used to learn the correct behavior of an application. I used Daikon, a tool to obtain dynamic likely invariants, and developed a method of observing these properties at runtime to verify the integrity. The integrity measurement component was implemented as a Python module on top of Volatility, a virtual machine introspection tool. My approach is a first step towards integrity attestation, using hypervisor-based introspection on RADIUM and a proof of concept of application-level measurement capability.
Abstract Due to the vast amount of information we are faced with, summarization has become a critical necessity of everyday human life. Given that a large fraction of the electronic documents available online and elsewhere consist of short texts such as Web pages, news articles, scientific reports, and others, the focus of natural language processing techniques to date has been on the automation of methods targeting short documents. We are witnessing however a change: an increasingly larger number of books become available in electronic format. This means that the need for language processing techniques able to handle very large documents such as books is becoming increasingly important. This thesis addresses the problem of summarization of novels, which are long and complex literary narratives. While there is a significant body of research that has been carried out on the task of automatic text summarization, most of this work has been concerned with the summarization of short documents, with a particular focus on news stories. However, novels are different in both length and genre, and consequently different summarization techniques are required. This thesis attempts to close this gap by analyzing a new domain for summarization, and by building unsupervised and supervised systems that effectively take into account the properties of long documents, and outperform the traditional extractive summarization systems typically addressing news genre.
Segmentation, recognition and 3D reconstruction of objects have been cutting-edge research topics, which have many applications ranging from environmental and medical to geographical applications as well as intelligent transportation. In this dissertation, I focus on the study of segmentation, recognition and 3D reconstruction of objects using LiDAR data/MRI. Three main works are that (I). Feature extraction algorithm based on sparse LiDAR data. A novel method has been proposed for feature extraction from sparse LiDAR data. The algorithm and the related principles have been described. Also, I have tested and discussed the choices and roles of parameters. By using correlation of neighboring points directly, statistic distribution of normal vectors at each point has been effectively used to determine the category of the selected point. (II). Segmentation and 3D reconstruction of objects based on LiDAR/MRI. The proposed method includes that the 3D LiDAR data are layered, that different categories are segmented, and that 3D canopy surfaces of individual tree crowns and clusters of trees are reconstructed from LiDAR point data based on a region active contour model. The proposed method allows for delineations of 3D forest canopy naturally from the contours of raw LiDAR point clouds. The proposed model is suitable not only for a series of ideal cone shapes, but also for other kinds of 3D shapes as well as other kinds dataset such as MRI. (III). Novel algorithms for recognition of objects based on LiDAR/MRI. Aimed to the sparse LiDAR data, the feature extraction algorithm has been proposed and applied to classify the building and trees. More importantly, the novel algorithms based on level set methods have been provided and employed to recognize not only the buildings and trees, the different trees (e.g. Oak trees and Douglas firs), but also the subthalamus nuclei (STNs). By using the novel algorithms based ...
Did you stand at a door with a bunch of keys and tried to find the right one to unlock the door? Did you hold a flower and wonder the name of it? A need of object recognition could rise anytime and any where in our daily lives. With the development of mobile devices object recognition applications become possible to provide immediate assistance. However, performing complex tasks in even the most advanced mobile platforms still faces great challenges due to the limited computing resources and computing power. In this thesis, we present an object recognition system that resides and executes within a mobile device, which can efficiently extract image features and perform learning and classification. To account for the computing constraint, a novel feature extraction method that minimizes the data size and maintains data consistency is proposed. This system leverages principal component analysis method and is able to update the trained classifier when new examples become available . Our system relieves users from creating a lot of examples and makes it user friendly. The experimental results demonstrate that a learning method trained with a very small number of examples can achieve recognition accuracy above 90% in various acquisition conditions. In addition, the system is able to perform learning efficiently.
Computer security attacks take place due to the presence of vulnerabilities and bugs in software applications. Bugs and vulnerabilities are the result of weak software architecture and lack of standard software development practices. Despite the fact that software companies are investing millions of dollars in the research and development of software designs security risks are still at large. In some cases software applications are found to carry vulnerabilities for many years before being identified. A recent such example is the popular Heart Bleed Bug in the Open SSL/TSL. In today’s world, where new software application are continuously being developed for a varied community of users; it’s highly unlikely to have software applications running without flaws. Attackers on computer system securities exploit these vulnerabilities and bugs and cause threat to privacy without leaving any trace. The most critical vulnerabilities are those which are related to the integrity of the software applications. Because integrity is directly linked to the credibility of software application and data it contains. Here I am giving solution of maintaining web applications integrity running on RADIUM by using daikon. Daikon generates invariants, these invariants are used to maintain the integrity of the web application and also check the correct behavior of web application at run time on RADIUM architecture in case of any attack or malware. I used data invariants and program flow invariants in my solution to maintain the integrity of web-application against such attack or malware. I check the behavior of my proposed invariants at run-time using Lib-VMI/Volatility memory introspection tool. This is a novel approach and proof of concept toward maintaining web application integrity on RADIUM.
While pragmatics, through its integration of situational awareness and real world relevant knowledge, offers a high level of analysis that is suitable for real interpretation of natural dialogue, semantics, on the other end, represents a lower yet more tractable and affordable linguistic level of analysis using current technologies. Generally, the understanding of semantic meaning in literature has revolved around the famous quote ``You shall know a word by the company it keeps''. In this thesis we investigate the role of context constituents in decoding the semantic meaning of the engulfing context; specifically we probe the role of salient concepts, defined as content-bearing expressions which afford encyclopedic definitions, as a suitable source of semantic clues to an unambiguous interpretation of context. Furthermore, we integrate this world knowledge in building a new and robust unsupervised semantic model and apply it to entail semantic relatedness between textual pairs, whether they are words, sentences or paragraphs. Moreover, we explore the abstraction of semantics across languages and utilize our findings into building a novel multi-lingual semantic relatedness model exploiting information acquired from various languages. We demonstrate the effectiveness and the superiority of our mono-lingual and multi-lingual models through a comprehensive set of evaluations on specialized synthetic datasets for semantic relatedness as well as real world applications such as paraphrase detection and short answer grading. Our work represents a novel approach to integrate world-knowledge into current semantic models and a means to cross the language boundary for a better and more robust semantic relatedness representation, thus opening the door for an improved abstraction of meaning that carries the potential of ultimately imparting understanding of natural language to machines.
Smart phones today have become increasingly popular with the general public for its diverse abilities like navigation, social networking, and multimedia facilities to name a few. These phones are equipped with high end processors, high resolution cameras, built-in sensors like accelerometer, orientation-sensor, light-sensor, and much more. According to comScore survey, 25.3% of US adults use smart phones in their daily lives. Motivated by the capability of smart phones and their extensive usage, I focused on utilizing them for bio-medical applications. In this thesis, I present a new application for a smart phone to quantify the vital signs such as heart rate, respiratory rate and blood pressure with the help of its built-in sensors. Using the camera and a microphone, I have shown how the blood pressure and heart rate can be determined for a subject. People sometimes encounter minor situations like fainting or fatal accidents like car crash at unexpected times and places. It would be useful to have a device which can measure all vital signs in such an event. The second part of this thesis demonstrates a new mode of communication for next generation 9-1-1 calls. In this new architecture, the call-taker will be able to control the multimedia elements in the phone from a remote location. This would help the call-taker or first responder to have a better control over the situation. Transmission of the vital signs measured using the smart phone can be a life saver in critical situations. In today's voice oriented 9-1-1 calls, the dispatcher first collects critical information (e.g., location, call-back number) from caller, and assesses the situation. Meanwhile, the dispatchers constantly face a "60-second dilemma"; i.e., within 60 seconds, they need to make a complicated but important decision, whether to dispatch and, if so, what to dispatch. The dispatchers often feel that ...
In developing countries, acute respiratory infections (ARIs) are responsible for two million deaths per year. Most victims are children who are less than 5 years old. Pneumonia kills 5000 children per day. The statistics for cardiovascular diseases (CVDs) are even more alarming. According to a 2009 report from the World Health Organization (WHO), CVDs kill 17 million people per year. In many resource-poor parts of the world such as India and China, many people are unable to access cardiologists, pulmonologists, and other specialists. Hence, low skilled health professionals are responsible for screening people for ARIs and CVDs in these areas. For example, in the rural areas of the Philippines, there is only one doctor for every 10,000 people. By contrast, the United States has one doctor for every 500 Americans. Due to advances in technology, it is now possible to use a smartphone for audio recording, signal processing, and machine learning. In my thesis, I have developed an Android application named Smart Auscultation. Auscultation is a process in which physicians listen to heart and lung sounds to diagnose disorders. Cardiologists spend years mastering this skill. The Smart Auscultation application is capable of recording and classifying heart sounds, and can be used by public or clinical health workers. This application can detect abnormal heart sounds with up to 92-98% accuracy. In addition, the application can record, but not yet classify, lung sounds. This application will be able to help save thousands of lives by allowing anyone to identify abnormal heart and lung sounds.
How do the content and writing style of people who drink alcohol beverages stand out from non-drinkers? How much information can we learn about a person's alcohol consumption behavior by reading text that they have authored? This thesis attempts to extend the methods deployed in authorship attribution and authorship profiling research into the domain of automatically identifying the human action of drinking alcohol beverages. I examine how a psycholinguistics dictionary (the Linguistics Inquiry and Word Count lexicon, developed by James Pennebaker), together with Kenneth Burke's concept of words as symbols of human action, and James Wertsch's concept of mediated action provide a framework for analyzing meaningful data patterns from the content of blogs written by consumers of alcohol beverages. The contributions of this thesis to the research field are twofold. First, I show that it is possible to automatically identify blog posts that have content related to the consumption of alcohol beverages. And second, I provide a framework and tools to model human behavior through text analysis of blog data.
Emergency Medical Dispatch Protocols are guidelines that a 9-1-1 dispatcher uses to evaluate the nature of emergency, resources to send and the nature of help provided to the 9-1-1 caller. The current Dispatch Protocols are based on voice only call. But the Next Generation 9-1-1 (NG9-1-1) architecture will allow multimedia emergency calls. In this thesis I analyze and model the Emergency Medical Dispatch Protocols for NG9-1-1 architecture. I have identified various technical aspects to improve the NG9-1-1 Dispatch Protocols. The devices (smartphone) at the caller end have advanced to a point where they can be used to send and receive video, pictures and text. There are sensors embedded in them that can be used for initial diagnosis of the injured person. There is a need to improve the human computer (smartphone) interface to take advantage of technology so that callers can easily make use of various features available to them. The dispatchers at the 9-1-1 call center can make use of these new protocols to improve the quality and the response time. They will have capability of multiple media streams to interact with the caller and the first responders.The specific contributions in this thesis include developing applications that use smartphone sensors. The CPR application uses the smartphone to help administer effective CPR even if the person is not trained. The application makes the CPR process closed loop, i.e., the person who administers the CPR as well as the 9-1-1 operator receive feedback and prompt from the application about the correctness of the CPR. The breathing application analyzes the quality of breathing of the affected person and automatically sends the information to the 9-1-1 operator. In order to improve the Human Computer Interface at the caller and the operator end, I have analyzed Fitts law and extended it so that it ...
The history of mitigation programs to contain vector-borne diseases is a story of successes and failures. Due to the complex interplay among multiple factors that determine disease dynamics, the general principles for timely and specific intervention for incidence reduction or eradication of life-threatening diseases has yet to be determined. This research discusses computational methods developed to assist in the understanding of complex relationships affecting vector-borne disease dynamics. A computational framework to assist public health practitioners with exploring the dynamics of vector-borne diseases, such as malaria and dengue in homogenous and heterogeneous populations, has been conceived, designed, and implemented. The framework integrates a stochastic computational model of interactions to simulate horizontal disease transmission. The intent of the computational modeling has been the integration of stochasticity during simulation of the disease progression while reducing the number of necessary interactions to simulate a disease outbreak. While there are improvements in the computational time reducing the number of interactions needed for simulating disease dynamics, the realization of interactions can remain computationally expensive. Using multi-threading technology to improve performance upon the original computational model, multi-threading experimental results have been tested and reported. In addition, to the contact model, the modeling of biological processes specific to the corresponding pathogen-carrier vector to increase the specificity of the vector-borne disease has been integrated. Last, automation for requesting, retrieving, parsing, and storing specific weather data and geospatial information from federal agencies to study the differences between homogenous and heterogeneous populations has been implemented.
Multi-modal image fusion is an active research area with many civilian and military applications. Fusion is defined as strategic combination of information collected by various sensors from different locations or different types in order to obtain a better understanding of an observed scene or situation. Fusion of multi-modal images cannot be completed unless these two modalities are spatially aligned. In this research, I consider two important problems. Multi-modal, multi-perspective image registration and decision level fusion of multi-modal images. In particular, LiDAR and visual imagery. Multi-modal image registration is a difficult task due to the different semantic interpretation of features extracted from each modality. This problem is decoupled into three sub-problems. The first step is identification and extraction of common features. The second step is the determination of corresponding points. The third step consists of determining the registration transformation parameters. Traditional registration methods use low level features such as lines and corners. Using these features require an extensive optimization search in order to determine the corresponding points. Many methods use global positioning systems (GPS), and a calibrated camera in order to obtain an initial estimate of the camera parameters. The advantages of our work over the previous works are the following. First, I used high level-features, which significantly reduce the search space for the optimization process. Second, the determination of corresponding points is modeled as an assignment problem between a small numbers of objects. On the other side, fusing LiDAR and visual images is beneficial, due to the different and rich characteristics of both modalities. LiDAR data contain 3D information, while images contain visual information. Developing a fusion technique that uses the characteristics of both modalities is very important. I establish a decision-level fusion technique using manifold models.
Ambiguity is inherent to human language. In particular, word sense ambiguity is prevalent in all natural languages, with a large number of the words in any given language carrying more than one meaning. Word sense disambiguation is the task of automatically assigning the most appropriate meaning to a polysemous word within a given context. Generally the problem of resolving ambiguity in literature has revolved around the famous quote “you shall know the meaning of the word by the company it keeps.” In this thesis, we investigate the role of context for resolving ambiguity through three different approaches. Instead of using a predefined monolingual sense inventory such as WordNet, we use a language-independent framework where the word senses and sense-tagged data are derived automatically from Wikipedia. Using Wikipedia as a source of sense-annotations provides the much needed solution for knowledge acquisition bottleneck. In order to evaluate the viability of Wikipedia based sense-annotations, we cast the task of disambiguating polysemous nouns as a monolingual classification task and experimented on lexical samples from four different languages (viz. English, German, Italian and Spanish). The experiments confirm that the Wikipedia based sense annotations are reliable and can be used to construct accurate monolingual sense classifiers. It is a long belief that exploiting multiple languages helps in building accurate word sense disambiguation systems. Subsequently, we developed two approaches that recast the task of disambiguating polysemous nouns as a multilingual classification task. The first approach for multilingual word sense disambiguation attempts to effectively use a machine translation system to leverage two relevant multilingual aspects of the semantics of text. First, the various senses of a target word may be translated into different words, which constitute unique, yet highly salient signal that effectively expand the target word’s feature space. Second, the translated context words themselves embed co-occurrence information ...
Network security is a complex field that is handled by security professionals who need certain expertise and experience to configure security systems. With the ever increasing size of the networks, managing them is going to be a daunting task. What kind of solution can be used to generate effective security configurations by both security professionals and nonprofessionals alike? In this thesis, a web tool is developed to simplify the process of configuring security systems by translating direct human language input into meaningful, working security rules. These human language inputs yield the security rules that the individual wants to implement in their network. The human language input can be as simple as, "Block Facebook to my son's PC". This tool will translate these inputs into specific security rules and install the translated rules into security equipment such as virtualized Cisco FWSM network firewall, Netfilter host-based firewall, and Snort Network Intrusion Detection. This tool is implemented and tested in both a traditional network and a cloud environment. One thousand input policies were collected from various users such as staff from UNT departments' and health science, including individuals with network security background as well as students with a non-computer science background to analyze the tool's performance. The tool is tested for its accuracy (91%) in generating a security rule. It is also tested for accuracy of the translated rule (86%) compared to a standard rule written by security professionals. Nevertheless, the network security tool built has shown promise to both experienced and inexperienced people in network security field by simplifying the provisioning process to result in accurate and effective network security rules.
This thesis describes an approach for object recognition using the chordiogram shape-based descriptor. Global shape representations are highly susceptible to clutter generated due to the background or other irrelevant objects in real-world images. To overcome the problem, we aim to extract precise object shape using superpixel segmentation, perceptual grouping, and connected components. The employed shape descriptor chordiogram is based on geometric relationships of chords generated from the pairs of boundary points of an object. The chordiogram descriptor applies holistic properties of the shape and also proven suitable for object detection and digit recognition mechanisms. Additionally, it is translation invariant and robust to shape deformations. In spite of such excellent properties, chordiogram is not scale-invariant. To this end, we propose scale invariant chordiogram descriptors and intend to achieve a similar performance before and after applying scale invariance. Our experiments show that we achieve similar performance with and without scale invariance for silhouettes and real world object images. We also show experiments at different scales to confirm that we obtain scale invariance for chordiogram.
This dialog allows you to filter your current search.
Each of the Years listed note their name and the number of records that will be limited down to if you choose that option.
The list can be sorted by name or the count.