Search Results

Note: All results matching your query require you to be a member of the UNT Community (you must be on campus or login with university credentials for access).
Enhanced Approach for the Classification of Ulcerative Colitis Severity in Colonoscopy Videos Using CNN
Ulcerative colitis (UC) is a chronic inflammatory disease characterized by periods of relapses and remissions affecting more than 500,000 people in the United States. To achieve the therapeutic goals of UC, which are to first induce and then maintain disease remission, doctors need to evaluate the severity of UC of a patient. However, it is very difficult to evaluate the severity of UC objectively because of non-uniform nature of symptoms and large variations in their patterns. To address this, in our previous works, we developed two different approaches in which one is using the image textures, and the other is using CNN (convolutional neural network) to measure and classify objectively the severity of UC presented in optical colonoscopy video frames. But, we found that the image texture based approach could not handle larger number of variations in their patterns, and the CNN based approach could not achieve very high accuracy. In this paper, we improve our CNN based approach in two ways to provide better accuracy for the classification. We add more thorough and essential preprocessing, and generate more classes to accommodate large variations in their patterns. The experimental results show that the proposed preprocessing can improve the overall accuracy of evaluating the severity of UC.
Enhancing Storage Dependability and Computing Energy Efficiency for Large-Scale High Performance Computing Systems
With the advent of information explosion age, larger capacity disk drives are used to store data and powerful devices are used to process big data. As the scale and complexity of computer systems increase, we expect these systems to provide dependable and energy-efficient services and computation. Although hard drives are reliable in general, they are the most commonly replaced hardware components. Disk failures cause data corruption and even data loss, which can significantly affect system performance and financial losses. In this dissertation research, I analyze different manifestations of disk failures in production data centers and explore data mining techniques combined with statistical analysis methods to discover categories of disk failures and their distinctive properties. I use similarity measures to quantify the degradation process of each failure type and derive the degradation signature. The derived degradation signatures are further leveraged to forecast when future disk failures may happen. Meanwhile, this dissertation also studies energy efficiency of high performance computers. Specifically, I characterize the power and energy consumption of Haswell processors which are used in multiple supercomputers, and analyze the power and energy consumption of Legion, a data-centric programming model and runtime system, and Legion applications. We find that power and energy efficiency can be improved significantly by optimizing the settings and runtime scheduling of processors, and Legion runtime performs well for larger-scale computation in terms of power and energy consumption.
Mining Biomedical Data for Hidden Relationship Discovery
With an ever-growing number of publications in the biomedical domain, it becomes likely that important implicit connections between individual concepts of biomedical knowledge are overlooked. Literature based discovery (LBD) is in practice for many years to identify plausible associations between previously unrelated concepts. In this paper, we present a new, completely automatic and interactive system that creates a graph-based knowledge base to capture multifaceted complex associations among biomedical concepts. For a given pair of input concepts, our system auto-generates a list of ranked subgraphs uncovering possible previously unnoticed associations based on context information. To rank these subgraphs, we implement a novel ranking method using the context information obtained by performing random walks on the graph. In addition, we enhance the system by training a Neural Network Classifier to output the likelihood of the two concepts being likely related, which provides better insights to the end user.
Parallel Analysis of Aspect-Based Sentiment Summarization from Online Big-Data
Consumer's opinions and sentiments on products can reflect the performance of products in general or in various aspects. Analyzing these data is becoming feasible, considering the availability of immense data and the power of natural language processing. However, retailers have not taken full advantage of online comments. This work is dedicated to a solution for automatically analyzing and summarizing these valuable data at both product and category levels. In this research, a system was developed to retrieve and analyze extensive data from public online resources. A parallel framework was created to make this system extensible and efficient. In this framework, a star topological network was adopted in which each computing unit was assigned to retrieve a fraction of data and to assess sentiment. Finally, the preprocessed data were collected and summarized by the central machine which generates the final result that can be rendered through a web interface. The system was designed to have sound performance, robustness, manageability, extensibility, and accuracy.
A Performance and Security Analysis of Elliptic Curve Cryptography Based Real-Time Media Encryption
This dissertation emphasizes the security aspects of real-time media. The problems of existing real-time media protections are identified in this research, and viable solutions are proposed. First, the security of real-time media depends on the Secure Real-time Transport Protocol (SRTP) mechanism. We identified drawbacks of the existing SRTP Systems, which use symmetric key encryption schemes, which can be exploited by attackers. Elliptic Curve Cryptography (ECC), an asymmetric key cryptography scheme, is proposed to resolve these problems. Second, the ECC encryption scheme is based on elliptic curves. This dissertation explores the weaknesses of a widely used elliptic curve in terms of security and describes a more secure elliptic curve suitable for real-time media protection. Eighteen elliptic curves had been tested in a real-time video transmission system, and fifteen elliptic curves had been tested in a real-time audio transmission system. Based on the performance, X9.62 standard 256-bit prime curve, NIST-recommended 256-bit prime curves, and Brainpool 256-bit prime curves were found to be suitable for real-time audio encryption. Again, X9.62 standard 256-bit prime and 272-bit binary curves, and NIST-recommended 256-bit prime curves were found to be suitable for real-time video encryption.The weaknesses of NIST-recommended elliptic curves are discussed and a more secure new elliptic curve is proposed which can be used for real-time media encryption. The proposed curve has fulfilled all relevant security criteria, but the corresponding NIST curve could not fulfill two such criteria. The research is applicable to strengthen the security of the Internet of Things (IoT) devices, especially VoIP cameras. IoT devices have resource constraints and thus need lightweight encryption schemes for security. ECC could be a better option for these devices. VoIP cameras use a similar methodology to traditional real-time video transmission, so this research could be useful to find a better security solution for these devices.
Revealing the Positive Meaning of a Negation
Negation is a complex phenomenon present in all human languages, allowing for the uniquely human capacities of denial, contradiction, misrepresentation, lying, and irony. It is in the first place a phenomenon of semantical opposition. Sentences containing negation are generally (a) less informative than affirmative ones, (b) morphosyntactically more marked—all languages have negative markers while only a few have affirmative markers, and (c) psychologically more complex and harder to process. Negation often conveys positive meaning. This meaning ranges from implicatures to entailments. In this dissertation, I develop a system to reveal the underlying positive interpretation of negation. I first identify which words are intended to be negated (i.e, the focus of negation) and second, I rewrite those tokens to generate an actual positive interpretation. I identify the focus of negation by scoring probable foci along a continuous scale. One of the obstacles to exploring foci scoring is that no public datasets exist for this task. Thus, to study this problem I create new corpora. The corpora contain verbal, nominal and adjectival negations and their potential positive interpretations along with their scores ranging from 1 to 5. Then, I use supervised learning models for scoring the focus of negation. In order to rewrite the focus of negation with its positive interpretation, I work with negations from Simple Wikipedia, automatically generate potential positive interpretations, and then collect manual annotations that effectively rewrite the negation in positive terms. This procedure yields positive interpretations for approximately 77% of negations, and the final corpus includes over 5,700 negations and over 5,900 positive interpretations. I then use sequence-to-sequence neural models and provide baseline results.
Spatial Partitioning Algorithms for Solving Location-Allocation Problems
This dissertation presents spatial partitioning algorithms to solve location-allocation problems. Location-allocations problems pertain to both the selection of facilities to serve demand at demand points and the assignment of demand points to the selected or known facilities. In the first part of this dissertation, we focus on the well known and well-researched location-allocation problem, the "p-median problem", which is a distance-based location-allocation problem that involves selection and allocation of p facilities for n demand points. We evaluate the performance of existing p-median heuristic algorithms and investigate the impact of the scale of the problem, and the spatial distribution of demand points on the performance of these algorithms. Based on the results from this comparative study, we present guidelines for location analysts to aid them in selecting the best heuristic and corresponding parameters depending on the problem at hand. Additionally, we found that existing heuristic algorithms are not suitable for solving large-scale p-median problems in a reasonable amount of time. We present a density-based decomposition methodology to solve large-scale p-median problems efficiently. This algorithm identifies dense clusters in the region and uses a MapReduce procedure to select facilities in the clustered regions independently and combine the solutions from the subproblems. Lastly, we present a novel greedy heuristic algorithm to solve the contiguity constrained fixed facility demand distribution problem. The objective of this problem is to create contiguous service areas for the facilities such that the demand at all facilities is uniform or proportional to the available resources, while the distance between demand points and facilities is minimized. The results in this research are shown in the context of creating emergency response plans for bio-emergencies. The algorithms are used to select Point of Dispensing (POD) locations (if not known) and map them to population regions to ensure that all affected individuals are …
SurfKE: A Graph-Based Feature Learning Framework for Keyphrase Extraction
Current unsupervised approaches for keyphrase extraction compute a single importance score for each candidate word by considering the number and quality of its associated words in the graph and they are not flexible enough to incorporate multiple types of information. For instance, nodes in a network may exhibit diverse connectivity patterns which are not captured by the graph-based ranking methods. To address this, we present a new approach to keyphrase extraction that represents the document as a word graph and exploits its structure in order to reveal underlying explanatory factors hidden in the data that may distinguish keyphrases from non-keyphrases. Experimental results show that our model, which uses phrase graph representations in a supervised probabilistic framework, obtains remarkable improvements in performance over previous supervised and unsupervised keyphrase extraction systems.
Back to Top of Screen