Search Results

Note: All results matching your query require you to be a member of the UNT Community (you must be on campus or login with university credentials for access).
Automated Defense Against Worm Propagation.
Worms have caused significant destruction over the last few years. Network security elements such as firewalls, IDS, etc have been ineffective against worms. Some worms are so fast that a manual intervention is not possible. This brings in the need for a stronger security architecture which can automatically react to stop worm propagation. The method has to be signature independent so that it can stop new worms. In this thesis, an automated defense system (ADS) is developed to automate defense against worms and contain the worm to a level where manual intervention is possible. This is accomplished with a two level architecture with feedback at each level. The inner loop is based on control system theory and uses the properties of PID (proportional, integral and differential controller). The outer loop works at the network level and stops the worm to reach its spread saturation point. In our lab setup, we verified that with only inner loop active the worm was delayed, and with both loops active we were able to restrict the propagation to 10% of the targeted hosts. One concern for deployment of a worm containment mechanism was degradation of throughput for legitimate traffic. We found that with proper intelligent algorithm we can minimize the degradation to an acceptable level.
Comparison and Evaluation of Existing Analog Circuit Simulator using Sigma-Delta Modulator
In the world of VLSI (very large scale integration) technology, there are many different types of circuit simulators that are used to design and predict the circuit behavior before actual fabrication of the circuit. In this thesis, I compared and evaluated existing circuit simulators by considering standard benchmark circuits. The circuit simulators which I evaluated and explored are Ngspice, Tclspice, Winspice (open source) and Spectre® (commercial). I also tested standard benchmarks using these circuit simulators and compared their outputs. The simulators are evaluated using design metrics in order to quantify their performance and identify efficient circuit simulators. In addition, I designed a sigma-delta modulator and its individual components using the analog behavioral language Verilog-A. Initially, I performed simulations of individual components of the sigma-delta modulator and later of the whole system. Finally, CMOS (complementary metal-oxide semiconductor) transistor-level circuits were designed for the differential amplifier, operational amplifier and comparator of the modulator.
Design and Optimization of Components in a 45nm CMOS Phase Locked Loop
A novel scheme of optimizing the individual components of a phase locked loop (PLL) which is used for stable clock generation and synchronization of signals is considered in this work. Verilog-A is used for the high level system design of the main components of the PLL, followed by the individual component wise optimization. The design of experiments (DOE) approach to optimize the analog, 45nm voltage controlled oscillator (VCO) is presented. Also a mixed signal analysis using the analog and digital Verilog behavior of components is studied. Overall a high level system design of a PLL, a systematic optimization of each of its components, and an analog and mixed signal behavioral design approach have been implemented using cadence custom IC design tools.
Enhanced Approach for the Classification of Ulcerative Colitis Severity in Colonoscopy Videos Using CNN
Ulcerative colitis (UC) is a chronic inflammatory disease characterized by periods of relapses and remissions affecting more than 500,000 people in the United States. To achieve the therapeutic goals of UC, which are to first induce and then maintain disease remission, doctors need to evaluate the severity of UC of a patient. However, it is very difficult to evaluate the severity of UC objectively because of non-uniform nature of symptoms and large variations in their patterns. To address this, in our previous works, we developed two different approaches in which one is using the image textures, and the other is using CNN (convolutional neural network) to measure and classify objectively the severity of UC presented in optical colonoscopy video frames. But, we found that the image texture based approach could not handle larger number of variations in their patterns, and the CNN based approach could not achieve very high accuracy. In this paper, we improve our CNN based approach in two ways to provide better accuracy for the classification. We add more thorough and essential preprocessing, and generate more classes to accommodate large variations in their patterns. The experimental results show that the proposed preprocessing can improve the overall accuracy of evaluating the severity of UC.
Enhancing Storage Dependability and Computing Energy Efficiency for Large-Scale High Performance Computing Systems
With the advent of information explosion age, larger capacity disk drives are used to store data and powerful devices are used to process big data. As the scale and complexity of computer systems increase, we expect these systems to provide dependable and energy-efficient services and computation. Although hard drives are reliable in general, they are the most commonly replaced hardware components. Disk failures cause data corruption and even data loss, which can significantly affect system performance and financial losses. In this dissertation research, I analyze different manifestations of disk failures in production data centers and explore data mining techniques combined with statistical analysis methods to discover categories of disk failures and their distinctive properties. I use similarity measures to quantify the degradation process of each failure type and derive the degradation signature. The derived degradation signatures are further leveraged to forecast when future disk failures may happen. Meanwhile, this dissertation also studies energy efficiency of high performance computers. Specifically, I characterize the power and energy consumption of Haswell processors which are used in multiple supercomputers, and analyze the power and energy consumption of Legion, a data-centric programming model and runtime system, and Legion applications. We find that power and energy efficiency can be improved significantly by optimizing the settings and runtime scheduling of processors, and Legion runtime performs well for larger-scale computation in terms of power and energy consumption.
A Language and Visual Interface to Specify Complex Spatial Pattern Mining
The emerging interests in spatial pattern mining leads to the demand for a flexible spatial pattern mining language, on which easy to use and understand visual pattern language could be built. It is worthwhile to define a pattern mining language called LCSPM to allow users to specify complex spatial patterns. I describe a proposed pattern mining language in this paper. A visual interface which allows users to specify the patterns visually is developed. Visual pattern queries are translated into the LCSPM language by a parser and data mining process can be triggered afterwards. The visual language is based on and goes beyond the visual language proposed in literature. I implemented a prototype system based on the open source JUMP framework.
Logic Programming Tools for Dynamic Content Generation and Internet Data Mining
The phenomenal growth of Information Technology requires us to elicit, store and maintain huge volumes of data. Analyzing this data for various purposes is becoming increasingly important. Data mining consists of applying data analysis and discovery algorithms that under acceptable computational efficiency limitations, produce a particular enumeration of patterns over the data. We present two techniques based on using Logic programming tools for data mining. Data mining analyzes data by extracting patterns which describe its structure and discovers co-relations in the form of rules. We distinguish analysis methods as visual and non-visual and present one application of each. We explain that our focus on the field of Logic Programming makes some of the very complex tasks related to Web based data mining and dynamic content generation, simple and easy to implement in a uniform framework.
Memory Management and Garbage Collection Algorithms for Java-Based Prolog
Implementing a Prolog Runtime System in a language like Java which provides its own automatic memory management and safety features such as built--in index checking and array initialization requires a consistent approach to memory management based on a simple ultimate goal: minimizing total memory management time and extra space involved. The total memory management time for Jinni is made up of garbage collection time both for Java and Jinni itself. Extra space is usually requested at Jinni's garbage collection. This goal motivates us to find a simple and practical garbage collection algorithm and implementation for our Prolog engine. In this thesis we survey various algorithms already proposed and offer our own contribution to the study of garbage collection by improvements and optimizations for some classic algorithms. We implemented these algorithms based on the dynamic array algorithm for an all--dynamic Prolog engine (JINNI 2000). The comparisons of our implementations versus the originally proposed algorithm allow us to draw informative conclusions on their theoretical complexity model and their empirical effectiveness.
Mining Biomedical Data for Hidden Relationship Discovery
With an ever-growing number of publications in the biomedical domain, it becomes likely that important implicit connections between individual concepts of biomedical knowledge are overlooked. Literature based discovery (LBD) is in practice for many years to identify plausible associations between previously unrelated concepts. In this paper, we present a new, completely automatic and interactive system that creates a graph-based knowledge base to capture multifaceted complex associations among biomedical concepts. For a given pair of input concepts, our system auto-generates a list of ranked subgraphs uncovering possible previously unnoticed associations based on context information. To rank these subgraphs, we implement a novel ranking method using the context information obtained by performing random walks on the graph. In addition, we enhance the system by training a Neural Network Classifier to output the likelihood of the two concepts being likely related, which provides better insights to the end user.
Modeling and reduction of gate leakage during behavioral synthesis of nanoscale CMOS circuits.
The major sources of power dissipation in a nanometer CMOS circuit are capacitive switching, short-circuit current, static leakage and gate oxide tunneling. However, with the aggressive scaling of technology the gate oxide direct tunneling current (gate leakage) is emerging as a prominent component of power dissipation. For sub-65 nm CMOS technology where the gate oxide (SiO2) thickness is very low, the direct tunneling current is the major form of tunneling. There are two contribution parts in this thesis: analytical modeling of behavioral level components for direct tunneling current and propagation delay, and the reduction of tunneling current during behavioral synthesis. Gate oxides of multiple thicknesses are useful in reducing the gate leakage dissipation. Analytical models from first principles to calculate the tunneling current and the propagation delay of behavioral level components is presented, which are backed by BSIM4/5 models and SPICE simulations. These components are characterized for 45 nm technology and an algorithm is provided for scheduling of datapath operations such that the overall tunneling current dissipation of a datapath circuit under design is minimal. It is observed that the oxide thickness that is being considered is very low it may not remain constant during the course of fabrication. Hence the algorithm takes process variation into consideration. Extensive experiments are conducted for various behavioral level benchmarks under various constraints and observed significant reductions, as high as 75.3% (with an average of 64.3%).
A Multi-Variate Analysis of SMTP Paths and Relays to Restrict Spam and Phishing Attacks in Emails
The classifier discussed in this thesis considers the path traversed by an email (instead of its content) and reputation of the relays, features inaccessible to spammers. Groups of spammers and individual behaviors of a spammer in a given domain were analyzed to yield association patterns, which were then used to identify similar spammers. Unsolicited and phishing emails were successfully isolated from legitimate emails, using analysis results. Spammers and phishers are also categorized into serial spammers/phishers, recent spammers/phishers, prospective spammers/phishers, and suspects. Legitimate emails and trusted domains are classified into socially close (family members, friends), socially distinct (strangers etc), and opt-outs (resolved false positives and false negatives). Overall this classifier resulted in far less false positives when compared to current filters like SpamAssassin, achieving a 98.65% precision, which is well comparable to the precisions achieved by SPF, DNSRBL blacklists.
A Netcentric Scientific Research Repository
The Internet and networks in general have become essential tools for disseminating in-formation. Search engines have become the predominant means of finding information on the Web and all other data repositories, including local resources. Domain scientists regularly acquire and analyze images generated by equipment such as microscopes and cameras, resulting in complex image files that need to be managed in a convenient manner. This type of integrated environment has been recently termed a netcentric sci-entific research repository. I developed a number of data manipulation tools that allow researchers to manage their information more effectively in a netcentric environment. The specific contributions are: (1) A unique interface for management of data including files and relational databases. A wrapper for relational databases was developed so that the data can be indexed and searched using traditional search engines. This approach allows data in databases to be searched with the same interface as other data. Fur-thermore, this approach makes it easier for scientists to work with their data if they are not familiar with SQL. (2) A Web services based architecture for integrating analysis op-erations into a repository. This technique allows the system to leverage the large num-ber of existing tools by wrapping them with a Web service and registering the service with the repository. Metadata associated with Web services was enhanced to allow this feature to be included. In addition, an improved binary to text encoding scheme was de-veloped to reduce the size overhead for sending large scientific data files via XML mes-sages used in Web services. (3) Integrated image analysis operations with SQL. This technique allows for images to be stored and managed conveniently in a relational da-tabase. SQL supplemented with map algebra operations is used to select and perform operations on sets of images.
Parallel Analysis of Aspect-Based Sentiment Summarization from Online Big-Data
Consumer's opinions and sentiments on products can reflect the performance of products in general or in various aspects. Analyzing these data is becoming feasible, considering the availability of immense data and the power of natural language processing. However, retailers have not taken full advantage of online comments. This work is dedicated to a solution for automatically analyzing and summarizing these valuable data at both product and category levels. In this research, a system was developed to retrieve and analyze extensive data from public online resources. A parallel framework was created to make this system extensible and efficient. In this framework, a star topological network was adopted in which each computing unit was assigned to retrieve a fraction of data and to assess sentiment. Finally, the preprocessed data were collected and summarized by the central machine which generates the final result that can be rendered through a web interface. The system was designed to have sound performance, robustness, manageability, extensibility, and accuracy.
A Performance and Security Analysis of Elliptic Curve Cryptography Based Real-Time Media Encryption
This dissertation emphasizes the security aspects of real-time media. The problems of existing real-time media protections are identified in this research, and viable solutions are proposed. First, the security of real-time media depends on the Secure Real-time Transport Protocol (SRTP) mechanism. We identified drawbacks of the existing SRTP Systems, which use symmetric key encryption schemes, which can be exploited by attackers. Elliptic Curve Cryptography (ECC), an asymmetric key cryptography scheme, is proposed to resolve these problems. Second, the ECC encryption scheme is based on elliptic curves. This dissertation explores the weaknesses of a widely used elliptic curve in terms of security and describes a more secure elliptic curve suitable for real-time media protection. Eighteen elliptic curves had been tested in a real-time video transmission system, and fifteen elliptic curves had been tested in a real-time audio transmission system. Based on the performance, X9.62 standard 256-bit prime curve, NIST-recommended 256-bit prime curves, and Brainpool 256-bit prime curves were found to be suitable for real-time audio encryption. Again, X9.62 standard 256-bit prime and 272-bit binary curves, and NIST-recommended 256-bit prime curves were found to be suitable for real-time video encryption.The weaknesses of NIST-recommended elliptic curves are discussed and a more secure new elliptic curve is proposed which can be used for real-time media encryption. The proposed curve has fulfilled all relevant security criteria, but the corresponding NIST curve could not fulfill two such criteria. The research is applicable to strengthen the security of the Internet of Things (IoT) devices, especially VoIP cameras. IoT devices have resource constraints and thus need lightweight encryption schemes for security. ECC could be a better option for these devices. VoIP cameras use a similar methodology to traditional real-time video transmission, so this research could be useful to find a better security solution for these devices.
Procedural content creation and technologies for 3D graphics applications and games.
The recent transformation of consumer graphics (CG) cards into powerful 3D rendering processors is due in large measure to the success of game developers in delivering mass market entertainment software that feature highly immersive and captivating virtual environments. Despite this success, 3D CG application development is becoming increasingly handicapped by the inability of traditional content creation methods to keep up with the demand for content. The term content is used here to refer to any data operated on by application code that is meant for viewing, including 3D models, textures, animation sequences and maps or other data-intensive descriptions of virtual environments. Traditionally, content has been handcrafted by humans. A serious problem facing the interactive graphics software development community is how to increase the rate at which content can be produced to keep up with the increasingly rapid pace at which software for interactive applications can now be developed. Research addressing this problem centers around procedural content creation systems. By moving away from purely human content creation toward systems in which humans play a substantially less time-intensive but no less creative part in the process, procedural content creation opens new doors. From a qualitative standpoint, these types of systems will not rely less on human intervention but rather more since they will depend heavily on direction from a human in order to synthesize the desired content. This research draws heavily from the entertainment software domain but the research is broadly relevant to 3D graphics applications in general.
Revealing the Positive Meaning of a Negation
Negation is a complex phenomenon present in all human languages, allowing for the uniquely human capacities of denial, contradiction, misrepresentation, lying, and irony. It is in the first place a phenomenon of semantical opposition. Sentences containing negation are generally (a) less informative than affirmative ones, (b) morphosyntactically more marked—all languages have negative markers while only a few have affirmative markers, and (c) psychologically more complex and harder to process. Negation often conveys positive meaning. This meaning ranges from implicatures to entailments. In this dissertation, I develop a system to reveal the underlying positive interpretation of negation. I first identify which words are intended to be negated (i.e, the focus of negation) and second, I rewrite those tokens to generate an actual positive interpretation. I identify the focus of negation by scoring probable foci along a continuous scale. One of the obstacles to exploring foci scoring is that no public datasets exist for this task. Thus, to study this problem I create new corpora. The corpora contain verbal, nominal and adjectival negations and their potential positive interpretations along with their scores ranging from 1 to 5. Then, I use supervised learning models for scoring the focus of negation. In order to rewrite the focus of negation with its positive interpretation, I work with negations from Simple Wikipedia, automatically generate potential positive interpretations, and then collect manual annotations that effectively rewrite the negation in positive terms. This procedure yields positive interpretations for approximately 77% of negations, and the final corpus includes over 5,700 negations and over 5,900 positive interpretations. I then use sequence-to-sequence neural models and provide baseline results.
Spatial Partitioning Algorithms for Solving Location-Allocation Problems
This dissertation presents spatial partitioning algorithms to solve location-allocation problems. Location-allocations problems pertain to both the selection of facilities to serve demand at demand points and the assignment of demand points to the selected or known facilities. In the first part of this dissertation, we focus on the well known and well-researched location-allocation problem, the "p-median problem", which is a distance-based location-allocation problem that involves selection and allocation of p facilities for n demand points. We evaluate the performance of existing p-median heuristic algorithms and investigate the impact of the scale of the problem, and the spatial distribution of demand points on the performance of these algorithms. Based on the results from this comparative study, we present guidelines for location analysts to aid them in selecting the best heuristic and corresponding parameters depending on the problem at hand. Additionally, we found that existing heuristic algorithms are not suitable for solving large-scale p-median problems in a reasonable amount of time. We present a density-based decomposition methodology to solve large-scale p-median problems efficiently. This algorithm identifies dense clusters in the region and uses a MapReduce procedure to select facilities in the clustered regions independently and combine the solutions from the subproblems. Lastly, we present a novel greedy heuristic algorithm to solve the contiguity constrained fixed facility demand distribution problem. The objective of this problem is to create contiguous service areas for the facilities such that the demand at all facilities is uniform or proportional to the available resources, while the distance between demand points and facilities is minimized. The results in this research are shown in the context of creating emergency response plans for bio-emergencies. The algorithms are used to select Point of Dispensing (POD) locations (if not known) and map them to population regions to ensure that all affected individuals are …
SurfKE: A Graph-Based Feature Learning Framework for Keyphrase Extraction
Current unsupervised approaches for keyphrase extraction compute a single importance score for each candidate word by considering the number and quality of its associated words in the graph and they are not flexible enough to incorporate multiple types of information. For instance, nodes in a network may exhibit diverse connectivity patterns which are not captured by the graph-based ranking methods. To address this, we present a new approach to keyphrase extraction that represents the document as a word graph and exploits its structure in order to reveal underlying explanatory factors hidden in the data that may distinguish keyphrases from non-keyphrases. Experimental results show that our model, which uses phrase graph representations in a supervised probabilistic framework, obtains remarkable improvements in performance over previous supervised and unsupervised keyphrase extraction systems.
Using Reinforcement Learning in Partial Order Plan Space
Partial order planning is an important approach that solves planning problems without completely specifying the orderings between the actions in the plan. This property provides greater flexibility in executing plans; hence making the partial order planners a preferred choice over other planning methodologies. However, in order to find partially ordered plans, partial order planners perform a search in plan space rather than in space of world states and an uninformed search in plan space leads to poor efficiency. In this thesis, I discuss applying a reinforcement learning method, called First-visit Monte Carlo method, to partial order planning in order to design agents which do not need any training data or heuristics but are still able to make informed decisions in plan space based on experience. Communicating effectively with the agent is crucial in reinforcement learning. I address how this task was accomplished in plan space and the results from an evaluation of a blocks world test bed.
Back to Top of Screen