You limited your search to:

  Partner: UNT Libraries
 Degree Discipline: Computer Science
Privacy Management for Online Social Networks
One in seven people in the world use online social networking for a variety of purposes -- to keep in touch with friends and family, to share special occasions, to broadcast announcements, and more. The majority of society has been bought into this new era of communication technology, which allows everyone on the internet to share information with friends. Since social networking has rapidly become a main form of communication, holes in privacy have become apparent. It has come to the point that the whole concept of sharing information requires restructuring. No longer are online social networks simply technology available for a niche market; they are in use by all of society. Thus it is important to not forget that a sense of privacy is inherent as an evolutionary by-product of social intelligence. In any context of society, privacy needs to be a part of the system in order to help users protect themselves from others. This dissertation attempts to address the lack of privacy management in online social networks by designing models which understand the social science behind how we form social groups and share information with each other. Social relationship strength was modeled using activity patterns, vocabulary usage, and behavioral patterns. In addition, automatic configuration for default privacy settings was proposed to help prevent new users from leaking personal information. This dissertation aims to mobilize a new era of social networking that understands social aspects of human network, and uses that knowledge to honor users' privacy. digital.library.unt.edu/ark:/67531/metadc283816/
Efficient Algorithms and Framework for Bandwidth Allocation, Quality-of-Service Provisioning and Location Management in Mobile Wireless Computing
The fusion of computers and communications has promised to herald the age of information super-highway over high speed communication networks where the ultimate goal is to enable a multitude of users at any place, access information from anywhere and at any time. This, in a nutshell, is the goal envisioned by the Personal Communication Services (PCS) and Xerox's ubiquitous computing. In view of the remarkable growth of the mobile communication users in the last few years, the radio frequency spectrum allocated by the FCC (Federal Communications Commission) to this service is still very limited and the usable bandwidth is by far much less than the expected demand, particularly in view of the emergence of the next generation wireless multimedia applications like video-on-demand, WWW browsing, traveler information systems etc. Proper management of available spectrum is necessary not only to accommodate these high bandwidth applications, but also to alleviate problems due to sudden explosion of traffic in so called hot cells. In this dissertation, we have developed simple load balancing techniques to cope with the problem of tele-traffic overloads in one or more hot cells in the system. The objective is to ease out the high channel demand in hot cells by borrowing channels from suitable cold cells and by proper assignment (or, re-assignment) of the channels among the users. We also investigate possible ways of improving system capacity by rescheduling bandwidth in case of wireless multimedia traffic. In our proposed scheme, traffic using multiple channels releases one or more channels to increase the carried traffic or throughput in the system. Two orthogonal QoS parameters, called carried traffic and bandwidth degradation, are identified and a cost function describing the total revenue earned by the system from a bandwidth degradation and call admission policy, is formulated. A channel sharing scheme is proposed for co-existing real-time and non-real-time traffic and analyzed using a Markov modulated Poisson process (MMPP) based queueing model. The location management problem in mobile computing deals with the problem of a combined management of location updates and paging in the network, both of which consume scarce network resources like bandwidth, CPU cycles etc. An easily implementable location update scheme is developed which considers per-user mobility pattern on top of the conventional location area based approach and computes an update strategy for each user by minimizing the average location management cost. The cost optimization problem is elegantly solved using a genetic algorithm. digital.library.unt.edu/ark:/67531/metadc278885/
A Unifying Version Model for Objects and Schema in Object-Oriented Database System
There have been a number of different versioning models proposed. The research in this area can be divided into two categories: object versioning and schema versioning. In this dissertation, both problem domains are considered as a single unit. This dissertation describes a unifying version model (UVM) for maintaining changes to both objects and schema. UVM handles schema versioning operations by using object versioning techniques. The result is that the UVM allows the OODBMS to be much smaller than previous systems. Also, programmers need know only one set of versioning operations; thus, reducing the learning time by half. This dissertation shows that UVM is a simple but semantically sound and powerful version model for both objects and schema. digital.library.unt.edu/ark:/67531/metadc279222/
Symplectic Integration of Nonseparable Hamiltonian Systems
Numerical methods are usually necessary in solving Hamiltonian systems since there is often no closed-form solution. By utilizing a general property of Hamiltonians, namely the symplectic property, all of the qualities of the system may be preserved for indefinitely long integration times because all of the integral (Poincare) invariants are conserved. This allows for more reliable results and frequently leads to significantly shorter execution times as compared to conventional methods. The resonant triad Hamiltonian with one degree of freedom will be focused upon for most of the numerical tests because of its difficult nature and, moreover, analytical results exist whereby useful comparisons can be made. digital.library.unt.edu/ark:/67531/metadc278485/
Intrinsic and Extrinsic Adaptation in a Simulated Combat Environment
Genetic algorithm and artificial life techniques are applied to the development of challenging and interesting opponents in a combat-based computer game. Computer simulations are carried out against an idealized human player to gather data on the effectiveness of the computer generated opponents. digital.library.unt.edu/ark:/67531/metadc278231/
Computational Complexity of Hopfield Networks
There are three main results in this dissertation. They are PLS-completeness of discrete Hopfield network convergence with eight different restrictions, (degree 3, bipartite and degree 3, 8-neighbor mesh, dual of the knight's graph, hypercube, butterfly, cube-connected cycles and shuffle-exchange), exponential convergence behavior of discrete Hopfield network, and simulation of Turing machines by discrete Hopfield Network. digital.library.unt.edu/ark:/67531/metadc278272/
Exon/Intron Discrimination Using the Finite Induction Pattern Matching Technique
DNA sequence analysis involves precise discrimination of two of the sequence's most important components: exons and introns. Exons encode the proteins that are responsible for almost all the functions in a living organism. Introns interrupt the sequence coding for a protein and must be removed from primary RNA transcripts before translation to protein can occur. A pattern recognition technique called Finite Induction (FI) is utilized to study the language of exons and introns. FI is especially suited for analyzing and classifying large amounts of data representing sequences of interest. It requires no biological information and employs no statistical functions. Finite Induction is applied to the exon and intron components of DNA by building a collection of rules based upon what it finds in the sequences it examines. It then attempts to match the known rule patterns with new rules formed as a result of analyzing a new sequence. A high number of matches predict a probable close relationship between the two sequences; a low number of matches signifies a large amount of difference between the two. This research demonstrates FI to be a viable tool for measurement when known patterns are available for the formation of rule sets. digital.library.unt.edu/ark:/67531/metadc277629/
Modeling Alcohol Consumption Using Blog Data
How do the content and writing style of people who drink alcohol beverages stand out from non-drinkers? How much information can we learn about a person's alcohol consumption behavior by reading text that they have authored? This thesis attempts to extend the methods deployed in authorship attribution and authorship profiling research into the domain of automatically identifying the human action of drinking alcohol beverages. I examine how a psycholinguistics dictionary (the Linguistics Inquiry and Word Count lexicon, developed by James Pennebaker), together with Kenneth Burke's concept of words as symbols of human action, and James Wertsch's concept of mediated action provide a framework for analyzing meaningful data patterns from the content of blogs written by consumers of alcohol beverages. The contributions of this thesis to the research field are twofold. First, I show that it is possible to automatically identify blog posts that have content related to the consumption of alcohol beverages. And second, I provide a framework and tools to model human behavior through text analysis of blog data. digital.library.unt.edu/ark:/67531/metadc271843/
Optimizing Non-pharmaceutical Interventions Using Multi-coaffiliation Networks
Computational modeling is of fundamental significance in mapping possible disease spread, and designing strategies for its mitigation. Conventional contact networks implement the simulation of interactions as random occurrences, presenting public health bodies with a difficult trade off between a realistic model granularity and robust design of intervention strategies. Recently, researchers have been investigating the use of agent-based models (ABMs) to embrace the complexity of real world interactions. At the same time, theoretical approaches provide epidemiologists with general optimization models in which demographics are intrinsically simplified. The emerging study of affiliation networks and co-affiliation networks provide an alternative to such trade off. Co-affiliation networks maintain the realism innate to ABMs while reducing the complexity of contact networks into distinctively smaller k-partite graphs, were each partition represent a dimension of the social model. This dissertation studies the optimization of intervention strategies for infectious diseases, mainly distributed in school systems. First, concepts of synthetic populations and affiliation networks are extended to propose a modified algorithm for the synthetic reconstruction of populations. Second, the definition of multi-coaffiliation networks is presented as the main social model in which risk is quantified and evaluated, thereby obtaining vulnerability indications for each school in the system. Finally, maximization of the mitigation coverage and minimization of the overall cost of intervention strategies are proposed and compared, based on centrality measures. digital.library.unt.edu/ark:/67531/metadc271860/
3D Reconstruction Using Lidar and Visual Images
In this research, multi-perspective image registration using LiDAR and visual images was considered. 2D-3D image registration is a difficult task because it requires the extraction of different semantic features from each modality. This problem is solved in three parts. The first step involves detection and extraction of common features from each of the data sets. The second step consists of associating the common features between two different modalities. Traditional methods use lines or orthogonal corners as common features. The third step consists of building the projection matrix. Many existing methods use global positing system (GPS) or inertial navigation system (INS) for an initial estimate of the camera pose. However, the approach discussed herein does not use GPS, INS, or any such devices for initial estimate; hence the model can be used in places like the lunar surface or Mars where GPS or INS are not available. A variation of the method is also described, which does not require strong features from both images but rather uses intensity gradients in the image. This can be useful when one image does not have strong features (such as lines) or there are too many extraneous features. digital.library.unt.edu/ark:/67531/metadc177193/
Automated Classification of Emotions Using Song Lyrics
This thesis explores the classification of emotions in song lyrics, using automatic approaches applied to a novel corpus of 100 popular songs. I use crowd sourcing via Amazon Mechanical Turk to collect line-level emotions annotations for this collection of song lyrics.  I then build classifiers that rely on textual features to automatically identify the presence of one or more of the following six Ekman emotions: anger, disgust, fear, joy, sadness and surprise. I compare different classification systems and evaluate the performance of the automatic systems against the manual annotations. I also introduce a system that uses data collected from the social network Twitter. I use the Twitter API to collect a large corpus of tweets manually labeled by their authors for one of the six emotions of interest. I then compare the classification of emotions obtained when training on data automatically collected from Twitter versus data obtained through crowd sourced annotations. digital.library.unt.edu/ark:/67531/metadc177253/
A Programming Language For Concurrent Processing
This thesis is a proposed solution to the problem of including an effective interrupt mechanism in the set of concurrent- processing primitives of a block-structured programming language or system. The proposed solution is presented in the form of a programming language definition and model. The language is called TRIPLE. digital.library.unt.edu/ark:/67531/metadc164005/
Multi-perspective, Multi-modal Image Registration and Fusion
Multi-modal image fusion is an active research area with many civilian and military applications. Fusion is defined as strategic combination of information collected by various sensors from different locations or different types in order to obtain a better understanding of an observed scene or situation. Fusion of multi-modal images cannot be completed unless these two modalities are spatially aligned. In this research, I consider two important problems. Multi-modal, multi-perspective image registration and decision level fusion of multi-modal images. In particular, LiDAR and visual imagery. Multi-modal image registration is a difficult task due to the different semantic interpretation of features extracted from each modality. This problem is decoupled into three sub-problems. The first step is identification and extraction of common features. The second step is the determination of corresponding points. The third step consists of determining the registration transformation parameters. Traditional registration methods use low level features such as lines and corners. Using these features require an extensive optimization search in order to determine the corresponding points. Many methods use global positioning systems (GPS), and a calibrated camera in order to obtain an initial estimate of the camera parameters. The advantages of our work over the previous works are the following. First, I used high level-features, which significantly reduce the search space for the optimization process. Second, the determination of corresponding points is modeled as an assignment problem between a small numbers of objects. On the other side, fusing LiDAR and visual images is beneficial, due to the different and rich characteristics of both modalities. LiDAR data contain 3D information, while images contain visual information. Developing a fusion technique that uses the characteristics of both modalities is very important. I establish a decision-level fusion technique using manifold models. digital.library.unt.edu/ark:/67531/metadc149562/
A Smooth-turn Mobility Model for Airborne Networks
In this article, I introduce a novel airborne network mobility model, called the Smooth Turn Mobility Model, that captures the correlation of acceleration for airborne vehicles across time and spatial coordinates. E?ective routing in airborne networks (ANs) relies on suitable mobility models that capture the random movement pattern of airborne vehicles. As airborne vehicles cannot make sharp turns as easily as ground vehicles do, the widely used mobility models for Mobile Ad Hoc Networks such as Random Waypoint and Random Direction models fail. Our model is realistic in capturing the tendency of airborne vehicles toward making straight trajectory and smooth turns with large radius, and whereas is simple enough for tractable connectivity analysis and routing design. digital.library.unt.edu/ark:/67531/metadc149603/
Automatic Tagging of Communication Data
Globally distributed software teams are widespread throughout industry. But finding reliable methods that can properly assess a team's activities is a real challenge. Methods such as surveys and manual coding of activities are too time consuming and are often unreliable. Recent advances in information retrieval and linguistics, however, suggest that automated and/or semi-automated text classification algorithms could be an effective way of finding differences in the communication patterns among individuals and groups. Communication among group members is frequent and generates a significant amount of data. Thus having a web-based tool that can automatically analyze the communication patterns among global software teams could lead to a better understanding of group performance. The goal of this thesis, therefore, is to compare automatic and semi-automatic measures of communication and evaluate their effectiveness in classifying different types of group activities that occur within a global software development project. In order to achieve this goal, we developed a web-based component that can be used to help clean and classify communication activities. The component was then used to compare different automated text classification techniques on various group activities to determine their effectiveness in correctly classifying data from a global software development team project. digital.library.unt.edu/ark:/67531/metadc149611/
Rapid Prototyping and Design of a Fast Random Number Generator
Information in the form of online multimedia, bank accounts, or password usage for diverse applications needs some form of security. the core feature of many security systems is the generation of true random or pseudorandom numbers. Hence reliable generators of such numbers are indispensable. the fundamental hurdle is that digital computers cannot generate truly random numbers because the states and transitions of digital systems are well understood and predictable. Nothing in a digital computer happens truly randomly. Digital computers are sequential machines that perform a current state and move to the next state in a deterministic fashion. to generate any secure hash or encrypted word a random number is needed. But since computers are not random, random sequences are commonly used. Random sequences are algorithms that generate a pattern of values that appear to be random but after some time start repeating. This thesis implements a digital random number generator using MATLAB, FGPA prototyping, and custom silicon design. This random number generator is able to use a truly random CMOS source to generate the random number. Statistical benchmarks are used to test the results and to show that the design works. Thus the proposed random number generator will be useful for online encryption and security. digital.library.unt.edu/ark:/67531/metadc115040/
Rapid Prototyping and Design of a Fast Random Number Generator
Information in the form of online multimedia, bank accounts, or password usage for diverse applications needs some form of security. the core feature of many security systems is the generation of true random or pseudorandom numbers. Hence reliable generators of such numbers are indispensable. the fundamental hurdle is that digital computers cannot generate truly random numbers because the states and transitions of digital systems are well understood and predictable. Nothing in a digital computer happens truly randomly. Digital computers are sequential machines that perform a current state and move to the next state in a deterministic fashion. to generate any secure hash or encrypted word a random number is needed. But since computers are not random, random sequences are commonly used. Random sequences are algorithms that generate a pattern of values that appear to be random but after some time start repeating. This thesis implements a digital random number generator using MATLAB, FGPA prototyping, and custom silicon design. This random number generator is able to use a truly random CMOS source to generate the random number. Statistical benchmarks are used to test the results and to show that the design works. Thus the proposed random number generator will be useful for online encryption and security. digital.library.unt.edu/ark:/67531/metadc115036/
A Global Stochastic Modeling Framework to Simulate and Visualize Epidemics
Epidemics have caused major human and monetary losses through the course of human civilization. It is very important that epidemiologists and public health personnel are prepared to handle an impending infectious disease outbreak. the ever-changing demographics, evolving infrastructural resources of geographic regions, emerging and re-emerging diseases, compel the use of simulation to predict disease dynamics. By the means of simulation, public health personnel and epidemiologists can predict the disease dynamics, population groups at risk and their geographic locations beforehand, so that they are prepared to respond in case of an epidemic outbreak. As a consequence of the large numbers of individuals and inter-personal interactions involved in simulating infectious disease spread in a region such as a county, sizeable amounts of data may be produced that have to be analyzed. Methods to visualize this data would be effective in facilitating people from diverse disciplines understand and analyze the simulation. This thesis proposes a framework to simulate and visualize the spread of an infectious disease in a population of a region such as a county. As real-world populations have a non-homogeneous demographic and spatial distribution, this framework models the spread of an infectious disease based on population of and geographic distance between census blocks; social behavioral parameters for demographic groups. the population is stratified into demographic groups in individual census blocks using census data. Infection spread is modeled by means of local and global contacts generated between groups of population in census blocks. the strength and likelihood of the contacts are based on population, geographic distance and social behavioral parameters of the groups involved. the disease dynamics are represented on a geographic map of the region using a heat map representation, where the intensity of infection is mapped to a color scale. This framework provides a tool for public health personnel and epidemiologists to run what-if analyses on disease spread in specific populations and plan for epidemic response. By the means of demographic stratification of population and incorporation of geographic distance and social behavioral parameters into the modeling of the outbreak, this framework takes into account non-homogeneity in demographic mix and spatial distribution of the population. Generation of contacts per population group instead of individuals contributes to lowering computational overhead. Heat map representation of the intensity of infection provides an intuitive way to visualize the disease dynamics. digital.library.unt.edu/ark:/67531/metadc115099/
Cuff-less Blood Pressure Measurement Using a Smart Phone
Access: Use of this item is restricted to the UNT Community.
Blood pressure is vital sign information that physicians often need as preliminary data for immediate intervention during emergency situations or for regular monitoring of people with cardiovascular diseases. Despite the availability of portable blood pressure meters in the market, they are not regularly carried by people, creating a need for an ultra-portable measurement platform or device that can be easily carried and used at all times. One such device is the smartphone which, according to comScore survey is used by 26.2% of the US adult population. the mass production of these phones with built-in sensors and high computation power has created numerous possibilities for application development in different domains including biomedical. Motivated by this capability and their extensive usage, this thesis focuses on developing a blood pressure measurement platform on smartphones. Specifically, I developed a blood pressure measurement system on a smart phone using the built-in camera and a customized external microphone. the system consists of first obtaining heart beats using the microphone and finger pulse with the camera, and finally calculating the blood pressure using the recorded data. I developed techniques for finding the best location for obtaining the data, making the system usable by all categories of people. the proposed system resulted in accuracies between 90-100%, when compared to traditional blood pressure meters. the second part of this thesis presents a new system for remote heart beat monitoring using the smart phone. with the proposed system, heart beats can be transferred live by patients and monitored by physicians remotely for diagnosis. the proposed blood pressure measurement and remote monitoring systems will be able to facilitate information acquisition and decision making by the 9-1-1 operators. digital.library.unt.edu/ark:/67531/metadc115102/
GPS CaPPture: a System for GPS Trajectory Collection, Processing, and Destination Prediction
In the United States, smartphone ownership surpassed 69.5 million in February 2011 with a large portion of those users (20%) downloading applications (apps) that enhance the usability of a device by adding additional functionality. a large percentage of apps are written specifically to utilize the geographical position of a mobile device. One of the prime factors in developing location prediction models is the use of historical data to train such a model. with larger sets of training data, prediction algorithms become more accurate; however, the use of historical data can quickly become a downfall if the GPS stream is not collected or processed correctly. Inaccurate or incomplete or even improperly interpreted historical data can lead to the inability to develop accurately performing prediction algorithms. As GPS chipsets become the standard in the ever increasing number of mobile devices, the opportunity for the collection of GPS data increases remarkably. the goal of this study is to build a comprehensive system that addresses the following challenges: (1) collection of GPS data streams in a manner such that the data is highly usable and has a reduction in errors; (2) processing and reduction of the collected data in order to prepare it and make it highly usable for the creation of prediction algorithms; (3) creation of prediction/labeling algorithms at such a level that they are viable for commercial use. This study identifies the key research problems toward building the CaPPture (collection, processing, prediction) system. digital.library.unt.edu/ark:/67531/metadc115089/
The Design Of A Benchmark For Geo-stream Management Systems
The recent growth in sensor technology allows easier information gathering in real-time as sensors have grown smaller, more accurate, and less expensive. The resulting data is often in a geo-stream format continuously changing input with a spatial extent. Researchers developing geo-streaming management systems (GSMS) require a benchmark system for evaluation, which is currently lacking. This thesis presents GSMark, a benchmark for evaluating GSMSs. GSMark provides a data generator that creates a combination of synthetic and real geo-streaming data, a workload simulator to present the data to the GSMS as a data stream, and a set of benchmark queries that evaluate typical GSMS functionality and query performance. In particular, GSMark generates both moving points and evolving spatial regions, two fundamental data types for a broad range of geo-stream applications, and the geo-streaming queries on this data. digital.library.unt.edu/ark:/67531/metadc103392/
Arithmetic Computations and Memory Management Using a Binary Tree Encoding af Natural Numbers
Two applications of a binary tree data type based on a simple pairing function (a bijection between natural numbers and pairs of natural numbers) are explored. First, the tree is used to encode natural numbers, and algorithms that perform basic arithmetic computations are presented along with formal proofs of their correctness. Second, using this "canonical" representation as a base type, algorithms for encoding and decoding additional isomorphic data types of other mathematical constructs (sets, sequences, etc.) are also developed. An experimental application to a memory management system is constructed and explored using these isomorphic types. A practical analysis of this system's runtime complexity and space savings are provided, along with a proof of concept framework for both applications of the binary tree type, in the Java programming language. digital.library.unt.edu/ark:/67531/metadc103323/
Investigating the Extractive Summarization of Literary Novels
Abstract Due to the vast amount of information we are faced with, summarization has become a critical necessity of everyday human life. Given that a large fraction of the electronic documents available online and elsewhere consist of short texts such as Web pages, news articles, scientific reports, and others, the focus of natural language processing techniques to date has been on the automation of methods targeting short documents. We are witnessing however a change: an increasingly larger number of books become available in electronic format. This means that the need for language processing techniques able to handle very large documents such as books is becoming increasingly important. This thesis addresses the problem of summarization of novels, which are long and complex literary narratives. While there is a significant body of research that has been carried out on the task of automatic text summarization, most of this work has been concerned with the summarization of short documents, with a particular focus on news stories. However, novels are different in both length and genre, and consequently different summarization techniques are required. This thesis attempts to close this gap by analyzing a new domain for summarization, and by building unsupervised and supervised systems that effectively take into account the properties of long documents, and outperform the traditional extractive summarization systems typically addressing news genre. digital.library.unt.edu/ark:/67531/metadc103298/
Measuring Semantic Relatedness Using Salient Encyclopedic Concepts
While pragmatics, through its integration of situational awareness and real world relevant knowledge, offers a high level of analysis that is suitable for real interpretation of natural dialogue, semantics, on the other end, represents a lower yet more tractable and affordable linguistic level of analysis using current technologies. Generally, the understanding of semantic meaning in literature has revolved around the famous quote ``You shall know a word by the company it keeps''. In this thesis we investigate the role of context constituents in decoding the semantic meaning of the engulfing context; specifically we probe the role of salient concepts, defined as content-bearing expressions which afford encyclopedic definitions, as a suitable source of semantic clues to an unambiguous interpretation of context. Furthermore, we integrate this world knowledge in building a new and robust unsupervised semantic model and apply it to entail semantic relatedness between textual pairs, whether they are words, sentences or paragraphs. Moreover, we explore the abstraction of semantics across languages and utilize our findings into building a novel multi-lingual semantic relatedness model exploiting information acquired from various languages. We demonstrate the effectiveness and the superiority of our mono-lingual and multi-lingual models through a comprehensive set of evaluations on specialized synthetic datasets for semantic relatedness as well as real world applications such as paraphrase detection and short answer grading. Our work represents a novel approach to integrate world-knowledge into current semantic models and a means to cross the language boundary for a better and more robust semantic relatedness representation, thus opening the door for an improved abstraction of meaning that carries the potential of ultimately imparting understanding of natural language to machines. digital.library.unt.edu/ark:/67531/metadc84212/
Toward a Data-Type-Based Real Time Geospatial Data Stream Management System
The advent of sensory and communication technologies enables the generation and consumption of large volumes of streaming data. Many of these data streams are geo-referenced. Existing spatio-temporal databases and data stream management systems are not capable of handling real time queries on spatial extents. In this thesis, we investigated several fundamental research issues toward building a data-type-based real time geospatial data stream management system. The thesis makes contributions in the following areas: geo-stream data models, aggregation, window-based nearest neighbor operators, and query optimization strategies. The proposed geo-stream data model is based on second-order logic and multi-typed algebra. Both abstract and discrete data models are proposed and exemplified. I further propose two useful geo-stream operators, namely Region By and WNN, which abstract common aggregation and nearest neighbor queries as generalized data model constructs. Finally, I propose three query optimization algorithms based on spatial, temporal, and spatio-temporal constraints of geo-streams. I show the effectiveness of the data model through many query examples. The effectiveness and the efficiency of the algorithms are validated through extensive experiments on both synthetic and real data sets. This work established the fundamental building blocks toward a full-fledged geo-stream database management system and has potential impact in many applications such as hazard weather alerting and monitoring, traffic analysis, and environmental modeling. digital.library.unt.edu/ark:/67531/metadc68070/
A Wireless Traffic Surveillance System Using Video Analytics
Video surveillance systems have been commonly used in transportation systems to support traffic monitoring, speed estimation, and incident detection. However, there are several challenges in developing and deploying such systems, including high development and maintenance costs, bandwidth bottleneck for long range link, and lack of advanced analytics. In this thesis, I leverage current wireless, video camera, and analytics technologies, and present a wireless traffic monitoring system. I first present an overview of the system. Then I describe the site investigation and several test links with different hardware/software configurations to demonstrate the effectiveness of the system. The system development process was documented to provide guidelines for future development. Furthermore, I propose a novel speed-estimation analytics algorithm that takes into consideration roads with slope angles. I prove the correctness of the algorithm theoretically, and validate the effectiveness of the algorithm experimentally. The experimental results on both synthetic and real dataset show that the algorithm is more accurate than the baseline algorithm 80% of the time. On average the accuracy improvement of speed estimation is over 3.7% even for very small slope angles. digital.library.unt.edu/ark:/67531/metadc68005/
Graph-Based Keyphrase Extraction Using Wikipedia
Keyphrases describe a document in a coherent and simple way, giving the prospective reader a way to quickly determine whether the document satisfies their information needs. The pervasion of huge amount of information on Web, with only a small amount of documents have keyphrases extracted, there is a definite need to discover automatic keyphrase extraction systems. Typically, a document written by human develops around one or more general concepts or sub-concepts. These concepts or sub-concepts should be structured and semantically related with each other, so that they can form the meaningful representation of a document. Considering the fact, the phrases or concepts in a document are related to each other, a new approach for keyphrase extraction is introduced that exploits the semantic relations in the document. For measuring the semantic relations between concepts or sub-concepts in the document, I present a comprehensive study aimed at using collaboratively constructed semantic resources like Wikipedia and its link structure. In particular, I introduce a graph-based keyphrase extraction system that exploits the semantic relations in the document and features such as term frequency. I evaluated the proposed system using novel measures and the results obtained compare favorably with previously published results on established benchmarks. digital.library.unt.edu/ark:/67531/metadc67939/
Techniques for Improving Uniformity in Direct Mapped Caches
Directly mapped caches are an attractive option for processor designers as they combine fast lookup times with reduced complexity and area. However, directly-mapped caches are prone to higher miss-rates as there are no candidates for replacement on a cache miss, hence data residing in a cache set would have to be evicted to the next level cache. Another issue that inhibits cache performance is the non-uniformity of accesses exhibited by most applications: some sets are under-utilized while others receive the majority of accesses. This implies that increasing the size of caches may not lead to proportionally improved cache hit rates. Several solutions that address cache non-uniformity have been proposed in the literature. These techniques have been proposed over the past decade and each proposal independently claims the benefit of reduced conflict misses. However, because the published results use different benchmarks and different experimental setups, (there is no established frame of reference for comparing these results) it is not easy to compare them. In this work we report a side-by-side comparison of these techniques. Finally, we propose and Adaptive-Partitioned cache for multi-threaded applications. This design limits inter-thread thrashing while dynamically reducing traffic to heavily accessed sets. digital.library.unt.edu/ark:/67531/metadc68025/
Measuring Vital Signs Using Smart Phones
Smart phones today have become increasingly popular with the general public for its diverse abilities like navigation, social networking, and multimedia facilities to name a few. These phones are equipped with high end processors, high resolution cameras, built-in sensors like accelerometer, orientation-sensor, light-sensor, and much more. According to comScore survey, 25.3% of US adults use smart phones in their daily lives. Motivated by the capability of smart phones and their extensive usage, I focused on utilizing them for bio-medical applications. In this thesis, I present a new application for a smart phone to quantify the vital signs such as heart rate, respiratory rate and blood pressure with the help of its built-in sensors. Using the camera and a microphone, I have shown how the blood pressure and heart rate can be determined for a subject. People sometimes encounter minor situations like fainting or fatal accidents like car crash at unexpected times and places. It would be useful to have a device which can measure all vital signs in such an event. The second part of this thesis demonstrates a new mode of communication for next generation 9-1-1 calls. In this new architecture, the call-taker will be able to control the multimedia elements in the phone from a remote location. This would help the call-taker or first responder to have a better control over the situation. Transmission of the vital signs measured using the smart phone can be a life saver in critical situations. In today's voice oriented 9-1-1 calls, the dispatcher first collects critical information (e.g., location, call-back number) from caller, and assesses the situation. Meanwhile, the dispatchers constantly face a "60-second dilemma"; i.e., within 60 seconds, they need to make a complicated but important decision, whether to dispatch and, if so, what to dispatch. The dispatchers often feel that they lack sufficient information to make a confident dispatch decision. This remote-media-control described in this system will be able to facilitate information acquisition and decision-making in emergency situations within the 60-second response window in 9-1-1 calls using new multimedia technologies. digital.library.unt.edu/ark:/67531/metadc33139/
Anchor Nodes Placement for Effective Passive Localization
Access: Use of this item is restricted to the UNT Community.
Wireless sensor networks are composed of sensor nodes, which can monitor an environment and observe events of interest. These networks are applied in various fields including but not limited to environmental, industrial and habitat monitoring. In many applications, the exact location of the sensor nodes is unknown after deployment. Localization is a process used to find sensor node's positional coordinates, which is vital information. The localization is generally assisted by anchor nodes that are also sensor nodes but with known locations. Anchor nodes generally are expensive and need to be optimally placed for effective localization. Passive localization is one of the localization techniques where the sensor nodes silently listen to the global events like thunder sounds, seismic waves, lighting, etc. According to previous studies, the ideal location to place anchor nodes was on the perimeter of the sensor network. This may not be the case in passive localization, since the function of anchor nodes here is different than the anchor nodes used in other localization systems. I do extensive studies on positioning anchor nodes for effective localization. Several simulations are run in dense and sparse networks for proper positioning of anchor nodes. I show that, for effective passive localization, the optimal placement of the anchor nodes is at the center of the network in such a way that no three anchor nodes share linearity. The more the non-linearity, the better the localization. The localization for our network design proves better when I place anchor nodes at right angles. digital.library.unt.edu/ark:/67531/metadc33132/
A Framework for Analyzing and Optimizing Regional Bio-Emergency Response Plans
The presence of naturally occurring and man-made public health threats necessitate the design and implementation of mitigation strategies, such that adequate response is provided in a timely manner. Since multiple variables, such as geographic properties, resource constraints, and government mandated time-frames must be accounted for, computational methods provide the necessary tools to develop contingency response plans while respecting underlying data and assumptions. A typical response scenario involves the placement of points of dispensing (PODs) in the affected geographic region to supply vaccines or medications to the general public. Computational tools aid in the analysis of such response plans, as well as in the strategic placement of PODs, such that feasible response scenarios can be developed. Due to the sensitivity of bio-emergency response plans, geographic information, such as POD locations, must be kept confidential. The generation of synthetic geographic regions allows for the development of emergency response plans on non-sensitive data, as well as for the study of the effects of single geographic parameters. Further, synthetic representations of geographic regions allow for results to be published and evaluated by the scientific community. This dissertation presents methodology for the analysis of bio-emergency response plans, methods for plan optimization, as well as methodology for the generation of synthetic geographic regions. digital.library.unt.edu/ark:/67531/metadc33200/
Elicitation of Protein-Protein Interactions from Biomedical Literature Using Association Rule Discovery
Extracting information from a stack of data is a tedious task and the scenario is no different in proteomics. Volumes of research papers are published about study of various proteins in several species, their interactions with other proteins and identification of protein(s) as possible biomarker in causing diseases. It is a challenging task for biologists to keep track of these developments manually by reading through the literatures. Several tools have been developed by computer linguists to assist identification, extraction and hypotheses generation of proteins and protein-protein interactions from biomedical publications and protein databases. However, they are confronted with the challenges of term variation, term ambiguity, access only to abstracts and inconsistencies in time-consuming manual curation of protein and protein-protein interaction repositories. This work attempts to attenuate the challenges by extracting protein-protein interactions in humans and elicit possible interactions using associative rule mining on full text, abstracts and captions from figures available from publicly available biomedical literature databases. Two such databases are used in our study: Directory of Open Access Journals (DOAJ) and PubMed Central (PMC). A corpus is built using articles based on search terms. A dataset of more than 38,000 protein-protein interactions from the Human Protein Reference Database (HPRD) is cross-referenced to validate discovered interactive pairs. A set of an optimal size of possible binary protein-protein interactions is generated to be made available for clinician or biological validation. A significant change in the number of new associations was found by altering the thresholds for support and confidence metrics. This study narrows down the limitations for biologists in keeping pace with discovery of protein-protein interactions via manually reading the literature and their needs to validate each and every possible interaction. digital.library.unt.edu/ark:/67531/metadc30508/
Rhythms of Interaction in Global Software Development Teams
Researchers have speculated that global software teams have activity patterns that are dictated by work-place schedules or a client's need. Similar patterns have been suggested for individuals enrolled in distant learning projects that require students to post feedback in response to questions or assignments. Researchers tend to accept the notion that students' temporal patterns adjust to academic or social calendars and are a result of choices made within these constraints. Although there is some evidence that culture do have an impact on communication activity behavior, there is not a clear how each of these factors may relate to work done in online groups. This particular study represents a new approach to studying student-group communication activities and also pursues an alternative approach by using activity data from students participating in a global software development project to generate a variety of complex measures that capture patterns about when students work. Students work habits are also often determined by where they live and what they are working on. Moreover, students tend to work on group projects in cycles, which correspond to a start, middle, and end time period. Knowledge obtained from this study should provide insight into current empirical research on global software development by defining the different time variables that can also be used to compare temporal patterns found in real-world teams. It should also inform studies about student team projects by helping instructors schedule group activities. digital.library.unt.edu/ark:/67531/metadc30476/
Socioscope: Human Relationship and Behavior Analysis in Mobile Social Networks
Access: Use of this item is restricted to the UNT Community.
The widely used mobile phone, as well as its related technologies had opened opportunities for a complete change on how people interact and build relationship across geographic and time considerations. The convenience of instant communication by mobile phones that broke the barrier of space and time is evidently the key motivational point on why such technologies so important in people's life and daily activities. Mobile phones have become the most popular communication tools. Mobile phone technology is apparently changing our relationship to each other in our work and lives. The impact of new technologies on people's lives in social spaces gives us the chance to rethink the possibilities of technologies in social interaction. Accordingly, mobile phones are basically changing social relations in ways that are intricate to measure with any precision. In this dissertation I propose a socioscope model for social network, relationship and human behavior analysis based on mobile phone call detail records. Because of the diversities and complexities of human social behavior, one technique cannot detect different features of human social behaviors. Therefore I use multiple probability and statistical methods for quantifying social groups, relationships and communication patterns, for predicting social tie strengths and for detecting human behavior changes and unusual consumption events. I propose a new reciprocity index to measure the level of reciprocity between users and their communication partners. The experimental results show that this approach is effective. Among other applications, this work is useful for homeland security, detection of unwanted calls (e.g., spam), telecommunication presence, and marketing. In my future work I plan to analyze and study the social network dynamics and evolution. digital.library.unt.edu/ark:/67531/metadc30533/
Design and Implementation of Large-Scale Wireless Sensor Networks for Environmental Monitoring Applications
Environmental monitoring represents a major application domain for wireless sensor networks (WSN). However, despite significant advances in recent years, there are still many challenging issues to be addressed to exploit the full potential of the emerging WSN technology. In this dissertation, we introduce the design and implementation of low-power wireless sensor networks for long-term, autonomous, and near-real-time environmental monitoring applications. We have developed an out-of-box solution consisting of a suite of software, protocols and algorithms to provide reliable data collection with extremely low power consumption. Two wireless sensor networks based on the proposed solution have been deployed in remote field stations to monitor soil moisture along with other environmental parameters. As parts of the ever-growing environmental monitoring cyberinfrastructure, these networks have been integrated into the Texas Environmental Observatory system for long-term operation. Environmental measurement and network performance results are presented to demonstrate the capability, reliability and energy-efficiency of the network. digital.library.unt.edu/ark:/67531/metadc28493/
Survey of Approximation Algorithms for Set Cover Problem
In this thesis, I survey 11 approximation algorithms for unweighted set cover problem. I have also implemented the three algorithms and created a software library that stores the code I have written. The algorithms I survey are: 1. Johnson's standard greedy; 2. f-frequency greedy; 3. Goldsmidt, Hochbaum and Yu's modified greedy; 4. Halldorsson's local optimization; 5. Dur and Furer semi local optimization; 6. Asaf Levin's improvement to Dur and Furer; 7. Simple rounding; 8. Randomized rounding; 9. LP duality; 10. Primal-dual schema; and 11. Network flow technique. Most of the algorithms surveyed are refinements of standard greedy algorithm. digital.library.unt.edu/ark:/67531/metadc12118/
Urban surface characterization using LiDAR and aerial imagery.
Many calamities in history like hurricanes, tornado and flooding are proof to the large scale impact they cause to the life and economy. Computer simulation and GIS helps in modeling a real world scenario, which assists in evacuation planning, damage assessment, assistance and reconstruction. For achieving computer simulation and modeling there is a need for accurate classification of ground objects. One of the most significant aspects of this research is that it achieves improved classification for regions within which light detection and ranging (LiDAR) has low spatial resolution. This thesis describes a method for accurate classification of bare ground, water body, roads, vegetation, and structures using LiDAR data and aerial Infrared imagery. The most basic step for any terrain modeling application is filtering which is classification of ground and non-ground points. We present an integrated systematic method that makes classification of terrain and non-terrain points effective. Our filtering method uses the geometric feature of the triangle meshes created from LiDAR samples and calculate the confidence for every point. Geometric homogenous blocks and confidence are derived from TIN model and gridded LiDAR samples. The results from two representations are used in a classifier to determine if the block belongs ground or otherwise. Another important step is detection of water body, which is based on the LiDAR sample density of the region. Objects like tress and bare ground are characterized by the geometric features present in the LiDAR and the color features in the infrared imagery. These features are fed into a SVM classifier which detects bare-ground in the given region. Similarly trees are extracted using another trained SVM classifier. Once we obtain bare-grounds and trees, roads are extracted by removing the bare grounds. Structures are identified by the properties of non-ground segments. Experiments were conducted using LiDAR samples and Infrared imagery from the city of New Orleans. We evaluated the influence of different parameters to the classification. Water bodies were extracted successfully using density measures. Experiments showed that fusion of geometric properties and confidence levels resulted into efficient classification of ground and non-ground regions. Classification of vegetation using SVM was promising and effective using the features like height variation, HSV, angle etc. It is demonstrated that our methods successfully classified the region by using LiDAR data in a complex urban area with high-rise buildings. digital.library.unt.edu/ark:/67531/metadc12196/
End of Insertion Detection in Colonoscopy Videos
Colorectal cancer is the second leading cause of cancer-related deaths behind lung cancer in the United States. Colonoscopy is the preferred screening method for detection of diseases like Colorectal Cancer. In the year 2006, American Society for Gastrointestinal Endoscopy (ASGE) and American College of Gastroenterology (ACG) issued guidelines for quality colonoscopy. The guidelines suggest that on average the withdrawal phase during a screening colonoscopy should last a minimum of 6 minutes. My aim is to classify the colonoscopy video into insertion and withdrawal phase. The problem is that currently existing shot detection techniques cannot be applied because colonoscopy is a single camera shot from start to end. An algorithm to detect phase boundary has already been developed by the MIGLAB team. Existing method has acceptable levels of accuracy but the main issue is dependency on MPEG (Moving Pictures Expert Group) 1/2. I implemented exhaustive search for motion estimation to reduce the execution time and improve the accuracy. I took advantages of the C/C++ programming languages with multithreading which helped us get even better performances in terms of execution time. I propose a method for improving the current method of colonoscopy video analysis and also an extension for the same to make it usable for real time videos. The real time version we implemented is capable of handling streams coming directly from the camera in the form of uncompressed bitmap frames. Existing implementation could not be applied to real time scenario because of its dependency on MPEG 1/2. Future direction of this research includes improved motion search and GPU parallel computing techniques. digital.library.unt.edu/ark:/67531/metadc12159/
Cross Language Information Retrieval for Languages with Scarce Resources
Our generation has experienced one of the most dramatic changes in how society communicates. Today, we have online information on almost any imaginable topic. However, most of this information is available in only a few dozen languages. In this thesis, I explore the use of parallel texts to enable cross-language information retrieval (CLIR) for languages with scarce resources. To build the parallel text I use the Bible. I evaluate different variables and their impact on the resulting CLIR system, specifically: (1) the CLIR results when using different amounts of parallel text; (2) the role of paraphrasing on the quality of the CLIR output; (3) the impact on accuracy when translating the query versus translating the collection of documents; and finally (4) how the results are affected by the use of different dialects. The results show that all these variables have a direct impact on the quality of the CLIR system. digital.library.unt.edu/ark:/67531/metadc12157/
Force-Directed Graph Drawing and Aesthetics Measurement in a Non-Strict Pure Functional Programming Language
Non-strict pure functional programming often requires redesigning algorithms and data structures to work more effectively under new constraints of non-strict evaluation and immutable state. Graph drawing algorithms, while numerous and broadly studied, have no presence in the non-strict pure functional programming model. Additionally, there is currently no freely licensed standalone toolkit used to quantitatively analyze aesthetics of graph drawings. This thesis addresses two previously unexplored questions. Can a force-directed graph drawing algorithm be implemented in a non-strict functional language, such as Haskell, and still be practically usable? Can an easily extensible aesthetic measuring tool be implemented in a language such as Haskell and still be practically usable? The focus of the thesis is on implementing one of the simplest force-directed algorithms, that of Fruchterman and Reingold, and comparing its resulting aesthetics to those of a well-known C++ implementation of the same algorithm. digital.library.unt.edu/ark:/67531/metadc12125/
Computational Epidemiology - Analyzing Exposure Risk: A Deterministic, Agent-Based Approach
Many infectious diseases are spread through interactions between susceptible and infectious individuals. Keeping track of where each exposure to the disease took place, when it took place, and which individuals were involved in the exposure can give public health officials important information that they may use to formulate their interventions. Further, knowing which individuals in the population are at the highest risk of becoming infected with the disease may prove to be a useful tool for public health officials trying to curtail the spread of the disease. Epidemiological models are needed to allow epidemiologists to study the population dynamics of transmission of infectious agents and the potential impact of infectious disease control programs. While many agent-based computational epidemiological models exist in the literature, they focus on the spread of disease rather than exposure risk. These models are designed to simulate very large populations, representing individuals as agents, and using random experiments and probabilities in an attempt to more realistically guide the course of the modeled disease outbreak. The work presented in this thesis focuses on tracking exposure risk to chickenpox in an elementary school setting. This setting is chosen due to the high level of detailed information realistically available to school administrators regarding individuals' schedules and movements. Using an agent-based approach, contacts between individuals are tracked and analyzed with respect to both individuals and locations. The results are then analyzed using a combination of tools from computer science and geographic information science. digital.library.unt.edu/ark:/67531/metadc11017/
Development, Implementation, and Analysis of a Contact Model for an Infectious Disease
With a growing concern of an infectious diseases spreading in a population, epidemiology is becoming more important for the future of public health. In the past epidemiologist used existing data of an outbreak to help them determine how an infectious disease might spread in the future. Now with computational models, they able to analysis data produced by these models to help with prevention and intervention plans. This paper looks at the design, implementation, and analysis of a computational model based on the interactions of the population between individuals. The design of the working contact model looks closely at the SEIR model used as the foundation and the two timelines of a disease. The implementation of the contact model is reviewed while looking closely at data structures. The analysis of the experiments provide evidence this contact model can be used to help epidemiologist study the spread of an infectious disease based on the contact rate of individuals. digital.library.unt.edu/ark:/67531/metadc9824/
Direct Online/Offline Digital Signature Schemes.
Online/offline signature schemes are useful in many situations, and two such scenarios are considered in this dissertation: bursty server authentication and embedded device authentication. In this dissertation, new techniques for online/offline signing are introduced, those are applied in a variety of ways for creating online/offline signature schemes, and five different online/offline signature schemes that are proved secure under a variety of models and assumptions are proposed. Two of the proposed five schemes have the best offline or best online performance of any currently known technique, and are particularly well-suited for the scenarios that are considered in this dissertation. To determine if the proposed schemes provide the expected practical improvements, a series of experiments were conducted comparing the proposed schemes with each other and with other state-of-the-art schemes in this area, both on a desktop class computer, and under AVR Studio, a simulation platform for an 8-bit processor that is popular for embedded systems. Under AVR Studio, the proposed SGE scheme using a typical key size for the embedded device authentication scenario, can complete the offline phase in about 24 seconds and then produce a signature (the online phase) in 15 milliseconds, which is the best offline performance of any known signature scheme that has been proven secure in the standard model. In the tests on a desktop class computer, the proposed SGS scheme, which has the best online performance and is designed for the bursty server authentication scenario, generated 469,109 signatures per second, and the Schnorr scheme (the next best scheme in terms of online performance) generated only 223,548 signatures. The experimental results demonstrate that the SGE and SGS schemes are the most efficient techniques for embedded device authentication and bursty server authentication, respectively. digital.library.unt.edu/ark:/67531/metadc9717/
Graph-based Centrality Algorithms for Unsupervised Word Sense Disambiguation
This thesis introduces an innovative methodology of combining some traditional dictionary based approaches to word sense disambiguation (semantic similarity measures and overlap of word glosses, both based on WordNet) with some graph-based centrality methods, namely the degree of the vertices, Pagerank, closeness, and betweenness. The approach is completely unsupervised, and is based on creating graphs for the words to be disambiguated. We experiment with several possible combinations of the semantic similarity measures as the first stage in our experiments. The next stage attempts to score individual vertices in the graphs previously created based on several graph connectivity measures. During the final stage, several voting schemes are applied on the results obtained from the different centrality algorithms. The most important contributions of this work are not only that it is a novel approach and it works well, but also that it has great potential in overcoming the new-knowledge-acquisition bottleneck which has apparently brought research in supervised WSD as an explicit application to a plateau. The type of research reported in this thesis, which does not require manually annotated data, holds promise of a lot of new and interesting things, and our work is one of the first steps, despite being a small one, in this direction. The complete system is built and tested on standard benchmarks, and is comparable with work done on graph-based word sense disambiguation as well as lexical chains. The evaluation indicates that the right combination of the above mentioned metrics can be used to develop an unsupervised disambiguation engine as powerful as the state-of-the-art in WSD. digital.library.unt.edu/ark:/67531/metadc9736/
Exploring Trusted Platform Module Capabilities: A Theoretical and Experimental Study
Trusted platform modules (TPMs) are hardware modules that are bound to a computer's motherboard, that are being included in many desktops and laptops. Augmenting computers with these hardware modules adds powerful functionality in distributed settings, allowing us to reason about the security of these systems in new ways. In this dissertation, I study the functionality of TPMs from a theoretical as well as an experimental perspective. On the theoretical front, I leverage various features of TPMs to construct applications like random oracles that are impossible to implement in a standard model of computation. Apart from random oracles, I construct a new cryptographic primitive which is basically a non-interactive form of the standard cryptographic primitive of oblivious transfer. I apply this new primitive to secure mobile agent computations, where interaction between various entities is typically required to ensure security. I prove these constructions are secure using standard cryptographic techniques and assumptions. To test the practicability of these constructions and their applications, I performed an experimental study, both on an actual TPM and a software TPM simulator which has been enhanced to make it reflect timings from a real TPM. This allowed me to benchmark the performance of the applications and test the feasibility of the proposed extensions to standard TPMs. My tests also show that these constructions are practical. digital.library.unt.edu/ark:/67531/metadc6101/
General Purpose Programming on Modern Graphics Hardware
I start with a brief introduction to the graphics processing unit (GPU) as well as general-purpose computation on modern graphics hardware (GPGPU). Next, I explore the motivations for GPGPU programming, and the capabilities of modern GPUs (including advantages and disadvantages). Also, I give the background required for further exploring GPU programming, including the terminology used and the resources available. Finally, I include a comprehensive survey of previous and current GPGPU work, and end with a look at the future of GPU programming. digital.library.unt.edu/ark:/67531/metadc6112/
Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing.
This research addresses the problem of automatic keyphrase extraction from large documents and back of the book indexing. The potential benefits of automating this process are far reaching, from improving information retrieval in digital libraries, to saving countless man-hours by helping professional indexers creating back of the book indexes. The dissertation introduces a new methodology to evaluate automated systems, which allows for a detailed, comparative analysis of several techniques for keyphrase extraction. We introduce and evaluate both supervised and unsupervised techniques, designed to balance the resource requirements of an automated system and the best achievable performance. Additionally, a number of novel features are proposed, including a statistical informativeness measure based on chi statistics; an encyclopedic feature that taps into the vast knowledge base of Wikipedia to establish the likelihood of a phrase referring to an informative concept; and a linguistic feature based on sophisticated semantic analysis of the text using current theories of discourse comprehension. The resulting keyphrase extraction system is shown to outperform the current state of the art in supervised keyphrase extraction by a large margin. Moreover, a fully automated back of the book indexing system based on the keyphrase extraction system was shown to lead to back of the book indexes closely resembling those created by human experts. digital.library.unt.edu/ark:/67531/metadc6118/
Evaluation of MPLS Enabled Networks
Access: Use of this item is restricted to the UNT Community.
Recent developments in the Internet have inspired a wide range of business and consumer applications. The deployment of multimedia-based services has driven the demand for increased and guaranteed bandwidth requirements over the network. The diverse requirements of the wide range of users demand differentiated classes of service and quality assurance. The new technology of Multi-protocol label switching (MPLS) has emerged as a high performance and reliable option to address these challenges apart from the additional features that were not addressed before. This problem in lieu of thesis describes how the new paradigm of MPLS is advantageous over the conventional architecture. The motivation for this paradigm is discussed in the first part, followed by a detailed description of this new architecture. The information flow, the underlying protocols and the MPLS extensions to some of the traditional protocols are then discussed followed by the description of the simulation. The simulation results are used to show the advantages of the proposed technology. digital.library.unt.edu/ark:/67531/metadc5797/
A Netcentric Scientific Research Repository
Access: Use of this item is restricted to the UNT Community.
The Internet and networks in general have become essential tools for disseminating in-formation. Search engines have become the predominant means of finding information on the Web and all other data repositories, including local resources. Domain scientists regularly acquire and analyze images generated by equipment such as microscopes and cameras, resulting in complex image files that need to be managed in a convenient manner. This type of integrated environment has been recently termed a netcentric sci-entific research repository. I developed a number of data manipulation tools that allow researchers to manage their information more effectively in a netcentric environment. The specific contributions are: (1) A unique interface for management of data including files and relational databases. A wrapper for relational databases was developed so that the data can be indexed and searched using traditional search engines. This approach allows data in databases to be searched with the same interface as other data. Fur-thermore, this approach makes it easier for scientists to work with their data if they are not familiar with SQL. (2) A Web services based architecture for integrating analysis op-erations into a repository. This technique allows the system to leverage the large num-ber of existing tools by wrapping them with a Web service and registering the service with the repository. Metadata associated with Web services was enhanced to allow this feature to be included. In addition, an improved binary to text encoding scheme was de-veloped to reduce the size overhead for sending large scientific data files via XML mes-sages used in Web services. (3) Integrated image analysis operations with SQL. This technique allows for images to be stored and managed conveniently in a relational da-tabase. SQL supplemented with map algebra operations is used to select and perform operations on sets of images. digital.library.unt.edu/ark:/67531/metadc5611/
Analysis of Web Services on J2EE Application Servers
The Internet became a standard way of exchanging business data between B2B and B2C applications and with this came the need for providing various services on the web instead of just static text and images. Web services are a new type of services offered via the web that aid in the creation of globally distributed applications. Web services are enhanced e-business applications that are easier to advertise and easier to discover on the Internet because of their flexibility and uniformity. In a real life scenario it is highly difficult to decide which J2EE application server to go for when deploying a enterprise web service. This thesis analyzes the various ways by which web services can be developed & deployed. Underlying protocols and crucial issues like EAI (enterprise application integration), asynchronous messaging, Registry tModel architecture etc have been considered in this research. This paper presents a report by analyzing what various J2EE application servers provide by doing a case study and by developing applications to test functionality. digital.library.unt.edu/ark:/67531/metadc5547/
FIRST PREV 1 2 3 NEXT LAST