UNT Theses and Dissertations - Browse

ABOUT BROWSE FEED

A Multi-Variate Analysis of SMTP Paths and Relays to Restrict Spam and Phishing Attacks in Emails

Description: The classifier discussed in this thesis considers the path traversed by an email (instead of its content) and reputation of the relays, features inaccessible to spammers. Groups of spammers and individual behaviors of a spammer in a given domain were analyzed to yield association patterns, which were then used to identify similar spammers. Unsolicited and phishing emails were successfully isolated from legitimate emails, using analysis results. Spammers and phishers are also categorized into serial spammers/phishers, recent spammers/phishers, prospective spammers/phishers, and suspects. Legitimate emails and trusted domains are classified into socially close (family members, friends), socially distinct (strangers etc), and opt-outs (resolved false positives and false negatives). Overall this classifier resulted in far less false positives when compared to current filters like SpamAssassin, achieving a 98.65% precision, which is well comparable to the precisions achieved by SPF, DNSRBL blacklists.
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: December 2006
Creator: Palla, Srikanth
Partner: UNT Libraries

A Language and Visual Interface to Specify Complex Spatial Pattern Mining

Description: The emerging interests in spatial pattern mining leads to the demand for a flexible spatial pattern mining language, on which easy to use and understand visual pattern language could be built. It is worthwhile to define a pattern mining language called LCSPM to allow users to specify complex spatial patterns. I describe a proposed pattern mining language in this paper. A visual interface which allows users to specify the patterns visually is developed. Visual pattern queries are translated into the LCSPM language by a parser and data mining process can be triggered afterwards. The visual language is based on and goes beyond the visual language proposed in literature. I implemented a prototype system based on the open source JUMP framework.
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: December 2006
Creator: Li, Xiaohui
Partner: UNT Libraries

Power-benefit analysis of erasure encoding with redundant routing in sensor networks.

Description: One of the problems sensor networks face is adversaries corrupting nodes along the path to the base station. One way to reduce the effect of these attacks is multipath routing. This introduces some intrusion-tolerance in the network by way of redundancy but at the cost of a higher power consumption by the sensor nodes. Erasure coding can be applied to this scenario in which the base station can receive a subset of the total data sent and reconstruct the entire message packet at its end. This thesis uses two commonly used encodings and compares their performance with respect to power consumed for unencoded data in multipath routing. It is found that using encoding with multipath routing reduces the power consumption and at the same time enables the user to send reasonably large data sizes. The experiments in this thesis were performed on the Tiny OS platform with the simulations done in TOSSIM and the power measurements were taken in PowerTOSSIM. They were performed on the simple radio model and the lossy radio model provided by Tiny OS. The lossy radio model was simulated with distances of 10 feet, 15 feet and 20 feet between nodes. It was found that by using erasure encoding, double or triple the data size can be sent at the same power consumption rate as unencoded data. All the experiments were performed with the radio set at a normal transmit power, and later a high transmit power.
Date: December 2006
Creator: Vishwanathan, Roopa
Partner: UNT Libraries

Grid-based Coordinated Routing in Wireless Sensor Networks

Description: Wireless sensor networks are battery-powered ad-hoc networks in which sensor nodes that are scattered over a region connect to each other and form multi-hop networks. These nodes are equipped with sensors such as temperature sensors, pressure sensors, and light sensors and can be queried to get the corresponding values for analysis. However, since they are battery operated, care has to be taken so that these nodes use energy efficiently. One of the areas in sensor networks where an energy analysis can be done is routing. This work explores grid-based coordinated routing in wireless sensor networks and compares the energy available in the network over time for different grid sizes.
Date: December 2006
Creator: Sawant, Uttara
Partner: UNT Libraries

Mediation on XQuery Views

Description: The major goal of information integration is to provide efficient and easy-to-use access to multiple heterogeneous data sources with a single query. At the same time, one of the current trends is to use standard technologies for implementing solutions to complex software problems. In this dissertation, I used XML and XQuery as the standard technologies and have developed an extended projection algorithm to provide a solution to the information integration problem. In order to demonstrate my solution, I implemented a prototype mediation system called Omphalos based on XML related technologies. The dissertation describes the architecture of the system, its metadata, and the process it uses to answer queries. The system uses XQuery expressions (termed metaqueries) to capture complex mappings between global schemas and data source schemas. The system then applies these metaqueries in order to rewrite a user query on a virtual global database (representing the integrated view of the heterogeneous data sources) to a query (termed an outsourced query) on the real data sources. An extended XML document projection algorithm was developed to increase the efficiency of selecting the relevant subset of data from an individual data source to answer the user query. The system applies the projection algorithm to decompose an outsourced query into atomic queries which are each executed on a single data source. I also developed an algorithm to generate integrating queries, which the system uses to compose the answers from the atomic queries into a single answer to the original user query. I present a proof of both the extended XML document projection algorithm and the query integration algorithm. An analysis of the efficiency of the new extended algorithm is also presented. Finally I describe a collaborative schema-matching tool that was implemented to facilitate maintaining metadata.
Date: December 2006
Creator: Peng, Xiaobo
Partner: UNT Libraries

CLUE: A Cluster Evaluation Tool

Description: Modern high performance computing is dependent on parallel processing systems. Most current benchmarks reveal only the high level computational throughput metrics, which may be sufficient for single processor systems, but can lead to a misrepresentation of true system capability for parallel systems. A new benchmark is therefore proposed. CLUE (Cluster Evaluator) uses a cellular automata algorithm to evaluate the scalability of parallel processing machines. The benchmark also uses algorithmic variations to evaluate individual system components' impact on the overall serial fraction and efficiency. CLUE is not a replacement for other performance-centric benchmarks, but rather shows the scalability of a system and provides metrics to reveal where one can improve overall performance. CLUE is a new benchmark which demonstrates a better comparison among different parallel systems than existing benchmarks and can diagnose where a particular parallel system can be optimized.
Date: December 2006
Creator: Parker, Brandon S.
Partner: UNT Libraries

An Approach Towards Self-Supervised Classification Using Cyc

Description: Due to the long duration required to perform manual knowledge entry by human knowledge engineers it is desirable to find methods to automatically acquire knowledge about the world by accessing online information. In this work I examine using the Cyc ontology to guide the creation of Naïve Bayes classifiers to provide knowledge about items described in Wikipedia articles. Given an initial set of Wikipedia articles the system uses the ontology to create positive and negative training sets for the classifiers in each category. The order in which classifiers are generated and used to test articles is also guided by the ontology. The research conducted shows that a system can be created that utilizes statistical text classification methods to extract information from an ad-hoc generated information source like Wikipedia for use in a formal semantic ontology like Cyc. Benefits and limitations of the system are discussed along with future work.
Date: December 2006
Creator: Coursey, Kino High
Partner: UNT Libraries

Natural Language Interfaces to Databases

Description: Natural language interfaces to databases (NLIDB) are systems that aim to bridge the gap between the languages used by humans and computers, and automatically translate natural language sentences to database queries. This thesis proposes a novel approach to NLIDB, using graph-based models. The system starts by collecting as much information as possible from existing databases and sentences, and transforms this information into a knowledge base for the system. Given a new question, the system will use this knowledge to analyze and translate the sentence into its corresponding database query statement. The graph-based NLIDB system uses English as the natural language, a relational database model, and SQL as the formal query language. In experiments performed with natural language questions ran against a large database containing information about U.S. geography, the system showed good performance compared to the state-of-the-art in the field.
Date: December 2006
Creator: Chandra, Yohan
Partner: UNT Libraries

Group-EDF: A New Approach and an Efficient Non-Preemptive Algorithm for Soft Real-Time Systems

Description: Hard real-time systems in robotics, space and military missions, and control devices are specified with stringent and critical time constraints. On the other hand, soft real-time applications arising from multimedia, telecommunications, Internet web services, and games are specified with more lenient constraints. Real-time systems can also be distinguished in terms of their implementation into preemptive and non-preemptive systems. In preemptive systems, tasks are often preempted by higher priority tasks. Non-preemptive systems are gaining interest for implementing soft-real applications on multithreaded platforms. In this dissertation, I propose a new algorithm that uses a two-level scheduling strategy for scheduling non-preemptive soft real-time tasks. Our goal is to improve the success ratios of the well-known earliest deadline first (EDF) approach when the load on the system is very high and to improve the overall performance in both underloaded and overloaded conditions. Our approach, known as group-EDF (gEDF), is based on dynamic grouping of tasks with deadlines that are very close to each other, and using a shortest job first (SJF) technique to schedule tasks within the group. I believe that grouping tasks dynamically with similar deadlines and utilizing secondary criteria, such as minimizing the total execution time can lead to new and more efficient real-time scheduling algorithms. I present results comparing gEDF with other real-time algorithms including, EDF, best-effort, and guarantee scheme, by using randomly generated tasks with varying execution times, release times, deadlines and tolerances to missing deadlines, under varying workloads. Furthermore, I implemented the gEDF algorithm in the Linux kernel and evaluated gEDF for scheduling real applications.
Date: August 2006
Creator: Li, Wenming
Partner: UNT Libraries

Modeling Infectious Disease Spread Using Global Stochastic Field Simulation

Description: Susceptibles-infectives-removals (SIR) and its derivatives are the classic mathematical models for the study of infectious diseases in epidemiology. In order to model and simulate epidemics of an infectious disease, a global stochastic field simulation paradigm (GSFS) is proposed, which incorporates geographic and demographic based interactions. The interaction measure between regions is a function of population density and geographical distance, and has been extended to include demographic and migratory constraints. The progression of diseases using GSFS is analyzed, and similar behavior to the SIR model is exhibited by GSFS, using the geographic information systems (GIS) gravity model for interactions. The limitations of the SIR and similar models of homogeneous population with uniform mixing are addressed by the GSFS model. The GSFS model is oriented to heterogeneous population, and can incorporate interactions based on geography, demography, environment and migration patterns. The progression of diseases can be modeled at higher levels of fidelity using the GSFS model, and facilitates optimal deployment of public health resources for prevention, control and surveillance of infectious diseases.
Date: August 2006
Creator: Venkatachalam, Sangeeta
Partner: UNT Libraries

Using Reinforcement Learning in Partial Order Plan Space

Description: Partial order planning is an important approach that solves planning problems without completely specifying the orderings between the actions in the plan. This property provides greater flexibility in executing plans; hence making the partial order planners a preferred choice over other planning methodologies. However, in order to find partially ordered plans, partial order planners perform a search in plan space rather than in space of world states and an uninformed search in plan space leads to poor efficiency. In this thesis, I discuss applying a reinforcement learning method, called First-visit Monte Carlo method, to partial order planning in order to design agents which do not need any training data or heuristics but are still able to make informed decisions in plan space based on experience. Communicating effectively with the agent is crucial in reinforcement learning. I address how this task was accomplished in plan space and the results from an evaluation of a blocks world test bed.
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: May 2006
Creator: Ceylan, Hakan
Partner: UNT Libraries

Towards Communicating Simple Sentence using Pictorial Representations

Description: Language can sometimes be an impediment in communication. Whether we are talking about people who speak different languages, students who are learning a new language, or people with language disorders, the understanding of linguistic representations in a given language requires a certain amount of knowledge that not everybody has. In this thesis, we propose "translation through pictures" as a means for conveying simple pieces of information across language barriers, and describe a system that can automatically generate pictorial representations for simple sentences. Comparative experiments conducted on visual and linguistic representations of information show that a considerable amount of understanding can be achieved through pictorial descriptions, with results within a comparable range of those obtained with current machine translation techniques. Moreover, a user study conducted around the pictorial translation system reveals that users found the system to generally produce correct word/image associations, and rate the system as interactive and intelligent.
Date: May 2006
Creator: Leong, Chee Wee
Partner: UNT Libraries

Flexible Digital Authentication Techniques

Description: Abstract This dissertation investigates authentication techniques in some emerging areas. Specifically, authentication schemes have been proposed that are well-suited for embedded systems, and privacy-respecting pay Web sites. With embedded systems, a person could own several devices which are capable of communication and interaction, but these devices use embedded processors whose computational capabilities are limited as compared to desktop computers. Examples of this scenario include entertainment devices or appliances owned by a consumer, multiple control and sensor systems in an automobile or airplane, and environmental controls in a building. An efficient public key cryptosystem has been devised, which provides a complete solution to an embedded system, including protocols for authentication, authenticated key exchange, encryption, and revocation. The new construction is especially suitable for the devices with constrained computing capabilities and resources. Compared with other available authentication schemes, such as X.509, identity-based encryption, etc, the new construction provides unique features such as simplicity, efficiency, forward secrecy, and an efficient re-keying mechanism. In the application scenario for a pay Web site, users may be sensitive about their privacy, and do not wish their behaviors to be tracked by Web sites. Thus, an anonymous authentication scheme is desirable in this case. That is, a user can prove his/her authenticity without revealing his/her identity. On the other hand, the Web site owner would like to prevent a bunch of users from sharing a single subscription while hiding behind user anonymity. The Web site should be able to detect these possible malicious behaviors, and exclude corrupted users from future service. This dissertation extensively discusses anonymous authentication techniques, such as group signature, direct anonymous attestation, and traceable signature. Three anonymous authentication schemes have been proposed, which include a group signature scheme with signature claiming and variable linkability, a scheme for direct anonymous attestation in trusted computing platforms ...
Date: May 2006
Creator: Ge, He
Partner: UNT Libraries

Modeling the Impact and Intervention of a Sexually Transmitted Disease: Human Papilloma Virus

Description: Many human papilloma virus (HPV) types are sexually transmitted and HPV DNA types 16, 18, 31, and 45 account for more than 75% if all cervical dysplasia. Candidate vaccines are successfully completing US Federal Drug Agency (FDA) phase III testing and several drug companies are in licensing arbitration. Once this vaccine become available it is unlikely that 100% vaccination coverage will be probable; hence, the need for vaccination strategies that will have the greatest reduction on the endemic prevalence of HPV. This thesis introduces two discrete-time models for evaluating the effect of demographic-biased vaccination strategies: one model incorporates temporal demographics (i.e., age) in population compartments; the other non-temporal demographics (i.e., race, ethnicity). Also presented is an intuitive Web-based interface that was developed to allow the user to evaluate the effects on prevalence of a demographic-biased intervention by tailoring the model parameters to specific demographics and geographical region.
Date: May 2006
Creator: Corley, Courtney D.
Partner: UNT Libraries

An Integrated Architecture for Ad Hoc Grids

Description: Extensive research has been conducted by the grid community to enable large-scale collaborations in pre-configured environments. grid collaborations can vary in scale and motivation resulting in a coarse classification of grids: national grid, project grid, enterprise grid, and volunteer grid. Despite the differences in scope and scale, all the traditional grids in practice share some common assumptions. They support mutually collaborative communities, adopt a centralized control for membership, and assume a well-defined non-changing collaboration. To support grid applications that do not confirm to these assumptions, we propose the concept of ad hoc grids. In the context of this research, we propose a novel architecture for ad hoc grids that integrates a suite of component frameworks. Specifically, our architecture combines the community management framework, security framework, abstraction framework, quality of service framework, and reputation framework. The overarching objective of our integrated architecture is to support a variety of grid applications in a self-controlled fashion with the help of a self-organizing ad hoc community. We introduce mechanisms in our architecture that successfully isolates malicious elements from the community, inherently improving the quality of grid services and extracting deterministic quality assurances from the underlying infrastructure. We also emphasize on the technology-independence of our architecture, thereby offering the requisite platform for technology interoperability. The feasibility of the proposed architecture is verified with a high-quality ad hoc grid implementation. Additionally, we have analyzed the performance and behavior of ad hoc grids with respect to several control parameters.
Date: May 2006
Creator: Amin, Kaizar Abdul Husain
Partner: UNT Libraries

Bayesian Probabilistic Reasoning Applied to Mathematical Epidemiology for Predictive Spatiotemporal Analysis of Infectious Diseases

Description: Abstract Probabilistic reasoning under uncertainty suits well to analysis of disease dynamics. The stochastic nature of disease progression is modeled by applying the principles of Bayesian learning. Bayesian learning predicts the disease progression, including prevalence and incidence, for a geographic region and demographic composition. Public health resources, prioritized by the order of risk levels of the population, will efficiently minimize the disease spread and curtail the epidemic at the earliest. A Bayesian network representing the outbreak of influenza and pneumonia in a geographic region is ported to a newer region with different demographic composition. Upon analysis for the newer region, the corresponding prevalence of influenza and pneumonia among the different demographic subgroups is inferred for the newer region. Bayesian reasoning coupled with disease timeline is used to reverse engineer an influenza outbreak for a given geographic and demographic setting. The temporal flow of the epidemic among the different sections of the population is analyzed to identify the corresponding risk levels. In comparison to spread vaccination, prioritizing the limited vaccination resources to the higher risk groups results in relatively lower influenza prevalence. HIV incidence in Texas from 1989-2002 is analyzed using demographic based epidemic curves. Dynamic Bayesian networks are integrated with probability distributions of HIV surveillance data coupled with the census population data to estimate the proportion of HIV incidence among the different demographic subgroups. Demographic based risk analysis lends to observation of varied spectrum of HIV risk among the different demographic subgroups. A methodology using hidden Markov models is introduced that enables to investigate the impact of social behavioral interactions in the incidence and prevalence of infectious diseases. The methodology is presented in the context of simulated disease outbreak data for influenza. Probabilistic reasoning analysis enhances the understanding of disease progression in order to identify the critical points of surveillance, ...
Date: May 2006
Creator: Abbas, Kaja Moinudeen
Partner: UNT Libraries

The enhancement of machine translation for low-density languages using Web-gathered parallel texts.

Description: The majority of the world's languages are poorly represented in informational media like radio, television, newspapers, and the Internet. Translation into and out of these languages may offer a way for speakers of these languages to interact with the wider world, but current statistical machine translation models are only effective with a large corpus of parallel texts - texts in two languages that are translations of one another - which most languages lack. This thesis describes the Babylon project which attempts to alleviate this shortage by supplementing existing parallel texts with texts gathered automatically from the Web -- specifically targeting pages that contain text in a pair of languages. Results indicate that parallel texts gathered from the Web can be effectively used as a source of training data for machine translation and can significantly improve the translation quality for text in a similar domain. However, the small quantity of high-quality low-density language parallel texts on the Web remains a significant obstacle.
Date: December 2007
Creator: Mohler, Michael Augustine Gaylord
Partner: UNT Libraries

Automated Syndromic Surveillance using Intelligent Mobile Agents

Description: Current syndromic surveillance systems utilize centralized databases that are neither scalable in storage space nor in computing power. Such systems are limited in the amount of syndromic data that may be collected and analyzed for the early detection of infectious disease outbreaks. However, with the increased prevalence of international travel, public health monitoring must extend beyond the borders of municipalities or states which will require the ability to store vasts amount of data and significant computing power for analyzing the data. Intelligent mobile agents may be used to create a distributed surveillance system that will utilize the hard drives and computer processing unit (CPU) power of the hosts on the agent network where the syndromic information is located. This thesis proposes the design of a mobile agent-based syndromic surveillance system and an agent decision model for outbreak detection. Simulation results indicate that mobile agents are capable of detecting an outbreak that occurs at all hosts the agent is monitoring. Further study of agent decision models is required to account for localized epidemics and variable agent movement rates.
Date: December 2007
Creator: Miller, Paul
Partner: UNT Libraries

High Performance Architecture using Speculative Threads and Dynamic Memory Management Hardware

Description: With the advances in very large scale integration (VLSI) technology, hundreds of billions of transistors can be packed into a single chip. With the increased hardware budget, how to take advantage of available hardware resources becomes an important research area. Some researchers have shifted from control flow Von-Neumann architecture back to dataflow architecture again in order to explore scalable architectures leading to multi-core systems with several hundreds of processing elements. In this dissertation, I address how the performance of modern processing systems can be improved, while attempting to reduce hardware complexity and energy consumptions. My research described here tackles both central processing unit (CPU) performance and memory subsystem performance. More specifically I will describe my research related to the design of an innovative decoupled multithreaded architecture that can be used in multi-core processor implementations. I also address how memory management functions can be off-loaded from processing pipelines to further improve system performance and eliminate cache pollution caused by runtime management functions.
Date: December 2007
Creator: Li, Wentong
Partner: UNT Libraries

System and Methods for Detecting Unwanted Voice Calls

Description: Voice over IP (VoIP) is a key enabling technology for the migration of circuit-switched PSTN architectures to packet-based IP networks. However, this migration is successful only if the present problems in IP networks are addressed before deploying VoIP infrastructure on a large scale. One of the important issues that the present VoIP networks face is the problem of unwanted calls commonly referred to as SPIT (spam over Internet telephony). Mostly, these SPIT calls are from unknown callers who broadcast unwanted calls. There may be unwanted calls from legitimate and known people too. In this case, the unwantedness depends on social proximity of the communicating parties. For detecting these unwanted calls, I propose a framework that analyzes incoming calls for unwanted behavior. The framework includes a VoIP spam detector (VSD) that analyzes incoming VoIP calls for spam behavior using trust and reputation techniques. The framework also includes a nuisance detector (ND) that proactively infers the nuisance (or reluctance of the end user) to receive incoming calls. This inference is based on past mutual behavior between the calling and the called party (i.e., caller and callee), the callee's presence (mood or state of mind) and tolerance in receiving voice calls from the caller, and the social closeness between the caller and the callee. The VSD and ND learn the behavior of callers over time and estimate the possibility of the call to be unwanted based on predetermined thresholds configured by the callee (or the filter administrators). These threshold values have to be automatically updated for integrating dynamic behavioral changes of the communicating parties. For updating these threshold values, I propose an automatic calibration mechanism using receiver operating characteristics curves (ROC). The VSD and ND use this mechanism for dynamically updating thresholds for optimizing their accuracy of detection. In addition to unwanted calls ...
Date: December 2007
Creator: Kolan, Prakash
Partner: UNT Libraries

Automated Defense Against Worm Propagation.

Description: Worms have caused significant destruction over the last few years. Network security elements such as firewalls, IDS, etc have been ineffective against worms. Some worms are so fast that a manual intervention is not possible. This brings in the need for a stronger security architecture which can automatically react to stop worm propagation. The method has to be signature independent so that it can stop new worms. In this thesis, an automated defense system (ADS) is developed to automate defense against worms and contain the worm to a level where manual intervention is possible. This is accomplished with a two level architecture with feedback at each level. The inner loop is based on control system theory and uses the properties of PID (proportional, integral and differential controller). The outer loop works at the network level and stops the worm to reach its spread saturation point. In our lab setup, we verified that with only inner loop active the worm was delayed, and with both loops active we were able to restrict the propagation to 10% of the targeted hosts. One concern for deployment of a worm containment mechanism was degradation of throughput for legitimate traffic. We found that with proper intelligent algorithm we can minimize the degradation to an acceptable level.
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: December 2005
Creator: Patwardhan, Sudeep
Partner: UNT Libraries

Planning techniques for agent based 3D animations.

Description: The design of autonomous agents capable of performing a given goal in a 3D domain continues to be a challenge for computer animated story generation systems. We present a novel prototype which consists of a 3D engine and a planner for a simple virtual world. We incorporate the 2D planner into the 3D engine to provide 3D animations. Based on the plan, the 3D world is created and the objects are positioned. Then the plan is linearized into simpler actions for object animation and rendered via the 3D engine. We use JINNI3D as the engine and WARPLAN-C as the planner for the above-mentioned prototype. The user can interact with the system using a simple natural language interface. The interface consists of a shallow parser, which is capable of identifying a set of predefined basic commands. The command given by the user is considered as the goal for the planner. The resulting plan is created and rendered in 3D. The overall system is comparable to a character based interactive story generation system except that it is limited to the predefined 3D environment.
Date: December 2005
Creator: Kandaswamy, Balasubramanian
Partner: UNT Libraries

A Minimally Supervised Word Sense Disambiguation Algorithm Using Syntactic Dependencies and Semantic Generalizations

Description: Natural language is inherently ambiguous. For example, the word "bank" can mean a financial institution or a river shore. Finding the correct meaning of a word in a particular context is a task known as word sense disambiguation (WSD), which is essential for many natural language processing applications such as machine translation, information retrieval, and others. While most current WSD methods try to disambiguate a small number of words for which enough annotated examples are available, the method proposed in this thesis attempts to address all words in unrestricted text. The method is based on constraints imposed by syntactic dependencies and concept generalizations drawn from an external dictionary. The method was tested on standard benchmarks as used during the SENSEVAL-2 and SENSEVAL-3 WSD international evaluation exercises, and was found to be competitive.
Date: December 2005
Creator: Faruque, Md. Ehsanul
Partner: UNT Libraries

FP-tree Based Spatial Co-location Pattern Mining

Description: A co-location pattern is a set of spatial features frequently located together in space. A frequent pattern is a set of items that frequently appears in a transaction database. Since its introduction, the paradigm of frequent pattern mining has undergone a shift from candidate generation-and-test based approaches to projection based approaches. Co-location patterns resemble frequent patterns in many aspects. However, the lack of transaction concept, which is crucial in frequent pattern mining, makes the similar shift of paradigm in co-location pattern mining very difficult. This thesis investigates a projection based co-location pattern mining paradigm. In particular, a FP-tree based co-location mining framework and an algorithm called FP-CM, for FP-tree based co-location miner, are proposed. It is proved that FP-CM is complete, correct, and only requires a small constant number of database scans. The experimental results show that FP-CM outperforms candidate generation-and-test based co-location miner by an order of magnitude.
Date: May 2005
Creator: Yu, Ping
Partner: UNT Libraries