48 Matching Results

Search Results

Advanced search parameters have been applied.

Data Mining and Homeland Security: An Overview

Description: Data mining has become one of the key features of many homeland security initiatives. Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. In the context of homeland security, data mining can be a potential means to identify terrorist activities, such as money transfers and communications, and to identify and track individual terrorists themselves, such as through travel and immigration records. This report explores the issue of data mining in detail and in the context of homeland security, as well as relevant initiatives and pieces of legislation.
Date: October 3, 2006
Creator: Seifert, Jeffrey W.
Partner: UNT Libraries Government Documents Department

Data Mining: An Overview

Description: Data mining is emerging as one of the key features of many homeland security initiatives. Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. This report discusses the data mining uses (i.e. Terrorism Information Awareness (TIA) Program) and issues (i.e. data quality, interoperability, privacy), as well as the limitations of data mining.
Date: May 3, 2004
Creator: Seifert, Jeffrey W
Partner: UNT Libraries Government Documents Department

Data Mining: An Overview

Description: Data mining is emerging as one of the key features of many homeland security initiatives. Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. This report discusses the data mining uses (i.e. Terrorism Information Awareness (TIA) Program) and issues (i.e. data quality, interoperability, privacy), as well as the limitations of data mining.
Date: December 16, 2004
Creator: Seifert, Jeffrey W
Partner: UNT Libraries Government Documents Department

Data Mining: An Overview

Description: Data mining is emerging as one of the key features of many homeland security initiatives. Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. This report discusses the data mining uses (i.e. Terrorism Information Awareness (TIA) Program) and issues (i.e. data quality, interoperability, privacy), as well as the limitations of data mining.
Date: May 21, 2003
Creator: Seifert, Jeffrey W
Partner: UNT Libraries Government Documents Department

Data Mining: An Overview

Description: Data mining is emerging as one of the key features of many homeland security initiatives. Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. This report discusses the data mining uses (i.e. Terrorism Information Awareness (TIA) Program) and issues (i.e. data quality, interoperability, privacy), as well as the limitations of data mining.
Date: June 7, 2005
Creator: Seifert, Jeffrey W
Partner: UNT Libraries Government Documents Department

Data Mining: An Overview

Description: Data mining is emerging as one of the key features of many homeland security initiatives. Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. This report discusses the data mining uses (i.e. Terrorism Information Awareness (TIA) Program) and issues (i.e. data quality, interoperability, privacy), as well as the limitations of data mining.
Date: June 7, 2005
Creator: Seifert, Jeffrey W
Partner: UNT Libraries Government Documents Department

Data Mining and Homeland Security: An Overview

Description: Data mining is emerging as one of the key features of many homeland security initiatives. Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. This report discusses the data mining uses (i.e. Terrorism Information Awareness (TIA) Program) and issues (i.e. data quality, interoperability, privacy), as well as the limitations of data mining.
Date: July 27, 2006
Creator: Seifert, Jeffrey W
Partner: UNT Libraries Government Documents Department

Critical Success Factors in Data Mining Projects.

Description: The increasing awareness of data mining technology, along with the attendant increase in the capturing, warehousing, and utilization of historical data to support evidence-based decision making, is leading many organizations to recognize that the effective use of data is the key element in the next generation of client-server enterprise information technology. The concept of data mining is gaining acceptance in business as a means of seeking higher profits and lower costs. To deploy data mining projects successfully, organizations need to know the key factors for successful data mining. Implementing emerging information systems (IS) can be risky if the critical success factors (CSFs) have been researched insufficiently or documented inadequately. While numerous studies have listed the advantages and described the data mining process, there is little research on the success factors of data mining. This dissertation identifies CSFs in data mining projects. Chapter 1 introduces the history of the data mining process and states the problems, purposes, and significances of this dissertation. Chapter 2 reviews the literature, discusses general concepts of data mining and data mining project contexts, and reviews general concepts of CSF methodologies. It also describes the identification process for the various CSFs used to develop the research framework. Chapter 3 describes the research framework and methodology, detailing how the CSFs were identified and validated from more than 1,300 articles published on data mining and related topics. The validated CSFs, organized into a research framework using 7 factors, generate the research questions and hypotheses. Chapter 4 presents analysis and results, along with the chain of evidence for each research question, the quantitative instrument and survey results. In addition, it discusses how the data were collected and analyzed to answer the research questions. Chapter 5 concludes with a summary of the findings, describing assumptions and limitations and suggesting future research.
Date: August 2003
Creator: Sim, Jaesung
Partner: UNT Libraries

Data Mining Techniques for Predicting Breast Cancer Survivability Among Women in the United States

Description: Poster for the 2014 UNT Graduate Exhibition in the Computer Science and Information Technology category. This poster discusses data mining techniques for predicting breast cancer survivability among women in the United States.
Date: March 1, 2014
Creator: Alshammari, Sultanah M.; Shah, Tawfiq M. & Huang, Yan
Partner: UNT College of Engineering

GPS CaPPture: a System for GPS Trajectory Collection, Processing, and Destination Prediction

Description: In the United States, smartphone ownership surpassed 69.5 million in February 2011 with a large portion of those users (20%) downloading applications (apps) that enhance the usability of a device by adding additional functionality. a large percentage of apps are written specifically to utilize the geographical position of a mobile device. One of the prime factors in developing location prediction models is the use of historical data to train such a model. with larger sets of training data, prediction algorithms become more accurate; however, the use of historical data can quickly become a downfall if the GPS stream is not collected or processed correctly. Inaccurate or incomplete or even improperly interpreted historical data can lead to the inability to develop accurately performing prediction algorithms. As GPS chipsets become the standard in the ever increasing number of mobile devices, the opportunity for the collection of GPS data increases remarkably. the goal of this study is to build a comprehensive system that addresses the following challenges: (1) collection of GPS data streams in a manner such that the data is highly usable and has a reduction in errors; (2) processing and reduction of the collected data in order to prepare it and make it highly usable for the creation of prediction algorithms; (3) creation of prediction/labeling algorithms at such a level that they are viable for commercial use. This study identifies the key research problems toward building the CaPPture (collection, processing, prediction) system.
Date: May 2012
Creator: Griffin, Terry W.
Partner: UNT Libraries

Data Mining and Homeland Security: An Overview

Description: This report explores the issue of data mining in detail and in the context of homeland security, as well as relevant initiatives and pieces of legislation. Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. In the context of homeland security, data mining can be a potential means to identify terrorist activities, such as money transfers and communications, and to identify and track individual terrorists themselves, such as through travel and immigration records.
Date: August 27, 2008
Creator: Seifert, Jeffrey W.
Partner: UNT Libraries Government Documents Department

A Proposed Data Fusion Architecture for Micro-Zone Analysis and Data Mining

Description: Data Fusion requires the ability to combine or “fuse” date from multiple data sources. Time Series Analysis is a data mining technique used to predict future values from a data set based upon past values. Unlike other data mining techniques, however, Time Series places special emphasis on periodicity and how seasonal and other time-based factors tend to affect trends over time. One of the difficulties encountered in developing generic time series techniques is the wide variability of the data sets available for analysis. This presents challenges all the way from the data gathering stage to results presentation. This paper presents an architecture designed and used to facilitate the collection of disparate data sets well suited to Time Series analysis as well as other predictive data mining techniques. Results show this architecture provides a flexible, dynamic framework for the capture and storage of a myriad of dissimilar data sets and can serve as a foundation from which to build a complete data fusion architecture.
Date: August 1, 2012
Creator: McCarthy, Kevin & Manic, Milos
Partner: UNT Libraries Government Documents Department

MINING NUCLEAR TRANSIENT DATA THROUGH SYMBOLIC CONVERSION

Description: Dynamic Probabilistic Risk Assessment (DPRA) methodologies generate enormous amounts of data for a very large number of simulations. The data contain temporal information of both the state variables of the simulator and the temporal status of specific systems/components. In order to measure system performances, limitations and resilience, such data need to be carefully analyzed with the objective of discovering the correlations between sequence/timing of events and system dynamics. A first approach toward discovering these correlations from data generated by DPRA methodologies has been performed by organizing scenarios into groups using classification or clustering based algorithms. The identification of the correlations between system dynamics and timing/sequencing of events is performed by observing the temporal distribution of these events in each group of scenarios. Instead of performing “a posteriori” analysis of these correlations, this paper shows how it is possible to identify the correlations implicitly by performing a symbolic conversion of both continuous (temporal profiles of simulator state variables) and discrete (status of systems and components) data. Symbolic conversion is performed for each simulation by properly quantizing both continuous and discrete data and then converting them as a series of symbols. After merging both series together, a temporal phrase is obtained. This phrase preserves duration, coincidence and sequence of both continuous and discrete data in a uniform and consistent manner. In this paper it is also shown that by using specific distance measures, it is still possible to post-process such symbolic data using clustering and classification techniques but in considerably less time since the memory needed to store the data is greatly reduced by the symbolic conversion.
Date: September 1, 2013
Creator: MAndelli, Diego; Aldemir, Tunc; Yilmaz, Alper & Smith, Curtis
Partner: UNT Libraries Government Documents Department

Revealing Occupancy Patterns in Office Buildings Through the use of Annual Occupancy Sensor Data

Description: Energy simulation programs like DOE-2 and EnergyPlus are tools that have been proven to aid with energy calculations to predict energy use in buildings. Some inputs to energy simulation models are relatively easy to find, including building size, orientation, construction materials, and HVAC system size and type. Others vary with time (e.g. weather and occupancy) and some can be a challenge to estimate in order to create an accurate simulation. In this paper, the analysis of occupancy sensor data for a large commercial, multi-tenant office building is presented. It details occupancy diversity factors for private offices and summarizes the same for open offices, hallways, conference rooms, break rooms, and restrooms in order to better inform energy simulation parameters. Long-term data were collected allowing results to be presented to show variations of occupancy diversity factors in private offices for time of day, day of the week, holidays, and month of the year. The diversity factors presented differ as much as 46% from those currently published in ASHRAE 90.1 2004 energy cost method guidelines, a document referenced by energy modelers regarding occupancy diversity factors for simulations. This may result in misleading simulation results and may introduce inefficiencies in the final equipment and systems design.
Date: June 1, 2013
Creator: Duarte, Carlos; Wymelenberg, Kevin Van Den & Rieger, Craig
Partner: UNT Libraries Government Documents Department

FP-tree Based Spatial Co-location Pattern Mining

Description: A co-location pattern is a set of spatial features frequently located together in space. A frequent pattern is a set of items that frequently appears in a transaction database. Since its introduction, the paradigm of frequent pattern mining has undergone a shift from candidate generation-and-test based approaches to projection based approaches. Co-location patterns resemble frequent patterns in many aspects. However, the lack of transaction concept, which is crucial in frequent pattern mining, makes the similar shift of paradigm in co-location pattern mining very difficult. This thesis investigates a projection based co-location pattern mining paradigm. In particular, a FP-tree based co-location mining framework and an algorithm called FP-CM, for FP-tree based co-location miner, are proposed. It is proved that FP-CM is complete, correct, and only requires a small constant number of database scans. The experimental results show that FP-CM outperforms candidate generation-and-test based co-location miner by an order of magnitude.
Date: May 2005
Creator: Yu, Ping
Partner: UNT Libraries

A Language and Visual Interface to Specify Complex Spatial Pattern Mining

Description: The emerging interests in spatial pattern mining leads to the demand for a flexible spatial pattern mining language, on which easy to use and understand visual pattern language could be built. It is worthwhile to define a pattern mining language called LCSPM to allow users to specify complex spatial patterns. I describe a proposed pattern mining language in this paper. A visual interface which allows users to specify the patterns visually is developed. Visual pattern queries are translated into the LCSPM language by a parser and data mining process can be triggered afterwards. The visual language is based on and goes beyond the visual language proposed in literature. I implemented a prototype system based on the open source JUMP framework.
Access: This item is restricted to UNT Community Members. Login required if off-campus.
Date: December 2006
Creator: Li, Xiaohui
Partner: UNT Libraries

Revealing the Impact of Climate Variability on the Wind Resource Using Data Mining Techniques (Poster)

Description: A data mining technique called 'k-means clustering' can be used to group winds at the NWTC into 4 major clusters. The frequency of some winds in the clusters is correlated with regional pressure gradients and climate indices. The technique could also be applied to wind resource assessment and selecting scenarios for flow modeling.
Date: December 1, 2011
Creator: Clifton, A. & Lundquist, J.
Partner: UNT Libraries Government Documents Department

Data mining for ontology development.

Description: A multi-laboratory ontology construction effort during the summer and fall of 2009 prototyped an ontology for counterfeit semiconductor manufacturing. This effort included an ontology development team and an ontology validation methods team. Here the third team of the Ontology Project, the Data Analysis (DA) team reports on their approaches, the tools they used, and results for mining literature for terminology pertinent to counterfeit semiconductor manufacturing. A discussion of the value of ontology-based analysis is presented, with insights drawn from other ontology-based methods regularly used in the analysis of genomic experiments. Finally, suggestions for future work are offered.
Date: June 1, 2010
Creator: Davidson, George S.; Strasburg, Jana (Pacific Northwest National Laboratory, Richland, WA); Stampf, David (Brookhaven National Laboratory, Upton, NY); Neymotin,Lev (Brookhaven National Laboratory, Upton, NY); Czajkowski, Carl (Brookhaven National Laboratory, Upton, NY); Shine, Eugene (Savannah River National Laboratory, Aiken, SC) et al.
Partner: UNT Libraries Government Documents Department

An Introduction to Data Science

Description: This book provides non-technical readers with a gentle introduction to essential concepts and activities of data science. For more technical readers, the book provides explanations and code for a range of interesting applications using the open source R language for statistical computing and graphics"--Resource home page.
Date: 2012
Creator: Stanton, Jeffrey M.
Partner: UNT Libraries

Creating a Criterion-Based Information Agent Through Data Mining for Automated Identification of Scholarly Research on the World Wide Web

Description: This dissertation creates an information agent that correctly identifies Web pages containing scholarly research approximately 96% of the time. It does this by analyzing the Web page with a set of criteria, and then uses a classification tree to arrive at a decision. The criteria were gathered from the literature on selecting print and electronic materials for academic libraries. A Delphi study was done with an international panel of librarians to expand and refine the criteria until a list of 41 operationalizable criteria was agreed upon. A Perl program was then designed to analyze a Web page and determine a numerical value for each criterion. A large collection of Web pages was gathered comprising 5,000 pages that contain the full work of scholarly research and 5,000 random pages, representative of user searches, which do not contain scholarly research. Datasets were built by running the Perl program on these Web pages. The datasets were split into model building and testing sets. Data mining was then used to create different classification models. Four techniques were used: logistic regression, nonparametric discriminant analysis, classification trees, and neural networks. The models were created with the model datasets and then tested against the test dataset. Precision and recall were used to judge the effectiveness of each model. In addition, a set of pages that were difficult to classify because of their similarity to scholarly research was gathered and classified with the models. The classification tree created the most effective classification model, with a precision ratio of 96% and a recall ratio of 95.6%. However, logistic regression created a model that was able to correctly classify more of the problematic pages. This agent can be used to create a database of scholarly research published on the Web. In addition, the technique can be used to create a ...
Date: May 2000
Creator: Nicholson, Scott
Partner: UNT Libraries