Distributed Frameworks Towards Building an Open Data Architecture

PDF Version Also Available for Download.

Description

Data is everywhere. The current Technological advancements in Digital, Social media and the ease at which the availability of different application services to interact with variety of systems are causing to generate tremendous volumes of data. Due to such varied services, Data format is now not restricted to only structure type like text but can generate unstructured content like social media data, videos and images etc. The generated Data is of no use unless been stored and analyzed to derive some Value. Traditional Database systems comes with limitations on the type of data format schema, access rates and storage sizes ... continued below

Physical Description

viii, 47 pages : color illustrations

Creation Information

Venumuddala, Ramu Reddy May 2015.

Context

This thesis is part of the collection entitled: UNT Theses and Dissertations and was provided by UNT Libraries to Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 131 times . More information about this thesis can be viewed below.

Who

People and organizations associated with either the creation of this thesis or its content.

Chair

Committee Members

Publisher

Rights Holder

For guidance see Citations, Rights, Re-Use.

  • Venumuddala, Ramu Reddy

Provided By

UNT Libraries

The UNT Libraries serve the university and community by providing access to physical and online collections, fostering information literacy, supporting academic research, and much, much more.

Contact Us

What

Descriptive information to help identify this thesis. Follow the links below to find similar items on the Digital Library.

Degree Information

Description

Data is everywhere. The current Technological advancements in Digital, Social media and the ease at which the availability of different application services to interact with variety of systems are causing to generate tremendous volumes of data. Due to such varied services, Data format is now not restricted to only structure type like text but can generate unstructured content like social media data, videos and images etc. The generated Data is of no use unless been stored and analyzed to derive some Value. Traditional Database systems comes with limitations on the type of data format schema, access rates and storage sizes etc. Hadoop is an Apache open source distributed framework that support storing huge datasets of different formatted data reliably on its file system named Hadoop File System (HDFS) and to process the data stored on HDFS using MapReduce programming model. This thesis study is about building a Data Architecture using Hadoop and its related open source distributed frameworks to support a Data flow pipeline on a low commodity hardware. The Data flow components are, sourcing data, storage management on HDFS and data access layer. This study also discuss about a use case to utilize the architecture components. Sqoop, a framework to ingest the structured data from database onto Hadoop and Flume is used to ingest the semi-structured Twitter streaming json data on to HDFS for analysis. The data sourced using Sqoop and Flume have been analyzed using Hive for SQL like analytics and at a higher level of data access layer, Hadoop has been compared with an in memory computing system using Spark. Significant differences in query execution performances have been analyzed when working with Hadoop and Spark frameworks. This integration helps for ingesting huge Volumes of streaming json Variety data to derive better Value based analytics using Hive and Spark.

Physical Description

viii, 47 pages : color illustrations

Subjects

Language

Collections

This thesis is part of the following collection of related materials.

UNT Theses and Dissertations

Theses and dissertations represent a wealth of scholarly and artistic content created by masters and doctoral students in the degree-seeking process. Some ETDs in this collection are restricted to use by the UNT community.

What responsibilities do I have when using this thesis?

When

Dates and time periods associated with this thesis.

Creation Date

  • May 2015

Added to The UNT Digital Library

  • Feb. 9, 2016, 4:37 p.m.

Description Last Updated

  • March 2, 2017, 11:27 a.m.

Usage Statistics

When was this thesis last used?

Yesterday: 0
Past 30 days: 3
Total Uses: 131

Interact With This Thesis

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

International Image Interoperability Framework

IIF Logo

We support the IIIF Presentation API

Venumuddala, Ramu Reddy. Distributed Frameworks Towards Building an Open Data Architecture, thesis, May 2015; Denton, Texas. (digital.library.unt.edu/ark:/67531/metadc801911/: accessed September 26, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; .