Search Results

2015 FIFA Corruption Scandal Twitter Dataset
This dataset is comprised of tweets that are related to the 2015 FIFA corruption scandal. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 8,615,937 Tweets make up the combined dataset.
2016 Democratic National Convention in Philadelphia Twitter Dataset
This dataset is comprised of tweets that are related to the 2016 Democratic National Committee meeting in Philadelphia, Pennsylvania that took place on July 25–28, 2016. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 15,676 Tweets make up the combined dataset.
2018 Texas Sentate Debate Twitter Dataset
This dataset contains Twitter JSON data for Tweets related to the United States Senate race between Beto O'Rourke and Ted Cruz. This dataset contains Tweets captured around their first debate on September 21, 2018. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 3,006,198 Tweets and 101,050 media files make up the combined dataset.
[Age of the UNT Libraries Collection Dataset, 2013]
Dataset generated for the University of North Texas Libraries collection tabulating the number of items published by decade within each subject area.
ALA Values and LGBT Social Justice
This dataset contains survey results from librarians regarding their stance on American Library Association values and social justice in relation to LGBTQ issues.
Analyzing COVID-19 Resources on Association of Academic Health Sciences Libraries’ (AAHSL) LibGuides – DATA
Data collected in order to analyze Association of Academic Health Sciences Libraries (AAHSL) member libraries’ COVID-19 LibGuides to determine quantity and origin of links included. The data set includes information on AAHSL member libraries, the stratified sample, and links/structure of applicable LibGuides.
The Autonomic Spectrum Questionnaire: A Factor Analysis
Dataset for the article, "The Autonomic Spectrum Questionnaire: A Factor Analysis" by Colin Ross, Justin Litvin, Anthony Ryals, and Patricia Kaminski (2021).
Badlands National Park Twitter Dataset
This dataset contains Twitter JSON data for Tweets related to the Badlands National Park (BadlandsNPS) user's tweets related to climate change and the Trump administration. This dataset was collected a few days before and following the phenomenon on Twitter. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 321,821 Tweets make up the combined dataset.
Biological Systems Dataset
Dataset generated for research on biological systems.
Blood Brain Organic Solute Descriptors
Dataset generated for research on blood brain organic solute descriptors.
Blood Fat Solute Descriptors Dataset
Dataset generated for research on blood fat solute descriptors.
Blood Liver Organic Solute Descriptors Dataset
Dataset generated for research on blood liver organic solute descriptors.
Bluegill Dataset
Dataset generated for research on bluegills.
Coda Archival Digital Repository Dataset
This dataset contains information extracted from the UNT Libraries' Coda Digital Repository. It contains information related to number of files, size, and ingest date of digital objects added to that system. It can be used for analysis and investigation of the growth and makeup of digital repositories.
Congressional Globe OCR Dataset
Dataset of OCR text from the Congressional Globe collection in the UNT Digital Library. In all there are 112 volumes and 104,615 pages of text in this dataset.
Consumer health information on public library websites: Availability and characteristics
Data documenting a sample of 200 U.S. public libraries. Each library's website was content analyzed to 1) determine if it provides online consumer health information (CHI) sources and, if yes, 2) describe eight characteristics of the CHI sources.
Dallas Police Shooting Twitter Dataset
This dataset contains Twitter JSON data for several Twitter search queries that were collected the week following the shooting of police officers in Dallas, Texas on July 7th 2017, using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 7,146,993 Tweets make up the combined dataset.
Data Annex to the United Nations Truth Commission on the civil war in El Salvador from 1979--1991 (digitized text)
This dataset contains statistical information transcribed from the supplementary documentation of a United Nations (UN) report compiled by The Commission on the Truth for El Salvador (La Comision de la Verdad para El Salvador). It includes information about approximately 20,000 civilian/noncombatant victims of the civil war in El Salvador (from 1979 to 1991) taken from interviews of those who survived or knew/knew of those who were victims.
DataRes Project Institution Policy Scan Data
Dataset from the DataRes Project indicating the name of the institutions in the study, funding awarded by the National Science Foundation (NSF) and the National Institute of Health (NIH) during the 2010-2011 fiscal year, whether institutions have a Data Management Policy, and the URL is a policy exists.
DataRes Project Primary Survey
Dataset from the DataRes Project. This dataset is the primary survey on data management needs of researchers.
DataRes Project Secondary Survey
Dataset from the DataRes Project. This dataset is the secondary survey on data management needs of researchers.
[Dataset of Web Archiving Research Articles]
Datasets used in the presentation, "Towards Building a Collection of Web Archiving Research Articles." The files included here were used to conduct several Machine Learning classification experiments that result in a corpus of scholarly research articles on the topic of web archiving.
[Dataset Supplemental Material and References]
Supplemental materials and references accompanying a series of chemistry datasets.
#DescribeTrumpWithOneWord Twitter Dataset
This dataset contains Twitter JSON data for Tweets related to the hashtag #DescribeTrumpWithOneWord. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 15,676 Tweets make up the combined dataset.
Development of Facultative Air Breathing in Bristlenose Plecos (Ancistrus cirrhosus)
Data collected on air breathing development in the bristlenose pleco. Bristlenose plecos breath air with a highly vascularized stomach when exposed to aquatic hypoxic conditions. This study looked at the development of this behavior and when the fish fist began to breathe air.
#DiaperDon Twitter Dataset
This dataset contains Twitter JSON data for Tweets related to the hashtag #DiaperDon. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 866,987 Tweets make up the combined dataset.
Digital Public Library of America: Bulk Metadata Download Feb 2015
Dataset containing metadata contributed to the Digital Public Library of America and normalized into their internal format.
ERCOT/2021 Texas Power Crisis Twitter Dataset
This dataset contains Twitter JSON data for Tweets related to the This dataseic Reliability Countil of Texas (ERCOT) during the 2021 Texas power crisis from February 10th, thru February 27th, 2021. The dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 612,082 Tweets make up the combined dataset.
Ethics Gaming Survey Results
Dataset generated for a National Science Foundation grant project, "EAGER: Prototyping a Virtue Ethics Game." These files contain the research results of the pre-test and post-test surveys.
Extended Date/Time Format (EDTF) Dates Research Datasets
Two datasets, each with 390,751 date samples from the UNT Libraries' digital collections. These samples were compiled for research regarding the Extended Date/Time Format (EDTF) standard. The first dataset contains a concatenated list of date values from the metadata records in The Portal to Texas History, the UNT Digital Library, and The Gateway to Oklahoma History. The "classified" dataset includes labels expressing whether each date is EDTF-valid and the level of conformance.
Gaming Census Dataset
This dataset represents survey feedback gathered about games in libraries, collections, cataloging, outreach, and programming.
Goldfish Dataset
Dataset generated for research on goldfish.
Hurricane Dorian Twitter Dataset
This dataset contains Twitter JSON data for Tweets related to Hurricane Dorian which is the most intense tropical cyclone on record to strike the Bahamas, and is regarded as the worst natural disaster in the country's history. This dataset was created using the twarc (https://github.com/DocNow/twarc) package that makes use of Twitter's search API. A total of 3,000,553 Tweets and 84,216 media files make up the combined dataset.
Hurricane Florence Twitter Dataset
This dataset contains Twitter JSON data for Tweets related to Hurricane Florence and the subsequent flooding along the Carolina coastal region. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 4,971,575 Tweets and 347,205 media files make up the combined dataset.
Hurricane Harvey Twitter Dataset
This dataset contains Twitter JSON data for Tweets related to Hurricane Harvey and the subsequent flooding along the Texas gulf region. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 7,041,866 Tweets make up the combined dataset.
Hurricane Ida Twitter Dataset
This dataset contains Twitter JSON data for Tweets related to Hurricane Ida which was a deadly and distructive Category 4 Atlantic hurricane that made landfall in Lousiana in 2021. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 1,868,703 Tweets make up the combined dataset.
Hurricane Laura Twitter Dataset
This dataset contains Twitter JSON data for Tweets related to Hurricane Laura that formed August 20, 2020 and dissipated August 29, 2020. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 1,168,178 Tweets make up the combined dataset.
Hydroxychloroquine Twitter Dataset
This dataset contains Twitter JSON data for several Twitter search queries that were collected related to the drug hydroxychloroquine and its relationship as an effective coronavirus treatment. This dataset was created to capture the opinions on Twitter after a group of people calling themselves "America’s Frontline Doctors" released a video sharing misleading claims about the virus and the drugs use as an effective treatment. This dataset was created using the twarc (https://github.com/DocNow/twarc) package that makes use of Twitter's search API. A total of 4,187,890 Tweets and 15,779 media files make up the combined dataset.
Ionic Liquids Dataset
Dataset generated for research on ionic liquids.
John Lewis Twitter Dataset
This dataset contains Twitter JSON data for several Twitter search queries that were collected following the death on July 17, 2020, of American politician and civil-rights leader John Lewis, who served in the United States House of Representatives for Georgia's 5th congressional district from 1987 until his death. This dataset was created using the twarc (https://github.com/DocNow/twarc) package that makes use of Twitter's search API. A total of 6,870,881 Tweets and 42,055 media files make up the combined dataset.
#Kaepernick7 and #ISupportKaepernickBecause Twitter Dataset
This dataset contains Twitter JSON data for Tweets related to the hashtags #Kaepernick7 and ISupportKaepernickBecause This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 573,379 Tweets make up the combined dataset.
Labeled PDF Dataset from End of Term (EOT) 2008 Web Archive
This dataset contains a random sample of 2000 PDF documents from the usda.gov domain in the End of Term (EOT) 2008 Web Archive. These samples were categorized as being of interest for possible inclusion in the Technical Report Archive and Image Library (TRAIL). Each PDF has been sorted into two categories, Technical_Report and Not_Technical_Report.
Labeled PDF Dataset from Texas Records and Information Locator (TRAIL) Web Archive
This dataset contains a random sample of 2000 PDF documents from the Texas Records and Information Locator (TRAIL) Web Archive from the Texas State Library and Archives Commission. Each PDF has been sorted into two categories, TX_Pub_In_Scope and Not_TX_Pub.
Labeled PDF Dataset from UNT.edu
This dataset contains a random sample of 2000 PDF documents from the Spring 2017 Web Archive of the unt.edu domain. (https://digital.library.unt.edu/ark:/67531/metadc993363/) that have been sorted into two categories, ForRepo and NotForRepo.
Log LC Bluegill Dataset
Dataset generated for research on bluegills.
Log LC Daphnia pulex Dataset
Dataset generated for research on Daphnia pulex.
Log LC Goldfish Dataset
Dataset generated for research on goldfish.
Log P Blood Brain Dataset
Dataset generated for research on blood brain.
Log P Blood Fat Dataset
Dataset generated for research on blood fat.
Log P Blood Liver Dataset
Dataset generated for research on blood liver.
Back to Top of Screen