Search Results

Corpus of News on the Web (NOW) - April 2018

Description: Dataset of words collected from newspapers and magazines from twenty different countries; the individual files include concordance information, parts-of-speech, and other arrangements of the data.
Access: Restricted to UNT Community Members. Login required if off-campus.
Date: April 2018
Creator: Davies, Mark
Partner: UNT Libraries

Corpus of News on the Web (NOW) - May 2018

Description: Dataset of words collected from newspapers and magazines from twenty different countries; the individual files include concordance information, parts-of-speech, and other arrangements of the data.
Access: Restricted to UNT Community Members. Login required if off-campus.
Date: May 2018
Creator: Davies, Mark
Partner: UNT Libraries

Congressional Globe OCR Dataset

Description: Dataset of OCR text from the Congressional Globe collection in the UNT Digital Library. In all there are 112 volumes and 104,615 pages of text in this dataset.
Date: April 6, 2015
Creator: Phillips, Mark Edward
Partner: UNT Libraries

Link Resolver Testing

Description: This excel file accompanies a workshop presentation titled 'Is it really that bad? Verifying the extent of full-text linking problems'.
Date: August 9, 2012
Creator: Harker, Karen
Partner: UNT Libraries

Dallas Police Shooting Twitter Dataset

Description: This dataset contains Twitter JSON data for several Twitter search queries that were collected the week following the shooting of police officers in Dallas, Texas on July 7th 2017, using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 7,146,993 Tweets make up the combined dataset.
Date: 2016-07-05/2016-07-14
Creator: Phillips, Mark Edward
Partner: UNT Libraries

Badlands National Park Twitter Dataset

Description: This dataset contains Twitter JSON data for Tweets related to the Badlands National Park (BadlandsNPS) user's tweets related to climate change and the Trump administration. This dataset was collected a few days before and following the phenomenon on Twitter. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 321,821 Tweets make up the combined dataset.
Date: 2017-01-15/2017-01-29
Creator: Phillips, Mark Edward
Partner: UNT Libraries

Hurricane Harvey Twitter Dataset

Description: This dataset contains Twitter JSON data for Tweets related to Hurricane Harvey and the subsequent flooding along the Texas gulf region. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 7,041,866 Tweets make up the combined dataset.
Date: 2017-08-18/2017-09-22
Creator: Phillips, Mark Edward
Partner: UNT Libraries

#Kaepernick7 and #ISupportKaepernickBecause Twitter Dataset

Description: This dataset contains Twitter JSON data for Tweets related to the hashtags #Kaepernick7 and ISupportKaepernickBecause This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 573,379 Tweets make up the combined dataset.
Date: 2016-08-20/2016-08-31
Creator: Phillips, Mark Edward
Partner: UNT Libraries

2015 FIFA Corruption Scandal Twitter Dataset

Description: This dataset is comprised of tweets that are related to the 2015 FIFA corruption scandal. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 8,615,937 Tweets make up the combined dataset.
Date: 2015-05-21/2015-06-05
Creator: Phillips, Mark Edward
Partner: UNT Libraries

#DescribeTrumpWithOneWord Twitter Dataset

Description: This dataset contains Twitter JSON data for Tweets related to the hashtag #DescribeTrumpWithOneWord. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 15,676 Tweets make up the combined dataset.
Date: 2017-09-02/2017-09-22
Creator: Phillips, Mark Edward
Partner: UNT Libraries

2016 Democratic National Convention in Philadelphia Twitter Dataset

Description: This dataset is comprised of tweets that are related to the 2016 Democratic National Committee meeting in Philadelphia, Pennsylvania that took place on July 25–28, 2016. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 15,676 Tweets make up the combined dataset.
Date: 2016-07-15/2016-08-01
Creator: Phillips, Mark Edward
Partner: UNT Libraries

[Dataset Paper Mould Version 2]

Description: 3D dataset model of a hand papermaking mould consisting of three parts: a mould frame, a mould surface, and a deckle. In this version (v2), the mould frame and the mould surface are printed separately and can be fastened together with common sheet metal screws. The user will need to print all three files to have a complete papermaking mould. The resulting 3D printed model will replicate the historical artifact used in Great Britain and parts of Europe in the nineteenth and twentieth centuries w… more
Date: November 10, 2020
Creator: Queen, Brian
Partner: UNT Libraries

[Dataset: Paper Mould Version 3]

Description: 3D dataset model of a hand papermaking mould consisting of three parts: a mould frame, a mould surface, and a deckle. This version (v3) is the same as version 2 in that the mould frame and the mould surface are printed separately, but the holes in V3's mould frame are larger in diameter to accept a heat set threaded insert and machine screws are used in place of sheet metal screws. The user will need to print all three files to have a complete papermaking mould. The resulting 3D printed model w… more
Date: November 10, 2020
Creator: Queen, Brian
Partner: UNT Libraries

Labeled PDF Dataset from End of Term (EOT) 2008 Web Archive

Description: This dataset contains a random sample of 2000 PDF documents from the usda.gov domain in the End of Term (EOT) 2008 Web Archive. These samples were categorized as being of interest for possible inclusion in the Technical Report Archive and Image Library (TRAIL). Each PDF has been sorted into two categories, Technical_Report and Not_Technical_Report.
Date: July 2018
Creator: Kirkwood, Patricia; Phillips, Mark Edward & Caldwell, Christopher
Partner: UNT Libraries

Gaming Census Dataset

Description: This dataset represents survey feedback gathered about games in libraries, collections, cataloging, outreach, and programming.
Date: December 3, 2018
Creator: Brannon, Sian; Robson, Diane & Dewitt-Miller, Erin
Partner: UNT Libraries

John Lewis Twitter Dataset

Description: This dataset contains Twitter JSON data for several Twitter search queries that were collected following the death on July 17, 2020, of American politician and civil-rights leader John Lewis, who served in the United States House of Representatives for Georgia's 5th congressional district from 1987 until his death. This dataset was created using the twarc (https://github.com/DocNow/twarc) package that makes use of Twitter's search API. A total of 6,870,881 Tweets and 42,055 media files make up … more
Date: 2020-07-10/2020-08-10
Creator: Phillips, Mark Edward
Partner: UNT Libraries

Hydroxychloroquine Twitter Dataset

Description: This dataset contains Twitter JSON data for several Twitter search queries that were collected related to the drug hydroxychloroquine and its relationship as an effective coronavirus treatment. This dataset was created to capture the opinions on Twitter after a group of people calling themselves "America’s Frontline Doctors" released a video sharing misleading claims about the virus and the drugs use as an effective treatment. This dataset was created using the twarc (https://github.com/DocNow/… more
Date: 2020-07-20/2020-08-11
Creator: Phillips, Mark Edward
Partner: UNT Libraries

Extended Date/Time Format (EDTF) Dates Research Datasets

Description: Two datasets, each with 390,751 date samples from the UNT Libraries' digital collections. These samples were compiled for research regarding the Extended Date/Time Format (EDTF) standard. The first dataset contains a concatenated list of date values from the metadata records in The Portal to Texas History, the UNT Digital Library, and The Gateway to Oklahoma History. The "classified" dataset includes labels expressing whether each date is EDTF-valid and the level of conformance.
Date: February 28, 2013
Creator: Phillips, Mark Edward
Partner: UNT Libraries

Portal to Texas History Newspaper OCR Text Dataset: Gainesville

Description: Dataset of OCR text from The Portal to Texas History and the Texas Digital Newspaper Program. This dataset includes titles from Gainesville Texas from the years 1888 to 1897. Titles included in this dataset include: The Daily Hesperian, and The Gainesville Daily Hesperian. In all there are 2,286 issues comprised of 9,359 pages of text.
Date: November 12, 2015
Creator: Phillips, Mark Edward
Partner: UNT Libraries

Portal to Texas History Newspaper OCR Text Dataset: Temple

Description: Dataset of OCR text from The Portal to Texas History and the Texas Digital Newspaper Program. This dataset includes titles from Temple Texas from the years 1907 to 1922. Titles included in this dataset include: Temple Daily Telegram. In all there are 4,627 issues comprised of 44,633 pages of text.
Date: November 12, 2015
Creator: Phillips, Mark Edward
Partner: UNT Libraries

Portal to Texas History Newspaper OCR Text Dataset: Abilene

Description: Dataset of OCR text from The Portal to Texas History and the Texas Digital Newspaper Program. This dataset includes titles from Abilene Texas from the years 1888 to 1923. Titles included in this dataset include: Abilene Daily Reporter, Abilene Morning Reporter, Abilene Semi-Weekly Farm Reporter, Abilene Semi-Weekly Reporter, Abilene Weekly Reporter, The Abilene Reporter, The Abilene Semi-Weekly Reporter, and the Abilene Weekly Reporter. In all there are 7,208 issues comprised of 62,871 pages… more
Date: November 12, 2015
Creator: Phillips, Mark Edward
Partner: UNT Libraries

Portal to Texas History Newspaper OCR Text Dataset: Denton

Description: Dataset of OCR text from The Portal to Texas History and the Texas Digital Newspaper Program. This dataset includes titles from Denton Texas from the years 1892 to 1911. Titles included in this dataset include: Denton County News, Denton County Record and Chronicle, Denton Evening News, Legal Tender, Record and Chronicle, The Denton County Record, and The Denton Monitor. In all there are 690 issues comprised of 4,686 pages of text.
Date: November 12, 2015
Creator: Phillips, Mark Edward
Partner: UNT Libraries

Portal to Texas History Newspaper OCR Text Dataset: McKinney

Description: Dataset of OCR text from The Portal to Texas History and the Texas Digital Newspaper Program. This dataset includes titles from McKinney Texas from the years 1880 to 1936. Titles included in this dataset include: Collin County Mercury, McKinney Weekly Democrat-Gazette, The Daily Courier, The Daily Gazette, The Democrat, The Democrat-Gazette, The Lion Roar, The McKinney Advocate, The McKinney Examiner, The McKinney Gazette, The Semi-Weekly Courier, The Southern Jerseyite, and The Weekly Democr… more
Date: November 12, 2015
Creator: Phillips, Mark Edward
Partner: UNT Libraries
Back to Top of Screen