Mapping Texts: Combining Text-Mining and Geo-Visualization To Unlock The Research Potential of Historical Newspapers Page: 21
53 p.View a full description of this paper.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
Lastly, all of the onscreen texts are drawn from a simple text "configuration" document that could
be easily edited to change the labeling, geographic or temporal context, or underlying data sets.Here is the completed version:
MAPPING TEXTS wa ..... '-
- - --- --, ....-
Assessing Digitization Quality Quantity of Recognized a
Scans of Texas Newspapers, 1829-2008 zoom:_ sc 3- 6- X Max
This visualization plots the quantity and quality of 232,567 pages of historical Texas
newspapers, as they spread out over time and space. The graphs plot the overall quantity
of information available by year and the quality of the corpus (by comparing the number of
words we can recognize to the total number scanned). The map shows the geography of
the collection, grouping all newspapers by their publication city. and can show both the
quantity and quality of the newspapers from various locations Clicking on a particular city 18 s
will provide a detailed view of the individual newspapers, where you can examine both the 1f5850 Is
quantity and quality of information. A timeline of historical events related to Texas is also
available for context.
Cr Collection Quantity and Quality by Locationnd Unrecognized Text, 1828-2008
Total Words Scanned 563.85 k Correct Words Scanned 454.13 k ] February 01, 2008
20 m
1875 1900 1925 1950 1975 20do0
1815 :.- .---1900 . 1925. --. 1950 19/5 2000Good words 72,186,505
VThe HSU Brand 100%
1The McMurry Bulletir
'The Optimist 80%
1The Reata
The War Whoop 60%
Zoom level 40%
r., 1912-20038
All years 20%
3 ' 3:1 3 ' r 3 2 r 3 3 , 3 4 T 53 3 59gtonv** - cabt s Gra u g Woowad E l Mma _one
lest ] o .- o o Tulsa , ap .,u . .
Darhert
I ...... o ..... o ,
D msoOklahoma Musrkogee FayeOnAille ) one
S Dumaso Borgr ' o " ae'eiI ln
Tota words" 93,508,784 so okahoma
o /Lahom.
Ama.o Pampa City oSth A
O O Fot Smith gkma
OCanyon NormanAr~na
O Lawton eo 3 lt:R
clovis Herford O BentonV U tle Ro
O Plawvvleu Ouacha Pine Brff
o o e Nationa Forest O
ia es VJ ;hita Fall.
Shrman
well o O Lubbock S a
Leve land Mcr.Knney El C r; Green"
Lovlnn ll~11 BrownheU1 O 0 71
n0Lamesa Snyder F ODalas -.eor cr
HobsC o F North O c als Shreveport N rrct
A...dy ,s ....... Tyle O n
a bn i e Bgpng Desoto O a ckst
O Mdland
Odessa O K "1
....... Os O...ahans. San Angeo WTe a cK te .. N ...n.n..
?7-P 93 1 139:30/ O Fort Kileen O O Al-xandlr
Stockon O Collee
tRo at i Stanon n
Cnuoe
Alpne Kerr', ustin O Beaumont o Louisia
S pring O Laker O
Newue Charles New benl.
a Cc nd OBru nfels Sugar Land ur w
N ar' ton
Piedras SarToo
Negrs vlo .
OO
, ...t..-,., oc uRatio of Good to Bad Words
U U
Showing cities having
publications with 50.0% - 100%
correct words.
Circle Scaling:
* Log '- Linear
Circle size is relative to total
number of words scanned. Color
indicates overall scan quality.
Click to show/hide historic timeline* At the top is a timeline that plots the quantity of words (both in the complete corpus, and the
"good words") over time, providing an overall sense of how the quantity of information ebbs
and flows with different time periods. Users can also adjust the dates on the timeline or order
to focus on a particular date-range in order to explore in more detail the quantity of
information available.
o And in our collection, visualizing the data on this timeline reveals that two time periods
in particular dominate the information available in our collection: 1883-1911 and 1925-
1940. Even though the entire collection represents 1829-2008, newspapers from those21
8 Farmmn
Las ,11 _ln r, , r 9- e 1 93nRO
O
,nada
Upcoming Pages
Here’s what’s next.
Search Inside
This paper can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Paper.
Torget, Andrew J., 1978-; Mihalcea, Rada, 1974-; Christensen, Jon & McGhee, Geoff. Mapping Texts: Combining Text-Mining and Geo-Visualization To Unlock The Research Potential of Historical Newspapers, paper, 2011; (https://digital.library.unt.edu/ark:/67531/metadc83797/m1/21/: accessed April 23, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT College of Arts and Sciences.