Text Mapping is a collaboration between the University of North Texas and Stanford University with a pretty simple mission: experiment with new methods for finding and analyzing meaningful patterns embedded within massive collections of historical newspapers. IDEA BEHIND THE PROJECT Why do we think this is important? Because, quite simply, historical newspapers are [...]
Text Mapping is a collaboration between the University of North Texas and Stanford University with a pretty simple mission: experiment with new methods for finding and analyzing meaningful patterns embedded within massive collections of historical newspapers.
IDEA BEHIND THE PROJECT
Why do we think this is important? Because, quite simply, historical newspapers are currently being digitized at a scale that is rapidly overwhelming our traditional methods of research. The Chronicling America project (a joint endeavor of the National Endowment for the Humanities and the Library of Congress), for example, recently digitized its one millionth historical newspaper page, and they will soon make millions more freely available online.
What can scholars do with such an immense wealth of information? Currently, they cannot do much. Without tools and methods capable of handling such large datasets—and thus sifting out meaningful patterns embedded within them—scholars typically find themselves confined to performing only basic word searches across enormous collections. While such basic searches can, indeed, find stray information scattered in unlikely places, they becoming increasingly less useful as datasets continue to grow in size. If a search for a particular term yields 4,000,000 results, even those search results produce a dataset far too large for any single scholar to analyze in a meaningful way using traditional methods.
Our goal, then, is to help solve this problem by combining the two most promising methods for finding meaning in such massive collections of historical newspapers: text-mining and visualization.
With nearly a quarter million pages, we could experiment with scale.
The newspapers were all digitized according to the standards set by the national Chronicling America project, providing a uniform sample.
The Texas orientation of all the newspapers gave us a consistent geography for our visualization experiments.
And so we have been experimenting with mining language patterns and mapping the results. We are currently working on a series of prototypes of what this might look like, which we will be releasing on this site as we develop them. These prototypes consist of visualizations that build on top of text- and data-mining that we are doing with the newspaper collection.
Our first prototype, which is nearly ready for initial release, examines the quantity and quality of information available in our newspaper collection as it spread out across both time and space. Future prototypes will attempt to answer specific research questions using the collection.