Introduction

Writing of Richmond, Virginia, capital of the Confederate States of America, shortly after the city fell to Union forces in April 1865, northern journalist George Alfred Townsend declared that "This town is the Rebellion .... Its history is the epitome of the whole contest." And yet, Townsend concluded, though for four years the city had been "all that we have directly striven for," ironically "to us ... Richmond is still a mystery."1

In a book review written over 125 years later historian Kenneth W. Noe arrived at a strikingly similar conclusion. Historians have appreciated the extraordinary importance of Richmond as the political and military headquarters of the Confederacy and as the object of Union forces throughout the conflict. And yet, Noe concludes, "Few American cities have been mentioned more and understood less than Civil War-era Richmond." "It is a rare volume on the conflict's eastern theater," he continues, "that does not place the Confederate capital somewhere near the center of events .... Richmond in these volumes, however, remains an offstage presence, a sort of metropolitan version of Hamlet's father, mentioned with frequency but rarely seen."2

While it would be too much to repeat Townsend and suggest that on the eve of the sesquicentennial of the Civil War Richmond is still a mystery, many aspect of the Confederate capital's Civil War-era history—especially the rhythms of daily life of the city's inhabitants during a period marked by massive sudden in-migration, escalating material and food deprivation, hyperinflation and economic collapse, and near constant military threat—are still something of a mystery, subjects of ongoing research that continues to reveal new insights.

This project, "Mining the Dispatch," seeks to explore—and encourage exploration of—the dramatic and often traumatic changes as well as the sometimes surprising continuities in the social and political life of Civil War Richmond. It uses as its evidence nearly the full run of the Richmond Daily Dispatch from the eve of Lincoln's election in November 1860 through the evacuation of the city in April 1865 (at which point publication of the Dispatch ceased) to December 1865 (when publication of the Dispatch resumed), an archive of over 112,000 pieces consisting of nearly 24 million words.3 It uses as its principle methodology topic modeling, a computational, probabilistic technique to uncover categories and discover patterns in and among texts.4

Topic Modeling

Topic modeling uses statistical techniques to categorize individual texts and, perhaps more importantly, to discover categories, topics, and patterns that we might not be aware of in those texts. A topic modeling program—here the impressive MALLET application developed by Andrew McCallum and others at the University of Massachusetts, Amherst—generates a specified number of topics from a group of documents. The specific topics are not predetermined by the researcher but instead emerge from the patterns uncovered by the statistical algorithm. All that is provided by the researcher is the number of topics; for this project I chose to begin with a 40 topic model for the Dispatch.

Exactly what is meant by a topic in the context of topic modeling is a bit abstract. A topic is a group of words that are likely to appear together in the same document, and a topic modeling program processes a group of documents to identify clusters of words—topics—that often appear in the same document together. To be a bit more technical, a topic is a probability distribution of words, statistics for a set of words that indicate the probability of each individual word appearing in a document on a particular topic.5 An example makes this far clearer. The words most likely to appear in one topic identified by MALLET in the Dispatch were "negro, years, reward, boy, man, named, jail, delivery, give, left, black, paid, pay, ran, color, richmond, subscriber, high, apprehension, age, ranaway, free, feet, delivered."6 While it can be a bit challenging to grasp topic modeling conceptually and technically, it's far less challenging to look at that list of words and know exactly what that topic is. That set of words are those most likely to appear in fugitive slave advertisements.

Besides identifying topics—those lists of words associated with one another—a topic model also provides proportions of what topics appear in each and every document. In other words, a given document likely doesn't not fall neatly and entirely into a single topic but consist of multiple topics. A topic model quantifies those proportions for every document. An example from the category above: in most issues of the Dispatch in December 1861 the following fugitive slave ad appeared:

Ranaway.—$10 reward.

—Ranaway from the subscriber, on the 3d inst., my slave woman Parthena. Had on a dark brown and white calico dress. She is of a ginger-bread color; medium size; the right fore-finger shortened and crooked, from a whitlow. I think she is harbored somewhere in or near Duvall's addition. For her delivery to me I will pay $10.

de 6—ts G. W. H. Tyler.

This is, obviously, a fugitive slave ad. The topic model produced by MALLET associates this piece with the "fugitive slave ad" topic, but not exclusively. In the topic model 90% of this piece comes from the fugitive slave ad topic and the other 10% from two other topics. In fact, while the Dispatch contained thousands of fugitive slaves ads from late 1860 to the end of the war, no single ad was identified by MALLET as belonging exclusively, 100%, within the topic I've labeled "fugitive slave ads."

Another example. Four years later in December 1865 the following appeared in the Dispatch:

Some of our tradesmen advertise the Fenian hat. We should think that the style just at present must be a shocking bad hat.—New York Tribune.

The above is not very witty, but very remarkable as coming from Horace Greeley, who is everywhere known as "the philosopher of the old white coat and shocking bad hat."

It's not straightforward for a human being to try to categorize this piece, let alone a piece of software. It's not an article that falls in any neat or simple way within the genres of journalism. It's a humorous dig at the northern abolitionist reformer and publisher of the New York Tribune that associates him with Irish revolutionaries (the Fenians) who were "black hats," political and social troublemakers. As the pie chart below shows, MALLET, appropriately and not surprisingly, doesn't place this in any single topic, instead concluding that it combines ten of the forty topics.

pie chart for topic proportions

This distribution has much to recommend it. About 14% of the piece comes from the topic I've labeled "entertainment and culture"—reasonable as this has something to do with fashion. Four topics each account for 9% of the piece, all of them explicable, perhaps even surprisingly and remarkably subtle for a computer program that has no knowledge of the meaning of any words: "European news" (the piece is connected to Irish politics), "anti-northern diatribes" (this piece is a dig at the New York "bad hat" Greeley), "humor" (the piece makes a joke at Greeley's expense of another "not very witty" joke from the Tribune), and "North" (the piece repeats news from New York). Yet the largest single topic identified by MALLET for this piece is clearly off base—the topic "fugitive slave ads." While this classification is explicable too—most fugitive slave ads mentioned the clothing worn by the runaway, and this short piece mentions "hat" three times and "coat" once—this piece appearing eight months after the end of the war and the end of American slavery is clearly not a fugitive slave ad.

These two examples suggest both the classificatory power as well as the limitations of topic modeling for individual documents within a larger corpus. The real potential of topic modeling, however, isn't at the level of the individual document. Topic modeling, instead, allows us to step back from individual documents and look at larger patterns among all the documents, to practice not close but distant reading, to borrow Franco Moretti's memorable phrase. The Dispatch archive is substantial in size—again over 112,000 pieces amounting to nearly 24 million words. Conventional historical methods would involve skimming much of this and closely reading a small sampling of articles. That method has much to recommend it, and topic modeling is certainly not a replacement for conventional, close reading methods. It and other distant reading methods do, however, provides historians an additional method that allows us to examine and detect patterns within not a sampling but in the entirety of an archive.7

Slavery

Examining a couple of the topics related to the institution of slavery suggests the value of topic modeling as a research tool. Using the data generated by MALLET, this site allows you to examine particular topics in depth. There are two types of graphs you can generate for a topic. First, you can choose to see a graph of the relative space that a topic occupied in the paper over time—this is particularly useful for topics that are more thematic in nature like "anti-northern diatribes" or "humor." Second, you can generate graphs that count the number of articles or advertisements where the proportion for a specified topic is above a threshold you specify. This is useful for topics that are more generic in nature like "fugitive slave ads" or "deserters." Generally a large number of documents will contain a particular topic, but in the majority of them only a tiny fraction of the topic proportion draws from the topic you might be interested in. For example, articles that are have a topic proportion of, say, 20% or less for the "fugitive slave ads" category very likely aren't fugitive slave ads—the "Fenian hat" article above is a perfect example of this, and it has even a larger topic proportion from the "fugitive slave ads" category. Thus for this latter type of graph it's usually useful to select some percentage, maybe 30% or 40%, as the minimum topic proportion for articles to include in the graph you generate.

The following graph is of the latter variety. It shows the number of pieces from the paper where the proportion from the fugitive slave ad topic is equal to or greater than 21.5%:

fugitive slaves ads graph--modeled

When overlayed with the actual count of fugitive slave ads in the paper, the accuracy of the model in detecting these ads is apparent, even remarkable.8

fugitive slaves ads graph--modeled and actual

What conclusions can we make from this kind of distant reading and draw from this graph specifically? For one, I think, this graph underscores the role of the Union army in presenting enslaved African Americans with opportunities—risky opportunities—to seize their freedom by running to the Yankee lines. There are two sustained spikes in the number of fugitive slave ads, the first in the summer of 1862 and the second in the summer of 1864. At both of those moments the Union army approached Richmond. In 1862 the Union army under McClennan approached within seven miles of Richmond during the Peninsula Campaign. (Jefferson Davis's coachman, William A. Jackson, was one of the slaves who ran to the Union army that spring.) Two summers later Grant's Overland Campaign brought Union armies near Richmond. The graph registers how during each of those summers enslaved men and women used to opportunity offered by relative proximity of the Union armies—which we might think of as a mobile North, bringing the free states closer to men and women enslaved in the South—to attempt to escape from bondage. A third spike at the end of 1861 is less explicable, and we'll return to it in a moment.

This graph of fugitive slave ads is an abstraction—a powerful and moving abstraction inasmuch as it evidences the courageous choices that many enslaved men and women made to attempt to escape their individual enslavement and to challenge and compromise the institution of slavery. The graph of another topic shows something different: the resiliency of at least some portion of the marketplace for slave labor during the Civil War. Slave hiring—where an owner would rent a slave to a third party—was common in the South. Calculating precise numbers of the extent of slave hiring isn't possible, but estimates of the number of slaves rented in any given year range from a twentieth to a third of the total enslaved population. In a city like Richmond slaves would be rented to work a laborers in tobacco factories or ironworks, as domestics or nurses in homes, or as agricultural field hands in the surrounding countryside. While occasionally a slave might be rented for a week or a month, a year-long rental corresponding to the calendar year was the norm. The graph of the relative distribution of "hiring and wanted advertisements" evidences the annual cycle of slave hiring in Richmond.

graph--relative distribution of hiring ads

This graph suggests the stability of at least this portion of the marketplace for slave labor in Civil War Richmond. Advertisements took up a consistent amount of space in the paper in 1863, 1864, and 1865—all after the issuing of the Emancipation Proclamation—as they did in 1861 before the beginning of the war.9

Yet at the beginning of 1862 we see something different. As the graph suggests, advertisements for slave hiring were far less frequent that year than the previous year or any of the next three years. Jaime Amanda Martinez has noted a drop in the rental prices in Virginia's slave hiring market at the beginning of 1862, but the cause of this market stagnation and recession isn't entirely clear.10

This is the same moment where we see a not easily explainable spike among fugitive slave ads.

graph--relative distribution of hiring ads and fugitive slave ads

Could it be that enslaved African American men and women destabilized the slave hiring market by using the chaos of war mobilization in and around Richmond to run away in increasing numbers? This is a question that can be formulated from but not answered by these graphs alone, that will require using more traditional research methods to investigate. But the question itself suggest the value of topic modeling. Topic modeling and other distant reading methods are most valuable not when they allow us to see patterns that we can easily explain but when they reveal patterns that we can't, patterns that surprise us and that prompt interesting and useful research questions.

Robert K. Nelson

Notes

  1. Geo. Alfred Townsend, Campaigns of a Non-Combatant, and his Romaunt Abroad during the War (New York: Blelock and Company, 1866), 330. [back]
  2. Kenneth W. Noe, Review of Ashes of Glory: Richmond at War by Ernest B. Furguson, Journal of American History 84 (June 1997): 242. [back]
  3. The Dispatch archive produced by the University of Richmond and Tuft University's Perseus Project produced e-text for nearly the full run of the paper except for most advertisements. Except for fugitive slave ads, advertisements having to do with the selling or hiring of slaves, military notices, and a couple of other categories, advertisements are a sample, with e-texts for a day of advertisements produced every two weeks. [back]
  4. Two thoughtful introductions to topic modeling by and for historians are Sharon Block, "Doing More with Digitization: An Introduction to Topic Modeling of Early American Sources," Comon-Place 6 (January 2006), http://www.common-place.org/vol-06/no-02/tales/ and Cameron Blevins, "Topic Modeling Martha Ballard's Diary," historying, September 23, 2010, http://historying.org/2010/04/01/topic-modeling-martha-ballards-diary/. [back]
  5. M. Steyvers and T. Griffiths, "Probabilistic Topic Models" in Latent Semantic Analysis: A Road to Meaning, eds. T. Landauer et al. (Hillsdale, NJ: Erlbaum, 2007) [pdf] [back]
  6. Technically, in topic modeling these words are not in but rather are the topic. [back]
  7. Franco Moretti, "Conjectures on World Literature," New Left Review 1 (January-February 2000): 54-68. Moretti's Graphs, Maps, Trees: Abstract Models for Literary History (New York: Verso, 2005), particularly the chapter on graphs, has influenced my approach and thinking to this project. [back]
  8. I'd like to thank and acknowledge Megan Molnar, who reviewed every issue of the Dispatch counting fugitive slave ads so I could compare the topic model for this category to actual data. [back]
  9. Jonathan D. Martin, Divided Mastery: Slave Hiring in the American South (Cambridge, Mass.: Harvard University Press, 2004), 8; Gregg D. Kimball, American City, Southern Place: A Cultural History of Antebellum Richmond (Athens, Ga.: University of Georgia Press, 2000), 26-30. [back]
  10. Jamie Amanda Martinez, "The Slave Market in Civil War Virginia" in Crucible of the Civil War: Virginia from Secession to Commemoration, ed. Edward L. Ayers, Gary W. Gallagher, and Andrew J. Torget (Charlottesville, Va.: University of Virginia Press, 2006), 111. [back]