A number of readers, including Mickey C., Lu Fong (writer and editor at The Good Men Project and Good Feed), Cheryl S., and Kelly V., let us know about Google Ngram. The program includes a database of a little over 5 million books and allows you to graph the frequency with which various words or phrases show up in books published in various languages over time (English can also be broken down into British or American English). Mickey and Lu each graphed the words “men” and “women” (see Lu’s discussion here):
Cheryl S. tried “shameful divorce” vs. “amicable divorce”:
The plateaus are due to smoothing, which presents the data as 3-year averages to reduce huge spikes and valleys from individual data points to make overall trends more apparent. You can change the level of smoothing. Here’s the graph with no smoothing at all:
Overall, the tool provides a way to track changes in language as well as social trends. Google provides some info on their methodology, though not as much as I’d like. Some key points:
1. They “normalize” the results based on the number of books published each year, to account for the fact that many more books are published each year now than in, say, 1800, so 100 occurrences of a phrase today means less than 100 occurrences then — that’s why results are presented as percentages, not as raw numbers.
2. Phrases have to appear in at least 40 books total to be included in the database.
3. Keep in mind, the dataset is not based on all books published, but of a subset of books digitized by Google Books. The database includes about 4% of all published books, according to a journal article just published in Science.
I suspect it will be an amazing time-killer.