Photo by Quinn Dombrowski via flickr.com
Photo by Quinn Dombrowski via flickr.com

The ever-expanding world of Google has opened the door for all kinds of large-scale statistical analyses, and in a paper published in Science, physicists Alexander Petersen, Joel Tenenbaum, and their co-authors demonstrate the utility of all that data. They mined through Google’s massive collection of scanned books to discover patterns behind the life and death of words.

The Wall Street Journal picked up on the physicists’ study and recently ran an article on their language evolution findings. For starters, the study makes the most accurate estimation yet of words in the English language—a whopping 1 million, much higher than previous dictionaries have ever recorded (Webster’s Third New International Dictionary has 348,000). And, even though it seems like slang outpaces even dedicated text-decryptors, it appears the English language is growing more slowly than in past decades, partly because the language has already grown so rich there isn’t much use for new words. The words that are born, though, get relatively high frequency of use since they are usually created to describe something new (think “Facebook”).

According to the authors, the world of words is “an inherently competitive, evolutionary environment. All these different words are battling it out against synonyms, variant spellings, and related words.” According to Tenenbaum, the WSJ reports, synonyms seem to be stuck in “Darwinian battles.”

In examples related by the WSJ, the authors document how “Roentgenogram” was the most popular term for “X-ray” (or “radiogram,” another contender) in the 20th century, but is now effectively dead (that is, it’s extremely rare). Similarly, the article cites that “loanmoneys” died circa 1950, killed off by “loans,” and “persistency” is breathing its last, out-competed, appropriately enough, by “persistence.”

Homogenization, the WSJ relates, may be another reason for faster word death rates in the modern era. For instance, William Clark (of Lewis & Clark fame) “spelled ‘Sioux’ 27 different ways in his journals (‘Sieoux,’ ‘Seaux,’ ‘Souixx,’ etc.), and several of those variants would have made it into 19th-century books.” Now, between auto-correct and copy editors, such “chaotic variety” is weeded out much more quickly, essentially speeding up natural selection in the warring world of words.

Furthermore, the study suggests a “tipping point” for words. At around 30 to 50 years old, new words either become long-standing staples of the language of fall out of style like so many Zubaz. The authors suggest this may be because that stretch of decades marks the point when dictionary makers approve or disapprove new candidates for inclusion. Or perhaps it’s generational turnover: ever-innovative children accept or reject their parents’ coinages and the words they leave behind don’t make it to the next generation of speakers.