Saturday, 18 December 2010
A novel look at novels
I guess I am a sucker for playing with data, but the new Google thing called Ngram viewer is too much fun to resist.
As many people know, Google has been digitising all the literature that it can lay its hands on for some while. They have now made available a new tool based on this stuff. What it lets you do is explore the use of words and phrases as they have occurred over time in the sample that they have made available. They are caling this culturomics, see www.culturomics.org
My first instinct was to explore the word novel. Here I am trying to write them, so I need to knowwhere they sit in the culture. The first graph (fig 1) shows you what comes up. It looks like they are going out of style. I am fifteen years too late.
Like any investigator finding a disappointing result, I immediately began querying the data. How do I know that the sample means anything? According to the paper in Science, (Science DOI: 10.1126/science.1199644) this tool looks at about 5.2 million books, 4% of all the books ever published. It could still be a biased sample and the bias could change over time. I emailed the authors of the science paper about it and they agreed that I had a point and gave me some more information, but not enough to resolve the question in my mind.
I am stuck with the elephant in the room that so often is never talked about in popular writing in the media; can I rely on the data?
What things are constant? I asked myself. I fell back on Benjamin Franklin who said “nothing is certain but death and taxes” and he said it in 1789, before the Google sample starts, so I’m not adding bias or double counting by relying on it. Franklin said this in a letter to one Jean-Baptiste Leroy. By an amazing coincidence, one of the authors of the Science paper is also a Jean-Baptiste. Jean-Baptiste gets no score at all and Benjamin Franklin varies widely, peaking in 1940, so I stuck with death and tax.
Franklin may be right that they are certain, but if this literature is to be believed, our interest in them is not constant (fig 2). I’m still left with the elephant in the room, so I tried elephant, alongside death and tax (fig 3). Elephants live a long time and I don’t think they go in an out of fashion much, so perhaps it is no surprise that they give a relatively constant score.
I tried running novel against elephant, it shows a rise of popularity, as compared to elephants, but it still peaks fifteen years ago. Is my writing career really doomed?
I tried running thriller against elephant because I write thrillers, and I did get some encouragement, thrillers are a lot less popular than elephants but are clearly on the up. Finally, I ran thriller on its own, and it gets better still. I leave you with thriller in the American English data (fig 4), because that’s the most encouraging graph, while I rush to get back to writing before they go out of style.