I guess I am a sucker for playing with data, but the new Google thing called Ngram viewer is too much fun to resist.

My first instinct was to explore the word novel. Here I am trying to write them, so I need to knowwhere they sit in the culture. The first graph (fig 1) shows you what comes up. It looks like they are going out of style. I am fifteen years too late.
Like any investigator finding a disappointing result, I immediately began querying the data. How do I know that the sample means anything? According to the paper in Science, (Science DOI: 10.1126/science.1199644) this tool looks at about 5.2 million books, 4% of all the books ever published. It could still be a biased sample and the bias could change over time. I emailed the authors of the science paper about it and they agreed that I had a point and gave me some more information, but not enough to resolve the question in my mind.
I am stuck with the elephant in the room that so often is never talked about in popular writing in the media; can I rely on the data?



I tried running thriller against elephant because I write thrillers, and I did get some encouragement, thrillers are a lot less popular than elephants but are clearly on the up. Finally, I ran thriller on its own, and it gets better still. I leave you with thriller in the American English data (fig 4), because that’s the most encouraging graph, while I rush to get back to writing before they go out of style.
No comments:
Post a Comment