Did more famous people than usual really die in 2016?

On Tuesday, actress and author Carrie Fisher died, setting off a wave of lamentations not just for the beloved Fisher, but for the entire calendar year 2016.

The just-ending year has seen not just the deaths of Fisher and pop star George Michael on Christmas Day, but other luminaries of sports, entertainment and politics: Prince and David Bowie, Muhammad Ali and Arnold Palmer, Fidel Castro and Nancy Reagan, Alan Rickman and John Glenn, among many others.

Combined with some people’s dissatisfaction with 2016’s political developments, these deaths have created an impression that the year 2016 has been particularly bad, an annus horribilis for the Anglo-American world.

But all these deaths are particularly raw and fresh. Has 2016 really seen more famous people die than, say, 2011, when Steve Jobs, Amy Winehouse, Osama bin Laden, Elizabeth Taylor, Joe Frazier, Václav Havel, Kim Jong-Il and Muammar Gaddafi passed away?1

There’s no easy way to measure collective grief, especially when it varies so widely from person to person. Prince’s death might have devasted one person while another would have just noted it as a curiosity.

Similarly, there’s no right or wrong way to grieve, and no one’s wrong to feel personally affected by any or all of the celebrity deaths in 2016.

But there is an empirical question we can try to answer: have more significant people really died in 2016 than past years? Or does it just seem that way because of the recency of the deaths or emotional carryover from the tumultuous year in politics?

You can make a good data-driven argument that 2016 has in fact seen more significant deaths than recent years.

There’s no one true way to answer that question. BBC News, for one, looked to news obituaries for its answer. (They came to a similar conclusion.)

My approach is a little different: use the open-source, user-edited Wikipedia’s records of deaths each year (while aware of Wikipedia’s many flaws as a source2). Among the categories Wikipedia can apply to every article, alongside things like “20th Century American singers” and “African-American male actors” are categories like “2016 deaths”. There’s one of those for every year, and the page includes every person with a Wikipedia article who’s marked as having died in that year.

I used an R script to compile the English-language Wikipedia pages for every person marked as having died in 2016 through Dec. 30 — as well as those who’ve died in every year going all the way back to 1990. You have to be somewhat notable to warrant a surviving Wikipedia page, so this is a rough proxy for notable deaths even if most of the tens of thousands of people in the dataset weren’t world-famous like Carrie Fisher or Fidel Castro. But we can also look at how long each article is as a very rough proxy for a person’s importance.3

The most famous celebrities

For example, here’s the number of Wikipedia death articles each year that are at least 80,000 characters long. That’s a cutoff that in 2016 includes basketball coach Pat Summitt, astronaut John Glenn, Supreme Court Justice Antonin Scalia, First Lady Nancy Reagan, boxer Muhammad Ali and musicians Merle Haggard, George Michael, Leonard Cohen, Prince and David Bowie — but not people like Fisher, hockey legend Gordie Howe, Israeli politician Shimon Peres or author and Holocaust survivor Elie Wiesel. In 2015 it would include Singaporean founder Lee Kuan Yew and actors Christopher Lee and Leonard Nimoy but not singers B.B. King or Scott Weiland or baseball player Yogi Berra.

plot of chunk deaths80k

Among these most famous people you can see 2016 has significantly more deaths than any past year.

What about a lower threshold? Cutting off articles at 40,000 characters would add people like police shooting victim Keith Lamont Scott, murdered British M.P. Jo Cox, conservative activist Phyllis Schlafly, mixed martial artist Kimbo Slice, actress Zsa Zsa Gabor and author Harper Lee. There are still some famous people below this line, like baseball player José Fernández and news anchor Gwen Ifill, but also many more people who are not household names in the West compared to above that line.

plot of chunk deaths40k

This is less dramatic — the highest of all time, but in line with a trend of more articles rather than an aberration. Still, a peak is a peak. If the data had shown a significant decrease in deaths this year, that could suggest people are particularly wrong to feel like there are more celebrity deaths in 2016 than normal. This leaves the possibility open.

You can experiment yourself with other thresholds than 40,000 and 80,000 in the interactive graphic below!

Another way to look at the same question: let’s take the 50 longest articles from each year and see how their length compares.

plot of chunk top50

Again, 2016 is a peak; again it’s not in a world by itself.

Adding less-famous celebrities

What if we broaden our scope significantly? Instead of looking at just the most famous people, let’s look at every single person with a Wikipedia page. That’s a much lower standard than the celebrities we’ve been discussing so far. Though most people don’t meet Wikipedia’s standards of “significant coverage in reliable sources that are independent of the subject”, that does include lots of people with only minor significance: semi-obscure authors, backbencher state legislators and backup athletes. But looking at the entire dataset shows some interesting trends:

plot of chunk mediansize

Looking at the median article length removes one piece of bias: the fact that there are just more Wikipedia articles each year. What we see here is that 2016 — and 2015 before it — have significantly longer articles among the thousands of people who qualify for articles. Just as interestingly, we see that’s not automatic — this measure plateaued and even decreased for much of the current decade.

In fact, 2016 actually has fewer death articles overall than past years, even though the ones it does have are longer:

plot of chunk totalpages

I’d argue this roughly 12 percent decline doesn’t disprove the annus horribilis theory, in part because it’s still early — it’s possible that among the many lesser-known figures on Wikipedia are plenty who have not yet been marked deceased on Wikipedia.

Reasons for caution

This dataset suggests that 2016 could in fact have seen more celebrity deaths than usual, though it doesn’t prove it. But there are several significant weaknesses for trying to analyze this question.

The first has to do with geography. The sense of a plethora of celebrity deaths in 2016 is primarily one among Americans, or more broadly speaking Anglo-Americans. But even the English-language Wikipedia has lots and lots of articles about non-Americans — lots of really long articles. Many of the longest articles of 2016 deaths are not people whose passing would have made Americans moved by the deaths of Carrie Fisher or John Glenn blink an eye.

For example, can you guess which of 2016’s deaths has the longest Wikipedia article? If you’re a typical American, you’d never figure it out. It’s not Prince, Bowie or Muhammad Ali, or even Fidel Castro. It’s Johan Cruyff, a Dutch soccer player and coach.

Here’s the 20 longest articles of 2016 deaths, which includes not just famous Americans but the King of Thailand, a French composer, a Scottish serial killer, a German historian, and Iranian and French filmmakers.

(Also of note: there are only two women among the 20 longest articles.)

title characters
Johan Cruyff 273322
Muhammad Ali 196472
Fidel Castro 168219
David Bowie 161916
Prince (musician) 153721
Keith Emerson 120863
Bhumibol Adulyadej 118807
Pierre Boulez 114294
Lonnie Mack 113258
Leonard Cohen 113082
Nancy Reagan 112516
Robert Black (serial killer) 106796
Antonin Scalia 99220
George Michael 97712
Merle Haggard 91324
Ernst Nolte 88350
John Glenn 85557
Abbas Kiarostami 85343
Jacques Rivette 83183
Pat Summitt 82533

The other major limit is that there’s no easy way to filter by reason for prominence. Americans don’t feel like 2016 was an annus horribilis just because some famous literary authors or world leaders died. It feels that way because a number of famous entertainers passed away. An ideal analysis would have some way to break out articles into rough groups: athletes, entertainers, politicians, authors, etc.

While Wikipedia does have a number of categories, they’re unfortunately far more granular than we need. Fisher, for example, is listed under “American film actresses,” “American television actresses,” “20th-century American actresses,” “21st-century American actresses,” “Actresses from Beverly Hills, California,” “Alumni of the Central School of Speech and Drama,” “American agnostics” and “Jewish agnostics” — but not just “Americans” or “Actors.”

This kind of information could be hand-coded, but that would be a laborious process that would become even more so the more people and years were included.

Finally, while Wikipedia infoboxes list peoples’ ages, that doesn’t appear to be accessible via API. If it were, we could determine whether 2016’s deaths seemed unusually shocking because they were younger than prior years. If someone dies at a ripe old age like Harper Lee, people may feel sad, and may be inspired to think back on the person’s life, but they’re not likely to be shocked the way people were by the deaths of Fisher or Prince at comparatively younger ages.

So we can’t say for certain whether more Americans or more celebrities died in 2016 — the kind of questions we’re trying to get at. Nor can we analyze whether 2016’s deaths were more surprising than we’d actuarily expect. All we can say is that it’s possible that more notable inhabitants of the Earth died in the past year than in prior years.

This report was generated using R, ggplot and R Markdown. Full data and the R scripts used to generate this report are available on Github. Below are several additional charts for people who want to learn more:

plot of chunk freqpoly

plot of chunk totalsize

The very longest death article each year on Wikipedia varies widely in length from year to year; perhaps surprisingly, Watergate prosecutor Archibald Cox’s long and distinguished career makes him the longest death article in the past 27 years.

plot of chunk top1

plot of chunk boxplot

There’s been a gradual rise in the median article length over the past 27 years, but not a dramatic one. There seems to be more variation among the outliers: unusually long or short articles.

plot of chunk top50dotplot

  1. Aside from that year’s prevalence of dictatorial deaths, which might not inspire the same level of grief.

  2. Wikipedia is, of course, rife with flaws, none moreso than that it depends on the work of a large cadre of volunteer authors and editors to stay up to date — and some pages get far more attention than others. More recent events are likely to be more thoroughly covered on Wikipedia than events of similar magnitude that happened before Wikipedia was created in 2001. With those caveats in mind, this should not be seen as a definitive answer to our question about how 2016 deaths compare.

  3. There are some flaws to this aside from the obvious one that a less-notable person with a particularly enthusiastic fan or fans could end up with a longer article than a more famous person. For one, many authors, musicians and actors, for example, have their bibliography, filmography or discography broken into a separate page from their main bio, which thus wouldn’t be counted. There are also a few special cases. In 2011, for example, Wikipedia has separate articles for “Osama bin Laden” and “Death of Osama bin Laden,” both of which are tagged as “2011 Deaths.” (Interestingly the “Death of” article is considerably longer than bin Laden’s main bio.)