With every word, I drop knowledge: A lexical analysis of 'Hamilton'

Lin-Manuel Miranda’s cultural phenomenon of a musical, “Hamilton” (review) has gotten plenty of attention for its words. It’s chock full of them, for one thing, with more than 20,000 words its two hour and 23 minute runtime (an order of magnitude more than “1776,” a prior generation’s Revolutionary War musical). For another, the very concept of the show is about words, and how a young man’s capacity with a pen lifts him from poverty to power.

So I think it’s worth a closer look at those words. Here they all are — the most important ones, anyway:


The most common words (common English words like “the” and “a” have been omitted) are those repeated over and over again in motifs, such as the show’s effective motto, “I’m not throwing away my shot.” Character names are frequently mentioned, as are a handful of short action verbs: “look,” “wait,” “take,” “see.” In fact, lots of the biggest words there are just a single syllable — overall, the average word in the show is just 1.4 syllables long.

But we can delve deeper. Different characters have very different voices in the show — from the dense verses of Alexander Hamilton and Angelica Schuyler to the more balladic stylings of George Washington and Elizabeth Schuyler to the basic raps of Hamilton’s first act friends.

Here’s side-by-side comparisons of Hamilton’s and Burr’s word uses in the musical:


Or the two main Schuyler Sisters, Eliza and Angelica:


Other characters have distinctive personalities in the show, but are defined lexically by their role as a backup chorus:


And some just don’t have enough lines for any of them to be especially significant:

Philip Hamilton

For completeness, here’s the show’s three presidents:


And the words spoken, sung and rapped by the show’s background chorus:


Pretty cool? We don’t have to say goodbye yet.

As you could tell by the relative size of the wordclouds, some characters have far more words in the show than others. Unsurprisingly, the titular character has by far the most words, followed by his rival Aaron Burr:


But not all these words are created equal. Though the show as a whole is dominated by a pretty simple vocabulary, some characters are much more likely to break out a $10 word than others — and Alexander Hamilton is perhaps surprisingly not the leader in that category.

I calculated the Flesch-Kincaid reading level scores for each significant character. Their numbers reflect the number of years of education generally required to understand a text:


King George, in fact, has the longest words of the bunch, despite singing languidly instead of rapping furiously. Philip is second, though he has so few words that may be an anomaly. My favorite character, Angelica Schuyler, uses the most complex language of any of the major characters, at just shy of a 10th grade level. Hamilton and Burr rap at a middle school level, as do Washington and Jefferson. Meanwhile Laurens, Lafayette and Mulligan, who perform what Miranda describes as “super beginner raps,” are understandable by a typical third grader.

I sense the convention is listless, so one final piece — the most complex text-mining tool so far. There’s an algorithm called “fuzzy c-means clustering” which divides up data into groups “so that items in the same class are as similar as possible, and items in different classes are as dissimilar as possible.” (For other examples of how this can be used, check out this piece on 2016 presidential candidates and my own article on Minnesota State of the State addresses.)

In this case, the code divided each character’s combined lyrics between three different categories. These categories don’t have any inherent meaning, but each reflect certain words that are distinctive to that category. For example, Group 2 is most strongly associated with Angelica and Eliza, and is typified by words such as “satisfied” and “sister” — though it also features refrains from other characters like “must” and “nice.”

The details are less important than the big picture, which is that the algorithm immediately pegs Alexander Hamilton’s lyrics as distinctive from the rest of the cast. He dominates his category in a way that no one else does:

Character Group 1 Group 2 Group 3
Angelica Schuyler 3.96% 64.97% 31.08%
Aaron Burr 30.16% 40.46% 29.39%
Elizabeth Schuyler 8.54% 54.03% 37.43%
Ensemble 35.40% 33.97% 30.63%
Extras 4.35% 68.45% 27.19%
King George 2.95% 35.87% 61.18%
Alexander Hamilton 93.28% 3.51% 3.21%
Thomas Jefferson 5.08% 57.72% 37.20%
Lafayette 2.63% 18.90% 78.47%
John Laurens 4.53% 24.70% 70.77%
James Madison 3.50% 47.36% 49.14%
Hercules Mulligan 2.32% 17.42% 80.26%
Philip Hamilton 2.90% 36.27% 60.83%
George Washington 5.63% 50.86% 43.51%

The man is simply non-stop.

(Can we get back to politics, please? For a different take on the time period covered in “Hamilton,” check out my essay on the Election of 1800: “1800 and the politics of the absolute.”)

Visit my GitHub to view the code used to create this project.