Visualizing Amounts

This week I am examining different ways to compare amounts between different classifications.  The data set I am working with consists of data about Broadway shows that have run over the past several decades.  I am interested in comparing Broadway shows from two time periods; 2000-2008 and 2009-2016.  Of interest is which shows were the most popular over these time periods.  I am defining “most popular” by the total number of people who saw the show over each eight-year period.  In addition, I am only interested in shows that ran during both time periods, so that we can examine the trends of popularity among shows.  I selected the top seven shows in attendance that ran in both periods.  These shows are The Lion King, Wicked, Mamma Mia!, The Phantom of the Opera, Chicago, Jersey Boys, and Mary Poppins.  I will be using two types of graphs to compare attendance; grouped bar plots and stacked bar plots.

First, we will look at a grouped bar graph that could use some improvement.

This grouped bar graph shows each production, a plots the total attendance for that production on the y-axis.  Some of the show titles are difficult to read because the names are long.  This causes the names to run into each other, creating a mess.  In addition, the order of shows on the x-axis is chosen arbitrarily.  In this case, ggplot defaults to alphabetical order.  It would be more interesting if the order of the shows was in ascending or descending order based on total attendance.

Below is the same data graphed in a stacked bar graph.

Once again, the x-axis label issues and x-axis order issues are present.  In addition, the total attendance is displayed as a power of 10, making reading the y-axis somewhat tricky.  We  can improve upon these three issues in the next two graphs.

This grouped bar chart and stacked bar chart solve the x-axis label problem by rotating the graphs 90 degrees and displaying the y values horizontally.  In addition, the attendance values are now ordered from greatest total attendance to least total attendance.  This allows the reader to see how the attendance changed between time periods, but also see which shows have remained strong draws over time.  For example, it is easy to see that longer running shows like The Lion King, The Phantom of the Opera, and Mamma Mia! declined in popularity between the two time periods, but newer shows like Wicked (2003) and Jersey Boys (2005) have increased in popularity.  Finally, the attendance values are displayed in millions, which makes interpreting the raw numbers from the graphs much more straight forward.

Leave a Reply