Making Use of Color!

In the first part of this blog post we will be making use of color to distinguish between different time series graphs that are plotted on the same set of axes.

We are interested in investigating how the average capacity of the audience for the Broadway show Beauty and the Beast changes over several years of the show’s run time. The data set can be obtained here: https://think.cs.vt.edu/corgis/csv/broadway/broadway.html. We subset the data down into the years 1996-1998 and only look at the show of interest. If we plot the mean capacity as a function of the week we get the following plot:

Now we can see that, for the most part, the mean capacity follows the same trend across all three years. A notable exception is in 1997 there is a much sharper dip in average attendance around week 36 than in the other two years included in the study. Perhaps this can be explained by the funeral of Diana, Princess of Whales being held during this week and people either wanted to attend this state funeral or watch it if it were televised. We can also note that these time series plots are relatively intertwined, so it might not be easy to determine an overall trend for each individual time series. We can somewhat alliviate this difficulty by including on the plot a smoothing curve. We will do this through the inclusion of a loess smoothing curve discussed in the previous blog post.

With the inclusion of the loess smoothing function we can get a much clearer idea of the individual trends in each of the time series plots. Using the smoothers we can see that for the most part the 1996 year saw better attendance on average than the other two years due to the red smoother being on top.

In the second part of the blog post we will use color to create a contour plot. Consider the contour plot with the Spectral color scheme as given in the code:

This contour plot was obtained using the spectral color scheme and uses several different colors and shades of those colors to determine the different zones of the plot. This plot allows us to easily determine the boundaries between zones but the problem is that the use of multiple colors does not allow for effortlessly understanding the order in which the quantities are encoded. By using the “YlOrRd” palette within our ggplot code we can create a graph that solves this problem:

We can see that this color scheme drifts through the different shades of yellow to orange to red. This drifting through the shades provides an ordering that we can easily follow without having to refer back to the chart constantly. We can further see that even though we are using fewer colors there are still clear boundaries between zones, so this graph has the same positive as the previous plot but has also corrected the negative present in the first plot. Thus, the color scheme selected is better the original.

Leave a Reply