About Dataset
According to the blog Broadway shows, to plot the time series plot by choosing one specific show, I picked my favorite one — “The Lion King”. I want to plot the mean Capacity as a function of week number, comparing several years.
Time series Plot
Note: Capacity is the percentage of the theatre that was filled during that week
The above plot shows the weekly mean capacity of “The Lion King” among a year from 2012-2016. By using different colors, I draw all the lines on the same graph. Generally, these lines have similar patterns which indicate the performance of “The Lion King” tends to be “stable” among the chosen 5 years. It makes sense to me because this show has being one of the most popular shows for a long time period.
But I still notice several facts from it. The data of 2016 is not complete, the records stop at the 33rd week. Also, It seems to have an unusual decrease in capacity between the 30th and the 40th week in 2015.
Choice of Color in ggplot2
In the “sim.and.plot2” function, I simulate a sample of size 200 from a bivariate normal distribution with correlation rho = -0.9 and use a bivariate density estimation algorithm to construct a contour graph of the density estimate.
Palette: “Spectral” (Diverging palettes)
Palette: Sequential palettes
From the above comparison, while considering about the property of bivariate normal distribution, I want to emphasize the area which is close to the mean of the distribution. Conversely, the area which is far from the center should be treated as the outlier. Thus, the sequential palettes should be more suitable in this case than the “Spectral” (Diverging palettes).
Of course, in the situation that I’d like to emphasize the middle-class with light colors and low and high extremes with dark colors, then “Spectral” is preferred.