Daily Archives: September 28, 2018

Broadway show comparisons

Hello world! 

This is a typical Friday night in graduate school  and I am about to start my blog. I intend to compare Broadway shows in two different time periods. The first time period is 2000-2008, and the second is 2009-2016. 

I am not that familiar with broad way shows unfortunately, so I can not ramble about them as much as I like.

How did I get the data?

The data and explanation can be found here.

Since I intend to compare the best performances, I can choose any of the variables in the data set to do the comparison. The easier ones seem to be either Gross or Attendance of people which I can set up my bar charts for.I choose Attendance of people as a criterion of best performance. 

Part1:

First, I display a bar chart below where I have plotted Name of the shows versus attendance grouped by the period when the show was displayed. Note that as I explained above we have two periods. 

As we can see, we have so many names on the horizontal axis, we don’t have a title and the bar plot is so cluttered. Fortunately we can improve our plot a little bit.

Below is the edited graph. I have switched y and x axis. I only look at the shows that people’s attendance is more that three million. I changed the attendance to million for clarity.

 The Wicked is outperforming in the later period base don the criterion of success I defined above. 

Part 2:

Next I intend to have a bad stacked bar plot.

 

Show names I are easily read. There is no title.

Next I improve this stacked bar plot a little bit. I add the attendance on the bar for ease of the comparisons.

 

 

 

R code:

######################
library(ggplot2)
library(tidyverse)
library(lubridate)
d <- broadway
d$Year <- year(mdy(d$Full))

d %>% filter(Year >= 2000, Year <= 2016) %>%
mutate(Period = ifelse(Year <= 2008, “2000-2008”, “2009-2016″)) %>%
group_by(Period, Name) %>%
summarize(Attendance = sum(Attendance)) -> S
names(S)

ggplot(data=S, aes(x=Name, y=Attendance,colour = Period)) +
geom_bar(stat=”identity”)

######################## The minimum of 3000000
S=S[which(S$Attendance>=3000000),]

S$Attendance=as.numeric(paste(round(S$Attendance / 1e6, 1)))
#######################

ggplot(data=S, aes(x=Name, y=Attendance)) +
geom_bar(aes(fill=Period),stat=”identity”,position = “dodge”)+
coord_flip()+
xlab(” Show Name”) + ylab(“Total Attendance in million”) +
ggtitle(“Best Broadway Shows, Total Attendance (in million)
Periods: 2000:2008/ 2009-2016”) +
theme(
plot.title = element_text(
colour = “black”, size = 16, hjust = 0.5
)
)
#################

ggplot(data=S, aes(x=Name, y=Attendance)) +
geom_bar(aes(fill=Period),stat=”identity”)+
#coord_flip()+
xlab(” Show Name”) + ylab(“Total Attendance in million”) +
#ggtitle(“Best Broadway Shows, Total Attendance (in million)
# Periods: 2000:2008/ 2009-2016”) +
theme(
plot.title = element_text(
colour = “black”, size = 16, hjust = 0.5
)
)
#################

ggplot(data=S, aes(x=Name, y=Attendance)) +
geom_bar(aes(fill=Period),stat=”identity”)+
coord_flip()+
xlab(” Show Name”) + ylab(“Total Attendance in million”) +
ggtitle(“Best Broadway Shows, Total Attendance (in million)
Periods: 2000:2008/ 2009-2016”) +
theme(
plot.title = element_text(
colour = “black”, size = 16, hjust = 0.5
)
)+
geom_text(aes(label = Attendance), size = 3, position = position_stack(vjust=.5))
#################