Hello world!
This is a typical Friday night in graduate school and I am about to start my blog. I intend to compare Broadway shows in two different time periods. The first time period is 2000-2008, and the second is 2009-2016.
I am not that familiar with broad way shows unfortunately, so I can not ramble about them as much as I like.
How did I get the data?
The data and explanation can be found here.
Since I intend to compare the best performances, I can choose any of the variables in the data set to do the comparison. The easier ones seem to be either Gross or Attendance of people which I can set up my bar charts for.I choose Attendance of people as a criterion of best performance.
Part1:
First, I display a bar chart below where I have plotted Name of the shows versus attendance grouped by the period when the show was displayed. Note that as I explained above we have two periods.
As we can see, we have so many names on the horizontal axis, we don’t have a title and the bar plot is so cluttered. Fortunately we can improve our plot a little bit.
Below is the edited graph. I have switched y and x axis. I only look at the shows that people’s attendance is more that three million. I changed the attendance to million for clarity.
The Wicked is outperforming in the later period base don the criterion of success I defined above.
Part 2:
Next I intend to have a bad stacked bar plot.
Show names I are easily read. There is no title.
Next I improve this stacked bar plot a little bit. I add the attendance on the bar for ease of the comparisons.
R code:
######################
library(ggplot2)
library(tidyverse)
library(lubridate)
d <- broadway
d$Year <- year(mdy(d$Full))
d %>% filter(Year >= 2000, Year <= 2016) %>%
mutate(Period = ifelse(Year <= 2008, “2000-2008”, “2009-2016″)) %>%
group_by(Period, Name) %>%
summarize(Attendance = sum(Attendance)) -> S
names(S)
ggplot(data=S, aes(x=Name, y=Attendance,colour = Period)) +
geom_bar(stat=”identity”)
######################## The minimum of 3000000
S=S[which(S$Attendance>=3000000),]
S$Attendance=as.numeric(paste(round(S$Attendance / 1e6, 1)))
#######################
ggplot(data=S, aes(x=Name, y=Attendance)) +
geom_bar(aes(fill=Period),stat=”identity”,position = “dodge”)+
coord_flip()+
xlab(” Show Name”) + ylab(“Total Attendance in million”) +
ggtitle(“Best Broadway Shows, Total Attendance (in million)
Periods: 2000:2008/ 2009-2016”) +
theme(
plot.title = element_text(
colour = “black”, size = 16, hjust = 0.5
)
)
#################
ggplot(data=S, aes(x=Name, y=Attendance)) +
geom_bar(aes(fill=Period),stat=”identity”)+
#coord_flip()+
xlab(” Show Name”) + ylab(“Total Attendance in million”) +
#ggtitle(“Best Broadway Shows, Total Attendance (in million)
# Periods: 2000:2008/ 2009-2016”) +
theme(
plot.title = element_text(
colour = “black”, size = 16, hjust = 0.5
)
)
#################
ggplot(data=S, aes(x=Name, y=Attendance)) +
geom_bar(aes(fill=Period),stat=”identity”)+
coord_flip()+
xlab(” Show Name”) + ylab(“Total Attendance in million”) +
ggtitle(“Best Broadway Shows, Total Attendance (in million)
Periods: 2000:2008/ 2009-2016”) +
theme(
plot.title = element_text(
colour = “black”, size = 16, hjust = 0.5
)
)+
geom_text(aes(label = Attendance), size = 3, position = position_stack(vjust=.5))
#################