View on GitHub


STA/ISS 313 - Project 1

Superbowl Ads

The Outliers


Every year millions of Americans tune into the most watched television broadcast in the United States — the Super Bowl. The Super Bowl is the NFL championship game, pitting the winners of the two major conferences, the AFC and the NFC, in a final showdown for all the glory. Due to the game’s high viewership, many brands use the occasion to advertise their products as a means to generate hype and exposure around their brands. In fact, Super Bowl commercials have been regarded as such a cultural phenomenon that many viewers only watch the games to see the beloved commercials. The invaluable exposure the Super Bowl brings to brands also comes with a hefty price as In 2021 alone the cost for a 30-second slot boasted an incredible price tag of $6 million.

Using a FiveThirtyEight dataset of 233 ads from 10 brands that aired the most spots since 2000 we decided to explore the themes that make Super Bowl ads so popular. The dataset includes categorical variables used to classify each ad as funny, show_product_quickly, patriotic, celebrity, danger, animals and/or use_sex, while also including variables that that we will use to explore popularity such as year, brand, view_count, like_count, dislike_count, favorite_count, and comment_count. We chose to use this dataset because we — like millions other Americans – love Super Bowl ads, and are interested in viewing trends over time.


We first explore the themes featured in Super Bowl ads and how they have changed over the last 21 years by leveraging the following variables: funny, show_product_quickly, patriotic, celebrity, danger, animals, use_sex, and year. Each variable (except for year) answers whether or not that specific characteristic was featured in the ad. For example, if an ad contains a celebrity appearance then there is a “TRUE” inputted for that brands’s commercial. Furthermore, a commercial can feature multiple characteristics such as being funny while using patriotism to raise awareness for said product. We are interested in seeing the changes in commercial advertising over time and predict that patriotic themes have become less prominent over time. Further, we hoped to identify which characteristics of advertisements were popular among certain brand categories.


Graph 1: Our first graph sought to explore the prevalence of ad characteristics over time. Originally, we hoped to show this data in a line graph with year on the x axis, count of ad characteristics on the y axis, and lines colored by ad characteristic. However, due to the number of ad categories it was difficult to discern the trends in popularity for each ad type and the graph appeared very crowded. We then decided to use animated bar chart to show the changing prevalence of certain ad features over time. This animated plot was useful to see how certain ad characteristics changed from year to year. After feedback, we decided to make this a racing bar chart in which the categories changed positions based on their prevalence. However, in both cases it was difficult to compare trends over the entire time frame (i.e. trends in 2000 Super Bowl commercials versus 2020 Super Bowl commercials). For this reason we have included an additional plot which captures 5 frames of the animated plot. This graph essential gives snapshots of the animated bar plot at different year intervals. We used the Spectral palette because it clearly distinguished the 7 characteristics by color.

Graph 2: For our second graph, we created a new variable brand_type, which grouped the 10 brands into 4 categories: Beverage, Car, NFL, and misc. All drink brands were placed in the “Beverage” category, car brands were placed in the Car category, Doritos and E-Trade were placed in the miscellaneous category, and NFL was placed in its own category. Then we grouped by brand_type and used the summarise function to create a new variable representing the count of each ad characteristic. We used pivot_longer to group all of the ad characteristics under a new variable called ad_type. The count of each ad characteristic was represented in the value variable. Finally, for each brand_type we calculated the percentage of each ad characteristic among ads for that brand_type. These percentages are represented in the stacked bar chart. This plot clearly identified which characteristics of advertisements were popular amongst certain brand categories.


Wrangling data

charactertistic_count <- superbowl_data %>%
  group_by(year) %>%
  summarise(Funny = sum(funny),
            Patriotic= sum(patriotic),
            Celebrity = sum(celebrity),
            Danger= sum(danger),
            Animal= sum(animals),
            "Product Quickly" = sum(show_product_quickly),
            Sexuality = sum(use_sex)) %>%
  pivot_longer(cols = c(Funny, Patriotic, Celebrity, "Product Quickly",
                        Danger, Animal, Sexuality), names_to = "type")
brand_category_data <- superbowl_data%>%
  mutate(brand_type = case_when(brand == "Toyota" ~ "Car",
                               brand == "Kia" ~ "Car",
                               brand == "Hynudai" ~ "Car",
                               brand == "Bud Light" ~ "Beverage",
                               brand == "Coca-Cola" ~ "Beverage",
                               brand == "Budweiser" ~ "Beverage",
                               brand == "Pepsi" ~ "Beverage",
                               brand == "Doritos" ~ "Misc.",
                               brand == "E-Trade" ~ "Misc.",
                               brand == "NFL" ~ "NFL",
                               TRUE ~ "Miscellaneous"))

brand_category_data <- brand_category_data %>%
  group_by(brand_type) %>%
  summarise(Funny = sum(funny),
            Patriotic= sum(patriotic),
            Celebrity = sum(celebrity),
            Danger= sum(danger),
            Animal= sum(animals),
            "Product Quickly" = sum(show_product_quickly),
            Sexuality = sum(use_sex)) %>%
  pivot_longer(cols = c(Funny, Patriotic, Celebrity, "Product Quickly",
                        Danger, Animal, Sexuality), names_to = "type")

brand_category_data<- brand_category_data %>%
  mutate(per=value/sum(value)) %>%
  mutate(labels = paste(per, "%"))

Visualization one

g <- ggplot(charactertistic_count, aes(type, value, fill  = type)) +
  geom_col(color = "black") + labs(title = 'Popularity of Ad Characteristics: {closest_state}',
                    x = 'Ad Type', y= " Ad Count") +
  theme_minimal() +
  coord_flip() +
  geom_text(x = 1000 , y = -10,  
            family = "Times",  
            aes(label = as.character(year)),  
            size = 35, col = "grey18")  +
  theme(axis.text.x=element_text(color = "black", size=10, angle=30,
                                 vjust=.8, hjust=0.8),
        legend.position = "none") +
                    transition_length = 0,
                    state_length = 2 ) +
  scale_fill_brewer(palette = "Spectral")

        nframes = 100,
        fps = 5,
        end_pause = 15)

characteristic_count_updated <- charactertistic_count %>%
  mutate(rank1 = rank(-value),
         Value_lbl = paste0(" ", value)) %>%
  ungroup() %>%
  group_by(year) %>%
  arrange(rank1, type) %>%
  mutate(rank = seq(1, n())) %>%

anim = ggplot(characteristic_count_updated, aes(rank, group = type, fill = type))+
  geom_tile(aes(y = value/2,
                height = value,
                width = 0.9), alpha = 0.8, color = NA) +
  scale_fill_viridis_d(option = "magma",  
                       direction = -1) +
  geom_text(aes(y = 0, label = paste(type, " ")), vjust = 0.2, hjust = 1, size = 5) +
  geom_text(aes(y= value,label = Value_lbl, hjust=0),size = 5 ) +
  coord_flip(clip = "off", expand = TRUE) +
  scale_x_reverse() +
  scale_y_continuous(breaks = seq(0, 14, by = 2)) +
  theme_minimal() +
        axis.title.y = element_blank(),
        axis.title.x = element_text(size=12, face="italic",color = "black"),
        panel.grid.major.x = element_line( size=.1, color="grey" ),
        panel.grid.minor.x = element_line( size=.1, color="grey" ),
        plot.title=element_text(size=16, hjust=0.5, face="bold",colour="black", vjust=-1),
        plot.subtitle=element_text(size=18, hjust=0.5, face="italic", color="red"),
        plot.caption =element_text(size=12, hjust=0.5, face="italic", color="red"),
        plot.margin = margin(1,4, 1, 8, "cm")) +
  transition_states(year, transition_length = 4, state_length = 1) +
  ease_aes('sine-in-out') +
  labs(title = 'Prevalence of Ad Characteristics: {closest_state}', y= "Ad Count")

animate(anim, nframes = 325,fps = 20)

ggplot(charactertistic_count %>%
         filter(year %in% c(2000, 2005, 2010,2015, 2020)),
       aes(type, value, fill  = type)) +
  geom_col(color = "black") +
  labs(title = 'Popularity of Superbowl Ad Characteristics Over Time',
       y= " Ad Count", fill = "Ad Characteristic") +
  theme_minimal() +
  theme(axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        legend.position = c(0.85, 0.25),
        legend.key.size = unit(.5, 'cm'),
        legend.key.height = unit(.5, 'cm'), #change legend key height
        legend.key.width = unit(.5, 'cm'),
        legend.background = element_rect(size = 0.5, colour = 1),
        axis.title.y = element_text(angle = 90, vjust = 0.5)) +
  scale_fill_brewer(palette = "Spectral") +
  facet_wrap(~year, ncol = 3) +
  scale_y_continuous(breaks = seq(0, 12, by = 2))

Visualization two

ggplot(brand_category_data, aes(fill = type, y = per, x = brand_type)) +
    geom_bar(position = "stack", stat = "identity",
             color = "black") +
  scale_fill_brewer(palette = "Spectral") +
  theme_minimal() +
  labs(title = "Popularity of Ad Characteristics by Brand Type",
       x= "Brand Type",
       y = "Percentage") +
  scale_y_continuous(labels = percent) +
  geom_bar(position = "stack", stat = "identity", color = "black") +
  guides(fill = guide_legend(reverse = TRUE, nrow = 1))  +
  coord_flip() +  
  theme(legend.direction = "horizontal",
        legend.position = "bottom", = "horizontal",
        legend.title = element_blank(),
        legend.text = element_text(size =7),
        aspect.ratio = .4


From our first plot above we can see that there have been considerable changes to the content featured in Super Bowl ads. Primarily we can see that the use of sex has gone down considerably, especially since 2015. We believe this could be attributed to social movements such as the Me Too movement which has put considerable pressure on how sexuality and gender roles are portrayed in the media.

We can also see however, that there are ad categories that have consistently featured — or not featured — among the top brands. For example, in over 67% of years covered in our dataset, brands consistently relied on being funny and showing the product quickly to advertise their products. Intuitively this makes sense. Humor is a universal method to draw attention and showing the product you are advertising is critical in order to develop a relationship with the consumer . Alternatively, in over 70% of years covered in our dataset, patriotism was the last marketing resort used by brands to advertise their products. We were surprised to see this as our initial hypothesis was it would among the most featured categories; the Super Bowl prides itself as America’s most watched broadcast which emanates underlying patriotic ties.

Our second plot shows the breakdown of ad features by brand type. We can see that NFL owned ads feature celebrities at a much higher rate compared to other brand types. We believe this is true because NFL ads heavily feature prominent football players such as Tom Brady and Aaron Rodgers. The other 3 brand types have similar breakdowns of ad features with features funny and shows product quickly being the most prevalent. This validates the results from our previous plots as those features have consistently featured among top brands in our dataset.


At the heart of every Super Bowl advertisement lies a brand. Due the significance of branding in advertisements, we wanted to look at the brand variable’s relationship to other factors in our dataset. To get a general idea of which brands are well received by the audience, we decided to do an analysis with brand, like_count, and dislike_count to see which brands had the highest proportion of likes relative to total interactions (likes + dislikes). These were the most direct interactions in our dataset, and we felt that like_count and dislike_count provide the most information on whether or not the audience enjoys particular advertisements. However, we did not want to completely dismiss the other audience metrics such as view_count and comment_count. For our second analysis, we looked at the relationship between these two variables and brand to see if they provided any extra insight into the popularity of brands who make super bowl commercials.

We thought this would be an interesting question to answer so that next time we watch the Super Bowl, we know which brands are likely to have advertisements we will enjoy. Furthermore, we are all familiar with the brands included in the dataset and felt like the class could connect personally with the analysis since they too are likely to be familiar with the brands.


Graph 1: Originally, for our first graph, we planned to use a 100% stacked bar chart to compare the proportion of likes out of total reactions received by each brand for all commercials in the dataset. When we tried this, however, the visualization looked very crowded and did not look appealing. We instead decided that a lollipop plot of the same data was easier to interpret and more aesthetically appealing. In order to display the brands in a neat manner, we put them on the y-axis and the proportions on the x. The lollipop plot is effective in this case, as it clearly demonstrates which brands received a higher relative number of likes. This metric is important, as it gauges how audiences react to a video, giving us some idea of how well each of the brands is received by their respective audiences. In addition, this plot has a fairly reasonable data-to-ink ratio and the use of lines, as opposed to bars, creates a less crowded appearance.

Graph 2: For our second graph, we created a scatterplot using geom_point. While we personally believe that looking at the proportion of likes relative to total interactions is the best way to gauge the audience’s opinion on each brand’s commercials, we thought it would also be important to look at other measures of popularity, in this case, avg_total_views and avg_total_comments. These were both variables we generated by calculating the average total views and comments each brand received per commercial. We put avg_total_views on the x-axis and avg_total_comments on the y-axis, coloring by brand. Due to the large discrepancy in values of view and comment count, we decided to change the scales of the x-axis and y-axis to a log10 scale. This made it much easier to interpret the graph and spaced out the points in a way that made them easy to see. We believe this visualization is appropriate for our analysis, as it clearly demonstrates the which brands receive the most amount of views and comments per commercial. This graph is a nice complement to our first visualization and will be helpful to see how each brand is represented in each scenario.



q2_p1_data <- filter(superbowl_data, like_count != & dislike_count != %>%
  group_by(brand) %>%
  mutate(likes_total = sum(like_count),
         dislikes_total = sum(dislike_count),
         proportion_likes_total = likes_total / (dislikes_total + likes_total))
q2_p2_data <- filter(superbowl_data, comment_count != &
                         view_count != %>%
  group_by(brand) %>%
  summarise(avg_total_comments = mean(comment_count),
            avg_total_views = mean(view_count))

Visualization one

ggplot(q2_p1_data, aes(x = proportion_likes_total, y = brand)) +
    geom_segment(aes(y = reorder(brand, proportion_likes_total),
                     yend = reorder(brand, proportion_likes_total),
                     x = 0, xend = proportion_likes_total, color = brand),
                 show.legend = FALSE, size = 2) +
  geom_point(aes(color = brand), size = 3, show.legend = FALSE) +
  coord_cartesian(xlim = c(.75, 1)) +
  scale_color_brewer(palette = "Paired") +
  labs(title = "Proportion of likes received out of total video interactions,
       by brand", x = "Proportion of likes out of total interactions",
       y = "Brand", subtitle = "(Video interactions = likes + dislikes)") +
  theme_minimal() +
  theme(axis.ticks.y = element_blank(),
        axis.text.y = element_text(size = 11, color = "black"),
        axis.text.x = element_text(size = 11, color = "black"),
        axis.title.x = element_text(size = 12),
        axis.title.y = element_text(size = 12))

Visualiztion two

ggplot(q2_p2_data, aes(x = avg_total_views, y = avg_total_comments)) +
  geom_point(aes(color = brand), show.legend = FALSE, size = 4) +
  geom_text_repel(aes(label = brand)) +
  scale_color_brewer(palette = "Paired") +
  scale_y_log10() +
  scale_x_log10(labels = unit_format(unit = "M", scale = 1e-6)) +
  labs(title = "Average Total Comments vs. Average Total Views, by brand",
       x = "Average Total Views", y = "Average Total Comments",
       caption = "Axes on log10 scale, labels are raw units") +


In graph one, it is clear that Kia, the NFL, Pepsi, and E-Trade Super Bowl commercials all received a high proportion of likes out of total reactions compared to other brands included in the dataset. In general, it appears that these brands are the most well received by the audience. On the other hand, Budweiser, Coca-Cola, and Doritos received the lowest proportion of likes and therefore seem to be less appealing to the audience. When looking at similar brands, it appears that the audience reacted very similarly to their commercials in some instances. For example, Toyota and Hyundai, both multinational car brands, received a very similar proportion of likes. Bud Light and Budweiser were roughly similar as well, though their proportions differed by a greater amount than Toyota and Hyundai. This trend was not true across all brands, however (i.e. Coca-Cola and Pepsi). Going forward, this analysis tells us we should pay attention to Super Bowl commercials by Kia, the NFL, Pepsi, and E-Trade, as they have historically been rated highly.

Graph two is a nice complement to graph one. Originally, we thought that the brands with the highest proportion of likes would also have many views, but this pattern was not true. In fact, there does not appear to be much of a relationship between what we found in the first graph and what the second graph shows us. Kia, the top-rated brand in the first visualization, actually had the lowest average view and comment values. The NFL, on the other hand, was the second highest rated in the first graph, but has the highest average number of comments, and the second highest average views on their commercials. Therefore, it appears that audience’s reactions to a commercial may not have much of a relationship to the number of views and comments it receives.

It is difficult to extract general relationships that are not brand-specific from these visualizations. If we had more observations and more brands in our dataset, it would likely help reveal overarching trends. However, one thing we did notice was that all the car companies were in the lower left portion of the second graph. One potential explanation of this is that cars are not a mass-consumer brand. Households typically only have one or two cars–if they have one at all. The goods in the top right of the second graph are all things that people consume in large quantities and on a consistent basis. Therefore, these items could have broader audiences and this could explain why these commercials receive a high number of likes and comments, on average.


Our presentation can be found here.


Thomas Mock (2021). Tidy Tuesday: A weekly data project aimed at the R ecosystem. SuperBowl Ads., viewed 13 September 2021.