The following is a guest post by Concordia College sociology major Ryan Larson ’14. After graduation, Ryan intends to pursue graduate study in sociology and criminology. He is also a huge hockey fan.
Hockey is back at the forefront of the national sports consciousness thanks to T.J. Oshie and his Olympic shootout heroics against host team Russia on Saturday morning. Many in the media have made claims as to which country will obtain the coveted title of world hockey dominance (via a gold medal, which isn’t actually solid gold). However, to what extent are these claims mere speculation?
The Claims
Baseball has long been the hallmark choice for sports analytics, due to its large sample sizes (162 game seasons) and relatively independent events (for a more thorough discussion, I highly recommend Nate Silver’s The Signal and The Noise, Ch. 3). Recently, analytics has moved to ice hockey which has been spearheaded by Rob Vollman. Not surprisingly, he has made one of the only claims on who takes home the gold peppered with any quantitative substance. Vollman makes an implicit assumption having many NHL players (also good ones) is an indicator for Olympic success. This makes theoretical sense, as the hegemonic domination of the NHL in the professional hockey market clearly attracts the world’s finest athletic performers. Jaideep Kanungo, in an aptly titled “Hockeynomics” article (following scholarship in Simon Kuper and Stefan Szymanski’s Soccernomics) claims that countries with higher populations (higher likelihood of producing elite talent), gross domestic product (more resources to support player development such as indoor ice and equipment), and experience (proxy of country support) may give clues to a team’s success in Sochi.
The Data
To evaluate these claims, I channeled my inner Nate Silver and constructed a dataset using the Olympic mens hockey teams from 1998-2010 (prior to 1998 NHL players were not allowed to participate). I coded each team’s aggregate NHL games played, goaltender games played, goals, assists, points, and all-stars. Additionally, I appended the NHL data with GDP per capita, population, and IIHF World Ranking in each respective competition year (the IIHF World Ranking was instituted in 2003, so I manually calculated the rankings of each country in the 1998 and 2002 games). The IIHF ranking is utilized as an indicator of international competition success. I also coded if a team won gold, or if any medal was won irrespective of its elemental composition. As could be assumed, the NHL measures are all highly correlated (Table 1). Therefore, in each analysis I chose to use the highest correlated NHL metric with each respective dependent variable (specifically, NHL games played for medal win and all-stars for gold win). For the stats geeks out there, I use of a multilevel random effects probit model structured hierarchically by year. Probit regression models probabilities of outcomes (here, of winning any medal and of winning a gold medal). This model deals with the non-independence of the dependent measure of cases in the same Olympic year, because when three teams medal (or one team obtains gold) all others do not. While these analyses have very few cases (n=52), the dataset is a population of all relevant teams and years (making statistical significance irrelevant).
The Model
Table 2 depicts each predictor’s effect on the change in probability of success in the Winter Olympics.
Looking at Table 2, we can glean three major insights on what best predicts Olympic hockey success:
1. NHL measures are relatively good predictors for Olympic team success. The addition of 1 NHL player increases a team’s probability of winning a medal by 12.9% and the addition of 1 all-star increases a participating country’s probability of winning gold by 13.4%. The NHL measures outperformed other predictors in the models by accounting for about a third of the variation in medal and gold medal wins by themselves. This finding supports the notion that having players with experience in the best league on the planet is crucial for Olympic success. These effects are particularly impressive considering the small size of the population and the fact that these models are predicting relatively rare events.
2. IIHF World Ranking points, GDP per capita, and country’s population prove to be relatively poor predictors of Olympic medal winning. Compared to the NHL metrics, the other factors in the model were not as predictive. The only measure that was associated with any substantial probability change was population size in the gold model – and it decreased the probability of winning a gold! This finding is most likely a statistical artifact of the small sample size, as only 4 gold winners were included in the analysis. A possible explanation for this artifact could be the cultural hockey support (which is outside the scope of this data) present in countries with relatively small populations that tend to fare well in the tournaments (Czech Republic, Sweden). This same explanation most likely holds for GDP per capita as well, and a bigger sample size may show positive effects. For the above theoretical reasons, population and GDP per capita were not included in the final model (brings Pseudo R2 to .25).
3. NHL all-stars are what drive gold medal wins. Olympic play is characterized by preliminary round robins followed by a bracket single-elimination tournament. As far as the NHL metrics are concerned, getting to be one of the select teams on a podium come tournament end is best predicted by the number of NHL players present on a country’s team. However, when predicting the rare event of a gold medal, all-stars take the predictive lead. In other words, when only 4 teams remain in the bracket (most likely littered with many NHL players) it is the team with the most all-star players that has the greatest probability to take home the title of world champion.
In sum, the models support Vollman’s notion that NHL players matter, and having very good players (all-stars) is key to winning the gold. However, the impacts of GDP per capita, IIHF ranking, and population were relatively weak. However, to fully investigate this notion a larger sample would be ideal (which may soon become impossible).
Predicting Sochi 2014
Using the above models, I entered the 2014 Olympic teams’ data into the equation (excluded GDP per capita and population from the gold model for reasons discussed above). Table 3 relays each team’s probability of winning any medal as well as taking home the gold in Sochi. As illustrated by the pseudo R2 values in Table 2, these predictive models do not account for the majority of variation in probabilities, but model fits of .329 (medal) and .25 (gold) are far from nothing. In spite of the small historical sample size and attempting to predict who will win out of the 12 very best international squads (tight competition), the predictors included should allow us to get a better idea of who will “bring home some hardware” in Sochi above and beyond the speculation rampant in the media.
Much to my chagrin given my love for the Yanks, my models predict that the medalists for the 2014 Winter Olympics are as follows: