The following post is by Ryan Larson ’14, a senior sociology major at Concordia College. He loves sports of all kinds, plays jazz sax, and will begin a graduate program in sociology in the fall.
With the NCAA’s Men’s Basketball Tournament starting today, the media are alight with predictions as to who will cut down the nets April 7th. The annual phenomenon of penciling in the winners in tens of millions of brackets has a new twist this year: a billion dollar prize. The grand prize is being offered by Quicken Loans, the Detroit mortgage lender, with the backing of Warren E. Buffett, to anyone who fills out a perfect 2014 tournament bracket. The prize money will be paid out in 40 annual payments of $25 million, or a one-time lump sum of $500 million. However, how likely is a perfect bracket to surface?
In all likelihood, it won’t. No record of a perfect bracket has surfaced to date, and the advent of Internet-based bracket filling makes this much easier to track. For example, in the 16 years of the ESPN online bracket challenge, not one has been perfect (this also holds for the other Internet-based hosts). Jeff Berge, Professor of Mathematics at DePaul University says the odds of picking a perfect bracket randomly is 1 in 9,223,372,036,854,775,808 (the probability of getting 63 out of 63 right is the product of the probability of getting each one right, which for a coin flip is 50 percent). If everyone on earth filled out 100 brackets, it would theoretically take 13 million years to get a perfect bracket. In sum, the prediction worth putting much credence in is the notion that Buffett won’t have to part with his billion.
However, not all NCAA March Madness contests are a 50/50 coin flip. A no. 1 seed has never lost to a no. 16 seed, which makes these games easier to predict correctly than the Final Four contests. Incorporating just this one piece of information, University of Minnesota Professor of Biostatistics Brad Carlin put the odds at more like “1 in 128 billion.” This estimate is based solely on the probabilities of correct predictions in each round: the probability of calling a first-round game correctly ranges from 51 percent for the No. 8 vs. No. 9 game to 100 percent for the No. 1 vs. No. 16; and that second-round games can be called with 65 percent accuracy. The figures are 60 percent for Sweet Sixteen games and 50 percent for every game from the Elite Eight through the final. To put this in perspective, your odds of being killed by a vending machine are higher than picking a perfect bracket at even with the incorporation of these conditions.
All hope is not lost (although it’s pretty close to it). Implementing statistical modeling techniques on historical tournament data can help increase your chances of picking games correctly (however, at a very modest rate). Arguably the most popular model is that of former New York Times, now ESPN prognosticator Nate Silver. Silver, and his team at fivethirtyeight, are in their fourth year of building a model to correctly pick the winners of the March Madness contests. The model is primarily based (weighted at 5/7 of the model) of a composite of computer college basketball rankings. These computer based rankings are combined with two human based metrics (2/7 of the model): the NCAA selection committee’s S-Curve and preseason rankings from the Associated Press and the coaches (used as an indicator for “underlying player and coaching talent”). Additionally, Silver and his team adjust for injuries and player suspensions (using a statistic called win shares) and travel distance. Silver then simulates the tournament thousands of times to obtain predicted probabilities of each team advancing in each round (interactive graphic with the final model can be found here).
What other factors influence a win probability? Other inquiry has backed up Silver’s notion that rankings matter, and that season performance (wins (particularly away wins), offensive scoring) and historical team performance (final four appearances, championships) also can lend some predictive insight. Ken Pomeroy’s predictive rankings are also very popular (and also incorporated into Silver’s model), although details of his methods are hidden behind a paywall. His models highlight the importance of strength of schedule as an important factor in the equation. Additionally, ESPN’s Basketball Power Index (BPI), created by Alok Pattani and Dean Oliver, accounts for the final score, pace of play, site, strength of opponent and absence of key players in every Division I men’s game (a new addition to silver’s model this year). However, the inclusion of these metrics into a regression equation rarely gets you more predictive prowess than a coin toss (R2=.5).
Although modeling could help you gain valuable insight into your office bracket pool, it will not lead to a perfect bracket without a large amount of luck coming your way. Although sports do have a large amount of systematic variation, the inclusion of a good amount of random variation is what makes both prediction difficult and athletic contests beloved. When filling out your brackets this year, data driven analysis should give you leg up wouldn’t have had otherwise. Listen to what the fox has to say. (For further reading: predictive analytics are also used to predict which teams will be selected to the tournament on Selection Sunday, with surprising accuracy).