Lester Primus, Taylor Madison, Julian White
In the spirit of March, the best application of statistics is to analyze the famed NCAA College Basketball tournament, also known as “March Madness”. Every year thousands of people of all ages rush to their favorite sports website to fill out a bracket in hopes that they could be the one to correctly predict the outcome of each game throughout the tournament. Though there has never been one who correctly predicted 100% of the games played in the tournament, there have been reports of people guessing the entire first round correctly. The number of teams participating in the current system is 68 which is an increased amount from the 68 in 2011 which was also increased from the 64 in 1985. This means that today, there are 67 total games played and for a team to be crowned champions they must win at least six games.
The possibilities of combinations of surviving teams each year seems almost endless, especially to those doing their best to win competitions with their friends. Teams are ranked or “seeded” based on several factors from their season. These factors include their win-loss record, their performance in their conference tournament, their strength of schedule, and the number of ranked teams they defeated along with the number of unranked teams by who they were beaten. The teams ranked number one in each region are the top four teams over all, with the number two seeds being teams five through eight overall, and so on. Therefore, upon first glance, it is intuitive to predict that the higher ranked team will always win their game. Past history, year after year, has proven otherwise. The victory of a lower seed over a higher seed is known as an “upset”. Lower seeded teams that commit multiple upsets to remain in the tournament are known as “Cinderella” teams. We hope to use statistics to answer the many questions of the common sports fan. We will analyze the relationship of seeds with likelihood of victory as well as the potential of one seed to teams to upset another.
First, we will determine the relationship between a team’s seed and the number of games they win in the tournament, if there is one at all. A linear relationship would display the teams more likely to win, that is, the higher seeded, do win. This could easily be set up in confidence intervals and hypothesis tests. If one were to pick the winners of each game purely based on seeding how likely would that person be correct, and at what significance level? This is done by finding out the average number of games a higher seed tends to win each year and compare it to the average number of games a lower seed wins. Because there are sixteen different seeds several seed match ups can be compared.
A second question can be to determine if a lower seed are motivated by the “upset factor” to win their game. For example, out of the 32 first round games, would an upset occur in at least half the games proving the lower seeds more likely to win in total? The null hypothesis is equal to 16 while the alternate hypothesis is less than 16.
The number one and two seeds are considered to be the dominant teams of the tournament and are almost never picked to lose the first round. A number one seed has actually never lost a first round game going back to the first NCAA tournament games in 1939. A final question can determine the likelihood of a number one or two seed making the Final Four (this is the semi-finals of the tournament). This probability can be found by treating each year as a sample and finding the number of number one seeds that make it to the Final Four each year. A confidence interval can be found using these sample means and standard deviations.