Correlation between SAT Scores, Graduation Rate, and SAT Participation Rates





The SAT test has long been the standard by which universities and students measure readiness for university, and there is a large amount of data concerning it and the students taking it. There is a large amount of data concerning high school students, the main segment of the population that take the SAT. The SAT test is normally integral to the collegiate entrance process, and is the bane of many a student in high school. This test is an indicator of high performance and scholastic achievement, and, while varying numbers of students take this test, it provides an idea of the number of students looking at college. The SAT score of a high school measures the readiness of its students for college and can determine funding. However, the focus of the majority of private and public education studies involves the entire teenage population and the ever important graduation rate. This rate has been used as the benchmark of school performance by legislation and public perception, and this rate can elevate a school or force it into reorganization. The graduation rate, therefore, is the most important statistic regarding our primary education system, and translating this into enrollment in secondary education is a focus of legislative efforts. Combining both of these facets offers a large window into a segment of the population that is the focus of numerous social studies: the American teenager.

To find the SAT data go to  Afterwards click on The Data Library followed by Data Sets.  Once there look for the 14th link from the top titled “SAT Scores by State (1990-2004)” to download the Excel file of the data.  If you look at the download link directly below it titled “1984-1993 Teen Statistics” you will find the other data set we used.  The data collected on SAT statistics from 1990 through 2004 shows the participation rate, along with mean math and verbal scores, of students by state.  In addition, it has the national average scores and participation rate from 1991 through 2004.  Our other data set tabulates a variety of information by state such as high school graduation rate, juvenile violent crime rate and median income.  We hope to use the data of overlapping years in the two sets to find correlation between a subset of the variables.

The data suggests a correlation between the participation rate of high school students and the mean SAT I scores of both the verbal and the math portions of the SAT.  At first glance, there appears to be a negative correlation of participation rate to mean SAT score.  One theory for this alleged phenomenon is that only the very well prepared students in the low-participation states take the SAT, raising the average compared to states with high-participation rates, where more ill-prepared students take the SAT.  We will attempt to prove, with confidence intervals and hypothesis testing, that there is indeed a negative correlation of participation rate to mean SAT score that is not the result of random chance.

The data also suggests a correlation between the graduation rate of high school students to SAT scores.  It appears that there is a positive correlation between high school graduation rate to mean SAT score.  One theory for this apparent correlation is that the states with higher graduation rates produce more well-prepared students that will take the SAT exam.  We will again, with confidence intervals and hypothesis testing, attempt to prove that there is a positive correlation between graduation rate and SAT mean score that is not the result of random chance.

States in the U.S. will want to improve their mean SAT scores, since this reflects well on their education systems and encourages more funding.  If either of these correlations can be proven to not be the result of random chance, then the states will have a general idea of how to improve their mean scores (if the correlations are true, lower the students taking the test and raise the graduation rate of students.)  This knowledge would greatly aid the policymakers in the education departments across the U.S.


Drexel University. (2008, August 19).  SAT Scores by State(1990-2004). Math Forum. Retrieved March 25, 2011, from

Drexel University. (2008, August 19).  1984-1993 Teen Statistics. Math Forum. Retrieved March 25, 2011, from

Is the NFL now a passing league?

Seth Friedman, Xiongfei Gao, Joseph Newman

In recent years, it has been claimed that the offensive production and statistics in the NFL have changed considerably. Articles such as this one that suggest the NFL is becoming a “passing league” are becoming more and more frequent. Many football analysts stress that having a franchise quarterback is more important than relying on a running back or defense, because that’s the way the league is now.  The question to be asked, however, is: is it just the number of passing yards that have been increasing over the years, or is this a general trend across all of the offensive statistics?

In order to examine this, it would be helpful to examine a database containing the offensive statistics for the past couple of decades. To do this, we will be using the statistics database and viewing the offensive data from 1966 (the year of the first Superbowl) to 2011. We will be analyzing both the number of passing yards/game and number of rushing yards/game to determine whether one has increased, both have increased, or none have increased. We are using the number of yards/game instead of just number of yards because the NFL season was changed from 14 to 16 games in 1977.  For each statistic (i.e. passing or rushing), we will perform two one-tailed hypothesis tests (each using the median year of 1988 in the null hypothesis): one will test if the pre-1988 data is less than the 1988 average, and the other will test if the post-1988 data is greater than the 1988 average.

To do this, we will use μ = the 1988 average (219.4 for passing and 69.0 for rushing) for the null hypothesis for both one-tailed tests, and we will use μ < the 1988 average for the pre-1988 test’s alternate hypothesis and μ > the 1988 average for the post-1988 test’s alternate hypothesis. If we reject the null hypothesis, then it confirms that there is an increasing trend in the data, and if we fail to reject the null hypothesis, then we fail to show that there is an increasing trend. We will be examining the top 20 NFL quarterbacks and running backs instead of all of them, as this will not only still produce an accurate test result for each one-tailed test (20 players * approximately 20 years for each of the two tests equals around 400 data points to use, meaning that the Central Limit Theorem applies).

Other methods that we will be using for examining the data will include building a 95% confidence interval around the data, and we can also construct a couple of linear plots to confirm our conclusions about the statistics. For example, we can plot the number of passing yards/game versus the year, the number of rushing yards/game versus the year, and the number of rushing yards/game versus the number of passing yards/game. In all of these plots, we can use the tools that we have learned in class to determine whether or not a linear relationship exists between each of these sets of variables. Through all of this, we believe that we can definitively determine whether the offensive style of play has changed.


Jason Clary.  (2011, June 09).  NFL: Has the NFL really turned into a pass-first league?
Retrieved from:   -a-pass-first-league

NFL Statistics.

Red Light Cameras

Megan Covington
Kasey Hill

Statistical Analysis of Red Light Cameras in Texas Intersections

Over 30,000 fatal crashes occur in the United States each year (NHTSA 2009).  Many of these occur in intersections, specifically when drivers fail to stop at a red light.  Several cities now use red light cameras to automatically give tickets to those who run red lights.  These cameras identify the license plate numbers of the offending vehicles and mail tickets to the registered drivers of the cars.  The stated use of these cameras is to improve public safety and decrease the number of fatal crashes by deterring motorists from running red lights and thus causing accidents; however, critics – including AAA – think that the cameras are installed merely to generate increased revenue for local and state governments (Batista 2010).  For this project, we will set out to determine if red light cameras actually decrease the amount of crashes and decrease the likelihood of injury for crashes that occur in an intersection.

For this application project, we intend to use data gathered by the Texas Department of Transportation (2011) regarding the number of accidents at specific intersections before and after the installation of red light cameras at those intersections.  The population is all intersections with red lights, and the sample is chosen intersections in Texas.  We assume that all motorists are informed of the presence of the red light cameras at each of the specified intersections, that each motorist who runs a red light is captured by the camera and given a ticket, and that no other change affects the intersection except the installation of red light cameras.  Once the data is collected, we will block for the crash type: fatal, injury, and non-injury.  We will compute the percent change in the number of each type of accident before and after installation of the red light cameras at each intersection.  We will also look at the percent change in the total number of crashes before and after installation of the red light cameras at each intersection.

Assuming that the percent changes at each intersection form a normal distribution for each block, we can then conduct a hypothesis test for each block (fatal, injury, non-injury, total) with the null hypotheses being that X = 0, where X represents the percent change in the number of crashes before and after the installation of the red light cameras.  The alternate hypotheses will be that X < 0, meaning that the number of accidents in an intersection has decreased following the addition of red light cameras.

We shall examine the possible Type I and Type II errors for these hypothesis tests.  Type I error would occur when there is no percent change in the number of crashes before and after the red light cameras are added, but we reject the null hypothesis anyway.   This type of error could lead local and state governments to install the cameras thinking that they effectively reduce the number of crashes when in fact, they have no effect on the accident rate.  Type II error would be that we do not reject the null hypothesis when in fact there is a decrease in the number of crashes.  This type of error would result in governments choosing not to install red light cameras when they could reduce the number of accidents and help save lives.   95% confidence intervals for the mean percent change in accidents for each block will be computed and examined.

The results of this statistical analysis could demonstrate the effectiveness of red light cameras and the validity of government’s arguing that they improve safety, disproving the theory that they are merely installed to increase revenue.  If a larger amount of data was collected from across the country, further statistical analysis could determine whether or not red light cameras actually are effective in helping to decrease the number of crashes and thus save lives.


Batista, Elysa.  (2010, May 13).  Crist signs Fla. bill legalizing red light cameras.  Naples Daily News.  Retrieved from

National Highway Traffic Safety Administration (NHTSA).  (2009).  Fatality Analysis      Reporting System (FARS) Encyclopedia [data file]. Retrieved from

Texas Department of Transportation.  (2011).  Red Light Cameras – Annual Data Reports [data   file]. Retrieved from

Where to Live to Be Happy

Jack Minardi and Chris Lioi

Many notions of happiness exist, and most of them are subjective and hard to quantify.  However, many people have proposed various schema for the quantification of happiness.  Many of these are the result of surveys and self-assessments questionnaires that aim to assign a number on a certain scale indicative of how happy a person is.  What sort of factors affect happiness?  Or perhaps a weaker but more reasonable question, what sort of factors affect a certain numerical quantification of happiness?  It may be that metrics based on different characterizations of happiness are affected by different things.

Our project proposal is to find what (if any) such correlations exist for a certain metric (or metrics).  In particular, there is ample data offered by the organization World Database of Happiness.  Their website[1] is “an ongoing register of scientific research on the subjective enjoyment of life”.  It lists several metrics of happiness by nation or geographical region. There is also another measure known as Gross National Happiness [2] (GNH) named such to parallel the concept of Gross National Product. It is claimed to be a better measure of a given country’s success as compared to GDP. Either one of these databases may be used. Since the data presented in the World Database of Happiness does not seem to be easily downloadable, we will write a python script t scrape the site and collect the needed data.  We will correlate these metrics with various other data for the world’s nations, such as population, GDP, average age, average lifespan, and any other variables that would appear to be of consequence.  These other data may be obtained easily from any number of public data sources, such as the CIA Factbook[3].  Again, if the data is not presented in an easily downloadable format, such as an excel file, our plan is to write simple scripts to scrape the website for the relevant data.  The data analysis, in the form of linear regression using gradient descent, will be done in either R or MATLAB. We plan on writing the algorithms ourselves to get a better understanding of how they operate.

In the end we hope to be able to present statistics that show how many different metrics are related to happiness, and hope to gain a better insight into what makes us happy. Using the tools learned in the class we will be able to show how strongly the different measures are correlated, and what seems to contribute the most to overall happiness.




Big Bully on Campus: Is Vanderbilt Stealing Your Lunch Money?

Curtis Northcutt
Peter York
Hayden Kelly

MATH 216 – Statistics, Dr. Derek Bruff
Statistics Project Proposal
March 26, 2012


Big Bully on Campus: Is Vanderbilt Stealing Your Lunch Money?

Math 216 Statistics Project Proposal

Does the average student lose money on a given meal purchased using meal plan at Rand? This project focuses on the average cost paid for a single meal plan vs. the average cost of a single meal at RAND. While we note that it is commonly held that the price of a given meal does not accurately reflect the market value of the food purchased (i.e. on-campus food is generally believed to be overpriced), the analysis of whether or not this is actually the case is beyond the scope of the project. Rather, our goal is to advise students who have decided to eat on campus whether to purchase meal plan next semester or use Commodore Cash for food purchases based on the results of our experiment. Thus, a successful statistical analysis of the population of Vanderbilt students who eat at RAND will allow such students to save money next semester on their meal expenses.

We will watch the registers in Rand on Monday and Tuesday, gathering samples consisting of three pieces of data: (1) the price of the meal, (2) the number of remaining meals, and (3) the gender of the person purchasing the meal.  Because we do not know the distribution of meal price, will gather 100-200 samples to assume normality. If a student does not use meal plan, we will not gather any data for that student as they are not in the population we are considering. We will gather samples at breakfast and lunch only, because if we gathered data on the only night that Rand serves dinner, Tuesday, our data would likely be unreasonably linearized due to the homogeneity of prices on Tortellini Tuesday.  By gathering data on Monday and Tuesday, we will be able to ascertain the percentage of students with each type of meal plan (8, 14, 19, or 21 meal plan) from the number of remaining meals data. Since all Vanderbilt meal plans reset at 12:00 a.m. on Monday, it will not be possible for students to have used enough meals by Monday afternoon in order for their “meals left” to drop below their meal plan category. This project may be extended to analyze other variables by also answering the questions: “Do males or females lose more money on meal plan?” and “Would our results be different if we sampled from Branscomb Munchie Mart instead?”

We will calculate the average cost per meal for all students based on the price for each plan, provided by Vanderbilt Dining[1]. We will then construct a probability distribution function, where x = the average cost per meal for a given meal plan type and P(x) = the percentage of students who have that meal plan. The expected value of this PDF is the average cost per meal for all Vanderbilt students on meal plan.

We will then perform a hypothesis test to determine whether the average cost and average price of a given meal are in line with each other.  We will let H0 : μ = E(x), where µ is the average price per meal and E(x) is the average cost per meal, as calculated above. This choice for our null hypothesis stems from Vanderbilt’s assertion that the meal plan average cost approximates your expenditures. We will let HA: μ < E(x), the average price per meal is less than the average cost per meal. By modeling the normal distribution, we will conduct a hypothesis test to determine if there is statistically significant evidence to reject H0 in favor of HA with α = 0.05.

If we fail to reject the null hypothesis, we will advise students who eat on campus to stick with meal plan; however, if we reject the null hypothesis, we will advise students to purchase their meals with Commodore Cash and save the difference between the cost of meal plan and the price of dining hall food. While it would be more exciting to be able to reject the null hypothesis if we fail to reject the null hypothesis, it would be gratifying to learn that the university is not taking advantage of students.


[1] Vanderbilt University. VU Meal Plans. Retrieved from




Moore’s Law Holding Steady?

Authors:  Graham G.,  Colin T.,  Richard W.

Technology development is increasing at a very rapid rate.  Gordon Moore proposed in the mid 1960’s that the number of transistors that can be placed on an integrated circuit at a reasonable cost doubles every 18 to 24 months. This trend in technological increase has been observed not only in transistors, but also in processor speed, memory capacity, and even pixel densities in digital cameras. However, some now fear that we are approaching the the physical limits of how small we can make these technologies, while still maintaining the same reliability. How long can this trend of unbounded growth continue?These technological increases are significant to many different aspects of the world including  businesses, education, communication, the “information grid,” and the digital divide between 1st and 3rd world countries. In industry, companies need to forecast what technologies will be available to them when they go to develop new products down the line. For instance, if a company plans to create a mobile device to be released 4 years from now, they need to draw up the specifications based the best components they will be able to find when they go into manufacturing, not the best parts currently on the market.  Educators need to be aware of different technologies as they come into existence, as they need to teach their students how to utilize new technologies in order to produce an efficient workforce.  The current divide between technological capabilities in 1st world countries and 3rd world countries is currently quite large, but as technology gets less expensive, will 3rd world countries continue to lag behind, or will they be able to catch up?

We are interested in testing whether Moores law holds in several different tech sectors.  Does Moore’s Law hold for processing power? How about memory capacity? What about pixel densities in cameras? All of these questions relate directly back to whether or not Moore’s law holds because they are the effects of the different areas the law affects. By looking at data over many years for these individual traits, we can compare how the number of transistors on a chip translates to the technologies that number is supposed to make better.

Data for these questions should be quite easy to obtain.  It isn’t very difficult to go online and find historic prices for different processors, hard drives and cameras.  What is difficult is determining what is reasonable as “the technology” for a given year.  For any given year, we will likely be able to find many processor and hard drive models on the market, so determining what a given year’s “transistor count” or “cost per megabyte” may be much more difficult to ascertain.  We will need to come up with some method for averaging the prices for different models in a given year.

For each of our questions, we will compare the data on processing power, memory capacity and pixel densities to the number of transistors on an integrated circuit to see if the trend holds. We will do a two-sided test for our analysis for each technology. Our null hypothesis will be that we accept Moore’s Law, since it is merely an estimation of the advancement of technology and we can allow some tolerance if it does not hold precisely. Our alternate hypothesis will be that the Law does not hold and that it either overestimates or underestimates our ability to continue this growth.

We will also examine the errors that go along with our hypothesis tests. Type I Error would be concluding Moore’s Law doesn’t hold when it actually does. Type II Error would be concluding that Moore’s Law does hold when in fact it doesn’t.


1. Long, Phillip D.  (May 2002).  Moore’s Law and the Conundrum of Human Learning.  Retrieved from

2.  Intel.  (February 2003).  Moore’s Law: Raising the Bar.  Retrieved from

3.  McCallum, John C. (2012). Memory Prices 1957 to 2012. Retrieved from

4.  Wikipedia.  Moore’s Law.  Retrieved from’s_law

Go Dores: Final Shot Strategy Proposal

Go Dores: Final Shot Strategy proposal

BY Jiacheng Ren and Haolin Wang


All Vanderbilt Basketball fans have faced the situation like this: Vanderbilt Commodores was 2 point behind in the second half and we get the ball. There was only 15 seconds left to make the final shot. Shall we make a two-point basket to tie the game or shoot a three to end it now?

To answer this question, we must know the probability to score a two-point and a three-point shot. Apparently, it is relatively easy to take a two-point shot than a three pointer because it is easier to score when shooting nearer to the basket and the defense might be more focused on preventing us shooting a three-point shot. However, even if we made the two-point shot, we still have to play in the overtime game. Unfortunately, we have a very poor overtime game winning record, which would almost compromise our effort on making the two-point shot. On the other hand, we have top three-point shooters in the whole country.  Perhaps, the chance of scoring a three would be slim since our opponent would put more attention on preventing us shooting a three. We might be better off to let our John Jenkins or Jeffery Taylor to win the game right away.

In order to find the best chance to win the game, we need to know the key factors that could possibly influence the result. According to Bill Hanks, a 32-year-experienced basketball coach, “the final shot by a team is dictated by five factors.”  The first one is the time on the clock, which dictates how much time the shooter have for the final shot and how complicated your final play could be. We will ignore this factor in this project to simplify our analysis. The second factor is foul situation, and we will also simplify this factor by assigning a fixed probability of getting fouled based on Hanks’s experience. “The closer a player is to the basket, the higher the chance of a foul.” The third factor is the players. Assuming in this scenario, the players in the game are Jenkins, Taylor, Ezeli, Tinsley and Goulbourne. The table below shows the stats of the players. The fourth factor is the placement of the ball and the fifth factor is the defense. We will ignore these two by making assumptions in the scenario. In addition, we would like to add another key factor because what we are interested in is not only making the last shot, but also winning the game. This factor would be our overtime performance. By gathering data from the web, we can know the ranking of the two teams and our expectation to win in the overtime. Sadly, we lost all 3 overtime games this season.

Player                                 FG%       FT%        3P%

John Jenkins                      .474        .837        .439

Jeffery Taylor                     .493        .605        .423

Festus Ezeli                       .539        .604        .000

Brad Tinsley                       .474        .855        .415

Lance Goulbourne              .456        .680        .309

First of all, we can apply hypothesis test on the overtime winning chance. H0 will be we have a 50% chance and HA will be the chance is less than 50%. For the shooting, we will run simulations which, for example, let Jenkins shoot 3 point for 100 times and let Ezeli attempt a close range shot for 100 times. We will also run a simulation to see if we get a foul or not. The approximate chance of foul can be concluded from the database of ESPN.  At last, we consider all of our simulations and find out what’s the expectation of each different final play strategies.



Social Networking and GPAs

Robert Price, Suzanne Ward, Robert Wolff

Application Project: Part 1

Math 216


Social Networking and GPA

Most people hold the belief that social networking is having an overall negative effect on students’ scholastic success. Many academic researchers and online articles support this point of view, as Facebook, Twitter, and other network sites have been deemed wastes of time which are used at the expense of studying. Although most online articles suggest that time on social networking sites have a negative impact on a student’s GPA, while other researchers suggest that the so-called negative impact is non-existent. Our goal will be to determine the relationship between social networking use and students’ GPAs among Vanderbilt students.

A recent article in The Telegraph claims that educators hold social networking websites responsible for students’ bad performances in classes (Bloxham, 2010). This article cites a study of 500 teachers where the overwhelming majority responded that the influence of social media websites on their students has negatively affected their performance in schools for various reasons. The teachers believe that students have a tendency to hurriedly finish homework in order to communicate with people online, and may even be communicating online in the classroom during class time. Additionally, teachers note that students are less able to concentrate in class due to their obsession with social media usage. Although this study was based purely on the observations of teachers, many other studies have shown that this correlation is real. One documented in The Daily Mail stated that the average GPA of those who frequently use facebook is 3.06, whereas the average GPA of those who don’t use Facebook very much is 3.82 (Choney, 2010).

In contrast, another study found online documented research conducted by the University of New Hampshire Whittemore School of Business and Economics. This study of 1,127 students determined that there was no correlation between online social networking usage and grades (Capano). The researchers split up the grades of students into one of two categories, high grade or low-grade. Coupled with this division, social networking usage was split into either a heavy usage or light usage category. Of the heavy users, 65% were placed into the high grades category while 63% of the light users were placed into the high grades category. The results of this study are contrary to popular belief, and led us to wonder what kind of results could be seen at Vanderbilt.

Our group will collect data from Vanderbilt students by sending an anonymous survey to the class along with our fraternity and sorority listservs.  We want to conduct the survey solely via email and not through social media sites so that our data is not skewed towards people who use these sites.  We will assume that the responders will represent a random sample of Vanderbilt students, but include a few qualifying questions on the survey to make sure that our data is not weighted too heavily toward a particular college.  The survey will also include questions regarding the student’s use of social media websites as well as their GPA.  While only surveying the class would most likely give us a good random sample of engineers, we hope that surveying more students will allow us to more confidently analyze any relationships that appear in the data.  The following social media websites will be included: Facebook, YouTube, Twitter, Pinterest, Google+, and Other.

After the data has been collected, we will analyze correlations between social media use and GPA of Vanderbilt Students.  As of now, we want to look for a relationship between time spent using social media and GPA as well as a relationship between number of sites used and GPA.   The null hypotheses are that time spent and number of accounts have no effect on a student’s grade point average.  The alternate hypotheses are that they do have an effect.  We will also use linear regression to look for relationships between GPA and time spent, or GPA and the number of accounts.  To add to our study, we will also compute various confidence intervals to determine the likelihood that social networking is having a negative impact on grades. From the confidence intervals, we will be able to analyze the likelihood of both Type 1 and Type 2 errors.


Bloxham, A. (2010, November 18). Social networking: teachers blame Facebook and Twitter for pupils’ poor grades. The Telegraph. Retrieved March 25, 2012, from

Capano, N. Social Networking Usage and Grades Among College Students. University of New Hampshire, Whittemore School of Business & Economics. Retrieved March 25, 2012, from

Choney, S. (2010, September 7). Facebook use can lower grades by 20 percent, study says. MSNBC. Retrieved March 25, 2012, from

Income-Dependent Hospital Admissions Statistical Analysis

Nathan Hall, Siana Aspy, Tim Altmansberger

The discrepancies between medical facilities in low-income and high-income American communities can be striking. Health insurance is a majorly controversial political issue in the modern age, largely because of the very high price of medical care; This exorbitant price has a tendency to affect those who can least afford it, the same people that are unable to pay for insurance. Preventative care will ward off higher costs in the future, but it can be seen as an unnecessary expense by those who are in financial turmoil. Intuitively, then, a sample of people with lower income would be less likely to seek medical attention for a given problem until it becomes serious. It logically follows that, for a given hospital in a low-income area, the severity of cases would be, on average, higher than that found at a hospital in a high-income area.
In order to test the idea that low-income hospitals would see cases of a higher severity for a given ailment, we propose a hypothesis test on data for multiple admissions at a variety of hospitals. The number of times a patient is readmitted should correlate to an inability to pay for preventative care – maintaining overall health and well-being can get expensive! The null hypothesis in this study would take the following form:

H0 = Low-income hospitals display a similar readmission rate per patient than higher income hospitals for any given type of illness.

Accordingly, the Alternative Hypothesis would be as follows:

Ha= Low-income hospitals will display a significantly higher readmission rate per patient than higher income hospitals for any given type of illness.

(The terms “low-income hospitals” and “high-income hospitals” will, for the duration of the project, refer to hospitals with a low-income patient base and high-income patient base respectively. This may also be determined by the area in which the hospital is located if no patient salary information is available)

Sampling error may result from the types of cases being treated at the hospital in question. For instance, a cancer patient would be expected to undergo more return visits (thus producing a higher readmission rate) than a stomach virus patient. Thus, a hospital with a more state-of-the-art cancer facility would generally be more likely to have a higher readmission rate due to a flaw in sampling. This can occur for any hospital with a field of specialization, which may skew the data drastically. To avoid the possibility of sampling error, readmission rates will be grouped by type of ailment. The data will be segregated into categories (such as oncology, gastrointestinology, etc.) for a more representative comparison.

The data obtained on readmissions at different hospitals can be analyzed using various statistical techniques. Confidence intervals with varying significance levels can be constructed based on the mean readmission rate at high-income hospitals. If the mean readmission rate from low-income hospitals falls outside the confidence interval, the null may be rejected in favor of the alternative. In addition, because a larger percentage of Medicare and Medicaid funding goes to people with lower income, a linear regression could be used to find what correlation (if any) exists between Medicare/Medicaid spending and readmission rates to hospitals. Using these forms of statistical analysis, it can be determined whether or not there is a link between patient income and readmission rates to hospitals.

Average Value of MTG Packs

James Trippe

Daniel Hasday

Average Money Earned per Magic Booster

                Traders buy and sell Magic: The Gathering cards daily for up to hundreds of dollars per card. However, cards generally come off the printing box and are filled with a random set of cards. These packs have 1 rare. 3 uncommons, 10 commons, and a land, token or foil card that is randomly chosen from any rarity. Additionally, a rare now can also be a “mythic rare” one in eight times. Ultimately though, bulk buyers end up buying a lot of these packs and opening them to get the cards and sell them individually. Thus, their livelihoods depend on how much money you get from opening a pack. We want to compute the average amount earned per pack opening versus selling the pack for $4, the standard price at gaming stores. We will use three sets of Magic cards; the most recently released (Dark Ascension), the most recent core set (M12), and Worldwake. With these three sets we will determine the feasibility of selling the cards pulled from the packs by a bulk buyer.

Our null hypothesis is that the mean price of cards pulled from a pack is equal to or less than four dollars, meaning we want to sell the packs. The alternative hypothesis is that the price of the cards in the pack is greater than four dollars, meaning we want to open the packs and sell the cards. More rigorously:

Ho: μcards ≤ $4                                                                      Ha: μcards > $4

To test our hypothesis, we will be utilizing some online resources since we don’t want to spend thousands of dollars on hundreds of packs. There is a website1 that can simulate packs of any set using the rarities that Wizards of the Coast2 does to print the cards. We will pull boosters from the website and catalog what we get per pack into an excel file. We can get the average prices of the cards from another site3 and put this in the same file. With a little computer wizardry we can get the total amount of money stored in the pack. We will do this in sets of 50 packs for each set in order to have statistically relevant data.

What we care about from this exercise is that amount of money stored in each pack. Thus, we will construct 90% confidence intervals for each of the samples and reject the null hypothesis at the 10% confidence level. The reason for this is that there is little risk involved with this, and opening the packs to sell the individual cards versus simply selling the packs are pretty interchangeable once you have the packs. Thus, if there is even a slight trend towards opening the packs earning more money, we want to take advantage of the situation. Once these confidence intervals are done, another interesting trend could be whether opening packs produces a linear relationship with money earned. Thus, we can plot number of packs opened versus money earned and determine the linear relationship. For our data visualization, we will attempt to compare the average money earned for each of the three sets and determine which one is most likely to earn you money by opening the pack.




1) Online Free Bestiaire Magic Draft: Magic Draft. (January 30, 2012). Online Free Bestiaire Magic Draft: Magic Draft. Retrieved March 25, 2012, from

2) Booster pack – Wikipedia, the free encyclopedia. (March 5, 2012). Wikipedia, the free encyclopedia. Retrieved March 25, 2012, from

3) (n.d.) Retrieved March 25, 2012, from