Exploring Relationships in Body Dimensions

Irene Hukkelhoven, Zachary Sanicola, Natalie Thoni

For years, the Body Mass Index (BMI) has been used to quantify obesity. Recently, questions have been raised concerning the appropriateness of using this index to define someone’s healthy weight range. A particularly daunting concern stems from the medical insurance company practice of using an applicant’s BMI as one of the terms to compute and justify that person’s insurance premium. Moreover, some researchers insist that the “BMI is bogus” [1]. Our goal is to investigate the validity of this concern by redefining a healthy weight range and to determine if the BMI falsely labels a person as obese. Although de-emphasized, we will also discuss the applications of skeletal measurements in gender determination and define size-ranges for manufacturers of retail and ergonomic goods.

The dataset [2] is comprised of 507 total individuals. Specifically, our sample population contains 247 men and 260 women, all of whom complete several hours of exercise a week. The majority of the subjects were in their twenties and early thirties at the time of sampling, though the dataset also contains a handful of older men and women. To concentrate on a more specific demographic, we will eliminate those men and women over a certain age. As such, we define the population as all physically active young adults.

Several questions we will attempt to address are as follows:
1. Is height a good indicator of weight? How accurately does the BMI assign under- or overweight statuses?
2. What skeletal bone is the best indicator of gender?
3. How many units of each shirt size should a clothing retail store order to ensure they will not be overstocked? How big should an airline make their seats to accommodate the smaller 95% of the population?

Studies have already shown that height is a poor indicator of weight, so we predict that our linear regression correlation will not be near an absolute value of 1. We will show that there are other body measurements with improved correlation values that can be used to better predict a subject’s “scale weight”. To justify the higher accuracy of using other body measurements, we will use hypothesis testing on a randomly selected group within our sample population (two for each individual; H_0: u = x; H_A: u =/ x, where u is the scale weight and x is the weight determined by (1) solely height and (2) other body measurements) and calculate and compare respective p-values. Furthermore, we will attempt to define a more rigorous equation (than BMI) for determining whether an individual is within a healthy weight range, thus answering questions about the value of the obesity index for individuals whose body build is atypical for their height.

Contrary to popular belief, pelvic measurements are not the most reliable data for determining the gender of skeletal remains. Using histograms, we will uncover which skeletal measurement best evidences a male or female body. Specifically, we will seek to identify those body parts whose normal distribution curves for male versus female subjects overlap the least. Such information can be useful in forensic science and anthropological studies (i.e. identifying remains of a missing person, or shedding light on ancient cultural burial rituals)

To figure out what quantity of each shirt size a buyer for a clothing store should order, we will calculate five two-sided confidence intervals to relate weight to shirt size (XS, S, M, L, XL). Then, we can use a probability density function to determine how many units of each shirt size a buyer should order without fear of being overstocked. In a similar manner, confidence intervals can be employed to calculate the size of an airplane seat suitable for the smallest 95% of the population (this will be a one-sided interval, since a seat big enough to fit the largest people will automatically also fit the smallest people).

In conclusion, our project will primarily highlight the topic of health, obesity, and BMI. As a secondary focus, we will look into the applications of skeletal measurements for gender determination as used in forensic and anthropological research. Lastly, we will address how our data can be used to provide useful information to clothing and furniture manufacturers.

References:

1. Devlin, Keith. “Top 10 Reasons Why the BMI Is Bogus.” Npr.org. National Public Radio, 4 July 2009. Web. 25 Mar. 2012. <http://www.npr.org/templates/story/story.php?storyId=106268439>.

2. Heinz, Grete, Louis J. Peterson, Roger W. Johnson, and Carter J. Kerk. “Exploring Relationships in Body Dimensions.” Journal of Statistics Education 11.2 (2003).Amstat. Web. 25 Mar. 2012. <www.amstat.org/publications/jse/v11n2/datasets.heinz.html>.

Bracketology: How does Seeding determine Success?

Lester Primus, Taylor Madison, Julian White

In the spirit of March, the best application of statistics is to analyze the famed NCAA College Basketball tournament, also known as “March Madness”. Every year thousands of people of all ages rush to their favorite sports website to fill out a bracket in hopes that they could be the one to correctly predict the outcome of each game throughout the tournament. Though there has never been one who correctly predicted 100% of the games played in the tournament, there have been reports of people guessing the entire first round correctly. The number of teams participating in the current system is 68 which is an increased amount from the 68 in 2011 which was also increased from the 64 in 1985. This means that today, there are 67 total games played and for a team to be crowned champions they must win at least six games.

The possibilities of combinations of surviving teams each year seems almost endless, especially to those doing their best to win competitions with their friends. Teams are ranked or “seeded” based on several factors from their season.  These factors include their win-loss record, their performance in their conference tournament, their strength of schedule, and the number of ranked teams they defeated along with the number of unranked teams by who they were beaten. The teams ranked number one in each region are the top four teams over all, with the number two seeds being teams five through eight overall, and so on. Therefore, upon first glance, it is intuitive to predict that the higher ranked team will always win their game. Past history, year after year, has proven otherwise. The victory of a lower seed over a higher seed is known as an “upset”. Lower seeded teams that commit multiple upsets to remain in the tournament are known as “Cinderella” teams. We hope to use statistics to answer the many questions of the common sports fan. We will analyze the relationship of seeds with likelihood of victory as well as the potential of one seed to teams to upset another.

First, we will determine the relationship between a team’s seed and the number of games they win in the tournament, if there is one at all. A linear relationship would display the teams more likely to win, that is, the higher seeded, do win. This could easily be set up in confidence intervals and hypothesis tests. If one were to pick the winners of each game purely based on seeding how likely would that person be correct, and at what significance level? This is done by finding out the average number of games a higher seed tends to win each year and compare it to the average number of games a lower seed wins. Because there are sixteen different seeds several seed match ups can be compared.

A second question can be to determine if a lower seed are motivated by the “upset factor” to win their game. For example, out of the 32 first round games, would an upset occur in at least half the games proving the lower seeds more likely to win in total? The null hypothesis is equal to 16 while the alternate hypothesis is less than 16.

The number one and two seeds are considered to be the dominant teams of the tournament and are almost never picked to lose the first round. A number one seed has actually never lost a first round game going back to the first NCAA tournament games in 1939. A final question can determine the likelihood of a number one or two seed making the Final Four (this is the semi-finals of the tournament). This probability can be found by treating each year as a sample and finding the number of number one seeds that make it to the Final Four each year. A confidence interval can be found using these sample means and standard deviations.

Project Proposal: Alessandra, Medhi, Sonja

Household energy consumption is an important element in household expenditures and environmental impact. Understanding the factors that drive energy consumption is important to determine what affects it the most and how to improve efficiency. Energy consumption is a crucial issue worldwide because of the limited availability of energy resources. Because the United States is a world leader and among the top two countries of mass energy consumers worldwide[1], knowing the patterns and distributions of energy consumption in the U.S. could help affect energy worldwide.

The United States Energy Administration conducts period surveys of household energy consumption and publishes it in a document “Home Energy Use and Costs”. This data describes the many different factors that drive energy consumption in households including geographic location, demographics, etc. It would be interesting to hypothesize why households in different geographic locations use more or less energy, as well as which demographics affect household energy consumption. From the data provided, energy consumption varies dramatically by both geographic location and race and income.

First, we will analyze the data from 2005[2] to observe geographic variance, determining which location uses the most energy per household and the size of the standard deviation between data values. The same will be done for the data for race and income. To determine which factor drives the household energy consumption the most, we will compare the variability in each factor – the higher the variability, the more dependent the consumption on that factor.

According the data, the northeast region shows the highest energy consumption values per household at 122.2 million Btu’s per household. In order to verify this, we will conduct a statistical analysis that will yield the percentile value of energy consumption associated with the northeastern region. This would give us an idea of how much greater northeast energy consumption is compared to the rest of the regions in the United States. For household consumption by income, households whose incomes are $100,000 or more consume the most energy compared to lower income levels at 130.5 million Btu’s per household. The trend appears to be that the greater the income the more energy consumed. We can test the linearity of this trend. From the data for household consumption by race, non-hispanic white households consume the most energy at 99.9 million Btu’s per household. However, it seems as though the energy consumption does not vary as much as it does for geographic location and income.

For each factor, we will attempt to determine whether the data fits to a normal distribution or if it is skewed. Then, we will find the mean and standard deviations for the data and if the data is normal, calculate the z-scores based on these values. Next, for each factor, the percentage will be calculated to determine what the probability is that energy consumption will be a certain value based on that factor.

After the locations, races and incomes with the highest and lowest household energy consumptions are determined, it can be hypothesized why this is the case. For example, certain locations have higher temperatures that may result in higher energy consumption for cooling. For race, it may be generalized that some races are not as concerned with sustainability or educated on the use of energy. For households with higher income, they may use more energy due to more appliances and other commodities that result from a more expendable income. Lastly, these factors may be interdependent so a more in depth study would look at the ways they affect one another.

References

Energy Information Administration. (2005). 2005 residential energy consumption survey: energy consumption and expenditures tables. Retrieved from http://www.eia.gov/consumption/residential/data/2005/c&e/summary/pdf/alltables1-15.pdf

Swartz, S., & Oster, S. (2010, July 18). China tops U.S. in energy use. Retrieved from http://online.wsj.com/article/SB10001424052748703720504575376712353150310.html


[1]http://online.wsj.com/article/SB10001424052748703720504575376712353150310.html

[2]www.eia.gov/consumption/residential/data/2005/c&e/summary/pdf/alltables1-15.pdf

Correlation between SAT Scores, Graduation Rate, and SAT Participation Rates

Daniel

Peter

William

 

The SAT test has long been the standard by which universities and students measure readiness for university, and there is a large amount of data concerning it and the students taking it. There is a large amount of data concerning high school students, the main segment of the population that take the SAT. The SAT test is normally integral to the collegiate entrance process, and is the bane of many a student in high school. This test is an indicator of high performance and scholastic achievement, and, while varying numbers of students take this test, it provides an idea of the number of students looking at college. The SAT score of a high school measures the readiness of its students for college and can determine funding. However, the focus of the majority of private and public education studies involves the entire teenage population and the ever important graduation rate. This rate has been used as the benchmark of school performance by legislation and public perception, and this rate can elevate a school or force it into reorganization. The graduation rate, therefore, is the most important statistic regarding our primary education system, and translating this into enrollment in secondary education is a focus of legislative efforts. Combining both of these facets offers a large window into a segment of the population that is the focus of numerous social studies: the American teenager.

To find the SAT data go to www.mathforum.org/workshops/sum96/data.collections.  Afterwards click on The Data Library followed by Data Sets.  Once there look for the 14th link from the top titled “SAT Scores by State (1990-2004)” to download the Excel file of the data.  If you look at the download link directly below it titled “1984-1993 Teen Statistics” you will find the other data set we used.  The data collected on SAT statistics from 1990 through 2004 shows the participation rate, along with mean math and verbal scores, of students by state.  In addition, it has the national average scores and participation rate from 1991 through 2004.  Our other data set tabulates a variety of information by state such as high school graduation rate, juvenile violent crime rate and median income.  We hope to use the data of overlapping years in the two sets to find correlation between a subset of the variables.

The data suggests a correlation between the participation rate of high school students and the mean SAT I scores of both the verbal and the math portions of the SAT.  At first glance, there appears to be a negative correlation of participation rate to mean SAT score.  One theory for this alleged phenomenon is that only the very well prepared students in the low-participation states take the SAT, raising the average compared to states with high-participation rates, where more ill-prepared students take the SAT.  We will attempt to prove, with confidence intervals and hypothesis testing, that there is indeed a negative correlation of participation rate to mean SAT score that is not the result of random chance.

The data also suggests a correlation between the graduation rate of high school students to SAT scores.  It appears that there is a positive correlation between high school graduation rate to mean SAT score.  One theory for this apparent correlation is that the states with higher graduation rates produce more well-prepared students that will take the SAT exam.  We will again, with confidence intervals and hypothesis testing, attempt to prove that there is a positive correlation between graduation rate and SAT mean score that is not the result of random chance.

States in the U.S. will want to improve their mean SAT scores, since this reflects well on their education systems and encourages more funding.  If either of these correlations can be proven to not be the result of random chance, then the states will have a general idea of how to improve their mean scores (if the correlations are true, lower the students taking the test and raise the graduation rate of students.)  This knowledge would greatly aid the policymakers in the education departments across the U.S.

References:

Drexel University. (2008, August 19).  SAT Scores by State(1990-2004). Math Forum. Retrieved March 25, 2011, from http://mathforum.org/workshops/sum96/data.collections/datalibrary/data.set6.html

Drexel University. (2008, August 19).  1984-1993 Teen Statistics. Math Forum. Retrieved March 25, 2011, from http://mathforum.org/workshops/sum96/data.collections/datalibrary/data.set6.html

Is the NFL now a passing league?

Seth Friedman, Xiongfei Gao, Joseph Newman

In recent years, it has been claimed that the offensive production and statistics in the NFL have changed considerably. Articles such as this one that suggest the NFL is becoming a “passing league” are becoming more and more frequent. Many football analysts stress that having a franchise quarterback is more important than relying on a running back or defense, because that’s the way the league is now.  The question to be asked, however, is: is it just the number of passing yards that have been increasing over the years, or is this a general trend across all of the offensive statistics?

In order to examine this, it would be helpful to examine a database containing the offensive statistics for the past couple of decades. To do this, we will be using the NFL.com statistics database and viewing the offensive data from 1966 (the year of the first Superbowl) to 2011. We will be analyzing both the number of passing yards/game and number of rushing yards/game to determine whether one has increased, both have increased, or none have increased. We are using the number of yards/game instead of just number of yards because the NFL season was changed from 14 to 16 games in 1977.  For each statistic (i.e. passing or rushing), we will perform two one-tailed hypothesis tests (each using the median year of 1988 in the null hypothesis): one will test if the pre-1988 data is less than the 1988 average, and the other will test if the post-1988 data is greater than the 1988 average.

To do this, we will use μ = the 1988 average (219.4 for passing and 69.0 for rushing) for the null hypothesis for both one-tailed tests, and we will use μ < the 1988 average for the pre-1988 test’s alternate hypothesis and μ > the 1988 average for the post-1988 test’s alternate hypothesis. If we reject the null hypothesis, then it confirms that there is an increasing trend in the data, and if we fail to reject the null hypothesis, then we fail to show that there is an increasing trend. We will be examining the top 20 NFL quarterbacks and running backs instead of all of them, as this will not only still produce an accurate test result for each one-tailed test (20 players * approximately 20 years for each of the two tests equals around 400 data points to use, meaning that the Central Limit Theorem applies).

Other methods that we will be using for examining the data will include building a 95% confidence interval around the data, and we can also construct a couple of linear plots to confirm our conclusions about the statistics. For example, we can plot the number of passing yards/game versus the year, the number of rushing yards/game versus the year, and the number of rushing yards/game versus the number of passing yards/game. In all of these plots, we can use the tools that we have learned in class to determine whether or not a linear relationship exists between each of these sets of variables. Through all of this, we believe that we can definitively determine whether the offensive style of play has changed.

 

Jason Clary.  (2011, June 09).  NFL: Has the NFL really turned into a pass-first league?
Retrieved from: http://bleacherreport.com/articles/729496-nfl-has-the-nfl-really-turned-into   -a-pass-first-league

NFL Statistics.  http://www.nfl.com/stats/player

Red Light Cameras

Megan Covington
Kasey Hill

Statistical Analysis of Red Light Cameras in Texas Intersections

Over 30,000 fatal crashes occur in the United States each year (NHTSA 2009).  Many of these occur in intersections, specifically when drivers fail to stop at a red light.  Several cities now use red light cameras to automatically give tickets to those who run red lights.  These cameras identify the license plate numbers of the offending vehicles and mail tickets to the registered drivers of the cars.  The stated use of these cameras is to improve public safety and decrease the number of fatal crashes by deterring motorists from running red lights and thus causing accidents; however, critics – including AAA – think that the cameras are installed merely to generate increased revenue for local and state governments (Batista 2010).  For this project, we will set out to determine if red light cameras actually decrease the amount of crashes and decrease the likelihood of injury for crashes that occur in an intersection.

For this application project, we intend to use data gathered by the Texas Department of Transportation (2011) regarding the number of accidents at specific intersections before and after the installation of red light cameras at those intersections.  The population is all intersections with red lights, and the sample is chosen intersections in Texas.  We assume that all motorists are informed of the presence of the red light cameras at each of the specified intersections, that each motorist who runs a red light is captured by the camera and given a ticket, and that no other change affects the intersection except the installation of red light cameras.  Once the data is collected, we will block for the crash type: fatal, injury, and non-injury.  We will compute the percent change in the number of each type of accident before and after installation of the red light cameras at each intersection.  We will also look at the percent change in the total number of crashes before and after installation of the red light cameras at each intersection.

Assuming that the percent changes at each intersection form a normal distribution for each block, we can then conduct a hypothesis test for each block (fatal, injury, non-injury, total) with the null hypotheses being that X = 0, where X represents the percent change in the number of crashes before and after the installation of the red light cameras.  The alternate hypotheses will be that X < 0, meaning that the number of accidents in an intersection has decreased following the addition of red light cameras.

We shall examine the possible Type I and Type II errors for these hypothesis tests.  Type I error would occur when there is no percent change in the number of crashes before and after the red light cameras are added, but we reject the null hypothesis anyway.   This type of error could lead local and state governments to install the cameras thinking that they effectively reduce the number of crashes when in fact, they have no effect on the accident rate.  Type II error would be that we do not reject the null hypothesis when in fact there is a decrease in the number of crashes.  This type of error would result in governments choosing not to install red light cameras when they could reduce the number of accidents and help save lives.   95% confidence intervals for the mean percent change in accidents for each block will be computed and examined.

The results of this statistical analysis could demonstrate the effectiveness of red light cameras and the validity of government’s arguing that they improve safety, disproving the theory that they are merely installed to increase revenue.  If a larger amount of data was collected from across the country, further statistical analysis could determine whether or not red light cameras actually are effective in helping to decrease the number of crashes and thus save lives.

References:

Batista, Elysa.  (2010, May 13).  Crist signs Fla. bill legalizing red light cameras.  Naples Daily News.  Retrieved from http://www.naplesnews.com/news/2010/may/13/crist-signs-fla-bill-legalizing-red-light-cameras/

National Highway Traffic Safety Administration (NHTSA).  (2009).  Fatality Analysis      Reporting System (FARS) Encyclopedia [data file]. Retrieved from http://www-fars.nhtsa.dot.gov/Main/index.aspx

Texas Department of Transportation.  (2011).  Red Light Cameras – Annual Data Reports [data   file]. Retrieved from http://www.txdot.gov/safety/red_light_reports.htm

Where to Live to Be Happy

Jack Minardi and Chris Lioi

Many notions of happiness exist, and most of them are subjective and hard to quantify.  However, many people have proposed various schema for the quantification of happiness.  Many of these are the result of surveys and self-assessments questionnaires that aim to assign a number on a certain scale indicative of how happy a person is.  What sort of factors affect happiness?  Or perhaps a weaker but more reasonable question, what sort of factors affect a certain numerical quantification of happiness?  It may be that metrics based on different characterizations of happiness are affected by different things.

Our project proposal is to find what (if any) such correlations exist for a certain metric (or metrics).  In particular, there is ample data offered by the organization World Database of Happiness.  Their website[1] is “an ongoing register of scientific research on the subjective enjoyment of life”.  It lists several metrics of happiness by nation or geographical region. There is also another measure known as Gross National Happiness [2] (GNH) named such to parallel the concept of Gross National Product. It is claimed to be a better measure of a given country’s success as compared to GDP. Either one of these databases may be used. Since the data presented in the World Database of Happiness does not seem to be easily downloadable, we will write a python script t scrape the site and collect the needed data.  We will correlate these metrics with various other data for the world’s nations, such as population, GDP, average age, average lifespan, and any other variables that would appear to be of consequence.  These other data may be obtained easily from any number of public data sources, such as the CIA Factbook[3].  Again, if the data is not presented in an easily downloadable format, such as an excel file, our plan is to write simple scripts to scrape the website for the relevant data.  The data analysis, in the form of linear regression using gradient descent, will be done in either R or MATLAB. We plan on writing the algorithms ourselves to get a better understanding of how they operate.

In the end we hope to be able to present statistics that show how many different metrics are related to happiness, and hope to gain a better insight into what makes us happy. Using the tools learned in the class we will be able to show how strongly the different measures are correlated, and what seems to contribute the most to overall happiness.


[1] http://www1.eur.nl/fsw/happiness/

[2] http://en.wikipedia.org/wiki/Gross_national_happiness

[3] https://www.cia.gov/library/publications/the-world-factbook/

Big Bully on Campus: Is Vanderbilt Stealing Your Lunch Money?

Curtis Northcutt
Peter York
Hayden Kelly

MATH 216 – Statistics, Dr. Derek Bruff
Statistics Project Proposal
March 26, 2012

 

Big Bully on Campus: Is Vanderbilt Stealing Your Lunch Money?

Math 216 Statistics Project Proposal

Does the average student lose money on a given meal purchased using meal plan at Rand? This project focuses on the average cost paid for a single meal plan vs. the average cost of a single meal at RAND. While we note that it is commonly held that the price of a given meal does not accurately reflect the market value of the food purchased (i.e. on-campus food is generally believed to be overpriced), the analysis of whether or not this is actually the case is beyond the scope of the project. Rather, our goal is to advise students who have decided to eat on campus whether to purchase meal plan next semester or use Commodore Cash for food purchases based on the results of our experiment. Thus, a successful statistical analysis of the population of Vanderbilt students who eat at RAND will allow such students to save money next semester on their meal expenses.

We will watch the registers in Rand on Monday and Tuesday, gathering samples consisting of three pieces of data: (1) the price of the meal, (2) the number of remaining meals, and (3) the gender of the person purchasing the meal.  Because we do not know the distribution of meal price, will gather 100-200 samples to assume normality. If a student does not use meal plan, we will not gather any data for that student as they are not in the population we are considering. We will gather samples at breakfast and lunch only, because if we gathered data on the only night that Rand serves dinner, Tuesday, our data would likely be unreasonably linearized due to the homogeneity of prices on Tortellini Tuesday.  By gathering data on Monday and Tuesday, we will be able to ascertain the percentage of students with each type of meal plan (8, 14, 19, or 21 meal plan) from the number of remaining meals data. Since all Vanderbilt meal plans reset at 12:00 a.m. on Monday, it will not be possible for students to have used enough meals by Monday afternoon in order for their “meals left” to drop below their meal plan category. This project may be extended to analyze other variables by also answering the questions: “Do males or females lose more money on meal plan?” and “Would our results be different if we sampled from Branscomb Munchie Mart instead?”

We will calculate the average cost per meal for all students based on the price for each plan, provided by Vanderbilt Dining[1]. We will then construct a probability distribution function, where x = the average cost per meal for a given meal plan type and P(x) = the percentage of students who have that meal plan. The expected value of this PDF is the average cost per meal for all Vanderbilt students on meal plan.

We will then perform a hypothesis test to determine whether the average cost and average price of a given meal are in line with each other.  We will let H0 : μ = E(x), where µ is the average price per meal and E(x) is the average cost per meal, as calculated above. This choice for our null hypothesis stems from Vanderbilt’s assertion that the meal plan average cost approximates your expenditures. We will let HA: μ < E(x), the average price per meal is less than the average cost per meal. By modeling the normal distribution, we will conduct a hypothesis test to determine if there is statistically significant evidence to reject H0 in favor of HA with α = 0.05.

If we fail to reject the null hypothesis, we will advise students who eat on campus to stick with meal plan; however, if we reject the null hypothesis, we will advise students to purchase their meals with Commodore Cash and save the difference between the cost of meal plan and the price of dining hall food. While it would be more exciting to be able to reject the null hypothesis if we fail to reject the null hypothesis, it would be gratifying to learn that the university is not taking advantage of students.

 References

[1] Vanderbilt University. VU Meal Plans. Retrieved from http://www.vanderbilt.edu/dining/vumealplans.php

 

 

 

Moore’s Law Holding Steady?

Authors:  Graham G.,  Colin T.,  Richard W.

Technology development is increasing at a very rapid rate.  Gordon Moore proposed in the mid 1960’s that the number of transistors that can be placed on an integrated circuit at a reasonable cost doubles every 18 to 24 months. This trend in technological increase has been observed not only in transistors, but also in processor speed, memory capacity, and even pixel densities in digital cameras. However, some now fear that we are approaching the the physical limits of how small we can make these technologies, while still maintaining the same reliability. How long can this trend of unbounded growth continue?These technological increases are significant to many different aspects of the world including  businesses, education, communication, the “information grid,” and the digital divide between 1st and 3rd world countries. In industry, companies need to forecast what technologies will be available to them when they go to develop new products down the line. For instance, if a company plans to create a mobile device to be released 4 years from now, they need to draw up the specifications based the best components they will be able to find when they go into manufacturing, not the best parts currently on the market.  Educators need to be aware of different technologies as they come into existence, as they need to teach their students how to utilize new technologies in order to produce an efficient workforce.  The current divide between technological capabilities in 1st world countries and 3rd world countries is currently quite large, but as technology gets less expensive, will 3rd world countries continue to lag behind, or will they be able to catch up?

We are interested in testing whether Moores law holds in several different tech sectors.  Does Moore’s Law hold for processing power? How about memory capacity? What about pixel densities in cameras? All of these questions relate directly back to whether or not Moore’s law holds because they are the effects of the different areas the law affects. By looking at data over many years for these individual traits, we can compare how the number of transistors on a chip translates to the technologies that number is supposed to make better.

Data for these questions should be quite easy to obtain.  It isn’t very difficult to go online and find historic prices for different processors, hard drives and cameras.  What is difficult is determining what is reasonable as “the technology” for a given year.  For any given year, we will likely be able to find many processor and hard drive models on the market, so determining what a given year’s “transistor count” or “cost per megabyte” may be much more difficult to ascertain.  We will need to come up with some method for averaging the prices for different models in a given year.

For each of our questions, we will compare the data on processing power, memory capacity and pixel densities to the number of transistors on an integrated circuit to see if the trend holds. We will do a two-sided test for our analysis for each technology. Our null hypothesis will be that we accept Moore’s Law, since it is merely an estimation of the advancement of technology and we can allow some tolerance if it does not hold precisely. Our alternate hypothesis will be that the Law does not hold and that it either overestimates or underestimates our ability to continue this growth.

We will also examine the errors that go along with our hypothesis tests. Type I Error would be concluding Moore’s Law doesn’t hold when it actually does. Type II Error would be concluding that Moore’s Law does hold when in fact it doesn’t.

References:

1. Long, Phillip D.  (May 2002).  Moore’s Law and the Conundrum of Human Learning.  Retrieved from http://net.educause.edu/ir/library/pdf/erm0230.pdf

2.  Intel.  (February 2003).  Moore’s Law: Raising the Bar.  Retrieved from http://download.intel.com/museum/Moores_Law/Printed_Materials/Moores_Law_Backgrounder.pdf

3.  McCallum, John C. (2012). Memory Prices 1957 to 2012. Retrieved from http://www.jcmit.com/memoryprice.htm

4.  Wikipedia.  Moore’s Law.  Retrieved from http://en.wikipedia.org/wiki/Moore’s_law

Go Dores: Final Shot Strategy Proposal

Go Dores: Final Shot Strategy proposal

BY Jiacheng Ren and Haolin Wang

3/25/2012

All Vanderbilt Basketball fans have faced the situation like this: Vanderbilt Commodores was 2 point behind in the second half and we get the ball. There was only 15 seconds left to make the final shot. Shall we make a two-point basket to tie the game or shoot a three to end it now?

To answer this question, we must know the probability to score a two-point and a three-point shot. Apparently, it is relatively easy to take a two-point shot than a three pointer because it is easier to score when shooting nearer to the basket and the defense might be more focused on preventing us shooting a three-point shot. However, even if we made the two-point shot, we still have to play in the overtime game. Unfortunately, we have a very poor overtime game winning record, which would almost compromise our effort on making the two-point shot. On the other hand, we have top three-point shooters in the whole country.  Perhaps, the chance of scoring a three would be slim since our opponent would put more attention on preventing us shooting a three. We might be better off to let our John Jenkins or Jeffery Taylor to win the game right away.

In order to find the best chance to win the game, we need to know the key factors that could possibly influence the result. According to Bill Hanks, a 32-year-experienced basketball coach, “the final shot by a team is dictated by five factors.”  The first one is the time on the clock, which dictates how much time the shooter have for the final shot and how complicated your final play could be. We will ignore this factor in this project to simplify our analysis. The second factor is foul situation, and we will also simplify this factor by assigning a fixed probability of getting fouled based on Hanks’s experience. “The closer a player is to the basket, the higher the chance of a foul.” The third factor is the players. Assuming in this scenario, the players in the game are Jenkins, Taylor, Ezeli, Tinsley and Goulbourne. The table below shows the stats of the players. The fourth factor is the placement of the ball and the fifth factor is the defense. We will ignore these two by making assumptions in the scenario. In addition, we would like to add another key factor because what we are interested in is not only making the last shot, but also winning the game. This factor would be our overtime performance. By gathering data from the web, we can know the ranking of the two teams and our expectation to win in the overtime. Sadly, we lost all 3 overtime games this season.

Player                                 FG%       FT%        3P%

John Jenkins                      .474        .837        .439

Jeffery Taylor                     .493        .605        .423

Festus Ezeli                       .539        .604        .000

Brad Tinsley                       .474        .855        .415

Lance Goulbourne              .456        .680        .309

First of all, we can apply hypothesis test on the overtime winning chance. H0 will be we have a 50% chance and HA will be the chance is less than 50%. For the shooting, we will run simulations which, for example, let Jenkins shoot 3 point for 100 times and let Ezeli attempt a close range shot for 100 times. We will also run a simulation to see if we get a foul or not. The approximate chance of foul can be concluded from the database of ESPN.  At last, we consider all of our simulations and find out what’s the expectation of each different final play strategies.

 

References:

www.espn.com

http://voices.yahoo.com/basketball-basics-taking-last-shot-685849.html