Here's your final reading assignment. Read **Section 6.2 **in your textbook (**omitting Section 6.2.4**) and answer the following questions by noon, Monday, April 16th. Be sure to login (using the link near the bottom of the sidebar) to the blog before leaving your answers in the comment section below.

- Conducting a two-sample t-test requires that the underlying populations for both samples are normal. What are two methods we've seen for checking this normality assumption?
- Suppose you recruit 10 Vanderbilt undergraduates at random to sample the coffee at Starbucks and Panera, rating each store's coffee on a scale of 1 to 5. The average of the differences in the 10 pairs of ratings is 0.7 in favor of Starbucks, with a standard deviation of 0.5. Is this sufficient evidence to conclude that all Vanderbilt students prefer Starbucks coffee to Panera coffee? (Assume that the underlying populations here are normal.)
- What's one question you have about the reading?

1. Regression plot (horizontal) and normal probability plot (linear).

2.

H0: u-diff = 0

HA: u-diff >0

df = 10 -1 = 9

t-value = 0.7 - 0 / (0.5/sqrt(10)) = 4.43

Reject null hypothesis at 0.001 significance level

1) Using probability plots and ensuring population is not too skewed.

2) P(t> 0.7/0.5) = P(t>1.4)=0.0975 with df=9 --> Fail to reject null at 5% sig level

3) No questions

1) Using a normal probability plot with theoretical distributions on the x-axis and normalized experimental values and confirming a linear distribution. Another way to confirm this is to check that 66% of the data is between +-1 sigma, 95% between +-2 sigma, and 99% between +-3 sigma.

2) Ho: mean_starbucks = mean_panera Ha: mean_starbucks > mean_panera

SE = sqrt(2s^2/n) = sqrt(2(.5)^2/10) = .316

t = (.7-0)/.316 = 2.21 using df = 9 gives p < .05 means we have pretty good evidence students prefer Starbucks coffee to Panera coffee

3) Does the difference method used above required the adjustment of the SE for having two samples? I multiplied it by sqrt(2) to adjust it, and based on the book that looks correct, but I'm not sure.

1. Conducting a two-sample t-test requires that the underlying populations for both samples are normal. What are two methods we've seen for checking this normality assumption?

Checking the sample size

Normal probability plot

2. Suppose you recruit 10 Vanderbilt undergraduates at random to sample the coffee at Starbucks and Panera, rating each store's coffee on a scale of 1 to 5. The average of the differences in the 10 pairs of ratings is 0.7 in favor of Starbucks, with a standard deviation of 0.5. Is this sufficient evidence to conclude that all Vanderbilt students prefer Starbucks coffee to Panera coffee? (Assume that the underlying populations here are normal.)

P = (.7-0/.5) = 1.4

T(P) = .05

No

3. What's one question you have about the reading?

Where does the t* for the confidence interval come from?

1. We can use the probability plot to check whether the figure is a perfect line. Also we can check that 66% of data is between 1 sigma, 95% of data is between 2 sigma, and 99% of data is between 3 sigma.

2. SE = 0.5/SQUR(10) * SQUR(2) = 0.22

T = 0.7/0.22 = 3.18

USING df = 9, gives p<0.05

Therefore, we have strong evidence that students prefer starbucks coffee.

3. I do not sure whether I have to time the squr(2) above......

1. We can check to see if the distribution is bell-shaped and symmetric around the mean. We can also make a normal probability plot and see if the data resembles a straight line.

2. T = .7 - 0 / SE where the SE = sqrt(.5^2/10 + .5^2/10)

p-value = P(x >= .7 | mu = 0) = P ( Z >= 3.13) w/ 9 df

We get a p-value of less than .005, so we reject the null that the difference in scores for Starbucks and Panera coffee is zero. We conclude there is strong evidence that Vanderbilt students prefer Starbucks to Panera.

3. I'm kind of confused about how to use the t probability table.

1) 1 - Sample is independent.

2 - Size of sample is less than 10% of the population size.

2) This is a sufficient evidence since the difference of the two averages is not zero.

3) No question.

1) In testing the normality assumption we can check if the distribution is symmetric as well as find if outliers are extremely rare, among other methods.

2) Degrees of freedom = 9

T = 3.13

P- value is at least 0.01. The P value is decently large. We do not reject the null hypothesis therefore the data does not convincingly show that students prefer Starbucks over Panera.

3) No questions, unless I got number 2 wrong.

1. Plotting the samples on a normal probability plot and testing linearity. Plotting the samples on a histogram and testing against normal standard distribution plot.

2. H0: xbar_s-xbar_p = 0. HA: xbar_s-xbar_p >0. df = 9. SE = 0.313. T = (0.7-0)/0.313 = 2.24. Cutoff for 9 df = 1.83. T>1.83 therefore sufficient evidence that Vanderbilt students prefer Starbucks.

3. None

1) We can do that either by checking if 66% of the data fits in + or – one standard deviation and 5% fits in + or – two standard deviation, etc. Or we can use a normal probability plot and check if there is a fit.

2) We set up a hypothesis test where the null hypothesis is that the man of starbucks is equal to that of panera and an alternate hypothesis where the mean of Starbucks is greater. We then calculate the t-stat which is equal to t = (.7-0)/.316 = 2.21 using a calculated standard error= squareroot(2(5)^2/10)=.316. df=10-1=9 so the corresponding value is smaller than 0.05 which means that students prefer Starbucks coffee.

3) can we quantify when the t-stat should be used instead of the p-value?

1. The first one would be to check if the standard deviations are correct, making sure the appropriate number of values fall under it's appropriate Z score. THe other would be for looking at the theoretical distributions plot we've looked at before. It was very easy to tell from here if we would be dealing with a normal or non-normal data set.

2. H0: meanS - meanP = 0

HA: meanS > meanP

SE = (((.5^2)/10)+((.5^2)/10))^.5 = .223

T = (.7-0)/ .223 = 3.14

Degrees of freedom = n – 1 = 10 – 1 = 9. We know that with this t value that there is a good chance that students prefer Starbucks to Panera

3. Did you do the mobile version of the website yourself, or did wordpress automatically handle that stuff? Either way, props. Majority of websites don't have mobile optimized versions

The mobile version of this WordPress site? There's a plugin for that. I use WPtouch.

1. One method is could be to determine if 68% of the data is within one standard deviation of the mean, 95% within two standard deviations, and 99% within three standard deviations. Another method could be to look at the normal probability plot to determine normality.

2. Use the two hypothesis H0: Starbucks mean (s) = Panera (p) means and Ha: s > p. The SE = 0.224. t= .7-0/ .224 = 3.13. Using 9 degrees of freedem p <.01 giving reasonable evidence Starbucks is preferred over Panera.

3. The calculation of SE is becoming confusing to me. Not sure if I did it right in the previous example.

1. Looking at the normal probability plot and checking for skewness in the sample

2. Since p = 0.000827, this is sufficient evidence to say that Vanderbilt students prefer Starbucks coffee to Panera coffee

1- Using a normal probability plot with theoretical distributions on the x-axis and normalized experimental values and confirming a linear distribution. Another way is to check that 66% of the data is between one SD, 95% between two SDs, and 99% between three SDs.

2- t = 0.7/0.316 = 2.21 with df = 9 gives p < .05 so adequate evidence students prefer starbucks coffee over panera coffee. Plus of course they do--it's just better!

3-None

1. To check whether samples are normal, one may look at a plot of the data for extreme outliers or consider whether previous experiences indicate that the data is not nearly normal.

2.

H0: u1-u2 = 0

H1: u1-u2 > 0

n = 10

XDIFF = 0.7

s = 0.5

SE = sqrt( 0.5^2/10 + 0.5^2/10 ) = 0.2236

T = (point estimate - null value)/SE = 0.7/0.2236 = 3.1306

df = n - 1 = 10 - 1 = 9

The t-distribution table indicates that the associated p-value for this T statistic is between 0.005 and 0.01. Both of these probabilities are lower than 5%. Thus, there is sufficient evidence to conclude that all Vanderbilt students prefer Starbucks coffee to Panera coffee.

3. The book refers to using a computer to calculate the degrees of freedom for the sampling distribution of a difference of two means. What is the algorithm used to perform this task?

What's one question you have about the reading?

1) Sample is assumed to be normal if its underlying population is a normal distribution or when the sample size is large.

2) X = average rating for Starbucks, Y = average rating for Panera, n = 10

X - Y = 0.7, standard deviation = 0.5

Since underlying population is normal, the sample is also normal.

Ho : X-Y = 0

HA : X-Y > 0 degrees of freedom = 10 - 1 = 9

SE = sqrt(0.5^2/10) = 0.1581

T = (0.7 - 0)/0.1581 = 4.427

Using this T value and looking at row for dof 9 on the table, we see that the p-value lies between .001 and .002, which is less than .05 significance level. Since the p-value is small, we have strong evidence to reject the null hypothesis and conclude that indeed students favor Starbucks more than Panera.

3) How do we know which tail to use?

1. To check if the underlying distribution is normal, we can use a normal probability plot and look at the linearity of the relation between the distributions and the experimental values. Another way to check for normality is to see what percentage of the data is between plus or minus 1, 2, and 3 standard deviations away. 1 standard deviation corresponds to 66% of the data, 2 standard deviations corresponds to approximately 99% of the data and 3 standard deviations contains about 99% of the data.

2. If the underlying population is normal, then the small sample size will not matter. The null hypothesis is H0: mean of starbucks = mean of panera. The alternate hypothesis is Ha: mean of starbucks > mean of panera

SE = sqrt((2s^2)/n) = sqrt((2(.5)^2)/10) = .316

t = (x1-x0)/SE = (0.7-0)/0.316 = 2.21 and we use 9 degrees of freedom to give a p-value of p < .05. This shows that students at Vanderbilt most likely prefer Starbucks coffee over Panera coffee.

3. Why is the actual degree of freedom so much higher than the n-1 version? Is there a better way to estimate this value?

2 standard deviations actually includes 95% of the data, not 99%

1. By analyzing a residuals plot and by ensuring the sample size is sufficiently large and that each piece of data in the samples are independent.

2. Hypothesis test: T = (0.7-0)/sqrt(2*(0.5^2/10)) = 3.13. df=10-1=9. The p-value lies between 0.01 and 0.02, so we can confidently say that Vanderbilt students prefer Starbucks coffee to Panera coffee.

3. What is the computer software used to find the exact degrees of freedom?

1) Conducting a two-sample t-test requires that the underlying populations for both samples are normal. What are two methods we've seen for checking this normality assumption?

> Greater than 50 samples, or underlying process is normal.

2) Suppose you recruit 10 Vanderbilt undergraduates at random to sample the coffee at Starbucks and Panera, rating each store's coffee on a scale of 1 to 5. The average of the differences in the 10 pairs of ratings is 0.7 in favor of Starbucks, with a standard deviation of 0.5. Is this sufficient evidence to conclude that all Vanderbilt students prefer Starbucks coffee to Panera coffee? (Assume that the underlying populations here are normal.)

> Yes, if the underlying populations are normal then this is sufficient information to conclude students prefer starbucks.

3) What's one question you have about the reading?

> I am a little fuzzy on 2 sample t-tests

1. One way is to check the 66% and 95% point on the distribution plot. Another way is to draw a normal probability plot and compare the data with theoretical quantiles.

2. H0: Mean ratings of the two store are the same.

HA: Starbucks has a higher mean rating than panera.

SE = 0.316

t = 0.7/0.316 = 2.21

9 degrees of freedom

p < 0.05 according to the t table

ie. we have strong evidence that vandy students prefer Starbucks

3. What's the condition of applying t distribution?

1. normal probability plot - check for linear fit

determine if 66%, 95%, and 99% of the data are within 1, 2, and 3 standard deviations from the mean, respectively

2. S= startbucks, P = panera

Ho: mean(S) = mean(P)

Ha: mean(S) > mean(P)

SE = sqrt(2(.5)^2/10) = .316

t = (.7-0)/.316 = 2.21

df = 9, p < .05

the evidence should be enough

3. We have been using the t-test a lot in our lab. Normally we require much larger sample sizes though. Is that a requirement of good lab work or the math behind it?

1. a linear distribution of the normal probability plot and by looking at the distribution of the population itself.

2. SE = sqrt((0.5^2)/10) = 0.16

t = 0.7/0.16 = 4.43

p-value is really small so students prefer starbucks more

1)95% confidence interval and hypothesis test method

2) since the underlying population is normal, doesn't matter how small the sample size we take, so it is pretty good evidence to say that students prefer starbucks as oppose to panera. but, the standard deviation cant be near the population standard deviation.

3)The two sample t difference is so confusing, i dont know when to use this formula

1. We have used normal probability plots as well as checking if the z score appropriately describes the percentage.

2. We run a hypothesis test here to find out if the evidence is adequate.

Ho: Starbucks = Panera, Ha: Starbucks > Panera

SE = .316

p < .05 which means we have adequate evidence to reject the null hypothesis.

3. What is the difference between the difference method and some of the other methods we've learned?

1.

A quantile plot is one tactic of testing normality on a set of data. Alternatively we can create a histogram with medium sized buckets that can also visually test for normality. This does require some test data to be found that allows us to scope out how well this might be used.

2.

It is safe to conclude from this information that Vanderbilt undergraduates do prefer Starbucks over Panera, though as a matter of phrasing, it is NOT safe to conclude that ALL Vanderbilt students prefer it. This is a measure of averages, so not every single person must like the same item more to have the overall lean towards Starbucks. Also, this study was also only limited to Undergraduates, so not all students were represented (i.e. Graduate students' preferences cannot be predicted).

3. What's one question you have about the reading?

Alternatively, could this test be done visually by a side by side normal plot? It could be a test along the lines of "if they cross before their 1.5 standard deviations they are not equivilant." This might be a more intuitive way to convery the differences between these two sets.

1) That the observations are independent and the sample size is greater than or equal to 50.

2) T = 1.4, df = 9. P value = .1 so this is not sufficient evidence.

3) Why do we use the approximation for df if it is so inaccurate? Like in the example the df found by the computer was 45.97 but using the approximation it was 26 which seems like a significant difference.

1) The obvious choice for determining normality is a normal probability plot. One could also use a histogram to get a rough idea of what is going on.

2) T-value of 4.427 for the probability that the difference in the 10 ratings was 0.7 given that there exists no preference between Starbucks and Panera. Degrees of freedom = n-1 = 9. This corresponds to a p-value that is very low, meaning that all Vanderbilt students really do prefer Starbucks coffee to Panera coffee. That seems like too strong of a result for such a humble sample size.

3) I am still shaky on the normality condition stuff. Are summary statistics from a sample really enough information from which to say with authority that the underlying population is normal?

Your question will be answered in class today.

1) Checking to see if the data is fairly symmetric about the mean without any obvious outliers and the samples are independent of one another.

2) (Xbar)starbucks - (Xbar)panera = 0.7

n = 10

s = 0.5

df = n - 1 = 10 -1 = 9 degrees of freedom

T = 3.125

p-value falls between 0.010 and 0.005 (one tail) --> p-values is very small, fail to reject that null hypothesis. There is sufficient evidence to conclude that all Vanderbilt students prefer Starbucks coffee to Panera coffee.

3) In the examples in the reading there were always two values for each sample's mean, it was confusing in number 2 to just deal with the mean difference without each one's standard deviation. Will you go over how to do problem 2 in class?

1) In order to be considered normal the data cannot be skewed and must be unimodal

2) Yes, that is enough evidence. The resulting t-value suggests that we can reject our null hypothesis that there is no difference in coffee ratings

3) If data is medium-sized, which would be most accurate to determine the significance--large or small sample inference?

1) first, seeing that 95% of data is within 2 SDs of the mean and ~65 percent is within 1sd. Also, we can also see normality visually, if the data is presented in a bell curve

2) Ho: mean(starbucks) = mean(panera) Ha: mean(starbucks) > mean(panera)

SE = root(2s^2/n) = root(2(.5)^2/10) = .316

t = (.7)/.316 = 2.21 using df = 9 since p < .05 supports evidence that students prefer Starbucks

3) How do you account for SE being based off of two samples?

1) 66% of the data is between +-1 sigma, 95% between +-2 sigma, and 99% between +-3 sigma.

2) Ho: mean of starbucks = mean of panera

Ha: mean of starbucks > mean_panera

SE = 0 .316

t = 2.21 using df = 9 p < .05 Students prefer Starbucks coffee to Panera coffee

Conducting a two-sample t-test requires that the underlying populations for both samples are normal. What are two methods we've seen for checking this normality assumption?

We can use a normality plot and check if our plot yields a linear line.

We can view the historgram of the data and view if it appears normal (bell curved, unimodal, symmetric)

Suppose you recruit 10 Vanderbilt undergraduates at random to sample the coffee at Starbucks and Panera, rating each store's coffee on a scale of 1 to 5. The average of the differences in the 10 pairs of ratings is 0.7 in favor of Starbucks, with a standard deviation of 0.5. Is this sufficient evidence to conclude that all Vanderbilt students prefer Starbucks coffee to Panera coffee? (Assume that the underlying populations here are normal.)

Since the data is paired, we are considering the distribution of the difference for each pair and thus can use matched pairs statistics (i.e. SE is just s/sqrt(n), not the complicated formal it is for unpaired differences)

Ho: muS - muP = 0

Ha: muS - muP > 0

n = 10, df = 9

p = P(T > (.7 - 0 )/ (.5/sqrt(10))) df = 9 -> p-value < .005

yes, this is significant evidence to believe Vandy students prefer Starbucks.

What's one question you have about the reading?

No questions

One great way to check if the population is normal is to look at the normal probability plot, and see if the data are linear or not. Also, we can plot the data in a histogram and check how well it lines up with a normal curve. If the data seems roughly symmetric about the mean, we are more likely to have a normal population.

H0: x1-x2=0; HA: x1>x2;

df=9; t(twotail,95%)=2.26; SE=.5/sqrt(10)=.15; x+ x1-x2=.7;

T=(.7-0)/.15=4.6; this corresponds to a p value of less than .0002, so it is safe to say that starbucks has superior quality by vandy students standards.

In the case of the teacher, if we find that there is evidence for one test being easier than the other, how do we determine how much to curve the other test by?

1. We've used the Q-Q plots before to determine whether a sample is normal by measuring the linearity of a given data set. Additionally, we've seen whether 68% of the data set is between one standard deviation of the mean, 95% between two standard deviations, and 99% between three standard deviations.

2. The null hypothesis is that the mean rating of Starbucks is equal to the mean rating of Panera, and the alternate hypothesis is that the mean rating of Starbucks is greater than the mean rating of Panera.

SE = sqrt ( s^2/n + s^2/n ) = sqrt ( 2 * s^2/n ) = sqrt ( ( 2 * .5^2 ) / 10 ) = 0.224

T = (point estimate - null value) / SE = ( .7-0 ) / .224 = 3.125

p is less than .05 which indicates strongly that students prefer Starbucks as opposed to Panera coffee.

3. Although this isn't a specific question, I'm a little confused as to where the degrees of freedom is supposed to be used in calculations of this sort.

1. Normal Probability Plot

2. The high T score is good evidence that Vanderbilt students prefer Starbucks coffee to Panera coffee.

SE = Standard Deviation/n^.5 = .5/(10^.5) = .158

T = .7-0/.158

T = 4.43

df = 9

t(df = 9, .05) = 2.26

4.43 > 2.26

If the samples are independent and large, then it is safe to assume they are nearly normal

It is not enough to make this assumption because the sample size is so small

What is the difference between t and Z?

1. One method is checking the maximum and minimum values to make sure that there are no outliers and the data is not skewed in any way. Another method is simply examining the data itself and seeing if it appears to take the shape of a normal distribution.

2. H0: u1 - u2 = 0. HA = u1 - u2 > 0. Since the underlying populations are normal and the sample is small, we can use a one-tailed test using the t distribution with df = 9. SE = 0.5/sqrt(10) = 0.158. T = (0.7 - 0)/0.158 = 4.427. This means that the p-score is < 0.005 < 0.05, which means that we reject the null hypothesis and accept that all Vanderbilt students prefer Starbucks coffee to Panera coffee.

3. I have seen this material before, so I did not have any questions about the reading.

1. a. scatterplot

b. look at a normal distribution chart and see if it fits the normal curve

2. Ho: Coffees are the same (panera = starbucks)

Ha: Starbucks is preferred (starbucks > panera)

DOF = 9

SE = sqroot(s^2/n + s^2/n)=sqroot(.5^2/10 + .5^2/10)= 0.223606798

t = (point estimate - null) / SE = (.7 - 0) / .224 = 3.13 w/ DOF = 9

p < .005

Therefore, we will reject our Ho and accept our Ha. Startbucks coffee is prefered.

3. When its a single tail to we use the same s and n in the SE equation?

1. Visual inspection of histogram, QQ plot

2. t=.7/(.5*sqrt(1/5)) = 3.13 which p-value for 2-tailed is 0.0058 so this is strong evidence students prefer Starbucks.

3. What's the R function for t-table lookup?

1) X-axis: Normal probability plot w/ theoretical distributions, Y-Axis: Normalized experimental values. Also confirm a linear distribution. Or check 66% of the data is w/in 1 SD, 95% w/in 2 SD, and 99% w/in 3 SD.

2) Ho: Mean Starbucks = Mean Panera

Ha: Mean Starbucks > Mean Panera

SE = sqrt(2s^2/n) = sqrt(2(.5)^2/10) = .316

t = (.7-0)/.316 = 2.21

df = 9 --> p strong evidence.

3) Why are they called degrees of freedom?

1. Check for independence; check the graph of the residuals.

2. Yes (T = 3.162)

3.None

1) Using a normal probability plot with theoretical distributions on the x-axis and normalized experimental values and confirming a linear distribution. It is also feasible to check the dataset to ensure that approximately 66% lies within 1 stdv of the mean, 95% within 2, and so on.

2) Ho: Starbucksmean = Paneramean

Ha: Starbucksmean > Paneramean

SE = sqrt(2s^2/n) = sqrt(2(.5)^2/10) = .316

t = (.7-0)/.316 = 2.21 using df = 9 ====> p < .05

This is strong evidence that students prefer Starbucks to Panera!

3) The book makes an adjustment look necessary as a product of having two samples - is this the case?

1. One way to do this is to use a q-q plot where a theoretical normalized distribution is on the x-axis and the sample distribution is on the y-axis. if there is a linear relationship between both of those sets of data, then you have a nearly-normal distribution. Another way to check is to overlay the normal distribution on top of a histogram representing your data. If they fit each other pretty well, then the sample distribution is nearly-normal.

2. Ho = meanPanera >= meanStarbucks

Ha = meanStarbucks 0.012 (p-value)

Since our p-value is less than .05, we can safely reject our null hypothesis and say that there is strong evidence that students prefer Starbucks to Panera coffee (as they should).

3. I really don't feel like I computed the standard error in question 2 correctly... is there a different way to do this?

1. to check this normality assumption you can use a normal probability and also by checking the standard deviation of the data.

2. Constructing a t test: null: mean starbucks=mean panera. Alternate: mean starbucks > mean panera. T= pt estimate-null /SE . T= (.7-0)/ 0.316 = 2.21. df=9. p < 0.05, so reject the null hypothesis. There is evidence that Vanderbilt students prefer starbucks over panera coffee.

3. How do you create confidence intervals for t tests in comparison to when using z scores?

1) Normal Probability Plot and Central Limit Theorem.

2) replacing SEx1-x2 with the standard deviation of the difference and df=9, we get T=.7/(.5/10)^.5 = 3.13 which gives us a p-value between .01 and .005 which is significant evidence.

3) none

1) To determine if a set of data is normal then we make sure that 66% of the data is within one standard deviation and 95% of the data is within two standard deviations. Another way to determine this is if a line of best fit shows that there are an equal number of data points above and below the line.

2) H0: Starbucks = Panera; Ha: Starbucks> Panera

SE = sqrt(2s^2/10) = sqrt(2(.5^)2/10) = .316

t = (.7-0)/.316 = 2.21 so our t value is .0136 which means that we can reject the null hypothesis and conclude that we do have sufficient evidence to say that students prefer Starbucks to Panera.

3) Is a sample t test better to use than a sample p test?

1.

One method may be plotting the data in a quartile plot to see if there are numerous outliers. Another may be to plot the data using a histogram to check for any large skew to one side.

2.

SE = sq(2*s^2/n) = 0.224

t = (.7 - 0) / 0.224 = 3.125

df = 10 -1 = 9

using table C2 we can see our p-value is significantly less than 0.05, and so we can conclude students prefer starbucks to panera coffee.

3.

Why do we take the smaller of the df values? How does the computer calculate them?

1. Conducting a two-sample t-test requires that the underlying populations for both samples are normal. What are two methods we've seen for checking this normality assumption?

a) probability plot

b) look at a histogram and see if it is symmetric

2. Suppose you recruit 10 Vanderbilt undergraduates at random to sample the coffee at Starbucks and Panera, rating each store's coffee on a scale of 1 to 5. The average of the differences in the 10 pairs of ratings is 0.7 in favor of Starbucks, with a standard deviation of 0.5. Is this sufficient evidence to conclude that all Vanderbilt students prefer Starbucks coffee to Panera coffee? (Assume that the underlying populations here are normal.)

X1=Starbucks X2=Panera

Ho: X1-X2=0

Ha: X1-X2>0

p-value=P(x1-x2>.7|X1-X2=0)=P(t>(0.7-0)/SE)

SE=2*.5/sqrt(10)=1/sqrt(10)

d.o.f=9

P(t>2.214)=.027

if Vanderbilt students had no preference, we would only get such a large skew 3% of the time. That is pretty convincing evidence that Vanderbilt students prefer Starbucks to Panera coffee.

1) Ensure the independence of the samples, and verifying that they are roughly symmetric about the mean.

2) No since we have a very small degree of freedom.

3) N/A

1.) The two techniques we have for testing the normality of a sample are to compare a histogram of the data to a normal probability curve (a slightly informal test), and to use a normal probability plot (a slightly less informal test).

2.) To test this, we employ the t-distribution with mean 0.7 and standard deviation 0.5. The test statistic is then T = (0.7 - 0)/0.5 = 1.4. With 10 students in our sample, we have the degrees of freedom as df = 9. We find the p-value for this test to be relatively high, therefore we refuse to reject the null hypothesis that there is no difference between the sample means.

3.) Would the t-test work if we were interesting in estimating the sum of two sample means?

1) Confidence Intervals, difference of two means

2) No, it is not sufficient evidence

3) No questions about the reading

1. Samples must be at least 50 and the normal probability plot needs to be somewhat linear.

2. Am now seeing that there is a reading assignment and because I do not have my book on me cannot answer this question.

3. How do we do two sample t tests...

1) In order to assume that samples are normal, you need to assume that the samples are independent and that the population sizes are large enough.

2) No we would need the standard deviations and test statistics for Starbucks and Panera, not the standard deviation of the differences.

1.) We can check if the data is symmetric around the mean, i.e plot similar to normal probability plot. A linear normal probability plot could also be used

2.)

ms = mean of starbucks

mp = mean of paneraHo: mean_star - mean_pan = 0

Ha: ms - mp > 0

SE = sqrt(2s^2/n) = sqrt(2(.5)^2/10) = .316

t = (.7-0)/.316 = 2.21

df = 9

p < .05

We can conclude that students prefer starbucks coffee to Panera coffee

3)Im not clear on how the df is really found.

1. You can graph them and on a probability plot. check the basic stats,: mean , median, outliers.

2. NO it is not.4

3. The reading was clear.

1. There are two ways you can check for normality using graphs: (1) graphically using a histogram by comparing a histogram of the sample data to a normal probability plot and examining how similar the trends are; (2) by creating a Q-Q plot of the standardized data against the standard normal distribution and comparing the correlation (the closer the points fall on a line and the more straight that line is, the more normal the data is).

2. I don’t think the sample size is big enough to make a conclusion about all students at Vanderbilt. However, if you assume that it’s a representative sample, you can perform a t-test and be within the 95% confidence level that Vanderbilt students prefer Starbucks coffee to Panera.

3. How are we going to recognize that you are asking us to do a t-test versus some other statistical analysis method on the final?

1. Looking at the distribution curve, seeing whether it follows the 68–95–99.7 rule.

2. SE=sqrt(.5^2/10), T = (.7-0)/SE = 4.42719., degrees of freedom = 9, so we have a p value between 0 and 0.010 which is very small. Therefore, we can reject the null hypothesis.

3. If we're only given one standard deviation and sample population as we are in this problem, how should we use the formula for Standard Error given in the book that requires two of each? Should we just say SE=sqrt(s^2/n)?

1. We need to first verify conditions for each sample separately and then verify that the samples are also independent. Before we move on, we must first verify that the t distribution method can be applied. Because the sheep were randomly assigned their treatment and, presumably, were kept separate from one another, the independence assumption is verified for each sample as well as for between samples. The data are very limited, so we can only check for obvious outliers in the raw data in Figure 6.13. Since the distributions are (very) roughly symmetric, we will assume the normality condition is acceptable. Because the conditions are satisfied, we can apply the t distribution.

2. Yes this is sufficient evidence as .5 std is not at great as .7 (point est) margin of victory. Use a confidence interval.

3. How do t-tests and p-tests relate? are they derived in a similar method?

1) Normal probability plot with theoretical distributions, and check the standard deviations

2) Ho: the mean at starbucks is = to the mean at panera

Ha: the mean at starbucks is > the mean at panera

t= 2.21, so p <.05 ....... Students tend to prefer Starbucks coffee over Panera's

3) None

A)

1) normal probability plot with theoretical distributions.

2) calculate 99% of the data is between +-3 sigma, 95% between +-2 sigma, and 66% between +-1sigma.

B)

First of all, The number of students are way small compare to the puploation. Also, It depedns on many other factors (living place, income, ...etc). For that, we can not build an idea from this method. Saying thatby finding P-value, we can determine some ideas:

H0: mean_starbucks = mean_panera

Ha: mean_starbucks > mean_panera

SE = 0.31

Then, we got that the P-value is smaller than 0.05. For that, The H0 is true. Than, students have the same perferance for both brands.

C)

Thanks, I am sorry for spelling mistake because I am kind in rush. Sorry 🙂