Here's your next reading assignment. Read **Sections 7.1-7.2 **in your textbook and answer the following questions by 8 a.m., Wednesday, March 21st. Be sure to login (using the link near the bottom of the sidebar) to the blog before leaving your answers in the comment section below.

- Compute the residual for the observation (95.5, 94.0), the one marked with a triangle on Figure 7.6, using the linear model given in Equation 7.1.
- Take a look at the residual plot in part (a) of question 7.2 on page 296. Imagine the x-y scatterplot for these data. Based on the residual plot, what can you about the scatterplot?
- Assuming that the line of best fit for the possum data seen in Figure 7.4 is indeed y = 41 + 0.59x, what does the number 0.59 tell you about possums?
- What's one question you have about the reading?

1.

Data = Fit + Residual

94 = 41 + .59*95.5 + Residual

Residual = -3.35

2.

The scatter plot will have the data points in the beginning very bunched together, looking very linear. As you move along the y axis, the points will become more scattered on both ends from the line of best fit, and they will become more sparse.

3.

The 0.59 tells us how strongly total length is more likely to affect the possums head length. A higher number means a higher correlation between the possums total length and the possums head length.

4.

The rightmost plot in figure 7.8, it seems unclear to me how they came up with that slope.

1. e = y - 41 - 0.59x = 94 - 41 - (0.59*95.5) = -3.345

2. This scatterplot is approximately linear. The correlation is strong when x is small, and decreases with increasing x.

3. Head length changes more than total length per unit. If head length increased by 1, total length would only increase by 0.59.

1. y = 41 + 0.59(95.5) = 97.345

e = y - y˜ = 94-97.345 = -3.345

2. the data is more spread out x wise as we increase y

3. The tails are bigger than the heads bc the slope is less than 1. Also, the bigger the head the bigger the tail since the slope is positive.

4. Nope. Straight forward.

1. 3.345

2. For lower values of x, the points will be close together and closely form a line. As x gets higher, the points will get farther and farther away from each other and the best-fit line.

3. For every increase in total length of a possum by 1 cm, the head length increases by approximately 0.59 mm.

1. -4

2. The scatter plot is shaped like a cone, with the data getting more variability the larger the x variable gets.

3. It is the standard deviation of the x variable divided by the standard deviation of the y variable, multiplied by the correlation variable.

4. none

1. -3.345

2. Since the residuals get larger, the points on the x-y scatterplot would get farther and farther away from the line, meaning they bugle out farther from the line as x gets larger.

3. 0.59 represents the expected difference in the head length of a possum if the total length of a possum increased by 1 unit.

4.None really, remember most of this from high school.

1) y-hat(95.5cm) = 41 + .59(95.5) = 97.345

e = 94 - 97.345 = -3.345

2) The larger that x gets, the more y varies from the line of best fit.

3) That for every cm of total length of the possum, we can expect the possum's head to be .59 mm longer

4) If given *only* a residual plot, is there any way to be able to see positive or negative correlation between the data, or is this only possible with the scatter plot also known?

1) -3.34

2) As the data becomes larger, there is more and more variability introduced into the calculations.

3) The slope of a line tells you how much the value on the y axis will increase for every unit of x that increases. In this example, y increases more slowly than x. I can't find what units they both represent, however.

4) How would we find these lines using our calculators or with R or Excel. It seems that most of my classmates have a good bit of experience using these tools, but I have never really been exposed to them (other than the calculator, of course).

1. y hat = 41 + 0.59x

41 + 0.59(95.5) = 97.345

residual = y - y hat = 94 - 97.345 = -3.345

2. Because we don't know if there is a positive or negative correlation, the only thing we can tell is that residual variability increases with the horizontal axis.

3. As the total length of a possum increases by 1 cm, its head length increases by about 0.59 mm.

4. I am kind of confused on how to calculate R and R^2. Is residual^2 the average of residual squared?

1) y^ = 41 + 0.59 * x = 41 + 0.59 * 95.5 = 97.345

e = y - y^ = 94.0 - 97.345 = -3.35

2) The shape of the scatterplot will resemble the shape of a funnel where as values get larger, the points get farther from the best fit line.

3) For every centimeter increase in possum's total length, the possum's head will increase by .59.

4) R describes the strength of linear relationship and it also tells is whether data are negatively or positively associated. R-squared doesn't tell us the direction (positive or negative), then why is it used more common as the book states?

1) 11.3

2)The error in the observations grew considerably as the predictor variable grew. The relationship is very linear though

3) How much the head length increases per increase in length of the body

4) 7.2 may not be linear now that I think of it

1) -3.345

2)The plot have strong correlation at the beginning and its getting weaker towards the end of the plot. It has small residuals at the beginning and getting larger towards the end of the plot.

3) 0.59 is the slope of the line and its the ratio of the rate of change of the length and length of head of the possum.

4) No question.

1.

Eq 7.1: y = 41 + 0.59 x

y1 = 41 + 0.59 (95.5) = 97.345

Estimated Value at 95.5: 97.345

Actual Value at 95.95: 94.0

Residual: 94.0 - 97.345 = - 3.345

2.

The linear relationship in the data seems to become weaker the farther along the line it gets. The data spreads out, and a cone would be a better representation of the plot.

3.

For every additional centimeter of total possum length, we might expect that possum to have a 0.59 cm longer head.

4. What's one question you have about the reading?

I had some difficulty understanding the formula for creating a lease-squares line. Is that something we need to do by hand?

1) 41 + .59 ( 95.5 ) = 97.345

94.0 - 97.345 = -3.345

2) The scatter plot would have a very linear pattern at the beginning but the plot would become more scattered with less of a pattern towards the ends. The dots would be very close together at the beginning but become more far apart at the end.

3) 0.59 is beta1 which is equal to (Sy/Sx)R. 0.59 is a parameter of the regression line. 0.59 is also the slope of the least squares line

4) The idea of least square is confusing

1. Compute the residual for the observation (95.5, 94.0), the one marked with a triangle on Figure 7.6, using the linear model given in Equation 7.1.

Data= Fit + Residual

94=41+.59(95.5)+residual

Residual=3.345

2. Take a look at the residual plot in part (a) of question 7.2 on page 296. Imagine the x-y scatterplot for these data. Based on the residual plot, what can you about the scatterplot?

When the x values of the scatterplot are low, the data is very linear, but as the x values increase the data becomes more and more spread out.

3. Assuming that the line of best fit for the possum data seen in Figure 7.4 is indeed y = 41 + 0.59x, what does the number 0.59 tell you about possums?

Generally, for each cm of total length added, a Possum's head will be .59 cm larger.

4. What's one question you have about the reading?

I would like to do some clicker questions matching residual plots and scatter plots. I am having trouble visualizing the two together. It reminds me a bit of the probability plots

1.y=41+.59(95.5)=97.345

97.345-94=3.345

2. The data spreads out as it gets further away from the origin. In other words, the residuals increase the further from zero we get.

3. For every additional inch of length, the possums head length increases by .59 inches.

4. Why do we have readings due on the same day as homework?

further details on 3 (as i accidentally posted before being ready).... .59 is the slope of the line of best fit. and represents for each additional cm of body length, the head length will increase by an average of .59 mm.

1. The observation marked with a triangle is 3.345 units away from the linear trend. Therefore, the residual is 3.345.

2. The data very nearly fit the least square line. The observations near the origin fit the line very closely. As you move away from the origin, the observations fit the least square line worse.

3. The number 0.59 indicates the slope of the line of best fit. This number means that, for each unit increase in total length, the head length increases by 0.59 length units.

4. What is the difference between statistical error, residual, and deviation?

1) e = yi - ^yi = 94.0 - (41 + 0.59 * 95.5) = -3.345

2) The points on the scatter plot fan out for high x-values, meaning there is lots of variation in y for high x, but low variation for low x.

3) For every centimeter a possum's total length is larger, the length of it's head is larger by .59 millimeters.

4) Can we model the residuals normally for any data or only some data?

1) y_hat = 41 + .59(95.5) = 97.35, e = 94 - 97.35 = -3.35.

2) Based on the residual plot, the scatterplot most likely does not have a very strong linear dependency.

3) The .59 tells us that for every 1mm increase in the possum's total length, there will be a .59mm increase in their head length.

1. y^ = 41 + (.59) (95.5) = 97.345

Residual = 97.345 - 94 = 3.345

2. It seems that for this residual plot there is a pattern as there is more residual towards the right end of the plot. Due to this pattern it seems the scatterplot may not be the most accurate way to depict the data.

3. This slope indicates that for each additional cm in the length of a possum it has and additional 0.59 cm of head length on average.

4. The section on least squares regression was a little confusing to me.

1) ^y=97.345 and so residual = ^y-yi = 97.345-94 = 3.345

2) This is an example of non-constant variability and so there would appear to be a linear relation initially but as x increases, y spreads out.

3) For every unit increase in total length of the possum, we would expect to see an average increase in head length of the possum by 0.59 units.

4) None

1.) y^ = 41+ 0.59(95.5) = 97.25

y = 94.0

residual = 97.25 - 94.0 = 3.25.

2.) The variability of the points around the least square line increases as x increases, therefore the data cannot be fitted with a least square line.

3.) 0.59 is the slope of the regression line.

4.) What percentage of a data is large enough no to be considered as outliers?

1)

Data = Fit + Residual

Data = 41+ .59*95.5 = 97.345

Fit = 94.0

Residual = 97.345 - 94.0 = 3.345

2) The residual plot shows that the linear fit is appropriate for the data in the scatter plot. All residuals are centered about the line shown in 7.2 (a).

3) For every 1 cm a possum grows in total length, you can expect its head size to grow 0.59 cm

4) How do we determine if the linear fit is significant?

1. ^yi=41+0.59*(95.5)=97.345

e=yi-^yi=94-97.345= -3.345

e=-3.345

2. The clustering towards the best fit line in the beginning shows that the residual is very low when the x variable is low. The residuals spread out as x increases. So we can infer that as x increases, so does the variance of the data. Also the best fit line describes the lower x’s well, but not the higher values of x.

3. 0.59 shows the numerical relationship between the total length and the head length of the possum relative to the average head length. Average head length increases by 0.59mm for every increase of 1mm to total length. The positive association tells us that as the y-variable increases so does the x-variable.

4. Is it possible to determine correlation with relatively small amounts of data?

yhat = 41 + 0.59*x = 41 + 0.59*95.5 = 97.345

esubi = ysubi - yhat = 94.0 - 97.345 = -3.345

Looking at the residual plots, a few trends can be applied.

On the first graph, the graph starts off very linearly, but starts to vary from the expected line after a while.

The second graph should not be applied with a linear fit, at least at the beginning. By seeing a curve in the residual, it can be guessed that the actual graph is nonlinear. It does taper off linearly after the first third or so, however.

The number 0.59 is the average amount that the possum's head length (in mm) changes as the possum itself lengthens (in cm). For each centimeter longer a possum is, it is most likely going to have a 0.59 mm increase in head length.

How are nonlinear curves of best fit created?

1. Residual is the vertical distance of the point above or below the line. In this case, the equation for the fit line is y=41+0.59x. The y value of the fit line at the triangle point is 97.345. Thus the vertical distance between the triangle point (y=94) and the fit line (y=97.345) is 97.345-94=3.345

2. The scatterplot for this data would start out with the data points clumped around the fit line, but as the graph progressed along the x axis, the data would start to drift further from the fit line and become more scattered.

3. The 0.59 term in the fit line equation signifies the relationship between a possum's body length in centimeters and its head length in millimeters. This number tells us that the head of a possum is 0.59 as long in millimeters as the possum is long (in centimeters)

4. Why bother making a residual plot? The residual is fairly easy to eyeball from the actual data plot with the best fit line and thus the residual plot doesn't really tell us any new information about the data

1) expected y = 41 + 0.59 x = 97.345

residual = 94 - 97.345 = -3.345

2) The variability of the data increases with increasing x-value.

3) The .59 means that on average 5.9% (after equating units) of an increase in overall length goes to an increase in head length.

I'm wondering a bit why the book doesn't at least touch on other fits but delays them to another course. Also, what are the curve-fitting tools in R?

1) -3.33

2) I'm not sure "what I can" about the scatter plot, but I could always say that: there seems to be higher variation in Y for larger values of X than there does for smaller values of X

3) An opossum's head is larger by .59 millimeters for every extra mm it is in length

4) nothing atm

1) e=94-(41+.59*95.5)= -3.345

2) the variation in y values seems to increase with an increase in x values.

3) The size of a possum's head increases by .59 mm for every cm increase in total body length.

4) I'm unsure as to when residual modeling is the appropriate choice.

1) e = -3.345 (Actually, it is more like 2.71828)

2) In the panel on the left, the variability increases with increasing x. In the panel on the right, a straight line likely does not fit the data.

3) The 0.59 means that head length increases 0.59 mm for every 1 cm of total length increase.

4) When is extrapolation appropriate?

1.) The actual data point is 94, and the fit point is 41 + (0.59)*(95.5) = 97.345. So the residual is 94 - 97.345 = -3.345

2.) The residuals seem to be relatively evenly dispersed above and below the neutral axis. On the low portion of the graph, the residuals are clustered right around zero, so the data are almost exactly collinear. However, on the higher end of the graph, the residuals spread out and although they seem to be randomly distributed, they are not coincident with the neutral axis. So the data are still well modeled by a linear fit, but the data do not exactly lie on the fit as in the earlier part of the graph.

3.) The number 59 is the slope of the linear model. It indicates that for every unit increase in the total length of the possum, x, we will expect an increase in the head length, y.

4.) Is there a more exact numerical metric for determining if the residuals in a residual plot are randomly distributed about zero other than visual inspection?

1. The observed y value at x = 95.5 is 94 . The value predicted by the model fit is computed to be y = 41+.59*(95.5)=97.345. The residual value: observed-predicted = 94-97.345= -3.345.

2. The best fit line for the graph fits the data really well for the lower range of x, but as x increases, we see the residual values also increase.

3. .59 tells us the proportion of which the length of the possum translates into its height.

4. How do you determine what level of curvature is tolerated before linear model is no longer valid and we should use other tools to try and model the data?

1. ~10

2. Linear, data is les precise farther down the x axis.

3. Possums must in general have much longer bodies for the possibility of having a kind of longer head length.

4. Why is a graph of the residual helpful

Compute the residual for the observation (95.5, 94.0), the one marked with a triangle on Figure 7.6, using the linear model given in Equation 7.1.

y_hat = 41 + .59 * 96 = 97.64

y_actual = 94

e = y_actual - y_hat = -3.64

Take a look at the residual plot in part (a) of question 7.2 on page 296. Imagine the x-y scatterplot for these data. Based on the residual plot, what can you about the scatterplot?

The variability in the scatterplot increases as x increases. As x increases, the points in the scatterplot deviate more. A linear model is not appropriate.

Assuming that the line of best fit for the possum data seen in Figure 7.4 is indeed y = 41 + 0.59x, what does the number 0.59 tell you about possums?

For an increase of 1 cm in total length, there is a predicted increase of .59 cm in the head length.

What's one question you have about the reading?

A lot of formula's got thrown around about calculating the slope and intercept of the regression line. Can you please explain why we are supposed to believe those eqns and where they came from?

1) y = 41 + .59*95.5 = 97.345. Residual = data - fit = 94 - 97.345 = 3.345

2) That in the beginning of the scatter plot the points fall closely along a line but it slowly gets further away from the line in both directions.

3) The .59 means that for cm increase in head length, there is a .59 cm increase for the total length.

4) How do you compose a line of best fit that's not linear?

1. Compute the residual for the observation (95.5, 94.0), the one marked with a triangle on Figure 7.6, using the linear model given in Equation 7.1.

41+0.59*95.5=97.34

94-97.34 = -3.345 <- residual

2. Take a look at the residual plot in part (a) of question 7.2 on page 296. Imagine the x-y scatterplot for these data. Based on the residual plot, what can you about the scatterplot?

The scatter plot is not linear because the residual grows with x instead of remaining relatively constant for all the data. Therefore, a linear model is not a good fit for this data.

3. Assuming that the line of best fit for the possum data seen in Figure 7.4 is indeed y = 41 + 0.59x, what does the number 0.59 tell you about possums?

For every centimeter in length, their head grows 0.59mm

4. What's one question you have about the reading?

Could you give an example of the third reason for using criterion 7.8

1. Yhat = 41 + .59 * 95.5 = 97.345

94 - 97.345 = -3.345 = residual

2. I can tell that the line that models this data is very good at low x values, and then it becomes less precise towards the higher x values. Since the values seem to neither trend up or down together in the residual plot, but instead just become more spread out, it seems that the line could be a good approximation for these values.

3. This number tells us that the possum's head length increases by .59 times that of the total length of the possum every time the possum's total length increases by 1 cm, so it's not quite a 1 to 1 ratio.

4. Nothing right now. Probably will have some later in class.

1. The observed response is y=94.0. The predicted response is y=41+0.59*95.5=97.345. Thus, the residual is equal to 94-97.345 or "-3.345".

2. The linear model becomes less reliable as the "x" variable increases. This can be easily seen since the magnitude of the residuals increases with the "x" variable. The scatterplot thus does not have constant variation and a linear model should not be used.

3. For each additional centimeter of total length, we can estimate a possum's head to be 0.59 mm longer.

4. How do we perform the calculations for "R" without using a computer?

1. 94.0-（41+0.59*95.5）= －3.345

2. It means that there is lots of variation in y when x is high.

3. The length of head is larger by 0.59 millimeters

4. Whether the residuals can be modeled for any data?

1. e = yi - y(hat)i = 94 -41 +.59 * 95.5 = -3.35

2. The data spreads out more as x gets higher, meaning there is more expected variability in higher x data

3. For cm long the possom is, the head is expected to be .59 mm longer

4. What is the airspeed velocity of an unlaiden swallow?

1. By equation 7.1, we get -3.345 as the residual.

2. Since there is a lot of variation in with high x values, we can infer that there is a lot of residual variation for high x, but low for low x.

3. .59 is the slope of this graph, this means that for every centimeter a possum's tail is longer, it's head is .59 millimeters longer.

4. Are residuals always normal? Or only for some data?

1. e=yi-^yi=-3.345

2. The variation level of y increases as x increases.

3. The length of the head increases by 0.59 for every centimeter a possum is larger.

1. Residual = data - fit = 94-(41+0.59*95.50) = -3.345

2. The scatterplot has a linear trendline, but as the numbers get bigger, the data spreads farther and farther from the trendline in both directions.

3. It tells you for every cm a possum grows, its head length grows 0.59mm.

4. What can R-squared tell us that R can't and vice versa (besides - vs +)?

1) -3.345

2) Large variation in y for high x, but low variation for low x.

3) Each increasing centimeter a possum's length , the length of it's head is increased by .59 millimeters.

1. 94 - 41 - 0.59*95.5 = -3.345

2. as x increases, the data points deviate from the linear model.

3. their head length grows on average of about 0.59 cm for each centimeter of total length growth.

1) Compute the residual for the observation (95.5, 94.0), the one marked with a triangle on Figure 7.6, using the linear model given in Equation 7.1.

e=yi-yi_hat=94.0-(41+0.59*95.5)=-3.345

2) Take a look at the residual plot in part (a) of question 7.2 on page 296. Imagine the x-y scatterplot for these data. Based on the residual plot, what can you about the scatterplot?

There's low y-variation for low values of x, and high y-variation for higher values of x.

3) Assuming that the line of best fit for the possum data seen in Figure 7.4 is indeed y = 41 + 0.59x, what does the number 0.59 tell you about possums?

.59*possumLength (in centimeters) = possumHeadLength (in milimeters)

4) What's one question you have about the reading?

I know the chapter says that there are more advanced techniques used to attempt curved best fits, but I was just wondering now what those techniques WERE, even if we're not doing them yet (or in case we're not covering that in this class).

1. 41 + 0.59 * 95.5 = 97.345. Residual = 94 - 97.345 = -3.345

2. There are appears to be a decently strong trend in the data, but around a quarter of the way down the line, the variability drastically increases, meaning that in the actual x-y scatterplot, there is even more variability, so the fit might be even less appropriate.

3. 0.59 describes the estimated difference in head length if the total length for a case happened to be one unit larger.

4. There is a lot of terminology thrown around, especially 7.1. Could this terminology be clarified in class? It gets a bit overwhelming.

1. e = 94 – 97.345 = -3.345

2. As the x values grow larger there seems to be an extreme amount of variation and the scatter plot loses it linearity that it possessed in the lower values of x.

3. Each time the possum gets larger, by the cm, it head grows larger by .59mm.

4. How do residual values help us with determining linear patterns of scatter plots?

1) Expected y = 41 + .59*95.5

Expected y = 97.345

Residual = 97.345-94

Residual = 3.345

2) I would say that the data are tightly grouped at smaller x values but the data begin to spread out as x increases.

3) For every inch that a possum's length increases, the expected increase in their head length is .59 inches.

1) e = 94.0 - (41 + 0.59 * 95.5) = -3.345

2) there is high variation in y for high x, and small variation for small values of x. Therefore, the points on the scatter plot spread out for larger values of x.

3) It's the rate at which the head of the possum increases as its length increases.

4) are there different ways to model the residual?

1) 94 - (41 + .59*95.5) = -3.345

2) The y-values get less accurate as x increases

3) On average, if a possum is 1 inch larger in length its head is .59 inches larger

4) can we use linear regression for a closed set of curved data such as the first plot on p 284?

A)

-3.35

B)

There are some variation in y for big values of x. However, this variation gets small when x is small.

C)

The different in length in each one is 0.59.

D)

Can you accept my late answers?