# Reading Assignment #4 - Due 1/25/12

Here's your next reading assignment. Read Sections 1.5-1.7 (omitting Section 1.6.2) in your textbook and answer the following questions by 8 a.m., Wednesday, January 25th. Be sure to login to the blog before leaving your answers in the comment section below.

1. These charts from Businessweek make the point that correlation does not imply causation. Identify a possible lurking variable for one of the charts.
2. Let's say I wanted to compare the effects of teaching calculus in two different ways--with clickers and without. Why would it be difficult to conduct a randomized experiment to compare these methods here at Vanderbilt?
3. What's one question you have about the reading?

## 60 thoughts on “Reading Assignment #4 - Due 1/25/12”

1. 1.
For Figure 2, a possible lurking variable is the publics concern over global warming and their want for more research or action to take place over the issue.
2.
It may be difficult to control the differences between the groups because there is not a large enough sample size of students taking calculus to know that each class has the same level of intellect.
It may be hard to get a completely random sample of calc students because the lazier and/or more cost sensitive students would most likely take the class that didn't require a clicker (or clicker participation).
3.
Would a person be able to point out lurking variables for any kind of faulty correlation and causation, or do lurking variables only exist for more closely related data?

2. 1. In the NSF R&D Fund vs. Global Warming graph, a causation is implied between the budget for the National Science Foundation and the increase in global temperature. However, a leading variable, namely global CO2 production, is left out of the graph, despite being the most likely explanatory variable for the rising global temperature.

2. Because of the limited number of calculus classes and sections at Vanderbilt, dividing the study by having one teacher use clickers and one teach without clickers would give unreliable information depending more on the teacher than on the single clicker variable. With a larger sample size of teachers (in the hundreds), the study could be more accurately conducted in this manner.

3. When cluster sampling, should the population be split into random clusters, or divided based on some characteristics that each cluster would have in common?

3. 1. A lurking variable for figure 1 is where the Facebook users are located and when Facebook was started. The timeline that Facebook became more popular might just have to do with independent events.
2. It would be difficult to conduct this experiment because there are more than just two variables involved. Some students learn differently than others and some were never good at calculus in the first place. There is also the variable that is how much effort a student puts into the class. It doesn't matter which method you use if a student puts no effort in what so ever. Also for the group not using clickers, there are many methods that can be used that do not involve clickers.
3. I do not understand figure 1.41 where they randomly split in half. Why is it that in the control group there is one group with both low-risk and high-risk and one group with just high-risk.

4. 1. In Figure 2 showing the correlation between the average global temperature increase and the increase in the National Science Foundation's budget perhaps a lurking variable is the amount of money in the Science Foundation's budget allocated for global warming research. It may be that as the global temperature rises more money is being given to the National Science Foundation to research the problem.
2. In conducting this randomized trial it would be impossible to split the students up into two different classes without them knowing which class was which, as one group would have clicker and one would not. This would lead to bias as one group may see the clickers as more effective and the fact that they do not have clickers is what is negatively affecting their grade when it may actually have no relation and the student may just be bad at calculus with or without a clicker.
3. Most of the information in these chapters was pretty clear. I was a little confused on how to determine possible lurking variable for some of the figures in the Bussinessweek charts but it may just be that the correlations were very obscure to begin with.

5. 1) A possible lurking variable in the second graph could be the amount of money spent on researching global warming. While both the National Science Foundation R& D budget and global warming has gone up in the past 16 years, that doesn't mean that scientist caused global warming. The budget could have gone up due to an increase in research about global warming, or the increase in global warming and the research budget could be completely unrelated. There isn't enough information to tell.
2) It would be difficult because it would be impossible to conduct a blind experiment. It's impossible because since the students and the teacher know if clickers are being used, the placebo effect comes into play. The students using clickers might start studying more or feel more confident while taking tests since they know the clickers are supposed to help. That would make it impossible to really tell which teaching method is better.
3) In blocking, are the different groups given different treatments to see how a drug works with a certain group of people or are they all given the same treatment? And if they are treated the same, is all of the data from the different blocks combined into one data set or are the multiple sets compared?

6. 1. One possible lurking variable for Figure 4 is the advance of technology. As society has gained access to cellphones and the internet, we no longer rely on newspapers to get our information, so the number of newspapers sold has decreased. This advancement in technology also means information gets around quicker, so Shymalan's movies - most of which rely on one big secret or reveal - are more easily spoiled, which could potentially make the movies less enjoyable.

2. It would be difficult to randomize the experiment because you can't randomly place students in calculus classes. Students choose which class they are in based on teacher and class time. Students may be more likely to choose a teacher with a better reputation, and the quality of the teacher impacts how well students learn Calculus, meaning you could not control all the variables affecting how well Calculus students perform.

3. None, really. I remember experimental design fairly well.

7. 1. (Figure 3) A good housing market is likely related to a good economy. A good economy can also be linked to people having more children. The more children that are born, the more likely some of them are named Ava.

2. The best way to conduct this experiment would be to have the same teacher teach two classes at the same level: one with clickers and one without. Major grades could then be compared. The students would be randomized into classes because you don't know during registration if a clicker is required. However, the study cannot account for students who forget their clicker or skip class.
Also there could be lurking variables. For example, the two classes would have to be taught at different times. Students who would show up for a 9 am class might be overall better students (and thus receive higher grades) than those that who register for the class at noon. Or, everyone skips the 9 am but is well rested and ready to focus at noon.

3. What determines a big enough sample size?

8. 1) A possible lurking variable for Figure 3 entitled "Did Avas Cause the US Housing Bubble?" is the increase in the population in the US. This would have caused both housing prices to increase (more people require homes, demand increases, prices rise) and since there are more babies being born, the number of Avas increased.
2) There are several factors which influence the performance of students. The calibre of the students might be higher in 1 of the classes, one of the classes might be earlier and students might decide to skip those lectures, and 1 professor might be better than the other.
3) Do lurking variables have to be correlated with both the explanatory and response variables?

9. 1. These charts from Businessweek make the point that correlation does not imply causation. Identify a possible lurking variable for one of the charts.
Figure 3 insinuates that the number of babies named Ava caused the housing bubble, showing an increase and subsequent decrease in Avas that coincided with the housing price index over time. One lurking variable is that the high housing price index may have encouraged people to have babies and the fall discouraged them. In that case, the housing bubble would have a causative effect on the number of babies, regardless of their names, which would undoubtedly affect the number of Avas.
2. Let's say I wanted to compare the effects of teaching calculus in two different ways--with clickers and without. Why would it be difficult to conduct a randomized experiment to compare these methods here at Vanderbilt?
Vanderbilt students are more intelligent than most and those taking calculus are most likely mathematically inclined. Furthermore, a majority of calculus students here have taken calculus in high school. This makes for a very skewed sample.
3. What's one question you have about the reading?
How can you ever really be sure of cause and effect variables? Is there some quantitative rule?

10. 1. A potential lurking variable for chart one is how population growth both increases the number of Facebook users and the number of people needing welfare/other government assistance from the Greek government.
2. There could potentially be a type of reverse Placebo effect, as the students without clickers know that the teacher is not doing everything to help them study.
3. Is it possible to randomize too much?

11. 1. A lurking variable in figure 3 is the economic recession.
2. You couldn't do a simple random sample of Vanderbilt students because many have not taken Calculus, with or without a clicker. Instead you would have to search for students that have taken Calculus and then create blocks of students who used a clicker and did not use a clicker. It may be difficult to find any students who have used a clicker in Calculus at Vanderbilt, so it would definitely be hard to perform a randomized experiment.
3. What type of sampling is most used by researchers conducting real experiments?

12. 1) There is a lot going on under the ground. Something has formed the mountain range in the New York State long time ago. And maybe this something creates some magnetic fields or whatever out there, and this could make people crazy and cause more murders. (It might sound stupid, I know, but one time I watched a documentary about something like this and so I thought it would make some sense here...)
2) It would be difficult to conduct a randomized experiment to compare clicker/non clicker methods when teaching calculus at Vanderbilt because say, if you have 2 calculus classes, then you might want to teach one class using clickers and another class without clickers, but the experiment would not be random. In a randomized experiment, each person should be assigned randomly into one or two groups. But students will be already registered for their specific time of a class and it would be difficult to divide the group of calculus students into two groups in such circumstances.
3) It seems to be kind of difficult to avoid non-response bias. But maybe is there a good strategy in avoiding it when collecting a random sample?

13. 1. Looking at the charts, it was very difficult to find a lurking variable because usually the two variables were extremely different like the mountain range and murder rates.... The only possible lurking variable I could think of deals with Fig. 2.. I believe a lurking variable would be the amount of studies done about global warming. As temperatures rise, there may be more global warming experiments to explain why and the rising number of experiments could cause the budget to increase. Therefore number of global warming experiments is correlated with both the explanatory and response variables.

2. A randomized experiment would require that each student is randomly assigned to either the treatment or control group and that would be near impossible at vanderbilt. There are way too many student in calculus with complicated and different schedules that you could not randomly assign students without many convicts arising. If you changed any of the students for a time conflict, then it would hurt the experiment and it would not be randomized.

3. I know that correlation does not imply causation but how do you get around this to fully determine causation. I feel like experimentation would still only be showing correlation. I believe it could help support the theory of causation but causation seems like a very hard thing to prove.

14. 1) Average global temperature in figure 2.

2) Because calculus class at Vanderbilt does not use clicker at all and the data will be biased.

3) If association does not imply causation, would it apply the other way around? Or will causation always imply association?

15. 1) Althought most of the data on these charts is completely unrelated (ie the shape of a mountain and NY murder rates). You could say that the increase in average global temperature has effected the public's interest in discovering new energy sources, which then might have caused the increase in the National Science Foundation's budget.
2) The population you will have to work with has, for the most part, already been exposed to clickers. Any study showing a correlation between learning and clicker's would be biased since most of us know how they work and are comfortable with them already. To make it truly randomized, you would need to collect people of different ages, backgrounds, exposures to technology, and income levels.
3) Doesn't blocking cause more problems than it solves? Yes, you are dividing the groups and trying to eliminate variables, but at the same time, you are halving your sample size and created probably more variables than you can reliably account for.

16. 1) These charts from Businessweek make the point that correlation does not imply causation. Identify a possible lurking variable for one of the charts.
> The charts are jokes and I don't think that there is a variable linked to both lines in any of the charts.

2) Let's say I wanted to compare the effects of teaching calculus in two different ways--with clickers and without. Why would it be difficult to conduct a randomized experiment to compare these methods here at Vanderbilt?
> I do not think it would be difficult to conduct this experiment.

3) What's one question you have about the reading?
> No questions

17. One example of a lurking variable is the study of global warming. Because global warming is becoming a big issue, more money is provided to study the phenomenon.
It would be difficult to conduct a randomized experiment because most of the students who take calculus also take classes that use clickers.
How do researches decide how many samples would be enough to find a meaningful correlation in a population?

18. 1. One potential lurking variable for the plot titled “Would M. Night Shyamalan Start Making Good Movies Again if People Bought More Newspapers” is the economy and a decrease in the sales of non-essential goods. Newspaper sales took a big hit from the economic crisis. As did the big film production companies, who count no longer shell out a ton of money for their productions to make sure they don’t suck.
2. It would be difficult to conduct a randomized experiment to compare these methods here at Vanderbilt because of all the other factors that would have to be controlled, such as professor, number of students, textbook, room location, etc. and Vanderbilt cannot easily provide ample classrooms in which to potentially make duplicates.
3. I’m a little confused about blocking. Is there a limit to how many variables we can block for?

19. 1.) In figure 6, the position from which the mountain is viewed is a possible lurking variable. Viewed from a different angle, the peaks of the mountain might not at all sync up with the peaks of the graph.

2.) Different classes are taught by different professors, whose individual teaching and grading styles may vary. This alone could greatly affect the outcome of the experiment (at least if the metric for evaluating the 'effectiveness' of teaching calculus is the grade in the class). Furthermore, we cannot assume that students have signed up for the classes at random. Students may have specifically signed up for the section taught by a particular teacher because they have some outside knowledge (e.g. ratemyprofessor.com). In this case a lurking variable would be whether or not students use ratemyprofessor.com. There are many other issues. In fact using classes at Vanderbilt to conduct this experiment might eliminate much experimental control.

3.) The experimental design section indicated that repeatability is an important factor in experimental design. Statisticians may repeat an entire study to ensure that the results are consistent. How are we to interpret disparate study results? That is, if we replicate a study several times and the results are not consistent, how are we to go about numerically investigating the anomaly, assuming we are confident that all experimental parameters have been consistently controlled?

20. For Chart #4, the lurking variable that affects both M. Night Shyamalan and newspapers could be technological advancements. People now expect a lot more out of their movies than just really bad plot twists. The quality of movies can be increased dramatically through proper use of technology (and writing). The same can be said for newspapers. Through the rise of the internet and digital media, print media is becoming less and less popular. Information can be up to the minute, and therefore quality can be a lot higher than could previously be achieved (note: CAN be). Another reason could be the economy, and how much people are likely to spend on movies. Nowadays if you see an M. Night Shyamalan movie, you know you got ripped off, and you are more adverse to losing that money. Hence worse ratings, as your money doesn't go as far anymore, and so you're bitter. With print media, when all the same news is readily available and free, paying for it doesn't much make sense.

If you were comparing those two sets, it'd be difficult because of your sample set. It'd be biased in terms of average intelligence level, because of the location of the students (a top 25 college), and because of the sample size (not many people take calculus, especially when you consider Vanderbilt's size compared to most colleges). There's also a bias, in that someone who knows they'll be tested on the information from class may be more likely to pay attention in class, and the teacher will pay more attention to specific concepts. It's sort of like the case given in the book (1.7.2), without blinds or the placebo effect.

What are some of the lurking variables for the other Businessweek figures?

21. 1. These charts from Businessweek make the point that correlation does not imply causation. Identify a possible lurking variable for one of the charts.

In Figure 4, media budgets may cause worse Shyamalan films and lower budgets for newspaper ads.

2. Let's say I wanted to compare the effects of teaching calculus in two different ways--with clickers and without. Why would it be difficult to conduct a randomized experiment to compare these methods here at Vanderbilt?

Each calculus class has a different level of students and it would be difficult to account for other variables such as the effectiveness of the professor by simply comparing a few classes.

3. What's one question you have about the reading?

When it's unclear whether one variable causes the correlation still be compared?

22. 1. A lurking variable for the graph of money spent on scientific research and global warming opinion could be the amount of money spent on global warming research.

2. In order to conduct a random experiment you would need to have students from many different majors take the calculus classes, which is not likely because many of the students who pursue non-math-related majors would not have time or would not want to participate in the experiment.

3. Why can causation not be inferred from association in an observation?

23. 1) In figure 2, both global warming and the NSF R&D budget steadily increase with time. The possible lurking variable is the research needs of the United States as a function of time. R&D has been funded in steadily increasing amounts since the end of the cold war. This is likely just a coincidence.

2) The possible reason this study would not be effective is that you will not get the same class to take calculus twice (at least without knowing it from the first class). Thus, the study won't have the same sample for both with clicker and without. This means that there will be a large source of error in the study unless it is performed many years in a row alternating between with clicker and without.

3) What is a qualitative method of measuring randomization such that you can tell if a sample is sufficiently randomized?

24. 1. For Figure 1, a lurking variable could be the internet itself. the internet is a variable correlated to both the explanatory and response variables. Facebook cannot be the only leafing cause of Greek debt crisis.
2. it would be difficult to conduct a randomized experiment o compare these methods here at Vanderbilt because there is a large population to take from and it would be difficult to get meaningful data. Also, not everyone takes calculus and thus would not be effective to do solely random sampling. It may be better to use blocking techniques.
3. I am not exactly sure how to identify lurking variables.

25. 1. Figure 3 has the lurking variable of the state of the economy. As the economy prospers, people have more children (more children of every name) and house price goes up. Then the recession occurred, people weren't economically stable and had less children (less of every name) and couldn't afford houses and the index dropped.

2. You couldn't randomize the experiment, otherwise half of the class would have to follow a different grading rubric since they would not be able to participate in clicker questions. They'd also have to leave the classroom so as not to be affected by the process. Lastly, in order to get a large enough sample size (enough replication) you'd need to have multiple calc classes in the experiment, but each professor teaches differently (some well, some teach poorly) and this would greatly affect results because it was introduce another uncontrolled variable.

3. The reading mentioned at least five times that causation cannot be determined from observational studies even when it is shown that all other variables were controlled. Why is this?

Also, a quick question, it says a lurking variable has to be one that correlates with both the response and independent variable. But in most of the graphs in the first question, there wasn't a single confounding factor for both, the two graphs just happen to both be increasing for completely different reasons (i.e. inflation goes and population goes up). So what do you call those sort of variables where they affect one, but not both?

26. 1. For the first chart, Greek debt would be a lurking variable.
2. Clicker integration and use varies with each teacher. Each teachers clicker use is not random, and that throws off any hope of generating an study that limits the effect of teachers on teaching.
3. I don't have any questions from this reading.

27. 1) Figure 3: One possible lurking variable would be the economic impact of the housing bust to family planning. Families who wish to expand their family may choose to wait until their financial stability becomes better. Therefore, instead of Avas causing the US housing bubble, the bubble itself may have impacted the family planning of millions in the US, which leads to less babies born after the bubble burst.

2) Probably because it is hard to obtain two test groups that are similar and comparable to each other.

3) How do you judge whether your sample is sufficiently random?

28. 1) A possible lurking variable for figure 6 is the amount of legislation in the state of New York concerning the acquisition of guns. Fewer gun laws would generally make it easier to obtain a potential murder weapon
2) There are too many outside variables that could affect the outcome of the study. For instance, something as harmless as the time of the class could influence how awake students are and how prepared they are for the class, which in turn might weight the study in one class' favor over another's.
3) Is there a scientific way to say that, for example, ice cream sales do not affect number of shark attacks in a given month but rather that both are caused by warm/cold weather instead of just saying "it's common sense"? Is there a way to prove that two data that are both effects of the same cause, and which therefore have a misleading relationship, are not actually caused by each other?

29. 1) For figure 2, the global temperature might have increased because of global warming. The national budget has also increased and so would the science budget.

2) It would be tough to randomize a trial because each teacher would have to teach with/without clickers. Any data collected could be caused by the teachers themselves and not the clickers.

3) I'm not sure what the advantages of cluster sampling, why not just conduct a simple random sample?

30. 1. A lurking variable in the National science foundation may be the amount of CO2 emission that may affect both the research funding that will attempt to reduce the emissions and it will also affect the temperature of the planet.

2. Because it would be hard to block all the other factors that affect learning other than the use of clickers.

3. Is there a mathematical way of determining lurking variables?

31. 1. These charts from Businessweek make the point that correlation does not imply causation. Identify a possible lurking variable for one of the charts.
Population size is a possible lurking variable for the global warming chart.

2. Let's say I wanted to compare the effects of teaching calculus in two different ways--with clickers and without. Why would it be difficult to conduct a randomized experiment to compare these methods here at Vanderbilt?
It would be difficult to remove the bias of the professor and the disposition of students taking the class. For something of this nature, perhaps the best means would be to compare different years--all with clickers one year and all without clickers another year. Even this has its drawbacks though and would need several iterations to remove differences within the groups year-to-year.

Nowadays it is common to take a large source of data (eg all patient records for a hospital or state) with many variables and try to identify causal links within them. This textbook seems to frown unconditionally on the use of observational data to draw causal links. However, at what point can a strong association in such a comprehensive observational dataset imply causation?

32. 1) A lurking variable for the newspaper/M Night Shyamalan graph is that newspapers are readily available online or on e-readers now.

2) There is a small number of calculus classes at a smaller private school like Vanderbilt. This would not allow for a large amount of randomization.

3) What is the quantitative way to decide how many people you should randomly select?

33. 1) In the "Would M. Night Shyamalan Start Making Good Movies Again if People Bought More Newspapers?" chart, a possible lurking variable would be the increase in the usage of the internet. As more people use the internet, less people buy newspapers, and more people with good taste in movies have the ability to use sites such as Rotten Tomatoes to display their contempt for M. Night Shymalan's unnecessary use of plot twists.

2) Some students have more experience using clickers than others, making it difficult to get an accurate read on which works better.

3) Was there a third question about the reading?

34. 1. for fig 3, a lurking variable could be an increase in the population, which would increase the number of babies named ava and increase the house price index due to demand for more houses.

2. the sample is too small due to the small amount of classes that teach calculus. furthermore, there are other variables that may affect the effects of teaching.

35. 1. A possible lurking variable could be present in Figure 2 (global warming and R&D budget). That variable is the level of pollution. As pollution increased, global warming increased. Similarly, as pollution increased, the National Science Foundation increased the R&D budget to attempt to find ways to reduce the pollution level in an attempt to make the world cleaner. Thus, an explanatory variable which was not considered in the chart could have influenced both of the graphs and the two may not have any effect on each other directly.

2. It would be difficult to conduct a randomized experiment here at Vanderbilt because the students at Vanderbilt are (or at least should be) smarter than the general student population. By only collecting students from Vanderbilt, a bias is introduced into the experiment which may influence the final results. In addition, even just at Vanderbilt, there are likely other variables which could also influence the result (such as class time, intelligence of the class, how many extracurriculars the students in each class have, etc.).

3. I don't understand the concept of "blocking."

36. 1) The growth of global temperature and the growth of the NSF R&D budget are both related to the lurking variable of economic growth.

2) Whether or not a student uses a clicker in his or her calculus class is one of a myriad of factors in his or her success or failure to learn calculus. Others include professor, class size, and length of class. It would be difficult to control those other factors.

3) How does one test a sample for bias?

37. 1) A lurking variable for the first graph would be the number, or proportion of facebook users in Greece.

2) The sample size would be very small, too many other factors that vary with the professor will also change the results. You would have to ensure that the time of day that the students took the class was consistent between clicker and non-clicker courses.

3) Is it possible or necessary to control for different variables in an observational study?

38. 1) In figure 4, the increase of internet news sites/blogs/commentary sites probably haven't affected M. Night Shalayman's ability to write a good story.
2) Yes, because if your entire student body is the population, most have taken calculus at different levels, and had many different teachers. Also I'm not sure any teachers teach calculus (or at least very few do) with clickers.
3) How can you account for a non-response bias?

39. 1. Most of the correlations seen in these graphs are probably just coincidental. A lurking variable that might exist in the M. Night Shyamalan graph is number of people using the internet. More people surfing the internet could mean more news is being obtained that way causing a decrease in newspaper sales. More people on the internet could also lead to more online discussions and reviews of current movies leading to higher standards for critics, thus Mr. Shyamalan's plumeting rotten tomatoes score.

2. Conducting this experiment at Vanderbilt would be difficult because the students here are not randomly selected and tend to be "above average" academically. Therefore, an experiment testing the effects of teaching calculus with and without clickers might not apply to the entire population of college students.

3. Even in a double blind test, doesn't someone have to know who's getting which treatment? Can't that person introduce bias? It seems then that an experiment can never truly be blind.

40. 1. For figure 1, the lurking variable is simply time. As time increases, Facebook's popularity increases, resulting in an increasing number of Facebook users. As time increases, the yield on bonds also increases, due to values of interest.
2. Most students at Vanderbilt have already used clickers so it might be skewed in favor of using clickers just because students are more accustomed to them.
3. Can you give some more examples of lurking variables?

41. 1.) A possible lurking variable in the chart of Babies named "Ava" and Housing price index might be an increase in population. An increase in population would mean more children being born and consequently more babies named Ava. This would also increase demand for houses and can lead to an increase in the price of houses.

2.) It is going to be hard to conduct such an experiment because the experiment would not be randomized as the people taking Calculus here at Vanderbilt mostly already have Calculus taught to them without the use of clickers at an earlier stage and that would not be accounted for in the experiment.

3.) In cases where a really large number of lurking variable possibilities have been considered and non found, is it safe to conclude causation?

42. 1. The number of active Facebook users in graph 1.
2. There is no control, not enough randomization and not enough replication.
3. At what point is there enough replication to make a correlation between variables you are studying?

43. 1. A lurking variable for fig. 3 "Did Avas cause the US Housing bubble" could be an increase in births in general. More babies would likely mean more Avas, and could also cause the demand for houses to go up, which would increase the price index.
2. Calculus is a "weed-out" course at Vanderbilt, so the grades will likely end up in the same ranges regardless of whether a clicker is used in the class or not.
3. How do you decide what a meaningful sample size would be? I know that with baseball statistics, they have different minimum sample sizes for each statistic, but how do you go about finding those?

44. 1. I'm not sure about the meaning of this question. In figure 1, there is a variable x about the increasing rate. The increasing rate of facebook user is similar to the rate of government bonds in Greek.
2.Because it's common that the ability of learning is different between every student. It's difficult to conclude that one group of students do better than the other group is due to using the clicker. Maybe other things during the experiment can also influence the result
3. From my point of view, it's impossible to create a complete randomization. What can we do to improve randomization as much as possible?

45. 1. In the example with the babies named Ava and the housing index, there could have been a possible boom in childbirth as more people could afford to have more children, and then when the housing market collapsed, fewer children were born as parents didn't have the money to spend on children in the near future, and they were thinking ahead, which would lead to a decrease in babies names all together. Additionally, in the global warming example, more money could have been budgeted to the National Science Foundation to research global warming as it became a very high talking point in the political media, and that could have caused the increase at a similar rate to the actual increase of the global temperature.
2. This experiment would never be completely random at Vanderbilt. Too many students have taken calculus in high school, and will be quicker at re-learning, or simply regurgitating the material. This will more often than not skew the results of the clicker vs. non clicker experiment because there could be some correlation between the clicker vs. non clicker that won't necessarily be caused by the fact that either clickers are used, or not used.
3. Although it's definitely true that correlation doesn't imply causation, with pure statistical numbers, is there any size of population measured to determine true causation and not correlation?

46. 1. One lurking variable for the first chart is time. Facebook users increase with time as well as the Greek debt.

2. This is difficult because the only difference between the classes can be the use of clickers. Everything else needs to be exactly the same.

3. When is correlation most useful?

47. 1. A possible lurking variable for the the Global Warming / National Science Foundation could be that increased industrialization is causing both more pollutants to be shot into the atmosphere (increasing tempuratures) and spurring a greater public interest in science and tech (increasing the R&D budget). Alternatively, the increasing temperature could be the reason for the increasing budget, as the NSF could be researching towards countering the trend.

2. It would be very difficult to do a randomized trial about teaching with clickers or without at Vanderbilt because there are very few (or no) calculus classes that use clickers. The sample size for that group would be very small and likely to have additional biases.

3. My major lurking question (punny!) is how would Cluster sampling not adversely affect the randomness? If you took a sample of education among all the counties in a given state, and chose 4 random counties then 200 people in each. What happens if you choose three counties that are fair representations of the average state, but then choose one that has abnormally high education rates, which results in very skewed returns. Instead of returning something like 5% masters degrees for the whole state, with 1/4 of your people coming from one place that has a 70% rate. That would give you back almost 23%.

48. 1. A lurking variable could be that because the temperature of the earth is increasing, the government is spending more money on R&D. Not the other way around.
2. There are tons of other variables involved than just clickers. Different professors for example.
3. It mentions association versus causation. It didn't discuss correlation versus causation.

49. 1. For global warming, a lurking variable connecting the two is that as the Earth gets warmer, people become more concerned and therefore more budgets are allocated to science to fix the problem.
2. Despite how great the math department is, only select teachers use clickers, and those teachers may be either great or bad. Because of the small sample size of teachers, even one 'outlier' of a teacher (good/bad) could greatly change the results.
3. I read 1.6.2 - what are the real world applications of the 3 sampling methods?

50. 1. A lurking variable for figure 2 could be global warming since scientists need R&D funds to study it and it is "supposedly" increasing the earths temperature.
2. It is hard to determine how good your data will be because there are other factors that can come into play with the experiment. Some of the students may not have taken a calculus course or even used a clicker before. The data will be skewed a certain way depending on who you sample at the university.
3. I am having trouble understanding the blocking of data and how this process is accomplished correctly.

51. 1. Years.
2. Because it is difficult to apply blocking principle with our experiment.
3. How can we do blocking by separating low risk and high risk patient at the same time when we want to randomize patients whom their susceptibility to a disease can be approximated through family history?

52. 1. A lurking variable in the housing and Avas chart is the number of babies. If more babies are born, then we can expect more babies to be called Ava and in turn more houses will be bought due to growing families that can't be supported by apartments.

2. There are not enough students here at Vanderbilt to satisfy the replication principle of randomized experiments. The experiments will have to be done in numerous schools.

3. None.

53. 1) For Figure #4, a lurking variable could be the rise of the internet and other electronics. Many do not feel the need to purchase newspapers anymore. These people end up reading articles online causing less of the actual newspaper to be read and less of a concern for things such as the director's move ratings.

2) Clicker questions are often discussed skewing results by softening individuality for group efforts. The professor may pay greater attention to the group without clickers simply because they don not have it as a learning tool. Also, Vanderbilt in general uses clickers. I know I personally have used it for at least six classes. Students have already decided that they like clickers so if they are taken away the student may unknowingly stop focusing as hard.

3) Does a lurking variable act as a direct causal joint operation (as in this causes this by way of lurking variable) or is it simply related to the two given variables?

54. 1. A possible lurking variable in Fig. 2 is that the National Science Foundation R&D Budget could be rising for other reasons than to research global warming. For instance, it could be rising due to an increased emphasis on cancer research or something of that sort.
2. I have not taken any of the lower-level calculus courses here - only Math 175 - so I do not know if clickers are commonly used in the courses. If they are, it would be difficult to perform this randomized experiment because there wouldn't be enough cases of classes that do not use clickers to measure the effect. The same is true if clickers are not used often, as there wouldn't be enough cases of classes that do use the clickers frequently. In addition, it would be hard to randomize the student sample in each class, as the students have the right to pick which class they want to be in. This could introduce bias, as some students may learn exceptionally well with clickers, while others may not.
3. I do not have any questions about the reading. Much of the information has either been discussed in class, or I have learned it in previous Psychology courses.

55. A)
Let's take Fig#3: The figure is an example of anecdotal evidence. According to the figure, children are the reason of the housing bubble (it ignores the fact and makes up a vrey interesting fact). The interest of the goal to make the viewers un-awary of the real reasons of the housing bubble. Maybe the numbers of babies is explanatory variable in the relation. However, the study has an interest of making the numbers of babies is the main reason. For that, the figure has bais because of the the wired interest !!

B)
It is difficult because humans are the one who the expermint (study) will be
done on. In other words, the results of the study may change because of our emotional effect. In other words, we, the human, will know by common sense what suppose to the right choice (answer)by dafult in these study. To solve this issue, we should blind the two groups if it is possible (not by making them can not see any things. By fooling them !!).
let me give the excpected result from the students. most of them (if not all) will say the clickers are much better. However, they are not better because of the advance tech. It is becaus that students will be in much relax mode with clicker (they will become more lazy).
Another story, most of studnets will say it is not good to have clickers. the reason for that is not because that the clickers are bad or slow the education. the reason is because they are expensive to the students (So, it effects the students from totally different aspect).
Therefore, the study is bais. the solution is that we neet to blind the studnets somehow!!

I think my answer is clear

C)
I am not sure about principles of experimental design.....

Thanks

56. 1) A possibly lurking variable is the stated purpose (what they're being given that money FOR) of the National Science Foundation's R&D budget.

2) There would be a limited sample size for the experiment, "not using clickers" is rather vague/not specific, and one cannot randomize the students in the calculus classes because students have control of their own scheduling.

3) The reading says that a lurking variable is one which is correlated with both other variables being examined, but this makes question 1 of this quiz seem "loaded," in that the whole point of the graphs is that they're absurd because the two variables appear to be unconnected. I had to stretch just to find something that kind-of linked the two variables of a graph.

57. 1. In Figure 2, global wealth may be responsible for both global warming (more technology causes more waste), and an increase in the National Science Foundation's R&D Budget.
2. Because you're still only teaching students who want to take Calculus, you won't get a representative sample of Vanderbilt students.
3. How is data collected using blocking typically analyzed?

58. 1) A variable pertaining to gun control strictness in the final graph
2) There are a plethora of lurking variables in such a study that the final data could be easily contested
3) Better explanation of lurking variables

59. 1. One possible lurking variable is for figure 2 would be CO2 emissions.
2. It would be difficult to conduct a randomized experiment bc bias will unintentionally arise. This may because some students prefer to have class without the clicker and some students may not participate throwing off the stat data.
3. What would a correct answer to #2 be? had a range of answers but i wasnt sure which one was right.