By Joe Cassiere, Kevin Johnson, and Michael Thomas

# Author Archives: Michael Thomas

# Who are the worst drivers?

Joe Cassiere

Kevin Johnson

Michael Thomas

**Who are the Worst Drivers?**

Statistical analysis is important for civil engineers in developing our nation’s transportation systems. Speed limits, turning radii, traffic signaling and more are all designed to reduce the number of accidents on the roadways. Also, car insurance companies rely on statistical evidence to set rates at which they charge their customers. For the average motorist, it’s common to think that everyone else on the road is a bad driver. Often stereotypes are made about certain people being worse on the road than others. Which demographic, though, has the worst drivers? Are men or women more likely to get into accidents? Does one specific state or area of the country have more accident-prone drivers? Are younger drivers more likely to crash than older ones, or vice versa? Using statistical techniques, we will attempt to answer these questions.

To start, we will be using data from the US Census Bureau’s website. The website supplies information on accidents by age, state, gender and several other categories. For the question of age, we will create a null hypothesis that claims that one age group is worse at driving than the others; while the alternate hypothesis is that all age groups generally get into the same amount of accidents. By setting up confidence intervals using the overall mean of the accident rates by the age groups, we will be able to tell which age groups are worse (or better) drivers.

To address the question of which state has the best drivers, we would be able to solve this question by utilizing hypothesis testing. Our null hypothesis would be that most states would have similar driving fatalities. This would lead to the number of fatalities through states having a low variance. Once we determine a variance that is suitably low, we would then be able to establish an alternate hypothesis, which says the variance is higher. Addressing this question with hypothesis testing allows us to determine if all states have roughly the same amount of driving fatalities per driver. Also, to test the driving skill of drivers by state, we could first find the overall mean and standard deviation of car accidents per capita in the United States by averaging all the states’ accidents per capita. Then, we could construct confidence intervals of 90%, 95%, and 99%. By comparing each state’s accident rate to the intervals, we will be able to tell which states have significantly better or worse drivers.

To test whether men or women are better drivers, we can use hypothesis testing again. We can let the null hypothesis be that neither men nor women are better drivers than the other. Numerically, we can say that the mean percentage of accidents that are men’s fault is 50%. The alternate hypothesis can be that either men or women are worse drivers than the other gender. Numerically, the alternate hypothesis is that the mean percentage of accidents that are men’s fault greater than or less than 50%.

For a finished product, we wish to use at least two important infographs. The comparison of drivers by state could be effectively depicted in a heat map of the United States. The shade of each state on the map will reflect the number of crashes per capita for its population. To address other factors simultaneously, we hope to put together a meaningful mosaic plot. This type of data visualization will be useful for trying to make predictions for the entire United States population based on our available data sets. For example, the mosaic could reflect that 22 year old men from the Northeast are the “most likely” to be in a fatal car accident. Out preliminary research has found that there is a significant amount of data regarding the blood alcohol content (BAC) of driving in fatal accidents. We will keep this data in mind as we more forward with our analysis in case our findings lack the appropriate complexity. BAC data could provide us with enough data for a strong linear regression analysis. In conclusion, we feel confident that both the quantity and quality of data regarding car accidents will appropriately fit the scope of the project and we look forward to the results of our analysis.