Though there were a lot of interesting insights made throughout the novel regarding cryptographic security, the point that most stood out to me had to do with the “paradox of the false positive.” In chapter eight of the novel, Marcus comes up with the idea to clone arphids in order to create a high number of unusual travel patterns and consequently bog down the DHS’s tracking systems. He goes on to describe how anytime you are trying to collect data on a wide scale, lets say for example from one million people, the test’s percentage of accuracy needs to be the same as the uncommonness of the thing being looked for. For example with one million people being tested, a 99% accurate test would still find 1,000 positives, which would be very unhelpful if you’re looking for only one specific person. The more people, and the less common the variable you’re searching for, the more unusable your test becomes.
What I thought was so interesting about this point is that when looking at the math for this sort of data mining, it seems so illogical. The probability of finding usable data in this manner becomes more and more difficult as the amount of data increases. However, just hearing the phrase “99% accuracy” creates an inherent false sense of security. This false sense of security becomes dangerous when we rely heavily on technologies such as these to find information for us. What happens when the accuracy is lower than 90%? Lower than 80%?
One thing we have discussed in class more than once is the idea of data mining, especially in schools, to attempt to find patterns that would predict crime before it occurred. The point that always gets brought up in favor of this sort of data mining is that it potentially could keep students safe, which of course would be beneficial. However, lets say that these measures were implemented at Vanderbilt, with 12,725 students, and the test had a reasonable 95% accuracy rate of finding potential threats. Theoretically 636 students could be found potential threats by the system. It’s improbable and illogical to question over 600 kids in order to find a possible one or two actual suspects. Though neither Marcus nor I were making the claim that all data mining is useless, seeing the numbers on how useful it really is puts the idea into a better perspective.