More on the Kevin Bacon Problem and the NSA Überdatabase

More on the Kevin Bacon Problem and the NSA Überdatabase

19 May 2006 · No Comments

This is actually a response to a comment made to an earlier post of mine.

Background: Midtopia invoked the “Kevin Bacon Problem” as to why the NSA database of telephone calls seemed foolish. I disagreed. My disagreement merited the following response from Sean Aqui:

An interesting take on the Kevin Bacon argument. I’ll take issue with a couple of points:

  1. If they’re looking for “anomalous” calls, fine. But to do that you need to have a suspect, and at that point you can get the job done using old-fashioned warrants and wiretaps. The Kevin Bacon problem is meant to point out the difficulty in spotting meaningful calling patterns in a sea of random data.

  2. I like the idea of using pseudonumbers to protect privacy. But that idea has two problems: One, there’s no evidence that that’s what they’re doing; and two, generating pseudonumbers scuttles your first point, because it would make it impossible to identify “anomalies”. You’d have no idea (to use your example) that the number your suspected terrorist was calling belonged to a fertilizer company or a disposable cell phone.

Just my pair of Abe Lincolns.

Good thoughts. However, I offer the following in the way of rebuttal. :)

On pseudonumbers — my comments on pseudonumbers were intended to refer to an earlier post I made on the subject (but forgot to link to *sigh*). One element of using pseudonumbers would be the inclusion of an additional data table providing information about the pseudonumber — e.g. location and type of account (residential/commercial/cellular/payphone…) — without disclosing the identity of the owner of the number. (This is, by the way, essentially what I do when researching the use of consumer credit data for insurance purposes.)

Then, if you know a phone number of a suspected terrorist and you get a judge to agree with you, you can request that a phone company disclose the real name/number associated with the pseudonumber. If you find an anomalous call from that pseudonumber, go back to a judge and get permission to identify that accountholder.

I think we both agree, however, that it seems unlikely given other actions taken (or alleged to have been taken) by the current administration that the NSA is concerned enough about privacy to implement even basic privacy protections like the use of pseudonumbers.

Re the Kevin Bacon issue — it’s true that spotting anomalies in a large sea of data is difficult. But it’s not impossible. Such models are already in existence, for example in the form of fraud-detection models used by the credit card companies. If an anomalous transaction is made on your credit card, the fraud department investigates. However, what constitutes “anomalous” varies depending on your transaction patterns and the patterns of others like you.

Considering that several of the statisticians I’m familiar with who have worked on credit fraud models also have some classified government experience on their resumes… I wouldn’t be too surprised if a version of my “anomalous call” idea isn’t already in use.

Yes, you do loose some information when “anonymifiying” the data…but if you keep enough relevant but non-identifying details in the database…. using certain types of tools, you should be able to build a strong model.

Tags: Privacy · War on Terror