As one might expect, the blogosphere has been abuzz with talk of the NSA’s collection of telephone call logs. Some sample quotes
I’ve encountered.
But the paranoia on the left, and in particular, the hatred for the Bush administration has become so intense there
is an automatic assumption that the NSA has to be engaging in nefarious activity, spying on you and your neighbor. The idea that the
agency is thinking creatively and proactively about how they can legally monitor the bad guys instead of just going about business
as usual is, apparently, out of the question for some. The sad truth is it is probably going to take another devastating attack to
convince many in this country that we are actually at war against Islamic jihadists.That is something true civil libertarians should think long and hard about. The more vigilant we are today in preventing attacks,
the more it will pay off in spades in terms of protecting our civil liberties in the future. Because if this country gets hit with a
small nuke and 30,000 or 100, 000 Americans die, all of the debating will be over. The ensuing crackdown will be massive, and the
loss of REAL civil liberties will become very, very possible.
That may be true. At the very least, it touches upon something at the core of the debate over privacy, the Patriot Act, and
surveillance programs: at what price? To what extent can security be developed and maintained without infringing “too much” on
individual liberties, freedom from intrusion of the government on private lives, and the right to privacy in general?
Next up: your bank records? Your medical records? Your credit card records?
On the medical record front…you might want to go re-read the HIPPA legislation, and perhaps educate yourself about the MIB.
For credit cards….well, I don’t think I’m alone in not being particularly surprised that the NSA was/could be mining telephone
records. In fact, I wouldn’t be at all surprised if one of the reasons the story has caught and spread so quickly is that it’s
exactly the sort of thing that many of us suspected could be happening.
I, for one, have similar suspicions about credit card records, based on prior legislation proposed or passed, and based on my
limited experience in working with a bank.
DefenseTech, via Donklephant:
Here’s what Krebs had to say about the newly-revealed NSA program that aims to track “every call ever made”: “If
you’re looking for a needle, making the haystack bigger is counterintuitive. It just doesn’t make sense.”
I don’t agree with that assertion. At least in the modeling/mining exercises I’ve participated in, more data was always better when
subtle patterns indicating infrequent events were being sought. Yes, the larger the database, the bigger the data management
headaches. There is definitely something to be said for only pulling “just enough” data to build the model you’re working on, but
unless calls within terror cells are commonplace in the country, “just enough” is likely to equal “almost all” at least for the
types of models I’d imagine the NSA working with.
In the fuss yesterday, it was said that the NSA was respecting privacy by only obtaining phone numbers, and not getting
names/addresses as part of the pulls. Others correctly observed that it’d be trivial to get the name and address associated with a
phone number.
I submit that it would be possible for the NSA to build its database, and do its data mining magic, but better respect privacy
rights of innocent citizens.
From what we learned yesterday, the NSA obtained records that probably contain the following fields:
- Phone number making the outgoing call
- Phone number receiving the call
- Time/date of call
- Length of call
If I were building this database, I’d rely on a technique I’ve used when working with credit data. When I do a retro study, the
vendor removes all information that would permit me to identify an individual. However, my data may contain records in different
tables, whose links I would like to maintain. In those situations, I’ll provide a “link value”, and the vendor will encode those
link values using an algorithm unknown to me when sending data back to me. Links in the data are preserved, but I lack the ability
to tie data back to real individuals.
In the NSA’s case, I’d recommend having the phone company supply pseudo-phone-numbers. E.g., rather than supplying the phone number
202-555-0101, encode it using an algorithm and encryption keys unknown to the NSA but common among the phone companies, to
generate a pseudonumber that might look like 67ad7891f78e. The database would then look something like this:
- Pseudonumber making the outgoing call
- Pseudonumber receiving the call
- Time/date of call
- Length of call
….and, to provide some additional meaning lost by the creation of pseudonumbers, have a second table with fields like:
- Pseudonumber
- Exchange (i.e., city/state)
- Type of phone (residential, commercial, cellular, payphone, VOIP)
The NSA could then use that database to build a model that to identify what different types of “normal” or “anomalous” calls might
look like.
Once the model is built/trained, the agency could go back to the phone companies (after receiving appropriate blessings from
judicial officials providing oversight) with a list of phone numbers associated with “persons of interest”, and require the vendors
to identify which pseudonumbers map to those phone numbers. Consult the model to identify other potentially interesting
pseudonumbers, and (again, after appropriate oversight-blessing) obtain the decoded phone numbers mapping to those pseudonumbers.
Wash, rinse, repeat (if needed).
If the exercise were designed like that, or in some other manner that didn’t disclose information about normal “uninteresting”
people….I’d find the exercise less problematic.
1 response so far ↓
1 Mike The Actuary’s Musings » German Court Rules Data-Mining to be Invasion of Privacy // 24 May 2006 at 6:38 pm
[...] On the other hand, I don’t like seeing data-mining written off as being inherently evil. It’s a tool. Like other tools, it can be used in an “evil” manner, as well as a “good” one. I’ve written previously about one way that a data-mining exercise could (I think) be undertaken without significant erosion of privacy rights. [...]