Occasional reflections on Life, the World, and Mathematics

Identifiability


A hot topic in statistics is the problem of anonymisation of data. Medical records clearly contain highly sensitive, private information. But if I extract just the blood pressure measurements for purposes of studying variations in blood pressure over time, it’s hard to see any reason for keeping those data confidential.

But what happens when you want to link up the blood pressure with some sensitive data (current medications, say), and look at the impact of local pollution, so you need at least some sort of address information? You strip out the names, of course, but is that enough? There may be only one 68-year-old man living in a certain postcode. It could turn into one of those logic puzzles where you are told that Mary likes cantelope and has three tattoos, while John takes cold baths and dances samba, along with a bunch of other clues, and by putting it all together in an appropriate grid you can determine that Henry is adopted and it’s Sarah’s birthday. Some sophisticated statistical work, particularly in the peculiar field of algebraic statistics, has gone into defining the conditions under which there can be hidden relations among the data that would allow individuals to be identified with high probability.

I thought of this careful and subtle body of work when I read this article about private-sector mass surveillance of automobile license plates — another step in the Cthulhu-ised correlations of otherwise innocuous information that modern information technology is enabling. Two companies are suing the state of Utah to block a law that prevents them from using their own networks of cameras to record who is travelling where when, and use that information for blackmail market research.

The Wall Street Journal reports that DRN’s own website boasted to its corporate clients that it can “combine automotive data such as where millions of people drive their cars … with household income and other valuable information” so companies can “pinpoint consumers more effectively.” Yet, in announcing its lawsuit, DRN and Vigilant argue that their methods do not violate individual privacy because the “data collected, stored or provided to private companies (and) to law enforcement … is anonymous, in the sense that it does not contain personally identifiable information.”

They’re only recording information about  So, in their representation, data are suitably anonymised if they don’t actually include the name and address. We’re just tracking vehicles. Could be anyone inside… We’re just linking it up with those vehicles’ household incomes. Presumably they’re going to target ads for high-grade oil and new tires at those cars, or something.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Tag Cloud

%d bloggers like this: