Hear Ye! Since 1998.
Please note: This post is at least 3 years old. Links may be broken, information may be out of date, and the views expressed in the post may no longer be held.
Oct 10

Anonymizing data by removing enough personal information

Rapleaf has an informative blog post about how to more effectively anonymize personal data.

Notice the new interest categories. Specifically, take a look at that bottom record: a 56+ year-old man who enjoys Twilight, knitting, and Motocross. In the dataset, there aren’t any other records that look like him. Furthermore, if we were given just that set of attributes, we’d be able to tie them back to that specific record. Even though each individual attribute is non-identifying, the dataset is no longer anonymous.

The goal of Anonymouse is to selectively exclude data from the cookies we drop so that our users are sufficiently indistinguishable. We define “sufficiently indistinguishable” using the notion of k-anonymity. A dataset is k-anonymous as long as every record in the set is identical to no fewer than k-1 other records. We can therefore think of a k-anonymous dataset as consisting of clusters of records, or equivalence classes, of size k or greater.

Furthermore, we wouldn’t just like to k-anonymize the dataset; we’d also like to maintain as much valuable data as possible.

  10:35pm  •  Computing  •  Law  •   •  Tweet This  •  Add a comment