Rapleaf has an informative blog post about how to more effectively anonymize personal data.
Notice the new interest categories. Specifically, take a look at that bottom record: a 56+ year-old man who enjoys Twilight, knitting, and Motocross. In the dataset, there aren’t any other records that look like him. Furthermore, if we were given just that set of attributes, we’d be able to tie them back to that specific record. Even though each individual attribute is non-identifying, the dataset is no longer anonymous.
The goal of Anonymouse is to selectively exclude data from the cookies we drop so that our users are sufficiently indistinguishable. We define “sufficiently indistinguishable” using the notion of k-anonymity. A dataset is k-anonymous as long as every record in the set is identical to no fewer than k-1 other records. We can therefore think of a k-anonymous dataset as consisting of clusters of records, or equivalence classes, of size k or greater.
Furthermore, we wouldn’t just like to k-anonymize the dataset; we’d also like to maintain as much valuable data as possible.
stuloh The Tax Haven That's Saving Google Billions (BusinessWeek) http://post.ly/16C4I
stuloh Poor You! (Literally.) (Sweet Hot Justice) http://post.ly/161cm
stuloh The New Funding Landscape (Paul Graham) http://post.ly/15lU6
stuloh Should I be worried when outside counsel spells my name, and the company's name wrongly?
stuloh My Square card reader arrived... time to test drive this thing at a group dinner.
stuloh Who controls the Internet? (FT) http://post.ly/13tBq
stuloh Chile mine rescue in progress: http://goo.gl/TnPM (US) / http://goo.gl/itbR (UK)