Bits of you are all over the Internet. If you've signed into Google and searched, saved a file in your Dropbox folder, made a phone call using Skype, or just woken up in the morning and checked your email, you're leaving a trail of digital crumbs. People who have access to this information — companies powering your emails and Web searches, advertisers who are strategically directing ads at you — can build a picture of who you are, what you like, and what you will probably do next. Revelations about government counter-terrorism programs such as PRISM indicate that federal agents and other operatives may use this data, too.
"Google knows what kinds of porn everyone in the world likes," Bruce Schneier, a security and cryptography expert told NBC News. Not only are companies tracking what you are doing, they are correlating it, he said.
Since news of PRISM broke, the leaders of the tech companies have denied knowledge of government access to their information. At Facebook, one of the world's biggest data collectors, Mark Zuckerberg posted a message that read: "When governments ask Facebook for data, we review each request carefully to make sure they always follow the correct processes and all applicable laws, and then only provide the information if is required by law."
But the law already permits quite a bit of digital sniffing — much of it without a warrant.
While authorities need a warrant to access the content of emails stored by companies like Yahoo and Google, they don't need a warrant for IP addresses of the computers used to access accounts, ProPublica notes. The government doesn't need a warrant to request draft emails, data stored in the cloud on services like Dropbox and Google Drive, and emails and texts that are older than 180 days old — investigators can demand them with a subpoena.
And when authorities do get a court order, the amount of available data multiplies. Director of National Intelligence James Clapper clarified in a statement to press Thursday that the government has access not to content of phone calls, but to "telephony metadata." That's a vague term, but at the very least, in includes who you called, from where, and when.
Painting a picture of you
Gather all of these shreds of metadata, apply some algorithms that spot clues in patterns, and you can put together a pretty good idea of who a person is, and what they're up to.
For example, when a group from MIT analyzed location data from cellphones of 1.5 million people in a single country over 15 months, the team could identify individuals simply by knowing where they were on four separate occasions.
When Netflix released anonymous watch histories of 500,000 subscribers as part of a public contest to create an algorithm that predicted what movie a person would like, Arvind Narayanan, a security researcher at Princeton University, and his colleague Vitaly Shmatikov, pinned names to numbers by comparing histories in the anonymized data with comments made by named individuals on IMDB. "In every case, you find two location points, or six to eight movies, or three data points ... it's enough to identify a person," Narayanan told NBC News.
Narayanan is now researching ways to make people harder to identify by their online behavior.
Facebook, as you might imagine, provides a wealth of identifying information. In a study published in the Proceedings of the Academy of Sciences in March this year, a team of data scientists showed that they could work out a person's sexual preferences, political leanings, and a host of other character details from their "likes." In a similar manner, others can work out similar identifying characteristics from "browsing histories, search queries, or purchase histories," they write in their paper.
"Commercial companies, governmental institutions, or even one’s Facebook friends could use software to infer attributes such as intelligence, sexual orientation, or political views that an individual may not have intended to share," they add. "One can imagine situations in which such predictions, even if incorrect, could pose a threat to an individual’s well-being, freedom, or even life."
Advertisers already track our lives with astonishing accuracy going off very little information — Target has known when a woman was pregnant even before her family did. And just as advertisers are profiling you to make money, law enforcement and counter-terrorism operatives make use of these clues to hunt for suspects.
In April, the NSA released a document called "Untangling the Web: A Guide to Internet Research." The massive work, 643 pages total, contains loads and loads of tips for hunting information that's publicly available on the Internet. One section, about "Google Hacking" isn't really hacking at all, just fancy tricks for locating accidentally published secrets.
Raytheon's Rapid Information Overlay Technology (or RIOT) software was built to make some of this searching easier. Its government customers use it to compile case files of location data scraped from checkins on Twitter, Facebook, Foursquare and other public social outlets.
President Obama said in a statement on Friday that surveillance programs like PRISM have "helped us prevent terrorist attacks." Still, privacy advocates have consistently pointed to such initiatives as potentially over-reaching.
The key here is that, even without a so-called "back door" into Facebook, Google, Apple and Microsoft servers (which the companies have vehemently denied), and even without the warrants they need to get specific information about individuals from those companies, the feds — and anyone else — can see an awful lot. And they have you to thank for it. Remember that next time you post to Facebook, upload a picture or comment on an article ... like this one.