Privacy implications of email tracking

kali null
8 min readJan 4, 2018

What happens when you open an email and allow it to display embedded images and pixels? You may expect the sender to learn that you’ve read the email, and which device you used to read it. But in a new paper we find that privacy risks of email tracking extend far beyond senders knowing when emails are viewed. Opening an email can trigger requests to tens of third parties, and many of these requests contain your email address. This allows those third parties to track you across the web and connect your online activities to your email address, rather than just to a pseudonymous cookie.

Illustrative example. Consider an email from the deals website LivingSocial (see details of the example email). When the email is opened, client will make requests to 24 third parties across 29 third-party domains.[1] A total of 10 third parties receive an MD5 hash of the user’s email address, including major data brokers Datalogix and Acxiom. Nearly all of the third parties (22 of the 24) set or receive cookies with their requests. In a webmail client the cookies are the same browser cookies used to track users on the web, and indeed many major web trackers (including domains belonging to Google, comScore, Adobe, and AOL) are loaded when the email is opened. While this example email has a large number of trackers relative to the average email in our corpus, the majority of emails (70%) embed at least one tracker.

How it works. Email tracking is possible because modern graphical email clients allow rendering a subset of HTML. JavaScript is invariably stripped, but embedded images and stylesheets are allowed. These are downloaded and rendered by the email client when the user views the email.[2] Crucially, many email clients, and almost all web browsers, in the case of webmail, send third-party cookies with these requests. The email address is leaked by being encoded as a parameter into these third-party URLs.

When the user opens the email, a tracking pixel from “tracker.com” is loaded. The user’s email address is included as a parameter within the pixel’s URL. The email client here is a web browser, so it automatically sends the tracking cookies for “tracker.com” along with the request. This allows the tracker to create a link between the user’s cookie and her email address. Later, when the user browses a news website, the browser sends the same cookie, and thus the new activity can be connected back to the email address. Email addresses are generally unique and persistent identifiers. So email-based tracking can be used for targeting online ads based on offline activity (say, to shoppers who used a loyalty card linked to an email address) and for linking different devices belonging to the same user.

Measuring email tracking at scale. To understand the privacy implications of viewing and interacting with emails we assembled a collection of messages from mailing lists on the top sites.[3] Using OpenWPM, a web measurement platform developed at Princeton, we simulated a user opening each email and clicking links from within a webmail client that loads remote content. We found that 85% of emails in our corpus contain embedded third-party content, and 70% contain resources categorized as trackers by popular tracking-protection lists. Many of these third parties, including 7 of the top 10, also have a large web presence.

When “anonymous” web tracking isn’t. About 29% of emails leak the user’s email address to at least one third party when the email is opened, and about 19% of senders sent at least one email that had such a leak. The majority of these leaks (62%) are intentional.[4] If the leaked email address is associated with a tracking cookie, as it would be in many webmail clients, the privacy risk to users is greatly amplified. Since a tracking cookie can be shared with traditional web trackers, email address can allow those trackers to link tracking profiles from before and after a user clears their cookies. If a user reads their email on multiple devices, trackers can use that address as an identifier to link tracking data cross-device.

Most of the top leak recipients, including LiveIntent, Acxiom, Conversant Media, and Neustar, are involved in “people-based” marketing. These third parties receive leaked email addresses from between 24 to 68 of the 902 email senders studied. People-based marketing is defined by Acxiom as “the ability to perform targeting and measurement at the level of real people, not just devices, by resolving identity across digital and offline channels.” In other words, it is a term used to describe a set of services which allow marketers to use tracking data collected across any of a user’s devices, as well as offline data, to target that user on any of their devices. As discussed above, this could include offline data such as purchases made using a loyalty card at a grocery store, if that data is available associated with the purchaser’s email address (or a hash of it).

While our data does not let us measure how the companies use leaked email addresses they receive when a user views an email, we can get some insight into potential uses by examining their product pages. The marketing materials and privacy policies of the four companies mentioned above detail their use of email addresses for cross-device targeting and/or data onboarding products.[5]

Are leaks of hashed email addresses less of a privacy concern? In many cases the leaked email address is hashed; in fact, 68% of all leaks which occur while viewing emails are hashed, one-third of which also include the domain portion of the email address in plaintext. Hashed email is considered by some leak recipients to not be personally identifying information.[6]

From a computer science perspective, the claim that a hashed email address is not personally identifying is patently false. When user records in a database are keyed by hashed email address, looking up the record for a given email address is trivial: simply hash it first and look it up (indeed, this is the whole point of storing hashed email addresses at all). What if you have data associated with a hash of an unknown email address and want to recover the original address? It’s surprisingly easy: you can rent a multi-GPU virtual machine for $14.40 an hour[7] , which gives you 73 billion MD5 hash computations per second based on published benchmarks. Modern methods have gotten really good at enumerating plausible sequences of characters and numbers in passwords, and we believe these methods will extend to email addresses. If they do, it would mean that email address hashes can be broken much more efficiently than through brute forcing (i.e., trying all possible combinations of characters). We posit that with a trillion guesses — a cost of 6 US cents — it should be possible to enumerate the majority of email address in use.

Additional leaks occur when users click on links in emails. When an email link is clicked the URL is typically handed over to the user’s browser, or to a new tab in the user’s browser, in the case of webmail. Email addresses and other identifiers may be embedded in these links, and may ultimately cause the user’s email address to leak to third-parties on the web. We found that about 11% of links contain requests that leak the user’s email address to a third-party and about 12% of all emails contain such a link. The largest recipients of these leaks are Google, Facebook, and Twitter, and the top recipients overall are very similar to the top third-party trackers on the web.

Leaks in link clicks can also allow email trackers to work around privacy protections in emails clients that strip cookies from remote resources (like Apple Mail) or in those that proxy remote resources (like Gmail). Since the clicked link is opened in the user’s browser, the tracker can make the explicit link between the user’s cookie and the leaked email address while the resulting page is loaded.

What can users do? All of the privacy risks discussed in our paper stem from remote resources, so users can use mail clients which support blocking images by default to completely avoid the problem. However, that can often result in emails which are unreadable; this is particularly true for marketing emails.

Blocking images by default provides complete protection from tracking when emails are viewed, but can often result in unreadable emails.

In Section 6.2 of the paper we survey 16 mail clients and find that a patchwork of privacy features are employed, but that no setup offers complete protection from the threats we identify. Mail clients that block cookies by default, like Apple Mail, offer some level of protection. In these clients it’s more difficult for a tracker to track users across mailing lists, since the mail client doesn’t provide a persistent identifier. The same is true for webmail clients which proxy images, like Gmail and Yandex. Content proxying has the added benefit of preventing a tracker from being able to link the browser’s cookies to any identifiers received when an email is opened.

Even with the defenses employed by the clients we studied, trackers which receive the user’s leaked email address will continue to be able to track and target users in these clients and on the web. As an example, LiveIntent’s marketing material reassures clients that it will continue to work in Gmail since “targeting is primarily based around the e-mail address’s [sic] MD5 hash”. Regardless of the defenses deployed by the client, control of tracking is handed off to the user’s browser when email links are clicked.

We found that the tracking protection lists EasyList and EasyPrivacy reduce the number of email leaks that occur when an email is viewed by 87%. Perhaps the best option for privacy-conscious users today is to use webmail and install tracking protection tools, such as uBlock Origin or Ghostery. Users who want to use a standalone client must find one which supports privacy extensions; of the clients we studied, the only one that supports such extensions is Thunderbird. Having tracking protection tools installed in the browser will also provide protection when email links are clicked. In Section 7 of the paper we prototyped a server-side filtering feature which uses the tracking protection lists to filter the HTML body of emails before they reach the user. We found it to be nearly as effective as a tracking blocker running in the user’s browser.

Github: https://github.com/citp/email_tracking

Paper: https://senglehardt.com/papers/pets18_email_tracking.pdf

Source: https://freedom-to-tinker.com/2017/09/28/i-never-signed-up-for-this-privacy-implications-of-email-tracking/

Bitcoin tip jar: bc1qgpl6lhf09j6kcdvkh8cz90p4cfxuyfec3ecjrd

Ethereum tip jar: 0x7e0Bf6D50b5F5fcbf76A16Bd5285CE0c74C063a9

--

--

kali null

security researcher and penetration tester. twitter: @kali_null