Today I’m writing for you a fairly long and technical article on user privacy when browsing the web.
It’s a well known fact that the internet is fuelled by advertising.  And that privacy is an ongoing battleground between users and site builders, refereed by browser software.
This fact was made very evident to me a few weeks ago when I browsed some online reviews of a very expensive camera lens.  For about a week after that, every single web site I visited showed me adverts telling me where I could get that very lens, with an intensity and singlemindedness of purpose which was almost hallucinatory.
Web sites track people using cookies, small pieces of data inserted into the web  protocol http that identify users and record data about them.  Sites can pass cookies between them to tell each other about you.  This is to allow them to customise themselves to your wishes (well, really, to customise advertisements based on your browsing history).  That can be a good thing, but it can also be a violation of your privacy.
For this reason, all modern browsers give you a high degree of control over how they will permit cookies to be used.  Also, you can clear out cookies in the browser.
But, increasingly, that’s not the whole story.  In this article, I’m going to describe some of the mechanisms web sites can use to track you without you being aware of it.  These are:

  • Browser fingerprinting, identifying you by how your browser behaves
  • Evercookies, other storage mechanisms available on your browser which can be used to replace cookies or regenerate them when they’re removed
  • Cookie syncing, a way of passing cookies between web sites within link URLs.

Browser fingerprinting.  New features within HTML5 allow web sites to draw images in the browser which are invisible to the user, using the system’s graphics card directly.  The details of the image will vary slightly based on the exact version of the browser, operating system and graphics card being used.  There are enough different combinations in common use to identify maybe 100-200 people uniquely.  This particular technique is called canvas fingerprinting, but there are many other fingerprinting techniques around, based on other aspects of browser behaviour which are accessible to web sites, including enumerating the installed plugins and fonts.  Taken together, they should be enough to uniquely identify one user among millions.
Evercookies.  Some browser applications, such as Adobe Flash, provide their own cookie mechanism independent of the browser cookies.  Also, HTML5 provides new facilities for storing data client-side including the localStorage, sessionStorage and IndexedDB APIs.  Finally, web sites can use ETags (opaque identifiers generated by the server, passed back to the browser for cache management and retained by the browser long term) to track users.  Any of these have the effect that:

  • Web sites have facilities they can use in the same way as browser cookies to track users, but which users are not aware of and cannot easily control.
  • When a user deletes his browser cookies, these additional local data stores can be used to regenerate the cookies, meaning that the user is still being tracked.

Cookie Syncing: when one web site redirects a user to a third party site, it can pass data to that site by adding a parameter string in the URL it sends to the site.  This parameter can include any information, including a cookie or identifier that can be used to track the user.
So, where does all this leave the user?
Unfortunately, most web users are not aware of privacy issues, and browser vendors are more influenced by the need for web sites to work well then by the need for user privacy.  None of the common browsers (Internet Explorer, Firefox, Safari, Chrome) provide good controls against the types of attacks I have described.  Only the Tor browser, specifically designed to protect user privacy, does so.  So, if you value your privacy or live in an oppressive country, you have few options when you choose a web browser.
Acknowledgement: I made use of the excellent (but very technical) paper The Web never forgets: Persistent tracking mechanisms in the wild when writing this article.