Privacy and HTTP Referer header (2/2)

First part of this post left off with the question of whether blocking Referer header with client-side tweaks is a useful feature. There is a long history of vilifying Referer header in the name of security. Some personal firewall suites implemented this pseudo-mitigation, as does one experimental web-browser, a Firefox add-on and one Chrome extension. In the standards realm, an Origin response header was proposed to convey a subset of the same information, leaving out file and query-string parts of the URL. HTML5 working group also jumped into the fray with a new noreferer attribute to allow website authors to designate when Referer is suppressed.

Getting by without Referer

Paradoxically as referrer information became more valuable to advertising-based business models, the Referer header itself became less critical. It turns out same information can be conveyed in alternative ways provided the originating website cooperates. For example the identity of the website containing the link can be appended to the URL as query-string parameter. This parameter could be the verbatim representation of Referer or some shortened representation both sides agree on. In the case of a banner ad, the advertising network is crafting the final URL that users will be taken to after clicking on the link. Depending on arrangement, the advertiser paying for this service can receive additional information about the user– including current page where they encountered the display ad– incorporated into the query-string at the end of that URL.

Standard header vs home-brew alternatives

Referer header provides this functionality for free, without either site having to do any extra work to stuff more information into query strings. There are certain advantages to relying on that built-in functionality. For example if affiliate websites are getting paid based on amount traffic they drove to the destination, there is an incentive to fraudulently inflate those figures. Tweaking the query-string to create the impression that a particular visit originated from any desired origin is trivial. Referer by contrast is chosen by the visitor’s  web browser and can not be influenced by the originating page. (Note this assumes the affiliate is counting on real users to inflate the statistics, who are running unmodified “honest” web browsers getting bounced off to the real target. Of course the affiliate could maintain its own bot army of modified web-browsers that forge requests with bogus Referer headers. But such artificial streams are easier to detect due to lack of IP diversity, among other anomalies.)

Omitted by design

Referer header is also omitted in certain situations, such as going from an HTTPS page using encryption to plain HTTP page in the clear. This is an intentional security measure, to protect personal information from a secure connection leaking out on a subsequent request in the clear. Similarly Referer is not modified during redirection chains, which can have surprising effects: if page A redirects to page B using an HTTP response status code 302 and page B in turn directs to C, the final Referer observed by C will be A instead of B. In these situations it is critical to use a different mechanism for conveying information about the path a particular user traveled. (Incidentally this is also why cross-site request forgery can not rely solely on checking the Referer header. The header does serve as a reliable indicator of whether a request originated externally when present— modulo client-side bugs that allow forging the header as in this Flash example. But there are legitimate cases when the header is missing by design. Rejecting these would be a false negative.)

Partial solutions

Combining the previous two observations, Referer header is neither necessary nor sufficient for contemporary web tracking scenarios. Returning to the question of whether vilifying the Referer and stripping it out is doing any good: there is a marginal benefit for stopping accidental leaks. These are security vulnerabilities where sensitive information intended only for one website is unintentionally divulged to a third-party by sourcing embedded content or clicking on links. Diligently suppressing the header from every request will defend against these oversights. But it does nothing to prevent deliberate information sharing, when the websites in question are colluding to track users. That happens to be exactly the arrangement between a publisher offering advertising space on its pages and the advertising network providing the content for that slot. Since there is an incentive to provide necessary information to the advertiser, the publisher can do that by using the link, avoiding any dependence on the unreliable Referer header.

HTTP cookies and equivalent functionality which can be used to emulate cookies– DOM storage, Flash cookies etc.– are far more critical for tracking. This is why the advertising industry panics whenever a major browser considers mucking with cookie settings in the name of privacy. Referer header on the other hand is largely historic, incidental behavior in web browsers which has been superseded by improved proprietary designs to achieve the same purpose.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s