All about User Agents & Referrer

User Agent

In computing, a user agent is software (a software agent) that is acting on behalf of a user. One common use of the term refers to a web browser telling a web site information about the browser and operating system. This allows the web site to customize content for the capabilities of a particular device, but also raises privacy issues.

There are other uses of the term "user agent". For example, an email reader is a mail user agent. In many cases, a user agent acts as a client in a network protocol used in communications within a client–server distributed computing system. In particular, the Hypertext Transfer Protocol (HTTP) identifies the client software originating the request, using a "User-Agent" header, even when the client is not operated by a user. The Session Initiation Protocol (SIP) protocol (based on HTTP) followed this usage. In the SIP, the term user agent refers to both end points of a communications session.

User agent identification

When a software agent operates in a network protocol, it often identifies itself, its application type, operating system, software vendor, or software revision, by submitting a characteristic identification string to its operating peer. In HTTP, SIP, and NNTP protocols, this identification is transmitted in a header field User-Agent. Bots, such as Web crawlers, often also include a URL and/or e-mail address so that the Webmaster can contact the operator of the bot.

Use in HTTP

In HTTP, the User-Agent string is often used for content negotiation, where the origin server selects suitable content or operating parameters for the response. For example, the User-Agent string might be used by a web server to choose variants based on the known capabilities of a particular version of client software. The concept of content tailoring is built into the HTTP standard in RFC 1945 "for the sake of tailoring responses to avoid particular user agent limitations."

The User-Agent string is one of the criteria by which Web crawlers may be excluded from accessing certain parts of a Web site using the Robots Exclusion Standard (robots.txt file).

As with many other HTTP request headers, the information in the "User-Agent" string contributes to the information that the client sends to the server, since the string can vary considerably from user to user.

Source: WIKI

HTTP Referrer

The HTTP referer (originally a misspelling of referrer) is an HTTP header field that identifies the address of the webpage (i.e. the URI or IRI) that linked to the resource being requested. By checking the referrer, the new webpage can see where the request originated.

In the most common situation this means that when a user clicks a hyperlink in a web browser, the browser sends a request to the server holding the destination webpage. The request includes the referrer field, which indicates the last page the user was on (the one where they clicked the link).

Referer logging is used to allow websites and web servers to identify where people are visiting them from, for promotional or statistical purposes.

Details

When visiting a webpage, the referrer or referring page is the URL of the previous webpage from which a link was followed.

More generally, a referrer is the URL of a previous item which led to this request. The referrer for an image, for example, is generally the HTML page on which it is to be displayed. The referrer field is an optional part of the HTTP request sent by the web browser to the web server.

Many websites log referrers as part of their attempt to track their users. Most web log analysis software can process this information. Because referrer information can violate privacy, some web browsers allow the user to disable the sending of referrer information. Some proxy and firewall software will also filter out referrer information, to avoid leaking the location of non-public websites. This can, in turn, cause problems: some web servers block parts of their website to web browsers that do not send the right referrer information, in an attempt to prevent deep linking or unauthorised use of images (bandwidth theft). Some proxy software has the ability to give the top-level address of the target website as the referrer, which usually prevents these problems while still not divulging the user's last-visited website.

Recently many blogs have started publishing referrer information in order to link back to people who are linking to them, and hence broaden the conversation. This has led, in turn, to the rise of referrer spam: the sending of fake referrer information in order to popularize the spammer's website.

Many pornographic paysites use referrer information to secure their websites. Only web browsers arriving from a small set of approved (login) pages are given access; this facilitates the sharing of materials among a group of cooperating paysites. Referrer spoofing is often used to gain free access to these paysites.

It is possible to access the referrer information on the client side using document.referrer in JavaScript. This can be used, for example, to individualize a web page based on a user's search engine query. However, the referrer field does not always include queries, such as when using Google Search with https.

Referer hiding

Most web servers maintain logs of all traffic, and record the HTTP referrer sent by the web browser for each request. This raises a number of privacy concerns, and as a result, a number of systems to prevent web servers being sent the real referring URL have been developed. These systems work either by blanking the referrer field or by replacing it with inaccurate data. Generally, Internet-security suites blank the referrer data, while web-based servers replace it with a false URL, usually their own. This raises the problem of referrer spam. The technical details of both methods are fairly consistent – software applications act as a proxy server and manipulate the HTTP request, while web-based methods load websites within frames, causing the web browser to send a referrer URL of their website address. Some web browsers give their users the option to turn off referrer fields in the request header.

Most web browsers do not send the referrer field when they are instructed to redirect using the "Refresh" field. This does not include some versions of Opera and many mobile web browsers. However, this method of redirection is discouraged by the World Wide Web Consortium (W3C).

If a website is accessed from a HTTP Secure (HTTPS) connection and a link points to anywhere except another secure location, then the referrer field is not sent.

The HTML5 standard added support for the attribute/value rel="noreferrer", which instructs the user agent to not send a referrer.

Another referrer hiding method is to convert the original link URL to a Data URI scheme-based URL containing small HTML page with a meta refresh to the original URL. When the user is redirected from the data: page, the original referrer is hidden. The first public implementation of this method is the Darefer app for ownCloud.

Upcoming Content Security Policy standard version 1.1 introduces a new referrer directive that allows more control over the browser's behavior in regards to the referrer header. Specifically it allows the webmaster to instruct the browser to block referrer at all, reveal it only when moving with the same origin etc.

Source: WIKI