What a Website Knows About You

Most websites prominently display some of the privacy policy that describes what kind of information the site collects about its visitors and what it does with that information. This makes perfect sense when you supply your name and / or email address and / or other pertinent personal information to the site, such as when…

Most websites prominently display some of the privacy policy that describes what kind of information the site collects about its visitors and what it does with that information. This makes perfect sense when you supply your name and / or email address and / or other pertinent personal information to the site, such as when you're creating an account or making a purchase. But what does a website know about you if you do not register with it? To understand this you need to know a bit about how web browsers and web servers interact.

A web server is the software application that hosts a website. Your web browser communicates with the web server to fetch the HTML pages, images, videos, etc. that make up the website. This communication is done using a “protocol” (a set of commands) called HTTP, which is short for “Hypertext Transfer Protocol”.

An interesting feature about HTTP is that it's mostly a plain text protocol. In other words, the commands are human-readable words and phrases. Here, for example, is the simplest HTTP command for fetching a single web page from a web server:

  GET /index.html HTTP / 0.9 

This command says “Please GET the page '/index.html' and, by the way, I only understand version 0.9 of HTTP”.

The web server would typically respond with a status code, some extra information, and the contents of the page in question.

A web browser normally sends additional information along with the request for a specific page. This information is sent to the web server using headers , which are name-value pairs. A modern browser would send headers like these:

  GET /index.html HTTP / 1.1 
Host: http://www.yahoo.com
Referer: http://www.google.com/search?q=best+directory
Accept-Language: en-US, en, fr-CA, fr
User-Agent: Mozilla / 5.0 (Windows; U; Windows NT 5.1; en-US; rv: 1.9.0.15) Gecko / 2009101601 Firefox / 3.0.15 GTB5

These headers tell the web server:

* That the visitor was directed to the website from a google.com search (the “Referer” header – yes, it's misspelled, that's the way it is in the protocol and it can not be changed) for the term “best directory “He said.

* That the visitor reads English and French (the “Accept-Language” header).

* That the visitor is using the Firefox browser (the “User-Agent” header).

When combined with the IP address of your computer (which the web server gets directly from the network connection the browser makes), this information can tell the webmaster a lot about the visitors that are browsing the site. None of it is personally identifiable information, but it's definitely useful. Webmasters can even tell which part of the world you're coming from based on your computer's IP address.

You can control how much of this information makes it to the web server. If you use the Firefox browser, for example, there are add-ons (extensions) that let you disable or otherwise mask these headers.

For the most part, though, these headers are actually useful for the webmaster and there's no need to block most of them. The only ones that should concern you are the cookies (marks) that web servers can insert into an HTTP conversation. Cookies have their use, but they're also a privacy concern when misused by website owners. Luckily, a good cookie blocker is all you need to fix that problem.