White House website and the Robots.txt file

Written by Shanx October 29th, 2003

White House website and the Robots.txt file

Continue reading →
Close

Does the White House have something to hide?

Senator Hiram Johnson href="http://www.bartleby.com/73/1925.html" style="text-decoration:underline">famously quipped that
“the first casualty when war comes is the truth.” As the war in Iraq
continues, is the White House intentionally preventing search engines
from preserving a record of its statements on the conflict with an egregious ROBOTS.TXT file? Or, did
their staff simply make a technical mistake?

When search engines “spider” the web in search of documents for their
indices, web site owners sometimes put a file called href="http://www.robotstxt.org/wc/robots.html">robots.txt which
instructs the “spiders” not to index certain files. This can be for
policy reasons, if an author does not want his or her pages to appear
in search listings, or it can be for technical reasons, for example if
a web site is dynamically generated and can not or should not be
downloaded in its entirety.

According to href="http://yro.slashdot.org/article.pl?sid=03/10/27/2052228&mode=nested&tid=103&tid=126&tid=95&tid=99&threshold=2">reports,
though, the White House is requesting that search engines href="http://www.bway.net/~keith/whrobots/">not index certain pages related to Iraq. In addition to stopping searches, this prevents archives like Google’s href="http://www.google.com/help/features.html#cached">cache and
the Internet Archive from
storing copies of pages that may later change. 2600
called the White House to investigate the matter.

According to White House spokesman Jimmy Orr, the blocking of search engines is not an attempt to ensure future revisions
will remain undetected. Rather, he explained, they “have an Iraq
section [of the website] with a different template than the main
site.” Thus, for example, a press release on a meeting between
President Bush and “Special Envoy” Bremer is available in href="http://www.whitehouse.gov/news/releases/2003/10/iraq/20031027-1.html">the
Iraq template (blocked from being indexed by search engines) or href="http://www.whitehouse.gov/news/releases/2003/10/20031027-1.html">the
normal White House template (available for indexing by search
engines). The attempt, Mr. Orr said, was that when people search,
they should not get multiple copies of the same information. Most of
the “suspicious” entries in the robots.txt file do, indeed, appear to
have only this effect.

According to the robots.txt of href="http://www.bway.net/~keith/whrobots/robotsWHcurrent10-24-03.txt">October
24, though, the href="http://www.whitehouse.gov/infocus/iraq/">In Focus: Iraq
section of the site was blocked from search engines. Some of the href="http://www.whitehouse.gov/infocus/iraq/kay-20031008.html">information
there does not href="http://www.whitehouse.gov/query.html?col=colpics&qt=%2B%22david+kay%22+%2B%22interim+progress%22&submit.x=0&submit.y=0">appear
to be available anywhere else on the White House site. However, it
seems that, in response to inquiries from 2600 and other
sources, the White House web team has recently changed their href="http://www.whitehouse.gov/robots.txt">robots.txt so that
these files are no longer blocked. (The current href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.29">Last-Modified
date on the robots.txt is 23:22 GMT,
October 27th, after work on this article had already begun.)

It is of course open to speculation as to whether the original
blocking of the content in question was malicious or an honest
mistake. Certainly anyone who maintains a large website has made some
sort of technical mistake at least once, and the promptness with which
the error was fixed after it was pointed out suggests that the White
House had no interest in keeping it in place. The White House, as an
entity responsible to the citizenry and an entity that has generated a
lot of criticism over its handling of the situation in Iraq, ought to
take special care to avoid similar mistakes in the
future. Nonetheless, we are pleased to learn that, at least this time,
the issue seems to have been resolved promptly.

Posted in Miscellaneous

2 Comments

Tagged with

2 Comments

  1. jacob phillips says:

    you may also enjoy reading this:
    http://condi.topcities.com/whrobots/index.html

  2. taylo says:

    I have a problems with this error when trying to go to windows updated site Error Ox80072EEZ .

Leave a Reply

Miscellaneous

I use the Nokia e61i as my mobile. Instead of my telco’s data plan (which offers me a meagre 1GB per month) I simply prefer to use my home wireless [...]

Continue reading →

View all

Web Tools

If you use Firefox (and if not, what are you waiting for?) you are familiar with useful extensions such as Video Downloader, which allow you to save local copies of [...]

Continue reading →

View all

Databases

This regexp worked for me. SELECT * FROM table WHERE NOT column ~ ( ‘^(‘|| $$[\09\0A\0D\x20-\x7E]|$$|| — ASCII $$[\xC2-\xDF][\x80-\xBF]|$$|| — non-overlong 2-byte $$\xE0[\xA0-\xBF][\x80-\xBF]|$$|| — excluding overlongs $$[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|$$|| — straight 3-byte [...]

Continue reading →

View all

Windows

So you’ve been visited by the much dreaded CRC — Cyclical Redundancy Check error, most likely encountered while copying files between hard disks. On Mac OSX, this will usually appear [...]

Continue reading →

View all

Mac OSX

A simple app ought to do it. Download iRinger. It’s a Windows app. If you’re on Mac, you’ll want to use it within a virtual machine, like Parallels or VMWare [...]

Continue reading →

View all

System Maintenance

I use the Nokia e61i as my mobile. Instead of my telco’s data plan (which offers me a meagre 1GB per month) I simply prefer to use my home wireless [...]

Continue reading →

View all

Wordpress

Among many new exciting features, WordPress 2.6 released the ability to store each and every revision of your posts, like an elaborate update history. Now this can be a pretty [...]

Continue reading →

View all

Audio/Video

Panic, the makers of some fantastic software such as Transmit or Panic, also have the most light-weight audio converter for the Mac OSX platform. It’s called Audion: get it here. [...]

Continue reading →

View all

iPhone

A simple app ought to do it. Download iRinger. It’s a Windows app. If you’re on Mac, you’ll want to use it within a virtual machine, like Parallels or VMWare [...]

Continue reading →

View all