Level 0: lifelike-unpredictable events
A web crawler is employed to realize a lifelike-unpredictable event.
Unpredictable as the unexpected encounter with a friend in a crowded and
foreign city street.
A lifelike-unpredictable crawler is started with a click on the Infome interface.
The instant T0 of the click will be captured by the crawler. From that time
on, its task is to search the web, moving from page to page till it
encounters a page that someone has created/modified at a point in time T1
later than the originating instant T0 ( T1>T0 ). The found page is returned
together with different measures of the length of the search which was required.
The simple fact that the final output is something which was
created/modified after you started the crawler's quest, accounts for the unpredictable
nature of its result and for the similarity to an unplanned
encounter. Such a lifelike-unpredictability is very rarely found in the
digital realm (do you know of other cases?). Mostly, when computers provide
us with random events or numbers, they are just scrambling around some
definite input so that the result of a deterministic
algorithm might appear to us unpredictable rather than obvious. But the web,
while digital, comes incredibly close to life, defying the deterministic
spell, because of its continuous-creation process.
A lifelike-unpredictable crawler may suggest that what will happen tomorrow,
rather than having been planned in some cosmic plan, yesterday, will be the
result of tomorrow's creation, and that we can all contribute. Ultimately, the crawler's
quest is for something that doesn't exist yet, but which is already dreamt about.
Starting such crawlers is playing with time.
Level 1: web clocks
Under the simplest hypothesis, the effort required by a crawler to find its page is
directly related to the ratio between the rate of change or growth of the web and its total size.
The minute the web stops changing and growing, no lifelike-unpredictable crawler
will anymore return form its voyage. Even just a slowdown in the web
evolution would cause our crawlers to spend much more time to find their
As time in our physical world is measured by movement, a measure of time
which is completely intrinsic to the web can be realized starting regularly
new crawlers and monitoring their search efforts (for instance #links
visited or maybe their leap in future T1-T0).
Looking at a clock is always somehow generating a slight anxiety, maybe
because we fear that time could stop, or that it could run too fast, leaving us
behind. This is surely the case for the web clock.
The lifelike-unpredictable crawlers
return (they do!) a newly created or modified page in a reassuringly small
amount of time (, i.e. of visited links. The web is alive, at least for now.
Some of the pages returned belong to categories of pages updated very
frequently and regularly (news, bulletins, forecasts).
There are pages with the wrong time stamp, ugh!
The Greenwich Mean Time is the universal time of the web, allowing to
define simultaneity across time zone.
If a crawler gets to a dead link, something pointing to a page which has
been removed, it gets back the infamous page-not-found (404) page. Such a
page is created by http servers right at the time of the query. So our crawler
stops his search, having found a page created later than its originating
instant T0. This unexpected side-effect allows to measure with the same
crawlers also the rate of decay of the web. But the resulting web clock can
be of a depressing kind.