giuseppe prisco

Level 0: lifelike-unpredictable events

A web crawler is employed to realize a lifelike-unpredictable event. Unpredictable as the unexpected encounter with a friend in a crowded and foreign city street.

A lifelike-unpredictable crawler is started with a click on the Infome interface. The instant T0 of the click will be captured by the crawler. From that time on, its task is to search the web, moving from page to page till it encounters a page that someone has created/modified at a point in time T1 later than the originating instant T0 ( T1>T0 ). The found page is returned together with different measures of the length of the search which was required.

The simple fact that the final output is something which was created/modified after you started the crawler's quest, accounts for the unpredictable nature of its result and for the similarity to an unplanned encounter. Such a lifelike-unpredictability is very rarely found in the digital realm (do you know of other cases?). Mostly, when computers provide us with random events or numbers, they are just scrambling around some definite input so that the result of a deterministic algorithm might appear to us unpredictable rather than obvious. But the web, while digital, comes incredibly close to life, defying the deterministic spell, because of its continuous-creation process.

A lifelike-unpredictable crawler may suggest that what will happen tomorrow, rather than having been planned in some cosmic plan, yesterday, will be the result of tomorrow's creation, and that we can all contribute. Ultimately, the crawler's quest is for something that doesn't exist yet, but which is already dreamt about. Starting such crawlers is playing with time.

Level 1: web clocks

Under the simplest hypothesis, the effort required by a crawler to find its page is directly related to the ratio between the rate of change or growth of the web and its total size. The minute the web stops changing and growing, no lifelike-unpredictable crawler will anymore return form its voyage. Even just a slowdown in the web evolution would cause our crawlers to spend much more time to find their yet-to-be destinations. As time in our physical world is measured by movement, a measure of time which is completely intrinsic to the web can be realized starting regularly new crawlers and monitoring their search efforts (for instance #links visited or maybe their leap in future T1-T0). Looking at a clock is always somehow generating a slight anxiety, maybe because we fear that time could stop, or that it could run too fast, leaving us behind. This is surely the case for the web clock.

Findings

The lifelike-unpredictable crawlers return (they do!) a newly created or modified page in a reassuringly small amount of time (, i.e. of visited links. The web is alive, at least for now. Some of the pages returned belong to categories of pages updated very frequently and regularly (news, bulletins, forecasts). There are pages with the wrong time stamp, ugh! The Greenwich Mean Time is the universal time of the web, allowing to define simultaneity across time zone.

If a crawler gets to a dead link, something pointing to a page which has been removed, it gets back the infamous page-not-found (404) page. Such a page is created by http servers right at the time of the query. So our crawler stops his search, having found a page created later than its originating instant T0. This unexpected side-effect allows to measure with the same crawlers also the rate of decay of the web. But the resulting web clock can be of a depressing kind.

unpredictable crawlers

home