Workshop A: Web-wide Indexing/Semantic Header or Cover Page
Chair: Bipin C. Desai, Brian Pinkerton
John Leavitt
The WebAnts project is working to develop both cooperative explorers
and cooperative servers, called "ants". This cooperation will allow
the processing and throughput load to be distributed among many
machines.
While indexing/exploring, this means that lots of explorers are all
working at once and are in constant communication with each other to
ensure that no area gets re-explored. This expands both the
processing and throughput resources available, which means that much
more of the web may be explored in a given period of time.
When serving information, a similar situation will exist, with many
servers each providing only part of the entire database. When a user
asks one server for information, the server can not only check its own
portion of the database, but can also, if necessary or desired, pass
the request onto the other servers. The responses from the different
servers can then assembled into a single response for the user. In
general, the user should not even notice that this extra work is being
done. The advantage is that it is not necessary to have a single
machine providing all the processing, throughput, and storage.
Note that the WebAnts project is concerned with developing the
technology to support this sort of distribution. There remain
significant questions as to what the best way may be to break up the
database. One obvious approach to distribute the information based on
its geographic location, i.e. a server in London has the database for
all European documents, one in Pittsburgh has all of the US and
Canada, etc. This approach would have many advantages, not the least
of which is eliminating unnecessary traffic between continents.
Unfortunately, it also has a serious problem, in that it does not
reflect the structure of the web, and therefore is unlikely to mirror
the distribution during exploration. This lack of symmetry would
necessitate an intermediate step in which each explorer attempts to
determine which servers should have which information and ships it off
to them. Such as step would be non-trivial to implement and could
tend to decrease the value of distributing the exploration phase.
The WebAnts project recently received funding from Texas Instruments
so that a more concentrated, albeit still part-time, effort may be put
forth. As part of this effort, cooperative explorers are currently
being developed. These will follow a master and servants design, in
which each master will coordinate the efforts of its slaves, both
between them and by communicating with other masters. The first
versions should start exploring the web by the end of March. The next
phase of the project is the development of distributed servers, which
should make their appearance in early fall.