Workshop A: Web-wide Indexing/
Semantic Header or Cover Page
Co-Chairs: Bipin C. Desai, Brian Pinkerton>
Workshop A:
Web-wide Indexing
Semantic Header or Cover Page
Summary
What to index?
- Use the anchor term used for the HTML links,
- Use the title and headings of the HTML page,
- Use the full text to create an index,
- Use the filename of the HTML resource,
- Word occurrence - URL pairs,
- Inverted indices of keywords,
- Indexes of interesting keywords.
How are indices created?
- Use a robot to scan the Web for new and changed HTML resource,
- Server side support based systems,
- Different frequency of updates.
What is indexed?
- Approximately twenty search engines with accompanying services for parts of the WWW,
- Each covers a part of the Web,
- Gateways to other indexing services such as WAIS.
What is needed?
- How does a naive user start a search? Danger of hierarchical index!
- Hierarchical searching,
- Share information and avoid replication,
- Parallel, fault-tolerant, and scalable index server,
- Language(Natural) independent,
- Find all the _____ having concept/property ------,
- Index other form of resources: images, graphics or sound,
- Capture structure of the Web, hyper-media,
- Common interface,
- Index generation with revision control,
- Establish a Web Indexers' Working Group.
Dr. Bipin C. Desai
Concordia University
Montreal, Canada
Email: bcdesai@cs.concordia.ca
Messages: (514)-848-3040
Fax: (514)-848-8652