Ex BibliothecaThe life and times of Zack Weinberg.
Wednesday, 11 December 2002# 6:30 PMweb logs and search enginesThese days there's a lot of good content out there in the form of web logs. Unfortunately, it's not indexed well by search engines. The trouble is, the webcrawler comes by and records whatever's on the front page of the log at the time, but by the time you go to make a search, a whole bunch more entries will have been added, pushing the entry you searched for off the front page. Entries in the archives may not be indexed at all. This despite the fact that most weblogs have 'permanent links': the little
blue hashmark at the beginning of this entry is an Here's how I would implement this. Suppose we invent a labelling scheme
which will allow a webcrawler to tell that an <a class="permalink-above" href="..."> <a class="permalink-below" href="..."> "permalink-above" means that the tag precedes the text it is a
permanent link for; "permalink-below" means that it follows that
text. (Both styles are used.) We also need a way to indicate the
block-level element that contains all the permanent links, so that
navigation and page header boilerplate don't get sucked into the
permalink indexing mode. For this, we define another Search engines then should record each chunk of text in the ambit of a permalink tag as a separate logical document. However, links to the base URL for the weblog should count as links to all of these chunks, for scoring purposes. (This corresponds to the intuitive interpretation of a link to the base URL, which is "I like everything this author says.") Links to individual permalinks count only for that chunk. Your comments are requested. |