ROADS v1 Requirements Document

21st February 1996


1. Overview of objectives for ROADS development and design

2. Objectives for ROADS v1

The following objectives for v1 are identified in the project proposal and are elaborated on the ROADS web pages:

3 General Topics

3.1 Documentation

At present there is no summary document outlining ROADS features. An outline description is required for v1, listing all features not just new ones.
To be done by Bristol

A definition of the `boundaries' of the product is required e.g. an explicit statement of which servers and clients are supported and any constraints on search engines. Similarly there should be a document outlining existing implementation platforms.
To be done by LUT

The release documentation should list all customisable options (with default settings) in order to assist in identifying which features can be changed by the system administrator.

3.2 Customisation

The ROADS product will be customisable, but the basic release should be a working quality product i.e. sensible choices should be made for all customisable options so as little adjustment as possible is required to achieve a working service. All options should have defaults set. The default settings should be included in the list of customisable options mentioned in 3.1.

3.3 Subject headings and thesauri

Access is required to authority files in the form of external lists of terms (e.g. LC authority files) and/or internal lists of terms already used in the database. Access to these lists is required by the template creator when they are about to input to a field, and by the searcher when they are looking for search terms.

Access is required to externally produced thesauri by the template creator when they are adding terms (subject headings in particular), and by the searcher when they are considering which search terms to use.

Access to these lists will need to be by searching and browsing. The user must easily be able to add the found term to a template or search.

4. Template design

4.1 Clusters

A cluster is defined as a grouping of data which re-occurs every time an individual, group, or organisation is described. To avoid repeating this information in every record in which it occurs, the cluster is referred to in the record by a handle. The end-user (the searcher) can display the cluster attribute values whenever a record containing that cluster is retrieved. Also for the searcher the cluster values are in effect contained in the main record i.e. the search will retrieve the record by a combination of values in the cluster and main body of the record.

In order to work effectively the indexer must be able to locate and include already existing clusters when creating a record containing a cluster.

4.2 Destination attribute

The Destination attribute will be used to indicate whether a particular record is `approved' as available to end-users, as well as specifying the quality level of the resource. There needs to be a mechanism which allows the system administrator to specify which users retrieve which records according to the value of the Destination attribute. There would need to be several groupings allowed e.g. record unfinished; record finished but requires checking; high quality; low quality etc. High and low quality would be built into the `all UK' and `high quality world-wide' searches that are now available.

4.5 Recording extensions to templates

A central document needs to be maintained recording all additional templates and data elements used by ROADS implementations. This should help to deter unnecessary proliferation of new data elements.
To be done by Bath

5 Database management

5.1 Link checker

A link checker is needed to check links at regular intervals. The interval length needs to be customisable by days/weeks/months; if the link fails a customisable number of times then an e-mail should be sent to the database administrator; if the link fails a further customisable number of times then the status attribute value  (a new attribute) should be changed in that record to ensure it can no longer be retrieved by end-users.

5.2 Resource change checker

LUT will monitor possibilities for a checker which would alert the administrator to changes in the content of a resource, with a view to inclusion in v2. In v1 the database administrator should be alerted to the new URL if the resource has moved.

5.3 Record security

Mechanisms are needed to ensure there are no unauthorised changes to templates. The design needs to take account of information providers adding and amending templates. There needs to be password protection at the server level and at the ROADS level i.e. allow for password protection on scripts such as mktemp.pl and admin.pl.

If possible detection of unauthorised changes should be recognised and prevented. Only authorised people should be allowed to create records, only those who created the record should be authorised to edit the record except for database administrator(s) who can edit any record.

5.4 Database integrity

The system administrator should have means to ensure the database is not corrupted. There should be regular checks to ensure the data has not been corrupted, and automatic e-mails to alert the administrator to any problems.

Any problems with regular indexing and addition of data should be flagged to the administrator by e-mail.

There should be regular statistical reports on the number of records in the database, number of additions, deletions and edits.

5.5 Record review

If records contain review dates then the database administrator should be mailed with details of the record on that date. There should also be an option for the record creator to be mailed with details of the record at review date, either as well as, or instead of, the copy for the database administrator. Details of the record should be a customisable cut down version of the record or the complete record.

5.6 E-mail address for system administrator

The installation procedure should request an e-mail address for the system administrators . There should be an option to have all mails go to at least two people.

5.7 What's new list

The what's new list is compiled by including all records identified as changed from a tick box in the mktemp.pl editor. The system administrator should specify how long records should stay in the list, and there needs to be an automatic cull.

5.8 Identifying duplicates

No action on duplicates at this stage, for consideration in v2.

5.9 Configuration checker

There should be a configuration checker for ROADS, which identifies any installation problems which may prevent the software from operating effectively and suggests remedies.

5.9 Administration using WWW forms

As much as possible of the administration of the ROADS software should be capable of being done via WWW forms. Currently there are several programs which need to be run from the command line.

5.10 Whois++

The search engine and its user interface should be divorced, and the whois++ protocol inserted between them.

5.11 Improvement to installation procedure

When a new version of the ROADS software is being installed, any configuration files for previously installed versions should not be destroyed. They might either by renamed, or the new configuration files installed with say a -dist suffix. The number of files which are modified during the installation process should be cut down.

5.12 Improvement to ROADS distribution

The process of packaging up the ROADS software for distribution sometimes causes it to be damaged - this is a side-effect of the software being used to do the packaging. It may be desirable to find an alternative approach.

5.13 Reduce size of inverted index

The size of the inverted index should be cut down - preferably by not including the full path to the template since we no longer have multiple databases. Will also be helped by application of the stopword list.

6. User Interface

6.1 Display of search results

The default format for the display of search results should remain as full templates (with the fields to be displayed decided by the service administrator). The display of "title only" search results should be consistent with the resource listings, i.e. they should consist of a button that links to the resource itself, and a title that links to the resource description.

6.2 Display of resource descriptions

The model at <URL:http://sosig.ac.uk/rl/docs/display8.html will be used as a model for the display of resource descriptions.

The URLs displayed within the resource description should be ordered according to the following rank:

The format of the URL display should remain as a link to the URL with the URL itself as the link text. There is a problem with some mailto URLs, however. These URLs may need special instructions, and mailto URLs should link to a separate html page with information on how to use mailto URLs. If the mailto URL requires service specific information, this should be contained in the access-policy or authentication fields of the record as appropriate.

6.3 Display of number of hits

The number of hits returned should be displayed with the search results.

6.4 Stopwords

It should be possible to specify and update a list of stopwords which should not be indexed.

A frequency counting tool should be written to help the database administrator discover which words should be added to the stopword list. The database administrator should be able to specify whether they would like numbers to be indexed , and whether terms which are less than three characters long will be indexed, with a default of no.

6.5 Translation of American spellings

There is a need to include translations from American spellings into English spellings when searching as part of the stemming algorithm (such as colour/color, grey/gray, optimise/optimize, etc)

6.6 Relevance Ranking

The search results should be displayed according to a relevance ranking algorithm. For ROADS v1 this should be kept simple. Records should be ranked according to how many times the search words appear in the template, and where. More sophisticated relevance ranking algorithms can be developed for future versions of ROADS.

The service administrator needs a mechanism to decide whether the audience ranking attribute, if used, takes precedence over other elements in the relevance algorithm.

6.7 Audience Ranking

A need has been identified to specify whether a resource will be useful to a wide or narrow audience. This will be accomplished by the addition of an audience attribute to the template which will have values set by the individual services e.g.

Audience: 2

The service administrator should ensure that the end-user has access to a document containing the meanings of the values used. The service administrator (at the point of installation?) would need to have control over how, or whether, audience ranking is be used by the relevance ranking algorithm, and what control the end-user will have over audience ranking.


Chris Osborne, Rachel Heery