bin/addsl.pl - generate HTML subject listings from ROADS templates
bin/addsl.pl [-ANacdhi] [-f config_dir] [-l view] [-m filename] [-n database_name] [-o override_file] [-p pattern] [-s source_dir] [-t target_dir] [-u name] [-w waylay_url] [handle1, handle2 ... handleN]
The addsl.pl program generates a set of subject listing files for the templates with the specified handles. These listing files are also converted into static HTML documents which can be placed on the WWW. The program can also generate HTML lists in numerical and alphabetical order based on the contents of a subject descriptor mapping file.
The addsl.pl program can generate a number of different subject listings. This allows, for example, a subject listing of UK based resources in addition to a subject listing of all resources. The views also allow easy selection of which subject listing a template should be added to in the admin-cgi/mktemp.pl editor.
You can arrange for the ROADS software to generate listings of some or all of your templates broken down by subject area. Note that each template which you would like to appear in a subject listing should contain at least one URI attribute and at least one Subject-Descriptor cluster.
You may have as many different views of your templates as you like. Each view is normally a collection of statically generated HTML documents created by addsl.pl, though in version 2 of ROADS you can also browse dynamically through your database using "canned" queries. The subject listings may be customized in a number of ways - notably via HTML outline files may be used to specify the overall format of each HTML document generated by the ROADS software. These have some extra pseudo-HTML tags which allow you to indicate where in the resulting documents you would like the subject listing information to appear.
It is also possible to specify a pattern which the URIs in the resource description templates will have to match in order to be included in a subject listing. This can be used to generate, for example, lists of resources which are found in the UK academic community, resources which are generated dynamically by scripts, all resources of a particular type (e.g. MPEG movies), and so on.
addsl.pl will also generate customizable lists of the available subject categories in both alphabetical and numerical order (assuming the Subject-Descriptor classification is numeric.
A default set of subject categories based on the different programme areas in the UK Electronic Libraries Programme (to match our sample database) is distributed with the ROADS software as config/classmap, under the top level ROADS installation directory. You will probably want to change this to reflect your installation.
The file format of the subject listing views is explained in detail below. Essentially, it should contain pointers to the location of each of the following:
HTML-Directory: subject-listing WWW-Directory: subject-listing Listing-Directory: subject-listing Mapping-File: class-map Subject-Scheme: DDC AlphaList-File: alphalist.html NumList-File: numlist.html
The meanings of these path names are explained below. It is worth noting that they can be either relative (to the various directories involved in generating the subject listings, such as the ROADS config, guts and htdocs directories), or absolute - e.g. /usr/local/roads/guts/subject-listing/Default. You may prefer to refer to them by the full path name to avoid confusion, but be aware that this may cause you problems if you move the ROADS installation to another directory tree.
Note that the ROADS software comes shipped with defaults for the Default, DefaultAlpha and DefaultNumber outlines. The outline HTML used to generate the actual subject listings lives by default under config/multilingual/*/subject-listing-views. In version 2 of ROADS we switched to using our generic HTML rendering code, away from the old hard-coded HTML rendering embedded in the older versions of this code.
If your Subject-Descriptor-Scheme is UDC (the default),
you should be able generate subject listings for all your templates using
the default view by running addsl.pl with the -a
argument:
% addsl.pl -a
You will not need to do this if you are creating templates from scratch
using the WWW based forms editor - this gives you the option of entering
new templates into the subject listings automatically. In fact, it runs
addsl.pl behind the scenes. If you only want to add a subset of
your templates (such as those which have changed recently),
addsl.pl should be called without the -a argument, and
with the handles of the templates as arguments, e.g.
% addsl.pl 0123 0124 0125
If you would like to create more than one view of your resource description
templates, e.g. to have a separate AllUK listing of resources
which pertain to the UK higher education community (Internet domain -
ac.uk), you will need to make another view file and run
addsl.pl with the -l specifying this, e.g. the view file
for AllUK might look something like this:
Outline-File: subject-listing/Default HTML-Directory: subject-listing/AllUK/ Listing-Directory: subject-listing/AllUK/ Mapping-File: subject-listing/classmap Alpha-Outline: subject-listing/DefaultAlpha Number-Outline: subject-listing/DefaultNumber
Whilst in this example the same HTML outline documents have been used for
both views, this is entirely under the control of the ROADS server
administrator. To create the AllUK view, you would need to run
addsl.pl with both the -l and -p arguments,
e.g.
% addsl.pl -a -p '\.ac\.uk' -l AllUK
The resulting subject listing files will be generated in the directory specified in the view file as HTML-Directory, e.g. /usr/local/www/ROADS/subject-listing/AllUK. The following files will be generated:
Note that if they do not exist already, you will need to create parent directories for the directories referred to in a subject listing view configuration file.
A number of options are available for the addsl.pl program to control which files are used for generating the subject listings and where configuration options are located. Note that most of these can also be supplied in the addsl.pl view config file (see below), and that settings which appear in this will usually override command line arguments.
Don't generate alphabetical subject index
Process all templates in source directory.
Specify that the alphabetical listing should take acount of the case of the characters. Without this option, acorn , Apple and Zebra are sorted in that order. With this flag set, they would be sorted as Apple , Zebra and acorn.
Specify that some (fairly copious) debugging information should be generated during the generation of the hypertext tree. This option is probably not of interest to anyone bar the developers.
Specify the directory for views configuration files.
Provide some online help outlining the options available and exit.
Regenerate HTML files regardless of timestamps on subject listing files.
Set subject listing view name. This is the name of the file that contains the configuration information concerning the location of the listings, HTML and outline files. For more information on this see below.
Be aware that for a given view you will actually need three sub- directories under the config/multilingual/*/subject-listing-views directory, named view, viewAlpha, and viewNumber. This is because the addsl.pl tool generates three separate sets of HTML files when it runs - the regular view of your database, plus views sorted by numerical and alphabetical order.
Just another reminder that the settings specified in a view file typically override other command line arguments, e.g. Subject-Scheme overrides the -u argument.
Specify the subject descriptor mapping file to use.
Don't generate numeric subject index.
Specifies the name of the database to use when generating HTML. The default is the service name which was entered when the ROADS software was installed.
Only enter entries in the subject listings for templates that have URI fields that match the supplied pattern. The pattern can be a full Perl regular expression and allows one to, for example, restrict entries in the subject listings to only include UK academic sites. By default the pattern matches all URLs and so all templates are included in the hypertext lists.
Set the absolute pathname of the directory containing the IAFA templates.
Set the absolute pathname of the directory where the files created by addsl.pl will be placed.
Sets the name of the Subject-Descriptor-Scheme to search for in the templates. The default is UDC.
The URL to waylay people too when dealing with an unusual or complex URL
scheme, e.g. wais. See the manual page for
cgi-bin/waylay.pl for more information on this.
These options are then followed by zero or more templates' handles (note -
not filenames). If the -a option is given, no handles
need be given on the command line; all templates in the database will be
added to the subject listings.
config/class-map - where to get default mappings from.
Subject-Descriptor-Scheme attributes in templates to filenames used for generating HTML.
config/subject-listing/* - view files, each of which describing a particular way of rendering the templates into HTML.
config/multilingual/*/subject-listing-views/* - HTML rendering rules for addsl.pl subject listing views, with a separate directory per view. The actual rendering rules are as per search results.
guts/subject-listing/*.lst - default location of the internal files used to maintain state between runs of subject listing tools.
htdocs/subject-listing - default location of the HTML generated by addsl.pl
The subject descriptor mapping file specifies the code for a particular subject section, the name given to that section in the HTML documents and the root of the filename used to hold that section's hypertext listing, each element being separated by a colon. An example line from a subject descriptor mapping file (for the UDC subject descriptor scheme) is:
30.442:Development Studies:devstud
Note that the section name should not contain the colon character ":" - this would confuse addsl.pl.
Each available HTML view of the templates is specified by a view file. A sample file is:
HTML-Directory: /WWW/htdocs/ROADS/subject-listing/ WWW-Directory: /ROADS/sl/ NumList-File: /ROADS/sl/numeric.html AlphaList-File: /ROADS/sl/numeric.html Listing-Directory: /usr/local/ROADS/guts/subject-listing/ Mapping-File: /usr/local/ROADS/config/subject-listing/class-map Generate-Children: yes
The various attributes currently defined in the view file are:
The name of the file into which addsl.pl will save a list of the subject categories sorted by alphabetical order.
Turns on case folding when alphabetising the list - the same as the -c option on the command line.
Whether or not to generate subject listings for templates that only have ChildOf relation types in them.
The path to the directory in which the subject listing HTML documents should be generated. This directory should be accessible to the HTTP daemon that serves the ROADS documents if they are to be accessible via the World Wide Web. If the path is a relative one, it is assumed to be relative to the ROADS htdocs directory, i.e. the directory under which the ROADS related HTML documents are rooted.
The path to the directory in which the subject listing files should be located. This is typically a subdirectory of the guts directory of the ROADS installation, where internal files used only by the ROADS software are kept. If this is a relative path, it is assumed to be relative to the ROADS guts directory.
The path to the subject descriptor mapping file. If this is a relative path, it is assumed to be relative to the ROADS config directory.
The name of the subject scheme that this view relates to.
The WWW path to the directory in which the HTML generated by addsl.pl will appear. This includes the AlphaList-File and NumList-File listings.
the manual page for bin/addwn.pl, the manual page for
bin/cullsl.pl, the manual page for bin/cullwn.pl, the
manual page for bin/mkinv.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Jon Knight <jon@net.lut.ac.uk>, Martin Hamilton <martinh@gnu.org>
bin/addwn.pl - add what's new entries for specified templates
bin/addwn.pl [-acdh] [-f directory] [-l number] [-n name] [-p pattern] [-r] [-s directory] [-w name] [-z date] [handle1 handle2 ... handleN]
The addwn.pl program adds templates with the specified handles to a What's New listing file. This listing file is then converted into a static HTML document which can be placed on the WWW. The Whats New file is intended to show end users what resources have just been catalogued by a subject service and/or when some aspect of a catalogued resource's template has changed.
The ROADS software can generate lists of resource descriptions which have been entered recently or changed recently. The configuration of this is very similar to that of the resource listings. Essentially, each What's New view is specified by an HTML outline file, a file to add the new resource information to, and an internal file. The default What's New view can be found in the file config/multilingual/*/whats-new-views/Default under the top level ROADS installation directory.
The default What's New view installed by the ROADS software will be configured to create a listing file called whats-new.html in the ROADS directory on your WWW server, and use sub-directories of the ROADS installation for its outline and internal files, e.g.
Outline-File: whats-new/outlines/Default HTML-File: whats-new.html Listing-File: whats-new/Default.lst
If you create your resource description templates using the WWW based
template editor, you will be given the option of entering them into a
What's New list - addwn.pl will be called to do this.
Alternatively, if you wish to generate these listings manually, you can
run addwn.pl yourself. Use the -a option to add all your
templates, e.g.
% addwn.pl -a
If you only want to include a subset of the resource description templates
in your database, addwn.pl takes a similar set of options to
addsl.pl - e.g. the -p option can be used to restrict
the templates which are included based on the contents of their URIs, and
individual templates to include can be specified on the command line.
Note that your templates must include at least one URI attribute.
A number of options are available for the addwn.pl program to control which files are used for generating the subject listings and where configuration options are located:
Process all templates in source directory.
Specify that the alphabetical listing should take acount of the case of the characters. Without this option, acorn , Apple and Zebra are sorted in that order. With this flag set, they would be sorted as Apple , Zebra and acorn.
specify that some (fairly copious) debugging information should be generated during the generation of the hypertext tree. This option is probably not of interest to anyone bar the developers.
Specify the directory for views configuration files.
Provide some online help outlining the options available and exit.
Specifies that only the last N resources added to the ROADS server should be used in the "What's New" listing.
Specifies the name of the database to use - defaulting to the service name which was entered when the ROADS server was created.
Only enter entries in the subject listings for templates that have URI fields that match the supplied pattern. The pattern can be a full Perl regular expression and allows one to, for example, restrict entries in the subject listings to only include UK academic sites. By default the pattern matches all URLs and so all templates are included in the hypertext lists.
Specifies that any duplicates should be removed (pared down to a single entry).
Set the absolute pathname of the directory containing the IAFA templates.
Sets the name of What's New view to use in configuring the addwn.pl script.
Specifies that only resources added since this date should be included in the "What's New" listing, where the date fields are:
hh - hours mm - minutes ss - seconds dd - day mm - month yyyy - year
These options are then followed by zero or more templates handles (note -
not filenames). If the -a option is given, no handles
need be given on the command line; all templates in the database will be
added to the subject listings.
config/whats-new/* - "What's New" view specifications
config/multilingual/*/whats-new-views/* - rendering rules for the various "What's New" views
htdocs/whats-new.html - default location of listing.
The addwn.pl can generate a number of different subject listings. This allows, for example, a subject listing of UK based resources in addition to a subject listing of all resources. The views also allow easy selection of which subject listing a template should be added to in the mktemp.pl editor.
The view is specified by a view file. A sample file is:
HTML-File: /WWW/htdocs/ROADS/whats-new.html Listing-File: /usr/local/ROADS/guts/whats-new/Default.lst
The various attributes currently defined in the view file are:
The path to the file in which the subject listing HTML document should be generated. This file should be accessible to the HTTP daemon that serves the ROADS documents if the HTML document is to be accessible via the World Wide Web. If the path is a relative one, it is assumed to be relative to the ROADS htdocs directory - i.e. the directory in which ROADS related WWW pages are rooted.
The path to the file in which the What's New listing file should be located. This is typically located in the guts directory of the ROADS installation, which is where files needed for the internal operation of the ROADS software are kept. If the path is a relative one, it is assumed to be relative to the ROADS guts directory.
the manual page for bin/addwn.pl, the manual page for
bin/cullsl.pl, the manual page for bin/mkinv.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Jon Knight <jon@net.lut.ac.uk>
bin/bg_exterminate.pl - background reindexing to remove stale templates
bin/bg_exterminate.pl
This Perl program launches a process to remove stale templates from a ROADS server database. On completion it sends email to the server's system admin and database admin contacts.
It is intended for invocation from a World-Wide Web CGI program, a cron job, or an at job.
None.
Mail to server maintainers.
the manual page for bin/exterminate.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/bg_lc.pl - background link checking run for WWW or cron
bin/bg_lc.pl
This Perl program launches a process to check the validity of the links (URLs) in a ROADS server database. On completion it sends email to the server's system admin and database admin contacts.
It is intended for invocation from a World-Wide Web CGI program, a cron job, or an at job.
None.
Mail to server maintainers. Link check log file left in logs/lc.
the manual page for bin/lc.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/bogus.pl - flag possible errors in ROADS installation
bin/bogus.pl [-h]
This Perl program tests the following aspects of the ROADS installation:
Generate output in HTML format
List of phases, and problem information if any problems found.
the manual page for admin-cgi/bogus.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/countattr.pl - count the attributes used in a template
bin/countattr.pl [-adh] [-s sourcedir] [file1 file2 ... fileN]
This Perl program runs through a set of IAFA (or IAFA style) templates and generates a report of which fields have been used and how many times.
Iterate over all of the templates in the source directory
Generate debugging information.
Generate output in HTML format.
Change the source directory from the default.
Mail to server maintainers.
the manual page for admin-cgi/countattr.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Jon Knight <jon@net.lut.ac.uk> Martin Hamilton <martinh@gnu.org>
bin/cullsl.pl - cull entries from subject listings
bin/cullsl.pl [-ANacdh] [-f directory] [-l view] [-m filename] [-n name] [-p pattern] [-s directory] [-t directory] [-u name] [handle1 handle2 ... handleN]
The cullsl.pl program removes one or more templates from a set of subject listing files. These changed listing files are also converted into static HTML documents which can be placed on the WWW. The program also generates HTML lists in numerical and alphabetical order based on the contents of a subject descriptor mapping file. This program shares many of its configuration files with addsl.pl.
cullsl.pl which lets you remove selected templates' details from the subject listings generated by addsl.pl. This uses the same mechanism as addsl.pl, and simply takes the handles of the templates you wish to remove as its arguments when run, e.g.
% cullsl.pl 814010256-14355
A number of options are available cullsl.pl program to control which files are used for generating the subject listings and where configuration options are located:
Don't generate alphabetically sorted breakdown of subject categories.
Process all templates in source directory.
Specify that the alphabetical listing should take acount of the case of the characters. Without this option, acorn , Apple and Zebra are sorted in that order. With this flag set, they would be sorted as Apple , Zebra and acorn.
Specify that some (fairly copious) debugging information should be generated during the generation of the hypertext tree. This option is probably not of interest to anyone bar the developers.
Specify the directory for views configuration files.
Provide some online help outlining the options available and exit.
Set subject listing view name. This is the name of the file that contains the configuration information concerning the location of the listings, HTML and outline files. For more information on this see below.
Specify the subject descriptor mapping file to use.
Specifies the name of the database to use - by default this is the name of the service as entered when the ROADS software was installed.
Only enter entries in the subject listings for templates that have URI fields that match the supplied pattern. The pattern can be a full Perl regular expression and allows one to, for example, restrict entries in the subject listings to only include UK academic sites. By default the pattern matches all URLs and so all templates are included in the hypertext lists.
Set the absolute pathname of the directory containing the IAFA templates.
Set the absolute pathname of the directory where subject listing mapping files (internal files used to maintain state between runs of the ROADS subject listing tools) should be stored.
Sets the name of the Subject-Descriptor-Scheme to search for in the templates. The default is UDC.
These options are then followed by zero or more templates handles (note -
not filenames). If the -a option is given, no handles
need be given on the command line; all templates in the database will be
added to the subject listings.
config/class-map - default mappings from
Subject-Descriptor-Scheme attributes in templates to filenames used for generating HTML.
config/subject-listing/* - view files, each of which describing a particular way of rendering the templates into HTML.
config/multilingual/*/subject-listing-views/* - HTML rendering rules for subject listing views.
guts/subject-listing/*.lst - default location of the internal files used to maintain state between runs of subject listing tools.
htdocs/subject-listing - default location of the HTML generated by cullsl.pl
The subject descriptor mapping file specifies the code for a particular subject section, the name given to that section in the HTML documents and the root of the filename used to hold that section's hypertext listing, each element being separated by a colon. An example line from a subject descriptor mapping file (for the UDC subject descriptor scheme) is:
30.442:Development Studies:devstud
Note that the section name should not contain the colon character ":" - this would confuse addsl.pl.
The cullsl.pl can generate a number of different subject listings. This allows, for example, a subject listing of UK based resources in addition to a subject listing of all resources. The views also allow easy selection of which subject listing a template should be added to in the mktemp.pl editor.
The view is specified by a view file. An example file is:
HTML-Directory: /WWW/htdocs/ROADS/subject-listing/ WWW-Directory: /ROADS/sl/ NumList-File: /ROADS/sl/numeric.html AlphaList-File: /ROADS/sl/numeric.html Listing-Directory: /usr/local/ROADS/guts/subject-listing/ Mapping-File: /usr/local/ROADS/config/subject-listing/class-map Generate-Children: yes
The various attributes currently defined in the view file are:
The name of the file into which addsl.pl will save a list of the subject categories sorted by alphabetical order.
Turns on case folding when alphabetising the list - the same as the -c option on the command line.
Whether or not to generate subject listings for templates that only have ChildOf relation types in them.
The path to the directory in which the subject listing HTML documents should be generated. This directory should be accessible to the HTTP daemon that serves the ROADS documents if they are to be accessible via the World Wide Web. If the path is a relative one, it is assumed to be relative to the ROADS htdocs directory, i.e. the directory under which the ROADS related HTML documents are rooted.
The path to the directory in which the subject listing files should be located. This is typically a subdirectory of the guts directory of the ROADS installation, where internal files used only by the ROADS software are kept. If this is a relative path, it is assumed to be relative to the ROADS guts directory.
The path to the subject descriptor mapping file. If this is a relative path, it is assumed to be relative to the ROADS config directory.
The name of the subject scheme that this view relates to.
The WWW path to the directory in which the HTML generated by addsl.pl will appear. This includes the AlphaList-File and NumList-File listings.
the manual page for bin/addsl.pl, the manual page for
bin/addwn.pl, the manual page for bin/cullwn.pl, the
manual page for bin/mkinv.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Jon Knight <jon@net.lut.ac.uk>, Martin Hamilton <martinh@gnu.org>
bin/cullwn.pl - cull stale entries from what's new listings
bin/cullwn.pl [-cdh] [-f directory] [-n name] [-w name] hhmmssDDMMYYYY
The cullwn.pl program removes entries from a What's New listing file that were added before a certain date. The new listing file is then converted into a static HTML document which can be placed on the WWW. The What's New file is intended to show end users what resources have just been catalogued by the ROADS service and/or when some aspect of a catalogued resource's template has changed.
It is anticipated that you will want to remove What's New listing entries which are past their use-by date, and the ROADS software provides a tool to help you do this. cullwn.pl will remove any What's New entries which are older than a given date - or the current date if no date is specified. At the moment you have to run this from the command line, but in a future version of the software we will be providing a World-Wide Web front end.
The cullwn.pl tool uses the same view configuration information as the addwn.pl tool - see the section on this for more information. It can be run either with or without a date from which to begin culling, e.g.
(start culling from now...)
% cullwn.pl
(start culling from the 15th of January 1997...)
% cullwn.pl 00000015011997
A number of options are available for the cullwn.pl program to control which files are used for generating the subject listings and where configuration options are located:
Specify that the alphabetical listing should take acount of the case of the characters. Without this option, acorn , Apple and Zebra are sorted in that order. With this flag set, they would be sorted as Apple , Zebra and acorn.
Specify that some (fairly copious) debugging information should be generated during the generation of the hypertext tree. This option is probably not of interest to anyone bar the developers.
Specify the directory for views configuration files.
Provide some online help outlining the options available and exit.
Specifies the name of the database to use - defaulting to the service name which was entered when the ROADS server was created.
Sets the name of What's New view to use in configuring the cullwn.pl script.
These options are then followed by a 14 character time and date string in the following format:
hhmmssDDMMYYYY
where (in order):
config/whats-new/* - "What's New" view specifications
config/multilingual/*/whats-new-views/* - rendering rules for the various "What's New" views
htdocs/whats-new.html - default location of listing.
The cullwn.pl program can generate a number of different subject listings. This allows, for example, a subject listing of UK based resources in addition to a subject listing of all resources. The views also allow easy selection of which subject listing a template should be added to in the mktemp.pl editor.
The view is specified by a view file. An example file is:
HTML-File: /WWW/htdocs/ROADS/whats-new.html Listing-File: /usr/local/ROADS/guts/whats-new/Default.lst
The various attributes currently defined in the view file are:
The path to the file in which the subject listing HTML document should be generated. This file should be accessible to the HTTP daemon that serves the ROADS documents if the HTML document is to be accessible via the World Wide Web. If the path is a relative one, it is assumed to be relative to the ROADS htdocs directory, i.e. the directory where ROADS related HTML documents are rooted.
The path to the file in which the What's Newm listing file should be located. This is typically located in the guts directory of the ROADS installation, which is where the internal files needed by the ROADS software are found. If the path is a relative one, it is assumed to be relative to the ROADS guts directory.
the manual page for bin/addsl.pl, the manual page for
bin/addwn.pl, the manual page for bin/cullsl.pl, the
manual page for bin/mkinv.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Jon Knight <jon@net.lut.ac.uk>, Martin Hamilton <martinh@gnu.org>
bin/deindex.pl - remove templates from index
bin/deindex.pl [-c ci_path] [-dh] [-i index_dir] [-s source_dir] [-t tmp_dir] handle1 handle2 ... handleN
The deindex.pl script removes one or more templates from a filesystem based inverted index of IAFA templates created by mkinv.pl. The inverted index allows the search.pl and admin.pl programs programs to rapidly match keywords and boolean expressions in a large number of IAFA templates. The deindex.pl program removes all keywords from the inverted index associated with the specifed template(s).
A number of options are available for the deindex.pl program to control where it looks for its files:
Location of the RCS ci program.
Enable debugging mode.
Display usage.
Set the absolute pathname of the directory in which the resulting inverted index is to be placed. By default this is the guts directory of the ROADS installation.
Set the absolute pathname of the directory containing the IAFA templates. By default this is the source directory of the ROADS installation.
Set the absolute pathname of the temporary directory used to hold a working copy of the template(s) being de-indexed.
The options are then followed by one or more template handles to be deindexed. The deindex.pl script removes all traces of these templates from the selected inverted index. The script also archives a copy of the template in a .archive subdirectory of the IAFA template source directory. This archiving uses the GNU Revision Control System (RCS) if available, allowing multiple copies of a template's change history to be recorded.
config/guts - default location of index data
config/source - default location of template database
config/source/.archive - location of archived templates
the manual page for bin/admin.pl, the manual page for
bin/mkinv.pl, the manual page for bin/search.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Jon Knight <jon@net.lut.ac.uk>
bin/dodgy.pl - find persistently stale templates
bin/dodgy.pl [-l basename] [-n grace]
This Perl program analyses the results of the last three runs of the ROADS link checking tool, and returns a list of the templates which have been unreachable at least a given number of times.
It is intended for invocation from the likes of a World-Wide Web CGI program, a cron job, an at job. Another tool has been written to take the results of this program and modify the actual templates so as to remove them from the portion of the ROADS server's database which is visible to the end user.
This is the path to the link checker log files. By default it is assumed to be the lc file in the ROADS logs directory. A different number is appended for each log run - lc, lc.0, lc.1 and so on.
This is the amount of "grace" to allow before removing templates, e.g. to flag templates when two sessions out of three are bad:
-n 2
List of filenames for the templates which have been persistently unreachable.
the manual page for admin-cgi/dodgy.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/dup_urls.pl - check for duplicate URLs in a collection of IAFA templates
bin/dup_urls.pl [-ad] [-s sourcedir] [file1 file2 ... fileN]
This program looks for duplicate URLs in IAFA templates, such as may be found on a ROADS server.
dup_urls.pl produces a report listing any duplicate URLs it comes across, and the handle names of the templates in which they are found.
dup_urls.pl takes the following arguments:
This argument indicates that all of the templates in the given source directory should be processed.
If this argument is given, debugging information will be dumped to the standard error output stream
This argument can be used to override the default template source directory, which is where dup_urls.pl looks for IAFA templates to check.
the manual page for admin-cgi/dup_urls.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/exterminate.pl - remove templates with persistently unreachable URLs
bin/exterminate.pl
This Perl program runs another tool in order to discover which templates have been persistently unreachable. Each of the resulting templates is modified so that any existing Status attribute is stripped out, and a new one introduced:
Status: stale
Finally, the ROADS server resource description database is reindexed.
The program is intended for invocation from a World-Wide Web CGI program, a cron job, or an at job.
None.
It is assumed that there is only one template per file.
the manual page for admin-cgi/exterminate.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/freq.pl - term frequency counter for IAFA style templates
freq.pl [-ad] [-f maxhits] [-m min-count] [-s sourcedir] [-t tmpdir] [-A attrib1|attrib2|...|attribN]
This Perl program will look at all the IAFA style templates in a given directory, and count the number of times each term found in the templates occurs. This has a number of uses - notably in determining an appropriate stop-list of words which should not be indexed, and in helping the user to devise an effective query.
Frequently appearing terms such as a, and the will likely cause large numbers of spurious hits when people search your database. To reduce the likelihood of this, we have added a ``stoplist'' feature to the ROADS search back end - this lets you arrange for certain search terms to be automatically removed, and we ship a sample stop list with the ROADS distribution.
The default behaviour is to sort the frequency count into order, and return the top fifty terms. This can be overridden by a set of command-line options.
send back a complete frequency count, rather than just the most frequently used terms
produce verbose debugging output
send back at most the top maxhits most frequently used terms, e.g. to see the top 100 with debugging info
freq.pl -df 100
stop once the frequency count falls below min-count, e.g. to get a list of all the terms which occur more than 999 times
freq.pl -m 999 | cut -f2 -d' '
look for the templates in the directory sourcedir, e.g. to use the templates in the directory /work2/WWW/roads and return a complete frequency breakdown
freq.pl -as /work2/WWW/roads
use tmpdir as temporary directory. This defaults to /tmp, but you may need to change the default if your machine does not have enough room in /tmp for any temporary files generated by freq.pl, e.g.
freq.pl -t /var/tmp
only produce frequency list for the attributes listed in attribute-list. attribute-list is a '|' (pipe) separated list of attribute names, e.g.
freq.pl -A 'description|keywords'
The output of freq.pl consists of the frequency count for a term, followed by a single space character, followed by the term itself, e.g.
310 research 283 mailing 270 available 268 University
An external program called "sort" is used to sort the frequency count into
descending order. This is a standard feature of most (all?)
implementations of Unix, but the command line options it takes may differ
from version to version. Let us know if you find a version which does not
understand -r, -n or -T!
Nothing ? :-)
the manual page for admin-cgi/freq.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/harvest_centroid.pl - extract centroid from SOIF or Harvest Broker/Gatherer
bin/harvest_centroid.pl [-d] [-h host] [-p port] [-s serverhandle]
This program tries to extract a WHOIS++ compatible centroid from one of the following :-
If invoked with a host name or IP address to contact, this program will try to establish whether it is talking to a Harvest Gatherer or Broker, and send the appropriate command to fetch a dump of the entire contents of the Gatherer or Broker's database.
With no -h argument, this program will expect to receive a
collection of SOIF templates on STDIN, such as you could get by
gzip -dc /usr/local/harvest/gatherers/*/All-Templates.gz
or
gdbmutil dump /usr/local/harvest/gatherers/*/PRODUCTION.gdbm
Note that when generating a centroid from a flat file collection of SOIF
templates, the -s argument should be used to specify a
serverhandle for the resulting centroid.
Turn on debugging output - very verbose!
The host name or IP address of the server to contact, if talking to a Gatherer or a Broker
The port number to use when connecting to a Gatherer or a Broker. This defaults to 8501 if not set, which is Harvest's default for a Broker when it's created.
We should let people specify the starting time for the poll, and pass this on to the Broker/Gatherer, so that it's possible to do a relative "poll" of the Harvest server.
We don't do anything special about character sets/encodings.
Not up to date with current CIP specifications - this is really intended for use with a WHOIS++ server which speaks the old RFC 1913 indexing protocol.
Should be integrated with wpp_shim.pl, so that WHOIS++ servers which cannot load a centroid from a flat file can think they're polling a WHOIS++ server - when in fact the shim would simply be returning a centroid which had been calculated already.
the manual page for bin/harvest_shim.pl, RFC 1913
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/harvest_shim.pl - search gateway between WHOIS++ and Harvest Broker
bin/harvest_shim.pl [-h host] [-p port]
This program relays WHOIS++ search requests to a Harvest Broker, and returns the results in WHOIS++ result format.
Before passing the WHOIS++ query on to the Harvest Broker, it is munged to remove WHOIS++ search syntax which would confuse the Broker. The search results, if any, are massaged into WHOIS++ templates using the template type FILE
The host to contact, or "localhost" by default.
The TCP port number to use, or 8501 by default.
Should be rewritten to allow for stand-alone operation.
the manual page for bin/harvest_centroid.pl, RFC 1913
Copyright (c) 1988, Peter Valkenburg <valkenburg@terena.nl>, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Peter Valkenburg <valkenburg@terena.nl>, Martin Hamilton <martinh@gnu.org>, Jon Knight <jon@net.lut.ac.uk>.
bin/iafa2digger.pl - convert IAFA templates to Digger v2 input format
bin/iafa2digger.pl [-ad] [-o outlinedir] [-s sourcedir] [file1 file2 ... fileN]
This Perl program converts IAFA templates such as those generated by the ROADS template editor into the format accepted by version 2 of Bunyip's "Digger" WHOIS++ server. This is necessary because Digger takes its input in the WHOIS++ on-the-wire format, which is slightly different to the IAFA templates used internally within the ROADS software.
Process all of the templates in the source directory.
Generate debugging information.
Use the template outline descriptions (the attributes and values which are legal in each template and cluster) from this directory.
Look in this directory for the templates being converted.
A single file containing the Digger formatted templates.
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/iafa_lint.pl - perform sanity check on a collection of IAFA templates
bin/iafa_lint.pl [-ad] [-o outlinedir] [-s sourcedir] [file1 file2 ... fileN]
This program performs some basic checks on the contents of a collection of IAFA templates, such as may be found on a ROADS server.
The contents of each template are checked against an outline version of that template type. Outline templates are used within the ROADS software to indicate the fields which a template may contain, and provide some of the configuration information used by the WWW based template editor.
iafa_lint.pl produces a report listing any of these problems which it finds with the IAFA templates it processes. The following checks are performed:
These are neither continuation lines from previous attribute/value pairs, or the beginning of a new attribute/value pair. They may be caused by, for example, hand editing templates and forgetting to indent continuation lines by a least one space.
In ROADS, records must have at least one space following the colon after the attribute name and before the value. This might not be the case if the record has be hand edited or imported from another system.
If attributes appear in a template but not in the outline specification for this template type, attention is drawn to them. It also serves to flag mis-spelled attribute names, e.g. Handel instead of Handle.
This is effectively an error if you are using the ROADS software, since the outline information is used in a number of places. It also serves to draw notice to mis-spelled Template-Type values!
This is to draw your attention to any templates whose handles (unique IDs within the database) contain whitespace characters such as tabs and spaces. Whitespace should not appear in handles because this would confuse some of the ROADS tools.
The use of older versions of the ROADS template editor tool mktemp.pl, and some external programs such as text editors and FTP clients, may result in templates containing carriage return characters - ASCII code 13. We have tried to make the ROADS tools fairly tolerant, but this may cause problems.
iafa_lint.pl takes the following arguments:
This argument indicates that all of the templates in the given source directory should be processed.
If this argument is given, debugging information will be dumped to the standard error output stream
This argument can be used to override the default outline directory, which is where the outline versions of each template type are stored.
This argument can be used to override the default template source directory, which is where iafa_lint.pl looks for IAFA templates to check.
You may need to either modify an existing outline file or create a new one, depending on whether you have invented a new template type or changed the attributes in an existing one. A set of default template outlines are distributed with the ROADS software, and can be found in the directory "\$ROADS::Config" on your installation.
It is necessary to have outline files for each template type which you will be checking using iafa_lint.pl
Each outline file must feature the Template-Type and Handle attributes. Attributes which only occur once should be written as they appear in the template, e.g. Title. Attributes which may occur multiple times should be written as variants, e.g. URI-v*. Finally, it is possible to refer to clusters of attributes drawn from another type of template by writing its name in brackets after a disambiguating prefix, e.g. Admin-(USER*).
A sample outline specification for a very short SERVICE template would look like this:
Template-Type: SERVICE Handle: Title: URI-v*: Admin-(USER*):
Note that other information may appear after the ":" character. This is not used by iafa_lint.pl.
admin-cgi/iafa_lint.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/info.pl - display information about the ROADS server installation
bin/info.pl
This Perl program scans the ROADS installation for the following information:
None.
The contents of ROADS.pm from the ROADS library directory, and the result of doing a uname -a.
the manual page for admin-cgi/info.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/lc.pl - Perl based HTML/IAFA link checker
bin/lc.pl [-acdilPsvux] [-b base_url] [-g guts_dir] [-p proxyurl] [-r seconds] [-t templatedir] [-w when_changed] [file1 file2 ... fileN]
This program will take a set of URLs on their own, in a set of IAFA templates, or in HTML documents and attempt to check their accessibility. It can be passed a list of file names to examine on the command line or via standard input, e.g.
find . -print | lc.pl -i
or
lc.pl -v *.html > logfile
Normal behaviour is to ignore directories, files whose names begin with a dot ".", and files which do not appear to contain HTML - based on their suffix. This last restriction can be removed with a command line option which tells the program to assume the files are all IAFA templates.
Currently the only URL schemes which can be checked with lc.pl are "http:", "gopher:", "ftp:" and "wais:". A future version may try to check other URL schemes.
lc.pl will not follow links in HTML documents recursively!
It is recommended that a World-Wide Web cache server be used as a go-between in the link checking process. This can be enabled via environmental variables, e.g. in the style of csh and tcsh:
setenv http_proxy "http://wwwcache.lut.ac.uk:3128/" setenv gopher_proxy "http://wwwcache.lut.ac.uk:3128/" setenv ftp_proxy "http://wwwcache.lut.ac.uk:3128/" setenv wais_proxy "http://wwwcache.lut.ac.uk:8001/" setenv no_proxy "lut.ac.uk"
Or in the sh/bash/ksh/zsh style:
http_proxy="http://wwwcache.lut.ac.uk:3128/" gopher_proxy="http://wwwcache.lut.ac.uk:3128/" ftp_proxy="http://wwwcache.lut.ac.uk:3128/" wais_proxy="http://wwwcache.lut.ac.uk:8001/" no_proxy="lut.ac.uk" export http_proxy gopher_proxy ftp_proxy wais_proxy no_proxy
The -p and -P options may also be used to affect proxying
and hence caching behaviour. Note that if you use -p to specify a
single proxy server for all your requests, this must be capable of
handling any "wais:" URLs that may be passed to it. You can run
lc.pl with the -l option to check for these in advance
of doing the actual link check.
In addition to cache support via the proxy HTTP mechanism - URLs which have already been visited during an link checking session will not be requested again in the same session, and the HTTP "HEAD" method is used whenever an "http" URL is requested. The time to sleep between requests is configurable, defaulting to two seconds.
check all IAFA templates. Uses ROADS default template directory, or another
directory specified with the -t option. Implies -i.
specifies a base URL which will be used to make any relative links absolute, e.g.
-b http://www.roads.lut.ac.uk/
check HTTP URLs which appear to run a script, i.e. contain the strings "/htbin/", "/cgi-bin/", or "?". Normally these will not be checked
generate debugging info
'guts' directory, used to hold DBM databases of Last-Modified times and Content-Length information on a per URL basis.
specify source is IAFA templates, default is HTML
don't actually check, just dump out URLs. This can be useful in finding out which URLs are cited, which documents make the citations, and so on
proxy all requests through the URL which follows, e.g.
-p http://wwwcache.lut.ac.uk:3128/
don't import any proxy settings from the environment
rest time between URL lookups (default is 2 seconds). This feature is
turned off is you enabled the -l option, since there is not going
to be any networking going on
strict checking mode, default is not to follow links which look as though they might be to large objects, e.g. MPEG movies. Strict mode causes all links to be checked
look in this directory for IAFA templates when -a option is
enabled
list unchecked URLs to stderr, e.g.
lc.pl -u *.html >successlog 2>failslog
list OK URLs as well as stale URLs
list only URLs which have changed in the last N days
the input is a series of URLs, rather than IAFA or HTML files, e.g.
lc.pl -x < my_list_of_urls
The basic format for lc.pl output is
<HTTP response code> <name of file containing URL> <URL>
e.g.
404 SOSIG347 http://www.iss.u-tokyo.ac.jp/center/SSJ.html
Libwww-perl automatically translates the result codes of requests in
protocols other than HTTP into their HTTP equivalents. If you use the
-v option to get the results of successful requests too, the
successful requests will be stamped with a 200 repsonse code,
e.g.
200 SOSIG345 http://www.ssd.gu.se/enghome.html
The output generated by the -u and -l options takes the
form
<name of file containing URL> <URL>
e.g.
SOSIG345 http://www.ssd.gu.se/enghome.html
The libwww-perl package is used to parse HTML documents, and to check the links themselves. At the time of writing, libwww-perl version 5 and Perl version 5.003 or above are recommended
Add support for other protocol schemes ? "finger:" should be easily done via proxy HTTP, but the cache servers don't speak this protocol scheme yet (and neither do many WWW authors?) "mailto:" and "mailserver:" could be done up to a point with code which checked for valid domain names, MX records and so on. An SMTP session to the remote server would be do-able, but then we wouldn't be able to take advantage of the current caching infrastructure... "telnet:" is another case in point. We could check the machine had a working DNS entry, and perhaps try to ping it, or even connect to the listed port. How far to take this is a matter for debate!
the manual page for admin-cgi/lc.pl, the manual page for
bin/report.pl, the manual page for admin-cgi/report.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/lc2sd.pl - convert from Library-Catalog to Subject-Descriptor
bin/lc2sd.pl [-h] [-s directory] [-u name]
The lc2sd.pl program is intended to change any Library-Catalog fields in a set of templates into Subject-Descriptor fields. Older versions of the ROADS software (prior to v0.2.0) generated Library-Catalog and they were in several old versions of the Internet Draft describing IAFA templates. This program converts these templates into a format compatible with the latest IAFA Internet Draft.
A number of options are available for the lc2sd.pl program:
Provide some online help outlining the options available and exit.
Sets the absolute pathname of the directory containing the IAFA templates.
Sets the name of the classification scheme that is to be inserted into Subject-Descriptor-Scheme fields.
the manual page for bin/addsl.pl, the manual page for
bin/cullsl.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Jon Knight <jon@net.lut.ac.uk>
bin/mail_owners.pl - send mail to people whose links have gone stale
bin/mail_owners.pl [-d] [-m mailtemplate] [-t ownertable]
This Perl program takes the results of the link checking tool and uses either a prepared table of maintainers for the various parts of the filesystem or stat to find out who is responsible for bad URLs.
This collected data of failed URLs is then mailed to each of these maintainers, if and only if there are bad URLs on their pages. Hopefully, these users will then take the appropriate actions.... :)
It is suitable for invocation from a World-Wide Web CGI program, a cron job, or an at job.
Generate debugging information
Form letter to be sent to all those whose links are stale
This file indicates who is responsible for a given file or hierarchy of files, and this information will be used to determine who to send the link checker report to. If it is not specified, the user name of the person who owns the file will be used as the contact address instead.
Link checker summary report in the format
<HTTP-RC> <file> <URL>
e.g.
200 /home/roads/source/SOSIG106 gopher://nisp.ncl.ac.uk:70/
Where HTTP-RC is the HTTP (or equivalent) response code for the request. Non-HTTP response codes will have been translated into HTTP style response codes before the link checker report is dumped out.
Warning messages to information providers.
This is really geared up to WWW server maintainers, rather than ROADS server maintainers. It should have a way of extracting the contact address from the templates if desired.
the manual page for bin/lc.pl
Copyright (c) 1988, Mattias Borrell <mattias@munin.ub2.lu.se>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by Lund University NetLab, as part of the DESIRE project. DESIRE is funded under the European Commission Telematics for Research Programme.
Mattias Borrell <mattias@munin.ub2.lu.se>
bin/makethes.pl - create thesaurus file or another DB(M) database
bin/makethes.pl [-d] [-f filename]
This program will create a DB(M) database based on a series of whitespace separated attribute/value pairs in a line delimited text file.
Turn on debugging.
DB(M) database filename to operate on.
config/Thesaurus* - default DB(M) database and input files
the manual page for admin-cgi/mktemp.pl, the manual page for
admin-cgi/dumpdbm.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/mkinv.pl - build ROADS database index
bin/mkinv.pl [-adhu] [-i directory] [-m minsize] [-s directory] [-t directory] [-x stoplist] [-y stopattr] [-z alltemps] [handle1 handle2 ... handleN]
The mkinv.pl program generates an index of IAFA templates which can be searched using the search.pl and admin.pl CGI programs. The index is used by these programs to rapidly match keywords and boolean expressions in a large number of IAFA templates.
A number of options are available to the mkinv.pl program to control where it looks for its files:
Index all the templates in the specified source directory.
Turn on debugging mode.
Provides online help and exits.
Set the absolute pathname of the directory in which the resulting inverted index is to be placed.
Don't index terms which are shorter than this - default is two characters.
Set the absolute pathname of the directory containing the source IAFA templates.
Set the absolute pathname of the directory to be used for intermediate temporary files. This option is useful if you find that you are running out of room in the system default temporary directory during particularly large indexing runs.
Unlink temporary files when in debug mode. Gives visual feedback without leaving lots of unsightly junk lying around.
The absolute pathname of a file containing a list of terms which should not be indexed.
The absolute pathname of a file containing a list of attributes which should not be indexed.
The absolute pathname of the file to which the list of template handle to filename mappings should be saved.
If the -a option is not used, the mkinv.pl script expects
one or more filenames containing IAFA templates to be given. These files
are then processed, and all the templates in them are indexed.
config/stopattr - default list of attributes to exclude from the index.
config/stoplist - default list of terms to exclude from the index.
guts/index* - index files themselves.
guts/alltemps - list of template handle to filename mappings.
source - the source templates themselves.
the manual page for admin-cgi/admin.pl, the manual page for
bin/deindex.pl, the manual page for
admin-cgi/deindex.pl, the manual page for
cgi-bin/search.pl, the manual page for
admin-cgi/mktemp.pl
The indexer will only correctly index IAFA templates that have a Template-Type attribute first and a Handle attribute second. All other attributes can be in any order. All templates generated by the ROADS software are in this format but the actual IAFA Internet Draft is not as strict. If you are processing templates derived from outside the ROADS system, be sure to ensure that these conditions hold before attempting to index them with mkinv.pl.
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Jon Knight <jon@net.lut.ac.uk>, Martin Hamilton <martinh@gnu.org>
bin/rebuild.pl - rebuild ROADS index, subject/what's new listings
rebuild.pl [-adp] [-s source_dir] [-t index_dir] [-S subject_listing_view] [-W whats_new_view] [handle1 handle2 ... handleN]
To allow the indexing and addition to the subject lists and whats new files to take place, the bin/rebuild.pl program must have access to the bin/deindex.pl, bin/mkinv.pl, bin/addsl.pl, bin/addwn.pl scripts.
Index all templates rather than just specified handles.
Turn on debugging mode.
Incorporate templates stored in holding area, normally guts/pending.
Set template source directory.
Set template index directory.
Set the subject listing view to use.
Set the "What's New" view to use.
config/subject-listing - subject listing views.
config/whats-new - "What's New" views.
guts/pending - holding area for templates created using the offline mode in the template editor.
guts/index* - index files.
source - templates themselves.
the manual page for bin/addsl.pl, the manual page for
bin/addwn.pl, the manual page for bin/deindex.pl, the
manual page for bin/mkinv.pl,
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Jon Knight <jon@net.lut.ac.uk>, Martin Hamilton <martinh@gnu.org>
bin/report.pl - generate report based on link check results
bin/report.pl [-h] [-l logname] [-s sortpath]
This Perl program generates a human digestable summary report of the errors which arose in the specified link checking run, i.e. those requests for which the response was not HTTP 200 or equivalent.
The often cryptic response codes are translated into plain English using the libwww-perl package, and the report is broken into sections, each of which deals with the occurrences of a particular problem.
Use HTML formatting.
The name of the file which contains the link checker session log.
The location of a sort program to use instead of the default.
List of link checker problems.
logs/lc - log file created by link checker run.
The Unix sort program is used, as is the libwww-perl-5 package. The latter is also a dependency for the link checker itself.
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/review.pl - generate template review info breakdown
bin/review.pl [-dnr] [-a attribute] [-o owner] [-s sourcedir] [-v view]
This Perl program checks resource descriptions to see whether they have passed their review date. It is intended for invocation from a World-Wide Web CGI program, a cron job, or an at job.
The report which this tool generates can be customized via a view file, which specifies the attributes which should appear in the listings of templates which are due for review.
The review.pl tool lets you automatically search your database for templates which are due to be checked. This works by scanning the To-Be-Reviewed-Date attribute in each template, if present. It has the limitation that it only understands the following two ways of writing the date and time:
Fri Aug 1 23:00:00 1997 Tue, 23 May 98 13:51:41 GMT
To deal with the ``year 2000'' problem, years which are only two digits will automatically have 1900 added to them. We've tried to make the ROADS software immune to year 2000 bugs - please let us know if you spot any problems in this area so that we can fix them.
Attribute to look in for record owner's email address.
Generate debugging information
Match templates which have no To-Be-Reviewed-Date attribute.
Owner to search for - typically email address. It is assumed that you know this already.
Template must have To-Be-Reviewed-Date attribute.
Directory where resource descriptions may be found, if not default.
Template view to be used. This is a file which specifies the attributes which should be returned (if present) in the summary report.
Summary report on templates which are due for review.
config/review-views - alternative sets of attributes to return in review.pl reports.
the manual page for admin-cgi/review.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/simplethes.pl - simple sample thesaurus plug-in
bin/simplethes.pl [-d] [-f filename]
This is a simple example program which is intended to illustrate the possibilities for using Perl and DB(M) databases to perform query expansion. The query to be expanded is passed as an environmental variable QUERY_STRING, as per the CGI specification.
Turn on debugging mode.
The name of the thesaurus DB(M) database which should be used.
the manual page for bin/wppd.pl, the manual page for
admin-cgi/dumpdbm.pl, the manual page for
bin/makethes.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Jon Knight <jon@net.lut.ac.uk>
bin/snarf.pl - do a WHOIS++ search and snarf the resulting handles
bin/snarf.pl [-dfl] [-p port] server query
The snarf.pl program performs a WHOIS++ search on the specified server and returns a list of the matching handles on a line by line basis. Note that the search must be structured as per the WHOIS++ query syntax defined in RFC 1835, the WHOIS++ protocol specification.
If the search was performed successfully, snarf.pl returns 0, otherwise it returns -1.
Turn on debugging output
Dump out the full records
Dump results out on one line, suitable for use as parameters to another program
Specify port number, default is your ROADS WHOIS++ server's port number, or 63
the manual page for cgi-bin/search.pl, the manual page for
bin/wppd.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>, Jon Knight <jon@net.lut.ac.uk>
bin/templateadmin.pl - template editor ACL manager
bin/templateadmin.pl [-h handle] [-o operation] [-u user]
This program provides a mechanism for adding users to and removing users from the access control lists used by the ROADS template editor. The access control lists (if present) control which users are allowed to update the nominated templates.
This is the handle to be operated on.
This is the operation to be carried out, one of
ADD - add user to the ACL for this template DELETE - delete user from the ACL for this template LIST - list ACL for this template
The user name to add or delete from the ACL for this template.
config/template_users - DB(M) database of template ACLs.
the manual page for admin-cgi/tempuserauth.pl, the manual page for
admin-cgi/mktemp.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Jon Knight <jon@net.lut.ac.uk>
bin/wig.pl - gather indexes (centroids)
bin/wig.pl [-d] spec_file
The wig.pl program is used to gather WHOIS++ index and Common Indexing Protocol (CIP) centroids from remote servers. Its is intended to be run either from the command line or, more likely, from cron periodically. It implements the protocol described in RFC 1913, and the client side of the Common Indexing Protocol. Please note that at the time of writing, CIP was still under development by the IETF's FIND working group. Please let us know if you find any interoperability problems.
The upshot is that wig.pl lets you configure your ROADS WHOIS++ server to grab the database indexes from other people's WHOIS++ and CIP aware servers, e.g. CNIDR's Iknow and Bunyip's Digger. When a search performed on your server matches information in one or more of these indexes, the client will be returned a "referral" to the relevant server or servers. The ROADS WWW based WHOIS++ client, search.pl, will automatically follow these referrals and search the indexed WHOIS++ servers in addition to your own.
Enter debug mode (only of interest to developers and during debugging)
config/wig/* - index gatherer specification files
guts/wig/* - per-server centroids
Note that the config file name in config/wig should both be the same as the indexed server's WHOIS++ server handle. This is the "Serverhandle" parameter in lib/ROADS.pm. Each server you index must have a unique server handle.
wig.pl is configured at run time by specifying the name of an indexing specification file. This filename is mandatory and it is assumed to be a file within the config/wig directory. Each line in the specification file contains either a comment (indicated by a hash character at the start of the line) or a configuration directive, followed by a colon and whitespace and then the value for that directive. valid directives are:
The hostname of the machine that is to be polled for a centroid. A specification file must contain the hostname of the remote server
The port number of the remote server that is to be polled. By default this is assumed to be the same as the port number of the local ROADS WHOIS++ server.
The type of poll to perform. This can either be CENTROID or QUERY. By default it is CENTROID.
For a QUERY type-of-poll the directive specifies the WHOIS++ style search string to send to the remote server. For CENTROID type-of-poll, it can take on two values: FULL or RELATIVE. A FULL poll-scope means that the FULL centroid should be return (taking into accound the Start-Time and End-Time still) whereas RELATIVE means that the centroid returned should contain any changes since the last poll by this index server. The default for a CENTROID type-of-poll is FULL.
The time before which we're not interested in changed centroid details. The default is empty (ie no constraint on the start time).
The time after which we're not interested in changed centroid details. This directive and Start-Time allow a selective subset of the remote servers centroid to be returned based on when the underlying data changed. The default is empty (ie no constraint on the end time).
The name of the template from which the centroids should be generated, or the special value ALL. ALL means consider all templates on the remote server. The default is ALL.
The list of names of fields that are of interest in the centroid, or the special value ALL. ALL means consider all fields within the specified template(s) when generating the centroid. The default value is ALL.
Specifies this machine's relation to the remote server. This directive can take one of three values: Topology, Geographical or Administrative (note that these are case sensitive). Topology means that this index server is indexing the remote server because of its place in the network topology, Geographical means that it is indexing the remote server because of their respective geographical locations and Administrative means that the indexing is taking place because of an administrative decision. The default value is Administrative.
A free text description of this index gatherer (or its related WHOIS++ server that makes use of the centroids it gathers) which the remote server can use when asked to describe the servers that index it. There is no default value for this directive.
This directive specifies the type of authentication to supply to the remote server. Common values are NONE (for no authentication) and Password (for a simple plaintext password exchange). RFC 1913 does not specify any others but any value that is understood by the remote server can be entered in this directive. There is no default value for this directive.
This directive's value is used inconjunction with the Authentication-Type directive to pass the actual password, key or other data required for this index server to be authenticated to the remote server. There is no default value for this directive.
The presence of this directive (its value doesn't actually matter) indicates that the remote server should be polled using the Common Indexing Protocol, rather than the standard WHOIS++ centroids mechanism.
Sets the CIP index type - by default we use the tagged index object, "application/index.obj.tagged".
For CIP polls, this corresponds to the Data Set Identifier of the server being polled. For ROADS we construct these by appending the (remote!) server's IP address and port number to the Loughborough University Department of Computer Studies enterprise identifier. In the SOSIG example below, e.g.
1.3.6.1.4.1.1828.10.198.168.254.252.8237
The output of the wig.pl program is held in the guts/wig directory. In this directory a subdirectory named after the remote server's handle will be generated. In the subdirectory, an index file generated from the returned centroid(s) will be created, along with a DBM database file used to rapidly locate items within the file. The format of each line of the index file is:
template:oldfield:term
The DBM file is keyed on the terms and the associated values are a list of offsets into the main index file that match that term. The DBM file must be regenerated every time the main index file is changed.
To cross search the WHOIS++ server running on sosig.ac.uk, the Social Science Information Gateway at the University of Bristol, you would create the file config/wig/sosigacuk01. As a bare minimum, this file would need to contain the host name of the server to contact, but in practice you will probably want to include the following:
Host-Name: sosig.ac.uk Host-Port: 8237 Description: Muppet Gateway; lets put on makeup and light up lights.
It's typically necessary for you to contact the remote server's administrator at this stage, because most WHOIS++ implementations will only let you index a server if you've been given permission to by its administrator. The ROADS WHOIS++ server uses an access control list based on the file config/hostsallow, and comes with some default settings which let the ROADS developers index your server by default. To add a new machine, we recommend that you put both its domain name and IP address into config/hostsallow, e.g.
bork.swedish-chef.org: poll 198.168.254.252: poll
Once this has been done, the ROADS WHOIS++ server will automatically allow the machine doing the indexing to "poll" it for centroids. Now all you need to do at the local end is run wig.pl, e.g.
bin/wig.pl sosigacuk01
If the index is successful, subsequent searches of your server will result in the centroid from SOSIG also being searched, and referrals being returned for any matches in this.
the manual page for wppd.pl:
If you want to set up an index server which has no local data of its own, you'll still need to build the main ROADS index, e.g. with bin/mkinv.pl. It's debatable whether this is a bug or a feature!
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Jon Knight <jon@net.lut.ac.uk>, Martin Hamilton <martinh@gnu.org>
bin/wppd.pl - LUT WHOIS++ server
bin/wppd.pl [-cCdDiLR] [-a admin-maxhits] [-e expansions] [-f maxfull] [-g wgipath] [-h serverhandle] [-l logfile] [-m maxhits] [-p portnumber] [-r restrictionsfile] [-s sourcedir] [-S stoplistfile] [-T thesaurus_prog] [-t indexdir]
This is a WHOIS++ server (see RFC 1835) which can be used to make the contents of the ROADS server's database available for searching over the Internet using the WHOIS++ protocol.
Administratively assigned upper limit on the number of hits which may be returned in response to a search.
Make searches case sensitive - by default they are case insensitive, i.e. the case of the letters in search terms is ignored.
Whether or not to return a hit count with the WHOIS++ response.
Return debugging information in the WHOIS++ protocol stream.
Do DNS lookups to find out the client's hostname - off by default, since it results in lots of unnecessary traffic. You can always do this in a batch job later on using the server logs.
File containing list of expansions to use in stemming search, e.g.
colour color
indicates that all instances of the search term colour should automatically be expanded to consider the search term color too.
Sets the default upper limit on the number of records which may be returned in full. The server administrator can set an upper limit on this value, and the client can indicate in their request how many records they would like to be returned in full - though there is no guarantee the server will honour this request.
Path to WHOIS++ Gateway Interface (WGI) executable which should be run on receiving a request. Off by default.
Server handle, unique ID for your server. This should be set by the ROADS installation program, but you can override it here.
Don't keep an in-memory cache of the database index. We do this by default because it speeds up searching. If your machine only has a small amount of RAM you might prefer to read the index off disk rather than cache it in memory.
Whether to log search terms and hit counts. The default is to keep logs of these things.
The name of the file where log entries should be placed.
The default maximum number of hits to return. The client can request an alternative upper limit, but there is no guarantee that the server will honour this request. In particular, the server administrator may have set an adminstratively defined upper limit which is lower than the value requested by the client.
The TCP port number to listen on. You will need to run the server as root if you want it to listen on ports less than 1023. This is discouraged.
File to look in for search restrictions. This is a list of the templates, and attributes within those templates, which the ordinary user will be allowed to search on. Anything which does not appear in this file will be ignored.
Use access control lists.
This is the directory where the ROADS database may be found, if different from the default.
This is the file in which the stoplist used when building the database index may be found. Words which appear in here are silently discarded when they're searched for. For example, if the word "the" was in the stoplistfile, a search for "the AND big AND breakfast" would be trimmed to "big AND breakfast."
This is the location of WHOIS++ Gateway Interface (WGI) conformant thesaurus program.
This is the location of the ROADS database index.
config/admin-restrict - search restrictions for admin users.
config/adminpasswd - password(s) for admin users in /etc/passwd format.
config/expansions - list of simple query expansions, e.g. 'color' to 'colour'.
config/hostsallow - TCP wrapper format list of client domain names and IP addresses, and allowed operations.
config/outlines - template outline definitions (schemas).
config/search-restrict - search restrictions for end users.
guts/alltemps - list of template handle to filename mappings.
guts/index* - database index used in searching.
guts/wppd.pid - WHOIS++ server process ID.
source - the actual templates themselves.
the manual page for bin/wppdc.pl, the manual page for
admin-cgi/wppdc.pl, the manual page for bin/snarf.pl,
the manual page for cgi-bin/search.pl, the manual page for
admin-cgi/admin.pl, ...
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>, Jon Knight <jon@net.lut.ac.uk>, with apologies to Tom Christiansen, and Larry Wall :-)
bin/wppdc.pl - control wppd.pl WHOIS++ server
bin/wppdc.pl [coldstart|status|restart|start|stop|safetyfirst]
This program lets you drive your LUT WHOIS++ server by remote control, making it possible to have it automatically restarted, shutdown and so on from things like cron jobs and WWW CGI programs.
There is only one option, which is the operation to be performed. This may be one of the following:
Start from scratch, ignoring any wppd.pid status files which might be present in the ROADS guts directory.
Dump out some status information about any LUT WHOIS++ servers which happen to be running
Restart any WHOIS++ servers which happen to be running. This consists of a stop followed by a start.
Start the LUT WHOIS++ server.
Stop any LUT WHOIS++ servers which happen to be running.
Start a new WHOIS++ server if there don't appear to be any running already.
This program uses the ps command to find out what processes are running. The options this program takes and the results it produces typically vary quite a bit between different versions of Unix. If you find that this program fails on your system, please get in touch so that we can fix it!
the manual page for bin/wppd.pl, the manual page for
admin-cgi/wppdc.pl
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/z3950_centroid.pl - extract centroid from NWI/EWI objects
bin/z3950_centroid.pl [-d] [-h hashtemp1] [-H hashtemp2] [-s serverhandle] < filename
This Perl program creates a WHOIS++ compatible centroid from the attributes and values in a collection of NWI/EWI index objects, as created by the Combine harvester. Note that you should give a server handle when invoking this program, or the default value of 'undefined' will be used.
The Combine harvester creates its database in a two level directory hierarchy, with a separate file for each indexed object. You can combine them together for feeding into this program using a simple find invocation :-
find HDB/hdb -type f -exec cat {} \; | z3950_centroid.pl -s test01
Or perhaps something more complicated!
Turn on debugging output - very verbose!
Filename to use for temporary DB hash database used in the construction of the centroid. This defaults to hashtemp1, and is used to hold a list of the document titles being indexed.
Filename to use for temporary DB hash database used in the construction of the centroid. This defaults to hashtemp2, and is used to hold a list of the terms in the document text being indexed.
We could traverse the filesystem and look at the timestamps on the index objects - this would let us do a relative centroid.
We don't do anything special about character sets/encodings.
Not up to date with current CIP specifications - this is really intended for use with a WHOIS++ server which speaks the old RFC 1913 indexing protocol.
the manual page for bin/harvest_centroid.pl, RFC 1913
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Martin Hamilton <martinh@gnu.org>
bin/z3950_shim.pl - search gateway between WHOIS++ and Z39.50 server
bin/z3950_shim.pl [-d database] [-h host] [-p port] [-z path_to_zbatch]
This program relays WHOIS++ search requests to a Z39.50 server and tries to munge the results back into WHOIS++ result format. It runs from the command line listening to STDIN and writing its results to STDOUT, and hence is suitable for launching via inetd.
Before passing the WHOIS++ query on to the Z39.50 server, it is munged to remove WHOIS++ search syntax which would confuse it. The search results, if any, are massaged into WHOIS++ templates using the template type GILS-NWI.
The database to use, or "Default" by default.
The host to contact, or "localhost" by default.
The TCP port number to use, or 210 by default.
The path to the zbatch program, or /usr/local/bin/zbatch by default.
This program depends on the zbatch program from the CNIDR Isite distribution - see http://www.cnidr.org. It should be rewritten to include native Z39.50 support!
Z39.50 is a very complex protocol, and it's highly likely that you won't be able to use this tool to talk to an arbitrary Z39.50 server. Be prepared to get your hacking gloves out!
Should be rewritten to allow for operation as a stand-alone server.
the manual page for bin/z3950_centroid.pl, RFC 1913
Copyright (c) 1988, Peter Valkenburg <valkenburg@terena.nl>, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
Peter Valkenburg <valkenburg@terena.nl>, Martin Hamilton <martinh@gnu.org>, Jon Knight <jon@net.lut.ac.uk>.