Transcript of ROADS API requirements meeting - May 6th 1998

Date: Wednesday, May 6th 1998
Location: irc://maple.ilrt.bris.ac.uk:8282/#roads
Convener (and editor of this document): Martin Hamilton

Attendees:

IRC handleIn Real LifeAffiliationInstitution Country
DanDan BrickleyILRT/ROADSBristol UniversityUK
NickyNicky FergusonILRT/ROADSBristol UniversityUK
LisaLisa GrayOMNINottingham UniversityUK
martinMartin HamiltonROADSLoughborough UniversityUK
PJPaul HollandsILRT/ROADSBristol UniversityUK
EEVLLinda KerrEEVLHeriott-Watt UniversityUK
John_KJohn KirriemuirILRT/ROADSBristol UniversityUK
EricMEric Lease MorganALEXNorth Carolina State UniversityUSA
BobBob ParkinsonOMNINottingham UniversityUK
IanIan PeacockUKOLNBath UniversityUK
andyAndy PowellUKOLN/ROADSBath UniversityUK
jasperJasper TredgoldILRT/ROADSBristol UniversityUK
SueSue WelshOMNINottingham UniversityUK

1. Introductions

martinthis meeting is hereby called to order!
martinOK, so where were we ? I was going to say (redundantly, to give people time to catch up on those two WWW pages :-), that I'm Martin Hamilton, and I work at Loughborough University on developing the ROADS software. I share an office with ROADS co-developer Jon Knight :-) who couldn't be with us today. He should be able to make it to tomorrow's meeting, however.
DanOh, right. I'm Dan Brickley, work at the Institute for Learning and Research Technology (ILRT) in Bristol, and am involved with our ROADS effort as well as the Biz/ed subject gateway. I'm here today as part ROADS person and part ROADS-user person...
BobBob is rwp@omni.ac.uk, technical officer, aka, hacker, with a tendancy to read the manual last, interested in backends for the index.
EricMmy name is eric morgan. i am a librarian at the north carolina state university libraries (ncsu) here in the united states. i have been playing with roads so i can better support my alex database of literature links. i have a fat head and you can see me and my office at the following url: http://www.lib.ncsu.edu/staff/morgan/office.html.
John_KI'm John - I'm not a teccie :-) and am mainly as an observer, and to make sure that 1) no-one makes an outrageous claim about ROADS and 2) to chuck any rogues off :-)
SueSue Welsh, OMNI Project Manager, ROADS fan, and mainly here to lurk
andyAndy Powell (UKOLN - ROADS Project partner). Minimal experience of running real ROADS based services but... done fair amount of playing with ROADS tools in various ways :-)
NickyNicky Ferguson, bureaucrat - here to see if John's right that "these chat meetings really work"
jasperHi, I'm Jasper Tredgold and I work at Bristol University as technical support for the SOSIG ROADS-based subject gateway. I also spend some time on ROADS.
PJHi folks, I'm Paul Hollands ILRT Training dogsbody and now roped into ROADS
LisaHello, I'm Lisa, the Project Officer from OMNI, and I'm having a lurk!
John_KWe have an independent observer (looking over shoulder :-) here - Sarah Ashton ex-Netlinks and now the National Maritime museum, who will be setting up a ROADS-based gateway in the near future

2. Approving the agenda

martinIs everyone happy to proceed with the agenda as it stands ?
John_KYes!
EricMi am happy.
DanYes
jasperyep
andyyes
martinOK, I think we have a quorum!

3. Done already?

martinWhat I'd like to do first is clear up a couple of the requirements/comments - 9 and 10.
martinAs per the agenda, I think we've actually done these already for ROADS version 2, and they're not particularly API related.
martinIf anyone is desperate for us to talk about about these, speak now or forever hold your peace!
DanI think you're right; they're interesting issues but out of scope for an API meeting...
martinAs noone has come forward to suggest that we re-include these two topics, let's move on to the third item on the agenda.
martinOr even the fourth :-)

4. What do we mean by "API"?

martinAre the four categories of "API" which I've listed in the agenda acceptable to people ?
DanIsn't the third a special case of the first?
EricMyes, the categories are acceptable to me.
Bobok for discussion
DanYep; i wasn't objecting. Just thinking out loud.
martinThere is a subtle distinction between the two - we could provide a "callback" facility in the ROADS library routines which let you plug in your own code. Alternatively, you'd be faced with hacking at the ROADS code itself, or replacing it with your own. Does that help to clarify?
DanYes. In both cases we're talking about a standard way of interfacing my icky Perl hacks with the ROADS modules...
martinaka Our icky Perl hacks :-)
BobI would think that the idea behind an api is too stop people hacking at the roads code, and provide some sort of plug in replacement for a general roads provided soln.
Dan"plug-ins" are a nice metaphor.
andyuser supplied ranking code (rank.pl ?) is an example of the third kind of API, right?
martinAndy - if anything, I'd put the ranking code in the first category. The ranking module doesn't have the ability to pull in code of yours. If you wanted to extend it, you'd have to hack at it, or replace it. Does that help to clarify ?
andyAh, right... is there currently an example of the third kind of API in ROADS?
martinI suppose you could say that WGI was - though here you're calling an external program rather than another Perl module.
DanIs the distinction between (case 1) physically replacing ROADS perl files with my own versus (case 3) telling ROADS - in a config file of some sort? - the name of my Perl module and function to call. (or, with WGI, the path to the executable program).
martinYes, that's it.
andyOK, agreed
DanIs there a distinction between callbacks that are hard coded into configuration files and ones which are registered with ROADS at runtime?
Danfor eg...
DanI want a script to be called when a template gets added into the database.
Bobprotocol oriented?
DanThis could either be a config file entry $ScriptToBeCalledWhenANewTemplateAdded
Bobpart 4
martin:-)
Danor it could be runtime - call some ROADS function registering the name of my script.
DanAre these the same thing for sake of this discussion?
martinI think it would be better to move on to the next item of the agenda and discuss this there. Holler now if you have any more to say on item 4!
DanFine.
Bobyo

5. Implementation status

martinHopefully I've given everyone enough time to read my monologue on what we currently implement in ROADS...
Bobwell, on line documentation, i do read sometimes, generally after I've emailed Jon/Martin with a bug report
martinDoes the list of possible new stuff which follows in the agenda fairly represent people's submissions ? I want to make sure that we don't leave anything important out.
martinLast chance... :-)
DanThe main thing I was interested in for Biz/ed was having some notion of an Event...
martinDan - which of the R numbers would that be ?
DanCallbacks I guess.
DanYour list: future devts.
martin:-)
martinOK, I'm moving us on to the main part of the meeting now...

6. Discuss!

martinI'll add an extra item - "callbacks" to this section, unless anyone has a problem with this. Holler now if you do...
martinDone!
martinSo, the first question I'd like us to discuss is...
martinDo we need a Perl level API to call (as opposed to running an external program) for tasks such as rebuilding the database index ?
Bobhow do we implement the sql index api?
Boband the sql templates api
Bob?
DanRe Martin's question: do we need perl functions to do things we've got command line tools for...
DanI'd quite like this, but guess it should be trivial for us to do ourselves. exec()?
martinCurrently we run an external program to index newly added records - you could substitute for this with your own program. Another possibility is that we provide a common SQL front end for people using SQL with ROADS - but there are some potential problems with this.
DanSQL is a big tricky problem if we try to address it properly
DanCan we divide up the discussion and talk about read-only versus read/write access to SQL stored records?
martinBy all means....
BobI'd like access to a mk.indexterms function, that calls the internal roads indexing stuff, and just presents me with data that I can bang into a relational table
DanSo you'd be doing your own indexing, in effect, rather than relying on whatever facilties the RDBMS offered?
BobAhh, indexing in terms of word lists, attributes etc, not indexes as in btree
martinIn pre-meeting discussions it occurred to me that another distinction which is worth making is whether the SQL (or a.n.other) backend is being used just to index the records, or store and index them. If the answer is just to index, we could adopt a Harvest compatible model which allowed for multiple search engines - including SQL based systems :-)
DanCould you expand on this a little? Sketch a scenario? (I wasn't around pre-meeting...)
BobI thing that the relational storage of templates should be distinct form the storage of the index terms associated with a template
martinDan - the "Harvest scenario" would be that we had something like the generic db_api.pm suggested in R7, which would act as a buffer between (at least) queries and whichever backend indexer was being used. This is essentially what Harvest does.
martinTo follow this train of thought...
martinDo we also want to wrap up the indexing process inside a Perl module type API ? (that was the original question :-)
DanI've been working with PostgreSQL [http://www.postgresql.org] for relational storage of ROADS-like records, and have found it's internal indexing facilities to be pretty poor for queries of the form 'term appears somewhere in such'n'so field'.
Dan(I started typing the above while it was more relevant to the discussion)
DanTo go back to Martin's comments on db_api.pm, I think this would be an excellent thing to do. However it's not a million miles away from inventing our own simple search protocol...
martin:-)
DanWe've effectively been doing this in another context by writing WHOIS++ wrapper scripts for a variety of interestingly searchable networked reseources. How, apart from being function instead of network socket oriented, would db_api.pm differ?
martinI should mention, for the benefit of anyone who hasn't picked up on this, that Jon has created an unofficial WHOIS++ extension which allows the Tk based version of the ROADS template editor to edit templates and then upload them to the ROADS server.
BobPart of my thought with db_api, was to allow a simple interface between a hidden world of wppd protocol streams/formats and the data. letting roads code format (sql result sets) into a wppd stream
martinWould you want to have a Perl API for regenerating things like subject listing breakdowns ?
Bobyes to wrapping the indexing process in a callable function/module
DanRegarding the WHOIS++ write extension: this is great idea when the databases internal representation maps nicely onto the public view of the data as WHOIS++ records. If the database stores stuff internally in several relation tables, I can't see a WHOIS++ write facility being useful.
Bobno for listings api, shell scripts are good enough here
martinYou should bear in mind that the immediate rationale for it is to provide a way for us to have a template editor which runs on PCs under Windows (and also, coincidentally, on Macs :-)
martinFWIW - at the moment the new template is uploaded into the "pending" templates area, awaiting a ROADS database rebuild.
Bobsuesmachine crashed installing the roads screen saver
martinWe can certainly move the body of code which is currently in bin/mkinv.pl into a separate library module.
PJRe: Tk editor - Perl Tk runs quite happily on my Windows PC and Macs as far as I'm aware. How about a port of Jon's code?
Dan(from John) the screensaver is an alpha release :-) Non-fast machines struggle with it, so am doing a cut-down version of it for june
martinJon has been looking at Perl/Tk too - but unfortunately he couldn't make it for today's meeting. He'll be around tomorrow afternoon, though, and may have some progress to report!
martinthinks... I thought Bob was joking about the screensaver!
DanCan I jump back to the readonly (searching) discussion? We started to talk about a standard ROADS representation for a searchable resource.
DanThis would be an alternative approach to WGI, which passes off all the work of implimenting WHOIS++ to the WGI implimentor.
martinDan - before you get going on this. Do we have consensus on a "yes" for APIs for a) adding a new record and b) rebuilding the database ?
martinWe can discuss details later...
DanCan I give a qualified yes? ie. Yes with-respect-to the ROADS built in database. Not w.r.t. arbitrary backend databases.
Bobyes
PJWe certainly want a painless way of adding new records in a variety of ways that doesn't involve hacking ROADS. So yes to a.
jasperyes
andyyes
martinDan - what's your problem with alternative backends ?
DanI just wanted to be clear what the requirement was. There's a big difference between sending something like a ROADS record to a relational database and saying 'store this' than if we're sending it to a ROADS built-in IAFA based database.
Bobsurley not if the api is correctly designed?
martinYes, that's true - the cop out is that we provide the mechanism for people to write support for arbitrary backends, but aren't likely to be able to code up the actual support ourselves!
DanI'd quite like to be able to do both: say to any ROADS service "store this" and send it a template. But it's a bigger job if we're allowing for relational databases stores with arbitrary schemas.
DanWe've got am agenda bullet-point that covers this: Schemas and schema normalisation for SQL backends
DanThe relational database / SQL philosophy is rather different to the data model implicit in IAFA/WHOIS++ records.
andyDoesn't the API hide the individual database schemas from you?
martinWe also have to bear in mind that some people might want to use their (say) SQL server to actually store the records, and some just for indexing them. So, we need to be flexible enough to handle this - preferably without jumping through hoops.
martinBefore we go any further...
DanMmmm....
martinI'm interpreting the above to mean that on the whole we like the idea of APIs for a) adding new records to a ROADS server and b) reindexing the database.
DanCould you clarify what a ROADS server is?
Dan(I'm not being facitious. I think we could do with inventing names for the different components ROADS offers.)
martinSorry - I'm just using that as shorthand for "installation of the ROADS software"
DanI've a ROADS server (aka installation) that uses WGI instead of the ROADS-textfile-database. I suspect it's the latter that
Danyou mean?
martinNo, either - if we have an API in-between it shouldn't matter.
DanOK, if that's what we mean. I vote Yes. It's a hard problem though.
martinIt does require us to be disciplined in terms of what we throw about when calling the various ROADS modules.
DanIn what way?
martinAt the moment we use global variables all over the place - but we should have a more clearly defined interface between the calling program and the module which implements the API.
jasperHow much of a re-write do you reckon this would be?
martinNot much work to hide the indexing code behind a Perl module front end, but there may be implications to exterminating global variables that run right through the code.
DanSo... is the consensus that we would like a standard API in ROADS for reading and writing data to permenant storage, and that this should abstract away from details like whether we're using a relational or textfile or OO databases behind the scenes...
martinThat's my reading.
DanWe could be very lazy and just have a storeThis(String IAFAtextfile) function that throws a record at someone elses code.
martin:-))
Bobyes, and its up to the roads hacker to implement the appropiate details for the particular mechanism of long term storage, for there particular DB system
DanOr we could try for a more abstract representation of the data to be stored. We all did a fair bit of work on this earlier in the spring, and as I recall ended up with a representation of the data using triples...?
martinAt the moment we have a very simple internal representation of a ROADS template when read in using readtemplate()
martinIt returns an associative array with each of the template's attributes as an element, and that attribute's value as the element's value
DanYep - it flattens all the structure down to a simple list of attrbutes (including -v1 -v2 suffixes etc)
DanSince the attribute names themselves also have composite structure (Author-Email; Admin-Email etc) we're passing off a lot of the work onto the backend implimentor.
Bobsorry, still thinks its not too difficult
jasperWhat would the advantage be of having a more complex internal representation of a template?
BobKISS
jasper?
DanWe've already got the complexity; the whois++ schema gives us that. It's whether we make the ROADS internal representation reflect that complexity that is at stake here.
jasperWhen we looked at the ROADS object idea before I got the impression that it would be too much to get into v3?
Danit's all gone quiet... :-)
martinI'd like to suggest that the current approach actually makes it quite easy for us to do quite complex things - just by taking advantage of Perl's associative array mechanism. Having said that, it might be worth us hiding the actual internal data structure in a Perl object, just in case we want to change it in the future.
martinI guess the question boils down to... what do you want to do that readtemplate() is getting in the way of ?
martin:-)
DanI've no problem with readtemplate(). It's the idea of implimenting writetemplate() for relational databases with a variety of schemas that gives me the worries...
DanROADS templates have lots of interesting hidden structures, eg. list-valued attributes (Keywords; subject categories); hierarchical structure (clusters) and so on.
martin ... and you have to have semantic knowledge of the (ROADS) schema in use if you're to do something sensible with them at the back end ?
DanExactly.
DanI suspect it could be such a headache to impliment writetemplate() properly that we could find nobody using it.
martinIn ROADS-speak you can see whether an element is repeated or a cluster included by looking in config/outlines/*
martinList of things (e.g. keywords) are another matter!
martinI think we've moved on to schema normalization and SQL backends ? :-)
Bobwait wait, am I missing plot here. Its down to roads to supply a suitable API, and a module that'll interface to a bisic index/storage mechanism, its down to the roads user to implement the backend for a particular system
DanFor some definition of "suitable API". Would it be suitable for the receving code to just be thrown an IAFA/WHOIS++ template?
martinWe could say a Metadata::Template object ? :-)
martinWith a public method to retrieve a list of attributes
DanSounds better already!
martinAnd some magic to let you get the value for each element.
DanThe public method to retrieve list of attributes is getting more interesting. We ought ideally to let the config/outlines schema info be inspected...
martinYou mean in a programmatic sense ? So that a (say) Oracle backend could find out what its host ROADS installation was using for its (ROADS) schema ?
DanOr is that getting a bit sci-fi?
martin"ROADS - it's Rocket Science" :-)
DanI've not mentioned CORBA-ROADS yet ;-)
martinnow now1
martins/1/!/
martinI think the schema issue is problematic at both ends. Consider...
martin1) you install ROADS and create some template types and new attributes of your own
martin2) you have an existing SQL database which you want to interface to the ROADS WHOIS++ server
martinIf we're going to try and cater for both cases, we need a "schema normalization" module
DanRocketScience.pm?
martinI'm sure that one's been taken!
martinDoWhatIMeantNotWhatISaid.pm
martin:-)
DanAnd we want to be able to store records be sending them to some storage function which will Do The Right Thing, by having some knowledge of how our schema maps onto its favourite way of storing information.
BobIts not outrageous to attempt to automatically normalise a set of roads templates, and create a fresh set of table defns containing any new attributes.
martinThis is essentially what Digger (Bunyip WHOIS++ server) used to do - though they dropped the SQL support because it was too slow and unreliable...
martin:-)
martinFrom a ROADS developer's point of view, the main issue for me is whether I should be putting some infrastructure in the code base to help people who are trying to do 1) or 2) above. Assume that we already have an API which they can plug their own code into.
martine.g. should there be a ROADS module which lets you read in the current (ROADS) schema ?
martinFrom the silence I assume everyone's feverishly typing... or dozed off!
DanWas this the topic that was going to be addressed in ROADSv3 but moved back (forward?) into v4?
DanI'd (for Biz/ed and as a ROADS person) like to see this addressed. Relational databases aren't going to go away, but ROADS offers a lot of value added stuff which could work very well alongside out-of-the-box database packages.
martinNot schema issues particularly - we were going to do a no-holds-barred rewrite for version 3 (think Microsoft with Windows here :-), but were persuaded to hold off on this until version 4. If we're to properly support third party code, we do need to do some of this work in version 3 - IMHO.
BobId like to see an emphasis on getting a nice api I can use, even if I have to keep an eye of mismatch between templates and other representation
DanYep. It would be great to get simple support for writetemplate() facility, as bob says, without waiting for the CORBA/RDF rewrite ;-)
martinWhat would you like ?
DanMe?
martinYes, you.
martin:-)
martinI know - TheWorld.pm !
DanI'd like to start simple with all tempate writing being bottlenecked through one bit of code, so ANR people could - in principle - replace it with their own way of storing stuff.
martinOK, let's call that writetemplate()
Boband reading :-)
DanEven if this was done just by passing it the representation used by readtemplate().
DanAs a second item, I'd like ROADS exec to let us look again at the schemas and data model stuff, so we can have a more sensible representation of the records that reflects their internal structure.
martinWell, we happen to have an Exec. committee meeting on Monday... :-)
Danwithout this, I would be suprised if many people implimented alternative backend stores for ROADS.
DanOh, forgot to witter on about RDF. Consider it done ;-)
martinand XML :-)
DanNah. XML's the new ASCII. RDF's the new Java...
martinGoing back to the method formerly known as writetemplate()...
martinIs it acceptable to be reading the ROADS schema out of config/outlines/* or should we be providing an API for this too ? NB: this is not the same thing as the SQL (or whatever!) schema.
DanDo you mean just dish up the textfile?
BobI'm not sure that you need to supply the schema associated with a template.
DanJust the record-type would be enough. People will hardcode this stuff anyway.
martinI'm getting the impression that we don't need to write any special code at the ROADS end to make the schema info in config/outlines/* available to hackers. What about the back end - is it any of our business what the ROADS attributes map onto ?!
Bobno
DanNo. Although tools / a framwork for doing the mapping might be a good direction for ROADS to grow in.
jasperSorry. Got caught by the phone. Can we /have we talked about the callback stuff yet?
martinNot really!
martinBefore we go onto that...
jasperShall we? Or tomorrow?
martinI'd like to suggest we discuss it, and then reconvene tomorrow to talk about the actual technical nitty gritty - that should give me enough time to throw something together code-wise
martinAmbitious, but it might just work... !
DanMartin - you started to talk earlier about a Harvest model for searchable things...
martinYep!
DanWould this be something the CGI-based search client would use, instead of assuming that everything in the world was a W++ server?
DanHow would you see this idea fitting in with ROADS-as-we-know-it?
martinI was thinking about the bolting-on of backend databases to the WHOIS++ server - Harvest defines an API of sorts for doing this.
DanAh, right. Making it easier to write backends without having to know the WHOIS++ formalities.
martinWell...
DanThis would be coming up with a simple (ie. not Z39.50) API for searchable resources. Do we have schema traumas again?
martinNo, it's OK...
Suebob says you can't talk about this while he's not listening :-)
martinThe glue code would still need to do something with a WHOIS++ query (or a hypothetical future ROADS "query object") and return WHOIS++ format records.
DanNot ROADS::Metadata record objects?
martinWe already have this with WGI, but this is geared up to launching external programs. It would be useful if we could run a Perl module which might or might not launch an external program.
DanMartin - yep, doesn't sound like much of a bigger win over WGI unless it abstracts away from WHOIS++ a bit.
martinThe actual module/method to run could be implemented as a callback - cue Jasper :-)
jasperWas that a clever link to moving on to discussing callbacks?
martin:-))
martinOr would you prefer to do this tomorrow ?
jasperI'm happy to start it now. Dan? Anyone else?
DanHow about we talk about implimenting a Z39.50 API in ROADS?
Dan;-)
martinOK - callbacks.
DanYes...?
jasperWould it be better to implement callbacks as modules rather than separate scripts?
martinI'd been thinking about them as modules - with something in (say) ROADS.pm which let you override the default ROADS behaviour
martine.g. call Jasper::MyTemplateThingy::writetemplate instead of ROADS::Template::writetemplate
martinI'm not even sure if that's possible, though!
jasperIn terms of where they should go...
martinThinking in terms of Perl references, typeglobs, ...
jasperwhen templates are inserted, edited, deleted...
jasperwhen templates go 'stale'...
martinWe : could have callbacks both at the "method" level (e.g. replace the default bit of code to insert a new record) and also within the method itself
martine.g. "before inserting a new record", "after inserting a new record"
martinThe difficulty is knowing when to stop! cf. X Windows :-)
jasperYes. As a suggestion, could we have, say, half a dozen in the next beta to play around with?

We agreed at this point to continue the discussion in the session scheduled for the afternoon of Thursday, May 7th


The End (of part I)