Transcript of ROADS API requirements meeting - May 7th 1998

Date: Thursday, May 7th 1998
Location: irc://maple.ilrt.bris.ac.uk:8282/#roads
Convener: Jon Knight
Editor of this document: Martin Hamilton

Attendees:

IRC handleIn Real LifeAffiliationInstitution Country
ecdbDan BrickleyILRT/ROADSBristol UniversityUK
martinMartin HamiltonROADSLoughborough UniversityUK
EEVLGeir GranumEEVLHeriott-Watt UniversityUK
jonJon KnightROADSLoughborough UniversityUK
John_KJohn KirriemuirILRT/ROADSBristol UniversityUK
BobBob ParkinsonOMNINottingham UniversityUK
UnaUna O'SullivanROUTESOpen UniversityUK
jasperJasper TredgoldILRT/ROADSBristol UniversityUK

1. Introductions.

jonbangs loudly with a gavel
jonI'd like to call this meeting to some sort of order...
jonLooking at the session list, I'd guess that we all know who we all are and so there's no need for intros today... is that OK?
John_kYes
Yep!
Bobyes
ecdbhuh

2. Approval of the first meeting's conclusions.

jonI'm happy with point 1) (no major new features).. Any no's out there?
ecdbDoes 'creating a record' count as a major new item?
martinWe do do it already :-)
ecdbvia API?
jonI don't think so. Its more of a repackaging of what's already there. I'd taken "no major new features" to mean no new funvationality AND an API for it.
ecdbOK.
jonAm I right in thinking that (seeing as I was in Brum yesterday!)
jonIs everyone happy that there are no major new features requiring special APIs?
martinYep!
Bobyes
jasperuh-huh
jonRight, then the second conclusion from yesterday was that some of the external record creation/indexing/removal code be pacakged up as Perl modules.
jonIs this OK by everyone?
martinYep!
Bobyes
ecdbYep.
ecdb'some of the'? Not all of it?
martinThe details are below...
jonJust the three things in the web page - creation/indexing/removal.
ecdbYep, so we're saying that all creation/indexing/removal will go through the modules?
jonI guess so. Seems a bit pointless otherwise.
martin:-)
ecdbthat's why I was puzzled by 'some of the ...' above.)
martinDon't get hung up on the language - it was just written in a hurry
Bobjust to be pendantic, indexing means taking a template and getting a data structures, that is then feed, by whatever mechanism into the presistent storage system.
ecdbNo problem!
jonOK "all of" - I was cribbing the line "Some existing functionality in external Perl programs <blah> <blah>"
ecdbWhile we're on the topic of pedantry... could we move to saying "record" consistently instead of sometime record and sometimes "template"?
jonBob - we can discuss the details of what indexing means later in the meeting.
Bobok
jonDan - this is a historical hangup but yep, sounds reasonable. Maybe we should use "template" for the unfilled in outlines and "record" for the filled in version
jonMartin and I are bound to slip in meetings/IRC/phone calls though as we're so used to using them interchangably
martinI've replaced the couple of references in the future tense to ROADS templates on the agenda page with ROADS objects
ecdbWill this be a pain for you in the code? eg. readtemplate()?
martins/readtemplate/readobject/ :-)
ecdb:-)
jonOK, so I think we agree that all record creation, indexing (whatever that turns out to mean) and removal will be packaged up as perl modules.
martinYep!
jonOnto the last conclusion - that we'll basically clean up the code to get rid of global variables were possible and
jonreach an agreement on the standard parameters and returns from the API calls/
Bobumm, i went yesterday just as dan and martin were moving into the areas of support for wppd.
jons/\//\./ :-) for the programmers out there
martinWe didn't really move on much further - the agenda is mostly me trying to rationalise how we'd actually do what we discussed on IRC yesterday
martindo == write code to do :-)
ecdbAs for getting rid of global variables... will that help with getting ROADS running under fastCGI type setups?
martinYep!
ecdbkewl
jonThe globals are mostly due to my sloppy coding/the fact that the ROADS tools have often evolved (mutated?) beyond their original aims
jonSo does that sound OK to people?
martinYep!
jaspersounds good
jonRight, that seems to move us onto section 3 in the agenda...

3. The API itself.

jonOK, the first suggestion is that we have an open database routine
Bobjust a sec, what database, template or index?
jonNot really too much use with the current ROADS backend but as Martin says, it could well be useful for forming connections to SQL databases
martinBob - It was your suggestion :-)
Bobwhy ideas were all oriented to wppd/sql interaction for faster data retieval
Bobmost of the talk yesaterday seemd focused on the template side of things, and missed the wppd side
martinSee the "Perl visualization" section later on in the agenda for how I'm picturing this getting used in practice
jonThe fact that you can currently see the ROADS index as separate files from the actual source files is an "implementation detail"...
jon...its all still one database surely?
jonTo think in SQL for a moment....
Bobif you say so... not convinced
Bobi think of them as separate
martinThe rationale for "opening the database" specifically is that with session oriented backends you need to get a handle at the start of the session, and will most likely need to pass it to other functions which interact with the database backend.
jonour library OPAC all exists inside a single Sybase SQL database. That's the MARC data (equivalent of our records) and the indexes (equivalent of our indexs in ROADS/guts)
martins/need/may need/ :-)
jonDoes that make sense to people (or am I talking silly?)
martinI should mention in passing, in case it's not clear, that the things after the dashes "-" in the bullet points for section 3 are what we do at the moment (where appropriate)
ecdbjon -makes sense to me
BobI just think of them as separate as I can envisaged having a go at the code to handle templates at a different time to the code for the index
Bobstill, looks like i on my own here
ecdbyesterday, there seemed to be some talk about splitting out the indexing from the main datastore, eg. keeping records in ascii files but indexing using some relational table somehow
jonThat's still an implementation detail. The fact is that the records (ie filled in templates) + the indexing (how ever it is done) is what I'd call a database for this discussion
ecdbOK.
jonOf course, you might all disagree with that "fact"
martinNo :-)
Bob'for this discussion' then
jonOK then, in that case does an "open the database" API routine sound reasonable (it does to me - I tended to think of SybPerl's dbopen routines when I read that)
martinYep!
jasperyes
Bobyes
jonRightio, I think we can take it as read that the close database is OK and move onto the get a list of record handles in the database
martinWe didn't explicitly discuss this yesterday - but it's something that a lot of the ROADS tools need
Bobseems to be a useful function
jonThis is currently done using ROADS::ReadTemplate::readalltemps() - obviosuly if you plug in Brand X database and use that to store the records you'll need to have some way of retrieving all the handles
ecdbOh, i've been using `ls source/` ;-)
martintsk!
jonsort of like (in SQL): select distinct(HANDLES) from RECORD
ecdbYes, it would be useful.
Boblucky you, I get arg list too long, ...
jasperme too
Bobsome things gone wrong if you want distinct(handles), but thats the isea
martincut -f1 -d' ' guts/alltemps
jonOK, seeing as we're talking about inputs and outputs later on, can we say that an API to get all the handles is a good idea and move on to...
jon...reading a record
jonThat's currently done with ROADS::ReadTemplate::readtemplate()
jonI guess in the new naming scheme that's going to need to be something like:
ecdbSoon to be ROADS:ReadRecord:readrecord() ?
martin:-)
jonWhat Dan said... :-)
martinOr ROADS::API::Record::read()
jonOr better yet, what Martin said. :-) :-)
Bobok so far
jonWhich I guess means that we'll also want a ROADS::API::Record::write()
jonright?
martinYep!
Bobyup
jasperyip
ecdbAre we voting on this? (ie. yep)
John_kYes
jonNow whether this writes out to an ASCII file as we currently do or sticks it into some un-indexed SQL table or BLOB or whatever is again an implementation detail in my mind. Am I right?
martinYep!
Bobyes, a mere trifle
John_kThat is an ecumenical matter
ecdbAre we talking about a "record" object with a write() method? or a write function thats passed a record?
Bobnow that is a detail for 'ron
ecdbok
jonWe can discuss that in the input/output part of the meeting later.
jonOK then, next - an API call to index record. Something like ROADS::API::Record::index(). I guess this is time to look at your definition of indexing again Bob.
jonTo my mind, indexing is the operation of generating some (implementation defined) data structures that allows you to rapidly locate records matching a users query
ecdbSounds ok to me.
jonNow, I'd say that it would be "persistant" in that that the index would remain intact after the Perl code that created it has exited....
Bobthis comes down to my confusion if the index and the templates are 'the same database' or not
jon... but that it isn't persistant in the 'be around forever" sort of persistance.
martinIf you only use the ROADS::API code to read/write/index/... your records, it doesn't matter whether they're the same database
martinBear in mind that you might - or might not - want to store the objects in the same database you use for indexing and searching
jonOK, in which case I take it that a ROADS::API::Record::index() is OK by everyone?
martinyep!
Bobyes, i'm just trying to reorient my thoughts
Bobbut yes, need an index function or class or whatever
ecdbyep. I think. having the index implimented in one system and the data stored / cached elsewhere should be OK within this framwork. If anyone want to do that.
martinThink Glimpse or freeWAIS
jonDan - its one of those cool "implementation details" that as API designers we don't need to worry to much about (as long as the API doesn't go out of its way to make it impossible to do!)
jonRight, assuming that the need for an indexing call is OK, I think we can move onto the next one: delete a record.
ecdbROADS::API::KillEmAll()
jonNow this does make me think - remove the record AND the index or just the index data or...?
ecdbDeleting gets my vote.
jonSo I'd like to propose this:
jonROADS::API::Record::unindex() and...
jonROADS::API::Record::delete()
martinI'll go for that!
jonThe late would do the former and delete the actual data
jons/late/later/
jasperyep
ecdbYep
jonMost people would probably only use ROADS::API::Record::delete() most of the time, but the other is there in case its needed
Bobim still thinking this
Bobi.m not sure
BobI think that the operation delete template has two distinct elements
jasperHmm.. shouldn't we keep the two methods separate?
Bobthe template element and the index element, should the functions operation on different bits of the same ( :-) ) database
jonJasper - what you mean have ROADS::API::Record::delete() NOT unindex the record? I'm easy on that one.
Bobbe coded within the same function
jonBob - I don't think I quite follow you. Could you explain some more please?
Bobdeletefromindex(handle) and deletefromtemplate(handle)
ecdbI'd prefer both functions to exist, and have a convenience function that did both
Bobi dont think the delete a template function should have an implicit call to delete the keywords from the index
jonAh right... well that's fine by me - any objections?
Bobis the computer science jargon 'cohesion'
martin:-)
martinCan we wrap this up with - you might want to store a template independently of indexing it, e.g. if it hadn't been approved, and just been submitted by a trusted info provider ?
martinDitto removal from storage, removal from index
Bobnice example
jonHang on a mo though - deleting a record without un-indexing it can break things surely?
martinwhere deleting == deleting from "storage"
Bobsure, but if i'm writing some code to replace your functions,. thats my problem?
jonwhatever "storage" turns out to be an in a particular implementation
Bobas long as your base distribution does the right thing
jonFair enough, but it does seem a little dangerous to give the API built in semantics that can nuke a database's consistancy.
jonAnd what if someone else later uses your code without realising that the record removal doesn't un-index?
jonStill, if people are happy with that danger I don't mind pressing on...
Bobthey'll have some fun then won't they :-)
martinOur base distribution will call deindex/unindex/... what have you when deleting a record :-)
jonOther opinions on this before we move on?
jonOK, lets move on to performing a search. I think we can safely say that most people will want this, right?
martinWe don't have to thrash out absolutely everything today - but it would be good to get through as much as possible while we're focussed on what we want
martinPS Searching is good for me!
ecdbSearching's good. Are we talking about an API for pulling handles out of the database or distributed whois++y searching. The former I'd have thought.
martinROADS::Index::index_search()
martini.e. searching a local "database"
Bobis this a detail, or, will the search function provide a protocol neutral result stream, and will the roads SW format this to a wppd format
ecdbYep. Is the proposal that we treat the backend as something that understands whois++ queries? ie. just send in w++ formatted query
ecdb(+ what Bob said)
martinthat's an input/output issue - can we deal with it later ?!
Bobok then
ecdbOK. Last thing on searching: what about referrals? Is this something we consider all backends as potentially capable of doing?
Bobnot mine
ecdbor is the forward knowledge stuff separate?
martina) that's outputs, and b) see sample Perl at end of agenda !
Bobbut the wppd on front may
ecdbOK. Section4 then...
jonUp to you I would guess, but the API should allow it. As Martin said we can discuss that later.

4. Inputs.

NB: may want to turn the internal representation of a WHOIS++ record from a simple associative array into a Perl object. Needs to be investigated.

jonOK, to speed things up in Section 4, can I just put out each headline and then you all yell out "OK" if its fine or discuss it otherwise?
martinOK
BobOK
jasperok
jonopen database
ecdbOK
martinYep, er OK
ecdbSounds handy
John_kok
jasperok
Boblocal or remote databases important here?
Bobas thisd is inputs thinking about arguments
Bobor is this trying to be too clever (or too silly) by half
ecdbguess it could be remote. Could you use URIs to identify (eg whois++://omni:8327/ )
jonI was going to say local but Martin is just convincing me that it should be both. In which case I guess we need to have a structure of some sort to pass hostname, port, protocol, etc
ecdbHowabout just identify the database with URI, and let ROADS throw an exception if it can't talk to what you've asked for.
jonURI might be a good idea in some cases but what about SQL databases? Sybase://liba.lut.ac.uk/DB=prod_talis/ ??
Bobnick odbc style ideas
Bobas jons example
martinThe intention when writing this document was that it should be for opening a back end database, and that you should use ROADS::WPPC::wppc if you wanted to talk WHOIS++ to someone. It might, however, be useful to be able to wrap up other things in a single API... Can we concentrate on the backend database for now, though ?
martinBearing in mind that ideally ROADS::WPPC will become Net::WHOIS++ or something like eventually....
ecdbOK. So there's no issue of identifying which database as there's only one per installation?
jonBearing in mind of course that the world being what it is, a "backend" database might not actually be on the machine that the Perl code is running on.
ecdbyep
martinI'd like us to be able to ship support for multiple contributed backends, e.g. Glimpse, freeWAIS, ...
martinAnd for people to be able to choose in say ROADS.pm which one to use
ecdbSWISH-E is half finished - need to do something about fielded searching...
martinSQL type people might need to pass user names and passwords too
ecdbHow about... each ROADS installation has a set of convenience names for backends it knows about
martinMy reading is that we probably couldn't reach closure on arguments today, but we can protect ourselves with the Database => $database type approach (see Perl fragments below)
martinbelow == below in the agenda :-)
ecdband per-backend configuration files appropriate to the type of backend we're dealing with.
jonSQL people will also need to pass database names as a single SQL server typcially hosts multiple independent databases.
ecdbI agree.
jonSounds nice and flexible to me. I guess the proof of the pudding is in the eating though... :-)
ecdbSo... give ROADS a notion of named backends "GRAPEVINESQLBACKEND" and leave it to the implimentors to handle the various other parameters associated with that backend type and that particular database within the backend.
martinSomething like that!
Bobyep,
ecdbThis is what we've done with grapevine more or less. An adhoc config file mapping fieldnames etc.
ecdbShall we move on?
Bobyes
jonOK, lets move on. close database inputs.
jasperok
martinOK
jonOK
Bobany assumption for SQL users about commit and stuff
jonI assume an $database->close(); will commit any un terminated transactions if it can or return a failure code.
ecdbup to them I'dguess. if their notion of writing a template involves several stages, they can wrap it in SQLese to make it an atomic operation.
Bobim just flaging thoughts here
Bobnext point?
jonread record
ecdbI don't like the filename bit.
ecdbIs there a need for it?
martinSeemed like a good idea at the time :-)
martinThis is what ROADS::readtemplate::readtemplate() currently does, though we might not want to advertise it
Bobas this is an api, handle is enuf, then the local implementator can do it to read a file if wanted
ecdbCan we have a quick vote? Does anyone feel that we need to say its a filename? (of course it could be , behind the scenees)
Bobno
jasperno
jonI'd go along with Bob - the filename was a hack that made life easier for us but we can work round it.
ecdbno
jon(that's a no folks)
martinIn the current implementation the filename is a separate argument which you can supply. If you don't the template with the specified handle will have its handle mapped to a filename in guts/alltemps.
martinThat's a long winded way of agreeing with everyone :-)
ecdbOK. What about writerecord... could you sketch how variants and clusters work? and how attribute ordering happens?
jonOK then, onto: write record
martinDan - it's up to the backend implementor to decide how to store the object
ecdbYep. So theres no ordering of attributes other than what you know separately from knowing the outline files
jasperwhat happens if the hash doesn't contain elements that are already in the record?
jonIn our case I'd say that the basic code from mktemp.pl would just use the config/outlines to determine the attribute ordering (not that it really matters)
Bobjust a though, we have not thought about the template structure, martin has a possible assoc.
Bobie the representation within the api
martinI've got two socks :-)
Bobsame color?
martinActually, yes!
jonColour even.
martinCan we discuss object structure "later" ?
Bobok
Bobnon-linear or what
jonOK then folks - are we OK to go onto: index record?
martinOne thing - Jasper - I'd say that an update of an object *should* overwrite an existing one, so any existing bits which aren't in the updated version *should* be lost
jasperok
ecdbSeems simple enough to remember
ecdbWhat about locking?
jonWhich is not how my CHANGE operation in the WHOIS++ server for v3 works I might add... :-)
martinLocking is up to the backend implementor ?
Bobyes locking
Bobmight make it optimum
ecdbHow will the backend know when a record is being 'checked out' for editing?
martinThat could be part of the ROADS object structure ?
jonThe reason being (for CHANGE) that not all of a template might be returned to a template editor for editing.
Bobdoes it matter? delete record, delete terms
Bobedit in browser, insert termplate insert terms
jonDan - it doesn't. However the CHANGE protocol allows you to dump the templates in to a pending directory so that a cron/web based indexer can pick them up later
jonBut enough of this aside on CHANGE... I shouldn't have mentioned it really. :-)
ecdbOK. Guess it wont hurt in practice.
jonRight, is everyone comfy with the inputs to index records?
martinOK
jasperok
ecdbokj
Bobyes
jonOk then: delete record
jon(I guess this is actually two now - unindex and delete)
jasperok
ecdbhandle is fine
Bobyes
jonWhat about a * to unindex/undelete all records. Martin doesn't like it but it seems to me that some people might like it. Opinions?
Bobno
ecdbdangerously appealling. no.
martinyou might just type it by accident somewhere... :-)
Bobperform search then
Bobno for whois query format
jonOK, so I think that we can safely say no to a * in un-index/delete and use handles. OK with all?
martinOK
Bobok
jasperok
jonRight: perform search
ecdbagree with Bob. No to whois++ format. We should start to lose the assumption that everything/one will end up using WHOIS++ :-(
Boba bit more of an abstract format that wppd
jonReverse Polish Notation? :-)
ecdbYep. Simple though...
martinSick!
ecdb(yep was to bob not to poland)
ecdbWe should look at whether we've any requirements of a query interface that W++ didn't meet.
ecdbeg. "please exandthis query with your thesaurus"
martinyou had your opportunity :-)
ecdbor ability to search for records where Subject-clas-fication-scheme is DDC and value is 321
ecdbI'm not saying impliment this. Just that if we move away from W++ it might look feasible to think about this more
martinWe've got a query language already - if we're going to replace it, we need to replace it with something. What ?
jonOK, so if not WHOIS++ or RPN parse trees, what then?
ecdbWell it could be the ROADS query language which just happens to look a lot like Whois++.
martin:-))
Bobactuall, an rpn tree structure may not be such a bad representation
martinwhich bit of WHOIS++ query language don't you like ?
ecdbI've no go answer to the what-would-be-better. Theres SQL3 and OQL to read up on...
martinI'll get me coat :-)
Bobeasy to parse and turn into any (?) format
jonSQL is a no-no as it relies on you knowing the E-R model to form sensible queries.
ecdbI don't like the fact that there are things I wanted to ask of my server that whois++ didn't allow. It knows nothing about the variant/cluster structures we use to organise the data.
jonAnd an immediate <CLUNK> to anyone who suggests Z39.50 style ASN.1 :-)
ecdb:-)
ecdbThere are noises in Z39.50 land about a Z39.50 lite, and possibly alligning that with a future RDF query language...
jonThat's because the variant/cluster feature of IAFA templates was, after we started to use them, considered to be broken.
ecdbAnd RDF will probably pinch something that already exists...
Bobumm, is the search "behind" the wppd, or paralled to it?
ecdbOh, you never told us it was broken.
martinYou never asked :-)
jonI didn't think it was for what we were using it for. But the more you try to push it to do....
ecdbYep. It's not great for complex data. Multilingual with multiple URLs and 3 classification schemes, two authors etc...
Bobthats not what i called the variant/cluster "feature" of IAFA
martinHang on, folks...
ecdbOK.
martinWe're talking about query language here
martinJust to keep us on terra firma :-)
ecdbWhich is bound up with the logical structure of the thing being queried.
ecdbno?
martinThe logical structure and how its written down are two different things
martinLogically the variants and clusters thing is very nice
ecdbAgreed.
martinIn practice it's a pain for the reasons we've discussed from time time
martin...time to time
ecdbMore like, whois++ doesn't understand the nice logical structures IAFA gives us.
jonOK, here's a proposal to get us out of this quagmire: we pass in an argument of the search syntax to use as well as the query
Bobsorry to repeat myself, but I've assumed that the search function would be behind the wppd, is that the current working assumption
martinBob - you mean this is what wppd calls when it gets a query ?
Bobyes
martinThat's my assumption too. Though it could be used behind something else - e.g. LDAP, Z39.50, ...
martin:-)
ecdbYes
jonSomething like $result = new ROADS::API::Result $database->search("WHOIS++", "sex and drugs and rock and roll");
martinHarvest, ...
Bobno jon
ecdbWhich brings us to the syntax of the stuff that comes back...
martinNo - that's later
Bobi think roads should make some effort to parse the query
jonOK, so if we don't what to have multiple syntaxes, then we need to decide on the syntax to pass into this API call.
Bobi liked the idea of a tree structure
jonThough what about $result = new ROADS::API::Result $database->search("ROADSParsed", "(((sex drugs AND) rock AND) roll AND)");
jonfor those that want it?
Bobi'll be happy with that, perhaps ought to writer that chunk of code in lisp, but ....
ecdbQuery languages hurt my head
martinWhat do we gain by doing this ?
jonWe take the parsing of what ever we get in from the punter into a separate routine I guess so that the backend code only really needs to know about its particular query language(s)
jonDifferent backends might share the same "query normalisation" code though
Bobroads parses the query to an independant format, and parsing the query is the hard part. the implementator takes the abstract representation and creates SQL or whatever, relatively easy
ecdbsounds plausible to me
jasperwhat about options (case sensitive, etc)?
jonWell, I don't know about that Bob - I don't really think there's much difference in complexity between the WHOIS++ and the RPN if you're going to have to translate it into SQL. But that's just me.
Bobagreed that whois is not too bad and could be used, just thinking what I'd _like_ here
jonI guess that for case sensitivity, etc, you'd actually need something that did quadruples rather than (RPN style) triples:
jon(sex drugs AND "case;expand;turnpurple")
ecdbdo you want case/expand/etc properties per bit of the query or per query?
martinand if we're to incorporate support for multiple languages, character sets, and character set encodings...
jonPer bit of the query would allow local constraints (a bit of the WHOIS++ spec we currently ignore)
martin ... and which is being written out of the updated WHOIS++ spec
jonAnd someone is *bound* to want it for something in a year or so's time.
ecdbUpdated spec? WHOIS++++? I thought W++ was 'resting' now.
martinNo, draft-ietf-asid-whois*
martinThe main thing is (going back to query languages :-) that we have WHOIS++ already, but if we do something else it means that we have to do lots of work.
martinLaziness, Hubris, ...
martin:-)
ecdbOK, appreciate that. Only concern is that ROADS isn't seem as just a whois++ system.
ecdbZ39.50 blah blah RDF blah blah LDAP etc. etc.
martinSo, if we do move away from WHOIS++ as a query language, we need to be very clear on why we're putting lots of effort into doing this
ecdbYes. Shall we continue this another time?
Bobyep
jasperok
martinOK
jonMove the discussion to open-roads?
jonor roads-hackers more probably?
ecdbyep
martinOK
martin(either :-)
jonOK, roads-hackers it is then. Onto the gorgeously chunky section 5 of the agenda....

5. Outputs.

jonOutputs. Again, I'll call them out - OK if they're fine, argue (discuss) otherwise
jonopen database
martinOK - though see database object stuff below :-)
jasperok
Bobcould have a simple boolean, or try to pick up the SQL resulkt code
Bob(if you use sql)
Bobpick up -> return
jonI think we need a stand "Yes it worked"/"No it didn't" with a follow on implementation defined error code/message if needed
jons/stand/standard/
martinI'm thinking that in practice we'll probably want to return an object that encapsulates any private info which is needed by the back end for that database, or "undef" if the open failed
martinThis didn't make it into the agenda, coz I forgot to put it in!
martine.g. might need to hold a socket/file descriptor, process ID, Unix domain socket's filename, ...
jonHow about always return an object (no undefs) but have a OK/Not OK flag in it so that you can then look into the object for the error code/message if things go pear shaperd
martinOK!
Bobyo
jasperok
jonOK, then: close database
Bobsame for close ?
jonHow about more or less the same thing
jonRight Bob!
jasperok
martinOK
jonOK, straight onto: read record
Bobit looks like a difference between a totally exposed assoc. type structure or an encapsulated object?
martinif it's an object, we get to give it methods, and we *could* hide the internal data structure
ecdb:-)
jonWe've got a hash now and that sort of works. A Perl object would be neater though. I'd go with the latter were it not for the fact that our glorious exec committee didn't like the idea of us tweaking all the code to use objects
Bobbut its a real pain writing OO code correclty
martinIn Perl ?
Bobany form of OO, java, perl c++
martinPerl is very... forgiving :-)
jonPerl is the foam padded banana seat in the trike of OO programming languages
martinSuggest people check out the use of the $record object in the sample Perl to get a flavour for how I saw it being used
ecdbUsing other people's Perl objects is a joy. Writing OO perl modules yourself is less fun I think.
Bob'cause you can ditch/bypass all OO constraints when you want to
martinI've been reimplementing ssh in Perl, and it's not at all bad actually
martinI find myself wanting to turn everything into a Perl object :-)
Bobshould that be on alt.sex.deviant
jonAdvanced Programming in Perl makes objects seem like fun actually (OK, so I'm a sick puppy)
ecdbYeah? Did I mention that theres a perl language binding for the ILU corba/dist-obj system...
martinI just discovered (reading Usenet in another window... ) talk.underwear.veg :-)
jonNo, not the CORBA word, please!
martinCan we come back to the result of reading a template in later and just zip through the rest of section 5 ?
ecdbOK. How about the HTTP-NG word?
ecdbYep.
martinWash your mouth out!
jonAnyway, objects or hashes? Which is it to be?
ecdbResults of reading a template: objects. which contain hashes for now.
jasperI say objects
martinobjects
Bobwell, if jon and martin are so keen on OO then objects
martin:-)
martinAnd we're making work for ourselves too
ecdbOK. Will you need to ask permission of the exec committee on monday to work on this rockets science OO stuff?
jonWell objects from me as well. In which case JohnK and Martin will need to inform the Exec Ctte of this overwelming user demand... :-)
martinI think John has already pencilled me in for a session to explain what we did and what we decided...
ecdbYep. Biz/ed officially requests Objects. Year2000 compliant objects at that ;-)
jonExpect them in 2014 then... :-)
ecdbModelled in rdf and exposed over corba... ahem. sorry. swallowed a marketting manual...
martinOne point - objects vs. complete re-write. They weren't keen on version 3 being a complete re-write, but if we need objects to make APIs work properly...d
jon<BANG> <thud>
ecdbOuch!
ecdbHave we finished now? (i've lost my agenda url)
jonOK folks, objects it is and to hell with comitteee. On to write record
martinYou won't get away that easily - http://www.roads.lut.ac.uk/roads-meetings/api-agenda.html
martinwrite record - OK by me
ecdbOK.
jonOK
Bobok
ecdbA record being, like, an object or something?
martina thingy
Bobsame as open database for status code
jonA doodah
jonOK Bob - sounds good to me.
Bobits an object now then isn't it
ecdbyep
jonAn objecty doodah
jonOK then: index record(s)
martinOK
ecdbOK
jonOK
Boban encapsulated structure with a status code, as before, and some result structure
jonSo that's an OK is it?
ecdbok
martinOK
martin:--)
jasperok
Bobok
jondelete record(s)/deindex record(s)
ecdbkillemall
ecdbsorry must stop typing that
ecdbok
martin* err... OK
jon(and a special redrum option for Dan :-) )
jaspero k
Bobyo
jonperform search
martinOK
jon(this is the fun one)
martinto be defined later ... :-) in section 6

6. Discuss!

ecdbA set of ROADS doobries from the native database, describng the end resources.
ecdbAnd a set of ROADS doobries that describe referrals to remote searchable resources (ie. referrals)
ecdbRemotesearchabledoobries being resource descriptions like anything else
Bobhere need to worry about the result stream going back intop wppd.
Bobplus status info
ecdbWhy? Doobries will have a toWHOIS() method or something, no?
jonwppd.pl will be doing the formating of these ROADS results objects that the search API returns
jonYep Dan
ecdbWhere, if anywhere, should a ROADS database store descriptions of the remote services it has forward knowledge of?
ecdbeg if I know whois++://omni,.ac.yukj:// has a Title "OMNI - medical stuff" and a description ".....", where do we put this?
martinThat's a separate issue
ecdbOK.
ecdbCan searches return referrals?
jonAt the moment ROADS stores the centroids in DBM files and the metadata about the remote service that supplied them in a config file. These seem pretty efficient. However do people want to have the ability to put these into some other database
jonI think for the purposes of this API, we should say no. (oops looks like Bob's fallen in the water)
ecdbI think the can live fine where they are. I was thinking about how we deal with referalls
martinWe do need to be able to deal with referrals in search results
jonThat way we separate our database of records (and associated index) from the centroid information (which is what you use to generate referals)
ecdbif the search api has some notion of a referral, i'd like the api to also present me with a mechanism for finding out about the resource I've been referred to
martinWe should bear in mind that at the moment centroids are completely separate from the main ROADS database thingy
martin(as Jon has just reminded me :-)
ecdbQuick, w.rt. john's point. We have both the database of centroids and the database of centroid-server-descriptions...
jonAnd I'd like to keep it that way!
ecdbMe too.
martinI have no opinion either way and will swing with the crowd :-)
jonSo in which case the search API we're talking about here doesn't need any referal stuff in it.
martinOK by me
jonBut we might want to put centroids in another backend database...
martin:-)
jon... or more than one :-)
ecdbWe could probably re-use the search API for searching the referral database? And just treat the records that come back (eg. whois++// urls) as referalls?
jonI guess so - in which case I guess I'd have to go back on what I just said and let the search results include a referal special return type.
martin ... which as I now realise begs the question of how we'd put centroids into the API ?!
jonExactly
martinStrange that it didn't occur to me before. Must have been sunspot activity or something...
jonWe need a ROADS::Centroid:: heirarchy
jonROADS::Centroid::Open()
jonROADS::Centroid::Close()
jonROADS::Centroid::AddTerm()
jonROADS::Centroid::RemoveTerm()
jonROADS::Centroid::Search()
jon:-)
jonJasper, Dan (if you're still there), martin - does this sound reasonable?
martinMy theory is that we need to ask ourselves whether people need to write their own code to munge centroids
martinIf they do, we should roll centroids and referrals into the API
jasperjust catching up - interuptions
martinIf not, we should leave things as they currently are :-)
jonWell seeing as it wasn't suggested in the API requirements, lets say that they don't for the time being.
martin:-))
jonIf we treat centroids as separate things that we do internally we can bring them out into the light of day via an API at a later stage if people want them
martinOk by me!
jonJasper?
jasper[dan] - i'm in jasper room - in passing briefly - could you mail these Qs to roads-hackers? I'd like some support for building cnetroids. eg if i get a list of unique words per field into perl arrays, can ROADS help build a legal CIP object?
jonAh, I'd like to steer clear of CIP. Pain in the ass.
jasperIs CIP a dead duck then? [dan]
jasperThe basic idea seems sound. It's just a list of lists of words, per field, right?
jonProbably not, but for what people are currently using centroids for it ROADS its overkill.
jonNo. CIP is a super complex indexing system that is open ended and lets you send more or less any thing you like (using MIME index types to say what it is)
jasper[dan] OK. What instead? Centroids (CIPv1)?
jonI've implemented tagged-index-objects using CIPv3 and its a lot more complex than centroids was
jonFor the time being. If CIPv3 takes off in a big way we've got the code to munge it into our centroids structure if needs be now.
jonLets say that if the world of CLUMPS comes up with Z39.50 servers using CIPv3 to permit some sort of forward knowledge we should be ready for it (and hopefully are now) but that I'd encourage ROADS users to stick to the conceptually simpler centroids.
jasper[DAN] been looking at the centroid methods suggested above; is the idea that there would be a getCentroid(string DATABASEIDENTIFIER) function which we could call and would give us a handle to a centroid for each database backend that roads knows about?
jasper[dan] w.r.t. Z39.50, I think we've got a bit of time till they catch up.
martinYep, keep banging the rocks together guys :-)
jonEr no. I was thinking that we'd still have the wig.pl style config files that we have now but instead of just looking
jonin DBM databases we might look in (for example) an SQL database for a particular centroid.
jasperhuh?
jonThat way an implementation is free to either have one centroid per database or stuff them all into one database
jonHuh to which bit?
jaspergetCentroid(DBID) would allow this? Huh to didn't know what you meant about DBM and SQL in same sentence [DAn being dumb not Jasper]
martinSomething else we need to think about - we have an API for getting records in and out of back end databases, and searching/indexing/... - but not for generating a centroid of the records in the database. We probably need that, regardless of whether we have full blown centroid/CIP manipulation in our API ?
jonOK: you might (for some reason best known to an implementor) want to keep centroids for SOSIG in our DBM style databases but OMNI and ADAM ones in some SQL based database.
jonThe wig.pl config file will need to tell the code where this backend database is (both for centroid gathering in
jonwig.pl and looking at in wppd.pl)
jasper[dan] agree with martin. need to be able to get some metadata about a database, including bulk metadata like a centroid.
jonYep, I agree with Martin too.
jon$database->centroid()?
martinsounds good to me!
jasper[is-dan] xcentroid() returning what? A centroid textfile? Or a centroid Object(tm)?
jonWe could have a $database->cipv3("application/index-obj-tagged;DSI=32.3.4.5.2.213") call in the future as well. :-)
martinI think we should have the option of getting a CIPv3 type centroid out of them, perhaps by specifying protocol/version as an argument. Backends shouldn't be required to support centroids or CIP, but should be encouraged to support at least vanilla WHOIS++ centroids.
jasperoh, right. $database implying some kind of Roads object for database?
jonOr something like that - needs a bit more thought if we do do it but I'd leave it at the moment.
martinThe question is - is generation of centroids (and/or other manipulation of centroids) a core API requirement ?
jon$database is returned from ROADS::API::Database->new(BackEnd => "Glimpse"); in the examples in the agenda. Its the ojbect with the status of the open, internal guff, error code/msg, etc
martinNobody's mentioned it until now... :-)
jasper[is-dan] Can/should database have a description of some sort associated with it? Title, Description etc? Subject...
jonI'd say generation of the centroid from a backend database is core, but I don't know about the actual centroid data itself (the stuff we've gathered rather than what we're generating)
jonDunno. We've got a database name at the moment. I don't know whether a decription or other metadata will be of much use.
jonI suppose we could include an API call to get the metadata though.
jasper[dan] I think what I'm really after is richer metadata headers in the CIP things themselves.
jasperor centroid things.
jasperso if I get a referral to some finnish library, I know what it is before I go querying it and find out its all in finnish [dan]
jonNot in centroids you won't - we more or less do everything in there as it is. CIPv3, well, you can define your
jonown index MIME type... :-)
martinWe still need to have a WHOIS++/CIP protocol level interface to get them back to somebody...
jasperok
jonI think what you're after is what Z39.50 people call EXPLAIN. And which they are currently getting very confused about.
jasper[dan] Yep. self describing resources, more or less.
jon(bascially because nobody knows how to implement it properly to make database servers self describing)
jonI think we should put that down as a "do in the future" rather than an API thing now (though feel free to hack at it if you can see a clean way to do it)
jasper[dan] Yep. With Z39.50 its more of an immediate problem since a 'z client' knows very little about a resource just by being told its a z39.50 server. Knowing something is a whois++ server is a lot more, er, empowering.
martinSo - have we wound up the API discussions for now ? :-)
jasperYep. agree. I may hack some alternative non-CIP ish centroid format in XML with metadata headers. Just to see...
jasper[dan + jasper] Yep.
jonOK!
jonRight, time for some choccy then me thinks. Thanks for the input fellas!

The End (of Part II)