| martin | I'll add an extra item - "callbacks" to this section, unless anyone has a problem with this. Holler now if you do...
|
| martin | Done!
|
| martin | So, the first question I'd like us to discuss is...
|
| martin | Do we need a Perl level API to call (as opposed to running an external program) for tasks such as rebuilding the database index ?
|
| Bob | how do we implement the sql index api?
|
| Bob | and the sql templates api
|
| Bob | ?
|
| Dan | Re Martin's question: do we need perl functions to do things we've got command line tools for...
|
| Dan | I'd quite like this, but guess it should be trivial for us to do ourselves. exec()?
|
| martin | Currently we run an external program to index newly added records - you could substitute for this with your own program. Another possibility is that we provide a common SQL front end for people using SQL with ROADS - but there are some potential problems with this.
|
| Dan | SQL is a big tricky problem if we try to address it properly
|
| Dan | Can we divide up the discussion and talk about read-only versus read/write access to SQL stored records?
|
| martin | By all means....
|
| Bob | I'd like access to a mk.indexterms function, that calls the internal roads indexing stuff, and just presents me with data that I can bang into a relational table
|
| Dan | So you'd be doing your own indexing, in effect, rather than relying on whatever facilties the RDBMS offered?
|
| Bob | Ahh, indexing in terms of word lists, attributes etc, not indexes as in btree
|
| martin | In pre-meeting discussions it occurred to me that another distinction which is worth making is whether the SQL (or a.n.other) backend is being used just to index the records, or store and index them. If the answer is just to index, we could adopt a Harvest compatible model which allowed for multiple search engines - including SQL based systems :-)
|
| Dan | Could you expand on this a little? Sketch a scenario? (I wasn't around pre-meeting...)
|
| Bob | I thing that the relational storage of templates should be distinct form the storage of the index terms associated with a template
|
| martin | Dan - the "Harvest scenario" would be that we had something like the generic db_api.pm suggested in R7, which would act as a buffer between (at least) queries and whichever backend indexer was being used. This is essentially what Harvest does.
|
| martin | To follow this train of thought...
|
| martin | Do we also want to wrap up the indexing process inside a Perl module type API ? (that was the original question :-)
|
| Dan | I've been working with PostgreSQL [http://www.postgresql.org] for relational storage of ROADS-like records, and have found it's internal indexing facilities to be pretty poor for queries of the form 'term appears somewhere in such'n'so field'.
|
| Dan | (I started typing the above while it was more relevant to the discussion)
|
| Dan | To go back to Martin's comments on db_api.pm, I think this would be an excellent thing to do. However it's not a million miles away from inventing our own simple search protocol...
|
| martin | :-)
|
| Dan | We've effectively been doing this in another context by writing WHOIS++ wrapper scripts for a variety of interestingly searchable networked reseources. How, apart from being function instead of network socket oriented, would db_api.pm differ?
|
| martin | I should mention, for the benefit of anyone who hasn't picked up on this, that Jon has created an unofficial WHOIS++ extension which allows the Tk based version of the ROADS template editor to edit templates and then upload them to the ROADS server.
|
| Bob | Part of my thought with db_api, was to allow a simple interface between a hidden world of wppd protocol streams/formats and the data. letting roads code format (sql result sets) into a wppd stream
|
| martin | Would you want to have a Perl API for regenerating things like subject listing breakdowns ?
|
| Bob | yes to wrapping the indexing process in a callable function/module
|
| Dan | Regarding the WHOIS++ write extension: this is great idea when the databases internal representation maps nicely onto the public view of the data as WHOIS++ records. If the database stores stuff internally in several relation tables, I can't see a WHOIS++ write facility being useful.
|
| Bob | no for listings api, shell scripts are good enough here
|
| martin | You should bear in mind that the immediate rationale for it is to provide a way for us to have a template editor which runs on PCs under Windows (and also, coincidentally, on Macs :-)
|
| martin | FWIW - at the moment the new template is uploaded into the "pending" templates area, awaiting a ROADS database rebuild.
|
| Bob | suesmachine crashed installing the roads screen saver
|
| martin | We can certainly move the body of code which is currently in bin/mkinv.pl into a separate library module.
|
| PJ | Re: Tk editor - Perl Tk runs quite happily on my Windows PC and Macs as far as I'm aware. How about a port of Jon's code?
|
| Dan | (from John) the screensaver is an alpha release :-) Non-fast machines struggle with it, so am doing a cut-down version of it for june
|
| martin | Jon has been looking at Perl/Tk too - but unfortunately he couldn't make it for today's meeting. He'll be around tomorrow afternoon, though, and may have some progress to report!
|
| martin | thinks... I thought Bob was joking about the screensaver!
|
| Dan | Can I jump back to the readonly (searching) discussion? We started to talk about a standard ROADS representation for a searchable resource.
|
| Dan | This would be an alternative approach to WGI, which passes off all the work of implimenting WHOIS++ to the WGI implimentor.
|
| martin | Dan - before you get going on this. Do we have consensus on a "yes" for APIs for a) adding a new record and b) rebuilding the database ?
|
| martin | We can discuss details later...
|
| Dan | Can I give a qualified yes? ie. Yes with-respect-to the ROADS built in database. Not w.r.t. arbitrary backend databases.
|
| Bob | yes
|
| PJ | We certainly want a painless way of adding new records in a variety of ways that doesn't involve hacking ROADS. So yes to a.
|
| jasper | yes
|
| andy | yes
|
| martin | Dan - what's your problem with alternative backends ?
|
| Dan | I just wanted to be clear what the requirement was. There's a big difference between sending something like a ROADS record to a relational database and saying 'store this' than if we're sending it to a ROADS built-in IAFA based database.
|
| Bob | surley not if the api is correctly designed?
|
| martin | Yes, that's true - the cop out is that we provide the mechanism for people to write support for arbitrary backends, but aren't likely to be able to code up the actual support ourselves!
|
| Dan | I'd quite like to be able to do both: say to any ROADS service "store this" and send it a template. But it's a bigger job if we're allowing for relational databases stores with arbitrary schemas.
|
| Dan | We've got am agenda bullet-point that covers this: Schemas and schema normalisation for SQL backends
|
| Dan | The relational database / SQL philosophy is rather different to the data model implicit in IAFA/WHOIS++ records.
|
| andy | Doesn't the API hide the individual database schemas from you?
|
| martin | We also have to bear in mind that some people might want to use their (say) SQL server to actually store the records, and some just for indexing them. So, we need to be flexible enough to handle this - preferably without jumping through hoops.
|
| martin | Before we go any further...
|
| Dan | Mmmm....
|
| martin | I'm interpreting the above to mean that on the whole we like the idea of APIs for a) adding new records to a ROADS server and b) reindexing the database.
|
| Dan | Could you clarify what a ROADS server is?
|
| Dan | (I'm not being facitious. I think we could do with inventing names for the different components ROADS offers.)
|
| martin | Sorry - I'm just using that as shorthand for "installation of the ROADS software"
|
| Dan | I've a ROADS server (aka installation) that uses WGI instead of the ROADS-textfile-database. I suspect it's the latter that
|
| Dan | you mean?
|
| martin | No, either - if we have an API in-between it shouldn't matter.
|
| Dan | OK, if that's what we mean. I vote Yes. It's a hard problem though.
|
| martin | It does require us to be disciplined in terms of what we throw about when calling the various ROADS modules.
|
| Dan | In what way?
|
| martin | At the moment we use global variables all over the place - but we should have a more clearly defined interface between the calling program and the module which implements the API.
|
| jasper | How much of a re-write do you reckon this would be?
|
| martin | Not much work to hide the indexing code behind a Perl module front end, but there may be implications to exterminating global variables that run right through the code.
|
| Dan | So... is the consensus that we would like a standard API in ROADS for reading and writing data to permenant storage, and that this should abstract away from details like whether we're using a relational or textfile or OO databases behind the scenes...
|
| martin | That's my reading.
|
| Dan | We could be very lazy and just have a storeThis(String IAFAtextfile) function that throws a record at someone elses code.
|
| martin | :-))
|
| Bob | yes, and its up to the roads hacker to implement the appropiate details for the particular mechanism of long term storage, for there particular DB system
|
| Dan | Or we could try for a more abstract representation of the data to be stored. We all did a fair bit of work on this earlier in the spring, and as I recall ended up with a representation of the data using triples...?
|
| martin | At the moment we have a very simple internal representation of a ROADS template when read in using readtemplate()
|
| martin | It returns an associative array with each of the template's attributes as an element, and that attribute's value as the element's value
|
| Dan | Yep - it flattens all the structure down to a simple list of attrbutes (including -v1 -v2 suffixes etc)
|
| Dan | Since the attribute names themselves also have composite structure (Author-Email; Admin-Email etc) we're passing off a lot of the work onto the backend implimentor.
|
| Bob | sorry, still thinks its not too difficult
|
| jasper | What would the advantage be of having a more complex internal representation of a template?
|
| Bob | KISS
|
| jasper | ?
|
| Dan | We've already got the complexity; the whois++ schema gives us that. It's whether we make the ROADS internal representation reflect that complexity that is at stake here.
|
| jasper | When we looked at the ROADS object idea before I got the impression that it would be too much to get into v3?
|
| Dan | it's all gone quiet... :-)
|
| martin | I'd like to suggest that the current approach actually makes it quite easy for us to do quite complex things - just by taking advantage of Perl's associative array mechanism. Having said that, it might be worth us hiding the actual internal data structure in a Perl object, just in case we want to change it in the future.
|
| martin | I guess the question boils down to... what do you want to do that readtemplate() is getting in the way of ?
|
| martin | :-)
|
| Dan | I've no problem with readtemplate(). It's the idea of implimenting writetemplate() for relational databases with a variety of schemas that gives me the worries...
|
| Dan | ROADS templates have lots of interesting hidden structures, eg. list-valued attributes (Keywords; subject categories); hierarchical structure (clusters) and so on.
|
| martin | ... and you have to have semantic knowledge of the (ROADS) schema in use if you're to do something sensible with them at the back end ?
|
| Dan | Exactly.
|
| Dan | I suspect it could be such a headache to impliment writetemplate() properly that we could find nobody using it.
|
| martin | In ROADS-speak you can see whether an element is repeated or a cluster included by looking in config/outlines/*
|
| martin | List of things (e.g. keywords) are another matter!
|
| martin | I think we've moved on to schema normalization and SQL backends ? :-)
|
| Bob | wait wait, am I missing plot here. Its down to roads to supply a suitable API, and a module that'll interface to a bisic index/storage mechanism, its down to the roads user to implement the backend for a particular system
|
| Dan | For some definition of "suitable API". Would it be suitable for the receving code to just be thrown an IAFA/WHOIS++ template?
|
| martin | We could say a Metadata::Template object ? :-)
|
| martin | With a public method to retrieve a list of attributes
|
| Dan | Sounds better already!
|
| martin | And some magic to let you get the value for each element.
|
| Dan | The public method to retrieve list of attributes is getting more interesting. We ought ideally to let the config/outlines schema info be inspected...
|
| martin | You mean in a programmatic sense ? So that a (say) Oracle backend could find out what its host ROADS installation was using for its (ROADS) schema ?
|
| Dan | Or is that getting a bit sci-fi?
|
| martin | "ROADS - it's Rocket Science" :-)
|
| Dan | I've not mentioned CORBA-ROADS yet ;-)
|
| martin | now now1
|
| martin | s/1/!/
|
| martin | I think the schema issue is problematic at both ends. Consider...
|
| martin | 1) you install ROADS and create some template types and new attributes of your own
|
| martin | 2) you have an existing SQL database which you want to interface to the ROADS WHOIS++ server
|
| martin | If we're going to try and cater for both cases, we need a "schema normalization" module
|
| Dan | RocketScience.pm?
|
| martin | I'm sure that one's been taken!
|
| martin | DoWhatIMeantNotWhatISaid.pm
|
| martin | :-)
|
| Dan | And we want to be able to store records be sending them to some storage function which will Do The Right Thing, by having some knowledge of how our schema maps onto its favourite way of storing information.
|
| Bob | Its not outrageous to attempt to automatically normalise a set of roads templates, and create a fresh set of table defns containing any new attributes.
|
| martin | This is essentially what Digger (Bunyip WHOIS++ server) used to do - though they dropped the SQL support because it was too slow and unreliable...
|
| martin | :-)
|
| martin | From a ROADS developer's point of view, the main issue for me is whether I should be putting some infrastructure in the code base to help people who are trying to do 1) or 2) above. Assume that we already have an API which they can plug their own code into.
|
| martin | e.g. should there be a ROADS module which lets you read in the current (ROADS) schema ?
|
| martin | From the silence I assume everyone's feverishly typing... or dozed off!
|
| Dan | Was this the topic that was going to be addressed in ROADSv3 but moved back (forward?) into v4?
|
| Dan | I'd (for Biz/ed and as a ROADS person) like to see this addressed. Relational databases aren't going to go away, but ROADS offers a lot of value added stuff which could work very well alongside out-of-the-box database packages.
|
| martin | Not schema issues particularly - we were going to do a no-holds-barred rewrite for version 3 (think Microsoft with Windows here :-), but were persuaded to hold off on this until version 4. If we're to properly support third party code, we do need to do some of this work in version 3 - IMHO.
|
| Bob | Id like to see an emphasis on getting a nice api I can use, even if I have to keep an eye of mismatch between templates and other representation
|
| Dan | Yep. It would be great to get simple support for writetemplate() facility, as bob says, without waiting for the CORBA/RDF rewrite ;-)
|
| martin | What would you like ?
|
| Dan | Me?
|
| martin | Yes, you.
|
| martin | :-)
|
| martin | I know - TheWorld.pm !
|
| Dan | I'd like to start simple with all tempate writing being bottlenecked through one bit of code, so ANR people could - in principle - replace it with their own way of storing stuff.
|
| martin | OK, let's call that writetemplate()
|
| Bob | and reading :-)
|
| Dan | Even if this was done just by passing it the representation used by readtemplate().
|
| Dan | As a second item, I'd like ROADS exec to let us look again at the schemas and data model stuff, so we can have a more sensible representation of the records that reflects their internal structure.
|
| martin | Well, we happen to have an Exec. committee meeting on Monday... :-)
|
| Dan | without this, I would be suprised if many people implimented alternative backend stores for ROADS.
|
| Dan | Oh, forgot to witter on about RDF. Consider it done ;-)
|
| martin | and XML :-)
|
| Dan | Nah. XML's the new ASCII. RDF's the new Java...
|
| martin | Going back to the method formerly known as writetemplate()...
|
| martin | Is it acceptable to be reading the ROADS schema out of config/outlines/* or should we be providing an API for this too ? NB: this is not the same thing as the SQL (or whatever!) schema.
|
| Dan | Do you mean just dish up the textfile?
|
| Bob | I'm not sure that you need to supply the schema associated with a template.
|
| Dan | Just the record-type would be enough. People will hardcode this stuff anyway.
|
| martin | I'm getting the impression that we don't need to write any special code at the ROADS end to make the schema info in config/outlines/* available to hackers. What about the back end - is it any of our business what the ROADS attributes map onto ?!
|
| Bob | no
|
| Dan | No. Although tools / a framwork for doing the mapping might be a good direction for ROADS to grow in.
|
| jasper | Sorry. Got caught by the phone. Can we /have we talked about the callback stuff yet?
|
| martin | Not really!
|
| martin | Before we go onto that...
|
| jasper | Shall we? Or tomorrow?
|
| martin | I'd like to suggest we discuss it, and then reconvene tomorrow to talk about the actual technical nitty gritty - that should give me enough time to throw something together code-wise
|
| martin | Ambitious, but it might just work... !
|
| Dan | Martin - you started to talk earlier about a Harvest model for searchable things...
|
| martin | Yep!
|
| Dan | Would this be something the CGI-based search client would use, instead of assuming that everything in the world was a W++ server?
|
| Dan | How would you see this idea fitting in with ROADS-as-we-know-it?
|
| martin | I was thinking about the bolting-on of backend databases to the WHOIS++ server - Harvest defines an API of sorts for doing this.
|
| Dan | Ah, right. Making it easier to write backends without having to know the WHOIS++ formalities.
|
| martin | Well...
|
| Dan | This would be coming up with a simple (ie. not Z39.50) API for searchable resources. Do we have schema traumas again?
|
| martin | No, it's OK...
|
| Sue | bob says you can't talk about this while he's not listening :-)
|
| martin | The glue code would still need to do something with a WHOIS++ query (or a hypothetical future ROADS "query object") and return WHOIS++ format records.
|
| Dan | Not ROADS::Metadata record objects?
|
| martin | We already have this with WGI, but this is geared up to launching external programs. It would be useful if we could run a Perl module which might or might not launch an external program.
|
| Dan | Martin - yep, doesn't sound like much of a bigger win over WGI unless it abstracts away from WHOIS++ a bit.
|
| martin | The actual module/method to run could be implemented as a callback - cue Jasper :-)
|
| jasper | Was that a clever link to moving on to discussing callbacks?
|
| martin | :-))
|
| martin | Or would you prefer to do this tomorrow ?
|
| jasper | I'm happy to start it now. Dan? Anyone else?
|
| Dan | How about we talk about implimenting a Z39.50 API in ROADS?
|
| Dan | ;-)
|
| martin | OK - callbacks.
|
| Dan | Yes...?
|
| jasper | Would it be better to implement callbacks as modules rather than separate scripts?
|
| martin | I'd been thinking about them as modules - with something in (say) ROADS.pm which let you override the default ROADS behaviour
|
| martin | e.g. call Jasper::MyTemplateThingy::writetemplate instead of ROADS::Template::writetemplate
|
| martin | I'm not even sure if that's possible, though!
|
| jasper | In terms of where they should go...
|
| martin | Thinking in terms of Perl references, typeglobs, ...
|
| jasper | when templates are inserted, edited, deleted...
|
| jasper | when templates go 'stale'...
|
| martin | We : could have callbacks both at the "method" level (e.g. replace the default bit of code to insert a new record) and also within the method itself
|
| martin | e.g. "before inserting a new record", "after inserting a new record"
|
| martin | The difficulty is knowing when to stop! cf. X Windows :-)
|
| jasper | Yes. As a suggestion, could we have, say, half a dozen in the next beta to play around with?
|