BUBL April 1998

BUBL: the Browse Engine...

By Alan Dawson, BUBL Information Services Manager: ijs96122@ccsun.strath.ac.uk

I've never liked the term 'search engine'. If something has big shiny wheels and runs on rails, or if it drives the tyres that erode roads, then that's an engine. Piston broke is the sort of terminology I associated with engines, not the sobering wealth of information churned out by Internet so-called search engines. But the term has caught on, and now even our local parish newsletter offers a quick guide to search engines, alongside details of the latest campaign to complain about cracks in the pavement. It can't be long before 'search engine' appears in Volume 7 of the Camford Shorter Concise Abbreviated Dictionary, so I'd better learn to use the term.

Search engines are of course fantastically useful ways of finding information. It might not be the information you wanted, but what the heck, this is the Internet, not the British Library catalogue. Some of us can remember being impressed by Archie and Veronica, so it's a bit unfair to criticise services that are a thousand times better. Trouble is, there's a hundred thousand times more information around than in the days when FTP and Gopher were state-of-the-art information retrieval tools. So it's no wonder that people like search engines.

Yet you never hear anyone enthuse about browse engines. It's a phrase I'd never heard until I started this article, so it needs a bit of explaining. The idea is to retain some of the benefits of search engines while overcoming some of their deficiencies, which are quite well documented by regular users; a useful summary is given in "Cross-Searching Subject Gateways" in the January 1998 issue of DLib magazine. To summarise, search engines index pages not resources, they return huge numbers of hits (many of which are irrelevant), their results are usually unstructured and unclassified, item descriptions are variable at best, and many of the links are broken. Each search engine requires careful use of advanced features and precise syntax to get the best results (which can be very good), and of course they're all different.

All this poses problems because it seems that users in general are not very good at searching. Since the new BUBL service was launched from Strathclyde University Library in March 1997 we have accumulated a vast archive of terms typed into the search box of BUBL LINK (our catalogue of Internet resources), and it's rather depressing to scroll through this archive. It's not so much the spelling and typing errors, it's the lack of technique that is quite striking. Five, six, seven word phrases appear regularly. Since the default search type is 'AND' for two or more words, failure is almost guaranteed. It's not uncommon for users to respond by repeating the same search, perhaps hoping for better luck. The more experienced may try adding quotes or brackets (not bad, this works with some search engines), but adding extra search terms after getting zero hits is not a great idea.

Then there is the whole problem of terminology. Should one search for 'personnel management' or 'human resource management'? Is Chronic Fatigue Syndrome the same as Myalgic Encephalomyelitis? Do the capital letters matter? And just how do you spell encephalowhatsit? It's easier to just type ME instead and hope for the best.

So the idea of a browse engine is to remove these sources of error and uncertainty by saving users the trouble of having to type anything at all. This in itself is not a new idea, as Internet directories such as Yahoo (or indeed BUBL LINK) can be driven by mouse alone, though it might take a lot of clicking to find what you want, and even then it might not be there after all.

A browse engine would ideally emulate a search engine by offering a single page from which it was possible to find information on a large number of subjects, but by using an extensive controlled vocabulary and a field-based database it would locate only resources relevant to the subject rather than any that happened to mention them. There would be far fewer hits for most subjects, but they would be all be relevant.

The BUBL browse engine aims to add further value with features such as academic relevance, quality control, concise and meaningful resource descriptions, catalogue records including author, resource type and resource location, regular link checking and fixing, a guaranteed minimum number of hits (5) for every subject, and an upper limit on hits (15) for most subjects. The service is still under development but it already has over 1100 subject terms, covering all main subject areas. It is oriented toward JISC services in that it uses JISC's nine top-level subject areas, and includes numerous references to resources held by other JISC services and projects. It is not intended to be a substitute for searching, rather a fast and effective alternative for certain tasks and topics. There are no wheels or pistons grinding away, but then it's not an engine at all, more of a glider really. With a glider you hope it takes you where you want to go, but if not at least you enjoy the ride and it's easy to drive. The trial service is called BUBL 5:15 (pronounced 'bubble five fifteen') and it's just getting off the ground for initial evaluation at http://bubl.ac.uk/link/five/. We would welcome constructive feedback from anyone interested in trying cross-domain browsing or gliding as a change from searching or surfing. Please send any comments to a.dawson@strath.ac.uk

____________________
Contents | BUBL | Charlotte's Corner