Open Source Software in Libraries an Update

Library Hi Tech News

ISSN: 0741-9058

Article publication date: 1 May 2002

381

Citation

Bretthauer, D. (2002), "Open Source Software in Libraries an Update", Library Hi Tech News, Vol. 19 No. 5. https://doi.org/10.1108/lhtn.2002.23919eaf.003

Publisher

:

Emerald Group Publishing Limited

Copyright © 2002, MCB UP Limited


Open Source Software in Libraries an Update

David Bretthauer

Librarians who are looking for alternatives to restrictive software licenses and expensive software, which does not fully meet their needs, continue to incorporate open source software into their institutions. In the past year, there have been a number of developments in library-related open source software.

Simply stated, open source software is supplied with the source code, the underlying programming which is used to create any software package. In the case of proprietary software, the end user cannot legally view or change the source code. By contrast, open source software users are encouraged to look at the source code and offer improvements where possible, using a process which is similar to peer review. Over the past doecade, open source software proponents have created some stunning successes: Linux, FreeBSD, OpenBSD, Apache, MySQL, PostgreSQL, PHP, PERL, and Python are all open source products Opensource.org also has a certification process for licenses intended to protect the use of the term "open source", described at http:///opensource.org/docs/certification_mark.htm A complete definition of a license which meets the criteria for this certification is available at http://opensource.org/docs/definition.html[1]

To be sure, open source proponents are an enthusiastic group, but use of open source is not a religion – it does not require an all-or-nothing commitment, as open source software can often be found on computers and in IT centers along with proprietary software. And when license agreements allow, open source software has been incorporated into proprietary software. For example, the open source code which implemented Internet connectivity in the FreeBSD operating system has also been used in most releases of Microsoft Windows™ (Gomes, 2001).

Projects of interest

A number of library-specific open source projects have been in development and production use for several years. One example, Prospero, is about to receive a major upgrade. Initially designed to work in conjunction with Ariel® to produce a fully electronic document delivery system, this widely used open source package has recently received unexpected competition from Ariel itself. Prospero's manager Eric Schnell announced in February 2002 that Prospero 2.0 would be released soon (Schnell, 2002); in early March 2002 he projected a release date around the beginning of April 2002[2]. This significantly updated package will include a number of improvements, but unlike Ariel's upgrade last year it will not require updated hardware or operating systems – in libraries with expensive scanners which only have Windows 95 drivers available, this can mean thousands of dollars in savings. All current Prospero 1.x, as well as Ariel 2.x and 3.x users will be interested in this release. More information about Prospero is available at http://bones.med.ohio-state.edu/prospero/.

Another ongoing project is Jake (http://jake-db.org) and (http://jake.lib.sfu.ca), which concisely tells users which aggregated products index and/or provide full text for which electronic journals. While development has slowed throughout much of the past year, rumors of Jake's demise are greatly exaggerated. While Dan Chudnov, Jake's developer and manager, has been preoccupied with a new job, Todd Holbrook and his colleagues at Simon Fraser University continue to add to the code and contribute vendor data, and the product continues to be widely used (Chudnov, 2002). It would be fair, though, to consider Jake's slowed progress as evidence that open source projects do not maintain and evolve themselves even with the support of a bazaar, but must be directed by someone who acts as an manager, owner, shepherd, or champion.

One very promising package now in beta testing is LOCKSS™ (Lots of Copies Keeps Stuff Safe), intended to provide low-cost, long-term, distributed, redundant, archival access to electronic journals. Developed by Vicky Reich of HighWire Press, with support from NSF, Sun Microsystems, and the Mellon Foundation, this package is in beta test by 45 libraries, including major research libraries such as Library of Congress and Yale University, through summer 2002. The project is also endorsed by 53 publishers (LOCKSS, 2002). The intent of this project is to allow interested libraries both large and small to participate in archiving of e-journals which they have licensed – online and fully available to their patrons without intervention by library or systems staff, not by such offline means as CD-ROM or tape storage – for the cost of a PC. Because titles can be mirrored at many institutions, the need for expensive redundant servers is not necessary. Thus any interested library with funding for a low-end PC with perhaps a 60GB hard drive would be able to proactively ensure the availability of e-journal resources for which it has paid license fees, even if the publisher goes out of business or its servers fail (even temporarily). Further information on the LOCKSS project can be found at http://lockss.stanford.edu/ A live world map showing the current status of all beta test sites is at http://sul-lockss18.stanford.edu:8080/GlobalCacheMonitor. Vicky Reich also is pursuing funding "to evolve the project into a production quality archival program"[3].

Several project developers have envisioned complex products which they prefer to develop using what Eric Raymond has often characterized as the "Cathedral" model of software development[4]. Often these projects are developed with grant funding. In these cases, developers announce work on their projects and the intention of releasing the source code using an open source license, but often do not release the code, except to selected testers, until the project is well on its way to completion. One notable past example is the OpenBook integrated library system, for which source code was not available until after the product's first public demonstration. Current projects adopting this model with expected release dates in the near future (Summer 2002) are Infomine and the Internet Scout Portal Toolkit. These use open source packages such as Apache and MySQL to create sophisticated software. Some confusion is unfortunately developing over the use of the term "portal," given its past use to describe products such as MyLibrary. Depending on the application being discussed, it may mean "a user-customizable interface to a library's resources with separate accounts for each user (such as MyLibrary)", or "a product which allows individuals to easily create organized collections of electronic resources (such as Infomine, the Internet Scout Portal Toolkit, or conceivably even OCLC's SiteSearch)."

MyLibrary (http://hegel.lib.ncsu.edu/development/mylibrary/) is currently in version 2.5 and has been used in production user portals by several libraries beyond NCSU, where it was initially developed and is still maintained, both in the USA and elsewhere.

Infomine (http://infomine.ucr.edu) and the Internet Scout Portal Toolkit (http://scout.cs.wisc.edu/research/SPT/) are in development and beta testing. Neither is currently available via anonymous ftp download, but the current beta version of the source code for the Internet Scout Portal Toolkit can be obtained from David Sleasman by contacting him directly at sleasman@cs.wisc.edu. The Scout Portal Toolkit, which is currently seeking beta testers, "allows organizations that want to share the collection of knowledge and resources they have assembled via the World Wide Web to do so without making a substantial investment in technical resources or expertise" (The Internet Scout Portal Toolkit, 2002). Infomine's goal "is to help portal efforts scale better by combining the best of expert created portals (i.e. small collections notable for accuracy and precision) with the best of focused crawler/classifiers (i.e. very large collections notable for their reach and recall and machine assistance in metadata application)". Both projects expect to release their source code in summer 2002.

By contrast, other open source developers are using what Eric Raymond describes as the "Bazaar" method of software development by making early source code immediately available. Both Libproxy and Citation Manager are described by their primary authors as in early development, but both are available for download.

Libproxy is a rewriting pass-through proxy system, which is intended to permit authenticated users to access library resources from outside a library's IP range. Patrons of libraries which use this software to provide access to IP-authenticated electronic resources from outside the library's IP range are not required to modify their Web browsers. The current version, described as 0.020 as of early March 2002, has been implemented in production use. Further information about Libproxy is available at http://www.goerwitz.com:31265/software/libproxy/dist.

Citation Manager (http://stalefish.lib.sfu.ca/CitationManager/) is an open source online citation database for end users. It provides an easy way for institutions to give all patrons equal access to simple bibliographic organization tools both on-site and at home. It author, Todd Holbrook (2001) of Simon Fraser University Library, describes the package as "still at an alpha stage as I add features, but it's stable so far and usable, if limited". This package may be of interest to libraries which support commercial citation management packages.

Another project which is available, but for which documentation is still being written, is WIBS, or Windsor Internet Booking System, written by Art Rhyno. This package is a server-based scheduler for public PCs, and is based on another open source package called MRBS (Meeting Room Booking System). As such it should meet a demand among many libraries, and is in production use at Windsor Public Library and the University of Windsor. Like MRBS, it is built using Apache, PHP, and MySQL, and also provides an example of how open source licensing allows one programmer to reuse another's code for a different purpose. More information is available at http://wibs.sourceforge.net/

Work also continues on several open source integrated library systems. Generally open source ILSs are probably best considered as up-and-coming but not quite ready for production use. Koha (http://www.koha.org/) has been in production use in its home library, the Horowhenua Library Trust, Levin, New Zealand since the beginning of 2000, and development continues. OpenBook suffered a slight setback when its sponsoring organization, the Technology Resource Foundation, lost its funding. However, Willem Scholten, OpenBook's principal developer, is continuing development and will field-test the system in several Washington state-area libraries in the spring 2002. OpenBook is MARC21-compliant, has a Web interface, support for multiple languages, includes a Z39.50 server and client, and is geared toward collections of up to 25,000 items. Avanti (http://www.avantilibrarysystems.com/) also continues development, and is being tested, although its author plans no further releases until development is significantly further than the current 0.4 release.

When is it "open source" and when is it not? And when does the difference matter?

Some recent developments have received attention with the open source moniker. While these developments are welcome, they unfortunately do not meet the definition of open source stated at opensource.org How important this distinction is remains to be seen.

One is the recent announcement that OCLC would in the near future release its extensively developed and previously proprietary SiteSearch software package under what OCLC has described as an open source license (2001). As of this writing (early March 2002) the license itself has not yet been released, but OCLC has stated there will be two versions: one for non-commercial use, and one for commercial use. OCLC spent considerable resources developing this software, which it has sold for several years, therefore this represents a major shift of a major product from proprietary-commercial to open source software. It appears OCLC wishes to protect this investment, and sharing, from reuse in competing proprietary commercial products. Practically, this restriction should have little effect on libraries, which typically are non-profit organizations. However, the restriction is enough to prevent opensource.org from placing such a license on its list of approved licenses. Nevertheless, this decision represents a development which should be of interest to many commercial providers of library software.

Another type of license agreement, sometimes incorrectly referred to as "open source," allows a user who pays a membership fee to access source code. This model has also been called, perhaps more accurately, a "community source" license. In it, the community of users contributes a membership fee, which helps pay salaries of programmers and project managers who develop the package and may at their discretion accept and incorporate code contributed by members to the source code base. Outside the library world, this model was used successfully to develop the course management package Prometheus (recently sold to Blackboard). But community source is not the same as providing fee-based support for open source software when the fee is for support, as RedHat does with Linux, rather than access to the source code.

Digital Library Federation

Several library conferences have included presentations about open source software in the past year, including LITA's National Forum, a symposium sponsored by the North Suburban Library System near Chicago, American Library Association, and Access. But perhaps the most significant development in open source software for libraries, at least in the USA, was a meeting sponsored by the Digital Library Federation in October 2001. Organized by Aaron Trehub, Eric Lease Morgan, and Martin Halbert, and facilitated by Dan Greenstein of the DLF, the meeting's focus was on mainstreaming open source software in libraries. Many but not all of the attendees represent ARL libraries. There were two major outcomes of the meeting. First is an effort to develop a portal approach to exhibiting library-related open source software, including the ability to sort or find products used by type of library and by type of project. The goal of such a project would be to help libraries determine what packages would be suitable for use. The intent was to build on the strengths and accomplishments of Dan Chudnov's www.oss4lib.org, rather than compete with this successful site, which has functioned for several years with minimal support. That idea has been prototyped by Eric Lease Morgan and Notre Dame Libraries' Digital Access and Information Architecture Department, with the intent of pursuing grant funding opportunities and building awareness of open source within some library circles. The second outcome was to sponsor, or help sponsor, a research project which would test a number of hypotheses about open source software. The report of this meeting can be found at http://www.diglib.org/architectures/ossrep.htm

Clearly, open source software specific to meeting library needs is evolving. Despite some growing pains, open source is beginning to fulfill its potential for replacing increasingly restrictive software licenses and software subscriptions with a useful, viable alternative.

Notes1.A fuller explanation of the benefits of open source software can be found in Bretthauer (2001).2. Eric Schnell to David Bretthauer, personal mail, 1 March 2002.3. Victoria Reich to David Bretthauer, personal e-mail, 4 March 2002.4. See Raymond (1999). Obviously this is the topic of the entire essay, but especially see pages 29-30 and 37.5. Steve Mitchell to David Bretthauer, personal e-mail, 24 January 2002.

David Bretthauer (dave.bretthauer@uconn.edu) is Network Services Librarian, University of Connecticut, Storrs, and vice-chair of the LITA Open Source Systems Interest Group.

References

Binkley, P. (2001), "SiteSearch to be open-sourced", jake-list@sourceforge.net 1 October 2001 and http://www. goecrawler.com/archives/3/6067/2001/ 10/016734724/ 2 March 2002.

Bretthauer, D. (2001), "Open source software in libraries", Library Hi Tech News, June, p. 8-9.

Chudnov, D. (2002), "Using Jake for 'Indexed in"', available at jake-list@lists.sourceforge.org 7 March and http://geocrawler.com/lists/3/SourceForge /11697/0/8040256/

Gomes, L. (2001), "E-business: Microsoft uses free code", The Wall Street Journal, 18 June.

Holbrook, T. (2001), "EBSCO titles update", jake-list@sourceforge.net 4 October 2001 and http://www. goecrawler.com/archives/3/11697/2001/10/0/6757782 2 March 2002.

(The) Internet Scout Portal Toolkit (2002), http://scout.cs.wisc.edu/research/SPT/main.html 2 March.

LOCKSS (2002), "Permanent publishing: local control of content delivered via the Web", http://lockss.standford.edu/projectdescbrief.htm 1 March.

Raymond, E.S. (1999), "The cathedral and the bazaar", The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary, O'Reilly & Associates, Sebastopol, CA.

Schnell, E. (2002), "Just released: Prospero 2.0", prosper@auto.med.ohio-state.edu http://auto.med.ohio-state.edu/pipermail/propspero/2002-February/000112.html

Related articles