Archiving Websites: A Practical Guide for Information Management Professionals

Alastair G. Smith (Victoria University of Wellington, Wellington, New Zealand)

The Electronic Library

ISSN: 0264-0473

Article publication date: 20 November 2007

288

Keywords

Citation

Smith, A.G. (2007), "Archiving Websites: A Practical Guide for Information Management Professionals", The Electronic Library, Vol. 25 No. 6, pp. 794-795. https://doi.org/10.1108/02640470710837245

Publisher

:

Emerald Group Publishing Limited

Copyright © 2007, Emerald Group Publishing Limited


Libraries have always seen preservation of the documents they deal with as one of their key functions. Now that many of the “documents” we deal with are web based, web archiving is a significant concern in the library and related communities. The statistics are worrying: a web page is like a highly unstable isotope, with an estimated half life of two years, and “link rot” means that significant numbers of citations to websites are no longer useful after a few years.

Preserving websites presents challenges that do not occur with traditional materials. A website changes from day to day, so multiple versions may need to be preserved. Website technology changes rapidly, so that viewing a five‐year‐old website with a modern browser may not recreate the original experience. Also, carefully developed archiving procedures may become obsolete when new paradigms of web publishing, such as blogging, appear. The interconnected nature of the web means that preserving a single page may be meaningless, without access to the pages to which it linked. The sheer volume – in excess of ten billion pages and growing – poses both technological and organisational challenges.

In Archiving Websites, Adrian Brown, Head of Digital Archiving at the United Kingdom National Archives, takes on these challenges. He starts with an overview of the development of web archiving, looking at projects such as the Internet Archive (the famous “Wayback Machine”), the Nordic Web Archive, and Australia's Pandora project. Brown then works systematically through the process of archiving. Starting with selection and collection, he considers selective archiving (favoured by a number of national library based projects) and the unselective approach of the Internet Archive. He then moves on to quality assurance and cataloguing, including the need for appropriately applied metadata. The chapter on preservation looks at the preservation of physical media, but also the more important issue of migration and emulation to preserve content through changing technologies. The last stage considered is delivery to users – how users can access, search and browse the archived content. A separate chapter considers the overall management model for a web archiving programme. The book finishes by considering legal issues such as intellectual property and privacy, and makes some guesses about future trends. This last is difficult – as Brown points out, the advent of a more interactive Web 2.0 makes the notion of fixed content, which can be archived, increasingly obsolete.

As well as discussing principles, Brown includes numerous case studies from the UK National Archives and elsewhere. Useful additional material includes a brief glossary, an index, and appendices with details of useful software tools, model forms, checklists, job descriptions, etc. Archiving Websites is a compact but thorough survey of the challenges of developing a web archiving programme and will repay reading by any librarian or information manager concerned about the long‐term viability of the web‐based information that is now our stock in trade.

Related articles