Through the looking glass: envisioning new library technologies

Library Hi Tech News

ISSN: 0741-9058

Article publication date: 26 August 2014

482

Citation

Fernandez, P. (2014), "Through the looking glass: envisioning new library technologies", Library Hi Tech News, Vol. 31 No. 7. https://doi.org/10.1108/LHTN-07-2014-0057

Publisher

:

Emerald Group Publishing Limited


Through the looking glass: envisioning new library technologies

Article Type: Regular column From: Library Hi Tech News, Volume 31, Issue 7

Archiving the Web

Libraries have long served as stewards of informational resources and cultural artifacts. Increasingly, however, materials are created and shared digitally, which presents a host of challenges for those who seek to preserve information. This column explores emerging tools that enable non-specialists to envision preservation for the modern era and highlights technological and ethical challenges in the preservation of digital artifacts.

A case in contrasts: preserving born-print publications, podcasts and blogs

Different mediums of publication and their surrounding infrastructure can have vastly different impacts on ease of preservation. To illustrate these differences, I explore the preservation of video gaming commentary from a print magazine, a related podcast and a blog to highlight some of the changes that occurred during the transition from print to electronic media.

In the late 1990s and early 2000s, Computer Gaming World (later renamed Games for Windows: The Official Magazine) was among the most popular computer gaming magazines. Even today, more than a thousand libraries have holdings for this title. Hundreds of these libraries hold print copies, while even more provide digital access through one of multiple vendors. As a result of this redundancy, and the ease with which libraries have been able to collect materials produced by large magazine publishers, the legacy of this magazine is relatively safe. This example represents the traditional model of preservation, and materials such as these will almost certainly be preserved for future historians and enthusiasts to discover and examine.

Computer Gaming World was published during a time of rapid change in the media landscape, and its affiliated podcasts helped popularize the podcasting medium within the gaming niche. Although most libraries do not retain copies of the magazine’s podcasts, the prospects for their preservation appear relatively bright. First, the company that owns this material continues to host the podcast, even though the Web site is no longer updated. Second, many podcast episodes are held on the servers of the Internet Archive (https://archive.org/). Finally, the podcasts are easily retrievable through peer-to-peer file sharing programs. In truth, the intellectual property issues surrounding this form of archiving are far from clear – it is not obvious, for example, that the Internet Archive has the legal right to host this podcast. Still, for most practical purposes, the podcast is well preserved.

At the same time, the example of a popular gaming blog – Fidgit – highlights the ease with which digital content can go from accessible to nearly lost overnight. Fidgits host, SyFy, terminated the site and chose not to host a public archive of its content. Readers looking for Fidgits video gaming content were redirected to SyFys television channel Web site, and hundreds of articles and reviews formerly contained on Fidgit are currently lost for most purposes. Other Fidgit articles remain accessible, but only through the cumbersome interface of the Internet Archive’s Wayback Machine (https://archive.org/web/ – more on this later).

These three examples illustrate three different preservation outcomes. The print magazine is archived in many places under a relatively simple legal doctrine and distribution method. Moreover, the magazine’s copyright holders allow vendors to provide electronic access to the publication.

The podcast, on the other hand, is more emblematic of modern challenges.

Because podcasts are distributed as complete files with accompanying metadata, peer-to-peer file sharing and third-party hosts can easily redistribute and archive copies of the material. However, podcasts are preserved in ways that are largely dependent on the permisssiveness of its copyright holders and the culture surrounding podcasts. This means that copyright holders could challenge the existence of these archives.

In contrast, the blog simply vanished overnight; its disappearance facilitated by its very method of distribution. Instead of pushing out easily archivable copies of its content to readers, the blog was designed to bring users into the site, which was ever-changing. This characterizes the vast majority of Web sites available today. The Internet Archive, one of the world’s largest digital libraries, has made some of this site viewable. However, this archive has lost much of the site’s core functionality – most notably the ability to easily search old posts and follow links – and the context of many posts has been lost.

Meeting the challenges of digital preservation

These examples highlight the precarious nature of digital preservation and its dependence on technological innovation. The relatively seamless integration of features that make any given resource function while under active development can disappear under changing circumstances. For example, on any given Web site, versions of JavaScript, HTML, CSS and more can interact to create a single page; if that page is no longer actively maintained, the code may cease to function properly. Similarly, some site content – such as advertisements or images – may not be hosted on that site, thereby complicating preservation of the site’s content in its original context.

In addition to these technological challenges, we confront two important practical questions: what, exactly, do we want to preserve – and who will preserve it? In a sense, parts of Fidgit have largely been preserved, but what has been lost is the interaction of content. Links to the site now redirect to an entirely different site. The ability to easily search Fidgits content and understand the interaction of its pages has been diminished.

Libraries and other institutions are currently developing a new generation of tools that enable users to apply basic traditions of librarian stewardship to a new era. These new tools – like Perma.cc, Zotero and Storify – often focus on units of content smaller than an entire site, which allows them to preserve the content in greater context than larger-scale efforts. Many tools are built on technology from larger institutions, but they are end-user focused and designed to empower individuals to curate the content that is important to them.

Perma.cc

Perma.cc (http://perma.cc/) tackles the simple but pervasive problem of link rot, which occurs when Internet hyperlinks lead to webpages that are no longer in existence or no longer contain the same content. Perma.cc counteracts this problem by enabling users to easily create a permanent web hyperlink. Originally conceived of in the context of law libraries – where research shows that almost half of links contained in US Supreme Court opinions and > 70 per cent of links contained in the Harvard Law Review are now obsolete (Zittrain et al. 2014) – Perma.cc may become vital to preserving documents related to the core edifice of law.

The principle of creating permanent hyperlinks has applications that extend far beyond the law. When Perma.cc links to a site, it creates an electronic copy of the resource and any ephemera, such as pop-ups. Then, it generates a new URL pointing to the archived copy. Perma.cc is hosted through a collaboration of a number of libraries, which legitimizes the resource and helps ensure that it will continue into the future. If the myriad sites, forum posts or news outlets had used similar technology to archive their links to Fidgit posts, those links would still function today.

Permanent hyperlinking tools serve as a building block to help us understand the future of preservation technology. Instead of trying to archive a large set of complicated interrelated webpages, Perma.cc preserves one specific connection at a specific point in time, thereby preserving the context of the link for the future.

Zotero

Zotero (http://www.zotero.org), a citation management system, like Endnote, Mendeley or RefWorks, demonstrates that archival technology does not have to stand on its own. Tools like Zotero enable users to gather metadata (such as access date, Web site title, etc) from Web sites, articles and books, and easily reformat that metadata any number of ways (most commonly as citations for academic writing).

When Zotero gathers this basic information from a Web site, it also archives a copy of the Web site in its database. This ensures that the user can view the archived webpage to see how it looked when they originally viewed it (note that this tool also enables the user to view the archived version even without continual access to the Internet). Additionally, Zotero includes an internal editor, which allows users to annotate their copy of the Web site. In this way, archival technology complements and enhances tools that are designed to accomplish other tasks.

Storify: a glimpse into the future

Storify (https://storify.com/) points the way toward a more dynamic understanding of the preservation of context. This tool allows users to take content from a variety of sources, including popular social media platforms (such as Twitter, Instagram and Facebook) and remix them into customized timelines, which they can then share with their friends. Users, therefore, have the ability to contextualize and recontextualize digital content.

Librarians have used Storify as a digital curation tool to tell stories that are relevant to their institution. For example, a library can Storify a timeline of tweets alongside YouTube videos and images of a library event or give a virtual tour of their library alongside user comments posted to Twitter.

Lessons from Storify

Storify demonstrates how the concept of stewardship extends beyond information storage. With Storify, users can curate materials to highlight and display aspects that seem most relevant to them. Digital resources are constantly evolving, and the possibilities for which aspects to display – and in what context – are nearly endless.

Storify also integrates with social media, an important step toward conceptualizing the future of preservation. By enabling users to place video content alongside content from other social media, Storify can be used to capture content that is disconnected in both time and origin. For example, while a captured YouTube page will display related videos that were chosen by YouTube, Storify can capture reactions and responses from across social media, allowing future readers to understand connections between content from Twitter, Facebook, YouTube and other sources. Thus, Storify captures the video’s significance to its curator.

Storify’s integration with social media is important because an ever-increasing category of digital content is being produced through this medium. Social media tools enable users to create and share content, but that content is locked within proprietary systems, which makes conventional archiving difficult. In the past, primary sources like personal letters and diaries were relatively easy to preserve. However, as correspondence and public discourse has moved online, it has created a variety of preservation challenges. Although Storify does not resolve many of the problems surrounding the preservation of quasi-public discourse in social media, it does engage with these issues and helps to demonstrate the value of such content, which may one day be a vital part of our historical record.

Storify also raises questions about what context is important to preserve. For example: Do we need to know which ads were present alongside a post to understand the post? Do we need to know what news stories were covered that day to understand a Twitter conversation?

Finally, Storify engages non-librarians through its ease of use and provision of immediate reward mechanisms for sharing and curating content. As a result, Storify’s tremendous crowd sourcing potential could allow for more participatory forms of curation. Storify is not primarily a preservation tool, and its content could disappear based on lack of funding. However, as a curation tool, Storify can help us envision what might happen if straightforward curation tools are integrated with more traditional archiving infrastructure.

Institutional backbones

Although this column focuses on technology that can empower individuals, rather than on institutional preservation efforts, these two often complement one another. Institutions such as the Internet Archive, the Internet Memory Foundation (http://internetmemory.org) and Memento (http://www.mementoweb.org/) have built the technological infrastructure that enables large-scale archiving and have developed end-user tools for individuals.

For example, people can use Memento – through either a browser plug-in or the mobile web browser – to view past versions of a Web site, which may bear little resemblance to how it looks today. Similarly, the Wayback Machine, which is hosted by the Internet Archive and is one of the most visible instances of online webpage archives, has made copies of > 400 billion web pages. The archived pages are now freely accessible, either through a web-interface or as an integrated browser add-on. The Internet Archive seeks to provide archival copies of all forms of digital materials. It also hosts a number of tools and initiatives, including a broken link checker, a Save Page Now tool (similar to Perma.cc) and an API for its Wayback Machine.

The Internet Memory Foundation is developing the open source technological infrastructure that will facilitate even better archiving over time. As this happens, the Internet may become a more malleable resource, where the current version of a Web site is only a starting point for exploring its past, and defunct Web sites will become increasingly findable.

Intellectual property and ethical pitfalls

One major limiting factor in the growth of these tools is the complexity of the relevant intellectual property law. Potential archivists must not only contend with local regulations, but they must also consider international law if they want to share their efforts. Legal concepts pertaining to ownership, artistic and moral rights, fair use and the transformative nature of archiving are of crucial importance. Many Web sites also have terms of service agreements that bring in contract law.

Attempts to archive and display information contained in private social media accounts or online forum postings and blogs will undoubtedly provide a tremendous service to future historians. But such activities could also violate the implicit boundaries of users. Just so, Facebook was recently involved in a study that tweaked content on some of its users’ feeds to determine how the content would impact its users’ emotions. That most analysts view this research as legal has not prevented many users from becoming upset and feeling violated (Franceschi-Bicchierai, 2014).

In the process of thinking through preservation, libraries must not only consider questions of legality, but also of ethics. For example, if a user decides to take down their personal Web site, should that site be preserved even if the owner was unaware that the site’s robots.txt file gave permission for it to be archived? Consider another example. Tweets are often made in public with the implicit understanding that they function in the moment, are read primarily by a small audience of followers and are usually ignored soon thereafter. Preserving these tweets and making them accessible in different contexts may transform them in ways that violate their creators’ assumptions. Such questions become even more complicated when contemplating sources of information that are less clearly public, such as e-mail or private social networks.

Conclusion

As these curation and archival tools develop, they will continue to transform the nature of what is possible. As this change occurs, librarians can assist patrons in using practical archival tools that meet their needs and taking advantage of large-scale archives that already exist. Librarians will also be able to participate in systematic permission-based preservation of digital materials more easily than ever before.

As these tools become more integrated into the fabric of the Internet, libraries are equally well positioned to help inform their users about potential implications and pitfalls. Librarians can use their expertise about how information is structured to preserve existing resources and to help share these resources with their patrons. Additionally, librarians can empower patrons to help archive digital materials that are important to them in the context that matters to them.

Peter Fernandez

References

Franceschi-Bicchierai, L. (2014), “Facebook playing your feelings is legal but “creepy,” say law experts”, Mashable, available at: http://mashable.com/2014/07/01/facebook-emotions-study-legal/ (accessed 11 July 2014).

Zittrain, J. Albert, K. and Lessig, L. (2014), “Perma: scoping and addressing the problem of link and reference rot in legal citations”, Harvard Law Review F, Vol. 127, pp. 176-199.

Peter Fernandez (pfernand@utk.edu) is based at Research Services Agricultural Sciences & Natural Resources, University of Tennessee, Knoxville, Tennessee, USA

Related articles