Tackling Internet Infoglut. How Can Internet Researchers Keep Sane?

Library Hi Tech News

ISSN: 0741-9058

Article publication date: 1 May 1999

95

Citation

Kennewick, M. (1999), "Tackling Internet Infoglut. How Can Internet Researchers Keep Sane?", Library Hi Tech News, Vol. 16 No. 5. https://doi.org/10.1108/lhtn.1999.23916eaf.001

Publisher

:

Emerald Group Publishing Limited

Copyright © 1999, MCB UP Limited


Tackling Internet Infoglut. How Can Internet Researchers Keep Sane?

Introduction

In 1936, H.G. Wells envisioned a large encyclopedia similar to the Internet's World Wide Web. Today, with an estimated 61 million Internet users nationwide, it appears that his vision has become a reality. However, Wells failed to anticipate how the rise of the Internet would create the problem of information overload (Shillingford, 1998). A recent Reuters survey of 1,300 managers found that almost half believed the Internet will be a prime cause of information overload over the next few years (King, 1996). Sheer volume demonstrates this: by late 1997, there were more than 640,000 Web sites and 100 million pages on the Web, a number that continues to double every six months.

"Infoglut" is the new term to describe our constant saturation by too much information. As Nobel laureate and economist Herbert Simon once observed: "A wealth of information creates a poverty of attention" (Simon, 1999). In other words, with so much out there, how do you know what to pay attention to? And once you decide what is important, then how do you manage the information for future use? These are the problems of infoglut: an overabundance of information met by under-equipped consumers.

But infoglut has its good side. Despite the problems caused by infoglut, most managers said they could not operate effectively without a high level of information, and most found the Internet to be a prime source for that information. They want the information, but they want to stay sane while they try to find, organize, track, file, boil down, index, refer to and share it.

Whether people consider themselves "researchers" or not, a primary use of all this information on the Web is for research of all stripes. There are public databases with government and business information, such as trademark databases or tax records. There is competitive information, culled from manufacturers' Web sites, research reports and magazine articles. PR firms do coverage research, job seekers do employer research, consumers research major purchases, and patients and doctors look up the latest medical research. This variety and usefulness of Internet information is a boon, and is shaking up old ways of working.

The fact that the Internet is revolutionizing our world is not surprising. What is surprising is how unprepared we are to take advantage of the wealth of information available to us. It is up to the industry that spawned the beast called infoglut to come up with tools for dealing effectively with it.

People trying to conduct research on the Web ­ an activity more than 80 percent of Internet users cite as one of their primary reasons for going online ­ have two goals as they consume information: to discover the right information, and then to use it to make decisions. So how can people resolve the paradox of too much information and too few tools to help them manage it all?

Solving the First Problem: Information Is Hard to Find

For anyone trying to conduct research, the Web has just too many places to look. Today's search engines sort Web resources in such a broad manner that a single query can find literally thousands of hits, many of them with no relevance to the intended search topic. In addition, according to a study conducted by the NEC Research Institute, at best, only 35 percent of all web content is indexed by the search engines, leaving a large proportion of content basically unfindable (Lawrence and Giles, 1998).

Thus, researchers need a more intuitive and broader-reaching way to use the Web, one that is more like a "how-to book" rather than a large, indexed encyclopedia. For the discovery portion of the process, search tools need to become simpler, smarter and more granular.

Some progress has been made to make it easier to find information by reducing the quantity of Web site hits found in each query. The Ask Jeeves search site, for example, includes a pattern-matching mechanism that allows users to conduct searches using natural language, and retrieving a relatively short list of relevant Web sites. Metasearch sites, such as Infoseek Express, offer another approach to refining the search process.

Portal sites have also been working on ways to make online information easier for researchers to find. For example, Yahoo! has created an extensive list of Web-site hierarchies based on its own identified list of topics. By clicking on Recreation, for example, the user is given the next level of subcategories to choose from, including chat, dating and dance. However, Yahoo!'s preselected categories might not be arranged in a pattern that makes sense to the user.

So to further personalize the searching process, there have been some efforts by portal sites such as "My Yahoo!" to give users a search experience that remembers preferences and favorite sites. These features can save time during the search process because, through the pre-set portal, the search engine remembers what the user wants. It does not need to be reconfigured before every inquiry.

The next generation of Internet formatting standards after HTML (hypertext markup language), called XML (eXtensible Markup Language), also offers some relief. XML will enable Web page designers to tag specific kinds of content with standard classifications, such as by subject matter or industries. By enabling more effective identification of content, retrieval promises to become a more efficient process.

After the Search ­ How Do You Organize or Share Findings?

Discovery is one part of the infoglut problem. Usage is the other, less well-defined half. End-users today lack any effective way to organize, track or share retrieved Web information. The lack of a way to use information already accumulated is a big problem. Most people today use their browser's tools to bookmark URLs (Uniform Resource Locators), or they print out Web pages and sort them into piles, but these methods have severe limitations.

For example, a link to a daily news site carries the news of only that day, not necessarily the article the user intended to bookmark. If it is updated frequently, the URL can become meaningless. Similarly, a bookmarked URL can lead to an invalid or obsolete site (the same NEC research study cited earlier found some search engines brought up invalid links 5 percent of the time).

Similarly, the manual method of printing out Web pages and putting the hard copies in folders is only a partial solution, offering no keyword search, quick link, abstract, index or other management features possible through digital technology.

While some point to "knowledge management" software as a solution to the infoglut problem, this category of tools is not necessarily aimed at the day-to-day Internet surfer, but at professionals mining corporate data found in large enterprises. These large data-mining analysis tools cannot address the Web research issues. However, the concept introduced by the knowledge management software providers ­ the need to turn information into actionable knowledge easily available to the right people ­ is as applicable to Web research as it is for those using large data-mining applications.

What people doing research on the Web really want are tools that make their everyday lives easier. They need tools to handle the volume of information they are tracking, ways to organize information logically, and ways to find it again later, classify it and reuse it.

Moving Toward the Ideal Research Tool

What end-users need is a set of individual tools to help organize Web research in a topical way, then help them manage it and easily share that information with others. In other words, users need an "after the search" solution to the infoglut problem. Such tools would not only allow users to reference Web sites, but to collect or link all the information together in an associative manner, rather like a personal Internet composed of HTML pages and hot links.

By providing Web researchers with individual research collection tools, it would help alleviate two of the Internet's greatest problems: too much information and no way to manage it all.

Research tools to solve the problems of information consumption need to help people remember where a site is, and refresh it or not, depending on if the information needed is static or dynamic. Alternatively, a tool that could let people determine whether they want to be informed when a site is updated, and determine if they want to refresh it at that time, would be useful. People also want to annotate their selections, making note of what is there, and why they chose it. People may want just a piece of information, or the entire site, and they want the flexibility to choose which they save. They want the ability to retain information, show changes, and see trends over time.

Ideally, these tools will allow users to compare old and new information by keeping a record of content as it changes. Being able to track changes in Web information over time is particularly important for disciplines that need to capture trends indicated by changes in data.

Then, people will want to have abstracts for the information, index it in the method they desire, and store the document via keywords that make sense to them. In effect, they want a personal Internet librarian; someone ­ or some tool ­ to organize and manage their research, and serve it up at a moment's notice.

Using these tools, researchers will be able to share their Web information in associative formats to create mini Web "user guides" or templates. For example, an organization could create a "how-to" guide for researching a city's resources for newly relocated employees. The template might include links to finding temporary housing, statistics on school quality and ratings on the best real estate agents. Alternatively, a researcher can create a personal digest or report, keeping tabs on sources and links. Basically, these tools let people create their own personal search indices and categories that map back to how they think.

The Problem Has Been Identified. The Solutions Are Next

Companies that learn how to tame infoglut will gain a considerable competitive advantage. In the next five years, it will become even more critical for organizations to learn how to use Internet resources to the fullest, because our "anytime, anywhere" information society is only going to accelerate. There will also be more interactive information on the Web and users will be able to access information no matter where they are, via telephone, digital assistant, car, hotel room or their own desktop.

So look for tools that address the infoglut problem­both on the consumption and the usage side­to become pervasive. Once we have them, we'll all become saner.

References

King, J. (1996), "Information overload threatens managers' health", Computerworld, 17 October, http://www.computerworld. com/home/online9697.nsf/all/9610147information

Lawrence, S. and Giles, L. (1998), "Searching the World Wide Web," Science, Vol.280 No. 5360, p. 98.

Shillingford, J. (1998), "The evolving office environment", Financial Times, p. 4.

Simon, H. (1999), Cybernation, 15 March, http://www.cybernation.com/victory/quotations/authors/quotes_simon_herbert.html

Mike Kennewick is CEO of Webforia, an Internet software company addressing the problems of infoglut, Bellevue, Washington. http://www.webforia.com or (425) 401-6500.

Related articles