Guide to the professional literature

Online Information Review

ISSN: 1468-4527

Article publication date: 2 October 2007

212

Citation

(2007), "Guide to the professional literature", Online Information Review, Vol. 31 No. 5. https://doi.org/10.1108/oir.2007.26431eae.002

Publisher

:

Emerald Group Publishing Limited

Copyright © 2007, Emerald Group Publishing Limited


Guide to the professional literature

This column is designed to alert readers to pertinent wider journal literature on digital information and research.

The Accessibility of Chinese Local Government Web Sites: An Exploratory Study

Shi, Y.Q. in Government Information Quarterly, Vol. 24, No. 2, 2007, pp. 377-403

With the rapid development of Chinese e-government, Chinese citizens are encouraged to access e-government services as their convenience. However, the accessibility of Chinese e-government web sites has been overlooked. This research study tries to provide an overview of the accessibility of Chinese local government web sites. Three hundred twenty-four Chinese local government web sites were examined to find out how accessible they are with reference to the Web Content Accessibility Guidelines 1.0 (WCAG) published by the World Wide Web Consortium (W3C). This research found that all the surveyed Chinese e-government web sites failed one or more W3C’s accessibility measures and thus many disabled Chinese people may have substantial problems to access them. Several valuable recommendations are made based on the research findings and the China’s actual conditions.

An Assessment of the Usability of an Internet-Based Education System in a Cross-Cultural Environment: The Case of The Interreg Crossborder Program in Central Europe

Blazic, B.J., Law, E.L.C., and Arh, T. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 1, 2007, pp. 66-75

In this article, we assess the usability in an internet-based system for e-learning in a cross-cultural environment. The context of the evaluation and testing was a training program launched with the intention of introducing and promoting a new way of learning about and understanding the emerging technologies in regions with a low educational level and a high unemployment rate. The aim of the study was to assess the usability of the e-learning system with different methods and approaches to get a good assessment of its learnability and applicability in various circumstances.

Automatic Cognitive Style Identification of Digital Library Users for Personalization

Frias-Martinez, E., Chen, S.Y., and Liu, X.H. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 2, 2007, pp. 237-251

Digital libraries have become one of the most important web services for information seeking. One of their main drawbacks is their global approach: In general, there is just one interface for all users. One of the key elements in improving user satisfaction in digital libraries is personalization. When considering personalizing factors, cognitive styles have been proved to be one of the relevant parameters that affect information seeking. This justifies the introduction of cognitive style as one of the parameters of a web personalized service. Nevertheless, this approach has one major drawback: Each user has to run a time-consuming test that determines his or her cognitive style. In this article, we present a study of how different classification systems can be used to automatically identify the cognitive style of a user using the set of interactions with a digital library. These classification systems can be used to automatically personalize, from a cognitive-style point of view, the interaction of the digital library and each of its users.

Can Interactivity Make a Difference? Effects of Interactivity on the Comprehension of and Attitudes toward Online Health Content

Lustria, M.L.A. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 6, 2007, pp. 766-776

The internet is increasingly being recognized for its potential for health communication and education. The perceived relative advantage of the internet over other media is its cost-effectiveness and interactivity, which in turn contribute to its persuasive capabilities. Ironically, despite its potential, we are nowhere nearer understanding how interactivity affects processing of health information and its contribution in terms of health outcomes. An experiment was conducted to examine the effects of web interactivity on comprehension of and attitudes towards two health web sites, and whether individual differences might moderate such effects. Two sites on skin cancer were designed with different levels of interactivity and randomly assigned to 441 undergraduate students (aged 18-26) at a large southeastern university. The findings suggest that interactivity can significantly affect comprehension as well as attitudes towards health web sites. The article also discusses insights into the role of interactivity on online health communications, and presents implications for the effective design of online health content.

Checking out Facebook.Com: The Impact of a Digital Trend on Academic Libraries

Charnigo, L., and Barnett-Ellis, P. in Information Technology and Libraries, Vol. 26, No. 1, 2007, pp. 23-34

While the burgeoning trend in online social networks has gained much attention from the media, few studies in library science have yet to address the topic in depth. This article reports on a survey of 126 academic librarians concerning their perspectives toward Facebook.com, an online network for students. Findings suggest that librarians are overwhelmingly aware of the “Facebook phenomenon”. Those who are most enthusiastic about the potential of online social networking suggested ideas for using Facebook to promote library services and events. Few individuals reported problems or distractions as a result of patrons accessing Facebook in the library. When problems have arisen, strict regulation of access to the site seems unfavorable. While some librarians were excited about the possibilities of Facebook, the majority surveyed appeared to consider Facebook outside the purview of professional librarianship.

Children as Architects of Web Directories: An Exploratory Study

Bar-Ilan, J., and Belous, Y. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 6, 2007, pp. 895-907

Children are increasingly using the web. Cognitive theory tells us that directory structures are especially suited for information retrieval by children; however, empirical results show that they prefer keyword searching. One of the reasons for these findings could be that the directory structures and terminology are created by grown-ups. Using a card-sorting method and an enveloping system, we simulated the structure of a directory. Our goal was to try to understand what browsable, hierarchical subject categories children create when suggested terms are supplied and they are free to add or delete terms. Twelve groups of four children each (fourth and fifth graders) participated in our exploratory study. The initial terminology presented to the children was based on names of categories used in popular directories, in the sections on Arts, Television, Music, Cinema, and Celebrities. The children were allowed to introduce additional cards and change the terms appearing on the 61 cards. Findings show that the different groups reached reasonable consensus; the majority of the category names used by existing directories were acceptable by them and only a small minority of the terms caused confusion. Our recommendation is to include children in the design process of directories, not only in designing the interface but also in designing the content structure as well.

Defining a Session on Web Search Engines

Jansen, B.J., Spink, A., Blakely, C., and Koshman, S. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 6, 2007, pp. 862-871

Detecting query reformulations within a session by a web searcher is an important area of research for designing more helpful searching systems and targeting content to particular users. Methods explored by other researchers include both qualitative (i.e. the use of human judges to manually analyze query patterns on usually small samples) and nondeterministic algorithms, typically using large amounts of training data to predict query modification during sessions. In this article, we explore three alternative methods for detection of session boundaries. All three methods are computationally straightforward and therefore easily implemented for detection of session changes. We examine 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005. We compare session analysis using: Internet Protocol address and cookie; Internet Protocol address, cookie, and a temporal limit on intrasession interactions; and Internet Protocol address, cookie, and query reformulation patterns. Overall, our analysis shows that defining sessions by query reformulation along with Internet Protocol address and cookie provides the best measure, resulting in an 82 percent increase in the count of sessions. Regardless of the method used, the mean session length was fewer than three queries, and the mean session duration was less than 30 min. Searchers most often modified their query by changing query terms (nearly 23 percent of all query modifications) rather than adding or deleting terms. Implications are that for measuring searching traffic, unique sessions may be a better indicator than the common metric of unique visitors. This research also sheds light on the more complex aspects of web searching involving query modifications and may lead to advances in searching tools.

Development of Measures of Online Privacy Concern and Protection for Use on the Internet

Buchanan, T., Paine, C., Joinson, A.N. and Reips, U.D. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 2, 2007, pp. 157-165

As the internet grows in importance, concerns about online privacy have arisen. The authors describe the development and validation of three short internet administered scales measuring privacy-related attitudes (Privacy Concern) and behaviors (General Caution and Technical Protection). In Study 1, 515 people completed an 82-item questionnaire from which the three scales were derived. In Study 2, scale validity was examined by comparing scores of individuals drawn from groups considered likely to differ in privacy-protective behaviors. In Study 3, correlations between the scores on the current scales and two established measures of privacy concern were examined. The authors conclude that these scales are reliable and valid instruments suitable for administration via the internet, and present them for use in online privacy research.

Does Topic Metadata Help with Web Search?

Hawking, D., and Zobel, J. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 5, 2007, pp. 613-628

It has been claimed that topic metadata can be used to improve the accuracy of text searches. Here, we test this claim by examining the contribution of metadata to effective searching within web sites published by a university with a strong commitment to and substantial investment in metadata. The authors use four sets of queries, a total of 463, extracted from the university’s official query logs and from the university’s site map. The results are clear: The available metadata is of little value in ranking answers to those queries. A follow-up experiment with the web sites published in a particular government jurisdiction confirms that this conclusion is not specific to the particular university. Examination of the metadata present at the university reveals that, in addition to implementation deficiencies, there are inherent problems in trying to use subject and description metadata to enhance the searchability of web sites. Our experiments show that link anchor text, which can be regarded as metadata created by others, is much more effective in identifying best answers to queries than other textual evidence. Furthermore, query-independent evidence such as link counts and uniform resource locator (URL) length, unlike subject and description metadata, can substantially improve baseline performance.

E-Mail Interviewing in Qualitative Research: A Methodological Discussion

Meho, L.I., in Journal of the American Society for Information Science and Technology, Vol. 57, No. 10, 2006, pp. 1284-1295

This article summarizes findings from studies that employed electronic mail (e-mail) for conducting indepth interviewing. It discusses the benefits of, and the challenges associated with, using e-mail interviewing in qualitative research. The article concludes that while a mixed mode interviewing strategy should be considered when possible, e-mail interviewing can be in many cases a viable alternative to face-to-face and telephone interviewing. A list of recommendations for carrying out effective e-mail interviews is presented.

Enriching E-Learning Metadata through Digital Library Usage Analysis

Ferran, N., Casadesus, J., Krakowska, M., and Minguillon, J. in The Electronic Library, Vol. 25, No. 2, 2007, pp. 148-165

The purpose of this research is to propose an evaluation framework for analysing learning objects usage, with the aim of extracting useful information for improving the quality of the metadata used to describe the learning objects, but also for personalization purposes, including user models and adaptive itineraries. The paper presents experimental results from the log usage analysis during one academic semester of two different subjects, 350 students. The experiment examines raw server log data generated from the interactions of the students with the classroom learning objects, in order to find relevant information that can be used to improve the metadata used for describing both the learning objects and the learning process. Preliminary studies have been carried out in order to obtain an initial picture of the interactions between learners and the virtual campus, including both services and resources usage. These studies try to establish relationships between user profiles and their information and navigational behavior in the virtual campus, with the aim of promoting personalization and improving the understanding of what learning in virtual environments means.

An Examination of Searcher’s Perceptions of Nonsponsored and Sponsored Links during E-commerce Web Searching

Jansen, B.J. and Resnick, M. in Journal of the American Society for Information Science and Technology, Vol. 57, No. 14, 2006, pp. 1949-1961

This article reports results of an investigation into the effect of sponsored links on ecommerce information seeking on the web. In this research, 56 participants each engaged in six ecommerce web searching tasks. We extracted these tasks from the transaction log of a web search engine, so they represent actual ecommerce searching information needs. Using 60 organic and 30 sponsored web links, the quality of the web search engine results was controlled by switching nonsponsored and sponsored links on half of the tasks for each participant. This allowed for investigating the bias toward sponsored links while controlling for quality of content. The study also investigated the relationship between searching self-efficacy, searching experience, types of ecommerce information needs, and the order of links on the viewing of sponsored links. Data included 2,453 interactions with links from result pages and 961 utterances evaluating these links. The results of the study indicate that there is a strong preference for nonsponsored links, with searchers viewing these results first more than 82 percent of the time. Searching self-efficacy and experience does not increase the likelihood of viewing sponsored links, and the order of the result listing does not appear to affect searcher evaluation of sponsored links. The implications for sponsored links as a long-term business model are discussed.

Faceted Classification in Web Information Architecture – A Framework for Using Semantic Web Tools

Uddin, M.N., and Janecek, P. in The Electronic Library, Vol. 25, No. 2, 2007, pp. 219-233

The purpose of this paper is to develop and implement a faceted classification structure to improve web information organization, access and navigability. Some case studies of commercial websites using faceted metadata were analyzed to develop the classification approach. The proposed framework adapts the facet analysis theory from Faceted Classification System (FCS) to use semantic web tools especially XML and RDF store, and ontology, and is designed to be integrated within a Content Management System (CMS). A detailed example of a faceted classification system for an academic information system is used to demonstrate the construction of an FCS from metadata. Detailed examples show how classifying and organising information in multidimensional hierarchies is more accessible than simple one-dimensional taxonomic hierarchies.

Forming an Instructional Approach to Teach Web Searching Skills to Non-English Users

Lazarinis, F. in Program – Electronic Library and Information Systems, Vol. 41, No. 2, 2007, pp. 170-179

Locating information on the internet is an important skill in the information society. Some recent studies showed that searching using non-English terms is a more demanding task than searching in English. Based on these observations, this paper aims to apply the Instructional System Design (ISD) methodology to analyze, design and implement a training course for Greek users. This instructional approach considers the explanation of the internal search engine intelligence and inefficiencies with respect to non-English natural language as its basic structural element. Based on the ISD methodology, the tasks that needed to be trained as a web searcher were identified and a six-phase instructional sequence was constructed. The instructional methodology is evaluated with the aid of students in an authentic environment. The evaluation revealed that learners who followed the structured approach and were aware of the search engines’ limitations relating to the Greek language performed better in the web searching experiments.

The Impact of Time Constraints on Internet and Web Use

Slone, D.J. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 4, 2007, pp. 508-517

This study examines the influence of time constraints on internet and web search goals and search behavior. Specifically, it looks at the searching behavior of public library internet users who, previously limited to 30 minutes per internet session, are given an unlimited amount of time for use. Interviews and observations were conducted with 34 participants searching on their own queries. Despite an increase in the time allowed for searching, most people spent less than 30 minutes on the internet, carrying out tasks like paying bills, shopping, browsing, and making reservations. Those who took more than 30 minutes were looking for jobs or browsing. e-mail use was universal. In this context, influences like time-dependent and time-independent tasks, use of search hubs to perform more efficient searches, and search diversity were recorded. Though there are a number of large and small studies of internet and web use, few of them focus on temporal influences. This study extends knowledge in this area of inquiry.

Improving the Search Process through Ontology-based Adaptive Semantic Search

Yang, C., Yang, K.C., and Yuan, H.C. in The Electronic Library, Vol. 25, No. 2, 2007, pp. 234-248

The purpose of this research is to describe an efficient search methodology to help improve the search results in the top portion of a lengthy search list. When facing a lengthy search list, people often limit themselves to the top ten items on the list, even though there may be more useful information after the top ten items. This study proposes an ontology-based adaptive semantic search to significantly improve the search experience. To capture the semantic difference of search terms, naive ontology is used to store the relationship among terms. Before a search term is processed by the search engine Lucene, the related words of the search term are selected from ontology structures to form new query phrases in the process of query expansion. The weighting of the expanded query phrases is dynamically learned by observing the users’ clicking behaviours. Research results show that with the aid of ontology the average precision rate of all cases is dramatically higher than the precision rate for the default search result. Even in the worst cases, in some situations, this ontology is still close to the precision rate for the default search result.

Language Evolution and the Spread of Ideas on the Web: A Procedure for Identifying Emergent Hybrid Word Family Members

Thelwall, M., and Price, L. in Journal of the American Society for Information Science and Technology, Vol. 57, No. 10, 2006, pp. 1326-2337

Word usage is of interest to linguists for its own sake as well as to social scientists and others who seek to track the spread of ideas, for example, in public debates over political decisions. The historical evolution of language can be analyzed with the tools of corpus linguistics through evolving corpora and the web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the web, focusing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.

Link Decay in Leading Information Science Journals

Goh, D.H.L., and Ng, P.K. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 1, 2007, pp. 15-24

Web citations have become common in scholarly publications as the amount of online literature increases. Yet, such links are not persistent and many decay over time, causing accessibility problems for readers. The present study investigates the link decay phenomenon in three leading information science journals. Articles spanning a period of seven years (1997-2003) were downloaded, and their links were extracted. From these, a measure of link decay, the half-life, was computed to be approximately five years, which compares favorably against other disciplines (1.4-4.8 years). The study also investigated types of link accessibility errors encountered as well as examined characteristics of links that may be associated with decay. It was found that approximately 31 percent of all citations were not accessible during the time of testing, and the majority of errors were due to missing content (HTTP Error Code 404). Citations from the edu domain were also found to have the highest failure rates of 36 percent when compared with other popular top-level domains. Results indicate that link decay is a problem that cannot be ignored, and implications for journal authors and readers are discussed.

Measuring Online information Seeking Context, Part 2: Findings and Discussion

Kelly, D. in Journal of the American Society for Information Science and Technology, Vol. 57, No. 14, 2006, pp. 1862-1874

Context is one of the most important concepts in information seeking and retrieval research. However, the challenges of studying context are great; thus, it is more common for researchers to use context as a post hoc explanatory factor, rather than as a concept that drives inquiry. The purpose of this study was to develop a method for collecting data about information seeking context in natural online environments, and identify which aspects of context should be considered when studying online information seeking. The study is reported in two parts. In this, the second part, results and implications of this research are presented. Part 1 (Kelly, 2006) discussed previous literature on information seeking context and behavior, situated the current study within this literature, and described the naturalistic, longitudinal research design that was used to examine and measure the online information seeking context of seven users during a 14-week period. Results provide support for the value of the method in studying online information seeking context, the relative importance of various measures of context, how these measures change over time, and, finally, the relationship between these measures. In particular, results demonstrate significant differences in distributions of usefulness ratings according to task and topic.

Metadata for Digital Collections of E-Government Resources

Tambouris, E., Manouselis, N., and Costopoulou, C. in The Electronic Library, Vol. 25, No. 2, 2007, pp. 176-192

The purpose of this paper is to introduce a process for developing a metadata element set that will describe e-government resources in digital collections. The outcome of the process is a metadata schema that reuses as many elements as possible from existing specifications and standards (termed as an e-government metadata application profile). The use of e-government metadata is to facilitate the electronic categorisation and storage of governmental resources, as well as to enhance users’ electronic interactions with the public sector. The paper extends an initial process presented in the context of the European Standardization Committee CEN/ISSS, proposing four steps for developing the application profile: determine the resources to be described by the metadata, identify the stakeholder groups who will use the metadata, determine the use of metadata for each stakeholder group, and specify the metadata elements corresponding to each use. The steps of the proposed process are followed in order to develop an e-government metadata application profile for a particular digital collection: a one-stop governmental web portal that enables discovery and access to e-government services and documents residing at the websites of geographically dispersed public authorities.

The Power and Vulnerability of the “New Professional”: Web Management in UK Universities

Cox, A. in Program – Electronic Library and Information Systems, Vol. 41, No. 2, 2007, pp. 148-169

The purpose of this research is to explore the character of an emergent occupational role, that of university web manager. The primary data used were 15 semi-structured interviews conducted in 2004. These were analyzed partly for factual and attitudinal data, but also for the discursive interpretative repertoires in use. The paper examines the diverse backgrounds, occupational trajectories, organisational positions, job roles and status of practitioners working in “web management” in UK higher education. The discursive divide between the marketing and IT approaches to the web is investigated. Two case studies explore further the complexity and creativity involved in individuals’ construction of coherent and successful occupational identities.

Predicting User Concerns about Online Privacy

Yao, M.Z., Rice, R.E., and Wallis, K. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 5, 2007, pp. 710-722

With the rapid diffusion of the internet, researchers, policy makers, and users have raised concerns about online privacy, although few studies have integrated aspects of usage with psychological and attitudinal aspects of privacy. This study develops a model involving gender, generalized self-efficacy, psychological need for privacy, internet use experience, internet use fluency, and beliefs in privacy rights as potential influences on online privacy concerns. Survey responses from 413 college students were analyzed by bivariate correlations, hierarchical regression, and structural equation modeling. Regression results showed that beliefs in privacy rights and a psychological need for privacy were the main influences on online privacy concerns. The proposed structural model was not well supported by the data, but a revised model, linking self-efficacy with psychological need for privacy and indicating indirect influences of internet experience and fluency on online privacy concerns about privacy through beliefs in privacy rights, was supported by the data.

Redips: Backlink Search and Analysis on the Web for Business Intelligence Analysis

Chau, M., Shiu, B., Chan, I., and Chen, H. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 3, 2007, pp. 351-365

The World Wide Web presents significant opportunities for business intelligence analysis as it can provide information about a company’s external environment and its stakeholders. Traditional business intelligence analysis on the web has focused on simple keyword searching. Recently, it has been suggested that the incoming links, or backlinks, of a company’s web site (i.e. other web pages that have a hyperlink pointing to the company of interest) can provide important insights about the company’s “online communities”. Although analysis of these communities can provide useful signals for a company and information about its stakeholder groups, the manual analysis process can be very time-consuming for business analysts and consultants. In this article, we present a tool called Redips that automatically integrates backlink meta-searching and text-mining techniques to facilitate users in performing such business intelligence analysis on the web. The architectural design and implementation of the tool are presented in the article. To evaluate the effectiveness, efficiency, and user satisfaction of Redips, an experiment was conducted to compare the tool with two popular business intelligence analysis methods-using backlink search engines and manual browsing. The experiment results showed that Redips was statistically more effective than both benchmark methods (in terms of Recall and F-measure) but required more time in search tasks. In terms of user satisfaction, Redips scored statistically higher than backlink search engines in all five measures used, and also statistically higher than manual browsing in three measures.

Relevance Criteria Identified by Health Information Users during Web Searches

Crystal, A., and Greenberg, J. in Journal of the American Society for Information Science and Technology, Vol. 57, No. 10, 2006, pp. 1368-1382

This article focuses on the relevance judgments made by health information users who use the web. Health information users were conceptualized as motivated information users concerned about how an environmental issue affects their health. Users identified their own environmental health interests and conducted a web search of a particular environmental health web site. Users were asked to identify (by highlighting with a mouse) the criteria they use to assess relevance in both web search engine surrogates and full-text web documents. Content analysis of document criteria highlighted by users identified the criteria these users relied on most often. Key criteria identified included (in order of frequency of appearance) research, topic, scope, data, influence, affiliation, web characteristics, and authority/person. A power-law distribution of criteria was observed (a few criteria represented most of the highlighted regions, with a long tail of occasionally used criteria). Implications of this work are that information retrieval (IR) systems should be tailored in terms of users’ tendencies to rely on certain document criteria, and that relevance research should combine methods to gather richer, contextualized data. Metadata for IR systems, such as that used in search engine surrogates, could be improved by taking into account actual usage of relevance criteria. Such metadata should be user-centered (based on data from users, as in this study) and context appropriate (fit to users’ situations and tasks).

The Role of the Internet in Informal Scholarly Communication

Barjak, F. in Journal of the American Society for Information Science and Technology, Vol. 57, No. 10, 2006, pp. 1350-1367

The present analysis looks at how scientists use the internet for informal scientific communication. It investigates the relationship between several explanatory variables and internet use in a cross-section of scientists from seven European countries and five academic disciplines (astronomy, chemistry, computer science, economics, and psychology). The analysis confirmed some of the results of previous USA-based analyzes. In particular, it corroborated a positive relationship between research productivity and internet use. The relationship was found to be nonlinear, with very productive (nonproductive) scientists using the internet less (more) than would be expected according to their productivity. Also, being involved in collaborative R&D and having large networks of collaborators is associated with increased internet use. In contrast to older studies, the analysis did not find any equalizing effect whereby higher internet use rates help to overcome the problems of potentally disadvantaged researchers. Obviously, everybody who wants to stay at the forefront of research and keep upto-date with developments in their research fields has to use the internet.

Rule-Based Metadata Interoperation in Heterogeneous Digital Libraries

Ding, H; Solvberg, I. in The Electronic Library, Vol. 25, No. 2, 2007, pp. 193-207

The purpose of this research is to describe a system to support querying across distributed digital libraries created in heterogeneous metadata schemas, without requiring the availability of a global schema. The advantages and weaknesses of ontology based applications were investigated and have justified the utility of inferential rules in expressing complex relations between metadata terms in different metadata schemas. A process for combining ontologies and rules for specifying complex relations between metadata schemas were designed. The process was collapsed into a set of working phases and provides examples to illustrate how to interrelate two similar bibliographic ontology fragments for further query reformulation. Equipping ontologies with inferencing power can help describe more complex relations between metadata terms. This approach is critical for properly interpreting queries from one ontology to another.

Subject Retrieval from Full-Text Databases in the Humanities

East, J.W. in Portal – Libraries and the Academy, Vol. 7, No. 2, 2007, pp. 227-241

This paper examines the problems involved in subject retrieval from full-text databases of secondary materials in the humanities. Ten such databases were studied and their search functionality evaluated, focussing on factors such as Boolean operators, document surrogates, limiting by subject area, proximity operators, phrase searching, wildcards, weighting of search terms, limiting by type of document, controlled vocabulary indexing and ranking, and display of search results. The author suggests ways in which full-text searching might be improved, whether by enhancement of database records, by introduction of enhanced search functionality, or by the education of searchers in more effective search techniques. The conclusion is that current digitization projects are not producing databases that meet the needs of scholars.

Temporal Analysis of a Very Large Topically Categorized Web Query Log

Beitzel, S.M., Jensen, E.C., Chowdhury, A., Frieder, O., and Grossman, D. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 2, 2007, pp. 166-178

The authors review a log of billions of web queries that constituted the total query traffic for a 6-month period of a general-purpose commercial web search service. Previously, query logs were studied from a single, cumulative view. In contrast, this study builds on the authors’ previous work, which showed changes in popularity and uniqueness of topically categorized queries across the hours in a day. To further their analysis, they examine query traffic on a daily, weekly, and monthly basis by matching it against lists of queries that have been topically precategorized by human editors. These lists represent 13 percent of the query traffic. They show that query traffic from particular topical categories differs both from the query stream as a whole and from other categories. Additionally, they show that certain categories of queries trend differently over varying periods. The authors key contribution is twofold: They outline a method for studying both the static and topical properties of a very large query log over varying periods, and they identify and examine topical trends that may provide valuable insight for improving both retrieval effectiveness and efficiency.

Understanding Journal Usage: A Statistical Analysis of Citation and Use

McDonald, J.D. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 1, 2007, pp. 39-50

This study examined the relationship between print journal use, online journal use, and online journal discovery tools with local journal citations. Local use measures were collected from 1997 to 2004, and negative binomial regression models were designed to test the effect that local use, online availability, and access enhancements have on citation behaviors of academic research authors. Models are proposed and tested to determine whether multiple locally recorded usage measures can predict citations and if locally controlled access enhancements influence citation. The regression results indicated that print journal use was a significant predictor of local journal citations prior to the adoption of online journals. Publisher-provided and locally recorded online journal use measures were also significant predictors of local citations. Online availability of a journal was found to significantly increase local citations, and, for some disciplines, a new access tool like an OpenURL resolver significantly impacts citations and publisher-provided journal usage measures.

User Modeling for Personalized Web Search with Self-Organizing Map

Ding, C., and Patra, J.C. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 4, 2007, pp. 494-507

The widely used web search engines index and recommend individual web pages in response to a few keywords queries to assist users in locating relevant documents. However, the web search engines give different users the same answer set, although the users may have different preferences. A personalized web search would carry out the search for each user according to his or her preferences. To conduct the personalized web search, the authors provide a novel approach to model the user profile with a self-organizing map (SOM). Their results indicate that SOM is capable of helping the user to find the related category for each query used in the web search to make a personalized web search effective.

Visualization of the Citation Impact Environments of Scientific Journals: An Online Mapping Exercise

Leydesdorff, L. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 1, 2007, pp. 25-38

Aggregated journal-journal citation networks based on the Journal Citation Reports 2004 of the Science Citation Index (5,968 journals) and the Social Science Citation Index (1,712 journals) are made accessible from the perspective of any of these journals. A vector-space model is used for normalization, and the results are brought online at www.leydesdorff.net/jcr04 as input files for the visualization program Pajek. The user is thus able to analyze the citation environment in terms of links and graphs. Furthermore, the local impact of a journal is defined as its share of the total citations in the specific journal’s citation environments; the vertical size of the nodes is varied proportionally to this citation impact. The horizontal size of each node can be used to provide the same information after correction for within-journal (self-)citations.In the “citing” environment, the equivalents of this measure can be considered as a citation activity index which maps how the relevant journal environment is perceived by the collective of authors of a given journal. As a policy application, the mechanism of interdisciplinary developments among the sciences is elaborated for the case of nanotechnology journals.

Web Issue Analysis: An Integrated Water Resource Management Case Study

Thelwall, M., Vann, K., and Fairclough, R. in Journal of the American Society for Information Science and Technology, Vol. 57, No. 10, 2006, pp. 1303-1314

In this article web issue analysis is introduced as a new technique to investigate an issue as reflected on the web. The issue chosen, integrated water resource management (IWRM), is a United Nations-initiated paradigm for managing water resources in an international context, particularly in developing nations. As with many international governmental initiatives, there is a considerable body of online information about it: 41,381 hypertext markup language (HTML) pages and 28,735 PDF documents mentioning the issue were downloaded. A page uniform resource locator (URL) and link analysis revealed the international and sectoral spread of IWRM. A noun and noun phrase occurrence analysis was used to identify the issues most commonly discussed, revealing some unexpected topics such as private sector and economic growth. Although the complexity of the methods required to produce meaningful statistics from the data is disadvantageous to easy interpretation, it was still possible to produce data that could be subject to a reasonably intuitive interpretation. Hence web issue analysis is claimed to be a useful new technique for information science.

Web Searcher Interaction with the Dogpile.Com Metasearch Engine

Jansen, B.J., Spink, A., and Koshman, S. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 5, 2007, pp. 744-755

Metasearch engines are an intuitive method for improving the performance of web search by increasing coverage, returning large numbers of results with a focus on relevance, and presenting alternative views of information needs. However, the use of metasearch engines in an operational environment is not well understood. In this study, we investigate the usage of Dogpile.com, a major web metasearch engine, with the aim of discovering how web searchers interact with metasearch engines. We report results examining 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005 and compare these results with findings from other web searching studies. We collect data on geographical location of searchers, use of system feedback, content selection, sessions, queries, and term usage. Findings show that Dogpile.com searchers are mainly from the USA (84 percent of searchers), use about three terms per query (mean=2.85), implement system feedback moderately (8.4 percent of users), and generally (56 percent of users) spend less than one minute interacting with the web search engine. Overall, metasearchers seem to have higher degrees of interaction than searchers on non-metasearch engines, but their sessions are for a shorter period of time. These aspects of metasearching may be what define the differences from other forms of web searching. We discuss the implications of our findings in relation to metasearch for web searchers, search engines, and content providers.

Web Searching on the Vivisimo Search Engine

Koshman, S., Spink, A., and Jansen, B.J. in Journal of the American Society for Information Science and Technology, Vol. 57, No. 14, 2006, pp. 1875-1887

The application of clustering to web search engine technology is a novel approach that offers structure to the information deluge often faced by web searchers. Clustering methods have been well studied in research labs; however, real user searching with clustering systems in operational web environments is not well understood. This article reports on results from a transaction log analysis of Vivisimo.com, which is a web meta-search engine that dynamically clusters users’ search results. A transaction log analysis was conducted on two week’s worth of data collected from March 28 to April 4 and April 25 to May 2, 2004, representing 100 percent of site traffic during these periods and 2,029,734 queries overall. The results show that the highest percentage of queries contained two terms. The highest percentage of search sessions contained one query and was less than 1 minute in duration. Almost half of user interactions with clusters consisted of displaying a cluster’s result set, and a small percentage of interactions showed cluster tree expansion. Findings show that 11.1 percent of search sessions were multitasking searches, and there are a broad variety of search topics in multitasking search sessions. Other searching interactions and statistics on repeat users of the search engine are reported. These results provide insights into search characteristics with a cluster-based web search engine and extend research into web searching trends.

Web-Based Text Classification in the Absence of Manually Labeled Training Documents

Hung, C.M., and Chien, L.F, in Journal of the American Society for Information Science and Technology, Vol. 58, No. 1, 2007, pp. 88-96

Most text classification techniques assume that manually labeled documents (corpora) can be easily obtained while learning text classifiers. However, labeled training documents are sometimes unavailable or inadequate even if they are available. The goal of this article is to present a self-learned approach to extract high-quality training documents from the web when the required manually labeled documents are unavailable or of poor quality. To learn a text classifier automatically, we need only a set of user-defined categories and some highly related keywords. Extensive experiments are conducted to evaluate the performance of the proposed approach using the test set from the Reuters-21578 news data set. The experiments show that very promising results can be achieved only by using automatically extracted documents from the web.

Webmasters, Web Policies and Academic Libraries: A Survey

Hendricks, A. in Library Hi Tech, Vol. 25, No. 1, 2007, pp. 136-146

The purpose of this paper is to gauge how university libraries are currently handling web policies as well as to see if the role of the library webmaster has evolved. A survey was created and an invitation to participate was sent to various electronic discussion lists. Most of the questions were quantitative and were coded to find trends in the responses. Most of the respondents either are reference librarians or webmasters, and they are mostly staff or faculty. As increasing numbers of resources become available electronically, university library web pages are going to continue to play an important role in academia. Survey responses indicate that most libraries (52 percent) have developed a web policy and 64 percent have formed a web advisory committee to maintain their web content. Responses also indicate the desire for further training in keeping up with the new technologies and the increased workload due to the time spent in maintaining web pages.

Which Factors Explain the Web Impact of Scientists’ Personal Homepages?

Barjak, F., Li, X.M., and Thelwall, M. in Journal of the American Society for Information Science and Technology, Vol. 58, No. 5, 2007, pp. 200-211

In recent years, a considerable body of webometric research has used hyperlinks to generate indicators for the impact of web documents and the organizations that cre- ated them. The relationship between this web impact and other, offline impact indicators has been explored for entire universities, departments, countries, and scientific journals, but not yet for individual scientists-an important omission. The present research closes this gap by investigating factors that may influence the web impact (i.e. inlink counts) of scientists’ personal homepages. Data concerning 456 scientists from five scientific disciplines in six European countries were analyzed, showing that both homepage content and personal and institutional characteristics of the homepage owners had significant relationships with inlink counts. A multivariate statistical analysis confirmed that full-text articles are the most linked-to content in homepages. At the individual homepage level, hyperlinks are related to several offline characteristics. Notable differences regarding total inlinks to scientists’ homepages exist between the scientific disciplines and the countries in the sample. There also are both gender and age effects: fewer external inlinks (i.e. links from other web domains) to the homepages of female and of older scientists. There is only a weak relationship between a scientist’s recognition and homepage inlinks and, surprisingly, no relationship between research productivity and inlink counts. Contrary to expectations, the size of collaboration networks is negatively related to hyperlink counts. Some of the relationships between hyperlinks to homepages and the properties of their owners can be explained by the content that the homepage owners put on their homepage and their level of internet use; however, the findings about productivity and collaborations do not seem to have a simple, intuitive explanation. Overall, the results emphasize the complexity of the phenomenon of web linking, when analyzed at the level of individual pages.

Related articles