To read this content please select one of the options below:

Web robot detection in scholarly Open Access institutional repositories

Joseph W. Greene (James Joyce Library, University College Dublin, Dublin, Ireland)

Library Hi Tech

ISSN: 0737-8831

Article publication date: 19 September 2016

735

Abstract

Purpose

The purpose of this paper is to investigate the impact and techniques for mitigating the effects of web robots on usage statistics collected by Open Access (OA) institutional repositories (IRs).

Design/methodology/approach

A close review of the literature provides a comprehensive list of web robot detection techniques. Reviews of system documentation and open source code are carried out along with personal interviews to provide a comparison of the robot detection techniques used in the major IR platforms. An empirical test based on a simple random sample of downloads with 96.20 per cent certainty is undertaken to measure the accuracy of an IR’s web robot detection at a large Irish University.

Findings

While web robot detection is not ignored in IRs, there are areas where the two main systems could be improved. The technique tested here is found to have successfully detected 94.18 per cent of web robots visiting the site over a two-year period (recall), with a precision of 98.92 per cent. Due to the high level of robot activity in repositories, correctly labelling more robots has an exponential effect on the accuracy of usage statistics.

Research limitations/implications

This study is performed on one repository using a single system. Future studies across multiple sites and platforms are needed to determine the accuracy of web robot detection in OA repositories generally.

Originality/value

This is the only study to date to have investigated web robot detection in IRs. It puts forward the first empirical benchmarking of accuracy in IR usage statistics.

Keywords

Acknowledgements

The author would like to thank Paul Needham (University of Cranfield and IRUS-UK) and Stefan Amshey, Ann Connolly, and Jean-Gabriel Bankier (BePress Digital Commons) for invaluable discussions and suggestions on the draft of this paper.

Citation

Greene, J.W. (2016), "Web robot detection in scholarly Open Access institutional repositories", Library Hi Tech, Vol. 34 No. 3, pp. 500-520. https://doi.org/10.1108/LHT-04-2016-0048

Publisher

:

Emerald Group Publishing Limited

Copyright © 2016, Emerald Group Publishing Limited

Related articles