To read this content please select one of the options below:

Automatic vs manual categorisation of documents in Spanish

Carlos G. Figuerola (Universidad de Salamanca, Facultad de Documentation, C/Fco. De Vitoria 6‐16, 37008 Salamanca, Spain)
Angel Zazo Rodríguez (Universidad de Salamanca, Facultad de Documentation, C/Fco. De Vitoria 6‐16, 37008 Salamanca, Spain)
José Luis Alonso Berrocal (Universidad de Salamanca, Facultad de Documentation, C/Fco. De Vitoria 6‐16, 37008 Salamanca, Spain)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 1 December 2001

300

Abstract

Automatic categorisation can be understood as a learning process during which a program recognises the characteristics that distinguish each category or class from others, i.e. those characteristics which the documents should have in order to belong to that category. As yet few experiments have been carried out with documents in Spanish. Here we show the possibilities of elaborating pattern vectors that include the characteristics of different classes or categories of documents, using techniques based on those applied to the expansion of queries by relevance; likewise, the results of applying these techniques to a collection of documents in Spanish are given. The same collection of documents was categorised manually and the results of both procedures were compared.

Keywords

Citation

Figuerola, C.G., Zazo Rodríguez, A. and Luis Alonso Berrocal, J. (2001), "Automatic vs manual categorisation of documents in Spanish", Journal of Documentation, Vol. 57 No. 6, pp. 763-773. https://doi.org/10.1108/EUM0000000007099

Publisher

:

MCB UP Ltd

Copyright © 2001, MCB UP Limited

Related articles