To read this content please select one of the options below:

Measuring vocabulary use in the Linked Data Cloud

Alberto Nogales (University of Alcalá, Madrid, Spain)
Miguel Angel Sicilia-Urban (University of Alcalá, Madrid, Spain)
Elena García-Barriocanal (University of Alcalá, Madrid, Spain)

Online Information Review

ISSN: 1468-4527

Article publication date: 10 April 2017

609

Abstract

Purpose

This paper reports on a quantitative study of data gathered from the Linked Open Vocabularies (LOV) catalogue, including the use of network analysis and metrics. The purpose of this paper is to gain insights into the structure of LOV and the use of vocabularies in the Web of Data. It is important to note that not all the vocabularies in it are registered in LOV. Given the de-centralised and collaborative nature of the use and adoption of these vocabularies, the results of the study can be used to identify emergent important vocabularies that are shaping the Web of Data.

Design/methodology/approach

The methodology is based on an analytical approach to a data set that captures a complete snapshot of the LOV catalogue dated April 2014. An initial analysis of the data is presented in order to obtain insights into the characteristics of the vocabularies found in LOV. This is followed by an analysis of the use of Vocabulary of a Friend properties that describe relations among vocabularies. Finally, the study is complemented with an analysis of the usage of the different vocabularies, and concludes by proposing a number of metrics.

Findings

The most relevant insight is that unsurprisingly the vocabularies with more presence are those used to model Semantic Web data, such as Resource Description Framework, RDF Schema and OWL, as well as broadly used standards as Simple Knowledge Organization System, DCTERMS and DCE. It was also discovered that the most used language is English and the vocabularies are not considered to be highly specialised in a field. Also, there is not a dominant scope of the vocabularies. Regarding the structural analysis, it is concluded that LOV is a heterogeneous network.

Originality/value

The paper provides an empirical analysis of the structure of LOV and the relations between its vocabularies, together with some metrics that may be of help to determine the important vocabularies from a practical perspective. The results are of interest for a better understanding of the evolution and dynamics of the Web of Data, and for applications that attempt to retrieve data in the Linked Data Cloud. These applications can benefit from the insights into the important vocabularies to be supported and the value added when mapping between and using the vocabularies.

Keywords

Acknowledgements

The work leading to these results has received funding from the European Union Seventh Framework Programme, in the context of the SemaGrow project (ICT-318497): www.semagrow.eu/

Citation

Nogales, A., Angel Sicilia-Urban, M. and García-Barriocanal, E. (2017), "Measuring vocabulary use in the Linked Data Cloud", Online Information Review, Vol. 41 No. 2, pp. 252-271. https://doi.org/10.1108/OIR-06-2015-0183

Publisher

:

Emerald Publishing Limited

Copyright © 2017, Emerald Publishing Limited

Related articles