To read this content please select one of the options below:

Learning representations of Web entities for entity resolution

Luciano Barbosa (Universidade Federal de Pernambuco, Recife, Brazil)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 10 December 2018

Issue publication date: 8 August 2019

171

Abstract

Purpose

Matching instances of the same entity, a task known as entity resolution, is a key step in the process of data integration. This paper aims to propose a deep learning network that learns different representations of Web entities for entity resolution.

Design/methodology/approach

To match Web entities, the proposed network learns the following representations of entities: embeddings, which are vector representations of the words in the entities in a low-dimensional space; convolutional vectors from a convolutional layer, which capture short-distance patterns in word sequences in the entities; and bag-of-word vectors, created by a bow layer that learns weights for words in the vocabulary based on the task at hand. Given a pair of entities, the similarity between their learned representations is used as a feature to a binary classifier that identifies a possible match. In addition to those features, the classifier also uses a modification of inverse document frequency for pairs, which identifies discriminative words in pairs of entities.

Findings

The proposed approach was evaluated in two commercial and two academic entity resolution benchmarking data sets. The results have shown that the proposed strategy outperforms previous approaches in the commercial data sets, which are more challenging, and have similar results to its competitors in the academic data sets.

Originality/value

No previous work has used a single deep learning framework to learn different representations of Web entities for entity resolution.

Keywords

Citation

Barbosa, L. (2019), "Learning representations of Web entities for entity resolution", International Journal of Web Information Systems, Vol. 15 No. 3, pp. 346-358. https://doi.org/10.1108/IJWIS-07-2018-0059

Publisher

:

Emerald Publishing Limited

Copyright © 2018, Emerald Publishing Limited

Related articles