The Best of Both Worlds: Combining Hand-Tuned and Word-Embedding-Based Similarity Measures for Entity Resolution
Author:
Abstract
Recently word embedding has become a beneficial technique for diverse natural language processing tasks, especially after the successful introduction of several popular neural word embedding models, such as word2vec, GloVe, and FastText. Also entity resolution, i.e., the task of identifying digital records that refer to the same real-world entity, has been shown to benefit from word embedding. However, the use of word embeddings does not lead to a one-size-fits-all solution, because it cannot provide an accurate result for those values without any semantic meaning, such as numerical values. In this paper, we propose to use the combination of general word embedding with traditional hand-picked similarity measures for solving ER tasks, which aims to select the most suitable similarity measure for each attribute based on its property. We provide some guidelines on how to choose suitable similarity measures for different types of attributes and evaluate our proposed hybrid method on both synthetic and real datasets. Experiments show that a hybrid method reliant on correctly selecting required similarity measures can outperform the method of purely adopting traditional or word-embedding-based similarity measures.
- Citation
- BibTeX
Chen, X., Campero Durand, G., Zoun, R., Broneske, D., Li, Y. & Saake, G.,
(2019).
The Best of Both Worlds: Combining Hand-Tuned and Word-Embedding-Based Similarity Measures for Entity Resolution.
In:
Grust, T., Naumann, F., Böhm, A., Lehner, W., Härder, T., Rahm, E., Heuer, A., Klettke, M. & Meyer, H.
(Hrsg.),
BTW 2019.
Gesellschaft für Informatik, Bonn.
(S. 215-224).
DOI: 10.18420/btw2019-14
@inproceedings{mci/Chen2019,
author = {Chen, Xiao AND Campero Durand, Gabriel AND Zoun, Roman AND Broneske, David AND Li, Yang AND Saake, Gunter},
title = {The Best of Both Worlds: Combining Hand-Tuned and Word-Embedding-Based Similarity Measures for Entity Resolution},
booktitle = {BTW 2019},
year = {2019},
editor = {Grust, Torsten AND Naumann, Felix AND Böhm, Alexander AND Lehner, Wolfgang AND Härder, Theo AND Rahm, Erhard AND Heuer, Andreas AND Klettke, Meike AND Meyer, Holger} ,
pages = { 215-224 } ,
doi = { 10.18420/btw2019-14 },
publisher = {Gesellschaft für Informatik, Bonn},
address = {}
}
author = {Chen, Xiao AND Campero Durand, Gabriel AND Zoun, Roman AND Broneske, David AND Li, Yang AND Saake, Gunter},
title = {The Best of Both Worlds: Combining Hand-Tuned and Word-Embedding-Based Similarity Measures for Entity Resolution},
booktitle = {BTW 2019},
year = {2019},
editor = {Grust, Torsten AND Naumann, Felix AND Böhm, Alexander AND Lehner, Wolfgang AND Härder, Theo AND Rahm, Erhard AND Heuer, Andreas AND Klettke, Meike AND Meyer, Holger} ,
pages = { 215-224 } ,
doi = { 10.18420/btw2019-14 },
publisher = {Gesellschaft für Informatik, Bonn},
address = {}
}
Sollte hier kein Volltext (PDF) verlinkt sein, dann kann es sein, dass dieser aus verschiedenen Gruenden (z.B. Lizenzen oder Copyright) nur in einer anderen Digital Library verfuegbar ist. Versuchen Sie in diesem Fall einen Zugriff ueber die verlinkte DOI: 10.18420/btw2019-14
Haben Sie fehlerhafte Angaben entdeckt? Sagen Sie uns Bescheid: Send Feedback
More Info
DOI: 10.18420/btw2019-14
ISBN: 978-3-88579-683-1
ISSN: 1617-5468
xmlui.MetaDataDisplay.field.date: 2019
Language: (en)