Identity Management in Knowledge Graphs

In the absence of a central naming authority on the Semantic Web, it is common for different knowledge graphs to refer to the same thing by different names (IRIs). Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Such identity statements have strict logical semantics, indicating that every property asserted to one name, will also be inferred to the other, and vice versa. While such inferences can be extremely useful in enabling and enhancing knowledge-based systems such as search engines and recommendation systems, incorrect use of identity can have wide-ranging effects in a global knowledge space like the Semantic Web. With several studies showing that owl:sameAs is indeed misused for different reasons, a proper approach towards the handling of identity links is required in order to make the Semantic Web succeed as an integrated knowledge space.

This thesis investigates the identity problem at hand, and provides different, yet complementary solutions. Firstly, it presents the largest dataset of identity statements that has been gathered from the LOD Cloud to date, and a web service from which the data and its equivalence closure can be queried. Such resource has both practical impacts (it helps data users and providers to find different names for the same entity), as well as analytical value (it reveals important aspects of the connectivity of the LOD Cloud). In addition, by relying on this collection of 558M identity statements, we show how network metrics such as the community structure of the owl:sameAs graph can be used in order to detect possibly erroneous identity assertions. For this, we assign an error degree for each owl:sameAs based on the density of the community(ies) in which they occur, and their symmetrical characteristics. One benefit of this approach is that it does not rely on any additional knowledge. Finally, as a way to limit the excessive and incorrect use of owl:sameAs, we define a new relation for asserting the identity of two ontology instances in a specific context. This identity relation is accompanied with an approach for automatically detecting these links, with the ability of using certain expert constraints for filtering irrelevant contexts. As a first experiment, the detection and exploitation of the detected contextual identity links are conducted on a knowledge graph for life sciences, constructed in the context of this thesis in a collaboration with experts from the French National Institute of Agricultural Research (INRA).

Full PhD dissertation

Members of the Jury

Reviewer Assistant Professor - HDR
Catherine Faron Zucker
University of Nice Sophia Antipolis
Reviewer Professor
Mathieu d'Aquin
National University of Ireland Galway
Examiner Research Scientist
Harry Halpin
Massachusetts Institute of Technology
Examiner Professor
Pascal Molli
Nantes University
Examiner Professor
Sarah Cohen Boulakia
Paris-Sud University
Thesis Director Professor
Juliette Dibie
Thesis Director Assistant Professor - HDR
Nathalie Pernelle
Paris-Sud University
Thesis Supervisor Assistant Professor
Fatiha Saïs
Paris-Sud University
Thesis Supervisor Assistant Professor
Liliana Ibanescu

Date: Friday 30th of November 2018 at 14h30
Location: 16 rue Claude Bernard, 75005, Paris

How to get here:

  • By Bus: stop "Berthollet-Vauquelin" (Bus 21 or Bus 27) + 1 minute by foot
  • By Metro: stop "Censier-Daubenton" (Metro 7) + 4 minutes by foot
  • By RER: stop "Luxembourg" (RER B) + 11 minutes by foot

I defended my PhD thesis in the Coléou room. It is located in the Claude Bernard (CB) wing of the AgroParisTech building. Arriving from the main entrance of AgroParisTech (16 rue Claude Bernard and represented by the yellow circle in the figure below), the CB wing is accessed from the right hand side. The Coléou room is located at the same level of the main entrance (i.e. no need to take the stairs).

The defense will be followed by celebratory drinks at the Centenaire room, located in the wing A of the building, and also at the same level of the main entrance.