Main Article Content
In the past few years, knowledge graphs (KGs), as a form of structured human intelligence, have attracted considerable research attention from academia and industry. In this very active field of study, a widely explored problem is that of link prediction, the task of predicting whether two nodes should be connected, based on node attributes and local or global graph connectivity properties. The state of the art in this area is represented by techniques based on graph embeddings. However, KGs, especially those acquired using automated or partly automated techniques, are often riddled with noise, e.g., wrong relationships, which makes the problem of link deletion as important as that of link prediction. In this paper, we address three main research questions. The first is about the true effectiveness of different knowledge graph embedding models under the presence of an increasing number of wrong links. The second is to asses if methods that can predict unknown relationships effectively, work equally well in recognizing incorrect relations. The third is to verify if there are systems robust enough to maintain primacy in all experimental conditions. To answer these research questions, we performed a systematic benchmark study in which the experimental setting includes ten state-of-the-art models, three common KG datasets with different structural properties and three downstream tasks: the widely explored tasks of link prediction and triple classification, and the less popular task of link deletion. Comparative studies often yield contradictory results, where the same systems score better or worse depending on the experimental context. In our work, in order to facilitate the discovery of clear performance patterns and their interpretation, we select and/or aggregate performance data to highlight each specific comparison dimension: dataset complexity, type of task, category of models, and robustness against noise.