1. Präsentation von Marco Lehner
Reporters lose time for investigating topics twice and performing tedious and easily automatable tasks.
Readers having a hard time finding (background) information on topics of interest.
Newsrooms can't use their collected information in its entirety since it's not machine readable.
Knowledge base population systems add extracted information to existing Knowledge Bases (KB).
Knowledge Extraction often consists three steps:
Absence
Extracted Entities are not part of a given KB.
Latency
Knowledge needs to be served to users in real-time.
Evolving Information (not tackled)
Change of facts over time needs to be represented in the KB.
Any text document in german language.
Best results on documents with many entities.
Python wrapper for Stanford Core NLP.
Local score
}
Global Score
Weighted product of embedding variance and local score.
(Baseline)
While the set's global score is below a given threshold
(Baseline)
(Baseline)
Parravicini uses node2vec to calculate embeddings. It is not possible to get embeddings for new nodes.
Therefore GraphSAINT is (probably) used to calculate the embeddings. GraphSAGE trains an encoding function on a subset of the graph which later calculates embeddings for unseen parts of the graph. The structure of the graph has to remain stable.