Extropolis AI
What is a KG?
The KG Pipeline
Upstream
Representation
Downstream
Domain Discovery
Named Entity Recognition
Web Info Extraction
Relation Extraction
Transformation
RDF/RDFS
Wikidata
Property-centric
Ontology vs. Open
Instance matching
Stat Relational Learning
Representation Learning
Reasoning
Retrieval
Structured Querying
Question Answering
Representation
RDF/RDFS
Wikidata
Property-centric
Ontology vs. Open
Representation
RDF/RDFS
Wikidata
Property-centric
Ontology vs. Open
Collection of triples <s,p,o>
Uses URIs
Designed for semantic web
:kalin_ovtcharov
foaf:name
'Kalin Ovtcharov'
Representation
RDF/RDFS
Wikidata
Property-centric
Ontology vs. Open
Collection of triples <s,p,o>
Uses URIs
Designed for semantic web
:kalin_ovtcharov
foaf:name
'Kalin Ovtcharov'
Richer than RDF
Items/Properties/Statements
Statements: claims, ref, qualifier
think Wikipedia
info boxes
Representation
RDF/RDFS
Wikidata
Property-centric
Ontology vs. Open
Collection of triples <s,p,o>
Uses URIs
Designed for semantic web
:kalin_ovtcharov
foaf:name
'Kalin Ovtcharov'
Richer than RDF
Items/Properties/Statements
Statements: claims, ref, qualifier
think Wikipedia
info boxes
Local identifiers (no URIs)
Representation
RDF/RDFS
Wikidata
Property-centric
Ontology vs. Open
Collection of triples <s,p,o>
Uses URIs
Designed for semantic web
:kalin_ovtcharov
foaf:name
'Kalin Ovtcharov'
Richer than RDF
Items/Properties/Statements
Statements: claims, ref, qualifier
think Wikipedia
info boxes
Local identifiers (no URIs)
This is critical
Domain-specific?
User-focused?
Open?
Upstream (construction)
Domain Discovery
Named Entity Recognition
Web Info Extraction
Relation Extraction
All about crawlers
Lexical term matching, semantics, HMM-based
Require an ontology usually
using raw text is kind of old-school
Upstream (construction)
Domain Discovery
Named Entity Recognition
Web Info Extraction
Relation Extraction
All about crawlers
Lexical term matching, semantics, HMM-based
Require an ontology usually
using raw text is kind of old-school
Extract instances of predefined set of concepts from text
(considered hard)
Ontology
no ontology?
See Open IE
(even harder)
Upstream (construction)
Domain Discovery
Named Entity Recognition
Web Info Extraction
Relation Extraction
All about crawlers
Lexical term matching, semantics, HMM-based
Require an ontology usually
using raw text is kind of old-school
Extract instances of predefined set of concepts from text
(considered hard)
NER pipeline
CRFs, LSTMs etc
no ontology?
See Open IE
(even harder)
Upstream (construction)
Domain Discovery
Named Entity Recognition
Web Info Extraction
Relation Extraction
All about crawlers
Lexical term matching, semantics, HMM-based
Require an ontology usually
using raw text is kind of old-school
Extract instances of predefined set of concepts from text
(considered hard)
IE from HTML pages basically
Rule-based, hybrid, heuristics
Works with structured data too
might not be relevant for Extropolis
no ontology?
See Open IE
(even harder)
Upstream (construction)
Domain Discovery
Named Entity Recognition
Web Info Extraction
Relation Extraction
All about crawlers
Lexical term matching, semantics, HMM-based
Require an ontology usually
using raw text is kind of old-school
Extract instances of predefined set of concepts from text
(considered hard)
IE from HTML pages basically
Rule-based, hybrid, heuristics
Works with structured data too
might not be relevant for Extropolis
no ontology?
See Open IE
(even harder)
Think classification: given entities A and B what relation is the most likely?
Based on text, needs ontology
no ontology?
clustering
SVMs
CNNs/PCNNs
LSTM/BERT?
Bootstrapping/Distant Supervision
Upstream (construction)
Open IE
Domain-independent extraction
Good for heterogenous corpora
Main focus on relations
Domain-agnostic concepts exist
DBpedia
Wiki classes
YAGO
Challenges:
Inherently unsupervised
Suffers in multilingual settings
Granularity? Events?
Techniques:
Self-supervised ML (distant sup)
Rule-based
Clause based
+ Context
Transformation
Instance matching
Stat Relational Learning
Representation Learning
Resolve instances that refer to the same entity
+ ENS and Swoosh
Transformation
Instance matching
Stat Relational Learning
Representation Learning
Resolve instances that refer to the same entity
+ ENS and Swoosh
Infer relationships probabilistically
Why??
Markov Logic
PSL
Transformation
Instance matching
Stat Relational Learning
Representation Learning
Resolve instances that refer to the same entity
+ ENS and Swoosh
Infer relationships probabilistically
Why??
Markov Logic
PSL
Embeddings galore
To be continued...