On Modeling and Leveraging the Web of Data

Semantic Representations and Methods

Nikita Zhiltsov, Kazan Federal University

Kazan, Russia

Kazan Federal University

Founded in 1804, the Kazan University is the 3rd oldest university in Russia.

Breakthrough inventions and discoveries:

  • ​​non-Euclidean geometry (N. Lobachevsky)
  • ​a chemical element Ruthenium (K. E. Claus)
  • theory of organic compounds structure (A. Butlerov)
  • ​​electron paramagnetic resonance (Y. Zavoisky)
  • acoustic paramagnetic resonance (S. Altshuler) ...

R&D

  • A leading Russian research group in NLP & Semantic Web
  • Research grants from the government and industry (HP Labs Russia)
  • Publications on major conferences, such as CIKM, ISWC, ESWC, ECIR, COLING

Education

  • Courses on AI, IR and NLP
  • First Russian course on Semantic Web (Microsoft/Yandex prize) 
  • Hosting RuSSIR'2013, a summer school in IR (sponsored by Google, Yandex)

Intelligent Search Technologies Lab

1 professor, 1 PhD, 2 PhD candidates, 2 research fellows

Overview

  • Introduction
  • Ontological Modeling
  • Link prediction
  • Semantic search
  • Software

Entities

  • Describe real or abstract objects
  • Use a pre-defined schema (database or ontology)
  • Are connected with other entities by relations
  • Entities as nodes & relations as edges => knowledge graphs

Knowledge graphs

Barack Obama's descriptions

Linking Open Data

Research Problems

  • Data incompleteness
    • Link prediction
    • Ontology matching
    • Knowledge extraction
  • Data validation
    • Entity resolution
    • Error detection
  • Interface
    • Semantic search
    • Question answering
  • Reasoning
    • automatic reasoning
    • generalization

based on KDD'14 tutorial by A. Bordes and E. Gabrilovich

Research Problems

  • Data incompleteness
    • Knowledge extraction
    • Ontology matching
    • Link prediction
  • Data validation
    • Entity resolution
    • Error detection
  • Interface
    • Semantic search
    • Question answering
  • Reasoning
    • automatic reasoning
    • generalization

based on KDD'14 tutorial by A. Bordes and E. Gabrilovich

☑in this talk!

☑in this talk!

☑in this talk!

Overview

  • Introduction
  • Ontological Modeling
  • Link prediction
  • Semantic search
  • Software

Ontological modeling

  1. Barack Obama, Michelle Obama
  2. Barack Obama is a US_President
  3. Barack Obama is married to Michelle Obama
  4. US_President is a Person
  5. US_President (x) => ¬ Russian_President (x)

 

  1. Entities
  2. Classes
  3. Relations
  4. Axioms
  5. Inference rules

Problem:

modeling of the math knowledge 

This is a hard task due to:

  • abstractness
  • duality
  • emergence of novel terms

OntoMathPro

  • An expressive applied ontology of pro-level mathematics
  • 3,450 entities
  • ~ 5,000 relations
  • 2 taxonomies:
    • fields of mathematics
    • math objects
  • Bilingual (Russian/English)

with O. Nevzorova, A. Kirillovich, E. Lipachev (KESW'14)

Relations

  • Taxonomic
    • Lambda matrix is a Matrix
  • Logical dependency
    • Christoffel Symbol is defined by Connectedness
  • Topical relatedness
    • Barycentric Coordinates belongs to Metric Geometry
  • General associativity
    • ​Chebyshev Iterative Method see also Numerical Solution of Linear Equation Systems

Semantic Platform for Math Collections

with O. Nevzorova, D. Zaikin, O. Zhibrik, A. Kirillovich, V. Nevzorov, E. Birialtsev (ISWC'13)

Semantic Platform for Math Collections

  1. Logical structure analysis:
    1. Markup of significant document parts
    2. Extraction of relations between parts
  2. Text mining (only in Russian):
    1. Named entity recognition
    2. Entity resolution (OntoMathPro, DBpedia)
  3. Publishing as RDF dataset: available at 

    http://cll.niimm.ksu.ru:8890/sparql

  4. Semantic search of math formulas (see the next part =>)

 

ontomathpro.org

Overview

  • Introduction
  • Ontological Modeling
  • Link prediction
  • Semantic search
  • Software

Problem

RESCAL: Tensor Factorization

The tensor X is very sparse

RESCAL Optimization Task

Approximation under least-squares loss with regularization

Memory Efficient Objective

Thus, we can store dense matrices of r x n size at maximum instead of n x n

RESCAL ALS Algorithm

  1. The starting matrix          is initialized by first  r eigenvectors of the matrix 
  2. Perform updating for A and 

 

 

 

 

 

 

 

3. Keep updating until               converges 

A^{(0)}
A(0)
\sum_k (X_k + X_k^T)
k(Xk+XkT)
R_k
Rk

Overview

  • Introduction
  • Ontological Modeling
  • Link prediction
  • Semantic search
  • Software

Entity Ranking in the Web of Data

Entity Ranking by Modeling Latent Semantics

  1. Retrieve top-k results w.r.t. strong baseline
  2. Get entity vectors from RESCAL
  3. For each candidate entity, compute the cosine similarity w.r.t. top-k entity vectors
  4. Re-rank results according to pseudo relevance feedback

with E. Agichtein (CIKM'13)

=> +7.1% NDCG@10, +6.4% MAP@10

Semantic Search of Math Formulas

http://cll.niimm.ksu.ru/mathsearch

Overview

  • Introduction
  • Ontological Modeling
  • Link prediction
  • Semantic search
  • Software

Software

  • Anduin, processing RDF on Hadoop: http://github.com/nzhiltsov/Anduin

 (> 600 downloads on mloss.org)

 

  • Ext-RESCAL, scalable factorization of sparse tensors: http://github.com/nzhiltsov/Ext-RESCAL

Contacts

{firstname}.{lastname}@gmail.com

@nzhiltsov

 

 

Thanks for your attention!

Spasibo!

Presentation at Wayne State University

By Nikita Zhiltsov

Presentation at Wayne State University

  • 2,412