Enterprise

Semantics


Michael Grove — Chief Software Architect
@mikegrovesoft — http://clarkparsia.com

About Us


  • Since 2005; offices in Washington DC, and Boston, MA
  • Global leader in semantic technology & information integration
  • Strong academic partnerships in US, UK, EU, Latin America
  • Extensive experience building semantic technology based solutions

Services Experience

  • 10 years building semantic technology based solutions
    • US Gov't, banking, financial, energy, health/bio, retail
    • 75+ years aggregate experience in the field
      • 60% of employees: advanced degrees in semtech, AI, etc.
  • Application development and consulting
    • We've built lots of applications for and with our customers
    • We're good at this because we love doing it
  • Lots of semtech R&D: publicly and privately funded
    • "Research as a Service"
    • Contributed code to Sesame, Jena, OWL-API

      Training and Support


      • Comprehensive training suite
        • Based on a decade of customer engagements
        • Covers the full spectrum of semtech and related technologies:
          • Introduction to Semantic Technologies all the way to Advanced OWL Modeling 
          • Training for all products in our stack
      • Consulting & Support
        • For our products, naturally
        • But also for open source as well
          • Protege
          • Sesame, Jena

      Our Technology


      POPS


      • NASA is a big, interesting organization
        • 100k employees spread over 12 centers across US
        • Have a universe of data
      • But one simple, terrestrial problem; finding experts
        • COTS solutions were not working
        • Data required to solve problem was already in-house
      • Enter POPS
        • We build a model over all relavent data sources
        • Created a simple web interface & some social analytics to let NASA use their own data
        • Saved them $38M a year!
        • Now an official W3C case study on how to use semtech

      Why Clark & Parsia?


      • Experience ... We've been building solutions for Enterprise Semantics for nearly a decade
      • Leadership ... We've led from academia, from industry, from standardization
      • Technical Excellence ... We provide the leading OWL DL reasoner, the leading RDF database, and provide unique capabilities available no where else
      • Focus on Customer Success ... We're uniquely focused on helping our customers solve the hardest information problems in the world




      Stardog


      • The leading RDF database
        • Currently at 2.1.1 (2014-02-13)
        • 54 releases, or, 1 every 3 weeks
        • 5000x increase in scale, 1000x increase in performance
      • Pure Java
      • Community & Enterprise Editions
      • Rich feature set
        • Reasoning, ICV, full-text search, ACID, High Availability ...
      • Focus on developer experience

      Performance

      • Query
        • Loading lots of data not useful if you cannot query it
        • Query 100M triples with a throughput of 3M+ queries/hour, 1B at 500k queries/hour and 10B at 20k queries/hour.
          • BSBM, 64 concurrent clients
        • Fastest SP2B benchmark results at 5M, only known completion at 25M, close to completing at 100M
      • Scale
        • Up to 50B triples/quads on modest hardware
      • Load rates up to 500k triples/second
        • That's 100M triples in 3 minutes, 1B in 30m, and 20B in 20 hrs

        Developers

        Query


        • SPARQL 1.1
          • Query, Update, Graph Protocol
        • Highly optimized query planner
          • Optimized for analytic/BI queries
          • And queries from the reasoner
          • But does not sacrifice performance at low scales
        • Scalable query answering
          • Intermediate results get big, and fast
          • Runtime analysis of heap status; flow intermediate results into direct memory, or onto disk
        • Query Management

        Enterprise


        • JMX Server Monitoring
        • High Availability
        • Hot backup & restore
        • Access & Audit logging
        • Web Console build on Stardog Web Framework
        • PROV support

        Transactions & Security

        • Transactions
          • ACID
          • Guarded (optionally) by ICV
          • 2 Phase Commit over all database components
            • RDF Index, Lucene, KB, etc.
            • Automatically managed by database
        • Security
          • RBAC model
            • Based on Apache Shiro
            • R/W ACLs for access to individual databases
            • Administrative controls for actions against the DBMS
              • Online/offline a database, modify security settings, etc.

        ICV


        • Integrity Constraint Validation inside Stardog's transaction system
        • Prevents data modifications that violate your integrity constraints
        • Can be used in 'guard mode'
        • Also supports 'oracle mode' aka 'middleware mode'
          • Check whether a given graph & constraints are valid
          • Executed outside a transaction
        • Constraints expressed in SPARQL, OWL, SWRL, or Stardog Rules
          • High-level declarative languages make it easy to author simple constraints, possible to author complex

        Simple ICV Example


        Every supervisor should supervise at least one Employee
        Supervisor subClassOf supervises some Employee 
        IF { 
            ?x a Supervisor 
        } 
        THEN { 
            ?x supervises ?y . 
            ?y a Employee 
        } 
        select * { 
            ?x a Supervisor. 
            FILTER NOT EXISTS {
                ?x supervises ?y . 
                ?y a Employee 
            } 
        }  

        A more complex example

        If a project is funded by only internal funding sources, then it should be approved by the internal budget office
          Project and (fundedBy only InternalFundingSource) subClassOf approvedBy value InternalBudgetOffice 
          select * where { 
              ?x a Project . 
              FILTER NOT EXISTS {
                  ?x fundedBy ?y . 
                  FILTER NOT EXISTS { 
                      ?y a InternalFundingSource 
                  } 
              } . 
              FILTER NOT EXISTS {
                  ?x approvedBy InternalBudgetOffice 
              } 
          } 

          ICV Explanations


          • If you are using ICV
            • You may not understand why a constraint was violated
            • Or you might want to show the reason to a user
          • Explanations
            • Tells you why the violation occurred
              • Gives you the proof used to derive the violation
              • Shows the exact data which violated the constraint

          ICV Explanation Example


          Every supervisor should supervise at least one Employee:
          Supervisor subClassOf supervises some Employee
          Alice a Supervisor 
          VIOLATED Supervisor subClassOf (supervises some Employee)
             ASSERTED     Alice a Supervisor
             NOT_INFERRED x a Employee
                          Alice supervises x 

          Full-text Search


          • Embeds Lucene
            • Automatically managed by the database as if another RDF index
          • Enables full-text search over your RDF
            • Literal values indexed by Lucene
            • Use the Lucene query language to search
          • Seamless integration with SPARQL
            • Results from searching Lucene as SPARQL results
          • Also available directly via SNARL API

          What is Reasoning?


          • Make implicit information explicit
            • Implicit in the schema, or the data, or both
          • Represent domain knowledge in a formal, declarative model
            • Called an ontology
              • Like UML, but with formal semantics
            • W3C specification called OWL, the Web Ontology Language
          • Reasoners consume ontologies to derive new information
            • Answer queries, find inconsistencies

          • Complex, but manageable
            • OWL divided into profiles with less expressivity but better computational properties

          Reasoning


          • Unmatched OWL Support
            • All OWL2 profiles (QL, RL, EL, DL) and Stardog profile (SL)
            • Caveats no equality reasoning, no datatype reasoning, DL over schema only
          • Query time reasoning
            • No write performance penalty
            • Pay for what you use
          • Reasoning services
            • Consistency checking, Satisfiability
          • Explanations
            • Like ICV, you can get explanations for inferences

          Rules


          • Stardog supports SWRL a W3C standard
            • Part of the SL profile
            • You can use SWRL/RDF syntax, but it's terrible
            • Much easier to use Stardog Rules
              • If-Then style based on SPARQL Syntax
          PREFIX :
          PREFIX math: 
          IF {
               ?c a :Circle ;
                    :radius ?r
               BIND (math:pi() * math:pow(?r, 2) AS ?area)
          }
          THEN {
              ?c :area ?area
          }



          Questions?



          Thanks!

          Enterprise Semantics


          • Complex organizations face challenging IT problems
          • Enterprise Semantics: model-driven information integration
            • Provide solutions for decision support, analysis, BI
            • Use all facets of semantic technology, AI, knowledge representation, etc.
          • Smart data, not big data
            • Scale is not a condition for utility
            • Not all problems solved by adding more data

          Graph Analytics

          • Coming in 2.2
          • Take advantage of the fact that RDF is a graph
          • Graph measures: in-degree, out-degree, PageRank, betweenness centrality
          • Clustering: weakly/strongly connected components, Bron-Kerbosch clique finding
          • Path finding: BFS and shortest path
          • Seamless SPARQL integration and SNARL API 

          Stardog Web


          • Focus on Web part of Semantic Web
            • Organizations don't always have semtech experts at hand
          • Provide a framework that abstracts away those details
            • Stick to well-known web technologies
              • HTML, CSS, Javascript, JSON
              • backbone.js as the model layer, SPARQLRoutes as middleware
          • Goal is to provide good out of the box capabilities
            • Faceted-browsing, semantic search, REST, CRUD, etc.
            • Minimal configuration or programming required, just add data
            • Provide basis for quickly building web apps based on semtech
              • Aimed at data discovery/exploration use-cases
          • Stardog Web Console built on this technology

          How does it work?


          • Built on Backbone.js
            • Using a Yeoman generator to bootstrap the application
          • SPARQL Routes
            • RESTful middleware for SPARQL without all the query 
          • Data exchange is JSON-LD
            • You don't need to know RDF or fumble with RDF serializations
          • Templates based on Handlebars
          • Visualizations in D3

          Reasoning Example


          • For example, enforcing security (ACLs)
          • Can Bob access Resource1?
          Bob is-a Admin OR Bob created Resource1 OR (Bob hasRole ?r AND ?r canAccess Resource1) OR ... 
          • Hard to maintain, encoded domain knowledge into the query
          • Can leverage reasoning to simplify
          Bob canAccess Resource1 
          • More concise and maintainable
            • Reasoner handles the implementing logic transparently

          Clark & Parsia: Enterprise Semantics

          By Michael Grove

          Clark & Parsia: Enterprise Semantics

          A brief overview of Clark & Parsia and Stardog

          • 2,217