Stardog Unleashed on NYC


Michael Grove
Chief Software Architect - Clark & Parsia, LLC

About Us


  • Founded in 2005; offices in Washington DC & Boston
  • Application development & consulting
  • Customers in US Gov't, banking/financial, energy, health/bio, retail
  • Strong academic partnerships in US, UK, Europe, and Mexico
  • Experts in all things semantic
    • OWL/RDF/SPARQL/SWRL and any other acronym
    • Information Integration, Expertise Location, Policy Management, Enterprise Decision Support, Application Development

Overview


  • Use Cases of Semantic Technology in the Financial Industry
  • Stardog Overview
  • Stardog Web demo

Customer 360


  • Unify customer information: integrate all data about a customer as it's discovered
    • Past, Present, and Future
    • Pull from a variety of sources, including unstructured, many of which are non-relational
    • Take advantage of flexible nature of semantic technology

Data Provenance


  • Capture the provenance of data throughout its lifecycle
  • Utilize this information to enable data governance and regulatory compliance
  • Annotate data as it comes in; continuously updated
  • W3C spec dedicated to this: PROV

Reference Data


  • Create a 'gold standard' for names, labels and identities
    • Represent core industry terms and concepts
    • Ties into Data Provenance
  • Modeling complex relationships between entities can be trivial using semantic technology
  • FIBO is a great example

Compliance


  • Reduce compliance efforts to query answering and graph analytics
  • Legal regulations are complex
    • And tracking related policies is a time consuming job
    • Cost of implementation high, cost of a failure catastrophic
  • Utilize reasoning & rules
    • Express regulations and policies as complex relationships
    • Workflows and compliance checking can be performed by a reasoner
      • Automated compliance analysis with explanations

Analytics & Decision Support


  • Empower human decision making with contextualized, relevant information
  • There is a lot of value in unstructured information
    • But it is hard to query
    • And even harder to extract
  • But what you can extract is very valuable
    • As you build up, you create actionable information
    • Sift through the data to find the facts so a human can make decisions more quickly and easily

Examples


  • We've built some applications for our customers based on some of these concepts
    • Cross Matcher
    • Policy Management
    • POPS

What's the Common Thread?


  • All information integration problems
    • i.e. not really financial services problems
  • So how do you solve them?
    • Specifically, what's the best way to perform information integration?
  • Semantic Graphs

Semantic Graphs


  • Create graphs with meaning
    • Encoded within the graph
      • By giving formal, declarative definitions of the nodes and edges
      • Using a high-level language
    • Specifically, to create computer understandable meaning
      • So the computer can help
  • This lets us use the appropriate abstractions
  • And is the obvious choice for information integration problems

Some Programming Required?


  • It's a fact of life, non-programmers exist
    • You might be sitting next to one right now!
    • They can make valuable contributions to a codebase
      • Except, they can't write code
        • And we can't teach them
      • But we tend to lock everything up in the code
        • Like business logic

No Programming Required!


  • So we encode our business logic using a formal semantics
    • We're encoding it in the graph
      • Using a high-level language
        • No programming required
      • Frees it from the codebase; frees it from programmers

Non-programmers Rejoice


  • Let non-programmers perform complex information processing tasks without writing code
  • More directly capture expertise
    • By letting the actual experts author the business logic
  • Easier and more maintainable
    • Using the appropriate abstractions
    • Inference rules & queries
      • So the computer can do the work





Stardog


Performance


  • Query
    • Loading lots of data is not useful if you cannot query it
    • Query 100M triples with a throughput of 3M+ queries per hour.  1B with nearly 500k queries/hour and 10B with nearly 20k queries/hour
      • This is BSBM with 64 concurrent clients
    • Fastest SP2B benchmark results at 5M, only known implementation to complete 25M, close to completing 100M
  • Scale
    • Up to 50B triples/quads on modest hardware
  • Load rates up to 500k triples/second
    • That's 100M triples in 3 minutes, 1B in 30, and 20B in 20 hours.

Developers



ICV


  • Integrity Constraint Validation keeps data safe and consistent
  • Prevent modifications that violate your integrity constrains
    • 'Guard mode'
    • Constraint violations abort transactions
  • Also support 'oracle' mode, aka 'middleware' mode
    • Outside of a transaction
    • Check if data valid w.r.t some constraints
  • Violations can be explained
  • Inferences can satisfy or violate a constraint
  • Constraints expressed in SPARQL, OWL, SWRL, or Stardog Rules
    • High-level declarative languages make it easy to write simple constraints, possible to write complex ones

ICV Example


Every supervisor should supervise at least one employee
Supervisor subClassOf supervises some Employee  
IF { 
    ?x a Supervisor 
} 
THEN { 
    ?x supervises ?y . 
    ?y a Employee 
}  
select * { 
    ?x a Supervisor. 
    FILTER NOT EXISTS {
        ?x supervises ?y . 
        ?y a Employee 
    } 
} 

Another ICV Example


If a project is funded by only internal funding sources, then it should be approved by the internal budget office

Project and (fundedBy only InternalFundingSource) subClassOf approvedBy value InternalBudgetOffice 
select * where { 
    ?x a Project . 
    FILTER NOT EXISTS {
        ?x fundedBy ?y . 
        FILTER NOT EXISTS { 
            ?y a InternalFundingSource 
        } 
    } . 
    FILTER NOT EXISTS {
        ?x approvedBy InternalBudgetOffice 
    } 
} 

ICV Explanations


  • If you are using ICV
    • You may not understand why a violation occurred
    • Or want to communicate it to the user
  • Explanations
    • Tells you why the violation occurred
      • Shows exactly the data that caused the violation
      • Gives you the proof used to derive the violation

ICV Explanation Example

Every Supervisor should supervise at least one Employee

Supervisor subClassOf supervises some Employee
Alice a Supervisor 
VIOLATED Supervisor subClassOf (supervises some Employee)
   ASSERTED     Alice a Supervisor
   NOT_INFERRED x a Employee
                Alice supervises x 

What is reasoning?


  • Make implicit information explicit
    • Implicit in the schema, or data, or both
    • Represent domain knowledge in a formal declarative model
      • Called an ontology
        • Like UML, but with formal semantics
      • W3C specification called OWL, Web Ontology Language
  • Reasoners consume ontologies to derive new information
    • Answer queries, find inconsistencies
  • Complex, but manageable
    • OWL divided into profiles with less expressivity, but better computational properties 

Reasoning


  • Unmatched OWL support
    • All OWL2 profiles (RL, EL, QL, DL) and Stardog profile (SL)
    • Caveats, no equality reasoning, no datatype reasoning, no DL reasoning over your ABox
  • Query time reasoning
    • No write performance penalty
    • Pay for what you use
  • Explanations
    • Inference you don't understand?
    • Reasoner will give you the proof used to derive it!
  • Reasoning Services
      • Consistency checking, satisfiability

    Stardog Rules


    • Stardog supports SWRL
      • Part of the SL profile
      • You cannot write it by hand, SWRL/RDF is unusable
      • Much easier use Stardog Rules
        • If-Then style rules based on SPARQL syntax:
     
    PREFIX :
    PREFIX math: 
    IF {
        ?c a :Circle ;
             :radius ?r
        BIND (math:pi() * math:pow(?r, 2) AS ?area)
    }
    THEN {
        ?c :area ?area
    } 

    Query


    • SPARQL 1.1
      • Update, query, graph protocol
    • Custom query planner, optimized for complex queries
      • Targets BI/analytic queries
      • And also reasoning
      • But does not sacrifice performance at low scales or with simple queries
    • Scalable query answering
      • Intermediate results can get big, and fast
      • Runtime will automatically flow results off-heap, and then to disk as needed
    • Query management  

    Full Text Search


    • Embeds Lucene
      • Automatically managed by database as if another RDF index
    • Enables full-text searches over your RDF
      • Literals are indexed by Lucene
      • Uses the Lucene query language
    • Seamless integration via SPARQL
      • Join results of full-text searches with regular SPARQL query
    • Also available via SNARL Java API

    Enterprise Features


    • JMX server monitoring
    • Hot Backup & Restore
    • Access/Audit logging
    • Web console built on Stardog Web Framework
    • PROV and SKOS support
    • ACID Transactions
    • Rich Security model

    Archetypes


    • Named bundle of data and functionality that can be applied when a database is created
      • Intended to support data standards and/or toolchains in a simple way
      • Mix and match these when the database is created
    • PROV and SKOS support are built in
      • FIBO is next
      • Can also be user defined!




    Graph Versioning


    • Version control is insanely useful
      • Sometimes I wonder how people live without it
      • So why not for an RDF database?
    • Stardog adds commit management features similar to many popular VCS systems
      • Add metadata, like comments, to commits
      • Create tags
      • Revert to a previous version
      • Get diffs between versions
    • Oh, all of this is stored as RDF
      • So you can query your version history

    Admin Console


    • In Stardog 2.0 we added the Web Console
      • Expose the features of the stardog  CLI in an easy to use web interface
        • Add/Remove data, execute queries, etc.
        • Or simply browse your data
    • Coming in 2.2, we're adding an administrative web console
      • Create and drop database, manage security, etc.
      • Everything you can do via the stardog-admin  CLI

    Sneak Peek



    The Fixer


    • We talked about ICV
    • Finding and explaining violations is nice
      • But what do you do with these?
      • How about we fix them?
    • Semi-automated repair plans
      • Use the reasoner, constraints, and a planner to find ways to fix violations
      • When the solution is unambiguous, it can be applied automatically
      • And when it's not, Stardog can present multiple plans
        • So human can pick which one to apply

    Stardog Cluster


    • HA Cluster
    • Active Replication
      • 2PC-based commit protocol for strong consistency
      • Writes processed by coordinator to determine order of operations
      • Reads are distributed evenly over all nodes
    • Coming Soon!
      • Closed beta starting next month
      • Aiming for general release in Q3

    What's else?


    • Graph analytics
    • Named graph security
    • Stored Procedures
    • GeoSPARQL
    • Materialized views
    • Equality reasoning
    • R2RML support
    • And as always, faster & more scalable

    Stardog Web


    • Focus on the Web part of Semantic Web
      • Organizations don't always have experts in semtech available
    • Provide a framework that abstracts away these details
      • Stick to well-known web technologies
        • HTML, CSS, Javascript, JSON as data
        • backbone.js as a model layer, SPARQL Routes middleware
    • Enable teams to start building an application right away
      • Without focusing on learning semtech or graphs
      • Because the value is in solving the problem

    Stardog Web


    • Stardog Web Console is built on this technology
    • Have good out of the box capabilities
      • Search, CRUD, REST, faceted browsing
      • Templates, plugin mechanisms
    • Requiring minimal programming
      • JSON based configuration
      • Just Add Data
    • Provide basics for building web applications based on semtech quickly and easily
    • Soon to be open source




    Demo





    Questions?




    Thanks!

    Transactions & Security


    • Transactions
      • ACID
      • Guarded (optionally) by ICV
      • 2 Phase Commit over all database components
        • RDF Index, Lucene, KB, etc.
        • Automatically managed by the database
    • Security
      • RBAC model
        • Based on Apache Shiro
        • R/W ACLs for access to individual databases
        • Administrative controls for actions against DBMS
          • Online/offline a database, modify security settings, etc.

    Graph Analytics


    • Coming in Stardog 2.3
    • RDF graphs are still just graphs
    • Graph measures: in-degree, out-degree, PageRank, betweenness centrality
    • Clustering: weak/strongly connected components, clique finding
    • Path finding: BFS and shortest path
    • Seamless SPARQL integration

    Graph Analytics



    Reasoning Example


    • For example, enforcing security (ACLs)
    • Can Bob access Resource1?
    Bob is-a Admin OR Bob created Resource1 OR (Bob hasRole ?r AND ?r canAccess Resource1) OR ... 
    • Hard to maintain, encoded domain knowledge into the query
    • Can leverage reasoning to simplify
    Bob canAccess Resource1 
    • More concise and maintainable
      • Reasoner handles the implementing logic transparently

    Smart Data


    • Scale is not a necessary condition for utility
      • Not all problems are solved by adding more data
      • Getting value from data comes down to how easily you can do something with it
    • Smart data is data with semantics attached to it
      • Gives data meaning
        • More specifically, data with a computer understandable meaning
      • Which means the computer can help
        • And that makes it easier to utilize data
        • Analysis, BI, decision support, etc.

    Stardog Unleashed in NYC

    By Michael Grove

    Stardog Unleashed in NYC

    An overview of the features and performance of Stardog

    • 3,218