Neural Symbolic Learning

Ahmad Haj Mosa

Fabian Schneider

ahmad.haj.mosa@pwc.com

LinkedIn: ahmad-haj-mosa

schneider.fabian@pwc.com

LinkedIn: fabian-schneider-24122379

Marvin Minsky

The definition of intelligence

Our minds contain processes that enable us to solve problems we consider difficult.

"Intelligence" is our name of those processes we don't yet understand.

Marvin Minsky

The definition of intelligence

Our model contain processes that enable us to solve problems we consider difficult.

"Black-box" is our name of those processes we don't yet understand.

source: DARPA

Is it explainability vs accuracy?

Explainability
(notional)

Explainability

Prediction Accuracy

Neural Nets

Deep
Learning

Statistical
Models

AOGs

SVMs

Graphical
Models

Bayesian
Belief Nets

SRL

MLNs

Markov
Models

Decision
Trees

Ensemble
Methods

Random
Forests

Learning Techniques (today)

Non relational inductive bias

Spatial and Temporal relational inductive bias

Attention based Spatial and Temporal relational inductive bias

Multi relational inductive bias

explanatory factors

Disetangled Representation

Graph Neural Networks

Attention Mechanism

ConvNets

Feed Forward

Performance

Explainability

Current

Future

Is it explainability vs accuracy?

What is next in Deep Learning?

Slow

Fast

Automatic

Effortful

Logical

Emotional

Conscious

Unconscious

Stereotypic

Reasoning

System 1	System 2
drive a car on highways	drive a car in cities
come up with a good chess move (if you're a chess master)	point your attention towards the clowns at the circus
understands simple sentences	understands law clauses
correlation	causation
hard to explain	easy to explain

Thinking fast and slow

source: Thinking fast and slow by Daniel Kahneman

thinking fast

thinking slow

consciousness prior

learning slow

learning fast

hard explanation

easy

explanation

A framework for representing knowledge

Neural Symbolic Learning

Neural Symbolic AI Models

Sym

Data

Sym: Symbolic representation ( graph and/or knowledge base)
ML: Statistical Machine Leaning ( mainly deep learning )
RE: Logical Reasoning Engine
Data: raw data ( image, voice, structured data .etc)

Sym

Data

Sym

source: arXiv:1810.02338: Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Neural symbolic and disentangled reasoning

Sym

Logic Tensor Networks

Serafini, Luciano, and Artur d'Avila Garcez. "Logic tensor networks: Deep learning and logical reasoning from data and knowledge." arXiv preprint arXiv:1606.04422 (2016).

Socher, Richard, et al. "Reasoning with neural tensor networks for knowledge base completion."

Advances in neural information processing systems. 2013.

Defined FOL for real valued vectors (RealLogic)
Defined semantic grounding for RealLogic
Using Neural Tensor Network (NTN) for inferring new knowledge

NTN: learning representation of entities and their relations in a KB unsupervised using tensor representation

Logic Tensor Networks

Serafini, Luciano, and Artur d'Avila Garcez. "Logic tensor networks: Deep learning and logical reasoning from data and knowledge." arXiv preprint arXiv:1606.04422 (2016).

Socher, Richard, et al. "Reasoning with neural tensor networks for knowledge base completion."

Advances in neural information processing systems. 2013.

Defined FOL for real valued vectors (RealLogic)
Defined semantic grounding for RealLogic
Using Neural Tensor Network (NTN) for inferring new knowledge

NTN: learning representation of entities and their relations in a KB unsupervised using tensor representation

Data

Sym

Explanatory Graphs for CNNs

source: arXiv:1812.07997: Explanatory Graphs for CNNs

Framework 1: Logical Simplification using Tree Search, Graph Neural Network and Reinforcement Learning

Logical Simplification using TS, GN and RL

Y=\bar A.\bar B.C +\bar A.B.C+ A.\bar B.C + A.B.\bar C.+ A.B.C

Y=A.B+C

\bar A.\bar B.C

\bar A.B.C

A.\bar B.C

A.B. \bar C

A.B.C

Initial

Target

Truth Table

Graph Representation

Logical Simplification using TS, GN and RL

\bar A.\bar B.C

\bar A.B.C

A.\bar B.C

A.B. \bar C

A.B.C

A.B

Shared Layer

Value Layer

Policy Layer

Shared Weights along the features space and the depth

Convolution over the Graph

Bi-LSTM

Atten-LSTM

Flatten

Dense

Bi-LSTM

Apply the simplifier block on every pair of nodes
In case of large graph, apply the block locally on adjacent nodes

Simplifier Block

Shared Layer

Value Layer

Policy Layer

Bi-LSTM

Atten-LSTM

Flatten

Dense

Concatenation

\bar A.\bar B.C

\bar A. B.C

\bar A.C

Value of the selection [-1,1]

Simplifier Block

Bi-LSTM

\bar A.\bar B.C

\bar A. B.C

\bar A

\bar B

Bi-LSTM

Atten-LSTM

Flatten

Dense

Bi-LSTM

Concatenation

\bar A

Concatenation

Node Features

Bi-LSTM

\bar B

Bi-LSTM

Atten-LSTM

Flatten

Dense

Bi-LSTM

Concatenation

1	0	0	0
0	1	0	0
0	0	0	0

0	1	1	0
0	0	0	0
0	0	0	0

\bar B

1	0	0
0	1	0
0	0	1
0	0	0

\bar B

0	0	1
1	1	0
0	0	0
0	0	0

Binary Logic

Multivariate Logic

Monte Carlo Tree Search

Bi-LSTM

Atten-LSTM

Concatenation

\bar A.\bar B.C

\bar A. B.C

\bar A.C

Value of the selection [-1,1]

Y=\bar A.\bar B.C +\bar A.B.C+ A.\bar B.C + A.B.\bar C.+ A.B.C

Y=\bar A.C + A.\bar B.C + A.B.\bar C.+ A.B.C

Y=\bar A.C + A.\bar B.C + A.B

Y=\bar A.C + A.C + A.B

Y=C + A.B

N W

Q P

Accumulated value W = W + v

Number of visits N = N + 1

Mean value Q = W / N

Prior probability of an action P

v = acc - \frac{G}{G_{root}}

Active nodes in the graph: G

MCT + RL

Bi-LSTM

Y=\bar A.\bar B.C +\bar A.B.C+ A.\bar B.C + A.B.\bar C.+ A.B.C

Y=\bar A.C + A.\bar B.C + A.B.\bar C.+ A.B.C

Y=\bar A.C + A.\bar B.C + A.B

Y=\bar A.C + A.C + A.B

Y=C + A.B

N W

Q P

Accumulated value W = W + v

Number of visits N = N + 1

Mean value Q = W / N

Prior probability of an action P

Repeate the follwoing steps for M time:

```
 Generate Binary data
```
```
 Generate Target Rule
```

  Simulate the MCT:

```
Choose the action that maximises Q + U
```

Continue until the leaf node is reached

```
Rollout W, N, Q, P
```

Train the policy layer using policy gradient

Train the value layer by minimizing MSE between W and the predicted W

Active nodes in the graph: G

v = acc - \frac{G}{G_{root}}

U_i = \frac{W_i}{N_i} + c \sqrt{\frac{ln N_p}{N_i}}

Logical Simplification using TS, GN and RL

\bar A.\bar B.C

\bar A.B.C

A.\bar B.C

A.B. \bar C

A.B.C

A.B

Shared Layer

Value Layer

Policy Layer

Shared Weights along the features space and the depth

Framework 2: DL Disentangled Representation + Functional Programming Computaiotnal Graph

Framework 2: DL representation + CGN reasoning

Bi-LSTM

Atten-LSTM

Dense

Variational Disentagled AE

Dense Classifier

TSNE

LIME

CGN

(Ship_to = AT) & (Material_Tax =1) & (IncoTerm =1)

& (VAT_ID =AT) --> domestic full tax

Computational Graph Network

[ not ( A || C ) || ( C && not ( B && E ) )

, f7  ( f1 ( A ) )

, not ( f8 ( f7 ( f1 ( A ) ) ) )

, not ( A || B )  || implication( C, D )

, ... ]

\varphi\,(x)\,=\,f\,\circ\,g\,\,\,(x,?)\,?\,=\,f\,(\,g\,(\,x,?\,)\,)\,? \\ apply\,\,\varphi\,\,to\,\,\,input\,\,so\,\,we\,\,get\,\,result

Haskell

(.-) [[a] -> [b]] -> ([[b]] -> [a] -> [c]) -> [a] -> [c]
(.-) funcList         newFunc                 inp =
                      newFunc (map ($inp) funcList) inp
-- where
--    funcList = Previous Node Function List
--    newFunc  = Node Function
--    inp = Sample Input

Usage of CGN

predicted rules

searching for best fitting rules

re-validate rules

- Accuracy is not enough to trust an AI

- Accuracy vs Explainability is not a trade-off

Look for solutions in weird places:

- Try Functional Programming & Category Theory

Trends in XAI:

- closing the gap between Symbolic AI and DL

- disentangled representation

- Object-Oriented-Representation

- Computational Graph Networks

Conclusion

+ Don't let your robot read legal texts ;)

Thank you for your attention

Fabian Schneider
PoC-Engineer & Researcher
schneider.fabian@pwc.com

Ahmad Haj Mosa
AI Researcher
ahmad.haj.mosa@pwc.com

Neural Symbolic Learning

Ahmad Haj Mosa

Fabian Schneider

Marvin Minsky

The definition of intelligence

Our minds contain processes that enable us to solve problems we consider difficult.

"Intelligence" is our name of those processes we don't yet understand.

Marvin Minsky

The definition of intelligence

Our model contain processes that enable us to solve problems we consider difficult.

"Black-box" is our name of those processes we don't yet understand.

Is it explainability vs accuracy?

Explainability (notional)

Learning Techniques (today)

Non relational inductive bias​

Spatial and Temporal relational inductive bias​

Attention based Spatial and Temporal relational inductive bias​

Multi relational inductive bias​

explanatory factors

Disetangled Representation

Graph Neural Networks

Attention Mechanism

ConvNets

Feed Forward

Performance

Explainability

Current

Future

Is it explainability vs accuracy?

What is next in Deep Learning?

Slow

Fast

Automatic

Effortful

Logical

Emotional

Conscious

Unconscious

Stereotypic

Reasoning

Thinking fast and slow

A framework for representing knowledge

Neural Symbolic Learning

Neural Symbolic AI Models

Neural symbolic and disentangled reasoning

Logic Tensor Networks

Logic Tensor Networks

Explanatory Graphs for CNNs

Framework 1: Logical Simplification using Tree Search, Graph Neural Network and Reinforcement Learning

Logical Simplification using TS, GN and RL

Logical Simplification using TS, GN and RL

Shared Weights along the features space and the depth

Convolution over the Graph

Apply the simplifier block on every pair of nodes

In case of large graph, apply the block locally on adjacent nodes

Simplifier Block

Simplifier Block

Node Features

Monte Carlo Tree Search

MCT + RL

Logical Simplification using TS, GN and RL

Shared Weights along the features space and the depth

Framework 2: DL Disentangled Representation + Functional Programming Computaiotnal Graph

Framework 2: DL representation + CGN reasoning

Computational Graph Network

Haskell

Usage of CGN

Conclusion

Thank you for your attention

Explainability
(notional)

Non relational inductive bias

Spatial and Temporal relational inductive bias

Attention based Spatial and Temporal relational inductive bias

Multi relational inductive bias