ExAIS: Executable AI Semantics

Richard Schumi                                      Jun Sun

Singapore Management University

ICSE'22

  • Semantics
  • Applications
    • Test Case Generation
    • Model Validation

Tensorflow in Prolog

Background

  • Prolog
  • Tensorflow
layer(average).
layer(flatten).
ai_components(X) :- layer(X).

?- ai_components(flatten).
?- ai_components(X).
# Prolog

Prolog

  • declarative: describe computation rather than control flow, lacks side effect
  • logic programming: relies on first order logic
  • Prolog = knowledge base (rules & fact) + query

Term

  • atom: no inherent meaning
  • number: float or integer
  • variable: start with upper-case letter or underscore
  • compound term
    • list / string
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

predictions = model(x_train[:1]).numpy()
# Tensorflow

Tensorflow

  • popular DL framework from Google (OSDI'16)
  • computational graph (model)
  • inference / training

Semantics

  • Overview
  • Ex: dense
  • Ex: conv1D
  • Ex: dropout
  • Recap
# Semantics

Semantics

  • 72 layers (nearly all)
  • 3200 lines of Prolog
  • correctness guarantee:
    • code review
    • manual testing (ex in doc)
    • automated testing (own fuzzing engine)
# Def: Dense

Definition: Dense

For 1-D input and output, a dense layer is an affine transformation.

\mathbf{y} = W \mathbf{x} + \mathbf{b}
  • X: n-d tensor
    • the first dimension denotes sample
  • W: m-d tensor
  • B: (m-1)-d tensor
  • Y: (n+m-2)-d tensor
y_{i_2 \cdots i_{n - 1} j_2 \cdots j_m} = b_{j_2 \cdots j_m} + \sum_{k} x_{i_2 \cdots i_{n - 1} k} w_{k j_2 \cdots j_m}
dense_layer ([I|Is], IWs , Bs , [O|Os ]) :-
	depth ([I|Is ] ,2),
	dense_node_comp (I, IWs , Bs , O),
	dense_layer (Is , IWs , Bs , Os).

dense_layer ([I|Is], IWs , Bs , [O|Os ]) :-
	depth ([I|Is],D), D > 2,
	dense_layer (I, IWs , Bs , O),
	dense_layer (Is , IWs , Bs , Os).

dense_layer ([] , _, _, []).

dense_node_comp ([I|Is ],[ IW|IWs],Res0 ,Res) :-
	multiply_list_with (IW ,I, Res1 ),
	add_lists (Res0 ,Res1 , Res2 ),
	dense_node_comp (Is ,IWs ,Res2 ,Res).

dense_node_comp ([] ,[] , Res ,Res).
# Semantics: Dense

Semantics: Dense

y_{i_2 \cdots i_{n - 1} j_2 \cdots j_m} = b_{j_2 \cdots j_m} + \sum_{k} x_{i_2 \cdots i_{n - 1} k} w_{k j_1 \cdots j_m}
\mathbf{y}_{i_2 \cdots i_{n - 1}} = \mathbf{b} + \sum_{k} x_{i_2 \cdots i_{n - 1} k} \mathbf{w}_{k}
# Def: Conv1D

Definition: Conv1D

  • I: (N, L, C)
  • W: (K, C, F)
  • B: (F,)
  • O: (N, L - K + 1, F)
O_{i, j, k} = B_k + \sum_{0 \le j' < K, l} W_{j', l, k} I_{i, j + j', l}

arguments:

  • stride: slide by a certain step (default 1)
  • padding: add 0s to left and right
conv1D_layer (Is , KernelSize ,IWs ,Bs , Strides , Padding ,Os):-
	check_dimensions (Is ,3) ,
	check_valid_kernel (Is , KernelSize , Padding ),
	check_valid_weight_shapes (Is , KernelSize ,IWs ,Bs),
	pool1D (sum ,Is , KernelSize , Strides , Padding ,IWs ,Bs ,false ,Os).

pool1D ( Poolfunc ,[I|Is], PoolSize , Strides , Padding ,IWs ,Bs , MultiLayerPool ,[O|Os ]):-
	pool1D ( Poolfunc ,I ,0,0, PoolSize , Strides , Padding ,IWs ,Bs , MultiLayerPool ,[] ,O),
	pool1D ( Poolfunc ,Is , PoolSize , Strides , Padding ,IWs ,Bs , MultiLayerPool ,Os).

pool1D (_ ,[] ,_,_,_,_,_,_ ,[]) .

pool1D ( Poolfunc ,[[I|Is0 ]| Is ],0,0, PoolSize , Strides ,true ,IWs ,Bs , MultiLayerPool ,[] , Os) :-
	atomic (I),length ([[I|Is0 ]| Is],L),
	calc_padding (L, PoolSize , Strides ,LeftP , RightP ),
	padding1D ([[I|Is0 ]| Is], x,LeftP , RightP , Is1),
	pool1D ( Poolfunc ,Is1 ,0,0, PoolSize , Strides ,false ,IWs ,Bs , MultiLayerPool ,[] , Os).

pool1D ( Poolfunc ,[[I|Is0 ]| Is],X ,0, PoolSize , Strides ,false ,IWs ,Bs ,false ,Os0 ,Os) :-
	atomic (I),length ([[I|Is0 ]| Is],LX),
	get_pool_res1D ( Poolfunc ,[[I|Is0 ]| Is],X,Y, PoolSize , Strides , IWs ,Bs ,false ,O),
	insert_pool_field (Os0 ,O,true ,X,Y, Strides ,Os1),
	(X+ Strides + PoolSize =< LX -> X1 is X+ Strides ; X1 is LX +1) ,
	pool1D ( Poolfunc ,[[I|Is0 ]| Is],X1 ,0, PoolSize , Strides ,false ,IWs ,Bs ,false ,Os1 ,Os).

pool1D ( Poolfunc ,[[I|Is0 ]| Is],X,Y, PoolSize , Strides , Padding ,IWs , Bs ,true ,Os0 ,Os) :-
	atomic (I),length ([[I|Is0 ]| Is],LX),
	get_pool_res1D ( Poolfunc ,[[I|Is0 ]| Is],X,Y, PoolSize , Strides , IWs ,Bs ,true ,O),
	insert_pool_field (Os0 ,O,true ,X,Y, Strides ,Os1),
	(X+ Strides + PoolSize =< LX -> X1 is X+ Strides ,Y1 is Y; X1 is 0,Y1 is Y+1) ,
	pool1D ( Poolfunc ,[[I|Is0 ]| Is],X1 ,Y1 , PoolSize , Strides , Padding , IWs ,Bs ,true ,Os1 ,Os).

pool1D (_ ,[[I|Is0 ]| Is],X,Y,_,_,false ,_,_,_,Os ,Os) :-
	atomic (I),
	( length ([[I|Is0 ]| Is],LX), X >= LX;
	length ([I|Is0],LY), Y >= LY).
# Semantics: Conv1D

Semantics: Conv1D

\mathbf{O}_{i, j} = \mathbf{B} + \sum_{0 \le j' < K, l} \mathbf{W}_{j', l} I_{i, j + j', l}
# Def: Dropout

Definition: Dropout

  • randomly ignore some nodes with a certain probability
  • proved to be able to reduce "overfitting" when training a large neural network no a small dataset
  • only effect training
  • characteristics: non-determinism
dropout_layer (Is , Os , Rate , AcceptedRateDiff ) :-
	count_atoms (Is ,N), count_atoms (Os ,NO), NO = N,
	count_occurrences (Is ,0, NZeroOrig ),
	count_occurrences (Os ,0, NZeroNew ),
	RealRate is ( NZeroNew - NZeroOrig ) / (N -NZeroOrig ),
	Diff is abs( Rate - RealRate ),
	( Diff > AcceptedRateDiff -> ( write (" Expected Rate : "),
		writeln ( Rate ), write (" Actual Rate : "), writeln (
		RealRate ), false ); true ).
# Semantics: Dropout

Semantics: Dropout

  • Note (from Xingyu): this is not an executable way to handle non-determinism like dropout layer.
# Recap

Recap

  • Identify 79 unique non-abstract and non-wrapper layers.
  • Able to implement 72 of them.
  • Remaining 7 layers have too generic or flexible input such as function or other layers.
  • 7 are non-deterministic, only check if described properties followed.
  • cost 8 work months.
  • Major advantange of Prolog is declarative and high-level, which enables compactness and straightforward implementation.
    • 3.2k semantics v.s. 3M source
  • Only 7 layers had detailed descriptions in doc, 44 were explained with examples or reference paper, remaining 21 were under-specified or had very little examples.

Applications

  • Testing
  • Model Validation
  • Evaluation
# Testing

Testing Approach

  • A fuzzing method which utilises feedback from our semantics
  • Test case generator produces test data in the form of tensor and a deep learning model 

Model Validation

A = max_pool1D_layer ([[[1.313 ,1.02] ,[1.45 ,1.92]]] ,1 ,1 , false , Max),
B = conv1D_layer ([[[0.9421 ,0.7879] ,[0.809 ,0.855]]] ,1 ,[[[0.572 , 0.621] ,[0.5388 , 0.5741]]] ,[0 ,0] , 1, false , 1, Con),
C = cropping1D_layer (Con , 5, 5, Cro),
D = concatenate_layer ([ Max ,Cro], 1, Con1 ),
exec_layers ([A,B,C,D],["Max","Con","Cro"," Con1 "],Con1 ," Con1 ")
# Model Validation
  • Check if model is valid e.g. executable, that is, to detect errors such as, wrong shapes of parameters or input data, invalid arguments.
  • AI libraries already produces error messages to detect such issues. However, they can be difficult to understand and rarely there is no error reported.
  • By converting Tensorflow model into our Prolog representation, we can validate the models.
  • In evaluation, invalid models generated by test generator are used to inspect the bugs of Tensorflow.

Evaluation

10,000 testcases generated in 35h

100 invalid models generated

Conclusion

  • Tensorflow semantics in Prolog
    • deterministic behavior
  • Testing
    • some issues are found
  • https://github.com/rschumi0/ExAIS
Made with Slides.com