Modeling Transformers in Natural Language Processing Based on Spiking Neural Networks
AmirHossein Ebrahimi
September 2022
1
School of Mathematics, Statistics and Computer Science
Supervisors
Dr. Mohammad Ganjtabesh
Dr. Morteza Mohammadnori
It all starts here.
Image credit: The FlyEM team, Janelia Research Campus, HHMI (CC BY 4.0)
2
3
Transformer inspired
architecture
based on
Spiking Neural Network
4
5500 years of evolution.
5.1
Find the optimal delay for each connection
6.1
Find the optimal delay for each connection
Find the optimal weight for each connection
6.2
un·break·able
– Morpheme Detection
7.1
Adapted from Cognitive Neuroscience: The Biology of the Mind-W. W. Norton & Company (2019)
7.2
8.1
8.2
LIF as building block for our biological model.
Delay makes it possible to accumulate firing effect.
Homeostasis makes the stabilization process easier
STDP optimize weights via LTD and LTP
mechanism
winner take all simplify the model and its
output
Dopamine plays an important role
in reinforcement learning
8.3
9.1
p = 0.9
tau = 7
noise = 0.8
1000 words
The longer it takes
for the model to converge,
the darker it grows.
9.2
This show was an amazing, fresh & innovative idea in the 70s when it first aired...
– Sentiment Analysis
10
11.1
11.2
Population Base Activity regulate long term effect on neuron decision.
Stimulus merge and accumulate other population currents.
Trace make attention mechanism possible.
Condition Reseter separate the simulation in the inference & learning phase
Population Winner
is a strategy for output detection in neuron populations.
Decision Maker select the worthy neuron population.
11.3
Adapted from Sentiment Analysis on IMDb benchmark by paperswithcode.com
12
2021
13
Images credit: https://flyvec.vizhub.ai
Images credit: https://flyvec.vizhub.ai
2022
Image credit: "Text Classification in Memristor-based Spiking Neural Networks" paper
14
sentiment analysis model
Partially bio-plausible
Delay Learning
Generality
Scalability
transformer-inspired architecture
NEW
Fully bio-plausible
Delay Learning
Generality
word embedding model
Bio-plausible
Delay Learning
Generality
Scalability
15
Increase the number of neurons in the neural population by increasing the model size layer by layer.
Benchmark the model's performance in additional data domains, such as image using gabor filters or other nlp tasks.
Achieve optimal convergence by reordering the mapping function in the first seen of the desired outputs.
16
For you attention
STACK & SOURCE CODE
17