Modeling Transformers in Natural Language Processing Based on Spiking Neural Networks

AmirHossein Ebrahimi

September 2022

School of Mathematics, Statistics and Computer Science

Supervisors

Dr. Mohammad Ganjtabesh

Dr. Morteza Mohammadnori

brain of fruit flies

It all starts here.

Image credit: The FlyEM team, Janelia Research Campus, HHMI (CC BY 4.0)

natural language processing

Transformers

Chemical Reaction Prediction

Image / VIDEO processing

TSNN

Transformer inspired
architecture
based on
Spiking Neural Network

https://newsinhealth.nih.gov/2020/06/communication-breakdown

https://www.9news.com.au/world/elon-musks-neuralink-confirms-monkeys-died-in-project-denies-animal-cruelty-claims/9f28d264-19c6-4a42-a9f1-94ca5c6a756f

https://www.jneurosci.org/content/32/41/14125

Reading Background

5500 years of evolution.

5.1

Delay

Find the optimal delay for each connection

6.1

Delay

Find the optimal delay for each connection

Weight

Find the optimal weight for each connection

6.2

〞

un·break·able

– Morpheme Detection

7.1

Biological Model

Adapted from Cognitive Neuroscience: The Biology of the Mind-W. W. Norton & Company (2019)

7.2

Architecture

8.1

Beahviours

8.2

1

LIF as building block for our biological model.

2

Delay makes it possible to accumulate firing effect.

3

Homeostasis makes the stabilization process easier

5

STDP optimize weights via LTD and LTP
mechanism

4

winner take all simplify the model and its

output

6

Dopamine plays an important role
in reinforcement learning

8.3

9.1

Data Specification

p = 0.9

tau = 7

noise = 0.8

1000 words

Experiments

The longer it takes
for the model to converge,
the darker it grows.

9.2

〞

This show was an amazing, fresh & innovative idea in the 70s when it first aired...

– Sentiment Analysis

Architecture

11.1

Behaviours

11.2

1

Population Base Activity regulate long term effect on neuron decision.

2

Stimulus merge and accumulate other population currents.

3

Trace make attention mechanism possible.

5

Condition Reseter separate the simulation in the inference & learning phase

4

Population Winner
is a strategy for output detection in neuron populations.

6

Decision Maker select the worthy neuron population.

11.3

Benchmarks

Adapted from Sentiment Analysis on IMDb benchmark by paperswithcode.com

FlyVec

2021

Images credit: https://flyvec.vizhub.ai

Memristor

2022

Image credit: "Text Classification in Memristor-based Spiking Neural Networks" paper

Memristor

sentiment analysis model

Partially bio-plausible

Delay Learning

Generality

Scalability

tsnn

transformer-inspired architecture

NEW

Fully bio-plausible

Delay Learning

Generality

Scalability

FLYVEC

word embedding model

Bio-plausible

Delay Learning

Generality

Scalability

MODELS & COMPARISONS

Future Works

Model expansion

Increase the number of neurons in the neural population by increasing the model size layer by layer.

Multi domain

Benchmark the model's performance in additional data domains, such as image using gabor filters or other nlp tasks.

Mapping

Achieve optimal convergence by reordering the mapping function in the first seen of the desired outputs.

Thanks

For you attention

STACK & SOURCE CODE