XOR8  HASH  FUNCTION

B04902073   洪崇凱

B02208025   蔡誌烜​

B02705025   賴冠廷

Complexity

md5, sha1

djb2, sdbm

sum8, xor8

B02208025

B02407068

B03701137

B03208028

B04705045

studentID

djb2

xor8

'B02208025'

hash

read

\times33\,+
×33+\times33\,+

djb2:

xor8:

5381
53815381
42
4242
0
00
=
==
177615
177615177615
\oplus
\oplus
=
==
42
4242
48
4848
50
5050
177615
177615177615
42
4242
5861343
58613435861343
193424369
193424369193424369
122
122122
5861343
58613435861343
72
7272
122
122122

hash'

42
4242
48
4848
50
5050
B 0 1 0 0 0 0 1 0
0 0 0 1 1 0 0 0 0
2 0 0 1 1 0 0 1 0
2 0 0 1 1 0 0 1 0
0 0 0 1 1 0 0 0 0
8 0 0 1 1 1 0 0 0
0 0 0 1 1 0 0 0 0
2 0 0 1 1 0 0 1 0
5 0 0 1 1 0 1 0 1
0 1 0 0 1 1 0 1
\oplus
\oplus

Since  XOR  is  bit-independent, symmetric and reflexive...

A=\begin{bmatrix} 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 1 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 1 & 0 & 1 & 0 & 1 \end{bmatrix}
A=[010000100011000000110010001100100011000000111000001100000011001000110101]A=\begin{bmatrix} 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 1 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 1 & 0 & 1 & 0 & 1 \end{bmatrix}
H=\text{xor8}(A)\rightarrow f(\;[1\,1\,1\,1\,1\,1\,1\,1\,1]\cdot A\;)
H=xor8(A)f([111111111]A)H=\text{xor8}(A)\rightarrow f(\;[1\,1\,1\,1\,1\,1\,1\,1\,1]\cdot A\;)
=[0\,1\,0\,0\,1\,1\,0\,1]
=[01001101]=[0\,1\,0\,0\,1\,1\,0\,1]
f([x,y\cdots])=[x\,\text{mod}\,2,y\,\text{mod}\,2\cdots]
f([x,y])=[xmod2,ymod2]f([x,y\cdots])=[x\,\text{mod}\,2,y\,\text{mod}\,2\cdots]

每項有 n bits 做 XOR  ⇒  n 維向量做運算  ⇒

\text{hash}\sim U[0,2^n]
hashU[0,2n]\text{hash}\sim U[0,2^n]

 Conclusion 1A

k\mid2^n\Rightarrow
k2nk\mid2^n\Rightarrow
k\nmid2^n\Rightarrow
k2nk\nmid2^n\Rightarrow

最優(分散最平均)

個桶子有

2^n\,\text{mod}\,k
2nmodk2^n\,\text{mod}\,k
\frac{\lceil\frac{2^n}{k}\rceil}{\lfloor\frac{2^n}{k}\rfloor}
2nk2nk\frac{\lceil\frac{2^n}{k}\rceil}{\lfloor\frac{2^n}{k}\rfloor}

倍碰撞

個桶子沒用

k-2^n
k2nk-2^n

 Optimization 1

k=127
k=127k=127
\rightarrow128
128\rightarrow128
\text{Data}\sim U[0,255]
DataU[0,255]\text{Data}\sim U[0,255]
\sigma=43.7
σ=43.7\sigma=43.7
\rightarrow25.2
25.2\rightarrow25.2

 Optimization 1

k=60
k=60k=60
\rightarrow64
64\rightarrow64
\text{Data}\sim\text{studentID}
DatastudentID\text{Data}\sim\text{studentID}
\sigma=1815
σ=1815\sigma=1815
\rightarrow1782
1782\rightarrow1782
P_A=\begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n}\\ a_{2,1} & a_{2,2} & \cdots & a_{2,n}\\ a_{3,1} & a_{3,2} & \cdots & a_{3,n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{bmatrix}
PA=[a1,1a1,2a1,na2,1a2,2a2,na3,1a3,2a3,nam,1am,2am,n]P_A=\begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n}\\ a_{2,1} & a_{2,2} & \cdots & a_{2,n}\\ a_{3,1} & a_{3,2} & \cdots & a_{3,n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{bmatrix}
P_H=\begin{bmatrix}h_{1} & h_{2} & \cdots & h_{n}\end{bmatrix}
PH=[h1h2hn]P_H=\begin{bmatrix}h_{1} & h_{2} & \cdots & h_{n}\end{bmatrix}
P_A=\begin{bmatrix}x\\y\\z\end{bmatrix}
PA=[xyz]P_A=\begin{bmatrix}x\\y\\z\end{bmatrix}
\rightarrow\Pr(\text{hash}=1)=x(1-y)(1-z)+(1-x)y(1-z)
Pr(hash=1)=x(1y)(1z)+(1x)y(1z)\rightarrow\Pr(\text{hash}=1)=x(1-y)(1-z)+(1-x)y(1-z)
+(1-x)(1-y)z+xyz
+(1x)(1y)z+xyz +(1-x)(1-y)z+xyz
=4(x-\frac{1}{2})(y-\frac{1}{2})(z-\frac{1}{2})+\frac{1}{2}
=4(x12)(y12)(z12)+12=4(x-\frac{1}{2})(y-\frac{1}{2})(z-\frac{1}{2})+\frac{1}{2}

 Conclusion 2A

h=\frac{1}{2}\Leftrightarrow x = \frac{1}{2}\vee y = \frac{1}{2}\vee z = \frac{1}{2}
h=12x=12y=12z=12h=\frac{1}{2}\Leftrightarrow x = \frac{1}{2}\vee y = \frac{1}{2}\vee z = \frac{1}{2}
\rightarrow h_n = \frac{1}{2}\;\Leftrightarrow\;\exists m\ni a_{m,n} = \frac{1}{2}
hn=12mam,n=12\rightarrow h_n = \frac{1}{2}\;\Leftrightarrow\;\exists m\ni a_{m,n} = \frac{1}{2}
P_A=\begin{bmatrix}x\\y\\z\end{bmatrix}
PA=[xyz]P_A=\begin{bmatrix}x\\y\\z\end{bmatrix}
P_H=\,\;[h]
PH=[h]P_H=\,\;[h]
\rightarrow
\rightarrow

當 xor 中有一項遵守 Uniform Distribution

最後結果同樣遵守 Uniform Distribution

A\sim U\Rightarrow (A\oplus B\oplus C)\sim U
AU(ABC)UA\sim U\Rightarrow (A\oplus B\oplus C)\sim U

By Induction

 Conclusion 2B

x=y=z=t
x=y=z=tx=y=z=t

當各項有相同性質,即

t
tt
h
hh

 Optimization 2

'B02    208  025'

2828\oplus520\oplus25
2828520252828\oplus520\oplus25

DEC

HEX

HEX

 Optimization 2

\text{Data}\sim\text{studentID}
DatastudentID\text{Data}\sim\text{studentID}
\text{(Re-grouped)}
(Re-grouped)\text{(Re-grouped)}
\sigma=1782
σ=1782\sigma=1782
\rightarrow147
147\rightarrow147

'B02    208  025'

2828\oplus520\oplus25\,\oplus
2828520252828\oplus520\oplus25\,\oplus
  1. Random
  2. Hash
  3. Histogram Equilization
x
xx

Histogram Equilization

h(\text{``B02208''})=60
h(B02208)=60h(\text{``B02208''})=60

'B02    208  025'

2828\oplus520\oplus25\,\oplus
2828520252828\oplus520\oplus25\,\oplus
60
6060

From Histogram Equilization:

 Optimization 3

\sigma=147
σ=147\sigma=147
\rightarrow48.6
48.6\rightarrow48.6
\text{Data}\sim\text{studentID}
DatastudentID\text{Data}\sim\text{studentID}
\text{(Histogram Equilization)}
(Histogram Equilization)\text{(Histogram Equilization)}
  • 使用適當的桶子數
  • 找到具 Uniform Distribution 的資料特徵
  • 對已知分佈作 Histogram Equilization
2^i, i\leq n
2i,in2^i, i\leq n

xor8

djb2

xor8 (optimized)

\sigma=1815
σ=1815\sigma=1815
\sigma=40.4
σ=40.4\sigma=40.4
\sigma=48.6
σ=48.6\sigma=48.6
6\text{x}
6x6\text{x}

but

(156 ms  ⇒  25 ms)

Faster!

Reference

XOR8 hash function

By RedBug312

XOR8 hash function

Presentation slides of Possibility Project

  • 1,637