Strings

227 高翊恩

關於講師

  • OJ id: pring、pringDeSu
  • 喜歡壓一行
  • 左閉右開

會提到的東西

  • 字串匹配
  • 數回文
  • サ及LCP

關於string

C++ STL string

  • unsigned int size()
  • unsigned int find(char)
  • string substr(from, length) (也會寫成slrs_{l\dots r}
#include <bits/stdc++.h>
using namespace std;

int main() {
	string s = "abcdefg";
    cout << s.size() << endl; 		// 7
    cout << s.find('f') << endl; 	// 5
    cout << s.find('z') << endl; 	// 4294967295
    cout << s.substr(2, 3) << endl; 	// "cde"
    return 0;
}

事先定義

  • ss的第ii前綴:s.substr(0, i) (長度為ii
  • 我喜歡叫它Ps[i]P_s[i]
  • ss的第ii後綴:s.substr(i, s.size() - i) (從s[i]s[i]開始)
#include <bits/stdc++.h>
using namespace std;

string Prefix(string s, string i) {
	return s.substr(0, i);
}

string Suffix(string s, string i) {
	return s.substr(i, s.size() - i);
}

int main() {
    string s = "abcdefg";
    cout << Prefix(s, 3) << endl; // "abc"
    cout << Suffix(s, 4) << endl; // "defg"
    cout << Prefix(s, 0) << endl; // ""
    return 0;
}

事先定義

  • 回文
  • Palindrome
  • 我簡稱PD
#include <bits/stdc++.h>
using namespace std;

bool isPD(string s) {
    int n = s.size();
    for (int i = 0; i < (n >> 1); i++) if (s[i] != s[n - i + 1]) return false;
    return true;
}

事先定義

  • 字典序
  • lexicographical order
  • 從最左邊開始比
  • (char) '\0' = (int) 0
  • bool operator>(string, string)
#include <bits/stdc++.h>
using namespace std;

int main() {
    vector<string> v;
    for (auto &i : v) cin >> i;
    sort(v.begin(), v.end());
    for (auto &i : v) cout << i << endl;
    return 0;
}

Trie

Trie

  • 把一堆字串用樹狀結構存起來
  • 沒了

aaaa

abaaba

bbbb

bbabba

試試看:

Trie

  • 把一堆字串用樹狀結構存起來
  • 沒了

aaaa

abaaba

bbbb

bbabba

試試看:

struct TrieNode {
    int C;
    vector<TrieNode*> child;
    TrieNode() {
        C = 0;
        child = vector<TrieNode*>(26, nullptr);
    }
    void push(string s) {
        TrieNode *now = this;
        for (auto &i : s) {
            if (now -> child[i - 'a'] == nullptr) now -> child[i - 'a'] = new TrieNode();
            now = now -> child[i - 'a'];
        }
        now -> C++;
    }
};

Code

其實滿單純的

  • 給一個大字串ss和一坨小字串t[]t[]
  • 問有多少種組合方式可以只用tt中的字串組合出ss(可重複使用)
  • s5000|s|\leq5000
  • t.size()105,ti<106t.size()\leq10^5,\sum |t_i|<10^6

DP解

  • dp[i]=Ps[i]的組合方法dp[i]=P_s[i]的組合方法
  • dpi+1=jt[(si+1ji+1==j)×dpi+1j]dp_{i+1}=\sum\limits_{j\in t}[(s_{i+1-|j|\dots i+1}==j)\times dp_{i+1-j}]
  • 轉移太久了...

高級一點的DP解

  • dp[i]=Ps[i]的組合方法dp[i]=P_s[i]的組合方法
  • dpi+1=jt[(si+1ji+1==j)×dpi+1j]dp_{i+1}=\sum\limits_{j\in t}[(s_{i+1-j\dots i+1}==j)\times dp_{i+1-j}]
  • 上面這個可以優化成對所有結尾在i+1i+1的子字串,看有沒有在tt裡面
  • 高級技巧:反著做
  • tt弄成一個set

高級一點的DP解

  • dp[i]=Ps[i]的組合方法dp[i]=P_s[i]的組合方法
  • dpi+1=jt[(si+1ji+1==j)×dpi+1j]dp_{i+1}=\sum\limits_{j\in t}[(s_{i+1-j\dots i+1}==j)\times dp_{i+1-j}]
  • 上面這個可以優化成對所有結尾在i+1i+1的子字串,看有沒有在tt裡面
  • 高級技巧:反著做
  • tt弄成一個set

O(n2logn)\Omicron(n^2\log n)

來把log\log壓掉

用空間換取時間

來把log\log壓掉

用空間換取時間

dp[] 0 1 2 3 4 5
val 1 0 1 0

aa

aa

bb

bb

cc

dp[] 0 1 2 3 4 5
val 1 0 1 0

aa

aa

bb

bb

cc

dp[] 0 1 2 3 4 5
val 1 0 1 0

aa

aa

bb

bb

cc

dp[] 0 1 2 3 4 5
val 1 0 1 0

aa

aa

bb

bb

cc

dp[] 0 1 2 3 4 5
val 1 0 1 0

aa

aa

bb

bb

cc

dp[] 0 1 2 3 4 5
val 1 0 1 0

aa

aa

bb

bb

cc

dp[] 0 1 2 3 4 5
val 1 0 1 0

aa

aa

bb

bb

cc

dp[] 0 1 2 3 4 5
val 1 0 1 0 2

aa

aa

bb

bb

cc

所以應該知道Trie可以做什麼了

Hash

把字串當數字

Assembly

16進位制

Σ={09,AF}\Sigma=\{0\dots9,A\dots F\}

 

轉換方式:

8E5C16=3644410\mathbf{8E5C}_{16}={36444}_{10}

26進位制

Σ={az}\Sigma=\{a\dots z\}

 

轉換方式:

pring26=715918410\mathbf{pring}_{26}={7159184}_{10}

問題1

a0a\rightarrow0

pring26=aaaaapring26\mathbf{pring}_{26}=\mathbf{aaaaapring}_{26}

27進位制

Σ={az}\Sigma=\{a\dots z\}

{a,b,,z}{1,2,,26}\{a,b,\dots,z\}\rightarrow\{1,2,\dots,26\}

 

轉換方式:

pring27=886429610\mathbf{pring}_{27}={8864296}_{10}

問題2

數字太大啦!!

所以送它一個%\%

27進位制

Σ={az}\Sigma=\{a\dots z\}

{a,b,,z}{1,2,,26}\{a,b,\dots,z\}\rightarrow\{1,2,\dots,26\}

最後要%(1e9+7)\%(1e9+7)

 

轉換方式:

pring27=886429610\mathbf{pring}_{27}={8864296}_{10}

Hash有兩種

正著做:

Hash(s)=(s0p0+s1p1+s2p2++ss1ps1)%MHash(s)=(s_0p^0+s_1p^1+s_2p^2+\dots+s_{|s|-1}p^{|s|-1})\%M

反著做:

hsaH(s)=(s0ps1+s1ps2+s2ps3++ss1p0)%MhsaH(s)=(s_0p^{|s|-1}+s_1p^{|s|-2}+s_2p^{|s|-3}+\dots+s_{|s|-1}p^0)\%M

Hash 大重點

  • 當兩個字串Hash後的數字相同,則視為兩個字串相同
  • 數字要選好,不然會撞
  • 撞了就多選幾個質數和進位方法多做幾次
  • 剩餘定理

Hash做字串匹配

s=abbabbabs=abbabbab

t=abbabt=abbab

Hash(t)=1084078Hash(t)=1084078

[from, to) substring of s Hash()
[0, 5) abbab 1084078
[1, 6) bbabb 1103033
[2, 7) babba 572294
[3, 8) abbab 1084078

可是直接算Hash太慢了

可不可以利用Hash(s.substr(0,i))Hash(s.substr(0, i)),只用O(1)\Omicron(1)求出Hash(s.substr(1,i))Hash(s.substr(1, i))

遞推法

s0s1s2s3s4hs_0s_1s_2s_3s_4\longrightarrow h

s1s2s3s4hs0pts_1s_2s_3s_4\longrightarrow h-s_0p^{|t|}

s1s2s3s4_(hs0pt)×27s_1s_2s_3s_4\_\longrightarrow (h-s_0p^{|t|})\times 27

s1s2s3s4s5(hs0pt)×27+s5s_1s_2s_3s_4s_5\longrightarrow (h-s_0p^{|t|})\times 27+s_5

記得有模運算

這裡用的是hsaHhsaH

前綴和

開一個陣列PrefPref記錄Hash(Ps[i])Hash(P_s[i])

則可以用O(1)\Omicron(1)求出(i=lr1sipi)%M(\sum\limits_{i=l}^{r-1}s_ip^i)\%M

可是我們要的是(i=lr1sipil)%M(\sum\limits_{i=l}^{r-1}s_ip^{i-l})\%M

模運算

int mod;

int PLUS(int x, int y) {
    return (x + y) % mod;
}

int MINUS(int x, int y) {
    return (x - (y % mod) + mod) % mod;
}

int TIMES(int x, int y) {
    return (x * y) % mod;
}

模逆元

  • P÷Q=P×Q1P\div Q=P\times Q^{-1}

     Q×Q1=1\ \ \ \ \ Q\times Q^{-1}=1

Q×Q11(mod  M)\Rightarrow Q\times Q^{-1}\equiv1(\mod M)

所以想辦法找到Q1Q^{-1}就好了

怎麼找?

費馬小定理

apa(mod  p)a^p\equiv a(\mod p)

a<pap11(mod  p)a<p\Rightarrow a^{p-1}\equiv1(\mod p)

ap2a1(mod  p)\Rightarrow a^{p-2}\equiv a^{-1}(\mod p)

int modPow(int a, int x) {
    int ans = 1;
    for (int i = 1 << 30; i > 0; i >>= 1) {
        ans = ans * ans % mod;
        if (i & x) ans = ans * a % mod;
    }
    return ans;
}

int DIVIDE(int x, int y) {
    return x * modPow(y, mod - 2) % mod;
}

回到Hash

int mod;

int modPow(int a, int x) {
    int ans = 1;
    for (int i = 1 << 30; i > 0; i >>= 1) {
        ans = ans * ans % mod;
        if (i & x) ans = ans * a % mod;
    }
    return ans;
}

struct Hash {
    int p;
    vector<int> pref;
    Hash(string s, int _p) {
    	p = _p;
        int n = s.size();
        pref.resize(n + 1);
        int mul = 1;
        pref[0] = 0;
        for (int i = 0; i < n; i++) {
            pref[i + 1] = (pref[i] + (s[i] - 'a' + 1) * mul) % mod;
            mul = mul * p % mod;
        }
    }
    int query(int from, int len) {
        int ans = (pref[from + len] - pref[from] + mod) % mod;
        return ans * modpow(modpow(p, mod - 2), from) % mod;
    }
};

題單

KMP

共同前後綴

Prefix which is also Suffix

我喜歡叫它CPS (Common Prefix and Suffix)

共同前後綴

  • 是前綴也是後綴的子字串
  • 所有滿足條件的子字串,由長到短排成一列
  • 定義s.CPS(i)s.CPS(i)為這一列中的第ii
字串 CPS(0) CPS(1) CPS(2) CPS(3)
"abcabca" "abcabca" "abca" "a" ""
"abcde" "abcde" "" --- ---
"zzz" "zzz" "zz" "z" ""

共同前後綴

  • 也可以用數字存
  • kk個前綴與第kk個後綴相同
字串 CPS(0) CPS(1) CPS(2) CPS(3)
"abcabca" 7 4 1 0
"abcde" 5 0 -1 -1
"zzz" 3 2 1 0

對所有前綴存

s=abbabbabs=abbabbab

s的前綴 CPS(0) CPS(1) CPS(2) CPS(3)
"" "" --- --- ---
"a" "a" "" --- ---
"ab" "ab" "" --- ---
"abb" "abb" "" --- ---
"abba" "abba" "a" "" ---
"abbab" "abbab" "ab" "" ---
"abbabb" "abbabb" "abb" "" ---
"abbabba" "abbabba" "abba" "a" ""
"abbabbab" "abbabbab" "abbab" "ab" ""

對所有前綴存

s=abbabbabs=abbabbab

Ps[i] CPS(0) CPS(1) CPS(2) CPS(3)
0 0 -1 -1 -1
1 1 0 -1 -1
2 2 0 -1 -1
3 3 0 -1 -1
4 4 1 0 -1
5 5 2 0 -1
6 6 3 0 -1
7 7 4 1 0
8 8 5 2 0

表格減肥

減肥第一步

Ps[i] CPS(0) CPS(1) CPS(2) CPS(3)
0 0 -1 -1 -1
1 1 0 -1 -1
2 2 0 -1 -1
3 3 0 -1 -1
4 4 1 0 -1
5 5 2 0 -1
6 6 3 0 -1
7 7 4 1 0
8 8 5 2 0

↑    這一行沒用

減肥第二步

Ps[i] CPS(1) CPS(2) CPS(3)
0 -1 -1 -1
1 0 -1 -1
2 0 -1 -1
3 0 -1 -1
4 1 0 -1
5 2 0 -1
6 3 0 -1
7 4 1 0
8 5 2 0

觀察性質

減肥第二步

Ps[i] CPS(1) CPS(2) CPS(3)
0 -1 -1 -1
1 0 -1 -1
2 0 -1 -1
3 0 -1 -1
4 1 0 -1
5 2 0 -1
6 3 0 -1
7 4 1 0
8 5 2 0

觀察性質

減肥第二步

Ps[i] CPS(1) CPS(2) CPS(3)
0 -1 -1 -1
1 0 -1 -1
2 0 -1 -1
3 0 -1 -1
4 1 0 -1
5 2 0 -1
6 3 0 -1
7 4 1 0
8 5 2 0

觀察性質

減肥第二步

Ps[i] CPS(1) CPS(2) CPS(3)
0 -1 -1 -1
1 0 -1 -1
2 0 -1 -1
3 0 -1 -1
4 1 0 -1
5 2 0 -1
6 3 0 -1
7 4 1 0
8 5 2 0

觀察性質

證明

s.CPS(2)=s.CPS(1).CPS(1)s.CPS(2)=s.CPS(1).CPS(1)

abbabbaabbabbaabbaabbaaa

abbaabbaaa

abbabbaabbabba

abbabbaabbabba

abbabbaabbabba

abbaabba

abbabbaabbabba

abbaabba

abbabbaabbabba

abbaabba

abbabbaabbabba

abbaabba

aa

abbabbaabbabba

abbaabba

aa

99% Completed!

我們可以推得:

ts.CPSt\in s.CPS

t.CPSs.CPSt.CPS\subset s.CPS

次序問題

  • 根據定義:s.CPS(1)s.CPS(1)為「除了ss本身的最長CPSCPS

  • t=s.CPS(1)t=s.CPS(1),則t.CPS(1)t.CPS(1)就是「除了tt本身的最長CPSCPS
  • 根據剛剛推過的,t.CPS(1)s.CPSt.CPS(1)\in s.CPS,而且是「除了sstt之外的最長CPSCPS
  • 那就是s.CPS(2)s.CPS(2)

次序問題

再認真想一下,我們可以發現

s.CPS(a+b)=s.CPS(a).CPS(b)s.CPS(a+b)=s.CPS(a).CPS(b)

阿這跟減肥有什麼關係

Ps[i] CPS(1) CPS(2) CPS(3)
0 -1 -1 -1
1 0 -1 -1
2 0 -1 -1
3 0 -1 -1
4 1 0 -1
5 2 0 -1
6 3 0 -1
7 4 1 0
8 5 2 0

阿這跟減肥有什麼關係

Ps[i] CPS(1) CPS(2) CPS(3)
0 -1 -1 -1
1 0 -1 -1
2 0 -1 -1
3 0 -1 -1
4 1 0 -1
5 2 0 -1
6 3 0 -1
7 4 1 0
8 5 2 0

阿這跟減肥有什麼關係

Ps[i] CPS(1) CPS(2) CPS(3)
0 -1 -1 -1
1 0 -1 -1
2 0 -1 -1
3 0 -1 -1
4 1 0 -1
5 2 0 -1
6 3 0 -1
7 4 1 0
8 5 2 0

阿這跟減肥有什麼關係

Ps[i] CPS(1) CPS(2) CPS(3)
0 -1 -1 -1
1 0 -1 -1
2 0 -1 -1
3 0 -1 -1
4 1 0 -1
5 2 0 -1
6 3 0 -1
7 4 1 0
8 5 2 0

ttPsP_s的其中一個元素

則有t.CPSPst.CPS\subseteq P_s

 

如果我們對所有ss的前綴,記錄其CPS(1)CPS(1)

t.CPS(2)=t.CPS(1).CPS(1)t.CPS(2)=t.CPS(1).CPS(1)

很容易就可以得到t.CPS(2)t.CPS(2)

t.CPS(3)=t.CPS(2).CPS(1)t.CPS(3)=t.CPS(2).CPS(1)

很容易就可以得到t.CPS(3)t.CPS(3)

...

所以其實存第1行就可以了

Ps[i] CPS(1)
0 -1
1 0
2 0
3 0
4 1
5 2
6 3
7 4
8 5

當我們要找ss的第7個前綴的所有共同前後綴時

Ps[7].CPS(1)P_s[7].CPS(1)

Ps[7].CPS(2)P_s[7].CPS(2)

Ps[7].CPS(3)P_s[7].CPS(3)

終於…

  • 壓成一個一維陣列
  • 利用這個陣列我們可以求出所有前綴的任何CPSCPS
  • 定義πs[i]:Ps[i]\pi_s[i]:P_s[i]的「次長共同前後綴」(CPS(1)CPS(1)

s=abbabbabs=abbabbab

πs={1,0,0,0,1,2,3,4,5}\pi_s=\{-1,0,0,0,1,2,3,4,5\}

暫停一下

找出π\pi陣列

vector<int> Pi(string s) {
    int n = s.size();
    vector<int> result(n + 1);
    result[0] = -1;
    for (int i = 1; i < n; i++) {
        for (int j = i - 1; j >= 0; j--) {
            if (s.substr(0, j) == s.substr(i - j + 1, j)) {
                result[i] = j;
                break;
            }
        }
    }
    return result;
}

複雜度:O(n3)O(n^3)

超級慢

DP一下

初始條件:

π[0]=1,π[1]=0\pi[0]=-1,\pi[1]=0

觀察性質

Ps[πs[i+1]1]Ps[i].CPSP_s[\pi_s[i+1]-1]\in P_s[i].CPS

$$$$$$$$$$$$$$$$$$ $\underbrace{\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$}\dots

Ps[i+1]P_s[i+1]

假設我們要找πs[i+1]\pi_s[i+1]

所以就有找法了

πs[i+1]\pi_s[i+1]

j=1,2,j=1,2,\dots

如果Ps[i].CPS(j)P_s[i].CPS(j)的「下一個字元」==s[i+1]==s[i+1]

πs[i+1]=Ps[i].CPS(j)+1\pi_s[i+1]=P_s[i].CPS(j)+1

$$$$$$$?$$$$$$$$$$?\underbrace{\$\$\$\$\$\$\$?\$\$\$\$\$\$\$\$\$\$?}\dots

Ps[i+1]P_s[i+1]

s=abbacabbabs=abbacabbab\dots

πs={1,0,0,0,1,0,1,2,3,4,?}\pi_s=\{-1, 0, 0, 0, 1, 0, 1, 2, 3, 4, ?\}

abbacabba babbacabba\ b

s=abbacabbabs=abbacabbab\dots

πs={1,0,0,0,1,0,1,2,3,4,?}\pi_s=\{-1, 0, 0, 0, 1, 0, 1, 2, 3, 4, ?\}

abbacabba babbacabba\ b

Ps[9].CPS(1)=πs[9]=4P_s[9].CPS(1)=\pi_s[9]=4

s=abbacabbabs=abbacabbab\dots

πs={1,0,0,0,1,0,1,2,3,4,?}\pi_s=\{-1, 0, 0, 0, 1, 0, 1, 2, 3, 4, ?\}

abbacabba babbacabba\ b

Ps[9].CPS(1)=πs[9]=4P_s[9].CPS(1)=\pi_s[9]=4

s=abbacabbabs=abbacabbab\dots

πs={1,0,0,0,1,0,1,2,3,4,?}\pi_s=\{-1, 0, 0, 0, 1, 0, 1, 2, 3, 4, ?\}

abbacabba babbacabba\ b

Ps[9].CPS(2)=πs[πs[9]]=1P_s[9].CPS(2)=\pi_s[\pi_s[9]]=1

s=abbacabbabs=abbacabbab\dots

πs={1,0,0,0,1,0,1,2,3,4,?}\pi_s=\{-1, 0, 0, 0, 1, 0, 1, 2, 3, 4, ?\}

abbacabba babbacabba\ b

Ps[9].CPS(2)=πs[πs[9]]=1P_s[9].CPS(2)=\pi_s[\pi_s[9]]=1

s=abbacabbabs=abbacabbab\dots

πs={1,0,0,0,1,0,1,2,3,4,?}\pi_s=\{-1, 0, 0, 0, 1, 0, 1, 2, 3, 4, ?\}

abbacabba babbacabba\ b

πs[9+1]=1+1=2\pi_s[9+1]=1+1=2

Code

現場打!

KMP應用

匹配字串

abbaabbababbaabbab

abbaabbababbaabbab

abbababbab

abbababbab

abbababbab

abbababbab

abbababbab

tt往右推

tt每次往右推一格

abbaabbababbaabbab

abbababbab

abbababbab

abbababbab

abbababbab

abbababbab

tt往右推

可不可以預知這兩個「壯志未酬身先死」?

tt每次往右推一格

$$$$$$$$$$$$$$$$$$$ $\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$

t[j]t[j]

$$$$$$$$$$$$$$ $\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$

t[jx]t[j-x]

證明當s[i]t[j]s[i]\neq t[j]時候:

  1. 選某些xSx\in S的情況下可以確保tt可以完美地配到t[jx]t[j-x]
  2. 其他狀況的xx都一定會在配到t[jx]t[j-x]前就爛掉

那我們這個操作就可以將tt向右推SS中最小的

$$$$$$$$$$$$$$$$$$$ $\dots\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$\dots

tt往右推了xx

$$$$$$$$$$$$$$$$$$$ $\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$

t[j]t[j]

$$$$$$$$$$$$$$ $\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$

t[jx]t[j-x]

證明當s[i]t[j]s[i]\neq t[j]時候:

  1. 選某些xSx\in S的情況下可以確保tt可以完美地配到t[jx]t[j-x]
  2. 其他狀況的xx都一定會在配到t[jx]t[j-x]前就爛掉

那我們這個操作就可以將tt向右推SS中最小的

$$$$$$$$$$$$$$$$$$$ $\dots\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$\dots

tt往右推了xx

好啦其實就是CPSCPS

$$$$$$$$$$$$$$$$$$$ $\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$

t[j]t[j]

$$$$$$$$$$$$$$ $\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$

t[jx]t[j-x]

證明當s[i]t[j]s[i]\neq t[j]時候:

  1. 選某些xSx\in S的情況下可以確保tt可以完美地配到t[jx]t[j-x]
  2. 其他狀況的xx都一定會在配到t[jx]t[j-x]前就爛掉

那我們這個操作就可以將tt向右推SS中最小的

$$$$$$$$$$$$$$$$$$$ $\dots\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$\dots

tt往右推了xx

好啦其實就是CPSCPS

$$$$$$$$$$$$$$$$$$$ $\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$

t[j]t[j]

$$$$$$$$$$$$$$ $\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$

t[jx]t[j-x]

假設jxPt[j].CPSj-x\notin P_t[j].CPS

且成功配到藍色格子

$$$$$$$$$$$$$$$$$$$ $\dots\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$\dots

$$$$$$$$$$$$$$$$$$$ $\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$

t[j]t[j]

$$$$$$$$$$$$$$ $\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$

假設jxPt[j].CPSj-x\notin P_t[j].CPS

且成功配到藍色格子

$$$$$$$$$$$$$$$$$$$ $\dots\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$\dots

t[jx]t[j-x]

$$$$$$$$$$$$$$$$$$$ $\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$

t[j]t[j]

$$$$$$$$$$$$$$ $\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$

假設jxPt[j].CPSj-x\notin P_t[j].CPS

且成功配到藍色格子

$$$$$$$$$$$$$$$$$$$ $\dots\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\$\ \$\dots

jxPt[j].CPSj-x\in P_t[j].CPS

\Rightarrow只有推xx格的人可以活下來

t[jx]t[j-x]

所以發現錯誤的時候

就找現在配到的字串的CPS(1) (π)CPS(1)\ (\pi)

然後從剛剛錯的地方繼續

abbaabbababbaabbab

abbababbab

abbababbab

abbababbab

abbabbababbabbab

適當地加入$

abbababbab$\$

abbababbab

Code

vector<int> StringMatching(string s, string t) {
    int n = s.size(), m = t.size();
    vector<int> pi = Pi(t), ans;
    s.push_back('^');
    t.push_back('$');
    for (int i = 0, j = 0; i < n + 1; i++, j++) {
        if (j == m) ans.push_back(i - m);
        while (j >= 0 && s[i] != t[j]) j = pi[j];
    }
    return ans;
}

另一種想法

只要把t,$,st,\$,s連起來

然後做π\pi

然後數$\$後面哪些的π\pi值是t|t|

vector<int> stringMatching(string s, string t) {
    int m = t.size();
    vector<int> v;
    vector<int> pi = Pi(t + "$" + s);
    for (int i = 0; i < pi.size(); i++) {
    	if (pi[i] == m) v.push_back(i - m - m);
    }
    return v;
}

為什麼不先講這個\dots

Prefix Automaton

前綴自動機

abbaa\dots abbaa\dots

abbababbab

abbababbab

abbababbab

優化再優化!

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

t=abbabt=abbab

tt開一張PAPA表:

  • 目前要配對tt的第ii個字元(配完ii個字元)
  • ss的下一個字元為jj
  • 配完這兩個字元後tt可能會被連續地推動
  • 對到s[j+1]s[j+1]的會是t[PA[i][j]]t[PA[i][j]]

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

abbaabbababbaabbab

i 'a' 'b'
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

abbababbab

Code

vector<vector<int>> PrefixAutomaton(string s) {
    int n = s.size();
    vector<vector<int>> dp(n + 1, vector<int>(26, 0));
    vector<int> pi = Pi(s);
    s += '$';
    dp[0][s[0] - 'a'] = 1;
    for (int i = 1; i <= n; i++) for (int j = i; j >= 0; j = pi[j]) if (j < n) dp[i][s[j] - 'a'] = max(dp[i][s[j] - 'a'], j + 1);
    return dp;
}

vector<int> StringMatching(string s, string t) {
    int n = s.size(), m = t.size();
    auto PA = PrefixAutomaton(t);
    vector<int> ans;
    for (int i = 0, j = 0; i < n; i++) {
        j = PA[j][s[i] - 'a'];
        if (j == m) ans.push_back(i - m + 1);
    }
    return ans;
}

建表:O(n)\Omicron(n)

配對:O(1)/\Omicron(1)/每個字元

阿字串匹配不是直接用π\pi做的

幹嘛還要學PA

Compressed String

給定三字串s0,s1,ts_0,s_1,t和一整數nn

si+1=5×si+6×si1,iZ+s_{i+1}=5\times s_i+6\times s_{i-1},i\in\mathbb{Z}^+

求在sns_n裡面可以找到幾個tt

Compressed String

給定三字串s0,s1,ts_0,s_1,t和一整數nn

si+1=5×si+6×si1,iZ+s_{i+1}=5\times s_i+6\times s_{i-1},i\in\mathbb{Z}^+

求在sns_n裡面可以找到幾個tt

顯然地

sn=17{[6×(1)n+6n]s0+[6n(1)n]s1}|s_n|=\frac{1}{7}\{[6\times(-1)^n+6^n]|s_0|+[6^n-(-1)^n]|s_1|\}

直接用π\pi會炸

直接用PAPA也會炸

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0 2
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0 2
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0 2
1 1 2
2 1 3
3 4 0
4 1 5
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0 2
1 1 2
2 1 3
3 4 0
4 1 5 (+1)
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0 2
1 1 2
2 1 3
3 4 0
4 1 5 (+1)
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0 2
1 1 2
2 1 3
3 4 0
4 1 5 (+1)
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0 2
1 1 2
2 1 3
3 4 0
4 1 5 (+1)
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0 2
1 1 2
2 1 3
3 4 0
4 1 5 (+1)
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0 2
1 1 2
2 1 3
3 4 0
4 1 5 2 (+1)
5 1 3

t=abbabt=abbab

那我們就把配對給壓縮吧

i 'a' 'b' "bab"
0 1 0 2
1 1 2 2
2 1 3 5 (+1)
3 4 0 2
4 1 5 2 (+1)
5 1 3 5 (+1)

t=abbabt=abbab

PA的好處

  • 高級KMP
  • 把字串當作字元在做
  • 一次配一個字串

題單

Aho-Corasick Automaton

Automaton

一台自動機=Q,Σ,δ,q0,F一台自動機=\langle Q,\Sigma,\delta,q_0,F\rangle

=狀態,字母表,轉移函式,初始狀態,終止狀態=\langle狀態,字母表,轉移函式,初始狀態,終止狀態\rangle

Q=圈圈們Q=圈圈們

Σ={a,b}\Sigma=\{a,b\}

δ=箭頭們\delta=箭頭們

q0=橘色圈圈q_0=橘色圈圈

F=綠色圈圈們F=綠色圈圈們

走走看:abbaabbababbaabbab

真 ● PA

t=abbabt=abbab

走走看:abbaabbababbaabbab

還記得Trie嗎

在Trie上做PA

δ(q,c)=\delta(q,c)=

q所代表的字串+c中,在q所代表的字串+c中,

有出現在樹上的最長後綴

在Trie上做PA

δ(q,c)=\delta(q,c)=

q所代表的字串+c中,在q所代表的字串+c中,

有出現在樹上的最長後綴

在Trie上做PA

δ(q,c)=\delta(q,c)=

q所代表的字串+c中,在q所代表的字串+c中,

有出現在樹上的最長後綴

在Trie上做PA

δ(q,c)=\delta(q,c)=

q所代表的字串+c中,在q所代表的字串+c中,

有出現在樹上的最長後綴

走走看:abbaabbababbaabbab

但是這樣太久了

要換個做法

qq的樹上後綴

  • 有在樹上的後綴
  • 由長排到短
  • Suffix in Trie
  • 我喜歡叫它SIT

SIT性質

  • qi.SIT(0)=qiq_i.SIT(0)=q_i
  • qi.SIT(2)=qi.SIT(1).SIT(1)q_i.SIT(2)=q_i.SIT(1).SIT(1)
  • 所以我們對所有節點做SIT(1)SIT(1)就可以找到所有節點的SIT(n)SIT(n)

SIT性質

  • qi.SIT(0)=qiq_i.SIT(0)=q_i
  • qi.SIT(2)=qi.SIT(1).SIT(1)q_i.SIT(2)=q_i.SIT(1).SIT(1)
  • 所以我們對所有節點做SIT(1)SIT(1)就可以找到所有節點的SIT(n)SIT(n)
  • 我們叫它Suffix Link

建立Suffix Link

初始條件:

q[0].SIT(1)=nullptrq[0].SIT(1)=nullptr

 

q[i].SIT(1)=q[0]q[i].SIT(1)=q[0]

q[i]depth(1)\forall q[i]\in depth(1)

$$$$$$$$$ $\$\$\$\$\$\$\$\$\$\ \$

藍色Trie\in Trie

假設紅色為qi.SIT(1)q_i.SIT(1)

則紅色Trie\in Trie

則綠色Trie\in Trie

綠色\in藍色SITSIT

qiq_i

qiq_i

所以就有找法了

q[i].SIT(1)q[i].SIT(1)

j=1,2,j=1,2,\dots

如果q[i].father.SIT(j)q[i].father.SIT(j)有辦法走q[i].lastCharq[i].lastChar

q[i].SIT(1)=q[i].father.SIT(j).next[q[i].lastChar]q[i].SIT(1)=q[i].father.SIT(j).next[q[i].lastChar]

$$$$$$$$$ $\$\$\$\$\$\$\$\$\$\ \$

嚇你的

嚇你的

嚇你的

嚇你的

如果他有辦法往aa走的話

那個aa就會是SITSIT

但它顯然沒有

嚇你的

嚇你的

好耶

範例

做它的SLSL

範例

範例

範例

範例

好耶

注意事項

  • SLSL必定往上指
  • 一定要先做完之前每代的所有SLSL才能做這個點
  • BFSBFS

阿這樣子要怎麼轉移

δ(qi,c)\delta(q_i,c)

  • now=qinow=q_i
  • 如果nownow可以走cc,則δ(qi,c)=now.next[c]\delta(q_i,c)=now.next[c]
  • 要不然就now=now.SLnow=now.SL
  • 做到nownowrootroot噴出去為止

走走看:abbaabbababbaabbab

99% completed!

abbabbababbabbab中匹配下列字串:

abbababbab

babbab

baba

99% completed!

abbabbababbabbab中匹配下列字串:

abbababbab

babbab

baba

99% completed!

abbabbababbabbab中匹配下列字串:

abbababbab

babbab

baba

99% completed!

abbabbababbabbab中匹配下列字串:

abbababbab

babbab

baba

為什麼配不到babbabbaba

99% completed!

  • 踩上一個點時,該點代表snow.SIT(1)s_{now}.SIT(1)
  • 但如果snow.SIT(k)Fs_{now}.SIT(k)\in F,那它應該也要被配到
  • 走一遍SLSL路徑
  • 相當於是所有的Answer in Trie

99% completed!

  • 踩上一個點時,該點代表snow.SIT(1)s_{now}.SIT(1)
  • 但如果snow.SIT(k)Fs_{now}.SIT(k)\in F,那它應該也要被配到
  • 走一遍SLSL路徑
  • 相當於是要找到所有的Answer in Trie
  • 我喜歡叫它AIT

99% completed!

  • 跑一遍SLSLAITAIT有點浪費時間\dots
  • 壓縮SLSL的路徑,對每一個點qq都存一條指向qqq.SITFq為最長q'|q'\in q.SIT\cup F且q'為最長
  • 我們叫它Answer Link

100% completed!

  • 走走看:abbabbababbabbab

什麼時候生AIT?

  • 跟著SITSIT一起做
q[i].AIT={q[i].SLif q[i].SLFq[i].SL.AITotherwiseq[i].AIT=\left\{\begin{array}{rcl}q[i].SL&if\ q[i].SL\in F\\q[i].SL.AIT&otherwise\end{array}\right.
q[i].AIT=\left\{\begin{array}{rcl}q[i].SL&if\ q[i].SL\in F\\q[i].SL.AIT&otherwise\end{array}\right.

好耶

看扣囉

struct TrieNode {
    int id;
    char lastChar;
    TrieNode *next[26];
    TrieNode *SL;
    TrieNode *AL;
    TrieNode(TrieNode *parent = nullptr, char _lastChar = '\0') {
        id = -1;
        lastChar = _lastChar;
        fill(next, next + 26, nullptr);
        SL = parent;
        AL = nullptr;
    }
    void push(string s, int _id) {
        TrieNode *now = this;
        for (auto &i : s) {
            if (now -> next[i - 'a'] == nullptr) now -> next[i - 'a'] = new TrieNode(now, i);
            now = now -> next[i - 'a'];
        }
        now -> id = _id;
    }
    void construct() {
        queue<TrieNode*> q;
        for (auto &i : next) {
            if (i == nullptr) continue;
            q.push(i);
        }
        while (q.size()) {
            TrieNode *now = q.front();
            q.pop();
            for (auto &i : now -> next) {
                if (i == nullptr) continue;
                q.push(i);
            }
            TrieNode *back = now -> SL -> SL;
            while (back && back -> next[now -> lastChar - 'a'] == nullptr) back = back -> SL;
            if (back) now -> SL = back -> next[now -> lastChar - 'a'];
            else now -> SL = this;
            if (now -> SL -> id != -1) now -> AL = now -> SL;
            else now -> AL = now -> SL -> AL;
        }
    }
};

struct AC {
    vector<string> s;
    TrieNode *root;
    AC(vector<string> _s) {
        s = _s;
        root = new TrieNode();
    }
    void push(string _s) {
        s.push_back(_s);
    }
    void construct() {
        int n = s.size();
        for (int i = 0; i < n; i++) root -> push(s[i], i);
        root -> construct();
    }
    vector<int> stringMatching(string _s) {
        int n = s.size();
        vector<int> ans(n);
        TrieNode *now = root;
        for (auto &i : _s) {
            while (now && now -> next[i - 'a'] == nullptr) now = now -> SL;
            if (now) now = now -> next[i - 'a'];
            else now = root;
            if (now -> id != -1) ans[now -> id]++;
            TrieNode *back = now -> AL;
            while (back) {
                ans[back -> id]++;
                back = back -> AL;
            }
        }
        return ans;
    }
};

題單

Gusfield's Algorithm

Gusfield's Algorithm

zs[i]z_s[i]

  • arg maxr<si(Ps[r]==si...i+r)\argmax\limits_{r<|s|-i}(P_s[r]==s_{i...i+r})
  • 滿足Ps[r]==sii+rP_s[r]==s_{i\dots i+r}中最大的rr
  • 說穿了就是從s[i]s[i]s[0]s[0]開始最多能夠往右配幾個

s=abbabbabs=abbabbab

zs={_,0,0,5,0,0,2,0}z_s=\{\_,0,0,5,0,0,2,0\}

字串匹配

t+"$"+st+"\$"+szz

就好了

所以要怎麼生zz

zzz

$$$$$$A$$$$$$$$$B$$$\$\$\$\$\$\$A\$\$\$\$\$\$\$\$\$B\$\$\$\dots

l=arg maxj(j+zs[j])l=\argmax\limits_j(j+z_s[j])

現在已經配到最右邊的那個人

我們現在要找zs[i]z_s[i]

根據ii的位置,可以分成22種情況:

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

所以要怎麼生zz

$$$$$$A$$$$$$$$$B$$$\$\$\$\$\$\$A\$\$\$\$\$\$\$\$\$B\$\$\$\dots

  1. i<l+zs[l]i<l+z_s[l]

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

這個時候我們令i=ili'=i-l

即Prefix中對應到的ii

ii

所以要怎麼生zz

$$$$$$A$$$$$$$$$B$$$\$\$\$\$\$\$A\$\$\$\$\$\$\$\$\$B\$\$\$\dots

  1. i<l+zs[l]i<l+z_s[l]

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

這個時候我們令i=ili'=i-l

即Prefix中對應到的ii

ii

ii'

所以要怎麼生zz

$$C$$DA$$$$$$$$$B$$$\$\$C\$\$DA\$\$\$\$\$\$\$\$\$B\$\$\$\dots

1-1. i+zs[i]<zs[l]i'+z_s[i']<z_s[l]

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

ii

ii'

所以要怎麼生zz

$$C$$DA$$$$$C$$DB$$$\$\$C\$\$DA\$\$\$\$\$C\$\$DB\$\$\$\dots

1-1. i+zs[i]<zs[l]i'+z_s[i']<z_s[l]

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

ii

ii'

所以要怎麼生zz

$$C$$DA$$$$$C$$DB$$$\$\$C\$\$DA\$\$\$\$\$C\$\$DB\$\$\$\dots

1-1. i+zs[i]<zs[l]i'+z_s[i']<z_s[l]

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

ii

ii'

zs[i]=zs[i]\Rightarrow z_s[i]=z_s[i']

所以要怎麼生zz

$$$C$$A$$$$$$$$$B$$$\$\$\$C\$\$A\$\$\$\$\$\$\$\$\$B\$\$\$\dots

1-2. i+zs[i]=zs[l]i'+z_s[i']=z_s[l]

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

ii

ii'

所以要怎麼生zz

$$$C$$A$$$$$$C$$B$$$\$\$\$C\$\$A\$\$\$\$\$\$C\$\$B\$\$\$\dots

1-2. i+zs[i]=zs[l]i'+z_s[i']=z_s[l]

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

ii

ii'

所以要怎麼生zz

$$$C$$A$$$$$$C$$B$$$\$\$\$C\$\$A\$\$\$\$\$\$C\$\$B\$\$\$\dots

1-2. i+zs[i]=zs[l]i'+z_s[i']=z_s[l]

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

ii

ii'

ABA\neq B

CBC\neq B

AC?A\neq C?

所以要怎麼生zz

$$$C$$A$$$$$$C$$B$$$\$\$\$C\$\$A\$\$\$\$\$\$C\$\$B\$\$\$\dots

1-2. i+zs[i]=zs[l]i'+z_s[i']=z_s[l]

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

ii

ii'

所以這個時候

只能確定zs[i]zs[i]z_s[i]\geq z_s[i']

那就繼續配下去

所以要怎麼生zz

$$$$C$AD$$$$$$$$B$$$\$\$\$\$C\$AD\$\$\$\$\$\$\$\$B\$\$\$\dots

1-3. i+zs[i]>zs[l]i'+z_s[i']>z_s[l]

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

ii

ii'

所以要怎麼生zz

$$$$C$AD$$$$$$C$B$$$\$\$\$\$C\$AD\$\$\$\$\$\$C\$B\$\$\$\dots

1-3. i+zs[i]>zs[l]i'+z_s[i']>z_s[l]

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

ii

ii'

所以要怎麼生zz

$$$$C$AD$$$$$$C$B$$$\$\$\$\$C\$AD\$\$\$\$\$\$C\$B\$\$\$\dots

1-3. i+zs[i]>zs[l]i'+z_s[i']>z_s[l]

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

ii

ii'

ABA\neq B

所以要怎麼生zz

$$$$C$AD$$$$$$C$B$$$\$\$\$\$C\$AD\$\$\$\$\$\$C\$B\$\$\$\dots

1-3. i+zs[i]>zs[l]i'+z_s[i']>z_s[l]

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

ii

ii'

zs[i]=zs[l]i\Rightarrow z_s[i]=z_s[l]-i'

所以要怎麼生zz

$$$$$$A$$$$$$$$$B$$$\$\$\$\$\$\$A\$\$\$\$\$\$\$\$\$B\$\$\$\dots

2. il+zs[l]i\geq l+z_s[l]

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

ii

所以要怎麼生zz

$$$$$$A$$$$$$$$$B$$$\$\$\$\$\$\$A\$\$\$\$\$\$\$\$\$B\$\$\$\dots

ll

zs[l]z_s[l]

l+zs[l]l+z_s[l]

ii

沒料

只好從頭配了

2. il+zs[l]i\geq l+z_s[l]

Code

vector<int> Z(string s) {
    int n = s.size();
    vector<int> z(n);
    z[0] = 0;
    int l = 0;
    for (int i = 1; i < n; i++) {
        if (i >= l + z[l]) {                                                // 2.
            for (z[i] = 0; i + z[i] < n && s[z[i]] == s[i + z[i]]; z[i]++);
            l = i;
            continue;
        }
        int i_ = i - l;
        if (i_ + z[i_] < z[l]) z[i] = z[i_];                                // 1-1.
        else if (i_ + z[i_] > z[l]) z[i] = z[l] - i_;                       // 1-3.
        else {                                                              // 1-2.
            for (z[i] = i_; i + z[i] < n && s[z[i]] == s[i + z[i]]; z[i]++);
            if (z[i] > z[l]) l = i;
        }
    }
    return z;
}

int stringMatching(string s, string t) {
    int n = s.size(), m = t.size(), C = 0;
    vector<int> z = Z(t + "$" + s);
    for (int i = m + 1; i < n + m + 1; i++) {
        if (z[i] == t.size()) C++;
    }
    return C;
}

題單

Manacher

終於要用到PD了!

配回文的小技巧

  • 回文長度有奇有偶,找不到中心怎麼辦
  • 乾脆在字元和字元間夾一個*好了

abbababbab

abbab*a*b*b*a*b*

ms[i]m_s[i]

  • arg maxx<min(i,ni)(isPD(sixi+x+1))\argmax\limits_{x<\min(i,n-i)}(\operatorname{isPD}(s_{i-x\dots i+x+1}))
  • 滿足isPD(sixi+x+1)isPD(s_{i-x\dots i+x+1})中最大的xx
  • 說穿了就是以s[i]s[i]為中心的最長回文半徑(半徑不包含自己)

s=abbabs=*a*b*b*a*b*

zs={0,1,0,1,4,1,0,3,0,1,0}z_s=\{0,1,0,1,4,1,0,3,0,1,0\}

數回文

ms[i]\sum m_s[i]

就好了

所以要怎麼生mm

zzz

$$A$$$$$$$$$$$$$B$$$\$\$A\$\$\$\$\$\$\$\$\$\$\$\$\$B\$\$\$\dots

我們現在要找ms[i]m_s[i]

l=arg maxj<i(j+ms[j])l=\argmax\limits_{j<i}(j+m_s[j])

展開雙翼(?)後最右邊的人

根據ii的位置,可以分成22種情況:

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

所以要怎麼生mm

$$A$$$$$$$$$$$$$B$$$\$\$A\$\$\$\$\$\$\$\$\$\$\$\$\$B\$\$\$\dots

1. il+ms[l]i\leq l+m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

這個時候我們令i=l(il)i'=l-(i-l)

即以s[l]s[l]對稱過去的ii

所以要怎麼生mm

$$A$$$$$$$$$$$$$B$$$\$\$A\$\$\$\$\$\$\$\$\$\$\$\$\$B\$\$\$\dots

1. il+ms[l]i\leq l+m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

這個時候我們令i=l(il)i'=l-(i-l)

即以s[l]s[l]對稱過去的ii

ii'

所以要怎麼生mm

$$AC$$$$$D$$$$$$B$$$\$\$AC\$\$\$\$\$D\$\$\$\$\$\$B\$\$\$\dots

1-1. ims[i]>lms[l]i'-m_s[i']>l-m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

ii'

所以要怎麼生mm

$$AC$$$$$D$$$$$CB$$$\$\$AC\$\$\$\$\$D\$\$\$\$\$CB\$\$\$\dots

1-1. ims[i]>lms[l]i'-m_s[i']>l-m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

ii'

所以要怎麼生mm

$$AC$$$$$D$$$$$CB$$$\$\$AC\$\$\$\$\$D\$\$\$\$\$CB\$\$\$\dots

1-1. ims[i]>lms[l]i'-m_s[i']>l-m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

ii'

ms[i]=ms[i]\Rightarrow m_s[i]=m_s[i']

所以要怎麼生mm

$$A$$$$$$$C$$$$$B$$$\$\$A\$\$\$\$\$\$\$C\$\$\$\$\$B\$\$\$\dots

1-2. ims[i]=lms[l]i'-m_s[i']=l-m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

ii'

所以要怎麼生mm

$$A$$$$$C$C$$$$$B$$$\$\$A\$\$\$\$\$C\$C\$\$\$\$\$B\$\$\$\dots

1-2. ims[i]=lms[l]i'-m_s[i']=l-m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

ii'

所以要怎麼生mm

$$A$$$$$C$C$$$$$B$$$\$\$A\$\$\$\$\$C\$C\$\$\$\$\$B\$\$\$\dots

1-2. ims[i]=lms[l]i'-m_s[i']=l-m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

ii'

ABA\neq B

CBC\neq B

AC?A\neq C?

所以要怎麼生mm

$$A$$$$$C$C$$$$$B$$$\$\$A\$\$\$\$\$C\$C\$\$\$\$\$B\$\$\$\dots

1-2. ims[i]=lms[l]i'-m_s[i']=l-m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

ii'

所以這個時候

只能確定ms[i]ms[i]m_s[i]\geq m_s[i']

那就繼續配下去

所以要怎麼生mm

$CA$$$$$$$$D$$$$B$$$\$CA\$\$\$\$\$\$\$\$D\$\$\$\$B\$\$\$\dots

1-3. ims[i]<lms[l]i'-m_s[i']<l-m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

ii'

所以要怎麼生mm

$CA$$$$D$$$D$$$$B$$\$CA\$\$\$\$D\$\$\$D\$\$\$\$B\$\$\dots

1-3. ims[i]<lms[l]i'-m_s[i']<l-m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

ii'

所以要怎麼生mm

$CA$$$$D$$$D$$$$B$$\$CA\$\$\$\$D\$\$\$D\$\$\$\$B\$\$\dots

1-3. ims[i]<lms[l]i'-m_s[i']<l-m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

ii'

ABA\neq B

所以要怎麼生mm

$CA$$$$D$$$D$$$$B$$\$CA\$\$\$\$D\$\$\$D\$\$\$\$B\$\$\dots

1-3. ims[i]<lms[l]i'-m_s[i']<l-m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

ii'

ms[i]=il+ms[l]\Rightarrow m_s[i]=i'-l+m_s[l]

所以要怎麼生mm

$$A$$$$$$$$$$$$$B$$$\$\$A\$\$\$\$\$\$\$\$\$\$\$\$\$B\$\$\$\dots

2. i>l+ms[l]i>l+m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

所以要怎麼生mm

$$A$$$$$$$$$$$$$B$$$\$\$A\$\$\$\$\$\$\$\$\$\$\$\$\$B\$\$\$\dots

2. i>l+ms[l]i>l+m_s[l]

ll

lms[l]l-m_s[l]

l+ms[l]l+m_s[l]

ii

沒料

只好從頭配了

Code

vector<int> M(string s) {
    string br = "*";
    for (auto &i : s) br += string(1, s[i]) + "*";
    s = br;
    int n = s.size();
    vector<int> m(n);
    m[0] = 0;
    int l = 0;
    for (int i = 1; i < n; i++) {
        if (i > l + m[l]) {
            for (m[i] = 0; i - m[i] - 1 >= 0 && i + m[i] + 1 < n && s[i - m[i] - 1] == s[i + m[i] + 1]; m[i]++);
            l = i;
            continue;
        }
        int i_ = l - i + l;
        if (i_ - m[i_] > l - m[l]) m[i] = m[i_];
        else if (i_ - m[i_] < l - m[l]) m[i] = i_ - l + m[l];
        else {
            for (m[i] = m[i_]; i - m[i] - 1 >= 0 && i + m[i] + 1 < n && s[i - m[i] - 1] == s[i + m[i] + 1]; m[i]++);
            if (i + m[i] > l + m[l]) l = i; 
        }
    }
    return m;
}

題單

還記得後綴嗎

s=abbabs=abbab

Ss[0]=\0S_s[0]=\backslash0

Ss[1]=bS_s[1]=b

Ss[2]=abS_s[2]=ab

Ss[3]=babS_s[3]=bab

Ss[4]=bbabS_s[4]=bbab

Ss[5]=abbabS_s[5]=abbab

Ss[1]=bS_s[1]=b

Ss[2]=abS_s[2]=ab

Ss[3]=babS_s[3]=bab

Ss[4]=bbabS_s[4]=bbab

Ss[5]=abbabS_s[5]=abbab

Ss[0]=\0S_s[0]=\backslash0

還記得後綴嗎

s=abbabs=abbab

Ss[0]=\0S_s[0]=\backslash0

Ss[1]=bS_s[1]=b

Ss[2]=abS_s[2]=ab

Ss[3]=babS_s[3]=bab

Ss[4]=bbabS_s[4]=bbab

Ss[5]=abbabS_s[5]=abbab

Ss[1]=bS_s[1]=b

Ss[2]=abS_s[2]=ab

Ss[3]=babS_s[3]=bab

Ss[4]=bbabS_s[4]=bbab

Ss[5]=abbabS_s[5]=abbab

Ss[0]=\0S_s[0]=\backslash0

sort

還記得後綴嗎

s=abbabs=abbab

Ss[0]=\0S_s[0]=\backslash0

Ss[1]=bS_s[1]=b

Ss[2]=abS_s[2]=ab

Ss[3]=babS_s[3]=bab

Ss[4]=bbabS_s[4]=bbab

Ss[5]=abbabS_s[5]=abbab

sort

={0,2,5,1,3,4}サ=\{0,2,5,1,3,4\}

總之這就是サ

vector<int> SA(string s) {
    int n = s.size();
    vector<int> sa(n + 1);
    vector<pair<string, int>> v(n + 1);
    for (int i = 0; i <= n; i++) v[i] = {s.substr(i, n - i), i};
    sort(v.begin(), v.end());
    for (int i = 0; i <= n; i++) sa[i] = v[i].second;
    return sa;
}

O(n2logn)\Omicron(n^2\log n)

另一個サ

s.cycle(0)=abbab\0s.cycle(0)=abbab\backslash0

s.cycle(3)=ab\0abbs.cycle(3)=ab\backslash0abb

s.cycle(4)=b\0abbas.cycle(4)=b\backslash0abba

s.cycle(5)=\0abbabs.cycle(5)=\backslash0abbab

s.cycle(1)=bbab\0as.cycle(1)=bbab\backslash0a

s.cycle(2)=bab\0abs.cycle(2)=bab\backslash0ab

s.cycle(0)=abbab\0s.cycle(0)=abbab\backslash0

s.cycle(3)=ab\0abbs.cycle(3)=ab\backslash0abb

s.cycle(4)=b\0abbas.cycle(4)=b\backslash0abba

s.cycle(5)=\0abbabs.cycle(5)=\backslash0abbab

s.cycle(1)=bbab\0as.cycle(1)=bbab\backslash0a

s.cycle(2)=bab\0abs.cycle(2)=bab\backslash0ab

另一個サ

s.cycle(0)=abbab\0s.cycle(0)=abbab\backslash0

s.cycle(3)=ab\0abbs.cycle(3)=ab\backslash0abb

s.cycle(4)=b\0abbas.cycle(4)=b\backslash0abba

s.cycle(5)=\0abbabs.cycle(5)=\backslash0abbab

s.cycle(1)=bbab\0as.cycle(1)=bbab\backslash0a

s.cycle(2)=bab\0abs.cycle(2)=bab\backslash0ab

s.cycle(0)=abbab\0s.cycle(0)=abbab\backslash0

s.cycle(3)=ab\0abbs.cycle(3)=ab\backslash0abb

s.cycle(4)=b\0abbas.cycle(4)=b\backslash0abba

s.cycle(5)=\0abbabs.cycle(5)=\backslash0abbab

s.cycle(1)=bbab\0as.cycle(1)=bbab\backslash0a

s.cycle(2)=bab\0abs.cycle(2)=bab\backslash0ab

sort

另一個サ

s.cycle(0)=abbab\0s.cycle(0)=abbab\backslash0

s.cycle(3)=ab\0abbs.cycle(3)=ab\backslash0abb

s.cycle(4)=b\0abbas.cycle(4)=b\backslash0abba

s.cycle(5)=\0abbabs.cycle(5)=\backslash0abbab

s.cycle(1)=bbab\0as.cycle(1)=bbab\backslash0a

s.cycle(2)=bab\0abs.cycle(2)=bab\backslash0ab

sort

={5,3,0,4,2,1}サ=\{5,3,0,4,2,1\}

我們其實比較喜歡這個排序

複雜度優化

先來看個小動畫

Cyc string rank
0 abbab\0
1 bbab\0a
2 bab\0ab
3 ab\0abb
4 b\0abba
5 \0abbab
Cyc string rank
0 abbab\0
1 bbab\0a
2 bab\0ab
3 ab\0abb
4 b\0abba
5 \0abbab

0a_0a

Text

1b_1b

Text

2b_2b

Text

3a_3a

Text

4b_4b

Text

5\0_5\backslash0

Text

Cyc string rank
0 abbab\0
1 bbab\0a
2 bab\0ab
3 ab\0abb
4 b\0abba
5 \0abbab

0a_0a

Text

1b_1b

Text

2b_2b

Text

3a_3a

Text

4b_4b

Text

5\0_5\backslash0

Text

Cyc string rank
0 abbab\0
1 bbab\0a
2 bab\0ab
3 ab\0abb
4 b\0abba
5 \0abbab

0a_0a

1

1b_1b

3

2b_2b

3

3a_3a

1

4b_4b

3

5\0_5\backslash0

0

Cyc string rank
0 abbab\0 1
1 bbab\0a 3
2 bab\0ab 3
3 ab\0abb 1
4 b\0abba 3
5 \0abbab 0

0a_0a

1

1b_1b

3

2b_2b

3

3a_3a

1

4b_4b

3

5\0_5\backslash0

0

Cyc string rank
0 abbab\0 1
1 bbab\0a 3
2 bab\0ab 3
3 ab\0abb 1
4 b\0abba 3
5 \0abbab 0

0a b_0a\ b

1

1b b_1b\ b

3

2b a_2b\ a

3

3a b_3a\ b

1

4b \0_4b\ \backslash0

3

5\0 a_5\backslash0\ a

0

Cyc string rank
0 abbab\0 1
1 bbab\0a 3
2 bab\0ab 3
3 ab\0abb 1
4 b\0abba 3
5 \0abbab 0

0a b_0a\ b

1, 3

1b b_1b\ b

3, 3

2b a_2b\ a

3, 1

3a b_3a\ b

1, 3

4b \0_4b\ \backslash0

3, 0

5\0 a_5\backslash0\ a

0, 1

Cyc string rank
0 abbab\0 1
1 bbab\0a 3
2 bab\0ab 3
3 ab\0abb 1
4 b\0abba 3
5 \0abbab 0

0a b_0a\ b

1, 3

1b b_1b\ b

3, 3

2b a_2b\ a

3, 1

3a b_3a\ b

1, 3

4b \0_4b\ \backslash0

3, 0

5\0 a_5\backslash0\ a

0, 1

Cyc string rank
0 abbab\0 1
1 bbab\0a 3
2 bab\0ab 3
3 ab\0abb 1
4 b\0abba 3
5 \0abbab 0

0ab_0ab

1

1bb_1bb

5

2ba_2ba

4

3ab_3ab

1

4b\0_4b\backslash0

3

5\0a_5\backslash0a

0

Cyc string rank
0 abbab\0 1
1 bbab\0a 5
2 bab\0ab 4
3 ab\0abb 1
4 b\0abba 3
5 \0abbab 0

0ab_0ab

1

1bb_1bb

5

2ba_2ba

4

3ab_3ab

1

4b\0_4b\backslash0

3

5\0a_5\backslash0a

0

Cyc string rank
0 abbab\0 1
1 bbab\0a 5
2 bab\0ab 4
3 ab\0abb 1
4 b\0abba 3
5 \0abbab 0

0ab ba_0ab\ ba

1

1bb ab_1bb\ ab

5

2ba b\0_2ba\ b\backslash0

4

3ab \0a_3ab\ \backslash0a

1

4b\0 ab_4b\backslash0\ ab

3

5\0a bb_5\backslash0a\ bb

0

Cyc string rank
0 abbab\0 1
1 bbab\0a 5
2 bab\0ab 4
3 ab\0abb 1
4 b\0abba 3
5 \0abbab 0

0ab ba_0ab\ ba

1, 4

1bb ab_1bb\ ab

5, 1

2ba b\0_2ba\ b\backslash0

4, 3

3ab \0a_3ab\ \backslash0a

1, 0

4b\0 ab_4b\backslash0\ ab

3, 1

5\0a bb_5\backslash0a\ bb

0, 5

Cyc string rank
0 abbab\0 1
1 bbab\0a 5
2 bab\0ab 4
3 ab\0abb 1
4 b\0abba 3
5 \0abbab 0

0ab ba_0ab\ ba

1, 4

1bb ab_1bb\ ab

5, 1

2ba b\0_2ba\ b\backslash0

4, 3

3ab \0a_3ab\ \backslash0a

1, 0

4b\0 ab_4b\backslash0\ ab

3, 1

5\0a bb_5\backslash0a\ bb

0, 5

Cyc string rank
0 abbab\0 1
1 bbab\0a 5
2 bab\0ab 4
3 ab\0abb 1
4 b\0abba 3
5 \0abbab 0

0abba_0abba

2

1bbab_1bbab

5

2bab\0_2bab\backslash0

4

3ab\0a_3ab\backslash0a

1

4b\0ab_4b\backslash0ab

3

5\0abb_5\backslash0abb

0

Cyc string rank
0 abbab\0 2
1 bbab\0a 5
2 bab\0ab 4
3 ab\0abb 1
4 b\0abba 3
5 \0abbab 0

0abba_0abba

2

1bbab_1bbab

5

2bab\0_2bab\backslash0

4

3ab\0a_3ab\backslash0a

1

4b\0ab_4b\backslash0ab

3

5\0abb_5\backslash0abb

0

Cyc string rank
0 abbab\0 2
1 bbab\0a 5
2 bab\0ab 4
3 ab\0abb 1
4 b\0abba 3
5 \0abbab 0

0abba b\0ab_0abba\ b\backslash0ab

2

1bbab \0abb_1bbab\ \backslash0abb

5

2bab\0 abba_2bab\backslash0\ abba

4

3ab\0a bbab_3ab\backslash0a\ bbab

1

4b\0ab bab\0_4b\backslash0ab\ bab\backslash0

3

5\0abb ab\0a_5\backslash0abb\ ab\backslash0a

0

Cyc string rank
0 abbab\0 2
1 bbab\0a 5
2 bab\0ab 4
3 ab\0abb 1
4 b\0abba 3
5 \0abbab 0

0abbab\0ab_0abbab\backslash0ab

2

1bbab\0abb_1bbab\backslash0abb

5

2bab\0abba_2bab\backslash0abba

4

3ab\0abbab_3ab\backslash0abbab

1

4b\0abbab\0_4b\backslash0abbab\backslash0

3

5\0abbab\0a_5\backslash0abbab\backslash0a

0

Cyc string rank
0 abbab\0 2
1 bbab\0a 5
2 bab\0ab 4
3 ab\0abb 1
4 b\0abba 3
5 \0abbab 0

0abbab\0ab_0abbab\backslash0ab

2

1bbab\0abb_1bbab\backslash0abb

5

2bab\0abba_2bab\backslash0abba

4

3ab\0abbab_3ab\backslash0abbab

1

4b\0abbab\0_4b\backslash0abbab\backslash0

3

5\0abbab\0a_5\backslash0abbab\backslash0a

0

統整一下

將每個字串往後抄

目前長度個字元

把兩段字串分別用rankrank值表示

弄出一個pairpair

利用pairpair排序

把新的rankrank抄回去

統整一下

將每個字串往後抄

目前長度個字元

把兩段字串分別用rankrank值表示

弄出一個pairpair

利用pairpair排序

把新的rankrank抄回去

總共要繞logn\log n

複雜度

將每個字串往後抄

目前長度個字元

把兩段字串分別用rankrank值表示

弄出一個pairpair

利用pairpair排序

把新的rankrank抄回去

總共要繞logn\log n

O(n)\Omicron(n)

O(nlogn)\Omicron(n\log n)

O(n)\Omicron(n)

複雜度

將每個字串往後抄

目前長度個字元

把兩段字串分別用rankrank值表示

弄出一個pairpair

利用pairpair排序

把新的rankrank抄回去

總共要繞logn\log n

O(n)\Omicron(n)

O(nlogn)\Omicron(n\log n)

O(n)\Omicron(n)

總複雜度O(nlog2n)\Omicron(n\log^2n)

struct P {
    pii p;
    int id;
};

vector<int> SA(string s) {
    int n = s.size();
    vector<int> r(n + 1);
    vector<P> v(n + 1);
    vector<vector<P>> bar(n + 1);
    function<bool(P, P)> cmp = [](P a, P b) {
        return a.p < b.p;
    };
    function<void(void)> GetRank = [&]() {
        for (int i = 0, j = 0; i <= n; i = j) {
            while (j <= n && v[i].p == v[j].p) j++;
            for (int k = i; k < j; k++) {
                v[k].p.first = i;
                r[v[k].id] = i;
            }
        }
    };
    for (int i = 0; i < n; i++) v[i] = {{s[i] - 'a' + 1, 0}, i};
    v[n] = {{0, 0}, n};
    sort(v.begin(), v.end(), [](P a, P b){return a.p < b.p;});
    GetRank();
    int len = 1;
    while (len <= n) {
        for (auto &i : v) i.p.second = r[(i.id + len) % (n + 1)];
        sort(v.begin(), v.end(), cmp);
        GetRank();
        len <<= 1;
    }
    return r;
}

Code

Radix Sort

再快一點點

排序下列數字:

{41,78,38,57,59,50,72,43,46,61}\{41,78,38,57,59,50,72,43,46,61\}

排序下列數字:

{41,78,38,57,59,50,72,43,46,61}\{41,78,38,57,59,50,72,43,46,61\}

0
1
2
3
4
5
6
7
8
9

排序下列數字:

{41,78,38,57,59,50,72,43,46,61}\{41,78,38,57,59,50,72,43,46,61\}

0 50
1 41 61
2 72
3 43
4
5
6 46
7 57
8 78 38
9 59

{41,78,38,57,59,50,72,43,46,61}\{41,78,38,57,59,50,72,43,46,61\}

{50,41,61,72,43,46,57,78,38,59}\Rightarrow\{50,41,61,72,43,46,57,78,38,59\}

0 50
1 41 61
2 72
3 43
4
5
6 46
7 57
8 78 38
9 59

{41,78,38,57,59,50,72,43,46,61}\{41,78,38,57,59,50,72,43,46,61\}

{50,41,61,72,43,46,57,78,38,59}\Rightarrow\{50,41,61,72,43,46,57,78,38,59\}

0
1
2
3
4
5
6
7
8
9

{41,78,38,57,59,50,72,43,46,61}\{41,78,38,57,59,50,72,43,46,61\}

{50,41,61,72,43,46,57,78,38,59}\Rightarrow\{50,41,61,72,43,46,57,78,38,59\}

0
1
2
3 38
4 41 43 46
5 50 57 59
6 61
7 72 78
8
9

{41,78,38,57,59,50,72,43,46,61}\{41,78,38,57,59,50,72,43,46,61\}

{50,41,61,72,43,46,57,78,38,59}\Rightarrow\{50,41,61,72,43,46,57,78,38,59\}

{38,41,43,46,50,57,59,61,72,78}\Rightarrow\{38,41,43,46,50,57,59,61,72,78\}

0
1
2
3 38
4 41 43 46
5 50 57 59
6 61
7 72 78
8
9

複雜度

將每個字串往後抄

目前長度個字元

把兩段字串分別用rankrank值表示

弄出一個pairpair

利用pairpair排序

把新的rankrank抄回去

總共要繞logn\log n

O(n)\Omicron(n)

O(nlogn)\Omicron(n\log n)

O(n)\Omicron(n)

複雜度

將每個字串往後抄

目前長度個字元

把兩段字串分別用rankrank值表示

弄出一個pairpair

利用pairpair排序

把新的rankrank抄回去

總共要繞logn\log n

O(n)\Omicron(n)

O(n)\Omicron(n)

O(n)\Omicron(n)

複雜度

將每個字串往後抄

目前長度個字元

把兩段字串分別用rankrank值表示

弄出一個pairpair

利用pairpair排序

把新的rankrank抄回去

總共要繞logn\log n

O(n)\Omicron(n)

O(n)\Omicron(n)

O(n)\Omicron(n)

總複雜度O(nlogn)\Omicron(n\log n)

struct P {
    pii p;
    int id;
};

vector<int> SA(string s) {
    int n = s.size();
    vector<int> r(n + 1);
    vector<P> v(n + 1);
    vector<vector<P>> bar(n + 1);
    function<void(void)> RS = [&]() {
        for (auto &i : v) bar[i.p.second].push_back(i);
        for (int i = 0, j = 0; i <= n; i++) {
            for (auto &k : bar[i]) v[j++] = k;
            bar[i].clear();
        }
        for (auto &i : v) bar[i.p.first].push_back(i);
        for (int i = 0, j = 0; i <= n; i++) {
            for (auto &k : bar[i]) v[j++] = k;
            bar[i].clear();
        }
    };
    function<void(void)> GetRank = [&]() {
        for (int i = 0, j = 0; i <= n; i = j) {
            while (j <= n && v[i].p == v[j].p) j++;
            for (int k = i; k < j; k++) {
                v[k].p.first = i;
                r[v[k].id] = i;
            }
        }
    };
    for (int i = 0; i < n; i++) v[i] = {{s[i] - 'a' + 1, 0}, i};
    v[n] = {{0, 0}, n};
    sort(v.begin(), v.end(), [](P a, P b){return a.p < b.p;});
    GetRank();
    int len = 1;
    while (len <= n) {
        for (auto &i : v) i.p.second = r[(i.id + len) % (n + 1)];
        RS();
        GetRank();
        len <<= 1;
    }
    return r;
}

Code

題單

LCP

LCP

  • Longest Common Prefix
  • 最長共同前綴

LCP(abbab,abab)=2LCP(abbab,abab)=2

abbababbab

abababab

接下來的目標

生出一個資料結構,可以幫我們查詢任意兩組後綴的LCP

接下來的目標

生出一個資料結構,可以幫我們查詢任意兩組後綴的LCP

s=abbab\0s=abbab\backslash0

Ss[0]=\0S_s[0]=\backslash0

Ss[1]=b\0S_s[1]=b\backslash0

Ss[2]=ab\0S_s[2]=ab\backslash0

Ss[3]=bab\0S_s[3]=bab\backslash0

Ss[4]=bbab\0S_s[4]=bbab\backslash0

Ss[5]=abbab\0S_s[5]=abbab\backslash0

LCPs(1,4)LCP_s(1,4)

=LCP(b,bbab)=LCP(b,bbab)

=1=1

接下來的目標

生出一個資料結構,可以幫我們查詢任意兩組後綴的LCP

好啦我們還是比較喜歡cyclecycle

接下來的目標

生出一個資料結構,可以幫我們查詢任意兩個cyclecycle的LCP

接下來的目標

生出一個資料結構,可以幫我們查詢任意兩個cyclecycle的LCP

s=abbab\0s=abbab\backslash0

s.cycle(0)=abbab\0s.cycle(0)=abbab\backslash0

s.cycle(1)=bbab\0as.cycle(1)=bbab\backslash0a

s.cycle(2)=bab\0abs.cycle(2)=bab\backslash0ab

s.cycle(3)=ab\0abbs.cycle(3)=ab\backslash0abb

s.cycle(4)=b\0abbas.cycle(4)=b\backslash0abba

s.cycle(5)=\0abbabs.cycle(5)=\backslash0abbab

LCPs(4,1)LCP_s(4,1)

=LCP(b\0abba,bbab\0a)=LCP(b\backslash0abba,bbab\backslash0a)

=1=1

做法

還記得剛剛的rankrank陣列嗎

string a b b a b \0
rank 1 3 3 1 3 0
string ab bb bb ab b\0 \0a
rank 1 5 4 1 3 0
string abba bbab bab\0 ab\0a b\0ab \0abb
rank 2 5 4 1 3 0
string ... ...
rank 2 5 4 1 3 0

把所有rankrank存起來的話…

把所有rankrank存起來的話…

  • rank[i][j]rank[i][j]代表s.sycle(j)s.sycle(j)在第ii輪的排名
  • 如果我們想知道s.cycle(x)s.cycle(x)s.cycle(y)s.cycle(y)的第2i2^i個前綴是否相同
  • 就去查rank[i][x],rank[i][y]rank[i][x],rank[i][y],如果數字相同就代表前綴相同

把所有rankrank存起來的話…

  • 這代表我們可以二分搜前綴長度!
  • 複雜度:O(logn)\Omicron(\log n)每次查詢
  • (不考慮建rankrank的複雜度)

把所有rankrank存起來的話…

  • 這代表我們可以二分搜前綴長度!
  • 複雜度:O(logn)\Omicron(\log n)每次查詢
  • (不考慮建rankrank的複雜度)

但是空間複雜度呢?

空間炸裂

讓我們換個方式

Ss[5]=\0S_s[5]=\backslash0

Ss[3]=ab\0S_s[3]=ab\backslash0

Ss[0]=abbab\0S_s[0]=abbab\backslash0

Ss[4]=b\0S_s[4]=b\backslash0

Ss[2]=bab\0S_s[2]=bab\backslash0

Ss[1]=bbab\0S_s[1]=bbab\backslash0

s=abbabs=abbab

我們開一個陣列ll

使得l[i]=l[i]=排名在第ii的後綴和排名在第i+1i+1的後綴做LCPLCP

Ss[5]=\0S_s[5]=\backslash0

Ss[3]=ab\0S_s[3]=ab\backslash0

Ss[0]=abbab\0S_s[0]=abbab\backslash0

Ss[4]=b\0S_s[4]=b\backslash0

Ss[2]=bab\0S_s[2]=bab\backslash0

Ss[1]=bbab\0S_s[1]=bbab\backslash0

s=abbabs=abbab

我們開一個陣列ll

使得l[i]=l[i]=排名在第ii的後綴和排名在第i+1i+1的後綴做LCPLCP

l[0]=0l[0]=0

l[1]=2l[1]=2

l[2]=0l[2]=0

l[3]=1l[3]=1

l[4]=1l[4]=1

Ss[5]=\0S_s[5]=\backslash0

Ss[3]=ab\0S_s[3]=ab\backslash0

Ss[0]=abbab\0S_s[0]=abbab\backslash0

Ss[4]=b\0S_s[4]=b\backslash0

Ss[2]=bab\0S_s[2]=bab\backslash0

Ss[1]=bbab\0S_s[1]=bbab\backslash0

l[0]=0l[0]=0

l[1]=2l[1]=2

l[2]=0l[2]=0

l[3]=1l[3]=1

l[4]=1l[4]=1

如果我想要找LCP(Ss[3],Ss[2])LCP(S_s[3],S_s[2])

Ss[5]=\0S_s[5]=\backslash0

Ss[3]=ab\0S_s[3]=ab\backslash0

Ss[0]=abbab\0S_s[0]=abbab\backslash0

Ss[4]=b\0S_s[4]=b\backslash0

Ss[2]=bab\0S_s[2]=bab\backslash0

Ss[1]=bbab\0S_s[1]=bbab\backslash0

l[0]=0l[0]=0

l[1]=2l[1]=2

l[2]=0l[2]=0

l[3]=1l[3]=1

l[4]=1l[4]=1

如果我想要找LCP(Ss[3],Ss[2])LCP(S_s[3],S_s[2])

答案會是min(l[14])\min(l[1\dots4])

LCPLCP會變成求區間最小值(RMQ)

拿棵線段樹什麼的維護ll就好了

生出ll

先來看個小動畫

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=?l[1]=?

l[2]=?l[2]=?

l[3]=?l[3]=?

l[4]=?l[4]=?

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=?l[1]=?

l[2]=?l[2]=?

l[3]=?l[3]=?

l[4]=?l[4]=?

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=?l[1]=?

l[2]=?l[2]=?

l[3]=?l[3]=?

l[4]=?l[4]=?

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=?l[1]=?

l[2]=?l[2]=?

l[3]=?l[3]=?

l[4]=?l[4]=?

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=?l[1]=?

l[2]=?l[2]=?

l[3]=?l[3]=?

l[4]=?l[4]=?

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=?l[3]=?

l[4]=?l[4]=?

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=?l[3]=?

l[4]=?l[4]=?

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=?l[3]=?

l[4]=?l[4]=?

LCP(Ss[0],Ss[3])=2LCP(S_s[0],S_s[3])=2

LCP(Ss[1],Ss[4])=1\Rightarrow LCP(S_s[1],S_s[4])=1

LCP(Ss[1],Ss[2])1\Rightarrow LCP(S_s[1],S_s[2])\geq1

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=?l[3]=?

l[4]1l[4]\geq1

LCP(Ss[0],Ss[3])=2LCP(S_s[0],S_s[3])=2

LCP(Ss[1],Ss[4])=1\Rightarrow LCP(S_s[1],S_s[4])=1

LCP(Ss[1],Ss[2])1\Rightarrow LCP(S_s[1],S_s[2])\geq1

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=?l[3]=?

l[4]1l[4]\geq1

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=?l[3]=?

l[4]=1l[4]=1

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=?l[3]=?

l[4]=1l[4]=1

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=?l[3]=?

l[4]=1l[4]=1

LCP(Ss[1],Ss[2])=1LCP(S_s[1],S_s[2])=1

LCP(Ss[2],Ss[3])=0\Rightarrow LCP(S_s[2],S_s[3])=0

LCP(Ss[2],Ss[4])0\Rightarrow LCP(S_s[2],S_s[4])\geq0

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]0l[3]\geq0

l[4]=1l[4]=1

LCP(Ss[1],Ss[2])=1LCP(S_s[1],S_s[2])=1

LCP(Ss[2],Ss[3])=0\Rightarrow LCP(S_s[2],S_s[3])=0

LCP(Ss[2],Ss[4])0\Rightarrow LCP(S_s[2],S_s[4])\geq0

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]0l[3]\geq0

l[4]=1l[4]=1

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]0l[3]\geq0

l[4]=1l[4]=1

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=1l[3]=1

l[4]=1l[4]=1

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=?l[0]=?

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=1l[3]=1

l[4]=1l[4]=1

LCP(Ss[2],Ss[4])=1LCP(S_s[2],S_s[4])=1

LCP(Ss[3],Ss[5])=0\Rightarrow LCP(S_s[3],S_s[5])=0

LCP(Ss[3],Ss[5])0\Rightarrow LCP(S_s[3],S_s[5])\geq0

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]0l[0]\geq0

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=1l[3]=1

l[4]=1l[4]=1

LCP(Ss[2],Ss[4])=1LCP(S_s[2],S_s[4])=1

LCP(Ss[3],Ss[5])=0\Rightarrow LCP(S_s[3],S_s[5])=0

LCP(Ss[3],Ss[5])0\Rightarrow LCP(S_s[3],S_s[5])\geq0

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]0l[0]\geq0

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=1l[3]=1

l[4]=1l[4]=1

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=0l[0]=0

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=1l[3]=1

l[4]=1l[4]=1

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=0l[0]=0

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=1l[3]=1

l[4]=1l[4]=1

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=0l[0]=0

l[1]=2l[1]=2

l[2]=?l[2]=?

l[3]=1l[3]=1

l[4]=1l[4]=1

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=0l[0]=0

l[1]=2l[1]=2

l[2]=0l[2]=0

l[3]=1l[3]=1

l[4]=1l[4]=1

s=abbabs=abbab

50_50

3ab0_3ab0

0abbab0_0abbab0

4b0_4b0

2bab0_2bab0

1bbab0_1bbab0

l[0]=0l[0]=0

l[1]=2l[1]=2

l[2]=0l[2]=0

l[3]=1l[3]=1

l[4]=1l[4]=1

動動腦

  • 每一次一定都可以用前一次配對的東西嗎?
  • 會不會剛好配完新的區間在新的目標的往下?
  • 時間複雜度?

其實是懶得寫證明

vector<int> LCP(string s, vector<int> &r) {
    int n = s.size();
    vector<int> p(n + 1), l(n);
    for (int i = 0; i <= n; i++) p[r[i]] = i;
    int len = 0;
    for (int i = 0; i < n; i++) {
        int j = p[r[i] - 1];
        while (i + len < n && j + len < n && s[i + len] == s[j + len]) len++;
        l[r[i] - 1] = len;
        if (len) len--;
    }
    return l;
}

Code

題單

沒了

  • Suffix Automaton (另一個
  • Main-Lorentz Algorithm
  • Lyndon Factorization (寫不出這題的可以參考)
  • String Matching with FFT

各種毒瘤延伸:

推一個網站

Made with Slides.com