String
Outline
- Hash
- Z-Algorithm
- KMP
- Manacher
- Trie
- AC Automaton
- Suffix Array
- Main-Lorentz
Hash
What is hash?
For a hash function \(f\),
\(x=y \Rightarrow f(x)=f(y) \)
\(x \neq y \Rightarrow f(x) \neq f(y) \) (very high prob.)
For two strings \(s,t\), if we want to know whether \(s\) and \(t\) are the same, we can hash them, and check if \(f(x)=f(y)\)
Rabin Karp
Given a string \(s_0...s_{n-1}\), define \(a[i]=s_i*p^i\)
Problem
Given a string \(s\), answer \(q\) querys:
given a string \(t\), print the number of occurences of \(t\) in \(s\)
\(|s|, |t| \leq 10000\)
\(q \leq 50000 \)
\( \sum |t| \leq 350000\)
Solution
Use prefix sum on hash, and then we can check if a string of length \(|t|\) starting from every position of \(s\) matches in \( \Omicron (|s|) \).
Solution
#include <bits/stdc++.h>
#define IO ios::sync_with_stdio(0);cin.tie(0);cout.tie(0);
#define int long long
using namespace std;
const int p=127,M=998244353;
int pref[10005],po[10005];
main(){
IO
po[0]=1;
for(int i=1;i<10005;i++){po[i]=po[i-1]*p;po[i]%=M;}
int tc;cin >> tc;
while(tc--){
string T;cin >> T;
pref[0]=T[0];
for(int i=1;i<T.length();i++) {pref[i]=pref[i-1]*p+T[i];pref[i]%=M;}
int q;cin >> q;
while(q--){
int has=0,cnt=0;
string P;cin >> P;
for(int i=0;i<P.length();i++){has=has*p+P[i];has%=M;}
for(int i=P.length()-1;i<T.length();i++){
if(i==P.length()-1&&has==pref[i]) cnt++;
else if(((pref[i]-(pref[i-P.length()]*po[P.length()])%M)+M)%M==has%M) cnt++;
}
cout << cnt << '\n';
}
}
return 0;
}
Z-algorithm
Z-algorithm
Given a string \(s_0...s_{n-1}\), define an array \(z\):
\(z[i]=\) the biggest \(k\) that satisfies
\(s_0...s_{k-1}=s_is_{i+1}...s_{i+k-1}\)
(\(k=0\) if \(s_0 \neq s_i\))
Calculate \(z\)
Say we know \(z[0] \sim z[i-1]\).
First, we try to find the lower bound of \(z[i]\)
let \(l= \argmax_{0 \leq j \leq i-1}l+z[j]-1, r=l+z[l]-1\)
\(\Rightarrow s_0...s_{r-l}=s_l...s_r\).
if \(i \leq r\), we know that \(s_{i-l}...s_{r-l}=s_i...s_r\),
\( \Rightarrow z[i]\) is at least \(min(z[i-l],r-i+1)\)
Calculate \(z\)
Then, we can repeatedly check if \(s[z[i]]=s[i+z[i]]\),
and update \(z[i]\).
Finally, we can update \(l,r\) if \(i+z[i]-1 > r\).
Notice that \(r\) is increasing, and every time \(r\) increases requires \( \Omicron (1) \), so the algorithm is \( \Omicron (n) \) amortized.
Implementation
vector<int> z_algo(string &s){
int n=s.size();
vector<int> z(n,0);
for(int i=1,l=0,r=0;i<n;i++){
if(i<=r) z[i]=min(z[i-l],r-i+1);
while(i+z[i]<n&&s[z[i]]==s[i+z[i]]) z[i]++;
if(i+z[i]-1>r) l=i,r=i+z[i]-1;
}
return z;
}
Problem
Given a string \(s\), find the number of strings which satisfy:
A string \(t\) is a prefix and also a suffix of \(s\).
Solution
Count of different \(i\) which \(i+z[i]-1=n\).
KMP
KMP
Given a string \(s_0...s_{n-1}\), define failure function \(p\):
\(p[i]=\) the biggest \(k<i+1\) that satisfy
\(s_0...s_{k-1}=s_{i-k+1}...s_i\)
Build
Say we know \(p[0] \sim p[i-1]\).
let \(j=i-1\), if \(s_{p[j]}=s_i \Rightarrow p[i]=p[j]+1\)
otherwise we can keep making \(j=p[j-1]\) when \(j \neq 0\), and check if the condition is satisfied.
Build
Notice that \(j\) will be added only \( \Omicron(n) \) times, so the algorithm is \( \Omicron(n) \) amortized.
Implementation
vector<int> kmp(string &s){
int n=s.size();
vector<int> pi(n,0);
for(int i=1;i<n;i++){
int j=pi[i-1];
while(j>0&&s[i]!=s[j]) j=pi[j-1];
if(s[i]==s[j]) j++;
pi[i]=j;
}
return pi;
}
Problem
Given a string \(s\), answer \(q\) querys:
given a string \(t\), print the number of occurences of \(t\) in \(s\)
\(|s|, |t| \leq 10000\)
\(q \leq 50000 \)
\( \sum |t| \leq 350000\)
Solution
For every \(t\), calculate its failure function.
Maintain \(r\) where we match two string to \(s_i\) and \(t_r\),
if \(s_{i+1} \neq t_{r+1}\), we can make \(r=p[r]\) and keep matching.
Why failure function?
The name comes from that, if we failed on matching, we can switch to the largest possible position instantly.
Manacher's Algorithm
Problem
Given a string \(s\), find the longest palindrome substring.
\(|s| \leq 10^6 \)
First of all
Palindromes include two kinds:
1. odd length, center is a position in the string
2. even length, center is a position between two characters
Hard to deal with...
insert '*' between every two characters, the front and the end of the string, all palindromes become odd length (2*len+1)!
Construct
for a string \(s\) (after inserting '*'), define an array \(p\):
\(p[i]=\) the biggest \(k\) so that \(s_{i-k+1}...s_i=s_i...s_{i+k-1}\)
Then, how can we construct the array?
Calculate \(p\)
Say we have \(p[0] \sim p[i-1] \).
Let \(x= \argmax_{0 \leq j \leq i-1} j+p[j]-1\),
since \(s_{x-p[x]+1}...s_x=s_x...s_{x+p[x]-1}\),
\( \Rightarrow p[i] \geq min(p[2x-i], p[x]-(i-x))\)
Same idea with Z-algorithm!
Implementation
vector<int> manacher(string &ss){
string s;
s.resize(ss.size()*2+1,'.');
for(int i=0;i<ss.size();i++){
s[i*2+1]=ss[i];
}
vector<int> p(s.size(),1);
for(int i=0,l=0,r=0;i<s.size();i++){
p[i]=max(min(p[l*2-i],r-i),1LL);
while(0<=i-p[i]&&i+p[i]<s.size()&&s[i-p[i]]==s[i+p[i]]){
l=i,r=i+p[i],p[i]++;
}
}
return p;
}
Trie
Trie
Implementation
//didn't compile
int ch[N][26]{0},cnt[N]{0},ptr=0;
void insert(string &s){
int cur=0;
for(int i=0;i<s.length();i++){
if(!ch[cur][s[i]-'a']) ch[cur][s[i]-'a']=++ptr;
cur=ch[cur][s[i]-'a'];
}
cnt[cur]++;
}
so ez la
Problem
2021 北市賽 pB
我忘記題目了lol
Problem
Given an array \(a\), find the pair \((i,j)\) where \(a_i \oplus a_j\) is the biggest among all pairs.
\(n \leq 10^5, a_i \leq 10^9 \)
AC Automaton
Problem
(The stronger version of TIOJ 1306)
\(|s| \leq 10^5\)
\( \sum |t| \leq 5*10^5\)
Couldn't AC with hash, Z, or kmp...
Aho-Corasick Algorithm
Trie with fail link!
A fail link from \(u\) to \(v\): \(v\) represents the longest suffix of \(u\) which exists in the trie.
Remember what failure function is?
Demo
Build
Tree edge: just a trie
Fail link: a simple bfs would work!
Implemtation
const int N=5e5+5;
int ch[N][26]{0},fail[N]{0},ptr=0;
void insert(string &s,int ind){
int cur=0;
for(int i=0;i<s.size();i++){
if(!ch[cur][s[i]-'a']) ch[cur][s[i]-'a']=++ptr;
cur=ch[cur][s[i]-'a'];
}
}
void build(){
queue<int> q;
for(int i=0;i<26;i++) if(ch[0][i]) q.push(ch[0][i]);
while(!q.empty()){
int cur=q.front();q.pop();
for(int i=0;i<26;i++){
if(!ch[cur][i]) ch[cur][i]=ch[fail[cur]][i];
else{
q.push(ch[cur][i]);
int tem=fail[cur];
while(tem&&!ch[tem][i]) tem=fail[tem];
fail[ch[cur][i]]=ch[tem][i];
}
}
}
}
So how to solve the problem?
Matching: just walk on tree edge, and if there isn't one, take the fail link.
Maintain a count of times of visit on each vertex, and then a dfs is required.
Suffix Array
My implementation
const int N=2e5+5;
string s;
int n,p[N],pn[N],c0[N],c1[N],*c,*cn,cnt[N]{0};
void SA(string s){
s+='$';//cyclic or not
n=s.length();
c=c0;cn=c1;
//length=1
for(int i=0;i<n;i++) cnt[s[i]]++;
for(int i=0;i<256;i++) cnt[i]+=cnt[i-1];//256: sigma size
for(int i=0;i<n;i++){
p[--cnt[s[i]]]=i;
}
int cl=0;
c[p[0]]=cl;
for(int i=1;i<n;i++){
if(s[p[i]]!=s[p[i-1]]) cl++;
c[p[i]]=cl;
}
for(int k=1;k<n;k*=2){
//sorting mp(c[i-k],c[i]), c[i] already sorted in p[]
for(int i=0;i<=max(256LL,cl);i++) cnt[i]=0;//256
for(int i=0;i<n;i++){
pn[i]=p[i]-k;
if(pn[i]<0) pn[i]+=n;
}
for(int i=0;i<n;i++){
cnt[c[pn[i]]]++;
}
for(int i=1;i<=cl;i++) cnt[i]+=cnt[i-1];
for(int i=n-1;i>=0;i--){
p[--cnt[c[pn[i]]]]=pn[i];
}
cl=0;
cn[p[0]]=cl;
for(int i=1;i<n;i++){
auto prev=mp(c[p[i-1]],c[(p[i-1]+k)%n]);
auto cur=mp(c[p[i]],c[(p[i]+k)%n]);
if(prev!=cur) cl++;
cn[p[i]]=cl;
}
swap(c,cn);
}
//making all rank different
//for(int i=0;i<n;i++) c[p[i]]=i;
}
//p: starting indices after sort
//c: rank of indices, may be same if '$' not added
int lcp[N][20],po[20];
void LCP(){
po[0]=1;
for(int i=1;i<20;i++) po[i]=po[i-1]*2;
int k=0;
for(int i=0;i<n;i++){
if(c[i]==n-1){
k=0;
continue;
}
int j=p[c[i]+1];
while(i+k<n&&j+k<n&&a[i+k]==a[j+k]) k++;
lcp[c[i]][0]=k;
if(k) k--;
}
for(int j=1;j<20;j++){
for(int i=0;i<n-1;i++){
if(i+po[j-1]<n-1) lcp[i][j]=min(lcp[i][j-1],lcp[i+po[j-1]][j-1]);
}
}
}
//lcp[i][0]: longest common prefix of s.substr(p[i]), s.substr(p[i+1])
//lcp: a sparse table
int qry(int i,int j){
i=c[i],j=c[j];
if(i>j) swap(i,j);
int lg=__lg(j-i);
return min(lcp[i][lg],lcp[j-po[lg]][lg]);
}
順道提醒大家北市賽一定要好好喇分!
(去年好像有人說Sam會成為建中培訓的教材,這不就來了嗎)
Problems
我也沒看過
TIOJ 1927
ARC 151E
ABC 268 Ex
CF 1562 E
CF 1721 E
CF 985 F
CF 1366 G
CF 1363 F
CF 1313 E
Main-Lorentz
Problem
Finding repetitions - Algorithms for Competitive Programming (cp-algorithms.com)
Just learned this algorithm this Tuesday, very cool though.
String
By peter940324
String
- 484