Me @arnaudroger
You
It all adds up
the 1900s cost model
today's cost model
What does that means for java?
What kind of optimization does the JIT provides
Also worth considering Solaris Studio.
Workflow
Why
"If such choices are deeply nested, this strategy requires an exponential number of passes over the input data before it can detect whether the input matches. If the input is large, it is easy to construct a pattern whose running time would exceed the lifetime of the universe."
https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS
// run the following on java8, fixed on 9
StringBuilder sb = new StringBuilder();
for(int i = 0; i < 80; i++) sb.append("a");
sb.append("!");
long l = System.currentTimeMillis();
boolean b = Pattern.compile("^(a+)+$").matcher(sb.toString()).matches();
long t = l - System.currentTimeMillis();
System.out.println("t = " + t + " " + b);
All the following modification are available at
https://github.com/arnaudroger/re2j/tree/jug
2 old pr in re2j - not totally in line the changes in the presentation
https://github.com/google/re2j/pull/35/ - Merged!
https://github.com/google/re2j/pull/36/
All the benchmark run 20 iterations 20 forks on a box with set cpu speed, but still can be noisy
cd /tmp
sudo apt-get --yes install git linux-tools-generic linux-tools-x.y.z-w-generic cmake && \
wget https://github.com/jvm-profiling-tools/perf-map-agent/archive/master.zip && \
unzip master.zip && \
cd perf-map-agent-master && \
cmake . && \
make && \
git clone https://github.com/brendangregg/FlameGraph
export FLAMEGRAPH_DIR=/tmp/perf-map-agent-master/FlameGraph
export PERF_RECORD_SECONDS=60
bin/perf-java-flames <pid>
-XX:+PreserveFramePointer
you'll need to had that to your start script
then on the instance
String EXP1 = "\\\\.*(documents|\\$documents\\.user)\\\\";
String EXP2 = "abcdef.exe|foooooo.exe|bargoo.exe|ratatouille.exe|orleans.exe";
String[] DATA = {
"bargoo.exe",
"\\SystemRoot\\System32\\bargoo.exe",
"somefile.exe",
"C:\\WINDOWS\\system32\\somefile.exe",
"cmd.exe",
"\"C:\\WINDOWS\\system32\\cmd.exe\" ",
"powershell.exe",
"powershell.exe -Command function Main {\n $lorem = \\\"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur\\\"\n }\n \nMain\n"
};
@Benchmark
public void testExp1(Blackhole blackhole) {
for(String str : data) {
blackhole.consume(exp1.matcher(str).find());
}
}
@Benchmark
public void testExp2(Blackhole blackhole) {
for (String str : data) {
blackhole.consume(exp2.matcher(str).find());
}
}
@Benchmark
public void testCombine(Blackhole blackhole) {
for (String str : data) {
blackhole.consume(exp1.matcher(str).find());
blackhole.consume(exp2.matcher(str).find());
}
}
Benchmark - jmh
Java 8 vs Re2j 1.1
Benchmark Mode Cnt Score Error Units
JavaRegex.testCombine thrpt 200 16553.594 ± 397.406 ops/s
Re2jFindRegex.testCombine thrpt 200 1504.195 ± 9.869 ops/s 11 x 😞
JavaRegex.testExp1 thrpt 200 64284.475 ± 308.279 ops/s
Re2jFindRegex.testExp1 thrpt 200 4842.201 ± 56.966 ops/s 13 x 😞
JavaRegex.testExp2 thrpt 200 28048.060 ± 110.140 ops/s
Re2jFindRegex.testExp2 thrpt 200 2195.518 ± 34.807 ops/s 13 x 😞
java -jar target/benchmarks.jar -f 1 -wi 1 -i 1000 Re2jRegex.testExp2 \
-jvmArgs "-XX:+UnlockCommercialFeatures -XX:+FlightRecorder \
-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints"
Text
Text
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 1645.880 ± 17.718 ops/s 9%
Re2jFindRegex.testExp1 thrpt 200 5713.571 ± 61.726 ops/s 18%
Re2jFindRegex.testExp2 thrpt 200 2495.788 ± 24.896 ops/s 14%
9% for a 60 seconds change not bad!
the 18% looks inflated compare to Re2jMatchRegex
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 1691.707 ± 16.574 ops/s 3%
Re2jFindRegex.testExp1 thrpt 200 5899.987 ± 58.776 ops/s 3%
Re2jFindRegex.testExp2 thrpt 200 2544.503 ± 28.540 ops/s 2%
and another 3%.
3% could easily just be noise, but here consistent across benchmark
Text
Text
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 1885.322 ± 26.392 ops/s 11%
Re2jFindRegex.testExp1 thrpt 200 6465.808 ± 142.104 ops/s 10%
Re2jFindRegex.testExp2 thrpt 200 2791.424 ± 28.951 ops/s 10%
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 1885.322 ± 26.392 ops/s 11%
Re2jFindRegex.testExp1 thrpt 200 6465.808 ± 142.104 ops/s 10%
Re2jFindRegex.testExp2 thrpt 200 2791.424 ± 28.951 ops/s 10%
25% |
34% |
27% |
// https://github.com/google/re2j/blob/master/java/com/google/re2j/Inst.java#L64
if ((arg & RE2.FOLD_CASE) != 0) {
for (int r1 = Unicode.simpleFold(r0);
r1 != r0; // loop until folded on the original code point A -> a -> A over!
r1 = Unicode.simpleFold(r1)) {
if (r == r1) {
return true;
}
}
}
"simpleFold iterates over Unicode code points equivalent under the Unicode-defined simple case folding"
// https://github.com/google/re2j/blob/master/java/com/google/re2j/Unicode.java#L203
static int simpleFold(int r) {
// Consult caseOrbit table for special cases.
int lo = 0;
int hi = UnicodeTables.CASE_ORBIT.length;
while (lo < hi) {
int m = lo + (hi - lo) / 2;
if (UnicodeTables.CASE_ORBIT[m][0] < r) {
lo = m + 1;
} else {
hi = m;
}
}
if (lo < UnicodeTables.CASE_ORBIT.length &&
UnicodeTables.CASE_ORBIT[lo][0] == r) {
return UnicodeTables.CASE_ORBIT[lo][1];
}
// No folding specified. This is a one- or two-element
// equivalence class containing rune and toLower(rune)
// and toUpper(rune) if they are different from rune.
int l = toLower(r);
if (l != r) {
return l;
}
return toUpper(r);
}
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 2973.766 ± 32.765 ops/s 58% total 98% 👍
Re2jFindRegex.testExp1 thrpt 200 7970.696 ± 99.064 ops/s 23% total 65% 👍
Re2jFindRegex.testExp2 thrpt 200 5315.117 ± 71.776 ops/s 90% total 142% 👍
Text
PS : Use to be less than 2%
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 3301.226 ± 22.490 ops/s 11%/119%
Re2jFindRegex.testExp1 thrpt 200 8701.832 ± 52.752 ops/s 9%/ 80%
Re2jFindRegex.testExp2 thrpt 200 5497.766 ± 75.145 ops/s 3%/150%
Use to store current active matching thread
We can
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 3755.242 ± 39.526 ops/s 14%/150%
Re2jFindRegex.testExp1 thrpt 200 9335.040 ± 177.798 ops/s 7%/ 93%
Re2jFindRegex.testExp2 thrpt 200 6313.462 ± 88.504 ops/s 15%/188%
150% |
93% |
188% |
FlighRecorder got us so far
from 12x to 4-5x slower
a perf improvement of 93 and 188%
now time to....
Jmh has multiple profiler integrated.
including perfasm.
As good as it can get. but need to run on linux...
or windows
- need hsdis library in your vm
see https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly
echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid # needed only on per boot
java -jar target/benchmarks.jar -f 1 Re2jRegex.testExp2 -prof perfasm
27.59% 29.33% C2, level 4 com.google.re2j.Machine::add, version 487 (373 bytes)
22.30% 20.97% C2, level 4 com.google.re2j.Machine::add, version 487 (231 bytes)
19.77% 18.44% C2, level 4 com.google.re2j.Machine::step, version 490 (301 bytes)
11.40% 13.05% C2, level 4 com.google.re2j.Machine::match, version 538 (764 bytes)
8.37% 7.57% C2, level 4 com.google.re2j.Machine::step, version 490 (348 bytes)
5.79% 6.73% runtime stub StubRoutines::jint_disjoint_arraycopy (128 bytes)
0.11% 0.09% │ 0x00007fd8dd2104ba: mov 0x38(%rsp),%r10
0.55% 0.53% │ 0x00007fd8dd2104bf: mov 0xc(%r10),%r10d ;*getfield op
│ ; - com.google.re2j.Machine::add@23 (line 343)
0.80% 0.54% │ 0x00007fd8dd2104c3: or %r11,%r8
0.61% 0.53% │ 0x00007fd8dd2104c6: mov %r8,0x10(%rdx) ;*putfield pcsl
│ ; - com.google.re2j.Machine$Queue::add@15 (line 57)
│ ; - com.google.re2j.Machine::add@19 (line 342)
0.13% 0.17% │ 0x00007fd8dd2104ca: mov %r10d,%r11d
0.47% 0.55% │ 0x00007fd8dd2104cd: dec %r11d
0.76% 0.56% │ 0x00007fd8dd2104d0: cmp $0xc,%r11d
│ 0x00007fd8dd2104d4: jae 0x00007fd8dd21070e ;*tableswitch
│ ; - com.google.re2j.Machine::add@26 (line 343)
0.55% 0.53% │ 0x00007fd8dd2104da: mov 0x38(%rsp),%r11
0.15% 0.24% │ 0x00007fd8dd2104df: mov 0x14(%r11),%r8d ;*getfield arg
│ ; - com.google.re2j.Machine::add@141 (line 357)
0.56% 0.54% │ 0x00007fd8dd2104e3: mov 0x30(%r11),%r11d
0.77% 0.76% │ 0x00007fd8dd2104e7: movslq %r10d,%r9
0.64% 0.71% │ 0x00007fd8dd2104ea: mov %r11,%rcx
0.10% 0.18% │ 0x00007fd8dd2104ed: shl $0x3,%rcx ;*getfield outInst
│ ; - com.google.re2j.Machine::add@176 (line 363)
0.55% 0.60% │ 0x00007fd8dd2104f1: movabs $0x7fd8dd2103e0,%r10 ; {section_word}
0.73% 0.67% │ 0x00007fd8dd2104fb: jmpq *-0x8(%r10,%r9,8) ;*tableswitch
│ ; - com.google.re2j.Machine::add@26 (line 343)
0.00% ↘ 0x00007fd8dd210500: mov 0x70(%rsp),%rax
0.01% 0x00007fd8dd210505: jmpq 0x00007fd8dd2106e5
0x00007fd8dd21050a: andn %r8d,%edi,%r10d
0x00007fd8dd21050f: test %r10d,%r10d
Machine.add and Machine.step are very costly.
Method are big, and inlining is very limited.
switch(inst.op) is very much like virtual call dispatch
-> use polymorphism instead
-> better profiling information, JIT might do better than the switch
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 4272.693 ± 71.341 ops/s 14%/184% 👍
Re2jFindRegex.testExp1 thrpt 200 11552.934 ± 199.890 ops/s 24%/139% 👍
Re2jFindRegex.testExp2 thrpt 200 7973.784 ± 113.688 ops/s 26%/263% 👍
29.17% 26.09% C2, level 4 com.google.re2j.Machine::step, version 504 (1200 bytes)
25.99% 27.85% C2, level 4 com.google.re2j.Machine::step, version 504 (605 bytes)
22.36% 24.75% C2, level 4 com.google.re2j.Machine::match, version 557 (1126 bytes)
7.44% 8.03% runtime stub StubRoutines::jint_disjoint_arraycopy (128 bytes)
7.39% 7.60% C2, level 4 com.google.re2j.Machine::step, version 504 (365 bytes)
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 5139.382 ± 37.840 ops/s 20%/242%
Re2jFindRegex.testExp1 thrpt 200 14420.410 ± 193.187 ops/s 25%/198%
Re2jFindRegex.testExp2 thrpt 200 9594.136 ± 173.792 ops/s 20%/337%
62.67% 60.66% C2, level 4 com.google.re2j.Machine::step, version 500 (1434 bytes)
22.73% 25.48% C2, level 4 com.google.re2j.Machine::match, version 550 (979 bytes)
4.77% 4.96% C2, level 4 com.google.re2j.Machine::step, version 500 (381 bytes)
3.68% 4.65% C2, level 4 com.google.re2j.Machine::step, version 500 (111 bytes)
1.31% 0.24% C2, level 4 com.google.re2j.Machine::init, version 541 (312 bytes)
0.56% 0.60% C2, level 4 com.google.re2j.Machine::match, version 550 (267 bytes)
0.55% 0.54% [kernel.kallsyms] [unknown] (5 bytes)
0.19% 0.05% C2, level 4 com.google.re2j.Machine::init, version 541 (61 bytes)
No more
0.88% 0.58% │ │ 0x00007f242c3896fc: lea (%r12,%r8,8),%r11
0.55% 0.44% │ │ 0x00007f242c389700: mov 0x10(%r11,%r10,4),%r14d ;*aaload
│ │ ; - com.google.re2j.Machine::step@27 (line 278)
0.31% 0.41% │ │ 0x00007f242c389705: mov 0x10(%r12,%r14,8),%ebp ;*getfield inst
│ │ ; - com.google.re2j.Machine::step@78 (line 283)
│ │ ; implicit exception: dispatches to 0x00007f242c38aded
3.40% 3.11% │ │ 0x00007f242c38970a: mov 0x8(%r12,%rbp,8),%r8d ; implicit exception: dispatches to 0x00007f242c38adfd
7.00% 6.89% │ │ 0x00007f242c38970f: cmp $0xf8019992,%r8d ; {metadata('com/google/re2j/Inst$RuneInst')}
│ │ 0x00007f242c389716: jne 0x00007f242c389f11
1.79% 1.36% │ │ 0x00007f242c38971c: lea (%r12,%rbp,8),%r11 ;*invokevirtual isMatch
│ │ ; - com.google.re2j.Machine::step@85 (line 285)
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 5024.247 ± 76.465 ops/s -2%/234%
Re2jFindRegex.testExp1 thrpt 200 14606.561 ± 111.745 ops/s 1%/202%
Re2jFindRegex.testExp2 thrpt 200 10152.329 ± 85.381 ops/s 6%/362%
0.14% 0.12% ││ 0x00007fcab81811ee: mov 0x10(%r11,%r10,4),%r14d ;*aaload
││ ; - com.google.re2j.Machine::step@27 (line 278)
0.87% 0.66% ││ 0x00007fcab81811f3: mov 0x10(%r12,%r14,8),%r11d ;*getfield inst
││ ; - com.google.re2j.Machine::step@78 (line 283)
││ ; implicit exception: dispatches to 0x00007fcab8182849
3.71% 3.24% ││ 0x00007fcab81811f8: mov 0xc(%r12,%r11,8),%ebp ;*getfield op
││ ; - com.google.re2j.Machine::step@85 (line 285)
││ ; implicit exception: dispatches to 0x00007fcab8182859
6.46% 6.71% ││ 0x00007fcab81811fd: cmp $0x6,%ebp
1.53% 1.75% ││ 0x00007fcab8181200: je 0x00007fcab8181a6d ;*if_icmpne
││ ; - com.google.re2j.Machine::step@90 (line 285)
1.86% 1.80% ││ 0x00007fcab8181206: mov 0x8(%r12,%r11,8),%r9d
││ 0x00007fcab818120b: cmp $0xf8019992,%r9d ; {metadata('com/google/re2j/Inst$RuneInst')}
││ 0x00007fcab8181212: jne 0x00007fcab81819c5 ;*invokevirtual matchRune
││ ; - com.google.re2j.Machine::step@189 (line 299)
0.00% 0.01% ││ 0x00007fcab8181218: mov 0x20(%rsp),%r8
still high cost around inst access, cache miss?
What else?
0.65% 0.55% │ 0x00007f36b9225d3f: mov %rbx,%rsi
0.05% 0.06% │ 0x00007f36b9225d42: and %r9,%rsi ;*land
│ ; - com.google.re2j.Machine$Queue::contains@13 (line 47)
│ ; - com.google.re2j.Inst$AltInst::add@5 (line 187)
│ ; - com.google.re2j.Inst$AltInst::add@-1 (line 187)
0.07% 0.11% │ 0x00007f36b9225d45: test %rsi,%rsi
│ 0x00007f36b9225d48: jne 0x00007f36b9226481 ;*ifeq
│ ; - com.google.re2j.Machine$Queue::contains@16 (line 47)
│ ; - com.google.re2j.Inst$AltInst::add@5 (line 187)
│ ; - com.google.re2j.Inst$AltInst::add@-1 (line 187)
0.30% 0.49% │ 0x00007f36b9225d4e: cmp $0x40,%ecx
│ 0x00007f36b9225d51: jge 0x00007f36b92264cd ;*if_icmpge
│ ; - com.google.re2j.Machine$Queue::add@3 (line 56)
│ ; - com.google.re2j.Inst$AltInst::add@19 (line 190)
│ ; - com.google.re2j.Inst$AltInst::add@-1 (line 187)
0.25% 0.21% │ 0x00007f36b9225d57: mov 0x1c(%r10),%ebp ;*getfield outInst
│ ; - com.google.re2j.Inst$AltInst::add@23 (line 192)
│ ; - com.google.re2j.Inst$AltInst::add@-1 (line 187)
0.05% 0.05% │ 0x00007f36b9225d5b: or %r9,%rbx ;*lor ; - com.google.re2j.Machine$Queue::add@14 (line 57)
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 5467.610 ± 60.482 ops/s 9%/263%
Re2jFindRegex.testExp1 thrpt 200 14808.181 ± 134.538 ops/s 1%/206%
Re2jFindRegex.testExp2 thrpt 200 11636.934 ± 91.307 ops/s 15%/430%
0.38% 0.33% │ 0x00007f220122449e: mov %ebx,0xac(%rsp)
0.00% │ 0x00007f22012244a5: vmovd %eax,%xmm3
│ 0x00007f22012244a9: mov %rcx,%r14
0.13% 0.12% │ 0x00007f22012244ac: mov 0xc(%rcx),%r10d ;*getfield size
│ ; - com.google.re2j.Machine$Queue::addThread@6 (line 65)
│ ; - com.google.re2j.Inst$MatchInst::add@74 (line 106)
│ ; - com.google.re2j.Inst$Alt2Inst::add@35 (line 193)
│ ; - com.google.re2j.Machine::step@-1 (line 276)
0.29% 0.27% │ 0x00007f22012244b0: mov %r10d,0x28(%rsp)
│ 0x00007f22012244b5: mov 0x20(%rcx),%r10d ;*getfield denseThreads
│ ; - com.google.re2j.Machine$Queue::addThread@1 (line 65)
│ ; - com.google.re2j.Inst$MatchInst::add@74 (line 106)
│ ; - com.google.re2j.Inst$Alt2Inst::add@35 (line 193)
│ ; - com.google.re2j.Machine::step@-1 (line 276)
│ 0x00007f22012244b9: vmovd %r10d,%xmm2
0.16% 0.11% │ 0x00007f22012244be: mov 0x28(%rsp),%r10d
0.29% 0.27% │ 0x00007f22012244c3: inc %r10d ;*iadd
│ ; - com.google.re2j.Machine$Queue::addThread@11 (line 65)
│ ; - com.google.re2j.Inst$MatchInst::add@74 (line 106)
│ ; - com.google.re2j.Inst$Alt2Inst::add@35 (line 193)
│ ; - com.google.re2j.Machine::step@-1 (line 276)
│ 0x00007f22012244c6: vmovd %r10d,%xmm4
│ 0x00007f22012244cb: mov %r10d,0xc(%rcx) ;*putfield size
│ ; - com.google.re2j.Machine$Queue::addThread@12 (line 65)
│ ; - com.google.re2j.Inst$MatchInst::add@74 (line 106)
│ ; - com.google.re2j.Inst$Alt2Inst::add@35 (line 193)
│ ; - com.google.re2j.Machine::step@-1 (line 276)
Thread pooling takes a lot of time in different places
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 7818.190 ± 108.756 ops/s 43%/420%
Re2jFindRegex.testExp1 thrpt 200 20212.245 ± 387.433 ops/s 36%/317%
Re2jFindRegex.testExp2 thrpt 200 16208.431 ± 189.066 ops/s 39%/430%
Re2jMatchRegex.testCombine thrpt 200 4078.958 ± 36.783 ops/s -5%
Re2jMatchRegex.testExp1 thrpt 200 10905.446 ± 143.010 ops/s -8%
Re2jMatchRegex.testExp2 thrpt 200 7200.117 ± 62.800 ops/s -14%
1.47% 1.51% │ │ 0x00007f2da121b3e3: mov 0x20(%r9),%ebp ;*getfield denseThreadsInstructions
│ │ ; - com.google.re2j.Machine::step@78 (line 294)
0.39% 0.32% │ │ 0x00007f2da121b3e7: mov 0xc(%r12,%rbp,8),%r10d ; implicit exception: dispatches to 0x00007f2da121bd85
0.90% 0.70% │ │ 0x00007f2da121b3ec: cmp %r10d,%r8d
│ │ 0x00007f2da121b3ef: jae 0x00007f2da121b6d3
0.91% 0.92% │ │ 0x00007f2da121b3f5: lea (%r12,%rbp,8),%r10
1.29% 1.40% │ │ 0x00007f2da121b3f9: mov 0x10(%r10,%r8,4),%ebp ;*aaload
│ │ ; - com.google.re2j.Machine::step@83 (line 294)
0.34% 0.30% │ │ 0x00007f2da121b3fe: mov 0xc(%r12,%rbp,8),%r11d ; implicit exception: dispatches to 0x00007f2da121bd99
for (int j = 0; j < runq.size; ++j) {
Cannot prove runq.size will not change, it actually does.
if (!longest) {
// First-match mode: cut off all lower-priority threads.
freeQueue(runq, j + 1); // calls queue.clear(); witch set the size to 0
// which will trigger an exit from the loop
}
matched = true;
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 8061.101 ± 111.423 ops/s 3%/436%
Re2jFindRegex.testExp1 thrpt 200 20755.356 ± 356.537 ops/s 3%/329%
Re2jFindRegex.testExp2 thrpt 200 17957.057 ± 107.874 ops/s 11%/718%
0.79% 1.09% │ 0x00007f0c9921b4fa: mov 0x20(%rax),%ebp ;*getfield denseThreadsInstructions
│ ; - com.google.re2j.Machine::step@82 (line 295)
0.22% 0.25% │ 0x00007f0c9921b4fd: mov 0xc(%r12,%rbp,8),%r8d ; implicit exception: dispatches to 0x00007f0c9921c661
1.10% 1.20% │ 0x00007f0c9921b502: cmp %r8d,%r10d
│ 0x00007f0c9921b505: jae 0x00007f0c9921baa9
1.80% 1.41% │ 0x00007f0c9921b50b: lea (%r12,%rbp,8),%r8
0.61% 0.60% │ 0x00007f0c9921b50f: mov 0x10(%r8,%r10,4),%ecx ;*aaload
│ ; - com.google.re2j.Machine::step@87 (line 295)
Did not eliminate the boundary check in exp1 though
but did in exp2, will come back to that one later
0.26% 0.16% │ 0x00007f5f41204509: mov 0x20(%rdx),%r11d ;*getfield denseThreadsInstructions
│ ; - com.google.re2j.Machine::step@82 (line 295)
0.15% 0.10% │ 0x00007f5f4120450d: mov 0xc(%r12,%r11,8),%r10d ;*aaload
│ ; - com.google.re2j.Machine::step@87 (line 295)
│ ; implicit exception: dispatches to 0x00007f5f4120494d
1.35% 1.70% │ │ │ 0x00007f5f41204557: mov 0x8(%r12,%r10,8),%ecx
1.03% 1.25% │ │ │ 0x00007f5f4120455c: cmp $0xf8019993,%ecx ; {metadata('com/google/re2j/Inst$RuneInst')}
│ │ │ 0x00007f5f41204562: jne 0x00007f5f412047cd
0.86% 0.89% │ │ │ 0x00007f5f41204568: shl $0x3,%r10 ;*invokevirtual matchRune
│ │ │ ; - com.google.re2j.Machine::step@181 (line 312)
the instance type check is costly, even though only call it for RuneInst.
-> move matchRune method to Inst as final
-> no need for virtual dispatch
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 9082.066 ± 73.694 ops/s 13%/504%
Re2jFindRegex.testExp1 thrpt 200 21453.043 ± 332.811 ops/s 3%/343%
Re2jFindRegex.testExp2 thrpt 200 20037.627 ± 253.445 ops/s 12%/718%
0.53% 0.48% 0x00007f6ad9217394: mov 0x8(%rsp),%r8
1.20% 0.96% 0x00007f6ad9217399: movzbl 0x11(%r8),%r8d ;*getfield captures
; - com.google.re2j.Machine::step@26 (line 285)
2.20% 2.40% 0x00007f6ad921739e: test %r8d,%r8d
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 9307.324 ± 63.348 ops/s 2%/519%
Re2jFindRegex.testExp1 thrpt 200 22421.399 ± 230.262 ops/s 5%/363%
Re2jFindRegex.testExp2 thrpt 200 20411.229 ± 228.717 ops/s 2%/830%
0.04% 0.02% │││ ││││↘│ ││││ ││││ 0x00007f2e7923aab0: mov %r10d,0x5c(%rsp) ;*aload_0
│││ ││││ │ ││││ ││││ ; - com.google.re2j.Machine::match@267 (line 237)
0.09% 0.18% │││ ││││ ↘ ││││ ││││ 0x00007f2e7923aab5: test %eax,%eax
│││ ││││ ││││ ││││ 0x00007f2e7923aab7: jne 0x00007f2e7923b27d ;*ifne
│││ ││││ ││││ ││││ ; - com.google.re2j.Machine::match@271 (line 237)
0.50% 0.40% │││ ││││ ││││ ││││ 0x00007f2e7923aabd: mov 0x64(%rsp),%r11d
0.12% 0.12% │││ ││││ ││││ ││││ 0x00007f2e7923aac2: test %r11d,%r11d
│││ ││││ ╭││││ ││││ 0x00007f2e7923aac5: je 0x00007f2e7923ac65 ;*ifeq
│││ ││││ │││││ ││││ ; - com.google.re2j.Machine::match@275 (line 237)
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 9307.737 ± 144.002 ops/s 0%
Re2jFindRegex.testExp1 thrpt 200 22643.527 ± 276.346 ops/s 1%
Re2jFindRegex.testExp2 thrpt 200 20796.029 ± 175.146 ops/s 2%
waste of time, as can see from the perf asm
originally included after perf indicated higher cost.
Benchmark Mode Cnt Score Error Units
Re2jFindRegex.testCombine thrpt 200 9195.517 ± 118.936 ops/s -1%
Re2jFindRegex.testExp1 thrpt 200 22150.490 ± 84.177 ops/s -2%
Re2jFindRegex.testExp2 thrpt 200 20539.044 ± 121.331 ops/s -1%
no impact ;(
0.34% 0.31% ││ 0x00007f4a39235d73: mov %r11,%rax ;*iload
││ ; - com.google.re2j.Machine::step@37 (line 287)
0.61% 0.49% ││ 0x00007f4a39235d76: mov 0x10(%rbx,%r10,4),%r8d ;*aaload
││ ; - com.google.re2j.Machine::step@95 (line 297)
1.79% 1.69% ││ 0x00007f4a39235d7b: mov 0xc(%r12,%r8,8),%r11d ;*getfield op
││ ; - com.google.re2j.Machine::step@100 (line 299)
││ ; implicit exception: dispatches to 0x00007f4a3923701d
511% |
357% |
835% |
The great stagnation!
What else?
Other related perf resource