Solving the web most popular code shortening competition in Python

Alessandro Amici - @alexamici - <a.amici@bopen.eu>

B-Open Solutions - http://bopen.eu

Abstract

“Code shortening” is the “sport” where participants strive to achieve the shortest possible source code that solves a programming problem by exploiting all the tricks and quirks of the language.

The SIZECON on SPOJ is one of the oldest and most popular code shortening problems on the web with a bizarre twist, only character above ASCII value 32 are counted for the penalty. During the talk we will take a journey into some frightening depths of the Python language in order to write shorter and shorter solutions to SIZECON until, exploiting a number of truly mind-blowing tricks, we will reach the current record solution of 28 characters (above ASCII 32!).

I promise I’ll show you the most obfuscated, contrived and sick python code you have ever seen and (hopefully!) will ever see. I invite participants to give SIZECON a try and check their score against the Python2 and Python3 SPOJ rankings.

SPOJ and SIZECON

SPOJ is a coding platform and an online judge:

  • support for 45+ languages
  • a huge trove of 20.000+ problems
  • 50.000+ users
  • user scores are public -> ranks
  • solutions are not public -> can compete any time

SIZECON is a unusual "code golf" problem:

  • created in 2005
  • top20 most popular problems on SPOJ with 8000+ users including all languages, 1400+ for python
  • our own Tim Peters is among the very best solvers, in Perl

SIZECON problem statement

SIZECON best solutions for Python 2

SIZECON best solutions for Python 3

The "python golf" master plan

  • correctness
    • reference solutions
  • algorithm wizardry
    • alternative algorithms
  • language wizardry
    • shortened solutions

Reference solutions - 50-ish

T = int(raw_input())
r = 0
for _ in range(T):
    n = int(raw_input())
    if n > 0:
        r += n
print r
Golf score: 107
SIZECON score: 70
print sum(max(0,input())for _ in range(input()))
Golf score: 48
SIZECON score: 44

Alternative algorithms

Nothing to see here*,

please move along.

* as long as you forget the bizarre ASCII special characters exception.

Shortened solutions - down to 33!

i=input
print sum(max(0,i())for _ in range(i()))
Golf score: 48
SIZECON score: 43
i=input
print eval("+max(0,i())"*i())
Golf score: 37
SIZECON score: 35
i=input
print sum(eval("max(0,i()),"*i()))
Golf score: 42
SIZECON score: 40
i=input
i(eval("+max(0,i())"*i()))
Golf score: 34
SIZECON score: 33

Child's play

Alternative algorithms

We can use as many ASCII special characters as we like...

Code legal characters

Alternative algorithms

We can use as many ASCII special characters as we like...

String literal legal characters

Alternative algorithms

Algorithm building blocks:

  1. build a string literal with lots of ASCII special characters
  2. turn the string literal into code
  3. actually do something with that code
original_solution = 'print sum(max(0,input())for _ in range(input()))'

encrypted_original_solution = ASCII32_encrypt(original_solution)

solution_template = "exec ASCII32_decrypt('{}')"
solution = solution_template.format(encrypted_original_solution)
with open('solution.py', 'w') as fp:
    fp.write(solution_code)
exec ASCII32_decrypt(' ** encrypted string literal ** ')

Building ASCII32_decrypt

What we want

  • ​it get as input a string full with ASCII control characters

  • it outputs a string of python code

  • il must be short!

str.translate(table)
Return a copy of the string where all characters [...] have been mapped through the given translation table, which must be a string of length 256.
exec' ** encrypted string literal ** '.translate(' ** decrypt table ** ')
SIZECON score: 20 + # non ASCII control characters

translate based ASCII32_decrypt

original_solution = 'print sum(max(0,input())for _ in range(input()))'
chars = ''.join(set(original_solution) - set([' ']))
decrypt_table = ' ' + chars[:12] + ' ' + chars[12:] + ' ' * (254 - len(chars))

encrypted_original_solution = ASCII32_encrypt(original_solution)

solution_template = "exec'{}'.translate('{}')"
solution = solution_template.format(encrypted_original_solution, decrypt_table)
with open('solution.py', 'w') as fp:
    fp.write(solution_code)
print sum(max(0,input())for i in range(input()))
SIZECON score: 20 + 18  # '(),0aefgimnoprstux'
print sum(max(int(),input())for i in range(input()))
SIZECON score: 20 + 17  # '(),aefgimnoprstux'
print sum(max(0,input())for _ in range(input()))
SIZECON score: 20 + 19  # '(),0_aefgimnoprstux'

This is a new, unusual shortening problem.

"SIZECON2" problem

Same as SIZECON, but score is the number of "different" characters with ASCII > 32

input(sum(max(int(),input())for i in range(input())))
SIZECON score: 20 + 17  # '(),aefgimnoprstux'
input(sum(n for n in(input()for i in repr(int())*input())if repr(int())*n))
SIZECON score: 20 + 14  # '()*efimnoprstu'
print sum(max(int(),input())for i in range(input()))
SIZECON score: 20 + 17  # '(),aefgimnoprstux'

...

How can we do better?

input(len(tuple(()for n in(input()*repr(int()))for i in repr(int())*input())))
SIZECON score: 20 + 13  # '()*efilnoprtu'

"SIZECON2" problem

"Downward is the only way forward."

exec"input(len(tuple(()for(n)in(repr((int())))*int(input())for(i)in(repr(int()))*int(input()))))"

SIZECON score: 20 + 12  # '"0124567\cex'

YES! 32!

'\160\162\151\156\164\040\042\110\145\154\154\157\040\167\157\162\154\144\041\042\073'

Octal representation of characters in literal strings!

exec"\160\162\151\156\164\040\042\110\145\154\154\157\040\167\157\162\154\144\041\042\073"
SIZECON score: 20 + 13  # '"01234567\cex'
'print "Hello World!";'

...

exec"\151\156\160\165\164\050\154\145\156\050\164\165\160\154\145\050\050\051\146\157\162\050\156\051\151\156\050\162\145\160\162\050\050\151\156\164\050\051\051\051\051\052\151\156\164\050\151\156\160\165\164\050\051\051\146\157\162\050\151\051\151\156\050\162\145\160\162\050\151\156\164\050\051\051\051\052\151\156\164\050\151\156\160\165\164\050\051\051\051\051\051"

How can we do better?

"Downward is the only way forward."

'\\160\\162\\151\\156\\164'
exec"exec\"\\151\\156\\160\\165\\164\\050\\154\\145\\156\\050\\164\\165\\160\\154\\145\\050\\050\\051\\146\\157\\162\\050\\156\\051\\151\\156\\050\\162\\145\\160\\162\\050\\050\\151\\156\\164\\050\\051\\051\\051\\051\\052\\151\\156\\164\\050\\151\\156\\160\\165\\164\\050\\051\\051\\146\\157\\162\\050\\151\\051\\151\\156\\050\\162\\145\\160\\162\\050\\151\\156\\164\\050\\051\\051\\051\\052\\151\\156\\164\\050\\151\\156\\160\\165\\164\\050\\051\\051\\051\\051\\051\""
'\\160\\162\\151\\156\\16'+'4'
'\\160\\162\\151\\156\\16'+repr(4)
'\\160\\162\\151\\156\\16'+repr(1+1+1+1)
'\\160\\162\\151\\156\\16'+`1+1+1+1`
repr(object)
Return a string containing a printable representation of an object. This is the same value yielded by conversions (reverse quotes).
SIZECON score: 20 + 12  # '"0124567\cex'
'print'

"Downward is the only way forward."

'\\160\\162\\151\\156\\16'+`1+1+1+1`
'\\160\\162\\151\\156\\1'+`1+1+1+1+1+1`+`1+1+1+1`
'\\160\\162\\151\\15'+`1+1+1+1+1+1`+'\\'+`1`+`1+1+1+1+1+1`+`1+1+1+1`
SIZECON score: 20 + 9  # '"+01\`cex'
exec"exec\"\\1"+`1+1+1+1+1`+"1\\1"+`1+1+1+1+1`+`1+1+1+1+1`+"1\\1"+`1+1+1+1+1`+"0\\"+...+"\""

...

'\\160'
'\\1'+`1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1`
SIZECON score: 20 + 8  # '"+1\`cex'
exec"exec\"\\1"+`1+1+1+1+1`+"1\\1"+`1+1+1+1+1`+`1+1+1+1+1`+"1\\1"+`1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1`+"\\"+...+"\""

YES! 28!

"That many exec's within exec's is too unstable."

Python interpreter: decrypt the ASCII32_encrypted string

exec level 1: build the numbers and compose the "string literal" string

exec level 2: parse the string literal

exec level 3: finally run the solution

SIZECON solution - Python interpreter

SIZECON solution - exec level 1

SIZECON solution - exec level 2

exec'\151\156\160\165\164\x28\145\166\141\154\x28\42\53\155\141\x78\x28\151\156\164\x28\51\54\151\156\160\165\164\x28\51\51\42\52\151\156\160\165\164\x28\51\51\51'

SIZECON solution - exec level 3

input(eval("+max(int(),input())"*input()))

SIZECON absolute best solutions!

"Your condescension, as always, is

much appreciated, thank you."

 

Alessandro Amici

<a.amici@bopen.eu>

@alexamici

http://linkedin.com/in/alexamici

 

B-Open - http://bopen.eu

Solving the web most popular shortening contest with Python - EuroPython 2015

By Alessandro Amici

Solving the web most popular shortening contest with Python - EuroPython 2015

“Code shortening” is the “sport” where participants strive to achieve the shortest possible source code that solves a programming problem by exploiting all the tricks and quirks of the language. The SIZECON on SPOJ is one of the oldest and most popular code shortening problems on the web with a bizarre twist, only character above ASCII value 32 are counted for the penalty. During the talk we will take a journey into some frightening depths of the Python language in order to write shorter and shorter solutions to SIZECON until, exploiting a number of truly mind-blowing tricks, we will reach the current record solution of 28 characters (above ASCII 32!). I promise I’ll show you the most obfuscated, contrived and sick python code you have ever seen and (hopefully!) will ever see. Full talk online at: https://www.youtube.com/watch?v=4-3zLTg3GKk

  • 2,394

More from Alessandro Amici