Reformating your code without AI

let's see how a formatter works

Grab the slides:
https://slides.com/cheukting_ho/reformating-code

Hello I am Cheuk

  • Open-Source contributor


     
  • Organisers of community events


     
  • PSF director and fellow
     
  • Community manager at OpenSSF

Do you write code with another person?

👩‍✈️

👀

We don't actually need AI to reformat the code

There are so many good reformatter avaliable

Have you used Black?

or PyBetter or Autoflake or isort or...

Do you know how it works?

Do you want to make one yourself?

Chapter 1:
How to analyse Python codes

Challenge:
How to represent code in a tree structure

Answer:
Abstract Syntax Tree (AST)

fn(1, 2)  # calls fn

Source: https://libcst.readthedocs.io/en/latest/why_libcst.html

Useful for

  • compilers / interpreters
  • syntax analysis
  • optimization
  • essence of what the code does

Challenge:
Formatters need the details

(They care about those white spaces)

Answer:
Concrete Syntax Tree (CST)

fn(1, 2)  # calls fn

Source: https://libcst.readthedocs.io/en/latest/why_libcst.html

Still no white spaces!!!

How?

LibCST

(*not all formatters uses LibCST)

LibCST parses Python 3.0 -> 3.11 source code as a CST tree that keeps all formatting details (comments, whitespaces, parentheses, etc). It's useful for building automated refactoring (codemod) applications and linters.

fn(1, 2)  # calls fn

Source: https://libcst.readthedocs.io/en/latest/why_libcst.html

Parse Source Code

import libcst as cst

cst.parse_expression("1 + 2")
BinaryOperation(
    left=Integer(
        value='1',
        lpar=[],
        rpar=[],
    ),
    operator=Add(
        whitespace_before=SimpleWhitespace(
            value=' ',
        ),
        whitespace_after=SimpleWhitespace(
            value=' ',
        ),
    ),
    right=Integer(
        value='2',
        lpar=[],
        rpar=[],
    ),
    lpar=[],
    rpar=[],
)

Chapter 2:
Matching code patterns

Trees generated by LibCST can be huge

How do we know where to refractor?

Imagine you are writing a reformatter...

Using Matchers and Visitors

Matchers

  • flexible
  • define the shape
  • provide matching logic like
    (e.g. ZeroOrMore, OneOf, AtLeastN)
  • replace isinstance
def if_there_is_addition(node):
    return m.matches(
        node,
        m.BinaryOperation(
            operator=m.Add(),
        ),
    )

Visitors

  • read-only
  • traverse the tree
  • provide functions like
    on_visit and on_leave
  • visit nodes one at a time
class CountAdd(cst.CSTVisitor):
    def __init__(self):
        self.add_count = 0

    def visit_Add(self, node):
        self.add_count += 1
        print(f"counting add number {self.add_count}")

    def leave_Add(self, node):
        print("bye!")

source = "1 + 2"
module = cst.parse_module(source)
module.visit(CountAdd())

Chapter 3:
Rewrite Python codes

Visitors are read-only

to travel through and modify code

we need Transformers

Transformers

  • can read or write
  • traverse the tree
  • visit nodes one at a time
  • like Visitor but also has 
    updated_node
class OffByOne(cst.CSTTransformer):
    def __init__(self):
        pass

    def leave_BinaryOperation(self, original_node, updated_node):
        if first_addition(original_node):
            return updated_node.with_changes(
                left=cst.BinaryOperation(
                    left=cst.Integer(
                        value='1',
                        lpar=[],
                        rpar=[],
                    ),
                    operator=cst.Add(
                        whitespace_before=cst.SimpleWhitespace(
                            value=' ',
                        ),
                        whitespace_after=cst.SimpleWhitespace(
                            value=' ',
                        ),
                    ),
                    right=original_node.left,
                    lpar=[],
                    rpar=[],
                )
            )
        else:
            return updated_node.with_changes()

Generate Source Code

it is as easy as accessing the code attribute

source = "1 + 2"
module = cst.parse_module(source)
new = module.visit(OffByOne())
print(new.code)

Conclusion

I think...

 

  • A parser expresses code in a tree structure,
     
  • which makes it easy to carry out static analysis,
     
  • so that we can reformat our code,
     
  • in a way that could make collaboration better

Why not learn a bit more about parsers?

And write your own

https://conference.pyladies.com/

Reformating your code without AI

By Cheuk Ting Ho

Reformating your code without AI

  • 318