Constantly improve your

code

ARKADIUSZ KONDAS

Lead Software Architect
@ Proget Sp. z o.o.

Zend Certified Engineer

Code Craftsman

Ultra Runner

@ ArkadiuszKondas

Zend Certified Architect

The problem

The problem:

  • Programmers makes mistakes
  • Mistakes = bugs
  • Bugs = vulnerabilities
  • Goal: avoid bugs :) 

How to find bugs?

 

  • Analysing by monitoring and interacting with the application as it executes
  • Fuzzing, penetrating testing, functional testing

 

  • Analysing an application without executing it
  • Code review, binary analysis, reverse engineering

Static analysis

Dynamic analysis

Code review

Manual

Code review

    /**
     * @param string $path
     */
    public function read(string $path) : string
    {
        $this->filesystem->read($path);
    }

    public function doSomeStuff()
    {
        shell_exec('rm -rf ' . $_POST['dir']);
    }

Manual code review is expensive

900 000 LOC

160 000 LOC

230 000 LOC

Manual code review

Steve McConnell (Code Complete) says 10-20 defects per 1000 lines of code

~ 13 500 bugs

~ 2 400 bugs

 ~ 3 450 bugs

Manual code review

~ 675 000 bugs

~ 1, 290 000 bugs

 ~ 360 000 bugs

~ 45 000 000 LOC

~ 86 000 000 LOC

 ~ 24 000 000 LOC

Code review

Automated

Static Analysis

  • Automated searching of source code for known issues
     
  • Higher up front costs
     
  • ‘Free’ security once built and configured
     
  • Catch low hanging fruit automatically

Code review

Automated

VS

Manual

Both!

Computer Science Theory

Computer Science Theory

To best use tools, you need to understand them.

  • Language types
  • Automata
  • Parsers

Language

Alphabet

Symbols

Words

Grammar

Chomsky’s Language Hierarchy

Regular expressions

  • Regular expressions can parse any regular language
  • Process input until accept or error state is reached

Regular expressions

  • Quick and easy to write, so low cost
     
  • “Does my code match this very specific known issue?”

Examples

  • Bad imports

  • Calls to known dangerous functions

  • Known security misconfigurations

     

REGEX example


$data = file_get_contents($basePath . $_GET['filename']);

‘file_get_contents(.*(\$\_(GET|POST)))’

Code

Regex

REGEX example

if (DEBUG) {
    printf('Some variable %s', $var1);
    printf('Other variable %s', $var2);
    printf('Another variable %s', $var3);
}
‘printf\(.*\)’

Code

Regex

Regex Disadvantages

  • No way to maintain state
  • Cannot back trace

Solutions

  • Check backwards line by line until you reach beginning of file - inefficient
  • Check X many previous lines – lots of false positives

Regex Disadvantages

  • Regular expressions only match regular languages*
     
  • Programming languages usually context-free

Chomsky’s Language Hierarchy

Context-Free Languages

  • Superset of regular languages
  • Anything that can be accepted by a pushdown automata

Parsers

  • Converts text into a hierarchical data structure
     
  • Construct a Parse Tree or Abstract Syntax Tree (AST) from the source code
     
  • Two separate stages: Lexer and Parser

Parser

if (DEBUG)
{
    printf(...);
    printf(...);
    printf(...);
}

if

code block

printf

printf

printf

Control Flow Graphs

  • Allows tracing of execution dependant on given inputs
    without running the application
     
  • Trace data sinks back to original source

Control Flow Graphs

$result = login($_POST[‘user’], $_POST[‘password’]);

function login(user, password) {
    return login_query(user, password);
}

function login_query(user, password) {
    return mysqli_query('
        select * from user 
        where user=' . $user . ' 
        and password=' . $password . ';');
}

Parsers - cons

  • Higher upfront cost to develop
     
  • More computationally intensive

Tools

Tools

https://github.com/exakat/php-static-analysis-tools

  • Bugs finders
  • Coding standards
  • Fixers
  • Metrics
  • DIY

jakzal/phpqa

docker pull jakzal/phpqa
alias phpqa='docker run 
-it --rm -v `pwd`:/project 
-w /project jakzal/phpqa'
phpqa phploc .

https://github.com/jakzal/phpqa

sensiolabs/security-checker

https://github.com/sensiolabs/security-checker

phploc

https://github.com/sebastianbergmann/phploc

phploc 4.0.0 by Sebastian Bergmann.

Directories                                        826
Files                                             3695

Size
  Lines of Code (LOC)                           426055
  Comment Lines of Code (CLOC)                  102659 (24.10%)
  Non-Comment Lines of Code (NCLOC)             323396 (75.90%)
  Logical Lines of Code (LLOC)                  103807 (24.36%)
    Classes                                      89321 (86.05%)
      Average Class Length                          23
        Minimum Class Length                         0
        Maximum Class Length                       894
      Average Method Length                          3
        Minimum Method Length                        0
        Maximum Method Length                      143
    Functions                                      257 (0.25%)
      Average Function Length                        0
    Not in classes or functions                  14229 (13.71%)

phploc

https://github.com/sebastianbergmann/phploc

Cyclomatic Complexity
  Average Complexity per LLOC                     0.18
  Average Complexity per Class                    6.06
    Minimum Class Complexity                      1.00
    Maximum Class Complexity                    339.00
  Average Complexity per Method                   1.91
    Minimum Method Complexity                     1.00
    Maximum Method Complexity                   155.00

Dependencies
  Global Accesses                                  187
    Global Constants                                12 (6.42%)
    Global Variables                                 4 (2.14%)
    Super-Global Variables                         171 (91.44%)
  Attribute Accesses                             27479
    Non-Static                                   26295 (95.69%)
    Static                                        1184 (4.31%)
  Method Calls                                   80455
    Non-Static                                   77774 (96.67%)
    Static                                        2681 (3.33%)

phploc

https://github.com/sebastianbergmann/phploc

Structure
  Namespaces                                       774
  Interfaces                                       309
  Traits                                            56
  Classes                                         3397
    Abstract Classes                               181 (5.33%)
    Concrete Classes                              3216 (94.67%)
  Methods                                        20996
    Scope
      Non-Static Methods                         20369 (97.01%)
      Static Methods                               627 (2.99%)
    Visibility
      Public Methods                             17603 (83.84%)
      Non-Public Methods                          3393 (16.16%)
  Functions                                       1211
    Named Functions                                 30 (2.48%)
    Anonymous Functions                           1181 (97.52%)
  Constants                                        846
    Global Constants                                 6 (0.71%)
    Class Constants                                840 (99.29%)

phpmetrics/phpmetrics

Deptrac

# depfile.yml
paths:
  - ./src
exclude_files:
  - .*test.*
layers:
  - name: Domain
    collectors:
      - type: className
        regex: .*Domain.*
  - name: Application
    collectors:
      - type: className
        regex: .*Application.*
  - name: Infrastructure
    collectors:
      - type: className
        regex: .*Infrastructure.*
  - name: UserInterface
    collectors:
      - type: className
        regex: .*UserInterface.*

https://github.com/sensiolabs-de/deptrac

Deptrac

# depfile.yml


ruleset:
  Domain: ~
  Application:
    - Domain
  Infrastructure:
    - Application
    - Domain
  UserInterface:
    - Application

Deptrac

Start to create an AstMap for 119 Files.
.......................................................................................................................
AstMap created.
start emitting dependencies "InheritanceDependencyEmitter"
start emitting dependencies "BasicDependencyEmitter"
end emitting dependencies
start flatten dependencies
end flatten dependencies
collecting violations.
formatting dependencies.
Phpml\Classification\MLPClassifier::8 must not depend on Phpml\Math\Matrix (Classification on Math)
Phpml\Classification\MLPClassifier::22 must not depend on Phpml\Math\Matrix (Classification on Math)

Found 2 Violations

Continuous Integration

Commercial Tools

Scrutinizer

https://scrutinizer-ci.com/

Insight

https://insight.sensiolabs.com

Code climate

https://codeclimate.com

Other

  • Bliss - Automatically reviews code in real-time and shows how much it's worth in lines of code.
  • Checkmarx - Get a full PHP static security code analysis and prevent security vulnerabilities.
  • Codacy - Codacy: Automated Code Review..
  • RIPS - The superior security software for PHP applications. Source code static analyser for vulnerabilities.
  • SideCI - CI for automated code review by code analysis.

Summary

Summary

  • Static analysis can provide low-cost security checks once configured
     
  • ASTs and CFGs let you do all kinds of awesome things
     
  • Automated code analysis complements traditional manual assessment

Q&A

Thanks for listening

@ ArkadiuszKondas

https://slides.com/arkadiuszkondas

https://joind.in/talk/07b5c

Constantly improve your PHP code

By Arkadiusz Kondas

Constantly improve your PHP code

When testing a PHP application, whether manually or automatically, programmers spend a lot of time debugging code that would not even be compiled in other languages. This leaves less time to test the real business logic that should be most important. By statically analyzing the code, a number of interesting tools have been developed to fully automate the Continuous Improvement process. This presentation will be an objective overview of the latest developments that will allow you to continually improve the quality of your code in your projects.

  • 1,756