Constantly improve your
code
ARKADIUSZ KONDAS
Lead Software Architect
@ Proget Sp. z o.o.
Zend Certified Engineer
Code Craftsman
Ultra Runner
@ ArkadiuszKondas
Zend Certified Architect
The problem
The problem:
- Programmers makes mistakes
- Mistakes = bugs
- Bugs = vulnerabilities
- Goal: avoid bugs :)
How to find bugs?
- Analysing by monitoring and interacting with the application as it executes
- Fuzzing, penetrating testing, functional testing
- Analysing an application without executing it
- Code review, binary analysis, reverse engineering
Static analysis
Dynamic analysis
Code review
Manual
Code review
/**
* @param string $path
*/
public function read(string $path) : string
{
$this->filesystem->read($path);
}
public function doSomeStuff()
{
shell_exec('rm -rf ' . $_POST['dir']);
}
Manual code review is expensive
900 000 LOC
160 000 LOC
230 000 LOC
Manual code review
Steve McConnell (Code Complete) says 10-20 defects per 1000 lines of code
~ 13 500 bugs
~ 2 400 bugs
~ 3 450 bugs
Manual code review
~ 675 000 bugs
~ 1, 290 000 bugs
~ 360 000 bugs
~ 45 000 000 LOC
~ 86 000 000 LOC
~ 24 000 000 LOC
Code review
Automated
Static Analysis
- Automated searching of source code for known issues
- Higher up front costs
- ‘Free’ security once built and configured
- Catch low hanging fruit automatically
Code review
Automated
VS
Manual
Both!
Computer Science Theory
Computer Science Theory
To best use tools, you need to understand them.
- Language types
- Automata
- Parsers
Language
Alphabet
Symbols
Words
Grammar
Chomsky’s Language Hierarchy
Regular expressions
- Regular expressions can parse any regular language
- Process input until accept or error state is reached
Regular expressions
- Quick and easy to write, so low cost
- “Does my code match this very specific known issue?”
Examples
-
Bad imports
-
Calls to known dangerous functions
-
Known security misconfigurations
REGEX example
$data = file_get_contents($basePath . $_GET['filename']);
‘file_get_contents(.*(\$\_(GET|POST)))’
Code
Regex
REGEX example
if (DEBUG) {
printf('Some variable %s', $var1);
printf('Other variable %s', $var2);
printf('Another variable %s', $var3);
}
‘printf\(.*\)’
Code
Regex
Regex Disadvantages
- No way to maintain state
- Cannot back trace
Solutions
- Check backwards line by line until you reach beginning of file - inefficient
- Check X many previous lines – lots of false positives
Regex Disadvantages
- Regular expressions only match regular languages*
- Programming languages usually context-free
Chomsky’s Language Hierarchy
Context-Free Languages
- Superset of regular languages
- Anything that can be accepted by a pushdown automata
Parsers
- Converts text into a hierarchical data structure
- Construct a Parse Tree or Abstract Syntax Tree (AST) from the source code
- Two separate stages: Lexer and Parser
Parser
if (DEBUG)
{
printf(...);
printf(...);
printf(...);
}
if
code block
printf
printf
printf
Control Flow Graphs
- Allows tracing of execution dependant on given inputs
without running the application
- Trace data sinks back to original source
Control Flow Graphs
$result = login($_POST[‘user’], $_POST[‘password’]);
function login(user, password) {
return login_query(user, password);
}
function login_query(user, password) {
return mysqli_query('
select * from user
where user=' . $user . '
and password=' . $password . ';');
}
Parsers - cons
- Higher upfront cost to develop
- More computationally intensive
Tools
Tools
https://github.com/exakat/php-static-analysis-tools
- Bugs finders
- Coding standards
- Fixers
- Metrics
- DIY
jakzal/phpqa
docker pull jakzal/phpqa
alias phpqa='docker run
-it --rm -v `pwd`:/project
-w /project jakzal/phpqa'
phpqa phploc .
https://github.com/jakzal/phpqa
sensiolabs/security-checker
https://github.com/sensiolabs/security-checker
phploc
https://github.com/sebastianbergmann/phploc
phploc 4.0.0 by Sebastian Bergmann.
Directories 826
Files 3695
Size
Lines of Code (LOC) 426055
Comment Lines of Code (CLOC) 102659 (24.10%)
Non-Comment Lines of Code (NCLOC) 323396 (75.90%)
Logical Lines of Code (LLOC) 103807 (24.36%)
Classes 89321 (86.05%)
Average Class Length 23
Minimum Class Length 0
Maximum Class Length 894
Average Method Length 3
Minimum Method Length 0
Maximum Method Length 143
Functions 257 (0.25%)
Average Function Length 0
Not in classes or functions 14229 (13.71%)
phploc
https://github.com/sebastianbergmann/phploc
Cyclomatic Complexity
Average Complexity per LLOC 0.18
Average Complexity per Class 6.06
Minimum Class Complexity 1.00
Maximum Class Complexity 339.00
Average Complexity per Method 1.91
Minimum Method Complexity 1.00
Maximum Method Complexity 155.00
Dependencies
Global Accesses 187
Global Constants 12 (6.42%)
Global Variables 4 (2.14%)
Super-Global Variables 171 (91.44%)
Attribute Accesses 27479
Non-Static 26295 (95.69%)
Static 1184 (4.31%)
Method Calls 80455
Non-Static 77774 (96.67%)
Static 2681 (3.33%)
phploc
https://github.com/sebastianbergmann/phploc
Structure
Namespaces 774
Interfaces 309
Traits 56
Classes 3397
Abstract Classes 181 (5.33%)
Concrete Classes 3216 (94.67%)
Methods 20996
Scope
Non-Static Methods 20369 (97.01%)
Static Methods 627 (2.99%)
Visibility
Public Methods 17603 (83.84%)
Non-Public Methods 3393 (16.16%)
Functions 1211
Named Functions 30 (2.48%)
Anonymous Functions 1181 (97.52%)
Constants 846
Global Constants 6 (0.71%)
Class Constants 840 (99.29%)
phpmetrics/phpmetrics
Deptrac
# depfile.yml
paths:
- ./src
exclude_files:
- .*test.*
layers:
- name: Domain
collectors:
- type: className
regex: .*Domain.*
- name: Application
collectors:
- type: className
regex: .*Application.*
- name: Infrastructure
collectors:
- type: className
regex: .*Infrastructure.*
- name: UserInterface
collectors:
- type: className
regex: .*UserInterface.*
https://github.com/sensiolabs-de/deptrac
Deptrac
# depfile.yml
ruleset:
Domain: ~
Application:
- Domain
Infrastructure:
- Application
- Domain
UserInterface:
- Application
Deptrac
Start to create an AstMap for 119 Files.
.......................................................................................................................
AstMap created.
start emitting dependencies "InheritanceDependencyEmitter"
start emitting dependencies "BasicDependencyEmitter"
end emitting dependencies
start flatten dependencies
end flatten dependencies
collecting violations.
formatting dependencies.
Phpml\Classification\MLPClassifier::8 must not depend on Phpml\Math\Matrix (Classification on Math)
Phpml\Classification\MLPClassifier::22 must not depend on Phpml\Math\Matrix (Classification on Math)
Found 2 Violations
Continuous Integration
Commercial Tools
Scrutinizer
https://scrutinizer-ci.com/
Insight
https://insight.sensiolabs.com
Code climate
https://codeclimate.com
Other
- Bliss - Automatically reviews code in real-time and shows how much it's worth in lines of code.
- Checkmarx - Get a full PHP static security code analysis and prevent security vulnerabilities.
- Codacy - Codacy: Automated Code Review..
- RIPS - The superior security software for PHP applications. Source code static analyser for vulnerabilities.
- SideCI - CI for automated code review by code analysis.
Summary
Summary
- Static analysis can provide low-cost security checks once configured
- ASTs and CFGs let you do all kinds of awesome things
- Automated code analysis complements traditional manual assessment
Q&A
Thanks for listening
@ ArkadiuszKondas
https://slides.com/arkadiuszkondas
https://joind.in/talk/07b5c
Constantly improve your PHP code
By Arkadiusz Kondas
Constantly improve your PHP code
When testing a PHP application, whether manually or automatically, programmers spend a lot of time debugging code that would not even be compiled in other languages. This leaves less time to test the real business logic that should be most important. By statically analyzing the code, a number of interesting tools have been developed to fully automate the Continuous Improvement process. This presentation will be an objective overview of the latest developments that will allow you to continually improve the quality of your code in your projects.
- 1,756