Exploring new methods of code structure analysis to support understanding and evolution of complex IT systems

mgr inż. Krzysztof Borowski

dr hab. Bartosz Baliś

dr Tomasz Orzechowski

Presentation plan

  1. Maintenance and evolution of complex software programs

  2. Software development tools

    1. Understanding the source code

    2. Guarding code quality

  3. Why available today tools are not sufficient?

  4. Semantic Code Graph as an abstract code representation model

  5. Various Semantic Code Graph applications

    1. Code visualization - Graph Buddy

    2. Monitoring software structure quality

  6. Future plans

Current state of the software development

  • January 2020 - GitHub reports over 190 million repositories, 40 million active users (In 2010 - 1 million repositories)

  • Estimated billions lines of new code produced yearly

https://medium.com/modern-stack/how-much-computer-code-has-been-written-c8c03100f459

Maintenance of complex software programs [1]

"The only constant in code is change"

  • Changing functional requirements
    • business evolution
    • not advancing means falling behind
  • Changing non-functional requirements 
    • Availability, scalability, reliability
    • Next-generation libraries
    • Next-generation languages
    • New technologies

A big ball of mud is a software system that lacks a perceivable architecture. Although undesirable from a software engineering point of view, such systems are common in practice due to business pressures, developer turnover and code entropy. They are a type of design anti-pattern.

https://en.wikipedia.org/wiki/Big_ball_of_mud

Properly maintained software enables rapid software evolution to address changing requirements.

Current software development tools

  • Understanding the source code
  • Guarding the code quality

Understanding the source code

  • IDE advanced features
    • syntax highlights
    • integrated compilers
    • semantically browsing the source code
    • debugging
  • IDE integrations
    • version control systems integrations
    • others (ticketing systems, database plugins, containers plugins)
  • Visualizations (mostly UML diagrams)

Guarding the code quality [3]

  • Linters
    • guarding against common pitfalls
    • enforcing programming style
    • usually integrated with the compiler (i.e. scalastyle, scapegoat)
  • IDE suggestions and guidance (often based on integrated linters)
  • Tests (unite tests, integration tests) - more about code correctness than quality
  • Static code analysis in External Server (SonarQube)
    • gathering metrics from different tools
    • static code analysis
    • code coverage

Tools available today are not sufficient

  • Understanding project semantic structure

    • folders and files
    • browsing flatly structured files
    • invisible coupling
  • Maintaining the project semantic structure

    • keeping the code modular
    • advising on project structure
    • guarding against accidental complexity (cyclic dependencies, highly coupled modules and classes)
    • guarding against code structure degradation over time

Understanding and visualization

  • Source Trail https://www.sourcetrail.com/documentation/ (C++, Java) (free)
  • NDepend - https://www.ndepend.com (C#) (paid)
  • JArchitect - https://www.jarchitect.com (paid)

Guarding project structure

  • ArchUnit - https://www.archunit.org
    • Java-specific (bytecode and reflection)
    • Should be applied from the very beginnig
    • Helps with maintenance not with understanding the current state
ArchRule myRule = classes()
    .that().resideInAPackage("..service..")
    .should().onlyBeAccessed().byAnyPackage("..controller..", "..service..")

Abstract code representations [2]

  • Abstract Syntax Tree (AST)
  • Control Flow Graph
  • Program Dependence Graph
  • Code Property Graph 

Abstract Syntax Tree

Control Flow Graph

Program Dependency Graph

Code Property Graph [4]

Semantic code structure representation

  • Representing code syntax structure
  • Representing code semantic structure
class A()

object AFactory {
  def createA() = {
    new A()
  }
}

Semantic Code Graph (Scala)

class A()
object AFactory {
  def createA() = {
    new A()
  }
}

 hypothesis:

the code structure and dependencies of any program can be represented as a directed graph, precise enough to be valuable in various analyses and visualizations

Semantic Code Graph

Common format

syntax = "proto3";

message Location {
    string uri = 1;
    int32 startLine = 2;
    int32 startCharacter = 3;
    int32 endLine = 4;
    int32 endCharacter = 5;
}

message Edge {
    string to = 1;
    string type = 2;
    Location location = 3;
    map<string, string> properties = 4;
}

message GraphNode {
    string id = 1;
    string kind = 2;
    Location location = 3;
    map<string, string> properties = 4;
    string displayName = 5;
    repeated Edge edges = 6;
}

message SemanticGraphFile {
    string uri = 1;
    repeated GraphNode nodes = 2;
}

Various Semantic Code Graph applications

Visual help

Graph Buddy - demo

Semantic Code Graph visualization seamlessly integrated with IDE

Graph Buddy

Software structure monitoring [6]

  • Particular nodes
    • Betweenness centrality
    • Clustering coefficient
    • Modularity class
  • Whole graph
    • Finding communities
    • Average node degree
    • Graph diameter
    • Modularity 
    • Graph depth

Betweenness centrality

where \(\sigma_{st}\) is the total number of shortest paths from  node \(s\) to node \(t\)  and \(\sigma_{st}(v)\)  is the number of those paths that pass through .

Where we want to be

Future plans

  • Graph Buddy plugin evaluation
  • Metrics based tips in IDE
  • Monitoring project evolution [5]
  • Integration with popular CIs
  • Integration with SonarQube

 Bibliography

[1] S.  G.  Eick,  T.  L.  Graves,  A.  F.  Karr,  J.  S.  Marron,  and  A.  Mockus,“Does  code  decay?  assessing  the  evidence  from  change  managementdata,”IEEE Transactions on Software Engineering,  vol.  27,  no.  1,  pp.1–12, 2001.

[2] Vinay Arora, Rajesh Bhatia, and Maninder Singh. “Evaluation of FlowGraph and Dependence Graphs for Program Representation”. In:Interna-tional Journal of Computer Applications56 (Oct. 2012), pp. 18–23.doi:10.5120/8959-3161.

[3] J.  Bohnet  and  J.  D ̈ollner,  “Monitoring  code  quality  and  developmentactivity by software maps,”Proceedings - International Conference onSoftware Engineering, 01 2011.

[4] F. Yamaguchi et al. “Modeling and Discovering Vulnerabilities with CodeProperty Graphs”. In:2014 IEEE Symposium on Security and Privacy.2014, pp. 590–604.doi:10.1109/SP.2014.44.

[5] P. Bhattacharya et al. “Graph-based analysis and prediction for softwareevolution”. In:2012 34th International Conference on Software Engineering(ICSE). June 2012, pp. 419–429

[6] . Walunj et al. “GraphEvo: Characterizing and Understanding SoftwareEvolution using Call Graphs”. In:2019 IEEE International Conference onBig Data (Big Data). 2019, pp. 4799–4807.

Exploring new methods of code structure analysis

By liosedhel

Exploring new methods of code structure analysis

  • 1,042