Exploring new methods of code structure analysis to support understanding and evolution of complex IT systems

mgr inż. Krzysztof Borowski

dr hab. Bartosz Baliś

dr Tomasz Orzechowski

Presentation plan

  1. Modern software development

  2. The current state of the art in code understanding space

  3. Semantic Code Graph as a semantic aware code structure representation

  4. Semantic Code Graph applications

Current state of the software development

  • January 2020 - GitHub reports over 190 million repositories, 40 million active users (In 2010 - 1 million repositories)

  • Estimated billions lines of new code produced yearly

https://medium.com/modern-stack/how-much-computer-code-has-been-written-c8c03100f459

A big ball of mud is a software system that lacks a perceivable architecture. Although undesirable from a software engineering point of view, such systems are common in practice due to business pressures, developer turnover and code entropy. They are a type of design anti-pattern.

https://en.wikipedia.org/wiki/Big_ball_of_mud

Properly maintained software enables rapid software evolution to address changing requirements.

In 2021, IT spending on enterprise software is expected to amount to around 601 billion U.S. dollars worldwide, a growth of 13.6 percent from the previous year.

Current software development tools

Understanding the source code

  • IDE advanced features
    • syntax highlights
    • integrated compilers
    • semantically browsing the source code
    • debugging
  • IDE integrations
    • version control systems integrations
    • others (ticketing systems, database plugins, containers plugins)
  • Visualizations (mostly UML diagrams)

Guarding the code quality [3]

  • Linters
    • guarding against common pitfalls
    • enforcing programming style
    • usually integrated with the compiler (i.e. scalastyle, scapegoat)
  • IDE suggestions and guidance (often based on integrated linters)
  • Tests (unite tests, integration tests) - more about code correctness than quality
  • Static code analysis in External Server (SonarQube)
    • gathering metrics from different tools
    • static code analysis
    • code coverage

Understanding and visualization

  • Source Trail https://www.sourcetrail.com/documentation/ (C++, Java) (free)
  • NDepend - https://www.ndepend.com (C#) (paid)
  • JArchitect - https://www.jarchitect.com (paid)

Guarding project structure

  • ArchUnit - https://www.archunit.org
    • Java-specific (bytecode and reflection)
    • Should be applied from the very beginnig
    • Helps with maintenance not with understanding the current state
ArchRule myRule = classes()
    .that().resideInAPackage("..service..")
    .should().onlyBeAccessed().byAnyPackage("..controller..", "..service..")

Can the tools be better?

  • Understanding project semantic structure

    • folders and files
    • browsing flatly structured files
    • invisible coupling
  • Maintaining the project semantic structure

    • keeping the code modular
    • advising on project structure
    • guarding against accidental complexity (cyclic dependencies, highly coupled modules and classes)
    • guarding against code structure degradation over time

Can we represent the source code differently?

Abstract code representations [2]

  • Abstract Syntax Tree (AST)
  • Control Flow Graph
  • Data Flow Graph
  • Program Dependence Graph
  • Code Property Graph 

Abstract Syntax Tree

Abstract Syntax Tree

Control Flow Graph

Program Dependency Graph

CFG + DFG

Code Property Graph [4]

CFG + PDG + AST

Call Graph

Semantic code structure representation

  • Representing code syntax structure
  • Representing code semantic structure
class A()

object AFactory {
  def createA() = {
    new A()
  }
}

Semantic Code Graph (Scala)

class A()
object AFactory {
  def createA() = {
    new A()
  }
}

Semantic Code Graph - Code Structure

class A() {
  def runA(): Unit = ...
  def run(): Unit = ...
}
object AFactory {
  def createA() = {
    new A()
  }
}

def main() = {
  AFactory.createA().runA()
}

 hypothesis:

the code structure and dependencies of any program can be represented as a directed graph, precise enough to be valuable in various analyses and visualizations

Semantic Code Graph extraction

  • Semantic Graphs Scalac Compiler Plugin - published
  • proto files generated during compilation process available for later static analysis
  • Java Semantic Graphs Extractor in progress 

Common format

syntax = "proto3";

message Location {
    string uri = 1;
    int32 startLine = 2;
    int32 startCharacter = 3;
    int32 endLine = 4;
    int32 endCharacter = 5;
}

message Edge {
    string to = 1;
    string type = 2;
    Location location = 3;
    map<string, string> properties = 4;
}

message GraphNode {
    string id = 1;
    string kind = 2;
    Location location = 3;
    map<string, string> properties = 4;
    string displayName = 5;
    repeated Edge edges = 6;
}

message SemanticGraphFile {
    string uri = 1;
    repeated GraphNode nodes = 2;
}

Various Semantic Code Graph applications

Framework for working with the code

  1. Learning about the software (big picture)
  2. Browsing interactively the source code graph
  3. Monitoring code evolution

 

Learning about the software

(big picture)

Visual help - metals library

Visual help - betweenness

Visual help - Eigenvector

Visual help - LOC

Semantic Graphs Analytic

Analyzing and combining centralities:

  1. Degree Centrality
  2. Closeness Centrality
  3. Betweenness Centrality*
  4. Eigenvector Centrality
    + LOC metric

Finding noticable endpoints via Eccentricity

Betweenness centrality

where \(\sigma_{st}\) is the total number of shortest paths from  node \(s\) to node \(t\)  and \(\sigma_{st}(v)\)  is the number of those paths that pass through .

Project insights - spark

Project insights - spark

Project insights - akka

Browsing interactively the source code graph

Graph Buddy

Semantic Code Graph visualization seamlessly integrated with IDE

Graph Buddy

Call hierarchy

Code structure

Architecture

Code evolution

  • Average node degree
  • Graph diameter
  • Modularity 
  • Graph depth
  • Average clustering coefficient

Code evolution

Ideas yet to be explored

  • Architecture Release Guard
  • Architecture Code Guard

Towards better modularization

Future plans

  • Graph Buddy plugin evaluation
  • Metrics based tips in IDE
  • Monitoring project evolution [5]
  • Integration with popular CIs
  • Integration with SonarQube

 Bibliography

 

[1] A. Bandi, B. J. Williams, and E. B. Allen, “Empirical evidence of codedecay: A systematic mapping study,” in2013 20th Working Conferenceon Reverse Engineering (WCRE), 2013, pp. 341–350.

[2] R. Arora and S. Goel, “Javarelationshipgraphs (jrg): Transforming java projects into graphs using neo4j graph databases,” in Proceedings of the 2nd International Conference on Software Engineering and Information Management, ser. ICSIM 2019.New York, NY, USA: Association for Computing Machinery, 2019, p. 80–84. [Online].Available: https://doi.org/10.1145/3305160.3305173

[3] V. Walunj, G. Gharibi, D. H. Ho, and Y. Lee, “Graphevo: Characterizing and understanding software evolution using call graphs,” in2019 IEEEInternational Conference on Big Data (Big Data), 2019, pp. 4799–4807.

[4] L. Bedu, O. Tinh, and F. Petrillo, “A tertiary systematic literature review on software visualization,” in2019 Working Conference on SoftwareVisualization (VISSOFT), 2019, pp. 33–44.

[5] J. Ferrante, K. J. Ottenstein, and J. D. Warren, “The program dependence graph and its use in optimization, ”ACM Trans. Program.Lang. Syst., vol. 9, no. 3, p. 319–349, Jul. 1987. [Online]. Available:https://doi.org/10.1145/24039.24041

[6] B. G. Ryder, “Constructing the call graph of a program,”IEEE Transactions on Software Engineering, vol. SE-5, no. 3, pp. 216–226, May1979.

[7] D. Grove, G. DeFouw, J. Dean, and C. Chambers, “Call graph construction in object-oriented languages, ”SIGPLAN Not., vol. 32,no. 10, p. 108–124, Oct. 1997. [Online]. Available: https://doi.org/10.1145/263700.264352

[8] J. Bohnet and J. D ̈ollner, “Visual exploration of function call graphs for feature location in complex software systems,” in Proceedings of the 2006 ACM Symposium on Software Visualization, ser. SoftVis ’06.New York, NY, USA: Association for Computing Machinery, 2006, p.95–104. [Online]. Available: https://doi.org/10.1145/1148493.1148508

[9] G. Shu, B. Sun, T. A. D. Henderson, and A. Podgurski, “Javapdg:A new platform for program dependence analysis,” in2013 IEEESixth International Conference on Software Testing, Verification andValidation, 2013, pp. 408–415

Exploring new methods of code structure analysis 2021

By liosedhel

Exploring new methods of code structure analysis 2021

  • 787