Exploring new methods of code structure analysis to support understanding and evolution of complex IT systems
mgr inż. Krzysztof Borowski
dr hab. Bartosz Baliś
dr Tomasz Orzechowski


Presentation plan
-
Modern software development
-
The current state of the art in code understanding space
-
Semantic Code Graph as a semantic aware code structure representation
-
Semantic Code Graph applications


Current state of the software development


-
January 2020 - GitHub reports over 190 million repositories, 40 million active users (In 2010 - 1 million repositories)
-
Estimated billions lines of new code produced yearly

A big ball of mud is a software system that lacks a perceivable architecture. Although undesirable from a software engineering point of view, such systems are common in practice due to business pressures, developer turnover and code entropy. They are a type of design anti-pattern.
https://en.wikipedia.org/wiki/Big_ball_of_mud


Properly maintained software enables rapid software evolution to address changing requirements.


In 2021, IT spending on enterprise software is expected to amount to around 601 billion U.S. dollars worldwide, a growth of 13.6 percent from the previous year.
Current software development tools


Understanding the source code
- IDE advanced features
- syntax highlights
- integrated compilers
- semantically browsing the source code
- debugging
- IDE integrations
- version control systems integrations
- others (ticketing systems, database plugins, containers plugins)
- Visualizations (mostly UML diagrams)


Guarding the code quality [3]
- Linters
- guarding against common pitfalls
- enforcing programming style
- usually integrated with the compiler (i.e. scalastyle, scapegoat)
- IDE suggestions and guidance (often based on integrated linters)
- Tests (unite tests, integration tests) - more about code correctness than quality
- Static code analysis in External Server (SonarQube)
- gathering metrics from different tools
- static code analysis
- code coverage


Understanding and visualization


- Source Trail https://www.sourcetrail.com/documentation/ (C++, Java) (free)
- NDepend - https://www.ndepend.com (C#) (paid)
- JArchitect - https://www.jarchitect.com (paid)


Guarding project structure
- ArchUnit - https://www.archunit.org
- Java-specific (bytecode and reflection)
- Should be applied from the very beginnig
- Helps with maintenance not with understanding the current state


ArchRule myRule = classes()
.that().resideInAPackage("..service..")
.should().onlyBeAccessed().byAnyPackage("..controller..", "..service..")
Can the tools be better?
-
Understanding project semantic structure
- folders and files
- browsing flatly structured files
- invisible coupling
-
Maintaining the project semantic structure
- keeping the code modular
- advising on project structure
- guarding against accidental complexity (cyclic dependencies, highly coupled modules and classes)
- guarding against code structure degradation over time


Can we represent the source code differently?



Abstract code representations [2]
- Abstract Syntax Tree (AST)
- Control Flow Graph
- Data Flow Graph
- Program Dependence Graph
- Code Property Graph



Abstract Syntax Tree


Abstract Syntax Tree



Control Flow Graph


Program Dependency Graph



CFG + DFG
Code Property Graph [4]



CFG + PDG + AST
Call Graph



Semantic code structure representation
- Representing code syntax structure
- Representing code semantic structure
class A()
object AFactory {
def createA() = {
new A()
}
}


Semantic Code Graph (Scala)
class A()
object AFactory {
def createA() = {
new A()
}
}



Semantic Code Graph - Code Structure
class A() {
def runA(): Unit = ...
def run(): Unit = ...
}
object AFactory {
def createA() = {
new A()
}
}
def main() = {
AFactory.createA().runA()
}



hypothesis:
the code structure and dependencies of any program can be represented as a directed graph, precise enough to be valuable in various analyses and visualizations


Semantic Code Graph extraction
- Semantic Graphs Scalac Compiler Plugin - published
- proto files generated during compilation process available for later static analysis
- Java Semantic Graphs Extractor in progress

Common format
syntax = "proto3";
message Location {
string uri = 1;
int32 startLine = 2;
int32 startCharacter = 3;
int32 endLine = 4;
int32 endCharacter = 5;
}
message Edge {
string to = 1;
string type = 2;
Location location = 3;
map<string, string> properties = 4;
}
message GraphNode {
string id = 1;
string kind = 2;
Location location = 3;
map<string, string> properties = 4;
string displayName = 5;
repeated Edge edges = 6;
}
message SemanticGraphFile {
string uri = 1;
repeated GraphNode nodes = 2;
}



Various Semantic Code Graph applications




Framework for working with the code
- Learning about the software (big picture)
- Browsing interactively the source code graph
- Monitoring code evolution
Learning about the software
(big picture)
Visual help - metals library



Visual help - betweenness



Visual help - Eigenvector



Visual help - LOC



Semantic Graphs Analytic
Analyzing and combining centralities:
- Degree Centrality
- Closeness Centrality
- Betweenness Centrality*
- Eigenvector Centrality
+ LOC metric
Finding noticable endpoints via Eccentricity
Betweenness centrality


where \(\sigma_{st}\) is the total number of shortest paths from node \(s\) to node \(t\) and \(\sigma_{st}(v)\) is the number of those paths that pass through .
Project insights - spark



Project insights - spark



Project insights - akka



Browsing interactively the source code graph
Graph Buddy

Semantic Code Graph visualization seamlessly integrated with IDE


Graph Buddy



Call hierarchy



Code structure



Architecture



Code evolution


- Average node degree
- Graph diameter
- Modularity
- Graph depth
- Average clustering coefficient

Code evolution



Ideas yet to be explored


- Architecture Release Guard
- Architecture Code Guard
Towards better modularization



Future plans


- Graph Buddy plugin evaluation
- Metrics based tips in IDE
- Monitoring project evolution [5]
- Integration with popular CIs
- Integration with SonarQube
Bibliography


[1] A. Bandi, B. J. Williams, and E. B. Allen, “Empirical evidence of codedecay: A systematic mapping study,” in2013 20th Working Conferenceon Reverse Engineering (WCRE), 2013, pp. 341–350.
[2] R. Arora and S. Goel, “Javarelationshipgraphs (jrg): Transforming java projects into graphs using neo4j graph databases,” in Proceedings of the 2nd International Conference on Software Engineering and Information Management, ser. ICSIM 2019.New York, NY, USA: Association for Computing Machinery, 2019, p. 80–84. [Online].Available: https://doi.org/10.1145/3305160.3305173
[3] V. Walunj, G. Gharibi, D. H. Ho, and Y. Lee, “Graphevo: Characterizing and understanding software evolution using call graphs,” in2019 IEEEInternational Conference on Big Data (Big Data), 2019, pp. 4799–4807.
[4] L. Bedu, O. Tinh, and F. Petrillo, “A tertiary systematic literature review on software visualization,” in2019 Working Conference on SoftwareVisualization (VISSOFT), 2019, pp. 33–44.
[5] J. Ferrante, K. J. Ottenstein, and J. D. Warren, “The program dependence graph and its use in optimization, ”ACM Trans. Program.Lang. Syst., vol. 9, no. 3, p. 319–349, Jul. 1987. [Online]. Available:https://doi.org/10.1145/24039.24041
[6] B. G. Ryder, “Constructing the call graph of a program,”IEEE Transactions on Software Engineering, vol. SE-5, no. 3, pp. 216–226, May1979.
[7] D. Grove, G. DeFouw, J. Dean, and C. Chambers, “Call graph construction in object-oriented languages, ”SIGPLAN Not., vol. 32,no. 10, p. 108–124, Oct. 1997. [Online]. Available: https://doi.org/10.1145/263700.264352
[8] J. Bohnet and J. D ̈ollner, “Visual exploration of function call graphs for feature location in complex software systems,” in Proceedings of the 2006 ACM Symposium on Software Visualization, ser. SoftVis ’06.New York, NY, USA: Association for Computing Machinery, 2006, p.95–104. [Online]. Available: https://doi.org/10.1145/1148493.1148508
[9] G. Shu, B. Sun, T. A. D. Henderson, and A. Podgurski, “Javapdg:A new platform for program dependence analysis,” in2013 IEEESixth International Conference on Software Testing, Verification andValidation, 2013, pp. 408–415
Exploring new methods of code structure analysis 2021
By liosedhel
Exploring new methods of code structure analysis 2021
- 1,050