Exploring new methods of code structure analysis to support understanding and evolution of complex IT systems
mgr inż. Krzysztof Borowski
dr hab. Bartosz Baliś
dr Tomasz Orzechowski
Presentation plan
-
Modern software development
-
The current state of the art in code understanding space
-
Semantic Code Graph as a semantic aware code structure representation
-
Semantic Code Graph applications
Current state of the software development
-
January 2020 - GitHub reports over 190 million repositories, 40 million active users (In 2010 - 1 million repositories)
-
Estimated billions lines of new code produced yearly
A big ball of mud is a software system that lacks a perceivable architecture. Although undesirable from a software engineering point of view, such systems are common in practice due to business pressures, developer turnover and code entropy. They are a type of design anti-pattern.
https://en.wikipedia.org/wiki/Big_ball_of_mud
Properly maintained software enables rapid software evolution to address changing requirements.
In 2021, IT spending on enterprise software is expected to amount to around 601 billion U.S. dollars worldwide, a growth of 13.6 percent from the previous year.
Current software development tools
Understanding the source code
- IDE advanced features
- syntax highlights
- integrated compilers
- semantically browsing the source code
- debugging
- IDE integrations
- version control systems integrations
- others (ticketing systems, database plugins, containers plugins)
- Visualizations (mostly UML diagrams)
Guarding the code quality [3]
- Linters
- guarding against common pitfalls
- enforcing programming style
- usually integrated with the compiler (i.e. scalastyle, scapegoat)
- IDE suggestions and guidance (often based on integrated linters)
- Tests (unite tests, integration tests) - more about code correctness than quality
- Static code analysis in External Server (SonarQube)
- gathering metrics from different tools
- static code analysis
- code coverage
Understanding and visualization
- Source Trail https://www.sourcetrail.com/documentation/ (C++, Java) (free)
- NDepend - https://www.ndepend.com (C#) (paid)
- JArchitect - https://www.jarchitect.com (paid)
Guarding project structure
- ArchUnit - https://www.archunit.org
- Java-specific (bytecode and reflection)
- Should be applied from the very beginnig
- Helps with maintenance not with understanding the current state
ArchRule myRule = classes()
.that().resideInAPackage("..service..")
.should().onlyBeAccessed().byAnyPackage("..controller..", "..service..")
Can the tools be better?
-
Understanding project semantic structure
- folders and files
- browsing flatly structured files
- invisible coupling
-
Maintaining the project semantic structure
- keeping the code modular
- advising on project structure
- guarding against accidental complexity (cyclic dependencies, highly coupled modules and classes)
- guarding against code structure degradation over time
Can we represent the source code differently?
Abstract code representations [2]
- Abstract Syntax Tree (AST)
- Control Flow Graph
- Data Flow Graph
- Program Dependence Graph
- Code Property Graph
Abstract Syntax Tree
Abstract Syntax Tree
Control Flow Graph
Program Dependency Graph
CFG + DFG
Code Property Graph [4]
CFG + PDG + AST
Call Graph
Semantic code structure representation
- Representing code syntax structure
- Representing code semantic structure
class A()
object AFactory {
def createA() = {
new A()
}
}
Semantic Code Graph (Scala)
class A()
object AFactory {
def createA() = {
new A()
}
}
Semantic Code Graph - Code Structure
class A() {
def runA(): Unit = ...
def run(): Unit = ...
}
object AFactory {
def createA() = {
new A()
}
}
def main() = {
AFactory.createA().runA()
}
hypothesis:
the code structure and dependencies of any program can be represented as a directed graph, precise enough to be valuable in various analyses and visualizations
Semantic Code Graph extraction
- Semantic Graphs Scalac Compiler Plugin - published
- proto files generated during compilation process available for later static analysis
- Java Semantic Graphs Extractor in progress
Common format
syntax = "proto3";
message Location {
string uri = 1;
int32 startLine = 2;
int32 startCharacter = 3;
int32 endLine = 4;
int32 endCharacter = 5;
}
message Edge {
string to = 1;
string type = 2;
Location location = 3;
map<string, string> properties = 4;
}
message GraphNode {
string id = 1;
string kind = 2;
Location location = 3;
map<string, string> properties = 4;
string displayName = 5;
repeated Edge edges = 6;
}
message SemanticGraphFile {
string uri = 1;
repeated GraphNode nodes = 2;
}
Various Semantic Code Graph applications
Framework for working with the code
- Learning about the software (big picture)
- Browsing interactively the source code graph
- Monitoring code evolution
Learning about the software
(big picture)
Visual help - metals library
Visual help - betweenness
Visual help - Eigenvector
Visual help - LOC
Semantic Graphs Analytic
Analyzing and combining centralities:
- Degree Centrality
- Closeness Centrality
- Betweenness Centrality*
- Eigenvector Centrality
+ LOC metric
Finding noticable endpoints via Eccentricity
Betweenness centrality
where \(\sigma_{st}\) is the total number of shortest paths from node \(s\) to node \(t\) and \(\sigma_{st}(v)\) is the number of those paths that pass through .
Project insights - spark
Project insights - spark
Project insights - akka
Browsing interactively the source code graph
Graph Buddy
Semantic Code Graph visualization seamlessly integrated with IDE
Graph Buddy
Call hierarchy
Code structure
Architecture
Code evolution
- Average node degree
- Graph diameter
- Modularity
- Graph depth
- Average clustering coefficient
Code evolution
Ideas yet to be explored
- Architecture Release Guard
- Architecture Code Guard
Towards better modularization
Future plans
- Graph Buddy plugin evaluation
- Metrics based tips in IDE
- Monitoring project evolution [5]
- Integration with popular CIs
- Integration with SonarQube
Bibliography
[1] A. Bandi, B. J. Williams, and E. B. Allen, “Empirical evidence of codedecay: A systematic mapping study,” in2013 20th Working Conferenceon Reverse Engineering (WCRE), 2013, pp. 341–350.
[2] R. Arora and S. Goel, “Javarelationshipgraphs (jrg): Transforming java projects into graphs using neo4j graph databases,” in Proceedings of the 2nd International Conference on Software Engineering and Information Management, ser. ICSIM 2019.New York, NY, USA: Association for Computing Machinery, 2019, p. 80–84. [Online].Available: https://doi.org/10.1145/3305160.3305173
[3] V. Walunj, G. Gharibi, D. H. Ho, and Y. Lee, “Graphevo: Characterizing and understanding software evolution using call graphs,” in2019 IEEEInternational Conference on Big Data (Big Data), 2019, pp. 4799–4807.
[4] L. Bedu, O. Tinh, and F. Petrillo, “A tertiary systematic literature review on software visualization,” in2019 Working Conference on SoftwareVisualization (VISSOFT), 2019, pp. 33–44.
[5] J. Ferrante, K. J. Ottenstein, and J. D. Warren, “The program dependence graph and its use in optimization, ”ACM Trans. Program.Lang. Syst., vol. 9, no. 3, p. 319–349, Jul. 1987. [Online]. Available:https://doi.org/10.1145/24039.24041
[6] B. G. Ryder, “Constructing the call graph of a program,”IEEE Transactions on Software Engineering, vol. SE-5, no. 3, pp. 216–226, May1979.
[7] D. Grove, G. DeFouw, J. Dean, and C. Chambers, “Call graph construction in object-oriented languages, ”SIGPLAN Not., vol. 32,no. 10, p. 108–124, Oct. 1997. [Online]. Available: https://doi.org/10.1145/263700.264352
[8] J. Bohnet and J. D ̈ollner, “Visual exploration of function call graphs for feature location in complex software systems,” in Proceedings of the 2006 ACM Symposium on Software Visualization, ser. SoftVis ’06.New York, NY, USA: Association for Computing Machinery, 2006, p.95–104. [Online]. Available: https://doi.org/10.1145/1148493.1148508
[9] G. Shu, B. Sun, T. A. D. Henderson, and A. Podgurski, “Javapdg:A new platform for program dependence analysis,” in2013 IEEESixth International Conference on Software Testing, Verification andValidation, 2013, pp. 408–415
Exploring new methods of code structure analysis 2021
By liosedhel
Exploring new methods of code structure analysis 2021
- 787