Exploring new methods of code structure analysis to support understanding and evolution of complex IT systems
mgr inż. Krzysztof Borowski
dr hab. Bartosz Baliś
dr Tomasz Orzechowski
Presentation plan
-
Maintenance and evolution of complex software programs
-
Software development tools
-
Understanding the source code
-
Guarding code quality
-
-
Why available today tools are not sufficient?
-
Semantic Code Graph as an abstract code representation model
-
Various Semantic Code Graph applications
-
Code visualization - Graph Buddy
-
Monitoring software structure quality
-
-
Future plans
Current state of the software development
-
January 2020 - GitHub reports over 190 million repositories, 40 million active users (In 2010 - 1 million repositories)
-
Estimated billions lines of new code produced yearly
Maintenance of complex software programs [1]
"The only constant in code is change"
- Changing functional requirements
- business evolution
- not advancing means falling behind
- Changing non-functional requirements
- Availability, scalability, reliability
- Next-generation libraries
- Next-generation languages
- New technologies
A big ball of mud is a software system that lacks a perceivable architecture. Although undesirable from a software engineering point of view, such systems are common in practice due to business pressures, developer turnover and code entropy. They are a type of design anti-pattern.
https://en.wikipedia.org/wiki/Big_ball_of_mud
Properly maintained software enables rapid software evolution to address changing requirements.
Current software development tools
- Understanding the source code
- Guarding the code quality
Understanding the source code
- IDE advanced features
- syntax highlights
- integrated compilers
- semantically browsing the source code
- debugging
- IDE integrations
- version control systems integrations
- others (ticketing systems, database plugins, containers plugins)
- Visualizations (mostly UML diagrams)
Guarding the code quality [3]
- Linters
- guarding against common pitfalls
- enforcing programming style
- usually integrated with the compiler (i.e. scalastyle, scapegoat)
- IDE suggestions and guidance (often based on integrated linters)
- Tests (unite tests, integration tests) - more about code correctness than quality
- Static code analysis in External Server (SonarQube)
- gathering metrics from different tools
- static code analysis
- code coverage
Tools available today are not sufficient
-
Understanding project semantic structure
- folders and files
- browsing flatly structured files
- invisible coupling
-
Maintaining the project semantic structure
- keeping the code modular
- advising on project structure
- guarding against accidental complexity (cyclic dependencies, highly coupled modules and classes)
- guarding against code structure degradation over time
Understanding and visualization
- Source Trail https://www.sourcetrail.com/documentation/ (C++, Java) (free)
- NDepend - https://www.ndepend.com (C#) (paid)
- JArchitect - https://www.jarchitect.com (paid)
Guarding project structure
- ArchUnit - https://www.archunit.org
- Java-specific (bytecode and reflection)
- Should be applied from the very beginnig
- Helps with maintenance not with understanding the current state
ArchRule myRule = classes()
.that().resideInAPackage("..service..")
.should().onlyBeAccessed().byAnyPackage("..controller..", "..service..")
Abstract code representations [2]
- Abstract Syntax Tree (AST)
- Control Flow Graph
- Program Dependence Graph
- Code Property Graph
Abstract Syntax Tree
Control Flow Graph
Program Dependency Graph
Code Property Graph [4]
Semantic code structure representation
- Representing code syntax structure
- Representing code semantic structure
class A()
object AFactory {
def createA() = {
new A()
}
}
Semantic Code Graph (Scala)
class A()
object AFactory {
def createA() = {
new A()
}
}
hypothesis:
the code structure and dependencies of any program can be represented as a directed graph, precise enough to be valuable in various analyses and visualizations
Semantic Code Graph
Common format
syntax = "proto3";
message Location {
string uri = 1;
int32 startLine = 2;
int32 startCharacter = 3;
int32 endLine = 4;
int32 endCharacter = 5;
}
message Edge {
string to = 1;
string type = 2;
Location location = 3;
map<string, string> properties = 4;
}
message GraphNode {
string id = 1;
string kind = 2;
Location location = 3;
map<string, string> properties = 4;
string displayName = 5;
repeated Edge edges = 6;
}
message SemanticGraphFile {
string uri = 1;
repeated GraphNode nodes = 2;
}
Various Semantic Code Graph applications
Visual help
Graph Buddy - demo
Semantic Code Graph visualization seamlessly integrated with IDE
Graph Buddy
Software structure monitoring [6]
-
Particular nodes
- Betweenness centrality
- Clustering coefficient
- Modularity class
- Whole graph
- Finding communities
- Average node degree
- Graph diameter
- Modularity
- Graph depth
Betweenness centrality
where \(\sigma_{st}\) is the total number of shortest paths from node \(s\) to node \(t\) and \(\sigma_{st}(v)\) is the number of those paths that pass through .
Where we want to be
Future plans
- Graph Buddy plugin evaluation
- Metrics based tips in IDE
- Monitoring project evolution [5]
- Integration with popular CIs
- Integration with SonarQube
Bibliography
[1] S. G. Eick, T. L. Graves, A. F. Karr, J. S. Marron, and A. Mockus,“Does code decay? assessing the evidence from change managementdata,”IEEE Transactions on Software Engineering, vol. 27, no. 1, pp.1–12, 2001.
[2] Vinay Arora, Rajesh Bhatia, and Maninder Singh. “Evaluation of FlowGraph and Dependence Graphs for Program Representation”. In:Interna-tional Journal of Computer Applications56 (Oct. 2012), pp. 18–23.doi:10.5120/8959-3161.
[3] J. Bohnet and J. D ̈ollner, “Monitoring code quality and developmentactivity by software maps,”Proceedings - International Conference onSoftware Engineering, 01 2011.
[4] F. Yamaguchi et al. “Modeling and Discovering Vulnerabilities with CodeProperty Graphs”. In:2014 IEEE Symposium on Security and Privacy.2014, pp. 590–604.doi:10.1109/SP.2014.44.
[5] P. Bhattacharya et al. “Graph-based analysis and prediction for softwareevolution”. In:2012 34th International Conference on Software Engineering(ICSE). June 2012, pp. 419–429
[6] . Walunj et al. “GraphEvo: Characterizing and Understanding SoftwareEvolution using Call Graphs”. In:2019 IEEE International Conference onBig Data (Big Data). 2019, pp. 4799–4807.
Exploring new methods of code structure analysis
By liosedhel
Exploring new methods of code structure analysis
- 1,042