Algorithms for estimating truck factors: a comparative study
Mívian Ferreira · Thaís Mombach · Marco Tulio Valente · Kecia Ferreira
Kevin Hernández Rostrán
Seminario de Investigación II - Tecnológico de Costa Rica
TRuck Factor
“is defined as the minimal number of developers that have to be hit by a truck (or leave the team) in order to put the project in trouble”
LOw TF
The project’s knowledge is concentrated in few team members and therefore the project faces a serious risk of discontinuation in case these developers leave.
high TF
Every developer is contributing to the project in similar terms.
ZWK
- Zazworka et al (2010).
- First algorithm for TF.
AVL
- Avelino et al (2016).
- Degree-of- Authorship (DOA).
percentage of files that should be abandoned to configure a disaster scenario
coverage
Degree-of- Authorship (DOA)
the creation of a file f by a developer d initializes the value of DOA(d, f ); further commits on f by d increase DOA(d, f ); finally, commits by other developers decrease DOA(d,f)
RIG
- Rigby et al (2016).
- It is a non-deterministic algorithm.
- RIG can finish without computing a valid TF result.
CST
- Cosentino et al (2015).
- Primary developers (P).
- Secondary developers (S).
- Do not either provide more details nor the pseudo-code of CST algorithm.
Core Developers
Commit-Based Heuristic
LOC-Based Heuristic
- Sum of the added and removed lines, which they called churn.
- The Input parameter size receives the total churn of the analyzed project.
Research Questions
How accurate are the results provided by each algorithm?
How accurate are the results provided by each heuristic?
conclusions
- The original threshold (50% of abandoned files) proposed by Avelino et al (2016) has the best results in identifying Truck.
- RIG Algorithm has a non-deterministic behavior, and increasing the number of tested samples does not have a positive impact in its accuracy.
- Truck Factor developers are in most cases a subset of Core Developers, i.e., they tend to be the “core of the core developers”.
suggestions
- Replication of the work with proprietary software.
- Consider a time window for the artifacts used by Truck Factor algorithms.
- Consider social interaction on TF algorithms.
Algorithms for estimating truck factors: a comparative study
By Kevin Hernandez
Algorithms for estimating truck factors: a comparative study
- 57