Algorithms for estimating truck factors: a comparative study

Mívian Ferreira · Thaís Mombach · Marco Tulio Valente · Kecia Ferreira

Kevin Hernández Rostrán

Seminario de Investigación II - Tecnológico de Costa Rica

TRuck Factor

“is defined as the minimal number of developers that have to be hit by a truck (or leave the team) in order to put the project in trouble”

LOw TF

The project’s knowledge is concentrated in few team members and therefore the project faces a serious risk of discontinuation in case these developers leave.

high TF

Every developer is contributing to the project in similar terms.

ZWK

  • Zazworka et al (2010).
  • First algorithm for TF.

AVL

  • Avelino et al (2016).
  • Degree-of- Authorship (DOA).

percentage of files that should be abandoned to configure a disaster scenario

coverage

Degree-of- Authorship (DOA)

the creation of a file f by a developer d initializes the value of DOA(d, f ); further commits on f by d increase DOA(d, f ); finally, commits by other developers decrease DOA(d,f)

RIG

  • Rigby et al (2016).
  • It is a non-deterministic algorithm.
  • RIG can finish without computing a valid TF result.

CST

  • Cosentino et al (2015).
  • Primary developers (P).
  • Secondary developers (S).
  • Do not either provide more details nor the pseudo-code of CST algorithm.

Core Developers

Commit-Based Heuristic

LOC-Based Heuristic

  • Sum of the added and removed lines, which they called churn.
  • The Input parameter size receives the total churn of the analyzed project.

 

Research Questions

How accurate are the results provided by each algorithm?

How accurate are the results provided by each heuristic?

conclusions

  • The original threshold (50% of abandoned files) proposed by Avelino et al (2016) has the best results in identifying Truck.
  • RIG Algorithm has a non-deterministic behavior, and increasing the number of tested samples does not have a positive impact in its accuracy.
  • Truck Factor developers are in most cases a subset of Core Developers, i.e., they tend to be the “core of the core developers”.

suggestions

  • Replication of the work with proprietary software.
  • Consider a time window for the artifacts used by Truck Factor algorithms.
  • Consider social interaction on TF algorithms.

Algorithms for estimating truck factors: a comparative study

By Kevin Hernandez

Algorithms for estimating truck factors: a comparative study

  • 57