Theories, tools and research methods in program comprehension: past, present and future
By Margaret-Anne Storey, 2006
Presented by Rodrigo Araujo
Paper goals
Summarize many cognitive theories and techniques applied to program comprehension
How developers understand code?
How these theories and techniques can lead to improvements in program comprehension tools
How it will change in the future
Cognitive models + development tools
Easier program comprehension
Cognitive science
Study of the mind and its processes. It examines the nature, the tasks, and the functions of cognition.
Mental model
Engineer's mental representation of the program to be understood
Cognitive model
Cognitive processes and temporary information structures in the engineer's head used to form the mental model
Programming plans
Generic fragments of code that represent typical scenario
Delocalized plans
Programming plans implemented in disparate areas of the program
Recognizable, familiar features in the code that act as cues to the presence of certain structures
Comprehension models
Top-down comprehension
Reconstructing knowledge about the domain of the program and mapping it to the source code
Top-down comprehension
Hypothesis on nature of the program
Top-down comprehension
Hypothesis is refined in hierarchical fashion by forming sub hypotheses
Hypothesis on nature of the program
Sub-hypotheses are refined and evaluated in depth-first
Top-down comprehension
Hypothesis on nature of the program
Hypothesis is refined in hierarchical fashion by forming sub hypotheses
Group small chunks of statements
Bottom-up comprehension
Group small chunks of statements
Bottom-up comprehension
Form higher level abstractions
Group small chunks of statements
Bottom-up comprehension
Form higher level abstractions
Repeat until high-level understanding of the program is attained
Knowledge-base model
Programmers as opportunistic processors capable of exploiting both bottom-up and top-down cues (Letovsky, 1986)
Knowledge-base model
Knowledge base
Developer's expertise and background knowledge
Mental model
Assimilation process
Developer's current understanding of the program
How the mental model evolves using the developer's knowledge base + code + docs. Bottom-up or top-down
Knowledge-base model
Knowledge base
Mental model
Assimilation process
Knowledge-base model
Asking questions and conjecturing answers
Knowledge-base model
Why conjectures
Questioning the role of a function or piece of code
How conjectures
What conjectures
What is the method for accomplishing a goal
What is the variable or one of the program functions
Integrated metamodel
Everything together: It builds on the previous models. It uses knowledge base to support 3 comprehension processes
Integrated metamodel
Top-down domain model
Invoked and developed using as-needed strategy, when code is familiar. It uses domain knowledge as starting point for formulating hypotheses
Program model
Situation model
Invoked when unfamiliar, it's a control-flow abstraction
Dataflow and functional abstractions in the program. Developed after a partial program model is formed
Integrated metamodel
Knowledge base
Information needed to build the top-down domain model, program model and situation model
Integrated metamodel
"Understanding is formed at several levels of abstractions simultaneously by switching between the three comprehension processes"
Current theories and tool support
Should be designed to support top-down comprehension (Brooks, 1983). Important to document problem domain, programming concepts and domain knowledge
Support to search code by analogy and iterative searching. Should allow querying on the role of a variable, function, etc.
Top-down process requires browsing from high level abstractions to lower levels details, taking advantage of beacons. Bottom-up process requires following control and dataflow. Both should be supported
Multiple views
Programming environments need to provide different ways of representing programs: textual representation, call graphs, classes and their relationship, etc. These views, if easily accessible, should facilitate comprehension, especially if combined and cross-referenced
Tool requirements
Concept assignment problem
Hard task of mapping the code to the requirements
Concept assignment problem
Hard task of mapping the code to the requirements
#1: User indicates a starting point and then uses program slicing techniques to find related code
#2: Intelligent agent to scan code and search for candidate starting points
Reverse engineering
Call trees, diff tools, browsing history and entity fan-in can support the top-down model
Software visualization tool needs
Support the comprehension models by displaying how components interact
Importance of search and history
Tools should support rediscovery. Developers quickly forget details on a specific part of the program when they move to a new location.
Information needs for maintainers
7 questions are usually asked:
1. Where is a particular subroutine/procedure invoked?
2. What are the arguments and results of a function?
3. How does control flow reach a particular location?
4. Where is a particular variable set, used or queried?
5. Where is a particular variable declared?
6. Where is a particular data object accessed?
7. What are the inputs and outputs of a module?
Tools should try to answer these questions
Tool research
Parsers & data gathering tools to collect static/dynamic data.
Parsers & data gathering tools to collect static/dynamic data.
Support clustering, concept assignment, feature identification, metrics. Dynamic analysis: code instrumentation
Parsers & data gathering tools to collect static/dynamic data.
Support clustering, concept assignment, feature identification, metrics. Dynamic analysis: code instrumentation
Code browsers, code editors, visualization tools
Programmer and program's trends: how will it be in the future?
More diversified programmers
"Programming, and hence program comprehension, is no longer a niche"
More diversified programmers
"Programming, and hence program comprehension, is no longer a niche"
Sophisticated users
Author bets that developers will use more complex UI to build software
Sophisticated users
Author bets that developers will use more complex UI to build software


Globally distributed teams
Globally distributed teams
Spot on!

Agile developers
Agile developers
Spot on!

Popularization of distributed and web applications
Popularization of distributed and web applications
"[...] more prevalent with technologies such as .NET, J2EE and web services. One programming challenge that is occurring now and is likely to increase, is the combination of different paradigms in distributed applications, e.g. a client side script sends XML to a server application"
Popularization of distributed and web applications

Improved and newer software engineering practices
We have problems. But we definitely improved over the past few years
How tools/theories will evolve in response to these changes
"Learning theories (Exton, 2002) will become more relevant to end-users doing programming-like tasks."
"Theories are currently being developed to describe the social and organizational aspects of program comprehension (Gutwin, 2004)"
"The use of frameworks as an underlying technology for software tools is leading to faster tool innovations as less time needs to be spent reinventing the wheel"

"The use of frameworks as an underlying technology for software tools is leading to faster tool innovations as less time needs to be spent reinventing the wheel"

Recommendation systems and search systems will evolve (i.e NavTracks). Visualization tools will evolve. Collaboration tools will evolve. Thus, Program Comprehension tools will evolve.
But how to seamlessly integrate all this to the developer's workflow without increasing complexity?

How can we avoid bloated IDEs?
But how to seamlessly integrate all this to the developer's workflow without increasing complexity?

How can we make more cognitive friendly software?
Paper pros/cons
Huge amount of ideas for developer tools
Great interdisciplinary study
Spreading awareness on how important cognitive models are
Evaluating tools is hard
Experience cannot always measure developers ability and creativity
By Rodrigo Araújo
