Theories, tools and research methods in program comprehension: past, present and future
By Margaret-Anne Storey, 2006
Presented by Rodrigo Araujo
Paper goals
Summarize many cognitive theories and techniques applied to program comprehension
Paper goals
Summarize many cognitive theories and techniques applied to program comprehension
How developers understand code?
Paper goals
How these theories and techniques can lead to improvements in program comprehension tools
Paper goals
How it will change in the future
Paper goals
Cognitive models + development tools
Easier program comprehension
Terminology
Terminology
Cognitive science
Study of the mind and its processes. It examines the nature, the tasks, and the functions of cognition.
Terminology
Mental model
Engineer's mental representation of the program to be understood
Terminology
Cognitive model
Cognitive processes and temporary information structures in the engineer's head used to form the mental model
Terminology
Programming plans
Generic fragments of code that represent typical scenario
Terminology
Delocalized plans
Programming plans implemented in disparate areas of the program
Terminology
Beacons
Recognizable, familiar features in the code that act as cues to the presence of certain structures
Comprehension models
Comprehension models
Top-down comprehension
Reconstructing knowledge about the domain of the program and mapping it to the source code
Comprehension models
Top-down comprehension
Hypothesis on nature of the program
Comprehension models
Top-down comprehension
Hypothesis is refined in hierarchical fashion by forming sub hypotheses
Hypothesis on nature of the program
Comprehension models
Sub-hypotheses are refined and evaluated in depth-first
Top-down comprehension
Hypothesis on nature of the program
Hypothesis is refined in hierarchical fashion by forming sub hypotheses
Comprehension models
Group small chunks of statements
Bottom-up comprehension
Comprehension models
Group small chunks of statements
Bottom-up comprehension
Form higher level abstractions
Comprehension models
Group small chunks of statements
Bottom-up comprehension
Form higher level abstractions
Repeat until high-level understanding of the program is attained
Comprehension models
Knowledge-base model
Programmers as opportunistic processors capable of exploiting both bottom-up and top-down cues (Letovsky, 1986)
Comprehension models
Knowledge-base model
Knowledge base
Developer's expertise and background knowledge
Mental model
Assimilation process
Developer's current understanding of the program
How the mental model evolves using the developer's knowledge base + code + docs. Bottom-up or top-down
Comprehension models
Knowledge-base model
Knowledge base
Mental model
Assimilation process
+
+
Inquiries
Comprehension models
Knowledge-base model
Inquiries
Asking questions and conjecturing answers
Comprehension models
Knowledge-base model
Inquiries
Why conjectures
Questioning the role of a function or piece of code
How conjectures
What conjectures
What is the method for accomplishing a goal
What is the variable or one of the program functions
Comprehension models
Integrated metamodel
Everything together: It builds on the previous models. It uses knowledge base to support 3 comprehension processes
Comprehension models
Integrated metamodel
Top-down domain model
Invoked and developed using as-needed strategy, when code is familiar. It uses domain knowledge as starting point for formulating hypotheses
Program model
Situation model
Invoked when unfamiliar, it's a control-flow abstraction
Dataflow and functional abstractions in the program. Developed after a partial program model is formed
Comprehension models
Integrated metamodel
Knowledge base
Information needed to build the top-down domain model, program model and situation model
Comprehension models
Integrated metamodel
"Understanding is formed at several levels of abstractions simultaneously by switching between the three comprehension processes"
Current theories and tool support
Current theories and tool support
Documentation
Should be designed to support top-down comprehension (Brooks, 1983). Important to document problem domain, programming concepts and domain knowledge
Searching/Querying
Support to search code by analogy and iterative searching. Should allow querying on the role of a variable, function, etc.
Current theories and tool support
Browsing/Navigation
Top-down process requires browsing from high level abstractions to lower levels details, taking advantage of beacons. Bottom-up process requires following control and dataflow. Both should be supported
Multiple views
Programming environments need to provide different ways of representing programs: textual representation, call graphs, classes and their relationship, etc. These views, if easily accessible, should facilitate comprehension, especially if combined and cross-referenced
Tool requirements
Concept assignment problem
Hard task of mapping the code to the requirements
Tool requirements
Concept assignment problem
Hard task of mapping the code to the requirements
#1: User indicates a starting point and then uses program slicing techniques to find related code
#2: Intelligent agent to scan code and search for candidate starting points
Tool requirements
Reverse engineering
Call trees, diff tools, browsing history and entity fan-in can support the top-down model
Tool requirements
Software visualization tool needs
Support the comprehension models by displaying how components interact
Tool requirements
Importance of search and history
Tools should support rediscovery. Developers quickly forget details on a specific part of the program when they move to a new location.
Tool requirements
Information needs for maintainers
7 questions are usually asked:
1. Where is a particular subroutine/procedure invoked?
2. What are the arguments and results of a function?
3. How does control flow reach a particular location?
4. Where is a particular variable set, used or queried?
5. Where is a particular variable declared?
6. Where is a particular data object accessed?
7. What are the inputs and outputs of a module?
Tools should try to answer these questions
Tool research
Extraction
Analysis
Presentation
Tool research
Parsers & data gathering tools to collect static/dynamic data.
Extraction
Analysis
Presentation
Tool research
Parsers & data gathering tools to collect static/dynamic data.
Extraction
Analysis
Presentation
Support clustering, concept assignment, feature identification, metrics. Dynamic analysis: code instrumentation
Tool research
Parsers & data gathering tools to collect static/dynamic data.
Extraction
Analysis
Presentation
Support clustering, concept assignment, feature identification, metrics. Dynamic analysis: code instrumentation
Code browsers, code editors, visualization tools
Programmer and program's trends: how will it be in the future?
Programmer and program's trends
More diversified programmers
"Programming, and hence program comprehension, is no longer a niche"
Programmer and program's trends
More diversified programmers
"Programming, and hence program comprehension, is no longer a niche"
Programmer and program's trends
Sophisticated users
Author bets that developers will use more complex UI to build software
Programmer and program's trends
Sophisticated users
Author bets that developers will use more complex UI to build software
?
Programmer and program's trends
Globally distributed teams
Programmer and program's trends
Globally distributed teams
Spot on!
Programmer and program's trends
Agile developers
Programmer and program's trends
Agile developers
Spot on!
Programmer and program's trends
Popularization of distributed and web applications
Programmer and program's trends
Popularization of distributed and web applications
"[...] more prevalent with technologies such as .NET, J2EE and web services. One programming challenge that is occurring now and is likely to increase, is the combination of different paradigms in distributed applications, e.g. a client side script sends XML to a server application"
Programmer and program's trends
Popularization of distributed and web applications
Programmer and program's trends
Improved and newer software engineering practices
We have problems. But we definitely improved over the past few years
How tools/theories will evolve in response to these changes
How tools/theories will evolve in response to these changes
"Learning theories (Exton, 2002) will become more relevant to end-users doing programming-like tasks."
"Theories are currently being developed to describe the social and organizational aspects of program comprehension (Gutwin, 2004)"
How tools/theories will evolve in response to these changes
"The use of frameworks as an underlying technology for software tools is leading to faster tool innovations as less time needs to be spent reinventing the wheel"
How tools/theories will evolve in response to these changes
"The use of frameworks as an underlying technology for software tools is leading to faster tool innovations as less time needs to be spent reinventing the wheel"
Conclusion
Recommendation systems and search systems will evolve (i.e NavTracks). Visualization tools will evolve. Collaboration tools will evolve. Thus, Program Comprehension tools will evolve.
Conclusion
But how to seamlessly integrate all this to the developer's workflow without increasing complexity?
How can we avoid bloated IDEs?
Conclusion
But how to seamlessly integrate all this to the developer's workflow without increasing complexity?
Conclusion
How can we make more cognitive friendly software?
Paper pros/cons
Pros
Cons
Huge amount of ideas for developer tools
Great interdisciplinary study
Spreading awareness on how important cognitive models are
Evaluating tools is hard
Experience cannot always measure developers ability and creativity
Theories, tools and research methods in program comprehension: past, present and future
By Rodrigo Araújo
Theories, tools and research methods in program comprehension: past, present and future
- 1,821