Theories, tools and research methods in program comprehension: past, present and future

By Margaret-Anne Storey, 2006

Presented by Rodrigo Araujo

Paper goals

Summarize many cognitive theories and techniques applied to program comprehension

Paper goals

Summarize many cognitive theories and techniques applied to program comprehension

How developers understand code?

Paper goals

How these theories and techniques can lead to improvements in program comprehension tools

Paper goals

How it will change in the future 

Paper goals

Cognitive models + development tools

Easier program comprehension

Terminology

Terminology

Cognitive science

Study of the mind and its processes. It examines the nature, the tasks, and the functions of cognition.

Terminology

Mental model

Engineer's mental representation of the program to be understood

Terminology

Cognitive model

Cognitive processes and temporary information structures in the engineer's head used to form the mental model

Terminology

Programming plans

Generic fragments of code that represent typical scenario

Terminology

Delocalized plans

Programming plans implemented in disparate areas of the program

Terminology

Beacons

Recognizable, familiar features in the code that act as cues to the presence of certain structures

Comprehension models

Comprehension models

Top-down comprehension

Reconstructing knowledge about the domain of the program and mapping it to the source code

Comprehension models

Top-down comprehension

Hypothesis on nature of the program

Comprehension models

Top-down comprehension

Hypothesis is refined in hierarchical fashion by forming sub hypotheses

Hypothesis on nature of the program

Comprehension models

Sub-hypotheses are refined and evaluated in depth-first

Top-down comprehension

Hypothesis on nature of the program

Hypothesis is refined in hierarchical fashion by forming sub hypotheses

Comprehension models

Group small chunks of statements

Bottom-up comprehension

Comprehension models

Group small chunks of statements

Bottom-up comprehension

Form higher level abstractions

Comprehension models

Group small chunks of statements

Bottom-up comprehension

Form higher level abstractions

Repeat until high-level understanding of the program is attained

Comprehension models

Knowledge-base model

Programmers as opportunistic processors capable of exploiting both bottom-up and top-down cues (Letovsky, 1986)

Comprehension models

Knowledge-base model

Knowledge base

Developer's expertise and background knowledge

Mental model

Assimilation process

Developer's current understanding of the program

How the mental model evolves using the developer's knowledge base + code + docs. Bottom-up or top-down

Comprehension models

Knowledge-base model

Knowledge base

Mental model

Assimilation process

+

+

Inquiries

Comprehension models

Knowledge-base model

Inquiries

Asking questions and conjecturing answers

Comprehension models

Knowledge-base model

Inquiries

Why conjectures

Questioning the role of a function or piece of code

How conjectures

What conjectures

What is the method for accomplishing a goal

What is the variable or one of the program functions

Comprehension models

Integrated metamodel

Everything together: It builds on the previous models. It uses knowledge base to support 3 comprehension processes

Comprehension models

Integrated metamodel

Top-down domain model

Invoked and developed using as-needed strategy, when code is familiar. It uses domain knowledge as starting point for formulating hypotheses

Program model

Situation model

Invoked when unfamiliar, it's a control-flow abstraction

Dataflow and functional abstractions in the program. Developed after a partial program model  is formed 

Comprehension models

Integrated metamodel

Knowledge base

Information needed to build the top-down domain model, program model and situation model

Comprehension models

Integrated metamodel

"Understanding is formed at several levels of abstractions simultaneously by switching between the three comprehension processes"

Current theories and tool support

Current theories and tool support

Documentation

Should be designed to support top-down comprehension (Brooks, 1983). Important to document problem domain, programming concepts and domain knowledge

Searching/Querying

Support to search code by analogy and iterative searching. Should allow querying on the role of a variable, function, etc.

Current theories and tool support

Browsing/Navigation

Top-down process requires browsing from high level abstractions to lower levels details, taking advantage of beacons. Bottom-up process requires following control and dataflow. Both should be supported

Multiple views

Programming environments need to provide different ways of representing programs: textual representation, call graphs, classes and their relationship, etc. These views, if easily accessible, should facilitate comprehension, especially if combined and cross-referenced

Tool requirements

Concept assignment problem

Hard task of mapping the code to the requirements

Tool requirements

Concept assignment problem

Hard task of mapping the code to the requirements

#1: User indicates a starting point and then uses program slicing techniques to find related code

#2: Intelligent agent to scan code and search for candidate starting points

Tool requirements

Reverse engineering

Call trees, diff tools, browsing history and entity fan-in can support the top-down model

Tool requirements

Software visualization tool needs

Support the comprehension models by displaying how components interact

Tool requirements

Importance of search and history

Tools should support rediscovery. Developers quickly forget details on a specific part of the program when they move to a new location.

Tool requirements

Information needs for maintainers

7 questions are usually asked:

1. Where is a particular subroutine/procedure invoked?

2. What are the arguments and results of a function?

3. How does control flow reach a particular location?

4. Where is a particular variable set, used or queried?

5. Where is a particular variable declared?

6. Where is a particular data object accessed?

7. What are the inputs and outputs of a module?

Tools should try to answer these questions

Tool research

Extraction

Analysis

Presentation

Tool research

Parsers & data gathering tools to collect static/dynamic data.

Extraction

Analysis

Presentation

Tool research

Parsers & data gathering tools to collect static/dynamic data.

Extraction

Analysis

Presentation

Support clustering, concept assignment, feature identification, metrics. Dynamic analysis: code instrumentation

Tool research

Parsers & data gathering tools to collect static/dynamic data.

Extraction

Analysis

Presentation

Support clustering, concept assignment, feature identification, metrics. Dynamic analysis: code instrumentation

Code browsers, code editors, visualization tools

Programmer and program's trends: how will it be in the future?

Programmer and program's trends

More diversified programmers

"Programming, and hence program comprehension, is no longer a niche"

Programmer and program's trends

More diversified programmers

"Programming, and hence program comprehension, is no longer a niche"

Programmer and program's trends

Sophisticated users

Author bets that developers will use more complex UI to build software

Programmer and program's trends

Sophisticated users

Author bets that developers will use more complex UI to build software

?

Programmer and program's trends

Globally distributed teams

Programmer and program's trends

Globally distributed teams

Spot on!

Programmer and program's trends

Agile developers

Programmer and program's trends

Agile developers

Spot on!

Programmer and program's trends

Popularization of distributed and web applications

Programmer and program's trends

Popularization of distributed and web applications

"[...] more prevalent with technologies such as .NET, J2EE and web services. One programming challenge that is occurring now and is likely to increase, is the combination of different paradigms in distributed applications, e.g. a client side script sends XML to a server application"

Programmer and program's trends

Popularization of distributed and web applications

Programmer and program's trends

Improved and newer software engineering practices

We have problems. But we definitely improved over the past few years

How tools/theories will evolve in response to these changes

How tools/theories will evolve in response to these changes

"Learning theories (Exton, 2002) will become more relevant to end-users doing programming-like tasks."

"Theories are currently being developed to describe the social and organizational aspects of program comprehension (Gutwin, 2004)"

How tools/theories will evolve in response to these changes

"The use of frameworks as an underlying technology for software tools is leading to faster tool innovations as less time needs to be spent reinventing the wheel"

How tools/theories will evolve in response to these changes

"The use of frameworks as an underlying technology for software tools is leading to faster tool innovations as less time needs to be spent reinventing the wheel"

Conclusion

Recommendation systems and search systems will evolve (i.e NavTracks). Visualization tools will evolve. Collaboration tools will evolve. Thus, Program Comprehension tools will evolve.

Conclusion

But how to seamlessly integrate all this to the developer's workflow without increasing complexity?

How can we avoid bloated IDEs?

Conclusion

But how to seamlessly integrate all this to the developer's workflow without increasing complexity?

Conclusion

How can we make more cognitive friendly software?

Paper pros/cons

Pros

Cons

Huge amount of ideas for developer tools

Great interdisciplinary study

Spreading awareness on how important cognitive models are

Evaluating tools is hard

Experience cannot always measure developers ability and creativity

Theories, tools and research methods in program comprehension: past, present and future

By Rodrigo Araújo

Theories, tools and research methods in program comprehension: past, present and future

  • 1,790