May 24, 2012

The Importance of Development Documentation

Overview

Lately I've found myself harping on the importance of documenting code, program execution, and SCM items (i.e. JIRA issues and Perforce changelists). Documentation can be a controversial topic, particularly when mixing people from opposite camps on the subject. It has even been referred to as a philosophical difference.

Typical arguments against producing documentation for internal consumption tend to fall into the following two categories:

  • The documentation is superfluous with respect to the source code.
  • The resources spent producing documentation are better spent elsewhere.

While I could continue to espouse the benefits of good documentation, in many ways the discussion reduces to a disagreement along the lines of he-said/she-said. So instead of proselytizing I will instead provide scientific evidence in support of documentation. It is not a difference of philosophy.

The majority of evidence presented here applies to software developers but the analogous benefits apply o any persons involved in the development process including QA, technical writers, and anyone else that may need to synthesize information about the product. Only evidence indicated as statistically significant is included.

This essay will not cover the benefits of clean code although those benefits may be discussed in the referenced papers. For more on clean code, see Clean Code: A Handbook of Agile Software Craftsmanship [Google Books] or Writing clean code [IBM developerWorks].

The Importance of Comprehension

To identify the value associated with the variable or attribute of a scientific experiment a metric must be defined. For documentation that metric is comprehension and the resulting benefits of improved comprehension.

Debugging Efficiency

Leo Gugerty and Gary M. Olson. 1986. Comprehension Differences in Debugging by Skilled and Novice Programmers. In Papers presented at the first workshop on empirical studies of programmers on Empirical studies of programmers, Elliot Soloway and Sitharama Iyengar (Eds.). Ablex Publishing Corp., Norwood, NJ, USA, 13-27.

Gugerty and Olson conducted an experiment to determine differences in debugging skill between novice and expert programmers. Experts were able to identify and fix the programs in less than half the time (18.2m/17.3m for novices, 7.0m/9.3m for experts), with fewer attempts (4.5/2.2 for novices, 1.9/1.1 for experts), and with less probability of introducing new bugs (23%/30% for novices, 17%/0% for experts). Results indicated this was in large part due to generating high quality hypotheses with less study of the code primarily due to their superior ability to comprehend the program.

Murthi Nanja and Curtis R. Cook. 1987. An analysis of the on-line debugging process. In Empirical studies of programmers: second workshop, Gary M. Olson, Sylvia Sheppard, and Elliot Soloway (Eds.). Ablex Publishing Corp., Norwood, NJ, USA 172-184.

Nanja and Cook studied differences in the debugging process of expert, intermediate, and novice programmers and measured their performance when debugging. Their results support the conclusions of Gugerty and Olson's study: experts relied on superior program comprehension to fix bugs faster (19.8m for experts, 36.55m/56.0m for intermediates and novices) with less code changes (8.83 LOC for experts, 10.33/23.16 LOC for intermediates and novices) and without introducing as many new bugs (1 for experts, 2.33/4.83 for intermediates and novices).

Robert W. Holt, Deborah A. Boehm-Davis, and Alan C. Shultz. 1987. Mental representations of programs for student and professional programmers. In Empirical studies of programmers: second workshop, Gary M. Olson, Sylvia Sheppard, and Elliot Soloway (Eds.). Ablex Publishing Corp., Norwood, NJ, USA 33-46.

Holt et. al. examined the correlation between a programmer's perceived difficulty and complexity of code on that programmer's debugging performance. They found a small but significant correlation between debugging time/attempts and the difficulty in finding information (0.235/0.184/0.237) and the difficulty in recognizing program units (0.291/0.177/0.205). A somewhat less significant correlation was found between difficultly in working with the code and time to debug (0.210) and between program formatting being too condensed and number of debugging transactions (0.197).

Poor comprehension increased the time to fix bugs and correlated with the introduction of new bugs or incorrect fixes.

Systematic Understanding

David C. Littman, Jeannine Pinto, Stanley Letovsky, and Elliot Soloway. 1987. Mental models and software maintenance. Journal of Systems and Software. 7, 4 (December 1987), 341-355. DOI=10.1016/0164-1212(87)90033-1 http://dx.doi.org/10.1016/0164-1212(87)90033-1{info}

Littman et. al. analyzed the development process of experienced programmers tasked with modifying a program and identified two categories for understanding programs.

  1. Systematic developers trace data and control flow to understand global program behavior. The programmer detects causal interactions between program components and designs a modification taking these interactions into account.
  2. As-needed developers limit the scope of their understanding to the code that must be modified to implement the change. Data and control flow and interactions that may be affected due to the modification are unlikely to be found.

In their experiment all five developers who used the systematic strategy successfully modified the program while all five developers who used the as-needed strategy failed to modify the program correctly.

Failure to understand global program behavior and interactions between components resulted in incorrect implementation every time.

Code Reuse

Hoadley, C.M., Mann, L.M., Linn, M.C., & Clancy, M.J. (1996). When, Why and How do Novice Programmers Reuse Code? In W. Gray & D. Boehm-Davis (Eds.), Empirical Studies of Programmers, Sixth Workshop (pp. 109-130). Norwood, NJ: Ablex.

Among developers who are pre-disposed towards code reuse, comprehension influenced both the frequency of and form of reuse. Two mechanisms of reuse were examined:

  1. Direct is reuse of a function by calling it from new code.
  2. Cloning is copying code out of an existing function into new code.

An abstract understanding of functions resulted in 65% reuse (both direct and cloned) while only an algorithmic understanding resulted in 12% reuse. Misunderstood functions had low direct reuse of 5% but were reused by cloning 40%.

Code that is not well understood is less likely to be reused. Code that is misunderstood is likely to result in incorrect code.

Improving Comprehension

Beacons

Beacons are key features in code that indicate the presence of a structure or operation and strengthen the reader's hypothesis of functional behavior. They serve as shortcuts towards comprehension; failing to recognize a beacon requires a developer to spend additional time on comprehension.

Susan Wiedenbeck. 1986. Processes in Computer Program Comprehension. In Papers presented at the first workshop on empirical studies of programmers on Empirical studies of programmers, Elliot Soloway and Sitharama Iyengar (Eds.). Ablex Publishing Corp., Norwood, NJ, USA, 48-57.

Wiedenbeck's experiments found that experienced programmers were able to recall 77.75% of the beacons versus 47.50% of the non-beacons in the code while novices only recalled 13.83% of the beacons and 30.42% of the non-beacons.

Martha E. Crosby and Jean Scholtz and Susan Wiedenbeck. 2002. The Roles Beacons Play in Comprehension for Novice and Expert Programmers. In Programmers, 14th Workshop of the Psychology of Programming Interest Group, Brunel University. 18-21.

Comment beacons indicative of functionality are quickly processed by experienced programmers. Pure code beacons (i.e. important lines of code) require more time to process and might benefit from comprehension aids.

Edward M. Gellenbeck and Curtis R. Cook. 1991. An Investigation of Procedure and Variable Names as Beacons During Program Comprehension. Technical Report. Oregon State University, Corvallis, OR, USA.

Gellenbeck and Cook found that meaningful procedure and variable names resulted in higher rates (52% and 74%) of correct behavior identification compared to combinations with neutral procedure and variable names. However this still shows a large percentage of incorrect identification (48% and 26%) for undocumented source code.

Add documentation beacons (comments, mnemonic hints, or whitespace and formatting) to highlight important operations and logical concepts to speed up comprehension time and ensure proper comprehension.

Plausible Slot Filling

Stanley Letovsky. 1986. Cognitive Processes in Program Comprehension. In Papers presented at the first workshop on empirical studies of programmers on Empirical studies of programmers, Elliot Soloway and Sitharama Iyengar (Eds.). Ablex Publishing Corp., Norwood, NJ, USA, 58-79.

Plausible slot filling is an attempt to explain an unknown based on existing incomplete knowledge. It is a result of [abductive inference|http://en.wikipedia.org/wiki/Abductive_inference] (i.e. guessing) where one tries to explain something through reversed logical deduction. In other words:

if "Q" and "P implies Q" then "maybe P"

The deduction may be incorrect. In Letovsky's experiment a developer incorrectly guessed that a memory allocation within a database function was for a database record. In another example the developer did not immediately understand why only six elements were displayed when the record array contained seven elements.

Document background information and the purpose of code to prevent incorrect conclusions, even when the issue appears isolated or minor.

Program-Dependent Items

Mark Thomas and Stuart Zweben. 1986. The Effects of Program-Dependent and Program-Independent Deletions on Software Cloze Tests. In Papers presented at the first workshop on empirical studies of programmers on Empirical studies of programmers, Elliot Soloway and Sitharama Iyengar (Eds.). Ablex Publishing Corp., Norwood, NJ, USA, 138-152.

A cloze test is a comprehension and vocabulary test where words are removed from a larger body of text. Removed items fall into one of two categories:

  1. Program-independent items can be resolved correctly without understanding the functionality (e.g. by process of elimination or to meet compilation requirements).
  2. Program-dependent items require functional understanding for correct resolution.

In the tests conducted by Thomas and Zweben cloze test error rates for program-dependent items were 41.14%/32.11% while only 12.41%/5.75% for program-independent items. Stated differently, participants had a much harder time deciphering the correct meaning of the code when lacking program-dependent information.

Document considerations (global, external, state) to reduce the chance of incorrect conclusions due to missing context.

Abstract Comprehension

Hoadley, C.M., Mann, L.M., Linn, M.C., & Clancy, M.J. (1996). When, Why and How do Novice Programmers Reuse Code? In W. Gray & D. Boehm-Davis (Eds.), Empirical Studies of Programmers, Sixth Workshop (pp. 109-130). Norwood, NJ: Ablex.

Experiments found that students having difficulty summarizing code were less likely to reuse code. Additionally, abstract comprehension resulted in 65% function reuse versus 12% function reuse with only algorithmic comprehension. Code that was not understood either abstractly or algorithmically was cloned 40% of the time which likely resulted in incorrect code.

Documentation should be written towards both abstract and algorithmic comprehension to increase code reuse and prevent incorrect code cloning.

Encouraging Documentation

While the benefits and mechanisms of improved development documentation may be clear, it is also important to take action that will result in the production of this documentation.

Herb Krasner, Bill Curtis, and Neil Iscoe. 1987. Communication breakdowns and boundary spanning activities on large programming projects. In Empirical studies of programmers: second workshop, Gary M. Olson, Sylvia Sheppard, and Elliot Soloway (Eds.). Ablex Publishing Corp., Norwood, NJ, USA 47-64.

Krasner et. al. conducted an informal analysis of the communication issues affecting large programming projects and identified areas in which the culture and environment discouraged effective communication. These areas include communication skills, incentive systems, representational formats, rapid change, jargon, information overload, scheduling pressure, and peer/management expectations.

Encouraging the production of documentation and effective communication must be accomplished through a combination of peer pressure and management behavior.

  • Hire or foster developers with high communication and technical competence who exhibit an attitude of egoless programming.
  • Reward documentation, communication, and long-term goals instead of short-term performance.
  • Use similar/standard documentation formats and minimize the use of jargon.

Posted by josuah at May 24, 2012 12:55 AM UTC+00:00

Trackback Pings

TrackBack URL for this entry:
http://www.wesman.net/cgi-bin/mt/mt-tb.cgi/1614

Comments

Post a comment

July 2013
Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

Search