Software Essence and Untyped Languages

In his classic book "The Mythical Man Month", Frederick P. Brooks noted that 
software development tools and techniques have not been able to greatly improve 
programming productivity.  Brooks posited that software development was 
composed of difficulties essential to the task - for which he used the term 
'essence' - and difficulties that were incidental to the task.  His explanation 
for the lack of large improvements in productivity was that tools and 
techniques - including for software languages and programming paradigms - were 
already sufficiently good to minimize the effort spent on incidental tasks, 
with the result that the majority of software development is spent on the 
essence of the software.  As a former colleague of mine once put it, 'the 
limiting factor when I program is not how fast I can type.'

Independent evidence for Brooks' theory is available in studies that indicate 
that software development time is not greatly dependent on the language used.  
The one exception is that programming in assembler takes about twice as much 
time, perhaps because assembly languages does not provide the help with 
incidental coding tasks that higher level languages provide.

Several writers have recently observed that use of untyped languages 
- languages that do not associate each variable name with a single data type 
- provide perhaps a factor of two improvement in productivity over strongly 
typed languages like Java and C++.  This advantage seems independent of the 
specific untyped language used; Joel Spolsky has observed the effect with 
Visual Basic, Bruce Eckels with Python, and I've had similar experiences with 
PHP.  The degree of this improvement suggests that it is related to the 
essence, rather than merely with the incidentals, of software development, a 
suggestion that is reinforced by the language independence of the effect.

This would only be possible if lack of strong typing actually reduced the 
software essence needed for development of a piece of software, by eliminating 
some essential, rather than merely incidental, difficulties associated with 
that development.  And, indeed, it does:  in a strongly typed language, data 
comes in various types, and the developer must consciously think about the type 
of each datum when coding to ensure that the program will compile and run.  In 
contrast, an untyped language has only a single, global, syntactic data type, 
relieving the developer of the burden of tracking the syntactic type or 
representation associated with each datum.

This suggests, however, that there are limits to this benefit.  Data, 
particularly in object oriented programs, can have a wide variety of real data 
types, even if this typing is not enforced by language syntax.

As long as the programmer can mentally track the essential type of a datum 
- for example, that it's an amount in dollars - being relieved of the tedium of 
tracking a syntactic type - such as whether the dollar value is represented as 
a string of digits or a binary integer - is advantageous.  This effect is 
especially true if the application is such that occasional lapses on the part 
of the programmer only result in errors that are not catastrophic - for example 
if one value is represented incorrectly on a web page or other user interface, 
but the information on the rest of the page remains intact and useful.

When a program becomes sufficiently large and complex that the programmer can 
no longer mentally track essential types for all data without automated help, 
however, one might expect this advantage to be reduced or even reverse.  This 
would be especially true in software applications where even a single minor 
error can be catastrophic.

Warren J. Dew
25 November 2004
[put links to Spolsky and Eckel articles here.]