Software Essence and Untyped Languages
In his classic book "The Mythical Man Month", Frederick P. Brooks noted that
software development tools and techniques have not been able to greatly improve
programming productivity. Brooks posited that software development was
composed of difficulties essential to the task - for which he used the term
'essence' - and difficulties that were incidental to the task. His explanation
for the lack of large improvements in productivity was that tools and
techniques - including for software languages and programming paradigms - were
already sufficiently good to minimize the effort spent on incidental tasks,
with the result that the majority of software development is spent on the
essence of the software. As a former colleague of mine once put it, 'the
limiting factor when I program is not how fast I can type.'
Independent evidence for Brooks' theory is available in studies that indicate
that software development time is not greatly dependent on the language used.
The one exception is that programming in assembler takes about twice as much
time, perhaps because assembly languages does not provide the help with
incidental coding tasks that higher level languages provide.
Several writers have recently observed that use of untyped languages
- languages that do not associate each variable name with a single data type
- provide perhaps a factor of two improvement in productivity over strongly
typed languages like Java and C++. This advantage seems independent of the
specific untyped language used; Joel Spolsky has observed the effect with
Visual Basic, Bruce Eckels with Python, and I've had similar experiences with
PHP. The degree of this improvement suggests that it is related to the
essence, rather than merely with the incidentals, of software development, a
suggestion that is reinforced by the language independence of the effect.
This would only be possible if lack of strong typing actually reduced the
software essence needed for development of a piece of software, by eliminating
some essential, rather than merely incidental, difficulties associated with
that development. And, indeed, it does: in a strongly typed language, data
comes in various types, and the developer must consciously think about the type
of each datum when coding to ensure that the program will compile and run. In
contrast, an untyped language has only a single, global, syntactic data type,
relieving the developer of the burden of tracking the syntactic type or
representation associated with each datum.
This suggests, however, that there are limits to this benefit. Data,
particularly in object oriented programs, can have a wide variety of real data
types, even if this typing is not enforced by language syntax.
As long as the programmer can mentally track the essential type of a datum
- for example, that it's an amount in dollars - being relieved of the tedium of
tracking a syntactic type - such as whether the dollar value is represented as
a string of digits or a binary integer - is advantageous. This effect is
especially true if the application is such that occasional lapses on the part
of the programmer only result in errors that are not catastrophic - for example
if one value is represented incorrectly on a web page or other user interface,
but the information on the rest of the page remains intact and useful.
When a program becomes sufficiently large and complex that the programmer can
no longer mentally track essential types for all data without automated help,
however, one might expect this advantage to be reduced or even reverse. This
would be especially true in software applications where even a single minor
error can be catastrophic.
Warren J. Dew
25 November 2004
[put links to Spolsky and Eckel articles here.]