Thursday, May 22, 2008

Are All Programs Representational?

This excellent question was raised by George Furnas in reaction to an early presentation of these ideas. What about a program that runs in a process controller, for example? It seems plausible that the job of a such a program isn't to represent anything, but just to do something.

But the way programs do things is by representing things. The target domain for the process control program contains the actions it controls, together with the conditions on them, and the operations in this domain include things like causing a particular action to occur under given conditions. Rather than working directly in this domain, the program permits us to carry out this operation in a computational representation domain.

This is parallel to the situation described in the previous post on composition. There is an operation on programs that represents composition of mappings, and there is an operation on programs that represents placing an action under the control of a condition. As for composition, for all this to work, the relevant programs have to have actual implementations that carry out the actual work, that is, that really carry out the actions, really make them obey the conditions, and so on.

Notations for Composition

What notational devices do programming languages use to specify the composition of two mappings? A common one is borrowed from mathematical notation, nesting of operators. For example, the expression

2+3*4

uses nesting to compose the multiplication and addition mappings in a particular way. The expression

(2+3)*4

composes the same mappings in a different way. How the mappings are composed has to be worked out by analyzing the details of the expression, including the role of operator precedence (if there are no parentheses, multiplications are performed before additions, even when the addition comes first in the expression) and parentheses, if present.

Here is a version of an expression for do-echo-computation that uses this device heavily:

#Version 1
[originalSong[i]+(intervalOfSilence+[.5*e for e in originalSong])[i] for i in range(len(originalSong))]

The code assumes that originalSong is the sound to be operated upon, and that an appropriate constant intervalOfSilence has been constructed.

Is it clear in this notation what mappings are being composed, and how? Before considering this question more fully, here is an alternative form:

#Version 2
softerSong=[.5*e for e in originalSong]
echo=intervalOfSilence+softerSong
[originalSong[i]+echo[i] for i in range(len(originalSong))]

Version 2 of the program uses another common way of specifying composition, the use of variables. Rather than nesting operations to show composition of mappings, Version 2 shows the production of intermediate results, using some mappings, which are then used as inputs to other mappings, to accomplish the effect of composition.

Both Version 1 and Version 2 specify the mappings to be composed in terms of lower level operations like list comprehension in the main expressions. We can focus more clearly on the contrasts in notation by separating out these specifications, like this:

def attenuate(s):
return [.5*e for e in s]
def delay(s):
return intervalOfSilence+s
def mix(s1,s2):
return [s1[i]+s2[i] for i in range(len(s1))]

We then have

#Version 1a
mix(originalSong,delay(attenuate(originalSong)))

#Version 2a
softerSong=attenuate(originalSong)
echo=delay(softerSong)
mix(originalSong,echo)

Note that Version 2a has no nesting of operations; composition is specified entirely by the use of variables. What are the merits and demerits of these two notations?

The best developed framework for analyzing notations is Cognitive Dimensions analysis, developed by Thomas Green, Marian Petre, Alan Blackwell, and others (see http://www.cl.cam.ac.uk/%7Eafb21/CognitiveDimensions/). Cognitive Dimensions are aspects of a notation to examine; here are a few of them, with capsule definitions from Blackwell http://www.cl.cam.ac.uk/Teaching/2000/AGraphHCI/HCI/hcinotes.html#cds, and a discussion of Versions 1a and 2a with respect to each.

Hidden dependencies: important links between entities are not visible.

If the structure of the product means some parts are closely related to other parts, and changes to one may affect the other, are those dependencies visible? What kind of dependencies are hidden? In what ways can it get worse when you are creating a particularly large description? Do these dependencies stay the same, or are there some actions that cause them to get frozen? If so, what are they?

Both versions suffer from a hard to see connection between the definition of mix and its use. While "hearing two sounds together" is completely symmetric (hearing A and B together is just the same as hearing B and A together) the computational operation mix that approximates it is not symmetric: neither Version 1a nor Version 1b will work if mix(echo,originalSound) is substituted for mix(originalSound,echo): mix is defined in such a way that the longer sound has to be specified as the second parameter. There's no way to tell this without close reading of the definition of mix.

When operations have multiple parameters, and they are not symmetric, mathematical notation does not work well, because the notation does not convey the order in which parameters must appear. Perhaps to help with this, common mathematical notation uses special forms for very asymmetric operations like division and exponentiation, providing some help in keeping track of the different roles of numerator and denominator, and base and exponent. Even here, though, one just has to know which parameter is which.

Hard mental operations: high demand on cognitive resources.

What kind of things require the most mental effort with this notation? Do some things seem especially complex or difficult to work out in your head (e.g. when combining several things)? What are they?

There are hard mental operations required in both Version 1a and Version 1b. In 1a one has to parse the expression, keeping track of parentheses, to determine what is composed with what, and in what way (when mappings take more than one parameter the possible compositions multiply). In 2a one has to trace the history of the variables to recover the same information. Neither of these tasks is easy.

We'll consider the next two dimensions together.

Closeness of mapping: closeness of representation to domain.

How closely related is the notation to the result that you are describing? Why?


Role-expressiveness: the purpose of a component is readily inferred.

When reading the notation, is it easy to tell what each part is for? Why? Are there some parts that are particularly difficult to interpret? Which ones? Are there parts that you really don't know what they mean, but you put them in just because it's always been that way? What are they?

These two dimensions point to advantages of Version 2a over 1a that are similar to those pointed out in the earlier post (March 20, 2008 What difference does it make?) comparing two solutions to the echo problem. The introduction of named variables in 2a points up correspondences between entities in the program and entities in the target domain, an example of closeness of mapping. At the same time, these correspondences help the reader understand the roles of these parts in the program.

Note that while these uses of names seem quite important from the representational perspective, choice of names is outside the scope of programming language design, as traditionally conceived. This situation parallels that seen in the discussion of application: it's crucial that a program be applied to the right thing, but languages don't provide support for this.

A further benefit of Version 2a is that the order in which the mappings are composed corresponds to lexical order there, while it is reversed in Version 1a. That is, the operation mix, that comes last in composition order, is written last in Version 2a, but first in Version 1a.

On balance, these cognitive dimensions favor Version 2a over 1a. Yet common programming style, including the recommendations I have made to my intro students in the past, favors 1a. This mismatch seems to reflect the lack of a representational perspective on programs, a perspective that is needed to exploit closeness of mapping, with derivative advantages in role-expressiveness.

It's interesting to speculate whether notations could be devised that would reduce the problems of hidden dependencies and hard mental operations that both versions show.

Flexibility in Programs II: Composition

As mentioned in the previous post (Flexibility in Programs I: Application), another kind of flexibility of computational representations is that computational operations can be combined in different ways to produce different results. A simple illustration is that the attenuate operation that appears as part of the program to add echoes could also be used by itself just to produce a softer version of a sound. To account for this flexibility we need to decompose our computational representational system in a further way.

We know that in the echo program the mapping do-echo-computation is built up by combining the operations attenuate, delay, and mix, as shown here:



Figure 1

Figure 1a shows the individual character of the computational operations shown in combination in Figure 1:


Figure 1a

We can see that attenuate and delay each take a sequence of samples as input and produce one as output. The double tailed arrow is used to show that mix takes two sequences of samples as input and produces one as output. Figure 1 shows just one particular way of combining these operations so as to produce a new operation, one of many. Clearly we could have a program that just does attenuate, as shown in a representational system in Figure 2:


Figure 2

Mathematically, the process of combining operations is composition. Given mappings whose outputs fit the inputs of other mappings, we can create a composite mapping that combines the effects of these mappings.

Against this background, a program can be seen as specifying two separable things: a collection of operations, and a way of composing those operations (to create a mapping that corresponds to some desired operation in the target domain.) Because these two things are separable, we do not have to start from scratch when we want a new combination of operations, just as we don't have to start from scratch when we want to add an echo to a new sound.

Composition and Notation

As we've seen, a program specifies a collection of operations, and a way of composing the operations to create a desired mapping. How do programming languages support this activity? This is itself a problem in representation: our target domain has mappings, with operations that include the various ways we might wish to compose them. A programming language represents these things in a way that satisfies two requirements:

(1) There has to be an operation on programs that allows us to get the effect of composing mappings.

(2) Working with programs has to allow people to obtain these effects easily and accurately, in a sufficient range of situations.

Note that these conditions correspond to Mackinlay's expressiveness and effectiveness conditions, as discussed in the post for February 17, 2008
Representation and Cost.

What does Condition (1) entail about programs? One might think that a picture like Figure 3a would capture the essentials of the situation:




Figure 3a
Here there are two mappings in a target domain, m1 and m2, and their composition, m3. There is a mapping notate that takes mappings to programs, and another mapping, interpret, that takes a program to a mapping. It would seem that programs in this representational system, and an operation on them, accurately represent the mappings and the composition operation.

But how can interpret actually work? Since the idea of a representational system is that one obtains the actual results of work in the target domain, by passing through the representational domain, interpret would have to actually produce the mapping m3. But how can that be done? What does it mean to actually produce a mapping?

An auxiliary picture, Figure 3b, explains this.


Figure 3b
Here a mapping m is shown in a setting that includes the things on which it operates and which it produces. The picture also includes not only the program p, but also an implementation, ip, of program p. The implementation is an actual physical apparatus that operates on and produces things in the target domain. For example, in the echo example, the combination of record, do-echo-computation, and playback, operates on an actual sound and produces an actual sound.

As Figure 3b suggests, for p to represent m requires not just that there be some correspondence between p and m, but that there be an implementation of p that can actually do the work of m. Figure 3c elaborates Figure 3a to show the implementations as well as the programs:


Figure 3c
Rather than including these entities whenever we show programs representing mappings, however, we can just assume that whenever we show mappings notate and interpret between mappings and programs that there are implementations available that satisfy the requirement in Figure 3b. It is these implementations that actually produce the mappings that programs represent.

Turning now to condition (2), note that this condition is about what people can do: if people can't do the required work, a programming language can't be a good representation. We'll use the term notation for representations that have to be operated on in this way by people.


Flexibility in Programs I: Application

Looking more closely at the echo program discussed earlier (March 20, 2008
What difference does it make? ) we can see that some important aspects are not captured by the representational view we've developed so far. In our diagrams we have shown programs operating on the representation of some particular sound, whereas in fact the programs can be deployed so as to operate on any song. Further, as seen in Figure 1,

Figure 1

the computational operations in the program are arranged so that an echo is produced, and not some other effect. But these operations could be arranged differently, so as to produce quite different effects, in a different but related program. So there are two kinds of flexibility in programs that we have not captured: the ability to use the same program to represent different sounds, and the ability to recombine the same computational operations so as to represent different operations in a target domain. These capabilities are crucial in reducing the cost of creating computational representations. How can we understand them within the representational framework?

Continuing to use the program in Figure 1 as an example, we can distinguish two aspects of the representation system in which the program participates. First, we can pick out the general relationships that exist, independent of identifying any particular sound, or any particular arrangement of operations. As this diagram shows,

Figure 2

there is a correspondence between the domain of sounds, as a body, and the domain of sequences of samples, that we can refer to without picking out any one sound, or any one sequence. As shown there, the operations in the target and representational domains also appear as mappings connecting whole bodies of sounds, or whole bodies of sequences of samples. That is, we do not need to think of these operations as acting on individual sounds or sequences, but rather as connecting whole collections of sounds or sequences. In this way of looking at the situation, the requirement that a representational system actually works is the requirement that the composition 'record'--'do echo computation'--'play back' equals the mapping 'echo' (for now we neglect questions of approximation; see February 17, 2008 Representation and Cost.)

With the general relationships among collections of things laid out, the second aspect of the representation system we need to attend to is how we can deploy the operations in it so as to secure some particular desired result. After all, Figure 2 just tells us that there is a way to add the echo, it doesn't actually add it to anything. To get this to happen we have to have a way to pick out a particular sound, and then cause the operation shown in Figure 2 to be carried out as appropriate for that sound. Let's call this part of of our work application: we want to apply the operations in the representational system to a particular input.

In practice, this important matter is handled rather haphazardly in programming systems. In one scenario, application starts with a physical recording process, and what sound I am applying that process to is determined by when and where I do it: it applies to whatever sound is happening at that time and place. The recording process produces a computational entity called a file, which is given a name and stored in a certain way in a computer. Applying computational operations to that file, and not some other, usually depends on using the name in an appropriate way, and also carrying out the computational operations in a particular context, one that associates our particular file with the operations, and not some other file with the same name stored in a different way. Even though getting the intended results from our computational operations depends crucially on these operations (recording, naming, and storing) being carried out in the right way, programming systems don't specify how they should be done, nor provide any check that they have been done correctly. For example, if we use the wrong file name when setting up our computational operations there will no indication whatever that we have done that, provided only that the name we use picks out some file of the appropriate in our context.

These defects aside, we see that we can account for some of the flexibility of computational representational systems. These systems can be applied to different particular things in their target domains. We don't have to build a new system when we want to add an echo to a new sound. This works because we can separate the construction of the system from its application.