Thursday, May 22, 2008

Are All Programs Representational?

This excellent question was raised by George Furnas in reaction to an early presentation of these ideas. What about a program that runs in a process controller, for example? It seems plausible that the job of a such a program isn't to represent anything, but just to do something.

But the way programs do things is by representing things. The target domain for the process control program contains the actions it controls, together with the conditions on them, and the operations in this domain include things like causing a particular action to occur under given conditions. Rather than working directly in this domain, the program permits us to carry out this operation in a computational representation domain.

This is parallel to the situation described in the previous post on composition. There is an operation on programs that represents composition of mappings, and there is an operation on programs that represents placing an action under the control of a condition. As for composition, for all this to work, the relevant programs have to have actual implementations that carry out the actual work, that is, that really carry out the actions, really make them obey the conditions, and so on.

Notations for Composition

What notational devices do programming languages use to specify the composition of two mappings? A common one is borrowed from mathematical notation, nesting of operators. For example, the expression

2+3*4

uses nesting to compose the multiplication and addition mappings in a particular way. The expression

(2+3)*4

composes the same mappings in a different way. How the mappings are composed has to be worked out by analyzing the details of the expression, including the role of operator precedence (if there are no parentheses, multiplications are performed before additions, even when the addition comes first in the expression) and parentheses, if present.

Here is a version of an expression for do-echo-computation that uses this device heavily:

#Version 1
[originalSong[i]+(intervalOfSilence+[.5*e for e in originalSong])[i] for i in range(len(originalSong))]

The code assumes that originalSong is the sound to be operated upon, and that an appropriate constant intervalOfSilence has been constructed.

Is it clear in this notation what mappings are being composed, and how? Before considering this question more fully, here is an alternative form:

#Version 2
softerSong=[.5*e for e in originalSong]
echo=intervalOfSilence+softerSong
[originalSong[i]+echo[i] for i in range(len(originalSong))]

Version 2 of the program uses another common way of specifying composition, the use of variables. Rather than nesting operations to show composition of mappings, Version 2 shows the production of intermediate results, using some mappings, which are then used as inputs to other mappings, to accomplish the effect of composition.

Both Version 1 and Version 2 specify the mappings to be composed in terms of lower level operations like list comprehension in the main expressions. We can focus more clearly on the contrasts in notation by separating out these specifications, like this:

def attenuate(s):
return [.5*e for e in s]
def delay(s):
return intervalOfSilence+s
def mix(s1,s2):
return [s1[i]+s2[i] for i in range(len(s1))]

We then have

#Version 1a
mix(originalSong,delay(attenuate(originalSong)))

#Version 2a
softerSong=attenuate(originalSong)
echo=delay(softerSong)
mix(originalSong,echo)

Note that Version 2a has no nesting of operations; composition is specified entirely by the use of variables. What are the merits and demerits of these two notations?

The best developed framework for analyzing notations is Cognitive Dimensions analysis, developed by Thomas Green, Marian Petre, Alan Blackwell, and others (see http://www.cl.cam.ac.uk/%7Eafb21/CognitiveDimensions/). Cognitive Dimensions are aspects of a notation to examine; here are a few of them, with capsule definitions from Blackwell http://www.cl.cam.ac.uk/Teaching/2000/AGraphHCI/HCI/hcinotes.html#cds, and a discussion of Versions 1a and 2a with respect to each.

Hidden dependencies: important links between entities are not visible.

If the structure of the product means some parts are closely related to other parts, and changes to one may affect the other, are those dependencies visible? What kind of dependencies are hidden? In what ways can it get worse when you are creating a particularly large description? Do these dependencies stay the same, or are there some actions that cause them to get frozen? If so, what are they?

Both versions suffer from a hard to see connection between the definition of mix and its use. While "hearing two sounds together" is completely symmetric (hearing A and B together is just the same as hearing B and A together) the computational operation mix that approximates it is not symmetric: neither Version 1a nor Version 1b will work if mix(echo,originalSound) is substituted for mix(originalSound,echo): mix is defined in such a way that the longer sound has to be specified as the second parameter. There's no way to tell this without close reading of the definition of mix.

When operations have multiple parameters, and they are not symmetric, mathematical notation does not work well, because the notation does not convey the order in which parameters must appear. Perhaps to help with this, common mathematical notation uses special forms for very asymmetric operations like division and exponentiation, providing some help in keeping track of the different roles of numerator and denominator, and base and exponent. Even here, though, one just has to know which parameter is which.

Hard mental operations: high demand on cognitive resources.

What kind of things require the most mental effort with this notation? Do some things seem especially complex or difficult to work out in your head (e.g. when combining several things)? What are they?

There are hard mental operations required in both Version 1a and Version 1b. In 1a one has to parse the expression, keeping track of parentheses, to determine what is composed with what, and in what way (when mappings take more than one parameter the possible compositions multiply). In 2a one has to trace the history of the variables to recover the same information. Neither of these tasks is easy.

We'll consider the next two dimensions together.

Closeness of mapping: closeness of representation to domain.

How closely related is the notation to the result that you are describing? Why?


Role-expressiveness: the purpose of a component is readily inferred.

When reading the notation, is it easy to tell what each part is for? Why? Are there some parts that are particularly difficult to interpret? Which ones? Are there parts that you really don't know what they mean, but you put them in just because it's always been that way? What are they?

These two dimensions point to advantages of Version 2a over 1a that are similar to those pointed out in the earlier post (March 20, 2008 What difference does it make?) comparing two solutions to the echo problem. The introduction of named variables in 2a points up correspondences between entities in the program and entities in the target domain, an example of closeness of mapping. At the same time, these correspondences help the reader understand the roles of these parts in the program.

Note that while these uses of names seem quite important from the representational perspective, choice of names is outside the scope of programming language design, as traditionally conceived. This situation parallels that seen in the discussion of application: it's crucial that a program be applied to the right thing, but languages don't provide support for this.

A further benefit of Version 2a is that the order in which the mappings are composed corresponds to lexical order there, while it is reversed in Version 1a. That is, the operation mix, that comes last in composition order, is written last in Version 2a, but first in Version 1a.

On balance, these cognitive dimensions favor Version 2a over 1a. Yet common programming style, including the recommendations I have made to my intro students in the past, favors 1a. This mismatch seems to reflect the lack of a representational perspective on programs, a perspective that is needed to exploit closeness of mapping, with derivative advantages in role-expressiveness.

It's interesting to speculate whether notations could be devised that would reduce the problems of hidden dependencies and hard mental operations that both versions show.

Flexibility in Programs II: Composition

As mentioned in the previous post (Flexibility in Programs I: Application), another kind of flexibility of computational representations is that computational operations can be combined in different ways to produce different results. A simple illustration is that the attenuate operation that appears as part of the program to add echoes could also be used by itself just to produce a softer version of a sound. To account for this flexibility we need to decompose our computational representational system in a further way.

We know that in the echo program the mapping do-echo-computation is built up by combining the operations attenuate, delay, and mix, as shown here:



Figure 1

Figure 1a shows the individual character of the computational operations shown in combination in Figure 1:


Figure 1a

We can see that attenuate and delay each take a sequence of samples as input and produce one as output. The double tailed arrow is used to show that mix takes two sequences of samples as input and produces one as output. Figure 1 shows just one particular way of combining these operations so as to produce a new operation, one of many. Clearly we could have a program that just does attenuate, as shown in a representational system in Figure 2:


Figure 2

Mathematically, the process of combining operations is composition. Given mappings whose outputs fit the inputs of other mappings, we can create a composite mapping that combines the effects of these mappings.

Against this background, a program can be seen as specifying two separable things: a collection of operations, and a way of composing those operations (to create a mapping that corresponds to some desired operation in the target domain.) Because these two things are separable, we do not have to start from scratch when we want a new combination of operations, just as we don't have to start from scratch when we want to add an echo to a new sound.

Composition and Notation

As we've seen, a program specifies a collection of operations, and a way of composing the operations to create a desired mapping. How do programming languages support this activity? This is itself a problem in representation: our target domain has mappings, with operations that include the various ways we might wish to compose them. A programming language represents these things in a way that satisfies two requirements:

(1) There has to be an operation on programs that allows us to get the effect of composing mappings.

(2) Working with programs has to allow people to obtain these effects easily and accurately, in a sufficient range of situations.

Note that these conditions correspond to Mackinlay's expressiveness and effectiveness conditions, as discussed in the post for February 17, 2008
Representation and Cost.

What does Condition (1) entail about programs? One might think that a picture like Figure 3a would capture the essentials of the situation:




Figure 3a
Here there are two mappings in a target domain, m1 and m2, and their composition, m3. There is a mapping notate that takes mappings to programs, and another mapping, interpret, that takes a program to a mapping. It would seem that programs in this representational system, and an operation on them, accurately represent the mappings and the composition operation.

But how can interpret actually work? Since the idea of a representational system is that one obtains the actual results of work in the target domain, by passing through the representational domain, interpret would have to actually produce the mapping m3. But how can that be done? What does it mean to actually produce a mapping?

An auxiliary picture, Figure 3b, explains this.


Figure 3b
Here a mapping m is shown in a setting that includes the things on which it operates and which it produces. The picture also includes not only the program p, but also an implementation, ip, of program p. The implementation is an actual physical apparatus that operates on and produces things in the target domain. For example, in the echo example, the combination of record, do-echo-computation, and playback, operates on an actual sound and produces an actual sound.

As Figure 3b suggests, for p to represent m requires not just that there be some correspondence between p and m, but that there be an implementation of p that can actually do the work of m. Figure 3c elaborates Figure 3a to show the implementations as well as the programs:


Figure 3c
Rather than including these entities whenever we show programs representing mappings, however, we can just assume that whenever we show mappings notate and interpret between mappings and programs that there are implementations available that satisfy the requirement in Figure 3b. It is these implementations that actually produce the mappings that programs represent.

Turning now to condition (2), note that this condition is about what people can do: if people can't do the required work, a programming language can't be a good representation. We'll use the term notation for representations that have to be operated on in this way by people.


Flexibility in Programs I: Application

Looking more closely at the echo program discussed earlier (March 20, 2008
What difference does it make? ) we can see that some important aspects are not captured by the representational view we've developed so far. In our diagrams we have shown programs operating on the representation of some particular sound, whereas in fact the programs can be deployed so as to operate on any song. Further, as seen in Figure 1,

Figure 1

the computational operations in the program are arranged so that an echo is produced, and not some other effect. But these operations could be arranged differently, so as to produce quite different effects, in a different but related program. So there are two kinds of flexibility in programs that we have not captured: the ability to use the same program to represent different sounds, and the ability to recombine the same computational operations so as to represent different operations in a target domain. These capabilities are crucial in reducing the cost of creating computational representations. How can we understand them within the representational framework?

Continuing to use the program in Figure 1 as an example, we can distinguish two aspects of the representation system in which the program participates. First, we can pick out the general relationships that exist, independent of identifying any particular sound, or any particular arrangement of operations. As this diagram shows,

Figure 2

there is a correspondence between the domain of sounds, as a body, and the domain of sequences of samples, that we can refer to without picking out any one sound, or any one sequence. As shown there, the operations in the target and representational domains also appear as mappings connecting whole bodies of sounds, or whole bodies of sequences of samples. That is, we do not need to think of these operations as acting on individual sounds or sequences, but rather as connecting whole collections of sounds or sequences. In this way of looking at the situation, the requirement that a representational system actually works is the requirement that the composition 'record'--'do echo computation'--'play back' equals the mapping 'echo' (for now we neglect questions of approximation; see February 17, 2008 Representation and Cost.)

With the general relationships among collections of things laid out, the second aspect of the representation system we need to attend to is how we can deploy the operations in it so as to secure some particular desired result. After all, Figure 2 just tells us that there is a way to add the echo, it doesn't actually add it to anything. To get this to happen we have to have a way to pick out a particular sound, and then cause the operation shown in Figure 2 to be carried out as appropriate for that sound. Let's call this part of of our work application: we want to apply the operations in the representational system to a particular input.

In practice, this important matter is handled rather haphazardly in programming systems. In one scenario, application starts with a physical recording process, and what sound I am applying that process to is determined by when and where I do it: it applies to whatever sound is happening at that time and place. The recording process produces a computational entity called a file, which is given a name and stored in a certain way in a computer. Applying computational operations to that file, and not some other, usually depends on using the name in an appropriate way, and also carrying out the computational operations in a particular context, one that associates our particular file with the operations, and not some other file with the same name stored in a different way. Even though getting the intended results from our computational operations depends crucially on these operations (recording, naming, and storing) being carried out in the right way, programming systems don't specify how they should be done, nor provide any check that they have been done correctly. For example, if we use the wrong file name when setting up our computational operations there will no indication whatever that we have done that, provided only that the name we use picks out some file of the appropriate in our context.

These defects aside, we see that we can account for some of the flexibility of computational representational systems. These systems can be applied to different particular things in their target domains. We don't have to build a new system when we want to add an echo to a new sound. This works because we can separate the construction of the system from its application.

Saturday, April 26, 2008

Some Philosophical Perspectives

There is an extensive philosophical literature on representations. Much if it is aimed at understanding mental representations, which isn't our concern here, but some of the arguments presented are intended to clarify the nature of representations generally.

In what way do representations represent what they represent? I've argued that (i) a representation represents something only in the context of a representational system, in which its correspondence to a target is picked out. I've also argued that (ii) representational systems arise, or are devised, because they allow operations to be performed with less work, in an inclusive sense, than they can be performed without them. These arguments are similar to those presented by Ruth Millikan in her Language, Thought, and Other Biological Categories (MIT Press, 1984) [Millikan's terminology is different; she reserves the term "representation" for situations in which something has as "its focused proper function to precipitate acts of identification of the referents of its elements (p 198)." For example, a bee dance, clearly a representation in our sense, is not a representation for Millikan, because its purpose is not to identify any referent.]Allowing for difference in terminology, Millikan argues that (i) what representations refer to is determined by the function it serves in the (biological) system in which it is embedded, and that (ii) this function is seen in the adaptive value of the representation in an evolutionary sense. My arguments (i) and (ii) are extensions of these beyond the domain of biological systems.

A conflicting interpretation of representation is sought by Cummins in Representations, Targets, and Attitudes (MIT Press, 1996), where he argues for a "conception of representational content" that is "independent of use or functional role" (pp. 85-86). Cummins's central worry about meaning-as-use theories of representation is making sense of error in representations. We want to talk about a representation being "wrong", but if its meaning is determined by its use, how can a representation ever be "wrong", rather than meaningless?

Cummins's account of "wrong" representations relies on assigning "content" to representations, "independent of use or functional role", and comparing the content with the "target" of the representation, the thing that it is meant to refer to in a given situation. A representation is "wrong" if there is a mismatch, as when a map of Paris is used when New York is the target. But, as we've seen earlier, computational representations, like those using blocks of bits, simply can't be assigned content out of context. So a different account of content, and "wrongness", has to be offered, at least for these cases.

In the account I am suggesting, we identify a "wrong" representation when we analyze a system of things and relationships as if it were a representational system, but then find that the crucial requirement, that mapping operations into the representation domain and then back has to save work, doesn't hold. For example, a map of Paris is "correct" as a representation when seen in a representational system for navigating in Paris, but "wrong" in a system for navigating in New York: we don't save any work trying to get around Paris using a map of New York.

This kind of usage for "wrong" is common in talking about complex systems that partially satisfy some description. Rather than refusing to apply the term "representation" at all to a system that only partially meets the definition, we apply the term with a qualifier, "wrong" in this case, allowing us to describe both the points of agreement with the definition and the discrepancy. (Viewing the application of a term like "representation" as itself an instance of representation, one can say that using the term, with qualifications, provides value as long as the value gained by applying the term outweighs the complication involved in managing the qualifications. If a system shares very little with a full-fledged, working representational system, describing it as a representational system won't be useful. But there is no bright line separating appropriate from inappropriate usage, any more than there is bright line separating "insufficiently precise" measurements from "sufficiently precise" ones.)

This treatment is consistent with Millikan. Allowing as before for difference in terminology, for Millikan a representation is "wrong" when it fails to perform its proper (biological) function.

What does a representation represent?

It's common usage to talk about what something represents, as in "this sequence of numbers represents that sound." But it's clear that computational representations, at least, can't be said to represent anything, taken out of context. The same sequence of numbers, in the context of different uses, could represent a time series of stock prices, or a time series of predicted water levels in a reservoir, or ... . More radically, blocks of bits in memory are used to represent all the different things that can be represented computationally, and a given pattern of bits can represent numbers of different kinds (floating point or integer), or a group of characters, or a part of a data structure tied together with addresses, or... . In the context of a particular representational system, a block of bits can be said to represent whatever it is that it corresponds to in that system. But unless that context is assumed it is really meaningless to ascribe representational content to a block of bits. Anytime we ascribe content we are presuming context.

There's nothing unusual in this situation. Many attributes are commonly ascribed to things that really are determined in complex ways by the situations in which the things are encountered. Colors cannot be assigned to things independent of the viewer, for example, or independent of the context in which the thing is viewed (for striking demonstrations see http://www.purveslab.net/seeforyourself/). A thing can be said to be "large" or "small" only in a comparative context, usually implicit, and so on.

Thursday, March 20, 2008

What difference does it make?

Does it make any difference to view computational systems as representational systems? Yes. Here is a comparison of two programs I wrote as instructor's solutions for the same homework exercise in an introductory course using Python. The problem is to read a list of sound samples from a WAV file, and record a new sound, consisting of the original plus an echo. The echo is delayed by .15 sec, and attenuated by a factor of .25. This is the same example discussed in the post, Computational Stuff, a Worked Example.

Program 1 is the solution discussed in the earlier post, with the portion of the solution that records the song with echo slightly modified for comparability with the solution presented as Program 2.

#Program 1
originalSong=readwave("song.wav").getLeft()
softerSong=[.5*e for e in originalSong] #line 1
intervalOfSilence=int(.25*sampleRate)*[0] #line 6
delayedSofterSong=intervalOfSilence+softerSong #line 7 #line 2
#line 3:
songWithEcho=[originalSong[i]+delayedSofterSong[i] for i in range(len(originalSong))]
outputWAV=makewave("modifiedsong.wav",44100)
for sample in songWithEcho:
outputWAV.record(sample,0)
outputWAV.close()

Program 2 is my solution to the same problem, written earlier without considering the representational perspective:

#Program 2
originalSong = readwave("song.wav").getLeft()
echospread = int( 44100 * .25 )
outputWAV = makewave("modifiedsong.wav", sampleRate)
for i in range(0, echospread):
outputWAV.record(originalSong[i], 0)
for i in range(echospread, len(originalSong)):
outputWAV.record( originalSong[i] + originalSong[i - echospread] * .5,0)
outputWAV.close()


Both programs produce correct results. Indeed, they produce the same results. But they work quite differently.

It is difficult to identify any computational operation in Program 2 with an operation in the target domain of sounds, or to identify any data structure in Program 2 with a sound, except for originalSong. The thrust of Program 2 is to record certain numbers in a file, not to construct a representation of a sound that is produced in a certain way.

In contrast, in Program 1, softerSong, delayedSofterSong, and songWith Echo all correspond to sounds, and they are created using operations on sequences of samples that correspond to operations on sounds: attentuation (in line 1), delaying (in line 2), and mixing the original song with the echo (in line 3).

Here are diagrams showing these relationships in the two programs.


Here is shown the original song, bouncing off a reflector shown at right, in attenuated form, being delayed, and being mixed with the original song. Color coding and dashed lines show which things in the representation domain correspond to which things in the target domain. Solid lines show which operations in the representation domain correspond to mappings in the target domain.

Notice that the computational representation produces only an approximation to the actual echo effect. As shown, when the recorded sound is played back the echo is truncated, because when mixing is done in the program the resulting sound is only as long as the shorter of the two sounds being mixed.
The similar diagram for Program 2 has many fewer correspondences to point out. The value of originalSong corresponds to the original song in the target domain, but there are no other simple correspondences. Because Program 2 builds up the recording sample by sample, the operations of attentuation, delay, and mixing, are hidden in the arithmetic of the construction of the samples. The prominent decomposition of the program into two loops make sense in view of the change in arithmetic needed to produce two groups of samples, but this split doesn't reflect anything about how echos are produced physically.

Notice also that there is a mapping, "record samples" shown as an arrow for Program 1 that appears only as a floating label in Program 2. In Program 1 there is a list of samples created that corresponds to the song with echo, and this list of samples is recorded, an operation easily understood as a mapping. In Program 2, there is no list of samples that is created and then recorded; rather, samples are created and recorded piecemeal. Recording is happening, but not in a way that can be understood separately from other operations in the program.

So what? For an experienced programmer, Program 2 isn't hard to write. One imagines the numbers one has to record, and figures out how to do the arithmetic that will produce these numbers, in the process of recording them. One's understanding of the physics of sound plays a role only in determining what the correct numbers are, not in writing a program to produce the numbers.

But for beginners, Program 2 is difficult to write. Experience is needed to relate what the numbers are to a method for producing them.

Program 1, on the other hand, is easier to write, if one understands the physics of echo production and the computational operations that correspond to operations on sounds, as discussed in the earlier post. Writing the program requires only that the computational operations that correspond to attenuation, delay, and mixing be deployed and combined, and that one knows how to record a list of samples.