Thursday, May 22, 2008

Are All Programs Representational?

This excellent question was raised by George Furnas in reaction to an early presentation of these ideas. What about a program that runs in a process controller, for example? It seems plausible that the job of a such a program isn't to represent anything, but just to do something.

But the way programs do things is by representing things. The target domain for the process control program contains the actions it controls, together with the conditions on them, and the operations in this domain include things like causing a particular action to occur under given conditions. Rather than working directly in this domain, the program permits us to carry out this operation in a computational representation domain.

This is parallel to the situation described in the previous post on composition. There is an operation on programs that represents composition of mappings, and there is an operation on programs that represents placing an action under the control of a condition. As for composition, for all this to work, the relevant programs have to have actual implementations that carry out the actual work, that is, that really carry out the actions, really make them obey the conditions, and so on.

Notations for Composition

What notational devices do programming languages use to specify the composition of two mappings? A common one is borrowed from mathematical notation, nesting of operators. For example, the expression

2+3*4

uses nesting to compose the multiplication and addition mappings in a particular way. The expression

(2+3)*4

composes the same mappings in a different way. How the mappings are composed has to be worked out by analyzing the details of the expression, including the role of operator precedence (if there are no parentheses, multiplications are performed before additions, even when the addition comes first in the expression) and parentheses, if present.

Here is a version of an expression for do-echo-computation that uses this device heavily:

#Version 1
[originalSong[i]+(intervalOfSilence+[.5*e for e in originalSong])[i] for i in range(len(originalSong))]

The code assumes that originalSong is the sound to be operated upon, and that an appropriate constant intervalOfSilence has been constructed.

Is it clear in this notation what mappings are being composed, and how? Before considering this question more fully, here is an alternative form:

#Version 2
softerSong=[.5*e for e in originalSong]
echo=intervalOfSilence+softerSong
[originalSong[i]+echo[i] for i in range(len(originalSong))]

Version 2 of the program uses another common way of specifying composition, the use of variables. Rather than nesting operations to show composition of mappings, Version 2 shows the production of intermediate results, using some mappings, which are then used as inputs to other mappings, to accomplish the effect of composition.

Both Version 1 and Version 2 specify the mappings to be composed in terms of lower level operations like list comprehension in the main expressions. We can focus more clearly on the contrasts in notation by separating out these specifications, like this:

def attenuate(s):
return [.5*e for e in s]
def delay(s):
return intervalOfSilence+s
def mix(s1,s2):
return [s1[i]+s2[i] for i in range(len(s1))]

We then have

#Version 1a
mix(originalSong,delay(attenuate(originalSong)))

#Version 2a
softerSong=attenuate(originalSong)
echo=delay(softerSong)
mix(originalSong,echo)

Note that Version 2a has no nesting of operations; composition is specified entirely by the use of variables. What are the merits and demerits of these two notations?

The best developed framework for analyzing notations is Cognitive Dimensions analysis, developed by Thomas Green, Marian Petre, Alan Blackwell, and others (see http://www.cl.cam.ac.uk/%7Eafb21/CognitiveDimensions/). Cognitive Dimensions are aspects of a notation to examine; here are a few of them, with capsule definitions from Blackwell http://www.cl.cam.ac.uk/Teaching/2000/AGraphHCI/HCI/hcinotes.html#cds, and a discussion of Versions 1a and 2a with respect to each.

Hidden dependencies: important links between entities are not visible.

If the structure of the product means some parts are closely related to other parts, and changes to one may affect the other, are those dependencies visible? What kind of dependencies are hidden? In what ways can it get worse when you are creating a particularly large description? Do these dependencies stay the same, or are there some actions that cause them to get frozen? If so, what are they?

Both versions suffer from a hard to see connection between the definition of mix and its use. While "hearing two sounds together" is completely symmetric (hearing A and B together is just the same as hearing B and A together) the computational operation mix that approximates it is not symmetric: neither Version 1a nor Version 1b will work if mix(echo,originalSound) is substituted for mix(originalSound,echo): mix is defined in such a way that the longer sound has to be specified as the second parameter. There's no way to tell this without close reading of the definition of mix.

When operations have multiple parameters, and they are not symmetric, mathematical notation does not work well, because the notation does not convey the order in which parameters must appear. Perhaps to help with this, common mathematical notation uses special forms for very asymmetric operations like division and exponentiation, providing some help in keeping track of the different roles of numerator and denominator, and base and exponent. Even here, though, one just has to know which parameter is which.

Hard mental operations: high demand on cognitive resources.

What kind of things require the most mental effort with this notation? Do some things seem especially complex or difficult to work out in your head (e.g. when combining several things)? What are they?

There are hard mental operations required in both Version 1a and Version 1b. In 1a one has to parse the expression, keeping track of parentheses, to determine what is composed with what, and in what way (when mappings take more than one parameter the possible compositions multiply). In 2a one has to trace the history of the variables to recover the same information. Neither of these tasks is easy.

We'll consider the next two dimensions together.

Closeness of mapping: closeness of representation to domain.

How closely related is the notation to the result that you are describing? Why?


Role-expressiveness: the purpose of a component is readily inferred.

When reading the notation, is it easy to tell what each part is for? Why? Are there some parts that are particularly difficult to interpret? Which ones? Are there parts that you really don't know what they mean, but you put them in just because it's always been that way? What are they?

These two dimensions point to advantages of Version 2a over 1a that are similar to those pointed out in the earlier post (March 20, 2008 What difference does it make?) comparing two solutions to the echo problem. The introduction of named variables in 2a points up correspondences between entities in the program and entities in the target domain, an example of closeness of mapping. At the same time, these correspondences help the reader understand the roles of these parts in the program.

Note that while these uses of names seem quite important from the representational perspective, choice of names is outside the scope of programming language design, as traditionally conceived. This situation parallels that seen in the discussion of application: it's crucial that a program be applied to the right thing, but languages don't provide support for this.

A further benefit of Version 2a is that the order in which the mappings are composed corresponds to lexical order there, while it is reversed in Version 1a. That is, the operation mix, that comes last in composition order, is written last in Version 2a, but first in Version 1a.

On balance, these cognitive dimensions favor Version 2a over 1a. Yet common programming style, including the recommendations I have made to my intro students in the past, favors 1a. This mismatch seems to reflect the lack of a representational perspective on programs, a perspective that is needed to exploit closeness of mapping, with derivative advantages in role-expressiveness.

It's interesting to speculate whether notations could be devised that would reduce the problems of hidden dependencies and hard mental operations that both versions show.

Flexibility in Programs II: Composition

As mentioned in the previous post (Flexibility in Programs I: Application), another kind of flexibility of computational representations is that computational operations can be combined in different ways to produce different results. A simple illustration is that the attenuate operation that appears as part of the program to add echoes could also be used by itself just to produce a softer version of a sound. To account for this flexibility we need to decompose our computational representational system in a further way.

We know that in the echo program the mapping do-echo-computation is built up by combining the operations attenuate, delay, and mix, as shown here:



Figure 1

Figure 1a shows the individual character of the computational operations shown in combination in Figure 1:


Figure 1a

We can see that attenuate and delay each take a sequence of samples as input and produce one as output. The double tailed arrow is used to show that mix takes two sequences of samples as input and produces one as output. Figure 1 shows just one particular way of combining these operations so as to produce a new operation, one of many. Clearly we could have a program that just does attenuate, as shown in a representational system in Figure 2:


Figure 2

Mathematically, the process of combining operations is composition. Given mappings whose outputs fit the inputs of other mappings, we can create a composite mapping that combines the effects of these mappings.

Against this background, a program can be seen as specifying two separable things: a collection of operations, and a way of composing those operations (to create a mapping that corresponds to some desired operation in the target domain.) Because these two things are separable, we do not have to start from scratch when we want a new combination of operations, just as we don't have to start from scratch when we want to add an echo to a new sound.

Composition and Notation

As we've seen, a program specifies a collection of operations, and a way of composing the operations to create a desired mapping. How do programming languages support this activity? This is itself a problem in representation: our target domain has mappings, with operations that include the various ways we might wish to compose them. A programming language represents these things in a way that satisfies two requirements:

(1) There has to be an operation on programs that allows us to get the effect of composing mappings.

(2) Working with programs has to allow people to obtain these effects easily and accurately, in a sufficient range of situations.

Note that these conditions correspond to Mackinlay's expressiveness and effectiveness conditions, as discussed in the post for February 17, 2008
Representation and Cost.

What does Condition (1) entail about programs? One might think that a picture like Figure 3a would capture the essentials of the situation:




Figure 3a
Here there are two mappings in a target domain, m1 and m2, and their composition, m3. There is a mapping notate that takes mappings to programs, and another mapping, interpret, that takes a program to a mapping. It would seem that programs in this representational system, and an operation on them, accurately represent the mappings and the composition operation.

But how can interpret actually work? Since the idea of a representational system is that one obtains the actual results of work in the target domain, by passing through the representational domain, interpret would have to actually produce the mapping m3. But how can that be done? What does it mean to actually produce a mapping?

An auxiliary picture, Figure 3b, explains this.


Figure 3b
Here a mapping m is shown in a setting that includes the things on which it operates and which it produces. The picture also includes not only the program p, but also an implementation, ip, of program p. The implementation is an actual physical apparatus that operates on and produces things in the target domain. For example, in the echo example, the combination of record, do-echo-computation, and playback, operates on an actual sound and produces an actual sound.

As Figure 3b suggests, for p to represent m requires not just that there be some correspondence between p and m, but that there be an implementation of p that can actually do the work of m. Figure 3c elaborates Figure 3a to show the implementations as well as the programs:


Figure 3c
Rather than including these entities whenever we show programs representing mappings, however, we can just assume that whenever we show mappings notate and interpret between mappings and programs that there are implementations available that satisfy the requirement in Figure 3b. It is these implementations that actually produce the mappings that programs represent.

Turning now to condition (2), note that this condition is about what people can do: if people can't do the required work, a programming language can't be a good representation. We'll use the term notation for representations that have to be operated on in this way by people.


Flexibility in Programs I: Application

Looking more closely at the echo program discussed earlier (March 20, 2008
What difference does it make? ) we can see that some important aspects are not captured by the representational view we've developed so far. In our diagrams we have shown programs operating on the representation of some particular sound, whereas in fact the programs can be deployed so as to operate on any song. Further, as seen in Figure 1,

Figure 1

the computational operations in the program are arranged so that an echo is produced, and not some other effect. But these operations could be arranged differently, so as to produce quite different effects, in a different but related program. So there are two kinds of flexibility in programs that we have not captured: the ability to use the same program to represent different sounds, and the ability to recombine the same computational operations so as to represent different operations in a target domain. These capabilities are crucial in reducing the cost of creating computational representations. How can we understand them within the representational framework?

Continuing to use the program in Figure 1 as an example, we can distinguish two aspects of the representation system in which the program participates. First, we can pick out the general relationships that exist, independent of identifying any particular sound, or any particular arrangement of operations. As this diagram shows,

Figure 2

there is a correspondence between the domain of sounds, as a body, and the domain of sequences of samples, that we can refer to without picking out any one sound, or any one sequence. As shown there, the operations in the target and representational domains also appear as mappings connecting whole bodies of sounds, or whole bodies of sequences of samples. That is, we do not need to think of these operations as acting on individual sounds or sequences, but rather as connecting whole collections of sounds or sequences. In this way of looking at the situation, the requirement that a representational system actually works is the requirement that the composition 'record'--'do echo computation'--'play back' equals the mapping 'echo' (for now we neglect questions of approximation; see February 17, 2008 Representation and Cost.)

With the general relationships among collections of things laid out, the second aspect of the representation system we need to attend to is how we can deploy the operations in it so as to secure some particular desired result. After all, Figure 2 just tells us that there is a way to add the echo, it doesn't actually add it to anything. To get this to happen we have to have a way to pick out a particular sound, and then cause the operation shown in Figure 2 to be carried out as appropriate for that sound. Let's call this part of of our work application: we want to apply the operations in the representational system to a particular input.

In practice, this important matter is handled rather haphazardly in programming systems. In one scenario, application starts with a physical recording process, and what sound I am applying that process to is determined by when and where I do it: it applies to whatever sound is happening at that time and place. The recording process produces a computational entity called a file, which is given a name and stored in a certain way in a computer. Applying computational operations to that file, and not some other, usually depends on using the name in an appropriate way, and also carrying out the computational operations in a particular context, one that associates our particular file with the operations, and not some other file with the same name stored in a different way. Even though getting the intended results from our computational operations depends crucially on these operations (recording, naming, and storing) being carried out in the right way, programming systems don't specify how they should be done, nor provide any check that they have been done correctly. For example, if we use the wrong file name when setting up our computational operations there will no indication whatever that we have done that, provided only that the name we use picks out some file of the appropriate in our context.

These defects aside, we see that we can account for some of the flexibility of computational representational systems. These systems can be applied to different particular things in their target domains. We don't have to build a new system when we want to add an echo to a new sound. This works because we can separate the construction of the system from its application.

Saturday, April 26, 2008

Some Philosophical Perspectives

There is an extensive philosophical literature on representations. Much if it is aimed at understanding mental representations, which isn't our concern here, but some of the arguments presented are intended to clarify the nature of representations generally.

In what way do representations represent what they represent? I've argued that (i) a representation represents something only in the context of a representational system, in which its correspondence to a target is picked out. I've also argued that (ii) representational systems arise, or are devised, because they allow operations to be performed with less work, in an inclusive sense, than they can be performed without them. These arguments are similar to those presented by Ruth Millikan in her Language, Thought, and Other Biological Categories (MIT Press, 1984) [Millikan's terminology is different; she reserves the term "representation" for situations in which something has as "its focused proper function to precipitate acts of identification of the referents of its elements (p 198)." For example, a bee dance, clearly a representation in our sense, is not a representation for Millikan, because its purpose is not to identify any referent.]Allowing for difference in terminology, Millikan argues that (i) what representations refer to is determined by the function it serves in the (biological) system in which it is embedded, and that (ii) this function is seen in the adaptive value of the representation in an evolutionary sense. My arguments (i) and (ii) are extensions of these beyond the domain of biological systems.

A conflicting interpretation of representation is sought by Cummins in Representations, Targets, and Attitudes (MIT Press, 1996), where he argues for a "conception of representational content" that is "independent of use or functional role" (pp. 85-86). Cummins's central worry about meaning-as-use theories of representation is making sense of error in representations. We want to talk about a representation being "wrong", but if its meaning is determined by its use, how can a representation ever be "wrong", rather than meaningless?

Cummins's account of "wrong" representations relies on assigning "content" to representations, "independent of use or functional role", and comparing the content with the "target" of the representation, the thing that it is meant to refer to in a given situation. A representation is "wrong" if there is a mismatch, as when a map of Paris is used when New York is the target. But, as we've seen earlier, computational representations, like those using blocks of bits, simply can't be assigned content out of context. So a different account of content, and "wrongness", has to be offered, at least for these cases.

In the account I am suggesting, we identify a "wrong" representation when we analyze a system of things and relationships as if it were a representational system, but then find that the crucial requirement, that mapping operations into the representation domain and then back has to save work, doesn't hold. For example, a map of Paris is "correct" as a representation when seen in a representational system for navigating in Paris, but "wrong" in a system for navigating in New York: we don't save any work trying to get around Paris using a map of New York.

This kind of usage for "wrong" is common in talking about complex systems that partially satisfy some description. Rather than refusing to apply the term "representation" at all to a system that only partially meets the definition, we apply the term with a qualifier, "wrong" in this case, allowing us to describe both the points of agreement with the definition and the discrepancy. (Viewing the application of a term like "representation" as itself an instance of representation, one can say that using the term, with qualifications, provides value as long as the value gained by applying the term outweighs the complication involved in managing the qualifications. If a system shares very little with a full-fledged, working representational system, describing it as a representational system won't be useful. But there is no bright line separating appropriate from inappropriate usage, any more than there is bright line separating "insufficiently precise" measurements from "sufficiently precise" ones.)

This treatment is consistent with Millikan. Allowing as before for difference in terminology, for Millikan a representation is "wrong" when it fails to perform its proper (biological) function.

What does a representation represent?

It's common usage to talk about what something represents, as in "this sequence of numbers represents that sound." But it's clear that computational representations, at least, can't be said to represent anything, taken out of context. The same sequence of numbers, in the context of different uses, could represent a time series of stock prices, or a time series of predicted water levels in a reservoir, or ... . More radically, blocks of bits in memory are used to represent all the different things that can be represented computationally, and a given pattern of bits can represent numbers of different kinds (floating point or integer), or a group of characters, or a part of a data structure tied together with addresses, or... . In the context of a particular representational system, a block of bits can be said to represent whatever it is that it corresponds to in that system. But unless that context is assumed it is really meaningless to ascribe representational content to a block of bits. Anytime we ascribe content we are presuming context.

There's nothing unusual in this situation. Many attributes are commonly ascribed to things that really are determined in complex ways by the situations in which the things are encountered. Colors cannot be assigned to things independent of the viewer, for example, or independent of the context in which the thing is viewed (for striking demonstrations see http://www.purveslab.net/seeforyourself/). A thing can be said to be "large" or "small" only in a comparative context, usually implicit, and so on.

Thursday, March 20, 2008

What difference does it make?

Does it make any difference to view computational systems as representational systems? Yes. Here is a comparison of two programs I wrote as instructor's solutions for the same homework exercise in an introductory course using Python. The problem is to read a list of sound samples from a WAV file, and record a new sound, consisting of the original plus an echo. The echo is delayed by .15 sec, and attenuated by a factor of .25. This is the same example discussed in the post, Computational Stuff, a Worked Example.

Program 1 is the solution discussed in the earlier post, with the portion of the solution that records the song with echo slightly modified for comparability with the solution presented as Program 2.

#Program 1
originalSong=readwave("song.wav").getLeft()
softerSong=[.5*e for e in originalSong] #line 1
intervalOfSilence=int(.25*sampleRate)*[0] #line 6
delayedSofterSong=intervalOfSilence+softerSong #line 7 #line 2
#line 3:
songWithEcho=[originalSong[i]+delayedSofterSong[i] for i in range(len(originalSong))]
outputWAV=makewave("modifiedsong.wav",44100)
for sample in songWithEcho:
outputWAV.record(sample,0)
outputWAV.close()

Program 2 is my solution to the same problem, written earlier without considering the representational perspective:

#Program 2
originalSong = readwave("song.wav").getLeft()
echospread = int( 44100 * .25 )
outputWAV = makewave("modifiedsong.wav", sampleRate)
for i in range(0, echospread):
outputWAV.record(originalSong[i], 0)
for i in range(echospread, len(originalSong)):
outputWAV.record( originalSong[i] + originalSong[i - echospread] * .5,0)
outputWAV.close()


Both programs produce correct results. Indeed, they produce the same results. But they work quite differently.

It is difficult to identify any computational operation in Program 2 with an operation in the target domain of sounds, or to identify any data structure in Program 2 with a sound, except for originalSong. The thrust of Program 2 is to record certain numbers in a file, not to construct a representation of a sound that is produced in a certain way.

In contrast, in Program 1, softerSong, delayedSofterSong, and songWith Echo all correspond to sounds, and they are created using operations on sequences of samples that correspond to operations on sounds: attentuation (in line 1), delaying (in line 2), and mixing the original song with the echo (in line 3).

Here are diagrams showing these relationships in the two programs.


Here is shown the original song, bouncing off a reflector shown at right, in attenuated form, being delayed, and being mixed with the original song. Color coding and dashed lines show which things in the representation domain correspond to which things in the target domain. Solid lines show which operations in the representation domain correspond to mappings in the target domain.

Notice that the computational representation produces only an approximation to the actual echo effect. As shown, when the recorded sound is played back the echo is truncated, because when mixing is done in the program the resulting sound is only as long as the shorter of the two sounds being mixed.
The similar diagram for Program 2 has many fewer correspondences to point out. The value of originalSong corresponds to the original song in the target domain, but there are no other simple correspondences. Because Program 2 builds up the recording sample by sample, the operations of attentuation, delay, and mixing, are hidden in the arithmetic of the construction of the samples. The prominent decomposition of the program into two loops make sense in view of the change in arithmetic needed to produce two groups of samples, but this split doesn't reflect anything about how echos are produced physically.

Notice also that there is a mapping, "record samples" shown as an arrow for Program 1 that appears only as a floating label in Program 2. In Program 1 there is a list of samples created that corresponds to the song with echo, and this list of samples is recorded, an operation easily understood as a mapping. In Program 2, there is no list of samples that is created and then recorded; rather, samples are created and recorded piecemeal. Recording is happening, but not in a way that can be understood separately from other operations in the program.

So what? For an experienced programmer, Program 2 isn't hard to write. One imagines the numbers one has to record, and figures out how to do the arithmetic that will produce these numbers, in the process of recording them. One's understanding of the physics of sound plays a role only in determining what the correct numbers are, not in writing a program to produce the numbers.

But for beginners, Program 2 is difficult to write. Experience is needed to relate what the numbers are to a method for producing them.

Program 1, on the other hand, is easier to write, if one understands the physics of echo production and the computational operations that correspond to operations on sounds, as discussed in the earlier post. Writing the program requires only that the computational operations that correspond to attenuation, delay, and mixing be deployed and combined, and that one knows how to record a list of samples.

Saturday, March 8, 2008

Human-Centered Computing and Representation

Here is the synopsis of the program description for the Human-Centered Computing Cluster at the National Science Foundation (see http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=500051):

This cluster, Human-Centered Computing (HCC), encompasses a rich panoply of diverse themes in Computer Science and IT, all of which are united by the common thread that human beings, whether as individuals, teams, organizations or societies, assume participatory and integral roles throughout all stages of IT development and use.

HCC topics include, but are not limited to:

  • Problem-solving in distributed environments, ranging across Internet-based information systems, grids, sensor-based information networks, and mobile and wearable information appliances.
  • Multimedia and multi-modal interfaces in which combinations of speech, text, graphics, gesture, movement, touch, sound, etc. are used by people and machines to communicate with one another.
  • Intelligent interfaces and user modeling, information visualization, and adaptation of content to accommodate different display capabilities, modalities, bandwidth and latency.
  • Multi-agent systems that control and coordinate actions and solve complex problems in distributed environments in a wide variety of domains, such as disaster response teams, e-commerce, education, and successful aging.
  • Models for effective computer-mediated human-human interaction under a variety of constraints, (e.g., video conferencing, collaboration across high vs. low bandwidth networks, etc.).
  • Definition of semantic structures for multimedia information to support cross-modal input and output.
  • Specific solutions to address the special needs of particular communities.
  • Collaborative systems that enable knowledge-intensive and dynamic interactions for innovation and knowledge generation across organizational boundaries, national borders, and professional fields.
  • Novel methods to support and enhance social interaction, including innovative ideas like social orthotics, affective computing, and experience capture.
  • Studies of how social organizations, such as government agencies or corporations, respond to and shape the introduction of new information technologies, especially with the goal of improving scientific understanding and technical design.

Can this broad area of work, defined mainly by example, be given conceptual coherence? Focusing on the role of people in the representational work of computational system suggests that it can. Further, the representational perspective shows that HCC is not, as some have thought, a peripheral aspect of computing, but rather a central aspect, not just in its practical importance, but also in the intellectual content it shares with other areas of computing.

As we have seen, some operations in representational systems can be automated, that is, carried out by machine. But some operations, such as making a judgment from a bar chart, are carried out by people. For a representational system to work, the cost structure of these human operations have to meet the same conditions as the cost structure of automated operations, including conditions on accuracy and reliability.

While the cost structures of automated and human-performed operations have symmetric status in designing or analyzing representational systems, the specifics naturally are different. For example, what computers like to do isn't a relevant or even well-defined consideration, while what people like to do certainly is. Here are a few examples that illustrate some of the factors that influence the cost structure of operations performed by people in various representational situations.

Bar charts: The bars are often drawn by computer, and then people make the actual judgments. When a person implements a mapping in this context, they perceive the bars, and then create an output. As discussed earlier, if the bars are base-aligned and parallel, the length comparison will be easy and accurate.

Picture processing: Recognizing objects or people in a scene, or judging aesthetic qualities, or many other everyday operations, can be done from pictures. That is, if a real scene isn't at hand, it can be represented by a picture, and judgments made from the picture can substitute for the judgments that would be made from the real scene. Conveniently, suitable pictures can be created and displayed by computer, taking into account the following characteristics of human vision.

Color perception: For homo sapiens the brightness of just three colors can be chosen to reproduce perfectly any color (but see http://www.purveslab.net/main/ for complications that are at work in color vision). A display for dogs would only need two colors (dogs only discriminate short from long wavelengths.)

Limited spatial resolution, and blending: Below a certain size, closely spaced dots merge into a smooth-appearing area. Exploiting this principle, and the principles of human color perception, most displays work by presenting a large number of tiny, closely spaced dots, of three colors. Even though this arrangement produces images that are physically very different from "real" scense in the world, to the human visual system these images look realistic, and they are processed effectively and easily.

Depth perception: There are many cues for depth, that is, distance from the viewer. Nearly all of these are included in two-dimensional images. For example, perspective, the effect by which more distant objects form smaller images, or aerial perspective, the effect by which more distant objects appear fainter or more hazy, because they are seen through thicker intervening atmosphere, are preserved in two dimensional images. A depth cue that is important for objects near at hand is stereopsis: images seen by the two eyes differ in a way that is related to distance. Displays that exploit this indication of depth present different images to the two eyes, for example by using differently polarized light for the two images, and placing differently polarized filters over the two eyes. As for images made of dots, notice that the resulting pattern of light is very different from the pattern of light in a real scene. But it is different in a way that the human visual system doesn't detect. There could be organisms whose visual system is sensitive to the polarization of light. For such organisms these displays wouldn't work.

Moving from pictures to moving scenes, presentation of movies and videos relies on further facts about how people see. Just as closely spaced dots merge into the appearance of smooth surfaces, so images closely spaced in time merge into the appearance of smooth motion. As we all know, a movie consists of a series of frames, each a static picture, changed very rapidly, but we don’t see it that way. One could imagine a Martian in a movie theater wondering why Earthlings like to sit in the dark and watch long series of very similar pictures flashed on a screen. Video is more complicated, in that the pictures aren’t even complete: they are collections of closely spaced lines, with only half the lines included in each frame. But this ridiculous presentation looks quite good to the human visual system.

These examples bring out what the point of human centered computing is. Effective displays are not based on faithful physical and geometric reproductions of the signals available from real scenes. Rather, they systematically exploit facts that are specific to the human perceptual apparatus. Representational systems in which humans play a role cannot be designed without at least implicit knowledge of these facts.

Input forms, created by people.

In the examples just discussed, operations are carried out by people on entities, like bars, that are presented to them by a computer. From the point of view of the computer these are output forms. But it’s usually also necessary for people to produce forms that are given to the computer, so as to provide data for the computer to operate on, or to control what the computer does. From the point of the computer these are input forms. Just as for output forms, input forms have to be shaped to fit people’s capabilities. For output forms the human abilities that matter center on perception. But since input forms have to be produced by people, the key human capabilities are those of action. What are the actions people can control, to produce forms usable by a computer?

Keypresses. Most people have dexterous fingers that can be moved quickly and accurately so as to apply force to appropriately sized objects. Keys and buttons are such objects, and are designed to fit people’s fingers in size, the force required to activate them, and (for high quality keys) the feedback they provide to confirm that they have been activated. Most people can move their fingers so as to press more than one key at a time, and this ability is exploited in many musical instruments and a few computer input devices, such as chord keyboards.

Text entry. Often keys are used in groups to allow people to specify sequences of characters making up a text. For people who know a written language, many pieces of text are familiar, and can be generated quickly and accurately, whereas random sequences of keypresses can be entered only slowly and with high error rates.

Drawing. Most people can use their hands and fingers to grasp a pointed object of appropriate size and shape, and move the point along a desired path. With more difficulty, most people can move an object that has no point (a mouse) so as to control a pointed marker whose movement traces a desired path. In either case, the path can be sensed and act as input to a computer.

We could add many examples to this list, and perhaps invent new ways to use actions to communicate. For example, HCC researchers are developing ways for people to use facial expression, or tone of voice, to create input forms.

HCC and social systems.

Humans are social animals: most things that people do are done in groups. This influences human-centered computing in a number of ways.

There are usually multiple people in a computational system. The designers of such systems therefore have to understand not just what individual people are likely to do, and can do, but what people working in groups will and can do. For example, sometimes systems fail because some users don’t do things, like entering information into a data base, that are needed to support other users. But sometimes, as in Wikipedia, people working strictly as volunteers put huge amounts of effort into really useful contributions. If you are designing a system for a lot of people to use, you have to try to understand what shapes contrasts like this. In our terms, we need to understand the cost structures of operations performed by groups, not just by individuals. Plainly, our ability to predict what will happen naturally (and hence cheaply) in a group is poor.

Decisions about system design have to reflect the needs and wants of groups of people, not just those of individuals. Historically, before computers became cheap enough for individuals to buy them, nearly all computational systems were created to meet the needs of organizations, especially businesses. Today, even though it is common for individuals to own one or more computers, facilities for communicating with other people, especially via the Internet, are now crucial for almost all users. People need to communicate, in many situations, and want to communicate in many others. The design of future computational systems will continue to be shaped strongly by these needs and wants. Think of MySpace, or FaceBook.

These considerations, too, shape the cost structures we are concerned with as designers of representational systems. Here we are seeing the value side of cost: systems need to deliver value to offset their costs.

HCC and particular user groups.

People have different capabilities. Because, as we've seen, representational systems are shaped by users’ capabilities, differences in capabilities need to be reflected in different designs. Assistive technology uses representations that are suited to the needs of people with limited vision, or limited ability to read text, to mention just two examples. Inclusive design seeks to create representational systems that can be used by people with the widest possible range of capabilities.

To come: representation and programs; philosophical perspectives.

Saturday, March 1, 2008

Computational Stuff, A Worked Example

Suppose you are listening to a sound, say a bird song, and you think, “I’d like to hear that song with an echo.” Good luck doing that in the target domain, the leafy glade in which you are enjoying the ambiance. You could try to wheel up some kind of sound reflector (if by some miracle there is one handy) but the bird would probably fly off while you do it. Even if it stays around it may not sing for you.

What you’ll need to do is map the bird’s song into some representation domain, say a pattern of magnetization in a film of oxide on a cassette tape. (Of course, doing a mapping of this kind is what we call "recording".) Now, with two cassette players (one to play the original song, and one to play the song again, starting a little later, at reduced volume) you can get your effect. But there’s a good deal of work (you have to copy the tape), and some dexterity to get the right delay, involved.

If you map the bird’s song into computational stuff, things are much easier. You can automate the production of the echo, for the bird’s song or any other sound. That is, given any sound, represented computationally, you can get the sound with the echo just by asking for it. No more work than that.

Representing Sounds using Sequences of Numbers

You hear a sound when certain cells in your ears detect rapid changes in air pressure, with the pressure going up, going down, going up, going down and so on, very rapidly. How big the swing in pressure is determines the loudness of the sound. How rapidly the pressure swings up and down determines the pitch or frequency of the sound. The more rapidly the swings happen, the higher the pitch. Human ears are sensitive to swings that go up and down between about 20 times per second and about 20,000 times a second.

If you measure air pressure, very rapidly and accurately, you’ll get a collection of numbers. If you have some way of working out when each measurement was made, you have captured a representation of the sound: you know when the pressure goes up, and how high, when it goes down, and how far, and so on. If you had some way of taking these numbers, and creating air pressures that match them, with the right timing, you could reproduce the sound, at least roughly. The roughness comes in because you don’t really know what is supposed to happen to the air pressure in between your measurements. But if our measurements are close enough together we can hope that the roughness won’t matter.

In theory, you could do all of this by keeping track separately of the time associated with each air pressure measurement. In practice a simpler scheme is used. If we know how many measurements per second we are making, evenly spaced in time, we can work out the time associated with each measurement, without having to store the time for each measurement.

If we want to represent a sound, then, we need a big collection of numbers (44,100 numbers for one second of sound), which are the pressure measurements. We also need to keep track of how often we made the measurements, 44,100 times a second in our example. We need to know this number to play back the sound: if recreate the air pressure assuming the measurements were collected more or less often, the sound we get won’t approximate the original, because the timing of the air pressure changes will be off.

The terminology that is used in talking about this stuff is different from what we’ve been using. Rather than “air pressure measurement” we’ll say sample from now on, and the number of measurements we collect per second when recording is called the sample rate. Using this terminology, we can represent a sound by a sequence of samples, also keeping track of the sample rate that was used when they were recorded, and that should be used to play them back.

Doing Work on Sounds

For our representation to be useful we have to be able to work on it that corresponds to work we might want to do on sounds. Here are some example operations.

Making a sound louder. You hear that bird song, and you want it louder. You could try to make the bird angry, but that isn’t very likely to work. Instead, you map the song to a sequence of samples, and you multiply each sample by some factor greater than one. If you now play back the new samples, you’ll get a louder version of the original song. Here’s a picture of the representation system at work here:


Making a sound fainter. If you multiply each sample by a number less than one, and play back, the sound will be softer (because the swings in pressure aren’t as big.)

Hearing one sound and then another. You hear one bird song, and then, after a while, another. You decide you’d like to hear them together, one right after the other. You map each song to a sequence of samples, and then you stick the two sequences together to get one long sequence. When you play back the long sequence you’ll hear what you want.

Delaying a sound. If you have a sequence representing some silence, and a sequence representing the sound you want to delay, you can stick the sequences together, with the silence first. When you play this sequence back, you’ll first hear the silence, and then, after a delay, the sound.

Hearing two sounds together. You hear one bird song, and then another. You think they might harmonize, so you wish the birds would sing together. Good luck trying to get them to do that! Using our representation, this is easy. Get the sequences of samples for each song, and then make a new sequence by adding together to first samples in each of the songs, then the second two, and so on. It may not be clear that this will work. But if you think about the physics of the original sounds, you may be able to see that the individual air pressures add up in just this way.

Making an echo. Suppose we have our favorite bird song, and we want to hear it with an echo. What happens in an echo is that the original song hits our ears, but it also goes off and hits some reflector, like a cliff, and bounces back to our ears from there. So what is coming to our ears at any moment consists of whatever we are hearing of the original song, plus the song that is bouncing back to us. Since it takes time for the song to get to the cliff and back to us, what we hear in the echo is delayed. Also, since the bounce off the cliff produces some scattering, sound that doesn’t get back to our ears, the delayed, bounced version of the song is softer that the original.

Here’s how we can do all this, step by step.

1. Record the original song, getting a sequence of samples. Let’s call this sequence originalSong.

2. Make a softer version of the original, by multiplying each sample by (say) .5. Call this sequence softerSong

3. Record .25 seconds of silence, getting a sequence called quarterSecondOfSilence.

4. Stick the sequence quarterSecondOfSilence onto the front of softerSong. Call the resulting sequence delayedSofterSong.

5. Add together the samples in originalSong and delayedSofterSong. If we play back the resulting sequence, we’ll hear the original song with an echo.

Representing Sounds using Computational Stuff

In theory you could do all of this with pencil and paper, if you had some way to read the sample numbers, and to get the samples from your paper into some kind of player. But at 44,100 samples in a second of bird song, that’s a lot of paper and pencil. You want to get a machine to do the work.

To do this, we are going to represent the sequences of numbers we are using to represent the sounds, using computational stuff. Notice the three levels here: sounds are what we are interested in, and we use sequences of numbers to represent them. But then to be able to automate work on the sequences of numbers, we represent them using computational stuff.

We’ll do this using the Python programming language.

We are going to used things called lists in our Python programs to represent our sequences of samples.

To do the above work, the things we need to do are: get a list of samples representing the original bird song, multiply each sample in a list by a factor, get a list of samples representing .25 sec of silence, stick two lists together, and produce a new list by adding the samples in two lists together. We’ll also need to play back a list of samples. Here’s how to do these things in Python.

Basics of lists. A list is a sequence of things, numbers in our case, written like this: [1,22,47,29]. You can give a list a name, using an assignment statement, like this:

eggplant=[1,22,47,29]

Now we can refer to any of the elements of the list eggplant by using a number as an index. The indices start with 0, so eggplant[0] is 1, eggplant[1] is 22, eggplant[2] is 47, and eggplant[3] is 29. Indices are also often called subscripts, from mathematical notation).

Get a list of samples representing the original bird song. Rather than make a field recording, we’ll get our samples from a WAV file. Because computational representations of sounds are so handy, machinery has been created for recording sounds and storing them as collections of information on computer systems. Collections of information that can be filed away on the computer for future use are called files. A WAV file is a file that contains samples of a sound, stored in a particular way.

Here is a program that gets a list of samples from a WAV file named "birdsong.wav", and prints the first 10 of them:

originalSong=readwave("birdsong.wav") for i in range(10): print originalSong[i]

Multiply each sample in a list by a factor. This statement is all we need:

softerSong=[.5*e for e in originalSong]

This statement builds a new list, each of whose elements is made from an element of originalSong by multiplying it by .5. You can read it by thinking of the first thing in the brackets, .5*e, as a kind of pattern that shows how to make an element of the new list from an element of the old one. We know that e is an element of the list originalSong, because we wrote “for e in originalSong”. The square brackets tell us that we are making a list.

Get a list of samples representing .25 sec of silence. We could look for a WAV file that contains a recording of .25 sec of silence, but it will be easier just to make our own list of samples. This code will do it:

quarterSecondOfSilence=int(.25*44100)*[0]

This takes a little explaining. The number .25*44100 is the number of samples we need for a quarter second of sound, since we need 44,100 samples for each second of sound. The “int” with parens around it makes this into a whole number, or integer (you and I know that that product is already a whole number, but because .25 isn’t a whole number Python worries that the product might not be.) Then comes something funny looking: we are “multiplying” the list [0] by that number. When you “multiply” a list in Python by a number, Python sticks together that number of copies of the original list. So we are getting a list of .25*44100 copies of the list with just 0 in it. The result is a sequence of as many samples as we need to get .25 sec of sound, and each sample is 0.

Stick two lists together. This one is really easy. In Python, if I have two lists, I can just “add” them together. For example, [1,2,3]+[11,12] is [1,2,3,11,12]. So to produce a delayed version of softerSong we can just write

delayedSofterSong=quarterSecondOfSilence+softerSong

Produce a new list by adding the samples in two lists together. We’ve just seen that Python won’t do this if we just “add” the two lists with +... it will concatenate them. So this is a little more involved. Here is Python code that will do the trick.

songWithEcho=[originalSong[i]+delayedSofterSong[i] for i in range(len(originalSong))]

The expression originalSong[i]+delayedSofterSong[i] is the pattern for elements of the new list. The values of i used in the pattern come from the list range(len(originalSong)). The function range produces the list [0,1,...] with as many elements as the number you specify. Finally, len(originalSong) is the number of elements in the list, originalSong. This means that the values of i will step up from 0 until we have used as many values as there are things in originalSong, so in the pattern, originalSong[i] will get to be all of the elements in the list originalSong, which is what we want. Note that delayedSofterSong[i] picks out the corresponding elements of that list.

Play back a list of samples. We now hope we have a version of the song with the echo added, but we can’t hear it. Unfortunately we don’t have a way to convert a list of samples directly to sound from Python. What we can do, though, is put our samples into a WAV file, and then play the WAV file on our computer. Let's assume that the statement makewave("modifiedsong.wav",44100,songWithEcho) will put the samples in the list songWithEcho into a WAV file named "modifiedsong.wav". (Note: functions readwave and makewave aren't part of standard Python, but they can be implemented.)

Pulling these pieces together, here is a complete program for adding an echo to a sound, represented by a WAV file:

originalSong=readwave("birdsong.wav")
softerSong=[.5*e for e in originalSong]

quarterSecondOfSilence=int(.25*44100)*[0]
delayedSofterSong=quarterSecondOfSilence+softerSong songWithEcho=[originalSong[i]+delayedSofterSong[i] for i in range(len(originalSong))] makewave("modifiedsong.wav",44100,songWithEcho)

Below is a tabular summary of the methods we’ve discussed. The table distinguishes three domains, the sounds, the sequences of samples, and the lists (the computational stuff).



What is Implementation?

As was mentioned earlier, the key difference between computer science and mathematics is that mappings used in computing are (or could be) implemented, whereas mappings in mathematics need not be. But what does it mean to implement a mapping?

An implementation of a mapping is a thing with two ends, an input end and an output end. In the case of computer implementation, the thing will be a technical device of some kind. For example, an adder is a device whose input end can be set to correspond to a pair of numbers, say 2 and 3. After a little delay, the adder sets its output end to correspond to the sum of the two numbers on the input end, 5 in this case. Any computer will have one or more adders in it.

Similarly, to draw a bar chart we will want a device of some kind whose input end can be set to represent a length, and whose output end will produce a colored bar. In practice, you won’t find a bar-drawer as a separate device inside a computer. Rather, simpler devices are hooked together so as to act like a bar-drawer when needed.

The program above, when installed on an appropriately configured computer, implements some of the key stages of the production of a song with an echo. When an WAV file is presented as input, after some delay an appropriate WAV file appears as output. The steps connecting the original bird song in the leafy glade to a WAV file (recording), and connecting the output WAV file to a sound (playback) are not implemented by the program. If we were to examine their implementation, in practice, we would find that recording is probably implemented by a person operating a piece of special equipment, while playback is implemented by another program on a computer. The fact that mappings are often implemented by people leads into the topic of human-centered computing, to be taken up later.

Input and output forms.

We'll often need to refer to the states of the input and output ends of implemented mappings. Because implementations are very diverse, we'll want a generic term: we'll describe these states as input and output forms. For example, a bar graph displayed by a program may act as an output form the the program, and as an input form for a person viewing it.

Aside: Communication and Storage as Implemented Mappings

Many of the mappings one thinks of are like addition, in that the output is different from the input. But there are two very useful families of implementations of the identity mapping, the mapping whose output is the same as its input. Because the ends of a physical device can be separated in space, we can create a communication system, whose input end is in New York and whose output end is in Singapore. Here it is an advantage that the output is identical to the input.

Similarly, the ends of a device can be separated in time. A device whose input end is in September, 2006, but whose output end is in September, 2016, can be a storage system. Again, we are happy if the output is the same as the input.

Reprise: Advantages of computational representations.

The example we've discussed here provides concrete illustrations of the characteristic advantages of computational representations, representations using "computational stuff", over other forms. A sound represented as a sequence of samples can be stored in very little space, and accurately retrieved at a later time. It can also be communicated, that is, made available in another place, very rapidly and cheaply. These advantages give us the iPod, and the ability to acquire all kinds of material to play on it at very low cost. Finally, we can automate all kinds of transformations on sounds, incomparably more easily than can be done in other representation domains, such as magnetic tape, let alone in the target domain, the leafy glade.

Saturday, February 23, 2008

Herbert Simon, Sciences of the Artificial

Simon includes in this book (MIT Press, 1969) some early remarks on the importance of representation in problem solving and design. He uses as a motivating example the game of "number scrabble", in which players alternately take digits from the collection {1,2,3,4,5,6,7,8,9}, seeking to secure three digits whose sum is 15. Simon notes that this game is isomorphic to tic-tac-toe: if the digits are arranged thus,
4 9 2
3 5 7
8 1 6
the triples whose sum is 9 are exactly the rows, columns, and diagonals that constitute wins in tic-tac-toe. Thus a change of representation for the game allows players of number scrabble to engage their preexisting knowledge of how to play tic-tac-toe. Though Simon doesn't make the connection explicit, the context of this discussion suggests that he might also have been thinking that the re-represented game engages spatial reasoning abilities that the original game does not.

Simon calls for further work on representations: "This view can be extended to all of problem solving: solving a problem simply means representing it so as to make the solution transparent. If the problem solving could actually be organized in these terms, the issue of representation would indeed become central. But even if it cannot-- if this is too exaggerated a view-- a deeper understanding of how representations are created and how they contribute to the solution of problems will become an essential component in the future theory of design. ... We have only a sketchy and incomplete knowledge of the different ways in which problems can be represented and much less knowledge of the significance of the differences. ... But even though our classification is incomplete, and perhaps superficial, we can begin to build a theory of the properties of these representations. (pp. 77-79)

Robert Rosen, Life Itself

In this book (Columbia University Press, 1991) theoretical biologist Robert Rosen describes a conception of modeling that is closely related to the framework of representational systems described here. His Figure 3H.2, here,

connects a "natural" system, N, with a "formal" system, F, by encoding and decoding maps, in such a way that the composite map 2+3+4 agrees with the map 1 within N. This is a horizontal presentation of the same relationships I've shown vertically, linking a target domain and a representation domain.

Friday, February 22, 2008

Layering and Branching


Representations form natural layered structures, with the representation of one system being the target domain of another, and so, as shown schematically here:

A common family of layers in computational representations connects entities in some target domain with numbers, the numbers with collections of bits, and the bits with physical structures of some kind. Here's an elaborated example of this, starting with sticks, in a target domain at the top:


This diagram shows branching, as well as layering: for a given target domain there can be more than one possible representation domain. For example, numbers can be represented by elements of a data type (double) in a language like C, or by voltages in an analog machine. Elements in a data type can be represented by blocks of bits, and bits can be represented by very different physical structures, like magnetic cores, at left, or circuits with more than one pattern of circulation of current, at right.

The usefulness of representations is greatly increased by these two kinds of structure, layering and branching. Layering means that representations often can be devised and thought about without knowing or caring about most of the layers in a structure. Thus I can use numbers in a program without knowing or caring how the bits used to represent the numbers are implemented.

Branching works with layering to create vital flexibility. In my career the actual representation of bits has changed completely, from magnetic cores to nests of tiny transistors, but as a programmer this change affects me only secondarily, through improvements in cost and performance. The investments in software I and many others made were not lost, even as the hardware structures used to represent them changed utterly.

Although this example doesn't show it, there is upward branching as well as downward branching. Upward branching shows up when the same representation domain is used with more than one target domain. Clearly, lots of things can be represented by numbers with addition as a combining operation, for example. Thus upward branching signifies multiple use of representations. Mathematical (and computational) structures get much of their usefulness from the fact that they can be used to represent lots of different things.

Layering and the Organization of Disciplines

Looking at bundles of layered structures one sees a tree with branches (or roots) at top and bottom:


The branches at the top radiate into all the corners of human endeavor, from management to chemistry, and beyond. The roots of the tree extend into the wide variety of physical structures that are used, way down at the bottom of all the layered structures, to implement the representations above them. The traditional discipline of computer science focuses on the structures in the trunk of the tree, largely independent of the branches above, that distinguish different applications, and at the same time largely independent of the roots below, that distinguish various possible technologies of implementation.

Mathematics also inhabits this middle zone, studying structures that are largely independent of their applications, above, and certainly independent of the layers below. Indeed, mathematics is different from computer science, or vice versa, largely in the demand in computer science that its representational structures actually be implemented, that is, that they be grounded, or at least groundable, in physical structures.

(Telecommunications is also in the trunk along with computer science and mathematics. A good part of this field concerns communication in a way that is disconnected from the content of communications, as worked out in layers above the communication system, and from the implementation, worked out in layers below, where it is worked out that a system is built from electric wires, or from optical fibers, or from radio transmitters and receivers.)

The fact that the disciplines can be sorted out in this way, by reference to layered representational systems, does not mean that there are important advantages in doing so. The success of tight collaborations between computer scientists and scientists in other disciplines, like those that have led to the explosion of computational biology, suggests the value of understanding multiple layers at the same time, in devising new and more useful representations. Looking down from traditional computer science, there have also been important gains in blurring the traditional software-hardware boundary, in devising implementations that are sensitive to attributes of programming languages, and vice versa.

Donald Campbell, in his "Ethnocentrism of the disciplines and the fishscale model of omniscience", 1969, showed how very simple organizational processes lead to disciplines withdrawing from boundary areas. Computer science may stand to gain even more than other disciplines from fighting this tendency, and actively developing its boundary areas.

Fun with Theory of Measurement

Suppose we have three sticks, and we want to know whether the first two, when stuck together end to end, are longer or shorter than the third. We all know that we can measure the first two sticks, add the numbers we get, and compare the sum to the number we get when we measure the third stick. Would it be possible to measure length in such a way that when we combine the lengths of the first two sticks we would multiply instead of adding?

The answer is, yes, it is possible. This picture shows the unusual kind of ruler we would use:

This shows how it works:
Notice that the null stick has length 1 with this ruler. A little thought shows that this must be true: the null stick is the identity for putting sticks together, and so its measure must be the identity for the operation we use for putting lengths together.

Now that you see you can use multiplication rather than addition to combine lengths, do you think we could use subtraction?

The answer is that we can't. Putting sticks together is commutative, where comparison is concerned: it doesn't matter which stick we start with when putting them together. But subtraction is not commutative. So we can't use subtraction to represent putting sticks together.

Sunday, February 17, 2008

Representation and Cost

The point of a representational system is to move work from the target domain into the representation domain, where it can be done more easily, or more accurately, or better in some other way. I'll refer to the complex of issues that arise here (speed, effort, etc.) under the general heading of cost.

Mackinlay's thesis, cited in the previous post, distinguished two aspects of representations, expressiveness and effectiveness, in a way that illustrates the importance of cost. Expressiveness refers to the logical adequacy of something in a representation domain to capture the structure of a target domain. Mackinlay and Genesereth provide a nice example of an expressiveness failure, pointing out that the relation nested-within between closed contours in a diagram can't be used in general to represent contiguity of geographic regions like states or provinces, because nested-within is transitive, and contiguity is not. So expressiveness is a mathematical constraint on representational systems.

Effectiveness, on the other hand, refers to the ease and accuracy with which a human judge can extract correct answers from a visualization (the kind of representational system Mackinlay addressed.) This diagram shows three bar graphs that might be used to represent the fuel economy of two cars, as in the example in the last post:

The three representations are equally expressive: in each the yellow bar is longer than the red bar. But the representations differ considerably in effectiveness: human viewers find it easy to compare the lengths of bars when the bars are parallel and base-aligned, and only chart A has these attributes.

We can generalize the considerations in effectiveness to a much broader range of situations, including ones in which human perceptual judgments play no or little role. In many situations where computational representations are used, crucial operations in the representation domain are carried out automatically, and the relevant considerations in evaluating the representational system center on the cost of carrying out the operations by machine. The value of the representational system may hinge on how rapidly the needed operations can be carried out, and whether or not adequate memory is available, on a machine with an acceptable price.

Cost of creation, maintenance, and use.

Another kind of generalization considers costs beyond those incurred in using a representational system to answer some particular question. A system that provides fine answers at low cost will not be useful if is very expensive to create, and if this first cost cannot be amortized over enough usage. Thus in many situations the cost of creating a representational system for a given problem or family of problems will be an important consideration, motivating improvements in the technology of creating representational systems such as programming languages that can more easily be applied in connection with various target domains.

Another practical problem with computational representations is maintenance, the work required to modify a representational system so that it continues to support a shifting portfolio of work in a target domain that also shifts over time. Famously, the cost of maintenance commonly exceeds the cost of initial creation for many computational representational systems.

Cognitive dimensions analysis, as developed by Green, Petre, Blackwell, and others (see http://www.cl.cam.ac.uk/~afb21/CognitiveDimensions/), provides useful insight into the aspects of representations that determine costs of creation and maintenance, and I'll return to this subject below, when discussing the role of programming languages in representational systems.

Approximation and cost.

In the introductory post I said that the results obtained by moving work into the representation domain have to agree with those that would have been obtained by staying in the target domain. That's not quite true. The results only have to be "good enough", or "close enough", or "approximately correct". But what do these criteria really mean?

The crux is that the cost of the path to an ultimate result in the target domain that detours through the representation domain has to be less than that of the cheapest available path that stays in the target domain. The results obtained from the work in the representation domain are "good enough" just when the incremental costs, of all kinds, associated with any difference between these results and those that would have been obtained by staying in the target domain, are less than the savings realized by moving work into the representation domain.

For example, suppose in the situation with the log and the chasm that we aren't able to measure the log and chasm with complete accuracy (as will always be the case.) Using measurement, and moving the work of comparing the log and chasm into the representation domain of numbers, is nevertheless worthwhile, as long as the inaccuracy doesn't increase by much the probability of an expensive failure, when we act on the results of the measurement process, in placing or not placing the log across the chasm. How much error is acceptable? Depends on the many practicalities of the situation: Are there other logs available if we drop one into the chasm? How soon do we need the bridge? There's the further consideration that we can often tell whether measurement error is likely to matter, when we see the results: if the log measures much longer than the chasm, we don't worry about small errors.

These multiple practicalities suggest that we can't expect too much from a purely formal treatment of the "correctness" of a representational system. It's clearly much too strong a condition that the mappings from input to result in the target domain, and from input into the representation domain, over to a result in the representation domain, and then back into the target domain, strictly commute, in the parlance of category theory, though category theory may provide some useful insight. I'll return to category theory in a later post.

Saturday, February 9, 2008

Getting started

The mantra of this blog is, "computational systems are representational systems".


The aim is to provide a view of what computer science is all about. We computer scientists haven't been very good at explaining the intellectual basis of our field, and this may be one of the reasons we are struggling to interest students, and potential collaborators, in our field, even while the importance of computing in all aspects of life is growing by leaps and bounds.

I've been developing the representational view of computing for a while now, giving talks and having encouraging conversations with folks at conferences, and the blog is a way to spread the ideas around, and get feedback, pending the Big Paper, which of course will be available "real soon now." If you want to read more, while the blog takes shape, you can find some notes, and a link to a talk kindly recorded by the folks at the University of Toronto Faculty of Information Studies, on my website, http://www.cs.colorado.edu/~clayton (what's there are earlier versions of the ideas.)

Why do we need to work on what computer science is about? Jeanette Wing's influential paper on "computational thinking", and follow-on activities (see http://www.cs.cmu.edu/~CompThink/index.html) sets out the challenges. After a stimulating conference session on enrollment problems in CS at Microsoft Research a couple of years ago, one of the participants stood up at the closing to express enthusiasm, saying something like, "It's been great talking about how we can communicate better... we all need to get the word out about the key ideas in the field...", and then caught himself in midstream, "... but I'm not sure we know what they are!" In one of my favorite books, "Python Programming: An Introduction to Computer Science", John Zelle says, "[T]he fundamental question of computer science, is 'What can be computed?'" I don't agree! This question gets at little or nothing of why computer science is so important to all kinds of people doing all kinds of things. The importance of computing is what we need to explain.

Computational systems are representational systems.

Computational systems are important because they are used to represent all kinds of things people care about, and representations are really useful. Further, computational representations have a number of properties that make them especially useful.

So, what is a representational system? There's some theory of representation that we can draw on, best developed in the case of measurement systems, a special case. For theory of measurement see the three volume Foundations of Measurement, by Krantz et al., recently reissued in paperback; for the broader theory of representations, see papers by Mackinlay and Genesereth, including Mackinlay, J. D. (1986) Automating the Design of Graphical Presentations of Relational Information. ACM Transactions on Graphics, 5(2, April), 110-141.

A representational system consists of a target domain, in which there is something we want to accomplish, and a representation domain, with mappings connecting them. The point of representation is that we map work in the target domain into work in the representation domain, where it can be done more easily, or faster, or better in some other way. Then the results are mapped back to the target domain, where we need them.

The kind of representational system that’s best understood is measurement systems. You’ve been using them all your life, but if you are like most people you’ve never thought about how, let alone why, they work. Let’s look at an example.

Suppose we have some logs, and a chasm we want to bridge. We need to know whether one of the logs is long enough to bridge the chasm. We could answer this question by picking up the log, moving it over the chasm, and seeing whether it reaches the other side. This involves actual work: picking up the log, and moving it. The work could be hard, if the log is heavy, and it could be dangerous, if the chasm is deep. We might need to rig up some kind of derrick to swing the log over the chasm. It would be bad to go to all this trouble and then find that the log is too short.

We all know that there’s a way to avoid this possibility. We measure the log, and we measure the chasm, and we compare the measurements. If the number we get when we measure the log is bigger than the number get when measure the chasm, then the log will reach.

In this example, the target domain contains the logs and the chasm. The representation domain is numbers: when we measure the length of something we get a number.

Our question in the target domain is “Will this log reach across the chasm?” We map the question “Will this log reach across the chasm?” onto a question in the representation domain: “Is this number bigger than that one?” Finally, we map the answer we get in the representation domain, yes or no, onto an answer in the target domain, also yes or no (that mapping is easy.) Here’s a diagram showing all this:


The gain from doing all this (which we would normally do almost without thinking) is big. We don’t have to move the log!

For this or any other measurement system to work, the answer we get when we map our question into the representation domain, and then map the answer back to the target domain, has to be the same as the answer we would have gotten if we had done the work to answer the original question in the target domain. After all, how would we feel if we measured the log and the chasm, decided that the log was longer than the chasm, did all the work to swing the log across, and found out that it didn’t reach? The measurement systems we all rely on always do work, and if they appear not to, we assume we made a mistake of some kind, like not pulling the tape measure tight when we used it to measure something.

Measurement is interesting, and exceedingly important, but we are after even bigger game. Measurement systems are representational systems in which the representation domain is a fairly simple mathematical structure, like numbers. But in many representational systems the representation domain isn’t a simple mathematical structure, or a mathematical structure of any kind. For example, consider a simple bar chart, used to represent the fuel economy of cars. Here’s a diagram of this representational system:

Here the representation domain isn’t a mathematical structure, it’s marks on a piece of paper, or a pattern of colored dots on a computer screen.

In the representational systems we'll mainly be concerned with, the representation domain is neither simple mathematical structures, nor marks on paper (though patterns of dots on a computer screen come in around the edges.) Rather, our representational domains will contain computational stuff, whose nature will become clearer as we discuss it. Computational stuff has wonderful properties, so wonderful that computational representations are in use in virtually every sphere of human endeavor. As stressed earlier, the whole point of having a representation is to be able to do something more easily, or cheaper, or faster, or better in some way, and computational representations often have huge advantages over others.

Like the advantages of measurement, the advantages of computational representations often go unnoticed, even when we use them. Here are a few examples.

• If I create a computational representation of something, someone can access that representation, and use it to do work, on the other side of the world, incredibly quickly, and at incredibly little cost.

• Computational representations can easily be accessed in other times, as well as in other places. That is, they are easy and cheap to store and retrieve. The contents of all the books in all the libraries in the world can be stored in computers that fit in two standard shipping containers, each 20x8x8 feet; see http://www.archive.org/web/petabox.php.

• Many operations on computational representations can be automated, meaning that they can be carried out by a machine, rather than a person. This often offers huge advantages in speed, accuracy, and cost.