Sunday, February 17, 2008

Representation and Cost

The point of a representational system is to move work from the target domain into the representation domain, where it can be done more easily, or more accurately, or better in some other way. I'll refer to the complex of issues that arise here (speed, effort, etc.) under the general heading of cost.

Mackinlay's thesis, cited in the previous post, distinguished two aspects of representations, expressiveness and effectiveness, in a way that illustrates the importance of cost. Expressiveness refers to the logical adequacy of something in a representation domain to capture the structure of a target domain. Mackinlay and Genesereth provide a nice example of an expressiveness failure, pointing out that the relation nested-within between closed contours in a diagram can't be used in general to represent contiguity of geographic regions like states or provinces, because nested-within is transitive, and contiguity is not. So expressiveness is a mathematical constraint on representational systems.

Effectiveness, on the other hand, refers to the ease and accuracy with which a human judge can extract correct answers from a visualization (the kind of representational system Mackinlay addressed.) This diagram shows three bar graphs that might be used to represent the fuel economy of two cars, as in the example in the last post:

The three representations are equally expressive: in each the yellow bar is longer than the red bar. But the representations differ considerably in effectiveness: human viewers find it easy to compare the lengths of bars when the bars are parallel and base-aligned, and only chart A has these attributes.

We can generalize the considerations in effectiveness to a much broader range of situations, including ones in which human perceptual judgments play no or little role. In many situations where computational representations are used, crucial operations in the representation domain are carried out automatically, and the relevant considerations in evaluating the representational system center on the cost of carrying out the operations by machine. The value of the representational system may hinge on how rapidly the needed operations can be carried out, and whether or not adequate memory is available, on a machine with an acceptable price.

Cost of creation, maintenance, and use.

Another kind of generalization considers costs beyond those incurred in using a representational system to answer some particular question. A system that provides fine answers at low cost will not be useful if is very expensive to create, and if this first cost cannot be amortized over enough usage. Thus in many situations the cost of creating a representational system for a given problem or family of problems will be an important consideration, motivating improvements in the technology of creating representational systems such as programming languages that can more easily be applied in connection with various target domains.

Another practical problem with computational representations is maintenance, the work required to modify a representational system so that it continues to support a shifting portfolio of work in a target domain that also shifts over time. Famously, the cost of maintenance commonly exceeds the cost of initial creation for many computational representational systems.

Cognitive dimensions analysis, as developed by Green, Petre, Blackwell, and others (see http://www.cl.cam.ac.uk/~afb21/CognitiveDimensions/), provides useful insight into the aspects of representations that determine costs of creation and maintenance, and I'll return to this subject below, when discussing the role of programming languages in representational systems.

Approximation and cost.

In the introductory post I said that the results obtained by moving work into the representation domain have to agree with those that would have been obtained by staying in the target domain. That's not quite true. The results only have to be "good enough", or "close enough", or "approximately correct". But what do these criteria really mean?

The crux is that the cost of the path to an ultimate result in the target domain that detours through the representation domain has to be less than that of the cheapest available path that stays in the target domain. The results obtained from the work in the representation domain are "good enough" just when the incremental costs, of all kinds, associated with any difference between these results and those that would have been obtained by staying in the target domain, are less than the savings realized by moving work into the representation domain.

For example, suppose in the situation with the log and the chasm that we aren't able to measure the log and chasm with complete accuracy (as will always be the case.) Using measurement, and moving the work of comparing the log and chasm into the representation domain of numbers, is nevertheless worthwhile, as long as the inaccuracy doesn't increase by much the probability of an expensive failure, when we act on the results of the measurement process, in placing or not placing the log across the chasm. How much error is acceptable? Depends on the many practicalities of the situation: Are there other logs available if we drop one into the chasm? How soon do we need the bridge? There's the further consideration that we can often tell whether measurement error is likely to matter, when we see the results: if the log measures much longer than the chasm, we don't worry about small errors.

These multiple practicalities suggest that we can't expect too much from a purely formal treatment of the "correctness" of a representational system. It's clearly much too strong a condition that the mappings from input to result in the target domain, and from input into the representation domain, over to a result in the representation domain, and then back into the target domain, strictly commute, in the parlance of category theory, though category theory may provide some useful insight. I'll return to category theory in a later post.

No comments: