Why does measurement need an epistemology and what could it look like?

Measuring things seems quite straightforward. Most of us already learned in primary school to measure how far apart two points are on a piece of paper: draw a line between them, and compare its length with the length of your ruler. After checking how many marks on the ruler your line covers, you can express the outcome of your measurement numerically. You may now see that the points are 10 centimeters or 4 inches apart, but the knowledge of such units is hardly necessary for your purposes. You could equally settle on an outcome like ‘10 ruler marks with equal intervals between them’, delegating it to others to calibrate their own rulers with yours, in case they are interested in more comparisons. ‘What’, I expect you to think now, ‘is epistemologically interesting about such an uncomplicated activity? Epistemology is usually conceived of as dealing with the nature and justification of knowledge, not with the comparison between marks on a ruler, and distances on a sheet of paper!’

Before I try to convince you that your healthy skepticism is not entirely justified, let us have a look at exactly what we were doing when measuring distances in primary school. As noted above, we were comparing something—namely, the space between two points and our ruler with its marks on it. Naturally, things can be compared for various purposes, and on the basis of various attributes. In our case, we tried to find out how far those two points on a piece of paper were apart, by using the marks on the ruler. We might have done this to complete an exercise on a math test; to improve the symmetry of our fine art sketches; or out of a terribly human sense of curiosity. Whichever aim we had in mind, we were using our ruler to infer the length of a stretch of paper. Measurement is not a matter of comparison simpliciter, but an inferential, knowledge-oriented activity. We compare objects in order to gain knowledge about some of their attributes —because the latter are relevant for reaching some sort of cognitive or practical aim.

Epistemologists of measurement try to understand under which conditions such inferences are successful—i.e. they investigate when certain measurements license certain knowledge claims. This might seem straightforward when applied to our primary school example, yet becomes more complicated when we want to measure such esoteric things as gravitational waves or human well-being. Indeed, even simple distance measurements turn out to be trickier than suggested by our toy example. We just have to ask ourselves: are we really always willing to trust the distances measured with rulers other than our own? Surely—you may say—I can assume that your outcome will be consistent with mine as long as we compare our rulers before using them! However, what if you take your ruler to a location 20°C hotter than where we first compared them? Now you also have to consider lots of empirical details regarding the thermal expansion of the ruler’s metal before trusting your inferences! Of course, this is still a very harmless case of distance measurement. Things get more worrisome if you want to measure the itineraries that humans actually travel across. With the help of GPS satelites, many of us infer the distance between ourselves and other points on the earth’s surface. For these inferences to offer reliable travel advise, we need mathematical models of the earth’s topography and account for the effects of time dilation and gravitational redshift on the satellite clocks. If engineers would trust the satellite clocks too hastily, GPS coordinates will quickly stop matching the terrestrial topography and your navigation system might accidently tell you to drive off a cliff!

Given all the different kinds of measurement procedures used in and outside of science—as well as the diverse attributes they are measurements of—how can we find general criteria for their success? A lot would be gained here if we agreed what measurement are supposed to be successful in. In one sense, this a deeply contextual question. As discussed above, the external purposes of measurements can vary widely: from passing an examination, to determining the laws of planetary motion. In a narrower sense, however, the success of measurements might be defined relative to the specific attribute of interest. Let us see, then, if we can cash out some success criteria by looking at the way measurement inferences relate to their target-attributes.

One way of approaching this issue is to construe measurement as a matter of correspondence between the structure of our symbolic conventions and the structure of our attributes of interest. Measurement becomes a kind of mapping, in which the results of our different inferences, taken together, form a nice representation of the behavior of the attribute under our consideration. Such a view is in line with many established others in formal semantics and the philosophy of language. Moreover, correspondence as a bearer of truth has always enjoyed a strong intuitive appeal among philosophers. It seems natural to say that a measurement inference is successful in virtue of the similarity between the structure of our symbolic measurement system (i.e. the marks on our ruler and the intervals between them) and the structure of the attribute we want to acquire knowledge about (i.e. the distances on the paper). This position is certainly compatible with many different philosophical commitments. An empiricist will be happy to endorse it if taking it to mean that our measurement conventions approximate structures apparent in human experience. Indeed, attributes like length or temperature might be taken to denote certain regularities that hold across the perceptual features of thermal and spatial phenomena. A realist, in turn, can adopt correspondence talk by ascribing the structures assimilated by our measurement conventions to the mind-independent features of reality. Instead of empirical representation, the realist sees measurement conventions as literal descriptions of the metaphysical structure of particular bits of reality. We could even succeed in convincing some platonists of a correspondence view of measurement—although the reasoning would be slightly more technical. Some philosophers and mathematicians have noted that the ratios between different magnitudes inferred through an internally consistent measurement procedure do assimilate the logical relations existing between real numbers. Frege and Russell, among others, famously made this observation in their seminal attempts at defining basic mathematics in terms of symbolic logic. The Platonist could thus join the realist in believing measurement inferences to be aimed at literal descriptions of mind-independent relations; she need only to disagree that they occur in the physical world, and rather form part of what she believes is the world’s fundamental mathematical structure.

The epistemological utility of correspondence considerations is a result of their shared focus on structure. Indeed, we do not have to accept any of the semantic or metaphysical interpetations offered by the realist, empiricist, or platonist to appreciate its value. There are many ways in which we can usefully differentiate between the structures we find in various kinds of measurement inferences. Going back to our primary school case of distance measurement—we all would intuitively agree on some features characterizing inferred knowledge about spatial distance:

i. There is a clear case in which it can be said to be 0 (i.e. the distance from a point to itself)

ii. We can vary the precision of our standards (nearly) infinitely according to our needs (i.e. we can, practical, relativistic, and microphysical constraints aside, construct rulers of arbitrary sizes).

iii. We know exactly how much two different lines on a paper differ in their length.

None of those three features, however, hold when looking at other cases—for example, when we measure the hardness of physical substances. We can directly compare the hardness of two or more materials, but can neither speak of an objective minimum for hardness, nor can we say precisely how much two objects differ in hardness. The differences between hardness and distance measurement can be ascribed to the differences in the kind of relations we know to exist between magnitudes of hardness and magnitudes of spatial distance, respectively. In other words, they pertain to the known structural features that the target-attributes of our measurements have. Based on such structural differences, we can nicely distinguish between various kinds of measurement inferences, thus licensing knowledge-claims of varying informativeness. In a more widespread idiom, levels of structural complexity are referred to as different types of measurement scales:

Scale	Structural Features
nominal	Assignment of (numerical) labels without implying relational structure.
ordinal	Assignment of a rank ordering.
interval	Assignment of a rank ordering on a scale where the intervals between elements are numerically comparable.
ratio	Assignment of a rank ordering on such a scale where there is a minimum and the ranking can be represented by non-negative numbers with the ratios between these numbers reflecting a physical relationship as well.

Fig. 1: Simplified reconstruction of a famous scale typology by Stanley Smith Stevens (1946, 678).

Many philosophers and scientists have taken such structural considerations to exhaust what is philosophically interesting about measurement. The table shows a list of structures that characterize the knowledge gained through measurement procedures of different complexity. Does this, however, answer our epistemological questions about the justification of measurement inferences? Remember—we were originally concerned with the conditions under which knowledge claims based on measurement procedures can be judged successful, appealing to the difficulties of trusting our results in changing contexts. In measurement practice, however, a structural typology of different measurement scales can only help us in a limited range of cases. Namely, in such cases where where have antecedent knowledge about the kind of structure that our target-attribute has. For example, since the first balances were invented, humans have assumed that weight can have an absolute zero value, that there can be extremly fine-grained differences in weight, and that it is theoretically possible to say how much two objects differ in their weight. Consequentely, our scale typology could have provide some antecedent guidance for the kind of measurement procedure we want establish—namely, one that produces outomes that can be mapped onto ratio scale.

If we think about the worries that initially motivated an epistemological treatment of measurement, however, abstract scale typologies are of very limited help! To illustrate this point, think back to our initial thought experiments. We were concerned whether we can trust the initial attempts of measuring attributes or extending existing measurement procedures into hitherto inaccessible domains. In other words, we wanted to know whether we can trust inferences based on a novel kind of instrumental indicator or in a novel kind of context. This requires the reliability of a postulated or extrapollated empirical law which connects our indicator to our target attribute. Whether we want to find if we can infer temperature below the freezing point of the ordinary mercury thermometer; or whether we can even attempt to quantify well-being—we are ignorant of such precise empirical laws. In fact, scientists want to employ measurements to find out whether temperature is a well behaved quantity outside of the domain of reliability of our available thermoscopic substances or whether well-being exhibits any law-like connections to a certain indicator, and so forth. Here, we become aware of a unique justificatory circularity affecting scientific measurement, which the philosopher Hasok Chang famously dubbed the “problem of nomic measurement”. We confirm our empirical knowledge claims by comparing them to the outcomes of reliable measurements, but we base our measurement inferences on reliable knowledge of empirical laws! It is in these circular situations that we are in need of epistemological guidance. Situations that appear suprisingly often at the frontiers of science! Here, pointing to typologies such as the one above appears somewhat idle, since abstract scale-types do not tell us about the empirical law that might help us realise them for a particular attribute. We do not want to know that on what type of scale our outcomes might map one day, but how can increase their justificatory import. An epistemology of measurement, in other words, should not merely tell us something about the structure of settled knowledge—it should elucidate how we can go about justifying measurement inferences!

It is no wonder that many of the most fruitful answers to this question do not come from professional philosophers, but from practicing scientists. One attempt to ensure the success of measurement inferences without having access to the structure of their target attributes beforehand is to bite the bullet and completely suspend any talk about target-attributes. This is the strategy prescribed by operationalism, a position popularized by the experimental physicist Percy Bridgman. The reliability of any measurement inference, on this view, is restricted to those circumstances in which we employ identical, or fully interchangeable, measurement operations. Strictly speaking, measurement is not a matter of inference at all, but our operations define what we mean by a certain measurable concept and where we can use it. In other words,‘distance’ is what the rod measures. Operationalists insist that prior to meticulously studying the behavior of measuring rods beyond the earth’s surface, there is no meaningful way in which the celestial distances inferred from telescope observations can be compared with ordinary measurements of length. This view, however, quickly runs into serious difficulties. The only way that an operationalist will allow us to extend the domain of our inferences is through extremely fine-grained extrapolation—enforcing the constant check of whether our previous and known operations are fully substitutable by new ones. Such a strategy vastly underestimates the importance of theoretical conjecture, as becomes apparent when considering indirect forms of measurement. In many measurement cases, we do not motivate our inferences by comparing two instances of the same attributes (e.g. a distance on a paper and a distance on a ruler)—but through the theoretical links we suspect to exist between our target attribute and another, more accessible, attribute. Familiar examples include: inferring temperature from the expansion rate of mercury; infering intelligence from the performance in certain questionaries; and inferring the strength of gravitational attraction from the motion of pendulums. Only by acting on our theoretical suspicion that there is some attribute that our observable indicator is measuring, can we start constructing measurement procedures!

A more sophisticated take on the justification of measurement inferences was proposed by Ernst Mach, another experimental physicist. Mach has often been labeled as a positivist—but the typical connotations of the term obscure his key contribution to the epistemology of measurement. To better distinguish the view under consideration, we will call it economic conventionalism. For Mach, measurements, first and foremost, ought to simplify the acquisition of knowledge, by usefully coordinating theory and experiment. This facilitates what he famously referred to as the cognitive “economy of science”. When engaging in measurement inferences, in his view, we commit ourselves to certain coordinative principles. What does this mean? Remember that the operationalists would require us to exhaustively study the changes in the behavior of our measuring rods before we can compare geographical distances with distances inferred from telescope observations. For an economic conventionalist, however, this is not the only way to extend the domain of our measurement inferences, but one out of many. By applying an operationalist strategy, we commit to a coordinative principle P1—according to which every distance must be a multiple or fraction of the intervals between the marks of some measuring rod(s) at the earth’s surface. Alternatively, we could adopt the principle P2—stating that every distance is a multiple or fraction of our measuring rod—without making any qualifications of the place we are using it in. We could still yet commit to the principle P3 asserting that every distance is a multiple or fraction of the wavelength of light—and so forth. These principles take a more or less theoretical notion (e.g. distance), and coordinate it with some observable operation that serves as its basic unit. While P1 might be most secure, it will be far simpler for physicists to adopt P2 or P3. Isaac Newton, for example, inferred his universal laws of motion based on a principle closer to P2—thus opening up the way for 200 years of successful physics. Albert Einstein would later find out that there are, in fact, shortcomings in extending simple distance measurement infinitely across all possible objects and contexts. An economic conventionalist, however, will say that we have learned about these shortcomings precisely because of our initially broad conjecture. The fact that we eventually adjusted our principle provides her with yet another reason to embrace simplicity, for it shows that broad conjectures are perfectly capable of self-correction!

To be sure, even the economic conventionalist will argue that the choice between principles is constrained—as we could come up with various candidates that will, for empirical reasons, complicate our inferences. For example: I could propose an outrightly nonsensical principle such as ‘Every distance is a multiple or fraction of the length of the yearly average temperature in New Delhi.’ Evidently, the temperature in New Delhi does not have a ‘length’. I could also, however, come up with a principle that is intelligible but useless—for example, by employing the distance between me and the Northern-Portuguese village of Lanheses as a coordinative principle, defining all other distances as fractions or multiples of it. This would lead to lots of coordination problems, as my location is changing every hour or so. In our two mock cases, our principle would not simplify, but complicate experimental knowledge acquisition, as anyone interested in measuring anything would constantly need to investigate my location.

Nonetheless, we may still argue that economic conventionalism can hardly be the whole story when it comes to how we justify measurement inferences. For what exactly does it mean to simplify experimental research? What seems to lure in the background of Mach’s epistemology is a notion of progress. For whether a measurement procedure is effective in simplifying inquiry, on Mach’s “economic” account, depends on how it advances scientific knowledge. To clarify this notion, we must, once again, qualify what our research is supposed to be successful in.

We initially agreed to limit discussions about successful measurements to specific target attributes, leaving aside the external purposes for which we might want to measure those attributes. Could we say that Mach’s view, if sufficiently explicated, will define those measurement inferences of an attribute as successful which continue to get better over time? Such a view has been advocated by Hasok Chang, who calls such processes of self-correction as epistemic iterations. The idea is simple but powerful. We can best illustrate it by looking at any imperfect measurement situation—in which we have different inferences available that are not mutually compatible. These correspond to conflicting operations or coordinative principles. Mercury and alcohol, for example, can both be used in thermometers, but expand at different rates relative to the rise in temperature. Which of them should we rely on? The strategy of epistemic iteration advises us to take all those inferences to be justified that contribute to a process of iterative correction. In other words, we are making progress as long as we can employ different measurements, for the sake of their mutual improvement. It is iterative since our starting assumptions (i.e. all the knowledge that went into the construction of our imperfect procedures) are altered in the process of using them. Chang thinks that we can judge progressivness by assesing the improvement of coherence between measurement inferences across measurement contexts. The aim of epistemic iteration is not coherence simpliciter, but the improvement of coherence. While Chang’s conception of coherence may be applied to non-quantiative knowledge too, inferential coherence in measurement is equivalent to a numerical convergence between measurement outcomes. Epistemic iteration seems to offer a more realistic justificatory strategy than operationalism, as well as a clearer idea of progress than economic conventionalism. Should we, then, end our philosophical survey here? We are not quite there yet. All of our above discussions of epistemological views assumed that there is always a clear direction of progress. At the end of that direction are more coherent knowledge claims about the target attribute of our measurement inferences. Measurement, on such a view, aims to provide us with the most coherent knowledge about a very particular epistemic target that we all agree upon. Now consider the following situations:

We want to measure human well-being. Our measurements are important for improving government policy and for grounding further studies to explain variations across regions and countries. We can either use (i) a one-time questionnaire based on the direct self-assessment of articipants or (ii) a series of questionnaires to be sent out at multiple points during the year. We know that option (ii) is more likely to give us coherent values, as it minimizes the impact of daily mood swings among participants. However, option (i) represents variations among regions well enough, so that we might isolate structural determinants of well-being for further research and prioritize certain regions for policy improvements. Option (ii) is more coherent, while option (i) promises to be much more effective for our external purposes.
Eighteenth and nineteenth-century physicists tried to measure the ellipticity of the earth (i.e. how far it diverges from being a sphere due the centrifugal effect of its rotation), to better understand phenomena such as the terrestrial gravity field, or density variations across geological strata. Knowing the shape of the earth precisely makes astronomical navigation more reliable and the increases the accuracy of maps. These physicists had two different measurement procedures, in which they inferred ellipticity either from variations in the lenght of meridian sections with latitude, or from variations in surface gravity with latitude. The outcomes of the two procedures turned out to be inconsistent. However, by studying the inconsistent data, researchers discovered geological anomalies, the subsequent study of which helped them to understand the physical relation between the earth’s crust and mantle. For a considerable amount of time, they did not improve the coherence between their two inferences— yet were nonetheless very successful in reaching some of their external aims.

In cases 1 and 2, relying on incoherent measurement inferences proves valuable outside of the iterative self-correction process. They can offer access to novel kinds of empirical data that serve our external aims and may even ground novel theoretical hypotheses. Remember, that we had so far excluded the external aims of measurements from our discussions of inferential success, tying the later to a particular target attribute. To account for our new examples, we need to cash out a success criterion that takes into account what you can do with a measurement procedure beyond its inherent aim! The criterion usually invoked by philosophers and scientists in that context is fruitfulness. If we apply its usual definition, we could say that the fruitfulness of a measurement procedure is assessed based on the novel empirical and theoretical insights that it generates. Fruitfulness is a unique success criterion since it is inherently dynamic, i.e. it pertains to the performance of epistemic activities over time. As case 1 nicely illustrates, fruitfulness can trade-off against the demand for more coherence, making it necessary to stress its independent importance to scientific measurement. Epistemic iteration, consequently, cannot be the full story about measurement success. We do not only care about improving the coherence between measurements of predefined attributes but also value their fruitfulness beyond their ostensive target. In other words, measurements should not only coordinate knowledge claims as needly as possible, but has a dynamic function in generating novel insights.

If we take this dynamic function of measurements seriously, we should embrace a position called dynamic coherentism. Measurement inferences, for the dynamic coherentist, are successful either if they are fruitful, or coherent, or both. Consequently, we may justify our measurement inferences based on their contribution to progressive corrections, as well as their ability to generate novel insights. One immediate worry you might have regarding this view is that ‘fruitfulness” is still a notoriously ambiguous criterion to actually employ in scientific practice, while epistemic iteration offers a more elegant account of the justification of measurement inferences. Allow me, then, to return to our examples to illustrate how exactly it enriches the iterative model of progress. Incoherent measurement procedures of the same attribute will produce sets of outcomes that cannot be consistently fitted into a single empirical law describing the behavior of that same target attribute. That is, we are unable do adjudicate between conflicting coordinative principles. Such a procedure can be fruitful in two different ways:

First, it may be coherent enough to learn about other epistemically interesting phenomena which are related to our target attribute. In such cases, measurements are of instrumental utility. Take—for example—the structural determinants of well-being in case 1. We might be unsure how to exactly operationalize ‘well-being’ beyond vague clusters of indicators—yet nonetheless find it important to investigate the factors which influence such vague clusters.
Second, systematic discrepancies between outcomes can themselves guide inquriy. As case 2 illustrates, scientists can learn a great deal from understanding under which circumstances, and to which degree, measurement outcomes conflict. In some instances, learning from conflicting results is an instance of epistemic iteration—but this is far from necessary. Epistemic iteration is achieved only in cases where we can explain the cause of a discrepancy by appealing to available theoretical knowledge, which allows us to correct for sources of error. For example—we might have two metal rods that offer inconsistent results, due to one of them being exposed to a high temperature and thus expanding in lenght. Taking into account basic thermal physics, we may explain and correct the incoherence without altering our coordinative principle and our definition of length. In these cases, fruitfulness is a by-product of epistemic iteration, as it is sufficiently described by appealing to a higher coherence across contexts. The challenging situation, however, occurs when inquiry into the discrepancies leads to discoveries without improving coherence. Such is the case with example 2, where geological and geophysical discoveries were valuable in their own right, although not able to restore the global coherence of the ellipticity metrics. Here, progress is not equivalent to self-correction, as discrepancies opened up completey new lines of inquiry.

Finally, you might worry that fruitfulness opens the door for purely practical concerns, since novel forms of empirical data can be valuable for non-epistemic reasons. Indeed, some incoherent measurement procedures are highly useful practically, as indicated by the applications in case 1 and 2. While not strictly an epistemological matter, philosophers of science have done lots of work on how non-epistemic concerns do affect—and should affect—scientific methodology. It is beyond the scope of this introduction to discuss how such concerns might influence the justification of measurement inferences—but it offers a fruitful topic for epistemologists of measurement to explore in the future. Moving forward, we must clarify what non-epistemic values we deem worthy of consideration for the justification of measurement procedures, and how trade-offs between practical fruitfulness, epistemic fruitfulness, and coherence can be settled. We may dispense with such concerns by appealing to the value-freedom of scientific measurement—yet the success of such appeals is far from guaranteed.

Miguel Ohnesorge

Further reading: