This week, Brian reports on a successful launch to production and thinks about the “colours of bits” applied not to a financial framing, but from the perspective of academic paradigms.1
The Pilot Soil Monitoring and Incentives Program launched last week and the code that FAIMS3 is working on is powering the app being used to collect well-integrated geospatial and lab data!
In development news, we’re starting to work on updating Capacitor to version 3 for iOS support and planning Q2 development.
On the Colour of Bits, Paranoia, and implications to data collection paradigms:
From “What colour are your bits?”:
In intellectual property and some other fields we're very interested in information, data, artistic works, a whole lot of things that I'll summarize with the term "bits". Bits are all the things you can (at least in principle) represent with binary ones and zeroes. And very much of intellectual property law comes down to rules regarding intangible attributes of bits – Who created the bits? Where did they come from? Where are they going? Are they copies of other bits? Those questions are perhaps answerable by "metadata", but metadata suggests to me additional bits attached to the bits in question, and I'd like to emphasize that I'm talking here about something that is not properly captured by bits at all and actually cannot be, ever. Let's call it "Colour", because it turns out to behave a lot like the colour-coded security clearances of the Paranoia universe. [emphasis mine]
Newtonian science, for instance, is not simply a set of four conjectures – the three laws of mechanics and the law of gravitation. These four laws constitute only the ‘hard core’ of the Newtonian programme. But this hard core is tenaciously protected from refutation by a vast ‘protective belt’ of auxiliary hypotheses. And, even more importantly, the research programme also has a ‘heuristic’, that is, a powerful problem-solving machinery, which, with the help of sophisticated mathematical techniques, digests anomalies and even turns them into positive evidence.
Sitting somewhere between a programme’s heuristic and its hard core are the allowed academic colours of bits: a fundamental understanding of what digitally collected data can and should be able to measure/record/document3. We’ve been down this road before as we negotiate the names-of-things and the organisation-of-things; this feeling is a very familiar one, where the nature of what-should-be is discussed.
Colours in fieldwork
To be clear, my perspective on the colour of bits of astronomers comes from a very very very outsider’s perspective — all I have seen (besides an undergrad astronomy unit at RIT two decades ago) are the expressed priorities in design of a scientific tool by our colleagues at AAO. From my digital humanities perspective, the ‘colour’ of bits in astronomy — their coded-in design assumptions about “what makes a valid record” — tends much more towards the colour-blind CS implementation: a bit is a bit, fungible, and therefore all bits must be held to the same standards of rigour: why would you ever want to commit iffy bits to the database so other people could read them? This design-space of ‘yes, of course you want to share only the highest quality data…’ neglects the pragmatic heuristics of various disciplines I’ve had to deal with.
There, unfortunately, exist many colours of bits in fieldwork-disciplines, due to many reasons:
Technical literacy and the pragmatic realities of what constitutes academic research over in Arts (and the social sciences)
Temporal spread of the record (sometimes different people at hugely different times need to contribute to a record before it’s “complete”
Literal, physical, uncertainty — some things are unknowable from the evidence at hand, but that shouldn’t prevent creating what data we can know
Granularity — sometimes high data quality isn’t desirable. It’s either unaffordable, not helpful to the goals of the person in charge, not a known problem (we’re trying to fix this last bit), or a judgement call about a rushed field season and a very long data cleaning season
Divergence of theory and practice.
While I am firmly in camp reproducible data — it turns out that very few of my colleagues are, at least in their expressed preferences in how they want field data collection notebooks created. Aspects of method, discipline, provenance, annotations and certainty, someone’s philosophy of data, and — frankly — what they think a spreadsheet should be used for all colour a bit. We can claim that all of the former are meta- or para-data, but at a certain point it becomes turtles all the way down.
A decade ago, when I was designing FAIMS 1, I included provision of annotation and uncertainty as part of each field, to allow for a digital simulacra of the scribbling a question mark in the margins of the paper form. It turns out that this was insufficient and too much. A subset of our users would write the
? directly into the textfields where they were recording their data. One especially dedicated user wrote a small essay in an identifier explaining why they made various decisions4. At the same time, the generalised exporter was thought to be a monolithic exporter, and despite 10 years of custom exporters — no one asked me to be able to toggle the presence or absence or different-row-breaking-out of metadata. But all of these examples are bits.
What isn’t describable in “bits” and what I didn’t anticipate, were the strongly varied operationalisations of what “uncertainty” meant. Is it “physical uncertainty to the limits of measurement,” “personal uncertainty due to degradation in thing being observed,” or “Help! I need a
n adult supervisor!” When I designed uncertainty as a sliding scale between 0 and 1, I was thinking quite strongly of Bayesian reasoning, being influenced by HPMOR and the folks at Less Wrong. (I thought that the ability to document one’s epistemic state while collecting data would be useful.) The problem here is that by compelling uncertainty into a decimal value, I’m not actually sure anyone actually used it the way I designed it. I was trying to take a colour-of-bits problem and stuff it into bits.
In FAIMS3, we’ve turned uncertainty into a checkbox and allowed for a custom label — which is certainly one way of doing things. It allows for people to check, for example: “data taken with handheld GPS.” And while this uncertainty (meta)data is certainly useful — it’s even farther away from documenting the colour of bits than we were when I started.
(Meta)data doesn’t capture the evaluative accent
Every ideological sign – the verbal sign included – in coming about through the process of social intercourse, is defined by the social purview of the given time period and the given social group. So far, we have been speaking about the form of the sign as shaped by the forms of social interaction. Now we shall deal with its other aspect – the content of the sign and the evaluative accentuation that accompanies all content.
What determines this circle of items endowed with value accents?
In order for any item, from whatever domain of reality it may come, to enter the social purview of the group and elicit ideological semiotic reaction, it must be associated with the vital socioeconomic prerequisites of the particular group’s existence; it must somehow, even if only obliquely, make contact with the bases of the group’s material life.
Individual choice under these circumstances, of course, can have no meaning at all. The sign is a creation between individuals, a creation within a social milieu. Therefore the item in question must first acquire inter-individual significance, and only then can it become an object for sign formation. In other words, only that which has acquires social value can enter the world of ideology, take shape, and establish itself there.
Bits only have meaning as sign-arrangements. Literal, computational, bit-arrangement and parsing misses the point of bits having semiotic weight. Arts scholars, in my personal experience, have an Aristotelian evaluative accent — the colour of a bit is informed by its academic pedigree, not simply the rigour by which it was collected.
Security clearance, trust, academic side-eye, and colour-coding cells of a spreadsheet as data6 are all expressions of this infological evaluative accent. They are a “frame” by which we parse and understand the affordances of the bits being presented to us. What can we do with this data? What should we do with this data?
To some in arts, high quality data with a red security clearance is far less valuable than, for example, blue data shared as a photocopy of some scholar’s table (or pdf in these more modern times) when they worked it out by hand some time ago. To others, data is only an intermediate byproduct — and it can only increase in quality by being passed around and edited. A lack of validation on fields may stem from the fact that the field season’s permit is strictly 6 weeks long and being able to capture any data while in the high-friction environment of standing up to one’s knees in muck is better than fighting with the tablet. (Or, perhaps, acknowledging Rumsfeld’s unknown unknowns and knowing that the specific data collected while in the muck will be quite difficult to anticipate while dry at home).
The presence of non-fungible bits is an affront to those of us who work on “big” datasets7 as it becomes very difficult to compute on these evaluated dimensions of meaning. However, when the time, cost, or pragmatics of the reality allow for either the collection of colour-coded data or no data at all … most folks in Arts would prefer some over none, even if it is hard to describe in a FAIR fashion. If we don’t support them, we’ll never be able to persuade them that tools beyond the word processor and the spreadsheet are suitable to their task. This judgement is a mistake.
Reading this week
Julia Evans on the importance of tiny learning milestones while programming
Patrick McKenzie on virtual sword accounting (where I found the above article linked) has amazing quotes like:
And that is why accounting standards have destroyed more imaginary swords than all the rust monsters in all the prime material planes combined.
Andrew Hudson in Ars Technical provides A brief tour of the PDP-11
Technically, I should be using Lakatos’ Research Programmes rather than Kuhn’s paradigms here. But if I did that, even fewer person would understand me. (I try to have realistic expectations of counts of people…) This isn’t an appropriate use of the term Paradigm because it’s a whole-of-science phrase, rather than the more loosely-connected disciplinary Research Programme. I should stop here before my footnotes need footnotes… (Let me know if you want me to muse more on the philosophy of science in these things…)
This seems to be effective for data-as-objective-measurement, data-as-subjective-recorded-observation, and data-as-electronic-comms (Ballsun-Stanton 2012).
I still remember this incident from 2015 because it broke my exporters in a few significant ways.
Voloshinov was talking about Marxist propaganda. I used his theoretical basis in my dissertation’s look at philosophy of data. Also, by reading this soviet scholar, I’m afraid Friend Computer has judged you to be a Communist Mutant Traitor — your troubleshooting team has been assigned this problem…
If you’re one of the people who uses colour-as-meaning in a spreadsheet, I implore you to stop. It’s fine if colour is derived from a cell-value. But if it’s an expression of a human judgement, it’s impossible to compute on your spreadsheet, and generally quite tricky to share with other meatbags.
Big data = “it doesn’t fit on my laptop”
My most favourite tabletop RPG is Ars Magica which has gameplay options like: “The Extreme-Research Saga … Learn Latin so that you can read the ones that haven’t been translated yet. Learn paleography so that you can read the ones still in manu-
script. Go back to university and get a Ph.D. in medieval studies while actually just researching your saga.” Ars Magica is an opportunity to explore an Academic Power Fantasy — undisturbed time researching alone in the lab is possible!