Data, code, and regulation

Data is code and code is data. The distinction between software (“code”) and input (“data”) is blurry at best, arbitrary at worst. And this distinction, or lack thereof, has interesting implications for regulation.

In some contexts software is regulated but data is not, or at least software comes under different regulations than data. For example, maybe you have to maintain test records for software but not for data.

Suppose as part of some project you need to search for files containing the word “apple” and you use the command line utility grep. The text “apple” is data, input to the grep program. Since grep is a widely used third party tool, it doesn’t have to be validated, and you haven’t written any code.

Next you need to search for “apple” and “Apple” and so you search on the regular expression “[aA]pple” rather than a plain string. Now is the regular expression “[aA]pple” code? It’s at least a tiny step in the direction of code.

What about more complicated regular expressions? Regular expressions are equivalent to deterministic finite automata, which sure seem like code. And that’s only regular expressions as originally defined. The term “regular expression” has come to mean more expressive patterns.  Perl regular expressions can even contain arbitrary Perl code.

In practice we can agree that certain things are “code” and others are “data,” but there are gray areas where people could sincerely disagree. And someone wanting to be argumentative could stretch this gray zone to include everything. One could argue, for example, that all software is data because it’s input to a compiler or interpreter.

You might say “data is what goes into a database and code is what goes into a compiler.” That’s a reasonable rule of thumb, but databases can store code and programs can store data. Programmers routinely have long discussions about what belongs in a database and what belongs in source code. Throw regulatory considerations into the mix and there could be incentives to push more code into the database or more data into the source code.

* * *

See Slava Akhmechet’s essay The Nature of Lisp for a longer discussion of the duality between code and data.

4 thoughts on “Data, code, and regulation

  1. This is a good argument AGAINST government regulation. It’s obvious that politicians do not understand the points you are making, as it’s hard for many experienced programmers to understand what you are saying. What government regulation WILL do, though, is get a few major players who do understand what you are talking about to heavily influence those politicians who write that regulation in such a way as to benefit them as much as possible and damage their competitors as much as possible. Regulatory capture in action.

  2. Regulation is often futile for the Internet in particular, and technology in general, as both tend to “route around” the obstructions imposed by regulation.

    Regulations to “not do things” are also inherently more difficult to enforce than regulations which require certain verifiable behaviors. That is, verifying absence vs. verifying existence, which leads to the old saw: “The absence of evidence is not evidence of absence.” Something especially true in the digital age.

    So, what to do? Attempting to limit the availability of a class of tools solely to block their access by a few “ne’er do wells” harkens back to Benjamin Franklin’s sayings, which include: “They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” ”Those Who Sacrifice Liberty For Security Deserve Neither.” ”He who would trade liberty for some temporary security, deserves neither liberty nor security.”

    Given that Internet freedom can be even moderately constrained only by overwhelmingly draconian means (e.g., The Great Firewall of China), what is the proper response to those who would use technology in ways deemed “harmful”?

    Well, we don’t regulate everyone’s kitchen knives merely because each and every one of them has the clear potential to become a murder weapon. What we regulate is the misuse of kitchen knives, along with all other means to commit murder. We outlaw the result of a verifiable action, not the tools.

    And guess what? All the results of illicit use of encryption are already illegal. (To some extent: The legal meaning of “Digital Privacy” still needs to be clarified.)

    But what about preventing terrorism and mass murder? These are not distinguishably harder to do without encryption than with. Should we outlaw oxygen because it directly enables terrorists to do all their actions? Oxygen is viewed as a common good, despite its “misuse” by criminals. Encryption falls solidly within the category of “common good”, rather than outside it.

    I’m far, far more concerned about the illicit breaking of encryption than the illicit use of it. I’d be more concerned if the government, rather than outlawing all knives, instead mandated they be made of “safe” rubber. All in the name of “preventing” murder.

    New regulation to limit use of encryption (or the creation and sharing of more effective forms of it) would be little more than a waste of effort and funds.

    Let’s first classify kitchen knives as “munitions” and see how that goes over.

    Bah. I really hate it when my buttons get pushed.

    Of course, a better analogy than kitchen knives would be a regulation that all postal/shipping packaging be transparent. Or that envelopes are outlawed, but 8×11 postcards are OK. Bah!

    Then there’s the metadata, recording the existence, timing and size of physical or electronic communication. Or the widespread use of mass public surveillance, including cell phone trackers and license plate scanners. Thornier subjects, to be sure.

    Which reminds me: I still need to finish setting up the security cams to monitor my property and the adjacent public property (but not my neighbor’s property – that would be illegal). I’ll have to record everything until I figure out how best to filter it. Sigh.

    I mean, it’s different if I’m doing it for my personal security, right? I’m certain you or your neighbors are doing the same thing, right?

    Maybe I should ask the NSA or FBI or my local police to do it for me. I mean, why am I paying taxes in the first place?

  3. Thanks very much for the link to “The Nature of Lisp”.

    Has it been nearly a decade already? It seems like yesterday that I saw it mentioned on Digg (cough, cough).

    That article arrived just as I was being pushed to use XML as a communication format in a project. The thought of data as code as data effectively sidelined the effort when I realized “JSON *is* Python!” (with the same emphasis as: “I *am* the Kwisatz Haderach!”).

  4. Andrew Rodland

    More examples: an Excel spreadsheet is “data”, but with enough formulas and macros it becomes an interactive program. Many systems have configuration files complicated enough that configuring them is a special case of programming (e.g. Asterisk dialplans, mail filtering rules). In today’s world a program can even consist of configuration on different services owned by different companies all around the world, as when a monitoring system sends an email to PagerDuty, which has an escalation chain that triggers IFTTT to change the color of an internet-connected lightbulb to red.

Leave a Reply

Your email address will not be published.