Control characters

I didn’t realize until recently that there’s a connection between the control key on a computer keyboard and controlling a mechanical device. Both uses of the word control are related via ASCII control characters as I discovered by reading the blog post Four Column ASCII.

Computers work with bits in groups of eight, and there are a lot more possible eight-bit combinations than there are letters in the Roman alphabet, so some of the values were reserved for printer control codes. This is most obvious when you arrange the table of ASCII values in four columns, hence the title of the post above.

Most of the codes for controlling printers are obsolete, but historical vestiges remain. When you hold down the control key and type a letter, it may produce a corresponding control character which differs from the letter by flipping its second bit from 1 to 0, though often the control keys have been put to other uses.

Control-H

The letter H has ASCII code 0100 1000 and the back space control character has ASCII code 0000 1000. In some software, such as the bash shell and the Windows command line cmd, holding down the control key and typing H has the same effect as using the backspace key [1].

Other software uses Control-H for its own purposes. For example, Windows software often uses it to bring up a find-and-replace dialog, and Emacs uses it as the prefix to a help command.

Control-I

In ASCII the letter I is 0100 1001 and the tab character is 0000 1001. In some software you can produce a tab character with Control-I. This works in Emacs and in Notepad, for example. It doesn’t work in WYSIWYG programs like Word where Control-I usually formats text in italic.

Control-J and Control-M

The letter J has ASCII code 0100 1010 and the line feed control character has ASCII code 0000 1010. In some software typing Control-J inserts a line feed, and in other software it does something analogous.

Unix uses a line feed character to denote the start of a new line, but DOS used a carriage return and a line feed. If you type Control-J in Windows Notepad, you’ll get a new line, but it will be saved as a carriage return and a line feed.

In Emacs, the behavior of Control-J depends on the mode. In text mode, it simply inserts a newline. In TeX mode, Control-J ends a paragraph, but it also checks the preceding paragraph for unbalanced delimiters. If you have something like an open brace with no corresponding close brace, you’ll see a warning “Paragraph being closed appears to contain a mismatch.”

The carriage return character has ASCII code 0000 1101, and M has ASCII code 0100 1101. That why if a file was create on Windows and you open it in Unix, you may see ^M throughout the file.

Control-[

Some control characters correspond to characters other than letters. If you flip the second bit of the ASCII code for [ you get the control character for escape. And in some software, such as vi or Emacs, Control-[ has the same effect as the escape key.

More ASCII posts

[1] Control keys are often written with capital letters, like Control-H. This can be misleading if you think this means you have to also hold down the shift key as if you were typing a capital H. Control-h would be better notation. But the ASCII codes for control characters correspond to capital letters, so I use capital letters here.

5 thoughts on “Control characters

  1. OK … so now I feel …

    Among other things, wrote a couple of terminal emulators, long past, so familiar with usage of “control” characters.

    Still have practical use today. Want to serialize data into compact text form, without having to worry about complex escape or quote rules? Say you want to serialize a database table. Pick disused control characters as field and record separators – say RS for records and FS for fields. Easy to code. Very very fast to serialize and deserialize large data in this fashion.

    Note most of the Unix command line text processing tools can be used in this fashion (though missed by most).

  2. FWIW: “Group Separator”, ^], is the Telnet command preface (escape) code. Which I always forget if it’s been too long.

    Telnet is still used to move between processors within multi-processor real-time embedded systems, such as instrumentation. Plus, it’s included in BusyBox.

  3. Back in the day, young coders who discovered what happens when you print a string of Control-Gs were even more annoying than people who think HTML blink codes are cool.

  4. I learned typing and programming in the late 1970s, when the control key was to the immediate left of the A key on All Good Computer Keyboards. I recall using Ctrl-H instead of Backspace because it was all in the typing keys and the Backspace key was further away. And yes, Ctrl-I, Ctrl-J and Ctrl-M were alternatives to Tab, Line Feed and Carriage Return, as the keyboard would send the same code for either the control key or the labeled key.

    And of course many things happened. The IBM PC came out, moved the Caps Lock key to the former Control Key location to make the keyboard closer to a typewriter keyboard, and also did something more subtle. Each and every key and key combination sent a unique code (on both press and release!) to the computer, so a Ctrl=H was not interpreted as a backspace unless the OS or application interpreted the keycode as a backspace. Line Feed went away, and Carriage Return was probably just Return at some point, and is now Enter (on my Model M and the numerous copies of this layout since, by all manufacturers).

    As software became more sophisticated, the meaning of Tab changed from a tab spot every 8 characters that all terminals responded to identically, to whatever the editing or word processing software said it meant. Because of this, many coding standards now disallow the tab character in source files, and text/programming editors are (or should be) set to add spaces when the Tab key is pressed. The Tab key offered a convenience and some data compression by reducing the number of space characters, but due to diverging standards it has become detrimental.

    Speaking of controlling a mechanical device, there was BAUDOT.
    https://twitter.com/shapr/status/1148196693726375938

Comments are closed.