Five kinds of subscripts in R

Five kinds of things can be subscripts in R, and they all behave differently.

  1. Positive integers
  2. Negative integers
  3. Zero
  4. Booleans
  5. Nothing

For all examples below, let x be the vector (3, 1, 4, 1, 5, 9).

Positive integers

Ordinary vector subscripts in R start with 1, like FORTRAN and unlike C and its descendants. So for the vector above, x[1] is 3, x[2] is 1, etc. R doesn’t actually have scalar types; everything is a vector, so subscripts are vectors. In the expression x[2], the subscript is a vector containing a single element equal to 2. But you could use the vector (2, 3) as a subscript of x, and you’d get (1, 4).

Negative integers

Although subscripts that reference particular elements are positive, negative subscripts are legal. However, they may not do what you’d expect. In scripting languages, it is conventional for negative subscripts to indicate indexing from the end of the array. So in Python or Perl, for example, the statement y = x[-1] would set y equal to 9 and y = x[-2] would set y equal to 5.

In R, a negative is an instruction to remove an element from a vector. So y = x[-2] would set y equal to the vector (3, 4, 1, 5, 9), i.e. the vector x with the element x[2] removed.

While R’s use of negative subscripts is unconventional, it makes sense in context. In some ways vectors in R are more like sets than arrays. Removing elements is probably a more common task than iterating backward.

Zero

So if positive subscripts index elements, and negative subscripts remove elements, what does a zero subscript do? Nothing. It doesn’t even produce an error. It is silently ignored. See Radford Neal’s blog post about zero subscripts in R for examples of how bad this can be.

Booleans

Boolean subscripts are very handy, but look very strange to the uninitiated. Ask a C programmer to guess what x[x>3] would be and I doubt they would have an idea. A Boolean expression with a vector evaluates to a vector of Boolean values, the results of evaluating the expression componentwise. So for our value of x, the expression x>3 evaluates to the vector (FALSE, FALSE, TRUE, FALSE, TRUE, TRUE). When you use a Boolean array as a subscript, the result is the subset of elements whose index corresponds to an index in the Boolean array containing a TRUE value. So x[x>3] is the subset of x consisting of elements larger than 3, i.e. x[x>3] equals (4, 5, 9).

When a vector with a Boolean subscript appears in an assignment, the assignment applies to the elements that would have been extracted if there had been no assignment. For example, x[x > 3] <- 7 turns (3, 1, 4, 1, 5, 9) into (3, 1, 7, 1, 7, 7). Also, x[x > 3] <- c(10, 11, 12) would produce (3, 1, 10, 1, 11, 12).

Nothing

A subscript can be left out entirely. So x[] would simply return x. In multi-dimensional arrays, missing subscripts are interpreted as wildcards. For example, M[3,] would return the third row of the matrix M.

Mixtures

Fortunately, mixing positive and negative values in a single subscript array is illegal. But you can, for example, mix zeros and positive numbers. And since numbers can be NA, you can even include NA as a component of a subscript.

Related resources

10 thoughts on “Five kinds of subscripts in R

  1. I looked at R when SwRI began an internal multivariate statistical control project. I was both amazed at the project and dumbfounded by its complexity. I really wanted time to study the scripting language, but alas, I found I could write a program to do a calculation in less time than I could learn the in’s and out’s of programming R. It’s a heck of a program and the language is very powerful. But, don’t pick it up on Thursday, thinking you’ll be ready to use it Monday morning.

    (Incidentally, R is what kicked off the whole port of DCDFLIB to .Net flurry in my office.)

  2. Brendan O'Connor

    also, strings can be used as subscripts when doingr double-bracket indexing for lists.

  3. Thanks for sharing this concise information, it is very helpful. It would help even more, if you can add the list and data.frame indexing (e.g., [1] vs. [[1]]) and accessing slots and attributes to this list.

  4. There’s also at least a sixth method of subscripting vectors as one can assign names for vector elements and query them by name:

    > x names(x) x["c"]
    c
    3

    So vectors in R seem to be some kind of a mix of flattened vectors and maps of more popular languages.

  5. Sorry about the formatting earlier. There’s also at least a sixth method of subscripting vectors as one can assign names for vector elements and query them by name:

    > x <- c(1,2,3)
    > names(x) <- c('a','b','c')
    > x['c']
    c
    3

    So vectors in R seem to be some kind of a mix of flattened vectors and maps of more popular languages.

  6. Hari: You are correct. I must have been thinking of lists. I updated the post to remove the error.

  7. Nearly seven years have passed since this article was posted, and over three years since the last comment. And here I come along, wrestling with R for a data science class and stumble into this post. Brief. Precise. And to the point. Thanks for the info!

  8. Fantastic! This is just what I was looking for; “here’s what’s different or non-obvious in R for programmers of other languages”

    Thanks for taking the time to post this!

Comments are closed.