Five kinds of things can be subscripts in R, and they all behave differently.
- Positive integers
- Negative integers
- Zero
- Booleans
- Nothing
For all examples below, let x
be the vector (3, 1, 4, 1, 5, 9).
Positive integers
Ordinary vector subscripts in R start with 1, like FORTRAN and unlike C and its descendants. So for the vector above, x[1]
is 3, x[2]
is 1, etc. R doesn’t actually have scalar types; everything is a vector, so subscripts are vectors. In the expression x[2]
, the subscript is a vector containing a single element equal to 2. But you could use the vector (2, 3) as a subscript of x
, and you’d get (1, 4).
Negative integers
Although subscripts that reference particular elements are positive, negative subscripts are legal. However, they may not do what you’d expect. In scripting languages, it is conventional for negative subscripts to indicate indexing from the end of the array. So in Python or Perl, for example, the statement y = x[-1]
would set y
equal to 9 and y = x[-2]
would set y
equal to 5.
In R, a negative is an instruction to remove an element from a vector. So y = x[-2]
would set y
equal to the vector (3, 4, 1, 5, 9), i.e. the vector x
with the element x[2]
removed.
While R’s use of negative subscripts is unconventional, it makes sense in context. In some ways vectors in R are more like sets than arrays. Removing elements is probably a more common task than iterating backward.
Zero
So if positive subscripts index elements, and negative subscripts remove elements, what does a zero subscript do? Nothing. It doesn’t even produce an error. It is silently ignored. See Radford Neal’s blog post about zero subscripts in R for examples of how bad this can be.
Booleans
Boolean subscripts are very handy, but look very strange to the uninitiated. Ask a C programmer to guess what x[x>3]
would be and I doubt they would have an idea. A Boolean expression with a vector evaluates to a vector of Boolean values, the results of evaluating the expression componentwise. So for our value of x
, the expression x>3
evaluates to the vector (FALSE, FALSE, TRUE, FALSE, TRUE, TRUE)
. When you use a Boolean array as a subscript, the result is the subset of elements whose index corresponds to an index in the Boolean array containing a TRUE
value. So x[x>3]
is the subset of x
consisting of elements larger than 3, i.e. x[x>3]
equals (4, 5, 9).
When a vector with a Boolean subscript appears in an assignment, the assignment applies to the elements that would have been extracted if there had been no assignment. For example, x[x > 3] <- 7
turns (3, 1, 4, 1, 5, 9) into (3, 1, 7, 1, 7, 7). Also, x[x > 3] <- c(10, 11, 12)
would produce (3, 1, 10, 1, 11, 12).
Nothing
A subscript can be left out entirely. So x[]
would simply return x
. In multi-dimensional arrays, missing subscripts are interpreted as wildcards. For example, M[3,]
would return the third row of the matrix M
.
Mixtures
Fortunately, mixing positive and negative values in a single subscript array is illegal. But you can, for example, mix zeros and positive numbers. And since numbers can be NA
, you can even include NA
as a component of a subscript.
Related resources