Five kinds of things can be subscripts in R, and they all behave differently.
- Positive integers
- Negative integers
- Zero
- Booleans
- Nothing
For all examples below, let x
be the vector (3, 1, 4, 1, 5, 9).
Positive integers
Ordinary vector subscripts in R start with 1, like FORTRAN and unlike C and its descendants. So for the vector above, x[1]
is 3, x[2]
is 1, etc. R doesn’t actually have scalar types; everything is a vector, so subscripts are vectors. In the expression x[2]
, the subscript is a vector containing a single element equal to 2. But you could use the vector (2, 3) as a subscript of x
, and you’d get (1, 4).
Negative integers
Although subscripts that reference particular elements are positive, negative subscripts are legal. However, they may not do what you’d expect. In scripting languages, it is conventional for negative subscripts to indicate indexing from the end of the array. So in Python or Perl, for example, the statement y = x[-1]
would set y
equal to 9 and y = x[-2]
would set y
equal to 5.
In R, a negative is an instruction to remove an element from a vector. So y = x[-2]
would set y
equal to the vector (3, 4, 1, 5, 9), i.e. the vector x
with the element x[2]
removed.
While R’s use of negative subscripts is unconventional, it makes sense in context. In some ways vectors in R are more like sets than arrays. Removing elements is probably a more common task than iterating backward.
Zero
So if positive subscripts index elements, and negative subscripts remove elements, what does a zero subscript do? Nothing. It doesn’t even produce an error. It is silently ignored. See Radford Neal’s blog post about zero subscripts in R for examples of how bad this can be.
Booleans
Boolean subscripts are very handy, but look very strange to the uninitiated. Ask a C programmer to guess what x[x>3]
would be and I doubt they would have an idea. A Boolean expression with a vector evaluates to a vector of Boolean values, the results of evaluating the expression componentwise. So for our value of x
, the expression x>3
evaluates to the vector (FALSE, FALSE, TRUE, FALSE, TRUE, TRUE)
. When you use a Boolean array as a subscript, the result is the subset of elements whose index corresponds to an index in the Boolean array containing a TRUE
value. So x[x>3]
is the subset of x
consisting of elements larger than 3, i.e. x[x>3]
equals (4, 5, 9).
When a vector with a Boolean subscript appears in an assignment, the assignment applies to the elements that would have been extracted if there had been no assignment. For example, x[x > 3] <- 7
turns (3, 1, 4, 1, 5, 9) into (3, 1, 7, 1, 7, 7). Also, x[x > 3] <- c(10, 11, 12)
would produce (3, 1, 10, 1, 11, 12).
Nothing
A subscript can be left out entirely. So x[]
would simply return x
. In multi-dimensional arrays, missing subscripts are interpreted as wildcards. For example, M[3,]
would return the third row of the matrix M
.
Mixtures
Fortunately, mixing positive and negative values in a single subscript array is illegal. But you can, for example, mix zeros and positive numbers. And since numbers can be NA
, you can even include NA
as a component of a subscript.
I looked at R when SwRI began an internal multivariate statistical control project. I was both amazed at the project and dumbfounded by its complexity. I really wanted time to study the scripting language, but alas, I found I could write a program to do a calculation in less time than I could learn the in’s and out’s of programming R. It’s a heck of a program and the language is very powerful. But, don’t pick it up on Thursday, thinking you’ll be ready to use it Monday morning.
(Incidentally, R is what kicked off the whole port of DCDFLIB to .Net flurry in my office.)
also, strings can be used as subscripts when doingr double-bracket indexing for lists.
Thanks for sharing this concise information, it is very helpful. It would help even more, if you can add the list and data.frame indexing (e.g., [1] vs. [[1]]) and accessing slots and attributes to this list.
“Vectors in R are not required to consist of a single type, so you can mix the types above all in one subscript!”
I think you got this wrong here, but right in your other blog: http://www.johndcook.com/R_language_for_programmers.html.
how to import vector and list type of files
There’s also at least a sixth method of subscripting vectors as one can assign names for vector elements and query them by name:
> x names(x) x["c"]
c
3
So vectors in R seem to be some kind of a mix of flattened vectors and maps of more popular languages.
Sorry about the formatting earlier. There’s also at least a sixth method of subscripting vectors as one can assign names for vector elements and query them by name:
> x <- c(1,2,3)
> names(x) <- c('a','b','c')
> x['c']
c
3
So vectors in R seem to be some kind of a mix of flattened vectors and maps of more popular languages.
Hari: You are correct. I must have been thinking of lists. I updated the post to remove the error.
Nearly seven years have passed since this article was posted, and over three years since the last comment. And here I come along, wrestling with R for a data science class and stumble into this post. Brief. Precise. And to the point. Thanks for the info!
Fantastic! This is just what I was looking for; “here’s what’s different or non-obvious in R for programmers of other languages”
Thanks for taking the time to post this!