The case of the missing zero index

Published
2018-05-07
Tagged

R is 1-indexed. Some people probably get unreasonable riled up about this.

1
arr <- c("first entry", "second entry", "third entry")
2
arr[1] # => [1] "first entry"

But what happens if you try to use 0 as an index?

1
arr[0] # => character(0)

This might not be what you want: in fact, I suspect that nine times out of ten this is exactly not what you want. Consider:

1
indices <- c(1, 0, 2, 0, 3)
2
3
values <- arr[indices]
4
5
length(values) # => 3
6
length(indices) # => 5

I’ve had issues with this in the past. For example, consider the problem of trying to turn a number (or a vector of numbers) into Excel-style columns:

1
excel.columnify(1) # => "A"
2
excel.columnify(26) # => "Z"
3
excel.columnify(27) # => "AA"

If you’ve ever encountered R’s LETTERS constant, the process is pretty straightforward for numbers between 1 and 26:

1
excel.columnify <- function(n) {
2
  LETTERS[n]
3
}

But what if n is greater than 26? We could do something like this1:

1
excel.columnify <- function(n) {
2
  ifelse(
3
    n > 26,
4
    paste0(LETTERS[(n-1) %/% 26], LETTERS[(n-1) %% 26 + 1]),
5
    LETTERS[n]
6
  )
7
}

This seems to run OK…

1
excel.columnify(1:10)
2
  # =>  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
3
4
excel.columnify(20:30)
5
  # =>  [1] "T"  "U"  "V"  "W"  "X"  "Y"  "Z"  "AA" "AB" "AC" "AD"

…until we hit a very specific and weird exception:

1
excel.columnify(26:53)
2
  # => [1] "Z"  "AA" "AB" "AC" "AD" "AE" "AF" "AG" "AH" "AI" "AJ" "AK" "AL" "AM" "AN" "AO" "AP" "AQ" "AR" "AS" "AT"
3
  # => [22] "AU" "AV" "AW" "AX" "AY" "BZ" "AA"

Notice how the last couple of items in this vector have the wrong first letter? What’s going on here? If you’ve worked out that we have some zero-indices at work, you’re on the right track.

It’s worth noting how ifelse() works in R: it evaluates both value arguments (the last two arguments to the function) and splices them together based on the values of the first, boolean argument. Of note here, this means that the paste0 function runs on every item in the array:

1
paste0(LETTERS[(n-1) %/% 26], LETTERS[(n-1) %% 26 + 1])

Let’s examine those numeric vectors we pass as lookups to LETTERS:

1
n <- 26:53
2
(n-1) %/% 26
3
  # => [1] 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2

We’ve got a zero in this vector! Given what we know about zero lookups, we might correctly guess that this is going to cause issues:

1
LETTERS[(n-1) %/% 26]
2
  # =>  [1] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
3
  # => [27] "B"
4
5
length(LETTERS[(n-1) %/% 26])
6
  # => [1] 27
7
8
length(n)
9
  # => [1] 28
10
11
length(LETTERS[(n-1) %% 26 + 1])
12
  # => [1] 28

The paste0 function is fine with two arguments of differing lengths: it just duplicates the smallest until it’s the same length as the largest. This is normally useful, but in this case it’s broken.

How do we fix the error? It turns using NA as an index will return NA, maintaining the correct vector length:

1
LETTERS[c(1, NA, 3)]
2
  # => [1] "A" NA  "C"

Our solution is to replace every instance of 0 with NA, and then swap them back later on:

1
excel.columnify <- function(n) {
2
  first_letter_lookup <- (n-1) %/% 26
3
  first_letter_lookup[first_letter_lookup == 0] <- NA
4
5
  first_letters <- LETTERS[first_letter_lookup]
6
  first_letters[is.na(first_letters)] <- ""
7
8
  ifelse(
9
    n > 26,
10
    paste0(first_letters, LETTERS[(n-1) %% 26 + 1]),
11
    LETTERS[n]
12
  )
13
}

With this safeguard in place, our bug is fixed:

1
excel.columnify(26:53)
2
  # => [1] "Z"  "AA" "AB" "AC" "AD" "AE" "AF" "AG" "AH" "AI" "AJ" "AK" "AL" "AM" "AN" "AO" "AP" "AQ" "AR" "AS" "AT"
3
  # => [22] "AU" "AV" "AW" "AX" "AY" "AZ" "BA"

  1. Trust me, the maths works here.