R and colour palettes part 2 - set your primary

A few posts ago, I got very invested in how to build a nice colour palette in R. I went through how to build your own palette, select colours depending on the number of series you needed to show, and how to wrap it all up in some handy shortcut functions, so you could feel like a real pro as you bring your own colour palette in to help with whatever plot you need to build.

What could be better than a whole article devoted to colour palettes in R? Did you guess, doubling down with a follow-up article about niche aspects of colour palettes? If so, you’d be correct.

The story so far

By the end of our previous post, we’d build a company-specific colour palette that would automatically use our specified colours, and would even pick the best colours if we ended up plotting three (or four, or five) series against one another. We built all of the back-end, and then made it very easy to use by creating a function scale_colour_acme() which called the right palette and everything. Here’s a concise example of the sort of code we made:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
library("dplyr")
library("tidyr")
library("ggplot2")
library("colorspace")

.acme_colours <- c(
  red = "#eb3b5a",
  orange = "#fa8231",
  yellow = "#f7b731",
  green = "#20bf6b",
  topaz = "#0fb9b1",
  light_blue = "#2d98da",
  dark_blue = "#3867d6",
  purple = "#8854d0"
)

acme_colours <- function(index = NULL, named = FALSE) {
  # Default to everything
  if (is.null(index)) {
    index <- names(.acme_colours)
  }

  # This works with integer or character values
  return_value <- .acme_colours[index]

  if (!named) {
    names(return_value) <- NULL
  }

  return(return_value)
}

acme_colour_names <- function() {
  names(.acme_colours)
}

acme_palette <- function() {

  acme_colour_length <- length(acme_colours())

  function(n) {
    stopifnot(n <= acme_colour_length)
    return(acme_colours(1:n))
  }
}

scale_colour_acme <- function(...) {
    ggplot2::discrete_scale(
      aesthetics = "colour",
      scale_name = "acme",
      palette = acme_palette(),
      ...
    )
}

Note that this example has a very naive colour-picking algorithm within acme_palette(). It’ll do for this example, but you might want to change it when you build yours. We can use this very easily to produce some nice plots - for example, for plotting how car manufacturers’ fuel efficiency has improved over time:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Set up data ----
mileage <-
  mpg %>%
  filter(manufacturer %in% c("toyota", "honda", "nissan", "subaru", "hyundai")) %>%
  group_by(manufacturer, year) %>%
  summarise(mileage = mean(cty))


# Example 1 ----
# No primary series
ggplot(mileage, aes(x = year, y = mileage, colour = manufacturer)) +
  geom_line() +
  scale_colour_acme() +
  theme_minimal()

“What more could we do?” I hear you ask. Well..

You can be any colour you want, as long as you’re red

Let’s say we’re producing these plots for Toyota. We don’t really care whether Honda is yellow or blue or green or orange, but we know one thing: we have to be red. We should also be at the top. This is, after all, about us.

By default, R will sort your categories by whatever metric your categories can be sorted - factors will be sorted by level, while strings will be sorted alphabetically. It will then assign them, one-by-one, to the aesthetic values you’ve specified. So technically, you could convert your string values to factors, set their levels so that Toyota is always first, and then feed them into our algorithm…but that gets tiring, time after time. What if you could just tell your colour palette, “Hey, by the way, can you make sure that I’m always on top”?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Set up a "primary" scale
scale_colour_acme <- function(primary = NULL, ...) {
  scale <- ggplot2::discrete_scale(
      aesthetics = "colour",
      scale_name = "acme",
      palette = acme_palette(),
      ...
    )

  if (!is.null(primary)) {
    scale$old_map <- scale$map

    scale$breaks <- function(values) {
      c(
        primary,
        setdiff(values, primary)
      )
    }

    scale$map <- function(self, x, limits = self$get_limits()) {
      limits <- c(
        primary,
        setdiff(limits, primary)
      )

      self$old_map(x = x, limits = limits)
    }
  }

  return(scale)
}

# Example 2 ----
# Ensure "toyota" is plotted in red
ggplot(mileage, aes(x = year, y = mileage, colour = manufacturer)) +
  geom_line() +
  scale_colour_acme(primary = "toyota") +
  theme_minimal()

We do two cool things here. First, we hijack the scale’s map() function. This is what lets the scale map values (eg. “honda”, “hyundai”, “nissan”) to aesthetics (“#eb3b5a”, “#fa8231”, “#f7b731”). For discrete scales (like these, where we have a series of categories), the limits of the scale are considered to be all the distinct values - so this map() function will just be passed a list of ready-sorted categories. We step in before the mapping occurs, and gently re-arrange that list to ensure that our primary value (whatever that might be) is always on top. Second, we do the same for our breaks attribute. This determines the order that our categories are displayed in on our key. By ensuring we set the order correctly here, we make sure that not only is our primary always red, we can also ensure it’s on the top of the key (which is where it would be if it just happened to come first in the order anyway).

Highlighting some of the people, all of the time

Sometimes you only want to highlight one series, and every other series can just be dull grey or some other noncommittal colour. Think small multiples, or showing your series of interest against fifty other series. You could laboriously build a mapping of series to colours like a schmuck - but what if you could just say “if I don’t specify the colour to give this series, make it grey”? Then you could save all manner of time. Turns out, it’s not too hard - you just need to ensure your mapping function knows what to do with otherwise undefined values. We build on the mapping code we used above:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# Build primary/other scales ----
scale_colour_acme <- function(primary = NULL, other = NULL, ...) {
  scale <- ggplot2::discrete_scale(
      aesthetics = "colour",
      scale_name = "acme",
      palette = acme_palette(),
      ...
    )

  if (!is.null(primary)) {
    scale$old_map <- scale$map

    scale$breaks <- function(values) {
      c(
        primary,
        setdiff(values, primary)
      )
    }

    scale$map <-
      if (is.null(other)) {
        function(self, x, limits = self$get_limits()) {
          limits <- c(
            primary,
            setdiff(limits, primary)
          )

          self$old_map(x = x, limits = limits)
        }
      } else {
        function(self, x, limits = self$get_limits()) {
          ifelse(
            x == primary,
            acme_colours(1),
            other
          )
        }
      }
  }

  return(scale)
}


# Example 3 ----
# Plot "toyota" in red, everything else in light grey
ggplot(mileage, aes(x = year, y = mileage, colour = manufacturer)) +
  geom_line() +
  scale_colour_acme(primary = "toyota", other = "#AAAAAA") +
  theme_minimal() +
  theme(legend.position = "none")

Normally, our mapping function would assign any non-manually-assigned categories, to the various colours we supply through the palette. Now, however, we’re giving ourselves a way to override this default behaviour. If we supply the other argument, we tell the mapping function to plot the primary category in our default Acme colour, and everything else in that other colour. We don’t really need the key here - we know what we want to focus on. It turns out you can actually do this with scale_colour_manual() through the na.value argument, but I think this is a bit nicer.

But I ordered extra emphasis!

What if you wanted to show the default even more? It turns we can use this “primary/other” distinction in other scales than colour and fill - scales we haven’t touched yet in this series:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Appying primary/other to other scales ----
scale_size_specific <- function(..., .default) {
  scale_map <- c(...)
  stopifnot(all(names(scale_map) != ""))

  scale <- scale_size_manual(guide = "none")
  scale$map <- function(self, x, limits = self$get_limits()) {
    mapping <- scale_map[x]
    mapping[is.na(mapping)] <- .default
    mapping
  }

  return(scale)
}

scale_alpha_specific <- function(..., .default) {
  scale_map <- c(...)
  stopifnot(all(names(scale_map) != ""))

  scale <- scale_alpha_manual(guide = "none")
  scale$map <- function(self, x, limits = self$get_limits()) {
    mapping <- scale_map[x]
    mapping[is.na(mapping)] <- .default
    mapping
  }

  return(scale)
}

# Example 4 ----
# Plot "toyota" in red and size 1, everything else light grey and size 0.5
ggplot(mileage, aes(x = year, y = mileage, colour = manufacturer, size = manufacturer, alpha = manufacturer)) +
  geom_line() +
  scale_colour_acme(primary = "toyota", other = "#AAAAAA") +
  scale_size_specific("toyota" = 1, .default = 0.5) +
  scale_alpha_specific("toyota" = 1, .default = 0.8) +
  theme_minimal() +
  theme(legend.position = "none")

Here we’re building functions for both size and alpha, allowing us to split them along that primary/other distinction. Our “Toyota” series is blood-red, with a thicker and more solid line than every other series. Note that we also set guide = "none" for our scale_*_specific series: we’re assuming that the user is mainly going to be using these scales to add emphasis, so these aesthetic mappings don’t need to show up in the key.

Now, for this example, it might be considered a bit overkill to use colour and line thickness and alpha to distinguish Toyota from other series. But consider how we could use it on our previous, not-so-obviously-Toyota-centric plots, to give them a bit of emphasis:

1
2
3
4
5
6
7
8
# Example 5 ----
# Plot "toyota" in red and size 1, everything else gets a colour but is fainter and thinner
ggplot(mileage, aes(x = year, y = mileage, colour = manufacturer, size = manufacturer, alpha = manufacturer)) +
  geom_line() +
  scale_colour_acme(primary = "toyota") +
  scale_size_specific("toyota" = 1, .default = 0.5) +
  scale_alpha_specific("toyota" = 1, .default = 0.8) +
  theme_minimal()

Now we can tell who’s who on the chart, using our colour palette - but our preferred series is very definitely highlighted with a thicker, bolder line, so we know where we should be focussing.

Additional resources

As before - you have your chance to see the code that made all of these plots, in one place. Grab it here.