Argument ordering
Bringing tidy design principles back to life and thinking about argument order
This week I’ve focussed on bringing the tidy design principles website back to life. Jenny Bryan and I started on this book all the way back in 2018, and there have obviously been some quite a few technology changes in the meantime. (Un)fortunately I have quite a bit of experience converting bookdown books to Quarto and tools like knitr::convert_chunk_header()
do much of the heavy lifting, so overall it was just a couple of hours of work. But it’s highly likely I missed some stuff in the translation, so please let me know if you see anything that looks odd.
I also managed to get some thoughts in place about how you might think about ordering function arguments. I think both patterns are fairly obvious (please let me know if you think otherwise!) but worth writing down.
The first principle is that you should put the most important arguments first. Of course, this raises the question of what makes an argument important? It’s hard for me to make this concrete (so I hope you’ll send me your thoughts too), but there are three clear cases:
If a function transforms some existing object, that object is most important and should be the first argument.
Other arguments that affect the “shape” of the output (either the size or the type) are usually very important and should be near the front.
Optional arguments are clearly least important and should be at the end.
Do you think there are tidyverse functions that break this rule? Please let me know so I can include them as examples!
The second principle is that if you use …, it should be placed between the required and optional arguments. This is important because of the way that argument matching works in R: you can supply arguments by position (like mean(x)), by full name (like mean(na.rm = TRUE)), or by part of the name (like mean(na = TRUE)). Placing arguments after … means that you can only supply them by their full name; not only does this force users into a clearer style, it also makes it easier for you to change the order of arguments in the future (e.g. to move important arguments earlier, as above).
The goal of writing this newsletter is to get your thoughts. So please let me know if you think these patterns are useful or wrong or anything in between!
— Hadley
I hate to be pedantic, but want to point out that if we're talking about functions that work with the tidyverse, the first argument should always be the data. I think this is implicit for most users, but maybe not all?
Does the facet_grid() function count as something that might 'break' these rules?
I use facet_grid() to tweak the aesthetics of a plot but the facet_grid() function doesn't explicitly require (or need) an argument for the base ggplot or even the source data frame.
facet_grid() works great (and is very intuitive) but I think is an exception to the first bullet of "If a function transforms some existing object, that object is most important and should be the first argument."