About Tidy design principles

Hadley Wickham

Jul 28, 2023

And one example of a principle: function definitions should be scannable.

Read →

33 Comments

Jim Hester

Jul 28, 2023

Another example of the 'arguments shouldn't change behavior of other arguments' that came to mind.

In `install.packages()`, the default interprets `pkgs` as a vector of package names. But if `repo == NULL` then `pkgs` is instead interpreted as a vector of filesystem paths.

Expand full comment

Reply (1)

Hadley Wickham

Jul 28, 2023

Oh yeah, that's a good one!

Expand full comment

Shibaprasad Bhattacharya

Jul 28, 2023

"how do you write high-quality R code that's easy to understand"

Just a side comment on this: R codes, in general, are so easier to understand compared to other languages. :-)

Expand full comment

Brandon Loudermilk

Jul 28, 2023

I like your solution to enumerate possible options in function specifications by encoding the permissible values as a character vector and assigning as default. A less elegant approach I have relied upon is encoding as an exported list. Benefits are: (1) being able to call up the permissible values using tab completion with `$`; (2) being able to explicitly assign an obvious default value (see note at bottom); (3) a bit of information hiding (values of switch strings and encoded list values can be modified without breaking client code -- assuming client adheres to using the list); (4) permissible values can be queried programatically. The cons of this approach include that the list is separate from the function (though the default assignment illustrates the intended usage); Note that similar naming conventions can reinforce the intended coupling between exported function and arg list.

STAT <- list(

MIN = "min",

MAX = "max",

MEAN = "mean"

)

stat <- function(x, stat = STAT$MEAN) {

if (!stat %in% STAT) {

supported_stat <- paste0(STAT, collapse = ", ")

print(glue::glue("Unsupported arg value: stats = '{stat}'; must be one of ({supported_stat})"))

} else {

switch (stat,

mean = mean(x),

max = max(x),

min = min(x)

)

}

I prefer your solution - definitely more elegant. That said it was not at all obvious to me that the first value of the character vector is the default value; rather upon naive inspection it seems that the function expects a vector of values. This could be a point of confusion for beginning R programmers transitioning from a different language.

Expand full comment

Reply (1)

Hadley Wickham

Jul 28, 2023

Yeah, I agree it's not super obvious that the first value is the default; that's one of the reasons I'd like to promote this pattern more: so more people know about it.

I like the idea of storing the possible options in a named list; I'll add that to the list of alternatives in https://design.tidyverse.org/enumerate-options.html#how-keep-defaults-short

Expand full comment

Reply (1)

Igor Segota

Jul 29, 2023Edited

Yeah I only realized the first option is the default in R when using the base R p.adjust function and inspecting the source code - which is very difficult to read. This function probably violates every single clean code guidelines ever. Base R is such an incoherent mess...

Have you seen R package typed? https://github.com/moodymudskipper/typed

In spirit, very similar to pydantic in Python, but overrides `?` to create assertions. So the tidy variant of base::p.adjust function could be along the lines of:

```

library(typed)

PAdjustMethod <- Character(1, ... = ~ . %in% c("holm", "BH", "bonferroni"))

adjust_p_values <- ? function(p_values= ? Double, method= "holm" ? PAdjustMethod) {

### p.adjust(p_values, method)

}

# If the function is called with a wrong method it returns an informative error:

#> adjust_p_values(x, 'wrong_method')

# Error: In `adjust_p_values(x, "wrong_method")` at `check_arg(method, PAdjustMethod)`:

# wrong argument to function, mismatch

`# value %in% c("holm", "BH", "bonferroni")`: FALSE

# `expected`: TRUE

```

This makes it a lot more obvious what is going on, what is a default (I would say it is KIND OF important Holm method is the default one!) and what are allowed values. However, I am aware this violates your point #6 - all valid values are now in another object that needs to be exported.

Expand full comment

Jon Harmon

Jul 28, 2023

I'm definitely looking forward to this book! I've consulted what's there often while coding, I definitely look forward to it being fleshed out!

One quibble: I'm not sure that I'd call the thing that appears in autocomplete and at the top of documentation the *definition* of a function. I'd include the body of the function in the definition. R4DS 2e seems to agree (https://r4ds.hadley.nz/functions.html#:~:text=definition%20of%20a%20function) "To find the definition of a function that you’ve written, place the cursor on the name of the function and press F2." Unfortunately that's not very helpful, because I can't think of anything to call it instead. Unless you DO mean the whole thing, and it's just that PART of the definition appears all over the place.

I can't think of another example for arguments changing the meaning of other arguments, but I suspect S3s result in some cases where it might be ok; where the class of the first argument (subtly?) changes the meanings of later arguments. I still can't find any concrete examples even in that case, but it feels like there might be cases where that's ok.

Expand full comment

Reply (1)

Hadley Wickham

Jul 28, 2023

Yeah, definition might not be quite precise enough; maybe formals would be a better shorthand.

Expand full comment

Reply (2)

Maarten-Jan Kallen

Aug 7, 2023

I always call it the function signature for lack of a better name.

Expand full comment

Jon Harmon

Jul 28, 2023

Maybe "function API"?

Expand full comment

Reply (1)

Hadley Wickham

Jul 31, 2023

API is more general to me as it also includes the types of inputs and output.

Expand full comment

Boris Zlatopolsky

Jul 29, 2023

On functions, the description of what it does should be super clear over brevity. Within the tidyverse, I would much rather have 4 or 5 sentences that are written for complete beginners in mind, than 1 accurate sentence.

Expand full comment

Richard DeWald

Jul 28, 2023

“Scannable” isn’t scannable. I’m in your target demo for this book in the US and I thought you were referring to another programming function (scanability) I didn’t know anything about. Having said that, the only thing I come up with “glance-worthy” which probably doesn’t have the polish you wanted :). But scannable really threw me.

Expand full comment

Reply (2)

Hadley Wickham

Jul 28, 2023

Skimmable?

Expand full comment

Reply (1)

Marko Kazimirović

Jul 28, 2023

previewable?

Expand full comment

Christoph Scheuch

Aug 1, 2023

I also tripped over the adjective "scannable". Maybe "concise" or "intuitive" better describe the requirement "gives useful information at a glance"?

Expand full comment

Jul 28, 2023

One design choice for functions that I've found awkward as a user has been when particular arguments are ignored by the function, conditional on the value of other arguments. ranger::ranger() comes to mind, if I'm remembering right I think a number of arguments, like probability=TRUE are ignored unless other specific setting are chosen. It creates a lot dependence on the user parsing the docs carefully.

In cases like that, maybe it's a sign that the function should be split into two function apis, rather than having a big omnibus interface?

Expand full comment

Reply (2)

Hadley Wickham

Jul 28, 2023

I've been thinking of that as a slightly different problem because I think it's much more common. I have a few notes at https://design.tidyverse.org/arguments-independent.html

Expand full comment

Jorre Vannieuwenhuyze

Jul 30, 2023

Would such a principle not clash with function overloading, or do I understand you incorrectly?

Expand full comment

Adam Sadowski

Sep 8, 2023

For "You should be able to tell what affects the output of the function because all inputs are explicit arguments," I would be curious on your thoughts toward read_csv(). Why not an argument like "guess_column_types = c(FALSE, TRUE)"?

Expand full comment

Reply (1)

Hadley Wickham

Sep 8, 2023

If you don't guess the column types, you have to supply them. So I think that would make guess_colum_types redundant, because you can always tell if it would be TRUE or FALSE based on whether col_types was supplied or not. Or where you thinking of something else?

Expand full comment

Reply (1)

Adam Sadowski

Sep 8, 2023

Not if the default column type is character. (Apologies, my question should have included that.) Why not use such a default in hand with a guess_column_types argument?

Expand full comment

Reply (1)

Hadley Wickham

Sep 9, 2023

I guess you could probably design the interface that way. It feels like an alternative to the a big part of the current readr `read_` function interface which would need to be analysed from scratch.

Expand full comment

Robert DelRossi

Aug 3, 2023Edited

This is is *precisely* the book I’ve been hunting for a while now. Section 10, “Enumerate possible options” answers many questions I’ve had about the right way to address this topic ideally. I’ve kludged together approaches over the years, but this is so much more robust and elegant. Being able to capitalize on the experiences of building and refining popular tidyverse packages will be a tremendous value to us all!

Expand full comment

Reply (1)

Hadley Wickham

Aug 3, 2023

Thanks Robert!

Expand full comment

Hide Shidara

Jul 30, 2023

Take care not to suffer from Design Principle creep, some things only need a principle if they repeat a lot.

Expand full comment

Joseph Rickert

Jul 28, 2023

Among the arguments for the function fdapace::FCReg() are measurementError, default = TRUE and kern, default = “gauss”. FCReg() calls fdpace:::MvCov() which has arguments measurementError, default = TRUE, and kern (no default set). Is there a preferred pattern for this sort of thing? How “deep” should defaults go?

Expand full comment

Jon Harmon

Jul 28, 2023

After reading the linked chapters, do you have any suggestions for complex required arguments? I have some that have their own helper functions to help you keep track of everything that can go into them (to construct objects as part of the input to an api). Would you just reference those in the docs, put that function in as a default arg (possibly in a form such that it would itself throw an error if they didn't supply details), or something else?

Expand full comment

Reply (1)

Hadley Wickham

Jul 28, 2023

Are you thinking along the lines of the (rough draft) discussion in https://design.tidyverse.org/arguments-independent.html#creating-a-details-object ?

Expand full comment

Reply (1)

Jon Harmon

Jul 28, 2023

Yes! I started working out the idea while helping on the youtubeR package (see https://github.com/kevin-m-kent/youtubeR/blob/main/R/videos.R#L72 + https://github.com/kevin-m-kent/youtubeR/blob/main/R/zzz_schemas.R#L23 but I was already preparing for a large rewrite of that stuff even BEFORE reading these patterns 🙃)

In general in APIs, there are objects that follow the same schema across multiple endpoints, so I want a constructor for those objects rather than having the user supply them as individual arguments for each piece of that schema, so I want to make it easy for the user to find the helper function. But they're often required arguments in the endpoint-function, so theoretically they shouldn't have defaults. Importantly, I also don't think there's a simple version in a lot of these cases; it's not like sometimes I want "10" and sometimes I want a complicated object; I always want the complicated object.

I *think* I'm leaning toward setting constructor_helper() as the default, but then THAT would have non-default, required args, so the error message could explain things ~clearly. Maybe.

Expand full comment

Reply (1)

Hadley Wickham

Jul 31, 2023

Now also some details in https://design.tidyverse.org/argument-clutter.html

Expand full comment

Reply (1)

Jon Harmon

Jul 31, 2023

That makes sense. In my case, it's a collection of both required arguments and related optional arguments that describe a complex (required) object. I feel like the same idea applies, but it breaks the "required args shouldn't have defaults" pattern (the default would be an ~empty, failing call to the options function, I think).

Expand full comment

James Laird-Smith

Jul 28, 2023Edited

In `do.call()`, the `quote=` argument changes whether items in the list of the `args=` argument are quoted or not.

Expand full comment