Debugging in R

1 Introduction

2 Rationale

Why debugging?
- We are all stupid at times
- Sometimes, we are clever
- Later, this cleverness makes us feel stupid

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it? - Brian Kernighan, Unix for Beginners

3 How?

print/message et al
browser
traceback
trace
trace on predicates
Condition handling more generally

4 Print/REPL debugging

Easy to start with it, intuitive
Can be used across many languages
- In some languages, it is the only way :(
Pretty useless for larger functions
Makes for horrible, horrible code

5 Stupid Example

stupid_func <- function(data, indices) {
    train <- data[indices,]
    test <- data[indices]
    return(list(train, test))
}

Spot the problem?
I didn't (for more time than I'm comfortable admitting)

6 Print debugging

test <- sample(nrow(iris), 75, replace=FALSE)
mytest <- stupid_func(iris, test)

My `stupid_func` actually works well as an example, because none of the debugging methods are going to work well on this

7 Getting Better: Browser

stupid_func <- function(data, indices) {
    browser()
    train <- data[indices,]
    test <- data[indices]
    return(list(train, test))
}

Browser will stop execution of the function at the point at which it is called
There are then a number of things you can do
- 5 commands:
  - n for next command
  - s for step into the next function

this is a drawer

f for finish execution of current loop/function
- Q for stop (exit debugging)
- <Enter> for repeat current command
- where print out current call-stack

8 Five browser commands (No. 4 will astound you!)

n: moves to the next line
Q: leaves the browsing setting, not evaluating the function
c: leaves the session, evaluating the function
s: steps into a function at point
f finish execution of current loop/function
Seriously though, number 4 means that you can move seamlessly from your code to the code you call, which is amazing (to me, at least ;)
Additionally, this works better with smaller functions

9 More realistic examples

parse_quote <- function(quote) {
      quotecontents <- sapply(quote,
                              function(x) content(x))
      numrows <- length(quotecontents)
      numcols <- max(sapply(quotecontents, length))
      resmat <- matrix(data=NA, nrow=numrows, ncol=numcols)
      nameextractors <- tolower(names(quotecontents[[1]]))

      for(i in 1:length(nameextractors)) {
          fun <- get_component(
              component=nameextractors[i])
          part <- fun(x=quotecontents)
          resmat[1:length(part),i] <- part
      }
      resmat
  }

WTF?
I think I was trying to build a generic extractor for the responses returned from an API, but given the name, I definitely didn't start that way.
Let's use browser to see what the hell actually happens in this function
Conveniently, there's one already there :)

10 Laziness to a whole new level: trace

I could add some browser calls to the above function
That's going to be really annoying (especially when you run it in a script on a remote machine and it times out and it takes days to figure out what the hell even happened)
Of course, I would never do that
there's a better way - trace

11 Trace

Trace with no arguments reports when a function is called

stupidfunc <- function(x, y) {res <- x + y}
stupiderfunc <- function(n) {
    res <- vector(length=n)
    for(i in 1:n) {
        res[i] <- stupidfunc(i, 2)
    }
    }
trace(stupidfunc)
stupiderfunc(10)

trace: stupidfunc
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc

12 More Trace

Trace can be used for debugging your own functions
It can also be used to debug functions from packages

trace(ggplot2, tracer=browser)

This will allow you to step through the entire function
Upon reload of the function, this is removed
Can also be removed using untrace

untrace(stupidfunc)

13 Other Tricks

If an error occurs in someone else's code, there is an easy way

options(error=recover)
options(error=NULL)

This will then present you with a call stack and the ability to step into any of them using a number (or 0 to exit to the top-level)
This is really useful for errors which are sporadic, so you can see what data actually causes the error

14 Handling errors

We don't always have the option of dropping into a REPL to debug
The code could be on a remote server
The code could be running on someone else's machine
You may wish to automate a set of reports/decisions in which case you definitely can't handle errors manually
This is a job for R's condition system

15 Try

The simplest way to do this is try

## a <- "a"
## 1+a

err <- try(1+a, silent=TRUE)
class(err)

"try-error" =- So now we can record each of the errors and (potentially take some action based on them)

16 tryCatch

TryCatch is a more general form of try
Using this, we can take different actions based on what happened in the function

conditions <- function(code) {
    tryCatch(code,
             error=function(c) "error",
             warning=function(c) "warning",
             message=function(c) "message"
             )

}
conditions(stop(1+2))
conditions(warning(1+2))
conditions(message(1+2))

[1] "error"
[1] "warning"
[1] "message"

If the code is successful, the result is returned
Otherwise, the respective condition function is evaluated
So, for instance, if we were trying to get a bunch of webpages, then we could log errors (and potentially retry) and warnings, while using message to report on the progress which was made

17 Better Examples

message_handler <- function(m) message(m)
warning_handler <- function(w) warning(w)
error_handler <- function(e) simpleError(
                                 message="simple error",
                                 call=e)
ok_res <- tryCatch(expr=1+1,
         message=message_handler,
         warning=warning_handler,
         error=error_handler)

warning_res <- tryCatch(expr=as.integer(2^32+1),
         message=message_handler,
         warning=warning_handler,
         error=error_handler)

error_res <- tryCatch(expr=1+a,
         message=message_handler,
         warning=warning_handler,
         error=error_handler)

 Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  NAs introduced by coercion to integer range

18 Finally

tryCatch also has an argument finally which is a function which is called before control is handed away from the tryCatch block
This is normally most useful for writing out files and ensuring that connections are closed.
In general, when something always needs to happen, regardless of any errors, it should be in a finally block.

tryCatch({
    while(isTRUE(levok)) {
    error={function(...) message(e)},
    finally={
        ##because i starts at one
        statelist_done <- statelist[1:(i-1)]
        saveRDS(statelist_done,
             file=paste("statelist",
                        as.character(
                            Sys.time()),
                        args[1],
                        ".rds",
                        sep="_"))
        change_instance(first, "stop")})

This was code that hit an API and logged all of the data got in each session in a finally block. It also ensured that the connection to the API was closed.

19 Custom Conditions

You can create custom conditions
They should inherit from error, warning or message if you want them to work
They must contain message and call components

#shamelessly stolen from advanced R, Wickham (2014)
condition <- function(subclass, message, call = sys.call(-1), ...) {
  structure(
    class = c(subclass, "condition"),
    list(message = message, call = call, ...)
  )
}
is.condition <- function(x) inherits(x, "condition")
myerr <- condition("error", message="this is my error")
is.condition(myerr)

=TRUE

20 Example: getting loads of photos for CNN usage

Because deep learning is so hot right now
And because I suspect most of the benchmarks are horribly over-fitted

get_one_photo <- function(url, name) {
    download.file(url, destfile =name, mode="wb" )
    message(paste("got ", url, " saved to ", name, sep="" ))
}
get_some_photos <- function(list, id, folder) {

        for (i in 1:length(list)) {
            nam <- paste0(folder, "/", id, "-", i, ".jpg")
            get_one_photo(list[i], name = nam)
        }

}
    dir.create("photos_sample")

21 Explanation

we wrap download.file for getting one url and saving as PNG
We then call this functions repeatedly this to get all photos associated with a given row (the URLs are stored in a list-column)

22 Using messages to record state

log_results <- function(e) {
    if(!exists("num_processed")) {
        num_processed <<- 1
    }
    else {
        num_processed <<- num_processed + 1
    }
    if(!exists("messagedf")) {
        messagedf <<- vector(mode="list", length=1)
    }
    else {
        messagedf <<- c(messagedf, e)
    }}

We will log each URL processed
We can also log the number of URLs processed (just because, I guess)
Note the (normally a bad idea) use of global variables (<<-)

23 Warnings

handle_warnings <- function(e) {
    message(e)
    if(!exists("warning_vec")) {
        warning_vec <<- e
    }
    else {
        warning_vec <<- c(warning_vec, e)

    }

}

24 Putting it together

get_all_photos <- function(data) {
    for(j in 1:nrow(data)) {
        if(length(data$photos[[j]])==0) {
            next
        }
        else {
            tryCatch(expr={get_some_photos(
                               unlist(data$photos[j]),
                               id=data$listing_id[j],
                               folder="photos_sample")},
                     message=function(e) log_results(e),
                     warning=function(e) handle_warnings(e),
                     error=function(...) message(e)
                     )
        }
    }

}

25 Recap

This is, I admit, definitely not best practice
But if the errors are independent and infrequent, it does have the advantage of working.
To make it better, we'll need to go a little further into R's condition system

26 Conclusions

R has a variety of mechanisms for debugging
browser is quick and easy
options(error=recover) is useful, but annoying
trace allows you to debug any function
When you need to respond to unexpected events, use the condition system

27 References

Hadley Wickham, Advanced R (this chapter)
John Chambers, Software for Data Analysis (read all of it)
Peter Siebel, Beyond Exception Handling (translated by Hadley)