Debugging in R

1 Introduction

2 Rationale

  • Why debugging?
    • We are all stupid at times
    • Sometimes, we are clever
    • Later, this cleverness makes us feel stupid

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it? - Brian Kernighan, Unix for Beginners

3 How?

  • print/message et al
  • browser
  • traceback
  • trace
  • trace on predicates
  • Condition handling more generally

4 Print/REPL debugging

  • Easy to start with it, intuitive
  • Can be used across many languages
    • In some languages, it is the only way :(
  • Pretty useless for larger functions
  • Makes for horrible, horrible code

5 Stupid Example

stupid_func <- function(data, indices) {
    train <- data[indices,]
    test <- data[indices]
    return(list(train, test))
}

  • Spot the problem?
  • I didn't (for more time than I'm comfortable admitting)

6 Print debugging

test <- sample(nrow(iris), 75, replace=FALSE)
mytest <- stupid_func(iris, test)
  • My `stupid_func` actually works well as an example, because none of the debugging methods are going to work well on this

7 Getting Better: Browser

stupid_func <- function(data, indices) {
    browser()
    train <- data[indices,]
    test <- data[indices]
    return(list(train, test))
}
  • Browser will stop execution of the function at the point at which it is called
  • There are then a number of things you can do
    • 5 commands:
      • n for next command
      • s for step into the next function

this is a drawer

  • f for finish execution of current loop/function
    • Q for stop (exit debugging)
    • <Enter> for repeat current command
    • where print out current call-stack

8 Five browser commands (No. 4 will astound you!)

  • n: moves to the next line
  • Q: leaves the browsing setting, not evaluating the function
  • c: leaves the session, evaluating the function
  • s: steps into a function at point
  • f finish execution of current loop/function
  • Seriously though, number 4 means that you can move seamlessly from your code to the code you call, which is amazing (to me, at least ;)
  • Additionally, this works better with smaller functions

9 More realistic examples

parse_quote <- function(quote) {
      quotecontents <- sapply(quote,
                              function(x) content(x))
      numrows <- length(quotecontents)
      numcols <- max(sapply(quotecontents, length))
      resmat <- matrix(data=NA, nrow=numrows, ncol=numcols)
      nameextractors <- tolower(names(quotecontents[[1]]))

      for(i in 1:length(nameextractors)) {
          fun <- get_component(
              component=nameextractors[i])
          part <- fun(x=quotecontents)
          resmat[1:length(part),i] <- part
      }
      resmat
  }

  • WTF?
  • I think I was trying to build a generic extractor for the responses returned from an API, but given the name, I definitely didn't start that way.
  • Let's use browser to see what the hell actually happens in this function
  • Conveniently, there's one already there :)

10 Laziness to a whole new level: trace

  • I could add some browser calls to the above function
  • That's going to be really annoying (especially when you run it in a script on a remote machine and it times out and it takes days to figure out what the hell even happened)
  • Of course, I would never do that
  • there's a better way - trace

11 Trace

  • Trace with no arguments reports when a function is called
stupidfunc <- function(x, y) {res <- x + y}
stupiderfunc <- function(n) {
    res <- vector(length=n)
    for(i in 1:n) {
        res[i] <- stupidfunc(i, 2)
    }
    }
trace(stupidfunc)
stupiderfunc(10)
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc
trace: stupidfunc

12 More Trace

  • Trace can be used for debugging your own functions
  • It can also be used to debug functions from packages
trace(ggplot2, tracer=browser)
  • This will allow you to step through the entire function
  • Upon reload of the function, this is removed
  • Can also be removed using untrace
untrace(stupidfunc)

13 Other Tricks

  • If an error occurs in someone else's code, there is an easy way
options(error=recover)
options(error=NULL)
  • This will then present you with a call stack and the ability to step into any of them using a number (or 0 to exit to the top-level)
  • This is really useful for errors which are sporadic, so you can see what data actually causes the error

14 Handling errors

  • We don't always have the option of dropping into a REPL to debug
  • The code could be on a remote server
  • The code could be running on someone else's machine
  • You may wish to automate a set of reports/decisions in which case you definitely can't handle errors manually
  • This is a job for R's condition system

15 Try

  • The simplest way to do this is try
## a <- "a"
## 1+a
err <- try(1+a, silent=TRUE)
class(err)

"try-error" =- So now we can record each of the errors and (potentially take some action based on them)

16 tryCatch

  • TryCatch is a more general form of try
  • Using this, we can take different actions based on what happened in the function
conditions <- function(code) {
    tryCatch(code,
             error=function(c) "error",
             warning=function(c) "warning",
             message=function(c) "message"
             )

}
conditions(stop(1+2))
conditions(warning(1+2))
conditions(message(1+2))
[1] "error"
[1] "warning"
[1] "message"

  • If the code is successful, the result is returned
  • Otherwise, the respective condition function is evaluated
  • So, for instance, if we were trying to get a bunch of webpages, then we could log errors (and potentially retry) and warnings, while using message to report on the progress which was made

17 Better Examples

message_handler <- function(m) message(m)
warning_handler <- function(w) warning(w)
error_handler <- function(e) simpleError(
                                 message="simple error",
                                 call=e)
ok_res <- tryCatch(expr=1+1,
         message=message_handler,
         warning=warning_handler,
         error=error_handler)

warning_res <- tryCatch(expr=as.integer(2^32+1),
         message=message_handler,
         warning=warning_handler,
         error=error_handler)

error_res <- tryCatch(expr=1+a,
         message=message_handler,
         warning=warning_handler,
         error=error_handler)


 Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  NAs introduced by coercion to integer range

18 Finally

  • tryCatch also has an argument finally which is a function which is called before control is handed away from the tryCatch block
  • This is normally most useful for writing out files and ensuring that connections are closed.
  • In general, when something always needs to happen, regardless of any errors, it should be in a finally block.
tryCatch({
    while(isTRUE(levok)) {
    error={function(...) message(e)},
    finally={
        ##because i starts at one
        statelist_done <- statelist[1:(i-1)]
        saveRDS(statelist_done,
             file=paste("statelist",
                        as.character(
                            Sys.time()),
                        args[1],
                        ".rds",
                        sep="_"))
        change_instance(first, "stop")})

  • This was code that hit an API and logged all of the data got in each session in a finally block. It also ensured that the connection to the API was closed.

19 Custom Conditions

  • You can create custom conditions
  • They should inherit from error, warning or message if you want them to work
  • They must contain message and call components
#shamelessly stolen from advanced R, Wickham (2014)
condition <- function(subclass, message, call = sys.call(-1), ...) {
  structure(
    class = c(subclass, "condition"),
    list(message = message, call = call, ...)
  )
}
is.condition <- function(x) inherits(x, "condition")
myerr <- condition("error", message="this is my error")
is.condition(myerr)

=TRUE

20 Example: getting loads of photos for CNN usage

  • Because deep learning is so hot right now
  • And because I suspect most of the benchmarks are horribly over-fitted
get_one_photo <- function(url, name) {
    download.file(url, destfile =name, mode="wb" )
    message(paste("got ", url, " saved to ", name, sep="" ))
}
get_some_photos <- function(list, id, folder) {

        for (i in 1:length(list)) {
            nam <- paste0(folder, "/", id, "-", i, ".jpg")
            get_one_photo(list[i], name = nam)
        }

}
    dir.create("photos_sample")

21 Explanation

  • we wrap download.file for getting one url and saving as PNG
  • We then call this functions repeatedly this to get all photos associated with a given row (the URLs are stored in a list-column)

22 Using messages to record state

log_results <- function(e) {
    if(!exists("num_processed")) {
        num_processed <<- 1
    }
    else {
        num_processed <<- num_processed + 1
    }
    if(!exists("messagedf")) {
        messagedf <<- vector(mode="list", length=1)
    }
    else {
        messagedf <<- c(messagedf, e)
    }}
  • We will log each URL processed
  • We can also log the number of URLs processed (just because, I guess)
  • Note the (normally a bad idea) use of global variables (<<-)

23 Warnings

handle_warnings <- function(e) {
    message(e)
    if(!exists("warning_vec")) {
        warning_vec <<- e
    }
    else {
        warning_vec <<- c(warning_vec, e)

    }

}

24 Putting it together

get_all_photos <- function(data) {
    for(j in 1:nrow(data)) {
        if(length(data$photos[[j]])==0) {
            next
        }
        else {
            tryCatch(expr={get_some_photos(
                               unlist(data$photos[j]),
                               id=data$listing_id[j],
                               folder="photos_sample")},
                     message=function(e) log_results(e),
                     warning=function(e) handle_warnings(e),
                     error=function(...) message(e)
                     )
        }
    }

}

25 Recap

  • This is, I admit, definitely not best practice
  • But if the errors are independent and infrequent, it does have the advantage of working.
  • To make it better, we'll need to go a little further into R's condition system

26 Conclusions

  • R has a variety of mechanisms for debugging
  • browser is quick and easy
  • options(error=recover) is useful, but annoying
  • trace allows you to debug any function
  • When you need to respond to unexpected events, use the condition system

27 References

  • Hadley Wickham, Advanced R (this chapter)
  • John Chambers, Software for Data Analysis (read all of it)
  • Peter Siebel, Beyond Exception Handling (translated by Hadley)

Author: richie

Created: 2020-04-29 Wed 11:45

Validate