Easy error handling in R with purrr’s possibly
It’s discouraging to see your code choke section of the way by way of though making an attempt to use a operate in R. You may know that anything in a person of these objects triggered a challenge, but how do you observe down the offender?
The purrr package’s potentially()
operate is a person uncomplicated way.
In this case in point, I’ll demo code that imports many CSV data files. Most files’ price columns import as people, but a person of these comes in as figures. Jogging a operate that expects people as enter will induce an mistake.
For setup, the code underneath masses a number of libraries I need to have and then utilizes base R’s list.data files()
operate to return a sorted vector with names of all the data files in my information listing.
library(purrr)
library(readr)
library(rio)
library(dplyr)
my_information_data files <- list.files("data_files", full.names = TRUE) %>%
type()
I can then import the to start with file and search at its structure.
x <- rio::import("data_files/file1.csv") str(x) 'data.frame': 3 obs. of 3 variables: $ Category : chr "A" "B" "C" $ Value : chr "$4,256.48 " "$438.22" "$945.12" $ MonthStarting: chr "12/1/20" "12/1/20" "12/1/20"
Both of those the Worth and Month columns are importing as character strings. What I in the long run want is Worth as figures and MonthStarting as dates.
I occasionally offer with concerns like this by writing a compact operate, these kinds of as the a person underneath, to make adjustments in a file immediately after import. It utilizes dplyr’s transmute()
to develop a new Month column from MonthStarting as Day objects, and a new Overall column from Worth as figures. I also make certain to maintain the Category column (transmute()
drops all columns not explicity described).
library(dplyr)
library(lubridate)
process_file <- function(myfile)
rio::import(myfile) %>%
dplyr::transmute(
Category = as.character(Category),
Month = lubridate::mdy(MonthStarting),
Overall = readr::parse_selection(Worth)
)
I like to use readr’s parse_selection()
operate for changing values that occur in as character strings simply because it specials with commas, greenback signs, or % signs in figures. Even so, parse_selection()
demands character strings as enter. If a price is previously a selection, parse_selection()
will throw an mistake.
My new operate operates fantastic when I exam it on the to start with two data files in my information listing using purrr’s map_df()
operate.
my_benefits <- map_df(my_data_files[1:2], process_file)
But if I try out jogging my operate on all the data files, which include the a person in which Worth imports as figures, it will choke.
all_benefits <- map_df(my_data_files, process_file) Error: Problem with `mutate()` input `Total`. x is.character(x) is not TRUE ℹ Input `Total` is `readr::parse_number(Value)`. Run `rlang::last_error()` to see where the error occurred.
That mistake tells me Overall is not a character column in a person of the data files, but I’m not certain which a person. Preferably, I’d like to operate by way of all the data files, marking the a person(s) with challenges as glitches but even now processing all of them as an alternative of halting at the mistake.
potentially()
lets me do this by making a brand name new operate from my primary operate:
safer_process_file <- possibly(process_file, otherwise = "Error in file")
The to start with argument for potentially()
is my primary operate, process_file
. The second argument, if not
, tells potentially()
what to return if there is an mistake.
To use my new safer_process_file()
operate to all my data files, I’ll use the map()
operate and not purrr’s map_df()
operate. Which is simply because safer_process_file()
demands to return a list, not a information frame. And that’s simply because if there is an mistake, these mistake benefits won’t be a information frame they’ll be the character string that I explained to if not
to create.
all_benefits <- map(my_data_files, safer_process_file)
str(all_benefits, max.stage = 1) List of five $ :'data.frame':3 obs. of 3 variables: $ :'data.frame':3 obs. of 3 variables: $ :'data.frame':3 obs. of 3 variables: $ : chr "Mistake in file" $ :'data.frame':3 obs. of 3 variables:
You can see right here that the fourth item, from my fourth file, is the a person with the mistake. Which is uncomplicated to see with only five merchandise, but wouldn’t be quite so uncomplicated if I had a thousand data files to import and three had glitches.
If I name the list with my primary file names, it is a lot easier to discover the challenge file:
names(all_benefits) <- my_data_files str(all_results, max.level = 1) List of 5 $ data_files/file1.csv:'data.frame': 3 obs. of 3 variables: $ data_files/file2.csv:'data.frame': 3 obs. of 3 variables: $ data_files/file3.csv:'data.frame': 3 obs. of 3 variables: $ data_files/file4.csv: chr "Error in file" $ data_files/file5.csv:'data.frame': 3 obs. of 3 variables:
I can even help you save the benefits of str()
to a text file for additional evaluation.
str(all_benefits, max.stage = 1) %>%
capture.output(file = "benefits.txt")
Now that I know file4.csv is the challenge, I can import just that a person and ensure what the problem is.
x4 <- rio::import(my_data_files[4]) str(x4) 'data.frame': 3 obs. of 3 variables: $ Category : chr "A" "B" "C" $ Value : num 3738 723 5494 $ MonthStarting: chr "9/1/20" "9/1/20" "9/1/20"
Ah, Worth is indeed coming in as numeric. I’ll revise my process_file()
operate to account for the possibility that Worth isn’t a character string with an ifelse()
examine:
process_file2 <- function(myfile)
rio::import(myfile) %>%
dplyr::transmute(
Category = as.character(Category),
Month = lubridate::mdy(MonthStarting),
Overall = ifelse(is.character(Worth), readr::parse_selection(Worth), Worth)
)
Now if I use purrr’s map_df()
with my new process_file2()
operate, it must operate and give me a one information frame.
all_results2 <- map_df(my_data_files, process_file2) str(all_results2) 'data.frame': 15 obs. of 3 variables: $ Category: chr "A" "B" "C" "A" ... $ Month : Date, format: "2020-12-01" "2020-12-01" "2020-12-01" ... $ Total : num 4256 4256 4256 3156 3156 ...
Which is just the information and format I needed, thanks to wrapping my primary operate in potentially()
to develop a new, mistake-dealing with operate.
For more R tips, head to the “Do Additional With R” web site on InfoWorld or examine out the “Do Additional With R” YouTube playlist.
Copyright © 2020 IDG Communications, Inc.