Easy error handling in R with purrr’s possibly

It’s discouraging to see your code choke section of the way by way of though making an attempt to use a operate in R. You may know that anything in a person of these objects triggered a challenge, but how do you observe down the offender?

The purrr package’s potentially() operate is a person uncomplicated way.

In this case in point, I’ll demo code that imports many CSV data files. Most files’ price columns import as people, but a person of these comes in as figures. Jogging a operate that expects people as enter will induce an mistake.

For setup, the code underneath masses a number of libraries I need to have and then utilizes base R’s list.data files() operate to return a sorted vector with names of all the data files in my information listing.

library(purrr)
library(readr)
library(rio)
library(dplyr)
my_information_data files <- list.files("data_files", full.names = TRUE) %>%
type()

I can then import the to start with file and search at its structure.

x <- rio::import("data_files/file1.csv")
str(x)
'data.frame':	3 obs. of  3 variables:
 $ Category     : chr  "A" "B" "C"
 $ Value        : chr  "$4,256.48 " "$438.22" "$945.12"
 $ MonthStarting: chr  "12/1/20" "12/1/20" "12/1/20"

Both of those the Worth and Month columns are importing as character strings. What I in the long run want is Worth as figures and MonthStarting as dates.

I occasionally offer with concerns like this by writing a compact operate, these kinds of as the a person underneath, to make adjustments in a file immediately after import. It utilizes dplyr’s transmute() to develop a new Month column from MonthStarting as Day objects, and a new Overall column from Worth as figures. I also make certain to maintain the Category column (transmute() drops all columns not explicity described).

library(dplyr)
library(lubridate)
process_file <- function(myfile) 
rio::import(myfile) %>%
dplyr::transmute(
Category = as.character(Category),
Month = lubridate::mdy(MonthStarting),
Overall = readr::parse_selection(Worth)
)

I like to use readr’s parse_selection() operate for changing values that occur in as character strings simply because it specials with commas, greenback signs, or % signs in figures. Even so, parse_selection() demands character strings as enter. If a price is previously a selection, parse_selection() will throw an mistake.

My new operate operates fantastic when I exam it on the to start with two data files in my information listing using purrr’s map_df() operate.

my_benefits <- map_df(my_data_files[1:2], process_file)

But if I try out jogging my operate on all the data files, which include the a person in which Worth imports as figures, it will choke.

all_benefits <- map_df(my_data_files, process_file)
 Error: Problem with `mutate()` input `Total`.
x is.character(x) is not TRUE
ℹ Input `Total` is `readr::parse_number(Value)`.
Run `rlang::last_error()` to see where the error occurred.

That mistake tells me Overall is not a character column in a person of the data files, but I’m not certain which a person. Preferably, I’d like to operate by way of all the data files, marking the a person(s) with challenges as glitches but even now processing all of them as an alternative of halting at the mistake.

potentially() lets me do this by making a brand name new operate from my primary operate:

safer_process_file <- possibly(process_file, otherwise = "Error in file")

The to start with argument for potentially() is my primary operate, process_file. The second argument, if not, tells potentially() what to return if there is an mistake.

To use my new safer_process_file() operate to all my data files, I’ll use the map() operate and not purrr’s map_df() operate. Which is simply because safer_process_file() demands to return a list, not a information frame. And that’s simply because if there is an mistake, these mistake benefits won’t be a information frame they’ll be the character string that I explained to if not to create.

all_benefits <- map(my_data_files, safer_process_file)
str(all_benefits, max.stage = 1) 
List of five
 $ :'data.frame':3 obs. of  3 variables:
 $ :'data.frame':3 obs. of  3 variables:
 $ :'data.frame':3 obs. of  3 variables:
 $ : chr "Mistake in file"
 $ :'data.frame':3 obs. of  3 variables:

You can see right here that the fourth item, from my fourth file, is the a person with the mistake. Which is uncomplicated to see with only five merchandise, but wouldn’t be quite so uncomplicated if I had a thousand data files to import and three had glitches.

If I name the list with my primary file names, it is a lot easier to discover the challenge file:

names(all_benefits) <- my_data_files
str(all_results, max.level = 1) 
List of 5
 $ data_files/file1.csv:'data.frame':	3 obs. of  3 variables:
 $ data_files/file2.csv:'data.frame':	3 obs. of  3 variables:
 $ data_files/file3.csv:'data.frame':	3 obs. of  3 variables:
 $ data_files/file4.csv: chr "Error in file"
 $ data_files/file5.csv:'data.frame':	3 obs. of  3 variables:

I can even help you save the benefits of str() to a text file for additional evaluation.

str(all_benefits, max.stage = 1) %>%
capture.output(file = "benefits.txt")

Now that I know file4.csv is the challenge, I can import just that a person and ensure what the problem is.

x4 <- rio::import(my_data_files[4])
str(x4)
'data.frame':	3 obs. of  3 variables:
 $ Category     : chr  "A" "B" "C"
 $ Value        : num  3738 723 5494
 $ MonthStarting: chr  "9/1/20" "9/1/20" "9/1/20"

Ah, Worth is indeed coming in as numeric. I’ll revise my process_file() operate to account for the possibility that Worth isn’t a character string with an ifelse() examine:

process_file2 <- function(myfile) 
rio::import(myfile) %>%
dplyr::transmute(
Category = as.character(Category),
Month = lubridate::mdy(MonthStarting),
Overall = ifelse(is.character(Worth), readr::parse_selection(Worth), Worth)
)

Now if I use purrr’s map_df() with my new process_file2() operate, it must operate and give me a one information frame.

all_results2 <- map_df(my_data_files, process_file2)
str(all_results2)
'data.frame':	15 obs. of  3 variables:
 $ Category: chr  "A" "B" "C" "A" ...
 $ Month   : Date, format: "2020-12-01" "2020-12-01" "2020-12-01" ...
 $ Total   : num  4256 4256 4256 3156 3156 ...

Which is just the information and format I needed, thanks to wrapping my primary operate in potentially() to develop a new, mistake-dealing with operate.

For more R tips, head to the “Do Additional With R” web site on InfoWorld or examine out the “Do Additional With R” YouTube playlist.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Easy error handling in R with purrr’s possibly

Maria J. Danford

Next Post

Predictions for cloud computing in 2021

Breaking News

13 Essential Guidelines for FIU Dissertation Formatting

Choosing the Best Home Fiber Internet Provider: A Comprehensive Guide

Seasonal Tips for Using CPAP Supplies in Canada

Engaged in Adventure: Planning a Proposal Around a Shared Activity Like Hiking or Kayaking

Top Features of a Durable AI Website Builder

Why Custom eCommerce Solutions are Vital for Growth

Reasons Why Axis Credit Card Can Be Your Saviour

Maria J. Danford

You May Like

Breaking News