Question

Filter list of lists with purrr::keep

I have a reprex as follows:

library(dplyr)
library(purrr)

df1 <- data.frame(
  col1 = 1:5,
  col2 = 6:10
)

df2 <- df1 %>% 
  mutate(col3 = 1:5)


ls <- list(
  a = list(df1 = df1),
  b = list(
    df1 = df1,
    df2 = df2
  ),
  c = list(
    df2 = df2
  )
)

I want to filter ls so that elements of ls that contain col3 by name are kept.

I have tried using keep but I am unable to index correctly at the right depth.

Expected solution:

list(
  b = list(
    df2 = df2
  )
  ,c = list(
    df2 = df2
  )
)

This is close:

ls %>% 
  map(
    ~keep(.x, ~ "col3" %in% names(.x))
  )
 6  92  6
1 Jan 1970

Solution

 6

I know you ask for purrr, but here's a convenient one-liner with rrapply.

rrapply is based on base R rapply and aims at applying functions recursively to a set of elements of a list. Here, I use the condition argument to specify which elements should be evaluated, and how = "prune" to remove any elements that are not matching the condition. classes = "data.frame" specifies that the condition function should be applied at the level of the data.frame (and not, for instance, at the levels of the columns of the data.frames).

library(rrapply)
rrapply(ls, condition = \(x) "col3" %in% names(x), classes = "data.frame", how = "prune")

# $b
# $b$df2
#   col1 col2 col3
# 1    1    6    1
# 2    2    7    2
# 3    3    8    3
# 4    4    9    4
# 5    5   10    5
# 
# 
# $c
# $c$df2
#   col1 col2 col3
# 1    1    6    1
# 2    2    7    2
# 3    3    8    3
# 4    4    9    4
# 5    5   10    5
2024-07-04
Ma&#235;l

Solution

 3

This was trickier than I thought with purrr. It's simple to specify that you want to keep elements which have "col3" in the name but then you end up with a list with nested NULL elements. Here is a function to remove those:

remove_nulls <- function(l) {
    l |>
        # Replace nested NULLs with NULL
        map(\(x) if (is.null(unlist(x))) NULL else x) |>
        # Remove NULLs at 2nd level or below
        map(compact) |>
        # Remove NULLs at top level
        compact()
}

Then it's just a case of using purrr::modify_tree() to modify every leaf, using a predicate function to define leaves as data frames, and searching for "col3".

out <- modify_tree(
    ls,
    is_node = negate(is.data.frame),
    leaf = \(x) if ("col3" %in% names(x)) x else NULL
) |>
    remove_nulls()

identical(out, desired)
# [1] TRUE
2024-07-04
SamR