Question

dplyr identify rows that do not have a matched column within group

I have a df that looks as follow:

GROUP    NAME     TITLE     LEVEL
  A      John      Lead       0
  A      John      Staff      1
  A      Jake      Jr         0
  B      Bob       Lead       1
  B      Bob       Lead       0
  C      Andrew    Jr         0
  C      Andrew    Jr         1
  C      Rebecca   Staff      0

What I am trying to do is identify names within each group that have a title in level 1 that they do not have for level 0. In this example I would be looking for:

GROUP    NAME     TITLE     LEVEL
  A      John      Staff      1

Because in group A John has a title of staff for level 1 but not for level 0.

 4  46  4
1 Jan 1970

Solution

 2

The simplest approach I can think of would be an anti-join, where we look for GROUP/NAME/TITLE in Level 1 that is not in Level 0.

anti_join(df |> filter(LEVEL == 1),
          df |> filter(LEVEL == 0),
          join_by(GROUP, NAME, TITLE))

Result

# A tibble: 1 × 4
  GROUP NAME  TITLE LEVEL
  <chr> <chr> <chr> <dbl>
1 A     John  Staff     1
2024-07-09
Jon Spring

Solution

 1

Within each name, there should be:

  1. At least one observation for each levels of LEVEL (0 and 1).
  2. At least one (length(...) != 0) TITLE among the LEVEL == 1 that is different (setdiff) from any of the TITLEs with LEVEL == 0.
df |> 
  filter(any(LEVEL == 0) & LEVEL == 1 &
         length(setdiff(TITLE[LEVEL == 1], TITLE[LEVEL == 0])) != 0,
         .by = NAME)

#   GROUP NAME TITLE LEVEL
# 1     A John Staff     1
2024-07-09
Ma&#235;l

Solution

 0

I did it like so :

library(tidyverse)
step_1 <- read_csv("GROUP,NAME, TITLE, LEVEL
A,John,Lead, 0
A,John,Staff,1
A,Jake,Jr, 0
B,Bob, Lead, 1
B,Bob, Lead, 0
C,Andrew,Jr, 0
C,Andrew,Jr, 1
C,Rebecca, Staff,0")

# identify names within each group that have a title in level 1 that they do not have for level 0

step_1 |> split(~GROUP) |> map(\(grp){
grp |> pivot_wider(names_from="LEVEL",
                   values_from="TITLE") |>
    filter(`0`!=`1`)
}) |> list_rbind()
#  A tibble: 1 × 4
#  GROUP NAME  `0`   `1`  
#  <chr> <chr> <chr> <chr>
1 A     John  Lead  Staff
2024-07-09
Nir Graham

Solution

 0

use n_distinct:

df %>% 
  filter(n_distinct(LEVEL) == 1 & LEVEL == 1, .by =c(GROUP, NAME, TITLE))

  GROUP NAME TITLE LEVEL
1     A John Staff     1
2024-07-10
Onyambu