Question
R: Efficient way to str_replace_all without recursively replacing conflicting substitutions?
Hello,
The problem
First, let me try to illustrate the problem. Assume I want to apply the following cipher to encode the string, "abc"
.
library(tidyverse)
cipher <- tibble(
byte = c(128:153, 160:185, 246:255) %>% as.hexmode() %>% str_to_upper(),
char = c(LETTERS, letters, 0:9)
)
"abc" %>% str_replace_all(set_names(cipher$byte, cipher$char))
# [1] "AFFCAFFDAFFE"
The result I'd like is "A0A1A2"
, not "AFFCAFFDAFFE"
. It looks like 0
in the first substitution A0
is replaced with its own substitution, which is FF
, and so on. This is what I mean by recursive replacement of conflicting substitutions.
Related info
I've read this post. I've also read this issue. I've also looked into the vectorize_all
argument of the stri_replace_all*
function.
Working (but inefficient) solution
The only way I've managed successfully to make multiple string substitutions with replacement values that would otherwise be conflicting, is to split each string character for character, then make the substitutions, and finally paste it all back together. Like so:
library(tidyverse)
c("abc", "123") %>%
map_chr(\(string) {
str_split_1(string, "") %>%
map_chr(\(char) {
str_replace_all(char, set_names(cipher$byte, paste0("^", cipher$char, "$")))
}) %>% paste(collapse = "")
})
# [1] "A0A1A2" "F7F8F9"
Unfortunately, this way of encoding the strings takes a long time (on my 2020 Intel Macbook Pro at least) for large vectors. I prefer working within the tidyverse
, but at this stage I'd consider other methods too.