Question
Capturing all matches of a string value from an array of regex patterns, while prioritizing closest matches
Let's say I have an array of names, along with a regex union of them:
match_array = [/Dan/i, /Danny/i, /Daniel/i]
match_values = Regexp.union(match_array)
I'm using a regex union because the actual data set I'm working with contains strings that often have extraneous characters, whitespaces, and varied capitalization.
I want to iterate over a series of strings to see if they match any of the values in this array. If I use .scan
, only the first matching element is returned:
'dan'.scan(match_values) # => ["dan"]
'danny'.scan(match_values) # => ["dan"]
'daniel'.scan(match_values) # => ["dan"]
'dannnniel'.scan(match_values) # => ["dan"]
'dannyel'.scan(match_values) # => ["dan"]
I want to be able to capture all of the matches (which is why I thought to use .scan
instead of .match
), but I want to prioritize the closest/most exact matches first. If none are found, then I'd want to default to the partial matches. So the results would look like this:
'dan'.scan(match_values) # => ["dan"]
'danny'.scan(match_values) # => ["danny","dan"]
'daniel'.scan(match_values) # => ["daniel","dan"]
'dannnniel'.scan(match_values) # => ["dan"]
'dannyel'.scan(match_values) # => ["danny","dan"]
Is this possible?