Question
How do I fix this Reg ex so that it matches hyphenated words where the final segment ends in a consonant other than the letter m
I want to match all cases where a hyphenated string (which could be made up of one or multiple hyphenated segments) ends in a consonant that is not the letter m.
In other words, it needs to match strings such as: 'crack-l', 'crac-ken', 'cr-ca-cr-cr' etc. but not 'crack' (not hyphenated), 'br-oom' (ends in m), br -oo (last segment ends in vowel) or cr-ca-cr-ca (last segment ends in vowel).
It is mostly successful except for cases where there is more than one hyphen, then it will return part of the string such as 'cr-ca-cr' instead of the whole string which should be 'cr-ca-cr-ca'.
Here is the code I have tried with example data:
import re
dummy_data = """
broom
br-oom
br-oo
crack
crack-l
crac-ken
crack-ed
cr-ca-cr-ca
cr-ca-cr-cr
cr-ca-cr-cr-cr
"""
pattern = r'\b(?:\w+-)+\w*[bcdfghjklnpqrstvwxyz](?<!m)\b'
final_consonant_hyphenated = [
m.group(0)
for m in re.finditer(pattern, dummy_data, flags=re.IGNORECASE)
]
print(final_consonant_hyphenated)`
expected output:
['crack-l', 'crac-ken', 'crack-ed', 'cr-ca-cr-cr', 'cr-ca-cr-cr-cr']
current output:
['crack-l', 'crac-ken', 'crack-ed', **'cr-ca-cr'**, 'cr-ca-cr-cr', 'cr-ca-cr-cr-cr']
(bold string is an incorrect match as it's part of the cr-ca-cr-ca
string where the final segment ends in a vowel not a consonant).