Question

python regex to get text within a pattern defined

I'm working on writing a parser to extract information from the output given below

enter image description here

i need to get all the three texts which are in between '--'. so i wrote a regular expression as below

import re
def parse_ib_write_bw(mystr):
    output = dict()
    # match = re.search('--+\n(\n|.)*?--+', mystr, re.I)
    match = re.search('--+\n(.*)--+(.*)--+(.*)--+', mystr, re.DOTALL)

    if match:
        print(match.groups(1))
        print(match.groups(2))
        print(match.groups(3))

parse_ib_write_bw(my_str)

My understanding is:

--+\n(.*)--+    -->  This would give the output of the first block  until second  '---' is found

(.*)--+ --> would give the second block until the third '--' is found 

(.*)--+ --> would give the third block until the final'--' is found 

but i get the entire output. where i'm going wrong with my understanding?

 3  52  3
1 Jan 1970

Solution

 1

Since . matches a newline in DOTALL mode, the first .* matches all the text between the first and the last line of dashes, while the last line of dashes is matched by the latter --+(.*)--+(.*)--+ where the two .*s match an empty string.

You can instead use ^ in MULTILINE mode to assert that each line of dashes begins at the start of a line and is followed by a newline:

re.search('^--+\n(.*)^--+\n(.*)^--+\n(.*)^--+\n', mystr, re.DOTALL | re.MULTILINE)

Demo here

2024-07-23
blhsing