Question

Remove <br> tags from a parsed Beautiful Soup list?

I'm currently getting into a for loop with all the rows I want:

page = urllib2.urlopen(pageurl)
soup = BeautifulSoup(page)
tables = soup.find("td", "bodyTd")
for row in tables.findAll('tr'):

At this point, I have my information, but the

<br />

tags are ruining my output.

What's the cleanest way to remove these?

21 27683 21

1 Jan 1970

Solution

for e in soup.findAll('br'):
    e.extract()

2011-05-08

Solution

If you want to translate the <br />'s to newlines, do something like this:

def text_with_newlines(elem):
    text = ''
    for e in elem.recursiveChildGenerator():
        if isinstance(e, basestring):
            text += e.strip()
        elif e.name == 'br':
            text += '\n'
    return text

2011-05-08