Python BeautifulSoup - Get outer HTML of Element


Python BeautifulSoup  - Get outer HTML of Element

To get the outer HTML of an element in Python using BeautifulSoup, you can get the string representation of the element using str() built-in function, or use prettify() method on the HTML element.

1. Get outer HTML using str() built-in function in Python

In the following program, we take a sample HTML content in html_content variable, find the HTML element whose id is "my_div", and then get the outer HTML of this element using str() built-in function.

Find the required element using find() method. Then pass the element as argument to str() built-in function. The function returns a string representation of the element, which is the required outer HTML of the element.

Python Program

from bs4 import BeautifulSoup

# Sample HTML content
html_content = """
<html>
    <body>
        <div id="my_div">
            <h2>Welcome!</h2>
            <p>This is a paragraph.</p>
        </div>
        <div>This is another div.</div>
    </body>
</html>
"""

# Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')

# Find the element
element = soup.find(id="my_div")

# Get outer HTML of element
outer_html = str(element)

print(outer_html)

Output

<div id="my_div">
<h2>Welcome!</h2>
<p>This is a paragraph.</p>
</div>

2. Get outer HTML using Tag.prettify() in Python

prettify() method of the Tab class returns the content of the element including the formatting.

Find the required element using find() method. Call prettify() on the element. The method returns a string. Print it to output.

Python Program

from bs4 import BeautifulSoup

# Sample HTML content
html_content = """
<html>
    <body>
        <div id="my_div">
            <h2>Welcome!</h2>
            <p>This is a paragraph.</p>
        </div>
        <div>This is another div.</div>
    </body>
</html>
"""

# Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')

# Find the element
element = soup.find(id="my_div")

# Get outer HTML of element
outer_html = element.prettify()

print(outer_html)

Output

<div id="my_div">
 <h2>
  Welcome!
 </h2>
 <p>
  This is a paragraph.
 </p>
</div>

Summary

In this Python BeautifulSoup tutorial, given the HTML element, we have seen how to find the outer HTML content of the element by getting the string representation of the element, or by using prettify() method of the element.