Python BeautifulSoup - Get HTML Elements by Tag name
Python BeautifulSoup - Get HTML Elements by Tag name
You can get an HTML elements with specific tag name in the given HTML using Python BeautifulSoup.
To get HTML elements by tag name in Python using BeautifulSoup, parse the HTML content, and then use BeautifulSoup.find_all() method with the tag name passed as argument.
Steps to get HTML elements by class name using BeautifulSoup
- Given HTML content in html_content as a string value.
- Parse the HTML content. Call BeautifulSoup() constructor, and pass the HTML content string as argument. The constructor returns a BeautifulSoup object.
soup = BeautifulSoup(html_content, 'html.parser')
- Call find_all() method on the returned BeautifulSoup object soup, and pass the tag name as string argument. The find_all() method returns a Python List of HTML elements with the specified tag name. If no element is found for the given tag, then an empty list is returned.
elements_with_tag = soup.find_all(required_tag_name)
- You may iterate over the returned list of HTML elements using a Python For loop.
Example to get HTML elements by Tag name using BeautifulSoup
1. Get all div elements using find_all() in Python
In the following program, we take a sample HTML content in html_content variable, and then find the HTML elements with "div" tag, using the above mentioned steps with BeautifulSoup.
Python Program
from bs4 import BeautifulSoup
# Sample HTML content
html_content = """
<html>
<body>
<h2>Welcome!</h2>
<p>This is a paragraph.</p>
<p>This is another paragraph.</p>
<h2>Sample Heading</h2>
<div>This is a div.</div>
<div>This is another div.</div>
</body>
</html>
"""
# Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')
# Get elements by tag name
elements_with_tag = soup.find_all("div")
# Loop through the found elements and print
for element in elements_with_tag:
print(element)
Output
<div>This is a div.</div>
<div>This is another div.</div>
We have found three elements with "div" tag.
2. Get all elements with tag "h2" using find_all() in Python
In the following program, we take a sample HTML content in html_content variable, and then find the Heading 2 elements (with "h2" tag).
Since the tag name for Heading 2 elements is h2, pass this tag name as string argument to find_all() method. The method returns a list containing all the Heading 2 elements in the HTML.
Python Program
from bs4 import BeautifulSoup
# Sample HTML content
html_content = """
<html>
<body>
<h2>Welcome!</h2>
<p>This is a paragraph.</p>
<p>This is another paragraph.</p>
<h2>Sample Heading</h2>
<div>This is a div.</div>
<div>This is another div.</div>
</body>
</html>
"""
# Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')
# Get all the h2 elements
elements_with_tag = soup.find_all("h2")
# Loop through the found elements and print
for element in elements_with_tag:
print(element)
Output
<h2>Welcome!</h2>
<h2>Sample Heading</h2>
We have found two elements with "h2" tag.
3. Get all paragraph elements using find_all() in Python
In the following program, given a HTML content in html_content variable, we find all the paragraph elements in the HTML using find_all() method.
Since the tag name for paragraph elements is p, pass this tag name as string argument to find_all() method. The method returns a list containing all the paragraphs in the HTML.
Python Program
from bs4 import BeautifulSoup
# Sample HTML content
html_content = """
<html>
<body>
<h2>Welcome!</h2>
<p>This is a paragraph.</p>
<p>This is another paragraph.</p>
<h2>Sample Heading</h2>
<div>This is a div.</div>
<div>This is another div.</div>
</body>
</html>
"""
# Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')
# Get all the paragraph elements
elements_with_tag = soup.find_all("p")
# Loop through the found elements and print
for element in elements_with_tag:
print(element)
Output
<p>This is a paragraph.</p>
<p>This is another paragraph.</p>
We have found two paragraph elements in the HTML.
Summary
In this Python BeautifulSoup tutorial, given the HTML content string, we have seen how to find the HTML elements by tag name, using BeautifulSoup.find_all() method.