Python Soup - Find Elements by Text


Python BeautifulSoup - Find Elements by Text

To select elements in a HTML page by Text using BeautifulSoup, first find the elements by required selection criteria like a CSS selector, tag, or something like that, and then filter these elements based on the value of text property of the elements.

In this tutorial, we shall go through a step by step process to find elements in a webpage that contain specific text.

Steps to find elements by text using BeautifulSoup

Let us see the steps to find div elements that has the text 'Article'.

  1. Import BeautifulSoup from bs4 library.
  2. Given HTML content as a string in html_content, and the text that we need to search in the elements' text in search_text.
  3. Parse HTML content using BeautifulSoup() constructor, and store the returned object in soup.
  4. Call select() method on the soup object, and pass the required selector value. The select() method returns a list of Tag elements.
  5. Use a Python For loop to iterate over the list, and for each element, check if the text in element contains the search text.

Example: Get Div Elements with the Text 'Article'

In the following program, we take a sample HTML content, and then find the div elements with containing the text 'Article' using BeautifulSoup.select() method.

Python Program

from bs4 import BeautifulSoup

def contains_text(text):
    return text and "Article" in text

# Sample HTML content
html_content = """
<html>
    <body>
        <div>Article 1</div>
        <div>Article 2</div>
        <div>Story 1</div>
        <div>Story 2</div>
    </body>
</html>
"""
# Search text
search_text = "Article"

# Parse the HTML content
soup = BeautifulSoup(html_content, "html.parser")

# Get all div elements
div_elements = soup.select('div')

# Filter div elements based on text
elements_with_text = []
for element in div_elements:
    if search_text in element.text:
        elements_with_text.append(element)

# Print filtered elements
for element in elements_with_text:
    print(element)

Output

<div>Article 1</div>
<div>Article 2</div>

Summary

In this Python BeautifulSoup tutorial, we have seen how to find the elements in HTML page by text value using BeautifulSoup in Python, with a step by step process, and an example.