Beautiful Soup (HTML parser)

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Beautiful Soup
Original author(s)Leonard Richardson
Stable release
4.6.0 / May 7, 2017; 18 months ago (2017-05-07)
Repository Edit this at Wikidata
Written inPython
PlatformPython
TypeHTML parser library, Web scraping
LicensePython Software Foundation License (Beautiful Soup 3 - an older version) MIT License 4+[1]
Websitewww.crummy.com/software/BeautifulSoup/

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.[1]

It is available for Python 2.6+ and Python 3.

Code example[edit]

# Python 2
# Anchor extraction from html document
from bs4 import BeautifulSoup
import urllib.request

webpage = urllib.request.urlopen('http://en.wikipedia.org/wiki/Main_Page')
soup = BeautifulSoup(webpage, 'html.parser')
for anchor in soup.find_all('a'):
    print(anchor.get('href', '/'))
# Python 3
# Anchor extraction from html document
from bs4 import BeautifulSoup
import urllib.request

with urllib.request.urlopen('https://en.wikipedia.org/wiki/Main_Page') as response:
    webpage = response.read()
    soup = BeautifulSoup(webpage, 'html.parser')
    for anchor in soup.find_all('a'):
        print(anchor.get('href', '/'))

See also[edit]

References[edit]

  1. ^ a b "Beautiful Soup website". Retrieved 18 April 2012. Beautiful Soup is licensed under the same terms as Python itself