Beautiful Soup (HTML parser)

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Beautiful Soup
Original author(s)Leonard Richardson
Stable release
4.7.1 / January 6, 2019; 6 months ago (2019-01-06)
Repository Edit this at Wikidata
Written inPython
PlatformPython
TypeHTML parser library, Web scraping
LicensePython Software Foundation License (Beautiful Soup 3 - an older version) MIT License 4+[1]
Websitewww.crummy.com/software/BeautifulSoup/

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.[1]

It is available for Python 2.7 and Python 3.

Code example[edit]

#!/usr/bin/python3
# Anchor extraction from html document
from bs4 import BeautifulSoup
from urllib.request import urlopen

with urlopen('https://en.wikipedia.org/wiki/Main_Page') as response:
    soup = BeautifulSoup(response, 'html.parser')
    for anchor in soup.find_all('a'):
        print(anchor.get('href', '/'))

See also[edit]

References[edit]

  1. ^ a b "Beautiful Soup website". Retrieved 18 April 2012. Beautiful Soup is licensed under the same terms as Python itself