Beautiful Soup (HTML parser)
|Original author(s)||Leonard Richardson|
4.7.1 / January 6, 2019
|Type||HTML parser library, Web scraping|
|License||Python Software Foundation License (Beautiful Soup 3 - an older version) MIT License 4+|
Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.
It is available for Python 2.7 and Python 3.
#!/usr/bin/python3 # Anchor extraction from html document from bs4 import BeautifulSoup from urllib.request import urlopen with urlopen('https://en.wikipedia.org/wiki/Main_Page') as response: soup = BeautifulSoup(response, 'html.parser') for anchor in soup.find_all('a'): print(anchor.get('href', '/'))
- "Beautiful Soup website". Retrieved 18 April 2012.
Beautiful Soup is licensed under the same terms as Python itself
|This computer-library-related article is a stub. You can help Wikipedia by expanding it.|