Jump to content

Beautiful Soup (HTML parser): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Undid revision 760504102 by Kazkaskazkasako (talk)
Updated latest version to 4.6.0
Line 12: Line 12:
| released = <!-- {{Start date|YYYY|MM|DD|df=yes/no}} -->
| released = <!-- {{Start date|YYYY|MM|DD|df=yes/no}} -->
| discontinued =
| discontinued =
| latest release version = 4.5.1
| latest release version = 4.6.0
| latest release date = {{Start date and age|2016|08|02|df=no}}
| latest release date = {{Start date and age|2017|05|07|df=no}}
| latest preview version =
| latest preview version =
| latest preview date = <!-- {{Start date and age|YYYY|MM|DD|df=yes/no}} -->
| latest preview date = <!-- {{Start date and age|YYYY|MM|DD|df=yes/no}} -->

Revision as of 16:03, 12 July 2017

Beautiful Soup
Original author(s)Leonard Richardson
Stable release
4.6.0 / May 7, 2017; 7 years ago (2017-05-07)
Repository
Written inPython
PlatformPython
TypeHTML parser library, Web scraping
LicensePython Software Foundation License (Beautiful Soup 3 - an older version) MIT License 4+[1]
Websitewww.crummy.com/software/BeautifulSoup/

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.[1]

It is available for Python 2.6+ and Python 3.

Code example

# anchor extraction from html document
from bs4 import BeautifulSoup
import urllib2

webpage = urllib2.urlopen('http://en.wikipedia.org/wiki/Main_Page')
soup = BeautifulSoup(webpage,'html.parser')
for anchor in soup.find_all('a'):
    print(anchor.get('href', '/'))

See also

References

  1. ^ a b "Beautiful Soup website". Retrieved 18 April 2012. Beautiful Soup is licensed under the same terms as Python itself