Beautiful Soup (HTML parser): Difference between revisions

Beautiful Soup
Original author(s)	Leonard Richardson
Stable release	4.6.0 / May 7, 2017; 7 years ago
Repository	code.launchpad.net/beautifulsoup/ ;
Written in	Python
Platform	Python
Type	HTML parser library, Web scraping
License	Python Software Foundation License (Beautiful Soup 3 - an older version) MIT License 4+
Website	www.crummy.com/software/BeautifulSoup/

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 16:03, 12 July 2017

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.^[1]

It is available for Python 2.6+ and Python 3.

Code example

# anchor extraction from html document
from bs4 import BeautifulSoup
import urllib2

webpage = urllib2.urlopen('http://en.wikipedia.org/wiki/Main_Page')
soup = BeautifulSoup(webpage,'html.parser')
for anchor in soup.find_all('a'):
    print(anchor.get('href', '/'))

References

^ ^a ^b "Beautiful Soup website". Retrieved 18 April 2012. Beautiful Soup is licensed under the same terms as Python itself

This computer-library-related article is a stub. You can help Wikipedia by expanding it.

[crummy.com-1] "Beautiful Soup website". Retrieved 18 April 2012. Beautiful Soup is licensed under the same terms as Python itself

[1]

@@ Line 12: / Line 12: @@
 | released               = <!-- {{Start date|YYYY|MM|DD|df=yes/no}} -->
 | discontinued           =
-| latest release version = 4.5.1
+| latest release version = 4.6.0
-| latest release date    = {{Start date and age|2016|08|02|df=no}}
+| latest release date    = {{Start date and age|2017|05|07|df=no}}
 | latest preview version =
 | latest preview date    = <!-- {{Start date and age|YYYY|MM|DD|df=yes/no}} -->

Revision as of 16:03, 12 July 2017

Code example

See also

References