Beautiful Soup (HTML parser): Difference between revisions
Appearance
Content deleted Content added
Undid revision 760504102 by Kazkaskazkasako (talk) |
Updated latest version to 4.6.0 |
||
Line 12: | Line 12: | ||
| released = <!-- {{Start date|YYYY|MM|DD|df=yes/no}} --> |
| released = <!-- {{Start date|YYYY|MM|DD|df=yes/no}} --> |
||
| discontinued = |
| discontinued = |
||
| latest release version = 4. |
| latest release version = 4.6.0 |
||
| latest release date = {{Start date and age| |
| latest release date = {{Start date and age|2017|05|07|df=no}} |
||
| latest preview version = |
| latest preview version = |
||
| latest preview date = <!-- {{Start date and age|YYYY|MM|DD|df=yes/no}} --> |
| latest preview date = <!-- {{Start date and age|YYYY|MM|DD|df=yes/no}} --> |
Revision as of 16:03, 12 July 2017
Original author(s) | Leonard Richardson |
---|---|
Stable release | 4.6.0
/ May 7, 2017 |
Repository | |
Written in | Python |
Platform | Python |
Type | HTML parser library, Web scraping |
License | Python Software Foundation License (Beautiful Soup 3 - an older version) MIT License 4+[1] |
Website | www |
Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.[1]
It is available for Python 2.6+ and Python 3.
Code example
# anchor extraction from html document
from bs4 import BeautifulSoup
import urllib2
webpage = urllib2.urlopen('http://en.wikipedia.org/wiki/Main_Page')
soup = BeautifulSoup(webpage,'html.parser')
for anchor in soup.find_all('a'):
print(anchor.get('href', '/'))
See also
References
- ^ a b "Beautiful Soup website". Retrieved 18 April 2012.
Beautiful Soup is licensed under the same terms as Python itself