Skip to content
Snippets Groups Projects
Commit 2af5a32b authored by aslesha's avatar aslesha
Browse files

Upload New File

parent 69ed1f74
No related branches found
No related tags found
No related merge requests found
%% Cell type:code id: tags:
``` python
import requests
page = requests.get("http://dataquestio.github.io/web-scraping-pages/simple.html")
page
```
%% Output
<Response [200]>
%% Cell type:code id: tags:
``` python
page.status_code
```
%% Output
200
%% Cell type:code id: tags:
``` python
page.content
```
%% Output
b'<!DOCTYPE html>\n<html>\n <head>\n <title>A simple example page</title>\n </head>\n <body>\n <p>Here is some simple content for this page.</p>\n </body>\n</html>'
%% Cell type:code id: tags:
``` python
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
```
%% Cell type:code id: tags:
``` python
print(soup.prettify())
```
%% Output
<!DOCTYPE html>
<html>
<head>
<title>
A simple example page
</title>
</head>
<body>
<p>
Here is some simple content for this page.
</p>
</body>
</html>
%% Cell type:code id: tags:
``` python
list(soup.children)
```
%% Output
['html', '\n', <html>
<head>
<title>A simple example page</title>
</head>
<body>
<p>Here is some simple content for this page.</p>
</body>
</html>]
%% Cell type:code id: tags:
``` python
[type(item) for item in list(soup.children)]
```
%% Output
[bs4.element.Doctype, bs4.element.NavigableString, bs4.element.Tag]
%% Cell type:code id: tags:
``` python
html = list(soup.children)[2]
```
%% Cell type:code id: tags:
``` python
list(html.children)
```
%% Output
['\n', <head>
<title>A simple example page</title>
</head>, '\n', <body>
<p>Here is some simple content for this page.</p>
</body>, '\n']
%% Cell type:code id: tags:
``` python
body = list(html.children)[3]
```
%% Cell type:code id: tags:
``` python
list(body.children)
```
%% Output
['\n', <p>Here is some simple content for this page.</p>, '\n']
%% Cell type:code id: tags:
``` python
p = list(body.children)[1]
```
%% Cell type:code id: tags:
``` python
p.get_text()
```
%% Output
'Here is some simple content for this page.'
%% Cell type:code id: tags:
``` python
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment