404ページのURLをPython requestsで取得したい。
Pythonのrequestsを使用しページのURLを取得したいのですが、普通のページでなくリダイレクト先のページで404のステータスコードがでるページです。
以下のURLにgetを何回かループで投げると下記ののURLがリダイレクト先として取得できます。
詳しい内容、ここで質問しています。
https://www.pinterest.com/r/pin/554505772848816821/4995915543595742901/51a16eafabbb128c3c57b4a5087421b7a031e52bcbc41b26a4cb756137b347a6
問題
しかし、下記のURLはリンクが切れていてrequestsでgetを投げるとエラーになりプログラムが停まってしまいます。r.url
でリダイレクト先のURLを取得しようと考えたのですが、上手くいきません。
エラーの内容にリダイレクト先URLが含まれているのでプログラムの変数内にURLがあることは伺えます。
Traceback (most recent call last):
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/urllib3/util/connection.py", line 57, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
conn.connect()
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/urllib3/connection.py", line 301, in connect
conn = self._new_conn()
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f49a6e75358>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/urllib3/util/retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.behance.netgallery', port=443): Max retries exceeded with url: /Portfolio/23252259 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f49a6e75358>: Failed to establish a new connection: [Errno -2] Name or service not known',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "urlget.py", line 55, in <module>
got_url = get_redirect()
File "urlget.py", line 31, in get_redirect
r = requests.get(target_url)
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/requests/sessions.py", line 524, in request
resp = self.send(prep, **send_kwargs)
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/requests/sessions.py", line 659, in send
history = [resp for resp in gen] if allow_redirects else []
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/requests/sessions.py", line 659, in <listcomp>
history = [resp for resp in gen] if allow_redirects else []
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/requests/sessions.py", line 238, in resolve_redirects
**adapter_kwargs
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/requests/sessions.py", line 637, in send
r = adapter.send(request, **kwargs)
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.behance.netgallery', port=443): Max retries exceeded with url: /Portfolio/23252259 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f49a6e75358>: Failed to establish a new connection: [Errno -2] Name or service not known',))
それを取り出す方法をご教授頂けると幸いです。
よろしくお願いします。
教えて頂きたい事
https://www.behance.netgallery/Portfolio/23252259
このURLを取り出したい。
追記
回答ありがとうございます。
しかし、回答のコードを実行すると以下のエラーが出力されます。
正規表現のreをimportしろという事かと考え実行してみたのですが、ダメでした。
Traceback (most recent call last):
File "others2.py", line 2, in <module>
import requests
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/requests/__init__.py", line 43, in <module>
import urllib3
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/urllib3/__init__.py", line 8, in <module>
from .connectionpool import (
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/site-packages/urllib3/connectionpool.py", line 3, in <module>
import logging
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/logging/__init__.py", line 26, in <module>
import sys, os, time, io, traceback, warnings, weakref, collections
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/traceback.py", line 5, in <module>
import linecache
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/linecache.py", line 11, in <module>
import tokenize
File "/home/vagrant/.pyenv/versions/anaconda3-5.0.1/envs/twitter/lib/python3.6/tokenize.py", line 37, in <module>
cookie_re = re.compile(r'^[ \t\f]*#.*?coding[:=][ \t]*([-\w.]+)', re.ASCII)
AttributeError: module 're' has no attribute 'compile'