python urllib.request timeout error on a resource my browser can access
A script I use for scraping stock ticker changes from nasdaq.com has stopped working.
The url can still be accessed by my browsers, but gives a timeout TimeoutError: [WinError 10060] error when I attempt to access it using Python 3 urllib.request.
Other url's are still behaving as expected. I'm trying to understand whether this is some server-side script that is detecting I'm not in a browser? I am providing spoofed headers. See below.
import urllib.request
import traceback
req_headers =
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
urls=['http://msn.com',
'https://google.com',
'https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1',
'https://yahoo.com']
for url in urls:
try:
req = urllib.request.Request(
url,
data=None,
headers=req_headers
)
print('trying %s'%url)
pagetext=urllib.request.urlopen(req).read()
print ('Success',len(pagetext))
except:
traceback.print_exc()
The outputs are as follows:
trying http://msn.com
Success 446035
trying https://google.com
Sucess 234077
trying https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1
Traceback (most recent call last):
...
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
trying https://yahoo.com
Success 642758
What is going wrong here?
python python-3.x screen-scraping urllib
add a comment |
A script I use for scraping stock ticker changes from nasdaq.com has stopped working.
The url can still be accessed by my browsers, but gives a timeout TimeoutError: [WinError 10060] error when I attempt to access it using Python 3 urllib.request.
Other url's are still behaving as expected. I'm trying to understand whether this is some server-side script that is detecting I'm not in a browser? I am providing spoofed headers. See below.
import urllib.request
import traceback
req_headers =
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
urls=['http://msn.com',
'https://google.com',
'https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1',
'https://yahoo.com']
for url in urls:
try:
req = urllib.request.Request(
url,
data=None,
headers=req_headers
)
print('trying %s'%url)
pagetext=urllib.request.urlopen(req).read()
print ('Success',len(pagetext))
except:
traceback.print_exc()
The outputs are as follows:
trying http://msn.com
Success 446035
trying https://google.com
Sucess 234077
trying https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1
Traceback (most recent call last):
...
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
trying https://yahoo.com
Success 642758
What is going wrong here?
python python-3.x screen-scraping urllib
Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
– Tomalak
Nov 11 at 15:58
Can you increase the timeout value maybe ?
– Joe A
Nov 11 at 16:14
Interesting. I just tried a timeout of 25s and now the nasdaq url fails with[WinError 10054] An existing connection was forcibly closed by the remote hostMaybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further
– deseosuho
Nov 11 at 16:38
I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
– deseosuho
Nov 11 at 16:56
It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
– Praneeth
Nov 11 at 16:56
add a comment |
A script I use for scraping stock ticker changes from nasdaq.com has stopped working.
The url can still be accessed by my browsers, but gives a timeout TimeoutError: [WinError 10060] error when I attempt to access it using Python 3 urllib.request.
Other url's are still behaving as expected. I'm trying to understand whether this is some server-side script that is detecting I'm not in a browser? I am providing spoofed headers. See below.
import urllib.request
import traceback
req_headers =
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
urls=['http://msn.com',
'https://google.com',
'https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1',
'https://yahoo.com']
for url in urls:
try:
req = urllib.request.Request(
url,
data=None,
headers=req_headers
)
print('trying %s'%url)
pagetext=urllib.request.urlopen(req).read()
print ('Success',len(pagetext))
except:
traceback.print_exc()
The outputs are as follows:
trying http://msn.com
Success 446035
trying https://google.com
Sucess 234077
trying https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1
Traceback (most recent call last):
...
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
trying https://yahoo.com
Success 642758
What is going wrong here?
python python-3.x screen-scraping urllib
A script I use for scraping stock ticker changes from nasdaq.com has stopped working.
The url can still be accessed by my browsers, but gives a timeout TimeoutError: [WinError 10060] error when I attempt to access it using Python 3 urllib.request.
Other url's are still behaving as expected. I'm trying to understand whether this is some server-side script that is detecting I'm not in a browser? I am providing spoofed headers. See below.
import urllib.request
import traceback
req_headers =
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
urls=['http://msn.com',
'https://google.com',
'https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1',
'https://yahoo.com']
for url in urls:
try:
req = urllib.request.Request(
url,
data=None,
headers=req_headers
)
print('trying %s'%url)
pagetext=urllib.request.urlopen(req).read()
print ('Success',len(pagetext))
except:
traceback.print_exc()
The outputs are as follows:
trying http://msn.com
Success 446035
trying https://google.com
Sucess 234077
trying https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1
Traceback (most recent call last):
...
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
trying https://yahoo.com
Success 642758
What is going wrong here?
python python-3.x screen-scraping urllib
python python-3.x screen-scraping urllib
asked Nov 11 at 15:51
deseosuho
1621215
1621215
Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
– Tomalak
Nov 11 at 15:58
Can you increase the timeout value maybe ?
– Joe A
Nov 11 at 16:14
Interesting. I just tried a timeout of 25s and now the nasdaq url fails with[WinError 10054] An existing connection was forcibly closed by the remote hostMaybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further
– deseosuho
Nov 11 at 16:38
I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
– deseosuho
Nov 11 at 16:56
It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
– Praneeth
Nov 11 at 16:56
add a comment |
Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
– Tomalak
Nov 11 at 15:58
Can you increase the timeout value maybe ?
– Joe A
Nov 11 at 16:14
Interesting. I just tried a timeout of 25s and now the nasdaq url fails with[WinError 10054] An existing connection was forcibly closed by the remote hostMaybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further
– deseosuho
Nov 11 at 16:38
I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
– deseosuho
Nov 11 at 16:56
It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
– Praneeth
Nov 11 at 16:56
Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
– Tomalak
Nov 11 at 15:58
Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
– Tomalak
Nov 11 at 15:58
Can you increase the timeout value maybe ?
– Joe A
Nov 11 at 16:14
Can you increase the timeout value maybe ?
– Joe A
Nov 11 at 16:14
Interesting. I just tried a timeout of 25s and now the nasdaq url fails with
[WinError 10054] An existing connection was forcibly closed by the remote host Maybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further– deseosuho
Nov 11 at 16:38
Interesting. I just tried a timeout of 25s and now the nasdaq url fails with
[WinError 10054] An existing connection was forcibly closed by the remote host Maybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further– deseosuho
Nov 11 at 16:38
I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
– deseosuho
Nov 11 at 16:56
I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
– deseosuho
Nov 11 at 16:56
It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
– Praneeth
Nov 11 at 16:56
It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
– Praneeth
Nov 11 at 16:56
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53250439%2fpython-urllib-request-timeout-error-on-a-resource-my-browser-can-access%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53250439%2fpython-urllib-request-timeout-error-on-a-resource-my-browser-can-access%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
– Tomalak
Nov 11 at 15:58
Can you increase the timeout value maybe ?
– Joe A
Nov 11 at 16:14
Interesting. I just tried a timeout of 25s and now the nasdaq url fails with
[WinError 10054] An existing connection was forcibly closed by the remote hostMaybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further– deseosuho
Nov 11 at 16:38
I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
– deseosuho
Nov 11 at 16:56
It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
– Praneeth
Nov 11 at 16:56