python urllib.request timeout error on a resource my browser can access

A script I use for scraping stock ticker changes from nasdaq.com has stopped working.

The url can still be accessed by my browsers, but gives a timeout TimeoutError: [WinError 10060] error when I attempt to access it using Python 3 urllib.request.

Other url's are still behaving as expected. I'm trying to understand whether this is some server-side script that is detecting I'm not in a browser? I am providing spoofed headers. See below.

import urllib.request
import traceback

req_headers = 
 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
 

urls=['http://msn.com',
 'https://google.com',
 'https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1',
 'https://yahoo.com']
for url in urls:
 try:
 req = urllib.request.Request(
 url,
 data=None,
 headers=req_headers
 )
 print('trying %s'%url)
 pagetext=urllib.request.urlopen(req).read()
 print ('Success',len(pagetext))
 except:
 traceback.print_exc()

The outputs are as follows:

trying http://msn.com
Success 446035
trying https://google.com
Sucess 234077
trying https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1
Traceback (most recent call last):
 ...
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
trying https://yahoo.com
Success 642758

What is going wrong here?

asked Nov 11 at 15:51

deseosuho

1621215

Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
– Tomalak
Nov 11 at 15:58

Can you increase the timeout value maybe ?
– Joe A
Nov 11 at 16:14

Interesting. I just tried a timeout of 25s and now the nasdaq url fails with [WinError 10054] An existing connection was forcibly closed by the remote host Maybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further
– deseosuho
Nov 11 at 16:38

I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
– deseosuho
Nov 11 at 16:56

It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
– Praneeth
Nov 11 at 16:56

add a comment |

A script I use for scraping stock ticker changes from nasdaq.com has stopped working.

The url can still be accessed by my browsers, but gives a timeout TimeoutError: [WinError 10060] error when I attempt to access it using Python 3 urllib.request.

Other url's are still behaving as expected. I'm trying to understand whether this is some server-side script that is detecting I'm not in a browser? I am providing spoofed headers. See below.

import urllib.request
import traceback

req_headers = 
 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
 

urls=['http://msn.com',
 'https://google.com',
 'https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1',
 'https://yahoo.com']
for url in urls:
 try:
 req = urllib.request.Request(
 url,
 data=None,
 headers=req_headers
 )
 print('trying %s'%url)
 pagetext=urllib.request.urlopen(req).read()
 print ('Success',len(pagetext))
 except:
 traceback.print_exc()

The outputs are as follows:

trying http://msn.com
Success 446035
trying https://google.com
Sucess 234077
trying https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1
Traceback (most recent call last):
 ...
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
trying https://yahoo.com
Success 642758

What is going wrong here?

asked Nov 11 at 15:51

deseosuho

1621215

Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
– Tomalak
Nov 11 at 15:58

Can you increase the timeout value maybe ?
– Joe A
Nov 11 at 16:14

Interesting. I just tried a timeout of 25s and now the nasdaq url fails with [WinError 10054] An existing connection was forcibly closed by the remote host Maybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further
– deseosuho
Nov 11 at 16:38

I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
– deseosuho
Nov 11 at 16:56

It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
– Praneeth
Nov 11 at 16:56

add a comment |

A script I use for scraping stock ticker changes from nasdaq.com has stopped working.

The url can still be accessed by my browsers, but gives a timeout TimeoutError: [WinError 10060] error when I attempt to access it using Python 3 urllib.request.

Other url's are still behaving as expected. I'm trying to understand whether this is some server-side script that is detecting I'm not in a browser? I am providing spoofed headers. See below.

import urllib.request
import traceback

req_headers = 
 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
 

urls=['http://msn.com',
 'https://google.com',
 'https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1',
 'https://yahoo.com']
for url in urls:
 try:
 req = urllib.request.Request(
 url,
 data=None,
 headers=req_headers
 )
 print('trying %s'%url)
 pagetext=urllib.request.urlopen(req).read()
 print ('Success',len(pagetext))
 except:
 traceback.print_exc()

The outputs are as follows:

trying http://msn.com
Success 446035
trying https://google.com
Sucess 234077
trying https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1
Traceback (most recent call last):
 ...
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
trying https://yahoo.com
Success 642758

What is going wrong here?

asked Nov 11 at 15:51

deseosuho

1621215

A script I use for scraping stock ticker changes from nasdaq.com has stopped working.

The url can still be accessed by my browsers, but gives a timeout TimeoutError: [WinError 10060] error when I attempt to access it using Python 3 urllib.request.

Other url's are still behaving as expected. I'm trying to understand whether this is some server-side script that is detecting I'm not in a browser? I am providing spoofed headers. See below.

import urllib.request
import traceback

req_headers = 
 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
 

urls=['http://msn.com',
 'https://google.com',
 'https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1',
 'https://yahoo.com']
for url in urls:
 try:
 req = urllib.request.Request(
 url,
 data=None,
 headers=req_headers
 )
 print('trying %s'%url)
 pagetext=urllib.request.urlopen(req).read()
 print ('Success',len(pagetext))
 except:
 traceback.print_exc()

The outputs are as follows:

trying http://msn.com
Success 446035
trying https://google.com
Sucess 234077
trying https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1
Traceback (most recent call last):
 ...
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
trying https://yahoo.com
Success 642758

What is going wrong here?

python python-3.x screen-scraping urllib

asked Nov 11 at 15:51

deseosuho

1621215

asked Nov 11 at 15:51

deseosuho

1621215

asked Nov 11 at 15:51

deseosuho

1621215

asked Nov 11 at 15:51

deseosuho

1621215

asked Nov 11 at 15:51

deseosuho

1621215

Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
– Tomalak
Nov 11 at 15:58

Can you increase the timeout value maybe ?
– Joe A
Nov 11 at 16:14

Interesting. I just tried a timeout of 25s and now the nasdaq url fails with [WinError 10054] An existing connection was forcibly closed by the remote host Maybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further
– deseosuho
Nov 11 at 16:38

I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
– deseosuho
Nov 11 at 16:56

It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
– Praneeth
Nov 11 at 16:56

add a comment |

Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
– Tomalak
Nov 11 at 15:58

Can you increase the timeout value maybe ?
– Joe A
Nov 11 at 16:14

Interesting. I just tried a timeout of 25s and now the nasdaq url fails with [WinError 10054] An existing connection was forcibly closed by the remote host Maybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further
– deseosuho
Nov 11 at 16:38

I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
– deseosuho
Nov 11 at 16:56

It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
– Praneeth
Nov 11 at 16:56

Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
– Tomalak
Nov 11 at 15:58

Can you increase the timeout value maybe ?
– Joe A
Nov 11 at 16:14

Interesting. I just tried a timeout of 25s and now the nasdaq url fails with [WinError 10054] An existing connection was forcibly closed by the remote host Maybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further
– deseosuho
Nov 11 at 16:38

I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
– deseosuho
Nov 11 at 16:56

It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
– Praneeth
Nov 11 at 16:56

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53250439%2fpython-urllib-request-timeout-error-on-a-resource-my-browser-can-access%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb