python urllib.request timeout error on a resource my browser can access










0














A script I use for scraping stock ticker changes from nasdaq.com has stopped working.



The url can still be accessed by my browsers, but gives a timeout TimeoutError: [WinError 10060] error when I attempt to access it using Python 3 urllib.request.



Other url's are still behaving as expected. I'm trying to understand whether this is some server-side script that is detecting I'm not in a browser? I am providing spoofed headers. See below.



import urllib.request
import traceback

req_headers =
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'


urls=['http://msn.com',
'https://google.com',
'https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1',
'https://yahoo.com']
for url in urls:
try:
req = urllib.request.Request(
url,
data=None,
headers=req_headers
)
print('trying %s'%url)
pagetext=urllib.request.urlopen(req).read()
print ('Success',len(pagetext))
except:
traceback.print_exc()


The outputs are as follows:



trying http://msn.com
Success 446035
trying https://google.com
Sucess 234077
trying https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1
Traceback (most recent call last):
...
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
trying https://yahoo.com
Success 642758


What is going wrong here?










share|improve this question





















  • Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
    – Tomalak
    Nov 11 at 15:58










  • Can you increase the timeout value maybe ?
    – Joe A
    Nov 11 at 16:14










  • Interesting. I just tried a timeout of 25s and now the nasdaq url fails with [WinError 10054] An existing connection was forcibly closed by the remote host Maybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further
    – deseosuho
    Nov 11 at 16:38










  • I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
    – deseosuho
    Nov 11 at 16:56










  • It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
    – Praneeth
    Nov 11 at 16:56















0














A script I use for scraping stock ticker changes from nasdaq.com has stopped working.



The url can still be accessed by my browsers, but gives a timeout TimeoutError: [WinError 10060] error when I attempt to access it using Python 3 urllib.request.



Other url's are still behaving as expected. I'm trying to understand whether this is some server-side script that is detecting I'm not in a browser? I am providing spoofed headers. See below.



import urllib.request
import traceback

req_headers =
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'


urls=['http://msn.com',
'https://google.com',
'https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1',
'https://yahoo.com']
for url in urls:
try:
req = urllib.request.Request(
url,
data=None,
headers=req_headers
)
print('trying %s'%url)
pagetext=urllib.request.urlopen(req).read()
print ('Success',len(pagetext))
except:
traceback.print_exc()


The outputs are as follows:



trying http://msn.com
Success 446035
trying https://google.com
Sucess 234077
trying https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1
Traceback (most recent call last):
...
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
trying https://yahoo.com
Success 642758


What is going wrong here?










share|improve this question





















  • Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
    – Tomalak
    Nov 11 at 15:58










  • Can you increase the timeout value maybe ?
    – Joe A
    Nov 11 at 16:14










  • Interesting. I just tried a timeout of 25s and now the nasdaq url fails with [WinError 10054] An existing connection was forcibly closed by the remote host Maybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further
    – deseosuho
    Nov 11 at 16:38










  • I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
    – deseosuho
    Nov 11 at 16:56










  • It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
    – Praneeth
    Nov 11 at 16:56













0












0








0







A script I use for scraping stock ticker changes from nasdaq.com has stopped working.



The url can still be accessed by my browsers, but gives a timeout TimeoutError: [WinError 10060] error when I attempt to access it using Python 3 urllib.request.



Other url's are still behaving as expected. I'm trying to understand whether this is some server-side script that is detecting I'm not in a browser? I am providing spoofed headers. See below.



import urllib.request
import traceback

req_headers =
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'


urls=['http://msn.com',
'https://google.com',
'https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1',
'https://yahoo.com']
for url in urls:
try:
req = urllib.request.Request(
url,
data=None,
headers=req_headers
)
print('trying %s'%url)
pagetext=urllib.request.urlopen(req).read()
print ('Success',len(pagetext))
except:
traceback.print_exc()


The outputs are as follows:



trying http://msn.com
Success 446035
trying https://google.com
Sucess 234077
trying https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1
Traceback (most recent call last):
...
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
trying https://yahoo.com
Success 642758


What is going wrong here?










share|improve this question













A script I use for scraping stock ticker changes from nasdaq.com has stopped working.



The url can still be accessed by my browsers, but gives a timeout TimeoutError: [WinError 10060] error when I attempt to access it using Python 3 urllib.request.



Other url's are still behaving as expected. I'm trying to understand whether this is some server-side script that is detecting I'm not in a browser? I am providing spoofed headers. See below.



import urllib.request
import traceback

req_headers =
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'


urls=['http://msn.com',
'https://google.com',
'https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1',
'https://yahoo.com']
for url in urls:
try:
req = urllib.request.Request(
url,
data=None,
headers=req_headers
)
print('trying %s'%url)
pagetext=urllib.request.urlopen(req).read()
print ('Success',len(pagetext))
except:
traceback.print_exc()


The outputs are as follows:



trying http://msn.com
Success 446035
trying https://google.com
Sucess 234077
trying https://www.nasdaq.com/markets/stocks/symbol-change-history.aspx?sortby=EFFECTIVE&page=1
Traceback (most recent call last):
...
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
trying https://yahoo.com
Success 642758


What is going wrong here?







python python-3.x screen-scraping urllib






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 11 at 15:51









deseosuho

1621215




1621215











  • Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
    – Tomalak
    Nov 11 at 15:58










  • Can you increase the timeout value maybe ?
    – Joe A
    Nov 11 at 16:14










  • Interesting. I just tried a timeout of 25s and now the nasdaq url fails with [WinError 10054] An existing connection was forcibly closed by the remote host Maybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further
    – deseosuho
    Nov 11 at 16:38










  • I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
    – deseosuho
    Nov 11 at 16:56










  • It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
    – Praneeth
    Nov 11 at 16:56
















  • Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
    – Tomalak
    Nov 11 at 15:58










  • Can you increase the timeout value maybe ?
    – Joe A
    Nov 11 at 16:14










  • Interesting. I just tried a timeout of 25s and now the nasdaq url fails with [WinError 10054] An existing connection was forcibly closed by the remote host Maybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further
    – deseosuho
    Nov 11 at 16:38










  • I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
    – deseosuho
    Nov 11 at 16:56










  • It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
    – Praneeth
    Nov 11 at 16:56















Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
– Tomalak
Nov 11 at 15:58




Cannot reproduce. I get "200 OK" from the nasdaq.com url with your code.
– Tomalak
Nov 11 at 15:58












Can you increase the timeout value maybe ?
– Joe A
Nov 11 at 16:14




Can you increase the timeout value maybe ?
– Joe A
Nov 11 at 16:14












Interesting. I just tried a timeout of 25s and now the nasdaq url fails with [WinError 10054] An existing connection was forcibly closed by the remote host Maybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further
– deseosuho
Nov 11 at 16:38




Interesting. I just tried a timeout of 25s and now the nasdaq url fails with [WinError 10054] An existing connection was forcibly closed by the remote host Maybe I've become blacklisted? Odd given that I just ping that resource once per day. I will explore further
– deseosuho
Nov 11 at 16:38












I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
– deseosuho
Nov 11 at 16:56




I was able to resolve the issue using a proxyhandler as in this question: stackoverflow.com/questions/22967084/…
– deseosuho
Nov 11 at 16:56












It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
– Praneeth
Nov 11 at 16:56




It could be the time between two requests to Nasdaq. What is the average time between two requests? Could you once check changing the user agent too?
– Praneeth
Nov 11 at 16:56

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53250439%2fpython-urllib-request-timeout-error-on-a-resource-my-browser-can-access%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53250439%2fpython-urllib-request-timeout-error-on-a-resource-my-browser-can-access%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kleinkühnau

Makov (Slowakei)

Deutsches Schauspielhaus