Unable to pare the href tag in python

up vote
1
down vote

favorite

I get the following output in my beautiful soup.
[Search over 301,944 datasetsn]

I need to extract only the number 301,944 in this. Please guide me how this can be done. My code so far

import requests
import re
from bs4 import BeautifulSoup
source = requests.get('https://www.data.gov/').text
soup = BeautifulSoup (source , 'lxml')
#print soup.prettify()
images = soup.find_all('small')
print images
con = images.find_all('a') // I am unable to get anchor tag here. It says anchor tag not present
print con
#for con in images.find_all('a',href=True):
 #print con
#content = images.split('metrics')
#print content[1]
#images = soup.find_all('a', 'href':re.compile('d+'))
#print images

asked Nov 10 at 19:22

user1107731

64116

add a comment |

up vote
1
down vote

favorite

I get the following output in my beautiful soup.
[Search over 301,944 datasetsn]

I need to extract only the number 301,944 in this. Please guide me how this can be done. My code so far

import requests
import re
from bs4 import BeautifulSoup
source = requests.get('https://www.data.gov/').text
soup = BeautifulSoup (source , 'lxml')
#print soup.prettify()
images = soup.find_all('small')
print images
con = images.find_all('a') // I am unable to get anchor tag here. It says anchor tag not present
print con
#for con in images.find_all('a',href=True):
 #print con
#content = images.split('metrics')
#print content[1]
#images = soup.find_all('a', 'href':re.compile('d+'))
#print images

asked Nov 10 at 19:22

user1107731

64116

add a comment |

up vote
1
down vote

favorite

I get the following output in my beautiful soup.
[Search over 301,944 datasetsn]

I need to extract only the number 301,944 in this. Please guide me how this can be done. My code so far

import requests
import re
from bs4 import BeautifulSoup
source = requests.get('https://www.data.gov/').text
soup = BeautifulSoup (source , 'lxml')
#print soup.prettify()
images = soup.find_all('small')
print images
con = images.find_all('a') // I am unable to get anchor tag here. It says anchor tag not present
print con
#for con in images.find_all('a',href=True):
 #print con
#content = images.split('metrics')
#print content[1]
#images = soup.find_all('a', 'href':re.compile('d+'))
#print images

asked Nov 10 at 19:22

user1107731

64116

I get the following output in my beautiful soup.
[Search over 301,944 datasetsn]

I need to extract only the number 301,944 in this. Please guide me how this can be done. My code so far

import requests
import re
from bs4 import BeautifulSoup
source = requests.get('https://www.data.gov/').text
soup = BeautifulSoup (source , 'lxml')
#print soup.prettify()
images = soup.find_all('small')
print images
con = images.find_all('a') // I am unable to get anchor tag here. It says anchor tag not present
print con
#for con in images.find_all('a',href=True):
 #print con
#content = images.split('metrics')
#print content[1]
#images = soup.find_all('a', 'href':re.compile('d+'))
#print images

beautifulsoup

asked Nov 10 at 19:22

user1107731

64116

asked Nov 10 at 19:22

user1107731

64116

asked Nov 10 at 19:22

user1107731

64116

asked Nov 10 at 19:22

user1107731

64116

asked Nov 10 at 19:22

user1107731

64116

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

There is only one  tag on website.

Your images variable references it. But you use it in a wrong way to retrive anchor tag.

If you want to retrieve text from a tag you can get it with:

soup.find('small').a.text

where find method returns first small element it encounters on website. If you use find_all, you will get list of all small elements (but there's only one small tag here).

answered Nov 11 at 0:04

Dinko Pehar

9522324

1

its not working. When I do that it reports "Traceback (most recent call last): File "C:vishwamyscriptsvalue_site.py", line 7, in <module> images = soup.find_all('small').a.text File "C:Python27libsite-packagesbs4element.py", line 1884, in getattr "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. " % key AttributeError: ResultSet object has no attribute 'a'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– user1107731
Nov 12 at 11:01

I did it on my pc, it just showed text from anchor tag. I used find_all() but since its only anchor tag in small tag, I used find() to retrieve just that one.
– Dinko Pehar
Nov 12 at 11:25

1

Thank you. Now I got it.
– user1107731
Nov 12 at 15:24

Thanks. Can you please mark question as complete ? Thank you
– Dinko Pehar
Nov 12 at 17:37

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53242596%2funable-to-pare-the-href-tag-in-python%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

There is only one  tag on website.

Your images variable references it. But you use it in a wrong way to retrive anchor tag.

If you want to retrieve text from a tag you can get it with:

soup.find('small').a.text

where find method returns first small element it encounters on website. If you use find_all, you will get list of all small elements (but there's only one small tag here).

answered Nov 11 at 0:04

Dinko Pehar

9522324

1

its not working. When I do that it reports "Traceback (most recent call last): File "C:vishwamyscriptsvalue_site.py", line 7, in <module> images = soup.find_all('small').a.text File "C:Python27libsite-packagesbs4element.py", line 1884, in getattr "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. " % key AttributeError: ResultSet object has no attribute 'a'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– user1107731
Nov 12 at 11:01

I did it on my pc, it just showed text from anchor tag. I used find_all() but since its only anchor tag in small tag, I used find() to retrieve just that one.
– Dinko Pehar
Nov 12 at 11:25

1

Thank you. Now I got it.
– user1107731
Nov 12 at 15:24

Thanks. Can you please mark question as complete ? Thank you
– Dinko Pehar
Nov 12 at 17:37

add a comment |

up vote
0
down vote

accepted

There is only one  tag on website.

Your images variable references it. But you use it in a wrong way to retrive anchor tag.

If you want to retrieve text from a tag you can get it with:

soup.find('small').a.text

where find method returns first small element it encounters on website. If you use find_all, you will get list of all small elements (but there's only one small tag here).

answered Nov 11 at 0:04

Dinko Pehar

9522324

1

its not working. When I do that it reports "Traceback (most recent call last): File "C:vishwamyscriptsvalue_site.py", line 7, in <module> images = soup.find_all('small').a.text File "C:Python27libsite-packagesbs4element.py", line 1884, in getattr "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. " % key AttributeError: ResultSet object has no attribute 'a'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– user1107731
Nov 12 at 11:01

I did it on my pc, it just showed text from anchor tag. I used find_all() but since its only anchor tag in small tag, I used find() to retrieve just that one.
– Dinko Pehar
Nov 12 at 11:25

1

Thank you. Now I got it.
– user1107731
Nov 12 at 15:24

Thanks. Can you please mark question as complete ? Thank you
– Dinko Pehar
Nov 12 at 17:37

add a comment |

up vote
0
down vote

accepted

There is only one  tag on website.

Your images variable references it. But you use it in a wrong way to retrive anchor tag.

If you want to retrieve text from a tag you can get it with:

soup.find('small').a.text

where find method returns first small element it encounters on website. If you use find_all, you will get list of all small elements (but there's only one small tag here).

answered Nov 11 at 0:04

Dinko Pehar

9522324

There is only one  tag on website.

Your images variable references it. But you use it in a wrong way to retrive anchor tag.

If you want to retrieve text from a tag you can get it with:

soup.find('small').a.text

where find method returns first small element it encounters on website. If you use find_all, you will get list of all small elements (but there's only one small tag here).

answered Nov 11 at 0:04

Dinko Pehar

9522324

answered Nov 11 at 0:04

Dinko Pehar

9522324

answered Nov 11 at 0:04

Dinko Pehar

9522324

answered Nov 11 at 0:04

Dinko Pehar

9522324

1

its not working. When I do that it reports "Traceback (most recent call last): File "C:vishwamyscriptsvalue_site.py", line 7, in <module> images = soup.find_all('small').a.text File "C:Python27libsite-packagesbs4element.py", line 1884, in getattr "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. " % key AttributeError: ResultSet object has no attribute 'a'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– user1107731
Nov 12 at 11:01

I did it on my pc, it just showed text from anchor tag. I used find_all() but since its only anchor tag in small tag, I used find() to retrieve just that one.
– Dinko Pehar
Nov 12 at 11:25

1

Thank you. Now I got it.
– user1107731
Nov 12 at 15:24

Thanks. Can you please mark question as complete ? Thank you
– Dinko Pehar
Nov 12 at 17:37

add a comment |

1

its not working. When I do that it reports "Traceback (most recent call last): File "C:vishwamyscriptsvalue_site.py", line 7, in <module> images = soup.find_all('small').a.text File "C:Python27libsite-packagesbs4element.py", line 1884, in getattr "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. " % key AttributeError: ResultSet object has no attribute 'a'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– user1107731
Nov 12 at 11:01

I did it on my pc, it just showed text from anchor tag. I used find_all() but since its only anchor tag in small tag, I used find() to retrieve just that one.
– Dinko Pehar
Nov 12 at 11:25

1

Thank you. Now I got it.
– user1107731
Nov 12 at 15:24

Thanks. Can you please mark question as complete ? Thank you
– Dinko Pehar
Nov 12 at 17:37

its not working. When I do that it reports "Traceback (most recent call last): File "C:vishwamyscriptsvalue_site.py", line 7, in <module> images = soup.find_all('small').a.text File "C:Python27libsite-packagesbs4element.py", line 1884, in getattr "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. " % key AttributeError: ResultSet object has no attribute 'a'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– user1107731
Nov 12 at 11:01

I did it on my pc, it just showed text from anchor tag. I used find_all() but since its only anchor tag in small tag, I used find() to retrieve just that one.
– Dinko Pehar
Nov 12 at 11:25

Thank you. Now I got it.
– user1107731
Nov 12 at 15:24

Thanks. Can you please mark question as complete ? Thank you
– Dinko Pehar
Nov 12 at 17:37

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb