Scrapy get text out of span

up vote
3
down vote

favorite

URL: https://myanimelist.net/anime/236/Es_Otherwise

I trying to scrape the following content in URL:

enter image description here

I tried :

for i in response.css('span[class = dark_text]') :
 i.xpath('/following-sibling::text()')

or that current XPath who's don't work or I missed something...

aired_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[11]/text()')

producer_xpath = response.xpath("//*[@id='content']/table/tbody/tr/td[1]/div/div[12]/span/a/@href/text()")
licensor_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[13]/a/text()')
studio_xpath response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[14]/a/@href/title/text()')
studio_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[17]/text()')
str_rating_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[18]/text()')
ranked_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[20]/span/text()')
japanese_title_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[7]/text()')
source_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[15]/text()')
genre_xpath = [response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[16]/a[0]'.format(i)) for i in range(1,4)]
genre_xpath_v2 = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[16]/a/@href/text()')
number_of_users_rated_anime_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[19]/span[3]/text()')
popularity_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[21]/span/text()')
members_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[22]/span/text()')
favorite_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[23]/span/text()')

but I figured out that some text are out of a span class, so I would like to get that text out of span with a css/XPath formula.

edited Nov 11 at 8:41

quant

1,58411526

asked Nov 10 at 15:10

user9176398

10410

Hi. Please can you write a paragraph or so to better explain your question?
– user
Nov 10 at 15:17

What language do you want to use? Do you have a deal with that site to scrape the content?
– bestprogrammerintheworld
Nov 10 at 16:38

I use python with scrapy framework
– user9176398
Nov 10 at 20:50

add a comment |

up vote
3
down vote

favorite

URL: https://myanimelist.net/anime/236/Es_Otherwise

I trying to scrape the following content in URL:

enter image description here

I tried :

for i in response.css('span[class = dark_text]') :
 i.xpath('/following-sibling::text()')

or that current XPath who's don't work or I missed something...

aired_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[11]/text()')

producer_xpath = response.xpath("//*[@id='content']/table/tbody/tr/td[1]/div/div[12]/span/a/@href/text()")
licensor_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[13]/a/text()')
studio_xpath response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[14]/a/@href/title/text()')
studio_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[17]/text()')
str_rating_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[18]/text()')
ranked_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[20]/span/text()')
japanese_title_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[7]/text()')
source_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[15]/text()')
genre_xpath = [response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[16]/a[0]'.format(i)) for i in range(1,4)]
genre_xpath_v2 = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[16]/a/@href/text()')
number_of_users_rated_anime_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[19]/span[3]/text()')
popularity_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[21]/span/text()')
members_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[22]/span/text()')
favorite_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[23]/span/text()')

but I figured out that some text are out of a span class, so I would like to get that text out of span with a css/XPath formula.

edited Nov 11 at 8:41

quant

1,58411526

asked Nov 10 at 15:10

user9176398

10410

Hi. Please can you write a paragraph or so to better explain your question?
– user
Nov 10 at 15:17

What language do you want to use? Do you have a deal with that site to scrape the content?
– bestprogrammerintheworld
Nov 10 at 16:38

I use python with scrapy framework
– user9176398
Nov 10 at 20:50

add a comment |

up vote
3
down vote

favorite

URL: https://myanimelist.net/anime/236/Es_Otherwise

I trying to scrape the following content in URL:

enter image description here

I tried :

for i in response.css('span[class = dark_text]') :
 i.xpath('/following-sibling::text()')

or that current XPath who's don't work or I missed something...

aired_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[11]/text()')

producer_xpath = response.xpath("//*[@id='content']/table/tbody/tr/td[1]/div/div[12]/span/a/@href/text()")
licensor_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[13]/a/text()')
studio_xpath response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[14]/a/@href/title/text()')
studio_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[17]/text()')
str_rating_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[18]/text()')
ranked_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[20]/span/text()')
japanese_title_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[7]/text()')
source_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[15]/text()')
genre_xpath = [response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[16]/a[0]'.format(i)) for i in range(1,4)]
genre_xpath_v2 = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[16]/a/@href/text()')
number_of_users_rated_anime_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[19]/span[3]/text()')
popularity_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[21]/span/text()')
members_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[22]/span/text()')
favorite_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[23]/span/text()')

but I figured out that some text are out of a span class, so I would like to get that text out of span with a css/XPath formula.

edited Nov 11 at 8:41

quant

1,58411526

asked Nov 10 at 15:10

user9176398

10410

URL: https://myanimelist.net/anime/236/Es_Otherwise

I trying to scrape the following content in URL:

enter image description here

I tried :

for i in response.css('span[class = dark_text]') :
 i.xpath('/following-sibling::text()')

or that current XPath who's don't work or I missed something...

aired_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[11]/text()')

producer_xpath = response.xpath("//*[@id='content']/table/tbody/tr/td[1]/div/div[12]/span/a/@href/text()")
licensor_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[13]/a/text()')
studio_xpath response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[14]/a/@href/title/text()')
studio_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[17]/text()')
str_rating_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[18]/text()')
ranked_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[20]/span/text()')
japanese_title_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[7]/text()')
source_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[15]/text()')
genre_xpath = [response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[16]/a[0]'.format(i)) for i in range(1,4)]
genre_xpath_v2 = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[16]/a/@href/text()')
number_of_users_rated_anime_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[19]/span[3]/text()')
popularity_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[21]/span/text()')
members_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[22]/span/text()')
favorite_xpath = response.xpath('//*[@id="content"]/table/tbody/tr/td[1]/div/div[23]/span/text()')

but I figured out that some text are out of a span class, so I would like to get that text out of span with a css/XPath formula.

python html css scrapy

edited Nov 11 at 8:41

quant

1,58411526

asked Nov 10 at 15:10

user9176398

10410

edited Nov 11 at 8:41

quant

1,58411526

asked Nov 10 at 15:10

user9176398

10410

edited Nov 11 at 8:41

quant

1,58411526

edited Nov 11 at 8:41

quant

1,58411526

edited Nov 11 at 8:41

quant

1,58411526

asked Nov 10 at 15:10

user9176398

10410

asked Nov 10 at 15:10

user9176398

10410

asked Nov 10 at 15:10

user9176398

10410

Hi. Please can you write a paragraph or so to better explain your question?
– user
Nov 10 at 15:17

What language do you want to use? Do you have a deal with that site to scrape the content?
– bestprogrammerintheworld
Nov 10 at 16:38

I use python with scrapy framework
– user9176398
Nov 10 at 20:50

add a comment |

Hi. Please can you write a paragraph or so to better explain your question?
– user
Nov 10 at 15:17

What language do you want to use? Do you have a deal with that site to scrape the content?
– bestprogrammerintheworld
Nov 10 at 16:38

I use python with scrapy framework
– user9176398
Nov 10 at 20:50

Hi. Please can you write a paragraph or so to better explain your question?
– user
Nov 10 at 15:17

What language do you want to use? Do you have a deal with that site to scrape the content?
– bestprogrammerintheworld
Nov 10 at 16:38

I use python with scrapy framework
– user9176398
Nov 10 at 20:50

add a comment |

2 Answers
2

active

oldest

votes

up vote
0
down vote

If you are only trying to scrap the information that you mentioned in the image you can just make use of

response.xpath('//div[@class="space-it"]//text()').extract()

Or i am unable to understand your question properly.

answered Nov 10 at 17:18

Gaurav

That following syntax return empty list
– user9176398
Nov 10 at 20:49

Have You changed the class name? actually the class name is spaceit
– Gaurav
Nov 11 at 15:19

For a better Result You can try response.xpath('//div[@class="js-scrollfix-bottom"]//div[@class="spaceit"]
– Gaurav
Nov 11 at 15:41

just it won't return you alternative name and type
– Gaurav
Nov 11 at 15:43

add a comment |

up vote
0
down vote

it simpler to just loop through div inside the table

foundH2 = False
response = Selector(text=htmlString).xpath('//*[@id="content"]/table/tr/td[1]/div/*')

for resp in response:
 tagName = resp.xpath('name()').extract_first()
 if 'h2' == tagName:
 foundH2 = True
 if foundH2:
 # start adding 'info' after <h2>Alternative Titles</h2> found
 info = None
 if 'div' == tagName:
 for item in resp.xpath('.//text()').extract():
 if 'googletag.' in item: break
 item = item.strip()
 if item and item != ',':
 info = info + " " + item if info else item
 if info:
 print info

just my opinion, beautifulSoup is faster and better than scrapy.

answered Nov 10 at 21:19

ewwink

8,35622234

Thanks it works, but what's name and googletag ? can you explain a bit your code please.
– user9176398
Nov 11 at 9:03

it div content after Favorites: 27, and it will stop loop after it found
– ewwink
Nov 11 at 9:04

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53240264%2fscrapy-get-text-out-of-span%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

If you are only trying to scrap the information that you mentioned in the image you can just make use of

response.xpath('//div[@class="space-it"]//text()').extract()

Or i am unable to understand your question properly.

answered Nov 10 at 17:18

Gaurav

That following syntax return empty list
– user9176398
Nov 10 at 20:49

Have You changed the class name? actually the class name is spaceit
– Gaurav
Nov 11 at 15:19

For a better Result You can try response.xpath('//div[@class="js-scrollfix-bottom"]//div[@class="spaceit"]
– Gaurav
Nov 11 at 15:41

just it won't return you alternative name and type
– Gaurav
Nov 11 at 15:43

add a comment |

up vote
0
down vote

If you are only trying to scrap the information that you mentioned in the image you can just make use of

response.xpath('//div[@class="space-it"]//text()').extract()

Or i am unable to understand your question properly.

answered Nov 10 at 17:18

Gaurav

That following syntax return empty list
– user9176398
Nov 10 at 20:49

Have You changed the class name? actually the class name is spaceit
– Gaurav
Nov 11 at 15:19

For a better Result You can try response.xpath('//div[@class="js-scrollfix-bottom"]//div[@class="spaceit"]
– Gaurav
Nov 11 at 15:41

just it won't return you alternative name and type
– Gaurav
Nov 11 at 15:43

add a comment |

up vote
0
down vote

If you are only trying to scrap the information that you mentioned in the image you can just make use of

response.xpath('//div[@class="space-it"]//text()').extract()

Or i am unable to understand your question properly.

answered Nov 10 at 17:18

Gaurav

If you are only trying to scrap the information that you mentioned in the image you can just make use of

response.xpath('//div[@class="space-it"]//text()').extract()

Or i am unable to understand your question properly.

answered Nov 10 at 17:18

Gaurav

answered Nov 10 at 17:18

Gaurav

answered Nov 10 at 17:18

Gaurav

answered Nov 10 at 17:18

Gaurav

That following syntax return empty list
– user9176398
Nov 10 at 20:49

Have You changed the class name? actually the class name is spaceit
– Gaurav
Nov 11 at 15:19

For a better Result You can try response.xpath('//div[@class="js-scrollfix-bottom"]//div[@class="spaceit"]
– Gaurav
Nov 11 at 15:41

just it won't return you alternative name and type
– Gaurav
Nov 11 at 15:43

add a comment |

That following syntax return empty list
– user9176398
Nov 10 at 20:49

Have You changed the class name? actually the class name is spaceit
– Gaurav
Nov 11 at 15:19

For a better Result You can try response.xpath('//div[@class="js-scrollfix-bottom"]//div[@class="spaceit"]
– Gaurav
Nov 11 at 15:41

just it won't return you alternative name and type
– Gaurav
Nov 11 at 15:43

That following syntax return empty list
– user9176398
Nov 10 at 20:49

Have You changed the class name? actually the class name is spaceit
– Gaurav
Nov 11 at 15:19

For a better Result You can try response.xpath('//div[@class="js-scrollfix-bottom"]//div[@class="spaceit"]
– Gaurav
Nov 11 at 15:41

just it won't return you alternative name and type
– Gaurav
Nov 11 at 15:43

add a comment |

up vote
0
down vote

it simpler to just loop through div inside the table

foundH2 = False
response = Selector(text=htmlString).xpath('//*[@id="content"]/table/tr/td[1]/div/*')

for resp in response:
 tagName = resp.xpath('name()').extract_first()
 if 'h2' == tagName:
 foundH2 = True
 if foundH2:
 # start adding 'info' after <h2>Alternative Titles</h2> found
 info = None
 if 'div' == tagName:
 for item in resp.xpath('.//text()').extract():
 if 'googletag.' in item: break
 item = item.strip()
 if item and item != ',':
 info = info + " " + item if info else item
 if info:
 print info

just my opinion, beautifulSoup is faster and better than scrapy.

answered Nov 10 at 21:19

ewwink

8,35622234

Thanks it works, but what's name and googletag ? can you explain a bit your code please.
– user9176398
Nov 11 at 9:03

it div content after Favorites: 27, and it will stop loop after it found
– ewwink
Nov 11 at 9:04

add a comment |

up vote
0
down vote

it simpler to just loop through div inside the table

foundH2 = False
response = Selector(text=htmlString).xpath('//*[@id="content"]/table/tr/td[1]/div/*')

for resp in response:
 tagName = resp.xpath('name()').extract_first()
 if 'h2' == tagName:
 foundH2 = True
 if foundH2:
 # start adding 'info' after <h2>Alternative Titles</h2> found
 info = None
 if 'div' == tagName:
 for item in resp.xpath('.//text()').extract():
 if 'googletag.' in item: break
 item = item.strip()
 if item and item != ',':
 info = info + " " + item if info else item
 if info:
 print info

just my opinion, beautifulSoup is faster and better than scrapy.

answered Nov 10 at 21:19

ewwink

8,35622234

Thanks it works, but what's name and googletag ? can you explain a bit your code please.
– user9176398
Nov 11 at 9:03

it div content after Favorites: 27, and it will stop loop after it found
– ewwink
Nov 11 at 9:04

add a comment |

up vote
0
down vote

it simpler to just loop through div inside the table

foundH2 = False
response = Selector(text=htmlString).xpath('//*[@id="content"]/table/tr/td[1]/div/*')

for resp in response:
 tagName = resp.xpath('name()').extract_first()
 if 'h2' == tagName:
 foundH2 = True
 if foundH2:
 # start adding 'info' after <h2>Alternative Titles</h2> found
 info = None
 if 'div' == tagName:
 for item in resp.xpath('.//text()').extract():
 if 'googletag.' in item: break
 item = item.strip()
 if item and item != ',':
 info = info + " " + item if info else item
 if info:
 print info

just my opinion, beautifulSoup is faster and better than scrapy.

answered Nov 10 at 21:19

ewwink

8,35622234

it simpler to just loop through div inside the table

foundH2 = False
response = Selector(text=htmlString).xpath('//*[@id="content"]/table/tr/td[1]/div/*')

for resp in response:
 tagName = resp.xpath('name()').extract_first()
 if 'h2' == tagName:
 foundH2 = True
 if foundH2:
 # start adding 'info' after <h2>Alternative Titles</h2> found
 info = None
 if 'div' == tagName:
 for item in resp.xpath('.//text()').extract():
 if 'googletag.' in item: break
 item = item.strip()
 if item and item != ',':
 info = info + " " + item if info else item
 if info:
 print info

just my opinion, beautifulSoup is faster and better than scrapy.

answered Nov 10 at 21:19

ewwink

8,35622234

answered Nov 10 at 21:19

ewwink

8,35622234

answered Nov 10 at 21:19

ewwink

8,35622234

answered Nov 10 at 21:19

ewwink

8,35622234

Thanks it works, but what's name and googletag ? can you explain a bit your code please.
– user9176398
Nov 11 at 9:03

it div content after Favorites: 27, and it will stop loop after it found
– ewwink
Nov 11 at 9:04

add a comment |

Thanks it works, but what's name and googletag ? can you explain a bit your code please.
– user9176398
Nov 11 at 9:03

it div content after Favorites: 27, and it will stop loop after it found
– ewwink
Nov 11 at 9:04

Thanks it works, but what's name and googletag ? can you explain a bit your code please.
– user9176398
Nov 11 at 9:03

it div content after Favorites: 27, and it will stop loop after it found
– ewwink
Nov 11 at 9:04

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

foTrnJ lJE l7ggxi z,Zo IRfI5I6HX cbdFDq4351t5WpwD,SSo 86HyiQ1K6XG8mkVWUo8oNqS8ILv6Wp0 n69bVBmm0qz,2N

搜尋此網誌

Pfthb