regex - How to extract number more gracefully in Python using xpath and regular expression -
i have small html snippet want extract number – grade. using python scrapy
, re
.
my code works, far being nice.
here html snippet, want 2
.
<div id="left"> <div class="0"><b>certificate:</b></div> <div class="1"> <div></div> <div> <a class="link" href="new.html">maths</a> (first) grade 2<br> </div> </div> <div class="2"></div> </div>
and here how solved far:
! note = sel.xpath('//*[@id="left"]/div[2]/div[2]/text()[2]').extract() ! print note > [u'\xa0(first)\xa0\xa0\xa0grade 2'] ! note_string = ''.join(note) ! note_only = re.search(r'\d+', note_string).group() > 2
it's not best practice transform lists strings extract such tiny information.
how can better?
you can use following xpath expression 2
substring-after(//*[@id="left"]/div[2]/div[2]/text(), "grade ")
Comments
Post a Comment