我的网页看起来像这样:
<p> <strong class="offender">YOB:</strong> 1987<br/> <strong class="offender">RACE:</strong> WHITE<br/> <strong class="offender">GENDER:</strong> FEMALE<br/> <strong class="offender">HEIGHT:</strong> 5'05''<br/> <strong class="offender">WEIGHT:</strong> 118<br/> <strong class="offender">EYE COLOR:</strong> GREEN<br/> <strong class="offender">HAIR COLOR:</strong> BROWN<br/> </p>
我想提取每一个人的信息,并得到YOB:1987,RACE:WHITE等…
YOB:1987
RACE:WHITE
我试过的是:
subc = soup.find_all('p') subc1 = subc[1] subc2 = subc1.find_all('strong')
但是,这给我的唯一的值YOB:,RACE:等…
YOB:
RACE:
有没有一种方法,我可以得到的数据YOB:1987,RACE:WHITE格式?
只需遍历所有<strong>标签并使用next_sibling即可获取所需内容。像这样:
<strong>
next_sibling
for strong_tag in soup.find_all('strong'): print(strong_tag.text, strong_tag.next_sibling)
演示:
from bs4 import BeautifulSoup html = ''' <p> <strong class="offender">YOB:</strong> 1987<br /> <strong class="offender">RACE:</strong> WHITE<br /> <strong class="offender">GENDER:</strong> FEMALE<br /> <strong class="offender">HEIGHT:</strong> 5'05''<br /> <strong class="offender">WEIGHT:</strong> 118<br /> <strong class="offender">EYE COLOR:</strong> GREEN<br /> <strong class="offender">HAIR COLOR:</strong> BROWN<br /> </p> ''' soup = BeautifulSoup(html) for strong_tag in soup.find_all('strong'): print(strong_tag.text, strong_tag.next_sibling)
这给您:
YOB: 1987 RACE: WHITE GENDER: FEMALE HEIGHT: 5'05'' WEIGHT: 118 EYE COLOR: GREEN HAIR COLOR: BROWN