c语言sscanf函数的用法是什么
284
2022-09-19
【原】【爬虫系列】简要获取一下知乎的最热门话题相关主题及描述信息
最近看下爬虫方面,用一些通用的做一些小的实验。都是比较基础的代码,高手请跳过。
说明
这里只是实现获取知乎每日/每月最热问题的一个基本的小功能(class="question_link" href="/question/30359991/answer/401771701" target="_blank" data-id="4326359" data-za-element-name="Title">心算可以算出羽毛球的落点吗?
稍微分析一下可以看出,我们只需要拿到所有class=“question_link”的a标签就可以了。
代码
# coding:utf-8#!/usr/bin/python# @Time :18-5-29 下午2:12# @Author :Hao Chuang# @Site :# @Wechat :nianhuaiju# @File :zhihu-hot.py# @Software :PyCharm Community Edition# zhihu.pyimport requestsfrom bs4 import BeautifulSoupurl = '= '= '= {'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36'}def get_url_hot(): soup = url_set(url) # getTitle for link in soup.find_all('a', class_='question_link'): print(link.text) printstar(50) # get hot number for hotnum in soup.find_all('a', class_='zm-item-vote-count js-expand js-vote-count'): print(hotnum.text) printstar(50) # get author for author in soup.find_all('a', class_='author-link'): print (author.text) printstar(50) # get user type for usertype in soup.find_all('span', class_="badge-summary"): print (usertype.text) printstar(50) # get user desc for usertitle in soup.find_all('span', class_="badge-summary"): print (usertitle.text) printstar(50) # 获取热门话题内容 for contextdesc in soup.find_all('div', class_="zh-summary summary clearfix"): print (contextdesc.text) printstar(50) # 评论数量 for commentnum in soup.find_all('a', class_="meta-item toggle-comment js-toggleCommentBox"): print (commentnum.text) def get_url_day_hot(): soup = url_set(url_day) # getTitle for link in soup.find_all('a', class_='question_link'): print(link.text) printstar(50) # get hot number for hotnum in soup.find_all('a', class_='zm-item-vote-count js-expand js-vote-count'): print(hotnum.text) printstar(50) # get author for author in soup.find_all('a', class_='author-link'): print (author.text) printstar(50) # get user type for usertype in soup.find_all('span', class_="badge-summary"): print (usertype.text) printstar(50) # get user desc for usertitle in soup.find_all('span', class_="badge-summary"): print (usertitle.text) printstar(50) # 获取热门话题内容 for contextdesc in soup.find_all('div', class_="zh-summary summary clearfix"): print (contextdesc.text) printstar(50) # 评论数量 for commentnum in soup.find_all('a', class_="meta-item toggle-comment js-toggleCommentBox"): print (commentnum.text) def get_url_month_hot(): soup = url_set(url_month) # getTitle for link in soup.find_all('a', class_='question_link'): print(link.text) printstar(50) # get hot number for hotnum in soup.find_all('a', class_='zm-item-vote-count js-expand js-vote-count'): print(hotnum.text) printstar(50) # get author for author in soup.find_all('a', class_='author-link'): print (author.text) printstar(50) # get user type for usertype in soup.find_all('span', class_="badge-summary"): print (usertype.text) printstar(50) # get user desc for usertitle in soup.find_all('span', class_="badge-summary"): print (usertitle.text) printstar(50) # 获取热门话题内容 for contextdesc in soup.find_all('div', class_="zh-summary summary clearfix"): print (contextdesc.text) printstar(50) # 评论数量 for commentnum in soup.find_all('a', class_="meta-item toggle-comment js-toggleCommentBox"): print (commentnum.text) printstar(50)def printstar(num): print '*' * numdef url_set(url_): soup = BeautifulSoup(requests.get(url_, headers=headers).text, 'html.parser') return soupif __name__ == '__main__': printstar(100) get_url_hot() printstar(100) get_url_day_hot() printstar(100) get_url_month_hot()
运行
我使用Pycharm,直接CTRL+SHIFT+F10,运行,当然,你也可以在命令行中运行:python zhihu.py
结果如下:
这只是一个比较简单的示例,还可以优化,后续我在继续搞下
赠人玫瑰 手留余香
我们曾如此渴望命运的波澜,到最后才发现:人生最曼妙的风景,竟是内心的淡定与从容……我们曾如此期盼外界的认可,到最后才知道:世界是自己的,与他人毫无关系!-杨绛先生
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~