通过Azure Function跑爬虫（一）代码和环境准备-APISpace

通过Azure Function跑爬虫（一）代码和环境准备

下边来尝试下，把刚才分享的爬虫程序移植到Function中来跑，本身其实放在哪跑都无所谓，只是拓展下思路

类似这种功能性的程序其实没必要和业务代码放在一起，完全可以单独使用Function来实现按需运行，如果是放在容器化的环境里，也可以通过cron job来做，function的好处其实不在于代码运行的快慢或者费用等，而是维护起来会比较简单，运行情况的监控和日志等工作都交给Azure来做就完了

Function中Python代码的运行和在服务器上其实没什么区别，只不过Function是个托管的环境，代码运行的平台是不保证唯一性的，这次运行的时候是这台host，下次就换另外一个了，所以要考虑数据持久化的问题，之前介绍过app service现在可以直接mount storage的目录，对于function来说，也可以实现类似的功能，只不过做法稍有区别，我们可以直接将Azure File mount到function中，需要持久化的数据都可以放在Azure File mount出来的路径里就完事了

要实现这个需求非常简单，可以直接通过CLI完成

az webapp config storage-account add ` --resource-group blog ` --name mxyfun ` --custom-id myshare123 ` --storage-type AzureFiles ` --share-name share ` --account-name mxyblob ` --mount-path /formount ` --access-key $KEY

注意$KEY对应的值是storage account的key，mount完成之后可以通过list查看下mount的效果

az webapp config storage-account list ` --resource-group blog ` --name mxyfun

创建function的过程就不赘述了，下边是代码的准备工作，在function中运行代码其实很简单，直接把原有代码拷贝过来，稍作调整就OK了

首先需要依赖的module直接写到requirements.txt里

在部署的时候会自动进行安装

之后在init脚本里放上准备好的代码，注意把print这类的函数改成function中适用的logging等函数，再把输出的路径改为mount出来的路径就好了

import datetimeimport loggingfrom bs4 import BeautifulSoupimport csvimport requestsfrom urllib import parseimport osimport azure.functions as funcbase_url = '= '= 1csv_file = '/formount/data/Azure_Updates.csv'csv_header = ['No.', 'Update_Content', 'Update_URL', 'Update_date']csv_data = []def get_update_per_page(base_url, headers, params): response = requests.get(base_url, headers=headers, params=params) soup = BeautifulSoup(response.text, 'html.parser') update_lists = soup.find_all('a', attrs={'data-test-element': 'update-entry-link'}) date_lists = soup.find_all(attrs={'class': 'column medium-1'}) global no for i in range(len(update_lists)): update_data = { 'No.': no, 'Update_Content': update_lists[i].text, 'Update_URL': parse.urljoin(announce_base_url, update_lists[i]['href']), 'Update_date': date_lists[i].text.strip() } no += 1 csv_data.append(update_data) write_dict_to_csv(csv_header, csv_data, csv_file)def write_dict_to_csv(csv_header, csv_data, csv_file): with open(csv_file, 'w', encoding="utf_8_sig", newline='')as f: try: f_csv = csv.DictWriter(f, csv_header) f_csv.writeheader() f_csv.writerows(csv_data) except Exception as errors: print(errors)headers = { 'authority': 'azure.microsoft.com', 'cache-control': 'max-age=0', 'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"', 'sec-ch-ua-mobile': '?0', 'sec-ch-ua-platform': '"Windows"', 'upgrade-insecure-requests': '1', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36', 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'sec-fetch-site': 'same-origin', 'sec-fetch-mode': 'navigate', 'sec-fetch-user': '?1', 'sec-fetch-dest': 'document', 'referer': ' 'accept-language': 'en,zh-CN;q=0.9,zh;q=0.8'}def main(mytimer: func.TimerRequest) -> None: utc_timestamp = datetime.datetime.utcnow().replace( tzinfo=datetime.timezone.utc).isoformat() if mytimer.past_due: logging.info('The timer is past due!') logging.info('Python timer trigger function ran at %s', utc_timestamp) if os.access(csv_file, os.F_OK): logging.info("csv file already exist, remove file %s" % (csv_file)) os.remove(csv_file) for i in range(1, 20): params = ( ('Page', i), ) get_update_per_page(base_url, headers, params) logging.info('please check csv file')

接下来就是准备把代码部署到Function中了，这部分工作其实反而比编写代码更显得繁琐一些

python怎么过滤字符串中的英文字母

332 2022-09-30

通过Azure Function跑爬虫（一）代码和环境准备

c语言sscanf函数的用法是什么

r语言清空数组的方法是什么

python怎么过滤字符串中的英文字母

推荐文章

api接口有哪几种分类及功能

什么是API接口?API接口简单介绍

短信API接口概述，短信API接口的优势

7款快递物流的物流查询API工具，物流快递查询API接口怎么对接？

企业四要素: 了解企业经营成功的关键

什么是语音验证码?,语音验证码平台有哪些

全国工商查询系统怎么查企业名录

哪些平台提供实名认证的接口？

PHP如何调用API接口?

如何使用百度天气预报API接口?

最近发表

热评文章

数据接口api（数据接口API开发平台）

数据开放接口api（数据服务api开发）

Python爬虫教程：爬取酷狗音乐（python爬取

hbuilder怎么更改字体大小和颜色

直播平台api接口 - 构建卓越的直播平台

实时股票数据api接口（股票实时行情api接口）

通过Azure Function跑爬虫 （一）代码和环境准备

推荐文章

最近发表

热评文章

通过Azure Function跑爬虫（一）代码和环境准备