ingFang SC", "Microsoft YaHei", "Source Han Sans SC", "Noto Sans CJK SC", "WenQuanYi Micro Hei", sans-serif;font-size: medium;letter-spacing: normal;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);">网页如下,有多个链接:ingFang SC", "Microsoft YaHei", "Source Han Sans SC", "Noto Sans CJK SC", "WenQuanYi Micro Hei", sans-serif;font-size: medium;letter-spacing: normal;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);"> ingFang SC", "Microsoft YaHei", "Source Han Sans SC", "Noto Sans CJK SC", "WenQuanYi Micro Hei", sans-serif;font-size: medium;letter-spacing: normal;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);">找到其中的a标签:ingFang SC", "Microsoft YaHei", "Source Han Sans SC", "Noto Sans CJK SC", "WenQuanYi Micro Hei", sans-serif;font-size: medium;letter-spacing: normal;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);"><a hotrep="doc.overview.modules.path.0.0.1" href="https://cloud.tencent.com/document/product/1093/35681" title="产品优势">ingFang SC", "Microsoft YaHei", "Source Han Sans SC", "Noto Sans CJK SC", "WenQuanYi Micro Hei", sans-serif;font-size: medium;letter-spacing: normal;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);">产品优势ingFang SC", "Microsoft YaHei", "Source Han Sans SC", "Noto Sans CJK SC", "WenQuanYi Micro Hei", sans-serif;font-size: medium;letter-spacing: normal;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);"> ingFang SC", "Microsoft YaHei", "Source Han Sans SC", "Noto Sans CJK SC", "WenQuanYi Micro Hei", sans-serif;font-size: medium;letter-spacing: normal;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);"></a>ingFang SC", "Microsoft YaHei", "Source Han Sans SC", "Noto Sans CJK SC", "WenQuanYi Micro Hei", sans-serif;font-size: medium;letter-spacing: normal;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);">在deepseek中输入提示词:ingFang SC", "Microsoft YaHei", "Source Han Sans SC", "Noto Sans CJK SC", "WenQuanYi Micro Hei", sans-serif;font-size: medium;letter-spacing: normal;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);">你是一个Python编程专家,要完成一个百度搜索页面爬取的Python脚本,具体任务如下:ingFang SC", "Microsoft YaHei", "Source Han Sans SC", "Noto Sans CJK SC", "WenQuanYi Micro Hei", sans-serif;font-size: medium;letter-spacing: normal;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);">解析网页:https://cloud.tencent.com/document/product/1093定位class="rno-learning-path-wrap"的div标签; 然后定位div标签中所有a标签,提取title属性值作为网页文件名,提取href属性值作为网页下载地址,下载网页,保存网页到文件夹:F:\aivideo\腾讯云语音识别 注意: 每一步都要输出信息到屏幕上 每下载一个网页后,随机暂停3-6秒; 设置请求标头: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7 Accept-Encoding: gzip, deflate, br, zstd Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6 Cache-Control: max-age=0 Priority: u=0, i Referer: https://cloud.tencent.com/product/asr?from_column=20421&from=20421 Sec-Ch-Ua: " Not A;Brand";v="99", "Chromium";v="127", "Google Chrome";v="127" Sec-Ch-Ua-Mobile: ?0 Sec-Ch-Ua-Platform: "Windows" Sec-Fetch-Dest: document Sec-Fetch-Mode: navigate Sec-Fetch-Site: same-origin Sec-Fetch-User: ?1 Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Deepseek生成的源代码: import requests from bs4 import BeautifulSoup import os import time import random # 设置请求头 headers = { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7", "Accept-Encoding": "gzip, deflate, br, zstd", "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6", "Cache-Control": "max-age=0", "Priority": "u=0, i", "Referer": "https://cloud.tencent.com/product/asr?from_column=20421&from=20421", "Sec-Ch-Ua": '" Not A;Brand";v="99", "Chromium";v="127", "Google Chrome";v="127"', "Sec-Ch-Ua-Mobile": "?0", "Sec-Ch-Ua-Platform": '"Windows"', "Sec-Fetch-Dest": "document", "Sec-Fetch-Mode": "navigate", "Sec-Fetch-Site": "same-origin", "Sec-Fetch-User": "?1", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36" } # 创建文件夹 output_folder = 'F:\\aivideo\\腾讯云语音识别' if not os.path.exists(output_folder): os.makedirs(output_folder) # 请求网页 url = "https://cloud.tencent.com/document/product/1093" response = requests.get(url, headers=headers) response.encoding = 'utf-8' # 解析网页 soup = BeautifulSoup(response.text, 'html.parser') # 定位class="rno-learning-path-wrap"的div标签 div_tag = soup.find('div', class_='rno-learning-path-wrap') # 定位div标签中所有a标签 a_tags = div_tag.find_all('a') # 提取title属性值作为网页文件名,提取href属性值作为网页下载地址 for a_tag in a_tags: title = a_tag.get('title') href = a_tag.get('href') if title and href: # 输出信息到屏幕上 print(f"Found link: {title} - {href}") # 下载网页 try: response = requests.get(href, headers=headers) response.encoding = 'utf-8' file_path = os.path.join(output_folder, f"{title}.html") with open(file_path, 'w', encoding='utf-8') as file: file.write(response.text) print(f"Downloaded {href} to {file_path}") except Exception as e: print(f"Failed to download {href}: {e}") # 随机暂停3-6秒 time.sleep(random.uniform(3, 6))  |