通过 pip 安装 ScrapeGraphAI:
pipinstallscrapegraphai
安装 Playwright,用于基于 JavaScript 的抓取:
playwrightinstall
建议在虚拟环境中安装库,以避免与其他库发生冲突。
SmartScraperGraph:单页面抓取器,只需要用户提示和输入源。
SearchGraph:多页面抓取器,从搜索引擎的前 n 个搜索结果中提取信息。
SpeechGraph:单页面抓取器,从网站提取信息并生成音频文件。
使用本地模型的 SmartScraperGraph:
确保已安装 Ollama 并使用ollama pull命令下载模型。
示例代码展示了如何创建SmartScraperGraph实例并运行它,以获取项目列表及其描述。
使用混合模型的 SearchGraph:
使用 Groq 作为 LLM 和 Ollama 作为嵌入模型。
示例代码展示了如何创建SearchGraph实例并运行它,以获取 Chioggia 的传统食谱列表。
使用 OpenAI 的 SpeechGraph:
只需要传递 OpenAI API 密钥和模型名称。
示例代码展示了如何创建SpeechGraph实例并运行它,以生成项目摘要的音频文件。
SmartScraperGraph的输出是项目及其描述的列表。
SearchGraph的输出是食谱的列表。
SpeechGraph的输出是页面上项目摘要的音频文件。
在使用之前,需要设置 OpenAI API 密钥。
文档和参考页面可以在 ScrapeGraphAI 的官方页面上找到。
The reference page for Scrapegraph-ai is available on the official page of pypy:pypi.
pipinstallscrapegraphai
you will also need to install Playwright for javascript-based scraping:
playwrightinstall
Note: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries ?
Follow the procedure on the following link to setup your OpenAI API key:link.
The documentation for ScrapeGraphAI can be foundhere.
Check out also the docusaurusdocumentation.
There are three main scraping pipelines that can be used to extract information from a website (or local file):
SmartScraperGraph: single-page scraper that only needs a user prompt and an input source;
SearchGraph: multi-page scraper that extracts information from the top n search results of a search engine;
SpeechGraph: single-page scraper that extracts information from a website and generates an audio file.
It is possible to use different LLM through APIs, such asOpenAI,Groq,AzureandGemini, or local models usingOllama.
Remember to haveOllamainstalled and download the models using theollama pullcommand.
scrapegraphai.graphs
graph_config{
:{
:,
:,
:,
:,
},
:{
:,
:,
},
:,
}
smart_scraper_graph(
prompt,
source,
configgraph_config
)
resultsmart_scraper_graph.()
(result)The output will be a list of projects with their descriptions like the following:
{'projects':[{'title':'RotaryPendulumRL','description':'OpenSourceprojectaimedatcontrollingarealliferotarypendulumusingRLalgorithms'},{'title':'DQNImplementationfromscratch','description':'DevelopedaDeepQ-Networkalgorithmtotrainasimpleanddoublependulum'},...]}We useGroqfor the LLM andOllamafor the embeddings.
scrapegraphai.graphs
graph_config{
:{
:,
:,
:
},
:{
:,
:,
},
:,
}
search_graph(
prompt,
configgraph_config
)
resultsearch_graph.()
(result)The output will be a list of recipes like the following:
{'recipes':[{'name':'SardeinSaòre'},{'name':'Bigoliinsalsa'},{'name':'Seppieinumido'},{'name':'Molechefrite'},{'name':'Risottoallapescatora'},{'name':'Broeto'},{'name':'BibarasseinCassopipa'},{'name':'Risiebisi'},{'name':'SmegiassaCiosota'}]}You just need to pass the OpenAI API key and the model name.
scrapegraphai.graphs
graph_config{
:{
:,
:,
},
:{
:,
:,
:
},
:,
}
speech_graph(
prompt,
source,
configgraph_config,
)
resultspeech_graph.()
(result)The output will be an audio file with the summary of the projects on the page.
| 欢迎光临 链载Ai (http://www.lianzai.com/) | Powered by Discuz! X3.5 |