OLLama详细的 api 介绍不完全指南 python 直接调用 OLLama api 翻译助手演示 - 链载Ai

最近写了几篇关于OLLama的内容，大家都很关注。OLLama让那些没有GPU，还想玩一玩大语言模型成为可能，普通的CPU也可以跑个千问，gemma。ollama有自己python包，直接通过使用包来实现本地ollama的的所有功能，另外也提供api，调用方法也很简单，与openai调用方法类似，叫兼容openai接口。我看了一下ollama的api文档，通过调用api，可以查询本地模型列表，进行聊天对话，聊天补全，创建模型，拉取，删除模型等操作。最后用python做一个翻译助手演示，希望这些内容对你有帮助，记得点赞、关注、点在看。

1.生成补全

格式

POST/api/generate

使用提供的模型生成给定提示的响应。这是一个流端点，因此会有一系列响应。最终响应对象将包括来自请求的统计信息和附加数据。

参数

model：（必填）模型名称，后面可以跟tag。比如gemma:7b
prompt：生成响应的提示
images：（可选）base64 编码图像的列表（对于多模式模型，例如llava）

高级参数（可选）：

format：返回响应的格式。目前唯一接受的值是json
options：模型文件文档中列出的其他模型参数，例如temperature
system：系统消息（覆盖中定义的内容Modelfile）
template：要使用的提示模板（覆盖中定义的内容Modelfile）
context：从先前的请求返回的上下文参数/generate，这可用于保留简短的会话记忆
stream：false响应是否作为单个响应对象返回，而不是对象流
raw：如果true没有格式化，将应用于提示。raw如果您在 API 请求中指定完整模板化提示，则可以选择使用该参数
keep_alive：控制模型在请求后加载到内存中的时间（默认值5m：）

请求和响应的格式均为json格式，发出请求的格式：

curlhttp://localhost:11434/api/generate-d'{
"model":"llama2",
"prompt":"Whatcoloristheskyatdifferenttimesoftheday?RespondusingJSON",
"format":"json",
"stream":false
}'

高级参数都可以在请求中携带，比如keep_alive，默认是5分钟，5分钟内没有任何操作，释放内存。如果是-1，是一直加载在内存。

响应返回的格式：

{
"model":"llama2",
"created_at":"2023-11-09T21:07:55.186497Z",
"response":"{\n\"morning\":{\n\"color\":\"blue\"\n},\n\"noon\":{\n\"color\":\"blue-gray\"\n},\n\"afternoon\":{\n\"color\":\"warmgray\"\n},\n\"evening\":{\n\"color\":\"orange\"\n}\n}\n",
"done":true,
"context":[1,2,3],
"total_duration":4648158584,
"load_duration":4071084,
"prompt_eval_count":36,
"prompt_eval_duration":439038000,
"eval_count":180,
"eval_duration":4196918000
}

在powershell访问API格式为：

(Invoke-WebRequest-methodPOST-Body'{"model":"llama2","prompt":"Whyistheskyblue?","stream":false}'-urihttp://localhost:11434/api/generate).Content|ConvertFrom-json

python访问API：

url_generate="http://localhost:11434/api/generate"
defget_response(url,data):
response=requests.post(url,json=data)
response_dict=json.loads(response.text)
response_content=response_dict["response"]
returnresponse_content

data={
"model":"gemma:7b",
"prompt":"Whyistheskyblue?",
"stream":False
}


res=get_response(url_generate,data)
print(res)

上面是通过python对接口进行访问，可在程序代码直接调用，适合批量操作，生成结果。

正常请求时，options都省略了，options可以设置很多参数，比如temperature，是否使用gpu，上下文的长度等，都在此设置。下面是一个包含options的请求：

curlhttp://localhost:11434/api/generate-d'{
"model":"llama2",
"prompt":"Whyistheskyblue?",
"stream":false,
"options":{
"num_keep":5,
"seed":42,
"num_predict":100,
"top_k":20,
"top_p":0.9,
"tfs_z":0.5,
"typical_p":0.7,
"repeat_last_n":33,
"temperature":0.8,
"repeat_penalty":1.2,
"presence_penalty":1.5,
"frequency_penalty":1.0,
"mirostat":1,
"mirostat_tau":0.8,
"mirostat_eta":0.6,
"penalize_newline":true,
"stop":["\n","user:"],
"numa":false,
"num_ctx":1024,
"num_batch":2,
"num_gqa":1,
"num_gpu":1,
"main_gpu":0,
"low_vram":false,
"f16_kv":true,
"vocab_only":false,
"use_mmap":true,
"use_mlock":false,
"rope_frequency_base":1.1,
"rope_frequency_scale":0.8,
"num_thread":8
}
}'

2.生成聊天补全

格式

POST/api/chat

和上面生成补全很像。

参数

model：（必填）型号名称
messages：聊天的消息，这个可以用来保留聊天记忆

该message对象具有以下字段：

role：消息的角色，system或者user assistant
content: 消息内容
images（可选）：要包含在消息中的图像列表（对于多模式模型，例如llava）

高级参数（可选）：

format：返回响应的格式。目前唯一接受的值是json
options：模型文件文档中列出的其他模型参数，例如temperature
template：要使用的提示模板（覆盖中定义的内容Modelfile）
stream：false响应是否作为单个响应对象返回，而不是对象流
keep_alive：控制模型在请求后加载到内存中的时间（默认值5m：）

发送聊天请求：

curlhttp://localhost:11434/api/chat-d'{
"model":"llama2",
"messages":[
{
"role":"user",
"content":"whyistheskyblue?"
}
]
}'

和generate的区别，message和prompt对应，prompt后面直接跟要聊的内容，而message里面还有role角色，user相当于提问的内容。

响应返回的内容：

{
"model":"llama2",
"created_at":"2023-08-04T19:22:45.499127Z",
"done":true,
"total_duration":4883583458,
"load_duration":1334875,
"prompt_eval_count":26,
"prompt_eval_duration":342546000,
"eval_count":282,
"eval_duration":4535599000
}

还可以发送带聊天记录的请求：

curlhttp://localhost:11434/api/chat-d'{
"model":"llama2",
"messages":[
{
"role":"user",
"content":"whyistheskyblue?"
},
{
"role":"assistant",
"content":"duetorayleighscattering."
},
{
"role":"user",
"content":"howisthatdifferentthanmiescattering?"
}
]
}'

python格式的生成聊天补全：

url_chat="http://localhost:11434/api/chat"
data={
"model":"llama2",
"messages":[
{
"role":"user",
"content":"whyistheskyblue?"
},
"stream":False
}
response=requests.post(url_chat,json=data)
response_dict=json.loads(response.text)
print(response_dict)

3.创建模型

格式

POST/api/create

参数

name：要创建的模型的名称
modelfile（可选）：模型文件的内容
stream：（可选）如果false响应将作为单个响应对象返回，而不是对象流
path（可选）：模型文件的路径modelfile后面直接是modelfile的内容，比如基于那个模型，有那些设定，

创建模型的请求：

curlhttp://localhost:11434/api/create-d'{
"name":"mario",
"modelfile":"FROMllama2\nSYSTEMYouaremariofromSuperMarioBros."
}'

基于llama2创建一个模型，系统角色进行设定。返回结果就不多做介绍。

使用python创建一个模型：

url_create="http://localhost:11434/api/create"
data={
"name":"mario",
"modelfile":"FROMllama2\nSYSTEMYouaremariofromSuperMarioBros."
}
response=requests.post(url,json=data)
response_dict=json.loads(response.text)
print(response_dict)

这个python和上面的相同的功能。

4.显示模型

格式

GET/api/tags

列出本地所有模型。

使用python显示模型。

url_list="http://localhost:11434/api/tags"
defget_list(url):
response=requests.get(url)
response_dict=json.loads(response.text)
model_names=[model["name"]formodelinresponse_dict["models"]]
names=[]
#打印所有模型的名称
fornameinmodel_names:
names.append(name)
foridx,nameinenumerate(names,start=1):
print(f"{idx}.{name}")
returnnames
get_list(url_list)

返回结果：

1.codellama:13b
2.codellama:7b-code
3.gemma:2b
4.gemma:7b
5.gemma_7b:latest
6.gemma_sumary:latest
7.llama2:7b
8.llama2:latest
9.llava:7b
10.llava:v1.6
11.mistral:latest
12.mistrallite:latest
13.nomic-embed-text:latest
14.qwen:1.8b
15.qwen:4b
16.qwen:7b

5.显示模型信息

格式

POST/api/show

显示有关模型的信息，包括详细信息、模型文件、模板、参数、许可证和系统提示。

参数

name：要显示的模型名称

请求

curlhttp://localhost:11434/api/show-d'{
"name":"llama2"
}'

使用python显示模型信息：

url_show_info="http://localhost:11434/api/show"
defshow_model_info(url,model_name):
data={
"name":model_name
}
response=requests.post(url,json=data)
response_dict=json.loads(response.text)
print(response_dict)
show_model_info(url_show_info,"gemma:7b")

返回的结果：

{'license':'GemmaTermsofUse\n\nLastmodified:February21,2024\n\nByusing,reproducing,modifying,distributing,performingordisplayinganyportionorelementofGemma,ModelDerivativesincludingviaanyHostedService,(eachasdefinedbelow)(collectively,the"GemmaServices")orotherwiseacceptingthetermsofthisAgreement,youagreetobeboundbythisAgreement.\n\nSection1EFINITIONS\n1.1Definitions\n(a)"Agreement"or"GemmaTermsofUse"meansthesetermsandconditionsthatgoverntheuse,reproduction,DistributionormodificationoftheGemmaServicesandanytermsandconditionsincorporatedbyreference.\n\n(b)"Distribution"or"Distribute"meansanytransmission,publication,orothersharingofGemmaorModelDerivativestoathirdparty,includingbyprovidingormakingGemmaoritsfunctionalityavailableasahostedserviceviaAPI,webaccess,oranyotherelectronicorremotemeans("HostedService").\n\n(c)"Gemma"meansthesetofmachinelearninglanguagemodels,trainedmodelweightsandparametersidentifiedatai.google.dev/gemma,regardlessofthesourcethatyouobtaineditfrom.\n\n(d)"Google"meansGoogleLLC.\n\n(e)"ModelDerivatives"meansall(i)modificationstoGemma,(ii)worksbasedonGemma,or(iii)anyothermachinelearningmodelwhichiscreatedbytransferofpatternsoftheweights,parameters,operations,orOutputofGemma,tothatmodelinordertocausethatmodeltoperformsimilarlytoGemma,includingdistillationmethodsthatuseintermediatedatarepresentationsormethodsbasedonthegenerationofsyntheticdataOutputsbyGemmafortrainingthatmodel.Forclarity,OutputsarenotdeemedModelDerivatives.\n\n(f)"Output"meanstheinformationcontentoutputofGemmaoraModelDerivativethatresultsfromoperatingorotherwiseusingGemmaortheModelDerivative,includingviaaHostedService.\n\n1.2\nAsusedinthisAgreement,"including"means"includingwithoutlimitation".\n\nSection2:ELIGIBILITYANDUSAGE\n2.1Eligibility\nYourepresentandwarrantthatyouhavethelegalcapacitytoenterintothisAgreement(includingbeingofsufficientageofconsent).IfyouareaccessingorusinganyoftheGemmaServicesfororonbehalfofalegalentity,(a)youareenteringintothisAgreementonbehalfofyourselfandthatlegalentity,(b)yourepresentandwarrantthatyouhavetheauthoritytoactonbehalfofandbindthatentitytothisAgreementand(c)referencesto"you"or"your"intheremainderofthisAgreementreferstobothyou(asanindividual)andthatentity.\n\n2.2Use\nYoumayuse,reproduce,modify,Distribute,performordisplayanyoftheGemmaServicesonlyinaccordancewiththetermsofthisAgreement,andmustnotviolate(orencourageorpermitanyoneelsetoviolate)anytermofthisAgreement.\n\nSection3ISTRIBUTIONANDRESTRICTIONS\n3.1DistributionandRedistribution\nYoumayreproduceorDistributecopiesofGemmaorModelDerivativesifyoumeetallofthefollowingconditions:\n\nYoumustincludetheuserestrictionsreferencedinSection3.2asanenforceableprovisioninany
.......

6.其他

除了以上功能，还可以复制模型，删除模型，拉取模型，另外，如果有ollama的帐号，还可把模型推到ollama的服务器。

7.OLLama相关设置

1.本地模型存储

windows用户默认存储位置：

C:\Users\<username>\.ollama\models

更改默认存储位置，在环境变量中设置OLLAMA_MODELS对应存储位置，实现模型存储位置更改。

2.导入GGUF模型

可能有从HuggingFace下载的gguf模型，可以通过modelfile创建模型导入gguf模型。创建一个Modelfile文件：

FROM./mistral-7b-v0.1.Q4_0.gguf

通过这个Modelfile创建新模型：

ollamacreateexample-fModelfile

example为新模型名，使用时直接调用这个模型名就可以。

3.参数设置

正常运行模型时，很少对参数进行设置，在发送请求时，可以通过options对参数进行设置，比如设置上下文的token数：

curlhttp://localhost:11434/api/generate-d'{
"model":"llama2",
"prompt":"Whyistheskyblue?",
"options":{
"num_ctx":4096
}
}'

默认是2048，这里修改成了4096，还可以设置比如是否使用gpu，后台服务跑起来，刚出来这些东西，都可以在参数里进行设置。

8.兼容openai

兼容openai接口，通过openai的包可以直接调用访问ollama提供的后台服务。

fromopenaiimportOpenAI
client=OpenAI(
base_url='http://localhost:11434/v1/',
#requiredbutignored
api_key='ollama',
)

chat_completion=client.chat.completions.create(
messages=[
{
'role':'user',
'content':'Saythisisatest',
}
],
model='llama2',
)

得到：

ChatCompletion(id='chatcmpl-173',choices=[Choice(finish_reason='stop',index=0,logprobs=None,message=ChatCompletionMessage(content='\nThequestion"Whyistheskyblue?"isacommonone,andthereareseveralreasonswhytheskyappearsbluetooureyes.Herearesomepossibleexplanations:\n\n1.Rayleighscattering:WhensunlightentersEarth\'satmosphere,itencounterstinymoleculesofgasessuchasnitrogenandoxygen.Thesemoleculesscatterthelightinalldirections,buttheyscattershorter(blue)wavelengthsmorethanlonger(red)wavelengths.ThisisknownasRayleighscattering.Asaresult,thebluelightisdispersedthroughouttheatmosphere,givingtheskyitsblueappearance.\n2.Miescattering:InadditiontoRayleighscattering,thereisalsoaphenomenoncalledMiescattering,whichoccurswhenlightencountersmuchlargerparticlesintheatmosphere,suchasdustandwaterdroplets.Theseparticlescanalsoscatterlight,buttheypreferentiallyscatterlonger(red)wavelengths,whichcanmaketheskyappearmoreredororangeduringsunriseandsunset.\n3.Angel\'sbreath:Anotherexplanationforwhytheskyappearsblueisduetoaphenomenoncalled"angel\'sbreath."ThisoccurswhensunlightpassesthroughalayerofcoolairneartheEarth\'ssurface,causingthelighttobescatteredinalldirectionsandtakeonabluishhue.\n4.Opticalpropertiesoftheatmosphere:Theatmospherehasitsownopticalproperties,whichcanaffecthowlightistransmittedandscattered.Forexample,theatmospherescattersshorterwavelengths(suchasblueandviolet)morethanlongerwavelengths(suchasredandorange),whichcancontributetothebluecolorofthesky.\n5.Perspective:Thewayweperceivethecoloroftheskycanalsobeaffectedbyperspective.Fromadistance,theskymayappearbluebecauseourbrainsarewiredtoperceiveblueasacolorthatisfurtheraway.Thisisknownasthe"erspectiveProblem."\n\nIt\'sworthnotingthatthecoloroftheskycanvarydependingonthetimeofday,theamountofsunlight,andotherenvironmentalfactors.Forexample,duringsunriseandsunset,theskymayappearmoreredororangeduetothescatteringoflightbyatmosphericparticles.',role='assistant',function_call=None,tool_calls=None))],created=1710810193,model='llama2:7b',object='chat.completion',system_fingerprint='fp_ollama',usage=CompletionUsage(completion_tokens=498,prompt_tokens=34,total_tokens=532))

9.翻译助手

最后一个实现翻译助手，这么多大模型，中西语料足够，让他充当个免费翻译没问题吧。我愿意在网上找英文资源，有时会没有字幕，自己英语又不好，如果能把字幕翻译的活干好了，这个大模型学习，也算有所收获。下面通过python代码，访问ollama，给他设定一个身份，让他充当一个翻译的角色，后面只给他英文内容，他直接输出中文内容（"Translate the following into chinese and only show me the translated"）。只是一个demo，字幕提取，读取翻译应该都可以搞定。下面演示是要翻译的内容为grok网页介绍内容，看一下他翻译的效果。

importrequests
importjson

text="""
WearereleasingthebasemodelweightsandnetworkarchitectureofGrok-1,ourlargelanguagemodel.Grok-1isa314billionparameterMixture-of-ExpertsmodeltrainedfromscratchbyxAI.

ThisistherawbasemodelcheckpointfromtheGrok-1pre-trainingphase,whichconcludedinOctober2023.Thismeansthatthemodelisnotfine-tunedforanyspecificapplication,suchasdialogue.

WearereleasingtheweightsandthearchitectureundertheApache2.0license.

Togetstartedwithusingthemodel,followtheinstructionsatgithub.com/xai-org/grok.

ModelDetails
Basemodeltrainedonalargeamountoftextdata,notfine-tunedforanyparticulartask.
314BparameterMixture-of-Expertsmodelwith25%oftheweightsactiveonagiventoken.
TrainedfromscratchbyxAIusingacustomtrainingstackontopofJAXandRustinOctober2023.
ThecoverimagewasgeneratedusingMidjourneybasedonthefollowingpromptproposedbyGrok:A3Dillustrationofaneuralnetwork,withtransparentnodesandglowingconnections,showcasingthevaryingweightsasdifferentthicknessesandcolorsoftheconnectinglines.
"""


#"Describethebug.Whenselectingtouseaselfhostedollamainstance,thereisnowaytodo2things:Settheserverendpointfortheollamainstance.inmycaseIhaveadesktopmachinewithagoodGPUandrunollamathere,whencodingonmylaptopiwanttousetheollamainstanceonmydesktop,nomatterwhatvalueissetforcody.autocomplete.advanced.serverEndpoint,codywillalwaysattempttousehttp://localhost:11434,soicannotsepcifytheipofmydesktopmachinehostingollama.Useadifferentmodelonollama-nomatterwhatvalueissetforcody.autocomplete.advanced.model,forexamplewhenllama-code-13bisselected,thevscodeoutputtabforcodyalwayssays:█CodyCompletionProvider:initialized:unstable-ollama/codellama:7b-code"

url_generate="http://localhost:11434/api/generate"

data={
"model":"mistral:latest",
"prompt":f"{text}",#"Whyistheskyblue?",
"system":"Translatethefollowingintochineseandonlyshowmethetranslated",
"stream":False
}

defget_response(url,data):
response=requests.post(url,json=data)
response_dict=json.loads(response.text)
response_content=response_dict["response"]
returnresponse_content

res=get_response(url_generate,data)
print(res)

大概演示一下，具体细节再调整吧。今天内容到些结束。