Ollama 可以在本地运行LLM模型,本身是一个基于Golang的web服务器加上cli,背后通过CGO调用模型的。
brew install ollama
启动Ollama,背后实际上启动了一个Gin web server。
ollama serve
拉取镜像
ollama pull qwen2.5:1.5b
查看下载的镜像
ollama list
查看内存使用情况
ollama ps
通过curl调用
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5:1.5b",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}'
通过Python调用
from ollama import ChatResponse
from ollama import Client
client = Client(
host='http://localhost:11434'
)
response: ChatResponse = client.chat(
model='qwen2.5:1.5b',
messages=[
{
'role': 'user',
'content': '什么是Helm',
},
])
print(response.message.content)
或者通过Python的OpenAI调用
from openai import OpenAI
client = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
response = client.chat.completions.create(
model="qwen2.5:1.5b",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)