本地部署ollama

Ollama 可以在本地运行LLM模型，本身是一个基于Golang的web服务器加上cli，背后通过CGO调用模型的。

brew install ollama

启动Ollama，背后实际上启动了一个Gin web server。

ollama serve

拉取镜像

ollama pull qwen2.5:1.5b

查看下载的镜像

ollama list

查看内存使用情况

ollama ps

通过curl调用

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen2.5:1.5b",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

通过Python调用

from ollama import ChatResponse
from ollama import Client

client = Client(
  host='http://localhost:11434'
)

response: ChatResponse = client.chat(
  model='qwen2.5:1.5b',
  messages=[
  {
    'role': 'user',
    'content': '什么是Helm',
  },
])

print(response.message.content)

或者通过Python的OpenAI调用

from openai import OpenAI

client = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

response = client.chat.completions.create(
    model="qwen2.5:1.5b",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Reference

https://ollama.com/blog/openai-compatibility