一、K8s使用Model资源部署大模型

1、在 K8s 上部署大模型,可以直接用 Ollama Operator 的 CRD Model 部署,比如部署一个 phi的模型

# 编写配置文件
cd /home/ubuntu

vim phi.yaml

apiVersion: ollama.ayaka.io/v1
kind: Model
metadata:
  name: phi
spec:
  image: phi
  storageClassName: local-path
  replicas: 1
  imagePullPolicy: IfNotPresent

# 应用yaml
kubectl create -f phi.yaml

# 验证
kubectl get po  

2、首次部署大模型,会先创建一个 store 服务,用于存储 ollama 的模型文件

# 当Store起来后,会创建一个ollama服务,用于启动大模型
kubectl get po 

# 如果本地文件并没有当前模型,会先启动一个init容器进行模型的下载
kubectl logs -f  ollama-model-phi-6dbf6988fc-5jb7r  -c  ollama-image-pull

# 下载完成后,模型服务随之启动完成
kubectl get po -n ollama-llms

# 查看下载的模型文件
kubectl exec -ti ollama-models-store-0 -n ollama-llms -- bash
root@ollama-models-store-0:/# ls /root/.ollama/models/
blobs  manifests

# 同时也会创建一个SVC,用于调用该模型
root@VM-0-2-ubuntu:/home/ubuntu# kubectl get svc 
NAME                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
ollama-model-phi      ClusterIP   10.103.112.80   <none>        11434/TCP   5m8s
ollama-models-store   ClusterIP   10.98.84.255    <none>        11434/TCP   8m3s

# 访问测试
curl http://10.101.107.101:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
        "model": "phi",
        "messages": [
          {
            "role": "user",
            "content": "Hello! 你是什么模型?参数量有多大?"
          }
        ]
      }'

# 回复内容
{"id":"chatcmpl-267","object":"chat.completion","created":1739962225,"model":"phi","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":" Hi there! As an NLP model, my components are quite extensive. However, I can perform various functions such as language translation, question answering, sentiment analysis, text summarization, and more. The parameters for each of these functions vary depending on the specific task at hand, but overall, my training process involves numerous layers and models to improve performance over time while using large datasets.\n"},"finish_reason":"stop"}],"usage":{"prompt_tokens":65,"completion_tokens":79,"total_tokens":144}}

3、测试完毕后,可以卸载该模型

# 卸载模型
kubectl delete -f phi.yaml

# 结果验证
root@VM-0-2-ubuntu:/home/ubuntu# kubectl get svc
NAME                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
kubernetes            ClusterIP   10.96.0.1        <none>        443/TCP     67m
ollama-models-store   ClusterIP   10.105.115.183   <none>        11434/TCP   10m

root@VM-0-2-ubuntu:/home/ubuntu# kubectl get po
NAME                    READY   STATUS      RESTARTS   AGE
cuda-vectoradd          0/1     Completed   0          61m
ollama-models-store-0   1/1     Running     0          10m
volume-test             1/1     Running     0          56m

说明:store 服务不会卸载,后续在创建其他模型,不会在安装 store 服务。

二、K8s使用Kollama工具部署大模型

除了使用自定义资源部署,还可以使用 Kollama 工具进行部署。

1、在 ollama-llms 空间下,部署 phi 大模型

cd /home/ubuntu

./kollama deploy phi --image=phi --storage-class  local-path -n ollama-llms

2、查看创建的 Pod(已经下载的模型,不会在重新下载)

kubectl get po -n ollama-llms

3、使用 expose 指令暴露服务

cd /home/ubuntu

root@VM-0-2-ubuntu:/home/ubuntu# ./kollama expose phi -n ollama-llms
🎉 The model has been exposed through a service over 10.224.0.2:30997.

To start a chat with ollama:

  OLLAMA_HOST=10.224.0.2:30997 ollama run phi

To integrate with your OpenAI API compatible client:

  curl http://10.224.0.2:30997/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "phi",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'.

4、测试

# 测试
curl http://10.224.0.2:30997/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "phi",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'.

# 回复
{"id":"chatcmpl-708","object":"chat.completion","created":1739962460,"model":"phi","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":" Hi there! How can I assist you today?\n"},"finish_reason":"stop"}],"usage":{"prompt_tokens":34,"completion_tokens":13,"total_tokens":47}}

5、测试完毕后,可以卸载该模型

cd /home/ubuntu

./kollama undeploy phi -n ollama-llms

三、K8s一键部署Deepseek R1模型

1、定义yaml文件

vim deepseek-r1-1.5b.yaml

apiVersion: ollama.ayaka.io/v1
kind: Model
metadata:
  name: deepseek-r1
  namespace: ollama-llms
spec:
  image: deepseek-r1:1.5b
  storageClassName: local-path 
  replicas: 1
  imagePullPolicy: IfNotPresent

2、创建模型

kubectl create -f deepseek-r1-1.5b.yaml -n ollama-llms

3、查看状态

kubectl get po -n ollama-llms

4、查看创建的 svc 并执行测试

root@VM-0-2-ubuntu:/home/ubuntu# kubectl get svc -n ollama-llms
NAME                       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)     AGE
ollama-model-deepseek-r1   ClusterIP   10.103.67.9    <none>        11434/TCP   5m41s
ollama-models-store        ClusterIP   10.98.84.255   <none>        11434/TCP   15m

# 测试
curl http://10.111.125.216:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
        "model": "deepseek-r1:1.5b",
        "messages": [
          {
            "role": "user",
            "content": "Hello! 你是什么模型?参数量有多大?"
          }
        ]
      }'

# 回复

四、K8s一键部署任意大模型

如果需要部署其他大模型,也可以用同样的方式部署。

1、在 ollama 官网查找需要部署的模型,这里以llama3.3为例

ollama 官网链接:https://ollama.com/search

image-20250218152638972

2、找到llama3.2具体的版本

image-20250218152748173

3、创建 Model 资源

# 创建yaml文件
cd /home/ubuntu

vim llama3.2.yaml

apiVersion: ollama.ayaka.io/v1
kind: Model
metadata:
  name: llama-3-2-1b
spec:
  image: llama3.2:1b
  storageClassName: local-path
  replicas: 1
  imagePullPolicy: IfNotPresent

4、创建模型

kubectl create -f llama3.2.yaml 

五、K8s部署Open WebUI

1、创建WebUI 的 PVC

cd /home/ubuntu

# 定义yaml文件
vim open-webui-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app: open-webui
  name: open-webui-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: local-path
  resources:
    requests:
      storage: 2Gi

# 应用yaml文件创建pvc
kubectl create -f open-webui-pvc.yaml 

2、创建Deployment

cd /home/ubuntu

# 定义yaml文件
vim open-webui-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: open-webui-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: open-webui
  template:
    metadata:
      labels:
        app: open-webui
    spec:
      containers:
        - name: open-webui
          image: registry.cn-beijing.aliyuncs.com/dotbalo/open-webui:main
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: "500m"
              memory: "500Mi"
            limits:
              cpu: "1000m"
              memory: "1Gi"
          env:
            - name: OLLAMA_BASE_URL
              value: "http://ollama-models-store:11434"
            - name: ENABLE_OPENAI_API
              value: "false"
            - name: HF_HUB_OFFLINE
              value: "1"
          tty: true
          volumeMounts:
            - name: webui-volume
              mountPath: /app/backend/data
      volumes:
        - name: webui-volume
          persistentVolumeClaim:
            claimName: open-webui-pvc

3、创建 Deployment 并暴露服务

# 创建Deployment 
kubectl create -f open-webui-deploy.yaml 

# 暴露服务
kubectl expose deploy open-webui-deployment --type NodePort 

4、查看服务

root@VM-0-2-ubuntu:/home/ubuntu# kubectl get svc open-webui-deployment
NAME                    TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
open-webui-deployment   NodePort   10.103.84.203   <none>        8080:31500/TCP   82s

5、webui 启动完成后,即可通过 IP地址31500 端口访问 webu

六、K8s中指定GPU资源部署大模型

在 K8s 中使用 ollama 部署大模型,和虚拟机类似,默认也是使用全部的 GPU 资源进行启动模型服务。

如果需要控制调度的 GPU 资源,可以使用 resources 进行指定,比如当前模型只允许使用 1个 GPU 卡

apiVersion: ollama.ayaka.io/v1
kind: Model
metadata:
  name: deepseek-r1
spec:
  image: deepseek-r1:1.5b
  storageClassName: mdx-local-path
  replicas: 2
  imagePullPolicy: IfNotPresent
  resources:
    limits:
      cpu: 4
      memory: 8Gi
      nvidia.com/gpu: 1
    requests:
      cpu: 4
      memory: 8Gi
      nvidia.com/gpu: 1

七、 K8s 一键扩容模型服务

1、在 K8s 中扩容大模型服务,只需要更改 Model 资源的副本数即可

...
...
  replicas: 2
...
...

完整文件如下:

vim deepseek-r1-1.5b.yaml

apiVersion: ollama.ayaka.io/v1
kind: Model
metadata:
  name: deepseek-r1
  namespace: ollama-llms
spec:
  image: deepseek-r1:1.5b
  storageClassName: local-path
  replicas: 2
  imagePullPolicy: IfNotPresent

2、更新 K8s 资源

kubectl replace -f deepseek-r1-1.5b.yaml

3、查看资源

kubectl get po | grep deepseek-r1