ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin: 1.5em 8px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">本人成功在4台服务器成功部署满血版DeepSeek-R1-671B,相关信息简介如下,现可承接咨询指导或部署业务订单,部署过程正在逐步优化完善,大家可相互一起学习。以下有部署成功后的相关内容展示。 ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 16.8px;font-weight: bold;display: table;margin: 4em auto 2em;padding: 0px 0.2em;background: rgb(15, 76, 129);color: rgb(255, 255, 255);">满血版DeepSeek-R1-671B内容展示 ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15.4px;font-weight: bold;margin: 2em 8px 0.75em 0px;padding-left: 8px;border-left: 3px solid rgb(15, 76, 129);color: rgb(63, 63, 63);">Ray集群状态 ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin: 1.5em 8px;color: rgb(63, 63, 63);"> ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 14px;border-radius: 4px;display: block;margin: 0.1em auto 0.5em;" title="null"/> ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15.4px;font-weight: bold;margin: 2em 8px 0.75em 0px;padding-left: 8px;border-left: 3px solid rgb(15, 76, 129);color: rgb(63, 63, 63);">Production Metrics ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;overflow-x: auto;border-radius: 8px;margin: 10px 8px;padding: 0px !important;">(self-llm) deepseek@deepseek2:~$ curl http://10.119.85.138:8000/metrics ... 540 0 # TYPE python_gc_objects_collected_total counter 0 7756k python_gc_objects_collected_total{generation="0"} 37427.0 0 --:python_gc_objects_collected_total{generation="1"} 14232.0 --:-- --:--:-- python_gc_objects_collected_total{generation="2"} 16818.0 --:--:-- 9615k #HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC #TYPE python_gc_objects_uncollectable_total counter python_gc_objects_uncollectable_total{generation="0"} 0.0 python_gc_objects_uncollectable_total{generation="1"} 0.0 python_gc_objects_uncollectable_total{generation="2"} 0.0 #HELP python_gc_collections_total Number oftimesthis generation was collected #TYPE python_gc_collections_total counter python_gc_collections_total{generation="0"} 3033.0 python_gc_collections_total{generation="1"} 267.0 python_gc_collections_total{generation="2"} 315.0 ... ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15.4px;font-weight: bold;margin: 2em 8px 0.75em 0px;padding-left: 8px;border-left: 3px solid rgb(15, 76, 129);color: rgb(63, 63, 63);">openai API接口测试 ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;overflow-x: auto;border-radius: 8px;margin: 10px 8px;padding: 0px !important;">#其中 10.119.85.138 是deepseek2节点的IB网卡IP (self-llm) deepseek@deepseek2:~$ curl 10.119.85.138:8000/v1/models -H "Authorization: Bearer zY0MrQwXV9Oo3g==" | jq #输出内容如下 %Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 523 100 523 0 0 105k 0 --:--:-- --:--:-- --:--:-- 127k { "object": "list", "data": [ { "id": "DeepSeek-R1-671B", "object": "model", "created": 1740405511, "owned_by": "vllm", "root": "/root/.cache/huggingface/hub/models/unsloth/DeepSeek-R1-BF16/", "parent": null, "max_model_len": 32768, "permission": [ { "id": "modelperm-ced685e8156b4618b593580109205165", "object": "model_permission", "created": 1740405511, "allow_create_engine": false, "allow_sampling": true, "allow_logprobs": true, "allow_search_indices": false, "allow_view": true, "allow_fine_tuning": false, "organization": "*", "group": null, "is_blocking": false } ] } ] }
ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin: 1.5em 8px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">同时在vllm serve命令执行的窗口会看到如下输出内容服务功能验证 (self-llm) deepseek@deepseek2:~$ curl -X POST "http://10.119.85.138:8000/v1/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer zY0MrQwXV9Oo3g==" -d '{ "model": "DeepSeek-R1-671B", "messages": [{"role": "user", "content": "你好"}]}' (self-llm) deepseek@deepseek2:~$ curl -X POST "http://10.119.85.138:8000/v1/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer zY0MrQwXV9Oo3g==" -d '{ "model": "DeepSeek-R1-671B", "messages": [{"role": "user", "content": "请证明勾股定理"}]}' #回答 {"id":"chatcmpl-11ae1ddf321343af848b5c683e67b72d","object":"chat.completion","created":1740411348,"model":"deepseek-r1","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"<think>\n嗯,用户让我证明勾股定理。勾股定理是数学里非常基础但又重要的定理,肯定有很多不同的证明方法。先回忆一下,勾股定理是说在直角三角形中,斜边的平方等于两条直角边的平方和,也就是a² + b² = c²。现在得选一 种合适的证明方式,可能是几何的或者代数的。\n\n首先想到的是几何证明中的拼接法,通过将四个直角三角形拼成一个大的正方形,然后比较面积。我要不要试试这个方法?比如说,四个全等的直角三角形,设它们的直角边为a和b, 斜边为c,拼起来的话中间应该会形成一个边长为(a+b)的正方形,中间的空隙可能是小正方形,边长是c或者别的?\n\n不对,应该会形成一个边长为c的正方形,或者这个?等一下,可能得仔细画个图想象一下。假设我们把四个三角 形每个的直角边朝外,那斜边就会组成里面的一个正方形,这时候 ...#回答问题的同时在vllm serve命令执行的窗口会看到如下,显示token平均生成吞吐率 INFO 02-24 17:21:12 metrics.py:455] Avg prompt throughput: 1.6 tokens/s, Avg generation throughput: 36.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%. INFO 02-24 17:21:17 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 37.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%. INFO 02-24 17:21:22 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 36.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%. ... #甚至更高速度 INFO 02-24 23:32:00 metrics.py:455] Avg prompt throughput: 442.9 tokens/s, Avg generation throughput: 38.8 tokens/s, Running: 3 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%. INFO 02-24 23:32:05 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 102.4 tokens/s, Running: 3 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%. INFO 02-24 23:32:07 async_llm_engine.py:179] Finished request chatcmpl-03add50cba264c84afe98fd6cce9907f. INFO 02-24 23:32:10 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 79.4 tokens/s, Running: 2 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%.#apt install nvtop (self-llm) deepseek@deepseek1:~/installPkgs$ nvtop #如下是`nvtop`命令输出open-webui会话界面 #其中 10.119.85.138 是deepseek2节点的IB网卡IP (self-llm) deepseek@deepseek2:~$ curl http://10.119.85.138:18080 #或在浏览器中直接访问上述地址。第一个注册的用户,默认就是管理员。注册后登录、提问二、成功部署所用硬软件 服务器信息 注:
(1)其中的万兆网卡部署过程中没有用到。
(2)NVIDIA A800的信息如下
deepseek@deepseek1:~$ nvidia-smi Fri Feb 21 09:25:35 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A800-SXM4-80GB On | 00000000:3D:00.0 Off | 0 | | N/A 33C P0 61W / 400W | 1MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA A800-SXM4-80GB On | 00000000:42:00.0 Off | 0 | | N/A 29C P0 58W / 400W | 1MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA A800-SXM4-80GB On | 00000000:61:00.0 Off | 0 | | N/A 30C P0 61W / 400W | 1MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA A800-SXM4-80GB On | 00000000:67:00.0 Off | 0 | | N/A 33C P0 64W / 400W | 1MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 4 NVIDIA A800-SXM4-80GB On | 00000000:AD:00.0 Off | 0 | | N/A 32C P0 57W / 400W | 1MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 5 NVIDIA A800-SXM4-80GB On | 00000000:B1:00.0 Off | 0 | | N/A 29C P0 61W / 400W | 1MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 6 NVIDIA A800-SXM4-80GB On | 00000000 0:00.0 Off | 0 | | N/A 30C P0 62W / 400W | 1MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 7 NVIDIA A800-SXM4-80GB On | 00000000 3:00.0 Off | 0 | | N/A 32C P0 60W / 400W | 1MiB / 81920MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+软件信息 物理服务器操作系统:Ubuntu 22.04.4 LTS-x86_64 Nvidia driver version: 550.90.07 CUDA runtime version: 12.1.105(node容器内)、V12.4.99(物理服务器上) nvidia-fabricmanager版本:550.90.07 nvlink:3.0 nvswitch:2.0 PyTorch version: 2.5.1+cu124 CUDA used to build PyTorch: 12.4 OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 CMake version: version 3.31.4 Libc version: glibc-2.35 Python version: 3.12.9 (main, Feb 5 2025, 08:49:00) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-5.15.0-113-generic-x86_64-with-glibc2.35 Is CUDA available: True CUDA_MODULE_LOADING set to: LAZY Is XNNPACK available: True CPU: Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz, 112核心 numpy==1.26.4 torch==2.5.1 torchaudio==2.5.1 torchvision==0.20.1 triton==3.1.0