链载Ai

标题: 使用 Nginx 对 LLM 服务进行负载均衡实践 [打印本页]

作者: 链载Ai    时间: 3 小时前
标题: 使用 Nginx 对 LLM 服务进行负载均衡实践
想象一下有很多的算法服务实例,负载不均衡会造成:


需要将请求分发到不同的节点进行处理,让每个节点的负载在合适的水平,这就是负载均衡。

1. 简介

nginx是一款开源的、高性能的 Web 服务器,同时也广泛用作 反向代理服务器、负载均衡器 和 HTTP 缓存。 它的设计目标是解决传统服务器(如 Apache)在高并发场景下的性能瓶颈,现已成为全球最流行的 Web 服务器之一。

特点:

2. Nginx安装

以 ubuntu 为例

sudo apt install nginx

编辑配置文件/etc/nginx/nginx.conf,向 http 添加以下内容

http {
upstream backend {
least_conn; # 均衡算法
server 127.0.0.1:8001; # 后端服务1
server 127.0.0.1:8002; # 后端服务2
}
server {
listen 80;
location / {
proxy_pass http://backend;
}
}

log_format upstream_log'$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'to: $upstream_addr';

access_log /var/log/nginx/upstream.log upstream_log;
}

完整配置如下:

# cat /etc/nginx/nginx.conf
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
worker_connections 768;
# multi_accept on;
}

http {
##
# Basic Settings
##

sendfile on;
tcp_nopush on;
types_hash_max_size 2048;
# server_tokens off;

# server_names_hash_bucket_size 64;
# server_name_in_redirect off;

include /etc/nginx/mime.types;
default_type application/octet-stream;

##
# Logging Settings
##

log_format upstream_log'$remote_addrremote_port $request_uri - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'to: $upstream_addr';

access_log /var/log/nginx/access.log upstream_log;
error_log /var/log/nginx/error.log;

##
# SSL Settings
##

ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3;# Dropping SSLv3, ref: POODLE
ssl_prefer_server_ciphers on;

##
# Gzip Settings
##

gzip on;
# gzip_vary on;
# gzip_proxied any;
# gzip_comp_level 6;
# gzip_buffers 16 8k;
# gzip_http_version 1.1;
# gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

##
# Upstream Servers
##

upstream backend {
least_conn;
server 127.0.0.1:8001; # 后端服务1
server 127.0.0.1:8002; # 后端服务2
}

##
# Server Blocks
##

server {
listen 80;
server_name localhost;
location / {
proxy_pass http://backend;
proxy_set_header Host$host;
proxy_set_header X-Real-IP$remote_addr;
proxy_set_header X-Forwarded-For$proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto$scheme;
#proxy_http_version 1.1; # 确保使用 HTTP/1.1 <button class="citation-flag" data-index="7">
#proxy_set_header Connection '';
}
}
add_header X-Upstream$upstream_addralways; # 在响应头中暴露后端地址
##
# Virtual Host Configs
##

include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}

#mail {
# # See sample authentication script at:
# # http://wiki.nginx.org/ImapAuthenticateWithApachePhpScript
#
# # auth_http localhost/auth.php;
# # pop3_capabilities "TOP" "USER";
# # imap_capabilities "IMAP4rev1" "UIDPLUS";
#
# server {
# listen localhost:110;
# protocol pop3;
# proxy on;
# }
#
# server {
# listen localhost:143;
# protocol imap;
# proxy on;
# }
#}

编写完配置,systemctl reload nginx重启,nginx -t确认配置ok

3. 均衡算法

以上配置中least_conn是均衡算法,可以有多种算法可选

upstream backend {
server 127.0.0.1:8001 weight=3; # 60%的请求
server 127.0.0.1:8002 weight=2; # 40%的请求
}
upstream backend {
least_conn;
server 127.0.0.1:8001;
server 127.0.0.1:8002;
}
upstream backend {
ip_hash;
server 127.0.0.1:8001;
server 127.0.0.1:8002;
}

一致性哈希通过哈希环和虚拟节点机制,有效解决了传统哈希算法在动态环境下的数据迁移和负载不均问题。 其核心在于减少节点变动的影响范围,并利用虚拟节点实现数据分布的平衡性,是分布式系统中实现高可用和可扩展性的关键技术

upstream backend {
hash$request_uriconsistent; # 按请求URI分配
server 127.0.0.1:8001;
server 127.0.0.1:8002;
}

所有算法均可配合权重使用:

server 127.0.0.1:8001 weight=5 max_fails=3 fail_timeout=30s;

4. 测试负载均衡

开启两个http服务(端口 8001,8002),使用python脚本server.py启动

# server.py
fromhttp.serverimportBaseHTTPRequestHandler, HTTPServer
importsocket

classDebugRequestHandler(BaseHTTPRequestHandler):
defdo_GET(self):
# 打印客户端信息
client_ip, client_port = self.client_address
print(f"\n--- 新请求 @{self.server.server_port}---")
print(f"[客户端]{client_ip}:{client_port}")
print(f"[方法]{self.command}")
print(f"[路径]{self.path}")
print(f"[请求头]{self.headers}")

# 返回响应
self.send_response(200)
self.send_header("Content-type","text/plain")
self.end_headers()
self.wfile.write(f"响应来自:{self.server.server_port}".encode())

defdo_POST(self):
# 处理 POST 请求
content_length = int(self.headers["Content-Length"])
post_data = self.rfile.read(content_length).decode("utf-8")

print(f"\n--- POST 请求 @{self.server.server_port}---")
print(f"[数据]{post_data}")
self.do_GET() # 复用 GET 响应逻辑

defrun(server_class=HTTPServer, handler_class=DebugRequestHandler, port=8001):
server_address = ("0.0.0.0", port) # 绑定到所有接口
httpd = server_class(server_address, handler_class)
print(f"服务启动于 0.0.0.0:{port}...")
httpd.serve_forever()

if__name__ =="__main__":
# 启动两个服务实例(分别在 8001 和 8002 端口)
importthreading
threading.Thread(target=run, kwargs={"port":8001}).start()
threading.Thread(target=run, kwargs={"port":8002}).start()

python3 server.py

压力测试

apt install apache2-utils
ab -n 1000 -c 10 http://localhost/

ab是 Apache Bench 的缩写,它是 Apache HTTP 服务器项目中的一个性能测试工具,用于对 Web 服务器发起压力测试。

命令分解

  1. **ab**:工具名称(Apache Bench)。
  2. **-n 1000**:总请求数为 1000(-n表示number of requests)。
  3. **-c 10**:并发用户数为 10(-c表示concurrency,即同时发送的请求数)。
  4. **http://localhost/**:目标测试地址(可以是本地服务或远程 URL)。

模拟10 个并发用户同时访问http://localhost/,每个用户连续发送请求,直到总请求数达到1000。通过该命令可以测试:

示例输出解读

运行命令后,输出结果示例如下:

This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software: nginx/1.18.0
Server Hostname: localhost
Server Port: 80

Document Path: /
Document Length: 18 bytes

Concurrency Level: 10
Time taken for tests: 1.201 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 178000 bytes
HTML transferred: 18000 bytes
Requests per second: 832.30 [#/sec] (mean)
Time per request: 12.015 [ms] (mean)
Time per request: 1.201 [ms] (mean, across all concurrent requests)
Transfer rate: 144.68 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 1
Processing: 2 12 1.9 12 26
Waiting: 1 12 1.9 12 25
Total: 2 12 1.9 12 26

Percentage of the requests served within a certain time (ms)
50% 12
66% 12
75% 13
80% 13
90% 14
95% 15
98% 16
99% 18
100% 26 (longest request)

关键指标

  1. Requests per second813.01 [#/sec]
    表示服务器每秒能处理832 个请求(吞吐量越高,性能越好)。

  2. Time per request







欢迎光临 链载Ai (https://www.lianzai.com/) Powered by Discuz! X3.5