ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;display: table;padding: 0px 1em;color: rgb(63, 63, 63);">Ollama之服务监测程序背景描述ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 14px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">最近deepseek-r1大火,有个业务系统,需要在本地接入deepseek-r1,这位客户私有化部署的需求,同时闲置了一张3090卡。给他用Ollma拉取deepseek-r1:32b的模型。一开始都很顺利,谁知道,服务运行一段时间,就莫名奇妙卡死了,试了很多种方法,仍然没找到原因,所以就写了如下的监测程序监控服务,原理很简单,就是定时调用Ollama服务api,如果请求超时,则重启Ollama服务。ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;display: table;padding: 0px 0.2em;color: rgb(255, 255, 255);background: rgb(15, 76, 129);">代码实现ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 14px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">这边使用Python实现的,创建一个ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-feature-settings: normal;font-variation-settings: normal;font-size: 12.6px;text-align: left;line-height: 1.75;color: rgb(221, 17, 68);background: rgba(27, 31, 35, 0.05);padding: 3px 5px;border-radius: 4px;">ollama_monitor.py文件。ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-feature-settings: normal;font-variation-settings: normal;font-size: 14px;margin: 10px 8px;color: rgb(201, 209, 217);background: rgb(13, 17, 23);text-align: left;line-height: 1.5;overflow-x: auto;border-radius: 8px;padding: 0px !important;">importrequests importtime importsubprocess importpsutil importlogging importos fromdatetimeimportdatetime
OLLAMA_HOST = os.environ.get("OLLAMA_HOST","localhost:11434") try: port = OLLAMA_HOST.split(":")[1] # 提取端口号 OLLAMA_API_URL =f"http://localhost:{port}/api/tags" exceptIndexError: logging.error("OLLAMA_HOST 环境变量格式错误,应为 '主机:端口'") port ="11434"#默认端口 OLLAMA_API_URL =f"http://localhost:{port}/api/tags"
TIMEOUT_SECONDS =10 RESTART_COMMAND ="ollama ps"
defsetup_logging(): """配置日志记录器,同时输出到文件和控制台。""" log_dir ="logs" ifnotos.path.exists(log_dir): os.makedirs(log_dir)
current_time = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") log_file_path = os.path.join(log_dir,f"ollama_monitor_{current_time}.log")
# 创建文件处理器 file_handler = logging.FileHandler(log_file_path) file_handler.setLevel(logging.INFO) file_formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s') file_handler.setFormatter(file_formatter)
# 创建控制台处理器 console_handler = logging.StreamHandler() console_handler.setLevel(logging.INFO) console_formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s') console_handler.setFormatter(console_formatter)
# 获取根日志记录器并添加处理器 logging.basicConfig(level=logging.INFO, handlers=[file_handler, console_handler])
defcheck_ollama_status(): """检查 Ollama 状态,如果卡住则返回 True,否则返回 False。""" try: response = requests.get(OLLAMA_API_URL, timeout=TIMEOUT_SECONDS) response.raise_for_status() returnFalse exceptrequests.exceptions.RequestExceptionase: logging.error(f"Ollama 可能卡住:{e}") returnTrue
defrestart_ollama(): """重启 Ollama 服务。""" logging.info("重启 Ollama 服务...") try: forprocinpsutil.process_iter(['pid','name']): ifproc.info['name'] =='ollama.exe': p = psutil.Process(proc.info['pid']) p.terminate()
subprocess.Popen(RESTART_COMMAND, shell=True) logging.info("Ollama 服务已重启。") exceptExceptionase: logging.error(f"重启 Ollama 服务失败:{e}")
if__name__ =="__main__": setup_logging() whileTrue: ifcheck_ollama_status(): restart_ollama() sleep_time =int(os.environ.get("OLLAMA_MONITOR_INTERVAL",60)) time.sleep(sleep_time)ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 14px;letter-spacing: 0.1em;color: rgb(63, 63, 63);">这边检测的是ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-feature-settings: normal;font-variation-settings: normal;font-size: 12.6px;text-align: left;line-height: 1.75;color: rgb(221, 17, 68);background: rgba(27, 31, 35, 0.05);padding: 3px 5px;border-radius: 4px;">http://localhost:{port}/api/tags获取模型列表接口,用使用ps接口,没效果。 这边读取了环境变量ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-feature-settings: normal;font-variation-settings: normal;font-size: 12.6px;text-align: left;line-height: 1.75;color: rgb(221, 17, 68);background: rgba(27, 31, 35, 0.05);padding: 3px 5px;border-radius: 4px;">OLLAMA_MONITOR_INTERVAL监控间隔,默认为60秒ingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;display: table;padding: 0px 0.2em;color: rgb(255, 255, 255);background: rgb(15, 76, 129);">安装依赖包requirements.txt requests psutil pyinstaller
运行pip install -r requirements.txt 打包程序在程序目录下执行:pyinstaller --onefile ollama_monitor.py之后在根目录的dist/ollama_monitor.exe文件。 将该文件拷贝到目标机器上运行即可。 |