当前位置: 首页 > news >正文

深入解析:简单、高效且低成本的预训练、微调与服务,惠及大众基于 Ray 架构设计的覆盖大语言模型(LLM)完整生命周期的解决方案byzer-llm

简单、高效且低成本的预训练、微调与服务,惠及大众基于 Ray 架构设计的覆盖大语言模型(LLM)完整生命周期的解决方案byzer-llm

官网:https://github.com/allwefantasy/byzer-llm

手册:https://github.com/allwefantasy/byzer-llm/blob/master/docs/zh/001_%E4%B8%80%E4%B8%AA%E5%8A%AA%E5%8A%9B%E6%88%90%E4%B8%BA%E5%A4%A7%E6%A8%A1%E5%9E%8B%E7%BC%96%E7%A8%8B%E6%8E%A5%E5%8F%A3%E7%9A%84%E7%A5%9E%E5%A5%87Python%E5%BA%93.md

Byzer-LLM 基于 Ray 技术构建,是一款覆盖大语言模型(LLM)完整生命周期的解决方案,包括预训练、微调、部署及推理服务等阶段。

Byzer-LLM 的独特之处在于:

  1. 全生命周期管理:支持预训练、微调、部署和推理服务全流程
  2. 兼容 Python/SQL API 接口
  3. 基于 Ray 架构设计,便于轻松扩展

名词解释 

SaaS 模型 :把 LLM/AI能力封装成云服务,面向企业或个人以订阅/按调用次数收费,的AI大模型模型,也就是我们常说的AI模型调用

安装使用

直接pip安装

pip install byzer-llm

启动ray服务

ray start --head

 服务启动,提示

Local node IP: 192.168.0.95
/home/skywalk/minipy312/lib/python3.12/site-packages/ray/thirdparty_files/psutil/__init__.py:2017: RuntimeWarning: shared, active, inactive memory stats couldn't be determined and were set to 0ret = _psplatform.virtual_memory()
--------------------
Ray runtime started.
--------------------
Next stepsTo add another node to this Ray cluster, runray start --address='192.168.0.95:6379'To connect to this Ray cluster:import rayray.init()To submit a Ray job using the Ray Jobs CLI:RAY_ADDRESS='http://127.0.0.1:8265' ray job submit --working-dir . -- python my_script.py

直接安装Auto-Coder以便安装byzer-llm

手册认为直接pip安装 byzer-llm,后面的配置部分会比较麻烦,不如直接安装Auto-Coder,会自动配置好:这样一起执行即可

pip install pip -U
pip install -U auto-coder
ray start --head

输出

ray start --head
Enable usage stats collection? This prompt will auto-proceed in 10 seconds to avoid blocking cluster startup. Confirm [Y/n]:
Usage stats collection is enabled. To disable this, add `--disable-usage-stats` to the command that starts the cluster, or run the following command: `ray disable-usage-stats` before starting the cluster. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details.
Local node IP: 172.25.183.186
--------------------
Ray runtime started.
--------------------
Next stepsTo add another node to this Ray cluster, runray start --address='172.25.183.186:6379'To connect to this Ray cluster:import rayray.init()To submit a Ray job using the Ray Jobs CLI:RAY_ADDRESS='http://127.0.0.1:8265' ray job submit --working-dir . -- python my_script.pySee https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.htmlfor more information on submitting Ray jobs to the Ray cluster.To terminate the Ray runtime, runray stopTo view the status of the cluster, useray statusTo monitor and debug Ray, view the dashboard at127.0.0.1:8265If connection to the dashboard fails, check your firewall settings and network configuration.

事情变的有趣起来,现在让我们组个ray集群吧

加入ray

前面在192.168.1.5启动了ray服务器,这里再加入一台机器

ray start --address='192.168.1.5:6379'

加入完成:

ray status
======== Autoscaler status: 2025-11-01 19:16:47.050543 ========
Node status
---------------------------------------------------------------
Active:1 node_dfa56c248840a471c7b0be2e7dbeb3fb28041a6010a6887cabf07c791 node_f32f6c64e9df213e0c403f833f54d28a0493fe8d800284f46e9f878c
Pending:(no pending nodes)
Recent failures:(no failures)
Resources
---------------------------------------------------------------
Total Usage:0.0/28.0 CPU0.0/1.0 GPU0B/23.64GiB memory0B/10.13GiB object_store_memory
Total Constraints:(no request_resources() constraints)
Total Demands:(no resource demands)

应用

使用byzer-llm 做LLM大模型中转

这个先略,不太会

使用命令:

byzerllm deploy --pretrained_model_type saas/openai \
--cpus_per_worker 0.001 \
--gpus_per_worker 0 \
--num_workers 3 \
--infer_params  saas.api_key=${MODEL_OPENAI_TOKEN} saas.model=gpt-3.5-turbo-0125 \
--model gpt3_5_chat

使用byzer-llm启动本地大模型

在另一个文档里进行记录

调试

报错server_ Failed to start the grpc server

  File "/home/skywalk/minipy312/lib/python3.12/site-packages/ray/_private/node.py", line 796, in _init_gcs_client
    raise RuntimeError(
RuntimeError: Failed to start GCS.  Last 1 lines of error files:
[2025-05-28 19:20:11,803 C 54659 54659] (gcs_server) grpc_server.cc:128:  Check failed: server_ Failed to start the grpc server. The specified port is 6379. This means that Ray's core components will not be able to function correctly. If the server startup error message is `Address already in use`, it indicates the server fails to start because the port is already used by other processes (such as --node-manager-port, --object-manager-port, --gcs-server-port, and ports between --min-worker-port, --max-worker-port). Try running sudo lsof -i :6379 to check if there are other processes listening to the port.

.Please check /tmp/ray/session_2025-05-28_19-20-11_612410_54651/logs/gcs_server.out for details. Last connection error: None

报错Session name session_2025-05-29_09-05-56_429060_59678 does not match persisted value

AssertionError: Session name session_2025-05-29_09-05-56_429060_59678 does not match persisted value b'session_2025-05-28_19-39-24_560590_55048'. Perhaps there was an error connecting to Redis.
清除/tmp/ray/ 

rm -rf /tmp/ray/

依旧报错,删除ray进程

# 杀掉所有 ray processes
ps aux | grep ray | grep -v grep | awk '{print $2}' | xargs kill -9
# 杀掉残留 redis(Ray 会自带一个 redis-server)
ps aux | grep redis | grep -v grep | awk '{print $2}' | xargs kill -9

依旧报错

安装redis

sudo pkg install redis

To setup "redis" you need to edit the configuration file:
      /usr/local/etc/redis.conf

      To run redis from startup, add redis_enable="YES"
      in your /etc/rc.conf.
启动redis,依旧报错

先搁置

在FreeBSD的bash里,使用linux兼容安装的python3.12系统里,可以安装Auto-Coder,但是ray启动不了

启动报错:

  File "/home/skywalk/minipy312/lib/python3.12/site-packages/ray/_private/node.py", line 364, in __init__
    self.start_head_processes()
  File "/home/skywalk/minipy312/lib/python3.12/site-packages/ray/_private/node.py", line 1458, in start_head_processes
    self.start_gcs_server()
  File "/home/skywalk/minipy312/lib/python3.12/site-packages/ray/_private/node.py", line 1225, in start_gcs_server
    process_info = ray._private.services.start_gcs_server(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/skywalk/minipy312/lib/python3.12/site-packages/ray/_private/services.py", line 1515, in start_gcs_server
    stdout_file = open(os.devnull, "w")
                  ^^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 13] Permission denied: '/dev/null'

看了下/dev/null是有可写权限的,但是没办法啊

换到linux 兼容环境下吧

sudo chroot /compat/ubuntu22/ /bin/bash

然后安装

pip install pip -U
pip install -U auto-coder
ray start --head

启动ray报错Ray component worker_ports is trying to use a port number 12868 that is used by other components.

启动命令

ray start --head

    raise ValueError(
ValueError: Ray component worker_ports is trying to use a port number 12868 that is used by other components.
Port information: {'gcs': 'random', 'object_manager': 'random', 'node_manager': 'random', 'gcs_server': 6379, 'client_server': 10001, 'dashboard': 8265, 'dashboard_agent_grpc': 12868, 'dashboard_agent_http': 52365, 'runtime_env_agent': 33302, 'metrics_export': 63589, 'redis_shards': 'random', 'worker_ports': '9998 ports from 10002 to 19999'}
If you allocate ports, please make sure the same port is not used by multiple components.

问题不大,应该是被刚才的进程占用的端口号

在来一次,ray start --head ,哟,它启动了

ray start --head
Usage stats collection is enabled. To disable this, add `--disable-usage-stats` to the command that starts the cluster, or run the following command: `ray disable-usage-stats` before starting the cluster. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details.
Local node IP: 192.168.1.5
/home/skywalk/py312/lib/python3.12/site-packages/ray/thirdparty_files/psutil/__init__.py:2017: RuntimeWarning: shared, active, inactive memory stats couldn't be determined and were set to 0ret = _psplatform.virtual_memory()
--------------------
Ray runtime started.
--------------------
Next stepsTo add another node to this Ray cluster, runray start --address='192.168.1.5:6379'To connect to this Ray cluster:import rayray.init()To submit a Ray job using the Ray Jobs CLI:RAY_ADDRESS='http://127.0.0.1:8265' ray job submit --working-dir . -- python my_script.pySee https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.htmlfor more information on submitting Ray jobs to the Ray cluster.To terminate the Ray runtime, runray stopTo view the status of the cluster, useray statusTo monitor and debug Ray, view the dashboard at127.0.0.1:8265If connection to the dashboard fails, check your firewall settings and network configuration.

加入ray报错RuntimeError: Version mismatch

ray start --address='192.168.1.5:6379'

  File "/home/skywalk/py312/lib/python3.12/site-packages/ray/scripts/scripts.py", line 1164, in start
    node.check_version_info()
  File "/home/skywalk/py312/lib/python3.12/site-packages/ray/_private/node.py", line 454, in check_version_info
    ray._private.utils.check_version_info(
  File "/home/skywalk/py312/lib/python3.12/site-packages/ray/_private/utils.py", line 1569, in check_version_info
    raise RuntimeError(error_message)
RuntimeError: Version mismatch: The cluster was started with:
    Ray: 2.47.1
    Python: 3.12.9
This process on node 172.25.183.186 was started with:
    Ray: 2.47.1
    Python: 3.12.3

升级本机3.12.3到3.12.9,其实应该说重装才对,安装pyenv

curl https://pyenv.run | bash

安装python3.12.9

pyenv install 3.12.9

安装Auto-Coder

​pip install pip -U
pip install -U auto-coder

加入ray

ray start --address='192.168.1.5:6379'

加入ok,查看一下

ray status
======== Autoscaler status: 2025-11-01 19:16:47.050543 ========
Node status
---------------------------------------------------------------
Active:1 node_dfa56c248840a471c7b0be2e7dbeb3fb28041a6010a6887cabf07c791 node_f32f6c64e9df213e0c403f833f54d28a0493fe8d800284f46e9f878c
Pending:(no pending nodes)
Recent failures:(no failures)
Resources
---------------------------------------------------------------
Total Usage:0.0/28.0 CPU0.0/1.0 GPU0B/23.64GiB memory0B/10.13GiB object_store_memory
Total Constraints:(no request_resources() constraints)
Total Demands:(no resource demands)

http://www.jsqmd.com/news/53034/

相关文章:

  • CF1985G-D-Function
  • 2025 年义乌礼品定制厂家最新推荐榜,聚焦企业生产能力、服务水平与市场认可度多维度解析定制商务礼品 / 公司礼品定制 / 纪念品定制 / 定制伴手礼 / 企业礼品定制 / 客户礼品定制公司推荐
  • U636118 二叉搜索树
  • 2025年口碑好的四川种苗基地排名及采购参考
  • 2025 年义乌商务礼品厂家最新推荐榜,全链条能力与定制服务双维度深度解析商务伴手礼/商务礼品网/定制商务礼品/商务福利礼品/商务实用礼品公司推荐
  • egacy(传统) nftables(较新) 和后端ipvs iptables有关系吗
  • 2025 年透声膜厂家最新推荐榜,技术实力与市场口碑深度解析手机防水/MIC 防水/耳机防水/手表防水/摄像头防水/监控防水/无氟防水/ePTFE 防水透声膜公司推荐
  • KlineCharts对接股票k线数据 股票数据源API
  • 2025年抗气爆O形圈厂家权威推荐榜单:橡胶扶正器/V3级胶筒/震击器源头厂家精选
  • 2025年ai智能体推荐公司权威推荐榜单:智能体搜索‌/aigeo‌/AIGEO源头公司精选
  • 右击转到定义,f12会跳转到错误的方法上
  • 2025年企业内部知识库私有化部署服务商全景指南:选型必读——聚焦AI模型与Deepseek方案,贯通知识库与智能BI本地部署的技术演进与厂商矩阵
  • 2025年核心年核心方案商遴选指南:企业智能BI私有化部署厂商与AI知识库(含DeepSeek)部署方案商综合解析
  • 2025企业知识管理破局:AI知识库与智能BI私有化部署实战路径(含知识库部署服务商、AI知识库部署方案商、BI私有化部署方案商全景梳理)
  • [H3C/华三]Super VLAN技术简述与配置
  • 2025年工字钢弯管直销厂家权威推荐榜单:圆管弯管‌/铝型材弯管‌/中频热弯管源头厂家精选
  • 留学中介排名TOP10重磅发布,谁是申请服务标杆
  • 推荐几家ins推广公司,五家效果不错的ins营销服务商盘点
  • 2025企业智能BI与知识库本地化部署实力厂商全景透视:从BI私有化、AI知识库到DeepSeek专有方案,方案,谁在定义数据新基座?
  • 排名榜单重磅来袭,关注优质十大留学机构
  • 国际物流公司优选指南:国际物流主流企业综合对比分析
  • 2025年11月优质代运营公司TOP5推荐:Facebook、LinkedIn、TikTok、Google、INS等全平台覆盖
  • 综合评估结果公布,揭晓十大留学机构排名榜单
  • 2025年11月优质推广获客服务商TOP5推荐:覆盖Facebook、LinkedIn、TikTok、Google、INS等平台
  • 留学中介机构排名TOP10新鲜出炉,这家值得选择
  • 留学申请怎么选,留学中介排行榜TOP10表现突出
  • 2025年昆明清洁公司避坑榜:口碑认证+清洁达标率98%测评推荐
  • 深入解析:Flink 并行度与最大并行度从 0 到弹性扩缩容
  • 留学机构排行榜TOP10:2025申请季弯道超车就靠它!
  • iOS 应用测试的全流程 构建从功能验证到性能诊断的多工具协同体系