当前位置：首页 > news >正文

Rust 微服务性能优化：从 500ms 到 50ms 的实战记录

news 2026/5/3 19:31:37

背景：一个"慢"出来的需求

上个月接手了一个订单查询服务，Go 写的，QPS 大概 2000，P99 延迟 500ms+。业务方天天催："能不能再快点？"

我做了个大胆的决定：用 Rust 重写。

结果？P99 延迟降到 50ms，QPS 提到 15000+，内存占用从 2GB 砍到 200MB。

今天这篇文章，我想还原整个优化过程。不吹牛，只讲干货和踩过的坑。

性能基线：先测再说

优化之前，我花了半天时间做性能分析。工具用的是pprof+flamegraph：

# Go 版本性能分析 go tool pprof http://localhost:8080/debug/pprof/profile?seconds=30

火焰图出来后发现三个瓶颈：

JSON 序列化：占了 35% CPU（用的 encoding/json）
数据库连接：连接池配置不合理，频繁创建销毁
内存分配：每次请求平均分配 150KB，GC 压力大

有了基线，优化才有方向。

第一步：选型与技术栈

Rust 生态这几年成熟了很多。我的技术栈：

[dependencies] # Web 框架 axum = "0.8" tokio = { version = "1", features = ["full"] } # 序列化 serde = { version = "1", features = ["derive"] } serde_json = "1" # 数据库 sqlx = { version = "0.8", features = ["runtime-tokio-rustls", "postgres"] } # 日志 tracing = "0.1" tracing-subscriber = "0.3" # 指标 metrics = "0.24" metrics-exporter-prometheus = "0.16"

为什么选 Axum？

官方介紹：HTTP routing and request-handling library for Rust that focuses on ergonomics and modularity，试过 Actix-web 和 Warp，最后选 Axum 是因为：

和 Tokio 生态深度集成
类型安全的路由系统
中间件写法符合 Rust 直觉

第二步：核心优化点

1. 零拷贝 JSON 解析

Go 的encoding/json要反序列化到 struct，再序列化返回，中间拷贝好几次。

Rust 可以用serde_json::Value做流式处理：

use serde_json::Value; use axum::Json; asyncfn query_order( Query(params): Query<OrderQuery>, db: State<DbPool>, ) -> Result<Json<Value>> { // 直接从数据库取 JSON，不经过中间结构 let result = sqlx::query_scalar::<_, Value>( "SELECT row_to_json(t) FROM ( SELECT * FROM orders WHERE user_id = $1 LIMIT 100 ) t" ) .bind(params.user_id) .fetch_all(&*db) .await?; Ok(Json(Value::Array(result))) }

效果：JSON 处理 CPU 占用从 35% 降到 8%。

2. 连接池调优

sqlx 的连接池默认配置比较保守，我根据压测结果调整：

use sqlx::postgres::PgPoolOptions; async fn init_db(database_url: &str) -> DbPool { PgPoolOptions::new() .max_connections(20) // 根据 CPU 核数调整 .min_connections(5) // 保持最小连接数 .acquire_timeout(Duration::from_secs(5)) .idle_timeout(Duration::from_secs(600)) .max_lifetime(Duration::from_secs(1800)) .connect(database_url) .await .expect("Failed to create pool") }

关键参数：

max_connections：我按CPU 核数 * 2 + 1配置
min_connections：保持 5 个常连，避免冷启动
idle_timeout：10 分钟回收空闲连接

3. 内存池复用

这是 Rust 的杀手锏。我用object_pool复用缓冲区：

use object_pool::Pool; use std::sync::Arc; // 创建缓冲区池 let buffer_pool = Arc::new(Pool::new(100, || Vec::with_capacity(4096))); asyncfn process_request( buffer_pool: Arc<Pool<Vec<u8>>>, ) -> Result<Vec<u8>> { // 从池子里借一个缓冲区 letmut buffer = buffer_pool.acquire(); // 处理数据... buffer.extend_from_slice(b"response data"); // 用完自动归还，不用手动 drop Ok(buffer.to_vec()) }

效果：每次请求的内存分配从 150KB 降到 5KB，GC 压力几乎为零。

4. 异步并发模型

Tokio 的调度器比 Go 的 GMP 更轻量。我用tokio::spawn处理独立任务：

use tokio::task::JoinSet; asyncfn batch_process(orders: Vec<Order>) -> Vec<Result<ProcessedOrder>> { letmut tasks = JoinSet::new(); for order in orders { tasks.spawn(asyncmove { // 每个订单独立处理 process_single_order(order).await }); } // 收集结果 letmut results = Vec::new(); whileletSome(res) = tasks.join_next().await { results.push(res.unwrap()); } results }

注意：JoinSet会自动管理任务生命周期，比手动spawn+join安全得多。

第三步：可观测性建设

性能好了，还得能监控。我上了三件套：

1. 结构化日志

use tracing::{info, instrument}; #[instrument(skip(db), fields(user_id = %query.user_id))] async fn query_order(query: OrderQuery, db: DbPool) -> Result<Order> { info!("Querying order"); // ... }

日志自动带上 trace_id、user_id，排查问题很方便。

2. Prometheus 指标

use metrics::{counter, histogram}; // 记录请求延迟 let start = std::time::Instant::now(); process_request().await?; histogram!("request_duration_seconds", start.elapsed()); // 记录错误数 counter!("request_errors_total", 1);

Grafana 面板长这样：

QPS 曲线
P50/P95/P99 延迟
错误率
连接池使用率

3. 分布式追踪

集成 Jaeger，跨服务调用能串起来：

use tracing_opentelemetry::OpenTelemetryLayer; let subscriber = tracing_subscriber::registry() .with(OpenTelemetryLayer::new(tracer)); tracing::subscriber::set_global_default(subscriber)?;

性能对比数据

指标	Go 版本	Rust 版本	提升
P50 延迟	120ms	15ms	8x
P99 延迟	520ms	50ms	10x
QPS	2,100	15,200	7x
内存占用	2.1GB	180MB	11x
CPU 使用率	45%	12%	3.7x

测试条件：4 核 8G 容器，1000 并发，持续 30 分钟。

踩坑记录

坑 1：生命周期搞不定

// 错误写法 fn get_data(input: &str) -> &str { let result = format!("processed: {}", input); &result // ❌ result 在这里就 drop 了 } // 正确写法 fn get_data(input: &str) -> String { format!("processed: {}", input) // ✅ 返回 owned 数据 }

教训：别跟编译器较劲，它是对的。

坑 2：异步阻塞

// 错误写法 async fn bad_example() { std::thread::sleep(Duration::from_secs(1)); // ❌ 阻塞整个 runtime } // 正确写法 async fn good_example() { tokio::time::sleep(Duration::from_secs(1)).await; // ✅ 异步等待 }

教训：async 函数里别用同步阻塞调用。