当前位置: 首页 > news >正文

Iceberg Rest Catalog + OSS 实践踩坑记录:Polaris x-amz-content-sha256 报错 与 Nessie 配置

最近在做查询引擎Iceberg 性能测试,主要是环境准备、测试集准备、性能测试开展。
本篇只包括环境准备部分,记录下环境准备过程,几个方面:

  1. Catalog:尽量贴合生产,需要主流的catalog type,且性能测试在国内,所以Glue、Snowflake Catalog 等都用不了,只能自己部署1套catalog 服务。
  2. Storage:性能测试机器在国内,海外的对象存储是用不了了(比如S3,Azure,GCS),只能用国内的(比如OSS,COS,OBS)且可能由于catalog server没支持到位,只能走S3协议。
  3. Query Engine:保证选的catalog type 几种查询引擎都支持。

过滤以上几个条件,环境情况如下

Type System
catalog type Rest catalog Polaris, Nessie
storage scheme S3 OSS
query engine Doris, Trino

以下集成情况二选一

  1. Doris/Trino + Polaris + OSS
  2. Doris/Trino + Nessie + OSS

Polaris
先说下结论,最新Polaris版本(1.2.0)+ OSS(S3协议) 跑不起来,会有个报错

2025-12-12 17:13:45,460 INFO  [org.apa.pol.ser.exc.IcebergExceptionMapper] [4a2120d6-8520-441d-b502-a090f890b03d_0000000000000000030,POLARIS] [,,,] (executor-thread-1) Handling runtimeException aws-chunked encoding is not supported with the specified x-amz-content-sha256 value. (Service: S3, Status Code: 400, Request ID: 693C4D496D7461373771398C) (SDK Attempt Count: 1)

参考两个文档
0002-00000427
使用AWS SDK访问OSS
大概意思这个x-amz-content-sha256 header 不能传,Polaris也没配置参数可以控制这个。

在最新Polaris版本(1.2.0)加了个开关stsUnavailable 支持 Polaris 适配所有支持S3协议的对象存储。在1.2.0 之前,因为必须要走标准的S3 STS鉴权,所以老版本Polaris OSS肯定用不了。

这里有个小插曲,release note里stsUnavailable这个参数拼写错了,导致一直走STS鉴权,花了点时间折腾了下。最终通过日志发现这个参数没设置上,文档上拼写错的,复制错了。
Polaris 1.2.0 release note
image

当然,顺手提个PR fix下
https://github.com/apache/polaris/pull/3262

附上Polaris + OSS的docker yaml
参考quickstart和ceph example 改的


services:polaris:image: apache/polaris:latestports:# API port- "8181:8181"# Management port (metrics and health checks)- "8182:8182"# Optional, allows attaching a debugger to the Polaris JVM- "5005:5005"environment:JAVA_DEBUG: trueJAVA_DEBUG_PORT: "*:5005"AWS_REGION: cn-beijingAWS_ACCESS_KEY_ID: xxxxAWS_SECRET_ACCESS_KEY: xxxxAWS_ENDPOINT: http://oss-cn-beijing-internal.aliyuncs.comPOLARIS_BOOTSTRAP_CREDENTIALS: POLARIS,root,s3cr3tpolaris.realm-context.realms: POLARISquarkus.otel.sdk.disabled: "true"healthcheck:test: ["CMD", "curl", "http://localhost:8182/q/health"]interval: 2stimeout: 10sretries: 10start_period: 10spolaris-setup:image: alpine/curldepends_on:polaris:condition: service_healthyenvironment:- CLIENT_ID=${ROOT_CLIENT_ID:-root}- CLIENT_SECRET=${ROOT_CLIENT_SECRET:-s3cr3t}- CATALOG_NAME=${CATALOG_NAME:-quickstart_catalog}- REALM=${POLARIS_REALM:-POLARIS}- BASE_LOCATION=${BASE_LOCATION:-s3://xxx/polaris_warehouse}- S3_ENDPOINT=${S3_ENDPOINT:-http://oss-cn-beijing-internal.aliyuncs.com}entrypoint: /bin/shcommand:- -c- |set -exsleep 10sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositoriesapk add --no-cache jqecho "Obtaining root access token..."TOKEN_RESPONSE=$$(curl -s -X POST http://polaris:8181/api/catalog/v1/oauth/tokens \-H 'Content-Type: application/x-www-form-urlencoded' \-d "grant_type=client_credentials&client_id=$${CLIENT_ID}&client_secret=$${CLIENT_SECRET}&scope=PRINCIPAL_ROLE:ALL")TOKEN=$$(echo $$TOKEN_RESPONSE | jq -r '.access_token')echo "Obtained access token"echo "Creating catalog '$$CATALOG_NAME' in realm $$REALM..."PAYLOAD='{"catalog": {"name": "'$$CATALOG_NAME'","type": "INTERNAL","readOnly": false,"properties": {"default-base-location": "'$$BASE_LOCATION'"},"storageConfigInfo": {"storageType": "S3","allowedLocations": ["'$$BASE_LOCATION'", "'$$BASE_LOCATION'/"],"endpoint": "'$$S3_ENDPOINT'","region": "cn-beijing","endpointInternal": "'$$S3_ENDPOINT'","pathStyleAccess": false,"stsUnavailable": true}}}'curl -s -X POST http://polaris:8181/api/management/v1/catalogs \-H "Authorization: Bearer $$TOKEN" \-H "Accept: application/json" \-H "Content-Type: application/json" \-H "Polaris-Realm: $$REALM" \-d "$$PAYLOAD" > /dev/nullecho "✅ Catalog created"echo ""echo "Creating principal 'quickstart_user'..."PRINCIPAL_RESPONSE=$$(curl -s -X POST http://polaris:8181/api/management/v1/principals \-H "Authorization: Bearer $$TOKEN" \-H "Polaris-Realm: $$REALM" \-H "Content-Type: application/json" \-d '{"principal": {"name": "quickstart_user", "properties": {}}}')USER_CLIENT_ID=$$(echo $$PRINCIPAL_RESPONSE | jq -r '.credentials.clientId')USER_CLIENT_SECRET=$$(echo $$PRINCIPAL_RESPONSE | jq -r '.credentials.clientSecret')echo "✅ Principal created with clientId: $$USER_CLIENT_ID"echo "Creating principal role 'quickstart_user_role'..."curl -s -X POST http://polaris:8181/api/management/v1/principal-roles \-H "Authorization: Bearer $$TOKEN" \-H "Polaris-Realm: $$REALM" \-H "Content-Type: application/json" \-d '{"principalRole": {"name": "quickstart_user_role", "properties": {}}}' > /dev/nullecho "✅ Principal role created"echo "Creating catalog role 'quickstart_catalog_role'..."curl -s -X POST http://polaris:8181/api/management/v1/catalogs/$$CATALOG_NAME/catalog-roles \-H "Authorization: Bearer $$TOKEN" \-H "Polaris-Realm: $$REALM" \-H "Content-Type: application/json" \-d '{"catalogRole": {"name": "quickstart_catalog_role", "properties": {}}}' > /dev/nullecho "✅ Catalog role created"echo "Assigning principal role to principal..."curl -s -X PUT http://polaris:8181/api/management/v1/principals/quickstart_user/principal-roles \-H "Authorization: Bearer $$TOKEN" \-H "Polaris-Realm: $$REALM" \-H "Content-Type: application/json" \-d '{"principalRole": {"name": "quickstart_user_role"}}' > /dev/nullecho "✅ Principal role assigned"echo "Assigning catalog role to principal role..."curl -s -X PUT http://polaris:8181/api/management/v1/principal-roles/quickstart_user_role/catalog-roles/$$CATALOG_NAME \-H "Authorization: Bearer $$TOKEN" \-H "Polaris-Realm: $$REALM" \-H "Content-Type: application/json" \-d '{"catalogRole": {"name": "quickstart_catalog_role"}}' > /dev/nullecho "✅ Catalog role assigned"echo "Granting CATALOG_MANAGE_CONTENT privilege..."curl -s -X PUT http://polaris:8181/api/management/v1/catalogs/$$CATALOG_NAME/catalog-roles/quickstart_catalog_role/grants \-H "Authorization: Bearer $$TOKEN" \-H "Polaris-Realm: $$REALM" \-H "Content-Type: application/json" \-d '{"type": "catalog", "privilege": "CATALOG_MANAGE_CONTENT"}' > /dev/nullecho "✅ Privileges granted"echo ""echo "=========================================="echo "🎉 Polaris Quickstart Setup Complete!"echo "=========================================="echo ""echo "Catalog: $$CATALOG_NAME"echo "  Storage: S3 (MinIO)"echo "  Location: s3://bucket123"echo "  MinIO UI: http://localhost:9001"echo ""echo "Root credentials:"echo "  Client ID:     $$CLIENT_ID"echo "  Client Secret: $$CLIENT_SECRET"echo ""echo "User credentials:"echo "  Client ID:     $$USER_CLIENT_ID"echo "  Client Secret: $$USER_CLIENT_SECRET"echo ""echo "Polaris main APIs:"echo "  - Iceberg REST:   http://localhost:8181/api/catalog/v1"echo "  - Management:     http://localhost:8181/api/management/v1"echo "  - Generic Tables: http://localhost:8181/api/polaris/v1"echo ""echo "Polaris admin APIs:"echo "  - Health check:   http://localhost:8182/q/health"echo "  - Metrics:        http://localhost:8182/q/metrics"echo ""echo "To get started with Spark:"echo "  spark-sql \\"echo "    --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.0,org.apache.iceberg:iceberg-aws-bundle:1.10.0 \\"echo "    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\"echo "    --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \\"echo "    --conf spark.sql.catalog.polaris.type=rest \\"echo "    --conf spark.sql.catalog.polaris.warehouse=$$CATALOG_NAME \\"echo "    --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \\"echo "    --conf spark.sql.catalog.polaris.credential=$$USER_CLIENT_ID:$$USER_CLIENT_SECRET \\"echo "    --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \\"echo "    --conf spark.sql.catalog.polaris.s3.endpoint=http://localhost:9000 \\"echo "    --conf spark.sql.catalog.polaris.s3.path-style-access=true \\"echo "    --conf spark.sql.catalog.polaris.s3.access-key-id=minio_root \\"echo "    --conf spark.sql.catalog.polaris.s3.secret-access-key=m1n1opwd \\"echo "    --conf spark.sql.catalog.polaris.client.region=irrelevant \\"echo "    --conf spark.sql.defaultCatalog=polaris"echo ""echo "To get started with REST API:"echo "  # Get a token"echo "  export TOKEN=\$$(curl -s -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \\"echo "    -d 'grant_type=client_credentials' \\"echo "    -d 'client_id=$$USER_CLIENT_ID' \\"echo "    -d 'client_secret=$$USER_CLIENT_SECRET' \\"echo "    -d 'scope=PRINCIPAL_ROLE:ALL' \\"echo "    | jq -r '.access_token')"echo ""echo "  # Create a namespace"echo "  curl -X POST http://localhost:8181/api/catalog/v1/$$CATALOG_NAME/namespaces \\"echo "    -H \"Authorization: Bearer \$$TOKEN\" \\"echo "    -H 'Content-Type: application/json' \\"echo "    -d '{\"namespace\": [\"my_namespace\"], \"properties\": {}}'"echo ""echo "  # List namespaces"echo "  curl -X GET http://localhost:8181/api/catalog/v1/$$CATALOG_NAME/namespaces \\"echo "    -H \"Authorization: Bearer \$$TOKEN\""echo ""echo "=========================================="

Nessie
这个也说下结论,能跑起来。
首先先看下sha256 这种header是怎么解决的
Nessie有个开关可以控制这块
image
所以问题迎刃而解了

附上Nessie + OSS的docker yaml

version: '3'services:nessie:image: ghcr.io/projectnessie/nessiecontainer_name: nessieports:- "19120:19120"environment:- nessie.catalog.default-warehouse=warehouse- nessie.catalog.warehouses.warehouse.location=s3://mybucket/my-lakehouse/- nessie.catalog.warehouses.zgx.location=s3://xxxxx/iceberg_warehouse/- nessie.catalog.service.s3.default-options.endpoint=http://oss-cn-beijing-internal.aliyuncs.com- nessie.catalog.service.s3.default-options.access-key=urn:nessie-secret:quarkus:nessie.catalog.secrets.access-key- nessie.catalog.service.s3.default-options.path-style-access=false- nessie.catalog.service.s3.default-options.chunked-encoding-enabled=false- nessie.catalog.service.s3.default-options.auth-type=STATIC- nessie.catalog.secrets.access-key.name=xxx- nessie.catalog.secrets.access-key.secret=xxx- nessie.catalog.service.s3.default-options.region=cn-beijing- nessie.server.authentication.enabled=false- nessie.catalog.service.s3.default-options.request-signing-enabled=falsenetworks:nessie-rest:networks:nessie-rest:

Trino 测试 nessie 连通性
参考
https://projectnessie.org/nessie-latest/trino/?h=client+temp#starter-configuration
获取对应的配置

NESSIE_BASE_URL="http://127.0.0.1:19120/"
curl "${NESSIE_BASE_URL}/iceberg-ext/v1/client-template/trino?format=static"

补充配置 s3.aws-access-key s3.aws-secret-key

Trino 就可以正常读Iceberg表了

[trino@dec7c1a34cb6 /]$ trino --catalog nessie
trino> use zgx;
USE
trino:zgx> show tables;Table
----------------------unpartitioned_tableunpartitioned_table1unpartitioned_table2unpartitioned_table3
(4 rows)Query 20251214_145124_00043_v9qpy, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0.24 [4 rows, 417B] [16 rows/s, 1.72KiB/s]trino:zgx> select * from unpartitioned_table;col1 | col2 |        col3         |  col4  |    col5    |    col6    | col7  |    col8    |            col9
------+------+---------------------+--------+------------+------------+-------+------------+----------------------------true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456true |  101 | 9223372036854775807 | 123.45 | 987.654321 | 12345.6789 | xxxxx | 2025-12-14 | 2025-12-14 22:30:00.123456
(5 rows)Query 20251214_145128_00044_v9qpy, FINISHED, 1 node
Splits: 5 total, 5 done (100.00%)
0.26 [5 rows, 27.5KiB] [19 rows/s, 107KiB/s]trino:zgx>

附录
https://trino.io/docs/current/connector/iceberg.html
https://projectnessie.org/nessie-latest/configuration
https://projectnessie.org/nessie-latest/trino/

http://www.jsqmd.com/news/89594/

相关文章:

  • 2025跨境电商人必备!这十款英语学习APP让你沟通无国界 - 品牌测评鉴赏家
  • 告别哑巴英语!这些APP让你开口就惊艳 - 品牌测评鉴赏家
  • 苏州装修大揭秘!透明报价 0 增项公司全搜罗 - 品牌测评鉴赏家
  • Iceberg Rest Catalog + OSS 实践踩坑记录:解决Polaris x-amz-content-sha256 报错 与 Nessie 配置详解
  • SQLite Glob 子句详解
  • prompt 提示词
  • 日语自学神器大揭秘!这5款宝藏软件让你从零基础轻松进阶 - 品牌测评鉴赏家
  • 桌面开发,在线%CRM,客户关系管理%系统,基于vs2022,c#,winform,sql server数据库
  • 大学生必藏!10 款 APP 覆盖学习到生活,HelloTalk 带练 + 效率翻倍 - 品牌测评鉴赏家
  • 如何快速预览STL文件:终极3D模型预览工具指南
  • 5大Bilibili-Evolved插件推荐:告别繁琐操作,打造个性化B站体验
  • html综合教程
  • 完整教程:Qt Sensors 传感器框架详解
  • Applite:重新定义macOS软件管理的智能图形化工具
  • 2025年非遗膏方批发商排行榜,这五家实力领跑!非遗膏方/阿胶类产品/膏方/阿胶产品/阿胶/阿胶类/膏方类产品/阿胶糕非遗膏方现货推荐 - 品牌推荐师
  • FPGA实战:一段让我重新认识时序收敛的FPGA迁移之旅
  • 2025年选购指南:机械手数控车床品牌排行深度解析,机械手数控车床/数控机床/4轴数控机床/空调配件数控机床数控车床门店排行榜 - 品牌推荐师
  • JSP 国际化
  • MobaXterm:运维高手的终极利器
  • 44、开源工作许可与相关工具指南
  • Rust 泛型与特性
  • XML Schema 元素
  • ionic 单选框操作指南
  • 精通 Flutter 状态管理:从 Provider 到 Riverpod 的全维度实战
  • AI大模型入门到实战系列(九)主题建模
  • python自动化006:app自动化元素定位方式
  • 杨建允:AI搜索趋势对留学服务行业的影响
  • 一文搞懂目标检测模型
  • 当BI遇见AI Agent:衡石科技如何重塑企业数据分析工作流
  • 别再全量拉表了兄弟:一篇讲透增量数据处理与 CDC 的实战指南