当前位置: 首页 > news >正文

在python 3.14 容器中安装和使用chdb包

1.docker exec -it登录容器用pip install 命令安装

sudo docker exec -it python3143 bash root@DESKTOP-59T6U68:/# pip install chdb Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting chdb Downloading https://pypi.tuna.tsinghua.edu.cn/packages/23/28/f3aa551b4af78b8ac967c191407301eff5906dc7239ddb232d4d34bf8ad4/chdb-4.0.1-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (149.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.4/149.4 MB 31.3 MB/s 0:00:04 Collecting pandas<3.0.0,>=2.1.0 (from chdb) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/15/b2/0e62f78c0c5ba7e3d2c5945a82456f4fac76c480940f805e0b97fcbc2f65/pandas-2.3.3-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (12.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.3/12.3 MB 51.4 MB/s 0:00:00 Collecting pyarrow>=13.0.0 (from chdb) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9c/86/95c61ad82236495f3c31987e85135926ba3ec7f3819296b70a68d8066b49/pyarrow-23.0.0-cp314-cp314-manylinux_2_28_x86_64.whl (47.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.6/47.6 MB 33.8 MB/s 0:00:01 Collecting numpy>=1.26.0 (from pandas<3.0.0,>=2.1.0->chdb) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/5d/6c/7f237821c9642fb2a04d2f1e88b4295677144ca93285fd76eff3bcba858d/numpy-2.4.2-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.6/16.6 MB 43.4 MB/s 0:00:00 Collecting python-dateutil>=2.8.2 (from pandas<3.0.0,>=2.1.0->chdb) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ec/57/56b9bcc3c9c6a792fcbaf139543cee77261f3651ca9da0c93f5c1221264b/python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB) Collecting pytz>=2020.1 (from pandas<3.0.0,>=2.1.0->chdb) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/81/c4/34e93fe5f5429d7570ec1fa436f1986fb1f00c3e0f43a589fe2bbcd22c3f/pytz-2025.2-py2.py3-none-any.whl (509 kB) Collecting tzdata>=2022.7 (from pandas<3.0.0,>=2.1.0->chdb) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c7/b0/003792df09decd6849a5e39c28b513c06e84436a54440380862b5aeff25d/tzdata-2025.3-py2.py3-none-any.whl (348 kB) Collecting six>=1.5 (from python-dateutil>=2.8.2->pandas<3.0.0,>=2.1.0->chdb) Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl (11 kB) Installing collected packages: pytz, tzdata, six, pyarrow, numpy, python-dateutil, pandas, chdb Successfully installed chdb-4.0.1 numpy-2.4.2 pandas-2.3.3 pyarrow-23.0.0 python-dateutil-2.9.0.post0 pytz-2025.2 six-1.17.0 tzdata-2025.3 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. [notice] A new release of pip is available: 25.3 -> 26.0.1 [notice] To update, run: pip install --upgrade pip

2.导入后就可以用chdb.query查询

root@DESKTOP-59T6U68:/# python3 Python 3.14.3 (main, Feb 4 2026, 20:08:31) [GCC 14.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import chdb >>> res = chdb.query('select version()', 'Pretty'); print(res) ┏━━━━━━━━━━━┓ ┃ version() ┃ ┡━━━━━━━━━━━┩ 1. │ 25.8.2.1 │ └───────────┘ >>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)'); print(res) "13522500" >>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)','Pretty'); print(res) ┏━━━━━━━━━━┓ ┃ mpz_sum ┃ ┡━━━━━━━━━━┩ 1. │ 13522500 │ └──────────┘ >>> >>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)', 'JSON'); print(res) { "meta": [ { "name": "mpz_sum", "type": "Nullable(String)" } ], "data": [ { "mpz_sum": "13522500" } ], "rows": 1, "statistics": { "elapsed": 0.037387861, "rows_read": 0, "bytes_read": 0 } } >>> res = chdb.query('select * from file("/par/duck.parquet", Parquet)','PrettyCompact'); print(res) ┌─mpz_sum──┐ 1. │ 13522500 │ └──────────┘

上面查询了版本号和Parquet文件,query的第二个参数用来指定不同的格式,不加参数默认是无标题的字符串,加PrettyCompact参数才是clickhouse客户端默认的格式。
如果要查询带\r的tsv文件,需要设置参数input_format_tsv_crlf_end_of_line,clickhouse客户端中有两种方式,一种是单独用set命令设置,另一种是在查询语句中用SETTINGS子句。

:) set input_format_tsv_crlf_end_of_line=true; SET input_format_tsv_crlf_end_of_line = true Query id: 60a7f1df-aad9-41fc-b100-d2c507da7949 Ok. 0 rows in set. Elapsed: 0.009 sec. :) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) ; SELECT * FROM file('/mnt/c/d/data/cxy.tsv', Tsv) Query id: 5246c40d-2915-412c-8f28-97912d88e14f ┌─id─┬─name────┬─language───┐ 1. │ 1 │ Joe │ Java │ 2. │ 2 │ Alice │ JavaScript │ 3. │ 3 │ Leon │ C/C++ │ 4. │ 4 │ William │ Java │ 5. │ 5 │ James │ C/C++ │ 6. │ 6 │ Enson │ C/C++ │ └────┴─────────┴────────────┘ 6 rows in set. Elapsed: 0.067 sec. :) set input_format_tsv_crlf_end_of_line=false; SET input_format_tsv_crlf_end_of_line = false Query id: 591cf022-de92-4165-80e1-9744b8bdf93c Ok. 0 rows in set. Elapsed: 0.000 sec. :) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) ; SELECT * FROM file('/mnt/c/d/data/cxy.tsv', Tsv) Query id: d56dd6fe-b217-4c0b-b3a0-344202ed0fd9 Elapsed: 0.071 sec. Received exception: Code: 117. DB::Exception: You have carriage return (\r, 0x0D, ASCII 13) at end of first row. It's like your input data has DOS/Windows style line separators, that are illegal in TabSeparated format. You must transform your file to Unix format. But if you really need carriage return at end of string value of last column, you need to escape it as \r or else enable setting 'input_format_tsv_crlf_end_of_line': (while reading header): (in file/uri /mnt/c/d/data/cxy.tsv): While executing ParallelParsingBlockInputFormat: While executing File. (INCORRECT_DATA) :) :) select * from file('/mnt/c/d/data/cxy.tsv', Tsv) settings input_format_tsv_crlf_end_of_line =true; SELECT * FROM file('/mnt/c/d/data/cxy.tsv', Tsv) SETTINGS input_format_tsv_crlf_end_of_line = true Query id: 6bb8ec95-c597-4716-aee3-b60a001c76f2 ┌─id─┬─name────┬─language───┐ 1. │ 1 │ Joe │ Java │ 2. │ 2 │ Alice │ JavaScript │ 3. │ 3 │ Leon │ C/C++ │ 4. │ 4 │ William │ Java │ 5. │ 5 │ James │ C/C++ │ 6. │ 6 │ Enson │ C/C++ │ └────┴─────────┴────────────┘ 6 rows in set. Elapsed: 0.008 sec. :) \q Bye.

在python中只能用后一种

>>> import chdb >> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv)','Pretty'); print(res) Traceback (most recent call last): File "<python-input-7>", line 1, in <module> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv)','Pretty'); print(res) File "/usr/local/lib/python3.14/site-packages/chdb/__init__.py", line 205, in query res = conn.query(sql, output_format, params=params) RuntimeError: Code: 636. DB::Exception: The table structure cannot be extracted from a Tsv format file. Error: Code: 117. DB::Exception: You have carriage return (\r, 0x0D, ASCII 13) at end of first row. It's like your input data has DOS/Windows style line separators, that are illegal in TabSeparated format. You must transform your file to Unix format. But if you really need carriage return at end of string value of last column, you need to escape it as \r or else enable setting 'input_format_tsv_crlf_end_of_line'. (INCORRECT_DATA) (version 25.8.2.1). You can specify the structure manually: (in file/uri /par/data/cxy.tsv). (CANNOT_EXTRACT_TABLE_STRUCTURE) >>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','Pretty'); print(res) ┏━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ id ┃ name ┃ language ┃ ┡━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩ 1. │ 1 │ Joe │ Java │ ├────┼─────────┼────────────┤ 2. │ 2 │ Alice │ JavaScript │ ├────┼─────────┼────────────┤ 3. │ 3 │ Leon │ C/C++ │ ├────┼─────────┼────────────┤ 4. │ 4 │ William │ Java │ ├────┼─────────┼────────────┤ 5. │ 5 │ James │ C/C++ │ ├────┼─────────┼────────────┤ 6. │ 6 │ Enson │ C/C++ │ └────┴─────────┴────────────┘ >>> >>> res = chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','PrettyCompact'); print(res) ┌─id─┬─name────┬─language───┐ 1. │ 1 │ Joe │ Java │ 2. │ 2 │ Alice │ JavaScript │ 3. │ 3 │ Leon │ C/C++ │ 4. │ 4 │ William │ Java │ 5. │ 5 │ James │ C/C++ │ 6. │ 6 │ Enson │ C/C++ │ └────┴─────────┴────────────┘ >>> chdb.query('select * from file("/par/data/cxy.tsv", Tsv) settings input_format_tsv_crlf_end_of_line=true','PrettyCompact') ┌─id─┬─name────┬─language───┐ 1. │ 1 │ Joe │ Java │ 2. │ 2 │ Alice │ JavaScript │ 3. │ 3 │ Leon │ C/C++ │ 4. │ 4 │ William │ Java │ 5. │ 5 │ James │ C/C++ │ 6. │ 6 │ Enson │ C/C++ │ └────┴─────────┴────────────┘

查询结果可存入变量,然后用print()输出,也可以执行查询时直接输出。

http://www.jsqmd.com/news/375205/

相关文章:

  • Markdown语法学习笔记1快捷键
  • 小白从零开始勇闯人工智能:机器学习初级篇(PCA素材降维)
  • 题解:P15206 [SWERC 2018] Dishonest Driver
  • 题解:AT_pakencamp_2024_day1_c One Half
  • Burp Suite 入门文档(官方翻译)
  • PyTorch项目合集一
  • springboot民宿管理系统--附源码32900 - 详解
  • 免费城市夜景视频素材网站推荐
  • TikTok Shop东南亚2026退货新规来袭!海外仓这样布局抢占先机
  • 完整教程:MySQL数据可视化实战:从查询到图表全攻略
  • 面向大模型开发:在项目中使用 TOON 的实践与流式处理深度解析:原理、实战与踩坑记录
  • 3:【GitHub连接】Connection timed out port 22 → 改用443端口SSH(公司/校园网2026常见)
  • 探索 LDO 电路:模拟集成电路设计的实践之旅
  • 2:【新手最坑】git push HTTPS vs SSH反复失败怎么彻底统一
  • 4:【Git clone】fatal: unable to access / timeout / proxy设置
  • 如何在大数据领域运用 OLAP 提升业务洞察
  • 写论文是看完一堆文献后再写,还是边看边写
  • P10720 [GESP202406 五级] 小杨的幸运数字 欧拉筛
  • 5:【Git】remote origin already exists 如何安全修改URL
  • 1:【GitHub 2026】Permission denied (publickey) / 403 一键解决(SSH ed25519 + ssh-agent)
  • [幻灯]《软件方法》引导AI03-业务流程建模和改进
  • GLUT
  • 2024智能能源管理新趋势:上下文工程将成为提示工程架构师的核心能力
  • [幻灯片]《软件方法》引导AI全流程开发幻灯片02-愿景
  • 智能营销AI平台建设:如何设计弹性可扩展架构?
  • 国自然申请书卡壳了怎么办?
  • 【Docker基础篇】WSL2+Docker Desktop完整配置指南:Windows也能拥有原生Linux开发体验
  • 2月12号
  • Windows Hyper-V 安装 Ubuntu 系统完整教程(避坑版)
  • When Tables Go Crazy Evaluating Multimodal Models on French Financial Documents