别再手动算了!KingbaseES数据库与表大小查询的3个高效命令(附实战截图)
KingbaseES数据库存储分析实战:从基础查询到自动化监控
在数据库运维工作中,存储空间管理是DBA和开发人员无法回避的核心任务。无论是日常性能调优、容量规划,还是紧急故障排查,快速准确地获取数据库和表的空间占用情况都是解决问题的第一步。本文将带您深入掌握KingbaseES数据库存储分析的完整方法论,从基础命令到高级脚本封装,构建一套高效的存储监控体系。
1. 存储分析基础:核心函数与实战解读
KingbaseES提供了一系列内置函数用于存储空间分析,但单纯知道函数名称远远不够。我们需要理解每个函数返回值的含义、适用场景以及如何组合使用这些函数来获取真正有价值的信息。
1.1 数据库级空间分析
sys_database_size()函数是分析整个数据库占用空间的起点。但直接使用原始输出(以字节为单位的整数值)并不直观:
-- 原始字节输出 SELECT sys_database_size('my_database'); -- 人性化格式输出 SELECT sys_size_pretty(sys_database_size('my_database'));在实际工作中,我们通常需要对比多个数据库的大小分布:
SELECT d.datname AS database_name, sys_size_pretty(sys_database_size(d.datname)) AS size_pretty, sys_database_size(d.datname) AS size_bytes FROM sys_database d WHERE d.datname NOT IN ('template0', 'template1') ORDER BY size_bytes DESC;表:数据库大小查询结果关键字段解析
| 字段名 | 数据类型 | 说明 | 典型用途 |
|---|---|---|---|
| datname | text | 数据库名称 | 识别目标数据库 |
| size_pretty | text | 格式化后的大小(如'1.2 GB') | 快速阅读 |
| size_bytes | bigint | 原始字节数 | 精确计算和比较 |
1.2 表级空间分解技术
表空间分析比数据库级更复杂,因为KingbaseES提供了多个不同维度的函数:
-- 基本表大小(不含索引和TOAST) SELECT sys_size_pretty(sys_relation_size('my_table')); -- 表+索引大小 SELECT sys_size_pretty(sys_total_relation_size('my_table')); -- 表+索引+TOAST大小 SELECT sys_size_pretty(pg_table_size('my_table')) AS total_size, sys_size_pretty(pg_indexes_size('my_table')) AS indexes_size, sys_size_pretty(pg_total_relation_size('my_table') - pg_table_size('my_table') - pg_indexes_size('my_table')) AS toast_size;理解这些函数的区别至关重要:
sys_relation_size():仅表数据文件大小pg_indexes_size():该表所有索引的大小总和pg_table_size():表数据+TOAST+空闲空间映射pg_total_relation_size():表数据+索引+TOAST的总和
2. 高级分析技巧:定位存储热点
基础查询只能告诉我们"有多大",而运维人员更需要知道"为什么这么大"和"如何优化"。下面这些技巧可以帮助您深入分析存储使用情况。
2.1 按模式统计空间分布
当数据库包含多个模式(schema)时,按模式分组统计可以快速定位问题区域:
SELECT schemaname, sum(pg_total_relation_size(quote_ident(schemaname) || '.' || quote_ident(tablename)))::bigint AS total_size, pg_size_pretty(sum(pg_total_relation_size(quote_ident(schemaname) || '.' || quote_ident(tablename)))::bigint) AS pretty_size FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema') GROUP BY schemaname ORDER BY total_size DESC;2.2 识别最大的表和索引
以下查询可以列出数据库中最大的10张表及其关联索引:
SELECT t.schemaname || '.' || t.tablename AS table_name, pg_size_pretty(pg_total_relation_size(quote_ident(t.schemaname) || '.' || quote_ident(t.tablename))) AS total_size, pg_size_pretty(pg_table_size(quote_ident(t.schemaname) || '.' || quote_ident(t.tablename))) AS table_size, pg_size_pretty(pg_indexes_size(quote_ident(t.schemaname) || '.' || quote_ident(t.tablename))) AS indexes_size, (SELECT count(*) FROM pg_indexes WHERE tablename = t.tablename AND schemaname = t.schemaname) AS index_count FROM pg_tables t WHERE t.schemaname NOT IN ('pg_catalog', 'information_schema') ORDER BY pg_total_relation_size(quote_ident(t.schemaname) || '.' || quote_ident(t.tablename)) DESC LIMIT 10;2.3 分析表空间使用细节
对于特别大的表,我们还需要了解其内部空间分配情况:
SELECT schemaname, relname, pg_size_pretty(pg_relation_size(relid)) AS data_size, pg_size_pretty(pg_relation_size(relid, 'fsm')) AS fsm_size, pg_size_pretty(pg_relation_size(relid, 'vm')) AS vm_size, pg_size_pretty(pg_relation_size(relid, 'init')) AS init_size FROM pg_stat_user_tables WHERE schemaname = 'public' ORDER BY pg_relation_size(relid) DESC LIMIT 5;表:表空间组成解析
| 组件 | 说明 | 优化建议 |
|---|---|---|
| 主数据文件 | 实际表数据存储 | 考虑分区或归档旧数据 |
| FSM(空闲空间映射) | 跟踪表中可用空间 | 过大可能表明频繁更新/删除 |
| VM(可见性映射) | 加速vacuum操作 | 通常占用空间较小 |
| INIT文件 | 未初始化区域 | 通常可以忽略 |
3. 自动化监控方案
临时查询适合即时分析,但专业的数据库环境需要建立持续的监控机制。以下是几种常见的自动化方案。
3.1 创建监控视图
将常用查询封装为视图可以简化日常监控:
CREATE OR REPLACE VIEW db_size_monitor AS SELECT d.datname AS database_name, pg_size_pretty(pg_database_size(d.datname)) AS size_pretty, pg_database_size(d.datname) AS size_bytes, age(d.datfrozenxid) AS txid_age FROM pg_database d ORDER BY pg_database_size(d.datname) DESC; CREATE OR REPLACE VIEW table_size_monitor AS SELECT schemaname, relname, pg_size_pretty(pg_total_relation_size(relid)) AS total_size, pg_size_pretty(pg_relation_size(relid)) AS data_size, pg_size_pretty(pg_indexes_size(relid)) AS indexes_size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid) - pg_indexes_size(relid)) AS toast_size, n_live_tup AS live_rows, n_dead_tup AS dead_rows FROM pg_stat_user_tables ORDER BY pg_total_relation_size(relid) DESC;3.2 定期监控脚本
将查询封装为Shell脚本,配合cron实现定期监控:
#!/bin/bash # 监控数据库大小 psql -U monitor_user -d postgres -c " SELECT d.datname AS database, pg_size_pretty(pg_database_size(d.datname)) AS size, age(d.datfrozenxid) AS txid_age FROM pg_database d ORDER BY pg_database_size(d.datname) DESC; " > /var/log/db_size_$(date +%Y%m%d).log # 监控大表变化 psql -U monitor_user -d my_database -c " SELECT schemaname, relname, pg_size_pretty(pg_total_relation_size(relid)) AS size, pg_total_relation_size(relid) AS size_bytes FROM pg_stat_user_tables ORDER BY pg_total_relation_size(relid) DESC LIMIT 20; " >> /var/log/db_size_$(date +%Y%m%d).log3.3 历史趋势分析
建立历史数据表,记录空间使用变化趋势:
CREATE TABLE db_growth_history ( check_time timestamp PRIMARY KEY, total_size_bytes bigint, largest_db text, largest_db_size_bytes bigint, table_count int, avg_table_size_bytes bigint ); -- 每日收集统计信息 INSERT INTO db_growth_history SELECT now(), sum(pg_database_size(datname)), (SELECT datname FROM pg_database ORDER BY pg_database_size(datname) DESC LIMIT 1), (SELECT pg_database_size(datname) FROM pg_database ORDER BY pg_database_size(datname) DESC LIMIT 1), (SELECT count(*) FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema')), (SELECT avg(pg_total_relation_size(relid)) FROM pg_stat_user_tables) FROM pg_database;4. 实战优化案例
掌握了监控方法后,我们来看几个实际优化案例,了解如何根据空间分析结果采取行动。
4.1 大表分区优化
当发现某个表异常庞大时,分区通常是首选方案。以下是一个日志表的分区过程:
-- 原始表结构 CREATE TABLE app_logs ( log_id bigserial, log_time timestamp, app_name text, log_level text, message text, PRIMARY KEY (log_id) ); -- 转换为按日期分区 CREATE TABLE app_logs ( log_id bigserial, log_time timestamp, app_name text, log_level text, message text, PRIMARY KEY (log_id, log_time) ) PARTITION BY RANGE (log_time); -- 创建每月分区 CREATE TABLE app_logs_202301 PARTITION OF app_logs FOR VALUES FROM ('2023-01-01') TO ('2023-02-01'); CREATE TABLE app_logs_202302 PARTITION OF app_logs FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');分区后,可以单独管理每个分区的存储,例如将旧分区移动到较慢的存储设备上。
4.2 索引优化
通过分析发现某些表的索引占用空间超过数据本身:
-- 查找索引比表数据大的表 SELECT t.schemaname || '.' || t.tablename AS table_name, pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size, pg_size_pretty(pg_relation_size(i.indrelid)) AS table_size, i.indexrelname AS index_name FROM pg_stat_user_tables t JOIN pg_indexes i ON t.schemaname = i.schemaname AND t.relname = i.tablename WHERE pg_relation_size(i.indexrelid) > pg_relation_size(i.indrelid) ORDER BY (pg_relation_size(i.indexrelid) - pg_relation_size(i.indrelid)) DESC;对于这类情况,可以考虑:
- 删除重复或很少使用的索引
- 重建索引使用更紧凑的存储方式
- 考虑使用部分索引或条件索引
4.3 TOAST表优化
当表的TOAST部分异常大时,可能表明存在存储效率问题:
-- 查找TOAST占比高的表 SELECT schemaname, relname, pg_size_pretty(pg_total_relation_size(relid)) AS total_size, pg_size_pretty(pg_relation_size(relid)) AS data_size, pg_size_pretty(pg_indexes_size(relid)) AS indexes_size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid) - pg_indexes_size(relid)) AS toast_size, round((pg_total_relation_size(relid) - pg_relation_size(relid) - pg_indexes_size(relid)) * 100.0 / pg_total_relation_size(relid), 2) AS toast_percentage FROM pg_stat_user_tables WHERE pg_total_relation_size(relid) > 0 ORDER BY toast_percentage DESC LIMIT 10;对于TOAST占比高的表,可以考虑:
- 审查大字段的使用方式
- 考虑使用外部存储或文件系统存储大对象
- 对大字段数据进行压缩后再存储
