当前位置: 首页 > news >正文

Hive中的排序与分桶技术详解

Hive排序与分桶技术综述

❒ ORDER BY

ORDER BY用于对 SQL 查询的最终输出结果进行全局排序。它通过一个Reducer任务完成排序,确保全局有序性。然而,当输入数据规模较大时,单一的 Reducer 任务可能导致计算时间较长。默认情况下,ORDER BY 按照递增顺序(ascending)进行排序。例如,以下 SQL 语句使用 ORDER BY 对 cust_id 进行排序:select distinct cust_id,id_no,part_date from ads_api_cda_basic_info_parquet_pt order by cust_id。

❒ SORT BY

SORT BY并非直接对 SQL 的最终输出结果进行排序,而是在数据进入 Reducer 之前,对Map 端的输出数据按照指定字段进行预排序。值得注意的是,SORT BY 不会改变 Reducer 的数量。它主要确保每个 Reducer 内部的数据维持一定的顺序,但并不保证 SQL 查询结果的全局有序性。以下是一个使用 SORT BY 的示例 SQL 语句:select distinct cust_id, id_no, part_date from ads_api_cda_basic_info_parquet_pt SORT BY cust_id。

❒ DISTRIBUTE BY

DISTRIBUTE BY是一种分发规则,它决定了将 MAP 端的输出记录分配给哪个 reducer 进行进一步处理。这种分配不会改变 reducer 的数量。在默认情况下,采用 hash 取模算法将具有相同 Distribute By 字段的 MAP 端输出数据分发给同一个 reducer。然而,需要注意的是,Distribute By 并不保证每个 reducer 内部的所有记录都保持顺序。以下是一个使用 DISTRIBUTE BY 的示例 SQL 语句:select distinct cust_id, id_no, part_date from ads_api_cda_basic_info_parquet_pt distribute by cust_id。

DISTRIBUTE BY 与SORT BY的联合应用,可确保每个REDUCER 内部处理的所有记录都维持原有顺序。值得注意的是,Distribute By 的分区字段与 SORT BY 的排序字段可以互不相同。例如,以下 SQL 语句展示了这种配合使用的方式:select distinct cust_id, id_no, part_date from ads_api_cda_basic_info_parquet_pt distribute by cust_id sort by id_no。通过恰当选择 DISTRIBUTE BY 字段,并辅以 SORT BY,可以有效地解决一系列问题,包括 Map 输出文件大小的不均、Reduce 输出文件大小的不均、小文件的过多以及文件过大等。

❒ CLUSTER BY

CLUSTER BY的作用相当于同时使用 DISTRIBUTE BY 和 SORT BY;在CLUSTER BY中,底层的 Distribute By 分区字段与 SORT BY 排序字段是相同的;CLUSTER BY 不会改变 REDUCER 的数量;示例 SQL 语句:select distinct cust_id, id_no, part_date from ads_api_cda_basic_info_parquet_pt cluster by cust_id;

CLUSTER BY 在 Spark Web UI 中的体现

使用 CLUSTER BY 时,在 Spark Web UI 中可以观察到其底层的工作机制。Distribute By 分区字段与 SORT BY 排序字段在 CLUSTER BY 中是统一的,这确保了数据在分区的同时也进行了排序,从而优化了查询性能。值得注意的是,CLUSTER BY不会改变 REDUCER 的数量,这也是其高效性的一个体现。

❒ BUCKET 桶表

HIVE中,桶表(BUCKET)是一种特殊的数据表,它具有多个优势,包括高效取样、支持mapside join等。当声明桶表时,用户需要指定分桶字段和桶的数量,例如“CLUSTERED BY(user_id) INTO 31 BUCKETS”。在底层执行写入操作时,桶表会自动添加CLUSTER BY子语句,确保数据按照指定的分桶字段进行分布。此外,桶表的写入操作会涉及reducer,且reducer的数量会自动设置为声明的桶数。

通过合理选择分桶字段和数量,桶表能有效控制底层小文件的数量,从而减轻数据倾斜和小文件问题。而且,使用桶表来处理这些问题时,所有的更改都在DDL层面进行,无需修改DML语句添加CLUSTER/DISTRIBUTE BY子语句。由于DDL通常是系统上线或后续优化调整时的一次性操作,这进一步增加了系统的灵活性和运维的便捷性。

https://m.ibaotu.com/search/496380680.html
https://m.ibaotu.com/search/496380991.html
https://m.ibaotu.com/search/496380690.html
https://m.ibaotu.com/search/496381185.html
https://m.ibaotu.com/search/496381183.html
https://m.ibaotu.com/search/496381330.html
https://m.ibaotu.com/search/496380692.html
https://m.ibaotu.com/search/496381331.html
https://m.ibaotu.com/search/496380675.html
https://m.ibaotu.com/search/496381182.html
https://m.ibaotu.com/search/496381339.html
https://m.ibaotu.com/search/496380688.html
https://m.ibaotu.com/search/496380987.html
https://m.ibaotu.com/search/496479856.html
https://m.ibaotu.com/search/496380973.html
https://m.ibaotu.com/search/496380984.html
https://m.ibaotu.com/search/496380976.html
https://m.ibaotu.com/search/496380679.html
https://m.ibaotu.com/search/496380981.html
https://m.ibaotu.com/search/496380655.html
https://m.ibaotu.com/search/496380666.html
https://m.ibaotu.com/search/496380964.html
https://m.ibaotu.com/search/496380657.html
https://m.ibaotu.com/search/496380651.html
https://m.ibaotu.com/search/496380659.html
https://m.ibaotu.com/search/496412671.html
https://m.ibaotu.com/search/496381328.html
https://m.ibaotu.com/search/496380665.html
https://m.ibaotu.com/search/496380971.html
https://m.ibaotu.com/search/496380677.html
https://m.ibaotu.com/search/496380671.html
https://m.ibaotu.com/search/496381168.html
https://m.ibaotu.com/search/496381163.html
https://m.ibaotu.com/search/496380676.html
https://m.ibaotu.com/search/496381325.html
https://m.ibaotu.com/search/496479834.html
https://m.ibaotu.com/search/496380672.html
https://m.ibaotu.com/search/496380654.html
https://m.ibaotu.com/search/496381318.html
https://m.ibaotu.com/search/496380669.html
https://m.ibaotu.com/search/496380969.html
https://m.ibaotu.com/search/496380958.html
https://m.ibaotu.com/search/496380567.html
https://m.ibaotu.com/search/496380640.html
https://m.ibaotu.com/search/496380960.html
https://m.ibaotu.com/search/496380685.html
https://m.ibaotu.com/search/496380547.html
https://m.ibaotu.com/search/496380641.html
https://m.ibaotu.com/search/500568016.html
https://m.ibaotu.com/search/496380958.html
https://m.ibaotu.com/search/496380661.html
https://m.ibaotu.com/search/496412675.html
https://m.ibaotu.com/search/496380647.html
https://m.ibaotu.com/search/496380639.html
https://m.ibaotu.com/search/496380667.html
https://m.ibaotu.com/search/496380952.html
https://m.ibaotu.com/search/498615697.html
https://m.ibaotu.com/search/496380748.html
https://m.ibaotu.com/search/496380642.html
https://m.ibaotu.com/search/496381156.html
https://m.ibaotu.com/search/496380616.html
https://m.ibaotu.com/search/496380950.html
https://m.ibaotu.com/search/496380634.html
https://m.ibaotu.com/search/496381155.html
https://m.ibaotu.com/search/496380953.html
https://m.ibaotu.com/search/496381147.html
https://m.ibaotu.com/search/496380636.html
https://m.ibaotu.com/search/496380625.html
https://m.ibaotu.com/search/496380620.html
https://m.ibaotu.com/search/496380875.html
https://m.ibaotu.com/search/496380603.html
https://m.ibaotu.com/search/496380621.html
https://m.ibaotu.com/search/496381302.html
https://m.ibaotu.com/search/496380567.html
https://m.ibaotu.com/search/496380619.html
https://m.ibaotu.com/search/496479798.html
https://m.ibaotu.com/search/496380617.html
https://m.ibaotu.com/search/496380611.html
https://m.ibaotu.com/search/496380602.html
https://m.ibaotu.com/search/496380937.html
https://m.ibaotu.com/search/496380604.html
https://m.ibaotu.com/search/496380934.html
https://m.ibaotu.com/search/496380936.html
https://m.ibaotu.com/search/496381137.html
https://m.ibaotu.com/search/496380748.html
https://m.ibaotu.com/search/496380585.html
https://m.ibaotu.com/search/496380598.html
https://m.ibaotu.com/search/496381146.html
https://m.ibaotu.com/search/496381297.html
https://m.ibaotu.com/search/496380602.html
https://m.ibaotu.com/search/496380582.html
https://m.ibaotu.com/search/496380935.html
https://m.ibaotu.com/search/496380588.html
https://m.ibaotu.com/search/496381307.html
https://m.ibaotu.com/search/496381135.html
https://m.ibaotu.com/search/496380608.html
https://m.ibaotu.com/search/496380609.html
https://m.ibaotu.com/search/496380920.html
https://m.ibaotu.com/search/496380601.html
https://m.ibaotu.com/search/496380553.html
https://m.ibaotu.com/search/496380571.html
https://m.ibaotu.com/search/496380591.html
https://m.ibaotu.com/search/496380570.html
https://m.ibaotu.com/search/496380549.html
https://m.ibaotu.com/search/496380926.html
https://m.ibaotu.com/search/496380917.html
https://m.ibaotu.com/search/496380918.html
https://m.ibaotu.com/search/496380914.html
https://m.ibaotu.com/search/496380796.html
https://m.ibaotu.com/search/496381132.html
https://m.ibaotu.com/search/496380555.html
https://m.ibaotu.com/search/496380567.html
https://m.ibaotu.com/search/496380539.html
https://m.ibaotu.com/search/496380562.html
https://m.ibaotu.com/search/496381286.html
https://m.ibaotu.com/search/496380554.html
https://m.ibaotu.com/search/496380910.html
https://m.ibaotu.com/search/496479753.html
https://m.ibaotu.com/search/496380572.html
https://m.ibaotu.com/search/496380552.html
https://m.ibaotu.com/search/496380899.html
https://m.ibaotu.com/search/496381129.html
https://m.ibaotu.com/search/496380916.html
https://m.ibaotu.com/search/496479746.html
https://m.ibaotu.com/search/496380591.html
https://m.ibaotu.com/search/496380620.html
https://m.ibaotu.com/search/496380593.html
https://m.ibaotu.com/search/496380893.html
https://m.ibaotu.com/search/496381120.html
https://m.ibaotu.com/search/496381121.html
https://m.ibaotu.com/search/496380889.html
https://m.ibaotu.com/search/496380911.html
https://m.ibaotu.com/search/496381113.html
https://m.ibaotu.com/search/496381119.html
https://m.ibaotu.com/search/496381281.html
https://m.ibaotu.com/search/496380888.html
https://m.ibaotu.com/search/496380878.html
https://m.ibaotu.com/search/496380880.html
https://m.ibaotu.com/search/496380887.html
https://m.ibaotu.com/search/496381278.html
https://m.ibaotu.com/search/496380882.html
https://m.ibaotu.com/search/496380879.html
https://m.ibaotu.com/search/496380544.html
https://m.ibaotu.com/search/496381282.html
https://m.ibaotu.com/search/496380863.html
https://m.ibaotu.com/search/496380912.html
https://m.ibaotu.com/search/496381276.html
https://m.ibaotu.com/search/496380876.html
https://m.ibaotu.com/search/496380861.html
https://m.ibaotu.com/search/496380874.html
https://m.ibaotu.com/search/496380875.html
https://m.ibaotu.com/search/496381268.html
https://m.ibaotu.com/search/496479725.html
https://m.ibaotu.com/search/496381263.html
https://m.ibaotu.com/search/496479730.html
https://m.ibaotu.com/search/496381098.html
https://m.ibaotu.com/search/496381269.html
https://m.ibaotu.com/search/496380858.html
https://m.ibaotu.com/search/496381267.html
https://m.ibaotu.com/search/496380851.html
https://m.ibaotu.com/search/496381101.html
https://m.ibaotu.com/search/496380854.html
https://m.ibaotu.com/search/496381266.html
https://m.ibaotu.com/search/496380860.html
https://m.ibaotu.com/search/496380846.html
https://m.ibaotu.com/search/496381093.html
https://m.ibaotu.com/search/496380855.html
https://m.ibaotu.com/search/496479715.html
https://m.ibaotu.com/search/496381100.html
https://m.ibaotu.com/search/496380870.html
https://m.ibaotu.com/search/496380845.html
https://m.ibaotu.com/search/496381096.html
https://m.ibaotu.com/search/496381259.html
https://m.ibaotu.com/search/496381097.html
https://m.ibaotu.com/search/501365821.html
https://m.ibaotu.com/search/496380832.html
https://m.ibaotu.com/search/496380862.html
https://m.ibaotu.com/search/496381257.html
https://m.ibaotu.com/search/496479713.html
https://m.ibaotu.com/search/496380829.html
https://m.ibaotu.com/search/496380837.html
https://m.ibaotu.com/search/496380831.html
https://m.ibaotu.com/search/496381080.html
https://m.ibaotu.com/search/496381065.html

http://www.jsqmd.com/news/519594/

相关文章:

  • AI 在工作中的一些使用
  • 大数据领域HBase的高可用架构设计
  • 推荐系统召回算法实战:从协同过滤到YouTube深度学习,5种方法对比与选型指南
  • 蛋白质相互作用网络:亲和纯化质谱、酵母双杂交与计算方法预测
  • 代谢组学数据处理:峰提取、注释、统计分析与代谢通路富集
  • 47mt视角下考虑火蓄深度调峰的电网经济运行优化之旅
  • 探索numpy库:从基础到高级操作的详细指南
  • KiCad新手必看:从原理图到PCB的完整避坑指南(附ERC/DRC详解)
  • Comsol 实现光子晶体中拓扑荷相关的有趣仿真探索
  • 脂质组学:复杂脂类的鉴定与定量分析技术进展
  • PFC2D 中配位数与偏组构曲线计算探索:以密砂双轴压缩试验为例
  • 软件工程毕业设计必备:8款AI工具解决论文写作与代码难题
  • 蛋白质结构预测的革命:AlphaFold2/3的方法论与在蛋白质组学中的应用
  • PFC2D静力触探模拟:巧用rblock模拟土体
  • 永磁同步电机二阶自抗扰控制仿真:速度环与电流环的融合之旅
  • AI工具精选:软件工程毕业设计的论文撰写与代码复现指南
  • 第一次作业 3.22
  • PFC 与 OpenFOAM 耦合流化床求解中乱流现象探究
  • 2026春季W3(3.16~3.22)
  • SpringBoot从会用到精通,只需掌握这几点!
  • Windows 10/11 下用 FRP 内网穿透远程办公:从配置到自启动全流程(附常见错误排查)
  • 2026高职大数据工程技术毕业生就业难度大吗?
  • TypeORM——订单管理实例
  • 【2025深度测评】实测7款降AI率工具,轻松把论文AI率从99%降到5%!
  • 写论文不用一直坐在电脑前:手机上有哪些能用的AI写作神器?
  • 基于庞特里亚金极小值原理PMP的燃料电池混合动力系统能量管理方法探索
  • 从微信聊天到CS对战:IP地址和端口号如何让数据精准送达?
  • DCCRN-E: Enhancing Real-Time Speech Clarity with Phase-Aware Complex Masking
  • (aaa-) snap 不走系统代理,也不走终端的代理?:ubuntu官方:snap-store-proxy 的使用方法 (***)
  • 多任务处理原理揭秘:为什么你的电脑能同时运行微信和Chrome?