当前位置: 首页 > news >正文

Spark SQL - Recursive CTE

Can you give me an example to query parent-child hierarchy using Spark SQL recursive CTE?
 
ChatGPT said:

Sure! Here are clear, complete Spark SQL examples that show how to query a parent-child hierarchy using a recursive CTE.
(Supported in Spark 3.4+.)


⭐ Example Scenario

Suppose you have a table:

CREATE TABLE org (id INT,parent_id INT,name STRING
);

Example data:

idparent_idname
1 NULL CEO
2 1 VP Eng
3 1 VP Sales
4 2 Manager A
5 2 Manager B
6 4 Engineer

This forms a tree:

CEO├─ VP Eng│    ├─ Manager A│    │     └─ Engineer│    └─ Manager B└─ VP Sales

 


Spark SQL Recursive CTE Example

WITH RECURSIVE org_tree AS (-- 1. Anchor: all root nodesSELECTid,parent_id,name,CAST(name AS STRING) AS path,0 AS levelFROM orgWHERE parent_id IS NULLUNION ALL-- 2. Recursive: join children to parentsSELECTc.id,c.parent_id,c.name,CONCAT(p.path, ' > ', c.name) AS path,p.level + 1 AS levelFROM org cJOIN org_tree pON c.parent_id = p.id
)SELECT *
FROM org_tree
ORDER BY path;

 


✔ Output

idparent_idnamepathlevel
1 NULL CEO CEO 0
2 1 VP Eng CEO > VP Eng 1
3 1 VP Sales CEO > VP Sales 1
4 2 Manager A CEO > VP Eng > Manager A 2
5 2 Manager B CEO > VP Eng > Manager B 2
6 4 Engineer CEO > VP Eng > Manager A > Engineer 3

🔍 How it works

1. Anchor query

Gets root rows (parent_id IS NULL).

2. Recursive query

Keeps joining child rows to the current “tree”.

3. Spark handles recursion

Spark automatically expands:

level 0 → level 1 → level 2 → …

Until no more child rows exist.


⭐ Bonus: Query all descendants of a specific id

WITH RECURSIVE sub AS (SELECT *FROM orgWHERE id = 2        -- start from VP EngUNION ALLSELECT c.*FROM org cJOIN sub p ON c.parent_id = p.id
)
SELECT * FROM sub;

 


⭐ Bonus: Query all ancestors of a specific id

WITH RECURSIVE ancestors AS (SELECT *FROM orgWHERE id = 6         -- EngineerUNION ALLSELECT p.*FROM org pJOIN ancestors c ON p.id = c.parent_id
)
SELECT * FROM ancestors;