当前位置：首页 > news >正文

语义层为人民所用，由人民所建

news 2026/6/4 18:07:45

原文：towardsdatascience.com/semantic-layer-for-the-people-and-by-the-people-ce9ecbd0a6f6

TL;DR:

我的三个直接和一个隐藏的 Joker 是：

Joker #1:基于模式的存储库结构 🗂️
Joker #2:有序代码 👩‍💻
Joker #3:(非)内嵌文档 📜
[🃏 隐藏的 Joker: R**efinement Loop* 🃏]

<…/Images/86cf76a5bb32cbbf682673c729214aab.png>

“简单和一致*。 – 这是我向某人描述构建语义层时需要考虑的两个最重要的维度的方式。” [照片由 Zuzana Ruttkay 在 Unsplash 上提供]

语义 – 对语言意义的研究

根据维基百科，术语语义，即对语言意义的进行研究，考察****意义是什么。

或者本质上–词语如何获得意义以及复杂表达的意义如何取决于其组成部分[1]。

虽然术语语义的解释很简单，但我确实不得不花些时间思考“一个复杂表达的意义取决于其组成部分”这一部分，因为我想要重新使用它来解释分析中的语义层。

重新阅读后，我的解释如下：

与语言环境中的语义学类似，分析中的语义层是关于使数据具有意义。

正如词语组合成一个特定的意义，从而理解所说的话一样，来自不同来源的原始数据得到丰富并形成特定的*洞察**。

正如意义依赖于表达中词语的组合一样，从原始数据中得出的结果依赖于语义层中的建模方法。

正如 _ 正确结构化 _ 和良好形成的表达式导致易于理解一样，正确建模的数据导致高质量的数据洞察。

总的来说，这全部关乎如何从原始数据中创造更好、更快和新颖的 __价值，这导致洞察和理解业务应采取的行动。

这就是语义层的核心目的，构建它充满了无数的挑战。

在构建语义层时，我总是面临的两个关键问题是（1）简单性和（2）一致性。

或者，更好的说法是如何同时实现它们。

当我尝试通过关注我的分析发展的一个领域来简化我的语义层时，这通常削弱了我的一致性。
当我尝试使我的语义层保持一致时，它通常导致过于复杂。

因此，找到平衡，或者更准确地说，一个既简单又一致的语义层结构，是相关的，因为它会影响交付业务价值的速度，有时甚至影响其质量。

因此，在这篇文章中，我将分享我在创建一个旨在随着业务需求发展的语义层时，平衡这些挑战的3 个“策略”。

我的解释，以及提供的视觉模板，将主要基于我如何使用Looker构建语义层。

然而，它们是通用的，可以应用于构建任何其他 BI 或建模工具（如dbt Core或Dataform）的语义层。

让我们深入探讨。

策略#1：模式驱动的存储库结构 🗂️

语义层的存储库结构是你需要关注的蓝图。作为一个基础任务，它确实需要既简单又一致。

为什么你会问？

因为语义层的适当存储库结构需要服务于技术（用于数据建模和测试开发）和业务同事（用于自助分析部分）。简而言之，它需要被双方理解。
因为业务需求会随着时间的推移而“爆炸”，你的分析发展也是如此。如果你不跟踪语义层存储库结构模式，你的开发很可能会出现冗余，并且它将缺乏适当的发展标准。

那么，一个好的语义层存储库应该包含哪些关键组件：

(1)文件夹或子文件夹的层级—在你的存储库层次结构中创建清晰的层级，将业务逻辑与技术逻辑分离。这样，当技术用户和业务用户讨论相同的数据模型或相同的数据洞察比较方法（例如，特定的期间对比方法或特定的预测模型）时，他们可以“达成共识”。
(2)命名约定—与第一个组件相关，命名约定在标准化开发和保持仓库结构整洁方面发挥着重要作用。正确定义的命名约定通过遵循一致的模式加快数据建模，并在故障排除过程中简化导航。

为了为上述理论提供更多背景，我将直观地解释我的语义层仓库结构的蓝图。

.└── Semantic Layer/├── area_association_rules/├── models/├── association_rules.model.lkml ├── frequent_items.model.lkml ├── views_shop/├── association_rules.view.lkml ├── frequent_items.view.lkml ├── area_benchmarking/├── area_demand_forecast/├── area_financial_forecasting/├── area_customer_intelligence/├──....├── area_business_controlling/├── models/├── views/├── views_derived/├── views_aggregated/├── area_finances/├── area_logistics/├── area_performance_marketing/├──....├── base_views/├── base_date_granularity/├── base_date_granularity_customised/├── base_pop_logic/├── base_pop_logic_customised/├──....├── data_tests/├── documentation/├── locales/├── manifest.lkml ├── README.md

我通常将我的语义层仓库结构分为area_*和base_*_文件夹：

(1) **area_***文件夹。

使用这些文件夹的目的是将分析开发集中在组织代码到特定的业务区域（例如，area_finances或area_marketing）或者跨共享的业务分析案例（例如，area_demand forecast，area_benchmarking等）。
每个area_*文件夹封装了与特定业务部门或用例相关的逻辑和代码，并且进一步分为models和views_*文件夹。
语义层的models文件夹包含model文件，而views_*文件夹包含view文件或聚合/派生视图。| 👉🏼 注意：关于这些文件的更多信息可以在下一节找到.*

(2) **base_***文件夹。

使用这些文件夹的目的是专注于分离核心逻辑或功能，例如定制的时间范围分析方法，以及跨共享的views，这些views在多个业务区域中使用，并封装在语义层的多个model文件中。
这种组织方式确保了常见的业务建模逻辑不会分散在多个区域，并且集中管理，从而在整个语义层中实现一致性。

应用这个模板，我体验到了数据开发（整个团队）的加速以及故障排除的速度和便捷性，因为我专注于创建一个模式驱动的仓库结构。

考虑到仓库文件夹本身包含不同的`文件，保持代码整洁同样重要。

这让我想到了我的第二个“小丑”——创建干净的代码文件。

小丑 #2：组织代码 👩🏻‍💻

在大多数语义层中，数据项目仓库都包含一些常见的文件类型。

例如，在 Looker 语义层中，3 种主要的文件类型是模型、视图和清单[2]。

| 👉🏼 注意：Looker 中还有更多文件类型，但我会专注于上面列出的三种主要类型。

列出的每一个文件都有自己的代码逻辑，而这些代码逻辑是可以进行组织的。

让我再次为您提供我如何在 Looker 中组织特定文件内代码的视觉模板：

(1) 模型文件– 包含关于视图（表）及其在 Explore 中如何连接在一起的信息 [2]。

############ MODEL_NAME &amp; METADATA# Description: This model reflects the [business context].# Author: [Your Name] | Created on: [Date]# Contributors: [Your Name] | Last change on: [Date]##################### 1\. DISPLAY &amp; CONNECTION PARAMETERS########### 1.1 Label and Connection Setupconnection:"your_connection_name"label:"Your Model Label"########## 2\. STRUCTURAL PARAMETERS########### 2.1 Include Statements for Viewsinclude:"../views/*.view"include:"../views_aggregated/*.view"include:"../views_derived/*.view"## 2.2 Additional Include Statements (E.g. Logistics and Base Logic Views | Optional)include:"/area_logistics/views/*.view"include:"/base/*.view"include:"/base_granularity/*.view"## 2.3 Include Statements for Data Testsinclude:"../../data_tests/*.lkml"include:"../../data_tests/views/*.view"########## 3\. ACCESS CONTROL (Optional)########### 3.1 Define Access Grantsaccess_grant:access_grant_name{user_attribute:user_attribute_name allowed_values:["value_1","value_2"]}########## 4\. EXPLORES########### 4.1 Main Explore for Your Data Modelexplore:explore_name{label:"Explore Label"view_name:view_name persist_for:"N hours"## 4.1.1 SQL Filters and Conditions (Optional)sql_always_where:@{sql_always_where_condition_1}AND{%ifsome_field._in_query%}${some_field}IS NOT NULL{%else%}1=1{%endif%}AND @{sql_always_where_condition_2};;## 4.1.2 Joins for Additional Viewsjoin:another_view_name{type:left_outer relationship:one_to_many sql_on:${view_name.field_name}=${another_view_name.field_name};;}join:another_join_view{type:inner relationship:one_to_one sql_on:${view_name.field_name}=${another_join_view.field_name};;}# Add more joins to create data model}########## 5\. DATA TESTING EXPLORE########### 5.1 Define Explore for Data Testing (Optional)explore:data_testing{label:"Data Testing Explore"view_name:view_name hidden:yes join:another_test_view{type:left_outer relationship:one_to_one sql_on:${view_name.field_name}=${another_test_view.field_name};;}# Add more testing joins as required}########## 6\. MAP LAYER (Optional)########### 6.1 Define Map Layer for Geographic Data (Optional)map_layer:map_name{file:"../files/your_map_file.json"property_key:"your_property_key"label:"Map Label"format:topojson max_zoom_level:15min_zoom_level:2}

(2) 视图文件– 包含从特定数据库表（或多个连接的表）访问的维度和度量 [2]。

############ VIEW_NAME &amp; METADATA# Description: This view reflects the [business context] or [data source] it represents.# Author: [Your Name] | Created on: [Date]# Contributors: [Your Name] | Last change on: [Date]###########view:view_name{sql_table_name:`project.dataset.table_name`;;########## 1\. DISPLAY PARAMETERS########### 1.1 Label for View# Specifies how the view name will appear in the field pickerlabel:"Your View Display Name"## 1.2 Fields Hidden by Default# When set to yes, hides all fields in the view by defaultfields_hidden_by_default:yes########## 2\. STRUCTURAL &amp; FILTER PARAMETERS (Optional)########### 2.1 Include Files# Includes additional files or views to be part of this viewinclude:"filename_or_pattern"## 2.2 Extends View# Specifies views that this view will extendextends:[another_view_name]## 2.3 Drill Fields# Specifies the default list of fields shown when drilling into measuresdrill_fields:[dimension_name,another_dimension]## 2.4 Default Filters for Common Queriesfilter:default_date_filter{label:"Date Filter"type:date sql:${order_date};;description:"Filter data based on order date."}## 2.5 Suggestions for Dimensions# Enables or disables suggestions for all dimensions in this viewsuggestions:yes## 2.6 Set of Fields# Defines a reusable set of dimensions and measuresset:set_name{fields:[dimension_name,measure_name]}########## 3\. DIMENSIONS########### 3.1 Simple Dimensions (Directly from DB)dimension:dimension_name{label:"Dimension Display Name"type:string sql:${TABLE}.column_name;;description:"This dimension represents [business context] and contains values like [example]."}dimension:another_dimension{label:"Another Dimension Display Name"type:number sql:${TABLE}.other_column;;description:"Explanation of the dimension, including business context and possible values."}## 3.2 Compound Dimensions (Concatenated from Existing Dimensions)dimension:compound_dimension{label:"Compound Dimension Name"type:string sql:CONCAT(${dimension_name},"-",${another_dimension});;description:"A compound dimension created by concatenating [dimension_name] and [another_dimension]."}## 3.3 Derived Dimensions (Filtered/Grouped Values from Existing Dimensions)dimension:filtered_dimension{label:"Filtered Dimension Name"type:string sql:CASE WHEN ${dimension_name}='specific_value'THEN'Subset Value'ELSE'Other'END;;description:"This dimension subsets values from [dimension_name] based on specific business rules."}## 3.4 Tiered Dimension (Grouped by Tiers)dimension:order_amount_tier{label:"Order Amount Tier [€]"type:integer tiers:[50,100,150]sql:${revenue_column};;description:"This dimension creates tiers of order amounts based on thresholds (50, 100, 150)."}########## 4\. MEASURES########### 4.1 Simple Aggregated Measures (Sum, Count, Average)measure:total_revenue{group_label:"KPIs"label:"Total Revenue [€]"type:sumsql:${revenue_column};;value_format_name:currency_format description:"Total revenue, summing up all revenue from each record."}## 4.2 Calculated Measures (Derived from Existing Measures)measure:profit_margin{group_label:"KPIs"label:"Profit Margin [%]"type:number sql:(${total_revenue}-${cost_column})/NULLIF(${total_revenue},0);;value_format_name:percent_2 description:"Calculated profit margin as (Revenue - Cost) / Revenue."}}

(3) 清单文件– 是一个配置文件，包含项目常量、从另一个项目（或多个项目）导入的文件代码、本地化设置的代码，并用于添加扩展或自定义可视化 [2]。

############ MANIFEST_INFO &amp; METADATA# Description: This file reflects the [business context].# Author: [Your Name] | Created on: [Date]# Contributors: [Your Name] | Last change on: [Date]##################### 1\. STRUCTURAL PARAMETERS########### 1.1 Project Name &amp; LookML Runtimeproject_name:"Current Project Name"new_lookml_runtime:yes## 1.2 Local Dependencylocal_dependency:{project:"project_name"override_constant:constant_name{value:"string value"}}# Add additional local dependencies as needed.## 1.3 Remote Dependency (Optional)remote_dependency:remote_project_name{url:"remote_project_url"ref:"remote_project_ref"override_constant:constant_name{value:"string value"}}# Add additional remote dependencies as needed.## 1.4 Constants (Optional, but useful)constant:constant_name{value:"string value"export:none|override_optional|override_required}########## 2\. LOCALIZATION PARAMETERS########### 2.1 Localization Settingslocalization_settings:{localization_level:strict|permissive default_locale:locale_name}########## 3\. EXTENSION FRAMEWORK PARAMETERS (Optional)########### 3.1 Application Definitionsapplication:application_name{label:"Application Label"url:"application_url"file:"application_file_path"## 3.1.1 Mount Pointsmount_points:{# Define mount points here (refer to the application page for more details)}## 3.1.2 Entitlementsentitlements:{# Define entitlements here (refer to the application page for more details)}}# Add additional application declarations as required.########## 4\. CUSTOM VISUALIZATION PARAMETERS (Optional)########### 4.1 Visualization Definitionvisualization:{id:"unique-id"label:"Visualization Label"url:"visualization_url"sri_hash:"SRI hash"dependencies:["dependency_url_1","dependency_url_2"]file:"visualization_file_path"}# Add additional visualizations as needed.