当前位置：首页 > news >正文

Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings

news 2026/7/4 14:29:18

Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings

Authors:Suhyung Jang, Ghang Lee, Jaekun Lee, Hyunjun Lee

Deep-Dive Summary:

利用大语言模型编码增强 AI 模型训练中的建筑语义保护

作者：Suhyung Jang, Chang Lee, Jaekun Lee, Hyunjun Lee
机构：延世大学建筑工程系（韩国）；慕尼黑工业大学高等研究院（德国）

1. 引言

AI 在 AECO 行业的有效应用取决于建筑项目信息在机器可理解格式中的准确表示。以往研究多关注数据格式（如照片、点云、BIM 图），但往往忽视了编码方法的选择，默认使用 one-hot 或标签编码。虽然 LLM 嵌入在捕获领域特定语境方面表现出色，但其在 AI 模型训练中作为编码工具的潜力尚未得到充分开发。

本研究提出使用 LLM 嵌入作为编码（即“LLM 编码”），并在建筑对象子类型分类任务中通过 GraphSAGE 模型进行验证。实验对比了 one-hot 编码与 OpenAI 的 ‘text-embedding-3’ 系列及 Meta 的 ‘llama3’ 生成的嵌入，并探讨了维度压缩对语义保留的影响。

3. LLM 编码与 Matryoshka 表示模型

3.1 LLM 编码

在神经网络训练中，使用 LLM 编码需要修改损失计算方法。由于 LLM 嵌入处于高维空间，传统的 Sigmoid 函数会稀释语义特征。本研究将神经网络最后一层的维度设置为与目标 LLM 嵌入一致，并使用**余弦嵌入损失（Cosine Embedding Loss）**计算输出嵌入e p e_pep与目标嵌入e t e_tet之间的差异：

L ( e p , e t ) = 1 − e p ⋅ e t ∥ e p ∥ ∥ e t ∥ ( 1 ) L(\mathbf{e}_p,\mathbf{e}_t) = 1 - \frac{\mathbf{e}_p\cdot\mathbf{e}_t}{\|\mathbf{e}_p\|\|\mathbf{e}_t\|} \quad (1)L(ep,et)=1−∥ep∥∥et∥ep⋅et(1)

3.2 Matryoshka 表示模型

为了应对高维嵌入带来的计算效率问题，研究采用了 Matryoshka 表示模型，将高维嵌入投影到低维空间（如 1,024 维），同时保留关键语义特征。

5. 结果

如表 2 所示，随着 LLM 嵌入性能的提升，加权平均 F1 分数也随之增加。

表 2. 不同编码类型的加权平均 F1 分数

编码类型	维度	加权平均 F1 分数
One-hot 编码	42	0.8475
One-hot 编码	1,024	0.8705
text-embedding-3-small	1,536 (原始)	0.8498
text-embedding-3-small	1,024	0.8655
text-embedding-3-large	3,072 (原始)	0.8529
text-embedding-3-large	1,024 (压缩)	0.8766
llama-3	4,096 (原始)	0.8714

5.1 One-hot 与 LLM 编码对比

统计分析显示，压缩后的 ‘text-embedding-3-large’ 相比 one-hot 编码具有显著的性能提升 (p = 0.006596 p = 0.006596p=0.006596)。这表明压缩过程可能在去除噪声的同时保留了关键语义线索。

5.2 LLM 编码之间的对比

在压缩格式下，不同 LLM 编码之间表现出显著差异（表 6）。‘llama-3 (compacted)’ 得益于其庞大的训练基数和参数量，表现优异。结果还显示，压缩后的嵌入往往优于原始高维嵌入，这可能是因为当前使用的 AI 模型（GraphSAGE）规模尚不足以充分捕捉极高维度的全部语义。

7. 结论

本研究通过引入 LLM 嵌入作为编码，解决了 AI 模型中建筑语义的保留与丰富问题。实验证明 “llama-3 (compacted)” 等编码在 BIM 对象分类中显著优于 one-hot 编码。这一框架为 AECO 领域的从业者和研究人员提供了一个提升 AI 模型准确性和语义忠实度的可行方案。

致谢

本项目由韩国国土交通部（KAIA）和德国慕尼黑工业大学高等研究院（TUM-IAS）汉斯·费舍尔高级奖学金项目资助。

Original Abstract:Accurate representation of building semantics, encompassing both generic object types and specific subtypes, is essential for effective AI model training in the architecture, engineering, construction, and operation (AECO) industry. Conventional encoding methods (e.g., one-hot) often fail to convey the nuanced relationships among closely related subtypes, limiting AI’s semantic comprehension. To address this limitation, this study proposes a novel training approach that employs large language model (LLM) embeddings (e.g., OpenAI GPT and Meta LLaMA) as encodings to preserve finer distinctions in building semantics. We evaluated the proposed method by training GraphSAGE models to classify 42 building object subtypes across five high-rise residential building information models (BIMs). Various embedding dimensions were tested, including original high-dimensional LLM embeddings (1,536, 3,072, or 4,096) and 1,024-dimensional compacted embeddings generated via the Matryoshka representation model. Experimental results demonstrated that LLM encodings outperformed the conventional one-hot baseline, with the llama-3 (compacted) embedding achieving a weighted average F1-score of 0.8766, compared to 0.8475 for one-hot encoding. The results underscore the promise of leveraging LLM-based encodings to enhance AI’s ability to interpret complex, domain-specific building semantics. As the capabilities of LLMs and dimensionality reduction techniques continue to evolve, this approach holds considerable potential for broad application in semantic elaboration tasks throughout the AECO industry.

PDF Link:2602.15791v1