当前位置：首页 > news >正文

Git-RSCLIP效果可视化教程：t-SNE降维展示遥感图文嵌入空间分布

news 2026/3/27 0:33:08

Git-RSCLIP效果可视化教程：t-SNE降维展示遥感图文嵌入空间分布

1. 引言：为什么需要可视化嵌入空间？

当你使用Git-RSCLIP进行遥感图像分类或检索时，有没有好奇过这个模型到底是如何理解图像和文本的？模型内部将每张图像和每段文本都转换成了一个高维向量（通常称为嵌入向量），这些向量就像是一个个数字指纹，代表着图像和文本的语义特征。

但是这些高维向量对我们人类来说是完全不可见的。t-SNE降维技术就像是一个神奇的"翻译官"，能够将这些高维数据投影到二维或三维空间，让我们用肉眼就能观察到模型是如何组织和管理这些语义信息的。

通过本教程，你将学会如何使用t-SNE技术来可视化Git-RSCLIP的嵌入空间分布，直观地看到：

相似类型的遥感图像在嵌入空间中的聚集情况
文本描述与对应图像的匹配关系
模型对不同地物类别的区分能力

2. 环境准备与快速部署

2.1 安装必要依赖

首先确保你已经启动了Git-RSCLIP镜像，然后安装可视化所需的额外库：

pip install matplotlib scikit-learn plotly

2.2 准备示例数据

我们将使用一些典型的遥感图像和文本描述作为示例：

import numpy as np from PIL import Image import requests from io import BytesIO # 示例图像URL（实际使用时请替换为你的图像路径） image_urls = [ "https://example.com/river.jpg", "https://example.com/forest.jpg", "https://example.com/farmland.jpg", "https://example.com/city.jpg", "https://example.com/airport.jpg" ] # 对应的文本描述 text_descriptions = [ "a remote sensing image of river", "a remote sensing image of forest", "a remote sensing image of farmland", "a remote sensing image of urban buildings", "a remote sensing image of airport runway" ]

3. 获取嵌入向量

3.1 提取图像嵌入

import torch from models.model import create_model_and_transforms # 加载预训练模型 model, preprocess = create_model_and_transforms('git_large_rsclip') # 确保使用GPU device = "cuda" if torch.cuda.is_available() else "cpu" model = model.to(device) def get_image_embeddings(image_paths): """提取批量图像的嵌入向量""" image_embeddings = [] for img_path in image_paths: # 加载和预处理图像 if img_path.startswith('http'): response = requests.get(img_path) image = Image.open(BytesIO(response.content)) else: image = Image.open(img_path) image = preprocess(image).unsqueeze(0).to(device) # 提取特征 with torch.no_grad(): image_features = model.encode_image(image) image_embeddings.append(image_features.cpu().numpy()) return np.vstack(image_embeddings)

3.2 提取文本嵌入

def get_text_embeddings(texts): """提取文本描述的嵌入向量""" text_embeddings = [] for text in texts: # 文本编码 text_inputs = model.tokenize([text]).to(device) with torch.no_grad(): text_features = model.encode_text(text_inputs) text_embeddings.append(text_features.cpu().numpy()) return np.vstack(text_embeddings)

4. t-SNE降维可视化

4.1 准备降维数据

from sklearn.manifold import TSNE import matplotlib.pyplot as plt # 获取嵌入向量 image_embs = get_image_embeddings(image_urls) text_embs = get_text_embeddings(text_descriptions) # 合并所有嵌入向量 all_embeddings = np.vstack([image_embs, text_embs]) labels = ['river_img', 'forest_img', 'farmland_img', 'city_img', 'airport_img', 'river_txt', 'forest_txt', 'farmland_txt', 'city_txt', 'airport_txt']

4.2 执行t-SNE降维

# 初始化t-SNE tsne = TSNE(n_components=2, random_state=42, perplexity=3) # 执行降维 embeddings_2d = tsne.fit_transform(all_embeddings) # 分离图像和文本的坐标 image_coords = embeddings_2d[:5] text_coords = embeddings_2d[5:]

4.3 可视化结果

plt.figure(figsize=(12, 8)) # 绘制图像点 for i, (x, y) in enumerate(image_coords): plt.scatter(x, y, c='blue', s=200, alpha=0.7, marker='o') plt.annotate(labels[i], (x, y), xytext=(5, 5), textcoords='offset points', fontsize=9) # 绘制文本点 for i, (x, y) in enumerate(text_coords): plt.scatter(x, y, c='red', s=200, alpha=0.7, marker='s') plt.annotate(labels[i+5], (x, y), xytext=(5, 5), textcoords='offset points', fontsize=9) # 连接对应的图像-文本对 for i in range(5): plt.plot([image_coords[i][0], text_coords[i][0]], [image_coords[i][1], text_coords[i][1]], 'gray', linestyle='--', alpha=0.5) plt.title('Git-RSCLIP Embedding Space Visualization with t-SNE') plt.xlabel('t-SNE Dimension 1') plt.ylabel('t-SNE Dimension 2') plt.grid(True, alpha=0.3) plt.legend(['Images', 'Texts'], loc='best') plt.tight_layout() plt.show()

5. 分析可视化结果

5.1 解读聚类模式

通过t-SNE可视化，你可以观察到几个重要模式：

紧密聚类：相同类别的图像和文本描述在嵌入空间中距离很近，说明模型能够很好地理解语义相似性。

类别分离：不同类别的地物（如水域、植被、人造建筑）会形成明显的簇群，反映模型对各类地物的区分能力。

图像-文本对齐：对应的图像和文本描述应该由虚线连接，如果连接线很短，说明图文匹配良好。

5.2 实际应用洞察

这种可视化方法可以帮助你：

评估模型性能：如果同类样本聚集紧密，不同类样本分离明显，说明模型表现良好。

发现异常样本：远离自己簇群的样本可能是分类错误的案例，值得进一步分析。

优化提示词：通过比较不同文本描述的分布，可以找到最有效的描述方式。

6. 进阶技巧与实用建议

6.1 处理大量数据点

当需要可视化大量样本时，可以使用采样策略：

def visualize_large_dataset(embeddings, labels, sample_size=100): """可视化大规模数据集的采样版本""" indices = np.random.choice(len(embeddings), sample_size, replace=False) sample_embeddings = embeddings[indices] sample_labels = [labels[i] for i in indices] # 执行t-SNE降维 tsne = TSNE(n_components=2, random_state=42) coords_2d = tsne.fit_transform(sample_embeddings) # 可视化代码...

6.2 交互式可视化

使用Plotly创建交互式图表，可以悬停查看详细信息：

import plotly.express as px def create_interactive_plot(coords_2d, labels): fig = px.scatter(x=coords_2d[:, 0], y=coords_2d[:, 1], hover_name=labels, title='Interactive Embedding Visualization') fig.show()