当前位置：首页 > news >正文

告别NIfTI恐惧症：手把手教你用Python和SimpleITK搞定BraTS 2018数据集预处理

news 2026/5/13 5:13:37

告别NIfTI恐惧症：手把手教你用Python和SimpleITK搞定BraTS 2018数据集预处理

第一次接触医学影像数据的研究者，面对.nii.gz格式文件时往往会感到无从下手。这种专为神经影像学设计的NIfTI格式，虽然能高效存储三维体数据，但对计算机视觉或机器学习背景的研究者来说却像一堵无形的墙。本文将用最直白的语言和可复现的代码，带你快速掌握BraTS数据集的处理技巧。

医学影像处理与传统图像分析最大的区别在于三维体数据的概念。我们熟悉的JPG、PNG是二维矩阵，而MRI扫描结果是由多层二维切片堆叠形成的三维立方体。BraTS 2018数据集中的每个病例包含四种模态（t1、t2、flair、t1ce），每种模态都是一个155×240×240的三维数组，相当于155张240×240的切片图像。

1. 环境准备与数据理解

1.1 必备工具安装

处理NIfTI文件需要以下Python包，建议使用conda创建虚拟环境：

conda create -n brats python=3.8 conda activate brats pip install simpleitk numpy matplotlib nibabel scipy

SimpleITK是处理医学影像的瑞士军刀，相比传统的nibabel库，它的API更加简洁直观。我们额外安装nibabel是为了后续对比两种读取方式的差异。

1.2 数据集目录结构

下载解压后的BraTS 2018训练集通常呈现这样的结构：

MICCAI_BraTS_2018_Data_Training/ ├── HGG/ │ ├── Brats18_2013_10_1/ │ │ ├── Brats18_2013_10_1_flair.nii.gz │ │ ├── Brats18_2013_10_1_t1.nii.gz │ │ ├── Brats18_2013_10_1_t1ce.nii.gz │ │ ├── Brats18_2013_10_1_t2.nii.gz │ │ └── Brats18_2013_10_1_seg.nii.gz │ └── ...其他病例 └── LGG/ └── ...类似HGG的结构

四种模态图像的区别：

T1：解剖结构清晰，适合观察脑组织形态
T2：对水肿区域敏感
FLAIR：抑制脑脊液信号，突出病变区域
T1ce：对比增强后的T1，显示血脑屏障破坏区域

2. 数据加载与基础探查

2.1 批量获取文件路径

使用glob模块可以快速收集所有模态文件路径：

import glob def get_brats_paths(root_dir): cases = glob.glob(f"{root_dir}/*/*") paths = [] for case in cases: paths.append({ 't1': glob.glob(f"{case}/*t1.nii.gz")[0], 't1ce': glob.glob(f"{case}/*t1ce.nii.gz")[0], 't2': glob.glob(f"{case}/*t2.nii.gz")[0], 'flair': glob.glob(f"{case}/*flair.nii.gz")[0], 'seg': glob.glob(f"{case}/*seg.nii.gz")[0] }) return paths # 示例使用 data_paths = get_brats_paths("MICCAI_BraTS_2018_Data_Training/HGG") print(f"共加载{len(data_paths)}个病例")

2.2 使用SimpleITK读取NIfTI文件

SimpleITK的读取接口非常直观：

import SimpleITK as sitk def load_nii_as_array(filepath): """ 加载.nii.gz文件并返回numpy数组 返回：形状为(depth, height, width)的3D数组 """ img = sitk.ReadImage(filepath) arr = sitk.GetArrayFromImage(img) print(f"加载 {filepath.split('/')[-1]} 形状: {arr.shape}") return arr # 加载第一个病例的T1图像 t1_arr = load_nii_as_array(data_paths[0]['t1'])

注意：SimpleITK读取后的数组维度顺序是(z,y,x)，即(155,240,240)。这与某些库的(x,y,z)顺序不同，操作时需特别注意。

3. 三维可视化与切片探索

3.1 交互式切片查看器

医学影像分析的关键是理解三维结构。这个简易查看器可以帮助你浏览不同切面：

import numpy as np import matplotlib.pyplot as plt from ipywidgets import interact def explore_3d(arr): def show_slice(z=80, y=120, x=120): fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15,5)) ax1.imshow(arr[z,:,:], cmap='gray') ax1.set_title(f'Axial Slice {z}') ax2.imshow(arr[:,y,:], cmap='gray') ax2.set_title(f'Coronal Slice {y}') ax3.imshow(arr[:,:,x], cmap='gray') ax3.set_title(f'Sagittal Slice {x}') plt.show() return interact(show_slice, z=(0, arr.shape[0]-1), y=(0, arr.shape[1]-1), x=(0, arr.shape[2]-1)) # 查看T1图像 explore_3d(t1_arr)

3.2 多模态对比可视化

同时显示四种模态的中线切片，方便比较差异：

def plot_modalities(case_path): fig, axes = plt.subplots(2, 2, figsize=(10,10)) modalities = ['t1', 't1ce', 't2', 'flair'] titles = ['T1', 'T1 Contrast Enhanced', 'T2', 'FLAIR'] for ax, mod, title in zip(axes.flat, modalities, titles): arr = load_nii_as_array(case_path[mod]) ax.imshow(arr[arr.shape[0]//2], cmap='gray') ax.set_title(title) ax.axis('off') plt.tight_layout() plt.show() plot_modalities(data_paths[0])

4. 高级预处理流程

4.1 标准化与重采样

医学影像预处理通常包含以下关键步骤：

强度标准化：消除扫描仪差异
重采样：统一体素间距
脑部提取：去除颅骨等无关组织
配准：对齐不同模态

以下是整合这些步骤的完整预处理函数：

from scipy.ndimage import zoom def preprocess_volume(arr, target_shape=(80, 96, 64)): """ 完整预处理流程： 1. Z-score标准化 2. 重采样到目标形状 3. 裁剪异常值 """ # 强度标准化 arr = (arr - np.mean(arr)) / np.std(arr) # 计算重采样比例 scale_factors = [ target_shape[0]/arr.shape[0], target_shape[1]/arr.shape[1], target_shape[2]/arr.shape[2] ] # 三次样条插值重采样 arr = zoom(arr, scale_factors, order=3) # 裁剪到[-3,3]范围 arr = np.clip(arr, -3, 3) return arr.astype(np.float32)

4.2 标签处理技巧

BraTS的分割标签包含三类肿瘤组织，需要特殊处理：

标签值	肿瘤类型	对应颜色
1	坏死和非增强肿瘤核心(NCR/NET)	红色
2	瘤周水肿(ED)	绿色
4	增强肿瘤(ET)	蓝色

def process_segmentation(seg_arr, target_shape): # 创建三个独立通道的mask ncr = (seg_arr == 1).astype(np.uint8) # 坏死核心 ed = (seg_arr == 2).astype(np.uint8) # 水肿区域 et = (seg_arr == 4).astype(np.uint8) # 增强肿瘤 # 重采样每个通道 ncr = zoom(ncr, [ target_shape[0]/seg_arr.shape[0], target_shape[1]/seg_arr.shape[1], target_shape[2]/seg_arr.shape[2] ], order=0) # 最近邻插值保持二值特性 ed = zoom(ed, [ target_shape[0]/seg_arr.shape[0], target_shape[1]/seg_arr.shape[1], target_shape[2]/seg_arr.shape[2] ], order=0) et = zoom(et, [ target_shape[0]/seg_arr.shape[0], target_shape[1]/seg_arr.shape[1], target_shape[2]/seg_arr.shape[2] ], order=0) return np.stack([ncr, ed, et], axis=0) # 形状(3,d,h,w)

5. 构建高效数据管道

5.1 内存映射存储方案

处理大量3D数据时，内存可能成为瓶颈。使用HDF5格式可以高效存储和访问：

import h5py def create_hdf5_dataset(data_paths, output_file, target_shape=(4,80,96,64)): with h5py.File(output_file, 'w') as hf: # 初始化数据集 hf.create_dataset('images', shape=(len(data_paths), *target_shape), dtype=np.float32) hf.create_dataset('masks', shape=(len(data_paths), 3, *target_shape[1:]), dtype=np.uint8) # 进度条 from tqdm import tqdm for i, path in enumerate(tqdm(data_paths)): # 加载并预处理四种模态 modalities = [] for mod in ['t1', 't1ce', 't2', 'flair']: arr = load_nii_as_array(path[mod]) arr = preprocess_volume(arr, target_shape[1:]) modalities.append(arr) # 堆叠模态 (4,d,h,w) hf['images'][i] = np.stack(modalities, axis=0) # 处理分割标签 seg = load_nii_as_array(path['seg']) hf['masks'][i] = process_segmentation(seg, target_shape[1:])

5.2 数据增强策略

3D医学影像的特殊性要求定制的增强方法：

import random def augment_3d(image, mask): """ 应用3D空间增强 """ # 随机翻转 if random.random() > 0.5: image = np.flip(image, axis=1) mask = np.flip(mask, axis=1) if random.random() > 0.5: image = np.flip(image, axis=2) mask = np.flip(mask, axis=2) # 随机旋转(0,90,180,270度) k = random.randint(0, 3) image = np.rot90(image, k=k, axes=(1,2)) mask = np.rot90(mask, k=k, axes=(1,2)) # 随机亮度调整 image = image * (0.9 + 0.2*random.random()) return image, mask

6. 实战技巧与常见陷阱

处理BraTS数据集时，这些经验教训可能帮你节省数小时调试时间：

内存管理：同时加载多个3D体积会快速耗尽内存，建议使用生成器或HDF5
标签一致性：不同病例的标签值可能不同，确保统一处理为1/2/4
方向问题：某些.nii文件可能需要调整方向，使用sitk.DICOMOrient校正
模态对齐：虽然BraTS数据已配准，但自己收集的数据可能需要额外配准步骤

可视化检查预处理结果的代码片段：

def sanity_check(image, mask, slice_idx=40): """ 检查图像和mask的对齐情况 """ fig, axes = plt.subplots(1, 4, figsize=(20,5)) # 显示四种模态 for i in range(4): axes[i].imshow(image[i, slice_idx], cmap='gray') axes[i].set_title(f'Modality {i}') axes[i].axis('off') # 叠加显示mask plt.figure(figsize=(6,6)) plt.imshow(image[0, slice_idx], cmap='gray') # 用不同颜色显示三类肿瘤 plt.imshow(np.ma.masked_where(mask[0,slice_idx]==0, mask[0,slice_idx]), alpha=0.5, cmap='Reds') # NCR/NET plt.imshow(np.ma.masked_where(mask[1,slice_idx]==0, mask[1,slice_idx]), alpha=0.5, cmap='Greens') # ED plt.imshow(np.ma.masked_where(mask[2,slice_idx]==0, mask[2,slice_idx]), alpha=0.5, cmap='Blues') # ET plt.title('Segmentation Overlay') plt.axis('off') plt.show()

查看全文

http://www.jsqmd.com/news/806737/