当前位置: 首页 > news >正文

特征提取:从手工特征到深度学习

特征提取:从手工特征到深度学习

1. 技术分析

1.1 特征提取技术演进

特征提取经历了从手工设计到自动学习的演进:

特征提取技术路线 手工特征: SIFT/SURF/HOG 浅层学习: PCA/ICA 深度学习: CNN/Transformer

1.2 特征提取方法对比

方法类型特点效果适用场景
SIFT手工尺度不变图像检索
HOG手工梯度方向行人检测
CNN深度学习自动学习通用
ViTTransformer全局建模很高大规模

1.3 特征类型

特征类型 局部特征: SIFT、ORB 全局特征: 平均池化、CLIP 语义特征: BERT、ViT

2. 核心功能实现

2.1 手工特征提取

import cv2 import numpy as np from skimage.feature import hog from skimage import color class SIFTFeatureExtractor: def __init__(self): self.sift = cv2.SIFT_create() def extract(self, image): gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) keypoints, descriptors = self.sift.detectAndCompute(gray, None) if descriptors is not None: return descriptors.flatten()[:1024] else: return np.zeros(1024) def extract_batch(self, images): return [self.extract(img) for img in images] class HOGFeatureExtractor: def __init__(self, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(3, 3)): self.orientations = orientations self.pixels_per_cell = pixels_per_cell self.cells_per_block = cells_per_block def extract(self, image): gray = color.rgb2gray(image) features = hog( gray, orientations=self.orientations, pixels_per_cell=self.pixels_per_cell, cells_per_block=self.cells_per_block, block_norm='L2-Hys' ) return features def extract_batch(self, images): return [self.extract(img) for img in images] class ORBFeatureExtractor: def __init__(self, nfeatures=500): self.orb = cv2.ORB_create(nfeatures=nfeatures) def extract(self, image): gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) keypoints, descriptors = self.orb.detectAndCompute(gray, None) if descriptors is not None: return descriptors.flatten()[:2048] else: return np.zeros(2048)

2.2 深度学习特征提取

import torch import torch.nn as nn from torchvision import models class CNNFeatureExtractor(nn.Module): def __init__(self, model_name='resnet50', feature_dim=2048): super().__init__() self.model = getattr(models, model_name)(pretrained=True) self.model = nn.Sequential(*list(self.model.children())[:-1]) self.feature_dim = feature_dim def forward(self, x): features = self.model(x) features = features.view(-1, self.feature_dim) return features def extract(self, image): self.eval() image = torch.tensor(image).permute(2, 0, 1).unsqueeze(0).float() with torch.no_grad(): features = self.forward(image) return features.squeeze().numpy() class ViTFeatureExtractor(nn.Module): def __init__(self, model_name='vit_b_16', feature_dim=768): super().__init__() self.model = getattr(models, model_name)(pretrained=True) self.feature_dim = feature_dim def forward(self, x): features = self.model(x) return features def extract(self, image): self.eval() image = torch.tensor(image).permute(2, 0, 1).unsqueeze(0).float() with torch.no_grad(): features = self.forward(image) return features.squeeze().numpy() class CLIPFeatureExtractor: def __init__(self, model_name='ViT-B/32'): import clip self.device = "cuda" if torch.cuda.is_available() else "cpu" self.model, self.preprocess = clip.load(model_name, device=self.device) def extract_image(self, image): image = self.preprocess(image).unsqueeze(0).to(self.device) with torch.no_grad(): features = self.model.encode_image(image) return features.squeeze().cpu().numpy() def extract_text(self, text): text = clip.tokenize([text]).to(self.device) with torch.no_grad(): features = self.model.encode_text(text) return features.squeeze().cpu().numpy()

2.3 特征融合

class FeatureFusion: def __init__(self, method='concatenation'): self.method = method def fuse(self, features_list): if self.method == 'concatenation': return np.concatenate(features_list, axis=1) elif self.method == 'average': return np.mean(features_list, axis=0) elif self.method == 'max': return np.max(features_list, axis=0) elif self.method == 'attention': weights = self._compute_weights(features_list) return np.sum([w * f for w, f in zip(weights, features_list)], axis=0) def _compute_weights(self, features_list): norms = [np.linalg.norm(f) for f in features_list] total = sum(norms) return [n / total for n in norms] class FeatureNormalizer: def __init__(self, norm_type='l2'): self.norm_type = norm_type def normalize(self, features): if self.norm_type == 'l2': return features / np.linalg.norm(features) elif self.norm_type == 'min-max': return (features - features.min()) / (features.max() - features.min()) elif self.norm_type == 'z-score': return (features - features.mean()) / features.std() class FeatureSelection: def __init__(self, method='pca', n_components=128): self.method = method self.n_components = n_components self.transformer = None def fit(self, features): if self.method == 'pca': from sklearn.decomposition import PCA self.transformer = PCA(n_components=self.n_components) self.transformer.fit(features) elif self.method == 'tsne': from sklearn.manifold import TSNE self.transformer = TSNE(n_components=self.n_components) def transform(self, features): if self.transformer is not None: return self.transformer.transform(features) return features class FeaturePipeline: def __init__(self, extractors, fusion_method='concatenation', normalizer=None): self.extractors = extractors self.fusion = FeatureFusion(fusion_method) self.normalizer = normalizer def extract(self, image): features_list = [] for extractor in self.extractors: features = extractor.extract(image) features_list.append(features) fused = self.fusion.fuse(features_list) if self.normalizer: fused = self.normalizer.normalize(fused) return fused

3. 性能对比

3.1 特征提取方法对比

方法特征维度提取速度(ms)识别准确率
SIFT10245075%
HOG37803070%
ResNet-50204810092%
ViT-B76815095%
CLIP51220096%

3.2 不同任务表现

任务SIFTResNetViTCLIP
图像分类70%92%95%96%
图像检索80%88%92%94%
图像匹配85%90%93%95%

3.3 特征融合效果

融合方法准确率特征维度
拼接94%3072
平均92%1024
最大91%1024
注意力95%1024

4. 最佳实践

4.1 特征提取器选择

def select_feature_extractor(task_type, constraints): if constraints.get('speed', False): return SIFTFeatureExtractor() elif constraints.get('accuracy', False): return CLIPFeatureExtractor() else: return CNNFeatureExtractor() class FeatureExtractorFactory: @staticmethod def create(config): if config['type'] == 'sift': return SIFTFeatureExtractor() elif config['type'] == 'cnn': return CNNFeatureExtractor(model_name=config.get('model_name', 'resnet50')) elif config['type'] == 'vit': return ViTFeatureExtractor(model_name=config.get('model_name', 'vit_b_16')) elif config['type'] == 'clip': return CLIPFeatureExtractor(model_name=config.get('model_name', 'ViT-B/32'))

4.2 特征提取流程

class FeatureExtractionPipeline: def __init__(self, extractor, normalizer=None, selector=None): self.extractor = extractor self.normalizer = normalizer self.selector = selector def process(self, images): features = [] for image in images: feature = self.extractor.extract(image) if self.normalizer: feature = self.normalizer.normalize(feature) features.append(feature) features = np.array(features) if self.selector: features = self.selector.transform(features) return features def fit(self, images): features = [] for image in images: feature = self.extractor.extract(image) if self.normalizer: feature = self.normalizer.normalize(feature) features.append(feature) features = np.array(features) if self.selector: self.selector.fit(features)

5. 总结

特征提取是计算机视觉的核心:

  1. 手工特征:适合小规模数据,快速简单
  2. 深度学习特征:适合大规模数据,效果好
  3. 预训练模型:CLIP、ViT 等效果最佳
  4. 特征融合:组合多种特征可提升性能

对比数据如下:

  • CLIP 在跨模态任务上表现最好
  • ViT 在图像分类上效果最优
  • 特征融合可提升 2-3% 准确率
  • 推荐使用预训练模型提取特征
http://www.jsqmd.com/news/806394/

相关文章:

  • Linux Deadline 调度器的应用场景:4K 视频解码与自动驾驶控制
  • 火山引擎 Agent Plan 初体验实测
  • ARM ETE Trace ID寄存器详解与应用
  • 如何解决多平台加密音乐格式不兼容问题?Unlock Music浏览器端解密技术深度解析
  • MISRA C与CERT C编码标准在汽车电子安全中的协同应用
  • Arm CoreSight TRCPIDR寄存器组解析与应用
  • Gemini3.1Pro数学代码推理能力再突破
  • 锂离子电池安全防护与加密电量计技术解析
  • AI辅助Android开发:从传统到智能化的技术演进
  • 开源状态监控工具openclaw-status:从原理到部署的完整实践指南
  • AI辅助下的ROS2开发:人形机器人在巡检场景中的应用实践
  • 罗技PUBG鼠标宏完整配置教程:告别压枪烦恼,轻松提升射击稳定性
  • 镜像视界视觉重构技术|跨镜轨迹全域贯穿,无感定位精细化管控白皮书
  • 常见404 500错误解析
  • 2026年4月食品输送带供应商口碑推荐,pvc输送带/食品输送带/输送带/工业皮带,食品输送带供应链有哪些 - 品牌推荐师
  • 大模型赋能能源转型:小白程序员必收藏的入门与进阶指南
  • 轻量级实时数据流异常检测:Entropy库原理与工程实践
  • InputTip:提升表单体验的动态输入引导组件设计与实战
  • 指针 引用区别
  • ARM AMU与PMU架构详解及性能监控实践
  • 3步掌握透明悬浮浏览器:终极多任务效率提升指南
  • OpenClaw/GenPark可视化设计器:图形化构建自动化流程
  • AI辅助开发在嵌入式软件工程(机器人方向)中的应用:技术深度解析与实践指南
  • 从零搭建AI虚拟主播:基于Zerolan Live Robot的完整实践指南
  • Codex Skill 执行机制:从加载、选择到按需读取
  • Source Insight 正常识别解析复杂类型宏定义
  • 大模型AI学习资料免费分享,抓住程序员高薪风口,速收藏!
  • MSP 盈利、留客、提口碑,核心就盯这12个 KPI
  • AI赋能的嵌入式机器人软件开发:新时代高级工程师的核心能力与实践
  • 低成本推客系统开发|花小钱做大销量,中小商家首选拓客方案