AI大模型基础——物体识别模块解析(下)
话接上集,我们在了解完Colab
和## Hugging Face
之后,使用了pipeline
内部的sentiment-analysis
是一个情感分析的模块,并通过它了解到了大型语言模型处理用户输入理情感类问题的一个简化流程。
- 输入解析与理解
- 情感分析
- 内容拆分与模块分配
- 信息整合与生成回复
- 输出优化与反馈
如果不清楚的伙伴可以再复习一下前一篇文章。
物体识别
在设计程序之前我们要熟悉一些库中模型的使用这里我们使用在,Python中,可以通过以下方式查询transformers库支持的所有任务类型:
from transformers.pipelines import SUPPORTED_TASKS
print(SUPPORTED_TASKS)
格式化文件:
dict_items([('audio-classification',{'impl':<class'transformers.pipelines.audio_classification.AudioClassificationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForAudioClassification'>,),'default':{'model':{'pt':('superb/wav2vec2-base-superb-ks','372e048')}},'type':'audio'}),('automatic-speech-recognition',{'impl':<class'transformers.pipelines.automatic_speech_recognition.AutomaticSpeechRecognitionPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForCTC'>,<class'transformers.models.auto.modeling_auto.AutoModelForSpeechSeq2Seq'>),'default':{'model':{'pt':('facebook/wav2vec2-base-960h','55bb623')}},'type':'multimodal'}),('text-to-audio',{'impl':<class'transformers.pipelines.text_to_audio.TextToAudioPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForTextToWaveform'>,<class'transformers.models.auto.modeling_auto.AutoModelForTextToSpectrogram'>),'default':{'model':{'pt':('suno/bark-small','645cfba')}},'type':'text'}),('feature-extraction',{'impl':<class'transformers.pipelines.feature_extraction.FeatureExtractionPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModel'>,),'default':{'model':{'pt':('distilbert-base-cased','935ac13'),'tf':('distilbert-base-cased','935ac13')}},'type':'multimodal'}),('text-classification',{'impl':<class'transformers.pipelines.text_classification.TextClassificationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForSequenceClassification'>,),'default':{'model':{'pt':('distilbert-base-uncased-finetuned-sst-2-english','af0f99b'),'tf':('distilbert-base-uncased-finetuned-sst-2-english','af0f99b')}},'type':'text'}),('token-classification',{'impl':<class'transformers.pipelines.token_classification.TokenClassificationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForTokenClassification'>,),'default':{'model':{'pt':('dbmdz/bert-large-cased-finetuned-conll03-english','f2482bf'),'tf':('dbmdz/bert-large-cased-finetuned-conll03-english','f2482bf')}},'type':'text'}),('question-answering',{'impl':<class'transformers.pipelines.question_answering.QuestionAnsweringPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForQuestionAnswering'>,),'default':{'model':{'pt':('distilbert-base-cased-distilled-squad','626af31'),'tf':('distilbert-base-cased-distilled-squad','626af31')}},'type':'text'}),('table-question-answering',{'impl':<class'transformers.pipelines.table_question_answering.TableQuestionAnsweringPipeline'>,'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForTableQuestionAnswering'>,),'tf':(),'default':{'model':{'pt':('google/tapas-base-finetuned-wtq','69ceee2'),'tf':('google/tapas-base-finetuned-wtq','69ceee2')}},'type':'text'}),('visual-question-answering',{'impl':<class'transformers.pipelines.visual_question_answering.VisualQuestionAnsweringPipeline'>,'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForVisualQuestionAnswering'>,),'tf':(),'default':{'model':{'pt':('dandelin/vilt-b32-finetuned-vqa','4355f59')}},'type':'multimodal'}),('document-question-answering',{'impl':<class'transformers.pipelines.document_question_answering.DocumentQuestionAnsweringPipeline'>,'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForDocumentQuestionAnswering'>,),'tf':(),'default':{'model':{'pt':('impira/layoutlm-document-qa','52e01b3')}},'type':'multimodal'}),('fill-mask',{'impl':<class'transformers.pipelines.fill_mask.FillMaskPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForMaskedLM'>,),'default':{'model':{'pt':('distilroberta-base','ec58a5b'),'tf':('distilroberta-base','ec58a5b')}},'type':'text'}),('summarization',{'impl':<class'transformers.pipelines.text2text_generation.SummarizationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>,),'default':{'model':{'pt':('sshleifer/distilbart-cnn-12-6','a4f8f3e'),'tf':('t5-small','d769bba')}},'type':'text'}),('translation',{'impl':<class'transformers.pipelines.text2text_generation.TranslationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>,),'default':{('en','fr'):{'model':{'pt':('t5-base','686f1db'),'tf':('t5-base','686f1db')}},('en','de'):{'model':{'pt':('t5-base','686f1db'),'tf':('t5-base','686f1db')}},('en','ro'):{'model':{'pt':('t5-base','686f1db'),'tf':('t5-base','686f1db')}}},'type':'text'}),('text2text-generation',{'impl':<class'transformers.pipelines.text2text_generation.Text2TextGenerationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>,),'default':{'model':{'pt':('t5-base','686f1db'),'tf':('t5-base','686f1db')}},'type':'text'}),('text-generation',{'impl':<class'transformers.pipelines.text_generation.TextGenerationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>,),'default':{'model':{'pt':('gpt2','6c0e608'),'tf':('gpt2','6c0e608')}},'type':'text'}),('zero-shot-classification',{'impl':<class'transformers.pipelines.zero_shot_classification.ZeroShotClassificationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForSequenceClassification'>,),'default':{'model':{'pt':('facebook/bart-large-mnli','c626438'),'tf':('roberta-large-mnli','130fb28')},'config':{'pt':('facebook/bart-large-mnli','c626438'),'tf':('roberta-large-mnli','130fb28')}},'type':'text'}),('zero-shot-image-classification',{'impl':<class'transformers.pipelines.zero_shot_image_classification.ZeroShotImageClassificationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForZeroShotImageClassification'>,),'default':{'model':{'pt':('openai/clip-vit-base-patch32','f4881ba'),'tf':('openai/clip-vit-base-patch32','f4881ba')}},'type':'multimodal'}),('zero-shot-audio-classification',{'impl':<class'transformers.pipelines.zero_shot_audio_classification.ZeroShotAudioClassificationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModel'>,),'default':{'model':{'pt':('laion/clap-htsat-fused','973b6e5')}},'type':'multimodal'}),('conversational',{'impl':<class'transformers.pipelines.conversational.ConversationalPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>,<class'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>),'default':{'model':{'pt':('microsoft/DialoGPT-medium','8bada3b'),'tf':('microsoft/DialoGPT-medium','8bada3b')}},'type':'text'}),('image-classification',{'impl':<class'transformers.pipelines.image_classification.ImageClassificationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForImageClassification'>,),'default':{'model':{'pt':('google/vit-base-patch16-224','5dca96d'),'tf':('google/vit-base-patch16-224','5dca96d')}},'type':'image'}),('image-segmentation',{'impl':<class'transformers.pipelines.image_segmentation.ImageSegmentationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForImageSegmentation'>,<class'transformers.models.auto.modeling_auto.AutoModelForSemanticSegmentation'>),'default':{'model':{'pt':('facebook/detr-resnet-50-panoptic','fc15262')}},'type':'multimodal'}),('image-to-text',{'impl':<class'transformers.pipelines.image_to_text.ImageToTextPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForVision2Seq'>,),'default':{'model':{'pt':('ydshieh/vit-gpt2-coco-en','65636df'),'tf':('ydshieh/vit-gpt2-coco-en','65636df')}},'type':'multimodal'}),('object-detection',{'impl':<class'transformers.pipelines.object_detection.ObjectDetectionPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForObjectDetection'>,),'default':{'model':{'pt':('facebook/detr-resnet-50','2729413')}},'type':'multimodal'}),('zero-shot-object-detection',{'impl':<class'transformers.pipelines.zero_shot_object_detection.ZeroShotObjectDetectionPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForZeroShotObjectDetection'>,),'default':{'model':{'pt':('google/owlvit-base-patch32','17740e1')}},'type':'multimodal'}),('depth-estimation',{'impl':<class'transformers.pipelines.depth_estimation.DepthEstimationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForDepthEstimation'>,),'default':{'model':{'pt':('Intel/dpt-large','e93beec')}},'type':'image'}),('video-classification',{'impl':<class'transformers.pipelines.video_classification.VideoClassificationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForVideoClassification'>,),'default':{'model':{'pt':('MCG-NJU/videomae-base-finetuned-kinetics','4800870')}},'type':'video'}),('mask-generation',{'impl':<class'transformers.pipelines.mask_generation.MaskGenerationPipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForMaskGeneration'>,),'default':{'model':{'pt':('facebook/sam-vit-huge','997b15')}},'type':'multimodal'}),('image-to-image',{'impl':<class'transformers.pipelines.image_to_image.ImageToImagePipeline'>,'tf':(),'pt':(<class'transformers.models.auto.modeling_auto.AutoModelForImageToImage'>,),'default':{'model':{'pt':('caidas/swin2SR-classical-sr-x2-64','4aaedcb')}},'type':'image'})])
接下来我们对下面这张图片进行物体检索
1.添加检测模型
from transformers import pipeline
detector =pipeline("zero-shot-object-detection",model='google/owlvit-base-patch32')
2.添加图片处理库,并进行检测
import requests
//python 图片处理库
from PIL import Image
url="https://images.pexels.com/photos/125514/pexels-photo-125514.jpeg?auto=compress&cs=tinysrgb&w=600"
img =Image.open(requests.get(url,stream=True).raw)
predictions=detector(
img,
candidate_labels=['car','wheel']
)
predictions
运行返回对象数组结果。这里我们监视的是car
,wheel
这两个物体,然后对象内部会将物体的label
,box
,score
数据返回,这就是我们检索到的数据的参数,果如我们后续有其他操作,那么这些参数将被调用。
3.导入绘制库,渲染属性面板
from PIL import ImageDraw
draw =ImageDraw.Draw(img)
for prediction in predictions:
box=prediction["box"]
lable=prediction["label"]
score=prediction["score"]
xmin, ymin, xmax, ymax=box.values()
if prediction["label"]=='car':
raw.rectangle((xmin,ymin,xmax,ymax),outline='red',width=1)
draw.text((xmin,ymin),f"{lable}:{round(score,2)}",fill='red')
else :
draw.rectangle((xmin,ymin,xmax,ymax),outline='blue',width=1)
draw.text((xmin,ymin),f"{lable}:{round(score,2)}",fill='blue')
img
这里我们要知道的是,当我们需要输出Image图片时我们只需直接将图片变量放在代码中便会进行编译;并且我们可以通过标签属性来对物体的识别进行分类,从而实现不同物体的不同渲染效果。
转载自:https://juejin.cn/post/7380274613186215955