Module vision_model

Module vision_model 

Source
Expand description

Chinese contrastive Language-Image Pre-Training

Chinese contrastive Language-Image Pre-Training (CLIP) is an architecture trained on pairs of images with related texts.

  • 💻 Chinese-CLIP
  • 💻 [GH](https://github.com/huggingface/transformers/blob/5af7d41e49bbfc8319f462eb45253dcb3863dfb7/src/transformers/models/chinese_clip/modeling_chinese_clip.py_

Structs§

ChineseClipVisionConfig
ChineseClipVisionEmbeddings
ChineseClipVisionEncoder
ChineseClipVisionTransformer