Expand description
Chinese contrastive Language-Image Pre-Training
Chinese contrastive Language-Image Pre-Training (CLIP) is an architecture trained on pairs of images with related texts.
- 💻 Chinese-CLIP
- 💻 HF
Structs§
Enums§
- Position
Embedding Type - Type of position embedding. Choose one of
"absolute","relative_key","relative_key_query". For positional embeddings use"absolute". For more information on"relative_key", please refer to Self-Attention with Relative Position Representations (Shaw et al.). For more information on"relative_key_query", please refer to Method 4 in Improve Transformer Models with Better Relative Position Embeddings (Huang et al.).