Module text_model

Module text_model 

Source
Expand description

Chinese contrastive Language-Image Pre-Training

Chinese contrastive Language-Image Pre-Training (CLIP) is an architecture trained on pairs of images with related texts.

Structs§

ChineseClipTextConfig
ChineseClipTextEmbeddings
ChineseClipTextTransformer

Enums§

PositionEmbeddingType
Type of position embedding. Choose one of "absolute", "relative_key", "relative_key_query". For positional embeddings use "absolute". For more information on "relative_key", please refer to Self-Attention with Relative Position Representations (Shaw et al.). For more information on "relative_key_query", please refer to Method 4 in Improve Transformer Models with Better Relative Position Embeddings (Huang et al.).