An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Created by MG96

External Public cs.CV cs.AI cs.LG

Statistics

Citations
9998
References
58
Last updated
Loading...
Authors

Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly Jakob Uszkoreit Neil Houlsby
Project Resources

Name Type Source Actions
ArXiv Paper Paper arXiv
Semantic Scholar Paper Semantic Scholar
GitHub Repository Code Repository GitHub
google/vit-base-patch16-224 Model Hugging Face
Falconsai/nsfw_image_detection Model Hugging Face
google/vit-base-patch16-224-in21k Model Hugging Face
google/paligemma-3b-pt-224 Model Hugging Face
google/paligemma2-3b-pt-224 Model Hugging Face
google/paligemma-3b-pt-896 Model Hugging Face
google/paligemma-3b-mix-448 Model Hugging Face
google/paligemma-3b-mix-224 Model Hugging Face
google/paligemma2-28b-pt-896 Model Hugging Face
google/paligemma2-3b-pt-448 Model Hugging Face
google/paligemma2-3b-mix-448 Model Hugging Face
timm/vit_large_patch14_clip_224.openai_ft_in12k_in1k Model Hugging Face
google/vit-base-patch16-384 Model Hugging Face
google/paligemma2-10b-pt-896 Model Hugging Face
google/vit-large-patch16-224 Model Hugging Face
google/paligemma-3b-pt-448 Model Hugging Face
google/vit-large-patch16-224-in21k Model Hugging Face
google/paligemma2-10b-mix-448 Model Hugging Face
google/paligemma2-28b-mix-448 Model Hugging Face
google/paligemma2-3b-mix-224 Model Hugging Face
google/paligemma2-3b-pt-896 Model Hugging Face
google/vit-base-patch32-384 Model Hugging Face
google/vit-huge-patch14-224-in21k Model Hugging Face
google/vit-base-patch32-224-in21k Model Hugging Face
google/vit-large-patch32-384 Model Hugging Face
google/paligemma-3b-ft-vqav2-448 Model Hugging Face
qualcomm/VIT Model Hugging Face
google/paligemma2-10b-ft-docci-448 Model Hugging Face
google/paligemma-3b-ft-ocrvqa-896 Model Hugging Face
google/paligemma2-10b-pt-448 Model Hugging Face
google/vit-large-patch16-384 Model Hugging Face
google/paligemma2-3b-ft-docci-448 Model Hugging Face
timm/vit_large_patch14_dinov2.lvd142m Model Hugging Face
timm/vit_base_patch14_reg4_dinov2.lvd142m Model Hugging Face
google/paligemma2-28b-pt-448 Model Hugging Face
google/paligemma-3b-ft-docvqa-896 Model Hugging Face
timm/vit_base_patch16_224.augreg_in21k Model Hugging Face
google/paligemma2-10b-pt-224 Model Hugging Face
facebook/dinov2-with-registers-small Model Hugging Face
facebook/dinov2-with-registers-large Model Hugging Face
timm/vit_base_patch16_224.augreg2_in21k_ft_in1k Model Hugging Face
timm/vit_large_patch14_reg4_dinov2.lvd142m Model Hugging Face
google/paligemma2-10b-mix-224 Model Hugging Face
google/paligemma2-28b-pt-224 Model Hugging Face
Genius-Society/ViT Model Hugging Face
google/vit-hybrid-base-bit-384 Model Hugging Face
NetherlandsForensicInstitute/vuurwerkverkenner Model Hugging Face
facebook/dinov2-with-registers-giant Model Hugging Face
timm/vit_small_patch14_reg4_dinov2.lvd142m Model Hugging Face
timm/vit_base_patch16_clip_384.laion2b_ft_in1k Model Hugging Face
google/paligemma-3b-ft-ocrvqa-448 Model Hugging Face
google/paligemma-3b-ft-refcoco-seg-896 Model Hugging Face
google/paligemma-3b-ft-ocrvqa-896-jax Model Hugging Face
timm/vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k Model Hugging Face
google/hear Model Hugging Face
facebook/dinov2-with-registers-base Model Hugging Face
timm/vit_base_patch16_224.dino Model Hugging Face
timm/vit_small_patch16_224.dino Model Hugging Face
timm/vit_base_patch16_clip_384.laion2b_ft_in12k_in1k Model Hugging Face
google/paligemma-3b-ft-rsvqa-hr-224 Model Hugging Face
google/paligemma-3b-ft-ocrvqa-224 Model Hugging Face
google/paligemma-3b-ft-ai2d-224-jax Model Hugging Face
google/paligemma2-28b-mix-224 Model Hugging Face
google/paligemma2-28b-pt-896-jax Model Hugging Face
timm/vit_base_patch16_224.orig_in21k_ft_in1k Model Hugging Face
timm/vit_base_r50_s16_384.orig_in21k_ft_in1k Model Hugging Face
probing-vits/vit-dino-base16 Model Hugging Face
wesleyacheng/dog-breeds-multiclass-image-classification-with-vit Model Hugging Face
timm/vit_base_patch14_dinov2.lvd142m Model Hugging Face
timm/vit_small_patch14_dinov2.lvd142m Model Hugging Face
facebook/hiera_base_224.mae_in1k_ft_in1k Model Hugging Face
google/paligemma-3b-pt-224-jax Model Hugging Face
google/paligemma-3b-ft-widgetcap-448 Model Hugging Face
google/paligemma-3b-ft-cococap-448 Model Hugging Face
google/paligemma-3b-ft-docvqa-896-jax Model Hugging Face
google/paligemma2-3b-pt-224-jax Model Hugging Face
timm/vit_base_patch32_224.augreg_in21k_ft_in1k Model Hugging Face
timm/vit_huge_patch14_224.orig_in21k Model Hugging Face
timm/vit_small_patch8_224.dino Model Hugging Face
timm/vit_small_patch16_224.augreg_in1k Model Hugging Face
timm/vit_small_patch16_384.augreg_in21k_ft_in1k Model Hugging Face
timm/vit_small_patch32_224.augreg_in21k_ft_in1k Model Hugging Face
timm/beitv2_large_patch16_224.in1k_ft_in22k_in1k Model Hugging Face
Matthijs/vit-base-patch16-224 Model Hugging Face
timm/vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k Model Hugging Face
timm/vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k Model Hugging Face
timm/vit_base_patch32_clip_224.laion2b_ft_in12k_in1k Model Hugging Face
timm/vit_base_patch32_clip_448.laion2b_ft_in12k_in1k Model Hugging Face
timm/vit_base_patch16_clip_224.laion2b_ft_in12k_in1k Model Hugging Face
timm/vit_large_patch16_224.orig_in21k Model Hugging Face
DenisNovac/nsfw_image_detection Model Hugging Face
facebook/hiera-tiny-224-in1k-hf Model Hugging Face
google/paligemma-3b-pt-448-jax Model Hugging Face
google/paligemma-3b-pt-896-jax Model Hugging Face
google/paligemma-3b-ft-vqav2-224 Model Hugging Face
google/paligemma-3b-ft-science-qa-448 Model Hugging Face
google/paligemma-3b-ft-widgetcap-224 Model Hugging Face
google/paligemma-3b-ft-widgetcap-448-jax Model Hugging Face
google/paligemma-3b-ft-vizwizvqa-448-jax Model Hugging Face
google/paligemma-3b-ft-vqav2-224-jax Model Hugging Face
kakaobrain/coyo-700m Dataset Hugging Face
kakaobrain/coyo-labeled-300m Dataset Hugging Face
miral91/imagnet21K Dataset Hugging Face
yanze/PuLID-FLUX Space/Demo Hugging Face
gunship999/SexyImages Space/Demo Hugging Face
Yntec/ToyWorld Space/Demo Hugging Face
DamarJati/FLUX.1-DEV-Canny Space/Demo Hugging Face
mfidabel/controlnet-segment-anything Space/Demo Hugging Face
llamameta/flux-pro-uncensored Space/Demo Hugging Face
fantaxy/flx-pulid Space/Demo Hugging Face
Nymbo/Compare-6 Space/Demo Hugging Face
Yntec/PrintingPress Space/Demo Hugging Face
M2UGen/M2UGen-Demo Space/Demo Hugging Face
Uthar/SexyReality Space/Demo Hugging Face
google/paligemma2-10b-mix Space/Demo Hugging Face
big-vision/paligemma-hf Space/Demo Hugging Face
llamameta/fluxproV2 Space/Demo Hugging Face
merve/paligemma-doc Space/Demo Hugging Face
Yntec/ToyWorldXL Space/Demo Hugging Face
merve/paligemma2-vqav2 Space/Demo Hugging Face
guardiancc/flux-advanced-explorer Space/Demo Hugging Face
phenixrhyder/NSFW-ToyWorld Space/Demo Hugging Face
merve/paligemma-tracking Space/Demo Hugging Face
Yntec/blitz_diffusion Space/Demo Hugging Face
OpenGenAI/open-parti-prompts Space/Demo Hugging Face
hugging-fellows/paper-to-pokemon Space/Demo Hugging Face
probing-vits/attention-heat-maps Space/Demo Hugging Face
John6666/Diffusion80XX4sg Space/Demo Hugging Face
Deddy/PuLid-FLX-GPU Space/Demo Hugging Face
John6666/PrintingPress4 Space/Demo Hugging Face
RitaParadaRamos/SmallCapDemo Space/Demo Hugging Face
onuralpszr/paligemma2-detection Space/Demo Hugging Face
llamameta/fast-sd3.5-large Space/Demo Hugging Face
sonalkum/GAMA Space/Demo Hugging Face
sofianhw/PuLID-FLUX Space/Demo Hugging Face
Abinivesh/Multi-models-prompt-to-image-generation Space/Demo Hugging Face
AnnasBlackHat/Image-Similarity Space/Demo Hugging Face
flax-community/koclip Space/Demo Hugging Face
JournalistsonHF/text-to-image-bias Space/Demo Hugging Face
Yntec/Image-Models-Test-April-2024 Space/Demo Hugging Face
DemiPoto/TestDifs Space/Demo Hugging Face
Aanisha/Image_to_story Space/Demo Hugging Face
Nuno-Tome/simple_image_classifier Space/Demo Hugging Face
team-indain-image-caption/Hindi-image-captioning Space/Demo Hugging Face
Yntec/Image-Models-Test-2024 Space/Demo Hugging Face
Yntec/Image-Models-Test Space/Demo Hugging Face
Justinrune/LLaMA-Factory Space/Demo Hugging Face
ALM/CALM Space/Demo Hugging Face
John6666/hfd_test_nostopbutton Space/Demo Hugging Face
yergyerg/ImgGenClone Space/Demo Hugging Face
qiuzhi2046/PuLID-FLUX Space/Demo Hugging Face
Woleek/image-based-soundtrack-generation Space/Demo Hugging Face
abidlabs/vision-transformer Space/Demo Hugging Face
omerXfaruq/FindYourTwins Space/Demo Hugging Face
theaiinstitute/theia Space/Demo Hugging Face
stazizov/XFluxSpace Space/Demo Hugging Face
Yntec/Image-Models-Test-May-2024 Space/Demo Hugging Face
Hatman/AWS-Nova-Canvas Space/Demo Hugging Face
Nymbo/Diffusion80XX4sg Space/Demo Hugging Face
Yntec/Image-Models-Test-September-2024 Space/Demo Hugging Face
DemiPoto/testSortModels Space/Demo Hugging Face
JCTN/controlnet-segment-anything Space/Demo Hugging Face
42digital/DeepFashion_Classification Space/Demo Hugging Face
Shriharshan/Image-Caption-Generator Space/Demo Hugging Face
Deadmon/FLUX.1-DEV-Canny Space/Demo Hugging Face
autonomous019/image_story_generator Space/Demo Hugging Face
SunderAli17/ToonMage Space/Demo Hugging Face
tonyassi/product-recommendation Space/Demo Hugging Face
yasserrmd/MagicDoodles Space/Demo Hugging Face
kenken999/fastapi_django_main_live Space/Demo Hugging Face
Chakshu123/image-colorization-with-hint Space/Demo Hugging Face
akhaliq/paligemma2-10b-ft-docci-448 Space/Demo Hugging Face
dwb2023/omniscience Space/Demo Hugging Face
egmaminta/indoor-scene-recognition-to-speech Space/Demo Hugging Face
Amrrs/image-caption-with-vit-gpt2 Space/Demo Hugging Face
Somnath3570/food_calories Space/Demo Hugging Face
probing-vits/attention-rollout Space/Demo Hugging Face
gagan3012/ViTGPT2 Space/Demo Hugging Face
Yntec/MiniToyWorld Space/Demo Hugging Face
dennisjooo/Age-and-Emotion-Classifier Space/Demo Hugging Face
dwb2023/hf_extractor Space/Demo Hugging Face
Ramos-Ramos/visual-emb-gam-probing Space/Demo Hugging Face
Chakshu123/sketch-colorization-with-hint Space/Demo Hugging Face
John6666/ToyWorld4 Space/Demo Hugging Face
John6666/Diffusion80XX4g Space/Demo Hugging Face
SAITAN666/StableDiffusion35Large-Image-Models-Test-November-2024 Space/Demo Hugging Face
kaleidoskop-hug/PrintingPress Space/Demo Hugging Face
Yntec/Image-Models-Test-December-2024 Space/Demo Hugging Face
martynka/TasiaExperiment Space/Demo Hugging Face
sonalkum/GAMA-IT Space/Demo Hugging Face
abidlabs/image-classifier Space/Demo Hugging Face
hysts/space-that-creates-model-demo-space Space/Demo Hugging Face
st0bb3n/Cam2Speech Space/Demo Hugging Face
juliensimon/battle_of_image_classifiers Space/Demo Hugging Face
Npps/Food_Indentification_and_Nutrition_Info Space/Demo Hugging Face
John6666/Diffusion80XX4 Space/Demo Hugging Face
K00B404/HuggingfaceDiffusion_custom Space/Demo Hugging Face
John6666/blitz_diffusion4 Space/Demo Hugging Face
John6666/blitz_diffusion_builtin Space/Demo Hugging Face
NativeAngels/HuggingfaceDiffusion Space/Demo Hugging Face
K00B404/SimpleBrothel Space/Demo Hugging Face
rphrp1985/PuLID-FLUX Space/Demo Hugging Face
Saee/vQA-exploration Space/Demo Hugging Face
Abstract

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

Note:

No note available for this project.

No note available for this project.
Contact:

No contact available for this project.

No contact available for this project.