
Conghui He
Featured in:
arxiv.org
Articles
-
Dec 30, 2023 |
linyq17.github.io | Conghui He |Alex Wang |Bin Wang |Weijia Li
TL;DR Captions in LAION-2B have a significant bias towards describing visual text content embedded in the images. Released CLIP models have strong text spotting bias almost in every style of web images, resulting in the CLIP-filtering datasets inherently biased towards visual text dominant data. CLIP models easily learn text spotting capacity from parrot captions while failing to connect the vision-language semantics, just like a text spotting parrot.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →