
Arnab Dhua
Articles
-
Jun 13, 2024 |
amazon.science | Shasha Li |Ming Du |Arnab Dhua |Shuai Tang
Vision-language transformer models play a pivotal role in e-commerce product search. When using product description (e.g. product title) and product image pairs to train such models, there are often non-visual-descriptive text attributes in the product description, which makes the visual textual alignment challenging. We introduce MultiModal Learning with online Token Pruning (MML-TP).
-
Apr 16, 2024 |
amazon.science | Michael Huang |Xinliang Zhu |Arnab Dhua |Oriol Barbany Mayor
Multimodal search has become increasingly important in providing users with a natural and effective way to ex-press their search intentions. Images offer fine-grained details of the desired products, while text allows for easily incorporating search modifications. However, some existing multimodal search systems are unreliable and fail to address simple queries.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →