Arnab Dhua

Featured in: Favicon

Articles

Computer vision

Jun 13, 2024 | amazon.science | Shasha Li |Ming Du |Arnab Dhua |Shuai Tang

Vision-language transformer models play a pivotal role in e-commerce product search. When using product description (e.g. product title) and product image pairs to train such models, there are often non-visual-descriptive text attributes in the product description, which makes the visual textual alignment challenging. We introduce MultiModal Learning with online Token Pruning (MML-TP).
Leveraging large language models for multimodal search

Apr 16, 2024 | amazon.science | Michael Huang |Xinliang Zhu |Arnab Dhua |Oriol Barbany Mayor

Multimodal search has become increasingly important in providing users with a natural and effective way to ex-press their search intentions. Images offer fine-grained details of the desired products, while text allows for easily incorporating search modifications. However, some existing multimodal search systems are unreliable and fail to address simple queries.

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.