
Jesse Manders
Articles
-
2 weeks ago |
aws.amazon.com | Adewale Akinfaderin |Ishan Singh |Jesse Manders |Shreyas Vathul Subramanian
AWS Machine Learning Blog Organizations deploying generative AI applications need robust ways to evaluate their performance and reliability. When we launched LLM-as-a-judge (LLMaJ) and Retrieval Augmented Generation (RAG) evaluation capabilities in public preview at AWS re:Invent 2024, customers used them to assess their foundation models (FMs) and generative AI applications, but asked for more flexibility beyond Amazon Bedrock models and knowledge bases.
-
1 month ago |
aws.amazon.com | Ishan Singh |Adewale Akinfaderin |Ayan Ray |Jesse Manders
Organizations building and deploying AI applications, particularly those using large language models (LLMs) with Retrieval Augmented Generation (RAG) systems, face a significant challenge: how to evaluate AI outputs effectively throughout the application lifecycle. As these AI technologies become more sophisticated and widely adopted, maintaining consistent quality and performance becomes increasingly complex. Traditional AI evaluation approaches have significant limitations.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →