Articles

  • 1 week ago | aws.amazon.com | Shreyas Vathul Subramanian |Adewale Akinfaderin |Ishan Singh |Jesse Manders

    With Amazon Bedrock Evaluations, you can evaluate foundation models (FMs) and Retrieval Augmented Generation (RAG) systems, whether hosted on Amazon Bedrock or another model or RAG system hosted elsewhere, including Amazon Bedrock Knowledge Bases or multi-cloud and on-premises deployments. We recently announced the general availability of the large language model (LLM)-as-a-judge technique in model evaluation and the new RAG evaluation tool, also powered by an LLM-as-a-judge behind the scenes.

  • 2 weeks ago | aws.amazon.com | Yanyan Zhang |Ishan Singh |David Yan |Shreeya Sharma

    Amazon Bedrock Model Distillation is generally available, and it addresses the fundamental challenge many organizations face when deploying generative AI: how to maintain high performance while reducing costs and latency. This technique transfers knowledge from larger, more capable foundation models (FMs) that act as teachers to smaller, more efficient models (students), creating specialized models that excel at specific tasks.

  • 1 month ago | aws.amazon.com | Adewale Akinfaderin |Ishan Singh |Jesse Manders |Shreyas Vathul Subramanian

    AWS Machine Learning Blog Organizations deploying generative AI applications need robust ways to evaluate their performance and reliability. When we launched LLM-as-a-judge (LLMaJ) and Retrieval Augmented Generation (RAG) evaluation capabilities in public preview at AWS re:Invent 2024, customers used them to assess their foundation models (FMs) and generative AI applications, but asked for more flexibility beyond Amazon Bedrock models and knowledge bases.

  • 2 months ago | aws.amazon.com | Ishan Singh |Adewale Akinfaderin |Ayan Ray |Jesse Manders

    Organizations building and deploying AI applications, particularly those using large language models (LLMs) with Retrieval Augmented Generation (RAG) systems, face a significant challenge: how to evaluate AI outputs effectively throughout the application lifecycle. As these AI technologies become more sophisticated and widely adopted, maintaining consistent quality and performance becomes increasingly complex. Traditional AI evaluation approaches have significant limitations.

  • Aug 18, 2024 | dev.to | Ishan Singh

    This is a submission for the Build Better on Stellar: Smart Contract Challenge : Build a dApp What We BuiltHermes is a decentralized perpetual exchange built on the Stellar blockchain, designed to offer traders the ability to trade with up to 100x leverage. See it in action in our demo The platform draws heavy inspiration from Jupiter on Solana—not by copying its code, as Jupiter is not open-sourced, but rather by emulating its system architecture and trading mechanics.