Bryce Dubayah

Featured in:

Articles

Driving model performance optimization: 2024 highlights

Jan 9, 2025 | baseten.co | Pankaj Gupta |William Gao |Bryce Dubayah

Baseten’s Model Performance Team is a group of engineers dedicated to adapting cutting-edge inference optimization research and productionizing it to support high-volume real-world workloads. Our roadmap is customer-driven and focuses on:Latency: Achieving top-tier TTFT, TPOT, and other key latency metrics. Scalability: Supporting massive expansions in user volume without compromising latency or quality. Quality: Retaining or enhancing model output quality throughout the optimization process.
How to build function calling and JSON mode for open-source and fine-tuned LLMs

Sep 12, 2024 | baseten.co | Bryce Dubayah |Philip Kiely |Abu Qader |Pankaj Gupta

Today, we announced support for function calling and structured output for LLMs deployed with our TensorRT-LLM Engine Builder. This adds support at the model server level for two key features:Function calling: also known as â€śtool use,â€ť this feature lets you pass a set of defined tools to a LLM as part of the request body. Based on the prompt, the model selects and returns the most appropriate function/tool from the provided options.

Try JournoFinder For Free

Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.