
Matei Zaharia
Chief Technology Officer and Co-Founder at Databricks
CTO at @Databricks and CS prof at @UCBerkeley. Working on data+AI, including @ApacheSpark, @DeltaLakeOSS, @MLflow, https://t.co/94gROE5Xa0. https://t.co/nmRYAKG0LZ
Articles
-
2 months ago |
arxiv.org | Jan Luca |Matei Zaharia |Christopher Potts |Gustavo Alonso
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
-
Nov 13, 2024 |
databricks.com | Naveen Rao |Matei Zaharia |Patrick Wendell |Eric Peter
Monolithic to ModularThe proof of concept (POC) of any new technology often starts with large, monolithic units that are difficult to characterize. By definition, POCs are designed to show that a technology works without considering issues around extensibility, maintenance, and quality. However, once technologies achieve maturity and are deployed widely, these needs drive product development to be broken down into smaller, more manageable units.
-
Oct 8, 2024 |
databricks.com | Quinn Leng |Jacob P. Portes |Sam Havens |Matei Zaharia
Retrieval Augmented Generation (RAG) is the top use case for Databricks customers who want to customize AI workflows on their own data. The pace of large language model releases is incredibly fast, and many of our customers are looking for up-to-date guidance on how to build the best RAG pipelines. In a previous blog post, we ran over 2,000 long context RAG experiments on 13 popular open source and commercial LLMs to uncover their performance on various domain-specific datasets.
-
Oct 2, 2024 |
databricks.com | Linqing Liu |Matthew Hayes |Ritendra Datta |Matei Zaharia
IntroductionApplying Large Language Models (LLMs) for code generation is becoming increasingly prevalent, as it helps you code faster and smarter. A primary concern with LLM-generated code is its correctness. Most open-source coding benchmarks are designed to evaluate general coding skills. But, in enterprise environments, the LLMs must be capable not only of general programming but also of utilizing domain-specific libraries and tools, such as MLflow and Spark SQL.
-
Aug 12, 2024 |
databricks.com | Quinn Leng |Jacob P. Portes |Sam Havens |Matei Zaharia
Retrieval Augmented Generation (RAG) is the most widely adopted generative AI use case among our customers. RAG enhances the accuracy of LLMs by retrieving information from external sources such as unstructured documents or structured data.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →X (formerly Twitter)
- Followers
- 42K
- Tweets
- 2K
- DMs Open
- No

RT @databricks: There’s still time to join tomorrow’s webinar with Databricks CEO @AliGhodsi, @AnthropicAI CEO @DarioAmodei, and other indu…

RT @lateinteraction: This was super cool work from Tomu Hirata and the rest of the DSPy and MLflow OSS teams at @databricks. Understanding…

RT @tqchenml: Less than a month to go before #MLSys2025. We have a great line of keynote speakers @soumithchintala @istoica05 @AnimaAnandku…