Catching bugs and hallucinations

In the News

Brinnae Bent

Jul 02, 2024

Instagram’s new “AI Studio” will let creators make AI chatbot versions of themselves. Meta wants to let creators and eventually small businesses “create an AI for themselves” to interact with their communities and customers.

🥁 Interesting Products & Features

Claude Projects - Each project includes a 200K context window (the equivalent of a 500-page book) so users can add all of the relevant documents, code, and insights to enhance Claude’s effectiveness. Projects can also be shared across teams.
Gemini 1.5 Pro 2M context window, code execution capabilities, and Gemma 2 now available - not new, but these features are finally moving from waitlist to available to all developers
CriticGPT, a model based on GPT-4, writes critiques of ChatGPT responses to help human trainers spot mistakes during RLHF. (See paper summary below, LLM Critics Help Catch LLM Bugs)

📄 Interesting Papers

Meta Large Language Model Compiler: Foundation Models of Compiler Optimization - a suite of robust, openly available, pre-trained models specifically designed for code optimization tasks. Built on the foundation of Code Llama, LLM Compiler (7B and 13B) enhances the understanding of compiler intermediate representations, assembly language, and optimization techniques. Has undergone instruction fine-tuning to interpret compiler behavior. Authors from Meta.
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data - Model that generates images from multimodal prompts of interleaved text and images such as "a <picture of a man> man and his <picture of a dog> dog in an <picture of a cartoon> animated style." The MUMU model generalizes to tasks such as style transfer and character consistency. Authors from Sutter Hill Ventures (private equity firm).
Suri: Multi-constraint Instruction Following for Long-form Text Generation - This paper introduces a new dataset, Suri, which contains multi-constraint instructions (~10 constraints per instruction) and lengthy gold responses (between 2k and 5k words). The dataset contains 20K human-written long-form texts paired with LLM-generated backtranslated instructions that contain multiple complex constraints. Authors from University of Massachusetts Amherst.
LLM Critics Help Catch LLM Bugs - Reinforcement learning from human feedback (RLHF) is fundamentally limited by the capacity of humans to correctly evaluate model output. To improve human evaluation ability and overcome that limitation this work trains “critic” models that help humans to more accurately evaluate model-written code. These critics are themselves LLMs trained with RLHF to write natural language feedback highlighting problems in code from real-world assistant tasks. Fine-tuned LLM critics can successfully identify hundreds of errors in ChatGPT training data rated as “flawless”, even though the majority of those tasks are non-code tasks and thus out-of-distribution for the critic model. Authors from OpenAI.
Banishing LLM Hallucinations Requires Rethinking Generalization - The authors developed a model to reduce hallucinations - Lamini-1 - that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically. Here’s a blog with more details. Authors from Lamini.
Retrieval Augmented Instruction Tuning for Open NER with Large Language Models - In this paper, the authors explore Retrieval Augmented Instruction Tuning (RA-IT) for information extraction, focusing on the task of open named entity recognition (NER). Specifically, for each training sample, they retrieve semantically similar examples from the training dataset as the context and prepend them to the input of the original instruction. Authors from Tencent AI Lab Seattle.

🧠 Sources of Inspiration

Claude Engineer - CLI tool that leverages Anthropic's Claude-3.5-Sonnet model to assist with software development tasks. This tool combines the capabilities of a large language model with practical file system operations and web search functionality.

Header image from Meta Large Language Model Compiler: Foundation Models of Compiler Optimization.

Spill the GPTea

Discussion about this post

Spill the GPTea

Catching bugs and hallucinations

In the News

🗞 This Week in News

🥁 Interesting Products & Features

📄 Interesting Papers

🧠 Sources of Inspiration

Discussion about this post