Ranking superintelligence, higher token limits, and a new speech model

In the News

Brinnae Bent

Jul 16, 2024

OpenAI has created an internal system to rank artificial general intelligence on a scale from 1-5. They believe that current tech is approaching the second level.
- Level 1: Chatbots, AI with conversational language
- Level 2: Reasoners, human-level problem solving
- Level 3: Agents, systems that take actions
- Level 4: Innovators, AI that can aid in invention
- Level 5: Organizations, AI that can do the work on an organization

🥁 Interesting Products & Features

Updates at Anthropic
- Artifacts can now be shared and published (and remixed)
- The max output token limit for Claude 3.5 Sonnet has been doubled (4096—>8192) in the Anthropic API
LlamaCloud and LlamaParse from LlamaIndex - parsing and managing RAG designed for enterprise
Gemma 2 (9B and 27B) now available in Google AI Studio
Rufus -Amazon’s LLM-based shopping assistant - an example of using generative AI to augment the shopping experience by getting product recs, compare options, and accessing orders

📄 Interesting Papers

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs [technical report] - speech foundation model with multiple speech understanding capabilities, including automatic speech recognition (ASR), spoken language identification (LID), speech emotion recognition (SER), and audio event detection (AED). Enables speech-to-speech translation, emotional voice chatting, interactive podcasts, and expressive audiobooks. Authors from Alibaba.
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision - a novel self-training approach for video instruction tuning, leveraging labeled video datasets to enhance Large Vision Language Models. They iteratively generate and filter answers containing the correct video labels to improve general video understanding and enables adaptation to new tasks. Authors from Stanford and Google.
Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning - Personalized text-to-image models allow users to generate varied styles of images for an object. However, visual structure and details of the object are often unexpectedly changed during the diffusion process. One major reason is that these diffusion-based approaches typically adopt a simple reconstruction objective during training, which can hardly enforce appropriate structural consistency between the generated and the reference images. This paper introduces a novel reinforcement learning framework to supervise the diffusion models to improve the quality of the generated images. Authors from University of Electronic Science and Technology of China.
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps - This paper describes a simple approach for detecting contextual hallucinations. They hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. They propose a hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). A linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. [GitHub] Authors from MIT, U Washington.
ViTime: A Visual Intelligence-Based Foundation Model for Time Series Forecasting - ViTime overcomes the limitations of numerical time series data fitting by utilizing visual data processing paradigms and employs a innovative data synthesis method during training, called Real Time Series. Experiments on a diverse set of forecasting datasets demonstrate that ViTime achieves SOTA zero-shot performance, even surpassing the best individually trained supervised models in some situations. [GitHub] Authors from City University of Hong Kong.

🧠 Sources of Inspiration

Just for fun: Rabbit Hole Explorer, where you can dive deep into any topic with content from the web

Cover image from Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

Spill the GPTea

Ranking superintelligence, higher token limits, and a new speech model

In the News

🗞 This Week in News

🥁 Interesting Products & Features

📄 Interesting Papers

🧠 Sources of Inspiration