AI that smells trouble, generative ghosts, and the race between interpretability and model intelligence
In the News
🗞 General News
Anthropic CEO Dario Amodei on the urgency of interpretability -”Many of the risks and worries associated with generative AI are ultimately consequences of this opacity, and would be much easier to address if the models were interpretable.” Dario also refers to the “race between interpretability and model intelligence.”
Anthropic is exploring model welfare - they will explore how to determine when, or if, the welfare of AI systems deserves moral consideration; the potential importance of model preferences and signs of distress; and possible practical, low-cost interventions.
Detecting and Countering Malicious Uses of Claude - Report from Anthropic on recent case studies (March 2025) on how actors have misused the Claude models, and the steps Anthropic has taken to detect and counter the misuse
Report on Understanding and Addressing AI Harms from Anthropic - the team shares insights into their evolving approach to assessing and mitigating various harms that could result from generative AI systems, ranging from catastrophic scenarios like biological threats to critical concerns like child safety, disinformation and fraud
The US Government Accountability Office releases a report on Generative AI's Environmental and Human Effects - Discusses policy options that could enhance the benefits or address the challenges of environmental and human effects of generative artificial intelligence
LLM Arena Pareto Frontier - shows how models stack up based on their performance (LLM Arena ELO rating) and cost (price per million tokens)
🥁 Interesting Products & Features
AI Nose lets robots smell trouble, infections, and gas leaks before humans can - a leading robotics company, Ainos, has introduced an AI Nose system which combines a high-precision gas sensor array, real-time signal processing, and AI algorithms to identify and digitize a wide range of scents
Mobility AI from Google Research - provides transportation agencies with tools for data-driven policymaking, traffic management, and continuous monitoring of urban transportation systems.
📄 Interesting Papers
Generative Ghosts: Anticipating Benefits and Risks of AI Afterlives - The authors anticipate that it may become common practice for people to create a custom AI agent to interact with loved ones and/or the broader world after death; which they call generative ghosts since such agents will be capable of generating novel content rather than merely parroting content produced by their creator while living. This paper discusses the design space of potential implementations of generative ghosts and the practical and ethical implications of generative ghosts, including potential positive and negative impacts on individuals and society. Authors from Google DeepMind.
Evaluating Evaluation Metrics – The Mirage of Hallucination Detection - Many metrics have been proposed to assess faithfulness and factuality concerns in LLM generation; however, the robustness and generalization of these metrics are still untested. This research conducts a large-scale evaluation of 6 sets of hallucination detection metrics across 4 datasets, 37 language models from 5 families, and 5 decoding methods. The investigation reveals concerning gaps in current hallucination evaluation: metrics often fail to align with human judgments, take an overtly myopic view of the problem, and show inconsistent gains with parameter scaling. LLM-based evaluation yields the best overall results, and mode-seeking decoding methods seem to reduce hallucinations, especially in knowledge-grounded settings. Authors from University of Southern California and Apple.
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator - This paper proposes an Audio-Visual Generation and Separation model (AV-GAS) for generating images from soundscapes (mixed audio containing multiple classes). They propose a new challenge in audio-visual generation, which is to generate an image given a multi-class audio input, and they develop a method that solves this task using an audio-visual separator. They also introduce a new audio-visual separation task, which involves generating separate images for each class present in a mixed audio input. Authors from King’s College London.
Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts - This paper investigates how concept-based models (CMs) respond to out-of-distribution (OOD) inputs. CMs are neural architectures that first predict a set of high-level concepts (e.g., stripes, black) and then predict a task label from those concepts. This study explores the impact of concept interventions (i.e., operations where a human expert corrects a CM’s mispredicted concepts at test time) on CMs’ task predictions when inputs are OOD. The analysis revealed a weakness in current state-of-the-art CMs, “leakage poisoning”, which prevents them from properly improving their accuracy when intervened on for OOD inputs. To address this, the paper introduces MixCEM, a CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution. Authors from University of Cambridge.
FlowReasoner: Reinforcing Query-Level Meta-Agents - This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. The objective is to incentivize a reasoning-based meta-agent via external execution feedback. They combine distillation of DeepSeek R1 with reinforcement learning with a multi-purpose reward designed to guide the RL training from aspects of performance, complexity, and efficiency. FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. GitHub. Authors from Sea AI Lab and National University of Singapore.
🧠 Sources of Inspiration
Resilient AI Infrastructure (by Harvey team) - they discuss how they overcome challenges in deploying an LLM-based application at scale and share best practices for load balancing and monitoring
Cover photo generated using ChatGPT.