A lot happened this week.

In the News

Brinnae Bent

Apr 01, 2025

NIST on Adversarial Machine Learning - this report provides a taxonomy of concepts and defines terminology in the field of adversarial machine learning (AML). The taxonomy is arranged in a conceptual hierarchy that includes key types of ML methods, life cycle stages of attack, and attacker goals, objectives, capabilities, and knowledge. This report identifies current challenges in the life cycle of AI systems and describes corresponding methods for mitigating and managing the consequences of those attacks. Taken together, the taxonomy and terminology are meant to inform other standards and future practice guides for assessing and managing the security of AI systems by establishing a common language for the rapidly developing AML landscape.
OpenAI adopts Anthropic’s Model Context Protocol - MCP is an open source standard for LLMs to more easily integrate with external tools. Learn more about it here.
Anthropic Economic Index v2 - releases second research report. Findings include a rise in usage for coding, educational, science, and healthcare applications; “extended thinking” mode is primarily being used for technical tasks, and they break down data at the task- and occupation- level. They also released the datasets, available here.

🥁 Interesting Products & Features

4o Image Generation - character consistency, consistency between generations while editing, text rendering, restyling, and transparent layers. Incredible results but the primary use case so far has been… turning photos into anime?
Gemini 2.5 - debuts at the top of a wide range of benchmarks and #1 on LMArena (human preferences) by a significant margin. The model exhibits enhanced reasoning, advanced coding, and is available in Google AI Studio.
DeepSeek-V3-0324 Release - improved reasoning performance, stronger front-end development skills, and smarter tool-use capabilities
FLUX.1 inpainting - adding inpainting capabilities on top of the FLUX.1 model
Significant improvements to AI-driven weather prediction - Aardvark Weather uses thousands of times less computing power and is much faster than current systems
Magic Doodles from Luma AI Ray2 Image-to-Video - turning doodles into animations

📄 Interesting Papers

Tracing the thoughts of a large language model: Two new papers from Anthropic:
- Circuit Tracing: Revealing Computational Graphs in Language Models - extending prior mechanistic interpretability work, the team introduces a method to uncover mechanisms underlying behaviors of language models. They produce graph descriptions of the model’s computation on prompts of interest by tracing individual computational steps in a “replacement model”. This replacement model substitutes a more interpretable component (here, a “cross-layer transcoder”) for parts of the underlying model (the multi-layer perceptrons) that it is trained to approximate. They also develop a suite of visualization and validation tools to investigate these “attribution graphs” supporting simple behaviors of an 18-layer language model.
- On the Biology of a Large Language Model - The paper applies attribution graphs to study Claude 3.5 Haiku to gain insights into the model's internal workings. They discovered that the model performs multi-step reasoning, plans ahead when writing poetry by preselecting rhyming words, and utilizes both language-specific and abstract language-independent circuits. The model demonstrates strategies including forward and backward planning, generalizing addition circuitry across different contexts, and employing "metacognitive" circuits that allow it to assess its own knowledge boundaries. The study also explores how the model processes medical diagnoses, distinguishes between familiar and unfamiliar entities (which affects hallucination rates), handles harmful requests, responds to jailbreak attempts, and exhibits varying degrees of faithfulness in its chain-of-thought reasoning.
Unlocking the Hidden Potential of CLIP in Generalizable Deepfake Detection - This research leverages the Contrastive Language-Image Pre-training (CLIP) model to develop a generalizable detection method that performs robustly across diverse datasets and unknown forgery techniques with minimal modifications to the original model. The proposed approach utilizes parameter-efficient fine-tuning (PEFT) techniques, such as LN-tuning, to adjust a small subset of the model's parameters, preserving CLIP's pre-trained knowledge and reducing overfitting. A tailored preprocessing pipeline optimizes the method for facial images, while regularization strategies, including L2 normalization and metric learning on a hyperspherical manifold, enhance generalization. The proposed method achieves competitive detection accuracy comparable to or outperforming much more complex state-of-the-art techniques. Authors from the Czech Technical University in Prague.
Diffusion Counterfactuals for Image Regressors - Although counterfactual explanations have been widely applied to classification models, their application to regression tasks remains underexplored. This research presents two methods to create counterfactual explanations for image regression tasks using diffusion-based generative models to address challenges in sparsity and quality: 1) one based on a Denoising Diffusion Probabilistic Model that operates directly in pixel-space and 2) another based on a Diffusion Autoencoder operating in latent space. Both produce realistic, semantic, and smooth counterfactuals on CelebA-HQ and a synthetic data set, providing easily interpretable insights into the decision-making process of the regression model and reveal spurious correlations. Authors from Technische Universität Berlin.
Self-Organizing Graph Reasoning Evolves into a Critical State for Continuous Discovery Through Structural-Semantic Dynamics - This research uncovered a link between the physics concept of entropy and how AI can discover new ideas without stagnating. They measured structural entropy using Von Neumann graph entropy (applied to the adjacency Laplacian), while semantic entropy came from a similarity-based embedding deep language embedding matrix. Although semantic entropy consistently outpaces structural entropy, they remain in a near-critical balance—fueling "surprising edges" that introduce relationships between distant concepts. This mirrors physical systems on the brink of a phase transition, where a little bit of "disorder" keeps the process dynamic yet avoids chaos. Post. Author from MIT.
Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation - Harmful fine-tuning attack introduces significant security risks to the fine-tuning services. Standard defenses are fragile - with a few fine-tuning steps, the model still can learn the harmful knowledge. This research finds that an incredibly simple solution -- adding purely random perturbations to the fine-tuned model, can recover the model from harmful behavior, though it leads to a degradation in the model's fine-tuning performance. To address the degradation of fine-tuning performance, the authors propose Panacea, which optimizes an adaptive perturbation that will be applied to the model after fine-tuning. Panacea maintains model's safety alignment performance without compromising downstream fine-tuning performance. Comprehensive experiments are conducted on different harmful ratios, fine-tuning tasks and mainstream LLMs, where the average harmful scores are reduced by up-to 21.5%, while maintaining fine-tuning performance. Authors from Tsinghua University and Nanyang Technological University.
Attention IoU: Examining Biases in CelebA using Attention Maps - Existing methods for quantifying bias in computer vision classification models primarily focus on dataset distribution and model performance on subgroups, overlooking the internal workings of a model. This research introduces the Attention-IoU (Attention Intersection over Union) metric and related scores, which use attention maps to reveal biases within a model's internal representations and identify image features potentially causing the biases. Authors from Princeton University.

🧠 Sources of Inspiration

Can you mitigate your AI carbon footprint? Understanding tokens, the energy equation, and 4 simple ways to prompt yourself to a lower carbon footprint.
Tutorial from Hugging Face: Training and Finetuning Reranker Models with Sentence Transformers v4
Awesome Vision-to-Music Generation - survey on vision-to-music generation (V2M), including video-to-music and image-to-music generation. Nice collection of models, datasets, evaluations, and papers.

Spill the GPTea

Discussion about this post

Spill the GPTea

A lot happened this week.

In the News

🗞 This Week in News

🥁 Interesting Products & Features

📄 Interesting Papers

🧠 Sources of Inspiration

Discussion about this post