🗞 This Week in News
The Creativity Edition 🎹:
AI-designed jewelry from Arcade AI - they connect you with independent jewelry creators who collaborate with you to make your generated jewelry come to life. It took a few tries, but I was able to “dream up” the ring below using their tool and I could have it made for $72 by one of their makers.
MusicFX DJ - designed in collaboration between Google and artist Jacob Collier, this generative music creation tool makes you the “conductor” of an AI-powered jam session. By mixing prompts like instruments, genres and emotions, MusicFX DJ lets anyone steer a continuous flow of music, ideal for music experimentation.
Stable Diffusion 3.5 - more customizable, more efficient, more diverse and versatile. How? They integrated Query-Key Normalization into the transformer blocks, stabilizing the model training process and simplifying further fine-tuning and development in addition to “several adjustments to the architecture and training protocols”
The Open Source Initiative (OSI) has released its official definition of “open” source artificial intelligence, and this now includes (read responses here):
Sufficiently detailed information about the data used to train the system so that a skilled person can build a substantially equivalent system
The complete source code used to train and run the system
The model parameters, such as weights or other configuration settings
🥁 Interesting Products & Features
Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku - Upgraded Claude 3.5 Sonnet, Claude 3.5 Haiku now matches the performance of Claude 3 Opus, the prior largest model, on many evaluations for the same cost and similar speed to the previous generation of Haiku.
Anthropic also introduced a new capability in public beta: computer use. This allows developers to direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. If you haven’t seen the video yet:
Claude Analysis Tool - a new built-in feature that enables Claude to write and run JavaScript code. Claude can now process data, conduct analysis, and produce real-time insights, enabling Claude to compute complex math, analyze data, and iterate on different ideas before sharing an answer.
Replicate Playground - allows you to compare different models, prompts, and settings side by side and create grids of images to explore variations.
Hominis - open source 15B parameter model, collaboration between RealAI B.V. , University of Naples Federico II and NVIDIA. Check out a video of the playground here.
📄 Interesting Papers
Evaluating feature steering: A case study in mitigating social biases - Is feature steering useful and reliable? Authors from Anthropic.
Here is what the data says:
Within a certain range (the feature steering sweet spot) one can successfully steer the model without damaging other model capabilities. However, past a certain point, feature steering the model may come at the cost of decreasing model capabilities—sometimes to the point of the model becoming unusable.
Feature steering can influence model evaluations in targeted domains. For example, increasing the value of a feature that fires on discussions of gender bias increases the gender identity bias score.
Evidence suggests that we can’t always predict a feature’s effects just by looking at the contexts in which it fires. For example, we find that features we think might be related to gender bias may also significantly affect age bias, a general trend we refer to as off-target effects
They found a neutrality feature that significantly decreases social biases on nine social dimensions without necessarily impacting capabilities we tested too much
Sparse Crosscoders for Cross-Layer Features and Model Diffing - This note (not a full research paper) introduces sparse crosscoders, a variant of sparse autoencoders for understanding models in superposition. Where autoencoders encode and predict activations at a single layer, a crosscoder reads and writes to multiple layers. Crosscoders produce shared features across layers and even models. They allow us to resolve cross-layer superposition, they can help remove “duplicate features” from analysis, simplify circuits, and can produce shared sets of features across models. Authors from Anthropic.
Automatically Interpreting Millions of Features in Large Language Models - Open-source automated pipeline to generate and evaluate natural language explanations for SAE features using LLMs. They test the framework on SAEs of varying sizes, activation functions, and losses, trained on two different open-weight LLMs. Five new techniques are developed to score the quality of explanations that are cheaper to run than the previous state of the art. One of these techniques, intervention scoring, evaluates the interpretability of the effects of intervening on a feature, which we find explains features that are not recalled by existing methods. Authors from EleutherAI and Northwestern University.
Scalable watermarking for identifying large language model outputs - Introduces SynthID Text, the goal of which is to encode a watermark into AI-generated text in a way that helps you determine if text was generated from your LLM without affecting how the underlying LLM works or negatively impacting generation quality. Google DeepMind has developed a watermarking technique that uses a pseudo-random function, called a g-function, to augment the generation process of any LLM such that the watermark is imperceptible to humans but is visible to a trained model. Released on Hugging Face. Authors from Google DeepMind.
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages - This paper introduces Pangea, a multilingual multimodal LLM trained on PangeaIns, a diverse 6M instruction dataset spanning 39 languages. PangeaIns features: 1) high-quality English instructions, 2) carefully machine-translated instructions, and 3) culturally relevant multimodal tasks to ensure cross-cultural coverage. Authors from Carnegie Mellon University.
🧠 Sources of Inspiration
NotebookLlama: An Open Source version of NotebookLM from Meta - This is a guided series of tutorials/notebooks that can be taken as a reference or course to build a PDF to Podcast workflow. Assumes zero knowledge of LLMs, prompting and audio models, everything is covered in their respective notebooks.
We have a lot of interesting work on LLMs and Mechanistic Interpretability this week!
Resources for getting started with LLMs:
Resource for getting started with Mechanistic Interpretability:
Coursera course on Interpretable ML (this one is a plug 😇)