🗞 This Week in News
The Anthropic Economic Index (thanks Bochu for sending this one!) - an initiative aimed at understanding AI's effects on labor markets and the economy over time. The Index’s initial report provides data and analysis based on millions of anonymized conversations on Claude, revealing the how AI is being incorporated into real-world tasks across the modern economy. Anthropic also open sourced the dataset used for this analysis.
Report from Irrational Labs shows that rather than enhancing perceptions, the term “generative AI” [used in product marketing] significantly lowered expectations of a product’s potential impact.
Google’s Responsible AI Report for 2024 - includes highlights from 300+ research papers Google published on responsibility and safety topics, updates to their responsible AI policies, principles and frameworks, and key things they’ve learned from red teaming and evaluations that took place against safety, privacy, and security benchmarks. It also describes progress they’ve made on risk mitigation techniques across different gen AI launches — including better safety tuning and filters, security and privacy controls, the use of provenance technology in products, and broad AI literacy education.
Cisco evaluates DeepSeek R1 on security and it’s not good - Using algorithmic jailbreaking techniques, their team applied an automated attack methodology on DeepSeek R1 which tested it against 50 random prompts from the HarmBench dataset. These covered six categories of harmful behaviors including cybercrime, misinformation, illegal activities, and general harm. The results: DeepSeek R1 exhibited a 100% attack success rate, meaning it failed to block a single harmful prompt. This contrasts starkly with other leading models, which demonstrated at least partial resistance.
🥁 Interesting Products & Features
Beware those that violate robots.txt: Nepenthes is a malicious software targeting web crawlers that scrape data for LLMs: “It works by generating an endless sequences of pages, each of which with dozens of links, that simply go back into a tarpit. Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time. Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse.”
DeepSeek launches DeepSeek-VL2 - series of large Mixture-of-Experts (MoE) Vision-Language Models that improves upon its predecessor, DeepSeek-VL
Pikadditions -video inpainting for integrating objects or characters into videos
📄 Interesting Papers
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models - new approach to systematically map features discovered by sparse autoencoder across consecutive layers of LLMs. By using a data-free cosine similarity technique, they trace how specific features persist, transform, or first appear at each stage. This method yields granular flow graphs of feature evolution, enabling fine-grained interpretability and mechanistic insights into model computations. They also demonstrate how these cross-layer feature maps facilitate direct steering of model behavior by amplifying or suppressing chosen features, achieving targeted thematic control in text generation. Authors from T-Tech.
Screening performance and characteristics of breast cancer detected in the Mammography Screening with Artificial Intelligence trial (MASAI): a randomised, controlled, parallel-group, non-inferiority, single-blinded, screening accuracy study - This large study (>105k women) tested AI-supported mammography screening against standard double reading in Sweden’s national screening program and showed 29% increase in cancer detection, 44% reduction in screen-reading workload, and no significant rise in false positives. AI detected more small, lymph-node negative invasive cancers and increased detection of aggressive subtypes, including triple-negative and HER2-positive cancers. Authors from Lund University.
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models - can dictionary learning be used to discover unknown concepts from less human-interpretable scientific data? To answer this question, the authors use dictionary learning algorithms to study microscopy foundation models trained on multi-cell image data, where little prior knowledge exists regarding which high-level concepts should arise. They show that sparse dictionaries indeed extract biologically-meaningful concepts such as cell type and genetic perturbation type. Authors from ETH Zurich, Rutgers, University of Cambridge and Valence Labs.
Harmonic Loss Trains Interpretable AI Models - This paper introduces harmonic loss as an alternative to the standard cross-entropy loss for training neural networks and LLMs. Harmonic loss enables improved interpretability and faster convergence, owing to its scale invariance and finite convergence point by design. They validates the performance of harmonic models across algorithmic, vision, and language datasets and demonstrated that models trained with harmonic loss outperform standard models by: (a) enhancing interpretability, (b) requiring less data for generalization, and (c) reducing grokking. Authors from MIT.
🧠 Sources of Inspiration
Deep Dive into LLMs - a 3.5 hour mini class on YouTube from Andrej Karpathy. It covers the full training stack of how the models are developed, along with mental models of how to think about their "psychology", and how to get the best use them in practical applications.
AI Startup School from Y Combinator - June 16-17 in SF, calling students in compsci/AI. $500 travel stipend. Apply at the link.
RLHF mini textbook from Nathan Lambert
Cover photo from Pikadditions launch video.