🥧 Happy Thanksgiving!
It’s that time of year again - time to break out my blog written years ago on Thanksgiving Pie [Charts]. Happy feasting and plotting!

🗞 This Week in News
This is pretty fun - a thread on the Sam Altman / OpenAI board struggle as an AI agent-based simulation! And the paper that inspired it, for those who want to build it yourselves.
Rust Foundation Releases Problem Statement on C++/Rust Interoperability - they are requesting feedback on this problem statement and participation from the community.
🥁 Interesting Products & Features
Flux.1 Tools - a suite of models designed to add control and steerability to the base text-to-image model FLUX.1, enabling the modification and re-creation of real and generated images. These tools currently include: Fill for inpainting and outpainting, Depth for guidance based on depth maps, Canny for canny edge guidance, and Redux for mixing and recreating input images and text prompts. Think: ControlNet specifically for the Flux model suite.
AlphaQubit from Google identifies errors inside quantum computers, helping to make the new technology more reliable. AlphaQubit is a neural-network based decoder based on the transformers architecture. Using the consistency checks as an input, its task is to correctly predict whether the logical qubit — when measured at the end of the experiment — has flipped from how it was prepared. Paper here.
Nine physical qubits (small gray circles) in a qubit grid of side length 3 (code distance) form a logical qubit. At each step, 8 more qubits perform consistency checks (square and semicircle areas, blue and magenta when failing and gray otherwise) at each time step which inform the neural network decoder (AlphaQubit). At the end of the experiment, AlphaQubit determines what errors occurred. Source: Google Ai2 Open Scholar - a retrieval-augmented language model (LM) designed to answer user queries by first searching for relevant papers in the literature and then generating responses grounded in those sources. A joint project between Semantic Scholar and the University of Washington. Try the demo here.
Advancing red teaming with people and AI from OpenAI - “Red teaming” means using people or AI to explore a new system’s potential risks in a structured way. This week OpenAI shares 2 papers on red-teaming—a white paper detailing how OpenAI engages with external red teamers to test frontier models, and a research study that introduces a new method for automated red teaming.
📄 Interesting Papers
⭐️ Accelerating scientific discovery with generative knowledge extraction, graph-based representation, and multimodal intelligent graph reasoning - This research transformed a dataset of 1000 scientific papers on biological materials into an ontological knowledge graph. Using a large language embedding model, they compute deep node representations and use combinatorial node similarity ranking to develop a path sampling strategy that allows them to link dissimilar concepts that have previously not been related. One comparison revealed detailed structural parallels between biological materials and Beethoven's 9th Symphony, highlighting shared patterns of complexity through isomorphic mapping. In another example, the algorithm proposed an innovative hierarchical mycelium-based composite based on integrating path sampling with principles extracted from Kandinsky's 'Composition VII' painting. The resulting material integrates an innovative set of concepts that include a balance of chaos and order, adjustable porosity, mechanical strength, and complex patterned chemical functionalization. Author from MIT. Article in MIT News.
Generative World Explorer - Most world building develops agents that physically explore their environment to update their beliefs about the world state. In contrast, humans can imagine unseen parts of the world through a mental exploration and revise their beliefs with imagined observations. This allows humans to make more informed decisions, without necessitating the physical exploration of the world at all times. To achieve this human-like ability, this paper introduces the Generative World Explorer (Genex), an egocentric world exploration framework that allows an agent to mentally explore a large-scale 3D world and acquire imagined observations to update its belief. This updated belief will then help the agent to make a more informed decision at the current step. Authors from John Hopkins. Project page here.
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory - This paper introduces SAMURAI, an improved version of SAM 2 specifically designed for visual object tracking in videos that have crowded scenes with fast-moving or self-occluding objects. By incorporating temporal motion cues with the proposed motion-aware memory selection mechanism, SAMURAI effectively predicts object motion and refines mask selection, achieving robust, accurate tracking without the need for retraining or fine-tuning. Authors from University of Washington.
Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations - Drawing on statistical theory and the experiment design literature, the paper makes a number of recommendations to the AI research community for reporting benchmark evaluation results in a scientifically informative way. Recommendations include: use central limit theorem, cluster standard errors, reduce variance within questions, analyze paired differences, and use power analysis. Author from Anthropic. Article summary from Anthropic.
SAM Decoding: Speculative Decoding via Suffix Automaton - Inference speed of LLMs remains a big problem. SAM-Decoding introduces a novel retrieval-based speculative decoding method that uses a suffix automaton for efficient and accurate draft generation. Unlike n-gram matching currently used, SAM-Decoding finds the longest suffix match in generating text and a text corpus, achieving an average time complexity of O(1) per generation step. SAM-Decoding constructs static and dynamic suffix automatons for the text corpus and input prompts, respectively, enabling fast and precise draft generation. This results in a speed up of 2.2-2.5x on benchmarks. Authors from Renmin University of China.
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training - Extending context window sizes allows LLMs to process longer sequences and handle more complex tasks. Rotary Positional Embedding (RoPE) has become the de facto standard due to its relative positional encoding properties that benefit long-context training. However, using RoPE with BFloat16 format results in numerical issues and deviations due to its limited precision and this problem accumulates as context length increases. To address this, this paper introduces AnchorAttention, a plug-and-play attention method that alleviates numerical issues caused by BFloat16, improves long-context capabilities, and speeds up training. Authors from National University of Singapore.
Other interesting papers:
🧠 Sources of Inspiration
The Beginner’s Guide to Visual Prompt Injections - What is a visual prompt injection attack and how to recognize it? Read this short guide and check out our real-life examples of visual prompt injections attacks performed during Lakera's Hackathon.