Efficiency, Ernie, and Analytic World Models

In the News

Brinnae Bent

and

Dave Wang

Feb 18, 2025

A special thanks to Dave Wang, who curated this week’s edition of Spill the GPTea!

🗞 This Week in News

Adobe launches firefly video model to streamline creative workflow

🥁 Interesting Products & Features

Resource Efficient LLMs - A ByteDance team has proposed a novel architecture UltraMem that revolutionizes the implementation of large-scale memory layers in language models. It is built upon the foundation of PKM while introducing highly-sparse memory layers to dramatically improve computational efficiency and reduce inference latency. UltraMem achieves superior performance compared to both PKM and MoE models at equivalent scales, making it particularly suitable for resource-constrained environments.
A New Model from Anthropic to Rival OpenAI's O3-Mini - Anthropic aims to strike a balance between deep reasoning and fast response time, especially on large codebases and other business-related benchmarks. The company will reportedly introduce a “sliding scale” alongside the model to allow developers to control costs, as the deep reasoning capabilities consume more computing.
Baidu to open-source new Ernie 4.5 series LLM Amidst Competitive AI Market Landscape
SpotDraft - To streamline Contract agreement with automation software by extracting and summarizing key details and clauses from contracts, then suggesting follow-up work with to unified planning center that deals with deadlines and organization members.

📄 Interesting Papers

Dream to Drive: Model-Based Vehicle Control Using Analytic World Models (2025): For training autonomous vehicle controllers, there has been ongoing research of differentiable diffusers in end-to-end training loops for policies to learn priors. Now, a team uses them for training world models. Their proposed setups rely on the gradient of the next state with respect to the current state. They call this approach Analytic World Models (AWMs) and showcase its applications, including how to use it for planning in the Waymax simulator and enhanced performance on the large-scale Waymo Open Motion dataset by up to 12%. Authors from Sofia University and ETH Zurich.
Cooperative Multi-Agent Planning with Adaptive Skill Synthesis (2025): Building multi-purpose AI agents remains a challenge, especially for sample efficiency, interpretability, and transferability. LLMs do not mesh with the non-Markovian nature of multi-agent interactions under partial observability. Therefore, the team has created a novel multi-agent architecture that integrates vision-language models (VLMs) with a dynamic skill library and structured communication for decentralized closed-loop decision-making. Authors from Aalto University.
On the Generalization Properties of Diffusion Models (2023): This paper delves into the inner workings of diffusion models, a type of AI used for image generation. It explores how these models generalize, which is crucial for their ability to create diverse and realistic images. To establish a stronger theoretical understanding of diffusion models, the researchers have done a quantitative analysis to a data-dependent scenario, wherein target distributions are portrayed as a succession of densities with progressively increasing distances between modes. Authors from Stanford and Microsoft Research Asia.
Highly accurate protein structure prediction with AlphaFold (2021): Predicting the 3-D structure that a protein will adopt based solely on its amino acid sequence, the structure prediction component of the ‘protein folding problem’8, has been an important open research problem for more than 50 years. The group trained a deep learning model to regularly predict protein structures with atomic accuracy even where no similar structure is known. They validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14). Authors from Google DeepMind.
Learning Interactive Real-World Simulator (2023): Generative models trained on internet data have revolutionized how text, image, and video content can be created. Perhaps the next milestone for generative models is to simulate the real world in response to actions carried out by humans, robots, and other types of interactive agents. Applications of a real-world simulator range from controllable content creation in games and movies to training embodied agents purely in simulation that can be directly deployed in the real world. This paper explores using a simulator to train generative modeling. Authors from UC Berkeley, Google DeepMind, MIT, and University of Alberta.

🧠 Sources of Inspiration

AgentCoder is a novel multiagent-code generation framework that leverages the power of large language models (LLMs) to enhance the effectiveness of code generation. The framework consists of three specialized agents: the programmer agent, the test designer agent, and the test executor agent.
Understanding how to build a RAG system from scratch to supercharge the entry to building AI applications!

Cover photo from Adobe Firefly launch video.

A guest post by

Dave Wang

Data Science / Analytics Aspirant

Spill the GPTea

Efficiency, Ernie, and Analytic World Models

In the News

🗞 This Week in News

🥁 Interesting Products & Features

📄 Interesting Papers

🧠 Sources of Inspiration

Discussion about this post