"Based on everything you know about me", simulating 1k people, and a blind-spot free motorcycle helmet
In the News
🗞 This Week in News
Open-R1: a fully open reproduction of DeepSeek-R1 - Hugging Face announces their intention to build a completely open version of DeepSeek-R1 and asks for the community’s help in doing so. (This is also a neat writeup sharing the hows of the R1 model)
How can we improve adoption of standard data licenses for AI? A recent article from the Open Data Institute and Duke provides recommendations.
🥁 Interesting Products & Features
A motorcycle helmet uses computer vision to promise blind-spot elimination
Citations on Anthropic API - a new API feature that lets Claude ground its answers in source documents. Claude can now provide detailed references to the exact sentences and passages it uses to generate responses, leading to more verifiable, trustworthy outputs.
Holistic Agent Leaderboard (HAL) - A standardized, cost-aware, and third-party leaderboard for evaluating agents from Princeton University.
Dolphin 3.0 - the next generation of the open-source Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases.
📄 Interesting Papers
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning - The authors of DeepSeek-R1 released a technical report on the details of the training and evaluation of the DeepSeek-R1 model. Authors from DeepSeek-AI.
Generative Agent Simulations of 1,000 People - The authors present a novel architecture that simulates the attitudes and behaviors of 1,052 real individuals—applying large language models to qualitative interviews about their lives, then measuring how well these agents replicate the attitudes and behaviors of the individuals that they represent. The generative agents replicate participants' responses on the General Social Survey 85% as accurately as participants replicate their own answers two weeks later, and perform comparably in predicting personality traits and outcomes in experimental replications. The amazing part is that this was done using interviews that were only 2 hours long (<10k words). But there is still a major gap - on economic games like the Prisoner’s Dilemma, the generative agents of the humans didn’t replicate the human behavior sufficiently. News Article. Authors from Google DeepMind, Stanford, and Northwestern.
Scaling up self-supervised learning for improved surgical foundation models - This paper introduces SurgeNetXL, a surgical foundation model that sets a new benchmark in surgical computer vision. Trained on the largest reported surgical dataset to date (>4.7 million video frames), SurgeNetXL achieves consistent top-tier performance across six datasets spanning four surgical procedures and three tasks, including semantic segmentation, phase recognition, and critical view of safety (CVS) classification. This study also provides key insights into scaling pretraining datasets, extending training durations, and optimizing model architectures specifically for surgical computer vision. GitHub. Authors from Eindhoven University of Technology.
Evolving Deeper LLM Thinking - an evolutionary search strategy for scaling inference time compute in LLMs. The proposed approach, Mind Evolution, uses a language model to generate, recombine and refine candidate responses. The proposed approach avoids the need to formalize the underlying inference problem whenever a solution evaluator is available. Controlling for inference cost, they found that Mind Evolution significantly outperforms other inference strategies such as Best-of-N and Sequential Revision in natural language planning tasks. Authors from Google DeepMind.
Universal Actions for Enhanced Embodied Foundation Models - This paper introduces UniAct, a new embodied foundation modeling framework operating in the Universal Action Space. The learned universal actions capture generic behaviors across diverse robots by exploiting their shared structural features, and enable enhanced cross-domain data utilization and cross-embodiment generalizations by eliminating the notorious heterogeneity. The universal actions can be efficiently translated back to heterogeneous actionable commands by simply adding embodiment-specific details, from which fast adaptation to new robots becomes simple and straightforward. The 0.5B instantiation of UniAct outperforms 14X larger SOTA embodied foundation models in evaluations on various real-world and simulation robots. Authors from Tsinghua University.
🧠 Sources of Inspiration
Synthetic Data Engine - Create high-fidelity privacy-safe synthetic data: prepare, analyze, and encode original data, train a generative model on the encoded data, and generate synthetic data samples
In a recent op-ed, LinkedIn co-founder Reid Hoffman shares his beliefs for how AI will empower humanity - starting with the viral “Based on everything you know about me” ChatGPT prompts.
AI mistakes are different than human mistakes - what does this mean for the design of security systems?
Cover photo from