🗞 This Week in News
Law and order - In California, there are a number of new regulations around AI technology, addressing problematic uses of the technology like deepfakes (specifically regarding elections and nudity). The laws are also requiring watermarking of generatively created content.
New materials by analyzing crystallography data using an ML model from MIT - for more details, see their webpage
🥁 Interesting Products & Features
1X World Model - a world model serves as a virtual simulator for robots.
From the same starting image sequence, the 1X world model can imagine multiple futures from different robot action proposals. It can also predict non-trivial object interactions like rigid bodies, effects of dropping objects, partial observability, deformable objects (curtains, laundry), and articulated objects (doors, drawers, curtains, chairs).
📄 Interesting Papers
Towards Physically-Realizable Adversarial Attacks in Embodied Vision Navigation - a new attack method for embodied navigation / object detection that attaches adversarial patches with learnable textures and opacity to objects. They use a multi-view optimization strategy based on object-aware sampling, which uses feedback from the navigation model to optimize the patch's texture. Experimental results show the adversarial patches reduce navigation success rates by about 40%. GitHub. Authors from Beijing University of Posts and Telecommunications.
V-STaR: Training Verifiers for Self-Taught Reasoners - self-improvement approaches for LLMs iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability. This research proposes V-STaR, which utilizes both the correct and incorrect solutions generated during the self-improvement process to train a verifier that judges correctness of model-generated solutions. This verifier is used at inference time to select one solution among many candidate solutions. Running V-STaR for multiple iterations results in progressively better reasoners and verifiers, delivering a 4-17% test accuracy improvement over existing self-improvement and verification approaches on common code generation and math reasoning benchmarks with LLaMA2 models. Authors from Universite de Montreal, Microsoft Research, and Google DeepMind.
PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba - Facial-video based Remote photoplethysmography aims at measuring physiological signals and monitoring heart activity without any contact. In this paper, they propose the PhysMamba, a Mamba-based framework, to efficiently represent long-range physiological dependencies from facial videos. Specifically, they introduce the Temporal Difference Mamba block to first enhance local dynamic differences and further model the long-range spatio-temporal context. The findings show improved performance over three benchmark datasets. GitHub. Authors from Great Bay University.
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models - Promptriever is the first retrieval model able to be prompted like an LM. To train Promptriever, they curated and released a new instance-level instruction training set from MS MARCO, spanning nearly 500k instances. Achieves strong performance on standard retrieval tasks and can also follows instructions. Promptriever demonstrates that retrieval models can be controlled with prompts on a per-query basis. GitHub. Authors from Johns Hopkins University and Samaya AI.
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey - Unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy. Within this framework, the authors thoroughly review the existing literature on each dimension, create an evaluation benchmark, and conduct comprehensive evaluations for a variety of proprietary and open-source models. They also identify some potential challenges for future research based on our investigation results. Authors from various institutions, including Tsinghua University and Microsoft Research.
beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems - a framework for training sentence Transformer models with interaction data. Models trained with beeFormer can transfer knowledge between datasets while outperforming not only semantic similarity sentence Transformers but also traditional collaborative filtering methods. Training on multiple datasets from different domains accumulates knowledge in a single model, unlocking the possibility of training universal, domain-agnostic sentence Transformer models to mine text representations for recommender systems. Authors from Czech Technical University in Prague.
🧠 Sources of Inspiration
Quickstart Guide to using the OpenAI o1 model - advice on prompting (different than previous models!), managing the context window, and minimizing costs
An Intuitive Explanation of Sparse Autoencoders for LLM Interpretability - If you are curious about the current SOTA in mechanistic interpretability, this short read is for you!
World Model Challenge from 1X - minimize training loss on an extremely diverse robot dataset and win $10k
One missing piece in Vision and Language: A Survey on Comics Understanding - curated list of research papers and resources focusing on Comics Understanding [GitHub] [Paper]
If you’re interested in Responsible AI topics, I just launched a series of Coursera courses on Interpretable ML, Explainable ML, and Developing Explainable AI Systems! These courses will deep dive into Responsible AI topics with code tutorials and case studies. In addition to traditional approaches, you will learn cutting edge topics like Mechanistic Interpretability, XAI in LLMs, and XAI in Generative Computer Vision.
Developing Explainable AI (XAI)
Interpretable Machine Learning
Explainable Machine Learning (XAI)