🗞 This Week in News
Pushing the frontiers of audio generation - this article from Google DeepMind discusses use cases for audio generation, their research and novel techniques, and scaling audio generation. I thought it was particularly interesting that they finetuned the model on unscripted conversations from a number of voice actors with the “umm”s and “aah”s of real conversation.
🥁 Interesting Products & Features
PII & PHI Detection Synthetic Dataset from Gretel for use tasks like Named Entity Recognition and sensitive data redaction, which typically require access to private information. By using synthetic data, developers can simulate real-world conditions, safely building models that handle sensitive information while ensuring privacy compliance. They also introduce the GLiNER models fine-tuned on this dataset.
Robots that feel? Digit 360 from Meta is a “a tactile fingertip with human-level multimodal sensing capabilities” for robots. Researchers can apply to use them before their release.
SmolLM2 from HuggingFace - SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device.
📄 Interesting Papers
ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate - Adam is one of the most popular optimization algorithms in deep learning. However, it is known that Adam does not converge in theory unless choosing a hyperparameter (ie β2), in a problem-dependent manner. This paper proposes a new adaptive gradient method named ADOPT, which achieves the optimal convergence rate with any choice of β2 without depending on the bounded noise assumption. ADOPT addresses the non-convergence issue of Adam by removing the current gradient from the second moment estimate and changing the order of the momentum update and the normalization by the second moment estimate. ADOPT achieves superior results compared to Adam and its variants across a wide range of tasks, including image classification, generative modeling, natural language processing, and deep reinforcement learning. Authors from The University of Tokyo.
B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable - This paper propose 'B-cosification' to transform existing pre-trained models to become “inherently interpretable”. They evaluate their approach for both convolutional neural networks and vision transformers. They found that B-cosification can yield models that are on par with B-cos models trained from scratch in terms of interpretability, while often outperforming them in terms of classification performance at a fraction of the training cost. Authors from Max Planck Institute for Informatics.
Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge - embodied social intelligence challenge designed to test social perception and cooperation in embodied agents. In CHAIC, the goal is for an embodied agent equipped with egocentric observations to assist a human who may be operating under physical constraints—e.g., unable to reach high places or confined to a wheelchair—in performing common household or outdoor tasks as efficiently as possible. They benchmark planning- and learning-based baselines on the challenge and introduce a new method that leverages Large Language Models and behavior modeling. Authors from various institutions, including Carnegie Mellon, UMass Amherst, and MIT.
🧠 Sources of Inspiration
gsplat - open-source library for CUDA-accelerated differentiable rasterization of 3D gaussians with Python bindings. This library makes gaussian splatting faster and more memory efficient.