🗞 This Week in News
LLMs for updating software - Amazon claims their coding assistant saved their team 4,500 developer years of work and saved $260M in efficiency gains.
AGI Safety and Alignment at Google Deepmind - an interesting high-level read of the current state of safety and alignment at Google DeepMind. They have specific callouts to frontier safety, mechanistic interpretability, amplified oversight, and causal alignment.
🥁 Interesting Products & Features
Luma Labs releases Dream Machine 1.5 - text-to-video with custom text rendering and image-to-video improvements
Ideogram 2.0 - another text to image model. They claim it outperforms other models on image-text alignment and rendering accuracy. They offer color palette control, realism, and accurate text rendering. They are hiring several ML engineers right now.
You no longer need Discord to try Midjourney - their web platform is now available and open to everyone
Jamba-1.5 - the next generation of hybrid SSM-Transformer instruction following foundation models from AI21. Released under the Jamba Open Model License, which allows research and commercial use under license terms.
📄 Interesting Papers
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model - combining diffusion and language models. The authors provide a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. They demonstrate that scaling the Transfusion recipe to 7B parameters and 2T multi-modal tokens produces a model that can generate images and text on a par with similar scale diffusion models and language models. Authors from Waymo and USC.
To Code, or Not To Code? Exploring Impact of Code in Pre-training - Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training based on anecdotal consensus. This paper systematically investigates the impact of code data on general performance. They conduct extensive ablations and evaluate across a broad range of tasks and benchmarks for model sizes 470M-2.8B. They found consistent results that code is a critical building block for generalization far beyond coding tasks and improvements to code quality have an outsized impact across all tasks. In particular, compared to text-only pre-training, the addition of code results in up to relative increase of 8.2% in natural language reasoning, 4.2% in world knowledge, 6.6% improvement in generative win-rates, and a 12x boost in code performance respectively. Authors from Cohere.
"Image, Tell me your story!" Predicting the original meta-context of visual misinformation - How can we fact-check generated images? The authors of this paper propose to do this by first explaining what is actually true about the image. This would allow fact-checkers to focus their efforts on check-worthy visual content, engage in counter-messaging before misinformation spreads widely, and make their explanation more convincing. This paper introduces the task of automated image contextualization by creating 5Pils, a dataset of 1,676 fact-checked images with question-answer pairs about their original meta-context. Authors from TU Darmstadt and KU Leuven.
A Survey of Embodied Learning for Object-Centric Robotic Manipulation - This paper provides a comprehensive survey of the latest advancements in the embodied learning for object-centric robotic manipulation field. They categorize existing work into three main branches: 1) Embodied perceptual learning, which aims to predict object pose and affordance through various data representations; 2) Embodied policy learning, which focuses on generating optimal robotic decisions using methods such as reinforcement learning and imitation learning; 3) Embodied task-oriented learning, designed to optimize the robot's performance based on the characteristics of different tasks in object grasping and manipulation. A great read if you are interested in embedded learning and robotics. Authors from Hong Kong Polytechnic University and Tsinghua University.
Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask - Time Series Diffusion Embedding (TSDE) is the first diffusion-based self-supervised learning time series representation learning approach. TSDE segments time series data into observed and masked parts using an Imputation-Interpolation-Forecasting mask. It applies a trainable embedding function, featuring dual-orthogonal Transformer encoders with a crossover mechanism, to the observed part. They train a reverse diffusion process conditioned on the embeddings, designed to predict noise added to the masked part. Experiments demonstrate TSDE's superiority in imputation, interpolation, forecasting, anomaly detection, classification, and clustering tasks. Authors from various institutions, including EQT Group and KTH Royal Institute of Technology.
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models - This paper establishes criteria for gender equality benchmarks, spanning dimensions such as inclusivity, diversity, explainability, objectivity, robustness, and realisticity. They also developed GenderPair, a pair-based benchmark designed to assess gender bias in LLMs. The benchmark includes previously overlooked gender groups such as transgender and non-binary individuals. They developed debiasing techniques that incorporate counterfactual data augmentation and specialized fine-tuning strategies to reduce gender bias in LLMs. Experiments demonstrate a reduction in various gender bias benchmarks, with reductions averaging above 35% across 17 different LLMs. Authors from University of Science and Technology of China and Nanyang Technological University.
Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes - a fine-tuning approach for Segment-Anything-Model (SAM) that allows the input size to be variable. Authors from Meijo University.
🧠 Sources of Inspiration
Prompting tutorials/courses from Anthropic (for beginners and developers) - free on GitHub!
Microsoft open-sourced Aurora, a pre-trained model of the Atmosphere and a fine-tuned version for high-resolution weather forecasting. Air pollution forecasting model is also coming soon.
Anthropic releases the system prompts for Claude - pretty interesting if you are into prompt engineering.
Cover photo from Ideogram 2.0 press release.