🗞 This Week in News
The AI Scientist from Sakana AI - a framework for fully automatic scientific discovery using LLMs to perform research independently and communicate the findings. The “AI Scientist” generates novel research ideas, writes code, executes experiments, visualizes results, describes its findings by writing a full scientific paper, and then runs a simulated review process for evaluation. Interestingly, one of the limitations is that it struggles to compare the magnitude of two numbers. It can execute experiments and write a paper at a cost less than $15, but it is challenged by the seemingly simple task of comparing two numbers. The question remains, did the AI Scientist write the paper about itself? [See Paper]
AI Risk Repository from MIT - A curated collection of over 700 risks from artificial intelligence, categorized by cause and risk domain. [Preprint]
🥁 Interesting Products & Features
FLUX.1 from Black Forest Labs - a pretty impressive image generation model with high accuracy to prompts, clear text, and precise color control.
SWE-bench Verified - Open AI releases a human-validated subset of SWE-Bench to more reliably evaluate AI model ability to solve real-world software issues.
Grok-2 Beta Release - Performs similarly to Claude 3.5 Sonnet and gpt4-turbo, so not quite SOTA, but pretty close. The interface is available on X (formerly Twitter).
Hermes 3 from Nous Research - Hermes 3 was created by fine-tuning Llama 3.1 8B, 70B and 405B, and training on a dataset of primarily synthetically generated responses. The model has comparable performance to Llama 3.1 but has more capabilities in reasoning and creativity.
Another “AI Software Engineer” called Genie from Cosine. While it performs better than any other systems out there, it is still a long way off from replacing the human SWE.
📄 Interesting Papers
ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation - CLIP is known to have biases. This paper explicitly models and rectifies the bias existing in CLIP to facilitate the unsupervised semantic segmentation tasks. Authors from Beihang University.
DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model - allows for the use of a flat, lensless camera to reduce camera size and weight. Using diffusion models allows for the reconstruction of very poor quality images. Authors from Tel Aviv University.
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 - A few weeks ago we shared the release of Gemma Scope, the results of Google DeepMind’s experiments with mechanistic interpretability. Here is the paper recently released by the team sharing training details, infrastructure requirements, and open problems. Authors from Google DeepMind.
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers - Improves the reasoning of small LLMs without fine tuning. They use a target small language model (SLM) to augment a Monte Carlo Tree Search with a set of human-like reasoning actions to construct higher quality reasoning trajectories. Next, another SLM acts as a discriminator to verify each trajectory generated by the target SLM. The mutually agreed reasoning trajectories are considered mutual consistent, thus are more likely to be correct. Authors from Microsoft Research.
ECG-FM: An Open Electrocardiogram Foundation Model - ECG-FM has a transformer-based architecture and is pretrained on 2.5 million samples using ECG-specific augmentations and contrastive learning, as well as a continuous signal masking objective. Evaluation includes a diverse range of downstream tasks, including predicting ECG interpretation labels, reduced left ventricular ejection fraction, and abnormal cardiac troponin. [GitHub] Authors from University Health Network and University of Toronto.
MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing - proposes a set of methods to improve consistency in inpainting across different views. Authors from Fudan University.
🧠 Sources of Inspiration
Digital Scents from Osmo - identifying which molecules are associated with which types of aromas and then training an AI to recognize and identify specific patterns? Sounds like science fiction, but Osmo is working on this. Their team is also working on a method to recreate smells using molecular synthesis, allowing a computer to “smell” something and then send that information to another computer for resynthesis. They hope this AI could be used to smell diseases like cancer.
OpenResearcher: Unleashing AI for Accelerated Scientific Research - open-source GitHub repository for scientific research.
Vision-Language Model Evaluation Repository from Facebook Research - a comprehensive set of tools and scripts for evaluating VLM models and benchmarks including implementations for 40 evaluation benchmarks.
Long Context RAG Performance of LLMs - this blog post explores the impact of increased context length on the quality of RAG applications. Over 2,000 experiments on 13 popular open source and commercial LLMs. They found that most model performance decreases after a certain context size and identified unique failure patterns across different models.
Cover image from Digital Scents from Osmo.