🗞 This Week in News
Consent in Crisis: The Rapid Decline of the AI Data Commons (Report from Data Provenance Initiative) - They audit 14k web domains and demonstrate a proliferation of AI specific clauses to limit use, acute differences in restrictions on AI developers, as well as general inconsistencies between websites’ expressed intentions in their Terms of Service and their robots.txt. In a single year (2023-2024) there has been a rapid crescendo of data restrictions from web sources, rendering ~5%+ of all tokens in open training corpus C4, or 28%+ of the most actively maintained, critical sources in C4, fully restricted from use. For Terms of Service crawling restrictions, a full 45% of C4 is now restricted. If respected or enforced, these restrictions are rapidly biasing the diversity, freshness, and scaling laws for general-purpose AI systems.
Energy consumption of AI - What are the actual ramifications of the explosive growth of AI when it comes to power consumption? How much more expensive is it to run an AI model than to use the next-best method? Do we have the resources to switch to using AI on things we weren’t before, and is it responsible to use them for that? Is it worth it? This article has the numbers to try to answer these questions and more.
Enhance RAG results using GraphRAG - GraphRAG uses LLMs to create a comprehensive knowledge graph that details entities and their relationships from any collection of text documents. This graph enables GraphRAG to leverage the semantic structure of the data and generate responses to complex queries that require a broad understanding of the entire text.
🥁 Interesting Products & Features
Qwen2-VL from Alibaba can analyze videos more than 20 minutes long! Apache 2.0 licensed, competitive with GPT 4o mini - w/ video understanding and function calling.
📄 Interesting Papers
The AdEMAMix Optimizer: Better, Faster, Older - Momentum based optimizers typically rely on an Exponential Moving Average (EMA) of gradients, which decays exponentially the present contribution of older gradients. This work questions the use of a single EMA to accumulate past gradients and empirically demonstrates how this choice can be sub-optimal: a single EMA cannot simultaneously give a high weight to the immediate past, and a non-negligible weight to older gradients. To resolve this, the authors propose AdEMAMix, a modification of the Adam optimizer with a mixture of two EMAs to better take advantage of past gradients. Experiments show that with AdEMAMix, gradients can stay relevant for tens of thousands of steps, they help converge faster, and significantly slows-down model forgetting during training. Authors from EPFL and Apple.
WarpAdam: A new Adam optimizer based on Meta-Learning approach - In the conventional Adam optimizer, gradients are utilized to compute estimates of gradient mean and variance, subsequently updating model parameters. By contrast, this approach introduces a learnable distortion matrix, “P”, which is employed for linearly transforming gradients. This transformation slightly adjusts gradients during each iteration, enabling the optimizer to better adapt to distinct dataset characteristics. Experimental results across various tasks and datasets validate the superiority of this optimizer that integrates the 'warped gradient descend' concept in terms of adaptability. Authors from Beijing University of Chemistry Technology, Hohai University, and FuZhou University.
KAN See In the Dark - Current low-light image enhancement methods have challenges fitting the complex nonlinear relationship between normal and low-light images due to uneven illumination and noise effects. Kolmogorov-Arnold networks (KANs) feature spline-based convolutional layers and learnable activation functions, which can effectively capture nonlinear dependencies. This paper proposes a KAN-Block based on KANs for low-light image enhancement, alleviating the limitations of current methods constrained by linear network structures and lack of interpretability. Authors from Chongqing University of Technology.
Planning In Natural Language Improves LLM Search For Code Generation - The authors hypothesize that a core missing component is a lack of diverse LLM outputs, leading to inefficient search due to models repeatedly sampling highly similar, yet incorrect generations. They demonstrate that this lack of diversity can be mitigated by searching over candidate plans for solving a problem in natural language and propose PLANSEARCH, a new search algorithm. PLANSEARCH generates a diverse set of observations about the problem and then uses these observations to construct plans for solving the problem. By searching over plans in natural language rather than directly over code solutions, PLANSEARCH explores a significantly more diverse range of potential solutions compared to baseline search methods. Authors from Scale AI.
🧠 Sources of Inspiration
Text2X Repository - open collection of state-of-the-art Text to X (X can be everything) methods (papers, codes and datasets)
iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models - Python package designed to incrementally construct consistent knowledge graphs with resolved entities and relations by leveraging large language models for entity and relation extraction from text documents.
Anthropic Quickstarts - a collection of projects designed to help developers quickly get started with building deployable applications using the Anthropic API.
Cover photo from The AdEMAMix Optimizer: Better, Faster, Older.