🗞 This Week in News
This is the last regularly scheduled Spill the GPTea of 2024! Happy Holidays!
🥁 Interesting Products & Features
Gemini 2.0 - updates include native image and audio output and native tool use. The Gemini 2.0 Flash experimental model is now available to all Gemini users. And they also launched a new feature this past week called Deep Research, which uses advanced reasoning and long context capabilities to act as a research assistant, exploring complex topics and compiling reports on your behalf.
BrowserGym - an open, easy-to-use and extensible framework to accelerate the field of web agent research. AgentLab is a framework to implement, test, and evaluate web agents on all BrowserGym benchmarks.
📄 Interesting Papers
Let Curves Speak: A Continuous Glucose Monitor based Large Sensor Foundation Model for Diabetes Management - large sensor models (LSMs) to capture latent knowledge in CGM data by modeling patients as sequences of glucose time. CGM-LSM is pretrained on 15.96 million glucose records from 592 diabetes patients for near-future glucose prediction. They evaluated CGM-LSM against state-of-the-art methods using the OhioT1DM dataset across various metrics, prediction horizons, and unseen patients. This approach leveraging pretraining to uncover latent glucose generation patterns in sensor data is very interesting! Authors from Johns Hopkins University.
Probabilistic weather forecasting with machine learning - introducing GenCast, a probabilistic weather model with greater skill and speed than the top operational medium-range weather forecast in the world, ENS, the ensemble forecast of the European Centre for Medium-Range Weather Forecasts. GenCast is an ML weather prediction method, trained on decades of reanalysis data. GenCast generates an ensemble of stochastic 15-day global forecasts, at 12-h steps and 0.25° latitude–longitude resolution, for more than 80 surface and atmospheric variables, in 8 min. It has greater skill than ENS on 97.2% of 1,320 targets. Blog. Authors from Google DeepMind.
AI Red-Teaming is a Sociotechnical System. Now What? - This essay calls for collaboration between computer scientists and social scientists to study the sociotechnical systems surrounding AI technologies, including the work of red-teaming, to avoid repeating the mistakes of the recent past. We highlight the importance of understanding the values and assumptions behind red-teaming, the labor involved, and the psychological impacts on red-teamers. Authors from Microsoft Research.
From hearing to seeing: Linking auditory and visual place perceptions with soundscape-to-image generative artificial intelligence - proposes a soundscape-to-image diffusion model aiming to visualize soundscapes through the generation of street view images. By creating audio-image pairs, acoustic environments are first represented as high-dimensional semantic audio vectors. The model can then translate those semantic audio vectors into visual representations of the place. They evaluated the model using both machine-based and human-centered approaches and showed that generated street view images align with our common perceptions, and accurately create several key street elements of the original soundscapes. News Article. Authors from Wuhan University, University of South Carolina, and UT Austin.
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations - introduces LAION-SG, a large-scale dataset with high-quality structural annotations of scene graphs (SG), which precisely describe attributes and relationships of multiple objects, effectively representing the semantic structure in complex scenes. Based on LAION-SG, they trained a new model SDXL-SG to incorporate structural annotation information into the generation process. Models trained on LAION-SG are better at complex scene generation over models on existing datasets. GitHub. Author affiliations vary but include Zhejiang University and Alibaba.
An Evolved Universal Transformer Memory - This paper introduces Neural Attention Memory Models (NAMMs), a learned network for memory management that improves both the performance and efficiency of transformers. NAMMs are evolved atop pre-trained transformers to provide different latent contexts focusing on the most relevant information for individual layers and attention heads. NAMMs are universally applicable to any model using self-attention as they condition exclusively on the values in the produced attention matrices. Authors from Sakana AI.
From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding - Large vision-language models are prone to misinterpreting visual inputs, often resulting in hallucinations and unreliable outputs. To address these challenges, this paper proposes Dropout Decoding, an inference-time approach that quantifies the uncertainty of visual tokens and selectively masks uncertain tokens to improve decoding. The method measures the uncertainty of each visual token by projecting it onto the text space and decomposing it into aleatoric and epistemic components. Inspired by dropout regularization, they introduce uncertainty-guided token dropout, which applies the dropout principle to input visual tokens instead of model parameters, and during inference rather than training. By aggregating predictions from an ensemble of masked decoding contexts, Dropout Decoding robustly mitigates errors arising from visual token misinterpretations. Authors from Stony Brook University and University of Chicago.
🧠 Sources of Inspiration
$1M to the open source AI that breaks 90% on a new (uncontaminated) version of SWE-Bench. The motivation? To measure how AI coders perform when they can’t cheat and model a better way to benchmark.
Why is Claude the go-to chatbot for tech insiders? It has more to do with Claude’s EQ than IQ.
Semantic “world” map of GitHub - I don’t think this is useful, but super fun.
Thanks for the glucose ArXiv article. As usual, interesting links, Brinnae!