🗞 This Week in News
AI Safety Board created to advise the US Department of Homeland Security on deploying artificial intelligence safely within America’s critical infrastructure.
GPT-2 is back and climbing the LMsys leaderboard (just kidding, this is probably a cover for an unnamed model). Is it GPT-4-turbo? GPT-4.5? Who knows, but it is performing well.
🥁 Interesting Products & Features
Perplexica - an open-source version of the Perplexity AI-powered search engine. It uses methods like similarity searching and embeddings to refine results and provides clear answers with sources cited.
Cohere Toolkit - plug-and-play components and source code for interfaces, models, and retrieval. Ready to deploy with instructions for specific platforms such as AWS, Microsoft Azure, and GCP. While it currently plugs into Cohere’s proprietary models and tools, the project is open source.
Snowflake Arctic open-source LLM for enterprise: excels at enterprise tasks such as SQL generation, coding and instruction following benchmarks even when compared to open source models trained with significantly higher compute budgets. Apache 2.0 license provides ungated access to weights and code. Also open sourcing all of their data recipes and research insights.
📄 Interesting Papers
Let's Think Dot by Dot: Hidden Computation in Transformer Language Models -This paper shows that transformers can use meaningless filler tokens (e.g., '......') in place of a chain of thought (CoT) to solve two hard algorithmic tasks they could not solve when responding without intermediate tokens. This means that additional tokens can provide computational benefits independent of token choice. The fact that intermediate tokens can act as filler tokens raises concerns about LLMs engaging in unauditable, hidden computations that are increasingly detached from the observed CoT tokens. Authors from NYU.
Capabilities of Gemini Models in Medicine - This paper introduces Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. Evaluated on 14 medical benchmarks, establishing new SOTA performance on 10 of them. Authors from Google.
A Survey on Diffusion Models for Time Series and Spatio-Temporal Data - Recently, diffusion models have seen widespread application in time series and spatio-temporal data mining. This survey extensively covers their application in various fields, including healthcare, recommendation, climate, energy, audio, and transportation, providing a foundational understanding of how these models analyze and generate data. Author affiliation varies, including Oxford, U of Technology Sydney, and Squirrel AI.
Hallucination of Multimodal Large Language Models: A Survey - Review of recent advances in identifying, evaluating, and mitigating hallucinations in MLLMs, offering a detailed overview of the underlying causes, evaluation benchmarks, metrics, and strategies developed to address this issue. Additionally, they analyze the current challenges and limitations and formulate open questions that delineate potential pathways for future research. Authors from National University of Singapore.
🧠 Sources of Inspiration
FrugalGPT: Reducing LLM Costs & Improving Performance - a guide to reducing LLM costs while improving performance, based off the 2023 paper of the same name.
Google Health's annual The Check Up event showcased new ways Google is integrating AI and healthcare, including fine-tuning Gemini for the medical domain, building a personal health LLM in the Fitbit mobile app for personalized health insights, developing models to help with early detection of diseases, and doing research on ways AI can augment clinical conversations.