🗞 General News
GPT-4o tells you what you want to hear, but to the extreme:
GPT-4o Is An Absurd Sycophant (April 28) - article with lots of examples of the sycophancy exhibited by GPT-4o after an update on April 25
Sycophancy in GPT-4o: what happened and what we’re doing about it (April 29) - OpenAI rolls back update and shares initial thoughts on why the models exhibit sycophancy.
Expanding on what we missed with sycophancy (May 2) - OpenAI shares a deep dive on their findings, what went wrong, and future changes they are making.
Anecdotes that LLMs are good at “geo-guessing” - or guessing a location based on a photo
Anthropic introduces Economic Advisory Council - a group of economists who will provide Anthropic with expert guidance on the economic implications of AI development and deployment. The Council will advise Anthropic on AI's impact on labor markets, economic growth, and broader socioeconomic systems.
🥁 Interesting Products & Features
Google Open-sources SpeciesNet, an AI model designed to identify animal species by analyzing photos from camera traps. Since 2019, thousands of wildlife biologists have used SpeciesNet through a Google Cloud-based tool called Wildlife Insights to streamline biodiversity monitoring and inform conservation decision-making. The SpeciesNet AI model release will enable tool developers, academics and biodiversity-related startups to scale monitoring of biodiversity in natural areas.
Ai2 releases open-source OLMo 2 - 32B, 7B, 13B, and 1B models. OLMo 2 32B is the first fully-open model to outperform GPT3.5-Turbo and GPT-4o mini on a suite of popular, multi-skill academic benchmarks.
Microsoft introduces Phi-4 - 14B open weight model achieves better performance than OpenAI o1-mini and DeepSeek-R1-Distill-Llama-70B at most benchmarks
WebThinker: Empowering Large Reasoning Models with Deep Research Capability - an open-source deep research framework powered by large reasoning models (LRMs). WebThinker enables LRMs to search, explore web pages, and draft research reports.
📄 Interesting Papers
Agentic Neurodivergence as a Contingent Solution to the AI Alignment Problem - This paper demonstrates that achieving complete alignment is inherently unattainable due to mathematical principles rooted in the foundations of predicate logic and computability, in particular Turing's computational universality, Gödel's incompleteness and Chaitin's randomness. Instead, they argue that embracing AI misalignment or agent's `neurodivergence' as a contingent strategy, defined as fostering a dynamic ecosystem of competing, partially aligned agents, is a possible only viable path to mitigate risks. Authors from various institutions, including Oxford University, King’s College London, and University of Tokyo.
Advancing Conversational Diagnostic AI with Multimodal Reasoning - Large Language Model based systems show promise for diagnostic conversations, but their evaluation has mostly excluded multimodal data, limiting relevance for remote care. The Articulate Medical Intelligence Explorer (AMIE), built with Gemini 2.0 Flash, introduces a state-aware dialogue framework that dynamically incorporates multimodal inputs like images and PDFs to emulate structured clinical consultations. In a randomized, double-blind study, AMIE outperformed primary care physicians in both multimodal reasoning and traditional diagnostic competencies, signaling progress toward more capable AI-assisted healthcare. Authors from Google.
Model Evaluation in the Dark: Robust Classifier Metrics with Missing Labels - This paper proposes a multiple imputation technique for evaluating classifiers using metrics such as precision, recall, and ROC-AUC. This method not only offers point estimates but also a predictive distribution for these quantities when labels are missing. They empirically show that the predictive distribution's location and shape are generally correct, even when data is missing not at random (MNAR). Authors from J.P. Morgan Research.
Privacy Risks and Preservation Methods in Explainable Artificial Intelligence: A Scoping Review - A review that addresses 3 research questions: (1) what are the privacy risks of releasing explanations in AI systems? (2) what current methods have researchers employed to achieve privacy preservation in XAI systems? and (3) what constitutes a privacy preserving explanation? Authors from University of Guelph and National University of Singapore.
🧠 Sources of Inspiration
Anthropic’s AI for Science program - Anthropic to provide free API credits to support researchers working on high-impact scientific projects, with a particular focus on biology and life sciences applications
Google for Startups Accelerator: AI for Nature welcomes startups in the Americas and offers 10 weeks of virtual programming, including mentoring and technical support from Google engineers and experts through a mix of one-on-one and group learning sessions. Applications are open from March 3, 2025 to March 31, 2025 and the program will start in May 2025.
Observability for RAG Agents - Nice article with informative diagrams on LLMOps, evaluation of RAG systems, and monitoring.
From Retrieval to Reasoning: Advancing AI Agents for Knowledge Discovery and Collaboration - talk from Stanford Prof. Leskovec - introduces STaRK, AvaTaR, and CollabLLM, new frameworks that encourage AI agents to reason, collaborate, and test hypotheses using knowledge graphs, tools, and multi-turn optimization
MegaMath - an open math pretraining dataset curated from diverse, math-focused sources, with over 300B tokens.
Cover photo from Google’s Wildlife Insights.