Educating Llamas about cyberattacks

In the News

Brinnae Bent

Apr 15, 2025

AI Index 2025: The Top 10 Takeaways: Smaller models are getting better, models are becoming cheaper to use, China’s models are catching up, the number of AI-related harms is increasing, AI agents show early promise, AI investment is sky high, AI use in business is rapidly increasing, the number of FDA-approved, AI-enabled medical devices increased, US regulation moves to the states, regional differences persist regarding AI optimism, with Asia showing the most AI optimism.
Anthropic Education Report: How University Students Use Claude - a report based on analyzing one million anonymized student conversations. Key findings include: the majority of early adopters are STEM students and students use AI for direct problem solving, direct output creation, collaborative problem solving, and collaborative output creation. Students primarily use AI systems for creating (using information to learn something new) and analyzing (taking apart the known and identifying relationships).
Cyberattacks by AI agents are coming - AI agents are much cheaper than hiring the services of professional hackers and could orchestrate attacks more quickly and at larger scale. Cybersecurity experts believe that ransomware attacks—the most lucrative kind—are relatively rare because they require considerable human expertise, but those attacks could be outsourced to agents in the future.
AI masters Minecraft: DeepMind program finds diamonds without being taught - An AI system has for the first time figured out how to collect diamonds in the video game Minecraft — a difficult task requiring multiple steps — without being shown how to play. The system, “Dreamer”, is a step towards machines that can generalize knowledge learn in one domain to new situations.

🥁 Interesting Products & Features

Meta launches the “Llama 4 herd” - including Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context length support built using a mixture-of-experts architecture. They are also previewing Llama 4 Behemoth to serve as a teacher for new models.
NotebookLM adds feature to discover sources from around the web
Claude for Education - introduces “learning mode”, that guides students' reasoning process rather than providing answers
Midjourney v7 - Default model personalization for understanding of each person’s preferred aesthetics. Also includes “draft mode”, with 10x speed improvements.
Firebase Studio - A cloud-based development environment designed to accelerate how you build, test, deploy and run production-quality AI applications from Google
Sec-Gemini v1: a new experimental cybersecurity model from Google

📄 Interesting Papers

Mechanistic understanding and validation of large AI models with SemanticLens - This paper introduces SEMANTICLENS, a universal explanation method for neural networks that maps hidden knowledge encoded by components (e.g., individual neurons) into the semantically structured, multimodal space of a foundation model such as CLIP. See the demo here. Authors from Fraunhofer Heinrich Hertz Institute.
An Approach to Technical AGI Safety and Security - Google DeepMind shares their approach to AGI safety, exploring four main risk areas: misuse, misalignment, accidents, and structural risks, with a deeper focus on misuse and misalignment. Authors from Google DeepMind.
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features - This paper introduces ConceptAttention, a method that leverages the expressive power of DiT attention layers to generate high-quality saliency maps that precisely locate textual concepts within images. Without requiring additional training, ConceptAttention repurposes the parameters of DiT attention layers to produce contextualized concept embeddings [Code]. Authors from Georgia Tech, Virginia Tech, and IBM.
New Evaluation Benchmarks:
- MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs - a large-scale high-quality medical reasoning dataset designed to enable faithful and explainable medical problem-solving in large language models (LLMs). Authors from multiple institutions including UC Santa Cruz, Stanford, and University of British Columbia.
- PaperBench: Evaluating AI's Ability to Replicate AI Research - a benchmark evaluating the ability of AI agents to replicate AI research. Agents must replicate 20 ICML 2024 Spotlight and Oral papers from scratch, including understanding paper contributions, developing a codebase, and successfully executing experiments. For objective evaluation, they developed rubrics that hierarchically decompose each replication task into smaller sub-tasks with clear grading criteria. Authors from OpenAI.
- BrowseComp: a benchmark for browsing agents - To measure the ability for AI agents to locate hard-to-find, entangled information on the internet, Open AI recently released a new open source benchmark of 1,266 challenging problems called BrowseComp (“Browsing Competition”). Authors from OpenAI.

🧠 Sources of Inspiration

Hugging Face AI Agents Course
AI on Screen - Google and Range Media are partnering to commission films about the relationship between humanity and AI. They are calling for ideas and submissions for emotionally-driven short films across genres.
DeepSite - chat interface for creating websites using Deepseek (hosted in Hugging Face Spaces)
Single file “RL for LLM” library - Implementation of DeepSeek R1-zero style training with a single 80G GPU, no RL library, 3B base model.
RecML: High-Performance Recommender Library - a high-performance, large-scale deep learning recommender system library optimized for Cloud TPUs. It aims to provide researchers and practitioners state-of-the-art reference implementations, tools, and best practice guidelines for building and deploying recommender systems.

Cover photo from SemanticLens 1.1 UMAP Viewer.

Spill the GPTea

Discussion about this post

Spill the GPTea

Educating Llamas about cyberattacks

In the News

🗞 General News

🥁 Interesting Products & Features

📄 Interesting Papers

🧠 Sources of Inspiration

Discussion about this post