🗞 This Week in News
Claude 3 model family released (Haiku, Sonnet, and Opus) - better than GPT-4 at many benchmarks, includes vision, and is fast (Haiku can read an information dense research paper on arXiv with figures in less than three seconds). At launch, the token limit is 200k but they claim “models are capable of accepting inputs >1 million tokens” and will release this to select customers.
All the 2024 Leap Day Bugs documented here.
🥁 Interesting Products & Features
ChunkLlama: Dual chunk attention is a training-free and effective method for extending the context window of large language models (LLMs) to more than 8x times their original pre-training length.
Gleam is a new programming language. The authors claim you can learn it in a single afternoon. ⭐️
📄 Interesting Papers
HyperAttention: Long-context Attention in Near-Linear Time: Not a new paper (from October ‘23) but having a moment because it is rumored to be the tech behind the enormous context window seen in the Gemini models.
Axe the X in XAI: A Plea for Understandable AI: This paper argues that the term “Explainable AI (XAI)” is misleading and the terminology should be updated to “Understandable AI”.
Stable Diffusion 3 Research Paper: Stability AI shares the tech behind their recent model, including the Multimodal Diffusion Transformer (MMDiT) architecture, which uses separate sets of weights for image and language representations. Full paper here.
🧠 Sources of Inspiration
AI Hackathon by Columbia and Cornell (March 23-24)
Have an idea for practical applications of quantum computing? Win $5M from Google.
Prompt Library (mostly for educators, but you may find some interesting ideas from the way they structure prompts)
PKU attempts to build “Open Sora” model and asks for open source contributors. GitHub.
AI internship hunt resource. Research focused, but relevant details for anyone searching for an internship in data science/ML/AI.