🗞 This Week in News
Meta Llama 3 released - pretrained and instruction-fine-tuned language models with 8B and 70B parameters that can support a broad range of use cases. According to benchmarks, the most impressive open access model to date. Up against commercial english language models (lmsys leaderboard), it is second only to GPT-4.
TED 2024 collab with OpenAI Sora + artist Paul Trillo to create a generated video of what TED 2064 may look like.
🥁 Interesting Products & Features
Zerve, the IDE for LLMs - Platform for importing, fine-tuning and deploying Gen AI in enterprise solutions.
Hugging Chat launches on iOS - explore and chat with the best open source LLMs and customize your own
Phi-3 mini - 3.8B LLM with results on benchmarks similar to Mixtral 8x7B and GPT-3.5 but small enough to be deployed on a phone. How did they do it? A carefully curated dataset, composed of heavily filtered web data and synthetic data. Read the technical report here.
📄 Interesting Papers
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time - One portrait photo + speech audio = hyper-realistic talking face video with precise lip-audio sync, lifelike facial behavior, and naturalistic head movements, generated in real time. Their diffusion model accepts optional signals as condition, such as main eye gaze direction and head distance, and emotion offsets. Authors from Microsoft. **Obviously a lot of harm could be done with this technology. Fortunately, the authors agree and released this statement with the publication: Given such context, we have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations.**
The Curse of Recursion: Training on Generated Data Makes Models Forget - This study finds that the use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. Authors from Google, Cambridge, and University of Toronto.
Automated Social Science: Language Models as Scientist and Subjects - Structural causal models provide a language to state hypotheses, a blueprint for constructing LLM-based agents, an experimental design, and a plan for data analysis. The fitted structural causal model becomes an object available for prediction or the planning of follow-on experiments. Authors from MIT and Harvard.
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey - This paper is a survey that outlines key themes when selecting an agentic architecture, the impact of leadership on agent systems, agent communication styles, and key phases for planning, execution, and reflection that enable robust AI agent systems. Authors from Neudesic (IBM) and Microsoft.
🧠 Sources of Inspiration
What to do with your life? Follow curiousity.