🗞 This Week in News
Jamba from AI21 Labs - pretrained, MoE generative text model with 12B parameters. 256k context length. Combines transformer architecture components with Mamba Structured State Space model (SSM) technology = “Joint Attention and Mamba (Jamba) architecture”. Outperforms Llama, Gemma, and Mixtral on most reasoning benchmarks.
Robots that can detect emotion! A team from Columbia created a robot “Emo”. They developed two AI models: one that predicts human facial expressions by analyzing subtle changes in the target face and another that generates motor commands using the corresponding facial expressions. This allows the robot to anticipate and replicate a person’s smile before the person actually smiles
🥁 Interesting Products & Features
EvoEval: Holistic LLM code benchmark. Includes visualization tool, new benchmark problem sets, and a leaderboard.
The Berkeley Function Calling Leaderboard (also called Berkeley Tool Calling Leaderboard) evaluates the LLM's ability to call functions (aka tools) accurately.
Qwen1.5-MoE - a new model from the Qwen team matches 7B model performance with only 1/3 of the parameters (2.7B) activated. 75% decrease in training expenses and accelerates inference speed by a factor of 1.74 with the same performance.
Grok-1.5 announced this week, with improved reasoning capabilities relative to Grok-1. Context length of 128k tokens. Interestingly, it was built on a custom distributed training framework based on JAX, Rust, and Kubernetes.
📄 Interesting Papers
ReALM: Reference Resolution As Language Modeling - In this paper, Apple teases ReaLM, an on-device LLM that “substantially outperforms” GPT-4. This paper demonstrates how LLMs can be used to create an extremely effective system to resolve references of various types, by showing how reference resolution can be converted into a language modeling problem, despite involving forms of entities like those on screen that are not traditionally conducive to being reduced to a text-only modality. Authors from Apple.
Foresight - a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study - Foresight is a transformer-based pipeline that uses named entity recognition and linking tools to convert EHR document text into structured, coded concepts, followed by providing probabilistic forecasts for future medical events, such as disorders, substances, procedures, and findings. Authors from King’s College London.
Accelerating Scientific Discovery with Generative Knowledge Extraction, Graph-Based Representation, and Multimodal Intelligent Graph Reasoning: In this paper, the author transformed a dataset comprising 1,000 scientific papers into an ontological knowledge graph. Through an in-depth structural analysis, they calculated node degrees, identified communities and connectivities, and evaluated clustering coefficients and betweenness centrality of pivotal nodes, uncovering fascinating knowledge architectures. One comparison revealed structural parallels between biological materials and Beethoven's 9th Symphony, highlighting shared patterns of complexity through isomorphic mapping. Author from MIT.
🧠 Sources of Inspiration
OpenAI Evals framework is a system to evaluate an LLM. It contains an open-source registry of evaluations. Docs and code here.
Blog from OpenAI on lessons learned from VoiceEngine, a model for creating custom voices. Potential use cases include providing reading assistance, translation, supporting non-verbal people, and helping patients recover their voice. They share identified risks and safety/privacy concerns as well as recommendations for the future of this tech.
"Emo" could have very promising applications in the area of healthcare with the ability to read facial cues and respond empathetically, especially when interacting with children or the elderly.