Evolutionary Models, an AI SWE, and Prompt Compression

Mar 26, 2024

Sakana AI’s Evolutionary Model Merge - discovers the best ways to combine different models from open-source models with diverse capabilities and automatically creates new foundation models with desired capabilities specified by the user. They’re hiring (in Japan)!
NVIDIA GTC happened last week (see my recap here)

Cognition launches Devin, an AI software engineer. Nicknamed the “first AI software engineer”, Devin boasts impressive problem solving abilities and autonomous learning capabilities, enabling end-to-end development. This was the talk of the town at NVIDIA GTC last week!
Stable Code Instruct 3B released by Stability AI (instruction-tuned large language model, built on top of Stable Code 3B). Enhances code completion and supports natural language interactions. Outperforms comparable models including Codellama 7B Instruct and DeepSeek-Coder Instruct 1.3B.

Chronos: Learning the Language of Time Series - Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. Chronos models were pretrained based on the T5 family (ranging from 20M to 710M parameters) on a large collection of publicly available datasets, complemented by a synthetic dataset that generated via Gaussian processes to improve generalization. Authors from Amazon.
CoLLEGe: Concept Embedding Generation for Large Language Models - This paper seeks to modernize few-shot concept learning. CoLLEGe is a meta-learning framework capable of generating flexible embeddings for new concepts using a small number of example sentences or definitions.Authors from NYU.
LLMLingua-2: Learn Compression Target via Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression - It may be from Microsoft, but it’s a little meta: using a transformer to compress prompts for input into a … transformer. In academic language, this paper introduces a data distillation procedure to derive knowledge from an LLM to compress prompts without losing crucial information by formulating prompt compression as a token classification problem to guarantee the faithfulness of the compressed prompt to the original one. Authors from Microsoft.

Spill the GPTea