AI Chips, transformer embedding updates for math, and randomness in LLMs

In the News

Brinnae Bent

Jun 04, 2024

Mistral AI - Non-production License - Mistral releases their new model under a new license. This license allows developers to use the technology for non-commercial purposes and to support research work, but commercial products will need to pay $ to Mistral. What does this mean? Companies building large pre-trained (aka “foundational”) models are still trying to figure out the right balance between open release software and creating profitable businesses.
AI Chip War
- NVIDIA announces new chip architecture, “Rubin” (less than 3 months from the announcement of Blackwell, which has yet to ship to customers).
- AMD releases new AI chips titled MI350, expected to be available in 2025 which will offer 35x faster inference time

🥁 Interesting Products & Features

Perplexity Pages - from search engine to content creation, Perplexity introduces a new way to interact with its platform. It’s interesting from the UX perspective, but more interesting from the data curation perspective…will they use this human-curated content to improve their algorithms?
Codestral from Mistral AI (22B parameters) - trained on a diverse dataset of 80+ programming languages. Shows pretty decent improvements over other code-specific models. They did not compare to SOTA general models like Claude Opus or GPT-4o.
ControlNet Scribble with SDXL - Promises greater control of SDXL using sketching. (If this looks familiar it’s because this was in the original ControlNet with base Stable Diffusion. We have been waiting a while to get comparable ControlNet options with the improved SDXL model)

📄 Interesting Papers

Transformers Can Do Arithmetic with the Right Embeddings - The authors modify the transformer model by adding an embedding to each digit that encodes its position relative to the start of the number. They found that this allows models to solve arithmetic problems that are larger and more complex than those in their training data. Authors from University of Maryland.
This is a fun one - How Random is Random? Evaluating the Randomness and Humaness of LLMs' Coin Flips - Humans have an inability to be random. This study finds that GPT 4 and Llama 3 exhibit and exacerbate nearly every human bias they tested in the context of randomness, but GPT 3.5 exhibits more random behavior. Authors from Cornell.
The Platonic Representation Hypothesis - The authors put forth this hypothesis: “Neural networks, trained with different objectives on different data and modalities, are converging to a shared statistical model of reality in their representation spaces.” Why the convergence? Task generality, model capacity, and simplicity bias are all reasons the authors suggest. Authors from MIT. Explanatory outline with awesome visualizations.
Improving Clinician Performance in Classifying EEG Patterns on the Ictal–Interictal Injury Continuum Using Interpretable Machine Learning - This paper shows an interpretable deep-learning system that accurately classifies six patterns of potentially harmful EEG activity. The interpretable model performance was significantly higher than that of a corresponding uninterpretable black-box model. Authors from Duke. News Article.
An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation - By introducing the information bottleneck theory into retrieval-augmented generation, this paper shows improvements on Q&A tasks. The approach involves the filtration of noise by simultaneously maximizing the mutual information between compression and ground output, while minimizing the mutual information between compression and retrieved passage. Authors from Harbin Institute of Technology.

🧠 Sources of Inspiration

1-bit LLMs are small, speedy, and nearly as accurate as their large model counterparts - this short article summarizes recent research in this space
LLM Merging Competition (NeurIPS 2024) - Recent work has shown that specialized fine-tuned models can be rapidly merged to combine capabilities and generalize to new skills. The competition deadline is in mid-September, so this is perfect if you are looking to build some skills over the summer! Winners receive a cash prize and an opportunity to present their work at NeurIPS 2024.
Mozilla Builders Accelerator on open-source Local AI - up to $100k in funding, applications due end of July.
Reproducing GPT-2 in 90 minutes with $20 - Includes code and discussion.

Spill the GPTea

Discussion about this post

Spill the GPTea

AI Chips, transformer embedding updates for math, and randomness in LLMs

In the News

🗞 This Week in News

🥁 Interesting Products & Features

📄 Interesting Papers

🧠 Sources of Inspiration

Discussion about this post