Not quite a fairytale: a bite of the Apple 🍎, a Dragonfly 🐉, and character tuning 🏰

In the News

Brinnae Bent

Jun 11, 2024

What is “Apple Intelligence”? Apple has vertically integrated AI - from chips to data centers to on-device models. The platform has “3 levels of LLMs” including a 3B on-device model, a larger model hosted on Apple’s private servers, and 3rd party models (ie GPT-4o from ChatGPT). They rely heavily on fine-tuning for specific tasks and route tasks to specific models based on the specific query. They are taking safety pretty seriously and are continuously performing red-teaming on their models.
Anthropic researchers discuss how they developed Claude’s Character and share considerations, alignment strategies, and ideas for the future of “character training”.
Two weeks ago we shared Anthropic’s work on interpreting the inner workings of Claude using a sparse autoencoder. This week, OpenAI shared their own experiments using the same mechanistic interpretability methods on GPT-4. While not as comprehensive a report as Anthropic, they shared a fascinating visualization tool that allows you to look at features for both gpt-2 small and gpt4.

🥁 Interesting Products & Features

Stable Audio Open - an open-source text-to-audio model for generating audio samples and sound design from Stability AI. You can use it to create drum beats, instrument riffs, ambient sounds, foley and production elements up to 47 seconds long.
Qwen 2 released from Alibaba - pretrained and instruction-tuned models of 5 sizes, open-source on Hugging Face. Larger context window (128K tokens), SOTA performance on benchmarks, improvements in code and math, and trained on 29 different languages. Check out the Demo Here.
Dragonfly - A large vision-language model with multi-resolution zoom from Together AI. They launched two open-source models, Llama-3-8b-Dragonfly-v1 and Llama-3-8b-Dragonfly-Med-v1, trained specifically on biomedical image-instruction data. Dragonfly employs two key strategies: multi-resolution visual encoding and zoom-in patch selection, which enables the model to focus more fine-grained details on image regions and provide better commonsense reasoning.
Mistral releases LoRA fine-tuning API to customize their models

📄 Interesting Papers

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model1 - Compresses the UNet of Stable Diffusion into 1.99 bits (1.72 GB —> 219 MB!). This is a 7.9X compression ratio and they report improved generation quality across three benchmarks and based on human evaluation. Browsing their results, in my opinion, they are much better than the original model. Authors from Snap.
Scalable MatMul-free Language Modeling - This paper shows that matrix multiplication operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales. They also provide code compatible with the Hugging Face transformers library. Authors from University of California, Santa Cruz.
Towards a Personal Health Large Language Model - fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. They show impressive results that are comparable to human experts. Authors from Google.
GKAN: Graph Kolmogorov-Arnold Networks - Extends Kolmogorov-Arnold Networks (KAN) to graph-structured data. Shows improved performance relative to Graph Convolutional Networks. Authors from University of Southern California.
Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses - This paper assesses the abilities of LLMs to perceive and integrate human intentions and emotions into their Theory of Mind (ToM) reasoning processes within open-ended questions. They use posts from Reddit's ChangeMyView platform, which demands nuanced social reasoning to craft persuasive responses. Their analysis reveals clear disparities in ToM reasoning capabilities in open-ended questions, even in the most advanced models. They implement a prompt tuning method which shows some improvement, but it still falls short of fully achieving human-like reasoning. Authors from University of Washington.

🧠 Sources of Inspiration

NeurIPS Competitions 2024 - including competitions on generative AI, LLMs, reinforcement learning, privacy, and bionic humans!

There were a lot of interesting papers this week!

Other papers that were interesting but didn’t make the list above:

The image for today’s post shows results from BitsFusion.

Spill the GPTea

Discussion about this post

Spill the GPTea

Not quite a fairytale: a bite of the Apple 🍎, a Dragonfly 🐉, and character tuning 🏰

In the News

🗞 This Week in News

🥁 Interesting Products & Features

📄 Interesting Papers

🧠 Sources of Inspiration

Discussion about this post