Not quite a fairytale: a bite of the Apple π, a Dragonfly π, and character tuning π°
In the News
π This Week in News
What is βApple Intelligenceβ? Apple has vertically integrated AI - from chips to data centers to on-device models. The platform has β3 levels of LLMsβ including a 3B on-device model, a larger model hosted on Appleβs private servers, and 3rd party models (ie GPT-4o from ChatGPT). They rely heavily on fine-tuning for specific tasks and route tasks to specific models based on the specific query. They are taking safety pretty seriously and are continuously performing red-teaming on their models.
Anthropic researchers discuss how they developed Claudeβs Character and share considerations, alignment strategies, and ideas for the future of βcharacter trainingβ.
Two weeks ago we shared Anthropicβs work on interpreting the inner workings of Claude using a sparse autoencoder. This week, OpenAI shared their own experiments using the same mechanistic interpretability methods on GPT-4. While not as comprehensive a report as Anthropic, they shared a fascinating visualization tool that allows you to look at features for both gpt-2 small and gpt4.
π₯ Interesting Products & Features
Stable Audio Open - an open-source text-to-audio model for generating audio samples and sound design from Stability AI. You can use it to create drum beats, instrument riffs, ambient sounds, foley and production elements up to 47 seconds long.
Qwen 2 released from Alibaba - pretrained and instruction-tuned models of 5 sizes, open-source on Hugging Face. Larger context window (128K tokens), SOTA performance on benchmarks, improvements in code and math, and trained on 29 different languages. Check out the Demo Here.
Dragonfly - A large vision-language model with multi-resolution zoom from Together AI. They launched two open-source models, Llama-3-8b-Dragonfly-v1 and Llama-3-8b-Dragonfly-Med-v1, trained specifically on biomedical image-instruction data. Dragonfly employs two key strategies: multi-resolution visual encoding and zoom-in patch selection, which enables the model to focus more fine-grained details on image regions and provide better commonsense reasoning.
Mistral releases LoRA fine-tuning API to customize their models
π Interesting Papers
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model1 - Compresses the UNet of Stable Diffusion into 1.99 bits (1.72 GB β> 219 MB!). This is a 7.9X compression ratio and they report improved generation quality across three benchmarks and based on human evaluation. Browsing their results, in my opinion, they are much better than the original model. Authors from Snap.
Scalable MatMul-free Language Modeling - This paper shows that matrix multiplication operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales. They also provide code compatible with the Hugging Face transformers library. Authors from University of California, Santa Cruz.
Towards a Personal Health Large Language Model - fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. They show impressive results that are comparable to human experts. Authors from Google.
GKAN: Graph Kolmogorov-Arnold Networks - Extends Kolmogorov-Arnold Networks (KAN) to graph-structured data. Shows improved performance relative to Graph Convolutional Networks. Authors from University of Southern California.
Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses - This paper assesses the abilities of LLMs to perceive and integrate human intentions and emotions into their Theory of Mind (ToM) reasoning processes within open-ended questions. They use posts from Reddit's ChangeMyView platform, which demands nuanced social reasoning to craft persuasive responses. Their analysis reveals clear disparities in ToM reasoning capabilities in open-ended questions, even in the most advanced models. They implement a prompt tuning method which shows some improvement, but it still falls short of fully achieving human-like reasoning. Authors from University of Washington.
π§ Sources of Inspiration
NeurIPS Competitions 2024 - including competitions on generative AI, LLMs, reinforcement learning, privacy, and bionic humans!
There were a lot of interesting papers this week!
Other papers that were interesting but didnβt make the list above:
LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing
ReLUs Are Sufficient for Learning Implicit Neural Representations
Recurrent Context Compression: Efficiently Expanding the Context Window of LLM
Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue
The image for todayβs post shows results from BitsFusion.