🗞 This Week in News
Web browser company Opera allows users to download and use LLMs locally. Over 150 models included, including Llama, Gemma, and Vicuna. Note that each variant will take up >2GB of space on your local system to run a model.
🥁 Interesting Products & Features
Stable Audio 2.0 sets a new standard in AI-generated audio, producing high-quality, full tracks with coherent musical structure up to three minutes in length. Enables audio-to-audio generation, allowing users to upload and transform samples using prompts. Trained exclusively on licensed data 👏
Cosmopedia: a dataset of synthetic textbooks, blog posts, stories, posts, and WikiHow articles generated by Mixtral-8x7B-Instruct-v0.1. It contains over 30 million files and 25 billion tokens, making it the largest open synthetic dataset to date. They use a Llama2 architecture to assess quality. They released their code, dataset, and a 1B model trained on the data called cosmo-1b. This model was trained on LLM-created data that was quality-checked by another LLM. I particularly liked this anecdote from the article: “If you are anticipating tales about deploying large-scale generation tasks across hundreds of H100 GPUs, in reality most of the time for Cosmopedia was spent on meticulous prompt engineering.”
OpenAI improves fine-tuning API and expands their custom models program. Including a new Playground UI for comparing models, integration with Weights and Biases, better evaluation metrics, and hyperparameter configuration. Statement from OpenAI: “We believe that in the future, the vast majority of organizations will develop customized models that are personalized to their industry, business, or use case.”
📄 Interesting Papers
Many-shot Jailbreaking: This paper investigates a method that can be used to evade the safety guardrails put in place by LLM developers. The TLDR is that this method, “many shot jailbreaking”, just includes large amounts of text in a specific configuration in the prompt, which causes LLMs to produce potentially harmful responses despite their training. The authors are trying to find and expose hacks before those with harmful intent do, and wrote this paper to accelerate mitigation strategy research. Authors from Anthropic.
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want: This paper introduces a new model, a multi-domain dataset, and a benchmark for visual prompting. The new model is SPHINX-V, a MLLM that connects a vision encoder, a visual prompt encoder and an LLM for various visual prompts (points, bounding boxes, cycle and free-form shape) and language understanding.
Implementing machine learning techniques for continuous emotion prediction from uniformly segmented voice recordings: In this study, the authors use nonsensical emotional audio clips (the sentences did not make sense but they were spoken with a specific emotion). These clips were used to train models to classify the vocal emotion as fear, anger, joy, sadness, disgust, or neutral. The same clips were also shown to human participants who listened to the emotional audio clips and classified the emotions. The results showed that the AI emotion classifications were comparable in accuracy to the human emotion classifications. Small sample size, but an interesting study! Authors from Thomas Bayes Institute.
🧠 Sources of Inspiration
PoorOrpo: Finetuning LLMs on a budget: a Colab notebook that is an end-to-end LLM finetuning guide. Finetunes, evaluates, and infers an LLM.
Based on our in-class discussion last week: study finds reliance on ChatGPT is linked to procrastination, memory loss, and a decline in academic performance. 😬
Using LLMs in Products - talks about potential implementations and challenges. A good (relatively quick!) read for those interested in building products with LLM tech.