This one is for the builders (also pixtral, o1, and controlling diffusion models)

In the News

Brinnae Bent

Sep 17, 2024

OpenAI o1 - a new series of AI models designed to spend more time thinking before they respond. According to OpenAI, they can reason through complex tasks and solve harder problems than previous models in science, coding, and math. People have pretty mixed feelings on the new models but seem to agree on two things: the models are slower and a lot more expensive.

🥁 Interesting Products & Features

Pixtral 12B from Mistral AI - Mistral’s first multimodal model. Built on Nemo 12B, the new model can answer questions about an arbitrary number of images of an arbitrary size given either URLs or images encoded using base64, the binary-to-text encoding scheme. Available on Hugging Face. Good news - it’s on the Apache 2.0 license.
Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown from Jina AI
Hugging Face open sourced FineVideo dataset - a collection of over 43,000 YouTube videos with detailed notes on scenes, characters, plot twists, and how audio and visuals play together, making it a versatile tool for everything from improving pre-trained models to fine-tuning AI for specific video tasks.

📄 Interesting Papers

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models - Introduces ‘Concept Sliders’, low rank adaptors applied on top of pretrained image generation models for precise editing. By using simple text descriptions or a small set of paired images, concept sliders are trained to represent the direction of desired attributes. At generation time, these sliders can be used to control the strength of the concept in the image, enabling nuanced tweaking. Authors from Northeastern and MIT.
One-Shot Diffusion Mimicker for Handwritten Text Generation - This paper proposes a One-shot Diffusion Mimicker for stylized handwritten text generation, which only requires a single reference sample as style input, and imitates its writing style to generate handwritten text with arbitrary content. GitHub. Authors from South China University of Technology and National University of Singapore.
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs - Hierarchical cOntext MERging (HOMER) is a method for lengthening the context limit without training. HOMER uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks. Each chunk is then processed collectively, employing a hierarchical strategy that merges adjacent chunks at progressive transformer layers. A token reduction technique precedes each merging, ensuring memory usage efficiency. Authors from various institutions, including KAIST, University of Michigan, and Carnegie Mellon University.

🧠 Sources of Inspiration

Build a RAG Pipeline Step-by-step tutorial for building a RAG pipeline using Amazon S3, Vectorize, Pinecone, and OpenAI
Build a Brain Computer Interface Your own Brain Computer Interface - PiEEG-16 is a low-cost shield that lets you convert a Raspberry Pi to a brain-computer interface (EEG)
Build a fine-tuned LLM Fine tune a 405B parameter model with 64 H100 GPUs in 1 click with Axolotl and Lambda Cloud.
Build a serverless AI app AWS AI Stack – Full-stack boilerplate project for building serverless AI applications on AWS
Starst3r - Python package for 3D reconstruction from 2D images
Deepfakes & Social Engineering - article by VC firm Greylock -they are currently looking for cybersecurity founders in the deepfake detection space

Cover photo from One-Shot Diffusion Mimicker for Handwritten Text Generation.

Spill the GPTea

Discussion about this post

Spill the GPTea

This one is for the builders (also pixtral, o1, and controlling diffusion models)

In the News

🗞 This Week in News

🥁 Interesting Products & Features

📄 Interesting Papers

🧠 Sources of Inspiration

Discussion about this post