1M Context Window, Sora, inadequate benchmarks, sources of inspiration

In the News | Week 7

Brinnae Bent

Feb 20, 2024

Gemini 1.5 from Google achieves comparable quality to 1.0 Ultra, while using less compute. Uses a Mixture-of-Experts architecture. The exciting advancement here is the 1M token context window (compare to 128k token context window for Open AI’s gpt-4-turbo). If you are wondering what a context window, check out the “Continued Learning” section below.
Open AI’s new text-to-video model Sora launches with video previews that are knocking everyone’s socks off. TBD how good they are in “real life”, but this could be another disruptive technology from Open AI. 🎥 (Thanks to Mark DeLong for sending this link over to share!)

🥁 Interesting Products & Features

🚀 Groq is able to make LLM inferences at incredible speeds. They are using Language Programming Units, which they claim are faster than GPUs for running LLMs. Using Mixtral and Llama.
✨Just launched ✨ Taipy is an open-source Python library for building data and AI web apps (new alternative to Streamlit)

📄 Interesting Papers

Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence: This paper critically assess 23 state-of-the-art LLM benchmarks through the lenses of people, process, and technology, under the pillars of functionality and security. Authors from La Trobe University.
Can Separators Improve Chain-of-Thought Prompting?: Using Chain of Thought (CoT) with separators significantly improves the LLMs' performances on complex reasoning tasks compared with the vanilla CoT, which does not use separators. Authors from Yonsei University.

🧠 Sources of Inspiration

YC Request for Startups includes climate tech, robotics, spatial computing, cancer tech, and explainable AI.
Hacker News thread: How do you come up with side project ideas in 2024?

🍎 Continued Learning

What’s a context window?

In the ML space, a “context window” refers to the amount of tokens (tokens = broken up pieces of text, you can think of these like words) that can be sent to the model at one time.

I like to think of this as the model’s “working memory”. This is the immediate information available to the model in a non-compressed format (all of the learned data from a model is extremely compressed, similar to how you are barely able to remember the breakfast you ate two Sundays ago because that information has given way to more relevant information, like the reading of this newsletter).

This is the only information the model has about the current world and is inclusive of any previous conversation, searches, images, or documents you give the model.

The human’s working memory is limited to 7 items +/- 2 (according to Miller’s Law; which is why phone numbers are 7 digits long). With the increase from 128k tokens (gpt-4-turbo) to 1M+ tokens (gemini-1.5), this would be like now being able to remember 8 phone numbers at one time (how’s that for a party trick?)

What other news from the week are you excited about? Leave a comment!

Spill the GPTea

Discussion about this post