π This Week in News
Llama 3.1 from Meta - Open-source is king this week - Llama 3.1 benchmark scores top the leaderboards, outshining closed models GPT-4o and Claude 3.5 Sonnet on some tasks. Sizes include 405B , 70B, and 8B parameters. Context window of 128k tokens. Open weights on Hugging Face.
AI achieves silver-medal standard solving International Mathematical Olympiad problems - AlphaProof and AlphaGeometry 2 together solved four out of six problems from this yearβs International Mathematical Olympiad (IMO), achieving the same level as a silver medalist in the competition for the first time.
π₯ Interesting Products & Features
Mistral 2 Large from Mistral AI - 123B parameters, 128k context window, supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Released for research (non-commercial) use.
SAM 2 from Meta - SOTA real-time object segmentation in images and videos. Meta open sourced the model and the training data. Try it yourself in their free web demo. Itβs pretty impressive! If you have worked with SAM before, you will notice significant improvements in SAM 2.
The free-tier Gemini experience just got faster and better. Google adds Gemini 1.5 Flash to Gemini. There are especially noticeable improvements in reasoning and image understanding. This also quadruples Geminiβs context window to 32K tokens.
OpenAI is beta testing SearchGPT, a prototype of new AI search features that βgive you fast and timely answers with clear and relevant sourcesβ. The demos are interesting, but you will have to join the waitlist to test it for yourself.
AI Studio from Meta - AI Studio is βa place for people to create, share and discover AIs to chat with β no tech skills requiredβ. It allows anyone to create their own AI chatbot βto make you laugh, generate memes, give travel advice and so much moreβ. Creators can also make an AI as an extension of themselves to answer common DM questions and story replies.
GPT-4o Long Output - an experimental version of GPT-4o with a maximum of 64K output tokens per request
π Interesting Papers
SurvReLU: Inherently Interpretable Survival Analysis via Deep ReLU Networks - bridges the gap between previous deep survival models and traditional tree-based survival models through deep ReLU networks. A deliberately constructed deep ReLU network (SurvReLU) can harness the interpretability of tree-based structures with the representational power of deep survival models. Authors from University of Arkansas and Wash U in St. Louis.
KAN or MLP: A Fairer Comparison - more comprehensive comparison of KAN and MLP models across various tasks, including computer vision, audio processing, natural language processing, and symbolic formula representation. They control the number of parameters and FLOPs to compare the performance of KAN and MLP. The main observation is that, except for symbolic formula representation tasks, MLP generally outperforms KAN. Authors from National University of Singapore. (Side note - I appreciate the comprehensive evaluation, although it would be nice to also see analysis of the differences in interpretability, since that is a huge benefit of KAN)
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation - this paper introduces a new approach to object segmentation, which segments objects in 3D point clouds using natural language descriptions. Authors from Fudan University.
StreamMOS: Streaming Moving Object Segmentation with Multi-View Perception and Dual-Span Memory - Moving object segmentation based on LiDAR is a crucial and challenging task for autonomous driving and mobile robotics. A critical issue is inconsistent segmentation results for the same object in different frames. This paper proposes a streaming network with a memory mechanism, called StreamMOS, to build the association of features and predictions among multiple inferences. They utilize a short-term memory to convey historical features and build a long-term memory to store previous predictions and exploit them to refine the present forecast. Authors from Northeastern University, Shenyang, China.
Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal - This one is pretty neat. They introduce a large-scale, real-world raindrop removal dataset called Raindrop Clarity. Raindrop Clarity comprises 15,186 high-quality pairs/triplets (raindrops, blur, and background) of images with raindrops and the corresponding clear background images. The dataset can be used to explore background-focused and raindrop-focused images, including challenges unique to daytime and nighttime conditions. Authors from National University of Singapore.
π§ Sources of Inspiration
Fine tune gpt-4o mini for free through September 23.
βFriendβ open-source AI wearable necklace dev kit - from the same folks who brought us βOpen Glassβ - the open-source glasses dev kit
Using LLM as a judge for evaluations - a nice review blog from Netflix ML Engineer
Cover photo from SAM 2 from Meta.