🗞 This Week in News
For people who listen to podcasts - what do you think of the new NotebookLM feature that can turn anything into a podcast? I want to know in the comments. I have listened to a few, but I am not an avid podcast listener so I am curious what you think!
Meta Orion AR glasses - paired with an EMG-based wristband for control, these glasses will be able to overlay holograms on the real world in front of you. Being released as a dev kit only for now, but this is a peak into the future of wearable tech. Combines a lot of cool tech - from biosignals to miniaturizing projectors to AI.
OpenAI Academy - OpenAI to invest in developers and organizations leveraging AI to help solve hard problems and catalyze economic growth in their communities. The Academy will “ensure that the transformative potential of artificial intelligence is accessible and beneficial to diverse communities worldwide, starting in low- and middle-income countries.”
🥁 Interesting Products & Features
Raspberry Pi + Sony team up to release AI-powered camera module that enables “edge AI solutions that process visual data”. Pairs the Raspberry Pi RP2040 microcontroller chip with Sony’s IMX500 image sensor and eliminates the need for additional components like accelerators or a GPU, making it the ideal component for vision-based AI on the edge. $70.
📄 Interesting Papers
Clinical evaluation of a machine learning–based early warning system for patient deterioration - A machine learning–based early warning systems for patient deterioration in hospitals was deployed in St. Michael’s Hospital for 1.5 years and it reduced unexpected hospital deaths. Authors from St. Michael’s Hospital.
Time-MOE: Billion-scale Time Series Foundation Models with Mixture of Experts - a family of decoder-only transformer models that operate in an autoregressive manner and support flexible forecasting horizons with varying input context lengths. Models pre-trained on Time-300B, which spans over 9 domains and encompassing over 300 billion time points. The first time scaling a time series foundation model up to 2.4 billion parameters, the model achieved significantly improved forecasting precision. Unknown prior to this, these results validate the applicability of scaling laws for training tokens and model size in the context of time series forecasting. Authors from Princeton, Squirrel AI, and Griffith University.
PGN: The RNN's New Successor is Effective for Long-Range Time Series Forecasting - PGN directly captures information from previous time steps through the designed Historical Information Extraction layer and leverages gated mechanisms to select and fuse it with the current time step information. This reduces the information propagation path, effectively addressing the limitations of RNN. They propose a novel temporal modeling framework called Temporal PGN (TPGN), which incorporates two branches to comprehensively capture the semantic information of time series. One branch utilizes PGN to capture long-term periodic patterns while preserving their local characteristics. The other branch employs patches to capture short-term information and aggregate the global representation of the series. Experimental results on five benchmark datasets demonstrate SOTA performance. Authors from Beijing Jiaotong University.
ControlEdit: A MultiModal Local Clothing Image Editing Method - proposes a new image editing method ControlEdit, which transfers clothing image editing to multimodal-guided local inpainting of clothing images. They implement a self-supervised learning approach and extend the channels of the feature extraction network to ensure consistent clothing image style before and after editing. They also designed an inverse latent loss function to achieve soft control over the content of non-edited areas. and adopted Blended Latent Diffusion as the sampling method to make the editing boundaries transition naturally and enforce consistency of non-edited area content. Authors from School of Arts & Sciences, Beijing Institute of Fashion Technology.
Iterative Object Count Optimization for Text-to-image Diffusion Models - Diffusion models have a hard time with accurate numbers of objects. This paper focuses on optimizing the generated image based on a counting loss derived from a counting model that aggregates an object’s potential. Authors from Tel-Aviv University and Bar-Ilan University.
Breaking reCAPTCHAv2 - Using the YOLO models, researchers were able to solve 100% of the captchas from Google's reCAPTCHAv2 system. This implies that current AI technologies can exploit advanced image-based captchas. Authors from ETH Zurich.
🧠 Sources of Inspiration
Practitioner’s Guide to Triton - Triton is a language to program GPUs with “Python-ish” code, which is then compiled into ptx code (the same thing CUDA code is compiled into).
Do you know about AlphaChip? A relatively “old” framework from 2021, one of the first reinforcement learning approaches used to solve a real-world engineering problem and it has led to a proliferation of research in AI for chips over the past few years. AlphaChip is an open-source framework for generating chip floorplans with distributed deep reinforcement learning.