10 de agosto de 2025

Snap Research 2025

Conferences & Events

The Snap Research team is leading innovation across AR & generative AI, recommendation systems, and personalized creative tools.

In 2025, we are showcasing our work across several of the leading industry conferences & events.

Past Events:

SIGGRAPH 2025 -- Vancouver, Canada from August 10th - August 14th

Nested Attention: Semantic-aware Attention Values for Concept Personalization

Nested Attention is a new method that helps enhance identity preservation in image generation models, creating more consistent and accurate pictures of specific subjects across different styles and scenes. By introducing a semantic-aware attention structure, the model better preserves identity across varied styles and scenes. This makes it possible to create personalized images, even combining different subjects – like a person and their pet – into one picture.

InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention

This paper introduces InstantRestore, a method for restoring degraded face images using a single forward pass through a diffusion model. It aims to retain identity-specific features, supporting efficient identity-aware restoration for portrait photo enhancement.

Dynamic concepts personalization from single videos

Set-and-Sequence is a new framework for video generation models that addresses the challenge of generating videos with “dynamic concepts” – entities defined not only by their appearance but also by their unique motion patterns across time, such as ocean waves or a flickering bonfire. Set-and-Sequence enables realistic video personalization by learning how dynamic subjects behave over time, allowing for consistent motion, scene composition, and cross-scene blending.

DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling

DuetGen is a framework for generating synchronized two-person dance motions directly from music. It addresses the challenge of modeling interactive choreography, such as coordinated movement and physical interactions between dance partners. The system enables realistic duet dance generation for applications in animation, virtual avatars, and digital performance.

Be Decisive: Noise-Induced Layouts for Multi-Subject Generation

Our work Be Decisive tackles the challenge of accurately generating multiple distinct subjects in complex images without visual inaccuracies or unintended blending. Be Decisive introduces a small neural network that predicts and refines a noise-induced spatial layout during denoising, guiding where each subject should appear from the earliest stages of image generation. This allows for the creation of highly detailed images with multiple specific subjects, ensuring clear boundaries and natural compositions between them.

KDD 2025 -- Toronto, Ontario, Canada from August 3rd - August 7th

GiGL: Large-Scale Graph Neural Networks at Snapchat

GiGL is an open-source library for training and running Graph Neural Networks (GNNs) on large-scale graphs, supporting hundreds of millions of nodes and billions of edges. GIGL is used at Snap across key machine learning applications, including user growth, content ranking, and advertising.

On the Role of Weight Decay in Collaborative Filtering: A Popularity Perspective

This paper introduces PRISM (Popularity-awaRe Initialization Strategy for embedding Magnitudes). PRISM eliminates the use of embedding weight decay, a common but expensive technique in recommendation model training, and instead replaces it with a single light computation at the onset of training. PRISM is fast, simple to apply, leading to more efficient recommendation systems.

Revisiting Self-Attention for Cross-Domain Sequential Recommendation

This work introduces AutoCDSR, a method for improving how such systems predict user behaviors across different interaction domains, by promoting effective knowledge sharing while mitigating noisy or irrelevant signals. AutoCDSR improves accuracy and robustness of personalization in recommendation settings.

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training Snap AI T2I Model for Mobile Devices

SnapGen is a high-performance text-to-image research model designed to run directly on mobile devices, generating high-quality images in under two seconds. It has the potential to drastically reduce the compute and memory required for on-device image generation.

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

SnapGen-V extends our SnapGen model to generate five-second videos directly on mobile devices in just five seconds. It brings fast, on-device video generation into reach, building on our advances in text-to-image modeling.

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

The 4Real-Video research model generates realistic 4D videos with rich detail and natural motion, viewable from multiple angles. This technology has potential applications in immersive VR and next-gen storytelling experiences.

Stable Flow: Vital Layers for Training-Free Image Editing

Our Stable Flow research model enables powerful image editing, such as adding or removing objects without requiring complex training or high-end hardware. This approach allows anyone to edit photos with ease, no technical expertise needed.

Omni-ID: Holistic Identity Representation Designed for Generative Tasks

Our Omni-ID research model builds a comprehensive representation of a person’s face across various angles and expressions, enabling more realistic and personalized AI and AR generations.

PrEditor3D: Fast and Precise 3D Shape Editing

PrEditor3D is a tool developed by our research teams that allows for quick and precise editing of 3D models with minimal input, streamlining the process of 3D content creation by simplifying how 3D shapes are manipulated and adjusted. In application, PrEditor3D has the potential to make it easier for animators and Lens creators to bring their visions to life efficiently, leading to richer and more immersive AR experiences.

Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning

MM-Graph introduces the first benchmark for multi-modal graph learning, incorporating both visual and textual data to address the significant gap of visual information in current benchmarks. This allows for more comprehensive model evaluation and drives innovation in graph learning systems that can understand richer, real-world inputs.

Video Alchemist

With a text prompt and a set of reference images, Video Alchemist enables the ability to generate videos without extensive tuning or optimization. In application this will streamline video personalization with custom appearances and backgrounds, saving time while enhancing creativity.

Mind the Time: Temporally-Controlled Multi-Event Video Generation

Mind the Time introduces precise temporal control into AI-generated videos. It would allow creators to dictate the sequence and timing of events. It enables more structured, coherent storytelling in video generation.

Video Motion Transfer with Diffusion Transformers

Video Motion Transfer is a method for transferring realistic motion from one video to another using a diffusion research model. In application this model could easily create videos with realistic movement by transferring motion from reference videos, without needing complex setups.

Wonderland: Navigating 3D Scenes from a Single Image

Wonderland creates detailed 3D scenes from just one photo, simplifying the creation of 3D scenes, and allowing for faster and more efficient design without needing multiple angles or extensive resources.

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

AC3D improves camera control within video generation models, enabling smoother, more realistic movement. This gives creators more flexibility over camera movements in videos, and improves the quality and realism of generated scenes.

*All models and work outlined here is for research purposes only.

This post will continue to be updated.

Volver a las noticias

Comunícate con nosotros

Para solicitudes de prensa, envía un correo a press@snap.com.
Para cualquier otra consulta, visita nuestro sitio de ayuda.