2025年6月11日
2025年6月11日

Snap at CVPR

How Snap Research is Poised to Shape the Future of Creative Digital Technology

This year, we will share 12 papers at CVPR 2025, the premier conference for AI and computer vision innovation, taking place in Nashville, Tennessee — starting today through June 15th.

77% of Snap Research submissions were accepted — beating the industry average of 22% — a testament to the innovative work being done by our team. 

Snap will present on a range of topics, including two of our papers — SnapGen and 4Real-Video — which CVPR highlighted as among the top 3% of submissions.

See below for a complete summary and schedule.



1. SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Snap AI T2I Model for Mobile Devices

SnapGen is a high-performance text-to-image research model designed to run directly on mobile devices, generating high-quality images in under two seconds. It has the potential to drastically reduce the compute and memory required for on-device image generation.

2. SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device

SnapGen-V extends our SnapGen model to generate five-second videos directly on mobile devices in just five seconds. It brings fast, on-device video generation into reach, building on our advances in text-to-image modeling.

3. 4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion 

The 4Real-Video research model generates realistic 4D videos with rich detail and natural motion, viewable from multiple angles. This technology has potential applications in immersive VR and next-gen storytelling experiences.

4. Stable Flow: Vital Layers for Training-Free Image Editing

Our Stable Flow research model enables powerful image editing, such as adding or removing objects without requiring complex training or high-end hardware. This approach allows anyone to edit photos with ease, no technical expertise needed.

5. Omni-ID: Holistic Identity Representation Designed for Generative Tasks

Our Omni-ID research model builds a comprehensive representation of a person’s face across various angles and expressions, enabling more realistic and personalized AI and AR generations.

6. PrEditor3D: Fast and Precise 3D Shape Editing

PrEditor3D is a tool developed by our research teams that allows for quick and precise editing of 3D models with minimal input, streamlining the process of 3D content creation by simplifying how 3D shapes are manipulated and adjusted. In application, PrEditor3D has the potential to make it easier for animators and Lens creators to bring their visions to life efficiently, leading to richer and more immersive AR experiences.

7. Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning 

MM-Graph introduces the first benchmark for multi-modal graph learning, incorporating both visual and textual data to address the significant gap of visual information in current benchmarks. This allows for more comprehensive model evaluation and drives innovation in graph learning systems that can understand richer, real-world inputs.

8. Video Alchemist

With a text prompt and a set of reference images, Video Alchemist enables the ability to generate videos without extensive tuning or optimization. In application this will streamline video personalization with custom appearances and backgrounds, saving time while enhancing creativity.

9. Mind the Time: Temporally-Controlled Multi-Event Video Generation

Mind the Time introduces precise temporal control into AI-generated videos. It would allow creators to dictate the sequence and timing of events. It enables more structured, coherent storytelling in video generation.

10. Video Motion Transfer with Diffusion Transformers

Video Motion Transfer is a method for transferring realistic motion from one video to another using a diffusion research model. In application this model could easily create videos with realistic movement by transferring motion from reference videos, without needing complex setups.

11. Wonderland: Navigating 3D Scenes from a Single Image

Wonderland creates detailed 3D scenes from just one photo, simplifying the creation of 3D scenes, and allowing for faster and more efficient design without needing multiple angles or extensive resources.

12. AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

AC3D improves camera control within video generation models, enabling smoother, more realistic movement. This gives creators more flexibility over camera movements in videos, and improves the quality and realism of generated scenes.

Come find us at CVPR! 

*All models and work outlined here is for research purposes only. 

返回新闻

与我们取得联系

如有新闻界请求,请发送电子邮件至 press@snap.com
如有其他疑问,请造访我们的支持网站