Glossary · Term

Show-o2

← all terms

Definition

A unified vision-language model that interleaves text and image generation.

A unified multimodal model performing autoregressive text generation interleaved with diffusion-based image generation; used in VibeServe as a long-tail serving target.

Mentioned in 1 episode

  1. 027
    When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure