Glossary · Term

multimodal

← all terms

Definition

AI systems that handle more than one kind of input — like text and images together — instead of just one.

Models trained on or operating over multiple data modalities (text, image, audio, video, action), often using a unified representation space across modalities.

Mentioned in 3 episodes

  1. 080
    How a Two-Agent Trick Unlocked Large-Scale Training for Computer-Use Agents
  2. 066
    Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
  3. 027
    When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure