Glossary · Term

multimodal

Definition

AI systems that handle more than one kind of input — like text and images together — instead of just one.

Models trained on or operating over multiple data modalities (text, image, audio, video, action), often using a unified representation space across modalities.

Mentioned in 3 episodes

080
How a Two-Agent Trick Unlocked Large-Scale Training for Computer-Use Agents
066
Why Giving an AI Agent More Tools Can Make It Worse at Using a Computer
027
When AI Agents Build the Serving Stack: A Bet on Bespoke Infrastructure