Glossary · Term

ScienceWorld

← all terms

Definition

A simulated science-lab environment for testing whether AI agents can carry out experimental procedures.

A text-based interactive environment for evaluating LLM agents on multi-step science tasks like growing plants or measuring temperatures, built on a PDDL-style simulator.

Mentioned in 2 episodes

  1. 064
    When Agent Memory Stops Being a Database and Starts Being a Skill
  2. 052
    An Old Reinforcement Learning Tradeoff Sneaks Back Into LLM Agents