Version: 25.5

Evaluation Limits

When using Snorkel for evaluation, there is maximum of < 1k traces and < 100 steps per trace for each dataset.

LLM-as-a-judge (LLMAJ) iteration can only be used on train and valid splits.