Using dataset views for generative AI
Dataset views power the generative AI annotation data viewers. Snorkel Flow offers dataset view options with various benefits:
- Single response review: To view and annotate LLM responses one by one.
- Response ranking: To rank multiple LLM responses.
To create a dataset view
You need to create the dataset view once per dataset. You can create the dataset view using snorkelflow.client.dataset_views
in the SDK.
Single response view
dataset_name = "your-dataset"
view_name = "Single Response View"
view_type="single_llm_response_view"
column_mapping={
"instruction": "questions",
"response": "responses",
"context": "rc_title_text",
"prompt_prefix": "prompt_prefix"}
sf.create_dataset_view(
dataset=dataset_name,
name=view_name,
view_type=view_type,
column_mapping=column_mapping)
Ranking view
The ranking view can scale to as many labels as desired. Once Snorkel Flow assigns a rank to a response, the response changes from white to gray. When Snorkel Flow assigns a rank, it rearranges the responses in the user interface by ascending order. The responses are grouped by the prompt and retrieved content.
The ranking view works only for datasets created through the SDK following the LLM fine-tuning cookbook via the import_data
command.
dataset_name = "your-dataset"
view_name = "Ranking View"
view_type="ranking_llm_responses_view"
column_mapping={
"**instruction**": "questions",
"**response**": "responses",
"**context**": "rc_title_text",
"**prompt_prefix**": "prompt_prefix"}
# get ranking schema UID for create_dataset_view
dataset = Dataset.get(dataset_name)
for label_schema in dataset.label_schemas:
print(label_schema.name, label_schema.uid)
sf.create_dataset_view(
**dataset**=dataset_name,
**name**=view_name,
**view_type**=view_type,
**column_mapping**=column_mapping,
**label_schema_uids**=[<ranking schema id>])
To view a dataset
From within a batch, select the dataset view from the dropdown:
To retrieve the context, the retrieved context data must be in valid JSON. For example:
[
{
**"**TITLE**"**: "Raise Credit Score Quickly",
"RELEVANT_CONTEXT": "Once you've paid …”
}
]
The keys (title
, relevant_context
) are in correctly formatted in quotes.