Using dataset views for generative AI
Dataset views power the generative AI annotation data viewers. Snorkel Flow offers dataset view options with various benefits:
- Single response review: To view and annotate LLM responses one by one.
- Response ranking: To rank multiple LLM responses.
To create a dataset view
You need to create the dataset view once per dataset. You can create the dataset view using snorkelflow.client.dataset_views
in the SDK.
Single response view
dataset_name = "your-dataset"
view_name = "Single Response View"
view_type="single_llm_response_view"
column_mapping={
"instruction": "questions",
"response": "responses",
"context": "rc_title_text",
"prompt_prefix": "prompt_prefix"}
sf.create_dataset_view(
dataset=dataset_name,
name=view_name,
view_type=view_type,
column_mapping=column_mapping)
Ranking view
The ranking view can scale to as many labels as desired. Once Snorkel Flow assigns a rank to a response, the response changes from white to gray. When Snorkel Flow assigns a rank, it rearranges the responses in the user interface by ascending order. The responses are grouped by the prompt and retrieved content.
dataset_name = "your-dataset"
view_name = "Ranking View"
view_type="ranking_llm_responses_view"
column_mapping={
"**instruction**": "questions",
"**response**": "responses",
"**context**": "rc_title_text",
"**prompt_prefix**": "prompt_prefix"}
# get ranking schema UID for create_dataset_view
dataset = Dataset.get(dataset_name)
for label_schema in dataset.label_schemas:
print(label_schema.name, label_schema.uid)
sf.create_dataset_view(
**dataset**=dataset_name,
**name**=view_name,
**view_type**=view_type,
**column_mapping**=column_mapping,
**label_schema_uids**=[<ranking schema id>])
To view a dataset
From within a batch, select the dataset view from the dropdown:
To annotate on retrieved context in the view, the data for a column has to be in a valid JSON format. For example:
[
{
"TITLE": "Raise Credit Score Quickly",
"RELEVANT_CONTEXT": "Once you've paid …”
}
]
The keys (title
, relevant_context
) are in correctly formatted in quotes.