Walkthrough for annotators
This page walks through the process of manually annotating documents in Annotation Studio. This walkthrough is designed for users with the Annotator role, who need to access Snorkel Flow to annotate the documents that are assigned to them.
Access Annotation Studio
Typically when you log in to Snorkel Flow, you will land on the Application Overview page. If this is not the case, or you are elsewhere and need to get back into the application screen, click Applications on the left-side menu. From there, you can select the application from which you will annotate documents from. If you do not have this information, ask someone from your team.
Access your batches
Once you are in an application, the next step is to access your batches so that you can start annotating. Batches are simply a collection of documents that are assigned to you to manually annotate. A batch is typically a subset of the total amount of documents. With Snorkel, the amount of documents that you will have to manually annotate is significantly smaller!
To access your batches, click Batches on the left-side menu.
The Batches page shows a list of all batches that have been created:
The following information about each batch is available:
- Batch: The name of the batch.
- Status: Specifies whether annotations on the batch have not be started, are in progress, or have been completed.
- Annotators: The list of annotators that have been assigned to annotate the batch.
- Annotated: The number of documents in the batch that have been annotated.
- Size: The total number of documents that are in the batch.
- Created: The date that the batch was created.
To view your batches:
- Find the batch that you want to annotate. There are two ways to find the batches that are assigned to you:
- Look for your name under the Annotators column.
- Click the Filters icon at the top right corner of your screen. Here, you can filter to only the batches that are assigned to you and that have not yet been completed (where status is either In progress or Not started).
- Click the button with the batch name under the Batch column.
Now, you can start annotating your documents!
Annotate your documents
Once you enter a batch, your screen will look similar to the image below:
The main canvas shows the document text. For classification and extraction applications, the right side-pane shows all classes that you can label the document.
Here is some information about the various buttons that you can see on your screen:
- Click Data view (or Document view for extraction applications) to change the appearance of the documents in the main canvas. The views that are available differ based on the application type.
- Click the Filter icon to filter the documents that are shown to you. For example, you can use the filter to just show the documents that have comments on them.
- Click the gear icon to adjust various settings. We recommend that you adjust a few settings before you get started annotating.
- Click the arrow icons to page through the documents.
- Click the tag icon to add tags to the document. For example, you can tag documents that you find confusing and want to bring up for discussion.
- Click the comment icon to add comments to the document. For example, you can write a comment to explain your reasoning behind a selecting a particular label.
Adjust settings
After accessing your documents, we recommend that you first adjust some settings to get some quality of life improvements while annotating! To change your settings, click the gear icon at the top-right corner of your screen. We recommend changing the following settings:
- Select display columns: This setting lets you select and display only the data columns that are necessary for marking your annotations (e.g., the text column). This removes clutter from the main canvas when you are reviewing documents.
- Auto-advance on label change: With this setting enabled, after you label a document, you automatically go to the next document. This prevents you from having to manually click the arrows every time to view new, unlabeled documents. This setting is only available for single-label and text extraction applications.
- Keyboard shortcuts: With this setting enabled, keys are assigned to each label class, which makes it quicker to assign a label. In the example below, if you want to label a document "employment," then you just have to press 0 on your keyboard.
Here are descriptions of the rest of the settings that are available when you click the gear icon:
- Mark spaces (·): Replaces all spaces in your document with a dot (·).
- Hyperlink URLs: Hyperlinks any URL in your document. This makes it easier to see links in your documents and allows you to click the hyperlink to go directly to the website.
- Right-to-left (RTL) text: Right justifies your document text. By default, the document text is left justified.
- Go to first unlabeled data: Brings you to the first document in the batch that has not yet been annotated. This option is helpful any time you re-enter a batch to continue annotating.
- Export Studio dataset: Export your dataset into a CSV file.
How to Annotate
The following sections walk through how to annotate documents for the four different task types:
Single-label classification applications
In single-label classification applications, your goal is to assign a single class to each document. For example, assigning banking contract documents to one of the following classes: "employment," "loan," "services," or "stock."
On the right-side pane, you'll see all possible classes that you can label your document. To label a document, click the class in the right-side menu. Then click the arrow button to move on to the next document. If you adjusted the recommended settings, then you use can use the keyboard shortcut to assign a class, and then you will automatically be brought to the next document.
Multi-label classification applications
In multi-label classification applications, individual documents can have multiple label values. For example, let's say you are looking at movie review documents. You can label the movie as "Short Film," "Black and White," "Japanese Movies," or "World Cinema." Given these labels, you can see that a single movie can fall into multiple categories. In this case, for a given document, you can label each possible class as present, absent, or abstain from voting.
On the right-side pane, you'll see all possible classes. For each class, you can click:
- to label the class as Abstain.
- to label the class as Present.
- to label the class as Absent.
By default, all classes are initially labeled as Abstain. If enabled for your application, you can set the default label for each class to either Present, Absent, or Abstain.
You can also sort the classes, which is particularly helpful for applications with a large number of classes. You can sort classes marked as Present first, classes marked as Absent first, or in Alphanumerical order.
Text extraction applications
In text extraction applications, the goal is to extract key information from a document. For example, let's say that we want to extract all dates from a document. In this case, a date is considered a span. Your goal in this example is to review all highlighted spans in a document and identify whether the highlighted spans are in fact dates. You can label each highlighted span as "NEGATIVE," "POSITIVE," or "UNKNOWN" in the right-side pane. In this example, "POSITIVE" means that the highlighted span is a date, and "NEGATIVE" means that the highlighted span is not a date.
If you adjusted the recommended settings, then you can use the keyboard shortcuts to label a span, and then automatically to be brought to the next highlighted span in the document.
By default, you will be viewing the spans in Document view. We recommend annotating in either Document view or Span view:
- Document view shows the entire document on the main canvas. You can use the up and down arrows on your keyboard to navigate between spans of text to annotate. This view provides you with all context surrounding a span.
- Span view shows a single span per page. With this view, you don't have all the context surrounding a span. However, the simplified view can enable you to more quickly annotate things like dates that are easy to identify with less context.
Sequence tagging applications
In sequence tagging applications, your goal is to highlight and label spans throughout a document. Spans are key pieces of information that you want to extract from a document. For example, let's say we want to identify all mentions of company names in a document. In this case, a company name is considered a span. Your goal is then to highlight and label all company names that you find while reading through the document.
To label spans in Annotation Studio:
- Highlight a span.
- Select the label that you want for that span in the Annotation modal.
If you don't find any spans to label in the document, you can indicate this by clicking the Label entire document as OTHER button above the document text.
At any point, you can reset and clear all spans that you have labeled in the document by clicking the Remove all labels from document button above the document text.
Check progress
While you are annotating documents, you can see your progress percentage tick up in the right-hand corner of your screen.
Once you are finished, your progress bar will be labeled 100%, and the status will be updated to Complete on the Batches page.
Review comments and tags
Occasionally a reviewer will leave comments on documents. For example, if you left a comment asking a clarifying question, a reviewer may respond to your comment. To see all documents with comments:
- Click the filter icon .
- Click Comments.
- Under User, select the person that you want to see comments from. Alternatively, you can select Any user.
- Click the checkmark to save the filter.
Now, just the documents with comments are shown, making it easier for you to review any comments.
You can follow similar steps to filter documents based on tags:
- Click the filter icon .
- Click Tags.
- Under Tag, select a tag.
- under Operator, select is if you want to see all documents with that tag, or is not if you want all documents with that tag removed from view.
- Click the checkmark to save the filter.