External resource LF builders
This article describes the set of external resource LF builders that are available for classification and information extraction applications.
Dictionary builder
Label data points that contain all, any, or none of the words from a dictionary, provided through a TXT or CSV file. If you have a CSV that contains words for multiple words classes, you can use the Dictionary Generator.
We label data points where keywords in an external dictionary (e.g. a text file with a newline-separated list of loan terms) appear in the text.
The location of the TXT or CSV file needs to be an S3 file.
External model builder
Label data points using the predictions from a machine-learning model, such as sentiment analysis models from TextBlob and NLTK.
For the email-spam application, we find that many spam emails contain subjects that convey negative feelings. We label data points where the subject has a negative sentiment score as predicted by the external text analysis mode.
Crowdworker builder
Label data points using crowdworker annotations, as a new-line separated list of UIDs, each with doc::
prefix.
For example, say we want to annotate rows with UID 1
, 2
and 3
as label X
. We will first create a file, in the format shown below, and upload it to S3.
doc::1
doc::2
doc::3
The crowd-worker builder will then take in a path to the file and label all the examples in the file as X
.
The current form supports file input for a single label at a time.