Skip to main content
Version: 0.95

External resource LF builders

This article describes the set of external resource LF builders that are available for classification and information extraction applications.

Dictionary builder

Label data points that contain all, any, or none of the words from a dictionary, provided through a TXT or CSV file. If you have a CSV that contains words for multiple words classes, you can use the Dictionary Generator.

note

We label data points where keywords in an external dictionary (e.g. a text file with a newline-separated list of loan terms) appear in the text.

tip

The location of the TXT or CSV file needs to be an S3 file.

tip

External model builder

Label data points using the predictions from a machine-learning model, such as sentiment analysis models from TextBlob and NLTK.

note

For the email-spam application, we find that many spam emails contain subjects that convey negative feelings. We label data points where the subject has a negative sentiment score as predicted by the external text analysis mode.

Crowdworker builder

Label data points using crowdworker annotations, as a new-line separated list of UIDs, each with doc:: prefix.

For example, say we want to annotate rows with UID 12 and 3 as label X. We will first create a file, in the format shown below, and upload it to S3.

doc::1
doc::2
doc::3

The crowd-worker builder will then take in a path to the file and label all the examples in the file as X.

note

The current form supports file input for a single label at a time.