Span based LF builders
This article describes the basic set of span based LF builders that are available for text extraction applications. They include all builders for classification applications but with extra support for span or contexts around the spans. There are also Span Builders – Span Content Builder, Span Context Builder, Span Location Builder – made especially for extraction applications. In this section, we cover these three builders and the span-aware Regex Builder.
Span content builder
Label data points based on the content of the span to see if it:
- Matches exactly (
MATCHES
) - Contains (
CONTAINS
) - Starts with (
STARTS
) - Ends with (
ENDS
)
With the given string.
An extracted candidate is more likely a question if it ends with a question mark.
Span context builder
Label data points based on the text surrounding the extracted spans. This checks if the span is [LEFT] / [RIGHT] / [LEFT OR RIGHT]
of any in the list of the given strings, within the specified number of words.
Span location builder
Label data points based on whether the text of the extracted span appears in the specific location in the document. Words are split using the space character (s),lines using the newline character (n), sentences based on selected punctuation([.?!]s), and paragraphs based on two new line characters (nn+).
Advanced option: You can specify the frequency of the span within that location!
If the text in the span appears in the first 5 paragraphs, it is probably the date associated with a contract (the execution date for loan agreements).
Regex builder
This is similar to the Regex Builder for classification applications, except that you can use the special marker {{span}}
to reference the span’s text.
To match all the data points where the text span follows the word Dated, use the regex: Dated.*{{span}}
.