Skip to main content
Version: 0.95

Span based LF builders

This article describes the basic set of span based LF builders that are available for text extraction applications. They include all builders for classification applications but with extra support for span or contexts around the spans. There are also Span Builders – Span Content BuilderSpan Context BuilderSpan Location Builder – made especially for extraction applications. In this section, we cover these three builders and the span-aware Regex Builder.

Span content builder

Label data points based on the content of the span to see if it:

  • Matches exactly (MATCHES)
  • Contains (CONTAINS)
  • Starts with (STARTS)
  • Ends with (ENDS)

With the given string.

note

An extracted candidate is more likely a question if it ends with a question mark.

Span context builder

Label data points based on the text surrounding the extracted spans. This checks if the span is [LEFT] / [RIGHT] / [LEFT OR RIGHT] of any in the list of the given strings, within the specified number of words.

Span location builder

Label data points based on whether the text of the extracted span appears in the specific location in the document. Words are split using the space character (s),lines using the newline character (n), sentences based on selected punctuation([.?!]s), and paragraphs based on two new line characters (nn+).

Advanced option: You can specify the frequency of the span within that location!

note

If the text in the span appears in the first 5 paragraphs, it is probably the date associated with a contract (the execution date for loan agreements).

Regex builder

This is similar to the Regex Builder for classification applications, except that you can use the special marker {{span}} to reference the span’s text.

note

To match all the data points where the text span follows the word Dated, use the regex: Dated.*{{span}}.