Skip to main content
Version: 25.5

snorkelai.sdk.client.gts.align_external_ground_truth

snorkelai.sdk.client.gts.align_external_ground_truth(node_uid, x_uids, labels, user_format=False, scheduler=None)

Note: This SDK is only necessary when you use a non-dataframe format with sf.add_ground_truth for sequence tagging applications. Starting with version 0.95, if you use a dataframe with sf.add_ground_truth, the labels will automatically be aligned

(Sequence Tagging Only) This function changes external ground truth spans to compensate for the offsets caused by a text preprocessor.

Text Preprocessors may sometimes remove characters, resulting in misalignments between externally collected ground truth spans and the preprocessed text. For example, the default AsciiCharFilter preprocessor removes non-ascii characters which will cause some spans to shift leftwards, but external annotations will still have the original spans. Thus, it is necessary to use this function for applications with non-ascii characters.

Examples

An example for a sequence tagging application with AsciiCharFilter as the preprocessor

node_uid = sf.get_model_node(APP_NAME)
x_uids = ["doc::0", "doc::1"] # all x_uids with labels
labels = [
[[0, 20, "COMPANY"]], # labels for doc::0
[[10, 15, "COMPANY"], [20, 25, "COMPANY"]], # labels for doc::1
...
]

aligned_labels = sf.align_external_ground_truth(node_uid, x_uids, labels, user_format=True)

Then, use sf.add_ground_truth to add aligned_labels

sf.add_ground_truth(node_uid, x_uids, aligned_labels, user_format=True)

Parameters

NameTypeDefaultInfo
node_uidintThe UID of the model node.
x_uidsList[str]UIDs of data points. Can be a list or a 1D numpy array of strings.
labelsList[Any]Label values. List or numpy array of labels. Must be the same length as x_uids. If user_format is True, check that labels have not been JSON serialized.
user_formatboolFalseTrue if labels are provided in user format, False otherwise.
schedulerOptional[str]NoneDask scheduler (threads, client, or group) to use.

Returns

List of aligned labels corresponding to x_uids.

Return type

List[Any]