Skip to main content
Version: 25.3

snorkelflow.client.datasets.get_dataset_data

snorkelflow.client.datasets.get_dataset_data(dataset, split=None, start_date=None, end_date=None, target_columns=None)

Load raw data for the given dataset (prior to applying any processors).

Deprecated since version 2024.R4: Use snorkelflow.sdk.Dataset.get_dataframe() instead.

Parameters

NameTypeDefaultInfo
datasetUnion[str, int]Name or UID of the dataset to load unlabeled dataset from.
splitOptional[str]NoneName of split (“train”, “valid”, “test”) to load. None means load all splits.
start_dateOptional[str]NoneFetch data starting from this date. Defaults to minus infinity.
end_dateOptional[str]NoneFetch data up to this date. Defaults to infinity.
target_columnsOptional[List[str]]NoneOptional list of columns needed in dataframe. Default to all columns.

Returns

An [n_data_points x n_fields] Pandas DataFrame containing the dataset data.

Return type

DataFrame