Checks
Warning
This section is a work in progress
allowed_strings(value)
Create a pandera check for allowed strings.
Parameters
value : list | str The list of allowed strings or regex pattern.
Returns
pa.Check A pandera check for the allowed strings.
Source code in datachecker/checks_loaders_and_exporters/checks.py
convert_schema(schema, custom_checks=None)
Convert the loaded schema to a pandera DataFrameSchema. Uses simple defined functions to map schema keys to pandera checks. To add further checks, define schema key function above and add to the loop within this function.
Parameters
schema : dict The schema to convert.
Returns
pa.DataFrameSchema The converted pandera DataFrameSchema.
Source code in datachecker/checks_loaders_and_exporters/checks.py
convert_schema_into_log_entries(converted_schema)
converts pandera schema into log entries dataframe for all checks defined in schema Used to create a complete log of all checks, including those that pass.
Parameters
converted_schema : pa.DataFrameSchema The pandera DataFrameSchema to convert.
Returns
pd.DataFrame | None A dataframe of log entries or None if no checks are defined.
Source code in datachecker/checks_loaders_and_exporters/checks.py
forbidden_strings(value)
Create a pandera check for forbidden strings.
Parameters
value : list | str The list of forbidden strings.
Returns
pa.Check A pandera check for the forbidden strings.
Raises
TypeError If the value is a string. Regex is not supported for forbidden_strings or general TypeError if value is not a list or string.
Source code in datachecker/checks_loaders_and_exporters/checks.py
max_date(value)
Create a pandera check for maximum date.
Parameters
value : str The maximum date or datetime in 'YYYY-MM-DD HH:MM' or equivalent format. YYYYMMDDHHMM is also accepted but recommend separating with - or / for clarity. Date format with no timestamp is also accepted.
Returns
pa.Check A pandera check for the maximum date.
Source code in datachecker/checks_loaders_and_exporters/checks.py
max_decimal(value)
Create a pandera check for maximum decimal places.
Parameters
value : int The maximum number of decimal places.
Returns
pa.Check A pandera check for the maximum decimal places.
Source code in datachecker/checks_loaders_and_exporters/checks.py
max_val(value)
Create a pandera check for maximum value.
Parameters
value : int | float The maximum value to check against.
Returns
pa.Check A pandera check for the maximum value.
Source code in datachecker/checks_loaders_and_exporters/checks.py
min_date(value)
Create a pandera check for minimum date.
Parameters
value : str The minimum date in 'YYYY-MM-DD HH:MM' or equivalent format. YYYYMMDDHHMM is also accepted but recommend separating with - or / for clarity. Date format with no timestamp is also accepted.
Returns
pa.Check A pandera check for the minimum date.
Source code in datachecker/checks_loaders_and_exporters/checks.py
min_decimal(value)
Create a pandera check for minimum decimal places for floats (possible with pandera decimal data type)
Parameters
value : int The minimum number of decimal places.
Returns
pa.Check A pandera check for the minimum decimal places.
Source code in datachecker/checks_loaders_and_exporters/checks.py
min_val(value)
Create a pandera check for minimum value.
Parameters
value : int | float The minimum value to check against.
Returns
pa.Check A pandera check for the minimum value.
Source code in datachecker/checks_loaders_and_exporters/checks.py
string_length(max_length=None, min_length=None)
Create a pandera check for string length.
Parameters
max_length : int | None The maximum length of the string. min_length : int | None The minimum length of the string.
Returns
pa.Check A pandera check for the string length.
Source code in datachecker/checks_loaders_and_exporters/checks.py
validate_using_pandera(converted_schema, data)
validate data using a pandera DataFrameSchema. Returns a dataframe of failed checks with columns: 'column', 'check', 'failure_case', 'invalid_ids'. If all checks pass, returns None.
Parameters
converted_schema : pa.DataFrameSchema The pandera DataFrameSchema to use for validation. data : pd.DataFrame The data to validate.
Returns
pd.DataFrame | None A dataframe of failed checks or None if all checks pass.