Checks
Warning
This section is a work in progress
allowed_strings(value, library=pa)
Create a pandera check for allowed strings.
Parameters
value : list | str The list of allowed strings or regex pattern.
Returns
get_dtype_lib().Check A pandera check for the allowed strings.
Source code in datachecker/checks_loaders_and_exporters/checks.py
convert_schema(schema, df, custom_checks=None)
Convert the loaded schema to a pandera DataFrameSchema. Uses simple defined functions to map schema keys to pandera checks. To add further checks, define schema key function above and add to the loop within this function.
Parameters
schema : dict The schema to convert.
Returns
get_dtype_lib().DataFrameSchema The converted pandera DataFrameSchema.
Source code in datachecker/checks_loaders_and_exporters/checks.py
256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 | |
convert_schema_into_log_entries(converted_schema)
converts pandera schema into log entries dataframe for all checks defined in schema Used to create a complete log of all checks, including those that pass.
Parameters
converted_schema : get_dtype_lib().DataFrameSchema The pandera DataFrameSchema to convert.
Returns
pd.DataFrame | None A dataframe of log entries or None if no checks are defined.
Source code in datachecker/checks_loaders_and_exporters/checks.py
dtype_check_and_convert(validation_return)
converts the returned pandera dataframe to pandas. This is not converting the input dataframe, only the validation return which we process using pandas. To add support additional dataframe libraries, extend this function.
Parameters
validation_return : pd.DataFrame | pl.DataFrame The validation return dataframe to convert.
Returns
pd.DataFrame The converted pandas dataframe.
Source code in datachecker/checks_loaders_and_exporters/checks.py
forbidden_strings(value, library=pa)
Create a pandera check for forbidden strings.
Parameters
value : list | str The list of forbidden strings.
Returns
get_dtype_lib().Check A pandera check for the forbidden strings.
Raises
TypeError If the value is a string. Regex is not supported for forbidden_strings or general TypeError if value is not a list or string.
Source code in datachecker/checks_loaders_and_exporters/checks.py
max_date(value, library=pa)
Create a pandera check for maximum date.
Parameters
value : str The maximum date or datetime in 'YYYY-MM-DD HH:MM' or equivalent format. YYYYMMDDHHMM is also accepted but recommend separating with - or / for clarity. Date format with no timestamp is also accepted.
Returns
get_dtype_lib().Check A pandera check for the maximum date.
Source code in datachecker/checks_loaders_and_exporters/checks.py
max_decimal(value, library=pa)
Create a pandera check for maximum decimal places.
Parameters
value : int The maximum number of decimal places.
Returns
get_dtype_lib().Check A pandera check for the maximum decimal places.
Source code in datachecker/checks_loaders_and_exporters/checks.py
max_val(value, library=pa)
Create a pandera check for maximum value.
Parameters
value : int | float The maximum value to check against.
Returns
get_dtype_lib().Check A pandera check for the maximum value.
Source code in datachecker/checks_loaders_and_exporters/checks.py
min_date(value, library=pa)
Create a pandera check for minimum date.
Parameters
value : str The minimum date in 'YYYY-MM-DD HH:MM' or equivalent format. YYYYMMDDHHMM is also accepted but recommend separating with - or / for clarity. Date format with no timestamp is also accepted.
Returns
get_dtype_lib().Check A pandera check for the minimum date.
Source code in datachecker/checks_loaders_and_exporters/checks.py
min_decimal(value, library=pa)
Create a pandera check for minimum decimal places for floats (possible with pandera decimal data type)
Parameters
value : int The minimum number of decimal places.
Returns
get_dtype_lib().Check A pandera check for the minimum decimal places.
Source code in datachecker/checks_loaders_and_exporters/checks.py
min_val(value, library=pa)
Create a pandera check for minimum value.
Parameters
value : int | float The minimum value to check against.
Returns
get_dtype_lib().Check A pandera check for the minimum value.
Source code in datachecker/checks_loaders_and_exporters/checks.py
string_length(max_length=None, min_length=None, library=pa)
Create a pandera check for string length.
Parameters
max_length : int | None The maximum length of the string. min_length : int | None The minimum length of the string.
Returns
get_dtype_lib().Check A pandera check for the string length.
Source code in datachecker/checks_loaders_and_exporters/checks.py
validate_using_pandera(converted_schema, data)
validate data using a pandera DataFrameSchema. Returns a dataframe of failed checks with columns: 'column', 'check', 'failure_case', 'invalid_ids'. If all checks pass, returns None.
Parameters
converted_schema : get_dtype_lib().DataFrameSchema The pandera DataFrameSchema to use for validation. data : pd.DataFrame The data to validate.
Returns
pd.DataFrame | None A dataframe of failed checks or None if all checks pass.