Date Adjustment in Python#
SML User Guide#
Overview#
Descriptive |
Details |
|---|---|
Support Area |
Editing & Imputation |
Method Theme |
Editing |
Status |
Ready to Use |
Inputs |
input_dataframe, trading_weights, target_columns, contributor_returned_start_date_col, contributor_returned_end_date_col, expected_start_date_col, expected_end_date_col, domain_col, short_period_parameter_col, long_period_parameter_col, equal_weighted_col, set_to_mid_point_col, use_calendar_days_col, average_weekly_col, da_error_flag_col, trading_date_col, trading_period_start_col, trading_period_end_col, trading_weights_col, trading_domain_col |
Outputs |
dataframe with adjusted responses and sum of the weights |
Method Version |
1.2.5 |
Code Repository |
Summary#
To generate summary and headline statistics for a reporting period it is important to ensure that data reported are on a consistent basis referencing the same period. However, sometimes it is not possible for data to be reported for the exact period of time required, e.g., monthly, quarterly, yearly. The responder may specify different start and end dates for which the observed (referenced as response throughout this specification) data cover. The Date Adjustment method weights the data values from the response period to approximate the values for the desired (expected) period, either by giving varying weights to trading days or giving each date the same weight.
User Notes#
Finding and Installing the method#
This method requires Python >=3.7.1, <4.0.0 and uses the Pandas package v1.5.3.
If you are using Pandas >=2.0 this will be uninstalled and v1.5.3 installed.
To prevent downgrading software on your system, we recommend creating a virtual environment to install and run SML methods. This will enable you to install the method with the required version of Python, etc, without disrupting the newer versions you may be running on your system. If you’re new to virtual environments, please see our guidence on installing a method in the Help centre of our SML website to get started. Otherwise, use your preferred method to create a virtual environment with the correct software versions.
The method package can be installed from Artifactory/PyPI using the following code in the terminal or command prompt:
pip install sml_small
In your code you can import the method using:
import sml_small.date_adjustment as date_adjust
Requirements and Dependencies#
This method requires input data and trading day data.
These each need to be supplied as a pandas dataframe.
Assumptions and Validity#
All data inputs required by the method are available.
Each respondent is clearly classified into one mutually exclusive domain group.
All trading day weights are equal or greater than 0.
Trading day weights are available for all periods and domains, or the equal weighted option is set to yes.
How to Use the Method#
Method Input#
The method requires input data and trading day data.
Input Data#
The input data requires the following columns to perform the date adjustment method:
Reference: Unique to each respondent
Contributors start date: Start date returned by the contributor.
Contributors end date: End date returned by the contributor.
Q20: Variable response that needs to be date adjusted.
Expected start date: Start date the user is expecting from the contributor.
Expected end date: End date the user is expecting from the contributor
Domain: Domain classification.
Mid-point: Indicator as to whether the mid-point method needs to be used.
Equal-weighted: Indicator as to whether the equal-weighted method needs to be used.
Calendar days: Indicator as to whether the calendar days method needs to be used.
Average weekly: Indicator as to whether the average weekly method needs to be used.
Short period: A value that shows the user whether a response is of a short time frame.
Long period: A value that shows the user whether a response is of a long-time frame.
Error column: A column that will be populated if any errors occur.
Example
Reference |
Contributors start date |
Contributors end date |
Q20 |
Expected start date |
Expected end date |
Domain |
Mid-point |
Equal-weighted |
Calendar days |
Average Weekly |
Short period |
Long period |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 |
20220601 |
20220630 |
1184 |
20220601 |
20220630 |
A |
N |
N |
N |
N |
24 |
42 |
2 |
20220604 |
20220630 |
2045 |
20220601 |
20220630 |
A |
N |
N |
N |
N |
24 |
42 |
3 |
20220601 |
20220624 |
2013 |
20220601 |
20220630 |
A |
N |
N |
N |
N |
24 |
42 |
4 |
20220601 |
20220704 |
1992 |
20220601 |
20220630 |
A |
N |
N |
N |
N |
24 |
42 |
5 |
20220530 |
20220628 |
1027 |
20220601 |
20220630 |
A |
N |
N |
N |
N |
24 |
42 |
Trading Day Data#
The trading day data requires trading day weights for all dates present and for the domain we are interested in:
Date: Date we are interested in
Domain: Domain classification
Weight: Weight for the specific trading day
Period: An indicator of the period.
Period_start: Start date of the period
Period_end: End date of the period
**Example **
date |
domain |
weight |
period |
period_start |
period_end |
|---|---|---|---|---|---|
20220528 |
A |
0 |
202205 |
20220501 |
20220531 |
20220529 |
A |
0 |
202205 |
20220501 |
20220531 |
20220530 |
A |
0.2 |
202205 |
20220501 |
20220531 |
20220531 |
A |
0.2 |
202205 |
20220501 |
20220531 |
20220601 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220602 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220603 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220604 |
A |
0 |
202206 |
20220601 |
20220630 |
20220605 |
A |
0 |
202206 |
20220601 |
20220630 |
20220606 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220607 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220608 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220609 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220610 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220611 |
A |
0 |
202206 |
20220601 |
20220630 |
20220612 |
A |
0 |
202206 |
20220601 |
20220630 |
20220613 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220614 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220615 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220616 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220617 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220618 |
A |
0 |
202206 |
20220601 |
20220630 |
20220619 |
A |
0 |
202206 |
20220601 |
20220630 |
20220620 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220621 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220622 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220623 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220624 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220625 |
A |
0 |
202206 |
20220601 |
20220630 |
20220626 |
A |
0 |
202206 |
20220601 |
20220630 |
20220627 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220628 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220629 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220630 |
A |
0.2 |
202206 |
20220601 |
20220630 |
20220701 |
A |
0.2 |
202207 |
20220701 |
20220731 |
20220702 |
A |
0 |
202207 |
20220701 |
20220731 |
20220703 |
A |
0 |
202207 |
20220701 |
20220731 |
20220704 |
A |
0.2 |
202207 |
20220701 |
20220731 |
Method Output#
Output Data#
New columns are produced from running the method and are described below:
Sum of trading day weights over contributors period: The trading day weights will be summed over the dates returned, all dates are inclusive.
Number of days in contributors returned period: The number of days returned by the contributor, all dates are inclusive.
Actual period start date: The start date the user is looking for (usually similar to expected start date).
Actual period end date: The end date the user is looking for (usually similar to expected end date).
Number of days in actual returned period: The number of days the user set
Sum of trading day weights over actual period: The sum of the trading weights over the days set by the user.
Date adjusted Q20: The adjusted question value based on the sum of the trading days weights ratio.
Example
Ref |
Contributors start date |
Contributors end date |
Q20 |
Expected start date |
Expected end date |
Domain |
Mid-point |
Equal-weighted |
Calendar days |
Average Weekly |
Short period |
Long period |
Error Column |
Sum trading day weights over contributors returned period |
Num days in contributors returned period |
Actual period start date |
Actual period end date |
Num days in actual returned period |
Sum trading day weights over actual returned period |
Date adjusted Q20 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 |
2022-06-01 |
2022-06-30 |
1184 |
2022-06-01 |
2022-06-30 |
A |
N |
N |
N |
N |
24 |
42 |
4.400000000000001 |
30 |
2022-06-01 |
2022-06-30 |
30 |
4.400000000000001 |
1184.0 |
|
2 |
2022-06-04 |
2022-06-30 |
2045 |
2022-06-01 |
2022-06-30 |
A |
N |
N |
N |
N |
24 |
42 |
3.800000000000001 |
27 |
2022-06-01 |
2022-06-30 |
30 |
4.400000000000001 |
2367.8947368421054 |
|
3 |
2022-06-01 |
2022-06-24 |
2013 |
2022-06-01 |
2022-06-30 |
A |
N |
N |
N |
N |
24 |
42 |
3.600000000000001 |
24 |
2022-06-01 |
2022-06-30 |
30 |
4.400000000000001 |
2460.3333333333335 |
|
4 |
2022-06-01 |
2022-07-04 |
1992 |
2022-06-01 |
2022-06-30 |
A |
N |
N |
N |
N |
24 |
42 |
4.800000000000002 |
34 |
2022-06-01 |
2022-06-30 |
30 |
4.400000000000001 |
1826.0 |
|
5 |
2022-05-30 |
2022-06-28 |
1027 |
2022-06-01 |
2022-06-30 |
A |
N |
N |
N |
N |
24 |
42 |
4.400000000000001 |
30 |
2022-06-01 |
2022-06-30 |
30 |
4.400000000000001 |
1027.0 |
Example (Synthetic) Data#
Files containing the example input & output data given above can be found in the example_data folder of this repository.
Input & Trading day data:
date_adjustment_input_data_example_1.csv
date_adjustment_trading_day_weights_example_1.csv
Expected output after running the worked example:
date_adjustment_output_data_example_1.csv
Worked Example#
Using the example input data and trading day data, we can then run the method as shown below ensuring that we are calling the right columns from the dataframe:
import pandas as pd
import sml_small.date_adjustment as date_adjust
# Import datafile containing the respondent's data
datafile = "date_adjustment_input_data_example_1.csv"
df = pd.read_csv(datafile)
# Import trading day weights file
trading_day_datafile = "date_adjustment_trading_day_weights_example_1.csv"
trading_df = pd.read_csv(trading_day_datafile)
# Match up column names with with variables below
output = date_adjust.date_adjustment(
input_dataframe = df, # Respondent's input dataframe
trading_weights = trading_df, # trading day weights dataframe
target_columns = ['Q20'], # The variable that needs to be date adjusted
contributor_returned_start_date_col = "Contributors start date", # Contributor returned start date
contributor_returned_end_date_col = "Contributors end date", # Contributor returned end date
expected_start_date_col = "Expected start date", # Expected start date
expected_end_date_col = "Expected end date", # Expected end date
domain_col = "Domain", # Domain classification
short_period_parameter_col = "Short period", # Short period
long_period_parameter_col = "Long period", # Long period
equal_weighted_col = "Equal-weighted", # Equal-weighted column
set_to_mid_point_col = "Mid-point", # Mid-point indicator column
use_calendar_days_col = "Calendar days", # Calendar days column
average_weekly_col = "Average Weekly", # Average weekly indicator
da_error_flag_col = "Error column", # Column for errors to be populated
trading_date_col = "date", # Date in the trading day dataframe
trading_period_start_col = "period_start", # Start date column in trading day dataframe
trading_period_end_col = "period_end", # End date column in trading day dataframe
trading_weights_col = "weight", # Weight values for each date
trading_domain_col = "domain" # Domain for each date and weight
)
# Exporting the output file to csv
output.to_csv("date_adjustment_output_data_example_1.csv")
The output can be exported as a csv file and will give you the adjusted responses along with the sum of the weights.
Methodology#
Terminology#
Expected period start date – The expected start date of the period set by the user.
Expected period end date – The expected end date of the period set by the user.
Variable(s) to be date adjusted – The user can select one or more variables to be date adjusted.
Contributor’s returned start date – The start date of the period returned by a respondent/observed in the data.
Contributor’s returned end date – The end date of the period returned by a respondent/observed in the data.
Domain – Classification group.
Set to mid-point – Allows the user to apply the mid-point method described below.
Set to equal weighted – Allows the user to apply the equal weighted method described below.
Use calendar days – Allows the user to apply the calendar days method described below.
Average weekly – Allows the user to apply the average weekly method described below.
Short period parameter – This is used to raise a flag to alert the user that the contributor’s returned period is short.
Long period parameter – This is used to raise a flag to alert the user that the contributor’s returned period is long.
Date mapping – This is a file which contains dates and relevant trading day weights per domain.
Trading day weights: These are weights associated with each day to allow the user to give a higher value to certain days relative to others in a given period. For example, setting weights of 0.2 for weekdays and 0 for weekends when considering turnover data values would imply that turnover is not generated on the weekend and is stable throughout weekdays.
Sum of trading day weights over contributor’s period: The trading day weights will be summed over the dates returned; all dates are inclusive.
Number of days in contributor’s returned period: The number of days returned by the contributor; all dates are inclusive.
Actual period start date: Will appear on the output dataset and is the start date that the output is based on.
Actual period end date: Will appear on the output dataset and is the end date that the output is based on.
Number of days in actual returned period: The number of days the user set.
Sum of trading day weights over actual period: The sum of the trading weights over the days set by the user.
Date adjusted variable: The adjusted data value based on the sum of the trading days weights ratio.
Statistical Process Flow / Formal Definition#
The basic date adjustment method is:
Adjusted response = Original response * Sum of weights of desired period / Sum of weights of response
A contributor will give a response for a period, this does not have to match the expected period, and will be fed into the method. Data which have the same returned and expected start and end dates may have this method applied but no adjustments will be made as the input data is fit for purpose. Where this is not the case trading day weights are assigned to each day covered by the contributor’s returned period and the expected period. These are then summed to provide a sum for the total trading day weights for the returned and expected periods respectively. The expected period total trading day weights are divided by the contributor’s returned period total trading day weights. This ratio is then applied to the contributor’s response so that it is representative of the expected period.
Incomplete or erroneous data provided Contributors may not always provide full or correct data and the Date Adjustment method will handle it by producing error codes. There are 16 error codes and the explanation of each of the codes can be found below:
E00: Average Weekly parameter is invalid. E01: The value to be date adjusted is missing from one of the target columns. E02: The contributor returned end date is earlier than the contributor returned start date. E03: A required record for calculating weight m is missing from the trading weights table. E04: A required trading weight for calculating weight m is null or blank. E05: A required trading weight for calculating weight m has a negative value. E06: A required record for calculating weight n is missing from or duplicated in the trading weights table. E07: A required trading weight for calculating weight n is null or blank. E08: A required trading weight for calculating weight n has a negative value. E09: Contributors return does not cover any of expected period. E10: The sum of trading day weights over contributors returned period is zero. E11: The sum of trading day weights over contributors returned period is zero. E12: A required record for calculating midpoint date is missing from the trading weights table. E13: A required record for setting APS and APE by midpoint is missing from or duplicated in the trading weights table. E14: Expected period start date is missing or an invalid date. E15: Expected period end date is missing or an invalid date. These are NOT exceptions and do not cause the method to fail. Once an error flag has been placed on a row of data, no further processing is done to that row, preserving the data in the state it was when the flag was raised. The method will continue processing even when errors occur, this was done for historical reasons.
Set to Mid-point – set as Y or YT or N A mid-point method can be used in Date Adjustment to check whether a contributor’s returned dates are within the expected period. If the mid-point of the contributor’s returned start and end dates are outside the expected period, then a “C” flag is raised in the error flag column to inform the user that the contributor’s response data aligns with a different reporting period (i.e., not the expected period). If the respondent’s mid-point does lie outside of the period, the user is interested in then date adjustment will not occur.
There are two ways to use the mid-point method: set the mid-point to “Y” and set the mid-point to “YT”. Setting the mid-point input parameter to “Y” will simply calculate the mid-point of the days returned. If the number of days in the contributor’s returned period are even, then divide the count by 2 and add that to the contributor’s returned start date to find the mid-point. If the number of days in the contributor’s returned period are odd, then add 1 to the count and divide by 2. This is then added onto the contributor’s returned start date. If the user sets the mid-point input parameter to “YT”, then the method will trim any weighted days at the start or end of the contributor’s returned dates if they have trading day weights set to 0 and do the same calculation as above with regards to the number of days in the period. Therefore, each method could provide a slightly different answer. If the user does not want the mid-point method to occur, then please set this to “N”.
Use calendar days – set as Y or N If the mid-point lies outside of the expected period start and end dates, a “C” flag is raised in the Date change column. Calendar days function will only work when there is a “C” flag present. The calendar days function, when set to “Y”, allows the user to automatically set the expected start and end period dates to the first day and last day of the month where the mid-point lies. When the calendar days function is set to “N” it allows start and end period dates to be set to the start and end dates that “C” flag lies in.
Set to equal weighted – set to Y or N The equal weighted method is where all the trading day weights are set to 1 instead of having a unique trading day weight associated with each day. This can be a useful feature when each day is of equal importance with respect to the data collected. To use this feature, the equal-weighted column should be marked as “Y”. If the user does not want the trading day weights to be equal-weighted, then please set this to “N”.
Average weekly – set to A or N The average weekly function allows the responses to be given on a weekly value rather than the period specified by the user. This can be a useful tool when the user’s period isn’t always equal, to allow the user to compare similar values. To use this feature, the average weekly value will need to be set to “A”. If the user does not want to have average weekly values in the output, then please set this to “N”.
Additional Information#
The ONS Statistical Methods Library at https://statisticalmethodslibrary.ons.gov.uk/ contains:
Further information about the methods including a link to the GitHub repository which contains detailed API information as part of the method code.
Information about other methods available through the library.
License#
Unless stated otherwise, the SML codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation.
The documentation is available under the terms of the Open Government 3.0 license.