
This guide walks through the complete DeltaMax workflow โ from generating synthetic credit bureau data, running data quality checks (anomaly detection, T-test, PSI, mismatches), to uploading the outputs to Google Cloud Storage and loading them into BigQuery for reporting and visualization in Looker Studio.
It provides the commands and structure required to run each step end-to-end and adapt it to your project setup.
This guide walks through the complete DeltaMax workflow and is organized into three main sections:
1. You will need to create a project on Google Cloud
2. The project should be associated with your corporate billing account.
Generates synthetic datasets for the previous month (April) and the current month (May), where the May dataset intentionally includes injected anomalies that will be used for training and validation.
This step loads the generated datasets, standardizes the businessID, and verifies that both monthly files share common columns. It confirms the data is ready for the next DeltaMax modules. It confirms that the data is successfully loaded and ready to be passed to the next stages of the DeltaMax pipeline.
This step identifies numeric, boolean (bit), and string columns and prepares the correct columns for anomaly analysis.
Detects statistical and ML-based anomalies in the dataset.
Generates descriptive statistics to understand data distribution and month-to-month variation.
Compares common numerical columns between months using T-tests to detect statistically significant mean changes.
Calculates Population Stability Index (PSI) to measure distribution shifts between months.
Checks for inconsistencies in decimal precision across common numerical columns.
Detects inconsistencies in string lengths between previous and current datasets and saves the results to a log file.
Identifies data type inconsistencies between previous and current datasets and logs the results.
preprocesses data and fill missing numeric values using imputation.
Runs the Business Uniqueness module to compare previous vs current month data and identify businesses appearing in only one dataset.
Merges Dataset A and Dataset B into a single combined file using common keys and saves the merged dataset.
Note: You can modify the destination table names as needed based on your project naming conventions before loading the data into BigQuery.
Create bucket
gcloud storage buckets create gs://synthetic_cb_cli-deltamax-v1
gcloud storage cp /opt/DeltaMax-V1/credit_bureau_data_april.csv gs://synthetic_cb_cli-deltamax-v1/
gcloud storage cp /opt/DeltaMax-V1/credit_bureau_data_may.csv gs://synthetic_cb_cli-deltamax-v1/
gcloud storage cp /opt/DeltaMax-V1/anomalies_april.csv gs://synthetic_cb_cli-deltamax-v1/
gcloud storage cp /opt/DeltaMax-V1/cb_may_with_T-Test_anomalies.csv gs://synthetic_cb_cli-deltamax-v1/
gcloud storage cp /opt/DeltaMax-V1/data_type_mismatches.csv gs://synthetic_cb_cli-deltamax-v1/
gcloud storage cp /opt/DeltaMax-V1/credit_bureau_data_combined.csv gs://synthetic_cb_cli-deltamax-v1/
gcloud storage cp /opt/DeltaMax-V1/data_type_mismatches.csv gs://synthetic_cb_cli-deltamax-v1/
gcloud storage cp /opt/DeltaMax-V1/A_or_B_with_anomalies.sql gs://synthetic_cb_cli-deltamax-v1/
gcloud storage cp /opt/DeltaMax-V1/PSI_Changes_20250815_225600.csv gs://synthetic_cb_cli-deltamax-v1/
gcloud storage ls gs://synthetic_cb_cli-deltamax-v1
gcloud config list
bq mk synthetic_credit_bureau_cli_v1
bq ls
Load April dataset
bq load \
--source_format=CSV \
--skip_leading_rows=1 \
--autodetect \
synthetic_credit_bureau_cli_v1.cb_april \
gs://synthetic_cb_cli-deltamax-v1/credit_bureau_data_april.csv
Load May dataset
bq load \
--source_format=CSV \
--skip_leading_rows=1 \
--autodetect \
synthetic_credit_bureau_cli_v1.cb_may \
gs://synthetic_cb_cli-deltamax-v1/credit_bureau_data_may.csv
Load Anomaly dataset
bq load \
--source_format=CSV \
--skip_leading_rows=1 \
--autodetect \
synthetic_credit_bureau_cli_v1.anomalies_april \
gs://synthetic_cb_cli-deltamax-v1/anomalies_april.csv
bq load \
--source_format=CSV \
--skip_leading_rows=1 \
--autodetect \
synthetic_credit_bureau_cli_v1.cb_may_with_T_Test_anomalies \
gs://synthetic_cb_cli-deltamax-v1/cb_may_with_T-Test_anomalies.csv
bq load \
--source_format=CSV \
--skip_leading_rows=1 \
--autodetect \
synthetic_credit_bureau_cli_v1.data_type_mismatches \
gs://synthetic_cb_cli-deltamax-v1/data_type_mismatches.csv
Update project ID inside SQL files before running.
Create merged anomaly table
bq query --use_legacy_sql=false < A_or_B_with_anomalies.sql
Run monthly anomaly queries
bq query --use_legacy_sql=false < anomalies_april.sql
bq query --use_legacy_sql=false < anomalies_may.sql
Step-24 : Load the Big Query Tables to Looker and create reports and Visualizations as needed
Step-24: Reach out to Katalyst Street Professional Services for Custom Visualizations and Reports
