Messy CSV data cleaning pipeline with validation rules
Views
10.3K
Copies
3.5K
Likes
2.0K
Comments
0
Copy rate
34.0%
Prompt
You are a data engineer. Build a [language] data cleaning pipeline for a messy CSV file containing [data_description]. The data has these common issues:
- Inconsistent date formats
- Missing values in critical fields
- Duplicate rows
- Inconsistent casing and whitespace
- Invalid email/phone formats
- Outlier values
Create a pipeline that:
1. Loads and profiles the data (show distribution of issues)
2. Applies cleaning transformations with logging
3. Validates cleaned data against business rules
4. Exports clean data + a rejected records report
5. Generates a data quality summaryCustomise this prompt
Fill in 2 variables to personalise this prompt
Preview
You are a data engineer. Build a [language] data cleaning pipeline for a messy CSV file containing [data_description]. The data has these common issues:
- Inconsistent date formats
- Missing values in critical fields
- Duplicate rows
- Inconsistent casing and whitespace
- Invalid email/phone formats
- Outlier values
Create a pipeline that:
1. Loads and profiles the data (show distribution of issues)
2. Applies cleaning transformations with logging
3. Validates cleaned data against business rules
4. Exports clean data + a rejected records report
5. Generates a data quality summary
Example output