Customer and address databases are growing over years within organizations. It's natural that the collected data becomes out of date over time. People move, get a new physical address or change their email provider.
Invalid data costs your organization real money and time. Your email provider charges monthly license fees based on the number of contacts. Why charge for contacts with invalid email addresses? - better detect and remove those contacts from your databases.
Data cleansing describes the process of fixing datasets by removing incorrect, corrupted, duplicate or incomplete data. Working with incorrect data has a negative impact on your results and comes with its costs. It is crucial to establish a data cleansing process and execute it regularly to keep your datasets in good shape. In the following sections, we'll look into the different criteria of data quality.
How does the data conform to the defined business rules or constraints? This includes:
When using a database system, those constraints can already be added to the table definition itself. The database system then rejects inserts or updates that violate one or multiple constraints.
Some systems don't offer schema validation. Here it's important to include validity checks in your data cleansing process.
How close is your data to the true value? Depending on the type of information, this could mean the precision when collecting measured data or information collected about a customer like postal address, phone number, email address, etc. In some cases, an external database can be used in the cleansing process that represents a "gold standard". This can be used to correct or enhance your data.
All required information is available. This is often hard to fix later. Depending on the context, it's possible to go back and collect the data again. In other situations, it is impossible to fill the gap. In general, it is important to embed the completeness checks already in the data entry process.
Data should be consistent within a single or multiple data sets. For example, a customer which has different shipping addresses within two different data sets. Depending on the context you have multiple options to clean the data:
Data across all data sets should have the same units of measure. If you collect weights, make sure to store all values as either pounds or kilos by applying the corresponding conversion. This is also important for DateTime information. Makes sure to either store the timezone information or convert all DateTime to a defined timezone before saving.
Tools like CampaignKit can help you with your Data cleansing. Use the Developer API to embed enhanced email address verification into your data cleansing process. Alternatively, use the WebApp to connect CampaignKit directly to your CRM system or upload your data manually to detect and remove invalid email addresses from your data sets.