In this quick post, we’ll touch on an issue that a number of data sets face, and that’s duplication. A duplicate is essentially the same entity existing more than once within the same data set. Duplicates are there for a number of reasons. The main reason for their existence though, is due to multiple data sets being used to create a single repository of information. This can be from purchased lists, web based data, and data from inbound leads. If all of this data is combined, there will be leads having the same information, that have been imported from more than one source. Thus, we have duplication within the database. A simple enough concept to understand…
So, why do we care about duplicates? Appearance, perception and minimising wastage. Those would be three key points of consideration when debating why you care.
What makes us want to ensure that there are no duplicates within the database? If we have duplicated data we’re creating a situation where we’re doubling, or even tripling, our effort for no further monetary gain. Not only does this cost us time/money/effort, it’s creating a perception of unprofessionalism. Perhaps this is the data geek inside me, but if a business that wants me as a customer doesn’t take its data seriously, then how seriously is it going to service me as a customer?
Organisations that care about their data can take a number of proactive steps to remove duplicates from their system. A number of front end applications will allow for some form of rudimentary de-duplication when importing data. An example of this would be excluding records based on a phone number, or an address. Depending upon your application, this can often be sufficient. Yet for the majority, this is just far too basic. We want something that will not just handle the easy to find duplicates, but also something that will find those duplicates we could never have hoped to find through normal channels. We personally use a selection of 3rd party and in-house algorithms to combat these issues. An example that I’m sure will lend its self to the majority of users out there is this:
Most businesses have a database of Customers and Prospects. Normally these two data sets are either in distinct tables, systems, or just identified as such by a varied status within the record. If, as a business owner/manager, you want to generate additional long term revenue, you’ll look at sending out an acquisition campaign to increase your customer base. This campaign could be something basic, just to entice them to get them onboard. It’s usually time sensitive in nature, and has enticements embedded into the offer that are purely to acquire additional customers. In other words, the ROI should an existing customer want this offer, isn’t feasible. Each mail piece will cost in excess of $1. The “enticement” has an associated cost, as does the labour involved in either the response or the subsequent follow-up. You can see here that the costs of sending this offer out to existing customers quickly adds up. If your database was effectively cleaned and duplicate free, then your exposure is completely negated.
There is a great deal to think about when you come to remove duplicates from your database, but the actual process isn’t complex at all. Each business scenario is potentially different, which is why when we’re tasked with this, our approach will vary.
So just think about your database, and ask yourself, ‘When was the last time we did a proper audit on the level of duplication within the system?’