“Your car has failed its MOT.”
“We’ve run out of milk.”
“I’d give that ten minutes”
There are certain things we just don’t want to hear. However, receiving reliable and accurate information, even if it’s bad news, is essential in all aspects of our life.
When your address data is being verified, you don’t want to be told that a large percentage doesn’t match. But sometimes it is what you need to hear.
Different address validation systems use different methods to verify data. How these systems manage the process can look the same, but may produce very different results.
One of the key issues is the logic that is used within the system. Some systems use Fuzzy Logic, while others employ Rules-Based Logic.
To understand which system you need, it’s important to understand how both approaches work.
The Fuzzy Way
Fuzzy logic attempts to match as many addresses as possible using phonetics and algorithms, along with percentage scores to decide on the results. However, even the most advanced algorithms can produce false results. Whether that is search queries, address matching or predicting exam results. Reliance on this approach can often create difficulties.
Proponents of Fuzzy Logic occasionally attempt to position the method as Artificial Intelligence, but this can be misleading and provide a false confidence in the system. In reality, fuzzy logic generally attempts to find patterns in data and use those patterns to attempt to correct mistakes, fill in missing fields and make a ‘best guess’ at the result.
Fuzzy logic is about making as many matches as possible, even if that means changing the data. If something is 80% correct, then it’s also 20% incorrect.
Following The Rules
Rules-based logic is about determining what is allowed or not allowed to match, based on specific rules.
For instance, English language allows for both ie and ei, along with the age old ‘i before e, except after c’ rule, even if that is not always true! Using a rules-based system, if the word does not match, the programme will test a different rule to see whether that does produce a match, rather than deciding something is the same, just because it could sound similar.
Other rules include silent letters, such as ‘e’ at the end of words to be added or removed, or double letters to be reduced to a single.
By defining the rules and working through them in a logical manner, it is easy to provide reasons for why something has matched or not.
The best method to use alongside this, for address matching purposes, is to work ‘up’ through an address. Starting with the town (potentially within a county), then the thoroughfare with the town, followed by the individual address details (premise number, premise name, sub-premise, organisation). This is why rules-based matching for addressing works so well, because addresses, per country, have a known hierarchy.
Alongside this, by ‘squeezing’ the address to start with, it is possible to locate the likely components, such as postcode, town, thoroughfare and premise to help hone the matching rules for specific items, such as industrial estates, where the abbreviations can be varied.
Postcode and premise could be utilised for an ‘easy’ match, but a simple typo could have a radical affect, for instance, AB is the area code for Aberdeen, whilst BA is the area code for Bath… This is why Hopewiser would not recommend this form of matching where a computer is making the sole decision such as on a web form. However if the address is being validated via a Call Centre during a call, then looking up just the postcode and premise number is great to reduce keystrokes for validating data interactively.
Alongside the generic rules, it is possible to add more specific rules, based on analysis of items not matching, especially as abbreviations and colloquialisms change over time, to allow improvements to be made, in a structured manner.
So, do you want the good news or the bad news?
Both methods (fuzzy and rules) are based on matching to a primary dataset, so the better the quality of the data behind the processes, the higher the quality of the matches. However, rules-based matching produces fewer incorrect matches because it is a defined process, rather than a best guess.
This means rules-based logic can sometimes produce fewer address matches, but you can be confident of the results produced.
Our advice is that you should not use fuzzy logic systems for mission critical data, but it may be acceptable for some uses. However, if you want to have complete control of your address matching and be confident in the accuracy of your data, you should only use rules-based logic.
.