Now Covering Every State In The US

Andrew Bowell

October 7, 2024

Updates

Over the summer of 2024, Naurt has been rapidly increasing the amount of US data fed into the final destination API. In May, Naurt's parking spot and building entrance data covered just shy of 60 million address. Fast forward to November and over 155 million addresses have been catalogued across all 50 states.

A plot of Naurt's parking spot and building entrance data across the U.S.

Nationwide coverage hasn’t been easy

The most difficult aspect of running a geocoder is acquiring data. For Naurt, parking spots and building entrances are the easy part. Addresses however, are not as easy. Good quality, complete address data is hard to come by so we’ve been busy collecting various datasets taken from the county, state, and nationwide level. This has presented us with a unique set of challenges we hadn’t yet faced rolling our technology out in the UK and Singapore; namely, that addresses can be wildly different even when they’re describing the same address.

The UK and Singapore essentially have single sources of high quality, standardised addresses. We found this just wasn’t the case in the US. Streets may include or exclude cardinal directions for no apparent reason, remove ordinals such as st, th or rd, and generally be shortened, Boulevard → Blvd or Mountain View → Mtn. View. This can lead to a single location having multiple different, correct addresses. For this reason, we developed a pipeline where addresses are sanitised, corrected, and standardised before being combined with our parking sport and building entrance data. The output from this pipeline is also fed into our accuracy metric which is returned with every geocoding result.

Address standardisation doesn’t just provide problems in the ingestion of data. It also makes it harder to find and rank suitable address matches when searching. As we mentioned in our previous blog, Naurt Update: Faster and Global, we’ve switched our search system to OpenSearch from Postgres & PgVector. The main reason for this was search latency increasing due to an additional 100 million American addresses. However, it’s also enabled us to be more intelligent with our full-text search as we’re now able to efficiently use synonyms, such as rd → road. It’s also helped us handle the sticky situations where an abbreviation could have multiple meanings, such as st → street or saint. Overall, the road to full US coverage has left us with a quicker, more accurate search not to mention the benefits of being able to quickly horizontally scale our system.

Why go to all this trouble?

We realised early on that comprehensive coverage is the bedrock of all good geocoders. Often, we find a customer’s use of a geocoder is business critical - incorrect or unavailable geocoders for a delivery company would result in a lot of lost profits. This leads to complex systems where multiple geocoders are used either as backups, or substitutes depending on what region the delivery is in. At Naurt we believe the best way to ensure a system is robust is by keeping it simple. A single geocoder with comprehensive coverage therefore becomes the obvious choice.

A side effect of bad coverage is bad search results. If you’re searching for an address in the US against a list of 100 addresses, it’s almost guaranteed the most relevant address will be a bad match. On the other hand if you have hundreds of millions, it’s likely the most relevant address will be the correct one you’re looking for. No matter how many sanity checks you put in place to ensure the correct addresses are returned, the best way to avoid this problem is by increasing the density of address coverage.

What’s next?

With excellent availability in the U.S, UK, and Singapore Naurt is currently working with it’s partners to expand into continental Europe, Australia, and Canada. Alongside expanding coverage, Naurt continuously prioritises data accuracy, both in terms of the underlying data final destination and search accuracy. We’ll be looking to ensure partial address matching is more accurate as well as improving the rate at which we reject searches. Often a bad match can be worse than no match at all!

Now Covering Every State In The US

Nationwide coverage hasn’t been easy

Why go to all this trouble?

What’s next?

Dashboard

Connect

Pages

Now Covering Every State In The US

Nationwide coverage hasn’t been easy

Why go to all this trouble?

What’s next?

Related posts

How dlivrd is Investing in Restaurant Success with New Delivery Location Data from Naurt

Everything you need to know about Points of Interest

Python for Marine Science

Async SQlite in Rust

Subscribe to our newsletter

Dashboard

Connect

Pages