-
Schedule Appointment Book now
-
Call or text (406) 404-7177
Address Database 2024
Self-hosted street address database
A SQLite3 database file with over 150 million U.S. and Canada address records. Indexed for fast queries, even on fairly slow hardware.
Schema
The database contains a single table, addresses
, with the following columns:
- zipcode - ZIP Code/Postal code
- number - House number
- street - Street address
- street2 - Apartment/unit/etc number
- city - City name
- state - Two-letter state abbreviation
- plus4 - ZIP+4 (empty, may be used in future releases)
- country - Two-letter country code
- latitude
- longitude
- source - Data source identifier (see below)
There is also a “lite” version that omits the latitude, longitude, and source columns, as well as several indexes. It is optimized for autocompleting addresses in forms, when the user starts with the house number and street name, and optionally with a ZIP Code (or list of ZIPs) as a filter.
Example of an ideal, performant query on the “lite” database: SELECT * FROM addresses WHERE number="100" AND street LIKE "W MAIN%" LIMIT 10;
Download
Full database file is approx. 35 GB uncompressed (11GB compressed download). Click here for a small sample to test with if you’re not sure about grabbing all that data.
“Lite” database is approx. 21 GB uncompressed (5.5GB compressed download).
Processing and hosting all that data costs money and time. For better data quality, we bought a subscription to the USPS ZIP+4 database, which is not cheap. Pay what you think is fair, even if that’s $0. It’s up to you.
SHA256 Checksums:
7069647eabe8ec939296fdcae75e8b6bf015f5ff639db82fc939db8a6495a06b AddressDatabase2024.zip
63013d7a7c6dcab289ae71429841c3067eeefb11b9013976686b77f3bf1ce537 AddressDatabase2024-lite.zip
Coverage Map
You can fill in missing areas in the next release by contributing to the OpenAddresses project!
Address Format
Street names were standardized per USPS Publication 28: Postal Addressing Standards.
Units (street2
) were stripped of designators such as “APT”, because source data was very inconsistent on using them.
It is recommended to display addresses to the user as [number] [street] [# street2]\n[city] [state] [zipcode]
, where \n
is a line break.
Puerto Rico streets should be prefixed with CALLE
, but this has been omitted from the database for efficiency and easier matching.
For example, display Puerto Rico addresses as [number] CALLE [street] [# street2]\n[city] PR [zipcode]
.
Some Canadian PO Boxes are in this dataset. The number field is the PO Box number, and the street is “PO BOX”. If a street equals “PO BOX”, it should be displayed as “[street] [number]\n[city] [state] [zipcode]”.
United States street addresses missing ZIP Codes were matched to ZIP Codes using USPS databases with a fallback to ZIP boundary polygon matching. Records with ZIP Codes in the source dataset were used as-is after confirming the ZIP Code is valid and the state matches. This means that there may be some incorrect ZIP Codes.
Canadian street addresses were published by Statistics Canada already in the preferred Canada Post format, and were used as-is without further processing. Canada Post provided verified addresses to Statistics Canada so they should all be accurate.
Sources
United States data:
- National Address Database version 17
- OpenAddresses project, which collects various government address datasets into a standard data format. We regularly donate to OpenAddresses to cover the costs of downloading their data, and you should too.
Addresses were matched to ZIP Codes using ZIP+4 data from the United States Postal Service, provided by zip-codes.com.
Canada data:
- Source: Statistics Canada, National Address Register, 2024. Reproduced and distributed on an “as is” basis with the permission of Statistics Canada.
source
column
The source
column contains either “NAD [org name]” for data from the National Address Database to
indicate the government agency which provided the data, or “OA/[source]” for data via
OpenAddresses. For statewide datasets from OpenAddresses, the source will be similar to “OA/AK”
(two-letter state abbreviation), while for county-level and city-level data it will be “OA/[county or city name]”.
Canadian addresses have a source field of “StatsCan NAR”.
Copyright
Under United States copyright law, facts and collections thereof, such as a phone book listing, are not creative works, cannot therefore be copyrighted, and as a result are in the public domain.
Due to the effort involved sourcing this data, writing software to sanitize it, manually spot-checking for accuracy, contributing changes upstream to OpenAddresses, and finally compiling it into a clean, indexed database, we kindly ask you to send other people here instead of giving them the file yourself.
Some of the addresses are from proprietary sources, because some counties make deals with private companies to manage their GIS data. These companies typically sell access to the dataset including property owner names for real estate marketing and similar purposes. We did not pay for this data, and OpenAddresses typically scraped it from their free web map viewer instead. There isn’t really anything those companies can legally do about this because facts can’t be copyrighted, and we think this data shouldn’t be paywalled because of its value to humanity so we’d happily pirate it regardless.