Building an AI NAICS Code Lookup Tool: Seven Sources, One Answer

I built a business classification tool in an afternoon using seven free public APIs and Claude. Here's the exact architecture, every data source, and what I'd do differently.

Building an AI NAICS Code Lookup Tool: Seven Sources, One Answer

A consulting client had a data problem. Businesses in their system had incorrect NAICS codes — industry classification codes that drive real operational decisions. The wrong code on a building means the wrong priority when power goes out. They wanted a tool that could help their team look up and validate the right code, fast, without hours of manual research.

I built it in an afternoon. Here's exactly how it works.

The problem with NAICS codes

NAICS (North American Industry Classification System) codes define what type of business an organization is. There's no single authority that assigns or maintains them — businesses self-report to whichever government registry they interact with. SEC filings, federal contracts, healthcare licensing, IRS registration — each of these systems captures a NAICS or equivalent classification, but none of them talk to each other.

So the data exists. It's just scattered across seven different public registries, none of which have a simple "look up by company name" API that covers everyone.

The solution: hit all of them at once

SEC EDGAR — Every public company files 10-K annual reports with the SEC. These filings include SIC codes and industry descriptions. Free API, no key required.

SAM.gov — Any company that has ever done business with the federal government is registered in SAM.gov. The registration includes a self-reported primary NAICS code. Searchable by legal business name or tax ID. Free, no key required.

NPPES — The National Plan and Provider Enumeration System is CMS's registry of every licensed healthcare provider in the US. Hospitals, urgent cares, dialysis centers, nursing homes, pharmacies — all in there, with provider taxonomy codes that map directly to NAICS. Free, no key required. This is the one that directly solves the urgent-care-turned-restaurant problem.

USASpending.gov — Every federal contract and grant is recorded here, including the NAICS code that the recipient self-reported when bidding. If a company has ever received federal money, their NAICS code is here. Free, no key required.

OpenStreetMap Nominatim — OSM's geocoding API identifies facility types from the map database. Fire stations, police departments, airports, TV/radio stations, schools — it returns the OSM type tag, which maps cleanly to NAICS. Free, no key required.

CMS Care Compare — CMS maintains separate datasets for nursing homes and dialysis centers with state-level filtering. Free, no key required.

Web scrape fallback — If all six structured sources return nothing, the tool tries to locate the company's website, fetches the page, strips it to readable text, and passes that to Claude for classification.

The AI layer

All seven results go to Claude simultaneously. The prompt instructs it to cross-reference sources, note agreement and conflicts, and assign confidence based on how many sources confirm the same business type. Two or more agreeing sources → High confidence. One structured source → High. Web scrape only → Medium. Name inference only → Low.

The source citation in the result tells the CSR exactly where the data came from. If three sources all say hospital, the explanation says so and the confidence is about as high as it gets without calling the business directly.

The stack

  • Cloudflare Pages — static hosting + seven serverless functions (one per data source)
  • Cloudflare Functions — ES modules, each function handles one data source
  • Claude Sonnet via Anthropic API — synthesis layer
  • Zero databases, zero infrastructure to manage

Total monthly cost at this usage volume: a few dollars in Anthropic API calls. Everything else is free tier.

The pattern

The interesting thing here isn't the NAICS lookup. It's the underlying pattern: identify the public registries that might already have the answer, query all of them in parallel, use AI to synthesize and score confidence based on source agreement. That pattern is reusable across dozens of classification problems — contractor licensing, provider credentialing, business risk, property classification.

The public data is almost always there. The synthesis layer is cheap. The gap is usually just building the connector.


Jesse Myers is a Marine veteran, operator, and builder based in Edmond, Oklahoma. He runs 2057 Holdings and its six portfolio companies.

The business case write-up is on the 2057 blog: We Built an AI-Powered NAICS Code Lookup Tool in an Afternoon

Safire Business Services covers the enterprise data strategy angle: The NAICS Problem Is a Symptom — Your Business Data Has the Same Disease

Noevant on the composable AI pattern: One Afternoon, Seven APIs, One Answer: The Composable AI Tool Pattern