Back to Blog
Engineering2026-05-119 min

Hawaii-as-code: why I put a state's worth of parcels in Git instead of a database

J

John C. Thomas

Founder, BlueWave Projects

Most production geospatial systems live in a database. PostGIS on Postgres, a normalized schema, a tile server in front of it, an API for the apps, a cache for the reads, and an ETL job that keeps everything in sync with the upstream sources.

That's a reasonable architecture for a typical city or county dataset that updates daily and grows continuously. It is also a *lot of plumbing* for a problem that, in our case, doesn't actually call for it.

Hawaii-as-code is the alternative I built: every TMK parcel, every building footprint, and every address in Hawaiʻi is encoded as TypeScript modules and committed to a single Git repo. 384,262 statewide parcels. 239,458 Honolulu building footprints with real heights. 204,775 address points. Four islands. All of it sits in one repository under portofcams/hawaii-as-code. The 3D map at maps.ikenagroup.com is the static-export rendering of the same source files.

This post is about why that architecture is the right call for *this* shape of dataset, and the specific tradeoffs that come with it.

The problem

Hawaii's authoritative geospatial data is scattered across half a dozen sources:

  • State ArcGIS portal — statewide TMK parcels, the lowest common denominator
  • Honolulu DPP REST — building footprints with permit history
  • Hawaii County GIS — Big Island parcels with the deepest county schema
  • Maui County — separate REST endpoint with different field names
  • Kauai County — yet another schema
  • OSM Overpass — building heights and address points where the counties don't publish them
  • qPublic / TaxNetUSA — ownership records, scraped weekly
  • Every Hawaii-aware product the studio ships (Property Brief, Aloha Network, AI scope generator, ProBuildCalc, Hawaii Property Lookup) needs subsets of this data, fresh, fast, and consistent across products. The conventional architecture would mean: ingest from six sources nightly, normalize into a single schema, serve through one API, cache aggressively, and version the schema as the upstream sources drift.

    That's roughly $500/mo of infra and ~2,000 lines of glue code per product that imports it.

    The alternative: commit the data

    The dataset is finite. The whole compressed corpus — every parcel, building, and address in Hawaiʻi — is well under a gigabyte. It changes on a weekly cadence, not minutely. There are no joins that require SQL-level optimization for this scale.

    So I treated it like source code, not like data.

    One file per island per layer. Directory tree:

    parcels/
      oahu.ts
      maui.ts
      hawaii.ts
      kauai.ts
    buildings/
      honolulu.ts
      hilo.ts
    addresses/
      oahu.ts
      ...

    Each file exports a typed array of records. Every record has the same TypeScript shape regardless of which county it came from — normalization happens at ingest time, not at read time.

    One scraper per source. Every Saturday, a cron job hits all six upstream endpoints in parallel, normalizes the responses, and writes the result back to the repo as fresh TypeScript modules. The scraper commits the change with a structured message: refresh: oahu parcels (+47 / -12 / ~204).

    Diffs are the audit trail. Because the data is text, every weekly refresh shows up as a Git diff. Forty new parcels were added in Kailua? You can see them in a PR. A building height changed from 28' to 32'? Visible in the diff. No ETL black box where "the data refreshed, no idea what changed."

    Three superpowers that flow from this

    1. The same source feeds every product without API drift.

    Property Brief, the scope generator, ProBuildCalc, and the lookup tool all do import { oahuParcels } from "hawaii-as-code/parcels/oahu". There is no API between products. No versioning headaches when one product needs a field that another doesn't. No "tile server is down, app is broken" outages. The data layer is a compile-time dependency. If it compiles, it works.

    2. Static export becomes the 3D map.

    The 3D map at maps.ikenagroup.com is *just* the TypeScript compiled into a Three.js scene at build time. Buildings are extruded from the footprint arrays to their real heights. Parcels are rendered as clickable polygons. There's no map server, no runtime API for the map itself. Cloudflare Pages serves the whole experience as static assets. Page weight: about 18 MB across the four islands; load time on a good connection: ~2 seconds for the first paint with progressive enhancement.

    3. The codebase is the data ops team.

    When a new field needs to be added — say, "is this parcel in a special management area?" — I edit the scraper, push, and the next Saturday refresh fills the field in. The whole "data engineering" surface is the same code surface as the application. There's no separate team, no separate vocabulary, no separate deploy pipeline. Claude Code edits the scrapers the same way it edits the React components.

    The honest tradeoffs

    This architecture is *not* right for every dataset. The specific shape that makes it work here:

  • Finite. The whole state's data fits in memory at runtime. This pattern wouldn't survive moving to the contiguous US.
  • Stable cadence. Weekly updates, not real-time. If the underlying data changed minute-to-minute, the Git commits would be noise.
  • Public. Everything in the repo is sourced from government data feeds that are public records. No PII, no commercial scraping problems.
  • Multi-source normalization is the actual work. If the dataset came from one clean upstream API, you'd just hit the API. The value is in the unified shape across six different schemas.
  • Things that *don't* work:

  • Search at scale. Find "all parcels over 5 acres in Kaneohe Bay" is a linear scan in this architecture. For the products that need that query, the same data is *also* loaded into Postgres on the server side, which is the workhorse. Hawaii-as-code is the source of truth; the database is a derived index.
  • Append-only history. Git keeps everything forever, which is great until you realize you don't want to ship 2 years of weekly diffs to the browser. Production builds load only the latest snapshot.
  • Bandwidth. The TypeScript files compress well but they're still bigger than a binary protocol. For the 3D map, this is fine. For a high-frequency mobile read, you'd want a binary format on top.
  • When to steal this pattern

    Specifically when:

  • The dataset is small enough to ship to the client or compile to a static artifact (under a few GB compressed).
  • The cadence is daily-or-slower.
  • You have multiple sources that need unification.
  • You're running multiple products on top of the same data and care about consistency.
  • You want diffs to be reviewable instead of opaque.
  • For Hawaii — finite, slow-changing, multi-source, multi-product — this architecture is the right call. For the contiguous US it isn't. For a real-time logistics feed it isn't. For PII-bearing customer data it absolutely isn't.

    But if your dataset *does* fit, treating geospatial data like source code instead of like database rows is one of the cleanest architectural moves I've made in 12+ shipped products. The whole "data ops" team is a Saturday cron job and a PR review.

    More from BlueWave