Structure Inventory
Data Schema & Source Confidence
Prepared for: Fermat Capital Management Date: June 21, 2026 Classification: Confidential — Authorized Recipients Only Version: 2.1

The Hummingbird Structure Inventory (SI) is a unified building-level dataset covering 122.9 million structures across 49 US states. Every field is derived from publicly available government sources — county tax assessor records (CAMA), federal regulatory GIS layers, USGS instrumented measurements, FEMA databases, and US Census data — fused and calibrated into a single record per building.

This document describes every field in the SI, organized by the method by which the data was collected. Confidence is field-specific, not tier-specific. A lidar ground elevation measurement and an NSI tract median year are both technically "government data" — but the first is a direct instrument reading and the second is a statistical proxy. The Confidence column tells you exactly which is which.

123.1MBuildings
49States
100Columns
PUBLICData Origin
All HazardPerils
Jun 2026Vintage
Confidence is field-specific, not tier-specific. A field in Tier 3 (Federal Property Inventory) may be lower confidence than a field in Tier 2 (Federal Engineering Standards) for some applications — check the Confidence column for each field individually. construction_type, year_built, and roof attributes are the most important fields for wind and hail underwriting. Where these come from CAMA or FDOR (county assessor), confidence is HIGH. Where NSI fills the gap, treat as LOW and always check the construction_type_source field to know which applies to each building.

Contents

  1. Tier 1 — Permit-Recorded & Directly Measured
  2. Tier 2 — Authoritative Federal Engineering Standards & GIS
  3. Tier 3 — Federal Property Inventory (Complete, Modeled)
  4. Tier 4 — Satellite & Open Mapping (Geometry)
  5. Tier 5 — Hummingbird Engineered Inference
  6. Tier 6 — Observed Aggregate Loss Data (NFIP)
  7. Tier 7 — Survey & Socioeconomic Context
  8. Data Provenance & Source Notes
Year-Built & Construction Type — Source Coverage Summary

Two attributes drive the largest share of wind/hail underwriting variance: year_built (code vintage, post-storm resilience) and construction_type (wood vs masonry vulnerability). The table below shows exactly how much of the 123.9M-building national inventory is covered at each confidence level for each attribute.

Source Confidence Year-Built Const. Type Notes
CAMA / FDOR (direct) HIGH 30.0M (24.2%) 14.9M (12.0%) County tax assessor records. Recorded at permit issuance. FL = FDOR NAL (full state). Used as ground truth for all downstream models.
AEI / First American HIGH 44.9M (36.2%) First American national parcel database (109M parcels). Same source methodology as CAMA — building-level assessor records. AEI vs CAMA MAE = 5.0 yrs (85% within 5 yrs). Applied to buildings not covered by CAMA.
HB Model Inference MED 13.3M (10.8%) 74.9M (60.5%) Year-built: spatial neighbour inference (50m BallTree, CAMA/AEI donors). Construction type: GradientBoosting classifier trained on CAMA ground truth (70% accuracy). Applied only where year_built quality source was available.
County/State Median LOW 35.7M (28.8%) Fallback for buildings where no CAMA, AEI, or spatial donor was available. County median used where possible; state median as last resort. MAE vs assessor ~8–15 yrs.
NSI Census Proxy LOW 34.1M (27.5%) Census block-group aggregate. NSI vs AEI MAE = 16.2 yrs (33% within 5 yrs). Construction type: HAZUS default class for tract. Applied only where no quality year_built existed to enable model inference. Do not use for underwriting decisions.
Total / Combined 123.9M (100%) 89.8M (72.5%) CT coverage = CAMA (12%) + model inferred (60.5%). Remaining 27.5% is NSI-only. Year-built: HIGH/MED = 88.4M (71.4%); LOW = 35.7M (28.8%).

Use effective_year_built_source to filter by quality tier. Use construction_type_source + construction_type_inferred to distinguish assessor-recorded from model-predicted construction class.

Tier 1 Permit-Recorded & Directly Measured County Assessor / CAMA · USGS 3DEP · USGS NSHM

Highest confidence. These fields were either recorded at the moment of permit issuance (county assessor / CAMA records) or measured directly by physical instruments (lidar elevation surveys, seismograph networks). They reflect what the building actually is — not a statistical estimate of what it might be.

construction_type, year_built, and roof attributes are the most important fields for wind and hail underwriting. Where these come from CAMA or FDOR, confidence is HIGH. The construction_type_source field tells you exactly which source was used for every building — use it to filter your analysis to high-confidence records where it matters.

Florida FDOR NAL note: For Florida's 8.4M structures, the Florida Department of Revenue (FDOR) statewide NAL database is the assessor source — it is the same type of permit-recorded data as county CAMA, but with statewide coverage in a single dataset. Post-Andrew (1992) and post-2004 hurricane season code changes mean that FL FDOR vintage + construction_type enables precise wind vulnerability classification.

County Assessor / CAMA (includes FL FDOR NAL)
Field Description Coverage Data Quality Underwriting Notes
construction_type Primary structural system. Values: wood_frame, masonry, steel, manufactured. 100% filled: 12.1% from CAMA/FDOR (HIGH confidence — permit-recorded); 87.9% from NSI inference (LOW confidence — 51–60% accuracy vs. assessor). Use construction_type_source to filter to high-confidence records.
100% 21.9% CAMA/FDOR; 78.1% other
construction_type from CAMA/assessor — % of buildings AL: 0.0% AR: 0.0% AZ: 27.6% CA: 1.6% CO: 0.0% CT: 83.3% DE: 0.0% FL: 100.0% GA: 0.0% IA: 0.0% ID: 0.0% IL: 27.6% IN: 0.0% KS: 0.0% KY: 0.0% LA: 0.0% MA: 0.0% MD: 0.0% ME: 0.0% MI: 0.0% MN: 0.0% MO: 0.0% MS: 0.0% MT: 0.0% NC: 7.0% ND: 0.0% NE: 0.0% NH: 0.0% NJ: 0.0% NM: 0.0% NV: 0.0% NY: 0.0% OH: 0.0% OK: 0.0% OR: 0.0% PA: 17.1% RI: 0.0% SC: 0.0% SD: 0.0% TN: 0.0% TX: 16.7% UT: 0.0% VA: 15.6% VT: 0.0% WA: 0.0% WI: 0.0% WV: 0.0% WY: 0.0% AK: 0.0% ≥80% 40–79% <40% 0% / none
CAMA covered
HIGH (CAMA)LOW (NSI) ≥80% CAMA
40–79%
1–39% CAMA
0% (NSI only)
FDOR NAL County CAMA NSI fallback
The single most important field for wind and hail underwriting. CAMA validation shows NSI achieves only 51–60% agreement vs. assessor ground truth — biased toward wood_frame, with masonry recall as low as 25% in TX and steel recall 14% in FL. Cedant EDMs derived from NSI likely carry the same bias. Map shows CAMA coverage % by state; filter to CAMA records for high-confidence analysis.
construction_type_inferred Model-predicted construction type for buildings where CAMA/FDOR did not provide a value. GradientBoosting classifier trained on CAMA ground truth (~12M buildings). Features: effective_year_built, sqft, census_region, census_division, cbsa_type, num_story. Only applied where effective_year_built_source ∈ {cama, aei, inferred_spatial, inferred_county}.
model inference
MED Replaces NSI-sourced construction_type where AEI/CAMA year_built is now available. Accuracy ~65% vs assessor ground truth (vs 51–60% for NSI baseline). Use construction_type_source to distinguish CAMA-sourced from model-inferred rows.
construction_type_source Identifies which data source provided the construction_type value for each building. Values: fdor_nal, county_assessor, cook_county_cama, nsi_bldgtype.
100%
PROVENANCE
Derived
Use this field to filter analyses to high-confidence records. Filter to fdor_nal or county_assessor for underwriting decisions that require permit-grade construction data.
year_built_assessor Year structure was built or permitted, from tax assessor records. State coverage varies dramatically — see map.
~15% national avg
year built assessor — state coverage AL: 0% AR: 0% AZ: 61% CA: 4% CO: 31% CT: 94% DE: 0% FL: 95% GA: 0% IA: 0% ID: 0% IL: 12% IN: 0% KS: 12% KY: 0% LA: 0% MA: 0% MD: 0% ME: 0% MI: 0% MN: 27% MO: 9% MS: 0% MT: 0% NC: 15% ND: 0% NE: 29% NH: 0% NJ: 0% NM: 0% NV: 15% NY: 21% OH: 7% OK: 14% OR: 33% PA: 13% RI: 0% SC: 0% SD: 0% TN: 0% TX: 6% UT: 47% VA: 16% VT: 0% WA: 19% WI: 0% WV: 0% WY: 0% AK: 0% ≥80% 40–79% <40% 0% / none
≥80%
40–79%
<40%
0–10%
HIGH where present
County CAMAFDOR NAL
State coverage: FL 95%, CT 94%, AZ 61%, UT 47%, OR 33%, CO 31%, MN 27%, NE 29%, NY 21%, WA 19%, VA 16%, NV 15%, NC 15%, PA 13%, IL 12%, KS 12%, OK 14%, OH 7%, TX 6%, CA 4% — most other states 0–5%. Where absent, med_yr_blt (NSI tract median) substitutes but is LOW confidence. Drives wind-code vintage assignment.
effective_roof_year Year of the most recent roof installation. Priority: roofing permit records (actual replacement year) > county assessor records. This is the best available indicator of current roof age — distinct from year_built.
92% national avg
effective roof year — state coverage AL: 96% AR: 100% AZ: 91% CA: 23% CO: 89% CT: 100% DE: 79% FL: 92% GA: 81% IA: 99% ID: 49% IL: 98% IN: 97% KS: 99% KY: 88% LA: 98% MA: 92% MD: 93% ME: 95% MI: 95% MN: 97% MO: 99% MS: 97% MT: 93% NC: 97% ND: 99% NE: 100% NH: 91% NJ: 94% NM: 76% NV: 56% NY: 96% OH: 95% OK: 100% OR: 71% PA: 95% RI: 100% SC: 95% SD: 98% TN: 91% TX: 94% UT: 65% VA: 80% VT: 79% WA: 11% WI: 98% WV: 89% WY: 90% AK: 0% ≥80% 40–79% <40% 0% / none
≥80%
40–79%
<20%
HIGH (permit)MED (CAMA)
Building PermitsCounty CAMA
Key states: IL 98%, IA/NE/OK/AR/RI 99–100%, MN 97%, OH/PA/NY 95–96%, FL 92%, MA 92%, AL 96%, TN 91%. Low coverage: WA 11%, CA 23%, NV 56%, HI 0%. A 20-year-old shingle vs. a 3-year-old one has dramatically different hail and wind loss performance.
roof_cover_assessor Roof covering material from FDOR NAL. Highest-confidence roof material field when present. FL only.
7% (FL only)
HIGH
FDOR NAL
FL only. Where present, this is permit-recorded roof material — more reliable than modeled roof_material (Tier 5). For all other states, use Tier 5 roof_material.
enclosure_class Building enclosure per ASCE 7 / Florida Building Code. Values: Enclosed, Partially Enclosed, Open. FL only (13%).
13% (FL only)
HIGH
FDOR NAL
Partially enclosed structures face significantly higher internal wind pressure under ASCE 7. Critical hurricane vulnerability variable. Not available from assessor records outside FL.
sqft_assessor Conditioned floor area in square feet from FL FDOR assessor records. FL only.
7% (FL only)
HIGH
FDOR NAL
FL only. Higher precision than NSI sqft. Nationally, use sqft (Tier 3, NSI) as the primary area field.
Directly Measured Physical Data — Lidar & Seismograph Instruments
Field Description Coverage Data Quality Underwriting Notes
ground_elv Ground surface elevation in feet at building centroid. Measured by USGS 3DEP 1-meter lidar/photogrammetry — a direct instrument reading, not a model estimate.
100%
HIGH
USGS 3DEP
Basis for all flood stage calculations and freeboard determinations. The most accurate nationally available digital elevation model — 1-meter horizontal resolution lidar.
first_floor_elv Estimated first-floor elevation in feet AMSL = ground_elv + foundation height offset (varies by found_type).
100%
HIGH
USGS 3DEPHB Derived
Flood damage is a function of water depth above first floor, not ground elevation. The difference between first_floor_elv and bfe_ft (freeboard) determines FEMA flood zone compliance.
ss Mapped Maximum Considered Earthquake (MCE) spectral response acceleration at short periods (g), ASCE 7-22.
100%
HIGH
USGS NSHM
Derived from decades of seismograph observations at site-specific locations. Input to seismic design category determination.
s1 MCE spectral response acceleration at 1-second period (g).
100%
HIGH
USGS NSHM
1-second period is critical for mid-rise and taller structures. Pair with num_story to determine dominant vibration period.
pgam Geometric mean peak ground acceleration (g) for MCE ground motion.
100%
HIGH
USGS NSHM
Used for nonlinear soil response and liquefaction hazard. Critical for Pacific Coast portfolios.
sds Design spectral acceleration at short periods (g) — primary seismic demand parameter for low-rise residential.
100%
HIGH
USGS NSHM
Used directly in seismic vulnerability functions for residential structures.
sd1 Design spectral acceleration at 1-second period (g).
100%
HIGH
USGS NSHM
Pair with num_story to determine dominant vibration period for mid-rise structures.
Tier 2 Authoritative Federal Engineering Standards & GIS ASCE 7-22 · FEMA FIRM · FEMA WUI · USGS NSHM · USFS WRC

Federal agencies publish these as official standards or regulatory designations — the same maps that govern building codes, mandatory flood insurance, and wildfire land management. High confidence for what they measure, but area-based (not property-specific instrument readings). Every building gets a value for every field in this tier.

Why this matters: Regulatory spatial data is the foundation of catastrophe risk zoning. FEMA flood zones determine mandatory flood insurance purchase. ASCE 7 wind speeds set the design standard builders were required to meet. USGS seismic parameters define required building code seismic design category. WUI classifications identify structures in ember cast risk zones. These fields directly enable zone-based accumulation, treaty line definition, and regulatory compliance checking.

Field Description Coverage Data Quality Underwriting Notes
asce7_wind_speed_mph ASCE 7-22 design wind speed (3-sec gust at 10m) at building location, in mph. This is the current design standard — not the standard in effect at year of construction.
100%
HIGH
ASCE 7-22
FL coastal 165–200+ mph; Gulf barrier islands similar; Midwest interior ~115 mph. Based on long-term weather station observations. Use wind_code_vintage for the applicable standard at construction time.
wind_code_vintage Wind design code edition in effect when the building was constructed. Derived from year_built and ASCE 7 / state building code adoption history.
100%
HIGH
HB Derived
Pre-1993 FL structures were built to pre-Andrew codes — significantly weaker than post-2002 Florida Building Code. Key signal for FL wind vulnerability stratification.
sdc Seismic Design Category (ASCE 41). Values A–E: A/B = low seismic risk (most of eastern US); D/E = high (Pacific Coast, New Madrid Zone).
100%
HIGH
USGS NSHM
SDC D/E = seismic-resistant structural systems required by code. Used to flag seismic vulnerability and determine required construction methodology at time of build.
nehrp_site_class NEHRP soil site amplification class. A = hard rock; B = rock; C = dense soil; D = stiff soil; E = soft clay / Bay Mud.
100%
HIGH
USGS NSHM
Class E (SF Bay Mud, Seattle fill areas) can amplify ground motion 3–5× relative to Class A rock at the same location. Critical for CA and PNW earthquake accumulation analysis.
firmzone FEMA NFIP Flood Insurance Rate Map zone at building location. AE = 1% annual flood risk (riverine); VE = 1% flood + wave action (coastal); X = outside 500-yr boundary; AO = sheet flow.
100% joined 16% of buildings in a mapped flood zone
HIGH
FEMA NFIP FIRM
VE zones = highest flood damage potential (wave action + inundation). Mandatory flood insurance purchase trigger for federally backed mortgages in AE/VE zones.
bfe_ft FEMA Base Flood Elevation in feet AMSL — the regulatory 1% annual chance flood level. Only exists where FEMA has published detailed depth grids.
3% Major rivers, coastal SFHA zones only
HIGH where present
FEMA FIRM Depth Grids
Where present, this is the most authoritative available flood elevation benchmark. Low national coverage because FEMA only produces depth grids for recently updated, detailed study FIRMs.
freeboard_ft First floor elevation minus BFE, in feet. Positive = elevated above BFE (lower flood risk). Negative = below BFE (high flood damage probability).
3%
HIGH
HB Derived
Buildings with freeboard < −2 ft historically experience catastrophic losses even in moderate flood events. Available only where BFE data exists. Strong predictor of NFIP claim frequency.
in_wui_ignition TRUE if building is inside FEMA 2020 WUI Intermix zone — wildlands physically surround the structure. Highest direct fire risk. FALSE or null = not in Intermix.
100% joined 1–14% TRUE by state
HIGH
FEMA WUI 2020
Intermix WUI = structures scattered among dense vegetation. Direct flame contact from adjacent vegetation is plausible. State distribution: ME 13.4%, VT 13.3%, KY 11.6%, WV 11.4% highest; AZ 0.9%, NV 1.0%, FL 1.4% lowest.
in_wui_ember TRUE if building is in FEMA 2020 WUI Interface zone — within ~2km ember cast distance of wildlands. Broader than Intermix — most US buildings qualify.
100% joined 51–93% TRUE by state
HIGH
FEMA WUI 2020
HI 92.5%, DE 89.5%, MD 83.7% highest; VT 51.4%, NV 57.7%, NH 62% lowest. Ember ignition is the primary cause of structure loss in wildfire interface events. Pair with wrc_bp_rank for actual community fire probability.
wui_class USFS legacy WUI density/vegetation category (e.g., Med_Dens_Interface, High_Dens_Intermix). Superseded by in_wui_ignition / in_wui_ember for modeling.
100%
wrc bp rank — state coverage AL: 63% AR: 62% AZ: 92% CA: 93% CO: 81% CT: 55% DE: 40% FL: 74% GA: 47% IA: 75% ID: 63% IL: 82% IN: 66% KS: 74% KY: 46% LA: 63% MA: 64% MD: 80% ME: 42% MI: 51% MN: 70% MO: 63% MS: 53% MT: 57% NC: 55% ND: 60% NE: 66% NH: 41% NJ: 62% NM: 83% NV: 94% NY: 69% OH: 69% OK: 75% OR: 74% PA: 52% RI: 56% SC: 47% SD: 58% TN: 58% TX: 72% UT: 92% VA: 61% VT: 35% WA: 77% WI: 65% WV: 42% WY: 66% AK: 0% ≥80% 40–79% <40% 0% / none
MEDIUM
USFS WUI
More granular than binary WUI flags but less current than FEMA 2020 layer. Useful for legacy analysis compatibility. Use in_wui_ignition and in_wui_ember for primary wildfire risk stratification.
wrc_bp_rank USFS Wildfire Risk to Communities community burn probability rank, 0–1 scale. Higher = higher community fire probability nationally. Community-level (TIGER Places) — 31.5% of buildings in unincorporated areas are null.
68.5% 31.5% null (unincorporated)
≥80%
40–79%
<40%
MEDIUM
USFS WRC
CA mean 0.93 (93rd pctile nationally), FL 0.81, VT 0.35. Most actionable wildfire risk stratification field in the SI for portfolio accumulation. Null for unincorporated rural areas (31.5% of buildings).
Tier 3 Federal Property Inventory — Complete, Modeled FHWA / USACE National Structures Inventory (NSI)

The National Structures Inventory (NSI) is a federally maintained building database produced by the US Army Corps of Engineers and FHWA. It provides 100% national coverage for all fields in this tier. The tradeoff is granularity: NSI uses statistical inference at the census tract level, not per-building permit data. Confidence varies significantly by field within this tier — check the Confidence column for each.

Why this matters: NSI provides the foundational building skeleton — occupancy type, story count, foundation type, baseline replacement cost — for every structure in the country. Where assessor coverage is sparse (most of the US outside FL), NSI is the primary source for construction class. Hummingbird applies local Marshall & Swift cost calibration on top of NSI values for replacement cost (see Tier 5).

Field Description Coverage Data Quality Underwriting Notes
occtype Occupancy classification. Common values: RES1-1SNB (SFR, no basement), RES1-1SWB (SFR, basement), RES2 (mobile home), RES3A (multi-family 3–4 units), COM1 (retail).
100%
MED–HIGH
NSI
Reasonably accurate for basic residential type classification. Essential for occupancy-differentiated loss curves.
num_story Number of above-grade stories. NSI uses census-calibrated story assignments by occupancy type and vintage.
100%
MEDIUM
NSI
Mostly accurate but not permit-recorded. Influences wind pressure calculations (taller buildings face higher design wind loads) and replacement cost estimation.
found_type Foundation type. Values: Slab, Crawlspace, Basement, Pier. Modeled from census tract data — not measured.
100%
MEDIUM
NSI
Critical for flood loss estimation. Slab-on-grade floods at grade level; pier/crawlspace have elevated first floors. Drives first_floor_elv calculation.
sqft Gross floor area in square feet. Primary area field for all states. NSI calibrates from ACS housing surveys at the tract level.
100%
MEDIUM
NSI
Primary area basis for rcv_adj_struct_usd nationally. Less accurate than assessor sqft_assessor (FL). Use sqft_assessor where available.
med_yr_blt Median year built for the NSI census block group — a tract-level proxy, not a building-specific value.
100%
LOW
NSI
Use only when year_built_assessor is absent. A tract median applied to all buildings in the tract obscures within-tract vintage variation. Do not use for building-specific wind code vintage assignment.
val_struct NSI baseline replacement cost in USD before Hummingbird calibration. Known to underestimate in high-cost coastal markets (FL, CA, TX) by 20–40%.
100%
LOW
NSI
Do not use for loss modeling — use rcv_adj_struct_usd (Tier 5) instead. Retained here for benchmarking against the NSI baseline only.
bldgtype NSI construction class inference: W (wood / light frame) or M (masonry). Inferred from census tract housing survey data.
100%
LOW–MED
NSI
Superseded by construction_type where CAMA is available. Used as the national fallback (fills ~8.5% of buildings where no assessor record exists). Statistically unbiased at portfolio level but not reliable at individual building level.
Tier 4 Satellite & Open Mapping — Geometry Overture Maps Foundation (Microsoft · Meta · Amazon · TomTom)

Overture Maps provides GPS location and building geometry from satellite and aerial imagery. High confidence for spatial position — sub-5-meter accuracy for 97% of structures — but lower confidence for physical attributes like height that require dense lidar coverage.

Why this matters: Accurate building geometry is essential for catastrophe modeling. Footprint area (not floor area) determines roof exposure. Spatial precision affects whether a structure falls inside or outside a flood zone, WUI boundary, or wind speed contour. Overture's satellite-derived geometry is generally more accurate than parcel centroids for spatial intersection.

Field Description Coverage Data Quality Underwriting Notes
lat Building centroid latitude (WGS84 decimal degrees). Sub-5-meter accuracy for 97% of structures.
100%
HIGH
Overture Maps
All spatial joins (FEMA flood zones, ASCE wind speeds, WUI layers, seismic parameters) are performed on these coordinates. Foundation of all risk assignment in the SI.
lon Building centroid longitude (WGS84 decimal degrees).
100%
HIGH
Overture Maps
See lat.
footprint_area_sqm Building footprint area in square meters measured from the Overture polygon. Distinct from sqft (floor area) — for a 2-story building, sqft ≈ 2× footprint.
90%
HIGH
Overture Maps
Roof exposure for hail and wind is proportional to footprint area, not total floor area. Especially useful for large commercial structures where satellite-derived footprint is more reliable than NSI estimates.
height Building height in meters derived from satellite stereo imagery or photogrammetry. Sparse in rural areas.
68%
MEDIUM
Overture Maps
Available for most urban/suburban areas; sparse in rural regions. Used to corroborate story count and calculate exposure-weighted wind pressure. Where absent, derive from num_story.
roof_shape Roof geometry. Values: Gable, Hip, Flat, Complex. 8% national; FL better covered via FDOR supplemental data.
8% national
MEDIUM where present
Overture MapsFDOR NAL
Hip roofs are significantly more wind-resistant than gable roofs under ASCE 7. Coverage too sparse to use nationally — where absent, roof shape is null (not inferred). Use only as a supplemental signal where available.
overture_id Unique building identifier from the Overture Maps database. ~3% of SI buildings sourced from NSI geocoordinates without a matched Overture polygon.
97%
IDENTIFIER
Overture Maps
Stable identifier for cross-referencing against future Overture releases.
Tier 5 Hummingbird Engineered Inference Actuarial & Engineering Rules Applied to Tiers 1–4

These fields do not come directly from any single government database. Hummingbird engineers and actuaries derive them by applying physical and actuarial models to the source data in Tiers 1–4. The inputs are all publicly sourced; the outputs are HB's best estimate of each building's engineering characteristics. Confidence depends on the quality of underlying inputs — higher for FL where Tier 1 data is complete, lower for NSI-only states.

Why this matters: The most important wind vulnerability variables — nail schedule, roof-to-wall connection, opening protection — are almost never recorded in any public database. They can only be inferred from building vintage, construction class, and the wind code in effect when the permit was filed. HB's inference logic is based on IBHS field research, Florida Building Code historical records, and ASCE 7 edition histories. The replacement cost field (rcv_adj_struct_usd) corrects for known NSI biases in high-cost coastal markets.

Field Description Coverage Data Quality Underwriting Notes
rcv_adj_struct_usd Hummingbird replacement cost in USD — locally calibrated per county using Marshall & Swift construction cost factors on sqft × construction_type. Replaces NSI val_struct.
100%
MED–HIGH
HB Modeled
Use this field — not val_struct — for all exposure aggregation and loss modeling. Validated against FDOR improvements_value in FL. Significantly more accurate than NSI in coastal markets (Miami-Dade, Naples, coastal CA) where construction costs are 20–40% above national averages.
roof_material Roof covering material. Values: Asphalt Shingle, Metal, Tile, Built-up, Single-ply Membrane. Priority: FDOR > Overture > NSI/inference.
100%
MEDIUM
FDOR NALOverture MapsNSI fallback
Asphalt shingle dominates residential nationally (~75%). Metal and tile more prevalent in coastal FL. Tile is significantly more vulnerable to hail than metal. Critical for hail damage modeling.
roof_deck_attachment Roof deck nail specification. Values: 6d (6-penny common), 8d (8-penny common), 6d_ring_shank. Inferred from vintage + construction_type + state wind code history.
100%
MEDIUM
HB Modeled
Nail schedule is the single most important predictor of roof deck failure in hurricanes (IBHS research). Pre-1994 FL = predominantly 6d; post-2002 = predominantly 8d ring-shank. Operationalizes a known carrier data bias: carriers claim 40% ring-shank; IBHS field surveys find 70% 6d in pre-1994 FL structures.
roof_wall_connection Method connecting roof structure to top wall plate. Values: Toe-nail, Clip, Strap. Inferred from vintage + code uplift history.
100%
MEDIUM
HB Modeled
Single strongest predictor of total roof loss in severe wind. Toe-nail connections (pre-1992 FL) fail at much lower wind speeds than hurricane straps. FL mandatory strap requirement phased in post-Andrew, fully enforced under 2002 Florida Building Code.
opening_protection Window and door opening protection level. Values: None, Basic, Hurricane-rated. Inferred from age + wind zone.
100%
MEDIUM
HB Modeled
Opening breach is the primary failure mode for interior hurricane damage — once a window fails, internal pressure spikes dramatically. Miami-Dade product approval (HVHZ) requirement in effect from 2002 FBC. Post-2002 FL = hurricane-rated; pre-1994 FL = None.
wind_vuln_source Identifies which source combination drove the wind vulnerability inference for each building. Audit / provenance field.
84%
PROVENANCE
Derived
Use to understand confidence of Tier 5 wind fields for any given building or portfolio segment. Filter to high-input-quality records for sensitivity testing.
Tier 6 Observed Aggregate Loss Data FEMA NFIP Historical Flood Insurance Claims — Tract Level

FEMA publishes historical flood insurance claims at the census tract level. These are real flood insurance payouts — not modeled risk. Medium-high confidence for flood frequency signal, but tract-level (not building-specific). This is the behavioral evidence layer: it tells you whether buildings in this neighborhood have actually flooded before, and at what severity.

Why this matters: Repeat flooding is highly predictable geographically. Tracts with high claim counts and high average losses are documented flood problem areas. The below-BFE percentage (nfip_below_bfe_pct_tract) is a particularly useful signal — it indicates whether the FIRM flood zone designation is consistent with observed losses or understates actual flood exposure in the area.

Field Description Coverage Data Quality Underwriting Notes
nfip_claim_count_tract Total NFIP flood insurance claims paid in the building's census tract, all years. National median: 14 claims/tract. Outlier tracts: 1,000+.
97%
MED–HIGH
FEMA NFIP
Tracts with 1,000+ claims indicate chronic flood exposure regardless of current FIRM designation — the area has flooded repeatedly. Note: reflects NFIP-insured structures only; uninsured losses not captured.
nfip_avg_building_loss_tract Average paid building loss per NFIP claim in the tract, nominal USD across all claim years. National median: ~$15k.
93%
MED–HIGH
FEMA NFIP
High average loss + high count = persistent major flood risk. NFIP claim limits cap at $250K building / $100K contents — actual losses may exceed recorded amounts in high-value neighborhoods.
nfip_last_claim_year_tract Most recent year a NFIP claim was paid in the census tract.
97%
MEDIUM
FEMA NFIP
Recency signals ongoing vs. historical flood risk. Tracts with recent event claims (2017 Harvey, 2021 Ida, 2024 Helene) may have higher prevalence of flood-damaged structures with unresolved vulnerability.
nfip_below_bfe_pct_tract % of NFIP claims in the tract where the insured property was reported below the Base Flood Elevation.
97%
MEDIUM
FEMA NFIP
High below-BFE % = non-compliant construction prevalent in the area, or FIRM underestimates actual flood exposure. Key aggregate flood severity signal — buildings below BFE sustain dramatically higher losses. Also an adverse selection indicator.
Tier 7 Survey & Socioeconomic Context US Census ACS 5-Year · HMDA · FHFA House Price Index

Census survey data and housing market indicators. All tract-level, 5-year rolling averages. Useful for behavioral and maintenance context but not primary physical risk indicators. These describe the neighborhood, not the individual building.

Why this matters: Socioeconomic context influences both actual risk and behavioral response. Owner-occupancy rate and maintenance investment correlate with physical property condition relative to NSI estimates. Pre-1980 vintage percentage helps calibrate older construction prevalence. CBSA classification enables market-level exposure aggregation. FHFA HPI identifies markets where insured value inflation has diverged from reconstruction cost.

Field Description Coverage Data Quality Underwriting Notes
acs_median_home_value_usd ACS 5-year tract median owner-occupied home value, USD.
66%
LOW–MED
ACS 5-Year
Proxy for local property values. Useful for identifying neighborhoods where NSI replacement cost may be understated. High-value coastal markets (Malibu, Naples FL) should be verified against rcv_adj_struct_usd.
acs_owner_occupancy_pct % of housing units in the tract that are owner-occupied vs. renter or vacant.
67%
LOW–MED
ACS 5-Year
Owner-occupied properties are generally better maintained. Lower occupancy correlates with higher wind event claim frequency. Used in behavioral model calibration as a maintenance investment proxy.
acs_pct_built_pre1980 % of housing structures in the tract built before 1980.
67%
LOW–MED
ACS 5-Year
Pre-1980 structures pre-date modern building codes in most states. High prevalence = elevated structural vulnerability. Important for seismic (unreinforced masonry) and wind (pre-Andrew code) portfolios.
hmda_median_loan_usd Median home improvement loan amount originated in the tract from HMDA filings, USD.
67%
LOW–MED
HMDA / CFPB
Proxy for neighborhood reinvestment and renovation activity. High HMDA improvement lending suggests more recent roof replacements and construction upgrades than assessor vintage alone indicates.
cbsa_code Census Core-Based Statistical Area (CBSA) code for the building's metro or micropolitan area.
98%
HIGH (classification)
Census TIGER
Standard geographic identifier for metro-level accumulation analysis. Also used for labor cost calibration in rcv_adj_struct_usd. 2% of buildings in rural non-CBSA areas.
cbsa_title Human-readable CBSA name (e.g., "Miami-Fort Lauderdale-Pompano Beach, FL").
98%
HIGH (classification)
Census TIGER
See cbsa_code.
fhfa_hpi_2024 FHFA House Price Index level for the CBSA as of 2024. Available for CBSAs with sufficient ARM transaction data.
31%
MEDIUM
FHFA
Identifies markets where market value appreciation has significantly outpaced construction cost inflation — a key indicator of underinsurance risk as ACV policies diverge from current replacement cost.
fhfa_hpi_yoy_pct FHFA HPI year-over-year % change for the CBSA.
31%
MEDIUM
FHFA
Rapidly appreciating markets (Sun Belt metros 2020–2023) suggest increasing insurance-to-value gaps where policy limits have not kept pace with replacement cost increases.
acs_pct_single_unit % of housing structures in the tract that are single-unit (detached or attached) buildings.
67%
LOW–MED
ACS 5-Year
Used to calibrate NSI occupancy type distribution within each tract and to understand the concentration of residential vs. multifamily exposure.

Data Provenance & Source Notes

All data in the Hummingbird Structure Inventory is derived from publicly available government sources. No proprietary third-party data vendors are used. Complete source list: Florida Department of Revenue (FDOR NAL) · US Army Corps / FHWA National Structures Inventory (NSI) · Overture Maps Foundation · FEMA Flood Insurance Rate Maps (FIRM) and NFIP Claims · USGS 3D Elevation Program (3DEP) · USGS National Seismic Hazard Model (NSHM) · ASCE 7-22 design wind speed maps · US Census Bureau American Community Survey (ACS) · FHFA House Price Index · HMDA filings (CFPB) · USFS Wildfire Risk to Communities (WRC) · FEMA WUI boundary layers · County assessor / CAMA databases via public records requests or open data portals.

Storage: SI is stored in Apache Parquet format on Google Cloud Storage and Amazon S3. All source data is publicly licensed and suitable for storage under public data terms. No Fermat or other carrier proprietary data is incorporated into the SI.

Coverage exception: Alaska is excluded from the current version. All 49 continental states plus Hawaii are covered. Alaska is planned for a future release.

Currency: FDOR NAL, NSI, and FEMA regulatory layers refresh annually. Overture Maps refreshes quarterly. ACS 5-year estimates are on a rolling annual cycle. County CAMA data is ingested as publicly available, with most counties updating annually or quarterly.

Hummingbird Software, LLC — Confidential. This document contains proprietary methodology descriptions. Distribution restricted to authorized recipients.