Open models for biodiversity.
Peer-reviewed papers at NeurIPS, IEEE Field Robotics, and AAAI. Open ATProto lexicons. A self-hostable Hypersphere stack. Built with frontline partners; freely usable.
Selected papers, datasets, and writing.
- daviddao.org · 2026
Governing the Commons in the Intelligent Age
David Dao
AbstractFrom Hardin to Ostrom to AI agents; design principles for sociotechnical systems that preserve human agency, build digital trust, and scale commons governance with ML in the loop.
Essay - IEEE Trans. on Field Robotics · 2025
Autonomous Aerial-Aquatic Rapid Biodiversity Assessment in the Amazon
ETH BiodivX with GainForest
AbstractAutonomous aerial and aquatic drones, vision-language models, environmental DNA, and bioacoustic classifiers chained into a 24-hour biodiversity assessment pipeline; full XPRIZE Rainforest field methodology.
Paper - NeurIPS 2024
OAM-TCD: A Globally Diverse Dataset of High-Resolution Tree Cover Maps
Veitch-Michaelis, Dao, et al.
Abstract280,000+ instance annotations of individual tree crowns from OpenAerialMap imagery; Mask2Former and SegFormer baselines released alongside the dataset for instance and semantic segmentation.
Dataset - NeurIPS 2023
Collaborative Machine Learning for the Natural World
David Dao
AbstractInvited NeurIPS workshop talk on community-in-the-loop ML pipelines for biodiversity; field data flows from Ecuador, Brazil, and the Philippines, and how attribution rewards make those pipelines durable.
Invited talk - NeurIPS 2023
GEO-Bench: Toward Foundation Models for Earth Monitoring
Lacoste, Dao, et al.
AbstractSix classification and six segmentation tasks across six remote-sensing modalities; standard pretrain / fine-tune protocol and a leaderboard for evaluating Earth-observation foundation models.
Paper - MBZUAI · 2023
GainForest: AI and Web3 for the Climate Frontline
David Dao
AbstractResearch seminar at MBZUAI covering ReforesTree, deep-learning baselines for forest carbon stock, smart-contract payouts to steward addresses, and the move toward ATProto-anchored proof-of-impact records.
Invited talk - AAAI Workshop · 2022
ReforesTree: A Dataset for Estimating Tropical Forest Carbon Stock
Reiersen, Dao, et al.
AbstractDrone photogrammetry across six agroforestry sites in Ecuador with per-tree carbon-stock annotations; CNN regression baselines released openly and later reused in Earth-observation foundation-model evaluations.
Workshop - NeurIPS CCAI · 2022
ForestBench: Equitable Benchmarks for Monitoring, Reporting, and Verification of Nature-Based Solutions with Machine Learning
Newman, Exposito-Alonso, Czech, Dao, Lütjens, Gillespie, Hao, Cottam
AbstractSix benchmark tasks for monitoring, reporting, and verification of nature-based solutions; protocol-level definitions of fairness, uncertainty, and equitable cost-of-error across forest-carbon ML pipelines.
Workshop - ICML CCAI · 2021
Tackling the Overestimation of Forest Carbon with Deep Learning and Aerial Imagery
Reiersen, Dao, Lütjens, Klemmer, Zhu, Zhang
AbstractSpotlight. UAV imagery and CNN regression on six Ecuadorian agroforestry sites exposing systematic overestimation in forest-carbon stock claims relative to allometric ground truth.
Workshop - ICLR CCAI · 2020
Xingu: Explaining Critical Geospatial Predictions in Weak Supervision for Climate Finance
Dao, Rausch, Zhang, Rott
AbstractSaliency and influence-function diagnostics applied to weakly-supervised geospatial classifiers driving climate-finance decisions; case study on Amazonian forest-loss models.
Workshop - ICLR CCAI · 2020
TrueBranch: Metric Learning-based Verification of Forest Conservation Projects
Santamaria, Dao, Lütjens, Zhang
AbstractBest Proposal Award. Metric-learning verification of forest conservation projects; on-site and off-site ground-truth photographs embedded into a space where authentic claims separate from staged ones.
Workshop - NeurIPS CCAI · 2019
GeoLabels: Towards Efficient Ecosystem Monitoring using Data Programming on Geospatial Information
Dao, Rausch, Zhang
AbstractSnorkel-style data programming on geospatial features; weak-supervision labelling for ecosystem monitoring at continental scale without per-pixel ground truth.
Workshop - ICML CCAI · 2019
GainForest: Scaling Climate Finance for Forest Conservation using Interpretable Machine Learning on Satellite Imagery
Dao, Cang, Fung, Zhang, Pawlowski, Gonzales, Beglinger, Liu Zhang
AbstractFounding workshop paper. Interpretable ML on satellite imagery driving smart-contract payouts to forest stewards; the original architecture this entire research programme is built on.
Workshop - Medium · 2018
Decentralized Sustainability: Beyond the Tragedy of the Commons with Smart Contracts and AI
David Dao
AbstractThe founding essay; satellite-driven forest-loss prediction wired to a smart-contract escrow paying steward addresses directly, demoed at the 2017 UN Climate Change Hackathon.
Essay
ATProto lexicons for nature data.
Co-authored with the Hypercerts community and shipped as five reusable layers. Every lexicon, package, and service below is open source and operable end-to-end on your own PDS.
org.hypercerts.* lexicons
the schema
Co-authored ATProto lexicons describing impact claims, evidence collections, and verification labels as portable signed records on any PDS, validated against shared JSON schemas.
Hypersphere PDS
the data home
Self-hostable atproto-pds deployment tuned for community use; OAuth with DPoP, blob storage on S3-compatible buckets, and one-command provisioning so a steward can own every record signed against their DID.
Hyperindex
the indexer
ATProto firehose subscriber that crawls org.hypercerts.* records across the network, normalises them into Postgres, and exposes the result through a typed GraphQL schema every downstream tool can query.
Hyperlabel
the trust layer
Labeller service emitting com.atproto.label.* records over Hypercert claims; tier signals (high-quality, verified, contested) feed Bumicerts and any compatible consumer the same way Bluesky labels feed downstream feeds.
Hyperscan
the explorer
Web explorer for org.hypercerts.* records; resolves DID → PDS → blob CID and renders the full evidence trail behind any Bumicert, like a block explorer for community claims.
Open models, open datasets.
Every artefact behind the papers above is downloadable today. Trained weights and datasets on HuggingFace, benchmark suites and field pipelines on GitHub, and the assistant on community-owned PDS infrastructure.
OAM-TCD
on HuggingFace
280,000+ instance annotations of individual tree crowns over OpenAerialMap imagery, plus Mask2Former and SegFormer baselines fine-tuned for instance and semantic segmentation; dataset and weights both released.
Geo-Bench
on GitHub
Community benchmark suite for Earth-observation foundation models; six classification and six segmentation tasks across six remote-sensing modalities, with a shared pretrain / fine-tune protocol and leaderboard.
ReforesTree
on arXiv
Drone photogrammetry across six agroforestry sites in Ecuador with per-tree carbon-stock annotations and CNN regression baselines; reused as a downstream task in later Earth-observation foundation-model evaluations.
BiodivX agents
in IEEE T-FR
Multi-modal field pipeline behind the XPRIZE Rainforest win: autonomous aerial and aquatic drones, vision-language agents, bioacoustic classifiers, and on-site environmental DNA sequencing chained into a 24-hour biodiversity assessment.
Taina
on PDS + Telegram
Community-owned multilingual LLM assistant co-designed with Indigenous and local communities around Manaus; memory and contributions are stored as signed records on the contributor's own PDS rather than a vendor's database.
The theoretical frame; regenerative intelligence.
How the papers, datasets, and protocols above fit one frame: AI as a tool for repairing the commons, governed by the people closest to the land.