Research

Open models for biodiversity.

Peer-reviewed papers at NeurIPS, IEEE Field Robotics, and AAAI. Open ATProto lexicons. A self-hostable Hypersphere stack. Built with frontline partners; freely usable.

Publications

Selected papers, datasets, and writing.

Drag to scroll

daviddao.org · 2026
Governing the Commons in the Intelligent Age
David Dao
Abstract
From Hardin to Ostrom to AI agents; design principles for sociotechnical systems that preserve human agency, build digital trust, and scale commons governance with ML in the loop.
Essay
IEEE Trans. on Field Robotics · 2025
Autonomous Aerial-Aquatic Rapid Biodiversity Assessment in the Amazon
ETH BiodivX with GainForest
Abstract
Autonomous aerial and aquatic drones, vision-language models, environmental DNA, and bioacoustic classifiers chained into a 24-hour biodiversity assessment pipeline; full XPRIZE Rainforest field methodology.
Paper
NeurIPS 2024
OAM-TCD: A Globally Diverse Dataset of High-Resolution Tree Cover Maps
Veitch-Michaelis, Dao, et al.
Abstract
280,000+ instance annotations of individual tree crowns from OpenAerialMap imagery; Mask2Former and SegFormer baselines released alongside the dataset for instance and semantic segmentation.
Dataset
NeurIPS 2023
Collaborative Machine Learning for the Natural World
David Dao
Abstract
Invited NeurIPS workshop talk on community-in-the-loop ML pipelines for biodiversity; field data flows from Ecuador, Brazil, and the Philippines, and how attribution rewards make those pipelines durable.
Invited talk
NeurIPS 2023
GEO-Bench: Toward Foundation Models for Earth Monitoring
Lacoste, Dao, et al.
Abstract
Six classification and six segmentation tasks across six remote-sensing modalities; standard pretrain / fine-tune protocol and a leaderboard for evaluating Earth-observation foundation models.
Paper
MBZUAI · 2023
GainForest: AI and Web3 for the Climate Frontline
David Dao
Abstract
Research seminar at MBZUAI covering ReforesTree, deep-learning baselines for forest carbon stock, smart-contract payouts to steward addresses, and the move toward ATProto-anchored proof-of-impact records.
Invited talk
AAAI Workshop · 2022
ReforesTree: A Dataset for Estimating Tropical Forest Carbon Stock
Reiersen, Dao, et al.
Abstract
Drone photogrammetry across six agroforestry sites in Ecuador with per-tree carbon-stock annotations; CNN regression baselines released openly and later reused in Earth-observation foundation-model evaluations.
Workshop
NeurIPS CCAI · 2022
ForestBench: Equitable Benchmarks for Monitoring, Reporting, and Verification of Nature-Based Solutions with Machine Learning
Newman, Exposito-Alonso, Czech, Dao, Lütjens, Gillespie, Hao, Cottam
Abstract
Six benchmark tasks for monitoring, reporting, and verification of nature-based solutions; protocol-level definitions of fairness, uncertainty, and equitable cost-of-error across forest-carbon ML pipelines.
Workshop
ICML CCAI · 2021
Tackling the Overestimation of Forest Carbon with Deep Learning and Aerial Imagery
Reiersen, Dao, Lütjens, Klemmer, Zhu, Zhang
Abstract
Spotlight. UAV imagery and CNN regression on six Ecuadorian agroforestry sites exposing systematic overestimation in forest-carbon stock claims relative to allometric ground truth.
Workshop
ICLR CCAI · 2020
Xingu: Explaining Critical Geospatial Predictions in Weak Supervision for Climate Finance
Dao, Rausch, Zhang, Rott
Abstract
Saliency and influence-function diagnostics applied to weakly-supervised geospatial classifiers driving climate-finance decisions; case study on Amazonian forest-loss models.
Workshop
ICLR CCAI · 2020
TrueBranch: Metric Learning-based Verification of Forest Conservation Projects
Santamaria, Dao, Lütjens, Zhang
Abstract
Best Proposal Award. Metric-learning verification of forest conservation projects; on-site and off-site ground-truth photographs embedded into a space where authentic claims separate from staged ones.
Workshop
NeurIPS CCAI · 2019
GeoLabels: Towards Efficient Ecosystem Monitoring using Data Programming on Geospatial Information
Dao, Rausch, Zhang
Abstract
Snorkel-style data programming on geospatial features; weak-supervision labelling for ecosystem monitoring at continental scale without per-pixel ground truth.
Workshop
ICML CCAI · 2019
GainForest: Scaling Climate Finance for Forest Conservation using Interpretable Machine Learning on Satellite Imagery
Dao, Cang, Fung, Zhang, Pawlowski, Gonzales, Beglinger, Liu Zhang
Abstract
Founding workshop paper. Interpretable ML on satellite imagery driving smart-contract payouts to forest stewards; the original architecture this entire research programme is built on.
Workshop
Medium · 2018
Decentralized Sustainability: Beyond the Tragedy of the Commons with Smart Contracts and AI
David Dao
Abstract
The founding essay; satellite-driven forest-loss prediction wired to a smart-contract escrow paying steward addresses directly, demoed at the 2017 UN Climate Change Hackathon.
Essay

Open infrastructure

ATProto lexicons for nature data.

Co-authored with the Hypercerts community and shipped as five reusable layers. Every lexicon, package, and service below is open source and operable end-to-end on your own PDS.

Models & datasets

Open models, open datasets.

Every artefact behind the papers above is downloadable today. Trained weights and datasets on HuggingFace, benchmark suites and field pipelines on GitHub, and the assistant on community-owned PDS infrastructure.

The frame

The theoretical frame; regenerative intelligence.

How the papers, datasets, and protocols above fit one frame: AI as a tool for repairing the commons, governed by the people closest to the land.

Read the essay Collaborate with us

Open models for biodiversity.

Selected papers, datasets, and writing.

Governing the Commons in the Intelligent Age

Autonomous Aerial-Aquatic Rapid Biodiversity Assessment in the Amazon

OAM-TCD: A Globally Diverse Dataset of High-Resolution Tree Cover Maps

Collaborative Machine Learning for the Natural World

GEO-Bench: Toward Foundation Models for Earth Monitoring

GainForest: AI and Web3 for the Climate Frontline

ReforesTree: A Dataset for Estimating Tropical Forest Carbon Stock

ForestBench: Equitable Benchmarks for Monitoring, Reporting, and Verification of Nature-Based Solutions with Machine Learning

Tackling the Overestimation of Forest Carbon with Deep Learning and Aerial Imagery

Xingu: Explaining Critical Geospatial Predictions in Weak Supervision for Climate Finance

TrueBranch: Metric Learning-based Verification of Forest Conservation Projects

GeoLabels: Towards Efficient Ecosystem Monitoring using Data Programming on Geospatial Information

GainForest: Scaling Climate Finance for Forest Conservation using Interpretable Machine Learning on Satellite Imagery

Decentralized Sustainability: Beyond the Tragedy of the Commons with Smart Contracts and AI

ATProto lexicons for nature data.

org.hypercerts.* lexicons

Hypersphere PDS

Hyperindex

Hyperlabel

Hyperscan

Open models, open datasets.

OAM-TCD

Geo-Bench

ReforesTree

BiodivX agents

Taina

The theoretical frame; regenerative intelligence.