October 11, 2022 | Must Reads

Earthmover: The Modern Data Stack for Climate Data

Tony Liu

Written by

Tony Liu

Source: NASA

Ryan and Joe are not your usual founders. Ryan is a tenured professor, who runs an oceanography lab at Columbia University. Joe was a scientist at NCAR and a co-founder and Technology Director at CarbonPlan. They were not founders in search of a problem, but practitioners who, through the course of their climate research, kept running into the same nagging problem – working with big scientific data is a nightmare. 

To enable their research, Ryan and Joe had to put on their software engineering hats and build the infrastructure to work with climate data flexibly and at scale. In the process, they became core developers of open source projects Zarr and Xarray and spearheaded the Pangeo community, which is now an eclectic and vibrant community of researchers and developers working together on big data geoscience.

Over time, they were consumed by the foundational work they were doing for the Pangeo community to democratize scientific data analysis. In their words, “Our side thing became our main thing.” Founding Earthmover was the natural evolution of this work. 

With Earthmover, Ryan and Joe want to lower the barrier to entry for organizations, researchers, and developers that want to work on climate-related problems, which often involve analyzing large amounts of geospatial data. The thesis is simple. The company’s mission is to build an easy-to-use, cloud-native, enterprise-grade data platform that offers underserved users a UX that they are familiar with and makes it easy to consolidate data in a cloud-native data warehouse or lakehouse.

Sound familiar?

The “Modern Data Stack” (MDS) does just that, and it has transformed the way many new companies set up their data infrastructure. It is geared towards data in tabular format, which most businesses use. However, climate researchers are, more often than not, working with labeled multi-dimensional arrays, and this type of data is not compatible with the MDS. The equivalent Snowflake – dbt – Fivetran centric ecosystem does not exist for climate data.

Earthmover has ambitious plans to build out that entire ecosystem for scientific data, starting with the ArrayLake that will serve as the foundation for the platform. As the name suggests, it is inspired by the data lakehouse model that Databricks popularized. With the ArrayLake, researchers and developers will be able to work with the most important climate data sources flexibly.

We are very excited by Ryan and Joe’s vision to bring the scientific community the same delightful experience that the tabular data world has been able to enjoy over the past few years. Moreover, we are thrilled by their shared passion to create the data infrastructure to solve some of the biggest challenges that the planet is facing.


Our Perspectives