jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.15.2
National Wetlands Inventory
Description
This dataset is a copy of the National Wetlands Inventory , offering the data in more GIS-friendly and cloud-native geospatial formats. The original dataset is distributed as zipped Geodatabase files, and is available for download from here .
Data download
The script below was used to download the data from the National Wetlands Inventory in Geodatabase format from here . The script uses the leafmap Python package.
First, create a conda environment with the required packages:
1 conda create -n gdal python= 3.11
2 conda activate gdal
3 conda install -c conda-forge mamba
4 mamba install -c conda-forge libgdal-arrow-parquet gdal leafmap
1 conda create -n gdal python= 3.11
2 conda activate gdal
3 conda install -c conda-forge mamba
4 mamba install -c conda-forge libgdal-arrow-parquet gdal leafmap
If you are using Google Colab, you can install the packages as follows:
1 # %pip install leafmap lonboard==0.3.0
1 # %pip install leafmap lonboard==0.3.0
Then, run the script below:
1 import leafmap
2 import pandas as pd
3
4 url = 'https://open.gishub.org/data/us/us_states.csv'
5 df = pd.read_csv(url)
6 ids = df['id'].tolist()
7 ids.sort()
8 urls = [f"https://documentst.ecosphere.fws.gov/wetlands/data/State-Downloads/{id}_geodatabase_wetlands.zip" for id in ids]
9 leafmap.download_files(urls, out_dir='.', unzip=True)
1 import leafmap
2 import pandas as pd
3
4 url = 'https://open.gishub.org/data/us/us_states.csv'
5 df = pd.read_csv(url)
6 ids = df['id'].tolist()
7 ids.sort()
8 urls = [f"https://documentst.ecosphere.fws.gov/wetlands/data/State-Downloads/{id}_geodatabase_wetlands.zip" for id in ids]
9 leafmap.download_files(urls, out_dir='.', unzip=True)
Data conversion
The script below was used to convert the data from the original Geodatabase format to Parquet format. The script uses the leafmap Python package.
1 import leafmap
2 import pandas as pd
3
4 url = 'https://open.gishub.org/data/us/us_states.csv'
5 df = pd.read_csv(url)
6 ids = df['id'].tolist()
7
8 for index, state in enumerate(ids):
9 print(f'Processing {state} ({index+1}/{len(ids)})')
10 gdb = f"{state}_geodatabase_wetlands.gdb/"
11 layer_name = f'{state}_Wetlands'
12 leafmap.gdb_to_vector(gdb, ".", gdal_driver="Parquet", layers=[layer_name])
1 import leafmap
2 import pandas as pd
3
4 url = 'https://open.gishub.org/data/us/us_states.csv'
5 df = pd.read_csv(url)
6 ids = df['id'].tolist()
7
8 for index, state in enumerate(ids):
9 print(f'Processing {state} ({index+1}/{len(ids)})')
10 gdb = f"{state}_geodatabase_wetlands.gdb/"
11 layer_name = f'{state}_Wetlands'
12 leafmap.gdb_to_vector(gdb, ".", gdal_driver="Parquet", layers=[layer_name])
The total file size of the Geodatabase files is 32.5 GB. The total file size of the Parquet files is 75.8 GB.
Data access
The script below can be used to access the data using DuckDB . The script uses the duckdb Python package.
1 import duckdb
2
3 con = duckdb.connect()
4 con.install_extension("spatial")
5 con.load_extension("spatial")
6
7 state = "DC" # Change to the US State of your choice
8 url = f"https://data.source.coop/giswqs/nwi/wetlands/{state}_Wetlands.parquet"
9 con.sql(f"SELECT * EXCLUDE geometry, ST_GeomFromWKB(geometry) FROM '{url}'")
1 import duckdb
2
3 con = duckdb.connect()
4 con.install_extension("spatial")
5 con.load_extension("spatial")
6
7 state = "DC" # Change to the US State of your choice
8 url = f"https://data.source.coop/giswqs/nwi/wetlands/{state}_Wetlands.parquet"
9 con.sql(f"SELECT * EXCLUDE geometry, ST_GeomFromWKB(geometry) FROM '{url}'")
Alternatively, you can use the aws cli to access the data directly from the S3 bucket:
1 aws s3 ls s3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/
1 aws s3 ls s3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/
Data visualization
To visualize the data, you can use the leafmap Python package with the lonboard backend. The script below shows how to visualize the data.
1 import leafmap
2
3 state = "DC" # Change to the US State of your choice
4 url = f"https://data.source.coop/giswqs/nwi/wetlands/{state}_Wetlands.parquet"
5 gdf = leafmap.read_parquet(url, return_type='gdf', src_crs='EPSG:5070', dst_crs='EPSG:4326')
6 leafmap.view_vector(gdf, get_fill_color=[0, 0, 255, 128])
1 import leafmap
2
3 state = "DC" # Change to the US State of your choice
4 url = f"https://data.source.coop/giswqs/nwi/wetlands/{state}_Wetlands.parquet"
5 gdf = leafmap.read_parquet(url, return_type='gdf', src_crs='EPSG:5070', dst_crs='EPSG:4326')
6 leafmap.view_vector(gdf, get_fill_color=[0, 0, 255, 128])
Alternatively, you can specify a color map to visualize the data.
1 color_map = {
2 "Freshwater Forested/Shrub Wetland": (0, 136, 55),
3 "Freshwater Emergent Wetland": (127, 195, 28),
4 "Freshwater Pond": (104, 140, 192),
5 "Estuarine and Marine Wetland": (102, 194, 165),
6 "Riverine": (1, 144, 191),
7 "Lake": (19, 0, 124),
8 "Estuarine and Marine Deepwater": (0, 124, 136),
9 "Other": (178, 134, 86),
10 }
11 leafmap.view_vector(gdf, color_column='WETLAND_TYPE', color_map=color_map, opacity=0.5)
1 color_map = {
2 "Freshwater Forested/Shrub Wetland": (0, 136, 55),
3 "Freshwater Emergent Wetland": (127, 195, 28),
4 "Freshwater Pond": (104, 140, 192),
5 "Estuarine and Marine Wetland": (102, 194, 165),
6 "Riverine": (1, 144, 191),
7 "Lake": (19, 0, 124),
8 "Estuarine and Marine Deepwater": (0, 124, 136),
9 "Other": (178, 134, 86),
10 }
11 leafmap.view_vector(gdf, color_column='WETLAND_TYPE', color_map=color_map, opacity=0.5)
Display a legend for the data.
1 leafmap.Legend(title="Wetland Type", legend_dict=color_map)
1 leafmap.Legend(title="Wetland Type", legend_dict=color_map)
Data analysis
Find out the total number of wetlands in the United States by aggregating the 51 parquet files.
1 con.sql(f"""
2 SELECT COUNT(*) AS Count
3 FROM 's3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/*.parquet'
4 """)
1 con.sql(f"""
2 SELECT COUNT(*) AS Count
3 FROM 's3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/*.parquet'
4 """)
Find out the number of wetlands in each state. Note that the NWI datasets do not contain a field for state names. The filename argument can be used to add an extra filename column to the result that indicates which row came from which file.
1 df = con.sql(f"""
2 SELECT filename, COUNT(*) AS Count
3 FROM read_parquet('s3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/*.parquet', filename=true)
4 GROUP BY filename
5 ORDER BY COUNT(*) DESC;
6 """).df()
7 df.head()
1 df = con.sql(f"""
2 SELECT filename, COUNT(*) AS Count
3 FROM read_parquet('s3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/*.parquet', filename=true)
4 GROUP BY filename
5 ORDER BY COUNT(*) DESC;
6 """).df()
7 df.head()
Inspect the list of filenames.
Create a State column based on the filename column by extracting the state name from the filename.
1 count_df = con.sql(f"""
2 SELECT SUBSTRING(filename, LENGTH(filename) - 18, 2) AS State, COUNT(*) AS Count
3 FROM read_parquet('s3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/*.parquet', filename=true)
4 GROUP BY State
5 ORDER BY COUNT(*) DESC;
6 """).df()
7 count_df.head(10)
1 count_df = con.sql(f"""
2 SELECT SUBSTRING(filename, LENGTH(filename) - 18, 2) AS State, COUNT(*) AS Count
3 FROM read_parquet('s3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/*.parquet', filename=true)
4 GROUP BY State
5 ORDER BY COUNT(*) DESC;
6 """).df()
7 count_df.head(10)
Create a wetlands table from the DataFrame above.
1 con.sql("CREATE OR REPLACE TABLE wetlands AS FROM count_df")
2 con.sql("FROM wetlands")
1 con.sql("CREATE OR REPLACE TABLE wetlands AS FROM count_df")
2 con.sql("FROM wetlands")
To visualize the data on the map, we need a GeoDataFrame. Let's create a states table from the us_states.parquet file.
1 url = 'https://open.gishub.org/data/us/us_states.parquet'
2 con.sql(
3 f"""
4 CREATE OR REPLACE TABLE states AS
5 SELECT * EXCLUDE geometry, ST_GeomFromWKB(geometry)
6 AS geometry FROM '{url}'
7 """
8 )
9 con.sql("FROM states")
1 url = 'https://open.gishub.org/data/us/us_states.parquet'
2 con.sql(
3 f"""
4 CREATE OR REPLACE TABLE states AS
5 SELECT * EXCLUDE geometry, ST_GeomFromWKB(geometry)
6 AS geometry FROM '{url}'
7 """
8 )
9 con.sql("FROM states")
Join the wetlands table with the states table.
1 con.sql("""
2 SELECT * FROM states INNER JOIN wetlands ON states.id = wetlands.State
3 """)
1 con.sql("""
2 SELECT * FROM states INNER JOIN wetlands ON states.id = wetlands.State
3 """)
Export the joined table as a Pandas DataFrame.
1 df = con.sql("""
2 SELECT name, State, Count, ST_AsText(geometry) as geometry
3 FROM states INNER JOIN wetlands ON states.id = wetlands.State
4 """).df()
5 df.head()
1 df = con.sql("""
2 SELECT name, State, Count, ST_AsText(geometry) as geometry
3 FROM states INNER JOIN wetlands ON states.id = wetlands.State
4 """).df()
5 df.head()
Convert the Pandas DataFrame to a GeoDataFrame.
1 gdf = leafmap.df_to_gdf(df, src_crs="EPSG:4326")
1 gdf = leafmap.df_to_gdf(df, src_crs="EPSG:4326")
Visualize the data on the map.
1 m = leafmap.Map()
2 m.add_data(
3 gdf, column='Count', scheme='Quantiles', cmap='Greens', legend_title='Wetland Count'
4 )
5 m
1 m = leafmap.Map()
2 m.add_data(
3 gdf, column='Count', scheme='Quantiles', cmap='Greens', legend_title='Wetland Count'
4 )
5 m
Create a pie chart to show the percentage of wetlands in each state.
1 leafmap.pie_chart(count_df, 'State', 'Count', height=800, title='Number of Wetlands by State')
1 leafmap.pie_chart(count_df, 'State', 'Count', height=800, title='Number of Wetlands by State')
Create a bar chart to show the number of wetlands in each state.
1 leafmap.bar_chart(count_df, 'State', 'Count', title='Number of Wetlands by State')
1 leafmap.bar_chart(count_df, 'State', 'Count', title='Number of Wetlands by State')
Calculate the total area of wetlands in the United States. It takes about 3 minutes to run this query. Please be patient.
1 con.sql(f"""
2 SELECT SUM(Shape_Area) / 1000000 AS Area_SqKm
3 FROM 's3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/*.parquet'
4 """)
1 con.sql(f"""
2 SELECT SUM(Shape_Area) / 1000000 AS Area_SqKm
3 FROM 's3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/*.parquet'
4 """)
Calculate the total area of wetlands in each state. It takes about 3 minutes to run this query. Please be patient.
1 area_df = con.sql(f"""
2 SELECT SUBSTRING(filename, LENGTH(filename) - 18, 2) AS State, SUM(Shape_Area) / 1000000 AS Area_SqKm
3 FROM read_parquet('s3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/*.parquet', filename=true)
4 GROUP BY State
5 ORDER BY COUNT(*) DESC;
6 """).df()
7 area_df.head(10)
1 area_df = con.sql(f"""
2 SELECT SUBSTRING(filename, LENGTH(filename) - 18, 2) AS State, SUM(Shape_Area) / 1000000 AS Area_SqKm
3 FROM read_parquet('s3://us-west-2.opendata.source.coop/giswqs/nwi/wetlands/*.parquet', filename=true)
4 GROUP BY State
5 ORDER BY COUNT(*) DESC;
6 """).df()
7 area_df.head(10)
Create a pie chart to show the percentage of wetlands in each state.
1 leafmap.pie_chart(area_df, 'State', 'Area_SqKm', height=900, title='Wetland Area by State')
1 leafmap.pie_chart(area_df, 'State', 'Area_SqKm', height=900, title='Wetland Area by State')
Create a bar chart to show the wetland area in each state.
1 leafmap.bar_chart(area_df, 'State', 'Area_SqKm', title='Wetland Area by State')
1 leafmap.bar_chart(area_df, 'State', 'Area_SqKm', title='Wetland Area by State')