Quick introduction

First we need to instantiate the erddapy server object.

In [1]:
from erddapy import ERDDAP


e = ERDDAP(
    server='https://data.ioos.us/gliders/erddap',
    protocol='tabledap',
    response='csv',
)

Now we can populate the object with constraints, the variables of interest and the dataset id.

In [2]:
e.dataset_id = 'whoi_406-20160902T1700'

e.constraints = {
    'time>=': '2016-07-10T00:00:00Z',
    'time<=': '2017-02-10T00:00:00Z',
    'latitude>=': 38.0,
    'latitude<=': 41.0,
    'longitude>=': -72.0,
    'longitude<=': -69.0,
}

e.variables = [
    'depth',
    'latitude',
    'longitude',
    'salinity',
    'temperature',
    'time',
]


url = e.get_download_url()

print(url)
https://data.ioos.us/gliders/erddap/tabledap/whoi_406-20160902T1700.csv?depth,latitude,longitude,salinity,temperature,time&time>=1468108800.0&time<=1486684800.0&latitude>=38.0&latitude<=41.0&longitude>=-72.0&longitude<=-69.0
In [3]:
import pandas as pd


df = e.to_pandas(
    index_col='time (UTC)',
    parse_dates=True,
).dropna()

df.head()
Out[3]:
depth latitude longitude salinity temperature
time
2016-09-03 20:15:46 5.35 40.990881 -71.12439 32.245422 20.6620
2016-09-03 20:15:46 6.09 40.990881 -71.12439 32.223183 20.6512
2016-09-03 20:15:46 6.72 40.990881 -71.12439 32.237950 20.6047
2016-09-03 20:15:46 7.37 40.990881 -71.12439 32.235470 20.5843
2016-09-03 20:15:46 8.43 40.990881 -71.12439 32.224503 20.5691
In [4]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

fig, ax = plt.subplots(figsize=(17, 2))
cs = ax.scatter(df.index, df['depth (m)'], s=15,
                c=df['temperature (Celcius)'], marker='o', edgecolor='none')

ax.invert_yaxis()
ax.set_xlim(df.index[0], df.index[-1])
xfmt = mdates.DateFormatter('%H:%Mh\n%d-%b')
ax.xaxis.set_major_formatter(xfmt)

cbar = fig.colorbar(cs, orientation='vertical', extend='both')
cbar.ax.set_ylabel('Temperature ($^\circ$C)')
ax.set_ylabel('Depth (m)');
_images/quick_intro_6_0.png

Longer introduction

First we need to instantiate the ERDDAP URL constructor for a server. In this example we will use https://data.ioos.us/gliders/erddap.

In [5]:
from erddapy import ERDDAP


e = ERDDAP(server='https://data.ioos.us/gliders/erddap')

What are the methods/attributes available?

In [6]:
[method for method in dir(e) if not method.startswith('_')]
Out[6]:
['constraints',
 'dataset_id',
 'get_download_url',
 'get_info_url',
 'get_search_url',
 'get_var_by_attr',
 'params',
 'protocol',
 'requests_kwargs',
 'response',
 'server',
 'to_pandas',
 'to_xarray',
 'variables']

All the get_<methods> will return a valid ERDDAP URL for the requested response and options. erddapy will raise an error is if URL HEADER cannot be validated.

In [7]:
print(e.get_search_url(search_for='all'))
https://data.ioos.us/gliders/erddap/search/advanced.html?page=1&itemsPerPage=1000&protocol=(ANY)&cdm_data_type=(ANY)&institution=(ANY)&ioos_category=(ANY)&keywords=(ANY)&long_name=(ANY)&standard_name=(ANY)&variableName=(ANY)&minLon=(ANY)&maxLon=(ANY)&minLat=(ANY)&maxLat=(ANY)&minTime=(ANY)&maxTime=(ANY)&searchFor=all

There are many responses available, see the docs for griddap and tabledap respectively. The most useful ones for Pythonistas are the .csv and .nc that can be read with pandas and netCDF4-python respectively.

Let’s load the csv reponse directly with pandas.

In [8]:
import pandas as pd


df = pd.read_csv(e.get_search_url(response='csv', search_for='all'))
In [9]:
'We have {} tabledap, {} griddap, and {} wms endpoints.'.format(
    len(set(df['tabledap'].dropna())),
    len(set(df['griddap'].dropna())),
    len(set(df['wms'].dropna()))
)
Out[9]:
'We have 434 tabledap, 0 griddap, and 0 wms endpoints.'

We can refine our search by providing some constraints.

In [10]:
def show_iframe(src):
    """Helper function to show HTML returns."""
    from IPython.display import HTML
    iframe = '<iframe src="{src}" width="100%" height="950"></iframe>'.format
    return HTML(iframe(src=src))

Let’s narrow the search area, time span, and look for sea_water_temperature only.

In [11]:
kw = {
    'standard_name': 'sea_water_temperature',
    'min_lon': -72.0,
    'max_lon': -69.0,
    'min_lat': 38.0,
    'max_lat': 41.0,
    'min_time': '2016-07-10T00:00:00Z',
    'max_time': '2017-02-10T00:00:00Z',
    'cdm_data_type': 'trajectoryprofile'
}

search_url = e.get_search_url(response='html', **kw)

show_iframe(search_url)
Out[11]:

We can see that the search form above was correctly populated with the constraints we provided.

Let us change the response from .html to .csv, so we load it as a pandas.DataFrame, and inspect what are the Dataset IDs available for download.

In [12]:
search_url = e.get_search_url(response='csv', **kw)
search = pd.read_csv(search_url)
gliders = search['Dataset ID'].values

print('Found {} Glider Datasets:\n{}'.format(len(gliders), '\n'.join(gliders)))
Found 17 Glider Datasets:
allrutgersGliders
blue-20160818T1448
cp_335-20170116T1459
cp_336-20161011T0027
cp_336-20170116T1254
cp_340-20160809T0230
cp_374-20160529T0035
cp_374-20161011T0106
cp_376-20160527T2050
cp_379-20170116T1246
cp_380-20161011T2046
cp_387-20160404T1858
cp_388-20160809T1409
cp_389-20161011T2040
silbo-20160413T1534
sp022-20170209T1616
whoi_406-20160902T1700

Now that we know the Dataset IDs we can explore their metadata with the get_info_url method.

In [13]:
info_url = e.get_info_url(dataset_id=gliders[0], response='html')

show_iframe(src=info_url)
Out[13]:

Again, with the csv response, we can manipulate the metadata and find the variables that have the cdm_profile_variables attribute.

In [14]:
info_url = e.get_info_url(dataset_id=gliders[0], response='csv')

info = pd.read_csv(info_url)

info.head()
Out[14]:
Row Type Variable Name Attribute Name Data Type Value
0 attribute NC_GLOBAL acknowledgment String This deployment supported by National Science ...
1 attribute NC_GLOBAL cdm_data_type String TrajectoryProfile
2 attribute NC_GLOBAL cdm_profile_variables String profile_id, time, latitude, longitude, time_uv...
3 attribute NC_GLOBAL cdm_trajectory_variables String trajectory, wmo_id
4 attribute NC_GLOBAL comment String Glider operated by the Rutgers University Cent...
In [15]:
''.join(info.loc[info['Attribute Name'] == 'cdm_profile_variables', 'Value'])
Out[15]:
'profile_id, time, latitude, longitude, time_uv, lat_uv, lon_uv, u, v'

Selecting variables by theirs attributes is such a common operation that erddapy brings its own method to simplify this task.

The get_var_by_attr method is inspired by netCDF4-python’s get_variables_by_attributes however, because erddapy is operating on remote serves, it will return the variable names instead of the actual variables.

Here we check what is/are the variable(s) associated with the standard_name used in the search.

Note that get_var_by_attr caches the last response in case the user needs to make multiple requests, but it will loose its state when a new request is made.

(See the execution times below.)

In [16]:
%%time

# First one, slow.
e.get_var_by_attr(
    dataset_id='whoi_406-20160902T1700',
    standard_name='sea_water_temperature'
)
CPU times: user 127 ms, sys: 6.14 ms, total: 133 ms
Wall time: 1.79 s
Out[16]:
['temperature']
In [17]:
%%time

# Second one on the same glider, a little bit faster.
e.get_var_by_attr(
    dataset_id='whoi_406-20160902T1700',
    standard_name='sea_water_practical_salinity'
)
CPU times: user 25.3 ms, sys: 3.07 ms, total: 28.4 ms
Wall time: 697 ms
Out[17]:
['salinity']
In [18]:
%%time

# New one, slow.
e.get_var_by_attr(
    dataset_id='cp_336-20170116T1254',
    standard_name='sea_water_practical_salinity'
)
CPU times: user 137 ms, sys: 5.3 ms, total: 142 ms
Wall time: 1.64 s
Out[18]:
['salinity']

With Python it is easy to loop over all the dataset_ids for the variables with standard_names

In [19]:
variables = [
    e.get_var_by_attr(
        dataset_id=glider,
        standard_name=lambda v: v is not None
    )
    for glider in gliders
]

We can construct a set with the common variables in those dataset_ids.

In [20]:
common_variables = set(variables[0]).intersection(*variables[1:])

common_variables
Out[20]:
{'conductivity',
 'conductivity_qc',
 'density',
 'density_qc',
 'depth',
 'depth_qc',
 'lat_uv',
 'lat_uv_qc',
 'latitude',
 'latitude_qc',
 'lon_uv',
 'lon_uv_qc',
 'longitude',
 'longitude_qc',
 'precise_lat',
 'precise_lon',
 'precise_time',
 'precise_time_qc',
 'pressure',
 'pressure_qc',
 'salinity',
 'salinity_qc',
 'temperature',
 'temperature_qc',
 'time',
 'time_qc',
 'time_uv',
 'time_uv_qc',
 'u',
 'u_qc',
 'v',
 'v_qc'}

Last, but not least, the download endpoint!

It is important to note that the download constraints are based on the variables names and not the standardized ones for the get_search_url method.

In [21]:
constraints = {
    'longitude>=': kw['min_lon'],
    'longitude<=': kw['max_lon'],
    'latitude>=': kw['min_lat'],
    'latitude<=': kw['max_lat'],
    'time>=': kw['min_time'],
    'time<=': kw['max_time'],
}



download_url = e.get_download_url(
    dataset_id=gliders[0],
    protocol='tabledap',
    variables=common_variables,
    constraints=constraints
)

print(download_url)
https://data.ioos.us/gliders/erddap/tabledap/allrutgersGliders.html?conductivity,latitude,density_qc,u_qc,temperature_qc,lon_uv_qc,depth,precise_time,longitude,salinity,lat_uv_qc,time_uv,time,salinity_qc,depth_qc,longitude_qc,precise_time_qc,time_uv_qc,v_qc,latitude_qc,pressure_qc,precise_lat,lon_uv,pressure,precise_lon,v,conductivity_qc,density,time_qc,lat_uv,u,temperature&longitude>=-72.0&longitude<=-69.0&latitude>=38.0&latitude<=41.0&time>=1468108800.0&time<=1486684800.0

Putting everything in DataFrames.

In [22]:
from requests.exceptions import HTTPError


def download_csv(url):
    return pd.read_csv(
        url,
        index_col='time (UTC)',
        parse_dates=True,
    )


dfs = {}
for glider in gliders:
    try:
        download_url = e.get_download_url(
            dataset_id=glider,
            protocol='tabledap',
            variables=common_variables,
            response='csv',
            constraints=constraints
        )
    except HTTPError:
        print('Failed to download {}'.format(glider))
        continue
    dfs.update({glider: download_csv(download_url)})
Failed to download allrutgersGliders
Failed to download silbo-20160413T1534
Failed to download sp022-20170209T1616

The glider datasets should be masked automatically but we found that is not true. The cell below applies the mask as described by the data QC flag.

In [23]:
import numpy as np


for glider in dfs.keys():
    dfs[glider].loc[dfs[glider]['salinity_qc'] == 9, 'salinity'] = np.NaN
    dfs[glider].loc[dfs[glider]['pressure_qc'] == 9, 'pressure'] = np.NaN
    dfs[glider].loc[dfs[glider]['temperature_qc'] == 9, 'temperature'] = np.NaN
    dfs[glider].loc[dfs[glider]['salinity'] <= 0, 'salinity'] = np.NaN

Finally let’s see some figures!

In [24]:
%matplotlib inline
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter


def make_map(extent):
    fig, ax = plt.subplots(
        figsize=(9, 9),
        subplot_kw=dict(projection=ccrs.PlateCarree())
    )
    ax.coastlines(resolution='10m')
    ax.set_extent(extent)

    ax.set_xticks([extent[0], extent[1]], crs=ccrs.PlateCarree())
    ax.set_yticks([extent[2], extent[3]], crs=ccrs.PlateCarree())
    lon_formatter = LongitudeFormatter(zero_direction_label=True)
    lat_formatter = LatitudeFormatter()
    ax.xaxis.set_major_formatter(lon_formatter)
    ax.yaxis.set_major_formatter(lat_formatter)

    return fig, ax


dx = dy = 0.5
extent = kw['min_lon']-dx, kw['max_lon']+dx, kw['min_lat']+dy, kw['max_lat']+dy

fig, ax = make_map(extent)
for glider, df in dfs.items():
    ax.plot(df['longitude'], df['latitude'], label=glider)

leg = ax.legend()
_images/quick_intro_42_0.png
In [25]:
def glider_scatter(df, ax, glider):
    ax.scatter(df['temperature'], df['salinity'],
               s=10, alpha=0.5, label=glider)

fig, ax = plt.subplots(figsize=(9, 9))
ax.set_ylabel('salinity')
ax.set_xlabel('temperature')
ax.grid(True)

for glider, df in dfs.items():
    glider_scatter(df, ax, glider)

leg = ax.legend()
_images/quick_intro_43_0.png

Extras

OPeNDAP response

In [26]:
e.constraints = None
e.protocol = 'tabledap'

opendap_url = e.get_download_url(
    dataset_id='whoi_406-20160902T1700',
    response='opendap',
)

print(opendap_url)
https://data.ioos.us/gliders/erddap/tabledap/whoi_406-20160902T1700
In [27]:
from netCDF4 import Dataset


with Dataset(opendap_url) as nc:
    print(nc.summary)
Slocum glider dataset gathered as part of the TEMPESTS (The Experiment to Measure and Predict East coast STorm Strength), funded by NOAA through CINAR (Cooperative Institute for the North Atlantic Region).

netCDF “file-like” to xarray

open_dataset will download a temporary file, so be careful with the constraints to avoid downloading several gigabytes!

In [28]:
e.dataset_id = 'cp_336-20170116T1254'
e.response = 'nc'
e.variables = common_variables
e.constraints = constraints

download_url = e.get_download_url()
In [29]:
import requests


def humansize(nbytes):
    suffixes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']
    k = 0
    while nbytes >= 1024 and k < len(suffixes)-1:
        nbytes /= 1024.
        k += 1
    f = ('%.2f' % nbytes).rstrip('0').rstrip('.')
    return '%s %s' % (f, suffixes[k])

r = requests.head(download_url)
nbytes = float(r.headers['Content-Length'])
humansize(nbytes)
Out[29]:
'604.09 KB'

That is the uncompressed size, we are OK because the download will be less than that, ERDDAP streams gzip’ed data.

In [30]:
r.headers['Content-Encoding']
Out[30]:
'gzip'
In [31]:
ds = e.to_xarray(decode_times=False)

ds
Out[31]:
<xarray.Dataset>
Dimensions:          (row: 16232)
Coordinates:
    latitude         (row) float64 ...
    depth            (row) float32 ...
    longitude        (row) float64 ...
    time_uv          (row) float64 ...
    time             (row) float64 ...
    lon_uv           (row) float64 ...
    lat_uv           (row) float64 ...
Dimensions without coordinates: row
Data variables:
    conductivity     (row) float32 ...
    density_qc       (row) float32 ...
    u_qc             (row) float32 ...
    temperature_qc   (row) float32 ...
    lon_uv_qc        (row) float32 ...
    precise_time     (row) float64 ...
    salinity         (row) float32 ...
    lat_uv_qc        (row) float32 ...
    salinity_qc      (row) float32 ...
    depth_qc         (row) float32 ...
    longitude_qc     (row) float32 ...
    precise_time_qc  (row) float32 ...
    time_uv_qc       (row) float32 ...
    v_qc             (row) float32 ...
    latitude_qc      (row) float32 ...
    pressure_qc      (row) float32 ...
    precise_lat      (row) float64 ...
    pressure         (row) float64 ...
    precise_lon      (row) float64 ...
    v                (row) float64 ...
    conductivity_qc  (row) float32 ...
    density          (row) float32 ...
    time_qc          (row) float32 ...
    u                (row) float64 ...
    temperature      (row) float32 ...
Attributes:
    acknowledgement:               Funding provided by the National Science F...
    cdm_data_type:                 TrajectoryProfile
    cdm_profile_variables:         profile_id, time, latitude, longitude, tim...
    cdm_trajectory_variables:      trajectory, wmo_id
    contributor_name:              Paul Matthias,Peter Brickley,Sheri White,D...
    contributor_role:              CGSN Program Manager,CGSN Operations Engin...
    Conventions:                   Unidata Dataset Discovery v1.0, COARDS, CF...
    creator_email:                 kerfoot@marine.rutgers.edu
    creator_name:                  John Kerfoot
    creator_url:                   http://rucool.marine.rutgers.edu
    date_created:                  2017-04-19T14:33:41Z
    date_issued:                   2017-04-19T14:33:41Z
    date_modified:                 2017-04-19T14:33:41Z
    deployment_number:             4
    Easternmost_Easting:           -69.98303682074565
    featureType:                   TrajectoryProfile
    format_version:                https://github.com/ioos/ioosngdac/tree/mas...
    geospatial_lat_max:            39.91726417227544
    geospatial_lat_min:            39.32370673037986
    geospatial_lat_units:          degrees_north
    geospatial_lon_max:            -69.98303682074565
    geospatial_lon_min:            -71.18259602604894
    geospatial_lon_units:          degrees_east
    geospatial_vertical_max:       976.756
    geospatial_vertical_min:       -0.03969577
    geospatial_vertical_positive:  down
    geospatial_vertical_units:     m
    history:                       2017-04-19T14:33:35Z: Data Source /Users/k...
    id:                            cp_336-20170116T1254_e698_69e3_e309
    infoUrl:                       http://data.ioos.us/gliders/erddap/
    institution:                   Ocean Observatories Initiative
    ioos_dac_checksum:             f42b729c0bf19af1b7229b21350ebaaf
    ioos_dac_completed:            False
    keywords:                      AUVS > Autonomous Underwater Vehicles, Oce...
    keywords_vocabulary:           GCMD Science Keywords
    license:                       All OOI data including data from OOI core ...
    Metadata_Conventions:          Unidata Dataset Discovery v1.0, COARDS, CF...
    metadata_link:                 http://ooi.visualocean.net/sites/view/CP05...
    naming_authority:              org.oceanobservatories
    Northernmost_Northing:         39.91726417227544
    platform_type:                 Slocum Glider
    processing_level:              Contains any/all of the following: L0 Data...
    project:                       Ocean Observatories Initiative
    publisher_email:               kerfoot@marine.rutgers.edu
    publisher_name:                John Kerfoot
    publisher_url:                 http://rucool.marine.rutgers.edu
    references:                    http://oceanobservatories.org/
    sea_name:                      Mid-Atlantic Bight
    source:                        Observational data from a profiling glider
    sourceUrl:                     (local files)
    Southernmost_Northing:         39.32370673037986
    standard_name_vocabulary:      CF Standard Name Table v27
    subsetVariables:               trajectory, wmo_id, profile_id, time, lati...
    summary:                       The Pioneer Array is located off the coast...
    time_coverage_end:             2017-02-09T23:03:25Z
    time_coverage_start:           2017-01-16T13:03:04Z
    title:                         cp_336-20170116T1254
    Westernmost_Easting:           -71.18259602604894
In [32]:
ds['temperature']
Out[32]:
<xarray.DataArray 'temperature' (row: 16232)>
array([14.3976, 14.4236, 14.4596, ...,  4.4004,  4.3975,  4.3978],
      dtype=float32)
Coordinates:
    latitude   (row) float64 ...
    depth      (row) float32 ...
    longitude  (row) float64 ...
    time_uv    (row) float64 ...
    time       (row) float64 ...
    lon_uv     (row) float64 ...
    lat_uv     (row) float64 ...
Dimensions without coordinates: row
Attributes:
    _ChunkSizes:          1
    actual_range:         [ 0.     17.2652]
    ancillary_variables:  temperature_qc
    colorBarMaximum:      32.0
    colorBarMinimum:      0.0
    instrument:           instrument_ctd
    ioos_category:        Temperature
    long_name:            Sea Water Temperature
    observation_type:     measured
    platform:             platform
    source_variable:      sci_water_temp
    standard_name:        sea_water_temperature
    units:                degree_Celsius
    valid_max:            40.0
    valid_min:            -5.0
In [33]:
data = ds['temperature'].values
depth = ds['depth'].values

mask = ~np.ma.masked_invalid(data).mask
In [34]:
data = data[mask]
depth = depth[mask]
lon = ds['longitude'].values[mask]
lat = ds['latitude'].values[mask]
In [35]:
import warnings


with warnings.catch_warnings():
    warnings.simplefilter('ignore')
    mask = depth <= 5

data = data[mask]
depth = depth[mask]
lon = lon[mask]
lat = lat[mask]
In [36]:
%matplotlib inline
import matplotlib.pyplot as plt
import cartopy.crs as ccrs


dx = dy = 1.5
extent = (
    ds.geospatial_lon_min-dx, ds.geospatial_lon_max+dx,
    ds.geospatial_lat_min-dy, ds.geospatial_lat_max+dy
)
fig, ax = make_map(extent)

cs = ax.scatter(lon, lat, c=data, s=50, alpha=0.5, edgecolor='none')
cbar = fig.colorbar(cs, orientation='vertical',
                    fraction=0.1, shrink=0.9, extend='both')
ax.coastlines('10m');
_images/quick_intro_57_0.png