How to use cf-pandas
#
The main use of cf-pandas
currently is for selecting columns of a DataFrame that represent axes or coordinates of the dataset and for selecting a variable from a pandas DataFrame
using the accessor and a custom vocabulary that searches column names for a match to the regular expressions, as well as some other capabilities that have been ported over from cf-xarray
. There are several class and utilities that support this functionality that are used internally but are also helpful for other packages.
import cf_pandas as cfp
import pandas as pd
Get some data#
# Some data
url = "https://files.stage.platforms.axds.co/axiom/netcdf_harvest/basis/2013/BE2013_/data.csv.gz"
df = pd.read_csv(url)
df
time | longitude | latitude | z | profile | temperature | pressure | salinity | chlorophyll_a | conductivity | distance | segment | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2013-08-07T22:26:00 | -168.01784 | 65.409500 | 0.0 | 0 | 10.3291 | 0.0 | 30.7286 | NaN | NaN | 0.00 | 0 |
1 | 2013-08-07T22:26:00 | -168.01784 | 65.409500 | 66.0 | 0 | NaN | NaN | NaN | NaN | NaN | 0.00 | 0 |
2 | 2013-08-07T22:26:00 | -168.01784 | 65.409500 | 65.0 | 0 | NaN | NaN | NaN | NaN | NaN | 0.00 | 0 |
3 | 2013-08-07T22:26:00 | -168.01784 | 65.409500 | 64.0 | 0 | NaN | NaN | NaN | NaN | NaN | 0.00 | 0 |
4 | 2013-08-07T22:26:00 | -168.01784 | 65.409500 | 63.0 | 0 | NaN | NaN | NaN | NaN | NaN | 0.00 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
12735 | 2013-09-24T22:59:00 | -168.01384 | 60.516167 | 25.0 | 139 | NaN | NaN | NaN | NaN | NaN | 15575752.91 | 0 |
12736 | 2013-09-24T22:59:00 | -168.01384 | 60.516167 | 24.0 | 139 | NaN | NaN | NaN | NaN | NaN | 15575752.91 | 0 |
12737 | 2013-09-24T22:59:00 | -168.01384 | 60.516167 | 23.0 | 139 | NaN | NaN | NaN | NaN | NaN | 15575752.91 | 0 |
12738 | 2013-09-24T22:59:00 | -168.01384 | 60.516167 | 32.0 | 139 | NaN | NaN | NaN | NaN | NaN | 15575752.91 | 0 |
12739 | 2013-09-24T22:59:00 | -168.01384 | 60.516167 | 90.0 | 139 | NaN | NaN | NaN | NaN | NaN | 15575752.91 | 0 |
12740 rows × 12 columns
Basic accessor usage#
The terminology all comes from cf-xarray
which deals with multi-dimensional data and has more layers of standardized attributes. This package ports over useful functionality, retaining some of the complexity of terminology and syntax from cf-xarray
which doesn’t always apply. The perspective is to be able to think about and use DataFrames of data in a similar manner to Datasets of data/model output.
When you use the cf-pandas
accessor it will first validate that columns representing time, latitude, and longitude are present and identifiable (by validating the object).
Using an approach copied directly from cf-xarray
, cf-pandas
contains a mapping of names from the CF conventions that define the axes (“T”, “Z”, “Y”, “X”) and coordinates (“time”, “vertical”, “latitude”, “longitude”). These are built in and used to identify columns containing axes and coordinates using name matching (column names are split by white space for the comparison).
Check axes and coordinates mappings of the dataset:
df.cf.axes, df.cf.coordinates
({'Z': ['z'], 'T': ['time']},
{'longitude': ['longitude'], 'latitude': ['latitude'], 'time': ['time']})
Check all available keys:
df.cf.keys()
{'T', 'Z', 'latitude', 'longitude', 'time'}
Is a certain key in the DataFrame?
"T" in df.cf, "X" in df.cf
(True, False)
What CF standard names can be identified with strict matching in the column names? Column names will be split by white space for this comparison.
df.cf.standard_names
{'latitude': ['latitude'], 'longitude': ['longitude'], 'time': ['time']}
Select variable#
Selecting a variable typically requires knowing the name of the column representing the variable. What is demonstrated here is an approach to selecting a column name containing the variable using regular expression matching. In this case, the user defines the regular expression matching that will be used to identify matches to a variable. There are helper functions for this process available in cf-pandas
; see the Reg
, Vocab
, and widget
classes and below for more information.
Create custom vocabulary#
More information about custom vocabularies and using the Vocab
class here: https://cf-pandas.readthedocs.io/en/latest/demo_vocab.html
You can make regular expressions for your vocabulary by hand or use the Reg
class in cf-pandas
to do so.
# initialize class
vocab = cfp.Vocab()
# define a regular expression to represent your variable
reg = cfp.Reg(include="salinity", exclude="soil", exclude_end="_qc")
# Make an entry to add to your vocabulary
vocab.make_entry("salt", reg.pattern(), attr="standard_name")
# Add another entry to vocab
vocab.make_entry("temp", "temp")
vocab
{'salt': {'standard_name': '(?i)^(?!.*(soil))(?!.*(_qc)$)(?=.*salinity)'}, 'temp': {'standard_name': 'temp'}}
Access variable#
Refer to the column of data you want by the nickname described in your custom vocabulary.
You can do this with a context manager, especially if you are using more than one vocabulary:
with cfp.set_options(custom_criteria=vocab.vocab):
print(df.cf["salt"])
0 30.7286
1 NaN
2 NaN
3 NaN
4 NaN
...
12735 NaN
12736 NaN
12737 NaN
12738 NaN
12739 NaN
Name: salinity, Length: 12740, dtype: float64
Or you can set one for use generally in this kernel:
cfp.set_options(custom_criteria=vocab.vocab)
df.cf["salt"]
0 30.7286
1 NaN
2 NaN
3 NaN
4 NaN
...
12735 NaN
12736 NaN
12737 NaN
12738 NaN
12739 NaN
Name: salinity, Length: 12740, dtype: float64
Display mapping of all variables in the dataset that can be identified using the custom criteria/vocab we defined above:
df.cf.custom_keys
{'salt': ['salinity'], 'temp': ['temperature']}
Other utilities#
Access all CF Standard Names#
sn = cfp.standard_names()
sn[:5]
['acoustic_signal_roundtrip_travel_time_in_sea_water',
'aerodynamic_particle_diameter',
'aerodynamic_resistance',
'age_of_sea_ice',
'age_of_stratospheric_air']
Use vocabulary to match any list#
This is the logic under the hood of the cf-pandas
accessor that selects what column matches a variable nickname according to the custom vocabulary. This comes from cf-xarray
almost exactly. It is available as a separate function because it is useful to use in other scenarios too. Here we filter the standard names just found by our custom vocabulary from above.
cfp.match_criteria_key(sn, "salt", vocab.vocab)
['sea_water_practical_salinity_at_sea_floor',
'tendency_of_sea_water_salinity',
'sea_water_absolute_salinity',
'tendency_of_sea_water_salinity_expressed_as_salt_content',
'change_over_time_in_sea_water_preformed_salinity',
'tendency_of_sea_water_salinity_due_to_vertical_mixing',
'tendency_of_sea_water_salinity_due_to_sea_ice_thermodynamics',
'sea_water_salinity',
'tendency_of_sea_water_salinity_expressed_as_salt_content_due_to_parameterized_submesoscale_eddy_advection',
'square_of_sea_surface_salinity',
'sea_water_cox_salinity',
'integral_wrt_depth_of_product_of_salinity_and_sea_water_density',
'sea_water_practical_salinity',
'tendency_of_sea_water_salinity_expressed_as_salt_content_due_to_parameterized_eddy_advection',
'tendency_of_sea_water_salinity_due_to_horizontal_mixing',
'tendency_of_sea_water_salinity_expressed_as_salt_content_due_to_parameterized_mesoscale_eddy_advection',
'integral_wrt_depth_of_sea_water_practical_salinity',
'tendency_of_sea_water_salinity_expressed_as_salt_content_due_to_parameterized_mesoscale_eddy_diffusion',
'sea_surface_salinity',
'change_over_time_in_sea_water_absolute_salinity',
'tendency_of_sea_water_salinity_due_to_parameterized_eddy_advection',
'ratio_of_sea_water_practical_salinity_anomaly_to_relaxation_timescale',
'tendency_of_sea_water_salinity_expressed_as_salt_content_due_to_parameterized_dianeutral_mixing',
'product_of_eastward_sea_water_velocity_and_salinity',
'product_of_northward_sea_water_velocity_and_salinity',
'tendency_of_sea_water_salinity_expressed_as_salt_content_due_to_residual_mean_advection',
'sea_water_salinity_at_sea_floor',
'tendency_of_sea_water_salinity_due_to_advection',
'sea_water_reference_salinity',
'change_over_time_in_sea_water_practical_salinity',
'sea_water_knudsen_salinity',
'sea_water_preformed_salinity',
'change_over_time_in_sea_water_salinity',
'sea_ice_salinity']