Quick Start#
Laboratory#
It all starts in the lab…
In [1]: from lair import uataq
In [2]: lab = uataq.laboratory
The laboratory
object is a singleton instance of the Laboratory
class which is initialized with the UATAQ configuration file.
The configuration file is a JSON file which specifies UATAQ site characteristics
including name, location, status, research groups collecting data, and installed instruments.
The Laboratory
object contains the following attributes:
sites : A list of site identifiers.
instruments : A list of instrument names.
Research Sites#
The Site
object is the primary interface for accessing data from a UATAQ site.
Each site has a unique site identifier (SID) that corresponds to a key in the configuration file.
The lab
is responsible for constructing Site
objects from the configuration file,
including building the InstrumentEnsemble
for each site.
The InstrumentEnsemble
is a container object that hold different
Instrument
objects which provide the linkage between a Site
and the data files.
In [3]: sites = lab.sites # list of sites
In [4]: wbb = lab.get_site('wbb') # site object
The Site
object contains the following information as attributes:
SID : The site identifier.
config : A dictionary containing configuration information for the site from the config file.
instruments : An instance of the InstrumentEnsemble class representing the instruments at the site.
groups : The research groups that collect data at the site.
loggers : The loggers used by research groups that record data at a site.
pollutants : The pollutants measured at the site.
There are two primary methods for reading data from a site:
Reading Instrument Data - Data for each instrument at a site is read individually and stored in a dictionary with the instrument name as the key.
Getting Observations - Finalized observations from all instruments at a site are aggregated into a single dataframe.
Site.read_data()
andSite.get_obs()
have been wrapped inuataq.read_data()
anduataq.get_obs()
respectively for convenience with an added SID parameter.
Reading Instrument Data#
Using a Site
object we can read the data from each instrument
at the site for a specified processing lvl and time range:
In [5]: data = wbb.read_data(instruments='all', lvl='qaqc', time_range='2024')
The data is returned as a dictionary of pandas dataframes, one for each instrument. The dataframes are indexed by time and have columns for each variable:
In [6]: lgr_ugga = data['lgr_ugga']
In [7]: lgr_ugga.head()
Out[7]:
CH4_ppm CH4_ppm_sd H2O_ppm H2O_ppm_sd CO2_ppm ... Fit_Flag ID ID_CO2 ID_CH4 QAQC_Flag
Time_UTC ...
2024-01-01 00:00:00 2.39506 0.00255 4986.806 60.87091 476.0776 ... 3 ~atmospher~atmospher -10.0 -10.0 0
2024-01-01 00:00:10 2.394785 0.002722 4986.794 32.25323 475.82 ... 3 ~atmospher~atmospher -10.0 -10.0 0
2024-01-01 00:00:20 2.39244 0.001466 4995.927 65.15379 475.643 ... 3 ~atmospher~atmospher -10.0 -10.0 0
2024-01-01 00:00:30 2.391938 0.002942 4997.648 60.03634 475.499 ... 3 ~atmospher~atmospher -10.0 -10.0 0
2024-01-01 00:00:39 2.391498 0.002398 5003.465 71.19695 475.6132 ... 3 ~atmospher~atmospher -10.0 -10.0 0
[5 rows x 25 columns]
Getting Observations#
Or we can only get the finalized observations for a site which aggregates the instruments into a single dataframe:
In [8]: obs = wbb.get_obs(pollutants=['CO2', 'CH4', 'O3', 'NO2', 'NO', 'CO'],
...: time_range=['2024-02-08', None])
...:
In [9]: obs.head()
Out[9]:
NO_ppb NO2_ppb NOx_ppb CO_ppb O3_ppb CO2d_ppm_cal CO2d_ppm_raw CH4d_ppm_cal CH4d_ppm_raw
Time_UTC
2024-02-08 00:00:01.010 NaN NaN NaN NaN 24.6 NaN NaN NaN NaN
2024-02-08 00:00:01.020 NaN NaN NaN 529.643 NaN NaN NaN NaN NaN
2024-02-08 00:00:01.050 2.1 10.0 12.1 NaN NaN NaN NaN NaN NaN
2024-02-08 00:00:02.000 NaN NaN NaN NaN NaN 431.21101 420.7033 2.011395 2.002521
2024-02-08 00:00:03.150 NaN NaN NaN NaN 24.5 NaN NaN NaN NaN
Finalized observations only include data which has passed QAQC (QAQC_Flag >= 0
)
and that are measurements of the ambient atmosphere (ID == -10
).
The observations dataframe is indexed by time and aggregates pollutants into a single dataframe.
Two formats are available: wide
or long
.
The wide
format has columns for each pollutant and
the long
format has a pollutant
column with the pollutant name
and a value
column with the measurement value.
In [10]: obs_long = wbb.get_obs(pollutants=['CO2', 'CH4', 'O3', 'NO2', 'NO', 'CO'],
....: time_range=['2024-02-08', None],
....: format='long')
....:
In [11]: obs_long.head(10)
Out[11]:
pollutant value
Time_UTC
2024-02-08 00:00:01.010 O3_ppb 24.6
2024-02-08 00:00:01.020 CO_ppb 529.643
2024-02-08 00:00:01.050 NOx_ppb 12.1
2024-02-08 00:00:01.050 NO2_ppb 10.0
2024-02-08 00:00:01.050 NO_ppb 2.1
2024-02-08 00:00:02.000 CO2d_ppm_raw 420.7033
2024-02-08 00:00:02.000 CO2d_ppm_cal 431.21101
2024-02-08 00:00:02.000 CH4d_ppm_cal 2.011395
2024-02-08 00:00:02.000 CH4d_ppm_raw 2.002521
2024-02-08 00:00:03.150 O3_ppb 24.5
Mobile Sites & Observations#
Included as part of UATAQ is the TRAX/eBus project, which collects data from mobile sites.
The MobileSite
object is a subclass of the Site
object.
The laboratory
determines whether to build a Site
or
MobileSite
object based on the is_mobile
attribute in the configuration file.
Mobile sites provide the same functionality as fixed sites, but merge location
data with observations when using the get_obs()
method
and return a geodataframe.
In [12]: trx01 = lab.get_site('TRX01')
In [13]: mobile_data = trx01.get_obs(group='horel', time_range=['2019', '2021'])
In [14]: mobile_data.head()
Out[14]:
PM2.5_ugm3 O3_ppb ... Longitude_deg geometry
Time_UTC ...
2019-01-01 00:00:00 2.0 NaN ... -111.838913 POINT (-111.83891 40.76961)
2019-01-01 00:00:00 NaN 30.4 ... -111.838913 POINT (-111.83891 40.76961)
2019-01-01 00:00:02 NaN 27.6 ... -111.838913 POINT (-111.83891 40.76961)
2019-01-01 00:00:02 2.0 NaN ... -111.838913 POINT (-111.83891 40.76961)
2019-01-01 00:00:04 NaN 27.2 ... -111.838913 POINT (-111.83891 40.76961)
[5 rows x 5 columns]
Or in the long format:
In [15]: mobile_data_long = trx01.get_obs(group='horel', time_range=['2019', '2021'], format='long')
In [16]: mobile_data_long.head()
Out[16]:
pollutant value ... Longitude_deg geometry
Time_UTC ...
2019-01-01 00:00:00 PM2.5_ugm3 2.0 ... -111.838913 POINT (-111.83891 40.76961)
2019-01-01 00:00:00 O3_ppb 30.4 ... -111.838913 POINT (-111.83891 40.76961)
2019-01-01 00:00:02 O3_ppb 27.6 ... -111.838913 POINT (-111.83891 40.76961)
2019-01-01 00:00:02 PM2.5_ugm3 2.0 ... -111.838913 POINT (-111.83891 40.76961)
2019-01-01 00:00:04 O3_ppb 27.2 ... -111.838913 POINT (-111.83891 40.76961)
[5 rows x 5 columns]