Quick Start#
Laboratory#
It all starts in the lab…
In [1]: from lair import uataq
In [2]: lab = uataq.laboratory
The laboratory
object is a singleton instance of the Laboratory
class which is initialized with the UATAQ configuration file.
The configuration file is a JSON file which specifies UATAQ site characteristics
including name, location, status, research groups collecting data, and installed instruments.
The Laboratory
object contains the following attributes:
sites : A list of site identifiers.
instruments : A list of instrument names.
Research Sites#
The Site
object is the primary interface for accessing data from a UATAQ site.
Each site has a unique site identifier (SID) that corresponds to a key in the configuration file.
The lab
is responsible for constructing Site
objects from the configuration file,
including building the InstrumentEnsemble
for each site.
The InstrumentEnsemble
is a container object that hold different
Instrument
objects which provide the linkage between a Site
and the data files.
In [3]: sites = lab.sites # list of sites
In [4]: wbb = lab.get_site('wbb') # site object
The Site
object contains the following information as attributes:
SID : The site identifier.
config : A dictionary containing configuration information for the site from the config file.
instruments : An instance of the InstrumentEnsemble class representing the instruments at the site.
groups : The research groups that collect data at the site.
loggers : The loggers used by research groups that record data at a site.
pollutants : The pollutants measured at the site.
There are two primary methods for reading data from a site:
Reading Instrument Data - Data for each instrument at a site is read individually and stored in a dictionary with the instrument name as the key.
Getting Observations - Finalized observations from all instruments at a site are aggregated into a single dataframe.
Site.read_data()
andSite.get_obs()
have been wrapped inuataq.read_data()
anduataq.get_obs()
respectively for convenience with an added SID parameter.
Reading Instrument Data#
Using a Site
object we can read the data from each instrument
at the site for a specified processing lvl and time range:
In [5]: data = wbb.read_data(instruments='all', lvl='qaqc', time_range='2024')
The data is returned as a dictionary of pandas dataframes, one for each instrument. The dataframes are indexed by time and have columns for each variable:
In [6]: lgr_ugga = data['lgr_ugga']
In [7]: lgr_ugga.head()
Out[7]:
CH4_ppm CH4_ppm_sd H2O_ppm ... ID_CO2 ID_CH4 QAQC_Flag
Time_UTC ...
2024-01-01 00:00:00 2.395060 0.002550 4986.806 ... -10.0 -10.0 0
2024-01-01 00:00:10 2.394785 0.002722 4986.794 ... -10.0 -10.0 0
2024-01-01 00:00:20 2.392440 0.001466 4995.927 ... -10.0 -10.0 0
2024-01-01 00:00:30 2.391938 0.002942 4997.648 ... -10.0 -10.0 0
2024-01-01 00:00:39 2.391498 0.002398 5003.465 ... -10.0 -10.0 0
[5 rows x 25 columns]
Getting Observations#
Or we can only get the finalized observations for a site which aggregates the instruments into a single dataframe:
In [8]: obs = wbb.get_obs(pollutants=['CO2', 'CH4', 'O3', 'NO2', 'NO', 'CO'],
...: time_range=['2024-02-08', None])
...:
In [9]: obs.head()
Out[9]:
CO2d_ppm_cal CO2d_ppm_raw ... NOx_ppb O3_ppb
Time_UTC ...
2024-02-08 00:00:01.000 NaN NaN ... NaN 24.6
2024-02-08 00:00:01.010 NaN NaN ... NaN NaN
2024-02-08 00:00:01.040 NaN NaN ... 12.1 NaN
2024-02-08 00:00:02.000 431.21101 420.7033 ... NaN NaN
2024-02-08 00:00:03.140 NaN NaN ... NaN 24.5
[5 rows x 9 columns]
Finalized observations only include data which has passed QAQC (QAQC_Flag >= 0
)
and that are measurements of the ambient atmosphere (ID == -10
).
The observations dataframe is indexed by time and aggregates pollutants into a single dataframe.
Two formats are available: wide
or long
.
The wide
format has columns for each pollutant and
the long
format has a pollutant
column with the pollutant name
and a value
column with the measurement value.
In [10]: obs_long = wbb.get_obs(pollutants=['CO2', 'CH4', 'O3', 'NO2', 'NO', 'CO'],
....: time_range=['2024-02-08', None],
....: format='long')
....:
In [11]: obs_long.head(10)
Out[11]:
pollutant value
Time_UTC
2024-02-08 00:00:01.000 O3_ppb 24.600000
2024-02-08 00:00:01.010 CO_ppb 529.643000
2024-02-08 00:00:01.040 NO_ppb 2.100000
2024-02-08 00:00:01.040 NO2_ppb 10.000000
2024-02-08 00:00:01.040 NOx_ppb 12.100000
2024-02-08 00:00:02.000 CO2d_ppm_cal 431.211010
2024-02-08 00:00:02.000 CH4d_ppm_cal 2.011395
2024-02-08 00:00:02.000 CO2d_ppm_raw 420.703300
2024-02-08 00:00:02.000 CH4d_ppm_raw 2.002521
2024-02-08 00:00:03.140 O3_ppb 24.500000
Mobile Sites & Observations#
Included as part of UATAQ is the TRAX/eBus project, which collects data from mobile sites.
The MobileSite
object is a subclass of the Site
object.
The laboratory
determines whether to build a Site
or
MobileSite
object based on the is_mobile
attribute in the configuration file.
Mobile sites provide the same functionality as fixed sites, but merge location
data with observations when using the get_obs()
method
and return a geodataframe.
In [12]: trx01 = lab.get_site('TRX01')
In [13]: mobile_data = trx01.get_obs(group='horel', time_range=['2019', '2021'])
In [14]: mobile_data.head()
Out[14]:
PM2.5_ugm3 ... geometry
Time_UTC ...
2019-01-01 00:00:00 2.0 ... POINT (-111.83891 40.76961)
2019-01-01 00:00:00 NaN ... POINT (-111.83891 40.76961)
2019-01-01 00:00:02 NaN ... POINT (-111.83891 40.76961)
2019-01-01 00:00:02 2.0 ... POINT (-111.83891 40.76961)
2019-01-01 00:00:04 NaN ... POINT (-111.83891 40.76961)
[5 rows x 5 columns]
Or in the long format:
In [15]: mobile_data_long = trx01.get_obs(group='horel', time_range=['2019', '2021'], format='long')
In [16]: mobile_data_long.head()
Out[16]:
pollutant ... geometry
Time_UTC ...
2019-01-01 00:00:00 PM2.5_ugm3 ... POINT (-111.83891 40.76961)
2019-01-01 00:00:00 O3_ppb ... POINT (-111.83891 40.76961)
2019-01-01 00:00:02 O3_ppb ... POINT (-111.83891 40.76961)
2019-01-01 00:00:02 PM2.5_ugm3 ... POINT (-111.83891 40.76961)
2019-01-01 00:00:04 O3_ppb ... POINT (-111.83891 40.76961)
[5 rows x 5 columns]