Quick Start#

Laboratory#

It all starts in the lab…

In [1]: from lair import uataq

In [2]: lab = uataq.laboratory

The laboratory object is a singleton instance of the Laboratory class which is initialized with the UATAQ configuration file. The configuration file is a JSON file which specifies UATAQ site characteristics including name, location, status, research groups collecting data, and installed instruments.

The Laboratory object contains the following attributes:

sites : A list of site identifiers.

instruments : A list of instrument names.

Research Sites#

The Site object is the primary interface for accessing data from a UATAQ site. Each site has a unique site identifier (SID) that corresponds to a key in the configuration file. The lab is responsible for constructing Site objects from the configuration file, including building the InstrumentEnsemble for each site. The InstrumentEnsemble is a container object that hold different Instrument objects which provide the linkage between a Site and the data files.

In [3]: sites = lab.sites          # list of sites

In [4]: wbb = lab.get_site('wbb')  # site object

The Site object contains the following information as attributes:

SID : The site identifier.

config : A dictionary containing configuration information for the site from the config file.

instruments : An instance of the InstrumentEnsemble class representing the instruments at the site.

groups : The research groups that collect data at the site.

loggers : The loggers used by research groups that record data at a site.

pollutants : The pollutants measured at the site.

There are two primary methods for reading data from a site:

Reading Instrument Data - Data for each instrument at a site is read individually and stored in a dictionary with the instrument name as the key.
Getting Observations - Finalized observations from all instruments at a site are aggregated into a single dataframe.

Site.read_data() and Site.get_obs() have been wrapped in uataq.read_data() and uataq.get_obs() respectively for convenience with an added SID parameter.

Reading Instrument Data#

Using a Site object we can read the data from each instrument at the site for a specified processing lvl and time range:

In [5]: data = wbb.read_data(instruments='all', lvl='qaqc', time_range='2024')

The data is returned as a dictionary of pandas dataframes, one for each instrument. The dataframes are indexed by time and have columns for each variable:

In [6]: lgr_ugga = data['lgr_ugga']

In [7]: lgr_ugga.head()
Out[7]: 
                      CH4_ppm  CH4_ppm_sd   H2O_ppm  ...  ID_CO2  ID_CH4  QAQC_Flag
Time_UTC                                             ...                           
2024-01-01 00:00:00  2.395060    0.002550  4986.806  ...   -10.0   -10.0          0
2024-01-01 00:00:10  2.394785    0.002722  4986.794  ...   -10.0   -10.0          0
2024-01-01 00:00:20  2.392440    0.001466  4995.927  ...   -10.0   -10.0          0
2024-01-01 00:00:30  2.391938    0.002942  4997.648  ...   -10.0   -10.0          0
2024-01-01 00:00:39  2.391498    0.002398  5003.465  ...   -10.0   -10.0          0

[5 rows x 25 columns]

Getting Observations#

Or we can only get the finalized observations for a site which aggregates the instruments into a single dataframe:

In [8]: obs = wbb.get_obs(pollutants=['CO2', 'CH4', 'O3', 'NO2', 'NO', 'CO'],
   ...:                 time_range=['2024-02-08', None])
   ...: 

In [9]: obs.head()
Out[9]: 
                         CO2d_ppm_cal  CO2d_ppm_raw  ...  NOx_ppb  O3_ppb
Time_UTC                                             ...                 
2024-02-08 00:00:01.000           NaN           NaN  ...      NaN    24.6
2024-02-08 00:00:01.010           NaN           NaN  ...      NaN     NaN
2024-02-08 00:00:01.040           NaN           NaN  ...     12.1     NaN
2024-02-08 00:00:02.000     431.21101      420.7033  ...      NaN     NaN
2024-02-08 00:00:03.140           NaN           NaN  ...      NaN    24.5

[5 rows x 9 columns]

Finalized observations only include data which has passed QAQC (QAQC_Flag >= 0) and that are measurements of the ambient atmosphere (ID == -10). The observations dataframe is indexed by time and aggregates pollutants into a single dataframe. Two formats are available: wide or long. The wide format has columns for each pollutant and the long format has a pollutant column with the pollutant name and a value column with the measurement value.

In [10]: obs_long = wbb.get_obs(pollutants=['CO2', 'CH4', 'O3', 'NO2', 'NO', 'CO'],
   ....:                     time_range=['2024-02-08', None],
   ....:                     format='long')
   ....: 

In [11]: obs_long.head(10)
Out[11]: 
                            pollutant       value
Time_UTC                                         
2024-02-08 00:00:01.000        O3_ppb   24.600000
2024-02-08 00:00:01.010        CO_ppb  529.643000
2024-02-08 00:00:01.040        NO_ppb    2.100000
2024-02-08 00:00:01.040       NO2_ppb   10.000000
2024-02-08 00:00:01.040       NOx_ppb   12.100000
2024-02-08 00:00:02.000  CO2d_ppm_cal  431.211010
2024-02-08 00:00:02.000  CH4d_ppm_cal    2.011395
2024-02-08 00:00:02.000  CO2d_ppm_raw  420.703300
2024-02-08 00:00:02.000  CH4d_ppm_raw    2.002521
2024-02-08 00:00:03.140        O3_ppb   24.500000

Mobile Sites & Observations#

Included as part of UATAQ is the TRAX/eBus project, which collects data from mobile sites. The MobileSite object is a subclass of the Site object. The laboratory determines whether to build a Site or MobileSite object based on the is_mobile attribute in the configuration file.

Mobile sites provide the same functionality as fixed sites, but merge location data with observations when using the get_obs() method and return a geodataframe.

In [12]: trx01 = lab.get_site('TRX01')

In [13]: mobile_data = trx01.get_obs(group='horel', time_range=['2019', '2021'])

In [14]: mobile_data.head()
Out[14]: 
                     PM2.5_ugm3  ...                     geometry
Time_UTC                         ...                             
2019-01-01 00:00:00         2.0  ...  POINT (-111.83891 40.76961)
2019-01-01 00:00:00         NaN  ...  POINT (-111.83891 40.76961)
2019-01-01 00:00:02         NaN  ...  POINT (-111.83891 40.76961)
2019-01-01 00:00:02         2.0  ...  POINT (-111.83891 40.76961)
2019-01-01 00:00:04         NaN  ...  POINT (-111.83891 40.76961)

[5 rows x 5 columns]

Or in the long format:

In [15]: mobile_data_long = trx01.get_obs(group='horel', time_range=['2019', '2021'], format='long')

In [16]: mobile_data_long.head()
Out[16]: 
                      pollutant  ...                     geometry
Time_UTC                         ...                             
2019-01-01 00:00:00  PM2.5_ugm3  ...  POINT (-111.83891 40.76961)
2019-01-01 00:00:00      O3_ppb  ...  POINT (-111.83891 40.76961)
2019-01-01 00:00:02      O3_ppb  ...  POINT (-111.83891 40.76961)
2019-01-01 00:00:02  PM2.5_ugm3  ...  POINT (-111.83891 40.76961)
2019-01-01 00:00:04      O3_ppb  ...  POINT (-111.83891 40.76961)

[5 rows x 5 columns]