Quick Start#

Laboratory#

It all starts in the lab…

In [1]: from lair import uataq

In [2]: lab = uataq.laboratory

The laboratory object is a singleton instance of the Laboratory class which is initialized with the UATAQ configuration file. The configuration file is a JSON file which specifies UATAQ site characteristics including name, location, status, research groups collecting data, and installed instruments.

The Laboratory object contains the following attributes:

  • sites : A list of site identifiers.

  • instruments : A list of instrument names.

Research Sites#

The Site object is the primary interface for accessing data from a UATAQ site. Each site has a unique site identifier (SID) that corresponds to a key in the configuration file. The lab is responsible for constructing Site objects from the configuration file, including building the InstrumentEnsemble for each site. The InstrumentEnsemble is a container object that hold different Instrument objects which provide the linkage between a Site and the data files.

In [3]: sites = lab.sites          # list of sites

In [4]: wbb = lab.get_site('wbb')  # site object

The Site object contains the following information as attributes:

  • SID : The site identifier.

  • config : A dictionary containing configuration information for the site from the config file.

  • instruments : An instance of the InstrumentEnsemble class representing the instruments at the site.

  • groups : The research groups that collect data at the site.

  • loggers : The loggers used by research groups that record data at a site.

  • pollutants : The pollutants measured at the site.

There are two primary methods for reading data from a site:

  1. Reading Instrument Data - Data for each instrument at a site is read individually and stored in a dictionary with the instrument name as the key.

  2. Getting Observations - Finalized observations from all instruments at a site are aggregated into a single dataframe.

    Site.read_data() and Site.get_obs() have been wrapped in uataq.read_data() and uataq.get_obs() respectively for convenience with an added SID parameter.

Reading Instrument Data#

Using a Site object we can read the data from each instrument at the site for a specified processing lvl and time range:

In [5]: data = wbb.read_data(instruments='all', lvl='qaqc', time_range='2024')

The data is returned as a dictionary of pandas dataframes, one for each instrument. The dataframes are indexed by time and have columns for each variable:

In [6]: lgr_ugga = data['lgr_ugga']

In [7]: lgr_ugga.head()
Out[7]: 
                      CH4_ppm CH4_ppm_sd   H2O_ppm H2O_ppm_sd   CO2_ppm  ... Fit_Flag                    ID ID_CO2 ID_CH4 QAQC_Flag
Time_UTC                                                                 ...                                                       
2024-01-01 00:00:00   2.39506    0.00255  4986.806   60.87091  476.0776  ...        3  ~atmospher~atmospher  -10.0  -10.0         0
2024-01-01 00:00:10  2.394785   0.002722  4986.794   32.25323    475.82  ...        3  ~atmospher~atmospher  -10.0  -10.0         0
2024-01-01 00:00:20   2.39244   0.001466  4995.927   65.15379   475.643  ...        3  ~atmospher~atmospher  -10.0  -10.0         0
2024-01-01 00:00:30  2.391938   0.002942  4997.648   60.03634   475.499  ...        3  ~atmospher~atmospher  -10.0  -10.0         0
2024-01-01 00:00:39  2.391498   0.002398  5003.465   71.19695  475.6132  ...        3  ~atmospher~atmospher  -10.0  -10.0         0

[5 rows x 25 columns]

Getting Observations#

Or we can only get the finalized observations for a site which aggregates the instruments into a single dataframe:

In [8]: obs = wbb.get_obs(pollutants=['CO2', 'CH4', 'O3', 'NO2', 'NO', 'CO'],
   ...:                 time_range=['2024-02-08', None])
   ...: 

In [9]: obs.head()
Out[9]: 
                        NO_ppb NO2_ppb NOx_ppb   CO_ppb O3_ppb CO2d_ppm_cal CO2d_ppm_raw CH4d_ppm_cal CH4d_ppm_raw
Time_UTC                                                                                                          
2024-02-08 00:00:01.010    NaN     NaN     NaN      NaN   24.6          NaN          NaN          NaN          NaN
2024-02-08 00:00:01.020    NaN     NaN     NaN  529.643    NaN          NaN          NaN          NaN          NaN
2024-02-08 00:00:01.050    2.1    10.0    12.1      NaN    NaN          NaN          NaN          NaN          NaN
2024-02-08 00:00:02.000    NaN     NaN     NaN      NaN    NaN    431.21101     420.7033     2.011395     2.002521
2024-02-08 00:00:03.150    NaN     NaN     NaN      NaN   24.5          NaN          NaN          NaN          NaN

Finalized observations only include data which has passed QAQC (QAQC_Flag >= 0) and that are measurements of the ambient atmosphere (ID == -10). The observations dataframe is indexed by time and aggregates pollutants into a single dataframe. Two formats are available: wide or long. The wide format has columns for each pollutant and the long format has a pollutant column with the pollutant name and a value column with the measurement value.

In [10]: obs_long = wbb.get_obs(pollutants=['CO2', 'CH4', 'O3', 'NO2', 'NO', 'CO'],
   ....:                     time_range=['2024-02-08', None],
   ....:                     format='long')
   ....: 

In [11]: obs_long.head(10)
Out[11]: 
                            pollutant      value
Time_UTC                                        
2024-02-08 00:00:01.010        O3_ppb       24.6
2024-02-08 00:00:01.020        CO_ppb    529.643
2024-02-08 00:00:01.050       NOx_ppb       12.1
2024-02-08 00:00:01.050       NO2_ppb       10.0
2024-02-08 00:00:01.050        NO_ppb        2.1
2024-02-08 00:00:02.000  CO2d_ppm_raw   420.7033
2024-02-08 00:00:02.000  CO2d_ppm_cal  431.21101
2024-02-08 00:00:02.000  CH4d_ppm_cal   2.011395
2024-02-08 00:00:02.000  CH4d_ppm_raw   2.002521
2024-02-08 00:00:03.150        O3_ppb       24.5

Mobile Sites & Observations#

Included as part of UATAQ is the TRAX/eBus project, which collects data from mobile sites. The MobileSite object is a subclass of the Site object. The laboratory determines whether to build a Site or MobileSite object based on the is_mobile attribute in the configuration file.

Mobile sites provide the same functionality as fixed sites, but merge location data with observations when using the get_obs() method and return a geodataframe.

In [12]: trx01 = lab.get_site('TRX01')

In [13]: mobile_data = trx01.get_obs(group='horel', time_range=['2019', '2021'])

In [14]: mobile_data.head()
Out[14]: 
                     PM2.5_ugm3  O3_ppb  ...  Longitude_deg                     geometry
Time_UTC                                 ...                                            
2019-01-01 00:00:00         2.0     NaN  ...    -111.838913  POINT (-111.83891 40.76961)
2019-01-01 00:00:00         NaN    30.4  ...    -111.838913  POINT (-111.83891 40.76961)
2019-01-01 00:00:02         NaN    27.6  ...    -111.838913  POINT (-111.83891 40.76961)
2019-01-01 00:00:02         2.0     NaN  ...    -111.838913  POINT (-111.83891 40.76961)
2019-01-01 00:00:04         NaN    27.2  ...    -111.838913  POINT (-111.83891 40.76961)

[5 rows x 5 columns]

Or in the long format:

In [15]: mobile_data_long = trx01.get_obs(group='horel', time_range=['2019', '2021'], format='long')

In [16]: mobile_data_long.head()
Out[16]: 
                      pollutant  value  ...  Longitude_deg                     geometry
Time_UTC                                ...                                            
2019-01-01 00:00:00  PM2.5_ugm3    2.0  ...    -111.838913  POINT (-111.83891 40.76961)
2019-01-01 00:00:00      O3_ppb   30.4  ...    -111.838913  POINT (-111.83891 40.76961)
2019-01-01 00:00:02      O3_ppb   27.6  ...    -111.838913  POINT (-111.83891 40.76961)
2019-01-01 00:00:02  PM2.5_ugm3    2.0  ...    -111.838913  POINT (-111.83891 40.76961)
2019-01-01 00:00:04      O3_ppb   27.2  ...    -111.838913  POINT (-111.83891 40.76961)

[5 rows x 5 columns]