General Functions#

read_data(SID[, instruments, group, lvl, ...])

Read data from an instrument at a site.

get_obs(SID[, pollutants, format, group, ...])

Get observations from a site.

Functions

lair.uataq.read_data(SID: str, instruments: Literal['all'] | str | list[str] | tuple[str, ...] | set[str] = 'all', group: str | None = None, lvl: str | None = None, time_range: str | list[str | datetime | None] | tuple[str | datetime | None, str | datetime | None] | slice | None = None, num_processes: int | Literal['max'] = 1, file_pattern: str | None = None) dict[str, DataFrame][source]#

Read data from an instrument at a site.

Parameters:
SIDstr

The site ID.

instrumentsstr | list[str] | tuple[str] | set[str] | ‘all’

The instrument(s) to read data from.

groupstr | None

The group name.

lvlstr | None

The data level.

time_rangestr | list[Union[str, dt.datetime, None]] | tuple[Union[str, dt.datetime, None], Union[str, dt.datetime, None]] | slice | None

The time range to read data. Default is None which reads all available data.

num_processesint | ‘max’

The number of processes to use. Default is 1.

file_patternstr | None

A string pattern to filter the file paths.

Returns:
dict[str, pd.DataFrame]

The data.

lair.uataq.get_obs(SID: str, pollutants: Literal['all'] | str | list[str] | tuple[str, ...] | set[str] = 'all', format: Literal['wide', 'long'] = 'wide', group: str | None = None, time_range: str | list[str | datetime | None] | tuple[str | datetime | None, str | datetime | None] | slice | None = None, num_processes: int | Literal['max'] = 1, file_pattern: str | None = None) DataFrame[source]#

Get observations from a site.

Parameters:
SIDstr

The site ID.

pollutantsstr | list[str] | tuple[str] | set[str] | ‘all’

The pollutant(s) to get observations for.

format‘wide’ | ‘long’

The format of the data. Default is ‘wide’.

groupstr | None

The group name.

time_rangestr | list[Union[str, dt.datetime, None]] | tuple[Union[str, dt.datetime, None], Union[str, dt.datetime, None]] | slice | None

The time range to get observations. Default is None which gets all available data.

num_processesint | ‘max’

The number of processes to use. Default is 1.

file_patternstr | None

A string pattern to filter the file paths.

Returns:
pd.DataFrame

The observations.

Input Parameters#

SID#

SID is the site identifier for a UATAQ site. It is a capitalized string that corresponds to a key in the configuration file.

Instruments#

instruments is a single instrument name or a list of instrument names. The instrument name is a string that corresponds to a key in the lab.instruments list.

Pollutants#

pollutants is a single pollutant name or a list of pollutant names. The pollutant name is a capitalized molecule abbreviation.

Research Group#

group is the research group that collected the data. It is a string that corresponds to a key in the uataq.filesystem.groups dictionary.

Processing Level#

lvl is the processing level of the data. Available levels are:

  • raw : Raw data.

  • qaqc : QAQC flags applied to data.

  • calibrated : Calibrated data. (Only available for instruments which receive calibration in post-processing.)

  • final : Finalized data. (Flagged data and measurements of calibration tanks are dropped.)

Time Range#

time_range filters the returned data to the specified time range.

There are three primary formats for time_range:

  1. None: Returns all available data.

  2. Single string in ISO8601 format down to the hour:

    • The string is interpreted as a range from the start of the string to the start of the next time unit.

    • Examples:

      • ‘2020’ represents the year 2020.

      • ‘2020-01’ represents January 2020 to February 2020.

      • ‘2020-01-01’ represents January 1st, 2020 to January 2nd, 2020.

      • ‘2020-01-01T12’ represents January 1st, 2020 from 12:00 to 13:00.

  3. List, tuple, or slice of two datetime-like objects:
    • Datetime-like objects include datetime objects, Timestamp objects, and strings in ISO8601 format.

    • The first object is the start of the range and the second object is the end of the range. The range is inclusive of the start and exclusive of the end.

    • The use of None in place of a datetime-like object will set the range to be unbounded in that direction.

Number of Processes#

num_processes is the number of processes to use when reading data from each instrument. The default is 1.
  • If num_processes is set to 1, the data is read serially.

  • Setting num_processes to a number greater than 1 will read the data in parallel using the minimum of num_processes and the number of files for an instrument.

  • Setting num_processeSs to ‘max’ will use the minimum of the number of files for an instrument and the number of available CPU cores.

    Warning: Frequent use of num_processes='max' may upset your fellow node users.

File Pattern#

file_pattern is a string that is used to filter the files. The primary use for this parameter is to filter raw lin gps data by nmea sentence type. The default is None.