General Functions#
|
Read data from an instrument at a site. |
|
Get observations from a site. |
Functions
- lair.uataq.read_data(SID: str, instruments: Literal['all'] | str | list[str] | tuple[str, ...] | set[str] = 'all', group: str | None = None, lvl: str | None = None, time_range: str | list[str | datetime | None] | tuple[str | datetime | None, str | datetime | None] | slice | None = None, num_processes: int | Literal['max'] = 1, file_pattern: str | None = None) dict[str, DataFrame] [source]#
Read data from an instrument at a site.
- Parameters:
- SIDstr
The site ID.
- instrumentsstr | list[str] | tuple[str] | set[str] | ‘all’
The instrument(s) to read data from.
- groupstr | None
The group name.
- lvlstr | None
The data level.
- time_rangestr | list[Union[str, dt.datetime, None]] | tuple[Union[str, dt.datetime, None], Union[str, dt.datetime, None]] | slice | None
The time range to read data. Default is None which reads all available data.
- num_processesint | ‘max’
The number of processes to use. Default is 1.
- file_patternstr | None
A string pattern to filter the file paths.
- Returns:
- dict[str, pd.DataFrame]
The data.
- lair.uataq.get_obs(SID: str, pollutants: Literal['all'] | str | list[str] | tuple[str, ...] | set[str] = 'all', format: Literal['wide', 'long'] = 'wide', group: str | None = None, time_range: str | list[str | datetime | None] | tuple[str | datetime | None, str | datetime | None] | slice | None = None, num_processes: int | Literal['max'] = 1, file_pattern: str | None = None) DataFrame [source]#
Get observations from a site.
- Parameters:
- SIDstr
The site ID.
- pollutantsstr | list[str] | tuple[str] | set[str] | ‘all’
The pollutant(s) to get observations for.
- format‘wide’ | ‘long’
The format of the data. Default is ‘wide’.
- groupstr | None
The group name.
- time_rangestr | list[Union[str, dt.datetime, None]] | tuple[Union[str, dt.datetime, None], Union[str, dt.datetime, None]] | slice | None
The time range to get observations. Default is None which gets all available data.
- num_processesint | ‘max’
The number of processes to use. Default is 1.
- file_patternstr | None
A string pattern to filter the file paths.
- Returns:
- pd.DataFrame
The observations.
Input Parameters#
SID#
SID
is the site identifier for a UATAQ site.
It is a capitalized string that corresponds to a key in the configuration file.
Instruments#
instruments
is a single instrument name or a list of instrument names.
The instrument name is a string that corresponds to a key in the lab.instruments
list.
Pollutants#
pollutants
is a single pollutant name or a list of pollutant names.
The pollutant name is a capitalized molecule abbreviation.
Research Group#
group
is the research group that collected the data.
It is a string that corresponds to a key in the uataq.filesystem.groups
dictionary.
Processing Level#
lvl
is the processing level of the data.
Available levels are:
raw
: Raw data.qaqc
: QAQC flags applied to data.calibrated
: Calibrated data. (Only available for instruments which receive calibration in post-processing.)final
: Finalized data. (Flagged data and measurements of calibration tanks are dropped.)
Time Range#
time_range
filters the returned data to the specified time range.
There are three primary formats for time_range
:
None
: Returns all available data.Single string in ISO8601 format down to the hour:
The string is interpreted as a range from the start of the string to the start of the next time unit.
Examples:
‘2020’ represents the year 2020.
‘2020-01’ represents January 2020 to February 2020.
‘2020-01-01’ represents January 1st, 2020 to January 2nd, 2020.
‘2020-01-01T12’ represents January 1st, 2020 from 12:00 to 13:00.
- List, tuple, or slice of two datetime-like objects:
Datetime-like objects include datetime objects, Timestamp objects, and strings in ISO8601 format.
The first object is the start of the range and the second object is the end of the range. The range is inclusive of the start and exclusive of the end.
The use of
None
in place of a datetime-like object will set the range to be unbounded in that direction.
Number of Processes#
num_processes
is the number of processes to use when reading data from each instrument. The default is 1.If
num_processes
is set to 1, the data is read serially.Setting
num_processes
to a number greater than 1 will read the data in parallel using the minimum ofnum_processes
and the number of files for an instrument.- Setting
num_processeSs
to ‘max’ will use the minimum of the number of files for an instrument and the number of available CPU cores. Warning: Frequent use of
num_processes='max'
may upset your fellow node users.
- Setting
File Pattern#
file_pattern
is a string that is used to filter the files. The primary use for this parameter is to filter raw lin gps data by nmea sentence type. The default is None
.