Horel Group#

John Horel group - MesoWest, TRAX/eBUS, etc.

This module contains classes and functions for working with the Horel group data in the CHPC UATAQ filesystem.

Module Attributes

HOREL_DIR

Horel group directory

UUTRAX_DIR

UUTRAX directory

UUTRAX_PILOT_DIR

UUTRAX pilot directory

PILOT_PHASE

Pilot phase time ranges for UUTRAX data

lvl_data_dirs

Data levels for UUTRAX data

column_mapping

Horel to UATAQ column mapping

Classes

HorelCSVFile(path, instrument)

Class for parsing CSV files from the Horel group.

HorelCSVFinalizedFile(path, instrument)

Class for parsing finalized CSV

HorelFile(path, instrument)

Abstract base class for Horel data files.

HorelGroup()

A class representing the Horel group data space in the CHPC UATAQ filesystem.

HorelH5File(path, instrument)

Class for parsing H5 files from the Horel group.

lair.uataq.filesystem.groupspaces.horel.HOREL_DIR: str = '/uufs/chpc.utah.edu/common/home/horel-group'#

Horel group directory

lair.uataq.filesystem.groupspaces.horel.UUTRAX_DIR: str = '/uufs/chpc.utah.edu/common/home/horel-group/uutrax'#

UUTRAX directory

lair.uataq.filesystem.groupspaces.horel.UUTRAX_PILOT_DIR: str = '/uufs/chpc.utah.edu/common/home/horel-group/uutrax_pilot'#

UUTRAX pilot directory

lair.uataq.filesystem.groupspaces.horel.PILOT_PHASE: dict[str, TimeRange] = {'TRX01': TimeRange(start=2014-11-11 00:00:00, stop=2018-11-19 20:03:58), 'TRX02': TimeRange(start=2016-02-04 00:00:00, stop=2018-11-19 18:53:52)}#

Pilot phase time ranges for UUTRAX data

lair.uataq.filesystem.groupspaces.horel.lvl_data_dirs: dict[str, list] = {'final': ['/uufs/chpc.utah.edu/common/home/horel-group/uutrax'], 'qaqc': ['/uufs/chpc.utah.edu/common/home/horel-group/uutrax'], 'raw': ['/uufs/chpc.utah.edu/common/home/horel-group/uutrax_pilot', '/uufs/chpc.utah.edu/common/home/horel-group/uutrax']}#

Data levels for UUTRAX data

lair.uataq.filesystem.groupspaces.horel.column_mapping: dict[str, dict[str, str]] = {'2b_205': {'2B_Air_Flow_Rate': 'Flow_Lpm', '2B_Internal_Air_Pressure': 'Internal_P_hPa', '2B_Internal_Air_Temperature': 'Internal_T_C', '2B_Ozone_Concentration': 'O3_ppb', 'FL2B': 'Flow_Lpm', 'OZNE': 'O3_ppb', 'Ozone_Data_Flagged': 'QAQC_Flag', 'PS2B': 'Internal_P_hPa', 'TC2B': 'Internal_T_C'}, '2b_405': {'2B405_Air_Flow_Rate': 'Flow_Lpm', '2B405_Cell_O3_Flow_Rate': 'O3_Flow_mLpm', '2B405_Internal_Air_Pressure': 'Internal_P_hPa', '2B405_Internal_Air_Temperature': 'Internal_T_C', '2B405_NO2_Concentration': 'NO2_ppb', '2B405_NOX_Concentration': 'NOx_ppb', '2B405_NO_Concentration': 'NO_ppb', 'FLNO': 'Flow_Lpm', 'FO3N': 'O3_Flow_mLpm', 'NO1C': 'NO_ppb', 'NO2C': 'NO2_ppb', 'NOXC': 'NOx_ppb', 'PSNO': 'Internal_P_hPa', 'TCNO': 'Internal_T_C'}, 'cr1000': {'Battery_Voltage': 'Battery_Voltage_V', 'Bus_Box_Temperature': 'Logger_T_C', 'Bus_Top_Relative_Humidity': 'Ambient_RH_pct', 'Bus_Top_Temperature': 'Ambient_T_C', 'TICC': 'Logger_T_C', 'TRNR': 'Ambient_RH_pct', 'TRNT': 'Ambient_T_C', 'Train_Box_Temperature': 'Logger_T_C', 'Train_Top_Relative_Humidity': 'Ambient_RH_pct', 'Train_Top_Temperature': 'Ambient_T_C', 'VOLT': 'Battery_Voltage_V'}, 'gps': {'Elevation': 'Altitude_msl', 'GELV': 'Altitude_msl', 'GLAT': 'Latitude_deg', 'GLON': 'Longitude_deg', 'GPS_Data_Flagged': 'QAQC_Flag', 'GPS_Direction': 'Course_deg', 'GPS_RMC_Valid': 'Status', 'GPS_Speed': 'Speed_kt', 'GTIM': 'Instrument_Time', 'Latitude': 'Latitude_deg', 'Longitude': 'Longitude_deg', 'NSAT': 'N_Satellites', 'RDIR': 'Course_deg', 'RSPD': 'Speed_kt', 'RSTS': 'Status'}, 'metone_es405': {'ERRR': 'Status', 'ES405_Air_Flow_Rate': 'Flow_Lpm', 'ES405_Error_Code': 'Status', 'ES405_Internal_Air_Pressure': 'Internal_P_hPa', 'ES405_Internal_Air_Temperature': 'Internal_T_C', 'ES405_Internal_Relative_Humidity': 'Internal_RH_pct', 'ES405_PM10_Concentration': 'PM10_ugm3', 'ES405_PM1_Concentration': 'PM1_ugm3', 'ES405_PM2.5_Concentration': 'PM2.5_ugm3', 'ES405_PM4_Concentration': 'PM4_ugm3', 'FLOW': 'Flow_Lpm', 'INRH': 'Internal_RH_pct', 'ITMP': 'Internal_T_F', 'PM01': 'PM1_ugm3', 'PM04': 'PM4_ugm3', 'PM10': 'PM10_ugm3', 'PM2.5_Data_Flagged': 'QAQC_Flag', 'PM25': 'PM2.5_ugm3', 'PRES': 'Internal_P_hpa'}, 'metone_es642': {'ERRR': 'Status', 'ES642_Air_Flow_Rate': 'Flow_Lpm', 'ES642_Error_Code': 'Status', 'ES642_Internal_Air_Pressure': 'Ambient_P_hPa', 'ES642_Internal_Air_Temperature': 'Ambient_T_C', 'ES642_Internal_Relative_Humidity': 'Internal_RH_pct', 'ES642_PM2.5_Concentration': 'PM2.5_ugm3', 'FLOW': 'Flow_Lpm', 'INRH': 'Internal_RH_pct', 'ITMP': 'Ambient_T_F', 'PM2.5_Data_Flagged': 'QAQC_Flag', 'PM25': 'PM2.5_ugm3', 'PRES': 'Ambient_P_hpa'}}#

Horel to UATAQ column mapping

class lair.uataq.filesystem.groupspaces.horel.HorelFile(path: str, instrument: str)[source]#

Abstract base class for Horel data files.

Attributes

path

(str) The file path.

period

(pd.Period) The period of the data file.

logger

(str) The logger name.

date_slicer

(slice) A slice object to extract the date from the file name.

file_freq

(str) The file frequency.

ext

(str) The file extension.

time_col

(str) The time column name.

instrument

(str) The instrument name.

Methods

usecols(col)

Check if a column should be used based on the instrument.

convert_nodata(data, nodata=-9999.0)

Convert NoData values to NaN.

coerce_numeric(data, exclude=’Time_UTC’)

Coerce columns to numeric.

__init__(path: str, instrument: str)[source]#

Initialize a HorelFile subclass object.

The instrument parameter is used to filter columns based on the instrument name.

Parameters:
pathstr

The file path.

instrumentstr

The instrument name - used to filter columns.

usecols(col: str) bool[source]#

Check if a column should be used based on the instrument.

Parameters:
colstr

The column name.

Returns:
bool

True if the column should be used, False otherwise.

format_time(data: DataFrame, **kwargs) DataFrame[source]#

Format the time column in the data DataFrame.

Parameters:
datapd.DataFrame

The data DataFrame.

**kwargsdict

Additional keyword arguments to pass to pd.to_datetime.

Returns:
pd.DataFrame

The data DataFrame with the time column formatted as Time_UTC.

static convert_nodata(data: DataFrame, nodata: float = -9999.0) DataFrame[source]#

Convert NoData values to NaN.

Parameters:
datapd.DataFrame

The data DataFrame.

nodatafloat

The NoData value.

Returns:
pd.DataFrame

The data DataFrame with NoData values converted to NaN.

static coerce_numeric(data, exclude: str | list[str] = 'Time_UTC') DataFrame[source]#

Coerce columns to numeric.

Parameters:
datapd.DataFrame

The data DataFrame.

excludestr | Sequence[str]

Columns to exclude from coercion.

Returns:
pd.DataFrame

The data DataFrame with columns coerced to numeric.

class lair.uataq.filesystem.groupspaces.horel.HorelH5File(path: str, instrument: str)[source]#

Class for parsing H5 files from the Horel group.

Attributes

path

(str) The file path.

period

(pd.Period) The period of the data file.

logger

(str) The logger name.

date_slicer

(slice) A slice object to extract the date from the file name.

file_freq

(str) The file frequency.

ext

(str) The file extension.

time_col

(str) The time column name.

instrument

(str) The instrument name.

Methods

usecols(col)

Check if a column should be used based on the instrument.

convert_nodata(data, nodata=-9999.0)

Convert NoData values to NaN.

coerce_numeric(data, exclude=’Time_UTC’)

Coerce columns to numeric.

parse() DataFrame[source]#

Parse the H5 file and return a DataFrame.

Returns:
pd.DataFrame

A DataFrame containing the parsed data.

class lair.uataq.filesystem.groupspaces.horel.HorelCSVFile(path: str, instrument: str)[source]#

Class for parsing CSV files from the Horel group.

Attributes

path

(str) The file path.

period

(pd.Period) The period of the data file.

logger

(str) The logger name.

date_slicer

(slice) A slice object to extract the date from the file name.

file_freq

(str) The file frequency.

ext

(str) The file extension.

time_col

(str) The time column name.

instrument

(str) The instrument name.

Methods

usecols(col)

Check if a column should be used based on the instrument.

convert_nodata(data, nodata=-9999.0)

Convert NoData values to NaN.

coerce_numeric(data, exclude=’Time_UTC’)

Coerce columns to numeric.

parse()

Parse the CSV file and return a DataFrame.

parse() DataFrame[source]#

Parse the CSV file and return a DataFrame.

Returns:
pd.DataFrame

A DataFrame containing the parsed data.

class lair.uataq.filesystem.groupspaces.horel.HorelCSVFinalizedFile(path: str, instrument: str)[source]#

Class for parsing finalized CSV

Attributes

path

(str) The file path.

period

(pd.Period) The period of the data file.

logger

(str) The logger name.

date_slicer

(slice) A slice object to extract the date from the file name.

file_freq

(str) The file frequency.

ext

(str) The file extension.

time_col

(str) The time column name.

final_patterns

(list[str]) A list of patterns to filter columns.

instrument

(str) The instrument name.

Methods

usecols(col)

Check if a column should be used based on the instrument.

convert_nodata(data, nodata=-9999.0)

Convert NoData values to NaN.

coerce_numeric(data, exclude=’Time_UTC’)

Coerce columns to numeric.

parse()

Parse the CSV file, finalize the data, and return a DataFrame.

parse() DataFrame[source]#

Parse the CSV file, finalize the data, and return a DataFrame.

Returns:
pd.DataFrame

A DataFrame containing the finalized data.

class lair.uataq.filesystem.groupspaces.horel.HorelGroup[source]#

A class representing the Horel group data space in the CHPC UATAQ filesystem.

Attributes

name

(str) The group name.

datafiles

(dict[str, Type[DataFile]]) A dictionary mapping datafile keys to DataFile classes.

Methods

get_highest_lvl(SID, instrument)

Get the highest data level for a given site and instrument.

get_files(SID, instrument, lvl, logger)

Get list of file paths for a given site, instrument, and level.

get_datafile_key(instrument, lvl, logger)

Get the datafile key based on the instrument, level, and logger.

get_datafiles(SID, instrument, lvl, logger, time_range, pattern=None)

Returns a list of data files for a given level and time range.

static get_highest_lvl(SID: str, instrument: str) str[source]#

Get the highest data level for a given site and instrument.

Parameters:
SIDstr

The site ID.

instrumentstr

The instrument name.

Returns:
str

The highest data level.

get_files(SID: str, instrument: str, lvl: str, logger: str = 'campbellsci') List[str][source]#

Get list of file paths for a given site, instrument, and level.

Parameters:
SIDstr

The site ID.

instrumentstr

The instrument name.

lvlstr

The data level.

loggerstr

The logger name.

Returns
list[str]

A list of file paths.

get_datafile_key(instrument: str, lvl: str, logger: str) str[source]#

Get the datafile key based on the instrument, level, and logger.

Parameters:
instrumentstr

The instrument name.

lvlstr

The data level.

loggerstr

The logger name.

Returns:
str

The datafile key.

get_datafiles(SID: str, instrument: str, lvl: str, logger: str, time_range: TimeRange, pattern: str | None = None) list[DataFile][source]#

Returns a list of data files for a given level and time range. Extends DataFile.get_datafiles by supplying the instrument name to the DataFile subclass.

Parameters:
SIDstr

The site ID.

instrumentstr

The instrument name.

lvlstr

The data level.

loggerstr

The logger name.

time_rangeTimeRange

The time range.

patternstr | None

The pattern to match file names.

Returns:
list[DataFile]

A list of data files.

static standardize_data(instrument: str, data: DataFrame) DataFrame[source]#

Manipulate the data to a standard format between research groups, renaming columns, converting units, mapping values, etc. as needed.

Parameters:
instrumentstr

The instrument model.

datapd.DataFrame

The data to standardize.

Returns:
pd.DataFrame

The standardized data.