File System#

The UATAQ subpackage operates on the idea that instrument data is collected differently by each research group, despite potentially being from the same site or instrument. Furthermore, the format in which data is written, and therefore read, is dependent on the system that logged the data rather than the instrument itself. To address this, UATAQ introduces a filesystem module which consists of GroupSpace and DataFile objects. The GroupSpace objects provide group-specific methods for working with data from that group, while the DataFile objects handle the actual parsing of the data files.

Group Spaces#

Each research group has its own module in groupspaces where group-specific code is stored. Each group module must contain a subclass of GroupSpace and define DataFile subclasses for each file format that the group uses.

All GroupSpace objects are stored in the groups dictionary with the group name as the key.

The default research group can be changed at runtime via uataq.filesystem.DEFAULT_GROUP or permanently (until the next update) changed in the the filesystem.__init__ module:

lair/uataq/filesystem/__init__.py#
14DEFAULT_GROUP: str = 'lin'

horel

John Horel group - MesoWest, TRAX/eBUS, etc.

lin

John Lin group - UUCON, TRAX, etc.

Contents#

lair.uataq.filesystem.DEFAULT_GROUP: str = 'lin'#

Default group to read data from.

lair.uataq.filesystem.groups: dict = {'horel': HorelGroup(), 'lin': LinGroup()}#

Groups dictionary to store GroupSpace objects.

lair.uataq.filesystem.lvls: dict = {'calibrated': 3, 'final': 4, 'qaqc': 2, 'raw': 1}#

Levels of data processing.

class lair.uataq.filesystem.DataFile(path: str)[source]#

Abstract base class for data files.

Attributes

path

(str) The file path.

period

(pd.Period) The period of the data file.

logger

(str) The logger name.

date_slicer

(slice) A slice object to extract the date from the file name.

file_freq

(str) The file frequency.

ext

(str) The file extension.

Methods

parse()

Parse the data file.

__init__(path: str)[source]#

Initialize the DataFile object. Determines the period of the data file from the file name.

Parameters:
pathstr

The file path.

abstract parse() DataFrame[source]#

Parse the data file. Must be implemented by subclasses.

class lair.uataq.filesystem.GroupSpace[source]#

Bases: object

Abstract base class for group spaces.

Attributes

name

(str) The group name.

datafiles

(dict[str, Type[DataFile]]) A dictionary of datafile keys and DataFile classes.

Methods

get_highest_lvl(SID, instrument)

Get the highest data level for a given site and instrument.

get_files(SID, instrument, lvl, logger)

Get list of file paths for a given site, instrument, and level.

get_datafile_key(instrument, lvl, logger)

Get the datafile key based on the instrument, level, and logger.

get_datafile_class(instrument, lvl, logger)

Get the DataFile class based on the instrument, level, and logger.

get_datafiles(SID, instrument, lvl, logger, time_range, pattern)

Returns a list of data files for a given level and time range.

abstract static get_highest_lvl(SID: str, instrument: str) str[source]#

Get the highest data level for a given site and instrument.

Parameters:
SIDstr

The site ID.

instrumentstr

The instrument name.

Returns:
str

The highest data level.

abstract get_files(SID: str, instrument: str, lvl: str, logger: str) list[str][source]#

Get list of file paths for a given site, instrument, and level.

Parameters:
SIDstr

The site ID.

instrumentstr

The instrument name.

lvlstr

The data level.

loggerstr

The logger name.

Returns
list[str]

A list of file paths.

abstract get_datafile_key(instrument: str, lvl: str, logger: str) str[source]#

Get the datafile key based on the instrument, level, and logger.

Parameters:
instrumentstr

The instrument name.

lvlstr

The data level.

loggerstr

The logger name.

Returns:
str

The datafile key.

get_datafile_class(instrument: str, lvl: str, logger: str) Type[DataFile][source]#

Get the DataFile class based on the instrument, level, and logger.

Parameters:
instrumentstr

The instrument name.

lvlstr

The data level.

loggerstr

The logger name.

Returns:
Type[DataFile]

The DataFile class.

get_datafiles(SID: str, instrument: str, lvl: str, logger: str, time_range: TimeRange, pattern: str | None = None) list[DataFile][source]#

Returns a list of data files for a given level and time range.

Parameters:
SIDstr

The site ID.

instrumentstr

The instrument name.

lvlstr

The data level.

loggerstr

The logger name.

time_rangeTimeRange

The time range to filter by.

patternstr, optional

A string pattern to filter the file paths. Defaults to None.

Returns:
list[DataFile]

A list of DataFile objects.

abstract static standardize_data(instrument: str, data: DataFrame) DataFrame[source]#

Manipulate the data to a standard format between research groups, renaming columns, converting units, mapping values, etc. as needed.

Parameters:
instrumentstr

The instrument model.

datapd.DataFrame

The data to standardize.

Returns:
pd.DataFrame

The standardized data.

lair.uataq.filesystem.filter_datafiles(files: list[DataFile], time_range: TimeRange, pattern: str | None = None) list[DataFile][source]#

Filter a list of files by a given time range.

Parameters:
fileslist

A list of DataFile objects.

time_rangeTimeRange

A TimeRange object representing the time range to filter by.

patternstr, optional

A string pattern to filter the file paths. Defaults to None.

Returns:
list[DataFile]

A list of DataFile objects that match the given time range.

lair.uataq.filesystem.parse_datafiles(files: list[DataFile], time_range: TimeRange, num_processes: int | Literal['max'] = 1, driver: Literal['pandas', 'xarray'] = 'pandas')[source]#

Read and parse data files using multiple processes.

Parameters:
fileslist

A list of DataFile objects.

time_rangeTimeRange

A TimeRange object representing the time range to filter by.

num_processesint, optional

The number of processes to use. Defaults to 1.

driverstr, optional

The data driver to use. Defaults to ‘pandas’.

Returns:
pandas.DataFrame

A DataFrame containing the parsed data.