File System#
The UATAQ subpackage operates on the idea that instrument data is collected
differently by each research group, despite potentially being from the same
site or instrument. Furthermore, the format in which data is written, and
therefore read, is dependent on the system that logged the data rather than the
instrument itself. To address this, UATAQ introduces a filesystem
module
which consists of GroupSpace
and DataFile
objects. The GroupSpace
objects provide group-specific methods for working with data from that group,
while the DataFile
objects handle the actual parsing of the data files.
Group Spaces#
Each research group has its own module in groupspaces
where group-specific code is stored.
Each group module must contain a subclass of GroupSpace
and define DataFile
subclasses for each file format that the group uses.
All GroupSpace
objects are stored in the groups
dictionary with the group name as the key.
The default research group can be changed at runtime via
uataq.filesystem.DEFAULT_GROUP
or permanently (until the next update) changed in the thefilesystem.__init__
module:14DEFAULT_GROUP: str = 'lin'
Contents#
- lair.uataq.filesystem.groups: dict = {'horel': HorelGroup(), 'lin': LinGroup()}#
Groups dictionary to store GroupSpace objects.
- lair.uataq.filesystem.lvls: dict = {'calibrated': 3, 'final': 4, 'qaqc': 2, 'raw': 1}#
Levels of data processing.
- class lair.uataq.filesystem.DataFile(path: str)[source]#
Abstract base class for data files.
Attributes
path
(str) The file path.
period
(pd.Period) The period of the data file.
logger
(str) The logger name.
date_slicer
(slice) A slice object to extract the date from the file name.
file_freq
(str) The file frequency.
ext
(str) The file extension.
Methods
parse()
Parse the data file.
- class lair.uataq.filesystem.GroupSpace[source]#
Bases:
object
Abstract base class for group spaces.
Attributes
name
(str) The group name.
datafiles
(dict[str, Type[DataFile]]) A dictionary of datafile keys and DataFile classes.
Methods
get_highest_lvl(SID, instrument)
Get the highest data level for a given site and instrument.
get_files(SID, instrument, lvl, logger)
Get list of file paths for a given site, instrument, and level.
get_datafile_key(instrument, lvl, logger)
Get the datafile key based on the instrument, level, and logger.
get_datafile_class(instrument, lvl, logger)
Get the DataFile class based on the instrument, level, and logger.
get_datafiles(SID, instrument, lvl, logger, time_range, pattern)
Returns a list of data files for a given level and time range.
- abstract static get_highest_lvl(SID: str, instrument: str) str [source]#
Get the highest data level for a given site and instrument.
- Parameters:
- SIDstr
The site ID.
- instrumentstr
The instrument name.
- Returns:
- str
The highest data level.
- abstract get_files(SID: str, instrument: str, lvl: str, logger: str) list[str] [source]#
Get list of file paths for a given site, instrument, and level.
- Parameters:
- SIDstr
The site ID.
- instrumentstr
The instrument name.
- lvlstr
The data level.
- loggerstr
The logger name.
- Returns
- list[str]
A list of file paths.
- abstract get_datafile_key(instrument: str, lvl: str, logger: str) str [source]#
Get the datafile key based on the instrument, level, and logger.
- Parameters:
- instrumentstr
The instrument name.
- lvlstr
The data level.
- loggerstr
The logger name.
- Returns:
- str
The datafile key.
- get_datafile_class(instrument: str, lvl: str, logger: str) Type[DataFile] [source]#
Get the DataFile class based on the instrument, level, and logger.
- Parameters:
- instrumentstr
The instrument name.
- lvlstr
The data level.
- loggerstr
The logger name.
- Returns:
- Type[DataFile]
The DataFile class.
- get_datafiles(SID: str, instrument: str, lvl: str, logger: str, time_range: TimeRange, pattern: str | None = None) list[DataFile] [source]#
Returns a list of data files for a given level and time range.
- Parameters:
- SIDstr
The site ID.
- instrumentstr
The instrument name.
- lvlstr
The data level.
- loggerstr
The logger name.
- time_rangeTimeRange
The time range to filter by.
- patternstr, optional
A string pattern to filter the file paths. Defaults to None.
- Returns:
- list[DataFile]
A list of DataFile objects.
- abstract static standardize_data(instrument: str, data: DataFrame) DataFrame [source]#
Manipulate the data to a standard format between research groups, renaming columns, converting units, mapping values, etc. as needed.
- Parameters:
- instrumentstr
The instrument model.
- datapd.DataFrame
The data to standardize.
- Returns:
- pd.DataFrame
The standardized data.
- lair.uataq.filesystem.filter_datafiles(files: list[DataFile], time_range: TimeRange, pattern: str | None = None) list[DataFile] [source]#
Filter a list of files by a given time range.
- Parameters:
- fileslist
A list of DataFile objects.
- time_rangeTimeRange
A TimeRange object representing the time range to filter by.
- patternstr, optional
A string pattern to filter the file paths. Defaults to None.
- Returns:
- list[DataFile]
A list of DataFile objects that match the given time range.
- lair.uataq.filesystem.parse_datafiles(files: list[DataFile], time_range: TimeRange, num_processes: int | Literal['max'] = 1, driver: Literal['pandas', 'xarray'] = 'pandas')[source]#
Read and parse data files using multiple processes.
- Parameters:
- fileslist
A list of DataFile objects.
- time_rangeTimeRange
A TimeRange object representing the time range to filter by.
- num_processesint, optional
The number of processes to use. Defaults to 1.
- driverstr, optional
The data driver to use. Defaults to ‘pandas’.
- Returns:
- pandas.DataFrame
A DataFrame containing the parsed data.