api_24sea.datasignals.core#

The core module for the api_24sea.datasignals subpackage

Classes#

`API`	Accessor for working with data signals coming from the 24SEA API.
`AsyncAPI`	Async version of the API class. Get data from 24sea API /datasignals

Functions#

`to_category_value`(→ pandas.DataFrame)	Categorize the data based on the metrics overview.
`to_star_schema`(→ Optional[Union[Dict[str, Any], ...)	Transforms the data and metrics_overview into a star schema format for

Module Contents#

class API#

Accessor for working with data signals coming from the 24SEA API.

Get the metrics names for a site, provided the following parameters.

Parameters#

siteOptional[str]

The site name. If None, the queryable metrics for all sites will be returned, and the locations and metrics parameters will be ignored.

locationsOptional[Union[str, List[str]]]

The locations for which to get the metrics. If None, all locations will be considered.

metricsOptional[Union[str, List[str]]]

The metrics to get. They can be specified as regular expressions. If None, all metrics will be considered.

For example:

metrics=["^ACC", "^DEM"] will return all the metrics that

start with ACC or DEM,
Similarly, metrics=["windspeed$", "winddirection$"] will | return all the metrics that end with windspeed and | winddirection,
and metrics=[".*WF_A01.*",".*WF_A02.*"] will return all | metrics that contain WF_A01 or WF_A02.

Returns#

Optional[List[Dict[str, Optional[str]]]]: The metrics names for the given site, locations and metrics.

Note

This class method is legacy because it does not add functionality to the DataSignals pandas accessor.

put_data(data: Any, site: str | None = None, location: str | None = None, on_conflict: str = 'replace', include_cyclecount: bool = True, profile: bool = False, headers: Dict[str, str] | None = None, timeout: int = 3600, max_retries: int = 0) → Dict[str, Any]#

Insert or fully replace DataSignals rows.

Parameters#

dataMapping, Sequence[Mapping], or pandas.DataFrame: Rows containing timestamp, site, location, and one or more metric fields.
sitestr, optional: Default site for rows that omit the field.
locationstr, optional: Default location for rows that omit the field.
on_conflict{“replace”, “nothing”}, default “replace”: Replace an existing metrics document or leave the row unchanged.
include_cyclecountbool, default True: Whether metrics beginning with CC_ should be written.
profilebool, default False: Whether the backend should include write timing information.
headersdict, optional: HTTP headers. Defaults to accepting JSON.
timeoutint, default 3600: Request timeout in seconds.
max_retriesint, default 0: Maximum retries after an HTTP 502 response.

Returns#

dict: Backend write result with processed and affected row counts.

Notes#

on_conflict="replace" replaces the entire existing metrics document. Metric keys omitted from the request are removed.

patch_data(data: Any, site: str | None = None, location: str | None = None, on_conflict: str = 'replace', include_cyclecount: bool = True, profile: bool = False, headers: Dict[str, str] | None = None, timeout: int = 3600, max_retries: int = 0) → Dict[str, Any]#

Insert rows or merge supplied metric keys into existing rows.

Parameters are equivalent to put_data(). With on_conflict="replace", incoming metric values win while unrelated existing metric keys remain unchanged. With "nothing", existing conflicting keys remain unchanged.

Get the data signals from the 24SEA API.

Parameters#

sitesOptional[Union[List, str]]: The site name or List of site names. If None, the site will be inferred from the metrics.
locationsOptional[Union[List, str]]: The location name or List of location names. If None, the location will be inferred from the metrics.
metricsUnion[List, str]: The metric name or List of metric names. It must be provided. They do not have to be the entire metric name, but can be a part of it. For example, if the metric name is "mean_WF_A01_windspeed", the user can equivalently provide sites="wf", locations="a01", metric="mean windspeed".
start_timestampUnion[str, datetime.datetime]: The start timestamp for the query. It must be in ISO 8601 format, e.g., "2021-01-01T00:00:00Z" or a datetime object.
end_timestampUnion[str, datetime.datetime]: The end timestamp for the query. It must be in ISO 8601 format, e.g., "2021-01-01T00:00:00Z" or a datetime object.
as_dictbool, optional: If True, the data will be returned as a list of dictionaries. Default is False.
as_star_schemabool, optional: If True, the data will be returned in a star schema format. Default is False.
outer_join_on_timestampbool: If False, the data will be returned as a block-diagonal DataFrame, and it will contain the site and location columns. Besides, the timestamp column will not contain unique values since it will be repeated for each site and location. If False, the data will be returned as a full DataFrame, it will not contain the site and location columns, and the timestamp column will contain unique values.
headersOptional[Union[Dict[str, str]]], optional: The headers to pass to the request. If None, the default headers will be used as {"accept": "application/json"}. Default is None.
datapd.DataFrame: The DataFrame to update with the data signals. If None, a new DataFrame will be created. Default is None.
timeoutint, optional: The timeout for the request in seconds. Default is 3600.
threadsint, optional: The number of threads to use for the request. Default is the number of CPU cores. If None, it will be set to the number of CPU cores.
location: Optional[Union[List, str]]: The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the locations parameter instead.
force_cache_missbool, optional: Whether to force a cache miss on the backend data endpoint. Default is False.
methodstr, optional: Retrieval method, either "GET" or "POST". Default is "POST".

Returns#

Union[pd.DataFrame, Dict[str, Dict[str, pd.DataFrame]]]

The DataFrame containing the data signals, or
A dictionary containing the data signals divided by location, or
A dictionary containing the data signals in star schema format.

Get the metrics statistics (MAX, MIN, AVG) for the specified time range.

Parameters#

sites: Optional[Union[List, str]]: The sites to filter the data.
locations: Optional[Union[List, str]]: The locations to filter the data.
metrics: Union[List, str]: The metrics to retrieve.
start_timestamp: Union[str, datetime.datetime]: The start timestamp for the data retrieval.
end_timestamp: Union[str, datetime.datetime]: The end timestamp for the data retrieval.
as_dict: bool: Whether to return the data as a dictionary.
headers: Optional[Union[Dict[str, str]]]: Headers to include in the request.
timeout: int: The timeout for the request.
threads: Optional[int]: The number of threads to use for the request.
location: Optional[Union[List, str]]: The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the locations parameter instead.
methodstr, optional: HTTP method to use for the backend request. Default is "GET".

Returns#

Union[DataFrame, Dict[str, Union[Dict[str, DataFrame], Dict[str, Any]]]]: The retrieved data.

Get the list of timestamps which the selected metrics have null values in the specified time range.

Parameters#

sites: Optional[Union[List, str]]: The sites to filter the data.
locations: Optional[Union[List, str]]: The locations to filter the data.
metrics: Union[List, str]: The metrics to retrieve.
start_timestamp: Union[str, datetime.datetime]: The start timestamp for the data retrieval.
end_timestamp: Union[str, datetime.datetime]: The end timestamp for the data retrieval.
as_dict: bool: Whether to return the data as a dictionary.
headers: Optional[Union[Dict[str, str]]]: Headers to include in the request.
timeout: int: The timeout for the request.
threads: Optional[int]: The number of threads to use for the request.
location: Optional[Union[List, str]]: The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the locations parameter instead.
methodstr, optional: HTTP method to use for the backend request. Default is "GET".

Returns#

Union[DataFrame, Dict[str, Union[Dict[str, DataFrame], Dict[str, Any]]]]: The retrieved data.

Get the metrics statistics (MAX, MIN, AVG) for the specified time range.

Parameters#

sites: Optional[Union[List, str]]: The sites to filter the data.
locations: Optional[Union[List, str]]: The locations to filter the data.
metrics: Union[List, str]: The metrics to retrieve.
start_timestamp: Union[str, datetime.datetime]: The start timestamp for the data retrieval.
end_timestamp: Union[str, datetime.datetime]: The end timestamp for the data retrieval.
granularity: Union[str, int]: The granularity of the data, can be a string, or an integer number of seconds. String values are restricted to “day”, “week”, “calendarmonth”, “30days”, or “365days”. If “calendarmonth” is used, the availability will refer to the specific calendar month (e.g. January 2023), and not to a rolling period of 30 days.
sampling_interval_seconds: Optional[int]: The sampling interval in seconds. If None, the default value is used, which is 600 seconds (10 minutes).
as_dict: bool: Whether to return the data as a dictionary.
headers: Optional[Union[Dict[str, str]]]: Headers to include in the request.
timeout: int: The timeout for the request.
threads: Optional[int]: The number of threads to use for the request.
location: Optional[Union[List, str]]: The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the locations parameter instead.
methodstr, optional: HTTP method to use for the backend request. Default is "GET".

Returns#

Union[DataFrame, Dict[str, Union[Dict[str, DataFrame], Dict[str, Any]]]]: The retrieved data.

get_oldest_timestamp(sites: str | List[str], locations: List[str] | str | None, method: str = 'GET') → pandas.DataFrame#: Get oldest timestamp for one or multiple sites (sync).

Run get_stats for predefined intervals:

all_time: (datetime.min -> datetime.max)
last_year: (now-365d -> now)
last_month: (now-30d -> now)

class AsyncAPI#

Async version of the API class. Get data from 24sea API /datasignals asyncronously

async get_metrics_overview() → pandas.DataFrame | None#: Asynchronously get metrics overview, authenticating if needed

async get_metrics(site: str | None = None, locations: str | List[str] | None = None, metrics: str | List[str] | None = None, headers: Dict[str, str] | None = None) → Any#: Get the metrics names for a site asynchronously.

async put_data(data: Any, site: str | None = None, location: str | None = None, on_conflict: str = 'replace', include_cyclecount: bool = True, profile: bool = False, headers: Dict[str, str] | None = None, timeout: int = 1800, max_retries: int = 0) → Dict[str, Any]#

Asynchronously insert or fully replace DataSignals rows.

Parameters are equivalent to API.put_data(). The coroutine returns the backend write result after the write task completes.

async patch_data(data: Any, site: str | None = None, location: str | None = None, on_conflict: str = 'replace', include_cyclecount: bool = True, profile: bool = False, headers: Dict[str, str] | None = None, timeout: int = 1800, max_retries: int = 0) → Dict[str, Any]#

Asynchronously merge metric keys into DataSignals rows.

Parameters are equivalent to API.patch_data(). The coroutine returns the backend write result after the write task completes.

Get the data signals from the 24SEA API asynchronously.

Asynchronous version of API.get_data(), with the same parameters and return type. The only difference is that this method is asynchronous and returns a coroutine, so it must be awaited to get the actual data. Moreover, in case of any request failure, instead of returning a DataFrame with the successfully retrieved data, it returns a list of error messages.

Parameters#

sitesOptional[Union[List, str]]: The site name or List of site names. If None, the site will be inferred from the metrics.
locationsOptional[Union[List, str]]: The location name or List of location names. If None, the location will be inferred from the metrics.
metricsUnion[List, str]: The metric name or List of metric names. It must be provided. They do not have to be the entire metric name, but can be a part of it. For example, if the metric name is "mean_WF_A01_windspeed", the user can equivalently provide sites="wf", locations="a01", metric="mean windspeed".
start_timestampUnion[str, datetime.datetime]: The start timestamp for the query. It must be in ISO 8601 format, e.g., "2021-01-01T00:00:00Z" or a datetime object.
end_timestampUnion[str, datetime.datetime]: The end timestamp for the query. It must be in ISO 8601 format, e.g., "2021-01-01T00:00:00Z" or a datetime object.
as_dictbool, optional: If True, the data will be returned as a list of dictionaries. Default is False.
as_star_schemabool, optional: If True, the data will be returned in a star schema format. Default is False.
outer_join_on_timestampbool: If False, the data will be returned as a block-diagonal DataFrame, and it will contain the site and location columns. Besides, the timestamp column will not contain unique values since it will be repeated for each site and location. If False, the data will be returned as a full DataFrame, it will not contain the site and location columns, and the timestamp column will contain unique values.
headersOptional[Union[Dict[str, str]]], optional: The headers to pass to the request. If None, the default headers will be used as {"accept": "application/json"}. Default is None.
datapd.DataFrame: The DataFrame to update with the data signals. If None, a new DataFrame will be created. Default is None.
timeoutint, optional: The timeout for the request in seconds. Default is 3600.
threadsint, optional: The number of threads to use for the request. Default is the number of CPU cores. If None, it will be set to the number of CPU cores.
location: Optional[Union[List, str]]: The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the locations parameter instead.
force_cache_missbool, optional: Whether to force a cache miss on the backend data endpoint. Default is False.
methodstr, optional: Retrieval method, either "GET" or "POST". Default is "POST".

Returns#

Union[pd.DataFrame, Dict[str, Dict[str, pd.DataFrame]]]: Coroutine that returns either: - The DataFrame containing the data signals, or - A dictionary containing the data signals divided by location, or - A dictionary containing the data signals in star schema format.

async get_oldest_timestamp(sites: str | List[str], locations: List[str] | str | None, method: str = 'GET') → pandas.DataFrame#

Get oldest timestamp for one or multiple sites (async).

Parameters#

site: Union[str, List[str]]: The site(s) to retrieve the oldest timestamp for.
locations: Optional[Union[List[str], str]],: The location(s) to retrieve the oldest timestamp for.

Returns#

pd.DataFrame: A DataFrame containing the oldest timestamp for the specified site(s) and location(s).

Get the metrics statistics (MAX, MIN, AVG) for the specified time range asynchronously.

Parameters#

sites: Optional[Union[List, str]]: The sites to filter the data.
locations: Optional[Union[List, str]]: The locations to filter the data.
metrics: Union[List, str]: The metrics to retrieve.
start_timestamp: Union[str, datetime.datetime]: The start timestamp for the data retrieval.
end_timestamp: Union[str, datetime.datetime]: The end timestamp for the data retrieval.
as_dict: bool: Whether to return the data as a dictionary.
headers: Optional[Union[Dict[str, str]]]: Headers to include in the request.
timeout: int: The timeout for the request.
location: Optional[Union[List, str]]: The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the locations parameter instead.

Returns#

Union[DataFrame, Dict[str, Union[Dict[str, DataFrame], Dict[str, Any]]]]: The retrieved data.

Get the metrics statistics (MAX, MIN, AVG) for the specified time range asynchronously.

Parameters#

sites: Optional[Union[List, str]]: The sites to filter the data.
locations: Optional[Union[List, str]]: The locations to filter the data.
metrics: Union[List, str]: The metrics to retrieve.
start_timestamp: Union[str, datetime.datetime]: The start timestamp for the data retrieval.
end_timestamp: Union[str, datetime.datetime]: The end timestamp for the data retrieval.
as_dict: bool: Whether to return the data as a dictionary.
headers: Optional[Union[Dict[str, str]]]: Headers to include in the request.
timeout: int: The timeout for the request.
location: Optional[Union[List, str]]: The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the locations parameter instead.

Returns#

Union[DataFrame, Dict[str, Union[Dict[str, DataFrame], Dict[str, Any]]]]: The retrieved data.

Get the metrics statistics (MAX, MIN, AVG) for the specified time range.

Parameters#

sites: Optional[Union[List, str]]: The sites to filter the data.
locations: Optional[Union[List, str]]: The locations to filter the data.
metrics: Union[List, str]: The metrics to retrieve.
start_timestamp: Union[str, datetime.datetime]: The start timestamp for the data retrieval.
end_timestamp: Union[str, datetime.datetime]: The end timestamp for the data retrieval.
granularity: Union[str, int]: The granularity of the data, can be a string, or an integer number of seconds. String values are restricted to “day”, “week”, “calendarmonth”, “30days”, or “365days”. If “calendarmonth” is used, the availability will refer to the specific calendar month (e.g. January 2023), and not to a rolling period of 30 days.
sampling_interval_seconds: Optional[int]: The sampling interval in seconds. If None, the default value is used, which is 600 seconds (10 minutes).
as_dict: bool: Whether to return the data as a dictionary.
headers: Optional[Union[Dict[str, str]]]: Headers to include in the request.
timeout: int: The timeout for the request.
threads: Optional[int]: The number of threads to use for the request.
location: Optional[Union[List, str]]: The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the locations parameter instead.

Returns#

Union[DataFrame, Dict[str, Union[Dict[str, DataFrame], Dict[str, Any]]]]: The retrieved data.

async get_stats_predefined_intervals(sites: List | str | None, locations: List | str | None, metrics: List | str, as_dict: bool = False, headers: Dict[str, str] | None = None, timeout: int = 3600, method: str = 'GET') → Dict[str, Any]#: Async version of get_stats_predefined_intervals.

to_category_value(data: pandas.DataFrame | Dict[str, Dict[str, pandas.DataFrame]], metrics_overview: pandas.DataFrame, keep_stat_in_metric_name: bool = False) → pandas.DataFrame#

Categorize the data based on the metrics overview.

Parameters#

dataUnion[pd.DataFrame, Dict[str, Dict[str, pd.DataFrame]]]: The data to be categorized. It can be either a DataFrame or a dictionary of DataFrames.
metrics_overviewpd.DataFrame: A DataFrame containing the information about the metrics.
keep_stat_in_metric_namebool, optional: Whether to keep the statistic in the metric name, by default True.

Returns#

Union[pd.DataFrame, Dict[str, Dict[str, pd.DataFrame]]]: The data in category-value format, based on the metrics overview.

Notes#

The function performs the following steps: 1. Transforms the data dictionary into a DataFrame if necessary. 2. Resets the index and converts the timestamp column to datetime. 3. Melts the data to long format. 4. Merges the melted data with the metrics overview DataFrame. 5. Renames columns for consistency. 6. Extracts latitude and heading information from the metric names. 7. Extracts sub-assembly information from the metric names. 8. Reorders the columns. 9. Optionally appends the statistic to the metric name. 10. Drops the rows where the metric name is “index”, “site” or “location”.

Example#

>>> import pandas as pd
>>> from typing import Union, Dict
>>> data = {
...     'timestamp': ['2021-01-01', '2021-01-02'],
...     'mean_WF_A01_TP_SG_LAT005_DEG000': [1.0, 1.1],
...     'mean_WF_A02_TP_SG_LAT005_DEG000': [2.0, 2.1]
... }
>>> metrics_overview = pd.DataFrame({
...     'metric': ['mean_WF_A01_TP_SG_LAT005_DEG000',
...                'mean_WF_A02_TP_SG_LAT005_DEG000'],
...     'short_hand': ['TP_SG_LAT005_DEG000', 'TP_SG_LAT005_DEG000'],
...     'statistic': ['mean', 'mean'],
...     'unit': ['unit', 'unit'],
...     'site': ['WindFarm', 'WindFarm'],
...     'location': ['WFA01', 'WFA02'],
...     'data_group': ['SG', 'SG'],
...     'site_id': ['WF', 'WF'],
...     'location_id': ['A01', 'A02']
... })
>>> categorized = to_category_value(data, metrics_overview)
>>> categorized
+------------+--------------------------------+-------+------+-----------+---------------------+---------+-------------+-----+---------+-----------+----------+--------------+
| timestamp  | full_metric_name               | value | unit | statistic | short_hand          | site_id | location_id | lat | heading | site      | location | metric_group |
+============+================================+=======+======+===========+=====================+=========+=============+=====+=========+===========+==========+==============+
| 2021-01-01 | mean_WF_A01_TP_SG_LAT005_DEG000| 1.0   | unit | mean      | TP_SG_LAT005_DEG000 | WF      | A01         | 5.0 | 0.0     | WindFarm  | WFA01    | SG           |
+------------+--------------------------------+-------+------+-----------+---------------------+---------+-------------+-----+---------+-----------+----------+--------------+
| 2021-01-02 | mean_WF_A01_TP_SG_LAT005_DEG000| 1.1   | unit | mean      | TP_SG_LAT005_DEG000 | WF      | A01         | 5.0 | 0.0     | WindFarm  | WFA01    | SG           |
+------------+--------------------------------+-------+------+-----------+---------------------+---------+-------------+-----+---------+-----------+----------+--------------+
| 2021-01-01 | mean_WF_A02_TP_SG_LAT005_DEG000| 2.0   | unit | mean      | TP_SG_LAT005_DEG000 | WF      | A02         | 5.0 | 0.0     | WindFarm  | WFA02    | SG           |
+------------+--------------------------------+-------+------+-----------+---------------------+---------+-------------+-----+---------+-----------+----------+--------------+
| 2021-01-02 | mean_WF_A02_TP_SG_LAT005_DEG000| 2.1   | unit | mean      | TP_SG_LAT005_DEG000 | WF      | A02         | 5.0 | 0.0     | WindFarm  | WFA02    | SG           |
+------------+--------------------------------+-------+------+-----------+---------------------+---------+-------------+-----+---------+-----------+----------+--------------+

to_star_schema(data: pandas.DataFrame | Dict[str, List[Dict[str, Any]]], metrics_overview: pandas.DataFrame | None = None, as_dict: bool = False, convert_object_columns_to_string: bool = False, _username: str | None = None, _password: str | None = None) → Dict[str, Any] | pandas.DataFrame | None#

Transforms the data and metrics_overview into a star schema format for analytical purposes.

Parameters#

dataUnion[pd.DataFrame, Dict[str, list[dict[str, Any]]]]: A DataFrame or dictionary representing the raw data. The keys are column column names, and the values are lists of data. Must include a “timestamp” column or have indices that can be converted to timestamps.
metrics_overviewpd.DataFrame: A DataFrame containing metadata for metrics, including the following required columns: - | ‘metric’: The metric names (must match column names in data). - | ‘short_hand’: Short descriptive names for the metrics. - | ‘description’: Detailed descriptions of the metrics. - | ‘statistic’: Aggregation or statistical operation (e.g., mean, | std). - | ‘unit_str’: The units for the metrics. - | ‘location’: Location identifiers. - | ‘site’: Windfarm identifiers. - | ‘data_group’: Grouping of data (e.g., “scada”).
as_dictbool, optional: If True, the data will be returned as a dictionary. Default is False.
convert_object_columns_to_stringbool, optional: If True, convert object columns in the DataFrame to string. This feature is useful if importing the DataFrame within a database so that the ‘value’ column can be stored as a float, since the non-float values will be stored as NULL. Default is False.
_usernameOptional[str]: The username for authentication. If None, the username will be inferred from the environment variables.
_passwordOptional[str]: The password for authentication. If None, the password will be inferred from the environment variables.

Returns#

dict[str, pd.DataFrame]

A dictionary containing the following tables:

‘FactData’: The fact table linking metrics to timestamps,

locations, metric IDs, and their values as columns.
‘FactPivotData’: The fact table in pivot format, i.e. containing

timestamp, location, and “statistic” + “short_hand” metric names

as columns. This pivoted format is the ones used generally by

BI tools and databases such as InfluxDB.
‘DimMetric’: Dimension table for metrics, including metric ID,

short name, description, statistic, and unit.
‘DimWindFarm’: Dimension table for wind farms, including

locations and sites.
‘DimCalendar’: Dimension table for time, including date parts

(year, month, day, hour, minute).
‘DimDataGroup’: Dimension table for data groups.

Raises#

ValueError: If required columns are missing in data or metrics_overview.
KeyError: If the metric column in metrics_overview contains values not present in data.

Example#

>>> import pandas as pd
>>> data = {
...     'timestamp': ['2020-01-01T00:00:00Z', '2020-01-01T00:10:00Z'],
...     'mean_WF_A01_winddirection': [257.445, 262.03],
...     'std_WF_A01_windspeed': [1.5165, 1.7966]
... }
>>> metrics_overview = pd.DataFrame({
...     'metric': ['mean_WF_A01_winddirection', 'std_WF_A01_windspeed'],
...     'short_hand': ['winddirection', 'windspeed'],
...     'description': ['Wind direction', 'Wind speed'],
...     'statistic': ['mean', 'std'],
...     'unit_str': ['°', 'm/s'],
...     'location': ['WFA01', 'WFMA4'],
...     'site': ['windfarm', 'windfarm'],
...     'data_group': ['scada', 'scada']
... })
>>> result = to_star_schema(data, metrics_overview)
>>> for key, df in result.items():
...     print(f"{key}: {df.to_markdown()}")