api_24sea.datasignals.core ========================== .. py:module:: api_24sea.datasignals.core .. autoapi-nested-parse:: The core module for the api_24sea.datasignals subpackage Classes ------- .. autoapisummary:: api_24sea.datasignals.core.API api_24sea.datasignals.core.AsyncAPI Functions --------- .. autoapisummary:: api_24sea.datasignals.core.to_category_value api_24sea.datasignals.core.to_star_schema Module Contents --------------- .. py:class:: API Accessor for working with data signals coming from the 24SEA API. .. py:property:: authenticated :type: bool Whether the client is authenticated .. py:property:: metrics_overview :type: Optional[pandas.DataFrame] Get the metrics overview DataFrame. .. py:method:: authenticate(username: str, password: str, permissions_overview: Optional[pandas.DataFrame] = None) Authenticate with username/password .. py:method:: get_metrics(site: Optional[str] = None, locations: Optional[Union[str, List[str]]] = None, metrics: Optional[Union[str, List[str]]] = None, headers: Optional[Dict[str, str]] = None) -> Optional[List[Dict[str, Optional[str]]]] Get the metrics names for a site, provided the following parameters. Parameters ---------- site : Optional[str] The site name. If None, the queryable metrics for all sites will be returned, and the locations and metrics parameters will be ignored. locations : Optional[Union[str, List[str]]] The locations for which to get the metrics. If None, all locations will be considered. metrics : Optional[Union[str, List[str]]] The metrics to get. They can be specified as regular expressions. If None, all metrics will be considered. For example: * | ``metrics=["^ACC", "^DEM"]`` will return all the metrics that | start with ACC or DEM, * Similarly, ``metrics=["windspeed$", "winddirection$"]`` will | return all the metrics that end with windspeed and | winddirection, * and ``metrics=[".*WF_A01.*",".*WF_A02.*"]`` will return all | metrics that contain WF_A01 or WF_A02. Returns ------- Optional[List[Dict[str, Optional[str]]]] The metrics names for the given site, locations and metrics. .. note:: This class method is legacy because it does not add functionality to the DataSignals pandas accessor. .. py:method:: selected_metrics(data: pandas.DataFrame) -> pandas.DataFrame Return the selected metrics for the query. .. py:method:: get_data(sites: Optional[Union[List, str]], locations: Optional[Union[List, str]], metrics: Union[List, str], start_timestamp: Union[str, datetime.datetime], end_timestamp: Union[str, datetime.datetime], as_dict: bool = False, as_star_schema: bool = False, outer_join_on_timestamp: bool = True, headers: Optional[Dict[str, str]] = None, data: Optional[pandas.DataFrame] = None, timeout: int = 3600, threads: Optional[int] = None, location: Optional[Union[List, str]] = None, force_cache_miss: bool = False, method: str = 'POST') -> Optional[Union[pandas.DataFrame, Dict[str, Union[Dict[str, pandas.DataFrame], Dict[str, Any]]], List[Union[Any, str]]]] Get the data signals from the 24SEA API. Parameters ---------- sites : Optional[Union[List, str]] The site name or List of site names. If None, the site will be inferred from the metrics. locations : Optional[Union[List, str]] The location name or List of location names. If None, the location will be inferred from the metrics. metrics : Union[List, str] The metric name or List of metric names. It must be provided. They do not have to be the entire metric name, but can be a part of it. For example, if the metric name is ``"mean_WF_A01_windspeed"``, the user can equivalently provide ``sites="wf"``, ``locations="a01"``, ``metric="mean windspeed"``. start_timestamp : Union[str, datetime.datetime] The start timestamp for the query. It must be in ISO 8601 format, e.g., ``"2021-01-01T00:00:00Z"`` or a datetime object. end_timestamp : Union[str, datetime.datetime] The end timestamp for the query. It must be in ISO 8601 format, e.g., ``"2021-01-01T00:00:00Z"`` or a datetime object. as_dict : bool, optional If True, the data will be returned as a list of dictionaries. Default is False. as_star_schema : bool, optional If True, the data will be returned in a star schema format. Default is False. outer_join_on_timestamp : bool If False, the data will be returned as a block-diagonal DataFrame, and it will contain the site and location columns. Besides, the timestamp column will not contain unique values since it will be repeated for each site and location. If False, the data will be returned as a full DataFrame, it will not contain the site and location columns, and the timestamp column will contain unique values. headers : Optional[Union[Dict[str, str]]], optional The headers to pass to the request. If None, the default headers will be used as ``{"accept": "application/json"}``. Default is None. data : pd.DataFrame The DataFrame to update with the data signals. If None, a new DataFrame will be created. Default is None. timeout : int, optional The timeout for the request in seconds. Default is 3600. threads : int, optional The number of threads to use for the request. Default is the number of CPU cores. If None, it will be set to the number of CPU cores. location: Optional[Union[List, str]] The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the `locations` parameter instead. force_cache_miss : bool, optional Whether to force a cache miss on the backend data endpoint. Default is False. method : str, optional HTTP method to use for the backend request. Default is ``"GET"``. Returns ------- Union[pd.DataFrame, Dict[str, Dict[str, pd.DataFrame]]] - The DataFrame containing the data signals, or - A dictionary containing the data signals divided by location, or - A dictionary containing the data signals in star schema format. .. py:method:: get_stats(sites: Optional[Union[List, str]], locations: Optional[Union[List, str]], metrics: Union[List, str], start_timestamp: Union[str, datetime.datetime], end_timestamp: Union[str, datetime.datetime], as_dict: bool = False, headers: Optional[Dict[str, str]] = None, timeout: int = 3600, threads: Optional[int] = None, location: Optional[Union[List, str]] = None, method: str = 'GET') -> Union[pandas.DataFrame, Dict[str, Union[Dict[str, pandas.DataFrame], Dict[str, Any]]]] Get the metrics statistics (MAX, MIN, AVG) for the specified time range. Parameters ---------- sites: Optional[Union[List, str]] The sites to filter the data. locations: Optional[Union[List, str]] The locations to filter the data. metrics: Union[List, str] The metrics to retrieve. start_timestamp: Union[str, datetime.datetime] The start timestamp for the data retrieval. end_timestamp: Union[str, datetime.datetime] The end timestamp for the data retrieval. as_dict: bool Whether to return the data as a dictionary. headers: Optional[Union[Dict[str, str]]] Headers to include in the request. timeout: int The timeout for the request. threads: Optional[int] The number of threads to use for the request. location: Optional[Union[List, str]] The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the `locations` parameter instead. method : str, optional HTTP method to use for the backend request. Default is ``"GET"``. Returns ------- Union[DataFrame, Dict[str, Union[Dict[str, DataFrame], Dict[str, Any]]]] The retrieved data. .. py:method:: get_null_timestamps(sites: Optional[Union[List, str]], locations: Optional[Union[List, str]], metrics: Union[List, str], start_timestamp: Union[str, datetime.datetime], end_timestamp: Union[str, datetime.datetime], as_dict: bool = False, headers: Optional[Dict[str, str]] = None, timeout: int = 3600, threads: Optional[int] = None, location: Optional[Union[List, str]] = None, method: str = 'GET') -> Union[pandas.DataFrame, Dict[str, Union[Dict[str, pandas.DataFrame], Dict[str, Any]]]] Get the list of timestamps which the selected metrics have null values in the specified time range. Parameters ---------- sites: Optional[Union[List, str]] The sites to filter the data. locations: Optional[Union[List, str]] The locations to filter the data. metrics: Union[List, str] The metrics to retrieve. start_timestamp: Union[str, datetime.datetime] The start timestamp for the data retrieval. end_timestamp: Union[str, datetime.datetime] The end timestamp for the data retrieval. as_dict: bool Whether to return the data as a dictionary. headers: Optional[Union[Dict[str, str]]] Headers to include in the request. timeout: int The timeout for the request. threads: Optional[int] The number of threads to use for the request. location: Optional[Union[List, str]] The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the `locations` parameter instead. method : str, optional HTTP method to use for the backend request. Default is ``"GET"``. Returns ------- Union[DataFrame, Dict[str, Union[Dict[str, DataFrame], Dict[str, Any]]]] The retrieved data. .. py:method:: get_availability(sites: Optional[Union[List, str]], locations: Optional[Union[List, str]], metrics: Union[List, str], start_timestamp: Union[str, datetime.datetime], end_timestamp: Union[str, datetime.datetime], granularity: Union[str, int], sampling_interval_seconds: Optional[int] = None, as_dict: bool = False, headers: Optional[Dict[str, str]] = None, timeout: int = 3600, threads: Optional[int] = None, location: Optional[Union[List, str]] = None, method: str = 'GET') -> Union[pandas.DataFrame, Dict[str, Union[Dict[str, pandas.DataFrame], Dict[str, Any]]]] Get the metrics statistics (MAX, MIN, AVG) for the specified time range. Parameters ---------- sites: Optional[Union[List, str]] The sites to filter the data. locations: Optional[Union[List, str]] The locations to filter the data. metrics: Union[List, str] The metrics to retrieve. start_timestamp: Union[str, datetime.datetime] The start timestamp for the data retrieval. end_timestamp: Union[str, datetime.datetime] The end timestamp for the data retrieval. granularity: Union[str, int] The granularity of the data, can be a string, or an integer number of seconds. String values are restricted to "day", "week", "calendarmonth", "30days", or "365days". If "calendarmonth" is used, the availability will refer to the specific calendar month (e.g. January 2023), and not to a rolling period of 30 days. sampling_interval_seconds: Optional[int] The sampling interval in seconds. If None, the default value is used, which is 600 seconds (10 minutes). as_dict: bool Whether to return the data as a dictionary. headers: Optional[Union[Dict[str, str]]] Headers to include in the request. timeout: int The timeout for the request. threads: Optional[int] The number of threads to use for the request. location: Optional[Union[List, str]] The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the `locations` parameter instead. method : str, optional HTTP method to use for the backend request. Default is ``"GET"``. Returns ------- Union[DataFrame, Dict[str, Union[Dict[str, DataFrame], Dict[str, Any]]]] The retrieved data. .. py:method:: get_oldest_timestamp(sites: Union[str, List[str]], locations: Optional[Union[List[str], str]], method: str = 'GET') -> pandas.DataFrame Get oldest timestamp for one or multiple sites (sync). .. py:method:: get_stats_predefined_intervals(sites: Optional[Union[List, str]], locations: Optional[Union[List, str]], metrics: Union[List, str], as_dict: bool = False, headers: Optional[Dict[str, str]] = None, timeout: int = 3600, threads: Optional[int] = None, method: str = 'GET') -> Dict[str, Any] Run get_stats for predefined intervals: - all_time: (datetime.min -> datetime.max) - last_year: (now-365d -> now) - last_month: (now-30d -> now) .. py:class:: AsyncAPI Async version of the API class. Get data from 24sea API /datasignals asyncronously .. py:method:: get_metrics_overview() -> Optional[pandas.DataFrame] :async: Asynchronously get metrics overview, authenticating if needed .. py:method:: get_metrics(site: Optional[str] = None, locations: Optional[Union[str, List[str]]] = None, metrics: Optional[Union[str, List[str]]] = None, headers: Optional[Dict[str, str]] = None) -> Any :async: Get the metrics names for a site asynchronously. .. py:method:: get_data(sites: Optional[Union[List, str]], locations: Optional[Union[List, str]], metrics: Union[List, str], start_timestamp: Union[str, datetime.datetime], end_timestamp: Union[str, datetime.datetime], as_dict: bool = False, as_star_schema: bool = False, outer_join_on_timestamp: bool = True, headers: Optional[Dict[str, str]] = None, data: Optional[pandas.DataFrame] = None, max_retries: int = 5, timeout: int = 1800, location: Optional[Union[List, str]] = None, force_cache_miss: bool = False, method: str = 'POST') -> Optional[Union[pandas.DataFrame, Dict[str, Union[Dict[str, pandas.DataFrame], Dict[str, Any]]], List[Union[Any, str]]]] :async: Get the data signals from the 24SEA API asynchronously. Asynchronous version of :py:meth:`API.get_data`, with the same parameters and return type. The only difference is that this method is asynchronous and returns a coroutine, so it must be awaited to get the actual data. Moreover, in case of any request failure, instead of returning a DataFrame with the successfully retrieved data, it returns a list of error messages. Parameters ---------- sites : Optional[Union[List, str]] The site name or List of site names. If None, the site will be inferred from the metrics. locations : Optional[Union[List, str]] The location name or List of location names. If None, the location will be inferred from the metrics. metrics : Union[List, str] The metric name or List of metric names. It must be provided. They do not have to be the entire metric name, but can be a part of it. For example, if the metric name is ``"mean_WF_A01_windspeed"``, the user can equivalently provide ``sites="wf"``, ``locations="a01"``, ``metric="mean windspeed"``. start_timestamp : Union[str, datetime.datetime] The start timestamp for the query. It must be in ISO 8601 format, e.g., ``"2021-01-01T00:00:00Z"`` or a datetime object. end_timestamp : Union[str, datetime.datetime] The end timestamp for the query. It must be in ISO 8601 format, e.g., ``"2021-01-01T00:00:00Z"`` or a datetime object. as_dict : bool, optional If True, the data will be returned as a list of dictionaries. Default is False. as_star_schema : bool, optional If True, the data will be returned in a star schema format. Default is False. outer_join_on_timestamp : bool If False, the data will be returned as a block-diagonal DataFrame, and it will contain the site and location columns. Besides, the timestamp column will not contain unique values since it will be repeated for each site and location. If False, the data will be returned as a full DataFrame, it will not contain the site and location columns, and the timestamp column will contain unique values. headers : Optional[Union[Dict[str, str]]], optional The headers to pass to the request. If None, the default headers will be used as ``{"accept": "application/json"}``. Default is None. data : pd.DataFrame The DataFrame to update with the data signals. If None, a new DataFrame will be created. Default is None. timeout : int, optional The timeout for the request in seconds. Default is 3600. threads : int, optional The number of threads to use for the request. Default is the number of CPU cores. If None, it will be set to the number of CPU cores. location: Optional[Union[List, str]] The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the `locations` parameter instead. force_cache_miss : bool, optional Whether to force a cache miss on the backend data endpoint. Default is False. method : str, optional HTTP method to use for the backend request. Default is ``"GET"``. Returns ------- Union[pd.DataFrame, Dict[str, Dict[str, pd.DataFrame]]] Coroutine that returns either: - The DataFrame containing the data signals, or - A dictionary containing the data signals divided by location, or - A dictionary containing the data signals in star schema format. .. py:method:: get_oldest_timestamp(sites: Union[str, List[str]], locations: Optional[Union[List[str], str]], method: str = 'GET') -> pandas.DataFrame :async: Get oldest timestamp for one or multiple sites (async). Parameters ---------- site: Union[str, List[str]] The site(s) to retrieve the oldest timestamp for. locations: Optional[Union[List[str], str]], The location(s) to retrieve the oldest timestamp for. Returns ------- pd.DataFrame A DataFrame containing the oldest timestamp for the specified site(s) and location(s). .. py:method:: get_stats(sites: Optional[Union[List, str]], locations: Optional[Union[List, str]], metrics: Union[List, str], start_timestamp: Union[str, datetime.datetime], end_timestamp: Union[str, datetime.datetime], as_dict: bool = False, headers: Optional[Dict[str, str]] = None, timeout: int = 3600, location: Optional[Union[List, str]] = None, method: str = 'GET') -> Union[pandas.DataFrame, Dict[str, Union[Dict[str, pandas.DataFrame], Dict[str, Any]]]] :async: Get the metrics statistics (MAX, MIN, AVG) for the specified time range asynchronously. Parameters ---------- sites: Optional[Union[List, str]] The sites to filter the data. locations: Optional[Union[List, str]] The locations to filter the data. metrics: Union[List, str] The metrics to retrieve. start_timestamp: Union[str, datetime.datetime] The start timestamp for the data retrieval. end_timestamp: Union[str, datetime.datetime] The end timestamp for the data retrieval. as_dict: bool Whether to return the data as a dictionary. headers: Optional[Union[Dict[str, str]]] Headers to include in the request. timeout: int The timeout for the request. location: Optional[Union[List, str]] The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the `locations` parameter instead. Returns ------- Union[DataFrame, Dict[str, Union[Dict[str, DataFrame], Dict[str, Any]]]] The retrieved data. .. py:method:: get_null_timestamps(sites: Optional[Union[List, str]], locations: Optional[Union[List, str]], metrics: Union[List, str], start_timestamp: Union[str, datetime.datetime], end_timestamp: Union[str, datetime.datetime], as_dict: bool = False, headers: Optional[Dict[str, str]] = None, timeout: int = 3600, location: Optional[Union[List, str]] = None, method: str = 'GET') -> Union[pandas.DataFrame, Dict[str, Union[Dict[str, pandas.DataFrame], Dict[str, Any]]]] :async: Get the metrics statistics (MAX, MIN, AVG) for the specified time range asynchronously. Parameters ---------- sites: Optional[Union[List, str]] The sites to filter the data. locations: Optional[Union[List, str]] The locations to filter the data. metrics: Union[List, str] The metrics to retrieve. start_timestamp: Union[str, datetime.datetime] The start timestamp for the data retrieval. end_timestamp: Union[str, datetime.datetime] The end timestamp for the data retrieval. as_dict: bool Whether to return the data as a dictionary. headers: Optional[Union[Dict[str, str]]] Headers to include in the request. timeout: int The timeout for the request. location: Optional[Union[List, str]] The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the `locations` parameter instead. Returns ------- Union[DataFrame, Dict[str, Union[Dict[str, DataFrame], Dict[str, Any]]]] The retrieved data. .. py:method:: get_availability(sites: Optional[Union[List, str]], locations: Optional[Union[List, str]], metrics: Union[List, str], start_timestamp: Union[str, datetime.datetime], end_timestamp: Union[str, datetime.datetime], granularity: Union[str, int], sampling_interval_seconds: Optional[int] = None, as_dict: bool = False, headers: Optional[Dict[str, str]] = None, timeout: int = 3600, location: Optional[Union[List, str]] = None, method: str = 'GET') -> Union[pandas.DataFrame, Dict[str, Union[Dict[str, pandas.DataFrame], Dict[str, Any]]]] :async: Get the metrics statistics (MAX, MIN, AVG) for the specified time range. Parameters ---------- sites: Optional[Union[List, str]] The sites to filter the data. locations: Optional[Union[List, str]] The locations to filter the data. metrics: Union[List, str] The metrics to retrieve. start_timestamp: Union[str, datetime.datetime] The start timestamp for the data retrieval. end_timestamp: Union[str, datetime.datetime] The end timestamp for the data retrieval. granularity: Union[str, int] The granularity of the data, can be a string, or an integer number of seconds. String values are restricted to "day", "week", "calendarmonth", "30days", or "365days". If "calendarmonth" is used, the availability will refer to the specific calendar month (e.g. January 2023), and not to a rolling period of 30 days. sampling_interval_seconds: Optional[int] The sampling interval in seconds. If None, the default value is used, which is 600 seconds (10 minutes). as_dict: bool Whether to return the data as a dictionary. headers: Optional[Union[Dict[str, str]]] Headers to include in the request. timeout: int The timeout for the request. threads: Optional[int] The number of threads to use for the request. location: Optional[Union[List, str]] The location name or List of location names. This is a legacy parameter, and it is deprecated. Please use the `locations` parameter instead. Returns ------- Union[DataFrame, Dict[str, Union[Dict[str, DataFrame], Dict[str, Any]]]] The retrieved data. .. py:method:: get_stats_predefined_intervals(sites: Optional[Union[List, str]], locations: Optional[Union[List, str]], metrics: Union[List, str], as_dict: bool = False, headers: Optional[Dict[str, str]] = None, timeout: int = 3600, method: str = 'GET') -> Dict[str, Any] :async: Async version of get_stats_predefined_intervals. .. py:function:: to_category_value(data: Union[pandas.DataFrame, Dict[str, Dict[str, pandas.DataFrame]]], metrics_overview: pandas.DataFrame, keep_stat_in_metric_name: bool = False) -> pandas.DataFrame Categorize the data based on the metrics overview. Parameters ---------- data : Union[pd.DataFrame, Dict[str, Dict[str, pd.DataFrame]]] The data to be categorized. It can be either a DataFrame or a dictionary of DataFrames. metrics_overview : pd.DataFrame A DataFrame containing the information about the metrics. keep_stat_in_metric_name : bool, optional Whether to keep the statistic in the metric name, by default True. Returns ------- Union[pd.DataFrame, Dict[str, Dict[str, pd.DataFrame]]] The data in category-value format, based on the metrics overview. Notes ----- The function performs the following steps: 1. Transforms the data dictionary into a DataFrame if necessary. 2. Resets the index and converts the timestamp column to datetime. 3. Melts the data to long format. 4. Merges the melted data with the metrics overview DataFrame. 5. Renames columns for consistency. 6. Extracts latitude and heading information from the metric names. 7. Extracts sub-assembly information from the metric names. 8. Reorders the columns. 9. Optionally appends the statistic to the metric name. 10. Drops the rows where the metric name is "index", "site" or "location". Example ------- >>> import pandas as pd >>> from typing import Union, Dict >>> data = { ... 'timestamp': ['2021-01-01', '2021-01-02'], ... 'mean_WF_A01_TP_SG_LAT005_DEG000': [1.0, 1.1], ... 'mean_WF_A02_TP_SG_LAT005_DEG000': [2.0, 2.1] ... } >>> metrics_overview = pd.DataFrame({ ... 'metric': ['mean_WF_A01_TP_SG_LAT005_DEG000', ... 'mean_WF_A02_TP_SG_LAT005_DEG000'], ... 'short_hand': ['TP_SG_LAT005_DEG000', 'TP_SG_LAT005_DEG000'], ... 'statistic': ['mean', 'mean'], ... 'unit': ['unit', 'unit'], ... 'site': ['WindFarm', 'WindFarm'], ... 'location': ['WFA01', 'WFA02'], ... 'data_group': ['SG', 'SG'], ... 'site_id': ['WF', 'WF'], ... 'location_id': ['A01', 'A02'] ... }) >>> categorized = to_category_value(data, metrics_overview) >>> categorized +------------+--------------------------------+-------+------+-----------+---------------------+---------+-------------+-----+---------+-----------+----------+--------------+ | timestamp | full_metric_name | value | unit | statistic | short_hand | site_id | location_id | lat | heading | site | location | metric_group | +============+================================+=======+======+===========+=====================+=========+=============+=====+=========+===========+==========+==============+ | 2021-01-01 | mean_WF_A01_TP_SG_LAT005_DEG000| 1.0 | unit | mean | TP_SG_LAT005_DEG000 | WF | A01 | 5.0 | 0.0 | WindFarm | WFA01 | SG | +------------+--------------------------------+-------+------+-----------+---------------------+---------+-------------+-----+---------+-----------+----------+--------------+ | 2021-01-02 | mean_WF_A01_TP_SG_LAT005_DEG000| 1.1 | unit | mean | TP_SG_LAT005_DEG000 | WF | A01 | 5.0 | 0.0 | WindFarm | WFA01 | SG | +------------+--------------------------------+-------+------+-----------+---------------------+---------+-------------+-----+---------+-----------+----------+--------------+ | 2021-01-01 | mean_WF_A02_TP_SG_LAT005_DEG000| 2.0 | unit | mean | TP_SG_LAT005_DEG000 | WF | A02 | 5.0 | 0.0 | WindFarm | WFA02 | SG | +------------+--------------------------------+-------+------+-----------+---------------------+---------+-------------+-----+---------+-----------+----------+--------------+ | 2021-01-02 | mean_WF_A02_TP_SG_LAT005_DEG000| 2.1 | unit | mean | TP_SG_LAT005_DEG000 | WF | A02 | 5.0 | 0.0 | WindFarm | WFA02 | SG | +------------+--------------------------------+-------+------+-----------+---------------------+---------+-------------+-----+---------+-----------+----------+--------------+ .. py:function:: to_star_schema(data: Union[pandas.DataFrame, Dict[str, List[Dict[str, Any]]]], metrics_overview: Optional[pandas.DataFrame] = None, as_dict: bool = False, convert_object_columns_to_string: bool = False, _username: Optional[str] = None, _password: Optional[str] = None) -> Optional[Union[Dict[str, Any], pandas.DataFrame]] Transforms the data and metrics_overview into a star schema format for analytical purposes. Parameters ---------- data : Union[pd.DataFrame, Dict[str, list[dict[str, Any]]]] A DataFrame or dictionary representing the raw data. The keys are column column names, and the values are lists of data. Must include a "timestamp" column or have indices that can be converted to timestamps. metrics_overview : pd.DataFrame A DataFrame containing metadata for metrics, including the following required columns: - | 'metric': The metric names (must match column names in `data`). - | 'short_hand': Short descriptive names for the metrics. - | 'description': Detailed descriptions of the metrics. - | 'statistic': Aggregation or statistical operation (e.g., mean, | std). - | 'unit_str': The units for the metrics. - | 'location': Location identifiers. - | 'site': Windfarm identifiers. - | 'data_group': Grouping of data (e.g., "scada"). as_dict : bool, optional If True, the data will be returned as a dictionary. Default is False. convert_object_columns_to_string : bool, optional If True, convert object columns in the DataFrame to string. This feature is useful if importing the DataFrame within a database so that the 'value' column can be stored as a float, since the non-float values will be stored as NULL. Default is False. _username : Optional[str] The username for authentication. If None, the username will be inferred from the environment variables. _password : Optional[str] The password for authentication. If None, the password will be inferred from the environment variables. Returns ------- dict[str, pd.DataFrame] A dictionary containing the following tables: - | 'FactData': The fact table linking metrics to timestamps, | locations, metric IDs, and their values as columns. - | 'FactPivotData': The fact table in pivot format, i.e. containing | timestamp, location, and "statistic" + "short_hand" metric names | as columns. This pivoted format is the ones used generally by | BI tools and databases such as InfluxDB. - | 'DimMetric': Dimension table for metrics, including metric ID, | short name, description, statistic, and unit. - | 'DimWindFarm': Dimension table for wind farms, including | locations and sites. - | 'DimCalendar': Dimension table for time, including date parts | (year, month, day, hour, minute). - | 'DimDataGroup': Dimension table for data groups. Raises ------ ValueError If required columns are missing in `data` or `metrics_overview`. KeyError If the `metric` column in `metrics_overview` contains values not present in `data`. Example ------- >>> import pandas as pd >>> data = { ... 'timestamp': ['2020-01-01T00:00:00Z', '2020-01-01T00:10:00Z'], ... 'mean_WF_A01_winddirection': [257.445, 262.03], ... 'std_WF_A01_windspeed': [1.5165, 1.7966] ... } >>> metrics_overview = pd.DataFrame({ ... 'metric': ['mean_WF_A01_winddirection', 'std_WF_A01_windspeed'], ... 'short_hand': ['winddirection', 'windspeed'], ... 'description': ['Wind direction', 'Wind speed'], ... 'statistic': ['mean', 'std'], ... 'unit_str': ['°', 'm/s'], ... 'location': ['WFA01', 'WFMA4'], ... 'site': ['windfarm', 'windfarm'], ... 'data_group': ['scada', 'scada'] ... }) >>> result = to_star_schema(data, metrics_overview) >>> for key, df in result.items(): ... print(f"{key}: {df.to_markdown()}")