api_24sea.utils
===============

.. py:module:: api_24sea.utils

.. autoapi-nested-parse::

   Utility functions and classes.


Functions
---------

.. autoapisummary::

   api_24sea.utils.is_executable_ipython
   api_24sea.utils.run_tqdm
   api_24sea.utils.normalize_http_method
   api_24sea.utils.build_json_request_payload
   api_24sea.utils.build_httpx_request_kwargs
   api_24sea.utils.handle_request
   api_24sea.utils.handle_request_async
   api_24sea.utils.default_to_regular_dict
   api_24sea.utils.flatten_api_response
   api_24sea.utils.normalize_group_values
   api_24sea.utils.require_auth
   api_24sea.utils.parse_timestamp
   api_24sea.utils.estimate_chunk_size
   api_24sea.utils.fetch_data_sync
   api_24sea.utils.fetch_availability_sync
   api_24sea.utils.fetch_availability_async
   api_24sea.utils.fetch_data_async
   api_24sea.utils.fetch_oldest_timestamp_sync
   api_24sea.utils.set_threads_nr
   api_24sea.utils.parse_stats_list
   api_24sea.utils.get_stats_overview_info
   api_24sea.utils.get_stats_as_dict
   api_24sea.utils.get_metrics_data_df_as_dict
   api_24sea.utils.series_to_type
   api_24sea.utils.column_to_type
   api_24sea.utils.calendar_monthly_availability


Module Contents
---------------

.. py:function:: is_executable_ipython() -> bool

   Return True when running under IPython-based shells.

   Returns
   -------
   bool
       True for IPython or Jupyter shells, False for standard Python.

   Examples
   --------
   >>> isinstance(is_executable_ipython(), bool)
   True


.. py:function:: run_tqdm(iterable: Iterable, *args: Any, exec_env: bool, **kwargs: Any)

   Select the appropriate tqdm variant for the active environment.

   Parameters
   ----------
   iterable : Iterable
       Items to iterate over.
   args : Any
       Positional arguments forwarded to ``tqdm``.
   exec_env : bool
       True when the notebook-aware progress bar should be used.
   kwargs : Any
       Keyword arguments forwarded to ``tqdm``.

   Returns
   -------
   tqdm.std.tqdm
       Configured progress iterator.

   Examples
   --------
   >>> list(run_tqdm(range(2), exec_env=False, disable=True))
   [0, 1]


.. py:function:: normalize_http_method(method: str) -> str

   Normalize and validate an HTTP method.

   Parameters
   ----------
   method : str
       HTTP verb to normalize.

   Returns
   -------
   str
       Upper-case HTTP method.

   Raises
   ------
   ValueError
       If the provided method is not supported.


.. py:function:: build_json_request_payload(params: Optional[Dict], json: Optional[Dict] = None) -> Optional[Dict]

   Normalize a non-GET payload to the backend JSON contract.

   Parameters
   ----------
   params : dict, optional
       Request parameters produced by the caller.
   json : dict, optional
       Explicit JSON payload fallback.

   Returns
   -------
   dict or None
       Normalized JSON payload.


.. py:function:: build_httpx_request_kwargs(method: str, params: Dict, json: Optional[Dict] = None) -> Tuple[Optional[Dict], Optional[Dict]]

   Build ``httpx`` request kwargs based on the HTTP method.

   GET requests keep using query-string parameters. Non-GET requests build
   the request payload from ``params`` and only fall back to ``json`` when
   ``params`` is empty.

   Parameters
   ----------
   method : str
       Normalized HTTP method.
   params : dict
       Parameters provided by the caller.
   json : dict, optional
       Explicit JSON payload fallback.

   Returns
   -------
   Tuple[Optional[Dict], Optional[Dict]]
       ``params`` and ``json`` values to forward to ``httpx.request``.


.. py:function:: handle_request(url: str, params: Dict, auth: Optional[httpx.BasicAuth], headers: Optional[Dict[str, str]] = {'accept': 'application/json'}, max_retries: int = 10, timeout: int = 3600, method: str = 'GET', json: Optional[Dict] = None) -> httpx.Response

   Handle the request to the 24SEA API and manage errors using httpx.

   This function will handle the request to the 24SEA API and manage any
   errors that may arise. If the request is successful, the response object
   will be returned. Otherwise, an error will be raised.

   Parameters
   ----------
   url : str
       The URL to which to send the request.
   params : dict
       The parameters to send with the request.
   auth : httpx.BasicAuth
       The authentication object.
   headers : dict
       The headers to send with the request.

   Returns
   -------
   httpx.Response
       The response object if the request was successful, otherwise error.


.. py:function:: handle_request_async(url: str, params: Dict, auth: Optional[httpx.BasicAuth], headers: Optional[Dict[str, str]] = {'accept': 'application/json'}, max_retries: int = 10, timeout: int = 1800, method: str = 'GET', json: Optional[Dict] = None) -> httpx.Response
   :async:


   Asynchronously handle the request to the 24SEA API using httpx's
   AsyncClient. Supports GET, POST, PUT, PATCH, DELETE methods.


.. py:function:: default_to_regular_dict(d_: Union[DefaultDict, Dict]) -> Dict

   Convert a defaultdict to a regular dictionary.


.. py:function:: flatten_api_response(payload: Any) -> Any

   Flatten nested list payloads returned by multi-location endpoints.


.. py:function:: normalize_group_values(values: List[str]) -> List[str]

   Normalize grouped request values while preserving order.

   Site-level grouping can repeat the special ``all`` metric once per
   location. The API only accepts it as a single value.


.. py:function:: require_auth(func)

   Decorator to ensure authentication before executing a method


.. py:function:: parse_timestamp(df: pandas.DataFrame, formats: Iterable[str] = ('ISO8601', 'mixed'), dayfirst: bool = False, keep_index_only: bool = True) -> pandas.DataFrame

   Parse timestamp column in DataFrame using multiple format attempts.

   Parameters
   ----------
   df : pandas.DataFrame
       Input DataFrame containing timestamp column or index
   formats : Iterable[str], default ('ISO8601', 'mixed')
       List of datetime format strings to try
   dayfirst : bool, default False
       Whether to interpret dates as day first

   Returns
   -------
   pandas.DataFrame
       DataFrame with parsed timestamp column

   Raises
   ------
   ValueError
       If timestamp parsing fails with all formats


.. py:function:: estimate_chunk_size(tasks: list, start_timestamp: Union[str, datetime.datetime], end_timestamp: Union[str, datetime.datetime], grouped_metrics: Iterable, selected_metrics: Union[pandas.DataFrame, None] = None, target: str = 'metric')

   Estimate the optimal chunk size for processing tasks based on the expected
   data volume.
   This function calculates the estimated size of the data request in megabytes
   (MB) by considering the number of data points, the number of tasks, and the
   bytes required per metric. It then determines an appropriate chunk size for
   processing the tasks efficiently.

   Parameters
   ----------
   tasks : list
       List of tasks to be processed.
   query : object
       Query object containing at least `start_timestamp` and `end_timestamp`
       attributes.
   grouped_metrics : iterable
       Iterable of grouped metrics, where each group is a tuple (key, group),
       and group is typically a DataFrame.
   selected_metrics : pandas.DataFrame or None
       DataFrame containing selected metrics with at least a "metric" column
       and optionally a "data_group" column.
   target : str, default "metric"
       The target column name in `selected_metrics` and `grouped_metrics`

   Returns
   -------
   dict
       Dictionary with the following keys:
           - "total_mb": float, estimated total size of the request in MB.
           - "n_tasks": int, number of tasks.
           - "chunk_size": int, recommended chunk size for processing.

   Notes
   -----
   - The function assumes each data point is a float64 (8 bytes) unless
     overridden by the "data_group".
   - The number of data points is estimated as one every 10 minutes between the
     start and end timestamps.
   - Chunk size is determined based on the estimated total data size.


.. py:function:: fetch_data_sync(url, site: str, locations: Union[str, List[str]], start_timestamp: Union[datetime.datetime, str], end_timestamp: Union[datetime.datetime, str], headers: Optional[Dict[str, str]], group: pandas.DataFrame, auth: Optional[httpx.BasicAuth], timeout: int, target: str = 'metric', force_cache_miss: bool = False, method: str = 'GET') -> Any

   Syncronously fetch metrics data for the datasignals API app.


.. py:function:: fetch_availability_sync(url, site: str, locations: Union[str, List[str]], start_timestamp: Union[datetime.datetime, str], end_timestamp: Union[datetime.datetime, str], granularity: Union[str, int], sampling_interval_seconds: int, headers: Optional[Dict[str, str]], group: pandas.DataFrame, auth: Optional[httpx.BasicAuth], timeout: int, method: str = 'GET') -> Any

   Syncronously fetch metrics data for the datasignals API app.


.. py:function:: fetch_availability_async(url: str, site: str, locations: Union[str, List[str], None], start_timestamp: Union[datetime.datetime, str], end_timestamp: Union[datetime.datetime, str], granularity: Union[str, int], sampling_interval_seconds: int, headers: Optional[Dict[str, str]], group: pandas.DataFrame, auth: Optional[httpx.BasicAuth], timeout: int, max_retries: int, method: str = 'GET') -> Any
   :async:


   Asynchronously fetch availability data for a site/location.


.. py:function:: fetch_data_async(url, site: str, locations: Union[str, List[str]], start_timestamp: Union[datetime.datetime, str], end_timestamp: Union[datetime.datetime, str], headers: Optional[Dict[str, str]], group: pandas.DataFrame, auth: Optional[httpx.BasicAuth], timeout: int, max_retries: int, as_dict: bool = False, target: str = 'metric', force_cache_miss: bool = False, method: str = 'GET') -> Union[pandas.DataFrame, Dict[str, Any]]
   :async:


   Asyncronously fetch metrics data for the datasignals API app.


.. py:function:: fetch_oldest_timestamp_sync(url: str, site: str, locations: Optional[str], headers: Dict[str, str], auth: Optional[httpx.BasicAuth], timeout: int, as_dict: bool = False, method: str = 'GET') -> Union[pandas.DataFrame, Dict[str, Any]]

   Synchronously fetch oldest timestamp data for a site/location.


.. py:function:: set_threads_nr(threads: Optional[int], thread_limit: int = 30) -> int

   Set the number of threads to use for processing.

   Parameters
   ----------
   threads : Optional[int]
       The number of threads to use. If None, the number of available CPU cores
       will be used.

   Returns
   -------
   int
       The number of threads to use.


.. py:function:: parse_stats_list(stats_list: List[Dict[str, Any]]) -> pandas.DataFrame

   Parse a list of statistics dictionaries into a DataFrame.

   Parameters
   ----------
   stats_list : List[Dict[str, Any]]
       List of dictionaries containing statistics data.

   Returns
   -------
   pd.DataFrame
       DataFrame containing the parsed statistics.


.. py:function:: get_stats_overview_info(stats_df: pandas.DataFrame, metrics_overview: Optional[pandas.DataFrame] = None) -> pandas.DataFrame

   Get the overview information for statistics DataFrame.

   Parameters
   ----------
   stats_df : pd.DataFrame
       DataFrame containing statistics data.
   metrics_overview : pd.DataFrame
       DataFrame containing metrics overview information.

   Returns
   -------
   pd.DataFrame
       DataFrame with overview information merged with stats_df.


.. py:function:: get_stats_as_dict(stats_df: pandas.DataFrame) -> Dict[str, Dict[str, pandas.DataFrame]]

   Convert the statistics DataFrame to a dictionary format.

   Parameters
   ----------
   stats_df : pd.DataFrame
       DataFrame containing statistics data.

   Returns
   -------
   Dict[str, Dict[str, pd.DataFrame]]
       Dictionary with site and location as keys and statistics as values.


.. py:function:: get_metrics_data_df_as_dict(metrics_data_df: pandas.DataFrame, selected_metrics: pandas.DataFrame) -> Dict[str, Dict[str, pandas.DataFrame]]

   Convert the metrics DataFrame to a dictionary format.

   Parameters
   ----------
   metrics_data_df : pd.DataFrame
       DataFrame containing metrics data.

   Returns
   -------
   Dict[str, Dict[str, pd.DataFrame]]
       Dictionary with site and location as keys and metrics data as values.


.. py:function:: series_to_type(series: pandas.Series, dtype: Union[str, type]) -> Union[pandas.Series, pandas.Timestamp]

   Convert a pandas Series to a specified data type.

   Parameters
   ----------
   series : pd.Series
       The Series to convert.
   dtype : Union[str, type]
       The data type to convert the series to.

   Returns
   -------
   pd.Series
       The Series converted to the specified data type.

   Example
   -------
   >>> import pandas as pd
   >>> s = pd.Series([1, 2, 3])
   >>> column_to_type(s, float)
   0    1.0
   1    2.0
   2    3.0
   dtype: float64


.. py:function:: column_to_type(data: pandas.DataFrame, column: str, dtype: Union[str, type]) -> pandas.DataFrame

   Convert a column in a DataFrame to a specified data type.

   Parameters
   ----------
   data : pd.DataFrame
       The DataFrame containing the column to convert.
   column : str
       The column to convert.
   dtype : Union[str, type]
       The data type to convert the column to.

   Returns
   -------
   pd.DataFrame
       The DataFrame with the column converted to the specified data type.

   Example
   -------
   >>> import pandas as pd
   >>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
   >>> column_to_type(df, 'A', float)
      A  B
   0  1  4
   1  2  5
   2  3  6


.. py:function:: calendar_monthly_availability(df: pandas.DataFrame, *, start_timestamp: Union[datetime.datetime, str, None] = None, end_timestamp: Union[datetime.datetime, str, None] = None, sampling_interval_seconds: int = 600) -> pandas.DataFrame

   Convert a daily availability dataframe to a calendar monthly availability
   dataframe.
   The columns of the input dataframe are assumed to be daily availability
   values (between 0 and 1). The output dataframe will have the same columns,
   but the two indices will be year and month. The values will be the mean of
   the daily availability values in that month.

   Parameters
   ----------
   df : pd.DataFrame
       Input dataframe with daily availability values.

   Returns
   -------
   pd.DataFrame
       Output dataframe with calendar monthly availability values.

   Example
   -------
   >>> import pandas as pd
   >>> data = {'timestamp': pd.date_range(start='2023-01-01', periods=90,
   ...                                    freq='D'),
   ...         'availability': [0.9, 0.8, 0.95] * 30}
   >>> df = pd.DataFrame(data).set_index('timestamp')
   >>> calendar_monthly_availability(df)
               availability
   timestamp
   2023-01          0.883333
   2023-02          0.883333
   2023-03          0.883333