api_24sea.utils =============== .. py:module:: api_24sea.utils .. autoapi-nested-parse:: Utility functions and classes. Functions --------- .. autoapisummary:: api_24sea.utils.is_executable_ipython api_24sea.utils.run_tqdm api_24sea.utils.normalize_http_method api_24sea.utils.build_json_request_payload api_24sea.utils.build_httpx_request_kwargs api_24sea.utils.handle_request api_24sea.utils.handle_request_async api_24sea.utils.default_to_regular_dict api_24sea.utils.flatten_api_response api_24sea.utils.normalize_group_values api_24sea.utils.require_auth api_24sea.utils.parse_timestamp api_24sea.utils.estimate_chunk_size api_24sea.utils.fetch_data_sync api_24sea.utils.fetch_availability_sync api_24sea.utils.fetch_availability_async api_24sea.utils.fetch_data_async api_24sea.utils.fetch_oldest_timestamp_sync api_24sea.utils.set_threads_nr api_24sea.utils.parse_stats_list api_24sea.utils.get_stats_overview_info api_24sea.utils.get_stats_as_dict api_24sea.utils.get_metrics_data_df_as_dict api_24sea.utils.series_to_type api_24sea.utils.column_to_type api_24sea.utils.calendar_monthly_availability Module Contents --------------- .. py:function:: is_executable_ipython() -> bool Return True when running under IPython-based shells. Returns ------- bool True for IPython or Jupyter shells, False for standard Python. Examples -------- >>> isinstance(is_executable_ipython(), bool) True .. py:function:: run_tqdm(iterable: Iterable, *args: Any, exec_env: bool, **kwargs: Any) Select the appropriate tqdm variant for the active environment. Parameters ---------- iterable : Iterable Items to iterate over. args : Any Positional arguments forwarded to ``tqdm``. exec_env : bool True when the notebook-aware progress bar should be used. kwargs : Any Keyword arguments forwarded to ``tqdm``. Returns ------- tqdm.std.tqdm Configured progress iterator. Examples -------- >>> list(run_tqdm(range(2), exec_env=False, disable=True)) [0, 1] .. py:function:: normalize_http_method(method: str) -> str Normalize and validate an HTTP method. Parameters ---------- method : str HTTP verb to normalize. Returns ------- str Upper-case HTTP method. Raises ------ ValueError If the provided method is not supported. .. py:function:: build_json_request_payload(params: Optional[Dict], json: Optional[Dict] = None) -> Optional[Dict] Normalize a non-GET payload to the backend JSON contract. Parameters ---------- params : dict, optional Request parameters produced by the caller. json : dict, optional Explicit JSON payload fallback. Returns ------- dict or None Normalized JSON payload. .. py:function:: build_httpx_request_kwargs(method: str, params: Dict, json: Optional[Dict] = None) -> Tuple[Optional[Dict], Optional[Dict]] Build ``httpx`` request kwargs based on the HTTP method. GET requests keep using query-string parameters. Non-GET requests build the request payload from ``params`` and only fall back to ``json`` when ``params`` is empty. Parameters ---------- method : str Normalized HTTP method. params : dict Parameters provided by the caller. json : dict, optional Explicit JSON payload fallback. Returns ------- Tuple[Optional[Dict], Optional[Dict]] ``params`` and ``json`` values to forward to ``httpx.request``. .. py:function:: handle_request(url: str, params: Dict, auth: Optional[httpx.BasicAuth], headers: Optional[Dict[str, str]] = {'accept': 'application/json'}, max_retries: int = 10, timeout: int = 3600, method: str = 'GET', json: Optional[Dict] = None) -> httpx.Response Handle the request to the 24SEA API and manage errors using httpx. This function will handle the request to the 24SEA API and manage any errors that may arise. If the request is successful, the response object will be returned. Otherwise, an error will be raised. Parameters ---------- url : str The URL to which to send the request. params : dict The parameters to send with the request. auth : httpx.BasicAuth The authentication object. headers : dict The headers to send with the request. Returns ------- httpx.Response The response object if the request was successful, otherwise error. .. py:function:: handle_request_async(url: str, params: Dict, auth: Optional[httpx.BasicAuth], headers: Optional[Dict[str, str]] = {'accept': 'application/json'}, max_retries: int = 10, timeout: int = 1800, method: str = 'GET', json: Optional[Dict] = None) -> httpx.Response :async: Asynchronously handle the request to the 24SEA API using httpx's AsyncClient. Supports GET, POST, PUT, PATCH, DELETE methods. .. py:function:: default_to_regular_dict(d_: Union[DefaultDict, Dict]) -> Dict Convert a defaultdict to a regular dictionary. .. py:function:: flatten_api_response(payload: Any) -> Any Flatten nested list payloads returned by multi-location endpoints. .. py:function:: normalize_group_values(values: List[str]) -> List[str] Normalize grouped request values while preserving order. Site-level grouping can repeat the special ``all`` metric once per location. The API only accepts it as a single value. .. py:function:: require_auth(func) Decorator to ensure authentication before executing a method .. py:function:: parse_timestamp(df: pandas.DataFrame, formats: Iterable[str] = ('ISO8601', 'mixed'), dayfirst: bool = False, keep_index_only: bool = True) -> pandas.DataFrame Parse timestamp column in DataFrame using multiple format attempts. Parameters ---------- df : pandas.DataFrame Input DataFrame containing timestamp column or index formats : Iterable[str], default ('ISO8601', 'mixed') List of datetime format strings to try dayfirst : bool, default False Whether to interpret dates as day first Returns ------- pandas.DataFrame DataFrame with parsed timestamp column Raises ------ ValueError If timestamp parsing fails with all formats .. py:function:: estimate_chunk_size(tasks: list, start_timestamp: Union[str, datetime.datetime], end_timestamp: Union[str, datetime.datetime], grouped_metrics: Iterable, selected_metrics: Union[pandas.DataFrame, None] = None, target: str = 'metric') Estimate the optimal chunk size for processing tasks based on the expected data volume. This function calculates the estimated size of the data request in megabytes (MB) by considering the number of data points, the number of tasks, and the bytes required per metric. It then determines an appropriate chunk size for processing the tasks efficiently. Parameters ---------- tasks : list List of tasks to be processed. query : object Query object containing at least `start_timestamp` and `end_timestamp` attributes. grouped_metrics : iterable Iterable of grouped metrics, where each group is a tuple (key, group), and group is typically a DataFrame. selected_metrics : pandas.DataFrame or None DataFrame containing selected metrics with at least a "metric" column and optionally a "data_group" column. target : str, default "metric" The target column name in `selected_metrics` and `grouped_metrics` Returns ------- dict Dictionary with the following keys: - "total_mb": float, estimated total size of the request in MB. - "n_tasks": int, number of tasks. - "chunk_size": int, recommended chunk size for processing. Notes ----- - The function assumes each data point is a float64 (8 bytes) unless overridden by the "data_group". - The number of data points is estimated as one every 10 minutes between the start and end timestamps. - Chunk size is determined based on the estimated total data size. .. py:function:: fetch_data_sync(url, site: str, locations: Union[str, List[str]], start_timestamp: Union[datetime.datetime, str], end_timestamp: Union[datetime.datetime, str], headers: Optional[Dict[str, str]], group: pandas.DataFrame, auth: Optional[httpx.BasicAuth], timeout: int, target: str = 'metric', force_cache_miss: bool = False, method: str = 'GET') -> Any Syncronously fetch metrics data for the datasignals API app. .. py:function:: fetch_availability_sync(url, site: str, locations: Union[str, List[str]], start_timestamp: Union[datetime.datetime, str], end_timestamp: Union[datetime.datetime, str], granularity: Union[str, int], sampling_interval_seconds: int, headers: Optional[Dict[str, str]], group: pandas.DataFrame, auth: Optional[httpx.BasicAuth], timeout: int, method: str = 'GET') -> Any Syncronously fetch metrics data for the datasignals API app. .. py:function:: fetch_availability_async(url: str, site: str, locations: Union[str, List[str], None], start_timestamp: Union[datetime.datetime, str], end_timestamp: Union[datetime.datetime, str], granularity: Union[str, int], sampling_interval_seconds: int, headers: Optional[Dict[str, str]], group: pandas.DataFrame, auth: Optional[httpx.BasicAuth], timeout: int, max_retries: int, method: str = 'GET') -> Any :async: Asynchronously fetch availability data for a site/location. .. py:function:: fetch_data_async(url, site: str, locations: Union[str, List[str]], start_timestamp: Union[datetime.datetime, str], end_timestamp: Union[datetime.datetime, str], headers: Optional[Dict[str, str]], group: pandas.DataFrame, auth: Optional[httpx.BasicAuth], timeout: int, max_retries: int, as_dict: bool = False, target: str = 'metric', force_cache_miss: bool = False, method: str = 'GET') -> Union[pandas.DataFrame, Dict[str, Any]] :async: Asyncronously fetch metrics data for the datasignals API app. .. py:function:: fetch_oldest_timestamp_sync(url: str, site: str, locations: Optional[str], headers: Dict[str, str], auth: Optional[httpx.BasicAuth], timeout: int, as_dict: bool = False, method: str = 'GET') -> Union[pandas.DataFrame, Dict[str, Any]] Synchronously fetch oldest timestamp data for a site/location. .. py:function:: set_threads_nr(threads: Optional[int], thread_limit: int = 30) -> int Set the number of threads to use for processing. Parameters ---------- threads : Optional[int] The number of threads to use. If None, the number of available CPU cores will be used. Returns ------- int The number of threads to use. .. py:function:: parse_stats_list(stats_list: List[Dict[str, Any]]) -> pandas.DataFrame Parse a list of statistics dictionaries into a DataFrame. Parameters ---------- stats_list : List[Dict[str, Any]] List of dictionaries containing statistics data. Returns ------- pd.DataFrame DataFrame containing the parsed statistics. .. py:function:: get_stats_overview_info(stats_df: pandas.DataFrame, metrics_overview: Optional[pandas.DataFrame] = None) -> pandas.DataFrame Get the overview information for statistics DataFrame. Parameters ---------- stats_df : pd.DataFrame DataFrame containing statistics data. metrics_overview : pd.DataFrame DataFrame containing metrics overview information. Returns ------- pd.DataFrame DataFrame with overview information merged with stats_df. .. py:function:: get_stats_as_dict(stats_df: pandas.DataFrame) -> Dict[str, Dict[str, pandas.DataFrame]] Convert the statistics DataFrame to a dictionary format. Parameters ---------- stats_df : pd.DataFrame DataFrame containing statistics data. Returns ------- Dict[str, Dict[str, pd.DataFrame]] Dictionary with site and location as keys and statistics as values. .. py:function:: get_metrics_data_df_as_dict(metrics_data_df: pandas.DataFrame, selected_metrics: pandas.DataFrame) -> Dict[str, Dict[str, pandas.DataFrame]] Convert the metrics DataFrame to a dictionary format. Parameters ---------- metrics_data_df : pd.DataFrame DataFrame containing metrics data. Returns ------- Dict[str, Dict[str, pd.DataFrame]] Dictionary with site and location as keys and metrics data as values. .. py:function:: series_to_type(series: pandas.Series, dtype: Union[str, type]) -> Union[pandas.Series, pandas.Timestamp] Convert a pandas Series to a specified data type. Parameters ---------- series : pd.Series The Series to convert. dtype : Union[str, type] The data type to convert the series to. Returns ------- pd.Series The Series converted to the specified data type. Example ------- >>> import pandas as pd >>> s = pd.Series([1, 2, 3]) >>> column_to_type(s, float) 0 1.0 1 2.0 2 3.0 dtype: float64 .. py:function:: column_to_type(data: pandas.DataFrame, column: str, dtype: Union[str, type]) -> pandas.DataFrame Convert a column in a DataFrame to a specified data type. Parameters ---------- data : pd.DataFrame The DataFrame containing the column to convert. column : str The column to convert. dtype : Union[str, type] The data type to convert the column to. Returns ------- pd.DataFrame The DataFrame with the column converted to the specified data type. Example ------- >>> import pandas as pd >>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) >>> column_to_type(df, 'A', float) A B 0 1 4 1 2 5 2 3 6 .. py:function:: calendar_monthly_availability(df: pandas.DataFrame, *, start_timestamp: Union[datetime.datetime, str, None] = None, end_timestamp: Union[datetime.datetime, str, None] = None, sampling_interval_seconds: int = 600) -> pandas.DataFrame Convert a daily availability dataframe to a calendar monthly availability dataframe. The columns of the input dataframe are assumed to be daily availability values (between 0 and 1). The output dataframe will have the same columns, but the two indices will be year and month. The values will be the mean of the daily availability values in that month. Parameters ---------- df : pd.DataFrame Input dataframe with daily availability values. Returns ------- pd.DataFrame Output dataframe with calendar monthly availability values. Example ------- >>> import pandas as pd >>> data = {'timestamp': pd.date_range(start='2023-01-01', periods=90, ... freq='D'), ... 'availability': [0.9, 0.8, 0.95] * 30} >>> df = pd.DataFrame(data).set_index('timestamp') >>> calendar_monthly_availability(df) availability timestamp 2023-01 0.883333 2023-02 0.883333 2023-03 0.883333