utils

General utility functions.

iowa_forecast.utils.normalize_item_name(item_name: str) str[source]

Convert ‘item_name’ values to lower case and replace spaces with underscores.

Parameters:

item_name (str) – Item names to normalize.

Returns:

str – Normalized item names.

Return type:

str

Examples

>>> normalize_item_name("TITOS HANDMADE VODKA")
'titos_handmade_vodka'

Notes

Used to generate names for the different ARIMA models that are created for each unique item name.

iowa_forecast.utils.split_table_name_info(table_name: str) Tuple[str | None, str | None, str][source]

Extract components from a table name.

Parameters:

table_name (str) – Table name to extract components from.

Returns:

Tuple[str | None, str | None, str] – A tuple containing the project ID, dataset ID and table name if any of these components are in the table name. If one of the components is not contained inside table_name, then they are returned as None.

Return type:

Tuple[str | None, str | None, str]

Examples

>>> split_table_name_info('my_project.my_dataset.my_table')
('my_project', 'my_dataset', 'my_table')
>>> split_table_name_info('my_dataset.my_table')
(None, 'my_dataset', 'my_table')
>>> split_table_name_info('my_table')
(None, None, 'my_table')
iowa_forecast.utils.create_bigquery_table_from_pandas(client: Client, dataframe: DataFrame, table_id: str, dataset_id='bqmlforecast', if_exists: str = 'replace')[source]

Create a BigQuery table from a pandas DataFrame.

Parameters:
  • client (bigquery.Client) – BigQuery client used to connect to the service.

  • dataframe (pd.DataFrame) – A pandas.DataFrame to load into the BigQuery table.

  • table_id (str) – ID of the table to create in BigQuery.

  • dataset_id (str, default "bqmlforecast") – ID of the dataset where the table will be created.

  • if_exists ({"fail", "replace", "append"}, default "replace") – Behavior when the table already exists.

Examples

>>> client = bigquery.Client()
>>> dataframe = pd.DataFrame({'column1': [1, 2], 'column2': ['a', 'b']})
>>> create_bigquery_table_from_pandas(client, dataframe, 'my_table')
iowa_forecast.utils.create_dataset_if_not_found(client: bigquery.Client, project_id: str | None = None, dataset_name: str = 'bqmlforecast', location: str = 'us')[source]

Create a BigQuery dataset if it does not exist.

Parameters:
  • client (bigquery.Client) – BigQuery client used to connect to the service.

  • project_id (str, optional) – ID of the project where the dataset will be created. If no value is provided, the Project ID gets inferred from the project attibute from client.

  • dataset_name (str, default "bqmlforecast") – Name of the dataset to create.

  • location (str, default "us") – Location of the dataset.

Raises:

Exception – If any exception other than the error informing the dataset already exists.

Examples

>>> client = bigquery.Client()
>>> create_dataset_if_not_found(client, dataset_name='new_dataset')
Dataset 'new_dataset' already exists.

Notes

This function checks if the specified dataset exists in the given project. If it does not exist, the function creates the dataset.

iowa_forecast.utils.list_tables_with_pattern(client: bigquery.Client, dataset_id: str, table_pattern: str, project_id: str | None = None) List[str][source]

List BigQuery tables matching a specific pattern.

Constructs a fully qualified dataset ID, retrieves the dataset, lists all tables, and filters them based on the provided pattern.

Parameters:
  • client (bigquery.Client) – The BigQuery client used to interact with the service.

  • dataset_id (str) – The ID of the dataset containing the tables to list.

  • table_pattern (str) – The pattern to match against the table IDs.

  • project_id (str, optional) – The ID of the project containing the dataset. If None, the client’s project is used.

Returns:

List[str] – A list of table IDs that match the specified pattern.

Return type:

List[str]

Notes

The fnmatch module is used to filter tables based on the pattern. Ensure that the pattern provided is compatible with fnmatch.

Examples

List all tables in a dataset that match the pattern ‘sales_*’:

>>> client = bigquery.Client()
>>> tables = list_tables_with_pattern(client, 'my_dataset', 'sales_*')
>>> print(tables)
['sales_2021', 'sales_2022']
iowa_forecast.utils.parse_combined_string(combined: str) dict[source]

Parse a combined offset string into its components.

Parameters:

combined (str) – A combined string specifying the offset, e.g., '2Y3M2W1D'.

Returns:

dict – A dictionary with keys 'years', 'months', 'weeks', 'days' and their corresponding values.

Raises:

ValueError – If the combined string is invalid.

Return type:

dict

iowa_forecast.utils.create_date_offset_from_parts(years=0, months=0, weeks=0, days=0) DateOffset[source]

Create a pandas.DateOffset object from individual time components.

Parameters:
  • years (int, default 0) – Number of years for the offset.

  • months (int, default 0) – Number of months for the offset.

  • weeks (int, default 0) – Number of weeks for the offset.

  • days (int, default 0) – Number of days for the offset.

Returns:

pd.DateOffset – A pandas.DateOffset object for the specified time components.

Return type:

DateOffset

iowa_forecast.utils.date_offset(*args: int | str, freq: str = None) pd.DateOffset[source]

Generate a pandas.DateOffset based on the given frequency and value or a combined string.

Parameters:
  • args (int or str) –

    • If one argument is provided, it should be a combined string specifying the offset, e.g., '2Y3M2W1D'.

    • If two arguments are provided, they should be n (int) and freq (str).

  • freq (str {'days', 'weeks', 'months', 'years'}, optional) – The frequency type. Valid options are 'days', 'weeks', 'months', 'years'. Ignored if combined is provided.

Returns:

pd.DateOffset – A pandas.DateOffset object for the specified frequency and value.

Raises:

ValueError – If freq is not one of the valid options or if the combined string is invalid.

Return type:

pd.DateOffset