utils¶

General utility functions.

iowa_forecast.utils.normalize_item_name(item_name: str) → str[source]¶

Convert ‘item_name’ values to lower case and replace spaces with underscores.

Parameters:: item_name (str) – Item names to normalize.
Returns:: str – Normalized item names.
Return type:: str

Examples

>>> normalize_item_name("TITOS HANDMADE VODKA")
'titos_handmade_vodka'

Notes

Used to generate names for the different ARIMA models that are created for each unique item name.

iowa_forecast.utils.split_table_name_info(table_name: str) → Tuple[str | None, str | None, str][source]¶

Extract components from a table name.

Parameters:: table_name (str) – Table name to extract components from.
Returns:: Tuple[str | None, str | None, str] – A tuple containing the project ID, dataset ID and table name if any of these components are in the table name. If one of the components is not contained inside table_name, then they are returned as None.
Return type:: Tuple[str | None, str | None, str]

Examples

>>> split_table_name_info('my_project.my_dataset.my_table')
('my_project', 'my_dataset', 'my_table')
>>> split_table_name_info('my_dataset.my_table')
(None, 'my_dataset', 'my_table')
>>> split_table_name_info('my_table')
(None, None, 'my_table')

iowa_forecast.utils.create_bigquery_table_from_pandas(client: Client, dataframe: DataFrame, table_id: str, dataset_id='bqmlforecast', if_exists: str = 'replace')[source]¶

Create a BigQuery table from a pandas DataFrame.

Parameters:

client (bigquery.Client) – BigQuery client used to connect to the service.
dataframe (pd.DataFrame) – A pandas.DataFrame to load into the BigQuery table.
table_id (str) – ID of the table to create in BigQuery.
dataset_id (str, default "bqmlforecast") – ID of the dataset where the table will be created.
if_exists ({"fail", "replace", "append"}, default "replace") – Behavior when the table already exists.

Examples

>>> client = bigquery.Client()
>>> dataframe = pd.DataFrame({'column1': [1, 2], 'column2': ['a', 'b']})
>>> create_bigquery_table_from_pandas(client, dataframe, 'my_table')

iowa_forecast.utils.create_dataset_if_not_found(client: bigquery.Client, project_id: str | None = None, dataset_name: str = 'bqmlforecast', location: str = 'us')[source]¶

Create a BigQuery dataset if it does not exist.

Parameters:

client (bigquery.Client) – BigQuery client used to connect to the service.
project_id (str, optional) – ID of the project where the dataset will be created. If no value is provided, the Project ID gets inferred from the project attibute from client.
dataset_name (str, default "bqmlforecast") – Name of the dataset to create.
location (str, default "us") – Location of the dataset.

Raises:

Exception – If any exception other than the error informing the dataset already exists.

Examples

>>> client = bigquery.Client()
>>> create_dataset_if_not_found(client, dataset_name='new_dataset')
Dataset 'new_dataset' already exists.

Notes

This function checks if the specified dataset exists in the given project. If it does not exist, the function creates the dataset.

iowa_forecast.utils.list_tables_with_pattern(client: bigquery.Client, dataset_id: str, table_pattern: str, project_id: str | None = None) → List[str][source]¶

List BigQuery tables matching a specific pattern.

Constructs a fully qualified dataset ID, retrieves the dataset, lists all tables, and filters them based on the provided pattern.

Parameters:

client (bigquery.Client) – The BigQuery client used to interact with the service.
dataset_id (str) – The ID of the dataset containing the tables to list.
table_pattern (str) – The pattern to match against the table IDs.
project_id (str, optional) – The ID of the project containing the dataset. If None, the client’s project is used.

Returns:

List[str] – A list of table IDs that match the specified pattern.

Return type:

List[str]

Notes

The fnmatch module is used to filter tables based on the pattern. Ensure that the pattern provided is compatible with fnmatch.

Examples

List all tables in a dataset that match the pattern ‘sales_*’:

>>> client = bigquery.Client()
>>> tables = list_tables_with_pattern(client, 'my_dataset', 'sales_*')
>>> print(tables)
['sales_2021', 'sales_2022']

iowa_forecast.utils.parse_combined_string(combined: str) → dict[source]¶

Parse a combined offset string into its components.

Parameters:: combined (str) – A combined string specifying the offset, e.g., '2Y3M2W1D'.
Returns:: dict – A dictionary with keys 'years', 'months', 'weeks', 'days' and their corresponding values.
Raises:: ValueError – If the combined string is invalid.
Return type:: dict

iowa_forecast.utils.create_date_offset_from_parts(years=0, months=0, weeks=0, days=0) → DateOffset[source]¶

Create a pandas.DateOffset object from individual time components.

Parameters:

years (int, default 0) – Number of years for the offset.
months (int, default 0) – Number of months for the offset.
weeks (int, default 0) – Number of weeks for the offset.
days (int, default 0) – Number of days for the offset.

Returns:

pd.DateOffset – A pandas.DateOffset object for the specified time components.

Return type:

DateOffset

iowa_forecast.utils.date_offset(*args: int | str, freq: str = None) → pd.DateOffset[source]¶

Generate a pandas.DateOffset based on the given frequency and value or a combined string.

Parameters:

args (int or str) –
- If one argument is provided, it should be a combined string specifying the offset, e.g., '2Y3M2W1D'.
- If two arguments are provided, they should be n (int) and freq (str).
freq (str {'days', 'weeks', 'months', 'years'}, optional) – The frequency type. Valid options are 'days', 'weeks', 'months', 'years'. Ignored if combined is provided.

Returns:

pd.DateOffset – A pandas.DateOffset object for the specified frequency and value.

Raises:

ValueError – If freq is not one of the valid options or if the combined string is invalid.

Return type:

pd.DateOffset