utils¶
General utility functions.
- iowa_forecast.utils.normalize_item_name(item_name: str) str [source]¶
Convert ‘item_name’ values to lower case and replace spaces with underscores.
- Parameters:
item_name (
str
) – Item names to normalize.- Returns:
str
– Normalized item names.- Return type:
Examples
>>> normalize_item_name("TITOS HANDMADE VODKA") 'titos_handmade_vodka'
Notes
Used to generate names for the different ARIMA models that are created for each unique item name.
- iowa_forecast.utils.split_table_name_info(table_name: str) Tuple[str | None, str | None, str] [source]¶
Extract components from a table name.
- Parameters:
table_name (
str
) – Table name to extract components from.- Returns:
Tuple[str | None
,str | None
,str]
– A tuple containing the project ID, dataset ID and table name if any of these components are in the table name. If one of the components is not contained insidetable_name
, then they are returned as None.- Return type:
Examples
>>> split_table_name_info('my_project.my_dataset.my_table') ('my_project', 'my_dataset', 'my_table') >>> split_table_name_info('my_dataset.my_table') (None, 'my_dataset', 'my_table') >>> split_table_name_info('my_table') (None, None, 'my_table')
- iowa_forecast.utils.create_bigquery_table_from_pandas(client: Client, dataframe: DataFrame, table_id: str, dataset_id='bqmlforecast', if_exists: str = 'replace')[source]¶
Create a BigQuery table from a pandas DataFrame.
- Parameters:
client (
bigquery.Client
) – BigQuery client used to connect to the service.dataframe (
pd.DataFrame
) – Apandas.DataFrame
to load into the BigQuery table.table_id (
str
) – ID of the table to create in BigQuery.dataset_id (
str
, default"bqmlforecast"
) – ID of the dataset where the table will be created.if_exists (
{"fail", "replace", "append"}
, default"replace"
) – Behavior when the table already exists.
Examples
>>> client = bigquery.Client() >>> dataframe = pd.DataFrame({'column1': [1, 2], 'column2': ['a', 'b']}) >>> create_bigquery_table_from_pandas(client, dataframe, 'my_table')
- iowa_forecast.utils.create_dataset_if_not_found(client: bigquery.Client, project_id: str | None = None, dataset_name: str = 'bqmlforecast', location: str = 'us')[source]¶
Create a BigQuery dataset if it does not exist.
- Parameters:
client (
bigquery.Client
) – BigQuery client used to connect to the service.project_id (
str
, optional) – ID of the project where the dataset will be created. If no value is provided, the Project ID gets inferred from theproject
attibute fromclient
.dataset_name (
str
, default"bqmlforecast"
) – Name of the dataset to create.location (
str
, default"us"
) – Location of the dataset.
- Raises:
Exception – If any exception other than the error informing the dataset already exists.
Examples
>>> client = bigquery.Client() >>> create_dataset_if_not_found(client, dataset_name='new_dataset') Dataset 'new_dataset' already exists.
Notes
This function checks if the specified dataset exists in the given project. If it does not exist, the function creates the dataset.
- iowa_forecast.utils.list_tables_with_pattern(client: bigquery.Client, dataset_id: str, table_pattern: str, project_id: str | None = None) List[str] [source]¶
List BigQuery tables matching a specific pattern.
Constructs a fully qualified dataset ID, retrieves the dataset, lists all tables, and filters them based on the provided pattern.
- Parameters:
client (
bigquery.Client
) – The BigQuery client used to interact with the service.dataset_id (
str
) – The ID of the dataset containing the tables to list.table_pattern (
str
) – The pattern to match against the table IDs.project_id (
str
, optional) – The ID of the project containing the dataset. If None, the client’s project is used.
- Returns:
List[str]
– A list of table IDs that match the specified pattern.- Return type:
List[str]
Notes
The
fnmatch
module is used to filter tables based on the pattern. Ensure that the pattern provided is compatible withfnmatch
.Examples
List all tables in a dataset that match the pattern ‘sales_*’:
>>> client = bigquery.Client() >>> tables = list_tables_with_pattern(client, 'my_dataset', 'sales_*') >>> print(tables) ['sales_2021', 'sales_2022']
- iowa_forecast.utils.parse_combined_string(combined: str) dict [source]¶
Parse a combined offset string into its components.
- Parameters:
combined (
str
) – A combined string specifying the offset, e.g.,'2Y3M2W1D'
.- Returns:
dict
– A dictionary with keys'years'
,'months'
,'weeks'
,'days'
and their corresponding values.- Raises:
ValueError – If the combined string is invalid.
- Return type:
- iowa_forecast.utils.create_date_offset_from_parts(years=0, months=0, weeks=0, days=0) DateOffset [source]¶
Create a
pandas.DateOffset
object from individual time components.- Parameters:
- Returns:
pd.DateOffset
– Apandas.DateOffset
object for the specified time components.- Return type:
DateOffset
- iowa_forecast.utils.date_offset(*args: int | str, freq: str = None) pd.DateOffset [source]¶
Generate a
pandas.DateOffset
based on the given frequency and value or a combined string.- Parameters:
If one argument is provided, it should be a combined string specifying the offset, e.g.,
'2Y3M2W1D'
.If two arguments are provided, they should be
n
(int) andfreq
(str).
freq (str
{'days', 'weeks', 'months', 'years'}
, optional) – The frequency type. Valid options are'days'
,'weeks'
,'months'
,'years'
. Ignored ifcombined
is provided.
- Returns:
pd.DateOffset
– Apandas.DateOffset
object for the specified frequency and value.- Raises:
ValueError – If
freq
is not one of the valid options or if the combined string is invalid.- Return type:
pd.DateOffset