temporary#

Module defines objects that are meant to be temporary.

class wip.temporary.FakeScaler(feature_range=(0, 1))[source]#

Bases: BaseEstimator, TransformerMixin

A fake scaler class that contains attributes used by SciKit Learn scalers.

This class simulates the behavior of a feature scaler but does not implement any scaling.

Parameters

feature_range (tuple of (min, max), default (0, 1)) – The desired range of transformed data. This parameter is not used in actual transformations but is kept for interface compatibility.

Variables

n_features_in (int) – The number of features observed during fit.
n_samples_seen (int) – The number of samples observed during fit.
min (ndarray of shape (n_features_in_,)) – The minimum value in each feature in the fitted data.
max (ndarray of shape (n_features_in_,)) – The maximum value in each feature in the fitted data.
data_range (ndarray of shape (n_features_in_,)) – The data range (max - min) for each feature in the fitted data.
data_min (ndarray of shape (n_features_in_,)) – The minimum value in each feature in the fitted data.
data_max (ndarray of shape (n_features_in_,)) – The maximum value in each feature in the fitted data.
scale (ndarray of shape (n_features_in_,)) – The scaling factors applied to each feature. Set to ones.
mean (ndarray of shape (n_features_in_,)) – The mean value for each feature in the fitted data.
center (ndarray of shape (n_features_in_,)) – The centering value for each feature.

fit(X, y=None)[source]#: Compute the necessary attributes from the training data.

transform(X)[source]#: Returns the original data as it is.

inverse_transform(X)[source]#: Reverses the transformation, effectively returning the original data.

Examples

>>> import numpy as np
>>> X = np.array([[1, 2], [3, 4]])
>>> scaler = FakeScaler()
>>> scaler.fit(X)
FakeScaler()
>>> scaler.transform(X)
array([[1, 2],
       [3, 4]])

Methods

`fit`(X[, y])	Compute the minimum, maximum, mean, and range for each feature in X.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_params`([deep])	Get parameters for this estimator.
`inverse_transform`(X)	Return the input data unchanged.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)	Return the input data unchanged.

fit(X, y=None)[source]#

Compute the minimum, maximum, mean, and range for each feature in X.

Assumes that X is a numpy array, pandas.Series, or pandas.DataFrame. This method calculates basic statistics for each feature but does not scale the data.

Parameters

X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – This parameter is not used in this method.

Returns

self – Returns self.

Return type

object

Raises

ValueError – If the input array X does not meet the expected criteria.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

inverse_transform(X)[source]#

Return the input data unchanged.

Parameters: X (array-like of shape (n_samples, n_features)) – Input data to be inversely transformed.
Returns: X_original – The original input data unchanged.
Return type: ndarray of shape (n_samples, n_features)

transform(X)[source]#

Return the input data unchanged.

Parameters: X (array-like of shape (n_samples, n_features)) – Input data to be transformed.
Returns: X_transformed – The original input data unchanged.
Return type: ndarray of shape (n_samples, n_features)

wip.temporary.add_constant_tags_summary(dataframe: DataFrame)[source]#

Add a column to dataframe to indicate whether the tag has constant values.

Parameters: dataframe (pd.DataFrame) – A pandas.DataFrame containing the tags to be checked.
Returns: A pandas.DataFrame containing the tags and a column indicating whether the tag has constant values.
Return type: pd.DataFrame

wip.temporary.adjust_column_widths(worksheet: Worksheet)[source]#

Adjust the widths of columns in a worksheet.

Column’s widths are adjusted based on the lengthiest value that each column has.

Parameters: worksheet (openpyxl.worksheet.worksheet.Worksheet) – The Worksheet instance, where the data is being stored to.

wip.temporary.adjust_models_coefficients(models_results: dict, scalers: dict) → dict[source]#

Change the coefficients of Ridge using scalers.

Parameters

models_results (dict) – Results of all models
scalers (dict) – Adjusted scalers for all tags

Returns

models_results – Results containing the regression models with adjusted coefficients

Return type

dict

wip.temporary.apply_style_cells(worksheet) → Tuple[int, int][source]#

Apply “bad” style to cells that have all values equal to zero.

Parameters: worksheet (openpyxl.worksheet.worksheet.Worksheet) – The Worksheet instance, where the data is being stored to.
Returns: A tuple containing the number of results that have all values equal to zero and the number of results that have at least one value equal to zero.
Return type: Tuple[int, int]

wip.temporary.bounds_linking(problem: LpProblem, variables: List[str], bounds_mapping: List[tuple]) → LpProblem[source]#

Parameters

problem (LpProblem) –
variables (List[str]) –
bounds_mapping (List[tuple]) –

Return type

LpProblem

wip.temporary.bounds_mapping_groupby(grp: DataFrame)[source]#

Parameters: grp (DataFrame) –

wip.temporary.can_define_inter_problem_constraint(pulp_solver, current_faixa) → bool[source]#

Perform check to determine if inter-problem constraint can be added to the optimization model.

This function checks whether constraints that compare further optimization problems can be added to a certain production range optimization problem.

Parameters

pulp_solver (PulpSolver) – The PulpSolver class instance, that contains the optimization problems for each production range.
current_faixa (str) – The current production range. Values should be represented as strings containing two numeric values separated by "-". For example, "700-750", "750-800”, "800-850", etc.

Returns

True if the constraint can be added to the optimization model and False otherwise.

Return type

bool

wip.temporary.compute_statistics(series_obj: Series, target_names: List[str]) → dict[source]#

Compute statistical measures for a given pandas Series.

This function calculates various statistical measures for the provided pandas Series. It checks if the series index is of a datetime type and computes averages for different periods, standard deviation, mode, median, minimum, maximum, and percentiles. It also determines if the series name is in the target names list.

Parameters

series_obj (pd.Series) – The pandas.Series object for which statistics are to be calculated. The index must be of datetime type.
target_names (List[str]) – List of target names to check if the series name is a target.

Returns

A dictionary containing computed statistical values. Keys are statistical measures and values are their corresponding computed values. If certain conditions are not met, some values may be None.

Return type

dict

Raises

ValueError – If the index of series_obj is not of a datetime type.

Examples

>>> series = pd.Series([1, 2, 3], index=pd.date_range('20200101', periods=3))
>>> compute_statistics(series, ['target'])
{'Average last 7 days': None, 'Average last 14 days': None, ...}

wip.temporary.compute_statistics_datasets(datasets: Dict[str, DataFrame]) → DataFrame[source]#

Compute statistics for each column across multiple pandas DataFrames.

Aggregates columns from multiple DataFrames provided in a dictionary and computes statistical measures for each unique column. The function iterates over all DataFrames, concatenates column values across them, and calculates statistics using compute_statistics. It handles columns with the same name across different DataFrames and computes overall statistics.

Parameters: datasets (Dict[str, pd.DataFrame]) – A dictionary where keys are dataset names (or identifiers) and values are pandas DataFrames. The function expects the last column of each DataFrame to be the target column.
Returns: A DataFrame where each row corresponds to a unique column from the input DataFrames and contains the computed statistics for that column.
Return type: pd.DataFrame

See also

openpyxl.workbook.workbook.Workbook: The Workbook class from openpyxl used to manipulate Excel files.
pandas.DataFrame: The DataFrame class from pandas used for data manipulation and analysis.

Notes

It’s important that the ranges_dataframe contains columns for ‘Tag’ and temperature ranges as these are crucial for the filtering and chart generation processes. The function dynamically adjusts the y-axis scale of the chart based on the minimum temperature in the data.

References

OpenPyXL documentation : https://openpyxl.readthedocs.io/ Pandas documentation : https://pandas.pydata.org/pandas-docs/stable/

Examples

>>> import pandas as pd
>>> from openpyxl import Workbook
>>> df = pd.DataFrame({
...     'Tag': ['TEMP1_I@08QU-QU-855I-GQ04', 'other_tag'],
...     '700-750': [1, 2],
...     '750-800': [3, 4],
...     '800-850': [5, 6],
...     '850-900': [7, 8],
...     '900-950': [9, 10],
...     '950-1000': [11, 12]
... })
>>> wb = Workbook()
>>> create_perfil_temperature_sheet(df, wb)
>>> 'Perfil Grupos de Queima' in wb.sheetnames
True

wip.temporary.create_results_ranges(res, labels, datasets)[source]#

Create a DataFrame with the pivoted results of the optimization problem.

Parameters

res (pd.DataFrame) – A pandas.DataFrame containing the results of the optimization problem.
labels (pd.DataFrame) – A pandas.DataFrame containing the labels for the optimization problem.
datasets (Dict[str, pd.DataFrame] | None) – A dictionary with pandas.DataFrame objects for each model used to create the optimization model. This dictionary is used to add additional statistics for each tag. If None, no statistics are added to the results.

Returns

A pandas.DataFrame containing the pivoted results of the optimization problem.

Return type

pd.DataFrame

wip.temporary.dataframe_to_worksheet(dataframe, worksheet, index=True, header=True)[source]#

Convert a pandas.DataFrame to an OpenPyXL worksheet.

Parameters

dataframe (pd.DataFrame) – A pandas.DataFrame containing the data to be converted to an OpenPyXL worksheet.
worksheet (openpyxl.worksheet.worksheet.Worksheet) – The Worksheet instance, where the data is being stored to.
index (bool, default True) – Whether to display the index in the formatted Excel file.
header (bool, default True) – Whether to display the header in the formatted Excel file.

wip.temporary.date_select(series_obj: Series, n_days: int = 30) → Series[source]#

Select the last n_days days from a pandas.Series.

Parameters

series_obj (pd.Series) – The pandas.Series object to select the last n_days days. This series must contain datetime values as index.
n_days (int, default 30) – Thew number of days to select based on the last existing date.

Returns

The pandas.Series, with only the last n_days days.

Return type

pd.Series

wip.temporary.drop_model_coefficients(model_coefficients: Dict[str, Dict[str, float | int]], coefficients_to_drop: List[str | Tuple[str, str]] | None = None) → Dict[str, Dict[str, float | int]][source]#

Remove specified coefficients from the given model coefficients.

This function provides a way to delete specific coefficients from the model coefficients. If no specific coefficients are provided to remove,

it will fall back to a set of temporary coefficients stored in

TEMPORARY_MODEL_COEFFICIENTS_TO_REMOVE.

Parameters

model_coefficients (Dict[str, Dict[str, float | int]]) – The model coefficients to change. Keys are model names and values are dictionaries where keys are coefficient names and values are their respective values.
coefficients_to_drop (List[str | Tuple[str, str]] | None, optional) – The coefficients to drop. Can either be a list of model names as strings or a list of tuples where the first element is the model name and the second is the coefficient name. If not provided, coefficients from TEMPORARY_MODEL_COEFFICIENTS_TO_REMOVE are removed.

Returns

The modified model coefficients dictionary after dropping the specified coefficients.

Return type

Dict[str, Dict[str, float | int]]

Examples

Assume we have the following model coefficients:

>>> _model_coefficients = {'model1': {'coeff1': 1.2, 'coeff2': 0.5},
...                        'model2': {'coeff1': 0.8, 'coeff2': 1.5}}
>>> _coefficients_to_drop = [('model1', 'coeff1'), ('model2', 'coeff1')]
>>> drop_model_coefficients(_model_coefficients, _coefficients_to_drop)
{'model1': {'coeff2': 0.5}, 'model2': {'coeff2': 1.5}}

wip.temporary.drop_models_results(models_results: Dict[str, list | dict], models_to_drop: List[str] | None = None) → Dict[str, list | dict][source]#

Remove specified models from the models_results dictionary.

The function filters out specified models from the dictionary containing model results. If no models are specified, the function defaults to removing models from the global variable TEMPORARY_MODELS_TO_REMOVE.

Parameters

models_results (Dict[str, list | dict]) – Dictionary where keys are model names (str) and values are lists of model results.
models_to_drop (List[str] | None, optional) – List of model names to be dropped from models_results. If not provided, the function defaults to using TEMPORARY_MODELS_TO_REMOVE.

Returns

Filtered dictionary of model results with specified models removed.

Return type

Dict[str, list | dict]

Examples

>>> _models_results = {"model1": [1, 2, 3], "model2": [4, 5, 6],
...                    "model3": [7, 8, 9]}
>>> drop_models_results(_models_results, ["model1", "model3"])
{"model2": [4, 5, 6]}

wip.temporary.drop_scalers(scalers: Dict[str, MinMaxScaler], scalers_to_drop: List[str] | None = None) → Dict[str, MinMaxScaler][source]#

Exclude scalers from the provided dictionary of scalers.

This function allows for dropping certain scalers based on their names. The names of the scalers to be dropped are provided in scalers_to_drop. If scalers_to_drop isn’t specified, a pre-defined list SCALERS_TO_REMOVE is used.

Parameters

scalers (Dict[str, MinMaxScaler]) – A dictionary of scaler objects (value) identified by their names (key).
scalers_to_drop (List[str] | None, optional) – List of scaler names to drop from the scalers dictionary. If not provided, defaults to a pre-defined list named TEMPORARY_SCALERS_TO_REMOVE.

Returns

The modified dictionary of scalers, with specified scalers removed.

Return type

Dict[str, MinMaxScaler]

Examples

Assuming we have the following scalers dictionary and SCALERS_TO_REMOVE list:

scalers = {“scaler1”: MinMaxScaler1, “scaler2”: MinMaxScaler2, “scaler3”: MinMaxScaler3} SCALERS_TO_REMOVE = [“scaler1”, “scaler3”]

Calling the function as:

>>> new_scalers = drop_scalers(scalers)
>>> new_scalers
{"scaler2": MinMaxScaler2}

If we specify the scalers_to_drop argument:

>>> new_scalers = drop_scalers(scalers, ["scaler2"])
>>> new_scalers
{"scaler1": MinMaxScaler1, "scaler3": MinMaxScaler3}

wip.temporary.energy_cons_vents_faixas(pulp_solver, current_faixa: str, df_sql: DataFrame)[source]#

Set constraint that sets the values of each fan energy consumption variable to be greater, smaller, or equal to the values from the previous production range based on the historical data tendencies between ranges.

For example, for the optimization problem of the production range "800-850" if the average historical values of the tag "CONS1_Y@08QU-PF-852I-01M1" increase relative to the average historical values of the production range "750-800", then this function will create a constraint that forces the "CONS1_Y@08QU-PF-852I-01M1" variable to be equal to or greater than 101% of the value obtained during the optimization process of the previous production range.

Parameters

pulp_solver (PulpSolver) – The PulpSolver class instance, that contains the optimization problems for each production range.
current_faixa (str) – The current production range. Values should be represented as strings containing two numeric values separated by "-". For example, "700-750", "750-800”, "800-850", etc.
df_sql (pd.DataFrame) – The pandas DataFrame with all tags represented as columns. This function expects that the dataframe used contains the tag values obtained after performing all data transformations and cleaning operations. This parameter is used to determine whether the fan energy consumption values of two adjacent production ranges increase, decrease, or stay the same.

Returns

The PulpSolver class instance with the added constraints.

Return type

PulpSolver

wip.temporary.energy_cons_vents_slopes(df_sql: DataFrame) → DataFrame[source]#

Parameters: df_sql (DataFrame) –
Return type: DataFrame

wip.temporary.filter_corpo_moedor_especifico(datasets: Dict[str, DataFrame]) → Dict[str, DataFrame][source]#

Filter DataFrames in the datasets’ dictionary on a specific condition.

This function filters each DataFrame in the provided datasets dictionary based on the “corpo_moedor_especifico” tag with an upper bound of 10. It uses the filter_tag function to perform the filtering.

Parameters: datasets (Dict[str, pd.DataFrame]) – A dictionary containing string keys and pandas.DataFrame as values. Each DataFrame should contain a column labeled “corpo_moedor_especifico”.
Returns: A dictionary containing the filtered DataFrames, retaining the same keys as the input dictionary.
Return type: Dict[str, pd.DataFrame]

See also

filter_tag: Function used for performing the filtering based on the provided tag and boundary.

Examples

Given a dictionary of DataFrames datasets where each DataFrame contains a column “corpo_moedor_especifico”:

>>> import pandas as pd
>>> datasets = {"df1": pd.DataFrame({"corpo_moedor_especifico": [5, 15, 7]})}
>>> filtered_datasets = filter_corpo_moedor_especifico(datasets)
>>> print(filtered_datasets["df1"])
   corpo_moedor_especifico
0                      5
2                      7

wip.temporary.filter_datasets_df_sql_by_date(datasets: Dict[str, DataFrame], df_sql: DataFrame, n_days: int = 60) → Tuple[Dict[str, DataFrame], DataFrame][source]#

Filter the datasets and df_sql DataFrame by date.

This function filters each DataFrame in the provided datasets dictionary and the df_sql DataFrame based on a lower-bound date. The lower-bound date is calculated as the current date minus a specified number of days.

Parameters

datasets (Dict[str, pd.DataFrame]) – A dictionary containing string keys and pandas.DataFrame as values. Each DataFrame should contain a DateTime index.
df_sql (pd.DataFrame) – A DataFrame with a DateTime index to be filtered.
n_days (int, default 60) – The number of days to subtract from the current date to get the lower-bound date.

Returns

A tuple containing the filtered datasets dictionary and the filtered df_sql DataFrame.

Return type

Tuple[Dict[str, pd.DataFrame], pd.DataFrame]

wip.temporary.fix_vent_control_tags_bounds(prob: LpProblem, datasets, faixa, lb_quantile=0.1, ub_quantile=0.9)[source]#

Parameters: prob (LpProblem) –

wip.temporary.format_results(header: bool = True, index: bool = False)[source]#

Format the results of the optimization problem and save them to an Excel file.

This function will pivot the results, so that the values for all production ranges are set column-wise.

Then this function formats the results and saves them to a new Excel file, located in the same directory as the original results file.

Parameters

header (bool, default True) – Whether to display the header in the formatted Excel file.
index (bool, default False) – Whether to display the index in the formatted Excel file.

wip.temporary.inverse_transform_models_features(models_features: Dict[str, Series], scalers: dict) → Dict[str, Series][source]#

Apply the inverse transformation of the models_features dictionary values.

If a scaler is not found for a particular series, it logs an error and uses the pre-existing values for that series.

Parameters

models_features (Dict[str, pd.Series]) – A dictionary where keys are tag names and values are pandas.Series.
scalers (Dict[str, SkLearnScalers]) – A dictionary where keys are tag names and values are instances of Scikit-learn scalers.

Returns

A dictionary of tag names and pandas.Series values representing these tags original values.

Return type

Dict[str, pd.Series]

wip.temporary.link_queima_vars(problem, datasets, faixa)[source]#

wip.temporary.pivot_optimization_results(res)[source]#

Pivot the results of the optimization problem.

Parameters: res (pd.DataFrame) – A pandas.DataFrame containing the results of the optimization problem.
Returns: A pandas.DataFrame containing the pivoted results of the optimization problem.
Return type: pd.DataFrame

wip.temporary.process_labels(labels: DataFrame) → DataFrame[source]#

Process the labels to add to the optimization problem.

Parameters: labels (pd.DataFrame) – A pandas.DataFrame containing the labels for the optimization problem.
Returns: A pandas.DataFrame containing the processed labels for the optimization problem.
Return type: pd.DataFrame

wip.temporary.replace_ventiladores_tags(datasets: Dict[str, pd.DataFrame], df_sql: pd.DataFrame, models: List[str] | None = None, old_ventiladores_tags: List[str] | None = None, new_ventiladores_tags: List[str] | None = None)[source]#

Replace old ventiladores tags with new ones in specified dataframes.

This function updates a collection of pandas DataFrames by removing specified old ventiladores tags if they are not target columns, and merging the DataFrames with a new set of ventiladores tags from another DataFrame. It ensures that the structure of the DataFrames remains consistent, especially with regard to the target column.

Parameters

datasets (Dict[str, pd.DataFrame]) – A dictionary of DataFrames keyed by model name.
df_sql (pd.DataFrame) – A pandas.DataFrame containing new ventiladores tags to be merged.
models (List[str] | None, optional) – The list of model names corresponding to keys in datasets to be updated. Defaults to a predefined list if None.
old_ventiladores_tags (List[str] | None, optional) – The list of old ventiladores tags to be removed. Defaults to a predefined list if None.
new_ventiladores_tags (List[str] | None, optional) – The list of new ventiladores tags to be merged. Defaults to a predefined list if None.

Returns

Updated dictionary of DataFrames with old tags replaced by new tags.

Return type

Dict[str, pd.DataFrame]

Raises

ValueError – If an old ventiladores tag is also a target column in any DataFrame, or if the target column changes after the operation.

Notes

It is crucial that the target column remains unchanged after this operation. The last column from each DataFrame is assumed to be the model’s target. Therefore, after running this function, the last column of each DataFrame should still contain the target values.

wip.temporary.round_value(value: Any, decimals: int = 2) → Any[source]#

Round numeric values with more than to 3 decimal places.

Parameters

value (Any) – Value to be rounded, if it is a float.
decimals (int, default 2) – Number of decimal places to round to.

Returns

Rounded value, if it is a float. Otherwise, the original value.

Return type

Any

wip.temporary.save_formatted_dataframe(dataframe: pd.DataFrame, path: str | Path, header: bool = True, index: bool = True)[source]#

Format the results of the optimization problem and save them to an Excel file.

Parameters

dataframe (pd.DataFrame) – A pandas.DataFrame containing the results of the optimization problem.
path (str | Path) – A path to save the formatted results to.
header (bool, default True) – Whether to display the header in the formatted Excel file.
index (bool, default True) – Whether to display the index in the formatted Excel file.

wip.temporary.temp_production_ranges_ascending(pulp_solver, current_faixa: str)[source]#

Constraint current production range optimization problem using previous range values.

This function performs the following steps:

1. The function checks if the current_faixa is the optimization problem from the first production range from the pulp_solver.probs dictionary. If it is, the function returns the pulp_solver object unchanged.

The function retrieves the dictionary of scalers from the pulp_solver object.
The function calculates the previous production range by subtracting 50 from the lower bound of the current production range.
The function retrieves the current and previous probability objects from the pulp_solver.probs dictionary.
If the previous probability object is not optimal, the function logs an error message and returns the pulp_solver object unchanged.
The function retrieves the LP variables for the current and previous production ranges.
The function creates a list of "TEMP1_I@08QU-QU-855I-GQXX" tag names. Then it retrieves the LP variables for the current and previous production ranges optimization problems, denormalizes them using the denormalize_lpvar function, and adds a constraint to the current optimization problem instance.
The function updates the pulp_solver.probs dictionary with the updated optimization problem instance.
The function returns the updated pulp_solver object.

Parameters

pulp_solver (PulpSolver) – The PulpSolver class instance, that contains the optimization problems for each production range.
current_faixa (str) – The current production range. Values should be represented as strings containing two numeric values separated by "-". For example, "700-750", "750-800”, "800-850", etc.

Returns

The PulpSolver class instance with the added constraints.

Return type

PulpSolver

Notes

To better explain what’s the purpose of this function, consider the following example:

Suppose we’re solving 5 optimization problems, for the following production ranges: '750-800', '800-850', '850-900', '900-950', and '950-1000'.

After defining the first production range problem and solving it, the model returned a value for the tag 'TEMP1_I@08QU-QU-855I-GQ09' equal to 1355. In this scenario, this function will force the value of 'TEMP1_I@08QU-QU-855I-GQ09' for the next production range of '800-850' to be equal to 1355 or higher. The next production range model in turn will restrict values that can be set to this same column based on the second problem’s results and so on.