temporary#
Module defines objects that are meant to be temporary.
- class wip.temporary.FakeScaler(feature_range=(0, 1))[source]#
Bases:
BaseEstimator,TransformerMixinA fake scaler class that contains attributes used by SciKit Learn scalers.
This class simulates the behavior of a feature scaler but does not implement any scaling.
- Parameters
feature_range (
tupleof(min,max), default(0,1)) – The desired range of transformed data. This parameter is not used in actual transformations but is kept for interface compatibility.- Variables
n_features_in (
int) – The number of features observed duringfit.n_samples_seen (
int) – The number of samples observed duringfit.min (
ndarrayofshape (n_features_in_,)) – The minimum value in each feature in the fitted data.max (
ndarrayofshape (n_features_in_,)) – The maximum value in each feature in the fitted data.data_range (
ndarrayofshape (n_features_in_,)) – The data range (max - min) for each feature in the fitted data.data_min (
ndarrayofshape (n_features_in_,)) – The minimum value in each feature in the fitted data.data_max (
ndarrayofshape (n_features_in_,)) – The maximum value in each feature in the fitted data.scale (
ndarrayofshape (n_features_in_,)) – The scaling factors applied to each feature. Set to ones.mean (
ndarrayofshape (n_features_in_,)) – The mean value for each feature in the fitted data.center (
ndarrayofshape (n_features_in_,)) – The centering value for each feature.
Examples
>>> import numpy as np >>> X = np.array([[1, 2], [3, 4]]) >>> scaler = FakeScaler() >>> scaler.fit(X) FakeScaler() >>> scaler.transform(X) array([[1, 2], [3, 4]])
Methods
fit(X[, y])Compute the minimum, maximum, mean, and range for each feature in X.
fit_transform(X[, y])Fit to data, then transform it.
get_params([deep])Get parameters for this estimator.
Return the input data unchanged.
set_output(*[, transform])Set output container.
set_params(**params)Set the parameters of this estimator.
transform(X)Return the input data unchanged.
- fit(X, y=None)[source]#
Compute the minimum, maximum, mean, and range for each feature in X.
Assumes that
Xis a numpy array,pandas.Series, orpandas.DataFrame. This method calculates basic statistics for each feature but does not scale the data.- Parameters
X (
array-likeofshape (n_samples,n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.y (
Ignored) – This parameter is not used in this method.
- Returns
self – Returns self.
- Return type
- Raises
ValueError – If the input array
Xdoes not meet the expected criteria.
- fit_transform(X, y=None, **fit_params)#
Fit to data, then transform it.
Fits transformer to
Xandywith optional parametersfit_paramsand returns a transformed version ofX.- Parameters
X (
array-likeofshape (n_samples,n_features)) – Input samples.y (
array-likeofshape (n_samples,)or(n_samples,n_outputs), default=None) – Target values (None for unsupervised transformations).**fit_params (
dict) – Additional fit parameters.
- Returns
X_new – Transformed array.
- Return type
ndarray arrayofshape (n_samples,n_features_new)
- wip.temporary.add_constant_tags_summary(dataframe: DataFrame)[source]#
Add a column to
dataframeto indicate whether the tag has constant values.- Parameters
dataframe (
pd.DataFrame) – Apandas.DataFramecontaining the tags to be checked.- Returns
A
pandas.DataFramecontaining the tags and a column indicating whether the tag has constant values.- Return type
pd.DataFrame
- wip.temporary.adjust_column_widths(worksheet: Worksheet)[source]#
Adjust the widths of columns in a worksheet.
Column’s widths are adjusted based on the lengthiest value that each column has.
- Parameters
worksheet (
openpyxl.worksheet.worksheet.Worksheet) – The Worksheet instance, where the data is being stored to.
- wip.temporary.adjust_models_coefficients(models_results: dict, scalers: dict) dict[source]#
Change the coefficients of Ridge using scalers.
- wip.temporary.apply_style_cells(worksheet) Tuple[int, int][source]#
Apply “bad” style to cells that have all values equal to zero.
- Parameters
worksheet (
openpyxl.worksheet.worksheet.Worksheet) – The Worksheet instance, where the data is being stored to.- Returns
A tuple containing the number of results that have all values equal to zero and the number of results that have at least one value equal to zero.
- Return type
Tuple[int,int]
- wip.temporary.bounds_linking(problem: LpProblem, variables: List[str], bounds_mapping: List[tuple]) LpProblem[source]#
- wip.temporary.can_define_inter_problem_constraint(pulp_solver, current_faixa) bool[source]#
Perform check to determine if inter-problem constraint can be added to the optimization model.
This function checks whether constraints that compare further optimization problems can be added to a certain production range optimization problem.
- Parameters
pulp_solver (
PulpSolver) – ThePulpSolverclass instance, that contains the optimization problems for each production range.current_faixa (
str) – The current production range. Values should be represented as strings containing two numeric values separated by"-". For example,"700-750","750-800”,"800-850", etc.
- Returns
True if the constraint can be added to the optimization model and False otherwise.
- Return type
- wip.temporary.compute_statistics(series_obj: Series, target_names: List[str]) dict[source]#
Compute statistical measures for a given pandas Series.
This function calculates various statistical measures for the provided pandas Series. It checks if the series index is of a datetime type and computes averages for different periods, standard deviation, mode, median, minimum, maximum, and percentiles. It also determines if the series name is in the target names list.
- Parameters
series_obj (
pd.Series) – Thepandas.Seriesobject for which statistics are to be calculated. The index must be of datetime type.target_names (
List[str]) – List of target names to check if the series name is a target.
- Returns
A dictionary containing computed statistical values. Keys are statistical measures and values are their corresponding computed values. If certain conditions are not met, some values may be None.
- Return type
- Raises
ValueError – If the index of
series_objis not of a datetime type.
Examples
>>> series = pd.Series([1, 2, 3], index=pd.date_range('20200101', periods=3)) >>> compute_statistics(series, ['target']) {'Average last 7 days': None, 'Average last 14 days': None, ...}
- wip.temporary.compute_statistics_datasets(datasets: Dict[str, DataFrame]) DataFrame[source]#
Compute statistics for each column across multiple pandas DataFrames.
Aggregates columns from multiple DataFrames provided in a dictionary and computes statistical measures for each unique column. The function iterates over all DataFrames, concatenates column values across them, and calculates statistics using
compute_statistics. It handles columns with the same name across different DataFrames and computes overall statistics.- Parameters
datasets (
Dict[str,pd.DataFrame]) – A dictionary where keys are dataset names (or identifiers) and values are pandas DataFrames. The function expects the last column of each DataFrame to be the target column.- Returns
A DataFrame where each row corresponds to a unique column from the input DataFrames and contains the computed statistics for that column.
- Return type
pd.DataFrame
See also
compute_statisticsUsed to compute statistics for individual columns.
Examples
>>> _datasets = { ... "dataset1": pd.DataFrame(...), ... "dataset2": pd.DataFrame(...) ... } >>> compute_statistics_datasets(_datasets) # DataFrame with computed statistics for each column across the datasets.
Notes
This function is specifically designed to work with DataFrames where the last column is considered as the target. The statistics are computed for each column, considering their presence across multiple DataFrames.
- wip.temporary.create_perfil_temperature_sheet(ranges_dataframe: DataFrame, wb, sheet_name: str = 'Perfil Grupos de Queima')[source]#
Adds a sheet to a given workbook and populates it with filtered temperature data and a line chart.
This function filters a DataFrame for specific tags, copies certain columns into a new sheet in the given Excel workbook, and creates a line chart visualizing temperature ranges. It ensures the sheet exists (creating it if necessary), adds filtered data, and then constructs and inserts a line chart based on the data.
- Parameters
ranges_dataframe (
pd.DataFrame) – The DataFrame containing temperature range data to be filtered and copied.wb (
openpyxl.workbook.workbook.Workbook) – The workbook where the sheet will be added and populated with data and a chart.sheet_name (
str, optional) – The name of the sheet to be added or used for inserting data. Default is ‘Perfil Grupos de Queima’.
- Raises
ValueError – If
ranges_dataframedoes not contain the necessary columns for filtering or chart generation.
See also
openpyxl.workbook.workbook.WorkbookThe Workbook class from openpyxl used to manipulate Excel files.
pandas.DataFrameThe DataFrame class from pandas used for data manipulation and analysis.
Notes
It’s important that the
ranges_dataframecontains columns for ‘Tag’ and temperature ranges as these are crucial for the filtering and chart generation processes. The function dynamically adjusts the y-axis scale of the chart based on the minimum temperature in the data.References
OpenPyXL documentation : https://openpyxl.readthedocs.io/ Pandas documentation : https://pandas.pydata.org/pandas-docs/stable/
Examples
>>> import pandas as pd >>> from openpyxl import Workbook >>> df = pd.DataFrame({ ... 'Tag': ['TEMP1_I@08QU-QU-855I-GQ04', 'other_tag'], ... '700-750': [1, 2], ... '750-800': [3, 4], ... '800-850': [5, 6], ... '850-900': [7, 8], ... '900-950': [9, 10], ... '950-1000': [11, 12] ... }) >>> wb = Workbook() >>> create_perfil_temperature_sheet(df, wb) >>> 'Perfil Grupos de Queima' in wb.sheetnames True
- wip.temporary.create_results_ranges(res, labels, datasets)[source]#
Create a DataFrame with the pivoted results of the optimization problem.
- Parameters
res (
pd.DataFrame) – Apandas.DataFramecontaining the results of the optimization problem.labels (
pd.DataFrame) – Apandas.DataFramecontaining the labels for the optimization problem.datasets (
Dict[str,pd.DataFrame] | None) – A dictionary withpandas.DataFrameobjects for each model used to create the optimization model. This dictionary is used to add additional statistics for each tag. If None, no statistics are added to the results.
- Returns
A
pandas.DataFramecontaining the pivoted results of the optimization problem.- Return type
pd.DataFrame
- wip.temporary.dataframe_to_worksheet(dataframe, worksheet, index=True, header=True)[source]#
Convert a
pandas.DataFrameto an OpenPyXL worksheet.- Parameters
dataframe (
pd.DataFrame) – Apandas.DataFramecontaining the data to be converted to an OpenPyXL worksheet.worksheet (
openpyxl.worksheet.worksheet.Worksheet) – The Worksheet instance, where the data is being stored to.index (
bool, defaultTrue) – Whether to display the index in the formatted Excel file.header (
bool, defaultTrue) – Whether to display the header in the formatted Excel file.
- wip.temporary.date_select(series_obj: Series, n_days: int = 30) Series[source]#
Select the last
n_daysdays from apandas.Series.- Parameters
series_obj (
pd.Series) – Thepandas.Seriesobject to select the lastn_daysdays. This series must contain datetime values as index.n_days (
int, default30) – Thew number of days to select based on the last existing date.
- Returns
The
pandas.Series, with only the lastn_daysdays.- Return type
pd.Series
- wip.temporary.drop_model_coefficients(model_coefficients: Dict[str, Dict[str, float | int]], coefficients_to_drop: List[str | Tuple[str, str]] | None = None) Dict[str, Dict[str, float | int]][source]#
Remove specified coefficients from the given model coefficients.
This function provides a way to delete specific coefficients from the model coefficients. If no specific coefficients are provided to remove,
it will fall back to a set of temporary coefficients stored in
TEMPORARY_MODEL_COEFFICIENTS_TO_REMOVE.- Parameters
model_coefficients (
Dict[str,Dict[str,float | int]]) – The model coefficients to change. Keys are model names and values are dictionaries where keys are coefficient names and values are their respective values.coefficients_to_drop (
List[str | Tuple[str,str]] | None, optional) – The coefficients to drop. Can either be a list of model names as strings or a list of tuples where the first element is the model name and the second is the coefficient name. If not provided, coefficients fromTEMPORARY_MODEL_COEFFICIENTS_TO_REMOVEare removed.
- Returns
The modified model coefficients dictionary after dropping the specified coefficients.
- Return type
Dict[str,Dict[str,float | int]]
Examples
Assume we have the following model coefficients:
>>> _model_coefficients = {'model1': {'coeff1': 1.2, 'coeff2': 0.5}, ... 'model2': {'coeff1': 0.8, 'coeff2': 1.5}} >>> _coefficients_to_drop = [('model1', 'coeff1'), ('model2', 'coeff1')] >>> drop_model_coefficients(_model_coefficients, _coefficients_to_drop) {'model1': {'coeff2': 0.5}, 'model2': {'coeff2': 1.5}}
- wip.temporary.drop_models_results(models_results: Dict[str, list | dict], models_to_drop: List[str] | None = None) Dict[str, list | dict][source]#
Remove specified models from the
models_resultsdictionary.The function filters out specified models from the dictionary containing model results. If no models are specified, the function defaults to removing models from the global variable
TEMPORARY_MODELS_TO_REMOVE.- Parameters
models_results (
Dict[str,list | dict]) – Dictionary where keys are model names (str) and values are lists of model results.models_to_drop (
List[str] | None, optional) – List of model names to be dropped frommodels_results. If not provided, the function defaults to usingTEMPORARY_MODELS_TO_REMOVE.
- Returns
Filtered dictionary of model results with specified models removed.
- Return type
Dict[str,list | dict]
Examples
>>> _models_results = {"model1": [1, 2, 3], "model2": [4, 5, 6], ... "model3": [7, 8, 9]} >>> drop_models_results(_models_results, ["model1", "model3"]) {"model2": [4, 5, 6]}
- wip.temporary.drop_scalers(scalers: Dict[str, MinMaxScaler], scalers_to_drop: List[str] | None = None) Dict[str, MinMaxScaler][source]#
Exclude scalers from the provided dictionary of scalers.
This function allows for dropping certain scalers based on their names. The names of the scalers to be dropped are provided in
scalers_to_drop. Ifscalers_to_dropisn’t specified, a pre-defined listSCALERS_TO_REMOVEis used.- Parameters
scalers (
Dict[str,MinMaxScaler]) – A dictionary of scaler objects (value) identified by their names (key).scalers_to_drop (
List[str] | None, optional) – List of scaler names to drop from thescalersdictionary. If not provided, defaults to a pre-defined list namedTEMPORARY_SCALERS_TO_REMOVE.
- Returns
The modified dictionary of scalers, with specified scalers removed.
- Return type
Dict[str,MinMaxScaler]
Examples
Assuming we have the following scalers dictionary and
SCALERS_TO_REMOVElist:scalers = {“scaler1”: MinMaxScaler1, “scaler2”: MinMaxScaler2, “scaler3”: MinMaxScaler3} SCALERS_TO_REMOVE = [“scaler1”, “scaler3”]
Calling the function as:
>>> new_scalers = drop_scalers(scalers) >>> new_scalers {"scaler2": MinMaxScaler2}
If we specify the
scalers_to_dropargument:>>> new_scalers = drop_scalers(scalers, ["scaler2"]) >>> new_scalers {"scaler1": MinMaxScaler1, "scaler3": MinMaxScaler3}
- wip.temporary.energy_cons_vents_faixas(pulp_solver, current_faixa: str, df_sql: DataFrame)[source]#
Set constraint that sets the values of each fan energy consumption variable to be greater, smaller, or equal to the values from the previous production range based on the historical data tendencies between ranges.
For example, for the optimization problem of the production range
"800-850"if the average historical values of the tag"CONS1_Y@08QU-PF-852I-01M1"increase relative to the average historical values of the production range"750-800", then this function will create a constraint that forces the"CONS1_Y@08QU-PF-852I-01M1"variable to be equal to or greater than 101% of the value obtained during the optimization process of the previous production range.- Parameters
pulp_solver (
PulpSolver) – ThePulpSolverclass instance, that contains the optimization problems for each production range.current_faixa (
str) – The current production range. Values should be represented as strings containing two numeric values separated by"-". For example,"700-750","750-800”,"800-850", etc.df_sql (
pd.DataFrame) – The pandas DataFrame with all tags represented as columns. This function expects that the dataframe used contains the tag values obtained after performing all data transformations and cleaning operations. This parameter is used to determine whether the fan energy consumption values of two adjacent production ranges increase, decrease, or stay the same.
- Returns
The
PulpSolverclass instance with the added constraints.- Return type
PulpSolver
- wip.temporary.energy_cons_vents_slopes(df_sql: DataFrame) DataFrame[source]#
- Parameters
df_sql (DataFrame) –
- Return type
DataFrame
- wip.temporary.filter_corpo_moedor_especifico(datasets: Dict[str, DataFrame]) Dict[str, DataFrame][source]#
Filter DataFrames in the datasets’ dictionary on a specific condition.
This function filters each DataFrame in the provided datasets dictionary based on the “corpo_moedor_especifico” tag with an upper bound of 10. It uses the
filter_tagfunction to perform the filtering.- Parameters
datasets (
Dict[str,pd.DataFrame]) – A dictionary containing string keys andpandas.DataFrameas values. Each DataFrame should contain a column labeled “corpo_moedor_especifico”.- Returns
A dictionary containing the filtered DataFrames, retaining the same keys as the input dictionary.
- Return type
Dict[str,pd.DataFrame]
See also
filter_tagFunction used for performing the filtering based on the provided tag and boundary.
Examples
Given a dictionary of DataFrames
datasetswhere each DataFrame contains a column “corpo_moedor_especifico”:>>> import pandas as pd >>> datasets = {"df1": pd.DataFrame({"corpo_moedor_especifico": [5, 15, 7]})} >>> filtered_datasets = filter_corpo_moedor_especifico(datasets) >>> print(filtered_datasets["df1"]) corpo_moedor_especifico 0 5 2 7
- wip.temporary.filter_datasets_df_sql_by_date(datasets: Dict[str, DataFrame], df_sql: DataFrame, n_days: int = 60) Tuple[Dict[str, DataFrame], DataFrame][source]#
Filter the
datasetsanddf_sqlDataFrame by date.This function filters each DataFrame in the provided datasets dictionary and the df_sql DataFrame based on a lower-bound date. The lower-bound date is calculated as the current date minus a specified number of days.
- Parameters
datasets (
Dict[str,pd.DataFrame]) – A dictionary containing string keys andpandas.DataFrameas values. Each DataFrame should contain a DateTime index.df_sql (
pd.DataFrame) – A DataFrame with a DateTime index to be filtered.n_days (
int, default60) – The number of days to subtract from the current date to get the lower-bound date.
- Returns
A tuple containing the filtered
datasetsdictionary and the filtereddf_sqlDataFrame.- Return type
Tuple[Dict[str,pd.DataFrame],pd.DataFrame]
- wip.temporary.fix_vent_control_tags_bounds(prob: LpProblem, datasets, faixa, lb_quantile=0.1, ub_quantile=0.9)[source]#
- Parameters
prob (LpProblem) –
- wip.temporary.format_results(header: bool = True, index: bool = False)[source]#
Format the results of the optimization problem and save them to an Excel file.
This function will pivot the results, so that the values for all production ranges are set column-wise.
Then this function formats the results and saves them to a new Excel file, located in the same directory as the original results file.
- wip.temporary.inverse_transform_models_features(models_features: Dict[str, Series], scalers: dict) Dict[str, Series][source]#
Apply the inverse transformation of the
models_featuresdictionary values.If a scaler is not found for a particular series, it logs an error and uses the pre-existing values for that series.
- Parameters
models_features (
Dict[str,pd.Series]) – A dictionary where keys are tag names and values arepandas.Series.scalers (
Dict[str,SkLearnScalers]) – A dictionary where keys are tag names and values are instances of Scikit-learn scalers.
- Returns
A dictionary of tag names and
pandas.Seriesvalues representing these tags original values.- Return type
Dict[str,pd.Series]
- wip.temporary.pivot_optimization_results(res)[source]#
Pivot the results of the optimization problem.
- Parameters
res (
pd.DataFrame) – Apandas.DataFramecontaining the results of the optimization problem.- Returns
A
pandas.DataFramecontaining the pivoted results of the optimization problem.- Return type
pd.DataFrame
- wip.temporary.process_labels(labels: DataFrame) DataFrame[source]#
Process the labels to add to the optimization problem.
- Parameters
labels (
pd.DataFrame) – Apandas.DataFramecontaining the labels for the optimization problem.- Returns
A
pandas.DataFramecontaining the processed labels for the optimization problem.- Return type
pd.DataFrame
- wip.temporary.replace_ventiladores_tags(datasets: Dict[str, pd.DataFrame], df_sql: pd.DataFrame, models: List[str] | None = None, old_ventiladores_tags: List[str] | None = None, new_ventiladores_tags: List[str] | None = None)[source]#
Replace old ventiladores tags with new ones in specified dataframes.
This function updates a collection of pandas DataFrames by removing specified old ventiladores tags if they are not target columns, and merging the DataFrames with a new set of ventiladores tags from another DataFrame. It ensures that the structure of the DataFrames remains consistent, especially with regard to the target column.
- Parameters
datasets (
Dict[str,pd.DataFrame]) – A dictionary of DataFrames keyed by model name.df_sql (
pd.DataFrame) – Apandas.DataFramecontaining new ventiladores tags to be merged.models (
List[str] | None, optional) – The list of model names corresponding to keys indatasetsto be updated. Defaults to a predefined list if None.old_ventiladores_tags (
List[str] | None, optional) – The list of old ventiladores tags to be removed. Defaults to a predefined list if None.new_ventiladores_tags (
List[str] | None, optional) – The list of new ventiladores tags to be merged. Defaults to a predefined list if None.
- Returns
Updated dictionary of DataFrames with old tags replaced by new tags.
- Return type
Dict[str,pd.DataFrame]- Raises
ValueError – If an old ventiladores tag is also a target column in any DataFrame, or if the target column changes after the operation.
Notes
It is crucial that the target column remains unchanged after this operation. The last column from each DataFrame is assumed to be the model’s target. Therefore, after running this function, the last column of each DataFrame should still contain the target values.
- wip.temporary.round_value(value: Any, decimals: int = 2) Any[source]#
Round numeric values with more than to 3 decimal places.
- Parameters
value (
Any) – Value to be rounded, if it is a float.decimals (
int, default2) – Number of decimal places to round to.
- Returns
Rounded value, if it is a float. Otherwise, the original value.
- Return type
Any
- wip.temporary.save_formatted_dataframe(dataframe: pd.DataFrame, path: str | Path, header: bool = True, index: bool = True)[source]#
Format the results of the optimization problem and save them to an Excel file.
- Parameters
dataframe (
pd.DataFrame) – Apandas.DataFramecontaining the results of the optimization problem.path (
str | Path) – A path to save the formatted results to.header (
bool, defaultTrue) – Whether to display the header in the formatted Excel file.index (
bool, defaultTrue) – Whether to display the index in the formatted Excel file.
- wip.temporary.temp_production_ranges_ascending(pulp_solver, current_faixa: str)[source]#
Constraint current production range optimization problem using previous range values.
This function performs the following steps:
1. The function checks if the current_faixa is the optimization problem from the first production range from the
pulp_solver.probsdictionary. If it is, the function returns thepulp_solverobject unchanged.The function retrieves the dictionary of scalers from the
pulp_solverobject.The function calculates the previous production range by subtracting 50 from the lower bound of the current production range.
The function retrieves the current and previous probability objects from the
pulp_solver.probsdictionary.If the previous probability object is not optimal, the function logs an error message and returns the
pulp_solverobject unchanged.The function retrieves the LP variables for the current and previous production ranges.
The function creates a list of
"TEMP1_I@08QU-QU-855I-GQXX"tag names. Then it retrieves the LP variables for the current and previous production ranges optimization problems, denormalizes them using thedenormalize_lpvarfunction, and adds a constraint to the current optimization problem instance.The function updates the
pulp_solver.probsdictionary with the updated optimization problem instance.The function returns the updated
pulp_solverobject.
- Parameters
pulp_solver (
PulpSolver) – ThePulpSolverclass instance, that contains the optimization problems for each production range.current_faixa (
str) – The current production range. Values should be represented as strings containing two numeric values separated by"-". For example,"700-750","750-800”,"800-850", etc.
- Returns
The
PulpSolverclass instance with the added constraints.- Return type
PulpSolver
Notes
To better explain what’s the purpose of this function, consider the following example:
Suppose we’re solving 5 optimization problems, for the following production ranges:
'750-800','800-850','850-900','900-950', and'950-1000'.After defining the first production range problem and solving it, the model returned a value for the tag
'TEMP1_I@08QU-QU-855I-GQ09'equal to 1355. In this scenario, this function will force the value of'TEMP1_I@08QU-QU-855I-GQ09'for the next production range of'800-850'to be equal to 1355 or higher. The next production range model in turn will restrict values that can be set to this same column based on the second problem’s results and so on.