ops#
Operations applied over the data
- wip.modules.ops.define_real_scalers(datasets: Dict[str, DataFrame])[source]#
Define a scaler for each variable from the
datasets
.- Parameters
datasets (
Dict[str
,pd.DataFrame]
) – The dictionary of datasets, Each key/value pair represents a model and its corresponding dataset.- Returns
Dict[str
,MinMaxScaler]
– The dictionary of scalers, with the new scalers added.np.ndarray
– A 2D array of values used to fit the scalers.
- wip.modules.ops.detect_file_encoding(filename: str | BufferedReader) str [source]#
Detect the character encoding of a file or
BufferedReader
.This function uses the
chardet
library to determine the character encoding of the input file orBufferedReader
object. The input can be a file path (as a string) or aBufferedReader
object.- Parameters
filename (
str | BufferedReader
) – File path (as a string) or aBufferedReader
object for which the character encoding needs to be detected.- Returns
The detected character encoding of the input file or
BufferedReader
.- Return type
Examples
>>> file_path = "example.txt" >>> encoding = detect_file_encoding(file_path) >>> print(encoding) 'utf-8'
Notes
This function uses the
chardet
library to detect the character encoding of the input file orBufferedReader
object. Thechardet
library evaluates the file content and returns the encoding along with a confidence score. The confidence score means the certainty of the encoding being correct. The encoding with the highest
- wip.modules.ops.fit_scalers_to_tag(tag_values: Series) object [source]#
Fit a
MinMaxScaler
to a tag’s values.- Parameters
tag_values (
pd.Series
) – Thepandas.Series
containing the values of a tag.- Returns
Fitted scaler.
- Return type
Notes
When fitting a
MinMaxScaler
to an array-like object that contains a single column, the array-like object needs to be reshaped to a 2D array prior to fitting the scaler with it. This function’s main purpose is to provide a simple shortcut to do so, without having to manually reshape the array-like object every time.
- wip.modules.ops.get_original_tag_name(otm_tag_name: str) str [source]#
Get original tag name from OTM tag name.
Examples
>>> get_original_tag_name("TEMP1_I@08QU_QU_855I_GQ16") 'TEMP1_I@08QU-QU-855I-GQ16' >>> get_original_tag_name("cfix") 'cfix' >>> get_original_tag_name('equalPQmult24div768divFUNC') '=PQ*24/768/FUNC' >>> get_original_tag_name('equal192divVELO') '=192/VELO' >>> get_original_tag_name('GRAN_OCS_16-18@08PE-BD-840I-01') 'GRAN_OCS_16-18@08PE-BD-840I-01' >>> get_original_tag_name("POT_TOTAL_VENT___US8") 'POT TOTAL VENT - US8'
- wip.modules.ops.inverse_transform_lpvar(lpvar: LpVariable, scaler: MinMaxScaler) LpAffineExpression [source]#
Get the inverse transform a
LpVariable
.- Parameters
lpvar (
pulp.LpVariable
) – Thepulp.LpVariable
to inverse transform.scaler (
MinMaxScaler
) – The Scaler used to transform theLpVariable
- Returns
Inverse transformed
LpVariable
.- Return type
pulp.LpAffineExpression
Notes
Variables on the optimization model are normalized to the range [0, 1] using
sklearn.preprocessing.MinMaxScaler
. Although this normalization is necessary due to the architecture of the optimization problem, some constraints require the comparison of variables that can only occur if they’re on their original scale.Examples
>>> import numpy as np >>> from sklearn.preprocessing import MinMaxScaler >>> data = np.array([1, 20, 80, 85, 55, 100]) >>> # Fit a scaler to the example data defined above. >>> scaler = MinMaxScaler().fit(data.reshape(-1, 1)) >>> scaled_data = scaler.transform(data.reshape(-1, 1)).reshape(-1) >>> # Show what the scaled data looks like. It should contain values that >>> # range from 0 to 1. >>> print(scaled_data) [0. 0.19191919 0.7979798 0.84848485 0.54545455 1. ] >>> # Rescale the last scaled value back to its original value: >>> print(inverse_transform_lpvar(scaled_data[-1], scaler)) 99.99999999999999
- wip.modules.ops.normalize_feature(scalers, feature, norm_value)[source]#
Normalize a given feature value based on scalers.
This function takes a dictionary of scalers, a feature key, and a value to normalize. It returns the normalized value based on the given scalers for the specified feature.
Attention
This function assumes that the given scalers are
sklearn.preprocessing.MinMaxScaler
objects. It tries to access the attributesdata_range_
anddata_min_
of the scaler object, that only exist inside theMinMaxScaler
class.- Parameters
scalers (
Dict[str
,MinMaxScaler]
) – Dictionary containing scaler objects, where the keys are feature names and the values are scaler objects withdata_min_
anddata_range_
attributes.feature (
str
) – The key corresponding to the feature to be normalized in thescalers
dictionary.norm_value (
float
) – The value of the specified feature to normalize.
- Returns
The normalized value of
norm_value
for the given feature using the provided scalers. The scaled value represents a value between 0 and 1.- Return type
Examples
>>> from sklearn.preprocessing import MinMaxScaler >>> import numpy as np >>> data = np.array([[1, 2], [3, 4], [5, 6]]) >>> scaler = MinMaxScaler().fit(data) >>> scalers = {"feature1": scaler} >>> normalize_feature(scalers, "feature1", 4) 0.5
Notes
This function applies the min-max scaling formula manually. Applying the normalization formula manually allows for single value normalization, without having to implement logic that transforms the single value into a numpy array, and then back into a single value. The min-max scaling formula is as follows:
X_{norm} = \frac{X - X_{min}}{X_{max} - X_{min}}
- wip.modules.ops.replace_string_from_file(solver_path, range_min=None, range_max=None)[source]#
Replace “.” by “,” in the file.
- wip.modules.ops.scaling_target_values(feature, scalers, lmin, lmax)[source]#
Scale the lower and upper bounds of a target variable using their scaler.
- Parameters
feature (
str
) – Target feature name.scalers (
Dict[str
,MinMaxScaler]
) – Dictionary containing scaler objects, where the keys are feature names and the values are their fittedsklearn.preprocessing.MinMaxScaler
objects.lmin (
float
) – The lower bound of the target feature.lmax (
float
) – The upper bound of the target feature.
- Returns
The scaled lower and upper bounds of the target feature, along with the feature name.
- Return type
Tuple[float
,float
,str]
Examples
>>> from sklearn.preprocessing import MinMaxScaler >>> import numpy as np >>> data = np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]) >>> scaler = MinMaxScaler().fit(data.reshape(-1, 1)) >>> feature = "QUIM_CFIX_PP_L@08PR" >>> scalers = {feature: scaler} >>> lmin = 0.2 >>> lmax = 0.8 >>> result = scaling_target_values(feature, scalers, lmin, lmax) >>> print(result) (20.0, 80.0, 'cfix')
- wip.modules.ops.string_in_list(string: str, list_strings: List[str]) bool [source]#
Check if a string starts with any element in a list of strings.
This function iterates through the list of strings and checks if the given string starts with any of the elements in the list. If there’s a match, it returns
True
. Otherwise, it returnsFalse
. If an empty string is provided, it returnsFalse
.- Parameters
string (
str
) – The string to be checked for starting substrings.list_strings (
List[str]
) – The list of strings that’ll be checked as starting substrings of the input string.
- Returns
True
if the input string starts with any element in the list,False
otherwise.- Return type
Examples
>>> string_in_list("hello", ["hi", "hell"]) True >>> string_in_list("world", ["wor", "earth"]) True >>> string_in_list("", ["hi", "hello"]) False >>> string_in_list("goodbye", ["hi", "hello"]) False
- wip.modules.ops.unnormalize_feature(scalers: Dict[str, MinMaxScaler], feature: str, norm_value: float, operation: str = 'two feature') float | Tuple[float, float] [source]#
Unnormalize a given feature value based on scalers.
This method contains two modes of rescailing the data:
two feature
: Use this mode to rescale the coefficients and intercepts of a linear regression model. In other words, this method can be used to convert the normalized coefficients and intercepts of a linear regression model, enabling their use with the original data, without having to normalize it first. Thefirst_value
andsecond_value
returned by this method represent the new unscaled coefficient and intercept, respectively. Thesecond_value
that’s returned needs to be subtracted from the normalized intercept to obtain its unscaled value.one feature
: Use this mode to rescale a single feature value that was normalized using min-max scaling.
- Parameters
scalers (
Dict[str
,MinMaxScaler]
) – Dictionary containing scaler objects, where the keys represent the feature names, and the values are their fittedsklearn.preprocessing.MinMaxScaler
objects.feature (
str
) – The key corresponding to the feature to be unnormalized. This feature must be present in thescalers
dictionary.norm_value (
float
) – The normalized value of the specified feature to rescale back to the original range.operation (str
{"two feature", "one feature"}
, default"two feature"
) – The operation to perform. Ifoperation
is “two feature”, then thenorm_value
is assumed to be the coefficient of a linear regression model, related to the specifiedfeature
. Ifoperation
is “one feature”, then thenorm_value
is assumed to be a single value normalized using min-max scaling.
- Returns
If
operation
is “two feature”, then a tuple containing the unscaled coefficient and intercept of the linear regression model is returned. Ifoperation
is “one feature”, then the unscaled value of the specified feature is returned.- Return type
float | Tuple[float
,float]
Examples
The example below shows how to use this method to rescale the coefficients and intercept of a linear regression model.
>>> import numpy as np >>> from sklearn.preprocessing import MinMaxScaler >>> from sklearn.linear_model import Ridge >>> X = np.array([[100, 400], [200, 300], [300, 200], [400, 100]]) >>> y = np.array([1, 2, 3, 4]) >>> scalers = {idx: MinMaxScaler().fit(X[:, idx].reshape(-1, 1)) for idx in range(X.shape[1])} >>> X_norm = np.array( ... [scalers[idx].transform(X[:, idx].reshape(-1, 1)).reshape(-1) ... for idx in range(X.shape[1])] ... ).T >>> print(X_norm) [[0. 1. ] [0.33333333 0.66666667] [0.66666667 0.33333333] [1. 0. ]] >>> model = Ridge().fit(X_norm, y) >>> print(X_norm[0].dot(model.coef_) + model.intercept_) # noqa 1.710526315789474 >>> # Same as model.predict(X_norm[0].reshape(1, -1)) >>> intercept = model.intercept_ # noqa >>> coeff_unscaled = [] >>> for idx, coeff in enumerate(model.coef_): # noqa ... coeff, intercept_unscaled = unnormalize_feature(scalers, idx, coeff) ... intercept -= intercept_unscaled ... coeff_unscaled.append(coeff) ... print(X[0].dot(np.array(coeff_unscaled)) + intercept) 1.710526315789474
The next example demonstrates the use of this method to rescale a single feature value that was normalized using min-max scaling: >>> print(unnormalize_feature(scalers, “0”, np.array([[X_norm[0, 0]]]), “one_feature”)) array([[100.]])
Notes
The min-max scaling formula is as follows:
X_{norm} = \frac{X - X_{min}}{X_{max} - X_{min}}
The unscaled value for
X
can be obtained by rearranging the above formula:X = X_{norm} \times (X_{max} - X_{min}) + X_{min}
See also
solver_ops.solver_operations.write_descriptive_contraints
Function that uses this method to rescale the coefficients and intercepts before using them to define an LP optimization model.