ops#

Operations applied over the data

wip.modules.ops.define_real_scalers(datasets: Dict[str, DataFrame])[source]#

Define a scaler for each variable from the datasets.

Parameters

datasets (Dict[str, pd.DataFrame]) – The dictionary of datasets, Each key/value pair represents a model and its corresponding dataset.

Returns

Dict[str, MinMaxScaler] – The dictionary of scalers, with the new scalers added.
np.ndarray – A 2D array of values used to fit the scalers.

wip.modules.ops.detect_file_encoding(filename: str | BufferedReader) → str[source]#

Detect the character encoding of a file or BufferedReader.

This function uses the chardet library to determine the character encoding of the input file or BufferedReader object. The input can be a file path (as a string) or a BufferedReader object.

Parameters: filename (str | BufferedReader) – File path (as a string) or a BufferedReader object for which the character encoding needs to be detected.
Returns: The detected character encoding of the input file or BufferedReader.
Return type: str

Examples

>>> file_path = "example.txt"
>>> encoding = detect_file_encoding(file_path)
>>> print(encoding)
'utf-8'

Notes

This function uses the chardet library to detect the character encoding of the input file or BufferedReader object. The chardet library evaluates the file content and returns the encoding along with a confidence score. The confidence score means the certainty of the encoding being correct. The encoding with the highest

wip.modules.ops.fit_scalers_to_tag(tag_values: Series) → object[source]#

Fit a MinMaxScaler to a tag’s values.

Parameters: tag_values (pd.Series) – The pandas.Series containing the values of a tag.
Returns: Fitted scaler.
Return type: object

Notes

When fitting a MinMaxScaler to an array-like object that contains a single column, the array-like object needs to be reshaped to a 2D array prior to fitting the scaler with it. This function’s main purpose is to provide a simple shortcut to do so, without having to manually reshape the array-like object every time.

wip.modules.ops.get_original_tag_name(otm_tag_name: str) → str[source]#

Get original tag name from OTM tag name.

Parameters: otm_tag_name (str) – OTM tag name
Returns: Original tag name
Return type: str

Examples

>>> get_original_tag_name("TEMP1_I@08QU_QU_855I_GQ16")
'TEMP1_I@08QU-QU-855I-GQ16'
>>> get_original_tag_name("cfix")
'cfix'
>>> get_original_tag_name('equalPQmult24div768divFUNC')
'=PQ*24/768/FUNC'
>>> get_original_tag_name('equal192divVELO')
'=192/VELO'
>>> get_original_tag_name('GRAN_OCS_16-18@08PE-BD-840I-01')
'GRAN_OCS_16-18@08PE-BD-840I-01'
>>> get_original_tag_name("POT_TOTAL_VENT___US8")
'POT TOTAL VENT - US8'

wip.modules.ops.inverse_transform_lpvar(lpvar: LpVariable, scaler: MinMaxScaler) → LpAffineExpression[source]#

Get the inverse transform a LpVariable.

Parameters

lpvar (pulp.LpVariable) – The pulp.LpVariable to inverse transform.
scaler (MinMaxScaler) – The Scaler used to transform the LpVariable

Returns

Inverse transformed LpVariable.

Return type

pulp.LpAffineExpression

Notes

Variables on the optimization model are normalized to the range [0, 1] using sklearn.preprocessing.MinMaxScaler. Although this normalization is necessary due to the architecture of the optimization problem, some constraints require the comparison of variables that can only occur if they’re on their original scale.

Examples

>>> import numpy as np
>>> from sklearn.preprocessing import MinMaxScaler
>>> data = np.array([1, 20, 80, 85, 55, 100])
>>> # Fit a scaler to the example data defined above.
>>> scaler = MinMaxScaler().fit(data.reshape(-1, 1))
>>> scaled_data = scaler.transform(data.reshape(-1, 1)).reshape(-1)
>>> # Show what the scaled data looks like. It should contain values that
>>> # range from 0 to 1.
>>> print(scaled_data)
[0.         0.19191919 0.7979798  0.84848485 0.54545455 1.        ]
>>> # Rescale the last scaled value back to its original value:
>>> print(inverse_transform_lpvar(scaled_data[-1], scaler))
99.99999999999999

wip.modules.ops.normalize_feature(scalers, feature, norm_value)[source]#

Normalize a given feature value based on scalers.

This function takes a dictionary of scalers, a feature key, and a value to normalize. It returns the normalized value based on the given scalers for the specified feature.

Attention

This function assumes that the given scalers are sklearn.preprocessing.MinMaxScaler objects. It tries to access the attributes data_range_ and data_min_ of the scaler object, that only exist inside the MinMaxScaler class.

Parameters

scalers (Dict[str, MinMaxScaler]) – Dictionary containing scaler objects, where the keys are feature names and the values are scaler objects with data_min_ and data_range_ attributes.
feature (str) – The key corresponding to the feature to be normalized in the scalers dictionary.
norm_value (float) – The value of the specified feature to normalize.

Returns

The normalized value of norm_value for the given feature using the provided scalers. The scaled value represents a value between 0 and 1.

Return type

float

Examples

>>> from sklearn.preprocessing import MinMaxScaler
>>> import numpy as np
>>> data = np.array([[1, 2], [3, 4], [5, 6]])
>>> scaler = MinMaxScaler().fit(data)
>>> scalers = {"feature1": scaler}
>>> normalize_feature(scalers, "feature1", 4)
0.5

Notes

This function applies the min-max scaling formula manually. Applying the normalization formula manually allows for single value normalization, without having to implement logic that transforms the single value into a numpy array, and then back into a single value. The min-max scaling formula is as follows:

X_{norm} = \frac{X - X_{min}}{X_{max} - X_{min}}

wip.modules.ops.read_json_dls(file_path, file_name)[source]#

Read a json file from ADLS.

Parameters

file_path (str) – The path to the json file.
file_name (str) – The name of the json file.

Returns

The data within the json file.

Return type

Dict[str, Any]

wip.modules.ops.replace_string_from_file(solver_path, range_min=None, range_max=None)[source]#

Replace “.” by “,” in the file.

Parameters

solver_path (str) – The path to the solver file.
range_min (int, optional) – The minimum range value, by default None
range_max (int, optional) – The maximum range value, by default None

wip.modules.ops.scaling_target_values(feature, scalers, lmin, lmax)[source]#

Scale the lower and upper bounds of a target variable using their scaler.

Parameters

feature (str) – Target feature name.
scalers (Dict[str, MinMaxScaler]) – Dictionary containing scaler objects, where the keys are feature names and the values are their fitted sklearn.preprocessing.MinMaxScaler objects.
lmin (float) – The lower bound of the target feature.
lmax (float) – The upper bound of the target feature.

Returns

The scaled lower and upper bounds of the target feature, along with the feature name.

Return type

Tuple[float, float, str]

Examples

>>> from sklearn.preprocessing import MinMaxScaler
>>> import numpy as np
>>> data = np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
>>> scaler = MinMaxScaler().fit(data.reshape(-1, 1))
>>> feature = "QUIM_CFIX_PP_L@08PR"
>>> scalers = {feature: scaler}
>>> lmin = 0.2
>>> lmax = 0.8
>>> result = scaling_target_values(feature, scalers, lmin, lmax)
>>> print(result)
(20.0, 80.0, 'cfix')

wip.modules.ops.string_in_list(string: str, list_strings: List[str]) → bool[source]#

Check if a string starts with any element in a list of strings.

This function iterates through the list of strings and checks if the given string starts with any of the elements in the list. If there’s a match, it returns True. Otherwise, it returns False. If an empty string is provided, it returns False.

Parameters

string (str) – The string to be checked for starting substrings.
list_strings (List[str]) – The list of strings that’ll be checked as starting substrings of the input string.

Returns

True if the input string starts with any element in the list, False otherwise.

Return type

bool

Examples

>>> string_in_list("hello", ["hi", "hell"])
True
>>> string_in_list("world", ["wor", "earth"])
True
>>> string_in_list("", ["hi", "hello"])
False
>>> string_in_list("goodbye", ["hi", "hello"])
False

wip.modules.ops.unnormalize_feature(scalers: Dict[str, MinMaxScaler], feature: str, norm_value: float, operation: str = 'two feature') → float | Tuple[float, float][source]#

Unnormalize a given feature value based on scalers.

This method contains two modes of rescailing the data:

two feature: Use this mode to rescale the coefficients and intercepts of a linear regression model. In other words, this method can be used to convert the normalized coefficients and intercepts of a linear regression model, enabling their use with the original data, without having to normalize it first. The first_value and second_value returned by this method represent the new unscaled coefficient and intercept, respectively. The second_value that’s returned needs to be subtracted from the normalized intercept to obtain its unscaled value.
one feature: Use this mode to rescale a single feature value that was normalized using min-max scaling.

Parameters

scalers (Dict[str, MinMaxScaler]) – Dictionary containing scaler objects, where the keys represent the feature names, and the values are their fitted sklearn.preprocessing.MinMaxScaler objects.
feature (str) – The key corresponding to the feature to be unnormalized. This feature must be present in the scalers dictionary.
norm_value (float) – The normalized value of the specified feature to rescale back to the original range.
operation (str {"two feature", "one feature"}, default "two feature") – The operation to perform. If operation is “two feature”, then the norm_value is assumed to be the coefficient of a linear regression model, related to the specified feature. If operation is “one feature”, then the norm_value is assumed to be a single value normalized using min-max scaling.

Returns

If operation is “two feature”, then a tuple containing the unscaled coefficient and intercept of the linear regression model is returned. If operation is “one feature”, then the unscaled value of the specified feature is returned.

Return type

float | Tuple[float, float]

Examples

The example below shows how to use this method to rescale the coefficients and intercept of a linear regression model.

>>> import numpy as np
>>> from sklearn.preprocessing import MinMaxScaler
>>> from sklearn.linear_model import Ridge
>>> X = np.array([[100, 400], [200, 300], [300, 200], [400, 100]])
>>> y = np.array([1, 2, 3, 4])
>>> scalers = {idx: MinMaxScaler().fit(X[:, idx].reshape(-1, 1)) for idx in range(X.shape[1])}
>>> X_norm = np.array(
...     [scalers[idx].transform(X[:, idx].reshape(-1, 1)).reshape(-1)
...     for idx in range(X.shape[1])]
... ).T
>>> print(X_norm)
[[0.         1.        ]
 [0.33333333 0.66666667]
 [0.66666667 0.33333333]
 [1.         0.        ]]
>>> model = Ridge().fit(X_norm, y)
>>> print(X_norm[0].dot(model.coef_) + model.intercept_)  # noqa
1.710526315789474
>>> # Same as model.predict(X_norm[0].reshape(1, -1))
>>> intercept = model.intercept_  # noqa
>>> coeff_unscaled = []
>>> for idx, coeff in enumerate(model.coef_):  # noqa
...     coeff, intercept_unscaled = unnormalize_feature(scalers, idx, coeff)
...     intercept -= intercept_unscaled
...     coeff_unscaled.append(coeff)
... print(X[0].dot(np.array(coeff_unscaled)) + intercept)
1.710526315789474

The next example demonstrates the use of this method to rescale a single feature value that was normalized using min-max scaling: >>> print(unnormalize_feature(scalers, “0”, np.array([[X_norm[0, 0]]]), “one_feature”)) array([[100.]])

Notes

The min-max scaling formula is as follows:

X_{norm} = \frac{X - X_{min}}{X_{max} - X_{min}}

The unscaled value for X can be obtained by rearranging the above formula:

X = X_{norm} \times (X_{max} - X_{min}) + X_{min}