io_ops#

Module with I/O operations that work locally and inside DataBricks interchangeably.

The functions in this module are intended to be used instead of the native Python functions to read and write files, or pandas I/O functions, such as pandas.DataFrame.to_csv, pandas.DataFrame.to_excel, and pandas.read_csv, etc.

In other words, instead of using:

import pandas as pd
df = pd.read_csv('path/to/file.csv')

Use:

from wip.datatools.io_ops import read_csv
df = read_csv('path/to/file.csv')

The same logic applies to other functions, such as pandas.DataFrame.to_csv.

Notes

All functions inside this module should be able to handle both local and ABFS filepaths. In other words, they should be able to handle both:

from wip.datatools.io_ops import read_csv
df = read_csv('path/to/file.csv')
# Or:
df = read_csv('abfss://insight@usazu1valesa001.dfs.core.windows.net/path/to/file.csv')
wip.datatools.io_ops.read_csv(path: str | Path, **kwargs: Any) pd.DataFrame[source]#

Read a CSV file and convert it to a pandas.DataFrame.

This Function works on both local and DataBricks environments.

Parameters
  • path (str | Path) – The file path where the CSV data is stored. It can be either a string or a pathlib.Path object.

  • kwargs (Any) – Additional keyword arguments passed to the pandas.read_csv function.

Returns

The CSV data converted to a pandas.DataFrame.

Return type

pd.DataFrame

wip.datatools.io_ops.read_joblib(path: str | Path) Any[source]#

Read .joblib extension files from a local directory or DataBricks.

The Function determines whether the code is being executed locally or inside DataBricks automatically, and determines how to read the files accordingly.

Parameters

path (str | Path) – The path to the .joblib extension file.

Returns

The .joblib file contents.

Return type

Any

wip.datatools.io_ops.read_json(path: str | Path, **kwargs: Any) dict | list[source]#

Read a JSON file and convert it to a Python object.

This Function works on both local and DataBricks environments.

Parameters
  • path (str | Path) – The file path where the JSON data is stored. It can be either a string or a pathlib.Path object.

  • kwargs (Any) –

    Additional keyword arguments. If ‘encoding’ is not specified, it defaults to ‘utf-8’.

    Other kwargs are passed to the open and json.load functions.

Returns

The JSON data converted to a Python object.

Return type

dict | list

wip.datatools.io_ops.read_local_datasets_df_sql() Tuple[Dict[str, DataFrame], DataFrame][source]#

Read the datasets and df_sql files from the local filesystem.

Returns

A tuple with the datasets and df_sql files.

Return type

Tuple[Dict[str, pd.DataFrame], pd.DataFrame]

Raises

RuntimeError – If the code is being executed inside DataBricks.

wip.datatools.io_ops.read_text(path: str | Path, mode: str = 'r', encoding: str = 'utf-8', **kwargs: Any) str[source]#

Read a text file and convert it to a string.

This Function works on both local and DataBricks environments.

Parameters
  • path (str | Path) – The file path where the text data is stored. It can be either a string or a pathlib.Path object.

  • mode (str, default "r") – The mode in which the file is opened. Possible values are: ‘r’, ‘r+’, ‘rb’.

  • encoding (str, default "utf-8") – The encoding to use to read the file. Encoding ensures that the file is read correctly. Possible values are: ‘utf-8’, ‘utf-16’, ‘latin-1’, etc. See the Notes section for more information.

  • kwargs (Any) –

    Additional keyword arguments. If ‘encoding’ is not specified, it defaults to ‘utf-8’.

    Other kwargs are passed to the open and json.load functions.

Returns

The text data converted to a string.

Return type

str

Notes

For a list of standard encodings that Python support, see: https://docs.python.org/3.11/library/codecs.html#standard-encodings

wip.datatools.io_ops.to_csv(data: pd.DataFrame, path: str | Path, **kwargs: Any)[source]#

Save a pandas.DataFrame as csv locally or to DataBricks.

The Function automatically detects if code is being executed locally or inside DataBricks, and applies the necessary actions to save the results as csv files based on where the code is being executed.

Parameters
  • data (pd.DataFrame) – Pandas DataFrame to save as csv file.

  • path (str | Path) – Where to save the resulting Csv file.

  • kwargs (Any) – Keyword arguments to pass to the pandas.DataFrame.to_csv method.

wip.datatools.io_ops.to_excel(data: pd.DataFrame, path: str | Path, **kwargs: Any)[source]#

Save a pandas.DataFrame as Excel locally or to DataBricks.

The Function automatically detects if code is being executed locally or inside DataBricks, and applies the necessary actions to save the results as Excel files based on where the code is being executed.

Parameters
  • data (pd.DataFrame) – Pandas DataFrame to save as Excel file.

  • path (str | Path) – Where to save the resulting Excel file.

  • kwargs (Any) – Keyword arguments to pass to the pandas.DataFrame.to_excel method.

wip.datatools.io_ops.to_joblib(obj: object, path: str | Path, **kwargs: Any)[source]#

Save an object as a joblib file locally or to DataBricks.

The Function automatically detects if code is being executed locally or inside DataBricks, and applies the necessary actions to save the object as a joblib file based on where the code is being executed.

Parameters
  • obj (object) – Pandas DataFrame to save as a joblib file.

  • path (str | Path) – Where to save the resulting joblib file.

  • kwargs (Any) – Keyword arguments to pass to the joblib.dump method.

wip.datatools.io_ops.to_json(data: dict | list | str | int | float | bool | NoneType, path: str | Path, **kwargs: Any)[source]#

Convert and save data to a JSON file.

This function takes various data types, converts them into a JSON format, and writes them to a file specified by the path. The function supports additional keyword arguments that are passed to the file open function.

Parameters
  • data (dict | list | str | int | float | bool | NoneType) – The data to be converted to JSON. This can be a dictionary, list, string, integer, float, boolean, or None.

  • path (str | Path) – The file path where the JSON data should be stored. Can be a string or Path object.

  • **kwargs (Any) –

    Additional keyword arguments. If ‘encoding’ is not specified, it defaults to ‘utf-8’.

    Other kwargs are passed to the open and json.dump functions.

Examples

>>> data = {"name": "John", "age": 30, "city": "New York"}
>>> to_json(data, 'path/to/file.json')
# This will save the data in JSON format in the specified file path.
>>> to_json(["apple", "banana", "cherry"], 'path/to/list.json', encoding='ascii')
# Saves the list as a JSON in ASCII encoding.

Notes

The function uses json.dump for serialization. Custom serialization can be handled by passing a custom cls parameter in kwargs if needed.

Raises
  • TypeError – If the data cannot be serialized to JSON.

  • OSError – If there is an issue writing to the file.

Parameters
wip.datatools.io_ops.to_lp(prob: pulp.LpProblem, path: str | Path)[source]#

Write a linear programming problem to an .lp file.

Parameters
  • prob (pulp.LpProblem) – The linear programming problem.

  • path (str | Path) – The path to the .lp file.

wip.datatools.io_ops.to_mps(prob: pulp.LpProblem, path: str | Path)[source]#

Write a linear programming problem to an .mps file.

Parameters
  • prob (pulp.LpProblem) – The linear programming problem.

  • path (str | Path) – The path to the .mps file.

wip.datatools.io_ops.to_pickle(obj: object, path: str | Path, **kwargs: Any)[source]#

Save an object as a pickle file locally or to DataBricks.

Parameters
  • obj (object) – The object to save as a pickle file.

  • path (str | Path) – The file path where the object is to be saved as pickle. It can be either a string or a pathlib.Path object.

  • kwargs (Any) –

    Additional keyword arguments. This function saves objects in byte mode (“wb”), therefore, no ‘encoding’ should be specified, as this mode does not need one.

    Other kwargs are passed to the open and pickle.dump functions.

wip.datatools.io_ops.to_text(data: str, path: str | Path, **kwargs: Any)[source]#

Save an object as a text file locally or to DataBricks.

Parameters
  • data (str) – The value to save as a text file.

  • path (str | Path) – The file path where the object is to be saved as text. It can be either a string or a pathlib.Path object.

  • kwargs (Any) – Additional keyword arguments.

wip.datatools.io_ops.write_lp(prob: pulp.LpProblem, key: str, tmp_path: str | Path) Any[source]#

Write the linear programming problem to an .lp file.

Parameters
  • prob (pulp.LpProblem) – The linear programming problem.

  • key (str) – The key to identify the problem.

  • tmp_path (str | Path) – The path to the temporary directory.

Return type

Any