io_ops#
Module with I/O operations that work locally and inside DataBricks interchangeably.
The functions in this module are intended to be used instead of the native Python
functions to read and write files, or pandas I/O functions, such as
pandas.DataFrame.to_csv, pandas.DataFrame.to_excel, and pandas.read_csv,
etc.
In other words, instead of using:
import pandas as pd
df = pd.read_csv('path/to/file.csv')
Use:
from wip.datatools.io_ops import read_csv
df = read_csv('path/to/file.csv')
The same logic applies to other functions, such as pandas.DataFrame.to_csv.
Notes
All functions inside this module should be able to handle both local and ABFS filepaths. In other words, they should be able to handle both:
from wip.datatools.io_ops import read_csv
df = read_csv('path/to/file.csv')
# Or:
df = read_csv('abfss://insight@usazu1valesa001.dfs.core.windows.net/path/to/file.csv')
- wip.datatools.io_ops.read_csv(path: str | Path, **kwargs: Any) pd.DataFrame[source]#
Read a CSV file and convert it to a
pandas.DataFrame.This Function works on both local and DataBricks environments.
- Parameters
path (
str | Path) – The file path where the CSV data is stored. It can be either a string or apathlib.Pathobject.kwargs (
Any) – Additional keyword arguments passed to thepandas.read_csvfunction.
- Returns
The CSV data converted to a
pandas.DataFrame.- Return type
pd.DataFrame
- wip.datatools.io_ops.read_joblib(path: str | Path) Any[source]#
Read
.joblibextension files from a local directory or DataBricks.The Function determines whether the code is being executed locally or inside DataBricks automatically, and determines how to read the files accordingly.
- Parameters
path (
str | Path) – The path to the.joblibextension file.- Returns
The
.joblibfile contents.- Return type
Any
- wip.datatools.io_ops.read_json(path: str | Path, **kwargs: Any) dict | list[source]#
Read a JSON file and convert it to a Python object.
This Function works on both local and DataBricks environments.
- Parameters
path (
str | Path) – The file path where the JSON data is stored. It can be either a string or apathlib.Pathobject.kwargs (
Any) –Additional keyword arguments. If ‘encoding’ is not specified, it defaults to ‘utf-8’.
Other kwargs are passed to the
openandjson.loadfunctions.
- Returns
The JSON data converted to a Python object.
- Return type
dict | list
- wip.datatools.io_ops.read_local_datasets_df_sql() Tuple[Dict[str, DataFrame], DataFrame][source]#
Read the
datasetsanddf_sqlfiles from the local filesystem.- Returns
A tuple with the
datasetsanddf_sqlfiles.- Return type
Tuple[Dict[str,pd.DataFrame],pd.DataFrame]- Raises
RuntimeError – If the code is being executed inside DataBricks.
- wip.datatools.io_ops.read_text(path: str | Path, mode: str = 'r', encoding: str = 'utf-8', **kwargs: Any) str[source]#
Read a text file and convert it to a string.
This Function works on both local and DataBricks environments.
- Parameters
path (
str | Path) – The file path where the text data is stored. It can be either a string or apathlib.Pathobject.mode (
str, default"r") – The mode in which the file is opened. Possible values are: ‘r’, ‘r+’, ‘rb’.encoding (
str, default"utf-8") – The encoding to use to read the file. Encoding ensures that the file is read correctly. Possible values are: ‘utf-8’, ‘utf-16’, ‘latin-1’, etc. See the Notes section for more information.kwargs (
Any) –Additional keyword arguments. If ‘encoding’ is not specified, it defaults to ‘utf-8’.
Other kwargs are passed to the
openandjson.loadfunctions.
- Returns
The text data converted to a string.
- Return type
Notes
For a list of standard encodings that Python support, see:
https://docs.python.org/3.11/library/codecs.html#standard-encodings
- wip.datatools.io_ops.to_csv(data: pd.DataFrame, path: str | Path, **kwargs: Any)[source]#
Save a
pandas.DataFrameas csv locally or to DataBricks.The Function automatically detects if code is being executed locally or inside DataBricks, and applies the necessary actions to save the results as csv files based on where the code is being executed.
- Parameters
data (
pd.DataFrame) – Pandas DataFrame to save as csv file.path (
str | Path) – Where to save the resulting Csv file.kwargs (
Any) – Keyword arguments to pass to thepandas.DataFrame.to_csvmethod.
- wip.datatools.io_ops.to_excel(data: pd.DataFrame, path: str | Path, **kwargs: Any)[source]#
Save a
pandas.DataFrameas Excel locally or to DataBricks.The Function automatically detects if code is being executed locally or inside DataBricks, and applies the necessary actions to save the results as Excel files based on where the code is being executed.
- Parameters
data (
pd.DataFrame) – Pandas DataFrame to save as Excel file.path (
str | Path) – Where to save the resulting Excel file.kwargs (
Any) – Keyword arguments to pass to thepandas.DataFrame.to_excelmethod.
- wip.datatools.io_ops.to_joblib(obj: object, path: str | Path, **kwargs: Any)[source]#
Save an object as a joblib file locally or to DataBricks.
The Function automatically detects if code is being executed locally or inside DataBricks, and applies the necessary actions to save the object as a joblib file based on where the code is being executed.
- Parameters
obj (
object) – Pandas DataFrame to save as a joblib file.path (
str | Path) – Where to save the resulting joblib file.kwargs (
Any) – Keyword arguments to pass to thejoblib.dumpmethod.
- wip.datatools.io_ops.to_json(data: dict | list | str | int | float | bool | NoneType, path: str | Path, **kwargs: Any)[source]#
Convert and save data to a JSON file.
This function takes various data types, converts them into a JSON format, and writes them to a file specified by the
path. The function supports additional keyword arguments that are passed to the file open function.- Parameters
data (
dict | list | str | int | float | bool | NoneType) – The data to be converted to JSON. This can be a dictionary, list, string, integer, float, boolean, or None.path (
str | Path) – The file path where the JSON data should be stored. Can be a string or Path object.**kwargs (
Any) –Additional keyword arguments. If ‘encoding’ is not specified, it defaults to ‘utf-8’.
Other kwargs are passed to the
openandjson.dumpfunctions.
Examples
>>> data = {"name": "John", "age": 30, "city": "New York"} >>> to_json(data, 'path/to/file.json') # This will save the data in JSON format in the specified file path.
>>> to_json(["apple", "banana", "cherry"], 'path/to/list.json', encoding='ascii') # Saves the list as a JSON in ASCII encoding.
Notes
The function uses
json.dumpfor serialization. Custom serialization can be handled by passing a customclsparameter inkwargsif needed.
- wip.datatools.io_ops.to_lp(prob: pulp.LpProblem, path: str | Path)[source]#
Write a linear programming problem to an
.lpfile.- Parameters
prob (
pulp.LpProblem) – The linear programming problem.path (
str | Path) – The path to the.lpfile.
- wip.datatools.io_ops.to_mps(prob: pulp.LpProblem, path: str | Path)[source]#
Write a linear programming problem to an
.mpsfile.- Parameters
prob (
pulp.LpProblem) – The linear programming problem.path (
str | Path) – The path to the.mpsfile.
- wip.datatools.io_ops.to_pickle(obj: object, path: str | Path, **kwargs: Any)[source]#
Save an object as a pickle file locally or to DataBricks.
- Parameters
obj (
object) – The object to save as a pickle file.path (
str | Path) – The file path where the object is to be saved as pickle. It can be either a string or apathlib.Pathobject.kwargs (
Any) –Additional keyword arguments. This function saves objects in byte mode (“wb”), therefore, no ‘encoding’ should be specified, as this mode does not need one.
Other kwargs are passed to the
openandpickle.dumpfunctions.
- wip.datatools.io_ops.to_text(data: str, path: str | Path, **kwargs: Any)[source]#
Save an object as a text file locally or to DataBricks.
- Parameters
data (
str) – The value to save as a text file.path (
str | Path) – The file path where the object is to be saved as text. It can be either a string or apathlib.Pathobject.kwargs (
Any) – Additional keyword arguments.
- wip.datatools.io_ops.write_lp(prob: pulp.LpProblem, key: str, tmp_path: str | Path) Any[source]#
Write the linear programming problem to an
.lpfile.- Parameters
prob (
pulp.LpProblem) – The linear programming problem.key (
str) – The key to identify the problem.tmp_path (
str | Path) – The path to the temporary directory.
- Return type
Any