io_ops#
Module with I/O operations that work locally and inside DataBricks interchangeably.
The functions in this module are intended to be used instead of the native Python
functions to read and write files, or pandas
I/O functions, such as
pandas.DataFrame.to_csv
, pandas.DataFrame.to_excel
, and pandas.read_csv
,
etc.
In other words, instead of using:
import pandas as pd
df = pd.read_csv('path/to/file.csv')
Use:
from wip.datatools.io_ops import read_csv
df = read_csv('path/to/file.csv')
The same logic applies to other functions, such as pandas.DataFrame.to_csv
.
Notes
All functions inside this module should be able to handle both local and ABFS filepaths. In other words, they should be able to handle both:
from wip.datatools.io_ops import read_csv
df = read_csv('path/to/file.csv')
# Or:
df = read_csv('abfss://insight@usazu1valesa001.dfs.core.windows.net/path/to/file.csv')
- wip.datatools.io_ops.read_csv(path: str | Path, **kwargs: Any) pd.DataFrame [source]#
Read a CSV file and convert it to a
pandas.DataFrame
.This Function works on both local and DataBricks environments.
- Parameters
path (
str | Path
) – The file path where the CSV data is stored. It can be either a string or apathlib.Path
object.kwargs (
Any
) – Additional keyword arguments passed to thepandas.read_csv
function.
- Returns
The CSV data converted to a
pandas.DataFrame
.- Return type
pd.DataFrame
- wip.datatools.io_ops.read_joblib(path: str | Path) Any [source]#
Read
.joblib
extension files from a local directory or DataBricks.The Function determines whether the code is being executed locally or inside DataBricks automatically, and determines how to read the files accordingly.
- Parameters
path (
str | Path
) – The path to the.joblib
extension file.- Returns
The
.joblib
file contents.- Return type
Any
- wip.datatools.io_ops.read_json(path: str | Path, **kwargs: Any) dict | list [source]#
Read a JSON file and convert it to a Python object.
This Function works on both local and DataBricks environments.
- Parameters
path (
str | Path
) – The file path where the JSON data is stored. It can be either a string or apathlib.Path
object.kwargs (
Any
) –Additional keyword arguments. If ‘encoding’ is not specified, it defaults to ‘utf-8’.
Other kwargs are passed to the
open
andjson.load
functions.
- Returns
The JSON data converted to a Python object.
- Return type
dict | list
- wip.datatools.io_ops.read_local_datasets_df_sql() Tuple[Dict[str, DataFrame], DataFrame] [source]#
Read the
datasets
anddf_sql
files from the local filesystem.- Returns
A tuple with the
datasets
anddf_sql
files.- Return type
Tuple[Dict[str
,pd.DataFrame]
,pd.DataFrame]
- Raises
RuntimeError – If the code is being executed inside DataBricks.
- wip.datatools.io_ops.read_text(path: str | Path, mode: str = 'r', encoding: str = 'utf-8', **kwargs: Any) str [source]#
Read a text file and convert it to a string.
This Function works on both local and DataBricks environments.
- Parameters
path (
str | Path
) – The file path where the text data is stored. It can be either a string or apathlib.Path
object.mode (
str
, default"r"
) – The mode in which the file is opened. Possible values are: ‘r’, ‘r+’, ‘rb’.encoding (
str
, default"utf-8"
) – The encoding to use to read the file. Encoding ensures that the file is read correctly. Possible values are: ‘utf-8’, ‘utf-16’, ‘latin-1’, etc. See the Notes section for more information.kwargs (
Any
) –Additional keyword arguments. If ‘encoding’ is not specified, it defaults to ‘utf-8’.
Other kwargs are passed to the
open
andjson.load
functions.
- Returns
The text data converted to a string.
- Return type
Notes
For a list of standard encodings that Python support, see:
https://docs.python.org/3.11/library/codecs.html#standard-encodings
- wip.datatools.io_ops.to_csv(data: pd.DataFrame, path: str | Path, **kwargs: Any)[source]#
Save a
pandas.DataFrame
as csv locally or to DataBricks.The Function automatically detects if code is being executed locally or inside DataBricks, and applies the necessary actions to save the results as csv files based on where the code is being executed.
- Parameters
data (
pd.DataFrame
) – Pandas DataFrame to save as csv file.path (
str | Path
) – Where to save the resulting Csv file.kwargs (
Any
) – Keyword arguments to pass to thepandas.DataFrame.to_csv
method.
- wip.datatools.io_ops.to_excel(data: pd.DataFrame, path: str | Path, **kwargs: Any)[source]#
Save a
pandas.DataFrame
as Excel locally or to DataBricks.The Function automatically detects if code is being executed locally or inside DataBricks, and applies the necessary actions to save the results as Excel files based on where the code is being executed.
- Parameters
data (
pd.DataFrame
) – Pandas DataFrame to save as Excel file.path (
str | Path
) – Where to save the resulting Excel file.kwargs (
Any
) – Keyword arguments to pass to thepandas.DataFrame.to_excel
method.
- wip.datatools.io_ops.to_joblib(obj: object, path: str | Path, **kwargs: Any)[source]#
Save an object as a joblib file locally or to DataBricks.
The Function automatically detects if code is being executed locally or inside DataBricks, and applies the necessary actions to save the object as a joblib file based on where the code is being executed.
- Parameters
obj (
object
) – Pandas DataFrame to save as a joblib file.path (
str | Path
) – Where to save the resulting joblib file.kwargs (
Any
) – Keyword arguments to pass to thejoblib.dump
method.
- wip.datatools.io_ops.to_json(data: dict | list | str | int | float | bool | NoneType, path: str | Path, **kwargs: Any)[source]#
Convert and save data to a JSON file.
This function takes various data types, converts them into a JSON format, and writes them to a file specified by the
path
. The function supports additional keyword arguments that are passed to the file open function.- Parameters
data (
dict | list | str | int | float | bool | NoneType
) – The data to be converted to JSON. This can be a dictionary, list, string, integer, float, boolean, or None.path (
str | Path
) – The file path where the JSON data should be stored. Can be a string or Path object.**kwargs (
Any
) –Additional keyword arguments. If ‘encoding’ is not specified, it defaults to ‘utf-8’.
Other kwargs are passed to the
open
andjson.dump
functions.
Examples
>>> data = {"name": "John", "age": 30, "city": "New York"} >>> to_json(data, 'path/to/file.json') # This will save the data in JSON format in the specified file path.
>>> to_json(["apple", "banana", "cherry"], 'path/to/list.json', encoding='ascii') # Saves the list as a JSON in ASCII encoding.
Notes
The function uses
json.dump
for serialization. Custom serialization can be handled by passing a customcls
parameter inkwargs
if needed.
- wip.datatools.io_ops.to_lp(prob: pulp.LpProblem, path: str | Path)[source]#
Write a linear programming problem to an
.lp
file.- Parameters
prob (
pulp.LpProblem
) – The linear programming problem.path (
str | Path
) – The path to the.lp
file.
- wip.datatools.io_ops.to_mps(prob: pulp.LpProblem, path: str | Path)[source]#
Write a linear programming problem to an
.mps
file.- Parameters
prob (
pulp.LpProblem
) – The linear programming problem.path (
str | Path
) – The path to the.mps
file.
- wip.datatools.io_ops.to_pickle(obj: object, path: str | Path, **kwargs: Any)[source]#
Save an object as a pickle file locally or to DataBricks.
- Parameters
obj (
object
) – The object to save as a pickle file.path (
str | Path
) – The file path where the object is to be saved as pickle. It can be either a string or apathlib.Path
object.kwargs (
Any
) –Additional keyword arguments. This function saves objects in byte mode (“wb”), therefore, no ‘encoding’ should be specified, as this mode does not need one.
Other kwargs are passed to the
open
andpickle.dump
functions.
- wip.datatools.io_ops.to_text(data: str, path: str | Path, **kwargs: Any)[source]#
Save an object as a text file locally or to DataBricks.
- Parameters
data (
str
) – The value to save as a text file.path (
str | Path
) – The file path where the object is to be saved as text. It can be either a string or apathlib.Path
object.kwargs (
Any
) – Additional keyword arguments.
- wip.datatools.io_ops.write_lp(prob: pulp.LpProblem, key: str, tmp_path: str | Path) Any [source]#
Write the linear programming problem to an
.lp
file.- Parameters
prob (
pulp.LpProblem
) – The linear programming problem.key (
str
) – The key to identify the problem.tmp_path (
str | Path
) – The path to the temporary directory.
- Return type
Any