utils#
Utility functions for general purpose tasks.
This module contains the following utility functions:
is_running_on_databricks
: Check if the code is running locally or on Databricks.get_spark_context
: Get the Spark context.find_filepath
: Find a file or folder in theinitial_dir
directory or its parent directories.remove_files
: Remove files from a directory matching a specified pattern.display_files
: Display tables of removed and not removed files in a given directory.
- wip.utils.dbutils_glob(pattern: str)[source]#
Perform a glob-like pattern matching for files in ABFSS using
dbutils.fs
.- Parameters
pattern (
str
) – The glob pattern to match against file names. Supports ‘*’ and ‘?’ wildcards.- Returns
A list of matched file paths in ABFSS.
- Return type
List[str]
- wip.utils.display_files(removed_files: List[str], not_removed_files: List[str])[source]#
Display tables of removed and not removed files in a given directory.
This function creates and displays two tables:
One for files successfully removed
Table of files that were not removed from the specified directory.
The tables include file names and directory paths.
- Parameters
removed_files (
List[str]
) – List of file paths that were successfully removed.not_removed_files (
List[str]
) – List of file paths that were not removed.
Notes
This function uses
rich.console.Console
andrich.table.Table
for displaying the tables in a formatted manner. It relies onlogger
for logging the number of removed and not removed files.Examples
>>> display_files(["/path/to/dir/removed.txt"], []) # This will display a table of removed files.
- wip.utils.exists(path: str | Path) bool [source]#
Check if a file or directory exists locally or in DataBricks.
- Parameters
path (
str | Path
) – The file path to check if it exists.- Returns
Whether the file or directory exists.
- Return type
- wip.utils.find_filepath(filename: str | Path, initial_dir: str | Path | None = None, max_upper_dirs: int = 4) Path [source]#
Find a file or folder in the
initial_dir
directory or its parent directories.- Parameters
filename (
str | Path
) – The filename to find.initial_dir (
str | Path | None
) – The initial directory to start searching from. If None, the current directory is used.max_upper_dirs (
int
, default3
) – The maximum number of parent directories to search. Note that increasing the maximum number of parent directories to search can increase search time exponentially.
- Returns
The path to the file.
- Return type
Path
- Raises
If one of the following occurs:
If the file isn’t found.
If the initial directory doesn’t exist.
If the initial directory is a file.
- wip.utils.get_dbutils()[source]#
Get the Databricks
dbutils
module.- Returns
The Databricks
dbutils
module, that contains modules likefs
.- Return type
ModuleType
- wip.utils.get_function_kwargs(func: Callable, **kwargs) Tuple[Dict[str, Any], Dict[str, Any]] [source]#
Return a dictionary of keyword arguments accepted by a given function.
- Parameters
func (
Callable
) – The function whose keyword arguments are to be retrieved.kwargs (
Any
) – Keyword arguments to pass to the function.
- Returns
A dictionary of keyword arguments accepted by a given function and another dictionary with the remaining keyword arguments.
- Return type
Tuple[Dict[str
,Any]
,Dict[str
,Any]]
- wip.utils.get_function_parameters(func: Callable) List[str] [source]#
Returns a list of parameter names accepted by a given function.
- Parameters
func (
Callable
) – The function whose parameters are to be retrieved.- Returns
A list of parameter names.
- Return type
List[str]
- wip.utils.get_spark_context()[source]#
Get the Spark context.
- Returns
The Spark context.
- Return type
pyspark.context.SparkContext
- wip.utils.is_running_on_databricks() bool [source]#
Check if the code is running locally or on Azure Databricks.
Function checks if the environment variable
DATABRICKS_RUNTIME_VERSION
exists. If it’s, then the code is running on Azure Databricks.- Returns
True
if running on Azure Databricks,False
otherwise.- Return type
- wip.utils.remove_files(directory: str | Path, pattern: str, verbose: bool = False) Tuple[List[str], List[str]] [source]#
Remove files from a directory matching a specified pattern.
This function attempts to delete files in a specified directory that match a given pattern. It returns lists of both removed and not removed files. If the directory does not exist or is not a directory, it logs an error.
- Parameters
directory (
str | Path
) – The directory from which files are to be removed. Accepts either a string path or aPath
object.pattern (
str
) – The pattern used to match files for removal, e.g., ‘*.txt’, or the name of the file to remove.verbose (
bool
, defaultFalse
) – If True, displays tables of removed and not removed files.
- Returns
A tuple containing two lists:
The first list contains paths of files successfully removed
The second list contains paths of files that were not removed.
- Return type
Tuple[List[str]
,List[str]]
- Raises
Exception – General exceptions are caught and logged if file removal fails.
See also
Examples
>>> remove_files("/path/to/dir", "*.txt") (['/path/to/dir/file1.txt', '/path/to/dir/file2.txt'], []) >>> remove_files("/path/to/dir", "**/*.txt") (['/path/to/dir/folder1/file1.txt', '/path/to/dir/folder2/file2.txt'], []) >>> remove_files("/path/to/dir", "file1.txt") (['/path/to/dir/file1.txt'], [])
Notes
This function logs errors and exceptions using the
logger
fromwip.logging_config
. It usesPath
frompathlib
for path manipulations and checks...versionadded:: 2.4.0
Include the
remove_files_databricks
function for removing files from ABFSS paths in Databricks.
- wip.utils.remove_files_databricks(directory: str | Path, pattern: str, verbose: bool = True) Tuple[List[str], List[str]] [source]#
Remove files from a Storage Account container path in Databricks.
This function attempts to delete files in a specified directory that match a given pattern. It returns lists of both removed and not removed files. If the directory does not exist or is not a directory, it logs an error.
- Parameters
directory (
str | Path
) – The directory from which files are to be removed. Accepts either a string path or aPath
object.pattern (
str
) – The pattern used to match files for removal, e.g., ‘*.txt’, or the name of the file to remove.verbose (
bool
, defaultFalse
) – If True, displays tables of removed and not removed files.
- Returns
A tuple containing two lists:
The first list contains paths of files successfully removed
The second list contains paths of files that were not removed.
- Return type
Tuple[List[str]
,List[str]]
- Raises
Exception – General exceptions are caught and logged if file removal fails.
Notes
This function assumes it’s running in a Databricks environment. It uses Databricks’
dbutils.fs
module to interact with ABFSS paths.Changed in version 2.8.9: Added a try/except clause to check if the path being accessed actually exists inside Azure Container.