utils#
Utility functions for general purpose tasks.
This module contains the following utility functions:
is_running_on_databricks: Check if the code is running locally or on Databricks.get_spark_context: Get the Spark context.find_filepath: Find a file or folder in theinitial_dirdirectory or its parent directories.remove_files: Remove files from a directory matching a specified pattern.display_files: Display tables of removed and not removed files in a given directory.
- wip.utils.dbutils_glob(pattern: str)[source]#
Perform a glob-like pattern matching for files in ABFSS using
dbutils.fs.- Parameters
pattern (
str) – The glob pattern to match against file names. Supports ‘*’ and ‘?’ wildcards.- Returns
A list of matched file paths in ABFSS.
- Return type
List[str]
- wip.utils.display_files(removed_files: List[str], not_removed_files: List[str])[source]#
Display tables of removed and not removed files in a given directory.
This function creates and displays two tables:
One for files successfully removed
Table of files that were not removed from the specified directory.
The tables include file names and directory paths.
- Parameters
removed_files (
List[str]) – List of file paths that were successfully removed.not_removed_files (
List[str]) – List of file paths that were not removed.
Notes
This function uses
rich.console.Consoleandrich.table.Tablefor displaying the tables in a formatted manner. It relies onloggerfor logging the number of removed and not removed files.Examples
>>> display_files(["/path/to/dir/removed.txt"], []) # This will display a table of removed files.
- wip.utils.exists(path: str | Path) bool[source]#
Check if a file or directory exists locally or in DataBricks.
- Parameters
path (
str | Path) – The file path to check if it exists.- Returns
Whether the file or directory exists.
- Return type
- wip.utils.find_filepath(filename: str | Path, initial_dir: str | Path | None = None, max_upper_dirs: int = 4) Path[source]#
Find a file or folder in the
initial_dirdirectory or its parent directories.- Parameters
filename (
str | Path) – The filename to find.initial_dir (
str | Path | None) – The initial directory to start searching from. If None, the current directory is used.max_upper_dirs (
int, default3) – The maximum number of parent directories to search. Note that increasing the maximum number of parent directories to search can increase search time exponentially.
- Returns
The path to the file.
- Return type
Path- Raises
If one of the following occurs:
If the file isn’t found.
If the initial directory doesn’t exist.
If the initial directory is a file.
- wip.utils.get_dbutils()[source]#
Get the Databricks
dbutilsmodule.- Returns
The Databricks
dbutilsmodule, that contains modules likefs.- Return type
ModuleType
- wip.utils.get_function_kwargs(func: Callable, **kwargs) Tuple[Dict[str, Any], Dict[str, Any]][source]#
Return a dictionary of keyword arguments accepted by a given function.
- Parameters
func (
Callable) – The function whose keyword arguments are to be retrieved.kwargs (
Any) – Keyword arguments to pass to the function.
- Returns
A dictionary of keyword arguments accepted by a given function and another dictionary with the remaining keyword arguments.
- Return type
Tuple[Dict[str,Any],Dict[str,Any]]
- wip.utils.get_function_parameters(func: Callable) List[str][source]#
Returns a list of parameter names accepted by a given function.
- Parameters
func (
Callable) – The function whose parameters are to be retrieved.- Returns
A list of parameter names.
- Return type
List[str]
- wip.utils.get_spark_context()[source]#
Get the Spark context.
- Returns
The Spark context.
- Return type
pyspark.context.SparkContext
- wip.utils.is_running_on_databricks() bool[source]#
Check if the code is running locally or on Azure Databricks.
Function checks if the environment variable
DATABRICKS_RUNTIME_VERSIONexists. If it’s, then the code is running on Azure Databricks.- Returns
Trueif running on Azure Databricks,Falseotherwise.- Return type
- wip.utils.remove_files(directory: str | Path, pattern: str, verbose: bool = False) Tuple[List[str], List[str]][source]#
Remove files from a directory matching a specified pattern.
This function attempts to delete files in a specified directory that match a given pattern. It returns lists of both removed and not removed files. If the directory does not exist or is not a directory, it logs an error.
- Parameters
directory (
str | Path) – The directory from which files are to be removed. Accepts either a string path or aPathobject.pattern (
str) – The pattern used to match files for removal, e.g., ‘*.txt’, or the name of the file to remove.verbose (
bool, defaultFalse) – If True, displays tables of removed and not removed files.
- Returns
A tuple containing two lists:
The first list contains paths of files successfully removed
The second list contains paths of files that were not removed.
- Return type
Tuple[List[str],List[str]]- Raises
Exception – General exceptions are caught and logged if file removal fails.
See also
Examples
>>> remove_files("/path/to/dir", "*.txt") (['/path/to/dir/file1.txt', '/path/to/dir/file2.txt'], []) >>> remove_files("/path/to/dir", "**/*.txt") (['/path/to/dir/folder1/file1.txt', '/path/to/dir/folder2/file2.txt'], []) >>> remove_files("/path/to/dir", "file1.txt") (['/path/to/dir/file1.txt'], [])
Notes
This function logs errors and exceptions using the
loggerfromwip.logging_config. It usesPathfrompathlibfor path manipulations and checks...versionadded:: 2.4.0
Include the
remove_files_databricksfunction for removing files from ABFSS paths in Databricks.
- wip.utils.remove_files_databricks(directory: str | Path, pattern: str, verbose: bool = True) Tuple[List[str], List[str]][source]#
Remove files from a Storage Account container path in Databricks.
This function attempts to delete files in a specified directory that match a given pattern. It returns lists of both removed and not removed files. If the directory does not exist or is not a directory, it logs an error.
- Parameters
directory (
str | Path) – The directory from which files are to be removed. Accepts either a string path or aPathobject.pattern (
str) – The pattern used to match files for removal, e.g., ‘*.txt’, or the name of the file to remove.verbose (
bool, defaultFalse) – If True, displays tables of removed and not removed files.
- Returns
A tuple containing two lists:
The first list contains paths of files successfully removed
The second list contains paths of files that were not removed.
- Return type
Tuple[List[str],List[str]]- Raises
Exception – General exceptions are caught and logged if file removal fails.
Notes
This function assumes it’s running in a Databricks environment. It uses Databricks’
dbutils.fsmodule to interact with ABFSS paths.Changed in version 2.8.9: Added a try/except clause to check if the path being accessed actually exists inside Azure Container.