fs_utils
Utilities for file system operations
CopyPolicy
Bases: str
, Enum
How to handle cases in which a replication of existing data is detected. In these cases, some operations may be avoided, trading off data consistency for speed.
Source code in pipewine/sinks/fs_utils.py
HARD_LINK = 'HARD_LINK'
class-attribute
instance-attribute
Hard links are similar to symbolic links in terms of speed and safety, the difference is that they point to the same inode of the linked file.
Soft link: MY_LINK
--> MY_FILE
--> INODE
Hard link: MY_FILE
--> INODE
<-- MY_LINK
While this does not offer any protection against modifications of the original file (because the data is not actually replicated), it prevents links from breaking in case the original file is moved/renamed or even deleted. Hard link are also faster than symbolic links, making this option preferable. The only limitation is that hard links can only be created when both link and linked files exist on the same filesystem.
REPLICATE = 'REPLICATE'
class-attribute
instance-attribute
Avoid the serialization (which can be expensive), and simply copy the existing file, replicating all of its content. This option is significantly faster than re-serializing everything every time, but may cause data corruption in case of read-write-read races. E.g. pipewine reads the file, some other process modifies it, then pipewine copies the modified file assuming it was not modified. Pipewine datasets should not be modified inplace, and pipewine never does so unless explicitly configured to do so, making this option relatively safe to use.
REWRITE = 'REWRITE'
class-attribute
instance-attribute
Do not copy anything, ever, even if the data is untouched. Treat every write alike: serialize the object, encode it and write to a new file. This is the slowest option but also the safest.
SYMBOLIC_LINK = 'SYMBOLIC_LINK'
class-attribute
instance-attribute
Symbolic links (a.k.a. soft links) are simply a reference to another file in any of the mounted file systems. They are virtually inexpensive, but do not actually contain any replicated data. They are prone to cause data loss, because every time the original file is modified/deleted/renamed/moved, the symbolic link will point to a modified/deleted file. A symbolic link can point to a file on another file system. This option is extremely unsafe, use at your own risk.
write_item_to_file(item, file, copy_policy=CopyPolicy.HARD_LINK)
Write an item to a file, using the specified copy policy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
item
|
Item
|
The item to write to the file. |
required |
file
|
Path
|
The path to the file to write to. |
required |
copy_policy
|
CopyPolicy
|
The copy policy to use when Pipewine infers that the item is a copy of an existing file. Defaults to CopyPolicy.HARD_LINK. |
HARD_LINK
|
Raises:
Type | Description |
---|---|
IOError
|
If the item cannot be written to the file. |