The io/ioutil package has been deprecated as of Go 1.16 [1]. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.
[1]: https://golang.org/doc/go1.16#ioutil
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* Create a test for an overhead bug
This image size and filesystem overhead combination was experimentally determined
to reproduce bz#2064936 in CI when using ceph/rbd with a Filesystem mode PV since
the filesystem capacity will be constrained by the PVC request size.
Below is the problem it tries to recreate:
When validating whether an image will fit into a PV we compare the
image's virtual size to the filesystem's reported available space to
guage whether it will fit. The current calculation reduces the apparent
available space by the configured filesystem overhead value but the
overhead is already (mostly) factored into the result of Statfs. This
causes the check to fail for PVCs that are just large enough to
accommodate an image plus overhead (ie. when using the DataVolume
Storage API with filesystem PVs with capacity constrained by the PVC
storage request size).
This was not caught in testing because HPP does not have capacity
constrained PVs and we are typically testing block volumes in the ceph
lanes. It can be triggered in our CI by allocating a Filesystem PV on
ceph-rbd storage because these volumes are capacity constrained and
subject to filesystem overhead.
Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
* Fix a target pvc validation bug
Corrects the validation logic for target volume.
Below description of the original problem:
When validating whether an image will fit into a PV we compare the
image's virtual size to the filesystem's reported available space to
guage whether it will fit. The current calculation reduces the apparent
available space by the configured filesystem overhead value but the
overhead is already (mostly) factored into the result of Statfs. This
causes the check to fail for PVCs that are just large enough to
accommodate an image plus overhead (ie. when using the DataVolume
Storage API with filesystem PVs with capacity constrained by the PVC
storage request size).
This was not caught in testing because HPP does not have capacity
constrained PVs and we are typically testing block volumes in the ceph
lanes. It can be triggered in our CI by allocating a Filesystem PV on
ceph-rbd storage because these volumes are capacity constrained and
subject to filesystem overhead.
Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
* Improve the warning message
Removed redundant and misleading part about pvc size and update the simplification
Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
* Remove useless test
The test checks that the validation logic takes fs Overhead into account.
New validation logic does not check fs overhead. So test is no longer
relevant.
Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
* Update clone size validation logic
The case with DV using spec.storage API needs
more complex validation that will be added in the
clone controller. The API webhook validation
for that case is removed.
Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
* Improve DV phase failure message in tests
Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
* Add test and warning event for clone size
During clone check if actual requested size on source volume is bigger
than target requested size and emit an event to notify user about situation.
Actual size on filesystem is lower that requested, because of possible filesystem overhead. When using storage API the overhead will be applied on target.
Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
* Code Review cleanup - Removing debug logs
Removed some garbage left after troubleshooting.
Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
* Move fn GetUsableSpace to common utils
Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
* Refactor the data-processor.go to allow register new phases and execution function.
Signed-off-by: Zhuchen Wang <zcwang@google.com>
* Add test case for unknown phase
Signed-off-by: Zhuchen Wang <zcwang@google.com>
* Correct the fsOverhead calculation in profile
Calculation needs play well with the actual resize that is done in data-processor
Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
* Properly reverse the calculation for overhead.
Signed-off-by: Alexander Wels <awels@redhat.com>
Co-authored-by: Alexander Wels <awels@redhat.com>
* Add qemu-img rebase and commit operations.
Also only fail images with backing files that do not exist, so that
ImageIO snapshots can be downloaded and applied to a base disk image.
Signed-off-by: Matthew Arnold <marnold@redhat.com>
* Add merge phase to data processor.
This keeps qemu-img details out of the ImageIO data source.
Signed-off-by: Matthew Arnold <marnold@redhat.com>
* Beef up transfer ticket finalization/cancellation.
Snapshots seem to be more prone to getting locked indefinitely than
disks if not correctly finalized or cancelled, so do this more carefully
than before.
Signed-off-by: Matthew Arnold <marnold@redhat.com>
* Allow downloading snapshots from ImageIO.
Download the first snapshot as a raw whole-disk image, and download
subsequent snapshots as QCOW images to be committed to that base.
Signed-off-by: Matthew Arnold <marnold@redhat.com>
* Allow multi-stage fields on ImageIO data sources.
Also avoid removing base disk image when cleaning data directory.
Signed-off-by: Matthew Arnold <marnold@redhat.com>
* Add ImageIO multi-stage functional tests.
Pick up fakeovirt update for stub functionality, so inventory responses
can be changed on the fly for individual tests.
Signed-off-by: Matthew Arnold <marnold@redhat.com>
* Update multi-stage documentation for ImageIO.
Signed-off-by: Matthew Arnold <marnold@redhat.com>
* Move if-else test block to functions.
Signed-off-by: Matthew Arnold <marnold@redhat.com>
* Reset ImageIO inventory for a test I missed.
Signed-off-by: Matthew Arnold <marnold@redhat.com>
* Clean up from some review comments.
Signed-off-by: Matthew Arnold <marnold@redhat.com>
* Sort out calls to cleanupTransfer.
Failures during the creation of a transfer ticket call the original
cleanupTransfer in a single location, and any exits after the data
source is created call a wrapper function. The wrapper has a lock and a
'done' flag to make sure it is only called once on exit, even when
interrupted from the goroutine that waits for SIGTERM.
Signed-off-by: Matthew Arnold <marnold@redhat.com>
* When qemu image is resized, it needs to be preallocated to the requested
size. In-place preallocation (using convert function of qemu-img)
cleans the data.
With this PR preallocation is applied simply when the image is resized.
Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
* Functional tests for preallocation verify content (MD5).
Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
This PR removes "skipped" condition for preallocation. Importer/uploader
will preallocate to the available size. Filesystem overhead needs to be
taken into account.
Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
* Validate images fit on a filesystem in more cases.
Background:
When the backing store is a filesystem, we store the images
as sparse files. So the file may eventually grow to be bigger
than the available storage. This will cause unfortunate
failures down the line.
Prior to this commit, we validated the size:
- In case the backing store implicitly did it for us (block volumes)
- On async upload
- When resizing (by the operation failing if the image cannot fit
in the available space).
The Resize phase is encountered quite commonly:
Transfer->Convert->Resize
TransferFile->Resize
Adding validation here for the non-resize case covers almost all
the cases.
The only exceptions that aren't validated now are:
- DataVolumeArchive via the HTTP datasource
- VDDK
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* When resizing, take into account filesystem overhead.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Add testing for too large upload/import
- Import/sync upload of too large physical size image (raw.xz, qcow2)
- Import/sync upload of too large virtual size image (raw.xz, qcow2)
- Import of a too large raw image file, if filesystem overhead is
taken into account
- Async upload of too large physical size qcow2.
The async upload cases do not mirror the sync upload ones because if
a block device is used as scratch space, it will hit a size limit
before the validation pause, and fail differently.
This scenario is identical to the sync upload case which was added.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Refactor code in a way that requires less comments to explain.
We can just validate that the requested image size will fit in the
available space, and not rely on the fact we typically resize the
images to the full size.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* When calculating usable space, round down to a multiple of 512.
Our validation is indirectly:
image after resize to usable space <= usable space
For this to pass, we need to ensure that qemu-img's rounding
up to 512 doesn't change the size.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Adjust qemu-img to the ones emitted by nbdkit:
- In some cases, we clearly don't rely on the qemu-img error,
so don't check for it.
- In one case, switch it to looking for the nbdkit equivalent
error message.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* [WIP] doc: User-facing doc for preallocation support
Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
* apis: CDI accepts `preallocation` option.
With this commit CDI accepts (but does handle) `preallocation` settings
for DataVolumes and in CDIConfig.
Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
* core: Implementing preallocation
This commit implements preallocation support for import and upload.
Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
* test: Functional tests for preallocation support
Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
* core: Remove "preallocation for StorageClasses" config
Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
* test: Removed unused function
Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
* test: Fix rook-ceph test failures
Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
* Updated dependencies
Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
* core: Uss PVC annotation to pass preallocation parameters
DataVolume controller now uses a PVC annotation to pass preallocation
configuration to import and update controllers.
Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
This phase can mean three things:
HTTP, imageio, s3, upload: Go directly to Convert
VDDK: unreachable code, points to arbitrary other phase.
registry: do minor processing after transfer.
Only registry makes actual use of this phase.
Point all current users directly to Convert, and fold the
work done in the registry Process phase into the previous
phase.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* When validating disk space, reserve space for filesystem overhead
The amount of available space in a filesystem is not exactly
the advertise amount. Things like indirect blocks or metadata
may use up some of this space. Reserving it to avoid reaching
full capacity by default.
This value is configurable from the CDIConfig object spec,
both globally and per-storageclass.
The default value is 0.055, or "5.5% of the space is
reserved". This value was chosen because some filesystems
reserve 5% of the space as overhead for the root user and
this space doubles as reservation for the worst case
behaviour for unclear space usage. I've chosen a value
that is slightly higher.
This validation is only necessary because we use sparse
images instead of fallocated ones, which was done to have
reasonable alerts regarding space usage from various
storage providers.
---
Update CDIConfig filesystemOverhead status, validate, and
pass the final value to importer/upload pods.
Only the status values controlled by the config controller
are used, and it's filled out for all available storage
classes in the cluster.
Use this value in Validate calls to ensure that some of the
space is reserved for the filesystem overhead to guard from
accidents.
Caveats:
Doesn't use Default: to define the default of 0.055, instead
it is hard-coded in reconcile. It seems like we can't use a
default value.
Validates the per-storageClass values in reconcile, and
doesn't reject bad values.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Use util GetStorageClassByName
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Test filesystem overhead validation against async upload endpoint
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* wait for NFS PVs to be deleted before continuing
Intended to help with flakes, but didn't make a difference.
Probably still worth doing.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Avoid using the uncached client unnecessarily
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Add error handling for the case where even a default SC is not found
Note that this change isn't expected to make a difference, as we
check if the targetStorageClass is nil later on and have the same
behaviour, but this is probably more correct API usage.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Add testing for the validation of filesystem overhead values
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Fix logical error in waiting for NFS PVs.
Wait for all of them, not just the last one.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
Modified function that gets the size of a block device/available to return error as well as -1, so we
can distinguish the path not existing from the binary not existing in case the container doesn't have
the required binaries.
Last lane also passed, but due to slow CI timed out before reporting results.
Signed-off-by: Alexander Wels <awels@redhat.com>