Commit Graph

18 Commits

Author SHA1 Message Date
Eng Zer Jun
aaacbae797
refactor: move from io/ioutil to io and os packages (#2484)
The io/ioutil package has been deprecated as of Go 1.16 [1]. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

[1]: https://golang.org/doc/go1.16#ioutil
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-12-05 19:19:13 +00:00
Bartosz Rybacki
f89812c268
Do not factor fs overhead into available space during validation (#2195)
* Create a test for an overhead bug

This image size and filesystem overhead combination was experimentally determined
to reproduce bz#2064936 in CI when using ceph/rbd with a Filesystem mode PV since
the filesystem capacity will be constrained by the PVC request size.

Below is the problem it tries to recreate:
When validating whether an image will fit into a PV we compare the
image's virtual size to the filesystem's reported available space to
guage whether it will fit.  The current calculation reduces the apparent
available space by the configured filesystem overhead value but the
overhead is already (mostly) factored into the result of Statfs.  This
causes the check to fail for PVCs that are just large enough to
accommodate an image plus overhead (ie. when using the DataVolume
Storage API with filesystem PVs with capacity constrained by the PVC
storage request size).

This was not caught in testing because HPP does not have capacity
constrained PVs and we are typically testing block volumes in the ceph
lanes.  It can be triggered in our CI by allocating a Filesystem PV on
ceph-rbd storage because these volumes are capacity constrained and
subject to filesystem overhead.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Fix a target pvc validation bug

Corrects the validation logic for target volume.

Below description of the original problem:
When validating whether an image will fit into a PV we compare the
image's virtual size to the filesystem's reported available space to
guage whether it will fit.  The current calculation reduces the apparent
available space by the configured filesystem overhead value but the
overhead is already (mostly) factored into the result of Statfs.  This
causes the check to fail for PVCs that are just large enough to
accommodate an image plus overhead (ie. when using the DataVolume
Storage API with filesystem PVs with capacity constrained by the PVC
storage request size).

This was not caught in testing because HPP does not have capacity
constrained PVs and we are typically testing block volumes in the ceph
lanes.  It can be triggered in our CI by allocating a Filesystem PV on
ceph-rbd storage because these volumes are capacity constrained and
subject to filesystem overhead.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Improve the warning message

Removed redundant and misleading part about pvc size and update the simplification

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Remove useless test

The test checks that the validation logic takes fs Overhead into account.
New validation logic does not check fs overhead. So test is no longer
relevant.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
2022-03-22 04:38:48 +01:00
Bartosz Rybacki
e18fc68718
BugId: 2038679 - Clone with volume mode file system using Storage API fails (#2096)
* Update clone size validation logic

The case with DV using spec.storage API needs
more complex validation that will be added in the
clone controller. The API webhook validation
for that case is removed.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Improve DV phase failure message in tests

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Add test and warning event for clone size

During clone check if actual requested size on source volume is bigger
than target requested size and emit an event to notify user about situation.

Actual size on filesystem is lower that requested, because of possible filesystem overhead. When using storage API the overhead will be applied on target.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Code Review cleanup - Removing debug logs

Removed some garbage left after troubleshooting.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Move fn GetUsableSpace to common utils

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
2022-02-02 17:53:10 +01:00
zhuchenwang
3864a838c0
Refactor the data-processor.go to allow customization (#2103)
* Refactor the data-processor.go to allow register new phases and execution function.

Signed-off-by: Zhuchen Wang <zcwang@google.com>

* Add test case for unknown phase

Signed-off-by: Zhuchen Wang <zcwang@google.com>
2022-01-20 14:17:33 +01:00
Bartosz Rybacki
a308404b07
Overhead on profile and usable space toghether (#1926)
* Correct the fsOverhead calculation in profile

Calculation needs play well with the actual resize that is done in data-processor

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Properly reverse the calculation for overhead.

Signed-off-by: Alexander Wels <awels@redhat.com>

Co-authored-by: Alexander Wels <awels@redhat.com>
2021-09-07 16:42:03 +02:00
Matthew Arnold
cab586ab1a
Implement multi-stage ImageIO imports. (#1903)
* Add qemu-img rebase and commit operations.

Also only fail images with backing files that do not exist, so that
ImageIO snapshots can be downloaded and applied to a base disk image.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Add merge phase to data processor.

This keeps qemu-img details out of the ImageIO data source.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Beef up transfer ticket finalization/cancellation.

Snapshots seem to be more prone to getting locked indefinitely than
disks if not correctly finalized or cancelled, so do this more carefully
than before.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Allow downloading snapshots from ImageIO.

Download the first snapshot as a raw whole-disk image, and download
subsequent snapshots as QCOW images to be committed to that base.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Allow multi-stage fields on ImageIO data sources.

Also avoid removing base disk image when cleaning data directory.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Add ImageIO multi-stage functional tests.

Pick up fakeovirt update for stub functionality, so inventory responses
can be changed on the fly for individual tests.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Update multi-stage documentation for ImageIO.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Move if-else test block to functions.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Reset ImageIO inventory for a test I missed.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Clean up from some review comments.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Sort out calls to cleanupTransfer.

Failures during the creation of a transfer ticket call the original
cleanupTransfer in a single location, and any exits after the data
source is created call a wrapper function. The wrapper has a lock and a
'done' flag to make sure it is only called once on exit, even when
interrupted from the goroutine that waits for SIGTERM.

Signed-off-by: Matthew Arnold <marnold@redhat.com>
2021-08-30 19:22:07 +02:00
Alexander Wels
b27ff563d1
Always align size of disk image to 1Mi blocks. (#1873)
Signed-off-by: Alexander Wels <awels@redhat.com>
2021-08-05 17:16:43 +02:00
Tomasz Barański
a4fbaffe1c
Bugfix: preallocating resized image erases data (#1747)
* When qemu image is resized, it needs to be preallocated to the requested
size. In-place preallocation (using convert function of qemu-img)
cleans the data.

With this PR preallocation is applied simply when the image is resized.

Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>

* Functional tests for preallocation verify content (MD5).

Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
2021-04-13 20:09:02 +02:00
Tomasz Barański
a29c3d4165
Preallocate even if the size is too small (#1637)
This PR removes "skipped" condition for preallocation. Importer/uploader
will preallocate to the available size. Filesystem overhead needs to be
taken into account.

Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
2021-02-10 21:18:56 +01:00
Maya Rashish
3b36e1cd4f
Validate image fits in filesystem in a lot more cases. take filesystem overhead into account when resizing. (#1466)
* Validate images fit on a filesystem in more cases.

Background:
When the backing store is a filesystem, we store the images
as sparse files. So the file may eventually grow to be bigger
than the available storage. This will cause unfortunate
failures down the line.

Prior to this commit, we validated the size:
- In case the backing store implicitly did it for us (block volumes)
- On async upload
- When resizing (by the operation failing if the image cannot fit
in the available space).

The Resize phase is encountered quite commonly:
Transfer->Convert->Resize
TransferFile->Resize

Adding validation here for the non-resize case covers almost all
the cases.

The only exceptions that aren't validated now are:
- DataVolumeArchive via the HTTP datasource
- VDDK

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* When resizing, take into account filesystem overhead.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Add testing for too large upload/import

- Import/sync upload of too large physical size image (raw.xz, qcow2)
- Import/sync upload of too large virtual size image (raw.xz, qcow2)
- Import of a too large raw image file, if filesystem overhead is
taken into account

- Async upload of too large physical size qcow2.
The async upload cases do not mirror the sync upload ones because if
a block device is used as scratch space, it will hit a size limit
before the validation pause, and fail differently.
This scenario is identical to the sync upload case which was added.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Refactor code in a way that requires less comments to explain.

We can just validate that the requested image size will fit in the
available space, and not rely on the fact we typically resize the
images to the full size.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* When calculating usable space, round down to a multiple of 512.

Our validation is indirectly:
image after resize to usable space <= usable space
For this to pass, we need to ensure that qemu-img's rounding
up to 512 doesn't change the size.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Adjust qemu-img to the ones emitted by nbdkit:

- In some cases, we clearly don't rely on the qemu-img error,
so don't check for it.
- In one case, switch it to looking for the nbdkit equivalent
error message.

Signed-off-by: Maya Rashish <mrashish@redhat.com>
2021-01-25 19:36:49 +01:00
Tomasz Barański
91a15c57d1
Preallocation support (#1498)
* [WIP] doc: User-facing doc for preallocation support

Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>

* apis: CDI accepts `preallocation` option.

With this commit CDI accepts (but does handle) `preallocation` settings
for DataVolumes and in CDIConfig.

Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>

* core: Implementing preallocation

This commit implements preallocation support for import and upload.

Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>

* test: Functional tests for preallocation support

Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>

* core: Remove "preallocation for StorageClasses" config

Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>

* test: Removed unused function

Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>

* test: Fix rook-ceph test failures

Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>

* Updated dependencies
Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>

* core: Uss PVC annotation to pass preallocation parameters

DataVolume controller now uses a PVC annotation to pass preallocation
configuration to import and update controllers.

Signed-off-by: Tomasz Baranski <tbaransk@redhat.com>
2020-12-18 16:46:16 -05:00
Maya Rashish
92a1ae073a
Remove the "Process" data processor phase, simplify state machine. (#1446)
This phase can mean three things:
HTTP, imageio, s3, upload: Go directly to Convert
VDDK: unreachable code, points to arbitrary other phase.
registry: do minor processing after transfer.

Only registry makes actual use of this phase.

Point all current users directly to Convert, and fold the
work done in the registry Process phase into the previous
phase.

Signed-off-by: Maya Rashish <mrashish@redhat.com>
2020-11-04 12:45:49 +01:00
Maya Rashish
b91887e1b7
Reserve overhead when validating that a Filesystem has enough space (#1319)
* When validating disk space, reserve space for filesystem overhead

The amount of available space in a filesystem is not exactly
the advertise amount. Things like indirect blocks or metadata
may use up some of this space. Reserving it to avoid reaching
full capacity by default.

This value is configurable from the CDIConfig object spec,
both globally and per-storageclass.

The default value is 0.055, or "5.5% of the space is
reserved". This value was chosen because some filesystems
reserve 5% of the space as overhead for the root user and
this space doubles as reservation for the worst case
behaviour for unclear space usage. I've chosen a value
that is slightly higher.

This validation is only necessary because we use sparse
images instead of fallocated ones, which was done to have
reasonable alerts regarding space usage from various
storage providers.

---

Update CDIConfig filesystemOverhead status, validate, and
pass the final value to importer/upload pods.

Only the status values controlled by the config controller
are used, and it's filled out for all available storage
classes in the cluster.

Use this value in Validate calls to ensure that some of the
space is reserved for the filesystem overhead to guard from
accidents.

Caveats:

Doesn't use Default: to define the default of 0.055, instead
it is hard-coded in reconcile. It seems like we can't use a
default value.

Validates the per-storageClass values in reconcile, and
doesn't reject bad values.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Use util GetStorageClassByName

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Test filesystem overhead validation against async upload endpoint

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* wait for NFS PVs to be deleted before continuing

Intended to help with flakes, but didn't make a difference.
Probably still worth doing.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Avoid using the uncached client unnecessarily

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Add error handling for the case where even a default SC is not found

Note that this change isn't expected to make a difference, as we
check if the targetStorageClass is nil later on and have the same
behaviour, but this is probably more correct API usage.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Add testing for the validation of filesystem overhead values

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Fix logical error in waiting for NFS PVs.

Wait for all of them, not just the last one.

Signed-off-by: Maya Rashish <mrashish@redhat.com>
2020-10-01 18:31:32 +02:00
Alexander Wels
310e5e239f
GetAvailableSpace(block) now returns error (#1244)
Modified function that gets the size of a block device/available to return error as well as -1, so we
can distinguish the path not existing from the binary not existing in case the container doesn't have
the required binaries.

Last lane also passed, but due to slow CI timed out before reporting results.

Signed-off-by: Alexander Wels <awels@redhat.com>
2020-06-19 13:57:37 -04:00
Alexander Wels
9a2b514365
Add async endpoint for upload that closes connection immediately after transfer completes and then continues background processing. (#1095)
Signed-off-by: Alexander Wels <awels@redhat.com>
2020-02-12 16:17:26 +01:00
Alexander Wels
47fbebad42
Added unit test to verify quantity returns right value. (#968)
Add label to verifier pods so if they fail, they also print their log.

Signed-off-by: Alexander Wels <awels@redhat.com>
2019-09-23 07:56:16 -04:00
Michael Henriksen
019c843586 make clone pods use selinux type spc_t instead of privileged (#875)
* make clone pods use selinux type spc_t instead of privileged

* fix block mode related tests
2019-07-08 13:58:42 -04:00
Alexander Wels
2d6375b057 data stream refactor.
Signed-off-by: Alexander Wels <awels@redhat.com>
2019-04-10 09:18:55 -04:00