Commit Graph

132 Commits

Author SHA1 Message Date
Eng Zer Jun
aaacbae797
refactor: move from io/ioutil to io and os packages (#2484)
The io/ioutil package has been deprecated as of Go 1.16 [1]. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

[1]: https://golang.org/doc/go1.16#ioutil
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-12-05 19:19:13 +00:00
akalenyu
4f0fa1fec2
Address possible nils in dv controller, log CSIDrivers in tests (#2253)
* [WIP] Debug ceph csidriver not being there

We'd expect the CSIDriver object to be there, otherwise ceph install might be struggling

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Avoid possible StorageClassName nils

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2022-10-19 23:50:04 +01:00
Arnon Gilboa
f58723d9ce
Enable DataVolume garbage collection by default (#2421)
On the next kubevirt-bot Bump kubevirtci PR (with bump-cdi), it will
fail on all kubevirtci lanes with tests referring DVs, as the tests
IsDataVolumeGC() looks at CDIConfig Spec.DataVolumeTTLSeconds and
assumes default is disabled. This should be fixed there.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2022-09-16 08:52:45 +01:00
Michael Henriksen
46c6aa994a
Support restricted PSA for worker pods (#2410)
* remove root worker pods

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* remove selinux requirement for worker pods

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* run tests in restricted namespace and required changes

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* handle empty tar

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* add PSA label when running functional tests in OpenShift

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* cannot use restricted PSA with istio (for now)

refactor scc management

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* fix clean script

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2022-09-14 21:16:23 +01:00
Arnon Gilboa
119a67a6a7
GC only if allowed to update DV owner finalizers (#2416)
PVC gets its DV ownerRefs, which can be any CR. If BlockOwnerDeletion is
true, we allow GC only if CDI has RBAC to update the owner finalizers
(we explicitly added it for VirtualMachines). BlockOwnerDeletions gets
validated with this admission plugin which is enabled in OpenShift, but
disabled in our kubevirtci clusters:

https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#ownerreferencespermissionenforcement

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2022-09-06 17:53:02 +01:00
Arnon Gilboa
9a839ae4dc
Always set a valid ContentType for created PVC (#2409)
When DV is garbage collected and PVC ContentType is not correctly set,
there is no way to guess the PVC ContetType. In CDI we assume that if
there is no content type it's a kubevirt disk, but is other cases
(e.g. export) we cannot make that assumption.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2022-08-26 00:51:17 +01:00
Shelly Kagan
5364b4b636
Check for pvc populated before doing any clone validations and actions (#2375)
In case our pvc is already populated no need to check for source PVC
unknown size etc.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

Signed-off-by: Shelly Kagan <skagan@redhat.com>
2022-08-10 21:42:06 +01:00
akalenyu
210848eb5f
Status reporting for CSI & Smart clones with WFFC storage (#2364)
* Fix logging level so we respect it in controllers/operator

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Fix CSI & Smart clones with WFFC storage status reporting

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2022-08-05 22:40:23 +01:00
Michael Henriksen
90ee754861
Queue DataVolumes referred to in "populatedFor" annotations (#2373)
* Queue DataVolumes referred to in "populatedFor" annotations

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* functional test should handle garbage collection

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2022-08-03 22:33:04 +02:00
alromeros
db59dcec87
Fix race condition in upload-controller to keep PVC annotations after a clone error (#2363)
* Modify upload-controller to keep the annotations of a PVC if a pod fails in another controller

When a pod fails, some annotations (running condition, pod phase...) are updated in the affected PVC to let other controllers know.

However, due to a lack of synchronization between some controllers, a race condition could happen where, if a pod succeeds just after a pod fails in a different controller, the annotations set in the error handling would just be updated inapropiately.

This commit modifies the upload-controller to check for clone failures before updating annotations.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Add unit tests to updateUploadAnnotations

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Update functional tests in cloner_test to check for conditions and annotations after a pod fails

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Improve error handling in source-clone pod to avoid overwritting the DV running condition

Signed-off-by: Alvaro Romero <alromero@redhat.com>
2022-08-03 14:50:27 +02:00
akalenyu
4d4ad12df5
Only list Ingresses/Routes in CDI namespace instead of cluster level (#2371)
* Only list Ingresses/Routes in CDI namespace instead of cluster level

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Change the way we initialize cache for cdi controller

This gives us flexibility to cache only exactly what we need.
The error that led me to this was that we were attempting to Watch()
Routes/Ingresses which is basically caching all namespaces. We only want to cache the CDI namespace for those.
Source/feature from https://github.com/kubernetes-sigs/controller-runtime/issues/1708

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2022-08-01 22:12:47 +02:00
alromeros
efe5109ba5
Improve error handling when pod creation fails (#2335)
* Improve the error handling when pod creation fails

When pod creation fails, the error is usually logged without providing additional information to the user. This behavior is especially risky when the user lacks the permits to check the logs, making it unintuitive and almost impossible to find the source of the problem.

This commit improves the error handling of the pod-creation process, so pertinent info about the failure is included in the pod's PVC.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Update functional tests to check for events when pod-creation fails

Since error handling in pod-creation has been improved in our controllers, this commit introduces several changes in the corresponding functional tests to properly cover the new behavior included when pod-creation fails.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Update unit-tests after improving error-handling of pods for proper coverage

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Minor fixes and improvements on error handling for pods

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Modify datavolume-controller to change the running condition of datavolumes when a pod fails

Until this commit, the way of handling pod errors in the datavolume-controller has been to change the affected datavolume's phase to failed, which conflicts with the declarative approach of the controllers.

This commit modifies this behavior so that, when a pod fails, the affected datavolume's running condition is changed to false while the phase remains unchanged.

Signed-off-by: Alvaro Romero <alromero@redhat.com>
2022-07-15 10:28:53 +02:00
Alexander Wels
5554567cbb
Comply with restricted security context (#2331)
* Comply with restricted security context in kubernetes

Ensure CDI pods comply with the restricted security context as much as
possible (have to be root for nbdkit and block devices). Also cannot set
SeccompProfile since SCC won't allow us to set it.

Signed-off-by: Alexander Wels <awels@redhat.com>

* Changed path /var/local/all_certs to stay in /var

Signed-off-by: Alexander Wels <awels@redhat.com>
2022-07-08 18:47:50 +02:00
Michael Henriksen
dcf968e1c5
fix issue #2323: Cross namespace smart clone not cleaning up (#2333)
* fix issue #2323: Cross namespace smart clone leaving tmp PVC and ObjectTransfer around

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* wait for objecttransfers to be cleaned up in clone tests

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* attempt to address test failures

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* move cross namespace cleanup code

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* move cleanup transfer call until after dataSource is resolved

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2022-06-28 14:19:11 +02:00
Arnon Gilboa
1bf34b578e
Add GC func tests; mutate DV GC annotation only on create (#2326)
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2022-06-28 02:32:55 +02:00
Bartosz Rybacki
b58e3924d1
Clone fs to block fails on size validation (#2299)
* Test: Clone fs to block fails on size validation

When requesting size `X` with filesystem volume mode and storage api the size
is increased for the fs overhead. When trying to clone to block using
the same size `X` the clone fails because the target is smaller than source.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Improve size validation for clone

Skip size validation for filesystem in webhook and include filesystem
overhead when doing the validation in controller.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Correct size validation for smart clone

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Correct unit test with fs overhead

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Restore CDI Config after each clone test

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Review cleanup

Removing redundant conversions and not useful comments

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
2022-06-27 15:02:54 +02:00
alromeros
5516641864
Allow creating clones without source PVC (#2306)
* Modify datavolume admission webhook to enable creating clones without source PVC

This commit modifies the datavolume admission webhook to follow a more descriptive approach, enabling the creation of clones without a source PVC.

This clone will later be handled by the datavolume-controller until the source PVC is created.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Modify the datavolume-controller to improve the handling of clones without source

Since we are allowing the creation of clones without source PVC in the admission webhook, we need to improve the handling of these clones once in the datavolume-controller.

This commit modifies said controller, so we do proper error handling and validation until the source PVC is created.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Add unit tests to check the creation of clones without source PVC

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Include a mechanism to reconcile clones without source once the source PVC is created

This commit introduces a new datavolume-controller watch so, if a clone without source is created, we make sure to reconcile it once a proper PVC is created.

It also updates/includes unit tests for proper coverage.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Include functional tests to cover the creation of clones without source PVC

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Minor refactoring of clone-related code in DataVolume reconciler to improve readability

After enabling the creation of clones without source PVC in the datavolume controller, the clone-related logic outside its reconciler has increased in size and become sparse.

This commit rearranges all this code and condenses it into the clone reconciler, without changing the loop behavior.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Modify datavolume-mutate webhook to reject the creation of clones if the source PVC's namespace doesn't exist

In previous commits, a mechanism to allow the creation of clones without source PVC was added, without ever checking if the source PVC's namespace exists or not.

This behavior could lead to permission conflicts between the user and the source's namespace since the webhook skipped all the related validation.

This commit modifies the datavolume-mutate admission webhook to reject the clone if the source PVC's namespace doesn't exist.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Update unit tests for proper coverage of clone-validation functions

This commit adds and updates several unit tests to improve the coverage of the clone-validation mechanism after several functions were moved to the controller.

It also introduces minor changes on related code in the datavolume-controller and functional tests following PR review.

Signed-off-by: Alvaro Romero <alromero@redhat.com>
2022-06-21 18:05:38 +02:00
alromeros
7eebbfaf1c
Make size optional when cloning using the Storage API (#2222)
* Allow empty DV size when cloning using storage API

When cloning a Data Volume, the size of the target can be potentially obtainable via the source PVC, which discards the need to explicitly specify it.

Considering that, this commit introduces a change in the correspondent validation webhook to allow omitting the resources.request.storage field when cloning a PVC using the storage API.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Modify datavolume-controller to allow obtaining storage size from source PVC when cloning

When cloning a PVC, if the target's size is not specified, said value can be attainable from the source PVC.

This commit introduces a change in datavolume controller so, in case of detecting an empty storage size, said value can be obtained when performing CSI and Smart cloning.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Update unit tests for datavolume-validation after enabling cloning with empty size

This commit updates the unit testing for the datavolume validation webhook, covering the possibility of cloning a PVC without setting any storage size.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Update unit testing for controller-related functions after enabling cloning with empty size

This commit includes unit tests for the volumeSize() function after enabling creating clones with blank size.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Update the datavolume controller to create a size-detection pod when performing host-assisted clone

When performing a host-assisted clone with empty clone size, simply copying the original PVC size could lead to potential overhead miscalculations if the source's VolumeMode is "filesystem".

When that's the case, an inspection pod will be created in the datavolume controller so it extracts the size of the virtual image using qemu-img.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Include an image-size detection tool to allow cloning with empty DV size

This commit introduces a new tool in charge of collecting the virtual image size when cloning with an empty DV size. In some cases where said value is unattainable from the original PVC's spec, the datavolume controller will create a new pod containing this new tool.

The binary will then run the 'qemu-img' command and handle its results appropriately.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Optimize the clone-size lookup process to avoid creating unnecessary size-detection pods

When performing host-assisted clone with an empty DV size, in some cases, a size-detection pod is used to obtain the required capacity.

This commit tries to optimize this process to keep the collected value as a PVC annotation, that is checked in subsequent clones to avoid creating redundant pods.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Minor fixes and improvements on mechanism for cloning with empty storage size
* Add new optional flag on size-detection binary to enable using a different URI scheme
* Improve the pod-creation mechanism so the pod is not created until the source PVC has finished the import
* Modify size-finlation mechanism to account for possible round-downs when importing the source image
* Improve the size inflation mechanism so only PVCs with filesystem as volume mode are considered
* Minor style corrections

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Modify the clone-controller to allow skipping the clone size validation in some cases

Due to filesystem overhead differences, the target's size can sometimes be smaller than the source's one when obtaining said value with the size-detection pod.

This commit introduces minor changes in the clone-controller so we can skip the size validation in those cases.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Minor changes and improvements in size-detection mechanism following PR review
* Added new UT that covers using empty storage API for non-cloning sources
* Added new watch on datavolume-controller that looks for changes in the size-detection pod
* Removed redundant and unnecessary specs on size-detection pod
* Added error handling when reading the pod's termination message
* Moved general-usage functions to 'util.go' file
* Updated 'datavolumes' documentation to reference the possibility of omitting the storage size when cloning
* Minor style corrections

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Add unit tests that cover the size-detection mechanism in the DataVolume controller

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Include functional tests for cloning without specifying storage size

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Improve error handling in the creation/deletion process of the size-detection pod

This commit introduces additional handling in case of error after and during the size-detection pod is created.

It also updates several related unit tests.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Minor fixes to improve fsOverhead calculations when cloning with empty storage size
* Modified the size-detection mechanism so we account for fsOverhead when cloning to filesystem volume mode in all cases
* Clean up the code for reconciling when cloning a PVC that is not ready
* Minor fix in functional test so it works when cloning from block to filesystem volume mode

Signed-off-by: Alvaro Romero <alromero@redhat.com>
2022-05-31 17:03:45 +03:00
Shelly Kagan
8db48708fc
Fix smart clone update request size condition (#2292)
Signed-off-by: Shelly Kagan <skagan@redhat.com>
2022-05-24 13:31:19 +03:00
Arnon Gilboa
0b3900c60c
Garbage Collect Completed DVs (#2233)
* Garbage Collect Completed DVs

See design at:
https://github.com/kubevirt/community/blob/main/design-proposals/garbage-collect-completed-dvs.md

ToDos:
-DataImportCron and DataSource controllers adaptation and func tests
-Add doc for DataVolume, CDIConfig and DataImportCron changes
-Extend unit tests and functional tests
-KubeVirt adaptation

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Controller minor fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Adapt tests to GC

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add DV mutate unit test for GC

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Improve GC skip per annotation test

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Use DescribeTable for the GC tests

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2022-05-19 00:09:31 +02:00
Shelly Kagan
aa6b9dd403
Fix smart clone request size update (#2276)
* Fix smart clone request size update

In case the pvc got an actual size as the expected
size there wasn't an update of the pvc spec with the
actual user request size, which left the pvc spec with
a smaller size then the user requested(the data size).
This caused a discrepancy when trying to restore such pvc,
which restored a pvc with the small size instead of the user request
size.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* review fixes

Signed-off-by: Shelly Kagan <skagan@redhat.com>
2022-05-18 17:21:33 +02:00
Alexander Wels
bbee324537
Make CDI clone strategy strings correct types (#2277)
Noticed the consts were strings instead of the right types
this changes the type to the right type and modifies the usage
to not have to cast to the right type all over the place.

Signed-off-by: Alexander Wels <awels@redhat.com>
2022-05-17 14:19:25 +02:00
Maya Rashish
c05a750a9a
Detect storage capabilities for no-provisioner storage classes (#2226)
* Detect storage capabilities for no-provisioner storage classes

Assume there's a persistent volume that we can look up to infer the
correct values for volume mode and access modes.

Limit ourselves to detecting no-provisioner capabilities on LSO to
avoid greatly increasing the number of storage classes we provide
capabilities for. This is similar to our current flow where we
only provide capabilities for known storage classes.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Regenerate bazel stuff for pkg/monitoring's existence

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Add a watcher for no-provisioner PVs

We maintain a map of storage class names and provisioners whenever
storage classes are changed.

If a PV has one of the storage classes with no-provisioner as a
provisioner, reconcile that storage class.

This is because we infer the storage profile based on PVs, and
new ones might have different storage capabilities.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Use a client to do our storage class caching

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Pass a client as an argument, not global.

Suggested by awels, thanks!

Signed-off-by: Maya Rashish <mrashish@redhat.com>
2022-05-12 14:00:02 +02:00
Alexander Wels
b3d08268e2
Start smart clone controller from datavolume controller when needed (#2265)
* periodic sync CSI snapshot CRD check

It was possible for the CSI snapshot CRD check to fail silently and
prevent the smart clone controller from starting during the cdi deployment
pod start up. This would prevent smart clone from working properly.

This adds a periodic sync of 1 minute for checking the CRDs. We also
log failures that are not is not found so we can more easily detect this
situation as humans.

Signed-off-by: Alexander Wels <awels@redhat.com>

* Change location of the start controller call.

Signed-off-by: Alexander Wels <awels@redhat.com>
2022-05-12 04:28:01 +02:00
Bartosz Rybacki
5fe69cb718
Bugfix - Correctly handle populated PVC (#2227)
* Test handling of populated PVC

Populated PVC created from clone operation should not start
any CDI actions. It can only update DV status.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Handle prepopulated pvc with network clone

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Handle prepopulated PVC for Smart and CSI clone

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
2022-04-29 14:48:01 +02:00
Bartosz Rybacki
11e9a53533
Extract the clone logic from main reconcile loop (#2241)
* Extract the clone logic for reconcile

This change only extracts the code branch that handles the clone.
The main idea is to divide the reconcile loop into parts doing import,
upload and clone. This makes code easier to read and understand.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Cleanup handling not existing PVC

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
2022-04-21 04:13:51 +02:00
akalenyu
71522a1f2d
Switch VolumeSnapshot to v1 (#2235)
* Switch VolumeSnapshot to v1

VolumeSnapshot v1beta is being deprecated:
https://kubernetes.io/blog/2022/04/07/upcoming-changes-in-kubernetes-1-24/#api-removals-deprecations-and-other-changes-for-kubernetes-1-24

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Fix unit tests; change version we look for in IsCsiCrdsDeployed

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2022-04-12 18:50:19 +02:00
Bartosz Rybacki
2f7987d407
Improve handling prePopulated DV (#2205)
Previously, when a DataVolume was Reconciled then CDI created PVC
regardless of prePopulated annotation.

There is a possibility that DataVolume and PVC pair is being
restored or moved from other cluster. When DataVolume is marked
as prePopulated, the CDI should not try to populate it.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
2022-04-07 21:55:44 +02:00
Arnon Gilboa
e9fc2009a5
Reconcile DVs waiting for default SC (#2196)
* Reconcile DVs waiting for default SC

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2022-03-24 12:44:59 +01:00
akalenyu
18c815261d
kubevirtci bump/controller change to overcome AfterSuite flake (#2162)
* Update kubevirtci to overcome AfterSuite flake

Update kubevirtci to get a fix for a flake where PVC cant be removed
because it still holds the `pvc-as-source-protection` finalizer:
https://github.com/kubernetes-csi/external-snapshotter/issues/349

More info in https://github.com/kubevirt/kubevirtci/pull/750.

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Don't create multiple VolumeSnapshots

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2022-02-23 17:44:00 +01:00
akalenyu
e678f5d0e9
Fix SnapshotForSmartCloneInProgress sometimes not occuring (#2158)
Sometimes we skip SnapshotForSmartCloneInProgress and directly go to
Succeeded.
This happens when the `r.updateDataVolume(dataVolumeCopy)` call fails.

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2022-02-16 22:42:21 +01:00
Matthew Arnold
e92013d079
Fix interaction between multi-stage import and retainAfterCompletion. (#2146)
* Append checkpoint ID to multi-stage importer pods.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Ignore completed pods for multi-stage imports.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Reset current import pod when checkpoint is done.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Don't prevent pod deletion for scratch space.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Only ignore pod when retainAfterCompletion is set.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Fix data volume unit tests.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Tests for checkpoint suffix and completed pods.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Test for retained pods exiting for scratch space.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Add functional test for retaining multistage pods.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Clean up lint error.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Remove scratch handling that is fixed elsewhere.

This is part of shouldDeletePod now.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Add unit tests for long PVC/checkpoint names.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Match retainAfterCompletion test to description.

Signed-off-by: Matthew Arnold <marnold@redhat.com>
2022-02-11 23:14:07 +01:00
Bartosz Rybacki
e18fc68718
BugId: 2038679 - Clone with volume mode file system using Storage API fails (#2096)
* Update clone size validation logic

The case with DV using spec.storage API needs
more complex validation that will be added in the
clone controller. The API webhook validation
for that case is removed.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Improve DV phase failure message in tests

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Add test and warning event for clone size

During clone check if actual requested size on source volume is bigger
than target requested size and emit an event to notify user about situation.

Actual size on filesystem is lower that requested, because of possible filesystem overhead. When using storage API the overhead will be applied on target.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Code Review cleanup - Removing debug logs

Removed some garbage left after troubleshooting.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Move fn GetUsableSpace to common utils

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
2022-02-02 17:53:10 +01:00
Matthew Arnold
7806e77bdf
Allow optional per-DataVolume VDDK image. (#2102)
* Add optional VDDK initImageURL field.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Pass VDDK image URL through to PVC annotation.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Unit tests for per-DV VDDK image URL.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Functional test for VDDK initImageURL field.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Update documentation for VDDK initImageURL.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Fix lint error.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Check for absence of AwaitingVDDK in unit test.

Signed-off-by: Matthew Arnold <marnold@redhat.com>
2022-01-19 22:49:48 +01:00
akalenyu
483359bf69
Add label on our PVCs to prevent unnecessary alert from going off (#2093)
We want to silence the KubePersistentVolumeFillingUp for all our PVCs that hold virtual machine disks,
since these disks consume the entire PVC by design.

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2022-01-14 15:35:06 +01:00
Maya Rashish
f2939fcee8
add PVC claimName to datavolume status (#2060)
* Make it possible to find the underlying PVC name using the DV

Right now a lot of things assume that the underlying PVC has the
same name/namespace, let's make it possible to reach over and not
need to have this implicit knowledge in a lot of places.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Install some artifacts on the old version of CDI during upgrade tests

And use this to test that DataVolume.Status.ClaimName is set after
upgrades.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Bump CDI pod update timeout

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Only check if non-testing CDI pods have updated.

We don't update the testing environment, so it looks like some of the
update fails.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Restore lower timeouts

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* As elsewhere, don't use local registry artifacts with external provider

Signed-off-by: Maya Rashish <mrashish@redhat.com>
2022-01-07 20:28:25 +01:00
Shelly Kagan
41df5c240e
Emit event and update dv conditions when pvc fails to create due to quota (#2016)
* Update datavolume conditions when quota exceeded when creating pvc

When creating the pvc from the dv the pvc size
can exceed the allowed quota, in such case so far the only
indication was to look in the logs.
Now added indication in the data volume conditions
(when possible) and emitted event.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Add functional tests to check the new conditons and event

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* tests cosmetics

-use existing functions
-add missing checks on errors
-remove unused code
-etc..

Signed-off-by: Shelly Kagan <skagan@redhat.com>
2021-11-16 17:29:39 +01:00
Matthew Arnold
703e421a8a
Allow user-specified headers in HTTP data source. (#1994)
* Update HTTP data source API to allow custom headers.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Implement custom HTTP headers API.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Document custom headers in HTTP data source.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Correct secretExtraHeader comment to reference Secret.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Add volume mounts for secret headers.

Replaces environment variables for headers from secrets.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Avoid failing when there are no extra headers.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Redact contents of headers that come from secrets.

Also split up getExtraHeaders to reduce Sonar Cloud complexity.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Ensure all HTTP client requests use extra headers.

Missed redirect check and content length retrieval, both of which might
need the extra headers.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Add some unit tests for extra HTTP headers.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Do not quote headers in nbdkit curl arguments.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Add functional tests for extra HTTP headers.

Avoids new test server by specifiying basic authorization headers to the
existing file host port that requires it.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Use filepath.Walk to read secrets.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Minor documentation update for secrets.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Re-run 'make generate' for verification failure.

Signed-off-by: Matthew Arnold <marnold@redhat.com>
2021-11-12 21:06:57 +01:00
Shelly Kagan
1eda6db87e
Datacontroller cosmetics (#2008)
* move common code to controller util

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* remove unused functions and cosmetics

Signed-off-by: Shelly Kagan <skagan@redhat.com>
2021-11-04 18:05:55 +01:00
Shelly Kagan
e7dd62eb26
Upload archive (#1969)
* Add support for archive upload

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* fix golang errors

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Change storage profile property set to support more then one set

So far CDI supported only 1 claim propery set. We want to be able
to support more then one so in case the user provides to the
DV storage volumeMode without accessMode or vice versa cdi
will be able to fit to it the most appropriate match.
Added to rook ceph block a second default of filesystem
volume mode with RWO access mode, it will support archive
upload which has default of filesystem mode.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* CR fix - change to one endpoint for the user

upload proxy will identify if the upload is archive
or not by looking at the content type annotation on
the pvc. If the content type is archive it will route
the uplaod to upload server to a new archive upload uri.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Add storage profile and data volume controllers unit tests

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* CR fixes

* add default volume mode to archive content type
* upload server use data processor for archive upload
* tests for volume mode with archive content type
* tests for archive upload of compressed tar

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Adjust imports acording to new apis dir

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* CR small fixes

Signed-off-by: Shelly Kagan <skagan@redhat.com>
2021-11-03 20:11:47 +01:00
Michael Henriksen
aedaf513ec
Move apis to staging, push to containerized-data-importer-api (#1997)
* move apis to new staging area

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* add script to push to staging

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* fix lint check and api reference

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* push staging to api repo

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2021-10-28 13:40:24 +02:00
akalenyu
50c93e8b0e
Deploy alerts infra as part of our installation (#1979)
* Deploy alerts infra as part of our installation

Conditionally deploy the infrastructure that is needed to fire alerts for our users
when bad things are happening to CDI.

Testing with `KUBEVIRT_DEPLOY_PROMETHEUS=true`

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Watch and unit test all prometheus related resources

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* add gateway for changing monitoring namespace (rbac purposes)

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* refactor test to check for exact alert name and firing state

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Align pattern of ensuring prometheus resource exists for all

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Remove potential noisy event

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Extract duplicate code to function

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Dont use empty value for prometheus label due to open issue

https://github.com/prometheus-operator/prometheus-operator/issues/4325

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2021-10-26 21:26:07 +02:00
Bartosz Rybacki
5d2eba8e13
Test the new handling of storage class name (#1958)
Added missing tests for change "Explicitly set the storage class name #1936".
Corrected the behavior when storage class is not provided and not available.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
2021-09-29 03:20:17 +02:00
Bartosz Rybacki
a82a0a7a59
Correct the cloneStrategy on StorageProfile (#1951)
Moves the cloneStrategy up one level from claimPropertySet to
StorageProfile.spec and StorageProfile.status.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
2021-09-22 18:06:18 +02:00
Arnon Gilboa
addf25b4f9
Support registry import using node docker cache (#1913)
* Support registry import using node docker cache

The new CRI (container runtime interface) importer pod is created with three containers and a shared emptyDir volume:
-Init container: copies static http server binary to empty dir
-Server container: container image container configured to run the http binary and serve up the image file in /data
-Client container: import.sh uses cdi-import to import from server container, and writes "done" file on emptydir
-Server container sees "done" file and exits

Thanks mhenriks for the PoC!

Done:
-added ImportMethod to DataVolumeSourceRegistry (DataVolume.Spec.Source.Registry, DataImportCron.Spec.Source.Registry).
Import method can be "skopeo" (default), or "cri" for container runtime interface based import
-added cdi-containerimage-server & import.sh to the cdi-importer container

ToDo:
-utests and func tests
-doc

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add tests, fix CR comments

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Use deployment docker prefix and tag in func tests

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add OpenShift ImageStreams import support

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add importer pod lookup annotation for image streams

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add pullMethod and imageStream doc

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2021-09-20 22:05:36 +02:00
Bartosz Rybacki
483b9733d5
Explicitly set that storage class name (#1936)
When the storage request is specified using the 'storage' API in
a DataVolume, CDI uses Storage Profiles to look up any missing
values that would be required to create the underlying PVC.
If the user omits the storage class parameter in their DV,
CDI assumes the cluster's default storage class will be used.

Since CDI calculates the storage class that should be used in
this case, it should explicitly set that storage class name
in the PVC it creates.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
2021-09-17 13:17:31 +02:00
Michael Henriksen
87a13c2f29
Add long term token to pvcs when host assisted cloning cross namespaces (#1922)
* Add long term token (10 years) to pvcs when host assisted cloning between namespaces

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* clone controller should retry if source in use

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* minor refactor if/else -> switch

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2021-09-17 01:24:00 +02:00
Michael Henriksen
2889d68766
BugId: 1999571 - fix clone into larger capacity nfs volume (#1939)
* BugId: 1999571 - fix clone into larger capacity nfs volume

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* fix lint issues

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2021-09-15 20:22:35 +02:00
Bartosz Rybacki
a308404b07
Overhead on profile and usable space toghether (#1926)
* Correct the fsOverhead calculation in profile

Calculation needs play well with the actual resize that is done in data-processor

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Properly reverse the calculation for overhead.

Signed-off-by: Alexander Wels <awels@redhat.com>

Co-authored-by: Alexander Wels <awels@redhat.com>
2021-09-07 16:42:03 +02:00
Matthew Arnold
33b43a6767
Avoid trying to get metrics from non-running pod. (#1911)
* Avoid trying to get metrics from non-running pod.

If the importer pod is not running, the attempt to get metrics can hit a
30-second HTTP timeout. For multi-stage imports, this causes the
transition to "Paused" to take much longer than it needs to.

Signed-off-by: Matthew Arnold <marnold@redhat.com>

* Return nil instead of new error.

Signed-off-by: Matthew Arnold <marnold@redhat.com>
2021-09-03 02:14:38 +02:00