PVC should be timestamped in creation and not only upon import
completion, as it might be mistakenly GCed. LRU sort will choose
PVC with empty timestamp as the first candidate for deletion.
The PVC will be recreated by the controller and eventually
timestamped, so this bug was hidden for a while.
Fixes CNV-36896
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Add Prometheus alerts and label existing alerts
- CDINoDefaultStorageClass - not having a default (or virt default)
SC is surely not an OpenShift error, as admins may prefer their cluster
users to only use explicit SC names. However, in the CDI context when
DV is created with default SC but default does not exist, we will fire
an error event and the PVC will be Pending for the default SC, so when
there are such Pending PVCs we will fire an alert.
- CDIDefaultStorageClassDegraded - when the default (or virt default)
SC does not support CSI/Snapshot clone (smart clone) or does not have
ReadWriteMany access mode (for live migration).
- CDIStorageProfilesIncomplete - add storageClass and provisioner
labels.
- CDIDataImportCronOutdated - add dataImportCron namespace and name
labels.
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* CR fixes
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Create stub VolumeSnapshotClass for testing
Including the VolumeSnapshot/Class/Content crds for the
CDIDefaultStorageClassDegraded alert func test.
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Add snapshot manifests for tests
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Deploy snapshot CRDs in the hpp destructive lane
Remove stub snapshot CRDs
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Add label explanation to new metric help
Also rename the metric kubevirt_cdi_storageprofile_status to
kubevirt_cdi_storageprofile_info since it always reports value 1,
where the label values provide the details about the storage
class and storage profile.
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Revert NoProvisioner check removal
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* CR fixes
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Nicify StorageProfile metric update
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
---------
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Default virt storage class
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Add alert for multiple default virt storage classes
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Refactor content type funcs to not return strings
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
---------
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Play nice with storage class changes; don't attempt to create snapshot from old sc
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Make the DataImportCron format more visible via printed column on get
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
---------
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
This commit ups the cpu request for for all our installed compopents
(cdi-deployment, cdi-apiserver, cdi-uploadproxy, cdi-operator)
for 10m (1% of a core) to 100m (10% of a core).
The main driver of this is BZ: 2216038.
Without this change, it is pretty easy to create a large number of
concurrent clone operations and get token timeout errors.
Upping resource requests and concurrency addresses the issue
in a very direct way.
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* dataVolume: Add default instance type labels from source
f229aeb started to pass default instance type labels from
DataImportCrons to DataVolumes and DataVolumes to any associated
destination DataSources or PVCs. As documented in issue #2782 this does
not however pass these labels from the initial source of a DataVolume to
either the DataVolume or the destination DataSources or PVCs
This change corrects this by updating DataVolumes when reconciled,
adding any labels found on PVC or DataSource sources. These labels will
then be passed on to the destination PVC or DataSources by the existing
functionality highlighted above.
Note that if any default instance type labels already exist on the
DataVolume then the process is skipped as it is assumed these are
provided either directly by the user or via a DataImportCron.
Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
* refactor: Use DefaultInstanceTypeLabels more often
Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
---------
Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
* dataimportcron: code change: Use better matchers in tests
Signed-off-by: Andrej Krejcir <akrejcir@redhat.com>
* dataimportcron: Pass dynamic credential support label
The label is passed from DataImportCron to DataVolume
and DataSource.
Signed-off-by: Andrej Krejcir <akrejcir@redhat.com>
---------
Signed-off-by: Andrej Krejcir <akrejcir@redhat.com>
* Disable DV GC by default
DataVolume garbage collection is a nice feature, but unfortunately it
violates fundamental principle of Kubernetes. CR should not be
auto-deleted when it completes its role (Job with TTLSecondsAfter-
Finished is an exception), and once CR was created we can assume it is
there until explicitly deleted. In addition, CR should keep idempotency,
so the same CR manifest can be applied multiple times, as long as it is
a valid update (e.g. DataVolume validation webhook does not allow
updating the spec).
When GC is enabled, some systems (e.g GitOps / ArgoCD) may require a
workaround (DV annotation deleteAfterCompletion = "false") to prevent
GC and function correctly.
On the next kubevirt-bot Bump kubevirtci PR (with bump-cdi), it will
fail on all kubevirtci lanes with tests referring DVs, as the tests
IsDataVolumeGC() looks at CDIConfig Spec.DataVolumeTTLSeconds and
assumes default is enabled. This should be fixed there.
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Fix test waiting for PVC deletion with UID
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Fix clone test assuming DV was GCed
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Fix DIC controller DV/PVC deletion when snapshot is ready
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
---------
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* StorageProfile API for declaring format of resulting cron disk images
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Integrate recommended format in dataimportcron controller
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Take snapclass existence into consideration when populating cloneStrategy and sourceFormat
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
---------
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* touch up zero restoresize snapshot
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* clone populator
only supports PVC source now
snapshot coming soon
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* more unit tests
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* unit test for clone populator
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* func tests for clone populator
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* move clone populator cleanup function to planner
other review comments
verifier pod should bount readonly
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* add readonly flag to test executor pods
synchronize get hash calls
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* increase linter timeout
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* better/explicit readonly support for test pods
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* check pv for driver info before looking up storageclass as it may not exist
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* addressed review comments
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* chooseStrategy shoud generate more events
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
---------
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
If the storage class binding mode is WaitForFirstConsumer, and the
annotation was not explicitly added to the DIC DV template, the created DV
will get stuck in WFFC phase.
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Add support for imagePullSecrets in the CDI CR, to support pulling
images from repositories that require secrets.
The imagePullSecrets is propagated to the following components: cdi-apiserver,
cdi-deployment, and cdi-uploadproxy. The definition of imagePullSecrets in
cdi-operator must be done manually.
Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>
* Modifying code to incorporate review comments.
Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>
---------
Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>
Co-authored-by: Gleb Aronsky <gleb.aronsky@windriver.com>
* Fix hostpath CSI being skipped as "Not HPP"
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Fall back to host assisted if immediate bind requested
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
---------
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
Keeping the last completed or failed job and pod for a while is needed
for both functional tests and debugging. Since the ttl was not set, the
jobs were not automatically deleted.
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* dataimportcron: Pass KubeVirt instance type labels to DataVolume and DataSource
Following on from 4fbcb2d509 a requirement
has arisen to expose the default instance type metadata previously
exposed as annotations also as labels to allow callers such as the UI to
have simple server side filtering of these resources.
The unreleased feature implementation in KubeVirt has now
switched to labels and so CDI should now do the same and pass through
the appropriate labels to the underlying resources.
Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
* instancetype: Pass instance type labels from DataVolume to PVC
Unlike annotations not all labels are copied from a given DataVolume to
a PVC during an import. This change corrects this for instance type
labels ensuring they are passed down to the underlying PVC.
The associated constants are also moved into pkg/controller/common/util
to allow access from the DataImportCron and DataVolume controllers.
Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
A recent design proposal within the KubeVirt community introduced the
idea of inferring the details of default instance type and preferences
from a given volume associated with a VirtualMachine [1]. The idea being
to further reduce the number of choices a user has to make to get a
bootable VirtualMachine to a single choice of a PVC.
This change aims to support this effort by allowing operators to
annotate the underlying DataVolumes, DataSources and PVCs at import time
through CDI by first annotating the initial DataImportCrons.
This is useful to users of CDI such as the KubeVirt SSP operator that
currently defines a number of DataImportCrons to pull in various boot
sources required by the KubeVirt common-templates project.
Both the DataVolume and DataSource associated with the DataImportCron
are annotated to allow KubeVirt to potentially avoid a deeper lookup of
the associated PVC when attempting to infer these defaults.
[1] https://github.com/kubevirt/community/blob/main/design-proposals/default-instancetypes-from-volumes.md
Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
- Split the huge DV controller into smaller op-specific DV controllers -
import, clone, upload
- Add common watch-adding function so each controller watches only its
relevant DVs
- Refactor the common Reconcile() to use interface DataVolumeReconciler
implemented by each controller
- Move all functions, structs, consts to the relevant controller
- Split the utests per controller
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Add cron-job sa to scc
Signed-off-by: Alexander Wels <awels@redhat.com>
* Make sure user is added on upgrade
Signed-off-by: Alexander Wels <awels@redhat.com>
Signed-off-by: Alexander Wels <awels@redhat.com>
* remove root worker pods
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* remove selinux requirement for worker pods
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* run tests in restricted namespace and required changes
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* handle empty tar
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* add PSA label when running functional tests in OpenShift
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* cannot use restricted PSA with istio (for now)
refactor scc management
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* fix clean script
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* Only list Ingresses/Routes in CDI namespace instead of cluster level
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Change the way we initialize cache for cdi controller
This gives us flexibility to cache only exactly what we need.
The error that led me to this was that we were attempting to Watch()
Routes/Ingresses which is basically caching all namespaces. We only want to cache the CDI namespace for those.
Source/feature from https://github.com/kubernetes-sigs/controller-runtime/issues/1708
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Comply with restricted security context in kubernetes
Ensure CDI pods comply with the restricted security context as much as
possible (have to be root for nbdkit and block devices). Also cannot set
SeccompProfile since SCC won't allow us to set it.
Signed-off-by: Alexander Wels <awels@redhat.com>
* Changed path /var/local/all_certs to stay in /var
Signed-off-by: Alexander Wels <awels@redhat.com>
* Reconcile DIC only if DataSource is not managed by another DIC
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* CR fixes
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Get rid of unnecessary DIC reconcile updates
Also fixed status.lastExecutionTimestamp to be the last polling time
as intended in the design, and not the last reconcile update time.
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Pass DIC last execution time by annotation
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Garbage Collect Completed DVs
See design at:
https://github.com/kubevirt/community/blob/main/design-proposals/garbage-collect-completed-dvs.md
ToDos:
-DataImportCron and DataSource controllers adaptation and func tests
-Add doc for DataVolume, CDIConfig and DataImportCron changes
-Extend unit tests and functional tests
-KubeVirt adaptation
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Controller minor fixes
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Adapt tests to GC
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Add DV mutate unit test for GC
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Improve GC skip per annotation test
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Use DescribeTable for the GC tests
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Watch UpdateFunc predicate is checking if ObjectNew has DataImportCron
label with value, while the request enqueue mapping function may refer
the old object before the label value was set, resulting it to pass a
reconcile.Request where Name is an empty string.
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Gather all metrics info in a single location
Signed-off-by: João Vilaça <jvilaca@redhat.com>
* Add comments to exported monitoring vars
Signed-off-by: João Vilaça <jvilaca@redhat.com>
* Delete erroneous DV on DIC desired digest update
When a DataImportCron import DV is condition Running=False with
Reason=Error it indicates this DIC might get stuck with this DV forever,
so no new import DVs will be created even if the source sha256 is
updated. With this change, when digest is updated, before creating the
new DV, we simply delete the erroneous DV if necessary.
Also includes some DataImportCron tests improvements and cleanup.
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* CR fixes
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Fix flaky tests
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* More CR fixes
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Fix CRDs deletion in operator deletion
Also check DataImportCron CRD has no DeletionTimestamp before adding a
finalizer
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* CR fixes
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
The following labels should be set by the operator that creates the
resource, and not by the DataImportCron controller, otherwise it may
result an unnecessary reconciliation and a race-condition.
app.kubernetes.io/component
app.kubernetes.io/managed-by
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Remove tag from DataImportCron registry source URL
There are cases when DataImportCron registry source URL contains tag
referring a specific version. We use this tagged URL when polling for
source updates, but on import we would like to remove the tag as we use
a specific sha256. E.g. for docker://quay.io/containerdisks/centos:8.4
the :8.4 tag will be removed.
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Use Docker reference parsing
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>