Commit Graph

20 Commits

Author SHA1 Message Date
Arnon Gilboa
ee431e8b14
[release-v1.57] Suppress alerts to reduce noise of dependent ones (#3185)
* [release-v1.57] Suppress alerts to reduce noise of dependent ones

Manual backport of #3129

This is a follow-up to #2998 introducing the following changes to alert
rules:
- CDIDefaultStorageClassDegraded - do not fire when no default SC
  (either k8s or virt).
- CDIDataImportCronOutdated - do not fire when no default SC (either
  k8s or virt), as DIC import DVs use default SC.
- CDINoDefaultStorageClass - fire not only when there is a pending DV
  (and PVC) but also when DV has an empty status (waiting for default
  SC etc.)

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* [release-v1.57] Get no-provisioner storage capabilities from PVs

Manual backport of #3128

It's time to get the storage capabilities for all no-provisioner storage
classes based on the existing PVs, and not only for those labeled with
`local.storage.openshift.io/owner-name`, as done so far.
It will also silence the unnecessary CDIStorageProfilesIncomplete alert
for local SC in the CI.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2024-04-12 00:59:57 +02:00
Arnon Gilboa
41f7c89187
[release-v1.57] Add new Prometheus alerts and label existing alerts (#3041)
Manual backport of #2998 & #3038

- CDINoDefaultStorageClass - not having a default SC is surely not an
OpenShift error, as admins may prefer their cluster users to only use
explicit SC names. However, in the CDI context when DV is created with
default SC but default does not exist, we will fire an error event and
the PVC will be Pending for the default SC, so when there are such
Pending PVCs we will fire an alert.

- CDIDefaultStorageClassDegraded - when the default SC does not support
CSI/Snapshot clone (smart clone) or does not have ReadWriteMany access
mode (for live migration).

- CDIStorageProfilesIncomplete - add storageClass and provisioner
labels.

- CDIDataImportCronOutdated - add dataImportCron namespace and name
labels.

Also:
* Rename the metric kubevirt_cdi_storageprofile_status to
kubevirt_cdi_storageprofile_info since it always reports value 1,
where the label values provide the details about the storage class and
storage profile.
* Add snapshot manifests for tests and deploy snapshot CRDs in the hpp
destructive lane

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2024-01-03 20:29:20 +01:00
kubevirt-bot
01fbebfc6e
increase controller cuncurrency and cpu requests (#2867)
This commit ups the cpu request for for all our installed compopents
(cdi-deployment, cdi-apiserver, cdi-uploadproxy, cdi-operator)
for 10m (1% of a core) to 100m (10% of a core).
The main driver of this is BZ: 2216038.
Without this change, it is pretty easy to create a large number of
concurrent clone operations and get token timeout errors.
Upping resource requests and concurrency addresses the issue
in a very direct way.

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
Co-authored-by: Michael Henriksen <mhenriks@redhat.com>
2023-08-24 19:50:37 +02:00
Alexander Wels
4b2c171ecc
Backport main commits to 1.57 release branch (#2764)
* Enable empty schedule in DataImportCron (#2711)

Allow disabling DataImportCron schedule and support external trigger

Signed-off-by: Ido Aharon <iaharon@redhat.com>

* expand upon #2721 (#2731)

Need to replace requeue bool with requeue duration

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* Add clone from snapshot functionalities to clone-populator (#2724)

* Add clone from snapshot functionalities to the clone populator

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Update clone populator unit tests to cover clone from snapshot capabilities

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Fix storage class assignation in temp-source claim for host-assisted clone from snapshot

This commit also includes other minor and styling-related fixes

Signed-off-by: Alvaro Romero <alromero@redhat.com>

---------

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Prepare CDI testing for the upcoming non-CSI lane (#2730)

* Update functional tests to skip incompatible default storage classes

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Enable the use of non-csi HPP in testing lanes

This commit modifies several scripts to allow the usage of classic HPP as the default SC in tests.

This allows us to test our non-populator flow with a non-csi provisioner.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

---------

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Allow snapshots as format for DataImportCron created sources (#2700)

* StorageProfile API for declaring format of resulting cron disk images

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Integrate recommended format in dataimportcron controller

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Take snapclass existence into consideration when populating cloneStrategy and sourceFormat

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

---------

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Remove leader election test (#2745)

Now that we are using the standard k8s leases from
the controller runtime library, there is no need to
test our implementation as it is no longer in use.
This will save some testing time and random failures.

Signed-off-by: Alexander Wels <awels@redhat.com>

* Integration of Data volume using CDI populators (#2722)

* move cleanup out of dv deletion

It seemed off to call cleanup in the prepare function
just because we don't call cleanup unless the dv is deleting.
Instead we check in the clenup function itself if it should be
done: in this 2 specific cases in case of deletion and in case
the dv succeeded.
The cleanup will be used in future commit also for population cleanup
which we also want to happen not only on deletion.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Use populator if csi storage class exists

Add new datavolume phase PendingPopulation to
indicate wffc when using populators, this new
phase will be used in kubevirt in order to know
that there is no need for dummy pod to pass wffc phase
and that the population will occur once creating the vm.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Update population targetPVC with pvc prime annotations

The annotations will be used to update dv that uses the
populators.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Adjust UT with new behavior

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* updates after review

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Fix import populator report progress

The import pod should be taken from pvcprime

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Prevent requeue upload dv when failing to find progress report pod

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Remove size inflation in populators

The populators are handling existing PVCs.
The PVC already has a defined requested size,
inflating the PVC' with fsoverhead will only be
on the PVC' spec and will not reflect on the target
PVC, this seems undesired.
Instead if the populators is using by PVC that the
datavolume controller created the inflation will happen
there if needed.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Adjust functional tests to handle dvs using populators

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Fix clone test

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* add shouldUpdateProgress variable to know if need to update progress

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Change update of annotation from denied list to allowed list

Instead if checking if the annotation on pvcPrime is not desired
go over desired list and if the annotation exists add it.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* fix removing annotations from pv when rebinding

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* More fixes and UT

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* a bit more updates and UTs

Signed-off-by: Shelly Kagan <skagan@redhat.com>

---------

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Run bazelisk run //robots/cmd/uploader:uploader -- -workspace /home/prow/go/src/github.com/kubevirt/project-infra/../containerized-data-importer/WORKSPACE -dry-run=false (#2751)

Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>

* Allow dynamic linked build for non bazel build (#2753)

The current script always passes the static ldflag to the
compiler which will result in a static binary. We would like
to be able to build dynamic libraries instead.

cdi-containerimage-server has to be static because we
are copying it into the context of a container disk container
which is most likely based on a scratch container and has no
libraries for us to use.

Signed-off-by: Alexander Wels <awels@redhat.com>

* Disable DV GC by default (#2754)

* Disable DV GC by default

DataVolume garbage collection is a nice feature, but unfortunately it
violates fundamental principle of Kubernetes. CR should not be
auto-deleted when it completes its role (Job with TTLSecondsAfter-
Finished is an exception), and once CR was created we can assume it is
there until explicitly deleted. In addition, CR should keep idempotency,
so the same CR manifest can be applied multiple times, as long as it is
a valid update (e.g. DataVolume validation webhook does not allow
updating the spec).

When GC is enabled, some systems (e.g GitOps / ArgoCD) may require a
workaround (DV annotation deleteAfterCompletion = "false") to prevent
GC and function correctly.

On the next kubevirt-bot Bump kubevirtci PR (with bump-cdi), it will
fail on all kubevirtci lanes with tests referring DVs, as the tests
IsDataVolumeGC() looks at CDIConfig Spec.DataVolumeTTLSeconds and
assumes default is enabled. This should be fixed there.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Fix test waiting for PVC deletion with UID

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Fix clone test assuming DV was GCed

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Fix DIC controller DV/PVC deletion when snapshot is ready

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Ido Aharon <iaharon@redhat.com>
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
Signed-off-by: Alvaro Romero <alromero@redhat.com>
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
Signed-off-by: Alexander Wels <awels@redhat.com>
Signed-off-by: Shelly Kagan <skagan@redhat.com>
Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Co-authored-by: Ido Aharon <iaharon@redhat.com>
Co-authored-by: Michael Henriksen <mhenriks@redhat.com>
Co-authored-by: alromeros <alromero@redhat.com>
Co-authored-by: akalenyu <akalenyu@redhat.com>
Co-authored-by: Shelly Kagan <skagan@redhat.com>
Co-authored-by: kubevirt-bot <kubevirtbot@redhat.com>
Co-authored-by: Arnon Gilboa <agilboa@redhat.com>
2023-06-22 00:59:17 +02:00
Arnon Gilboa
bfe30a8964
Split and refactor DV controller (#2483)
- Split the huge DV controller into smaller op-specific DV controllers -
  import, clone, upload
- Add common watch-adding function so each controller watches only its
  relevant DVs
- Refactor the common Reconcile() to use interface DataVolumeReconciler
  implemented by each controller
- Move all functions, structs, consts to the relevant controller
- Split the utests per controller

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2022-12-22 01:03:15 +00:00
Adam Litke
1532ae9ace
Suppress alerting for unsupported provisioners (#2476)
Certain storage provisioners are known not to work with CDI.  The main
example is provisioners that implement object storage (ie. buckets).  We
don't want to populate the StorageProfile with ClaimPropertySets but
there is also no need to raise an "incomplete" alert about such a
StorageProfile.  The simplest approach is to simply not count these
unsupported provisioners when setting the IncompleteProfileGauge.

Signed-off-by: Adam Litke <alitke@redhat.com>

Signed-off-by: Adam Litke <alitke@redhat.com>
2022-11-23 20:24:40 +00:00
akalenyu
4d4ad12df5
Only list Ingresses/Routes in CDI namespace instead of cluster level (#2371)
* Only list Ingresses/Routes in CDI namespace instead of cluster level

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Change the way we initialize cache for cdi controller

This gives us flexibility to cache only exactly what we need.
The error that led me to this was that we were attempting to Watch()
Routes/Ingresses which is basically caching all namespaces. We only want to cache the CDI namespace for those.
Source/feature from https://github.com/kubernetes-sigs/controller-runtime/issues/1708

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2022-08-01 22:12:47 +02:00
Alexander Wels
bbee324537
Make CDI clone strategy strings correct types (#2277)
Noticed the consts were strings instead of the right types
this changes the type to the right type and modifies the usage
to not have to cast to the right type all over the place.

Signed-off-by: Alexander Wels <awels@redhat.com>
2022-05-17 14:19:25 +02:00
Maya Rashish
c05a750a9a
Detect storage capabilities for no-provisioner storage classes (#2226)
* Detect storage capabilities for no-provisioner storage classes

Assume there's a persistent volume that we can look up to infer the
correct values for volume mode and access modes.

Limit ourselves to detecting no-provisioner capabilities on LSO to
avoid greatly increasing the number of storage classes we provide
capabilities for. This is similar to our current flow where we
only provide capabilities for known storage classes.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Regenerate bazel stuff for pkg/monitoring's existence

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Add a watcher for no-provisioner PVs

We maintain a map of storage class names and provisioners whenever
storage classes are changed.

If a PV has one of the storage classes with no-provisioner as a
provisioner, reconcile that storage class.

This is because we infer the storage profile based on PVs, and
new ones might have different storage capabilities.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Use a client to do our storage class caching

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Pass a client as an argument, not global.

Suggested by awels, thanks!

Signed-off-by: Maya Rashish <mrashish@redhat.com>
2022-05-12 14:00:02 +02:00
João Vilaça
1ad70c2ec2
Gather all metrics info in a single location (#2231)
* Gather all metrics info in a single location

Signed-off-by: João Vilaça <jvilaca@redhat.com>

* Add comments to exported monitoring vars

Signed-off-by: João Vilaça <jvilaca@redhat.com>
2022-04-13 01:46:19 +02:00
Michael Henriksen
d56e0cca05
23 libs (#2077)
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2022-01-07 16:56:25 +01:00
Prajith Kesava Prasad
7a85f2c18c
Trigger Host Assisted Cloning for encrypted ocs sc. (#2004)
* Trigger Host Assisted Cloning for encrypted ocs sc.

Signed-off-by: Prajith Kesava Prasad <pkesavap@redhat.com>

* Add functional tests

Signed-off-by: Prajith Kesava Prasad <pkesavap@redhat.com>
2021-12-11 00:52:00 +01:00
akalenyu
38af724f1c
Add alert for incomplete storage profiles / delete profile when corresponding SC gone (#2027)
* Add alert for incomplete storage profiles

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Run metric tests on both openshift and k8s

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Add functional test for storageprofile metrics

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Delete profile as a follow up to storage class getting deleted

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Address review, alter tests to cover List metric approach

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Address review; individually loop over metric decrement, shorten reconcile.Result{}

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Address review; deletion timestamp not possible when err/teardown in AfterEach

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2021-12-01 21:54:59 +01:00
akalenyu
eb639a6ac5
Change some relationship labels on update as well (#2018)
* Update operator-lifecycle-sdk to get fix for labels on upgrade

Update dep to get https://github.com/kubevirt/controller-lifecycle-operator-sdk/pull/19

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Reconcile labels also for CDIConfig

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Reconcile labels on storageprofile

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Reconcile remaining operator resources for updated labels

BZ#2017478

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2021-11-23 16:16:49 +01:00
Shelly Kagan
e7dd62eb26
Upload archive (#1969)
* Add support for archive upload

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* fix golang errors

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Change storage profile property set to support more then one set

So far CDI supported only 1 claim propery set. We want to be able
to support more then one so in case the user provides to the
DV storage volumeMode without accessMode or vice versa cdi
will be able to fit to it the most appropriate match.
Added to rook ceph block a second default of filesystem
volume mode with RWO access mode, it will support archive
upload which has default of filesystem mode.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* CR fix - change to one endpoint for the user

upload proxy will identify if the upload is archive
or not by looking at the content type annotation on
the pvc. If the content type is archive it will route
the uplaod to upload server to a new archive upload uri.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Add storage profile and data volume controllers unit tests

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* CR fixes

* add default volume mode to archive content type
* upload server use data processor for archive upload
* tests for volume mode with archive content type
* tests for archive upload of compressed tar

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Adjust imports acording to new apis dir

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* CR small fixes

Signed-off-by: Shelly Kagan <skagan@redhat.com>
2021-11-03 20:11:47 +01:00
Michael Henriksen
aedaf513ec
Move apis to staging, push to containerized-data-importer-api (#1997)
* move apis to new staging area

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* add script to push to staging

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* fix lint check and api reference

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* push staging to api repo

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2021-10-28 13:40:24 +02:00
Bartosz Rybacki
a82a0a7a59
Correct the cloneStrategy on StorageProfile (#1951)
Moves the cloneStrategy up one level from claimPropertySet to
StorageProfile.spec and StorageProfile.status.

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
2021-09-22 18:06:18 +02:00
akalenyu
2254cf0c1f
Add relationship labels (#1864)
Users don't want 👽 resources in clusters,
and we should also be able to tell if were part of a broader installation.

Note:
- Operator created resources were handled in https://github.com/kubevirt/controller-lifecycle-operator-sdk/pull/18
as these labels will be common to all resources deployed by the HCO.
- Now that the controller is guaranteed to have the labels, we can set env vars
that reference the label values (fieldRef) to spare calling GET on the CR in the controllers.
(thanks mhenriks).

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2021-07-28 20:05:24 +02:00
Michael Henriksen
d92c2f459d
update deps and bazel (#1815)
* update deps and bazel

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* fix apidocs and unit tests

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* fix generate-verify

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2021-06-08 01:31:59 +02:00
Bartosz Rybacki
1a6697b618
Provide storage profiles (#1734)
* Fix cluster scope for StorageProfile

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Automatically provide StorageProfiles based on well known providers

A new controller scans Storage Classes and creates a new StorageProfile for each. It provides recommended parameters for storage classes with well known providers.

Add StorageProfiles CRD to "make generate-verify" target

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>

* Refactor - extracted getStorageProfile to make code readable

Signed-off-by: Bartosz Rybacki <brybacki@redhat.com>
2021-04-08 13:40:58 +02:00