Commit Graph

9 Commits

Author SHA1 Message Date
Arnon Gilboa
afb269737b
Suppress alerts to reduce noise of dependent ones (#3129)
* Suppress alerts to reduce noise of dependent ones

This is a follow-up to #2998 introducing the following changes to alert
rules:

- CDIDefaultStorageClassDegraded - do not fire when no default SC
  (either k8s or virt).
- CDIDataImportCronOutdated - do not fire when no default SC (either
  k8s or virt), as DIC import DVs use default SC.
- CDINoDefaultStorageClass - fire not only when there is a pending DV
  (and PVC) but also when DV has an empty status (waiting for default
  SC etc.)

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Remove expensive func test

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Refactor updateDataImportCronCondition

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add some verify comments

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add IsDataVolumeUsingDefaultStorageClass helper for readability

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2024-04-07 20:47:51 +02:00
Aviv Litman
42ec627e35
Refactor recording-rules and alerts code (#3068)
* Refactor recording-rules and alerts code

Signed-off-by: avlitman <alitman@redhat.com>

* Remove promv1 from schema

Signed-off-by: avlitman <alitman@redhat.com>

---------

Signed-off-by: avlitman <alitman@redhat.com>
2024-02-18 16:05:42 +01:00
Aviv Litman
3bb70209d0
Refactor monitoring code (#3009)
* refactor monitoring

Signed-off-by: avlitman <alitman@redhat.com>

* Upgrade pointer to pnt

Signed-off-by: avlitman <alitman@redhat.com>

* fix controller base and ready gague

Signed-off-by: avlitman <alitman@redhat.com>

---------

Signed-off-by: avlitman <alitman@redhat.com>
2024-01-02 09:17:18 +01:00
Arnon Gilboa
edda5abe0f
Add new Prometheus alerts and label existing alerts (#2998)
* Add Prometheus alerts and label existing alerts

- CDINoDefaultStorageClass - not having a default (or virt default)
SC is surely not an OpenShift error, as admins may prefer their cluster
users to only use explicit SC names. However, in the CDI context when
DV is created with default SC but default does not exist, we will fire
an error event and the PVC will be Pending for the default SC, so when
there are such Pending PVCs we will fire an alert.

- CDIDefaultStorageClassDegraded - when the default (or virt default)
SC does not support CSI/Snapshot clone (smart clone) or does not have
ReadWriteMany access mode (for live migration).

- CDIStorageProfilesIncomplete - add storageClass and provisioner
labels.

- CDIDataImportCronOutdated - add dataImportCron namespace and name
labels.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Create stub VolumeSnapshotClass for testing

Including the VolumeSnapshot/Class/Content crds for the
CDIDefaultStorageClassDegraded alert func test.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add snapshot manifests for tests

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Deploy snapshot CRDs in the hpp destructive lane

Remove stub snapshot CRDs

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add label explanation to new metric help

Also rename the metric kubevirt_cdi_storageprofile_status to
kubevirt_cdi_storageprofile_info since it always reports value 1,
where the label values provide the details about the storage
class and storage profile.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Revert NoProvisioner check removal

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Nicify StorageProfile metric update

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2023-12-19 12:29:08 +01:00
Arnon Gilboa
0ee4a61987
Get rid of DataImportCron finalizer (#2144)
* Get rid of DataImportCron finalizer

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Remove CRDs deletion in operator deletion

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Cleanups

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2022-02-12 05:56:08 +01:00
Michael Henriksen
d56e0cca05
23 libs (#2077)
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2022-01-07 16:56:25 +01:00
Arnon Gilboa
d77abc3fa9
Add DataSource controller to update the Ready condition (#2085)
even when there is no DataImportCron associated

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2022-01-06 21:06:23 +01:00
akalenyu
3bff27dd43
Add alert for DataImportCron not being up to date (#2063)
* Add alert for DataImportCron failing

DataImportCrons now have conditions (particularly UpToDate) that tell us if
things are going as planned. We can utilize those to alert whenever were not UpToDate for a while.

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Address CR review; don't List, increment when needed via corresponding instance

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Address review & bugfix: don't update metric if err occurs

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* upToDateCondition => prevUpToDateCondition so it's clear we're deciding if we should inc/dec based on that

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Don't store state in controller; change metric type to GaugeVec (bool metric per DIC)

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2021-12-24 01:10:52 +01:00
Arnon Gilboa
fe018f1dc5
Add DataImportCron status conditions (#2045)
* Add DataImportCron status conditions

The `DataImportCron` controller updates the status conditions in a
controlled `DataImportCron` and its managed `DataSource`.

DataImportCron:
- UpToDate - indicates if the the most recent import is successful and
    `DataSource` is up-to-date. Updated to False whenever the source
     digest (latest sha256) is updated.
- Progressing - indicates whether the cron is currently in the process
    of importing. Updated to True if there is a current import and its
    `DataVolume` is `ImportInProgress`, otherwise False.

DataSource:
- Ready - indicates that the corresponding pvc exists and is populated.
    Update according to `DataImportCron.Status.LastImportedPVC`
    `DataVolume`'s `DataVolumeReady` condition, if the `DataVolume`
    exists. Otherwise False. Unlike `DataImportCron` `UpToDate`
    condition, this one does not care about newer source digest.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add DataImportCron RetentionPolicy and remove OwnerReferences

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* More CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add tests for retention policies and datasource/datavolume recreation if deleted

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add status condition tests

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* SetRecommendedLabels for all created CRs

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2021-12-16 02:21:01 +01:00