containerized-data-importer

mirror of https://github.com/kubevirt/containerized-data-importer.git synced 2025-06-03 06:30:22 +00:00

Author	SHA1	Message	Date
akalenyu	5dbbb10ee9	Adjust alert summary that is visible in UI to be more informative (#2149 ) As of now OpenShift UI will not display the runbook URL field for our alerts (docs and google will though), so let's make sure we provide a better entry point by being verbose in what is actually visible to the user. Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>	2022-02-14 16:41:50 +01:00
João Vilaça	2b7f786ffe	Add tool to generate metrics documentation (#2043 ) Signed-off-by: João Vilaça <jvilaca@redhat.com>	2022-01-26 01:23:10 +01:00
Michael Henriksen	d56e0cca05	23 libs (#2077 ) Signed-off-by: Michael Henriksen <mhenriks@redhat.com>	2022-01-07 16:56:25 +01:00
akalenyu	3bff27dd43	Add alert for DataImportCron not being up to date (#2063 ) * Add alert for DataImportCron failing DataImportCrons now have conditions (particularly UpToDate) that tell us if things are going as planned. We can utilize those to alert whenever were not UpToDate for a while. Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Address CR review; don't List, increment when needed via corresponding instance Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Address review & bugfix: don't update metric if err occurs Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * upToDateCondition => prevUpToDateCondition so it's clear we're deciding if we should inc/dec based on that Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Don't store state in controller; change metric type to GaugeVec (bool metric per DIC) Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>	2021-12-24 01:10:52 +01:00
Assaf Admi	639c6a1bd1	Add common labels into alert definitions (#2039 ) We want to be able to list all kubevirt alerts so we added labels to differentiate them. Signed-off-by: assafad <aadmi@redhat.com>	2021-12-13 18:05:08 +01:00
akalenyu	38af724f1c	Add alert for incomplete storage profiles / delete profile when corresponding SC gone (#2027 ) * Add alert for incomplete storage profiles Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Run metric tests on both openshift and k8s Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Add functional test for storageprofile metrics Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Delete profile as a follow up to storage class getting deleted Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Address review, alter tests to cover List metric approach Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Address review; individually loop over metric decrement, shorten reconcile.Result{} Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Address review; deletion timestamp not possible when err/teardown in AfterEach Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>	2021-12-01 21:54:59 +01:00
akalenyu	fd332a3165	Degraded/unusual restartcount alerts (#2009 ) * Add degraded alert Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Add unusual restart count metric Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Add actual firing alerts (degraded/restartcount) Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Test newly added metrics Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Review: Rename metric to match conventions, func to check if test is eligible to run metric tests Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Get rid of similar funcs, reconcile more generally Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>	2021-11-18 01:05:01 +01:00
akalenyu	50c93e8b0e	Deploy alerts infra as part of our installation (#1979 ) * Deploy alerts infra as part of our installation Conditionally deploy the infrastructure that is needed to fire alerts for our users when bad things are happening to CDI. Testing with `KUBEVIRT_DEPLOY_PROMETHEUS=true` Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Watch and unit test all prometheus related resources Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * add gateway for changing monitoring namespace (rbac purposes) Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * refactor test to check for exact alert name and firing state Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Align pattern of ensuring prometheus resource exists for all Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Remove potential noisy event Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Extract duplicate code to function Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com> * Dont use empty value for prometheus label due to open issue https://github.com/prometheus-operator/prometheus-operator/issues/4325 Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>	2021-10-26 21:26:07 +02:00

8 Commits