Commit Graph

53 Commits

Author SHA1 Message Date
Aviv Litman
42ec627e35
Refactor recording-rules and alerts code (#3068)
* Refactor recording-rules and alerts code

Signed-off-by: avlitman <alitman@redhat.com>

* Remove promv1 from schema

Signed-off-by: avlitman <alitman@redhat.com>

---------

Signed-off-by: avlitman <alitman@redhat.com>
2024-02-18 16:05:42 +01:00
Alex Kalenyuk
31d12e426e
update k8s & related libraries to 1.28 (#3078)
* Bump k8s/OpenShift/ctrl-runtime/lifecycle-sdk & make deps-update

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Operator: adapt for dependency bump

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Controller: adapt watch calls for dependency bump

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Controller: adapt to ctrl-runtime's cache API changes

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Operator: fix unit tests by deleting resources properly in fake client

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Controller: fix unit tests by deleting resources properly in fake client

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Controller: adapt to fake client honoring status subresource

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Fix codegen script & make generate

There are some issues in the new script, so we
will still use the deprecated one.
More context in f4d1a5431b

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Functests: Adapt to NamespacedName now implementing MarshalLog

ns/name -> {"name":"name","namespace":"ns"}

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Functests & API server: address deprecation of wait.PollImmediate

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

---------

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2024-01-23 17:52:05 +01:00
Aviv Litman
3bb70209d0
Refactor monitoring code (#3009)
* refactor monitoring

Signed-off-by: avlitman <alitman@redhat.com>

* Upgrade pointer to pnt

Signed-off-by: avlitman <alitman@redhat.com>

* fix controller base and ready gague

Signed-off-by: avlitman <alitman@redhat.com>

---------

Signed-off-by: avlitman <alitman@redhat.com>
2024-01-02 09:17:18 +01:00
Michael Henriksen
cc8dbc3bae
increase controller cuncurrency and cpu requests (#2862)
This commit ups the cpu request for for all our installed compopents
(cdi-deployment, cdi-apiserver, cdi-uploadproxy, cdi-operator)
for 10m (1% of a core) to 100m (10% of a core).
The main driver of this is BZ: 2216038.
Without this change, it is pretty easy to create a large number of
concurrent clone operations and get token timeout errors.
Upping resource requests and concurrency addresses the issue
in a very direct way.

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2023-08-24 02:48:34 +02:00
Maya Rashish
be9e17141c
Remove v1alpha1 version from CDI CRD (#2527)
* Tolerate errors due to deployment resourceVersion changing

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Watch and remove v1alpha1 version from CDI CRD

Previously we didn't monitor this resource at all, and since it's
not controlled by the cdi operator, we need to use custom watch
code for it.

Re-use the code for removing old version to make sure that if
v1alpha1 was ever a storedVersion, it's removed. Add a test for
that, too.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Don't wait for the deployment to be ready

Signed-off-by: Maya Rashish <mrashish@redhat.com>

---------

Signed-off-by: Maya Rashish <mrashish@redhat.com>
2023-02-02 05:16:09 +01:00
Arnon Gilboa
fe006ad923
Ensure Prometheus resources exist for CDINotReady (#2546)
* Ensure Prometheus resources exist for CDINotReady

Ensure Prometheus resources exist also when cdi-deployment is not
ready. This is needed because currently when cdi deployment fails
(e.g. wrong NodeSelector) the Prometheus resources are not created,
so CDINotReady will not be fired although CDI is not ready.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2023-01-26 00:16:47 +01:00
João Vilaça
1ad70c2ec2
Gather all metrics info in a single location (#2231)
* Gather all metrics info in a single location

Signed-off-by: João Vilaça <jvilaca@redhat.com>

* Add comments to exported monitoring vars

Signed-off-by: João Vilaça <jvilaca@redhat.com>
2022-04-13 01:46:19 +02:00
Michael Henriksen
d56e0cca05
23 libs (#2077)
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2022-01-07 16:56:25 +01:00
akalenyu
fd332a3165
Degraded/unusual restartcount alerts (#2009)
* Add degraded alert

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Add unusual restart count metric

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Add actual firing alerts (degraded/restartcount)

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Test newly added metrics

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Review: Rename metric to match conventions, func to check if test is eligible to run metric tests

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Get rid of similar funcs, reconcile more generally

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2021-11-18 01:05:01 +01:00
Michael Henriksen
aedaf513ec
Move apis to staging, push to containerized-data-importer-api (#1997)
* move apis to new staging area

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* add script to push to staging

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* fix lint check and api reference

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* push staging to api repo

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2021-10-28 13:40:24 +02:00
akalenyu
2254cf0c1f
Add relationship labels (#1864)
Users don't want 👽 resources in clusters,
and we should also be able to tell if were part of a broader installation.

Note:
- Operator created resources were handled in https://github.com/kubevirt/controller-lifecycle-operator-sdk/pull/18
as these labels will be common to all resources deployed by the HCO.
- Now that the controller is guaranteed to have the labels, we can set env vars
that reference the label values (fieldRef) to spare calling GET on the CR in the controllers.
(thanks mhenriks).

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2021-07-28 20:05:24 +02:00
Michael Henriksen
d92c2f459d
update deps and bazel (#1815)
* update deps and bazel

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* fix apidocs and unit tests

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* fix generate-verify

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2021-06-08 01:31:59 +02:00
Vishesh Tanksale
ec52451f75
Add functionality to dump CDI install strategy in a configmap (#1741)
* ADD functionality to dump CDI install strategy in a configmap

Signed-off-by: Vishesh Ajay Tanksale <vtanksale@apple.com>

* Fixing linter issues

Signed-off-by: Vishesh Ajay Tanksale <vtanksale@apple.com>

Co-authored-by: Vishesh Ajay Tanksale <vtanksale@apple.com>
2021-05-07 18:18:14 +02:00
Michael Henriksen
ee2f8376bb
fix custom cert rotation params (#1775)
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2021-05-06 20:19:39 +02:00
Michael Henriksen
838ff7939a
update api for cert configuration (#1542)
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2021-01-14 16:11:03 +01:00
Nahshon Unna Tsameret
93627f4fe8
Stop Using Deprecated Packages (#1548)
* sigs.k8s.io/controller-runtime/pkg/runtime/* packages are deprecated, and were moved to new paths.

Trying to upgrade sigs.k8s.io/controller-runtime to version v0.7.0 in HCO created a conflict because in v0.7.0 the deprecated packages were removed and cannot be used.

This PR replaces the deprecated packages with their new paths.

Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>

* Run `make deps-update`

Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>

* fix logger init

Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>

* fix test loggers

Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
2020-12-24 07:08:50 +01:00
Jakub Dzon
7f368900de
Updated controller-lifecycle-operator-sdk dependency (#1389)
Signed-off-by: Jakub Dzon <jdzon@redhat.com>
2020-09-24 14:39:29 +02:00
Jakub Dzon
5aa47587d3
Introducing operator lifecycle sdk (#1350)
Signed-off-by: Jakub Dzon <jdzon@redhat.com>
2020-09-17 23:25:26 +02:00
Alexander Wels
5a9e0ce377
Change how we look up the initial node placement. Instead of trying to do it (#1374)
during the reconcile object creation, do it during the reconcile loop where
we already look up the CR anyway, and setting the value is free.

Signed-off-by: Alexander Wels <awels@redhat.com>
2020-09-17 11:25:25 +02:00
Maya Rashish
e3436e0199
Allow specifying nodeSelector, affinity and tolerations for CDI pods (#1346)
* Generate CDI CRD using controller-tools.

This is only done for CDI CRD as it requires the existence of source
code. Other CRDs we create are created by a more bare bones pod.

CDIUninstallStrategy was missing a comment describing it, so add
one. This was spotted manually so there might be more missing.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Allow users to specify which nodes CDI pods will live on.

nodeSelector, affinity and tolerations are possible values.

This is done in the CDI CR (rather than CDIConfig) as we are
interested in having this field be populated by external operators.

Unit tests now require the existence of a CDI CR, so create it.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Add a unit test covering some node placement functions

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Specify that all our pods are linux-only.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Avoid duplicate test, accidental left over.

Pointed out by awels, thanks.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Rename to cdiOperatorDeployment for clarity.

Suggested by awels

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Specify we only run on linux using the CDI CR, no need to embed this
into the code.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Don't dereference workloadPlacement for no reason

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Split off operator test to have its own AfterEach, BeforeEach.

Use even more descriptive function names.

Do all the CDI delete/restore logic in AfterEach, to ensure that
it happens and restores the deployment with the original CR even
if the test fails.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Remove XXX. This is the proper way.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Adapt to latest changes in controller_test.go (renaming import)

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Simplify, not storing intermediate value.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Don't dereference nodeplacement in callers to CreateDeployment

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Remove redundant save & restore. Unit tests do this for us.

Pointed out by awels, thanks.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Split out "find toplevel" to a utility function

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Wait for the CDI CR update to apply before continuing.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Simplify, not storing intermediate value.

Signed-off-by: Maya Rashish <mrashish@redhat.com>

* Make it clear that the chosen node placement will not be schedulable.

Signed-off-by: Maya Rashish <mrashish@redhat.com>
2020-09-03 22:13:18 +02:00
Alexander Wels
6cf86d5984
Add events to operator (#1182)
* Add events to operator condition changes
Add events to operator create/delete/update of managed resources.

Signed-off-by: Alexander Wels <awels@redhat.com>

* Updated unit tests based on comments

Signed-off-by: Alexander Wels <awels@redhat.com>

* rebase on betav1

Signed-off-by: Alexander Wels <awels@redhat.com>

* Removed start events to reduce event generation spam

Signed-off-by: Alexander Wels <awels@redhat.com>
2020-08-27 18:59:15 +02:00
Alexander Wels
6dce12f090
Move CRDS from apiextensions v1beta1 to v1. (#1307)
* Move CRDS from apiextensions v1beta1 to v1.
Ensure that our code based schema validation matches the types in the api.

Signed-off-by: Alexander Wels <awels@redhat.com>

* Ran go mod tidy and vendor in attempt to see if we could use newer runtime controller, but our go version too old.
Addressed review comments.

Signed-off-by: Alexander Wels <awels@redhat.com>

* Addressed more review comments and fixed k8s-1.18 functional test failing.

Signed-off-by: Alexander Wels <awels@redhat.com>

* Remove categories 'all' from cluster scoped CRDs

Signed-off-by: Alexander Wels <awels@redhat.com>
2020-08-01 01:01:50 +02:00
Alexander Wels
72d47353a2
Strip status from objects in operator reconcile loop (#1237)
* Strip status from objects that have them before comparing them to expected objects which will have a blank status anyway. This stops the operator from needlessly reconciling objects that should not get reconciled due to a slight status change.

Signed-off-by: Alexander Wels <awels@redhat.com>

* Address review comments:
- Simplified status check code.
- Added default values for webhooks
- Added default values for deployments

Signed-off-by: Alexander Wels <awels@redhat.com>
2020-07-27 17:19:47 +02:00
Michael Henriksen
9e2c79b1e0
move api groups to v1beta1 (#1232)
* move upload.cdi.kubevirt.io API group to v1beta1

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* move core api to v1beta1

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* fix os-3.11 cluster sync and add functional tests for alpha api

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* change more occurences of v1alpha1

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* updates after rebase

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2020-07-10 15:47:38 +02:00
Nahshon Unna Tsameret
ece10521e9
[Upgrade Operator] Make sure that ObservedVersion is updated (#1213)
Fix #1212

Make sure that the `Status.ObservedVersion` fiels  on upgrade, even if it was not set in the previous version.

Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
2020-05-26 15:17:31 +02:00
Michael Henriksen
fba04c868b
use dedicated SCC (#1174)
* use dedicated SCC

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* SCC was not getting on initial deploy

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2020-04-15 15:38:03 +02:00
Michael Henriksen
03c36c8cd8
wait for all old resources to be deleted when installing CDI (#1156)
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2020-03-27 05:18:32 +01:00
Simone Tiraboschi
a97e16724e
Avoid failing reconciliation on new CR (#1138)
If the user explicitly delete CDI cr,
HCO will quickly try to create a new one.
If HCO is quick enough, CDI operator can
enter the reconciliation loop when an older
cdi-config config-map is still there although
marked for deletion.
In that case CDI operator was not going to create
a new config-map but was then marking the new CDI CR
in error phase just because it's still not controlled
by the config map pending for deletion.
On the next run, CDI operator was not going to create a
new config map just because the new CR is already marked
with phase=Errror.
Skip the controlled-by check on config maps marked for
deletion to avoid this bad loop.

Fixes: https://bugzilla.redhat.com/1809872

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
2020-03-12 13:43:58 +01:00
Michael Henriksen
102ce2e78c
Uninstall strategy and blocking webhook (#1118)
* webhook to block deletion of datavolumes for BlockUninstallIfWorkloadsExist uninstallStrategy

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* fix apiserver permissions and tighten up cdi delete webhook functional test

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* fix cdi delete webhook for older k8s versions that don't send the object

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* cleanup webhooks and apiservices on upgrade

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* have to wait for cdi configmap to be garbage collected

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* do dry run deletes for datavolume protection webhook

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2020-02-21 05:34:50 +01:00
Michael Henriksen
64d7a26a65
need to use uncached client in certain places (#1107)
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2020-02-16 17:30:46 +01:00
Michael Henriksen
0b9fb15e86
operator create apiservice and webhook configurations (#1103)
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2020-02-11 05:45:15 +01:00
Michael Henriksen
bd4c4c950b
cert rotation (#1091)
* initial cert rotation controller

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* fix typo

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2020-02-03 23:36:58 +01:00
Michael Henriksen
99f8af5b86 k8s client upgrade to 1.16 (#1079)
* initial client upgrade to 1.16

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* fix Route detection in OpenShift

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2020-01-14 13:43:17 +01:00
Michael Henriksen
f1e8b88052 make operator more resilient when creating ownership configmap fails (#1047)
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2019-12-06 16:07:22 +01:00
Michael Henriksen
97c23cfa5a remove DOCKER_REPO from operator (#1022)
* remove DOCKER_REPO from operator

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* make generate and update CDI schema

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2019-11-14 02:59:16 +01:00
Alexander Wels
399a05dc53
Add upgrade from release version 1.10.4 to current master. (#954)
Switch OKD-4.1 provider from olm install to operator install.

Signed-off-by: Alexander Wels <awels@redhat.com>
2019-09-11 00:35:46 -04:00
Alexander Wels
28b0b7b70b
Set conditions properly while deploying. (#948)
Signed-off-by: Alexander Wels <awels@redhat.com>
2019-09-04 12:15:28 -04:00
Alexander Wels
45eecea14e
Added conditions to match the HCO requirements. (#910)
Signed-off-by: Alexander Wels <awels@redhat.com>
2019-08-28 18:36:44 -04:00
Michael Henriksen
412b6e10ca CDI upgrade support (#929)
* * Initial upgrade support
* - Detect from reconcile loop that it is uograde flow
* - Set ObeservedVersion to target when upgrade is finished
* - Delete unused objects at the end of upgrade

* *     opertor controller unit test - detect upgrade
    *  cdi upgrade unit tests
    *  - verify upgrade flow is detected when version is updated
    *  - verify on upgrade objects are updated
    *  - verify on upgrade unused objects are deleted

* * optimize cleanuoUnusedResourses function
* fix logging error

* * CR fixes
* remove unused methods in unit tests
* use reflect.DeepEqual to compare runtime.Objects in unit test
* check DeletionTimeStamp before entering upgrade

* * uit tests - CR is deleted during/before upgrade

* * CR fixes:
* - invoke Deletion callbacks before and after resource deletion on clenaupUnusedResourse function
* - when looking for object to delete - search not only by name but by namespace as well

* * delete unused resources of previous version is CDI CRF is marked for deletion during upgrade
* add unit test for this case

* * should not start upgrade if versions are identical

* * add unit tests to verify there is no upgrade on identical versions

* CR fix - return error

* don't think we have to explicitly cleanup old resources when CDI deleted during upgrade

* refactor code and properly handle deleting resources on upgrade

* reconcile loop now does three way merge to better handle upgrade
2019-08-27 08:43:49 -04:00
Michael Henriksen
24741566f3 route creation 2019-08-06 16:35:53 -04:00
Michael Henriksen
f8b79ba5bc CCC reconsiliation in callbacks also improved merge route creation TODO 2019-08-05 22:55:42 -04:00
Michael Henriksen
3fcb8edc4b callbacks for operator 2019-08-05 22:55:42 -04:00
Alexander Wels
630a23ef23 Fix a bunch of go score card issues.
Signed-off-by: Alexander Wels <awels@redhat.com>
2019-05-06 16:52:03 -04:00
Michael Henriksen
680e223277 allow for override of registry and tag in CDI object 2019-04-19 10:58:12 -04:00
Michael Henriksen
dd66fa7594 fix bogus switch statement 2019-03-26 16:01:06 -04:00
Michael Henriksen
d2a3b1cc2f operator creates upload proxy route 2019-03-26 09:16:24 -04:00
Michael Henriksen
6f1d130d97 tests and review comments 2019-02-25 20:12:56 -05:00
Michael Henriksen
3892a7310d add configmap for insecure regestries 2019-02-25 20:12:56 -05:00
Michael Henriksen
316fd29188 rename operator leadership configMap and clean up other secrets/configmaps created 2019-02-04 19:58:50 -05:00
Michael Henriksen
277193f18a operator unit tests 2019-01-29 12:50:27 -05:00