* Bump k8s/OpenShift/ctrl-runtime/lifecycle-sdk & make deps-update
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Operator: adapt for dependency bump
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Controller: adapt watch calls for dependency bump
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Controller: adapt to ctrl-runtime's cache API changes
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Operator: fix unit tests by deleting resources properly in fake client
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Controller: fix unit tests by deleting resources properly in fake client
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Controller: adapt to fake client honoring status subresource
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Fix codegen script & make generate
There are some issues in the new script, so we
will still use the deprecated one.
More context in f4d1a5431b
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Functests: Adapt to NamespacedName now implementing MarshalLog
ns/name -> {"name":"name","namespace":"ns"}
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Functests & API server: address deprecation of wait.PollImmediate
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
---------
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
This commit ups the cpu request for for all our installed compopents
(cdi-deployment, cdi-apiserver, cdi-uploadproxy, cdi-operator)
for 10m (1% of a core) to 100m (10% of a core).
The main driver of this is BZ: 2216038.
Without this change, it is pretty easy to create a large number of
concurrent clone operations and get token timeout errors.
Upping resource requests and concurrency addresses the issue
in a very direct way.
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* Tolerate errors due to deployment resourceVersion changing
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Watch and remove v1alpha1 version from CDI CRD
Previously we didn't monitor this resource at all, and since it's
not controlled by the cdi operator, we need to use custom watch
code for it.
Re-use the code for removing old version to make sure that if
v1alpha1 was ever a storedVersion, it's removed. Add a test for
that, too.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Don't wait for the deployment to be ready
Signed-off-by: Maya Rashish <mrashish@redhat.com>
---------
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Ensure Prometheus resources exist for CDINotReady
Ensure Prometheus resources exist also when cdi-deployment is not
ready. This is needed because currently when cdi deployment fails
(e.g. wrong NodeSelector) the Prometheus resources are not created,
so CDINotReady will not be fired although CDI is not ready.
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* CR fixes
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
* Gather all metrics info in a single location
Signed-off-by: João Vilaça <jvilaca@redhat.com>
* Add comments to exported monitoring vars
Signed-off-by: João Vilaça <jvilaca@redhat.com>
* Add degraded alert
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Add unusual restart count metric
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Add actual firing alerts (degraded/restartcount)
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Test newly added metrics
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Review: Rename metric to match conventions, func to check if test is eligible to run metric tests
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* Get rid of similar funcs, reconcile more generally
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* move apis to new staging area
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* add script to push to staging
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* fix lint check and api reference
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* push staging to api repo
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
Users don't want 👽 resources in clusters,
and we should also be able to tell if were part of a broader installation.
Note:
- Operator created resources were handled in https://github.com/kubevirt/controller-lifecycle-operator-sdk/pull/18
as these labels will be common to all resources deployed by the HCO.
- Now that the controller is guaranteed to have the labels, we can set env vars
that reference the label values (fieldRef) to spare calling GET on the CR in the controllers.
(thanks mhenriks).
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
* update deps and bazel
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* fix apidocs and unit tests
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* fix generate-verify
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* sigs.k8s.io/controller-runtime/pkg/runtime/* packages are deprecated, and were moved to new paths.
Trying to upgrade sigs.k8s.io/controller-runtime to version v0.7.0 in HCO created a conflict because in v0.7.0 the deprecated packages were removed and cannot be used.
This PR replaces the deprecated packages with their new paths.
Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
* Run `make deps-update`
Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
* fix logger init
Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
* fix test loggers
Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
during the reconcile object creation, do it during the reconcile loop where
we already look up the CR anyway, and setting the value is free.
Signed-off-by: Alexander Wels <awels@redhat.com>
* Generate CDI CRD using controller-tools.
This is only done for CDI CRD as it requires the existence of source
code. Other CRDs we create are created by a more bare bones pod.
CDIUninstallStrategy was missing a comment describing it, so add
one. This was spotted manually so there might be more missing.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Allow users to specify which nodes CDI pods will live on.
nodeSelector, affinity and tolerations are possible values.
This is done in the CDI CR (rather than CDIConfig) as we are
interested in having this field be populated by external operators.
Unit tests now require the existence of a CDI CR, so create it.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Add a unit test covering some node placement functions
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Specify that all our pods are linux-only.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Avoid duplicate test, accidental left over.
Pointed out by awels, thanks.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Rename to cdiOperatorDeployment for clarity.
Suggested by awels
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Specify we only run on linux using the CDI CR, no need to embed this
into the code.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Don't dereference workloadPlacement for no reason
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Split off operator test to have its own AfterEach, BeforeEach.
Use even more descriptive function names.
Do all the CDI delete/restore logic in AfterEach, to ensure that
it happens and restores the deployment with the original CR even
if the test fails.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Remove XXX. This is the proper way.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Adapt to latest changes in controller_test.go (renaming import)
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Simplify, not storing intermediate value.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Don't dereference nodeplacement in callers to CreateDeployment
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Remove redundant save & restore. Unit tests do this for us.
Pointed out by awels, thanks.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Split out "find toplevel" to a utility function
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Wait for the CDI CR update to apply before continuing.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Simplify, not storing intermediate value.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Make it clear that the chosen node placement will not be schedulable.
Signed-off-by: Maya Rashish <mrashish@redhat.com>
* Add events to operator condition changes
Add events to operator create/delete/update of managed resources.
Signed-off-by: Alexander Wels <awels@redhat.com>
* Updated unit tests based on comments
Signed-off-by: Alexander Wels <awels@redhat.com>
* rebase on betav1
Signed-off-by: Alexander Wels <awels@redhat.com>
* Removed start events to reduce event generation spam
Signed-off-by: Alexander Wels <awels@redhat.com>
* Move CRDS from apiextensions v1beta1 to v1.
Ensure that our code based schema validation matches the types in the api.
Signed-off-by: Alexander Wels <awels@redhat.com>
* Ran go mod tidy and vendor in attempt to see if we could use newer runtime controller, but our go version too old.
Addressed review comments.
Signed-off-by: Alexander Wels <awels@redhat.com>
* Addressed more review comments and fixed k8s-1.18 functional test failing.
Signed-off-by: Alexander Wels <awels@redhat.com>
* Remove categories 'all' from cluster scoped CRDs
Signed-off-by: Alexander Wels <awels@redhat.com>
* Strip status from objects that have them before comparing them to expected objects which will have a blank status anyway. This stops the operator from needlessly reconciling objects that should not get reconciled due to a slight status change.
Signed-off-by: Alexander Wels <awels@redhat.com>
* Address review comments:
- Simplified status check code.
- Added default values for webhooks
- Added default values for deployments
Signed-off-by: Alexander Wels <awels@redhat.com>
* move upload.cdi.kubevirt.io API group to v1beta1
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* move core api to v1beta1
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* fix os-3.11 cluster sync and add functional tests for alpha api
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* change more occurences of v1alpha1
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* updates after rebase
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
Fix#1212
Make sure that the `Status.ObservedVersion` fiels on upgrade, even if it was not set in the previous version.
Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
* use dedicated SCC
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* SCC was not getting on initial deploy
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
If the user explicitly delete CDI cr,
HCO will quickly try to create a new one.
If HCO is quick enough, CDI operator can
enter the reconciliation loop when an older
cdi-config config-map is still there although
marked for deletion.
In that case CDI operator was not going to create
a new config-map but was then marking the new CDI CR
in error phase just because it's still not controlled
by the config map pending for deletion.
On the next run, CDI operator was not going to create a
new config map just because the new CR is already marked
with phase=Errror.
Skip the controlled-by check on config maps marked for
deletion to avoid this bad loop.
Fixes: https://bugzilla.redhat.com/1809872
Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
* webhook to block deletion of datavolumes for BlockUninstallIfWorkloadsExist uninstallStrategy
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* fix apiserver permissions and tighten up cdi delete webhook functional test
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* fix cdi delete webhook for older k8s versions that don't send the object
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* cleanup webhooks and apiservices on upgrade
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* have to wait for cdi configmap to be garbage collected
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* do dry run deletes for datavolume protection webhook
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* initial client upgrade to 1.16
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* fix Route detection in OpenShift
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* remove DOCKER_REPO from operator
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* make generate and update CDI schema
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
* * Initial upgrade support
* - Detect from reconcile loop that it is uograde flow
* - Set ObeservedVersion to target when upgrade is finished
* - Delete unused objects at the end of upgrade
* * opertor controller unit test - detect upgrade
* cdi upgrade unit tests
* - verify upgrade flow is detected when version is updated
* - verify on upgrade objects are updated
* - verify on upgrade unused objects are deleted
* * optimize cleanuoUnusedResourses function
* fix logging error
* * CR fixes
* remove unused methods in unit tests
* use reflect.DeepEqual to compare runtime.Objects in unit test
* check DeletionTimeStamp before entering upgrade
* * uit tests - CR is deleted during/before upgrade
* * CR fixes:
* - invoke Deletion callbacks before and after resource deletion on clenaupUnusedResourse function
* - when looking for object to delete - search not only by name but by namespace as well
* * delete unused resources of previous version is CDI CRF is marked for deletion during upgrade
* add unit test for this case
* * should not start upgrade if versions are identical
* * add unit tests to verify there is no upgrade on identical versions
* CR fix - return error
* don't think we have to explicitly cleanup old resources when CDI deleted during upgrade
* refactor code and properly handle deleting resources on upgrade
* reconcile loop now does three way merge to better handle upgrade