Commit Graph

548 Commits

Author SHA1 Message Date
Edu Gómez Escandell
cd7c8b14a5
Enable revive linter (#3241)
* Enable revive linter

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Simplify cdi-func-test-proxy

This function had quite a bit of redundant code (caught by the linter).
The workgroup was never Done because all exit paths went through a
log.Fatal.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Fix 'revive' linter warnings

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Fix tests that asserted on modified error messages

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Run make format

The formatted code has nothing to do with this PR but we may as well
include it.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Use lower-case variables and use built-in min function in vddk-datasource

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Use contexts in cdi-func-test-proxy

This added quite a bit of boilerplate per call, so I put everything in
a loop.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>
2024-05-21 22:35:42 +02:00
Arnon Gilboa
a5cedb1555
Progress metrics refactor and rename (#3254)
* clone_progress metric refactor

The clone_progress metric is not in the monitoring package. The metric
is with incorrect name, based on the kubevirt and Prometheus metrics
naming conventions. It's not documented and not located under
/pkg/monitoring. After the code refactoring we should not have
Prometheus metrics in other places in the code, other than the
/monitoring/metrics package, and metrics should be registered using
operator-observability package.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* openstack_populator_progress metric refactor

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* clone_progress metric refactor CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* ovirt_progress metric refactor

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Align progress metrics names with linter rules

Also add the metrics to the doc and json generation tools.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Remove redundant ListMetrics

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2024-05-17 14:02:25 +02:00
Alex Kalenyuk
74e255259c
Use direct io with qemu-img convert if pod is OOMKilled (#3017)
* Use direct io with qemu-img convert if target supports it

For a while now we have been switching between cache=none (direct io) and cache=writeback (page cache)
for qemu-img's writes.

We have settled on cache=writeback for quite some time since https://github.com/kubevirt/containerized-data-importer/pull/1919,
however, this has proven to be problematic;
Our pod's live in a constrained memory environment (default limit 600M).
cgroupsv1 compares utilization of page cache vs the host's dirty_ratio.
This means that on a standard system (30% dirty ratio) pages only get forced to disk at 0.3 * HOST_MEM (basically never),
easily triggering OOM on hosts with lots of free memory.

cgroupsv2 does come to the rescue here:
- It considers dirty_ratio against CGROUP_MEM
- Has a new memory.high knob that throttles instead of OOM killing
Sadly, k8s is yet to capitalize on memory.high since this feature is still alpha:
https://kubernetes.io/blog/2023/05/05/qos-memory-resources/
Leaving us with no way to avoid frequent OOMs.

This commit changes the way we write to bypass page cache if the target supports it,
otherwise, fall back to cache=writeback (use page cache).
There have previously been issues where target did not support O_DIRECT. A quick example is tmpfs (ram-based)

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Capitalize on cache mode=trynone if importer is being OOMKilled

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

---------

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2024-05-16 02:22:23 +02:00
Benny Zlotnik
46f0f76ed4
forklift: introduce forklift controller (#2983)
* forklift: add types for forklift populators

Introduce the OvirtVolumePopulator and OpenstackVolumePopulator types

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: introduce forklift controller

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: add CRD missing CRD suffix

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: fix tests

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: start support for mutated PVC

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: fix linter issues

and skip bound PVCs

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: update vendor and generated code

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: remove unnecessary argument

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: add ovirt-populator to cdi-importer

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: fix linter issues

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: fix contrller_test

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: regenerate swagger

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: normalize ovirt image name

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: restore OVIRT_POPULATOR_IMAGE_NAME

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: address comments

- remove accidental copy-paste
- change type of NAD reference to string
- use NAD in the controller

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: add doc with examples for the forklift populators

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: rename secret reference

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: fix linter warning

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: update deps

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: fix linter issues

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: use ginkgo for openstack populator test

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* Roll back nginx from 1.24.0 to 1.22.1 to avoid segfaults when pulling…
… images with tls

Possibly a bug, there were some segfault fixes in 1.25.4 release:
https://nginx.org/en/CHANGES

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* forklift: address comments

- Fix const
- Reverse conditional
- Pass context to reconcile
- Improve cross namespace check
- Add populator pod watcher

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: address comments

- Drop unnecessary comments
- Add dataSourceRef check
- Add node to spec

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: add restartCount annotation

- fix race condition leaving Running phase in annotation
- fix metrics UT

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: address comments

- Change optional fields to a pointer
- Fix Pod watch
- Add NotFound check to avoid an extra reconile

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

* forklift: regenerate

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>

---------

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
Co-authored-by: Alex Kalenyuk <akalenyu@redhat.com>
2024-05-14 21:38:22 +02:00
Edu Gómez Escandell
1ae6536b91
Fix flaky DataImportCron controller reconcile loop unit test (#3256)
This change caused a unit test to be flaky:
https://github.com/kubevirt/containerized-data-importer/pull/3176/files#

Before, this code would round down to the second, and add 1. Essentially
doing a "ceil" on the time delta.

After the change, the round-down part disappeared, so it became a plain
+1 rather than a ceil.

The unit test in question has an assertion ensuring that this value is
within 0 and 60.

If this value is within 59 and 60:
- Before #3176: rounds up to 60 (test passes)
- After #3176: is mapped to within 60 and 61 (test fails)

I didn't want to revert to the old code because it was a bit confusing,
so I removed the +1, which I understand it's there to make sure we
don't undershoot (no longer necessary as we don't round down anymore).

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>
2024-05-14 11:14:25 +02:00
Edu Gómez Escandell
7fb78edade
Enable dupword linter (#3175)
* Enable dupword linter

Per the readme:
> A linter that checks for duplicate words in the source code (usually miswritten)

https://github.com/Abirdcfly/dupword
https://golangci-lint.run/usage/linters/#dupword

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Remove duplicate words in comments and strings

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>
2024-04-25 01:00:23 +02:00
Edu Gómez Escandell
0e750262a3
Enable autoformatting linters (#3179)
* Enable gofmt linter

From the docs:

> Gofmt checks whether code was gofmt-ed. By default this tool runs with
> -s option to check for code simplification.

https://golangci-lint.run/usage/linters/#gofmt

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Run gomft on the project

Ran this command after adding the gofmt linter:

	golangci-lint run ./... --fix

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Enable whitespace linter

From the docs:
> Whitespace is a linter that checks for unnecessary newlines at the
> start and end of functions, if, for, etc.

https://golangci-lint.run/usage/linters/#whitespace

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Run whitespace on the project

Ran this command after adding the whitespace linter:

	golangci-lint run ./... --fix

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Enable GCI linter

Per the docs:

> Gci controls Go package import order and makes it always deterministic.

https://golangci-lint.run/usage/linters/#gci

NOTE: I noticed that many files separate their imports in a particular
way, so I set the linter to enforce this standard.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Run GCI on the project

Ran this command after adding the GCI linter:

	golangci-lint run ./... --fix

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>
2024-04-24 13:52:22 +02:00
Alex Kalenyuk
98d66cfb77
Clean up leftover snapshot sources from DataImportCron tests (#3123)
* Clean up leftover snapshot sources from DataImportCron tests

The dataimportcron tests may affect existing crons in the environment (in addition to the ones deployed by testing)
so we are better off cleaning up after ourselves.

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Add watch for volumesnapshot delete

Although we don't support seamless transition from volumesnapshot->pvc sources
(we hope you stay on snapshots if they scale better for your storage)
this will still be needed in most cases when someone switches and manually deletes snapshots.

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

---------

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2024-04-21 04:14:20 +02:00
Edu Gómez Escandell
213040a441
Enable nakedret linter (#3178)
* Enable nakedret linter

> Checks that functions with naked returns are not longer than a maximum
> size (can be zero).

https://github.com/alexkohler/nakedret

Relevant from the GO style guide:
> Naked returns are okay if the function is a handful of lines. Once
> it’s a medium sized function, be explicit with your return values.

https://go.dev/wiki/CodeReviewComments#named-result-parameters

I think a naked return is never easier to read than a regular return,
so I set the linter to always complain about it. Disagreements are
welcome.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Refactor functions with naked returns

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>
2024-04-19 15:30:17 +02:00
Nahshon Unna Tsameret
475a70c8cd
Add new DV reason: ImagePullFailed (#3194)
When the image pull fails, the DataVolume Running condition, will have
the Reason field of `ImagePullFailed`, to allow better error handling in
code taht uses it.

Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
2024-04-18 19:35:59 +02:00
Edu Gómez Escandell
c68afa6819
Enable durationcheck linter (#3176)
* Enable durationcheck linter

> Check for two durations multiplied together.

https://github.com/charithe/durationcheck

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Fix misuse of time.Duration

I also had to rename the variable because go-statickcheck complains
about a time.Duration having the suffix 'Sec'.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>
2024-04-18 03:07:59 +02:00
Ben Coxford
ebb9d31ac9
fix: propagating mesh annotations to transient pods (#3186)
* fix propagating mesh annotations to transient pods

Signed-off-by: bcoxford <ben.coxford@ncr.com>

* set pvc annotations on prep claim and size detection transient pods

Signed-off-by: bcoxford <ben.coxford@ncr.com>

* remove old annotations

Signed-off-by: bcoxford <ben.coxford@ncr.com>

* switch to target dv on pvc clone controller

Signed-off-by: bcoxford <ben.coxford@ncr.com>

* use metav1 object

Signed-off-by: bcoxford <ben.coxford@ncr.com>

---------

Signed-off-by: bcoxford <ben.coxford@ncr.com>
2024-04-16 22:32:47 +02:00
Edu Gómez Escandell
021d902127
Enable linters checking error handling (#3177)
* Enable errorlint linter

Per the readme:

> go-errorlint is a source code linter for Go software that can be used
to find code that will cause problems with the error wrapping scheme
introduced in Go 1.13.

https://github.com/polyfloyd/go-errorlint

Essentially, it checks that you properly use errors.Is and errors.As

It also forces you to use %w, but that is a bit too much. Using %v is
fine when you don't want to leak the internal error types of the
package. This is why I disabled this setting.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Replace error comparisons with errors.Is

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Replace error type assertions with errors.As

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Enable errname linter

Per the readme:
> Checks that sentinel errors are prefixed with the Err and error
> types are suffixed with the Error.

https://github.com/Antonboom/errname

Not a big deal, but helps keep the naming consistent.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Fix errname linter warnings

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>
2024-04-16 22:32:40 +02:00
Edu Gómez Escandell
3c06e26bac
Enable unconvert linter (#3171)
* Enable unconvert linter

This linter's doc describes it as:

   The unconvert program analyzes Go packages to identify unnecessary
   type conversions; i.e., expressions T(x) where x already has type T.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Unrestrict the number of linter warnings

It is best to show all warnings at once than to reveal them piece-meal,
particularly in CI where the feedback loop can be a bit slow.

By default, linters may only print the same message three times
(https://golangci-lint.run/usage/configuration/#issues-configuration)
The unconvert linter always prints the same message, so it specially
affected by this setting.

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Remove redundant type conversions

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>
2024-04-15 20:32:08 +02:00
alromeros
3d1b83fdfa
Reject volumes with storage size lower than 1MiB (#3161)
Kubevirt disks need to be at least 1MiB to work.

In order to improve consistency, this commit modifies our DV and PVC admission webhook to reject storage sizes lower than 1MiB.

Signed-off-by: Alvaro Romero <alromero@redhat.com>
2024-04-15 20:32:01 +02:00
Michael Henriksen
3a66fd5343
Remove openshift dep on api (#3172)
Less deps the better!  Could be problematic for projects importing CDI and openshift.

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2024-04-15 16:10:01 +02:00
alromeros
9241633062
Stop handling error phase as "unknown" in clone-populator (#3163)
Clone populator handles its error phase as unknown when it comes to updating the DataVolume. This has some benefits since we avoid setting the DV to failed, which break consistency with the non-populator flow. However, handling it as unknown is a bit vague and we would benefit from actually triggering a DataVolume event if necessary.

This commit updates this behavior so we handle the error accordingly.

Signed-off-by: Alvaro Romero <alromero@redhat.com>
2024-04-10 00:57:55 +02:00
Edu Gómez Escandell
42001ae051
Enable misspell linter and fix spelling errors (#3164)
* Add misspell to list of linters

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

* Fix spelling errors

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>

---------

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>
2024-04-09 20:17:55 +02:00
Edu Gómez Escandell
eb6b76a399
Support IPv6 in controller GetMetricsURL (#3165)
Joining hostname+port with string concatenation leads to wrong URLs
when the hostame is an IPv6:

HOST    PORT   Sprintf    CORRECT
::1     8000   ::1:8000   [::1]:8000

To avoid this issue, it's best to use net.JoinHostPort. I added a test
that verifies this fix.

This type of issue can be discovered running the following command:

    golangci-lint run ./... --enable nosprintfhostport

Signed-off-by: Edu Gómez Escandell <egomez@redhat.com>
2024-04-08 14:43:52 +02:00
Arnon Gilboa
afb269737b
Suppress alerts to reduce noise of dependent ones (#3129)
* Suppress alerts to reduce noise of dependent ones

This is a follow-up to #2998 introducing the following changes to alert
rules:

- CDIDefaultStorageClassDegraded - do not fire when no default SC
  (either k8s or virt).
- CDIDataImportCronOutdated - do not fire when no default SC (either
  k8s or virt), as DIC import DVs use default SC.
- CDINoDefaultStorageClass - fire not only when there is a pending DV
  (and PVC) but also when DV has an empty status (waiting for default
  SC etc.)

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Remove expensive func test

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Refactor updateDataImportCronCondition

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add some verify comments

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add IsDataVolumeUsingDefaultStorageClass helper for readability

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2024-04-07 20:47:51 +02:00
Alex Kalenyuk
39cc6ea853
Fix volume/access mode inferring for temp host assisted source PVC in snapshot clones (#3155)
* Clone from snapshot: fix volume/access mode inferring for temp host assisted source PVC

Sometimes with snapshot cloning we have to fall back to host assisted.
To do this, we create a temporary restore from the snapshot and initiate a host assisted clone
from that -> target PVC.
The issue this commit fixes is that we set the wrong access/volume modes on this temporary restore PVC
(we set it to the target's).

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Annotate dataimportcron-created snapshot with their source volume mode

This would then get capitalized on by host assisted fallbacks from snapshot cloning

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

---------

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2024-04-05 15:29:50 +02:00
Lion
ae208e8710
Fix that the Resources configuration (#3157)
Fix that the Resources configuration
is missing the configuration of the initcontainer container

Signed-off-by: lion <mzzgaopeng@gmail.com>
Co-authored-by: gaopeng <mzzgaopeng@gmailc.om>
2024-04-01 22:34:34 +02:00
Felix Matouschek
3239d50b4e
cleanup: Return modified labels added from termination message (#3147)
Instead of changing the labels map that is passed in as parameter to the
setLabelsFromTerminationMessage function, it now returns a modified copy
of the passed in labels map and is renamed to
addLabelsFromTerminationMessage. If the passed in map is nil a new map
will be allocated and initialized.

Signed-off-by: Felix Matouschek <fmatouschek@redhat.com>
2024-03-27 07:33:28 +01:00
alromeros
6a9a39c014
Improve readability of upload pod's creation code (#3151)
Signed-off-by: Alvaro Romero <alromero@redhat.com>
2024-03-27 02:43:29 +01:00
alromeros
1a88a51887
Improve readability of importer pod's creation code (#3118)
Signed-off-by: Alvaro Romero <alromero@redhat.com>
2024-03-24 14:33:18 +01:00
Felix Matouschek
ca8085a25c
feat(import): Set PVC labels derived from env vars on containerdisks (#3103)
* feat(cdi-containerimage-server): Add info endpoint

The info endpoint returns a ServerInfo object containing all
environment variables of the server serialized to json. This allows the
extraction of env vars from a containerdisk when using pullMethod node.

Signed-off-by: Felix Matouschek <fmatouschek@redhat.com>

* feat(importer): Add conversion of env vars to label

This adds the conversion of env vars containing KUBEVIRT_IO_ to a label
key/value pair.

Example: TEST_KUBEVIRT_IO_TEST=testvalue becomes test.kubevirt.io/test:
testvalue.

Signed-off-by: Felix Matouschek <fmatouschek@redhat.com>

* feat(importer): Extract labels from registry datasource

This allows the registry-datasource to return a termination message with
labels extracted from the env vars of a source containerdisk when using
pullMethod pod.

Signed-off-by: Felix Matouschek <fmatouschek@redhat.com>

* feat(importer): Extract labels from http datasource

This allows the http-datasource to return a termination message with
labels extracted from the env vars of a source containerdisk when using
pullMethod node.

Signed-off-by: Felix Matouschek <fmatouschek@redhat.com>

* feat(controller): Set PVC labels from importer termination message

With this change the import-controller is able set labels on destination
PVCs returned from the importer in its termination message.

Signed-off-by: Felix Matouschek <fmatouschek@redhat.com>

* tests: Add tests for conversion of containerdisk env vars to PVC labels

This adds tests for the conversion of containerdisk env vars to PVC
labels for both pullMethods pod and node.

Signed-off-by: Felix Matouschek <fmatouschek@redhat.com>

* fix: Fix race in import-populator

By running reconcileTargetPVC of populatorController on every reconcile
cycle, the import-populator controller is able to retry seting labels and
annotations on the target PVC when import-controller modified the target
PVC at the same time.

Signed-off-by: Felix Matouschek <fmatouschek@redhat.com>

---------

Signed-off-by: Felix Matouschek <fmatouschek@redhat.com>
2024-03-21 23:13:16 +01:00
Michael Henriksen
602ef71263
cdi.kubevirt.io/allowClaimAdoption annotation breaking "regular" operations (#3136)
Have to better track when a PVC was created for a particular DV (populate it) or createed for someone else (adopt it)

Fixes:  https://issues.redhat.com/browse/CNV-39618

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2024-03-21 02:57:15 +01:00
alromeros
544c57af1b
Improve handling of bound temp PVC in clone populator (#3126)
There is a short window of time where a bound PVC can have a unpopulated "capacity" field in its status.

The clone-populator handles this case by returning an error. This approach is overkill as the issue is usually fixed instantly.

This commit improves the handling of this case and adds a unit tests.

Signed-off-by: Alvaro Romero <alromero@redhat.com>
2024-03-20 21:01:15 +01:00
Arnon Gilboa
dabbcbed16
Get no-provisioner storage capabilities from PVs (#3128)
It's time to get the storage capabilities for all no-provisioner storage
classes based on the existing PVs, and not only for those labeled with
`local.storage.openshift.io/owner-name`, as done so far.
It will also silence the unnecessary CDIStorageProfilesIncomplete alert
for local SC in the CI.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2024-03-18 22:21:14 +01:00
Felix Matouschek
cd7ee9e9b3
feat: Make importer datasource communication explicit (#3101)
Make the communication of datasources in the importer explicit by adding
a GetTerminationMessage method to the DataSourceInterface.
Then use this method to communicate additional information to the import
controller once the importer pod has terminated, instead of writing
additional data to the termination message in the Close method of
datasources.

Signed-off-by: Felix Matouschek <fmatouschek@redhat.com>
2024-03-13 22:09:23 +01:00
alromeros
7fbe1c3d13
Avoid race condition during importer termination (#3116)
* Avoid race condition during importer termination by returning 0 exitCode when scratch space is required

The restart policy on failure along with manual pod deletion caused some issues after the importer exited with scratch space needed.

This commit sets the exit code to 0 when exiting for scratch space required so we manually delete the pod and avoid the described race condition.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Adapt functional test to work with faster importer pod recovery

Test [test_id:1990] relied on the assumption that deleting the file from an http server would always cause the DV to restart.

The old scratch space required mechanism always caused restarts on the DV, masking some false positives: This doesn't happen in all cases since the polling from the server can keep retrying without failing if the file is restored fast enough.

This commit adapts the test to work with faster importer recoveries and adds a md5sum check to make sure the imports ends up being succesfull despite removing the file.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

---------

Signed-off-by: Alvaro Romero <alromero@redhat.com>
2024-03-12 11:29:16 +01:00
Alex Kalenyuk
714d6a5517
Replace cron expression library with one used in kubernetes (#3127)
The library we use https://github.com/gorhill/cronexpr is archived for a while now
and has also started to cause errors in FOSSA:
https://storage.googleapis.com/kubevirt-prow/pr-logs/pull/kubevirt_containerized-data-importer/3116/pull-containerized-data-importer-fossa/1766767859234508800/build-log.txt

Let's use an active one which is being used in kubernetes/kubernetes as well.

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2024-03-11 16:25:16 +01:00
Arnon Gilboa
56a7eaeebc
Rename PVC webhook rendering label to applyStorageProfile (#3124)
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2024-03-07 08:42:43 +01:00
Arnon Gilboa
adc4abacaf
Watch DIC-orphan cronjobs and cleanup them (#3106)
* Watch DIC-orphan cronjobs and cleanup them

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2024-03-01 03:17:01 +01:00
Michael Henriksen
0131119b01
Check cdi.kubevirt.io/allowClaimAdoption on DataVolume rather than PVC (#3105)
This allows for better backwards compatibality with DataVolumeTemplates

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2024-02-29 21:37:02 +01:00
Aviv Litman
42ec627e35
Refactor recording-rules and alerts code (#3068)
* Refactor recording-rules and alerts code

Signed-off-by: avlitman <alitman@redhat.com>

* Remove promv1 from schema

Signed-off-by: avlitman <alitman@redhat.com>

---------

Signed-off-by: avlitman <alitman@redhat.com>
2024-02-18 16:05:42 +01:00
Arnon Gilboa
221469d062
Add PVC spec mutating webhook rendering based on StorageProfiles (#2813)
* Add PVC mutating webhook using StorageProfiles

The webhook mutates the PVC Spec based on the available StorageProfiles,
so for example you can create PVC without accessModes and it will be
auto-completed.

To use this feature, enable the `WebhookPvcRendering` feature gate.

For any PVC you want to use StorageProfile, label it with:
    cdi.kubevirt.io/useStorageProfile: "true"

If you want to use volumeMode preferred by CDI according to
StorageProfiles, set it to FromStorageProfile. Otherwise if not
explicitly set to Block, it will be Filesystem by k8s default.

E.g.:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-test
  labels:
    cdi.kubevirt.io/useStorageProfile: "true"
spec:
  storageClassName: rook-ceph-block
  volumeMode: FromStorageProfile
  resources:
    requests:
      storage: 1Mi

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Move webhook create/delete to callback

plus some CR fixes and cleanups

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Move webhook CR creation to sit with callbacks

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Update existing webhook if modified

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Eliminate unnecessary CR update

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2024-02-12 15:26:28 +01:00
Arnon Gilboa
cdc266d3ac
Fix StorageProfile ClaimPropertySets validation (#3096)
* Fix StorageProfile ClaimPropertySets validation

We have no reason to allow setting StorageProfile ClaimPropertySet
missing either volume mode or access modes.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Use CRD validation for ClaimPropertySet

Require both accessModes and volumeMode

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add validation rule for ClaimPropertySet AccessMode

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2024-02-08 23:13:02 +01:00
Yaroslav Borbat
d04225a00b
Introduce customizeComponents option (#3070)
* init

Signed-off-by: Yaroslav Borbat <yaroslav.borbat@flant.com>

* add e2e

Signed-off-by: Yaroslav Borbat <yaroslav.borbat@flant.com>

* fix unit tests

Signed-off-by: Yaroslav Borbat <yaroslav.borbat@flant.com>

* fix matchselector for cdi-deployment

Signed-off-by: Yaroslav Borbat <yaroslav.borbat@flant.com>

* rebase

Signed-off-by: Yaroslav Borbat <yaroslav.borbat@flant.com>

* refactor test of customizeComponents

Signed-off-by: Yaroslav Borbat <yaroslav.borbat@flant.com>

---------

Signed-off-by: Yaroslav Borbat <yaroslav.borbat@flant.com>
2024-01-28 20:57:39 +01:00
Michael Henriksen
bddba645a8
GetActiveCDI() should return success if single resource in error state (#3080)
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2024-01-25 00:16:06 +01:00
Alex Kalenyuk
31d12e426e
update k8s & related libraries to 1.28 (#3078)
* Bump k8s/OpenShift/ctrl-runtime/lifecycle-sdk & make deps-update

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Operator: adapt for dependency bump

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Controller: adapt watch calls for dependency bump

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Controller: adapt to ctrl-runtime's cache API changes

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Operator: fix unit tests by deleting resources properly in fake client

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Controller: fix unit tests by deleting resources properly in fake client

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Controller: adapt to fake client honoring status subresource

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Fix codegen script & make generate

There are some issues in the new script, so we
will still use the deprecated one.
More context in f4d1a5431b

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Functests: Adapt to NamespacedName now implementing MarshalLog

ns/name -> {"name":"name","namespace":"ns"}

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Functests & API server: address deprecation of wait.PollImmediate

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

---------

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
2024-01-23 17:52:05 +01:00
Arnon Gilboa
25dcbab235
Fix DataImportCron import DataVolume creation when last import is not found (#3072)
* Fix DIC DV creation when last import is not found

The `imports` local variable was not updated correctly, and later
referenced, so another reconcile was required for the DV to be created.

Fixes CNV-33973

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Fix DV controller Watch comment

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2024-01-19 10:44:02 +01:00
Michael Henriksen
b86d3dd3a3
Feature Gate and annotation for PVCs to be "adopted" by DataVolumes (#3029)
* apiserver support for cdi.kubevirt.io/allowClaimAdoption

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* controller support for cdi.kubevirt.io/allowClaimAdoption on existing PVC

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* if cdi.kubevirt.io/allowClaimAdoption specified on DataVolume do not apply on PVC until DV is succeeded

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* add DataVolumeClaimAdoption featuregate and integrate with apiserver and controller

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* functional tests for claim adoption

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* Fix func test failure and address some review comments

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* rename pvcRequiresNoWork to pvcRequiresWork

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* allow unbound PVC to be adopted

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

---------

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2024-01-17 19:53:59 +01:00
Shelly Kagan
40a382dd0e
Fix populators not repopulating pvc after it was deleted (#3056)
* Add tests to verify pvc is recreated and repopulated after delete

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Create import/upload source CR in case population pod is not succeeded

In case someone deleted the pvc a new pvc is created and the
status of the dv is still succeded. when using populators
in order for the population to restart and the dv state to change accordingly
we need to retrigger the population, this depends on the existing of the source CR.
That CR should exist as long as the pvc pod state is not succeeded.
When the pvc is recreated the succeded pod state annotation was removed.
Adjusted UT accordingly

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Adjust tests after review

Move cloner test to a parallel test suite.
Simplify import test.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

---------

Signed-off-by: Shelly Kagan <skagan@redhat.com>
2024-01-17 03:51:58 +01:00
Michael Henriksen
61368754b1
add annotation cdi.kubevirt.io/garbageCollected to PVCs when DVs are garbage collected (#3059)
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
2024-01-14 12:11:57 +01:00
Arnon Gilboa
34860865bf
Fix DataImportCron PVC GC race and test flakiness (#3057)
PVC should be timestamped in creation and not only upon import
completion, as it might be mistakenly GCed. LRU sort will choose
PVC with empty timestamp as the first candidate for deletion.
The PVC will be recreated by the controller and eventually
timestamped, so this bug was hidden for a while.

Fixes CNV-36896

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2024-01-12 02:54:40 +01:00
Aviv Litman
3bb70209d0
Refactor monitoring code (#3009)
* refactor monitoring

Signed-off-by: avlitman <alitman@redhat.com>

* Upgrade pointer to pnt

Signed-off-by: avlitman <alitman@redhat.com>

* fix controller base and ready gague

Signed-off-by: avlitman <alitman@redhat.com>

---------

Signed-off-by: avlitman <alitman@redhat.com>
2024-01-02 09:17:18 +01:00
Arnon Gilboa
edda5abe0f
Add new Prometheus alerts and label existing alerts (#2998)
* Add Prometheus alerts and label existing alerts

- CDINoDefaultStorageClass - not having a default (or virt default)
SC is surely not an OpenShift error, as admins may prefer their cluster
users to only use explicit SC names. However, in the CDI context when
DV is created with default SC but default does not exist, we will fire
an error event and the PVC will be Pending for the default SC, so when
there are such Pending PVCs we will fire an alert.

- CDIDefaultStorageClassDegraded - when the default (or virt default)
SC does not support CSI/Snapshot clone (smart clone) or does not have
ReadWriteMany access mode (for live migration).

- CDIStorageProfilesIncomplete - add storageClass and provisioner
labels.

- CDIDataImportCronOutdated - add dataImportCron namespace and name
labels.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Create stub VolumeSnapshotClass for testing

Including the VolumeSnapshot/Class/Content crds for the
CDIDefaultStorageClassDegraded alert func test.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add snapshot manifests for tests

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Deploy snapshot CRDs in the hpp destructive lane

Remove stub snapshot CRDs

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Add label explanation to new metric help

Also rename the metric kubevirt_cdi_storageprofile_status to
kubevirt_cdi_storageprofile_info since it always reports value 1,
where the label values provide the details about the storage
class and storage profile.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Revert NoProvisioner check removal

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* CR fixes

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Nicify StorageProfile metric update

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
2023-12-19 12:29:08 +01:00
Alexander Wels
2787ee06d4
Prevent upload populator from hot looping prime annotation (#2985)
This hot loop was causing the upload populator to constantly
add and remove the annotation causing the node controller to
always be outdated and never update the pvc to be bound causing
the upload test to fail randomly.

Signed-off-by: Alexander Wels <awels@redhat.com>
2023-11-17 01:27:21 +01:00
Ben Coxford
3a57e0ebf2
Add support for linkerd injection annotations on datavolumes (#2954)
* add support for linkerd annotations on datavolumes

Signed-off-by: bcoxford <ben.coxford@ncr.com>

* fix default value linkerd injection annotation

Signed-off-by: bcoxford <ben.coxford@ncr.com>

* set default value for linkerd injection to disabled

Signed-off-by: bcoxford <ben.coxford@ncr.com>

* resolve linting on mesh annotation hinting

Signed-off-by: bcoxford <ben.coxford@ncr.com>

* fix linting on AnnPodSidecarInjectionLinkerdDefault hints

Signed-off-by: bcoxford <ben.coxford@ncr.com>

* add note to sidecar unit tests

Signed-off-by: bcoxford <ben.coxford@ncr.com>

---------

Signed-off-by: bcoxford <ben.coxford@ncr.com>
2023-11-16 19:23:21 +01:00