containerized-data-importer/pkg/controller/storageprofile-controller.go
Alexander Wels 4b2c171ecc
Backport main commits to 1.57 release branch (#2764)
* Enable empty schedule in DataImportCron (#2711)

Allow disabling DataImportCron schedule and support external trigger

Signed-off-by: Ido Aharon <iaharon@redhat.com>

* expand upon #2721 (#2731)

Need to replace requeue bool with requeue duration

Signed-off-by: Michael Henriksen <mhenriks@redhat.com>

* Add clone from snapshot functionalities to clone-populator (#2724)

* Add clone from snapshot functionalities to the clone populator

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Update clone populator unit tests to cover clone from snapshot capabilities

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Fix storage class assignation in temp-source claim for host-assisted clone from snapshot

This commit also includes other minor and styling-related fixes

Signed-off-by: Alvaro Romero <alromero@redhat.com>

---------

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Prepare CDI testing for the upcoming non-CSI lane (#2730)

* Update functional tests to skip incompatible default storage classes

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Enable the use of non-csi HPP in testing lanes

This commit modifies several scripts to allow the usage of classic HPP as the default SC in tests.

This allows us to test our non-populator flow with a non-csi provisioner.

Signed-off-by: Alvaro Romero <alromero@redhat.com>

---------

Signed-off-by: Alvaro Romero <alromero@redhat.com>

* Allow snapshots as format for DataImportCron created sources (#2700)

* StorageProfile API for declaring format of resulting cron disk images

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Integrate recommended format in dataimportcron controller

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Take snapclass existence into consideration when populating cloneStrategy and sourceFormat

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

---------

Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>

* Remove leader election test (#2745)

Now that we are using the standard k8s leases from
the controller runtime library, there is no need to
test our implementation as it is no longer in use.
This will save some testing time and random failures.

Signed-off-by: Alexander Wels <awels@redhat.com>

* Integration of Data volume using CDI populators (#2722)

* move cleanup out of dv deletion

It seemed off to call cleanup in the prepare function
just because we don't call cleanup unless the dv is deleting.
Instead we check in the clenup function itself if it should be
done: in this 2 specific cases in case of deletion and in case
the dv succeeded.
The cleanup will be used in future commit also for population cleanup
which we also want to happen not only on deletion.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Use populator if csi storage class exists

Add new datavolume phase PendingPopulation to
indicate wffc when using populators, this new
phase will be used in kubevirt in order to know
that there is no need for dummy pod to pass wffc phase
and that the population will occur once creating the vm.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Update population targetPVC with pvc prime annotations

The annotations will be used to update dv that uses the
populators.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Adjust UT with new behavior

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* updates after review

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Fix import populator report progress

The import pod should be taken from pvcprime

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Prevent requeue upload dv when failing to find progress report pod

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Remove size inflation in populators

The populators are handling existing PVCs.
The PVC already has a defined requested size,
inflating the PVC' with fsoverhead will only be
on the PVC' spec and will not reflect on the target
PVC, this seems undesired.
Instead if the populators is using by PVC that the
datavolume controller created the inflation will happen
there if needed.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Adjust functional tests to handle dvs using populators

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Fix clone test

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* add shouldUpdateProgress variable to know if need to update progress

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Change update of annotation from denied list to allowed list

Instead if checking if the annotation on pvcPrime is not desired
go over desired list and if the annotation exists add it.

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* fix removing annotations from pv when rebinding

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* More fixes and UT

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* a bit more updates and UTs

Signed-off-by: Shelly Kagan <skagan@redhat.com>

---------

Signed-off-by: Shelly Kagan <skagan@redhat.com>

* Run bazelisk run //robots/cmd/uploader:uploader -- -workspace /home/prow/go/src/github.com/kubevirt/project-infra/../containerized-data-importer/WORKSPACE -dry-run=false (#2751)

Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>

* Allow dynamic linked build for non bazel build (#2753)

The current script always passes the static ldflag to the
compiler which will result in a static binary. We would like
to be able to build dynamic libraries instead.

cdi-containerimage-server has to be static because we
are copying it into the context of a container disk container
which is most likely based on a scratch container and has no
libraries for us to use.

Signed-off-by: Alexander Wels <awels@redhat.com>

* Disable DV GC by default (#2754)

* Disable DV GC by default

DataVolume garbage collection is a nice feature, but unfortunately it
violates fundamental principle of Kubernetes. CR should not be
auto-deleted when it completes its role (Job with TTLSecondsAfter-
Finished is an exception), and once CR was created we can assume it is
there until explicitly deleted. In addition, CR should keep idempotency,
so the same CR manifest can be applied multiple times, as long as it is
a valid update (e.g. DataVolume validation webhook does not allow
updating the spec).

When GC is enabled, some systems (e.g GitOps / ArgoCD) may require a
workaround (DV annotation deleteAfterCompletion = "false") to prevent
GC and function correctly.

On the next kubevirt-bot Bump kubevirtci PR (with bump-cdi), it will
fail on all kubevirtci lanes with tests referring DVs, as the tests
IsDataVolumeGC() looks at CDIConfig Spec.DataVolumeTTLSeconds and
assumes default is enabled. This should be fixed there.

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Fix test waiting for PVC deletion with UID

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Fix clone test assuming DV was GCed

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

* Fix DIC controller DV/PVC deletion when snapshot is ready

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Arnon Gilboa <agilboa@redhat.com>

---------

Signed-off-by: Ido Aharon <iaharon@redhat.com>
Signed-off-by: Michael Henriksen <mhenriks@redhat.com>
Signed-off-by: Alvaro Romero <alromero@redhat.com>
Signed-off-by: Alex Kalenyuk <akalenyu@redhat.com>
Signed-off-by: Alexander Wels <awels@redhat.com>
Signed-off-by: Shelly Kagan <skagan@redhat.com>
Signed-off-by: kubevirt-bot <kubevirtbot@redhat.com>
Signed-off-by: Arnon Gilboa <agilboa@redhat.com>
Co-authored-by: Ido Aharon <iaharon@redhat.com>
Co-authored-by: Michael Henriksen <mhenriks@redhat.com>
Co-authored-by: alromeros <alromero@redhat.com>
Co-authored-by: akalenyu <akalenyu@redhat.com>
Co-authored-by: Shelly Kagan <skagan@redhat.com>
Co-authored-by: kubevirt-bot <kubevirtbot@redhat.com>
Co-authored-by: Arnon Gilboa <agilboa@redhat.com>
2023-06-22 00:59:17 +02:00

412 lines
14 KiB
Go

package controller
import (
"context"
"fmt"
"reflect"
"github.com/go-logr/logr"
snapshotv1 "github.com/kubernetes-csi/external-snapshotter/client/v6/apis/volumesnapshot/v1"
"github.com/prometheus/client_golang/prometheus"
v1 "k8s.io/api/core/v1"
storagev1 "k8s.io/api/storage/v1"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/api/meta"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller"
"sigs.k8s.io/controller-runtime/pkg/event"
"sigs.k8s.io/controller-runtime/pkg/handler"
"sigs.k8s.io/controller-runtime/pkg/manager"
"sigs.k8s.io/controller-runtime/pkg/predicate"
"sigs.k8s.io/controller-runtime/pkg/reconcile"
"sigs.k8s.io/controller-runtime/pkg/source"
cdiv1 "kubevirt.io/containerized-data-importer-api/pkg/apis/core/v1beta1"
"kubevirt.io/containerized-data-importer/pkg/common"
cc "kubevirt.io/containerized-data-importer/pkg/controller/common"
"kubevirt.io/containerized-data-importer/pkg/monitoring"
"kubevirt.io/containerized-data-importer/pkg/operator"
"kubevirt.io/containerized-data-importer/pkg/storagecapabilities"
"kubevirt.io/containerized-data-importer/pkg/util"
)
var (
// IncompleteProfileGauge is the metric we use to alert about incomplete storage profiles
IncompleteProfileGauge = prometheus.NewGauge(
prometheus.GaugeOpts{
Name: monitoring.MetricOptsList[monitoring.IncompleteProfile].Name,
Help: monitoring.MetricOptsList[monitoring.IncompleteProfile].Help,
})
)
// StorageProfileReconciler members
type StorageProfileReconciler struct {
client client.Client
// use this for getting any resources not in the install namespace or cluster scope
uncachedClient client.Client
scheme *runtime.Scheme
log logr.Logger
installerLabels map[string]string
}
// Reconcile the reconcile.Reconciler implementation for the StorageProfileReconciler object.
func (r *StorageProfileReconciler) Reconcile(_ context.Context, req reconcile.Request) (reconcile.Result, error) {
log := r.log.WithValues("StorageProfile", req.NamespacedName)
log.Info("reconciling StorageProfile")
storageClass := &storagev1.StorageClass{}
if err := r.client.Get(context.TODO(), req.NamespacedName, storageClass); err != nil {
if k8serrors.IsNotFound(err) {
return reconcile.Result{}, r.deleteStorageProfile(req.NamespacedName.Name, log)
}
return reconcile.Result{}, err
} else if storageClass.GetDeletionTimestamp() != nil {
return reconcile.Result{}, r.deleteStorageProfile(req.NamespacedName.Name, log)
}
if _, err := r.reconcileStorageProfile(storageClass); err != nil {
return reconcile.Result{}, err
}
return reconcile.Result{}, r.checkIncompleteProfiles()
}
func (r *StorageProfileReconciler) reconcileStorageProfile(sc *storagev1.StorageClass) (reconcile.Result, error) {
log := r.log.WithValues("StorageProfile", sc.Name)
storageProfile, prevStorageProfile, err := r.getStorageProfile(sc)
if err != nil {
log.Error(err, "Unable to create StorageProfile")
return reconcile.Result{}, err
}
storageProfile.Status.StorageClass = &sc.Name
storageProfile.Status.Provisioner = &sc.Provisioner
snapClass, err := cc.GetSnapshotClassForSmartClone("", &sc.Name, r.log, r.client)
if err != nil {
return reconcile.Result{}, err
}
storageProfile.Status.CloneStrategy = r.reconcileCloneStrategy(sc, storageProfile.Spec.CloneStrategy, snapClass)
storageProfile.Status.DataImportCronSourceFormat = r.reconcileDataImportCronSourceFormat(sc, storageProfile.Spec.DataImportCronSourceFormat, snapClass)
var claimPropertySets []cdiv1.ClaimPropertySet
if len(storageProfile.Spec.ClaimPropertySets) > 0 {
for _, cps := range storageProfile.Spec.ClaimPropertySets {
if len(cps.AccessModes) == 0 && cps.VolumeMode != nil {
err = fmt.Errorf("must provide access mode for volume mode: %s", *cps.VolumeMode)
log.Error(err, "Unable to update StorageProfile")
return reconcile.Result{}, err
}
}
claimPropertySets = storageProfile.Spec.ClaimPropertySets
} else {
claimPropertySets = r.reconcilePropertySets(sc)
}
storageProfile.Status.ClaimPropertySets = claimPropertySets
util.SetRecommendedLabels(storageProfile, r.installerLabels, "cdi-controller")
if err := r.updateStorageProfile(prevStorageProfile, storageProfile, log); err != nil {
return reconcile.Result{}, err
}
return reconcile.Result{}, nil
}
func (r *StorageProfileReconciler) updateStorageProfile(prevStorageProfile runtime.Object, storageProfile *cdiv1.StorageProfile, log logr.Logger) error {
if prevStorageProfile == nil {
return r.client.Create(context.TODO(), storageProfile)
} else if !reflect.DeepEqual(prevStorageProfile, storageProfile) {
// Updates have happened, update StorageProfile.
log.Info("Updating StorageProfile", "StorageProfile.Name", storageProfile.Name, "storageProfile", storageProfile)
return r.client.Update(context.TODO(), storageProfile)
}
return nil
}
func (r *StorageProfileReconciler) getStorageProfile(sc *storagev1.StorageClass) (*cdiv1.StorageProfile, runtime.Object, error) {
var prevStorageProfile runtime.Object
storageProfile := &cdiv1.StorageProfile{}
if err := r.client.Get(context.TODO(), types.NamespacedName{Name: sc.Name}, storageProfile); err != nil {
if k8serrors.IsNotFound(err) {
storageProfile, err = r.createEmptyStorageProfile(sc)
if err != nil {
return nil, nil, err
}
} else {
return nil, nil, err
}
} else {
prevStorageProfile = storageProfile.DeepCopyObject()
}
return storageProfile, prevStorageProfile, nil
}
func (r *StorageProfileReconciler) reconcilePropertySets(sc *storagev1.StorageClass) []cdiv1.ClaimPropertySet {
claimPropertySets := []cdiv1.ClaimPropertySet{}
capabilities, found := storagecapabilities.GetCapabilities(r.client, sc)
if found {
for i := range capabilities {
claimPropertySet := cdiv1.ClaimPropertySet{
AccessModes: []v1.PersistentVolumeAccessMode{capabilities[i].AccessMode},
VolumeMode: &capabilities[i].VolumeMode,
}
claimPropertySets = append(claimPropertySets, claimPropertySet)
}
}
return claimPropertySets
}
func (r *StorageProfileReconciler) reconcileCloneStrategy(sc *storagev1.StorageClass, desiredCloneStrategy *cdiv1.CDICloneStrategy, snapClass string) *cdiv1.CDICloneStrategy {
if desiredCloneStrategy != nil {
return desiredCloneStrategy
}
if annStrategyVal, ok := sc.Annotations["cdi.kubevirt.io/clone-strategy"]; ok {
return r.getCloneStrategyFromStorageClass(annStrategyVal)
}
// Default to trying snapshot clone unless volume snapshot class missing
hostAssistedStrategy := cdiv1.CloneStrategyHostAssisted
strategy := hostAssistedStrategy
if snapClass != "" {
strategy = cdiv1.CloneStrategySnapshot
}
if knownStrategy, ok := storagecapabilities.GetAdvisedCloneStrategy(sc); ok {
strategy = knownStrategy
}
if strategy == cdiv1.CloneStrategySnapshot && snapClass == "" {
r.log.Info("No VolumeSnapshotClass found for storage class, falling back to host assisted cloning", "StorageClass.Name", sc.Name)
return &hostAssistedStrategy
}
return &strategy
}
func (r *StorageProfileReconciler) getCloneStrategyFromStorageClass(annStrategyVal string) *cdiv1.CDICloneStrategy {
var strategy cdiv1.CDICloneStrategy
switch annStrategyVal {
case "copy":
strategy = cdiv1.CloneStrategyHostAssisted
case "snapshot":
strategy = cdiv1.CloneStrategySnapshot
case "csi-clone":
strategy = cdiv1.CloneStrategyCsiClone
}
return &strategy
}
func (r *StorageProfileReconciler) reconcileDataImportCronSourceFormat(sc *storagev1.StorageClass, desiredFormat *cdiv1.DataImportCronSourceFormat, snapClass string) *cdiv1.DataImportCronSourceFormat {
if desiredFormat != nil {
return desiredFormat
}
// This can be changed later on
// for example, if at some point we're confident snapshot sources should be the default
pvcFormat := cdiv1.DataImportCronSourceFormatPvc
format := pvcFormat
if knownFormat, ok := storagecapabilities.GetAdvisedSourceFormat(sc); ok {
format = knownFormat
}
if format == cdiv1.DataImportCronSourceFormatSnapshot && snapClass == "" {
// No point using snapshots without a corresponding snapshot class
r.log.Info("No VolumeSnapshotClass found for storage class, falling back to pvc sources for DataImportCrons", "StorageClass.Name", sc.Name)
return &pvcFormat
}
return &format
}
func (r *StorageProfileReconciler) createEmptyStorageProfile(sc *storagev1.StorageClass) (*cdiv1.StorageProfile, error) {
storageProfile := MakeEmptyStorageProfileSpec(sc.Name)
util.SetRecommendedLabels(storageProfile, r.installerLabels, "cdi-controller")
// uncachedClient is used to directly get the config map
// the controller runtime client caches objects that are read once, and thus requires a list/watch
// should be cheaper than watching
if err := operator.SetOwnerRuntime(r.uncachedClient, storageProfile); err != nil {
return nil, err
}
return storageProfile, nil
}
func (r *StorageProfileReconciler) deleteStorageProfile(name string, log logr.Logger) error {
log.Info("Cleaning up StorageProfile that corresponds to deleted StorageClass", "StorageClass.Name", name)
storageProfileObj := &cdiv1.StorageProfile{
ObjectMeta: metav1.ObjectMeta{
Name: name,
},
}
if err := r.client.Delete(context.TODO(), storageProfileObj); cc.IgnoreNotFound(err) != nil {
return err
}
return r.checkIncompleteProfiles()
}
func isNoProvisioner(name string, cl client.Client) bool {
storageClass := &storagev1.StorageClass{}
if err := cl.Get(context.TODO(), types.NamespacedName{Name: name}, storageClass); err != nil {
return false
}
return storageClass.Provisioner == "kubernetes.io/no-provisioner"
}
func (r *StorageProfileReconciler) checkIncompleteProfiles() error {
numIncomplete := 0
storageProfileList := &cdiv1.StorageProfileList{}
if err := r.client.List(context.TODO(), storageProfileList); err != nil {
return err
}
for _, profile := range storageProfileList.Items {
if profile.Status.Provisioner == nil {
continue
}
// We don't count explicitly unsupported provisioners as incomplete
_, found := storagecapabilities.UnsupportedProvisioners[*profile.Status.Provisioner]
if !found && isIncomplete(profile.Status.ClaimPropertySets) {
numIncomplete++
}
}
IncompleteProfileGauge.Set(float64(numIncomplete))
return nil
}
// MakeEmptyStorageProfileSpec creates StorageProfile manifest
func MakeEmptyStorageProfileSpec(name string) *cdiv1.StorageProfile {
return &cdiv1.StorageProfile{
TypeMeta: metav1.TypeMeta{
Kind: "StorageProfile",
APIVersion: "cdi.kubevirt.io/v1beta1",
},
ObjectMeta: metav1.ObjectMeta{
Name: name,
Labels: map[string]string{
common.CDILabelKey: common.CDILabelValue,
common.CDIComponentLabel: "",
},
},
}
}
// NewStorageProfileController creates a new instance of the StorageProfile controller.
func NewStorageProfileController(mgr manager.Manager, log logr.Logger, installerLabels map[string]string) (controller.Controller, error) {
uncachedClient, err := client.New(mgr.GetConfig(), client.Options{
Scheme: mgr.GetScheme(),
Mapper: mgr.GetRESTMapper(),
})
if err != nil {
return nil, err
}
reconciler := &StorageProfileReconciler{
client: mgr.GetClient(),
uncachedClient: uncachedClient,
scheme: mgr.GetScheme(),
log: log.WithName("storageprofile-controller"),
installerLabels: installerLabels,
}
storageProfileController, err := controller.New(
"storageprofile-controller",
mgr,
controller.Options{Reconciler: reconciler})
if err != nil {
return nil, err
}
if err := addStorageProfileControllerWatches(mgr, storageProfileController, log); err != nil {
return nil, err
}
log.Info("Initialized StorageProfile controller")
return storageProfileController, nil
}
func addStorageProfileControllerWatches(mgr manager.Manager, c controller.Controller, log logr.Logger) error {
if err := c.Watch(&source.Kind{Type: &storagev1.StorageClass{}}, &handler.EnqueueRequestForObject{}); err != nil {
return err
}
if err := c.Watch(&source.Kind{Type: &cdiv1.StorageProfile{}}, &handler.EnqueueRequestForObject{}); err != nil {
return err
}
if err := c.Watch(&source.Kind{Type: &v1.PersistentVolume{}}, handler.EnqueueRequestsFromMapFunc(
func(obj client.Object) []reconcile.Request {
return []reconcile.Request{{
NamespacedName: types.NamespacedName{Name: scName(obj)},
}}
},
),
predicate.Funcs{
CreateFunc: func(e event.CreateEvent) bool { return isNoProvisioner(scName(e.Object), mgr.GetClient()) },
UpdateFunc: func(e event.UpdateEvent) bool { return isNoProvisioner(scName(e.ObjectNew), mgr.GetClient()) },
DeleteFunc: func(e event.DeleteEvent) bool { return isNoProvisioner(scName(e.Object), mgr.GetClient()) },
}); err != nil {
return err
}
mapSnapshotClassToProfile := func(obj client.Object) (reqs []reconcile.Request) {
var scList storagev1.StorageClassList
if err := mgr.GetClient().List(context.TODO(), &scList); err != nil {
c.GetLogger().Error(err, "Unable to list StorageClasses")
return
}
vsc := obj.(*snapshotv1.VolumeSnapshotClass)
for _, sc := range scList.Items {
if sc.Provisioner == vsc.Driver {
reqs = append(reqs, reconcile.Request{NamespacedName: types.NamespacedName{Name: sc.Name}})
}
}
return
}
if err := mgr.GetClient().List(context.TODO(), &snapshotv1.VolumeSnapshotClassList{}, &client.ListOptions{Limit: 1}); err != nil {
if meta.IsNoMatchError(err) {
// Back out if there's no point to attempt watch
return nil
}
if !cc.IsErrCacheNotStarted(err) {
return err
}
}
if err := c.Watch(&source.Kind{Type: &snapshotv1.VolumeSnapshotClass{}},
handler.EnqueueRequestsFromMapFunc(mapSnapshotClassToProfile),
); err != nil {
return err
}
return nil
}
func scName(obj client.Object) string {
return obj.(*v1.PersistentVolume).Spec.StorageClassName
}
func isIncomplete(sets []cdiv1.ClaimPropertySet) bool {
if len(sets) > 0 {
for _, cps := range sets {
if len(cps.AccessModes) == 0 || cps.VolumeMode == nil {
return true
}
}
} else {
return true
}
return false
}