[14:06:57] btullis: brouberol: can one of you help me take a look? I'm trying to create a PVC on aux-k8s-eqiad and it's not getting provisioned [14:07:34] E0213 14:07:12.130351 1 controller.go:957] error syncing claim "ca19af07-5732-47ca-b7b1-1920b3f31e0f": failed to provision volume with StorageClass "ceph-rbd-ssd": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ca19af07-5732-47ca-b7b1-1920b3f31e0f already exists [14:07:45] that's from the csi-provisoner container [14:08:08] I can try to have a look [14:08:28] I think the initial volume creation RPC sent by csi-rbdplugin might still be pending? [14:09:00] indeed [14:09:00] root@deploy2002:~# kubectl get pvc -A [14:09:00] NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE [14:09:00] kube-system cdanis-test-pvc Pending ceph-rbd-ssd 3m55s [14:09:21] mhm [14:09:49] I0213 14:05:08.459539 1 utils.go:195] ID: 35 Req-ID: pvc-ca19af07-5732-47ca-b7b1-1920b3f31e0f GRPC call: /csi.v1.Controller/CreateVolume [14:09:51] I0213 14:05:08.460597 1 utils.go:206] ID: 35 Req-ID: pvc-ca19af07-5732-47ca-b7b1-1920b3f31e0f GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-ca19af07-5732-47ca [14:09:53] -b7b1-1920b3f31e0f","parameters":{"clusterID":"6d4278e1-ea45-4d29-86fe-85b44c150813","csi.storage.k8s.io/pv/name":"pvc-ca19af07-5732-47ca-b7b1-1920b3f31e0f","csi.storage.k8s.io/pvc/name":"cdan [14:09:55] is-test-pvc","csi.storage.k8s.io/pvc/namespace":"kube-system","imageFeatures":"layering","pool":"aux-k8s-csi-rbd-ssd"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount": [14:09:57] {"fs_type":"ext4"}},"access_mode":{"mode":1}}]} [14:09:59] I0213 14:05:08.461202 1 rbd_util.go:1279] ID: 35 Req-ID: pvc-ca19af07-5732-47ca-b7b1-1920b3f31e0f setting disableInUseChecks: false image features: [layering] mounter: rbd [14:10:07] that looks right .. but there's never any followup for `ID: 35` in the logs of csi-rbdplugin [14:10:17] I had a look at the storage class, and it's using what looks like a good pool: `aux-k8s-csi-rbd-ssd` [14:10:54] brouberol@cephosd1001:~$ sudo ceph osd pool ls | grep aux-k8s-csi-rbd-ssd [14:10:54] aux-k8s-csi-rbd-ssd [14:11:55] ``` [14:11:55] E0213 14:06:08.458227 1 controller.go:957] error syncing claim "ca19af07-5732-47ca-b7b1-1920b3f31e0f": failed to provision volume with StorageClass "ceph-rbd-ssd": rpc error: code = DeadlineExceeded desc = context deadline exceeded [14:11:55] ``` [14:12:20] could it be that we're simply missing an egress rule between aux-k8s-eqiad and ceph ? [14:12:27] totally possible [14:12:35] or missing ferm rules on the ceph hosts [14:12:54] and it fits my rules of usual suspects: 1) DNS 2) iptables/firewalls 3) brouberol [14:17:28] I think this is the issue: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/9d3586d468918ad317b3ba5af20f06c78b7c32d9/modules/profile/manifests/ceph/server/firewall.pp#35 [14:17:57] each src_sets should include AUX_KUBEPODS_NETWORKS [14:18:16] brouberol: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1119504 [14:18:18] :D [14:18:26] had just gotten there [14:18:43] well dang [14:19:11] could you add `Hosts: cephosd1001.eqiad.wmnet` so we could run pcc? [14:21:50] +1ed [14:22:00] thanks! [14:25:21] will you run puppet on cephosd once the puppet patch is merged or should I? [14:31:29] eey [14:31:29] I0213 14:27:40.453664 1 event.go:364] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kube-system", Name:"cdanis-test-pvc", UID:"ca19af07-5732-47ca-b7b1-1920b3f31e0f", APIVersion:"v1", ResourceVersion:"234214696", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume [14:31:29] pvc-ca19af07-5732-47ca-b7b1-1920b3f31e0f [14:32:47] :D [14:33:38] welcome to the realm of state [14:33:39] thanks again for the help <3 [14:33:45] my/our pleasure! [14:33:59] ("our" because I include btullis, not because of a split brain) [16:59:59] Ah, great! Sorry for being late to the party. Glad you got it sorted.