[07:31:20] <_joe_> chaomodus: when you ported tcpircbot to python3, you forgot to change the monitoring accordingly [07:32:24] <_joe_> btw, now that everything is a systemd unit, we should really stop monitoring via grepping the output of ps [07:32:50] <_joe_> the alert is "PROCS CRITICAL: 0 processes with command name 'python', args 'tcpircbot.py'" because now it uses python3 [10:38:04] trying to run the tests for operations-cookbooks seems to take a long long time for pip to resolve and download dependencies for the first time, is anyone experiencing that issue? (getting 'this is taking longer than usual. you might need to provide the dependency resolver with stricter constraints...') [10:38:29] dcaro: pip has been having Issues recently, might be related [10:39:22] dcaro: I saw it yesterday but didn't had yet time to look into it, I'm on clinic duty this week. I'll try to look later today [10:39:47] it happens on wmflib too, and that one has pretty simple dependencies [10:39:47] ack, I'll let you know if I work around it :) [10:40:20] ack, thx [12:18:49] the SRE session slot is open next week, do we have any takers? [12:19:10] kormat, marostegui I see we have you two on the list for orchestrator [12:19:19] is that something you're interested it? [12:19:28] in* [12:21:32] in about 1-2 quarters, yes [12:22:30] lol [12:22:39] whoever added it to the list was being.. overambitious [12:25:42] I mean, up to you really, but I also don't want to set expectations for these sessions to be perfect or presentation of complete products or anything like that [12:26:12] something like "this is what we're trying to do, this is why it's exciting, we're at the early phases and this is our current status" is also totally ok :) [12:28:04] or you can move the s7 master failover scheduled for the 23rd to the session time and do it live with orchestrator :-P [12:43:27] judging from the silence I wasn't that convincing :P [12:43:40] if anyone else is interested for the slot next week, please let me know [13:21:30] paravoid: ah, i was lunching :) we're not yet at the stage when we know how it's going to be exciting yet. lots of integration work and testing to be done. currently it's "maybe it'll do everything?", but without knowing the details, that doesn't seem worth doing a presentation on yet [13:44:08] volans: o/ i had to make an extra patch for samwalton, but i wanted to check with you to make sure it is right [13:44:08] https://gerrit.wikimedia.org/r/c/operations/puppet/+/673270/1/modules/admin/data/data.yaml#b4796 [13:44:22] e.g. that its ok to move a user from ldap_only to real like that [13:45:56] ottomata: I'll leave it for volans to review but yep that's the correct maneuver [13:46:15] also thanks for catching that he was in ldap_only twice! everyone has a least favorite yaml feature and that one is mine [13:46:33] ottomata: looking [13:46:52] "we already have a validation script, can't we just add that to it?" good question! we cannot, because the second entry overwrites the first one [13:47:22] or, we could, but only by overriding part of the yaml parser [13:47:40] which I looked into for httpbb but I think there wasn't a clean hook for it [13:48:38] hahahah rzl is a master of the rhetorical style [13:48:41] yamllint picks up this type of error so we could use that [13:49:11] rzl: https://yamllint.readthedocs.io/en/stable/rules.html?highlight=duplicated#module-yamllint.rules.key_duplicates [13:49:34] I think also ruyaml allows to detect it IIRC [13:49:52] ruyamlorsomethingelse [13:54:07] ottomata: from the task it seems that k6s is needed too [13:54:15] if that's the case the patch is missing its setting [13:54:17] commented [14:02:48] thanks volans, commented too [14:02:52] checked with luca, we don't need it in this case [14:02:56] only if the user had ssh too [14:03:40] ok thx [14:04:42] jbond42, volans: hm, neat [14:04:54] also haha k6s, the budget k8s [14:06:51] rzl: the modern k8s ;) [14:08:38] mw-on-k6s might be easier, we should at least check it out [14:09:37] easier by 25% right? [14:10:28] ottomata: +1ed (in case you missed the above) [14:29:06] volans: thank you [15:20:15] rzl: https://integration.wikimedia.org/ci/job/operations-puppet-tests-buster-docker/22277/console [15:20:49] nice, tx [15:21:19] jbond42: oh, brilliant! [15:21:26] (for otheres) i have added basic yaml lint to the admin.yaml CI [16:09:38] <_joe_> jbond42: cool [16:15:55] thx [17:46:36] I volunteered to give a presentation on debugging MediaWiki, if you have specific questions you'd like me to address, let me know :) [17:50:05] +1 no specific questions bu i would be intrested in that [17:50:38] +1 (same thing) [18:04:23] :D [18:05:03] jbond42: do you know if the order of query_facts() is deterministic? (I'm not seeing any unnecessary diffs yet, just curious) [18:07:08] I'll watch that for sure [18:07:56] legoktm: i dont have a definet answer on that however we use it in enough places that i would say it probably is otherwise we would have noticed by now. I think it provides the data as its recived from puppetdb api, so the question is is the puppetdb api deterministic (again im not sure but probably), volans may have an authorative answer [18:10:21] ack, makes sense [18:11:51] fyi legoktm re: profile::kubernetes::node::docker_kubernetes_user_password. in the private repo this is set in the following places [18:11:54] hieradata/role/codfw/kubernetes/staging/worker.yaml [18:11:57] hieradata/role/common/kubernetes/worker.yaml [18:11:59] hieradata/role/common/ml_k8s/worker.yaml [18:12:03] but in production its only in hieradata/role/codfw/kubernetes/staging/worker.yaml [18:12:17] yes, that's intentional for now [18:12:20] the firs list is from labs/private repo [18:12:38] actually it shouldn't be in ml_k8s, let me see why it got set there [18:12:39] ack cool just wanted to checks wasnt sure if hieradata/role/common/kubernetes/worker.yaml was a mistake [18:13:26] we wanted to make sure it actually worked on kubestage* before rolling it out everywhere [18:13:44] legoktm: FYI if you use query_* to get a list of hosts in which you expect the current host to be in [18:13:56] because at first puppet run your host will not yet be there [18:14:05] you might need to do something like: [18:14:09] $cumin_masters = unique(concat(query_nodes('Class[Role::Cluster::Management]'), [$::fqdn])) [18:14:16] (from modules/profile/manifests/cumin/master.pp ) [18:14:32] * jbond42 and this ^^^ is why i have to build puppetdb in the pki cloud project [18:14:33] that ensure it will have the correct set of hosts even on reimage at first puppet run [18:15:03] ah, in this case the registry* hosts are querying the list of k8s nodes, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/672537/7/modules/docker_registry_ha/manifests/web.pp [18:18:44] * volans off for now, might be back later, sorry [19:39:51] I guess we didn't figure out why pip is slow for cookbooks? CI is hung at that step: https://integration.wikimedia.org/ci/job/tox-docker/17702/console [19:40:02] I'll try to debug it after lunch [20:23:15] legoktm: it's the new resolver [20:23:27] I didm't had time yet to look at it [20:23:36] it starts from pywmflib fwiw [20:24:39] ok, I'll start there then [20:27:34] each mypy version it wants to inspect is a 20MB wheel :/ [21:31:20] https://phabricator.wikimedia.org/P14948 [21:31:47] I wish it provided more debug info, looks to be stuck in an infinite loop at gitdb [21:32:03] the fact that it checks every mypy version is also problematic [21:32:19] I'm moving onto something else now