[06:36:33] 10serviceops, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Patch-For-Review, 10User-jijiki: Maps 2.0 roll-out plan - https://phabricator.wikimedia.org/T280767 (10Jgiannelos) [06:36:54] 10serviceops, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Patch-For-Review, 10User-jijiki: Maps 2.0 roll-out plan - https://phabricator.wikimedia.org/T280767 (10Jgiannelos) [09:19:33] 10serviceops, 10MW-on-K8s: On the kube-experimental mwdebug cluster, MediaWiki sees all edits as coming from localhost - https://phabricator.wikimedia.org/T297613 (10Joe) With my last changes, I'm now able to correctly see the page, and REMOTE_ADDR is not set to localhost in either of the following situations:... [09:24:01] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by elukey@cumin1001 for host kafka-main2002.codfw.wmnet with OS buster [09:24:25] reimage of kafka-main2002 started [09:34:29] there is a "local" change in /helmfile.d/services/mwdebug/values.yaml on deploy1001 which blocks automatic sync of helm charts (git_pull_charts.service). Is this needed/intended? [09:38:31] I think that it was probably a quick test, let's checkout it to restore the pull charts workflow [09:38:53] maybe safe the diff somewhere [09:39:03] (just if anybody wants it later on) [09:40:10] Ok I save the diff (its only one line with 127.0.0.1/32 as RemoteIPInternalProxy) and restore the original state [09:40:49] +1 [09:42:48] pull worked again and I see my expected changes in helmfile diff [09:42:49] Dec 17 09:42:03 deploy1002 systemd[1]: git_pull_charts.service: Succeeded. [09:54:49] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by elukey@cumin1001 for host kafka-main2002.codfw.wmnet with OS buster completed: - kafka-main2002 (**PASS**) - Downtimed on Ici... [09:57:22] kafka-main2002 on buster and recovering [10:07:45] 10serviceops, 10Release Pipeline, 10Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): PipelineLib deploy is broken and needs refactoring to use helm3 - https://phabricator.wikimedia.org/T297809 (10Jelto) I merged the changes to fix permissions issues with `jenkins` user and hit `rebuild`.... [10:23:30] 2002 recovered, I'll do 2001 this afternnon [10:23:33] *afternoon [11:02:56] 10serviceops, 10MW-on-K8s: On the kube-experimental mwdebug cluster, MediaWiki sees all edits as coming from localhost - https://phabricator.wikimedia.org/T297613 (10Joe) 05Open→03Resolved [12:34:55] _joe_: are you seeking for more reviews on https://gerrit.wikimedia.org/r/c/operations/puppet/+/748101 or just forget that I don't have +2? :-P [12:35:10] <_joe_> ahah the former ofc [12:35:14] <_joe_> err the latter [12:35:35] <_joe_> sorry I'm used to only having to give +1 to patches from others, I'll merge it [14:20:10] going to reimage kafka-main2001 (last one in codfw) [14:22:08] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by elukey@cumin1001 for host kafka-main2001.codfw.wmnet with OS buster [14:53:01] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by elukey@cumin1001 for host kafka-main2001.codfw.wmnet with OS buster completed: - kafka-main2001 (**PASS**) - Downtimed on Ici... [15:10:19] kafka main codfw on buster \o/ [15:26:34] yay \o/ [16:02:00] 10serviceops, 10Observability-Metrics, 10SRE-swift-storage: thanos-be hosts filing up root filesystem with logs - https://phabricator.wikimedia.org/T297959 (10fgiunchedi) [16:02:03] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10elukey) Kafka main codfw on buster! Next steps: - Rollout the fixed uid/gid change to Kafka main eqiad - Upgrade BIOS+NIC on kafka-main100[1-3] - Reimage the nodes to Buster [16:41:19] 10serviceops, 10Observability-Metrics, 10SRE-swift-storage, 10Patch-For-Review: thanos-be hosts filing up root filesystem with logs - https://phabricator.wikimedia.org/T297959 (10fgiunchedi) I've bandaided the immediate issue, leaving the task open since we haven't addressed the high volume of logs [16:53:49] _joe_, you around? [16:54:07] <_joe_> subbu: sure [16:54:15] <_joe_> but, barely :) [16:54:29] 10serviceops: Deploy improved title-parsing code - https://phabricator.wikimedia.org/T297962 (10ssastry) [16:54:38] ^ that is for you. [16:55:35] * _joe_ checks date [16:56:07] <_joe_> it's not april 1st, so we're actually rewriting mediawiki in rust finally [16:56:11] <_joe_> legoktm: ^^ [16:56:20] hm? [16:56:38] <_joe_> see above [16:57:52] 🚀🚀🚀 [16:58:34] :) [16:58:35] 10serviceops, 10Prod-Kubernetes, 10Toolhub, 10Kubernetes: Maintenance environment needed for running one-off commands - https://phabricator.wikimedia.org/T290357 (10bd808) [16:58:38] _joe_: so will you be able to deploy it today? [16:58:43] 10serviceops, 10Prod-Kubernetes, 10Toolhub, 10Kubernetes: Maintenance environment needed for running one-off commands - https://phabricator.wikimedia.org/T290357 (10bd808) [16:58:45] <_joe_> legoktm: yesterday! [16:59:20] 10serviceops, 10Prod-Kubernetes, 10Toolhub, 10Kubernetes: Maintenance environment needed for running one-off commands - https://phabricator.wikimedia.org/T290357 (10bd808) [16:59:39] <_joe_> subbu: seriously though, do you actually plan to make a microservice in rust to parse titles? [17:00:13] lolol [17:00:41] well, you know that i was considering rust seriously when porting parsoid from node.js [17:01:23] but, no, not right now. :) [17:01:35] well the upside is, it would be completely immune to all security faults and bugs. [17:01:41] because Urst :) [17:01:43] I hope the "since it's written in Rust, it can't have any bugs" line sold the joke [17:01:44] *Rust [17:05:18] I looked at rust for a bit in https://phabricator.wikimedia.org/T204595#4612789 .. but the only dom library there was was the one with servo and it was large and perf was so so then. [17:05:32] not sure what the status is these days. [17:06:44] bblack: https://i.redd.it/xye8vgi3t6w71.jpg [17:07:37] * subbu hopes legoktm doesn't actually believe that :) [17:08:28] * bblack thinks AI will replace software engineers and write perfect code in C for us, removing the need for Rust :) [17:08:36] I think https://i.redd.it/8a89n02l45n51.jpg is more applicable to me [17:09:06] :) [17:12:06] but overall, if getting frustrated at the compiler gets you fewer bugs at runtime, that is probably a worthwhile tradeoff. [17:13:00] https://i.redd.it/ougt3itg7m281.jpg [17:13:54] 10serviceops: Deploy improved title-parsing code - https://phabricator.wikimedia.org/T297962 (10Legoktm) 05Open→03Invalid Maybe next year ;-) [17:14:50] <_joe_> I don't dislike rust, apart from the syntax, the concurrency model, the compiler, the packaging system [17:15:00] <_joe_> it's a great project otherwise [17:15:10] <_joe_> (I'm joking, cargo is actually amazing) [17:15:14] don't forget the toxic community /s [17:15:28] <_joe_> well the community is actually toxic it seems :P [17:15:44] <_joe_> or, the core team maybe more than the community [17:15:55] :v [17:16:06] also, the real benchmarks for title parsing in Rust vs PHP is at https://phabricator.wikimedia.org/P18116#92451 [17:17:16] the 3-4x speedup is a microbenchmark, estimating from flamegraphs it would probably be a 0.5% speedup overall. Not really worth it (IMO) to figure out how to deploy a Rust+PHP extension just for that [17:17:16] i don't know much about the concurrency model ... I got sold on the low memory defects promise of Rust without GC and without perf hits ... [17:17:50] but, did i miss something about the rust community? i thought it was know for welcoming newcomers, etc ... [17:18:15] there's just a bit of drama right now [17:18:22] https://github.com/rust-lang/team/pull/671 [17:19:08] https://blog.rust-lang.org/inside-rust/2021/11/25/in-response-to-the-moderation-team-resignation.html [17:19:32] <_joe_> legoktm: I would rather use rust in a microservice to do parts of parsing that are particularly inefficient [17:19:51] <_joe_> "a bit" :) [17:20:12] the latest update is https://twitter.com/burntsushi5/status/1468594170038296597, so seems like things are going in the right direction [17:20:17] <_joe_> imagine if our CoCC resigned en masse because some senior engineers at the wmf refused to comply with it :) [17:20:39] <_joe_> legoktm: oh that's great to see [17:20:53] <_joe_> it's not even the first drama that hits that community. [17:23:19] yeah, title parsing is really not the place to start [17:23:20] well, conflict and drama is inevitable .... i am suspicious of large groups that pretend otherwise ... i am more interested in how they navigate them when the conflicts and dramas land [17:23:52] I mostly wrote the PHP bindings so I could plug it into the MW test suite, once we fixed all those bugs, I thought I might as well run a benchmark for fun [17:25:22] and i suppose what institutional structures and processes are in place as well. [17:25:29] <_joe_> subbu: hear hear re: large groups pretending otherwise ;) [17:26:25] i guess it is _joe_'s turn to troll legoktm :) [17:27:17] :P but I agree too [17:27:32] <_joe_> subbu: yeah I'll just rollback to mailman 2 on january 1st [17:27:53] I would die inside [17:28:26] it's been fun participating in this mini "prank" ... but, time for a non-work phone call ... ttyl. [17:29:16] <_joe_> eheh thank you subbu it was appreciated :D [17:29:59] :D [20:19:11] 10serviceops, 10Parsoid: Compare Parsoid perf on current production servers vs a newer test server - https://phabricator.wikimedia.org/T297259 (10Legoktm) I've posted all the raw data and images at https://people.wikimedia.org/~legoktm/T297259/data/, still digging into it. From a quick eyeball of the graphs I... [22:10:08] 10serviceops, 10Release Pipeline, 10Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): PipelineLib deploy is broken and needs refactoring to use helm3 - https://phabricator.wikimedia.org/T297809 (10dduvall) >>! In T297809#7577276, @Jelto wrote: > I merged the changes to fix permissions issu... [22:11:09] 10serviceops, 10Release Pipeline, 10Release-Engineering-Team (Priority Backlog 📥): PipelineLib deploy is broken and needs refactoring to use helm3 - https://phabricator.wikimedia.org/T297809 (10dduvall) 05Open→03Resolved a:03dduvall [23:33:31] 10serviceops, 10Prod-Kubernetes, 10Toolhub, 10Kubernetes: Maintenance environment needed for running one-off commands - https://phabricator.wikimedia.org/T290357 (10bd808) I have a temporary solution that I'm going to document here, but I would really, really like something less gross. My hack is running t...