[00:03:48] PROBLEM - Check systemd state on netflow5001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:04:10] PROBLEM - Check systemd state on netflow3001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:04:26] PROBLEM - Check systemd state on netflow4001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:04:34] PROBLEM - Check systemd state on netflow1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:04:50] PROBLEM - Check systemd state on netflow2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:14:06] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [00:15:54] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [00:28:30] PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 32597776 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:30:18] RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 101304 and 35 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:51:27] (03PS1) 10Krinkle: mediawiki: Change mw alerts to use a 3min moving average [puppet] - 10https://gerrit.wikimedia.org/r/c/operations/puppet/+/608188 [00:51:29] (03PS1) 10Krinkle: mediawiki: Raise fatal alert treshold from 50 to 100 [puppet] - 10https://gerrit.wikimedia.org/r/c/operations/puppet/+/608189 [00:54:36] (03PS2) 10Krinkle: mediawiki: Change mw alerts to use a moving average [puppet] - 10https://gerrit.wikimedia.org/r/c/operations/puppet/+/608188 [00:54:38] (03PS2) 10Krinkle: mediawiki: Raise fatal alert treshold from 50 to 100 [puppet] - 10https://gerrit.wikimedia.org/r/c/operations/puppet/+/608189 [00:55:29] (03CR) 10Krinkle: "I've plotted both versions in Grafana with a threshold line to show how it would behave:" [puppet] - 10https://gerrit.wikimedia.org/r/c/operations/puppet/+/608188 (owner: 10Krinkle) [00:55:41] (03CR) 10Krinkle: "See also https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fops&f" [puppet] - 10https://gerrit.wikimedia.org/r/c/operations/puppet/+/608189 (owner: 10Krinkle) [00:57:12] Nemo_bis: false alarm :) [00:58:10] mutante: I'm getting this on every page: [00:58:11] " Plugin install error: TypeError: self.onAction is not a function from https://gerrit.wikimedia.org/r/plugins/delete-project/static/delete-project.js " [00:58:20] seems to not affect anything important right now [00:58:26] but just pops up every time in the corner [00:58:58] Krinkle: "hard refresh" is what I Was told [00:59:01] But it didn't fix it for me [00:59:12] indeed, same here [00:59:15] it's still there for me hours later [01:01:03] I see quite a bit of console noise too [01:01:51] Are you using safari? [01:02:21] no [01:03:24] hmm [01:04:46] strangeis it chrome you use? [01:04:49] *is [01:04:57] nope [01:10:45] (03CR) 10Krinkle: mediawiki: Raise fatal alert treshold from 50 to 100 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/c/operations/puppet/+/608189 (owner: 10Krinkle) [01:15:44] Reedy which browser? [01:15:52] FF [01:16:06] oh [02:09:26] is there a clean cache button for FF? [02:19:31] (03CR) 10DannyS712: mediawiki: Raise fatal alert treshold from 50 to 100 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/c/operations/puppet/+/608189 (owner: 10Krinkle) [02:58:52] PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 102.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [03:49:11] I think we’re just going to have to append ?v=1 to it (delete-project) [03:49:12] Cc qchris [04:57:30] RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 75.25 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [05:18:20] PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:20:04] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 133, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:36:30] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 135, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:36:36] RECOVERY - Router interfaces on cr4-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200628T0700) [08:34:50] Paladox: I've got to look into that one some more. [09:22:57] Reedy, Krinkle: do you still have the issue about the delete project plugin failing to load? [09:23:22] If so you, could you please load gerrit with the dev tools open and look in the network tab. [09:23:52] Some 6 rows from bottom up, there should be "delete-project.js" [09:24:06] Could you check which response you got there? [09:24:29] Three line from bottom up in that file you should see: [09:24:51] `;class GrDeleteRepo extends Polymer.Element.... [09:25:05] Or you might see two lines from bottom up: [09:25:15] self.onAction('project', 'delete', onDeleteProject); [09:26:23] And in the "Transmitted" (I hope that what it translates to in English) column of the network tab where you normally see the file size, does it show the file size or "From Cache"? [09:28:02] (03PS1) 10Legoktm: codesearch: Add port for analytics search profile [puppet] - 10https://gerrit.wikimedia.org/r/608203 (https://phabricator.wikimedia.org/T249318) [09:29:32] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:31:22] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:42:16] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:44:06] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [09:57:45] qchris: here's what I see https://phabricator.wikimedia.org/P11680 - it says "from cache" [09:58:03] I'm using Firefox, but I already cleared my cache multiple times (ctrl+shift+R) based on wikitech-l [09:58:27] Mhmm.That's the old file. [09:58:40] Thanks legoktm [09:58:52] Could you verify with curl that the webserver gives you the new one? [09:58:56] (It does for me) [09:59:45] I think ctrl+shift+R isn't actually clearing the cache, because I still see "from cache" [09:59:59] Yup. It appears so. [10:00:02] https://support.mozilla.org/en-US/kb/how-clear-firefox-cache [10:00:04] yeah, curl gives me the new one [10:00:22] Ok. So things are ok server-side even for affected browsers. [10:01:00] Could you try the instructions on the above link and see if they solve the issue for you? [10:01:15] aaand now it's good [10:01:24] \o/ [10:01:25] Perfect. [10:02:14] I'll post that onto wikitech and the bug. [10:02:19] Re new gerrit: it doesn't load for me on safari with iOS 12. Every page will load for a while and then crash, same for the upstream gerrit. Shall I file a task upstream or what? Is it known/intended? [10:02:51] Huh. Not sure this is know. [10:03:01] qchris: thanks! [10:03:06] If it fails for upstream Gerrit too, it might be best to file with them directly. [10:03:11] Yeah it does [10:03:14] Daimona: out of curiosity iPad or iPhone? [10:03:16] legoktm: Thanks for debugging with me :-) [10:03:28] It could be due to limited RAM on the device, but well... [10:03:32] legoktm: iPhone [10:04:25] There was already an iOS issue discovered on gerrit-test about a toggle button not working. That one affected upstream too. So I guess maybe they don't test too much on iOS? (Just vaguely vage assumption). But awesome paladox fixed that issue for us. [10:04:36] Daimona: So yes, please report upstream. [10:04:44] ah, don't have access to one of those (normally I can borrow my mom's iPad) [10:05:06] You need to buy your mom an iPhone then :-P [10:05:28] Yeah I remember that issue. I could observe the same crashes on gerrit-test, but I thought it could've been temporary [10:05:44] I'll file a bug upstream, thank you [10:06:56] hah, trying to move her toward more free software, not less :) [10:07:51] qchris: also, just to make sure, gerrit-replica was upgraded too right? [10:08:17] Yes, gerrit-replica got upgraded too. [10:08:34] ssh gerrit-replica.wikimedia.org gerrit version [10:08:34] gerrit version 3.2.2-98-g98d827eaa3 [10:09:25] very few people will ever see it, but https://gerrit-replica.wikimedia.org/r/login is missing the image and probably all static resources [10:10:11] (I only ever log in to -replica to be able to see the monitoring graphs) [10:10:40] I've been told that we don't bother about the replica's web interface, so I only checked fetching through https. [10:10:55] (Nothing else on the webserver) [10:11:31] * qchris adds it on his todo list. [10:12:45] qchris: Should I report at https://bugs.chromium.org/p/gerrit/issues/entry, and what category? [10:12:47] when it starts lagging/OOMing, I have https://gerrit-replica.wikimedia.org/r/monitoring?part=graph&graph=usedMemory&period=mois bookmarked to see what's up - unless there's another place I can see that info [10:13:33] Daimona: I'd pick "PolyGerrit issue" [10:14:07] legoktm: Gotcha. It's of course ok to use that if it's working :-) [10:14:08] Ok, thank you [10:15:49] 10Operations, 10Continuous-Integration-Infrastructure: Add python3.8 to buster-wikimedia pyall component - https://phabricator.wikimedia.org/T241195 (10Legoktm) >>! In T241195#5804364, @MoritzMuehlenhoff wrote: >>>! In T241195#5793136, @faidon wrote: >> I've updated the aforementioned apt repository with 3.8.1... [10:23:10] Argh. On the replica, gerrit thinks gerrit.wikimedia.org is it's normative Url. That of course is wrong and screws up the WebUI. I'll see to uploading a fix for that. [10:23:39] That's why the web UI never worked. (Even before the upgrade) [10:25:21] hah, trying to move her toward more free software, not less :) < Windows Phone > iPhone > Androids *runs away at pace >.>* [10:30:46] qchris: yeah, the web UI never worked, but I assumed that was on purpose [10:39:32] PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 110.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [11:32:40] (03CR) 10Legoktm: [C: 03+1] "I deployed the labs/codesearch change, so this is safe to be merged at any time." [puppet] - 10https://gerrit.wikimedia.org/r/608203 (https://phabricator.wikimedia.org/T249318) (owner: 10Legoktm) [11:45:24] (03PS1) 10Ammarpad: Require editinterface to edit NS_CONFIG [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608212 (https://phabricator.wikimedia.org/T256278) [11:52:39] qchris well they don’t test iOS 12, but if it stopped working someone would have noticed (e.g me) :) [11:54:03] Gotcha :-D [12:16:08] PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 101.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [12:30:44] RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 62.03 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [12:43:06] (03PS1) 10QChris: gerrit: Use gerrit-replica.wikimedia.org as canonical host for gerrit2001 [puppet] - 10https://gerrit.wikimedia.org/r/608214 [12:45:20] PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 114.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [12:49:00] (03CR) 10QChris: "(This is not related to the recent Gerrit upgrade. It was broken already before.)" [puppet] - 10https://gerrit.wikimedia.org/r/608214 (owner: 10QChris) [12:58:25] (03PS2) 10QChris: gerrit: Use `gerrit-replica.wikimedia.org` as canonical host for gerrit2001 [puppet] - 10https://gerrit.wikimedia.org/r/608214 (https://phabricator.wikimedia.org/T256567) [12:58:47] (03CR) 10jerkins-bot: [V: 04-1] gerrit: Use `gerrit-replica.wikimedia.org` as canonical host for gerrit2001 [puppet] - 10https://gerrit.wikimedia.org/r/608214 (https://phabricator.wikimedia.org/T256567) (owner: 10QChris) [13:04:17] (03PS3) 10QChris: gerrit: Use `gerrit-replica.wikimedia.org` as canonical host for gerrit2001 [puppet] - 10https://gerrit.wikimedia.org/r/608214 (https://phabricator.wikimedia.org/T256567) [13:21:47] qchris: Yeah, it's still showing [13:22:30] If I look at https://gerrit.wikimedia.org/r/plugins/delete-project/static/delete-project.js [13:22:34] } [13:22:34] self.onAction('project', 'delete', onDeleteProject); [13:22:34] }); [13:22:41] You said that you did the F5 thing, but did you also try to clear the cache completely? [13:22:45] No [13:22:50] (Yup, that's the file from the old Gerrit) [13:22:52] Hard refreshing on the file in the browser just made it massively change [13:23:00] Could you try? [13:23:07] See above, I don't need to :P [13:23:13] Looks like that was enough [13:23:14] Ah. Ok :-) [13:23:34] I guess something possibly just being a little too aggressive in the caching [13:24:15] Sounds like it. I haven't found what it is (as I cannot reproduce, even if I locally upgrade Gerrits). But at least we know how to overcome it. [13:24:48] "zero your hard drive. reinstall your operating system from source files" [13:25:02] :-) [13:27:09] (03CR) 10QChris: [C: 04-1] "Looks like this is not enough. I'll have to dig deeper :-(" [puppet] - 10https://gerrit.wikimedia.org/r/608214 (https://phabricator.wikimedia.org/T256567) (owner: 10QChris) [13:38:08] RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 78.31 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [14:21:48] PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 106.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [15:23:18] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [15:25:08] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [15:55:06] PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 135.3 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [16:28:46] (03CR) 10Ammarpad: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608124 (https://phabricator.wikimedia.org/T256521) (owner: 10Hamish) [17:34:59] (03PS1) 10ProcrastinatingReader: Add 'abusefilter-view' as a default right for the CU log user [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608222 (https://phabricator.wikimedia.org/T255506) [17:59:40] RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 77.29 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [18:04:24] (03PS1) 10Krinkle: labs: Update eventgate placeholders in Beta Cluster to not use deploymentwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608223 (https://phabricator.wikimedia.org/T198673) [18:08:08] (03PS2) 10ProcrastinatingReader: Add 'abusefilter-view' as a default right for the CU log user [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608222 (https://phabricator.wikimedia.org/T255506) [18:08:51] (03CR) 10Krinkle: [C: 03+2] "no-op as the configs in question were empty." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608223 (https://phabricator.wikimedia.org/T198673) (owner: 10Krinkle) [18:09:36] (03Merged) 10jenkins-bot: labs: Update eventgate placeholders in Beta Cluster to not use deploymentwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608223 (https://phabricator.wikimedia.org/T198673) (owner: 10Krinkle) [18:18:35] (03PS1) 10Krinkle: labs: Remove wmgULSPosition override for deploymentwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608224 (https://phabricator.wikimedia.org/T198673) [18:30:27] (03CR) 10Krinkle: [C: 03+2] labs: Remove wmgULSPosition override for deploymentwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608224 (https://phabricator.wikimedia.org/T198673) (owner: 10Krinkle) [18:31:18] (03Merged) 10jenkins-bot: labs: Remove wmgULSPosition override for deploymentwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608224 (https://phabricator.wikimedia.org/T198673) (owner: 10Krinkle) [18:38:04] PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 138.3 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [18:38:10] (03PS1) 10Krinkle: labs: Disable FileExporter extension on deploymentwiki in Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608225 (https://phabricator.wikimedia.org/T198673) [18:38:54] (03PS2) 10Krinkle: labs: Disable FileExporter extension on deploymentwiki in Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608225 (https://phabricator.wikimedia.org/T198673) [18:39:19] (03CR) 10Krinkle: [C: 03+2] labs: Disable FileExporter extension on deploymentwiki in Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608225 (https://phabricator.wikimedia.org/T198673) (owner: 10Krinkle) [18:40:03] (03Merged) 10jenkins-bot: labs: Disable FileExporter extension on deploymentwiki in Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608225 (https://phabricator.wikimedia.org/T198673) (owner: 10Krinkle) [18:46:01] (03PS1) 10Krinkle: labs: Move Special:CollabPad from deploymentwiki to metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608246 (https://phabricator.wikimedia.org/T198673) [19:05:28] (03PS2) 10Elukey: piwik: Remove "slave" from comment [puppet] - 10https://gerrit.wikimedia.org/r/608157 (https://phabricator.wikimedia.org/T254646) (owner: 10Ladsgroup) [19:06:10] (03CR) 10Elukey: [C: 03+2] piwik: Remove "slave" from comment [puppet] - 10https://gerrit.wikimedia.org/r/608157 (https://phabricator.wikimedia.org/T254646) (owner: 10Ladsgroup) [19:14:13] (03CR) 10Huji: [C: 03+1] "@Proc are you planning to schedule its deployment? I am thinking it might be best if I do that, because I can test the outcome on fawiki r" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608222 (https://phabricator.wikimedia.org/T255506) (owner: 10ProcrastinatingReader) [19:36:34] (03CR) 10Krinkle: [C: 03+2] labs: Move Special:CollabPad from deploymentwiki to metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608246 (https://phabricator.wikimedia.org/T198673) (owner: 10Krinkle) [19:37:18] (03Merged) 10jenkins-bot: labs: Move Special:CollabPad from deploymentwiki to metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608246 (https://phabricator.wikimedia.org/T198673) (owner: 10Krinkle) [19:39:12] (03CR) 10ProcrastinatingReader: "> Patch Set 2: Code-Review+1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608222 (https://phabricator.wikimedia.org/T255506) (owner: 10ProcrastinatingReader) [19:41:24] (03PS5) 10Krinkle: betacluster: Add deploymentwiki to closed-labs.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/606733 (https://phabricator.wikimedia.org/T198673) (owner: 10Majavah) [19:41:34] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [19:42:40] (03CR) 10Krinkle: [C: 03+1] "LGTM. I think we should delete this wiki entirely in due time, but I suppose we could keep it read-only for a little while in case anyone " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/606733 (https://phabricator.wikimedia.org/T198673) (owner: 10Majavah) [19:43:24] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [19:45:12] (03CR) 10Krinkle: [C: 03+1] "I see, yeah, this seems cleaner indeed. So the workflow for adding a new wiki would 1) creating the wiki's YAML entry, 2) deciding its int" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594216 (https://phabricator.wikimedia.org/T251715) (owner: 10Jforrester) [19:45:51] (03PS6) 10Krinkle: labs: Add deploymentwiki to closed-labs.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/606733 (https://phabricator.wikimedia.org/T198673) (owner: 10Majavah) [19:47:03] (03PS1) 10ProcrastinatingReader: Setup rollbacker and mover on lijwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608247 (https://phabricator.wikimedia.org/T256109) [19:49:33] (03CR) 10Krinkle: [C: 03+2] labs: Add deploymentwiki to closed-labs.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/606733 (https://phabricator.wikimedia.org/T198673) (owner: 10Majavah) [19:50:01] (03Merged) 10jenkins-bot: labs: Add deploymentwiki to closed-labs.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/606733 (https://phabricator.wikimedia.org/T198673) (owner: 10Majavah) [19:50:02] Krinkle: not sure if that will even work, see T115584 [19:50:03] T115584: closed-labs.dblist no longer read as 'closed' tag - https://phabricator.wikimedia.org/T115584 [19:53:40] Majavah: I don't think that's an issue. That applies to bascially Cirrus and Wikibase only which have been fixed by other means since then [19:54:01] the InitialiseSettings tag 'closed' never applied to beta even before that change from 2015 [20:00:17] Hmmm, scap finished according to https://integration.wikimedia.org/ci/job/beta-scap-eqiad/306506/console but I can still edit [20:08:35] It's not really a surprise [20:09:01] some of prod config for 'closed' probably needs duplicating for 'closed-labs' [20:09:05] Or some other underlying mapping [20:09:51] see also eswiki in the closed-labs list [20:10:03] Yeah probably, I'll look into it tomorror [20:10:16] Yeah, aawiki isn't closed either [20:10:22] none of the 'closed' settings apply [20:10:52] I suppose we need to decide whether we want closed-labs to be an alternate or a supplement [20:11:38] right now it acts as neither, so only prod closed wikis apply [20:11:43] ah yeah aawiki is actually closed currently [20:11:51] but presumably wasn't before 2015 [20:12:00] when it used the labs list instead [20:19:16] I'm fine with using it either as an alternative or as a supplement, however supplement would be easier to implement [20:19:41] Majavah: easier for wmf-config, not easier for correctness more widely [20:20:01] if closed-labs is incomplete for labs it makes maintenance harder [20:20:06] https://codesearch.wmflabs.org/operations/?q=closed-labs&i=nope&files=&repos= [20:20:16] flow-labs is also a replacement [20:33:19] Well should be [20:36:43] (03PS1) 10Krinkle: multiversion: Fix 'closed-labs' reading as 'closed' for static config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608249 (https://phabricator.wikimedia.org/T109157) [20:36:55] Majavah: something like this perhaps [20:41:46] (03CR) 10Majavah: [C: 03+1] "LGTM, thanns" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608249 (https://phabricator.wikimedia.org/T109157) (owner: 10Krinkle) [20:42:22] s/thanns/thanks on my CR [20:44:55] (03PS1) 10Krinkle: labs: Set 'test.wikipedia.*' as canonical for Beta Cluster testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608250 (https://phabricator.wikimedia.org/T99156) [20:47:38] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:49:25] (03PS1) 10Krinkle: mediawiki: Remove 'test.wikimedia.beta.wmflabs.org' vhost [puppet] - 10https://gerrit.wikimedia.org/r/608251 (https://phabricator.wikimedia.org/T99156) [20:50:38] Reedy: could use a sanity check on the testwiki stuff if you're around :) [20:51:03] * Reedy just deletes beta [20:51:18] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:55:05] I should eat before doing anything else [21:02:14] I need to get some sleep, please don't fully destroy the beta cluster while I'm sleeping [21:04:17] Majavah: such requests can lead to things like this: https://en.wikipedia.org/w/load.php "Max made me put this here." [21:04:27] it is not *fully* destroyed ;-) [21:06:04] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [21:06:30] When people are sleeping is the best time to do stuff [21:11:38] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [21:20:41] (03CR) 10Reedy: [C: 03+1] "If the vhost doesn't exist, it can't be served via it!" [puppet] - 10https://gerrit.wikimedia.org/r/608251 (https://phabricator.wikimedia.org/T99156) (owner: 10Krinkle) [21:21:06] (03CR) 10RhinosF1: Setup rollbacker and mover on lijwiki (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608247 (https://phabricator.wikimedia.org/T256109) (owner: 10ProcrastinatingReader) [21:21:31] (03CR) 10Reedy: [C: 03+1] "Though, does just removing it actually remove it from the servers? Or does it just become an uncontrolled file on disk?" [puppet] - 10https://gerrit.wikimedia.org/r/608251 (https://phabricator.wikimedia.org/T99156) (owner: 10Krinkle) [21:26:02] (03PS2) 10Krinkle: mediawiki: Remove 'test.wikimedia.beta.wmflabs.org' vhost [puppet] - 10https://gerrit.wikimedia.org/r/608251 (https://phabricator.wikimedia.org/T99156) [21:27:19] (03PS2) 10ProcrastinatingReader: Setup rollbacker and mover on lijwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608247 (https://phabricator.wikimedia.org/T256109) [21:27:41] (03CR) 10ProcrastinatingReader: Setup rollbacker and mover on lijwiki (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608247 (https://phabricator.wikimedia.org/T256109) (owner: 10ProcrastinatingReader) [21:27:42] Reedy: thx, good point :) [21:27:49] I'll cherry pick with ensure absnet for a while then revert back to PS1 [21:28:48] (03CR) 10RhinosF1: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608247 (https://phabricator.wikimedia.org/T256109) (owner: 10ProcrastinatingReader) [21:29:31] You can just rm the file from disk on the beta appservers [21:30:17] (03PS3) 10Krinkle: mediawiki: Remove 'test.wikimedia.beta.wmflabs.org' vhost [puppet] - 10https://gerrit.wikimedia.org/r/608251 (https://phabricator.wikimedia.org/T99156) [21:31:11] Reedy: there's sometimes other stuff with side effect like avaliable symlinks etc [21:31:22] seems easier to let puppet agent -tv handle it [21:31:58] (03PS4) 10Krinkle: mediawiki: Remove 'test.wikimedia.beta.wmflabs.org' vhost [puppet] - 10https://gerrit.wikimedia.org/r/608251 (https://phabricator.wikimedia.org/T99156) [21:33:21] (03CR) 10Krinkle: [C: 03+2] labs: Set 'test.wikipedia.*' as canonical for Beta Cluster testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608250 (https://phabricator.wikimedia.org/T99156) (owner: 10Krinkle) [21:34:03] Hm.. new Gerrit 3's "cherry pick git command" is quite hidden [21:34:05] (03Merged) 10jenkins-bot: labs: Set 'test.wikipedia.*' as canonical for Beta Cluster testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608250 (https://phabricator.wikimedia.org/T99156) (owner: 10Krinkle) [21:34:10] It's under "Download patch" from the … menu [21:34:35] that's gonna confuse for a while [21:38:06] !log krinkle@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: beta-only I56eb4a802 (duration: 01m 00s) [21:38:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:38:56] Reedy: everything new is confusing [21:42:12] Not if it's done right [21:42:34] I can still make it confusing [21:43:45] !log krinkle@deploy1001 Synchronized wmf-config/CommonSettings.php: no-op I56eb4a802 (duration: 00m 58s) [21:43:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:15:50] PROBLEM - MariaDB Replica Lag: s4 on db1145 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1157.70 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [22:22:46] (03PS2) 10Cicalese: Add HTTP proxy to MediaModeration. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608062 (https://phabricator.wikimedia.org/T247943) [22:29:20] (03CR) 10Aftab: [C: 03+1] Change bnwiki logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/607852 (https://phabricator.wikimedia.org/T255328) (owner: 10Urbanecm) [22:32:57] (03PS5) 10Bmansurov: Add recommendation-api helmfile stanzas [deployment-charts] - 10https://gerrit.wikimedia.org/r/602527 (https://phabricator.wikimedia.org/T241230) [22:33:43] (03PS6) 10Bmansurov: Add recommendation-api helmfile stanzas [deployment-charts] - 10https://gerrit.wikimedia.org/r/602527 (https://phabricator.wikimedia.org/T241230) [22:34:13] 10Operations, 10Puppet, 10Beta-Cluster-Infrastructure, 10Technical-Debt, 10Tracking-Neverending: Minimize differences between beta and production (Tracking) - https://phabricator.wikimedia.org/T87220 (10Krinkle) [22:37:54] (03CR) 10Bmansurov: "Thanks @alexandros. I can successfully ping /robots.txt now. However, the service is unable to connect to the MediaWiki host for some reas" (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/602527 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov) [23:11:40] RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 78.31 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [23:52:54] RECOVERY - MariaDB Replica Lag: s4 on db1145 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave