[00:26:16] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[00:53:17] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on alert1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[00:53:37] <icinga-wm>	 PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[00:54:51] <icinga-wm>	 RECOVERY - Disk space on centrallog2002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=centrallog2002&var-datasource=codfw+prometheus/ops
[01:00:29] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[01:05:19] <wikibugs>	 (03PS5) 10DannyS712: phpcs: move AssignmentInControlStructures exclusion inline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/796360 (https://phabricator.wikimedia.org/T171115)
[01:09:29] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[01:12:23] <wikibugs>	 (03CR) 10DannyS712: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802840 (https://phabricator.wikimedia.org/T171115) (owner: 10DannyS712)
[01:22:08] <wikibugs>	 (03CR) 10DannyS712: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802841 (https://phabricator.wikimedia.org/T171115) (owner: 10DannyS712)
[01:22:45] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is CRITICAL: 113 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[01:25:05] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute for api_appserver on alert1001 is OK: (C)100 gt (W)50 gt 1 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[01:29:49] <wikibugs>	 (03CR) 10DannyS712: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802842 (https://phabricator.wikimedia.org/T171115) (owner: 10DannyS712)
[01:54:51] <icinga-wm>	 RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:17:39] <wikibugs>	 (03PS4) 10Tim Starling: Enable SSL for master DB connections in the secondary datacenter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799437 (https://phabricator.wikimedia.org/T134809)
[03:17:41] <wikibugs>	 (03PS5) 10Tim Starling: Add the master from the primary DC to the secondary DC load arrays [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799685 (https://phabricator.wikimedia.org/T134809)
[03:17:43] <wikibugs>	 (03PS4) 10Tim Starling: Clean up scap sequencing workaround [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801836
[03:19:01] <wikibugs>	 (03PS5) 10Tim Starling: Enable SSL for master DB connections in the secondary datacenter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799437 (https://phabricator.wikimedia.org/T134809)
[03:19:03] <wikibugs>	 (03PS6) 10Tim Starling: Add the master from the primary DC to the secondary DC load arrays [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799685 (https://phabricator.wikimedia.org/T134809)
[03:19:05] <wikibugs>	 (03PS5) 10Tim Starling: Clean up scap sequencing workaround [mediawiki-config] - 10https://gerrit.wikimedia.org/r/801836
[03:22:39] <wikibugs>	 (03CR) 10Tim Starling: "In PS5 I excluded x2 from the cross-DC master connection logic, reflecting the fact that MW has x2 configured such that all reads go to th" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799685 (https://phabricator.wikimedia.org/T134809) (owner: 10Tim Starling)
[03:26:39] <wikibugs>	 10ops-codfw: codfw: Master PDU rack/setup row A, row B, rowC and row D task - https://phabricator.wikimedia.org/T309956 (10Papaul)
[03:27:31] <icinga-wm>	 PROBLEM - Query Service HTTP Port on wdqs1006 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 380 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[03:29:49] <icinga-wm>	 RECOVERY - Query Service HTTP Port on wdqs1006 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.028 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[04:02:31] <icinga-wm>	 PROBLEM - SSH on wtp1040.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[04:11:50] <wikibugs>	 (03PS4) 10DannyS712: phpcs: enable and configure ValidGlobalName.allowedPrefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802842 (https://phabricator.wikimedia.org/T171115)
[04:12:27] <wikibugs>	 (03PS5) 10DannyS712: phpcs: enable and configure ValidGlobalName.allowedPrefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802842 (https://phabricator.wikimedia.org/T171115)
[04:18:49] <wikibugs>	 (03PS6) 10DannyS712: phpcs: enable and configure ValidGlobalName.allowedPrefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802842 (https://phabricator.wikimedia.org/T171115)
[04:19:39] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on alert1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[04:20:02] <wikibugs>	 (03CR) 10DannyS712: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802946 (https://phabricator.wikimedia.org/T171115) (owner: 10DannyS712)
[04:20:41] <wikibugs>	 10ops-codfw: (Need By:TBD) rack/setup/install row A new PDUs - https://phabricator.wikimedia.org/T309957 (10Papaul)
[04:26:16] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[04:26:36] <wikibugs>	 (03CR) 10Tim Starling: [C: 03+2] Enable SSL for master DB connections in the secondary datacenter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799437 (https://phabricator.wikimedia.org/T134809) (owner: 10Tim Starling)
[04:27:23] <wikibugs>	 (03Merged) 10jenkins-bot: Enable SSL for master DB connections in the secondary datacenter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/799437 (https://phabricator.wikimedia.org/T134809) (owner: 10Tim Starling)
[04:28:48] <wikibugs>	 10ops-codfw: (Need By:TBD) rack/setup/install row A new PDUs - https://phabricator.wikimedia.org/T309957 (10Papaul) p:05Triage→03Medium
[04:29:06] <wikibugs>	 10SRE, 10ops-codfw: codfw: Master PDU rack/setup row A, row B, rowC and row D task - https://phabricator.wikimedia.org/T309956 (10Papaul) p:05Triage→03Medium
[04:31:32] <logmsgbot>	 !log tstarling@deploy1002 Synchronized wmf-config/db-production.php: enable SSL for cross-DC master connections (duration: 03m 10s)
[04:31:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:33:33] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[04:33:36] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[04:33:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:36:20] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[04:36:22] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[04:36:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:36:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:38:04] <wikibugs>	 (03CR) 10DannyS712: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802947 (https://phabricator.wikimedia.org/T171115) (owner: 10DannyS712)
[04:40:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[04:40:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:45:18] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[04:45:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:49:12] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[04:49:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:49:13] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[04:49:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:53:14] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[04:53:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:00:42] <wikibugs>	 (03PS1) 10Marostegui: db1128: Enanble notifications [puppet] - 10https://gerrit.wikimedia.org/r/802945 (https://phabricator.wikimedia.org/T309303)
[05:02:15] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1128: Enanble notifications [puppet] - 10https://gerrit.wikimedia.org/r/802945 (https://phabricator.wikimedia.org/T309303) (owner: 10Marostegui)
[05:03:47] <icinga-wm>	 RECOVERY - SSH on wtp1040.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:06:16] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Add db1128 to dbctl T309303', diff saved to https://phabricator.wikimedia.org/P29418 and previous config saved to /var/cache/conftool/dbconfig/20220606-050616-marostegui.json
[05:06:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:06:21] <stashbot>	 T309303: Move db1128 from m1 (misc) to s1 (mediawiki) - https://phabricator.wikimedia.org/T309303
[05:07:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Pool db1128 on s1 with small weight after DIMM replacement T309303', diff saved to https://phabricator.wikimedia.org/P29419 and previous config saved to /var/cache/conftool/dbconfig/20220606-050707-root.json
[05:07:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:10:39] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[05:12:06] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Give more weight to db1137 in x1 to test 10.6.8 T309679 ', diff saved to https://phabricator.wikimedia.org/P29420 and previous config saved to /var/cache/conftool/dbconfig/20220606-051205-marostegui.json
[05:12:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:12:11] <stashbot>	 T309679:  Migrate a x1 DB host to mariadb 10.6 - https://phabricator.wikimedia.org/T309679
[05:17:55] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] switchover-tmpl: Add commands for the heartbeat and zarcillo (031 comment) [software] - 10https://gerrit.wikimedia.org/r/802778 (owner: 10Ladsgroup)
[05:18:00] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] switchover-tmpl: Add commands for the heartbeat and zarcillo [software] - 10https://gerrit.wikimedia.org/r/802778 (owner: 10Ladsgroup)
[05:18:33] <wikibugs>	 (03Merged) 10jenkins-bot: switchover-tmpl: Add commands for the heartbeat and zarcillo [software] - 10https://gerrit.wikimedia.org/r/802778 (owner: 10Ladsgroup)
[05:25:47] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully pool db1137 in x1 to with 10.6.8 T309679 ', diff saved to https://phabricator.wikimedia.org/P29421 and previous config saved to /var/cache/conftool/dbconfig/20220606-052546-marostegui.json
[05:25:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:25:52] <stashbot>	 T309679:  Migrate a x1 DB host to mariadb 10.6 - https://phabricator.wikimedia.org/T309679
[06:01:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1128 (re)pooling @ 2%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P29422 and previous config saved to /var/cache/conftool/dbconfig/20220606-060110-root.json
[06:01:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:16:15] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1128 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P29423 and previous config saved to /var/cache/conftool/dbconfig/20220606-061614-root.json
[06:16:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:31:19] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1128 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P29424 and previous config saved to /var/cache/conftool/dbconfig/20220606-063118-root.json
[06:31:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:32:19] <icinga-wm>	 RECOVERY - Memcached on an-tool1005 is OK: TCP OK - 0.001 second response time on 10.64.36.117 port 11211 https://wikitech.wikimedia.org/wiki/Memcached
[06:38:57] <marostegui>	 !log Migrate pc1014 to mariadb 10.6.8 T309612
[06:39:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:39:01] <stashbot>	 T309612: Migrate an active DC parsercache host to MariaDB 10.6 - https://phabricator.wikimedia.org/T309612
[06:40:04] <wikibugs>	 (03PS1) 10Marostegui: pc1014: Install MariaDB 10.6.8 [puppet] - 10https://gerrit.wikimedia.org/r/803084 (https://phabricator.wikimedia.org/T309612)
[06:41:15] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] pc1014: Install MariaDB 10.6.8 [puppet] - 10https://gerrit.wikimedia.org/r/803084 (https://phabricator.wikimedia.org/T309612) (owner: 10Marostegui)
[06:46:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1128 (re)pooling @ 20%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P29425 and previous config saved to /var/cache/conftool/dbconfig/20220606-064622-root.json
[06:46:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:46:52] <wikibugs>	 (03PS7) 10Elukey: Add BGP configuration for the new ML staging codfw cluster [homer/public] - 10https://gerrit.wikimedia.org/r/802072 (https://phabricator.wikimedia.org/T302198)
[06:50:30] <wikibugs>	 (03CR) 10Elukey: Add BGP configuration for the new ML staging codfw cluster (032 comments) [homer/public] - 10https://gerrit.wikimedia.org/r/802072 (https://phabricator.wikimedia.org/T302198) (owner: 10Elukey)
[06:52:04] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] ml-services: add svwiki & trwiki articlequality isvcs [deployment-charts] - 10https://gerrit.wikimedia.org/r/802500 (https://phabricator.wikimedia.org/T307418) (owner: 10Kevin Bazira)
[06:57:34] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ayounsi) 05Resolved→03Open Feel free to close the task if expected, but the latest diffscan report shows that SSH is open to...
[07:00:05] <jouncebot>	 Amir1 and Urbanecm: Your horoscope predicts another unfortunate UTC morning backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220606T0700).
[07:00:05] <jouncebot>	 No Gerrit patches in the queue for this window AFAICS.
[07:00:22] <wikibugs>	 (03CR) 10Ayounsi: Add role::netmon to the netmon1003 instance. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/802593 (https://phabricator.wikimedia.org/T309074) (owner: 10Andrea Denisse)
[07:01:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P29426 and previous config saved to /var/cache/conftool/dbconfig/20220606-070126-root.json
[07:01:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:16:31] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1128 (re)pooling @ 40%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P29427 and previous config saved to /var/cache/conftool/dbconfig/20220606-071630-root.json
[07:16:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:18:01] <wikibugs>	 (03PS1) 10Marostegui: ProductionServices.php: Promote pc1014 to pc1 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803088 (https://phabricator.wikimedia.org/T309612)
[07:24:33] <wikibugs>	 10SRE, 10SRE-swift-storage, 10ops-eqiad: Power drain and restart of ms-be1059 - https://phabricator.wikimedia.org/T307667 (10MatthewVernon) Sorry they're giving you the runaround, that sounds very annoying :( Thanks for the update!
[07:31:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P29428 and previous config saved to /var/cache/conftool/dbconfig/20220606-073134-root.json
[07:31:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:32:01] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] "10.64.48.89 is pc1014" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803088 (https://phabricator.wikimedia.org/T309612) (owner: 10Marostegui)
[07:35:20] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] ProductionServices.php: Promote pc1014 to pc1 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803088 (https://phabricator.wikimedia.org/T309612) (owner: 10Marostegui)
[07:36:10] <wikibugs>	 (03Merged) 10jenkins-bot: ProductionServices.php: Promote pc1014 to pc1 master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803088 (https://phabricator.wikimedia.org/T309612) (owner: 10Marostegui)
[07:37:11] <wikibugs>	 (03PS1) 10Marostegui: pc1011,pc1014: Promote pc1014 to pc1 master [puppet] - 10https://gerrit.wikimedia.org/r/803229 (https://phabricator.wikimedia.org/T309612)
[07:38:17] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] pc1011,pc1014: Promote pc1014 to pc1 master [puppet] - 10https://gerrit.wikimedia.org/r/803229 (https://phabricator.wikimedia.org/T309612) (owner: 10Marostegui)
[07:39:58] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[07:40:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:41:24] <logmsgbot>	 !log marostegui@deploy1002 Synchronized wmf-config/ProductionServices.php: Promote pc1014 to pc1 master T309612 (duration: 02m 53s)
[07:41:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:41:27] <stashbot>	 T309612: Migrate an active DC parsercache host to MariaDB 10.6 - https://phabricator.wikimedia.org/T309612
[07:44:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[07:44:10] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[07:44:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:44:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:46:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1128 (re)pooling @ 60%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P29429 and previous config saved to /var/cache/conftool/dbconfig/20220606-074638-root.json
[07:46:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:48:11] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[07:48:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:51:38] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be1048.eqiad.wmnet with OS bullseye
[07:51:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:51:42] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-be1048.eqiad.wmnet with OS bullseye
[08:00:38] <icinga-wm>	 PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:01:44] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P29430 and previous config saved to /var/cache/conftool/dbconfig/20220606-080142-root.json
[08:01:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:08:38] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1048.eqiad.wmnet with reason: host reimage
[08:08:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:11:11] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1048.eqiad.wmnet with reason: host reimage
[08:11:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:16:48] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P29431 and previous config saved to /var/cache/conftool/dbconfig/20220606-081647-root.json
[08:16:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:18:40] <wikibugs>	 10SRE, 10Thumbor, 10Thumbor Migration, 10serviceops: Migrate thumbor to Kubernetes - https://phabricator.wikimedia.org/T233196 (10fgiunchedi) re: instances, a bit of historical context in case it is useful. The main reason Thumbor was deployed that way is because of concurrency limits (i.e. one instance =...
[08:20:35] <wikibugs>	 10SRE, 10SRE-tools, 10Icinga, 10Infrastructure-Foundations, 10observability: Icinga paged for a host that should have been downtimed - https://phabricator.wikimedia.org/T309447 (10fgiunchedi) p:05High→03Medium >>! In T309447#7976123, @MoritzMuehlenhoff wrote: > Severity is unclear to me from just rea...
[08:25:55] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1048.eqiad.wmnet with OS bullseye
[08:25:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:25:58] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin2002 for host ms-be1048.eqiad.wmnet with OS bullseye completed: - ms-be1048 (**PASS**)   - Downtim...
[08:26:16] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[08:31:54] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298560)', diff saved to https://phabricator.wikimedia.org/P29432 and previous config saved to /var/cache/conftool/dbconfig/20220606-083153-ladsgroup.json
[08:31:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:31:57] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[08:31:59] <icinga-wm>	 PROBLEM - SSH on druid1006.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[08:33:00] <wikibugs>	 (03PS8) 10Elukey: Add BGP configuration for the new ML staging codfw cluster [homer/public] - 10https://gerrit.wikimedia.org/r/802072 (https://phabricator.wikimedia.org/T302198)
[08:39:39] <mbsantos>	 !log maintenance: trigger full planet re-import for maps codfw
[08:39:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:52] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Deprecate 'monitoring_setup' service state [puppet] - 10https://gerrit.wikimedia.org/r/803231 (https://phabricator.wikimedia.org/T309774)
[08:41:24] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be1049.eqiad.wmnet with OS bullseye
[08:41:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:41:28] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-be1049.eqiad.wmnet with OS bullseye
[08:46:59] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P29433 and previous config saved to /var/cache/conftool/dbconfig/20220606-084658-ladsgroup.json
[08:47:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:54:33] <wikibugs>	 10SRE, 10Data-Catalog, 10Data-Engineering, 10serviceops, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10BTullis) >>! In T303049#7976511, @JMeybohm wrote: >  > Sorry for nudging @BTullis - do you miss any information or need any assistance regarding the remaining s...
[08:58:14] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1049.eqiad.wmnet with reason: host reimage
[08:58:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:59:23] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35734/console" [puppet] - 10https://gerrit.wikimedia.org/r/803231 (https://phabricator.wikimedia.org/T309774) (owner: 10Filippo Giunchedi)
[09:01:18] <icinga-wm>	 RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[09:01:25] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1049.eqiad.wmnet with reason: host reimage
[09:01:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:01:33] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+1] "+ Janis and Ben re: datasearchhub (nothing functionally will change, JFYI)" [puppet] - 10https://gerrit.wikimedia.org/r/803231 (https://phabricator.wikimedia.org/T309774) (owner: 10Filippo Giunchedi)
[09:02:04] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P29434 and previous config saved to /var/cache/conftool/dbconfig/20220606-090203-ladsgroup.json
[09:02:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:09:26] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[09:13:29] <wikibugs>	 (03CR) 10Btullis: [C: 03+1] "Looks good. Thanks for the heads-up." [puppet] - 10https://gerrit.wikimedia.org/r/803231 (https://phabricator.wikimedia.org/T309774) (owner: 10Filippo Giunchedi)
[09:14:06] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] opensearch: add support for managing opensearch 2.0 [puppet] - 10https://gerrit.wikimedia.org/r/802862 (https://phabricator.wikimedia.org/T304440) (owner: 10Cwhite)
[09:17:09] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298560)', diff saved to https://phabricator.wikimedia.org/P29435 and previous config saved to /var/cache/conftool/dbconfig/20220606-091709-ladsgroup.json
[09:17:11] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[09:17:12] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
[09:17:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:17:14] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[09:17:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:17:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:17:20] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1049.eqiad.wmnet with OS bullseye
[09:17:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:17:24] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin2002 for host ms-be1049.eqiad.wmnet with OS bullseye completed: - ms-be1049 (**PASS**)   - Downtim...
[09:18:39] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM modulo the comments already made" [puppet] - 10https://gerrit.wikimedia.org/r/802593 (https://phabricator.wikimedia.org/T309074) (owner: 10Andrea Denisse)
[09:20:40] <wikibugs>	 (03PS8) 10MarcoAurelio: Enable $wgFixDoubleRedirects on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/780636 (https://phabricator.wikimedia.org/T305782)
[09:23:04] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] alertmanager.yml.erb: use facts directly instead of lookupvar [puppet] - 10https://gerrit.wikimedia.org/r/802489 (owner: 10David Caro)
[09:25:35] <wikibugs>	 (03CR) 10Filippo Giunchedi: "See inline" [puppet] - 10https://gerrit.wikimedia.org/r/802074 (owner: 10David Caro)
[09:29:10] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be1050.eqiad.wmnet with OS bullseye
[09:29:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:29:15] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-be1050.eqiad.wmnet with OS bullseye
[09:34:00] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35735/console" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/802104 (https://phabricator.wikimedia.org/T304716) (owner: 10Majavah)
[09:34:26] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+1] "Idea itself LGTM, see inline and what David said" [puppet] - 10https://gerrit.wikimedia.org/r/802104 (https://phabricator.wikimedia.org/T304716) (owner: 10Majavah)
[09:35:13] <wikibugs>	 (03CR) 10MVernon: sre.swift.convert-ssds: add new cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/801693 (https://phabricator.wikimedia.org/T309027) (owner: 10Volans)
[09:36:13] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/802040 (owner: 10David Caro)
[09:39:01] <wikibugs>	 (03PS3) 10Volans: sre.swift.convert-ssds: add new cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/801693 (https://phabricator.wikimedia.org/T309027)
[09:42:10] <wikibugs>	 (03CR) 10Volans: [C: 03+2] sre.swift.convert-ssds: add new cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/801693 (https://phabricator.wikimedia.org/T309027) (owner: 10Volans)
[09:44:39] <urbanecm>	 jouncebot: nowandnext
[09:44:39] <jouncebot>	 No deployments scheduled for the next 3 hour(s) and 15 minute(s)
[09:44:39] <jouncebot>	 In 3 hour(s) and 15 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220606T1300)
[09:45:12] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM (untested)" [puppet] - 10https://gerrit.wikimedia.org/r/790325 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede)
[09:45:13] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: Replace RAID controller battery in an-worker1081 - https://phabricator.wikimedia.org/T308434 (10BTullis) 05Open→03Resolved a:03BTullis I have downtimed the MegaRAID service on analytics1068 until 2022-08-30 - Apologies for the oversight @RhinosF1
[09:45:27] <wikibugs>	 (03Merged) 10jenkins-bot: sre.swift.convert-ssds: add new cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/801693 (https://phabricator.wikimedia.org/T309027) (owner: 10Volans)
[09:45:59] <wikibugs>	 (03PS1) 10Urbanecm: Revoke ipinfo-view-log from sysop [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803236 (https://phabricator.wikimedia.org/T309411)
[09:46:11] <wikibugs>	 (03PS2) 10Urbanecm: Revoke ipinfo-view-log from sysop [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803236 (https://phabricator.wikimedia.org/T309411)
[09:46:14] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Revoke ipinfo-view-log from sysop [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803236 (https://phabricator.wikimedia.org/T309411) (owner: 10Urbanecm)
[09:47:00] <wikibugs>	 (03Merged) 10jenkins-bot: Revoke ipinfo-view-log from sysop [mediawiki-config] - 10https://gerrit.wikimedia.org/r/803236 (https://phabricator.wikimedia.org/T309411) (owner: 10Urbanecm)
[09:47:34] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1050.eqiad.wmnet with reason: host reimage
[09:47:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:48:24] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] ipmi: Assign SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/802757 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[09:48:36] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] webperf: Assign SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/802758 (https://phabricator.wikimedia.org/T308013) (owner: 10Muehlenhoff)
[09:49:47] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[09:49:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:50:29] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1050.eqiad.wmnet with reason: host reimage
[09:50:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:50:39] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[09:50:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:50:40] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[09:50:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:51:06] <wikibugs>	 (03PS1) 10Volans: sre.swift.convert-ssds: fix typo [cookbooks] - 10https://gerrit.wikimedia.org/r/803238 (https://phabricator.wikimedia.org/T309027)
[09:51:36] <wikibugs>	 (03CR) 10Volans: [C: 03+2] "Trivial typo, self-merging" [cookbooks] - 10https://gerrit.wikimedia.org/r/803238 (https://phabricator.wikimedia.org/T309027) (owner: 10Volans)
[09:51:41] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[09:51:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:51:44] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: b35c217163fc621bf68b982580dd68f317b08a55: Revoke ipinfo-view-log from sysop (T309411) (duration: 03m 04s)
[09:51:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:54:39] <wikibugs>	 (03Merged) 10jenkins-bot: sre.swift.convert-ssds: fix typo [cookbooks] - 10https://gerrit.wikimedia.org/r/803238 (https://phabricator.wikimedia.org/T309027) (owner: 10Volans)
[09:57:43] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/802810 (owner: 10JMeybohm)
[09:58:57] <wikibugs>	 (03CR) 10Volans: [C: 04-1] black format cookbooks/sre/__init__.py [cookbooks] - 10https://gerrit.wikimedia.org/r/802810 (owner: 10JMeybohm)
[10:00:34] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "The cookbook repository does not currently use black. Applying black to a single file doesn't seem wise to me because it mixes different s" [cookbooks] - 10https://gerrit.wikimedia.org/r/802810 (owner: 10JMeybohm)
[10:04:57] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1050.eqiad.wmnet with OS bullseye
[10:05:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:05:03] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin2002 for host ms-be1050.eqiad.wmnet with OS bullseye completed: - ms-be1050 (**PASS**)   - Downtim...
[10:08:30] <wikibugs>	 (03PS10) 10Btullis: Add initial config for pooled status [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246)
[10:13:04] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add initial config for pooled status [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246) (owner: 10Btullis)
[10:18:07] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] "LGTM couple of minor nits/issues inline" [cookbooks] - 10https://gerrit.wikimedia.org/r/802811 (owner: 10JMeybohm)
[10:29:50] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be1051.eqiad.wmnet with OS bullseye
[10:29:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:54] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-be1051.eqiad.wmnet with OS bullseye
[10:31:20] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/802849 (https://phabricator.wikimedia.org/T293942) (owner: 10Dzahn)
[10:35:27] <wikibugs>	 (03PS1) 10Btullis: Use latest image version in all remaining eventgate services [deployment-charts] - 10https://gerrit.wikimedia.org/r/803242 (https://phabricator.wikimedia.org/T306181)
[10:41:03] <icinga-wm>	 PROBLEM - SSH on pki2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[10:42:16] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1051.eqiad.wmnet with reason: host reimage
[10:42:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:44:13] <wikibugs>	 (03PS1) 10Jbond: remote: add an __iter__ to RemoteHosts [software/spicerack] - 10https://gerrit.wikimedia.org/r/803243
[10:44:35] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[10:44:54] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1051.eqiad.wmnet with reason: host reimage
[10:44:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:48:35] <wikibugs>	 (03CR) 10Volans: "I don't have problems adding it. I'm just wondering if it could be confusing and/or incentivate re-implementing things already available v" [software/spicerack] - 10https://gerrit.wikimedia.org/r/803243 (owner: 10Jbond)
[10:52:07] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] remote: add an __iter__ to RemoteHosts [software/spicerack] - 10https://gerrit.wikimedia.org/r/803243 (owner: 10Jbond)
[10:58:01] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1051.eqiad.wmnet with OS bullseye
[10:58:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:58:04] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin2002 for host ms-be1051.eqiad.wmnet with OS bullseye completed: - ms-be1051 (**PASS**)   - Downtim...
[11:04:02] <wikibugs>	 (03CR) 10Jbond: "thanks for the patch lgtm but cls=important dosn't seem to have an affect" [puppet] - 10https://gerrit.wikimedia.org/r/802897 (owner: 10Ladsgroup)
[11:05:13] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic, and 2 others: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10BTullis) I'm a bit confused by the state of things now.  1) Has the update to service-runner 3.1.0 be...
[11:05:28] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/802851 (https://phabricator.wikimedia.org/T293942) (owner: 10Dzahn)
[11:07:49] <urbanecm>	 jouncebot: now
[11:07:49] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 52 minute(s)
[11:08:31] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM thx" [puppet] - 10https://gerrit.wikimedia.org/r/803231 (https://phabricator.wikimedia.org/T309774) (owner: 10Filippo Giunchedi)
[11:11:10] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/CommonSettings.php: b35c217163fc621bf68b982580dd68f317b08a55: Revoke ipinfo-view-log from sysop (T309411) (duration: 03m 18s)
[11:11:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:11:15] * urbanecm done
[11:11:36] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic, and 2 others: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10akosiaris) >>! In T306181#7982366, @BTullis wrote: > I'm a bit confused by the state of things now. >...
[11:13:27] <wikibugs>	 (03PS1) 10Jbond: CONTRIBUTORS: add additional contributors [puppet] - 10https://gerrit.wikimedia.org/r/803247
[11:15:13] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] CONTRIBUTORS: add additional contributors [puppet] - 10https://gerrit.wikimedia.org/r/803247 (owner: 10Jbond)
[11:21:00] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [software/mailman-templates] - 10https://gerrit.wikimedia.org/r/803253 (owner: 10L10n-bot)
[11:22:31] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.reimage for host ms-be1052.eqiad.wmnet with OS bullseye
[11:22:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:22:35] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host ms-be1052.eqiad.wmnet with OS bullseye
[11:25:36] <wikibugs>	 10SRE, 10Data-Engineering, 10Data-Engineering-Kanban, 10Traffic, and 2 others: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10BTullis) Great, thanks for the summary @akosiaris - So the reduction in replicas alone explains the s...
[11:33:58] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Add Evelien WMDE to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T309700 (10CDanis) a:03KFrancis
[11:38:26] <koi>	 Is there anyone interested in T309974?
[11:38:27] <stashbot>	 T309974: https://codesearch.wmcloud.org/ does not load - https://phabricator.wikimedia.org/T309974
[11:44:25] <Reedy>	 koi: WFM
[11:44:37] <marostegui>	 Same
[11:47:16] <koi>	 back to normal now
[11:57:10] <wikibugs>	 (03PS1) 10Ayounsi: Initial support for servers switch interfaces [cookbooks] - 10https://gerrit.wikimedia.org/r/803261
[11:57:12] <wikibugs>	 (03PS1) 10Ayounsi: [WIP] Decom cookbook: configure switches using cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/803262
[11:59:57] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Initial support for servers switch interfaces [cookbooks] - 10https://gerrit.wikimedia.org/r/803261 (owner: 10Ayounsi)
[12:01:16] <wikibugs>	 10SRE, 10ops-eqiad, 10serviceops: mw1415 (canary appserver) is down, incl. mgmt - https://phabricator.wikimedia.org/T307755 (10Cmjohnson) You have successfully submitted request SR1096030919.
[12:05:21] <logmsgbot>	 !log mvernon@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1052.eqiad.wmnet with reason: host reimage
[12:05:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:06:43] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
[12:06:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:06:48] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Persistence-Backup: Q4:(Need By: TBD) rack/setup/install backup1009.eqiad.wmnet - https://phabricator.wikimedia.org/T307048 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host backup1009.eqiad.wmnet with OS bullseye
[12:06:51] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
[12:06:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:06:55] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Persistence-Backup: Q4:(Need By: TBD) rack/setup/install backup1009.eqiad.wmnet - https://phabricator.wikimedia.org/T307048 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host backup1009.eqiad.wmnet with OS bullseye execut...
[12:07:01] <wikibugs>	 (03PS1) 10Ayounsi: Add python3.10 support to Tox [cookbooks] - 10https://gerrit.wikimedia.org/r/803263
[12:08:30] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1052.eqiad.wmnet with reason: host reimage
[12:08:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:10:55] <logmsgbot>	 !log cmjohnson@cumin1001 START - Cookbook sre.hosts.reimage for host backup1009.eqiad.wmnet with OS bullseye
[12:10:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:10:59] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Persistence-Backup: Q4:(Need By: TBD) rack/setup/install backup1009.eqiad.wmnet - https://phabricator.wikimedia.org/T307048 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host backup1009.eqiad.wmnet with OS bullseye
[12:11:02] <logmsgbot>	 !log cmjohnson@cumin1001 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1009.eqiad.wmnet with OS bullseye
[12:11:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:11:10] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Data-Persistence-Backup: Q4:(Need By: TBD) rack/setup/install backup1009.eqiad.wmnet - https://phabricator.wikimedia.org/T307048 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host backup1009.eqiad.wmnet with OS bullseye execut...
[12:11:15] <wikibugs>	 (03PS1) 10Jforrester: Partial revert "TextHandler::getTextTracksFromRows(): Remove unused code" [extensions/TimedMediaHandler] (wmf/1.39.0-wmf.14) - 10https://gerrit.wikimedia.org/r/802952 (https://phabricator.wikimedia.org/T309873)
[12:11:36] <icinga-wm>	 PROBLEM - Check systemd state on logstash2026 is CRITICAL: CRITICAL - degraded: The following units failed: curator_actions_cluster_wide.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:20:08] <wikibugs>	 (03PS11) 10Btullis: Add initial config for pooled status [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246)
[12:21:44] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Assign SPDX headers to puppet.git - https://phabricator.wikimedia.org/T308013 (10jbond) >>! In T308013#7980024, @Dzahn wrote: >> bundle exec rake 'spdx:convert:module[MODULENAME]' >  > Is there any way to install the ruby gem "puppet" from a Debian pac...
[12:24:30] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add initial config for pooled status [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246) (owner: 10Btullis)
[12:24:58] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "This fails the unit tests with AttributeError: 'PosixPath' object has no attribute 'startswith'" [cookbooks] - 10https://gerrit.wikimedia.org/r/803263 (owner: 10Ayounsi)
[12:26:16] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[12:27:21] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 03+1 C: 03+2] Deprecate 'monitoring_setup' service state [puppet] - 10https://gerrit.wikimedia.org/r/803231 (https://phabricator.wikimedia.org/T309774) (owner: 10Filippo Giunchedi)
[12:27:55] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
[12:27:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:30:00] <wikibugs>	 (03PS2) 10Ayounsi: Add python3.10 support to Tox [cookbooks] - 10https://gerrit.wikimedia.org/r/803263
[12:30:01] <wikibugs>	 (03PS2) 10Ayounsi: Initial support for servers switch interfaces [cookbooks] - 10https://gerrit.wikimedia.org/r/803261
[12:32:57] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Initial support for servers switch interfaces [cookbooks] - 10https://gerrit.wikimedia.org/r/803261 (owner: 10Ayounsi)
[12:34:04] <icinga-wm>	 RECOVERY - SSH on druid1006.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:40:48] <icinga-wm>	 PROBLEM - Host es2031 is DOWN: PING CRITICAL - Packet loss = 100%
[12:41:58] <icinga-wm>	 RECOVERY - Host es2031 is UP: PING OK - Packet loss = 0%, RTA = 33.20 ms
[12:42:28] <icinga-wm>	 PROBLEM - MariaDB read only es2 on es2031 is CRITICAL: Could not connect to localhost:3306 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Master_comes_back_in_read_only
[12:42:35] <volans>	 marostegui: es2031 got rebooted
[12:42:42] <volans>	 probably crashed, I'm having a look
[12:43:28] <icinga-wm>	 PROBLEM - mysqld processes on es2031 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[12:43:43] <volans>	 cc godog, jayme ^^^
[12:45:23] <godog>	 volans: thank you for the heads up
[12:45:43] <godog>	 volans: what assistance would you like ?
[12:46:28] <volans>	 godog: me personally nothing, I'm not sure if there is any DBA around today though to have a look
[12:46:31] <volans>	 it seems hardaware failure
[12:47:01] <godog>	 ack
[12:51:54] <wikibugs>	 10ops-codfw, 10DBA: es2031 crashed (es2) - https://phabricator.wikimedia.org/T309977 (10Volans) p:05Triage→03High
[12:52:02] <volans>	 godog: I've created this task ^^^
[12:53:00] <logmsgbot>	 !log mvernon@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1052.eqiad.wmnet with OS bullseye
[12:53:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:53:04] <wikibugs>	 10SRE-swift-storage: Upgrade Swift ms cluster to Bullseye and revisit mkfs.xfs options - https://phabricator.wikimedia.org/T279637 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin2002 for host ms-be1052.eqiad.wmnet with OS bullseye completed: - ms-be1052 (**PASS**)   - Downtim...
[12:54:06] <godog>	 ack, not sure if we need any depool action at this time? I see cp2031 in dbconfig-instance in puppet
[12:54:48] <papaul>	 volans: godog: es2031 https://netbox.wikimedia.org/extras/reports/network.Network/
[12:55:13] <volans>	 papaul: ?
[12:55:16] <volans>	 wrong link?
[12:55:45] <papaul>	 not that one i hav ea bus fatal error was detected on a component at slot 4 on es2031
[12:55:52] <papaul>	 yes that was the wrong link sorry
[12:56:04] <volans>	 godog: yes I can depool it from dbctl
[12:56:18] <jinxer-wm>	 (ProbeDown) firing: Service wdqs-ssl:443 has failed probes (http_wdqs-ssl_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=http - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:57:31] <godog>	 volans: SGTM
[12:58:44] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
[12:58:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:59:24] <logmsgbot>	 !log volans@cumin1001 dbctl commit (dc=all): 'es2031 crashed T309977', diff saved to https://phabricator.wikimedia.org/P29436 and previous config saved to /var/cache/conftool/dbconfig/20220606-125923-volans.json
[12:59:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:59:27] <stashbot>	 T309977: es2031 crashed (es2) - https://phabricator.wikimedia.org/T309977
[13:00:04] <jouncebot>	 RoanKattouw, Lucas_WMDE, Urbanecm, and awight: It is that lovely time of the day again! You are hereby commanded to deploy UTC afternoon backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220606T1300).
[13:00:04] <jouncebot>	 hauskatze: A patch you scheduled for UTC afternoon backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:13] <hauskatze>	 o/
[13:01:19] <jinxer-wm>	 (ProbeDown) resolved: Service wdqs-ssl:443 has failed probes (http_wdqs-ssl_ip4) - https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=http - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[13:02:00] <wikibugs>	 (03PS12) 10Jbond: Add initial config for pooled status [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246) (owner: 10Btullis)
[13:03:08] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add initial config for pooled status [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246) (owner: 10Btullis)
[13:04:09] <marostegui>	 volans: thanks I will check. was having lunch
[13:04:41] <volans>	 marostegui: thanks no prob, doesn't seem crazy urgent
[13:04:50] <volans>	 I've put hw logs in the task
[13:04:52] <marostegui>	 yeah it is not uses 
[13:04:54] <marostegui>	 used
[13:04:55] <volans>	 doesn't seem it was the first time
[13:05:04] <marostegui>	 thanks - I'll follow up
[13:05:17] <koi>	 https://sal.toolforge.org/ is down now
[13:06:34] <koi>	 and back to normal
[13:07:00] <koi>	 ...no, still 500 here
[13:08:58] <volans>	 koi: probably better to ask in #wikimedia-cloud-admin or #wikimedia-cloud
[13:09:30] <zabe>	 happens regularly and yes, you should ask in -cloud for someone to restart the webservice
[13:10:29] <koi>	 thanks for reply, asked
[13:11:48] <wikibugs>	 (03PS3) 10Ayounsi: Initial support for servers switch interfaces [cookbooks] - 10https://gerrit.wikimedia.org/r/803261
[13:14:44] <wikibugs>	 (03PS1) 10Andrew Bogott: Openstack nova vendordata: more fixes to metadata timeouts [puppet] - 10https://gerrit.wikimedia.org/r/803269 (https://phabricator.wikimedia.org/T309930)
[13:14:48] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Initial support for servers switch interfaces [cookbooks] - 10https://gerrit.wikimedia.org/r/803261 (owner: 10Ayounsi)
[13:17:55] <icinga-wm>	 PROBLEM - k8s API server requests latencies on ml-serve-ctrl1001 is CRITICAL: instance=10.64.16.202 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[13:18:02] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Openstack nova vendordata: more fixes to metadata timeouts [puppet] - 10https://gerrit.wikimedia.org/r/803269 (https://phabricator.wikimedia.org/T309930) (owner: 10Andrew Bogott)
[13:24:27] <wikibugs>	 (03CR) 10Ayounsi: "Almost good to merge" [homer/public] - 10https://gerrit.wikimedia.org/r/802072 (https://phabricator.wikimedia.org/T302198) (owner: 10Elukey)
[13:25:13] <wikibugs>	 (03PS13) 10Btullis: Add initial config for pooled status [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246)
[13:25:30] <wikibugs>	 (03PS9) 10Elukey: Add BGP configuration for the new ML staging codfw cluster [homer/public] - 10https://gerrit.wikimedia.org/r/802072 (https://phabricator.wikimedia.org/T302198)
[13:26:10] <wikibugs>	 (03CR) 10Elukey: Add BGP configuration for the new ML staging codfw cluster (032 comments) [homer/public] - 10https://gerrit.wikimedia.org/r/802072 (https://phabricator.wikimedia.org/T302198) (owner: 10Elukey)
[13:27:53] <hauskatze>	 No deployers around for this window? :)
[13:28:05] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add initial config for pooled status [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246) (owner: 10Btullis)
[13:28:40] <wikibugs>	 10ops-codfw, 10DBA: es2031 crashed (es2) - https://phabricator.wikimedia.org/T309977 (10Marostegui) a:03Papaul @Papaul can we contact Dell about this and get some advise? Checking the disk controller logs I haven't found anything relevant.
[13:29:13] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] "LGTM!" [homer/public] - 10https://gerrit.wikimedia.org/r/802072 (https://phabricator.wikimedia.org/T302198) (owner: 10Elukey)
[13:29:55] <wikibugs>	 (03PS1) 10Marostegui: es2031: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/803271 (https://phabricator.wikimedia.org/T309977)
[13:30:48] <wikibugs>	 (03PS1) 10Ssingh: trafficserver: 9.x upgrade: switch ip_allow.config to YAML format [puppet] - 10https://gerrit.wikimedia.org/r/803272 (https://phabricator.wikimedia.org/T309651)
[13:30:51] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Patch-For-Review: es2031 crashed (es2) - https://phabricator.wikimedia.org/T309977 (10Marostegui) As these hosts do not have replication, I am leaving MySQL stopped for now in case Papaul needs some reboots/firmware upgrade.  @Papaul if you need to power off or reboot this host...
[13:31:00] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] es2031: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/803271 (https://phabricator.wikimedia.org/T309977) (owner: 10Marostegui)
[13:34:15] <wikibugs>	 (03PS14) 10Btullis: Add initial config for pooled status [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246)
[13:35:30] <wikibugs>	 (03PS1) 10Ayounsi: Disable alert notifications on new netbox frontends [puppet] - 10https://gerrit.wikimedia.org/r/803274 (https://phabricator.wikimedia.org/T296452)
[13:36:17] <icinga-wm>	 PROBLEM - SSH on cp5012.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[13:36:27] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
[13:36:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:36:37] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
[13:36:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:37:07] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Add initial config for pooled status [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246) (owner: 10Btullis)
[13:37:20] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
[13:37:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:37:29] <wikibugs>	 (03CR) 10Ssingh: "https://puppet-compiler.wmflabs.org/pcc-worker1001/35737/cp2038.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/803272 (https://phabricator.wikimedia.org/T309651) (owner: 10Ssingh)
[13:37:50] <wikibugs>	 (03PS1) 10Zabe: netbase: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803275 (https://phabricator.wikimedia.org/T308013)
[13:37:52] <wikibugs>	 (03PS1) 10Zabe: ncredir: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803276 (https://phabricator.wikimedia.org/T308013)
[13:37:54] <wikibugs>	 (03PS1) 10Zabe: mtail: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803277 (https://phabricator.wikimedia.org/T308013)
[13:37:56] <wikibugs>	 (03PS1) 10Zabe: mjolnir: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803278 (https://phabricator.wikimedia.org/T308013)
[13:37:58] <wikibugs>	 (03PS1) 10Zabe: mcrouter: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803279 (https://phabricator.wikimedia.org/T308013)
[13:38:00] <wikibugs>	 (03PS1) 10Zabe: lxc: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803280 (https://phabricator.wikimedia.org/T308013)
[13:38:02] <wikibugs>	 (03PS1) 10Zabe: logster: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803281 (https://phabricator.wikimedia.org/T308013)
[13:38:04] <wikibugs>	 (03PS1) 10Zabe: logrotate: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803282 (https://phabricator.wikimedia.org/T308013)
[13:39:17] <logmsgbot>	 !log btullis@cumin1001 START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
[13:39:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:40:16] <urbanecm>	 jouncebot: nowandnext
[13:40:16] <jouncebot>	 For the next 0 hour(s) and 19 minute(s): UTC afternoon backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220606T1300)
[13:40:16] <jouncebot>	 In 1 hour(s) and 49 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220606T1530)
[13:40:22] <wikibugs>	 (03PS9) 10Urbanecm: Enable $wgFixDoubleRedirects on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/780636 (https://phabricator.wikimedia.org/T305782) (owner: 10MarcoAurelio)
[13:40:28] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "let's try it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/780636 (https://phabricator.wikimedia.org/T305782) (owner: 10MarcoAurelio)
[13:40:31] <wikibugs>	 (03PS15) 10Btullis: Add initial config for pooled status [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246)
[13:41:01] <urbanecm>	 hauskatze: deploying your patch. I don't think I need your presence for that, since it's a private wiki, which makes it hard for you to test :)
[13:41:07] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] mcrouter: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803279 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[13:41:10] <hauskatze>	 urbanecm: kind of :)
[13:41:18] <urbanecm>	 ?
[13:41:30] <hauskatze>	 kind of hard to test in a wiki I don't have an account
[13:41:35] <hauskatze>	 I mean :)
[13:42:20] <urbanecm>	 yeah, exactly :)
[13:43:24] <wikibugs>	 (03PS2) 10Zabe: mcrouter: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803279 (https://phabricator.wikimedia.org/T308013)
[13:44:19] <wikibugs>	 (03CR) 10Ladsgroup: os_reports: Make the reports look better (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/802897 (owner: 10Ladsgroup)
[13:44:30] <wikibugs>	 (03Merged) 10jenkins-bot: Enable $wgFixDoubleRedirects on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/780636 (https://phabricator.wikimedia.org/T305782) (owner: 10MarcoAurelio)
[13:45:03] <wikibugs>	 (03PS1) 10MVernon: Thanos: add search_platform user [puppet] - 10https://gerrit.wikimedia.org/r/803284 (https://phabricator.wikimedia.org/T309715)
[13:45:52] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] mcrouter: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803279 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[13:46:32] <wikibugs>	 (03PS1) 10Ssingh: trafficserver: 9.x upgrade: separate metric current_client_connections [puppet] - 10https://gerrit.wikimedia.org/r/803285 (https://phabricator.wikimedia.org/T309651)
[13:47:26] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
[13:47:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:47:34] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35738/console" [puppet] - 10https://gerrit.wikimedia.org/r/803285 (https://phabricator.wikimedia.org/T309651) (owner: 10Ssingh)
[13:48:28] <wikibugs>	 (03PS1) 10Ssingh: trafficserver: 9.x upgrade: rename max_connections_active_in [puppet] - 10https://gerrit.wikimedia.org/r/803286 (https://phabricator.wikimedia.org/T309651)
[13:49:25] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[13:49:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:29] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35739/console" [puppet] - 10https://gerrit.wikimedia.org/r/803286 (https://phabricator.wikimedia.org/T309651) (owner: 10Ssingh)
[13:49:42] <logmsgbot>	 !log btullis@cumin1001 END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
[13:49:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:50:16] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/CommonSettings.php: b7ca9fb268d59a3c2262733df247fb514b97f8b7: Enable $wgFixDoubleRedirects on officewiki (T305782) (duration: 03m 10s)
[13:50:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:50:20] <stashbot>	 T305782: Enable $wgFixDoubleRedirects on officewiki - https://phabricator.wikimedia.org/T305782
[13:50:28] <wikibugs>	 (03PS1) 10MVernon: profile::thanos::swift: fake creds for search_platform [labs/private] - 10https://gerrit.wikimedia.org/r/803287 (https://phabricator.wikimedia.org/T309715)
[13:50:37] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] Thanos: add search_platform user [puppet] - 10https://gerrit.wikimedia.org/r/803284 (https://phabricator.wikimedia.org/T309715) (owner: 10MVernon)
[13:50:44] <urbanecm>	 hauskatze: and let's see what happens :)
[13:51:14] <wikibugs>	 (03CR) 10Andrea Denisse: [C: 03+1] Rewrite logster::job to use systemd timers. [puppet] - 10https://gerrit.wikimedia.org/r/790325 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede)
[13:52:06] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[13:52:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:52:07] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[13:52:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:53:57] <hauskatze>	 urbanecm: thanks :)
[13:54:13] <urbanecm>	 np
[13:54:26] <hauskatze>	 hmm, you synced commonsettings, not IS?
[13:54:33] <urbanecm>	 ...
[13:54:37] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[13:54:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:54:45] <urbanecm>	 syncing again
[13:55:08] <urbanecm>	 thanks for noticing that
[13:55:12] <wikibugs>	 (03CR) 10MVernon: [C: 03+2] Thanos: add search_platform user [puppet] - 10https://gerrit.wikimedia.org/r/803284 (https://phabricator.wikimedia.org/T309715) (owner: 10MVernon)
[13:55:33] <wikibugs>	 (03CR) 10MVernon: [V: 03+2 C: 03+2] profile::thanos::swift: fake creds for search_platform [labs/private] - 10https://gerrit.wikimedia.org/r/803287 (https://phabricator.wikimedia.org/T309715) (owner: 10MVernon)
[13:56:28] <wikibugs>	 (03PS1) 10Ssingh: trafficserver: 9.x upgrade: remove deprecated parent_proxy_routing_enable [puppet] - 10https://gerrit.wikimedia.org/r/803288 (https://phabricator.wikimedia.org/T309651)
[13:56:39] <wikibugs>	 (03PS16) 10Btullis: Add initial config for pooled status [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246)
[13:56:58] <wikibugs>	 (03CR) 10Btullis: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/776225 (https://phabricator.wikimedia.org/T300246) (owner: 10Btullis)
[13:57:26] <hauskatze>	 np :)
[13:58:20] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: b7ca9fb268d59a3c2262733df247fb514b97f8b7: Enable $wgFixDoubleRedirects on officewiki (T305782) (duration: 03m 27s)
[13:58:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:24] <stashbot>	 T305782: Enable $wgFixDoubleRedirects on officewiki - https://phabricator.wikimedia.org/T305782
[13:58:30] <hauskatze>	 I was unsure if it is 'wg' or 'wmg'; MediaWiki docs say wg
[14:00:02] <urbanecm>	 it's wg. wmg are WM-specific variables.
[14:00:43] <wikibugs>	 10SRE-swift-storage, 10Discovery-Search (Current work), 10Patch-For-Review: Create swift thanos account for Search platform team - https://phabricator.wikimedia.org/T309715 (10MatthewVernon) 05Open→03Resolved a:03MatthewVernon This should all be done now, and I've restarted all the thanos swift frontends.
[14:02:43] <hauskatze>	 I'll be leaving shortly if there are no errors or a revert is needed
[14:03:46] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/803274 (https://phabricator.wikimedia.org/T296452) (owner: 10Ayounsi)
[14:09:37] <wikibugs>	 (03CR) 10Ayounsi: [V: 03+1 C: 03+2] "https://puppet-compiler.wmflabs.org/pcc-worker1001/35740/" [puppet] - 10https://gerrit.wikimedia.org/r/803274 (https://phabricator.wikimedia.org/T296452) (owner: 10Ayounsi)
[14:10:14] <wikibugs>	 (03CR) 10Herron: [C: 03+1] opensearch: add support for managing opensearch 2.0 [puppet] - 10https://gerrit.wikimedia.org/r/802862 (https://phabricator.wikimedia.org/T304440) (owner: 10Cwhite)
[14:11:35] <wikibugs>	 (03CR) 10MVernon: "Hi," [puppet] - 10https://gerrit.wikimedia.org/r/802604 (https://phabricator.wikimedia.org/T307801) (owner: 10Eevans)
[14:16:18] <wikibugs>	 (03PS2) 10Filippo Giunchedi: hieradata: TCP probe for ldap-ro [puppet] - 10https://gerrit.wikimedia.org/r/802071 (https://phabricator.wikimedia.org/T305847)
[14:16:20] <wikibugs>	 (03PS19) 10Filippo Giunchedi: prometheus::blackbox::check: add new blackbox exporter check [puppet] - 10https://gerrit.wikimedia.org/r/787067 (owner: 10Jbond)
[14:18:17] <icinga-wm>	 RECOVERY - k8s API server requests latencies on ml-serve-ctrl1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[14:19:20] <wikibugs>	 (03CR) 10MVernon: "Hi," [labs/private] - 10https://gerrit.wikimedia.org/r/802631 (https://phabricator.wikimedia.org/T307801) (owner: 10Eevans)
[14:20:19] <wikibugs>	 (03CR) 10Jbond: prometheus::blackbox::check: add new blackbox exporter check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/787067 (owner: 10Jbond)
[14:21:21] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] netbase: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803275 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[14:21:52] <wikibugs>	 10SRE, 10ops-codfw, 10DBA, 10Patch-For-Review: es2031 crashed (es2) - https://phabricator.wikimedia.org/T309977 (10Papaul) @Marostegui thanks
[14:22:04] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] ncredir: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803276 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[14:22:44] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] mtail: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803277 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[14:23:06] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] mjolnir: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803278 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[14:24:26] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] "thanks again will merge upto here the test in mcrouter need investigating" [puppet] - 10https://gerrit.wikimedia.org/r/803278 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[14:25:48] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] lxc: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803280 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[14:25:55] <wikibugs>	 (03PS2) 10Jbond: lxc: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803280 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[14:26:03] <wikibugs>	 (03PS2) 10Jbond: logster: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803281 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[14:26:34] <wikibugs>	 (03PS2) 10Jbond: logrotate: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803282 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[14:26:58] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] logster: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803281 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[14:27:04] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] logrotate: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803282 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[14:30:03] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Thank you for the followup, I've tested the change in Pontoon and reworked/adjusted a few bits and overall LGTM! See inline too" [puppet] - 10https://gerrit.wikimedia.org/r/787067 (owner: 10Jbond)
[14:30:11] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] logrotate: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803282 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[14:31:02] <wikibugs>	 (03PS1) 10Elukey: role::prometheus: enable settings for k8s ml-staging [puppet] - 10https://gerrit.wikimedia.org/r/803295 (https://phabricator.wikimedia.org/T302195)
[14:32:03] <wikibugs>	 (03CR) 10AOkoth: [C: 03+1] vrts: rename exim4 templates from otrs to vrts [puppet] - 10https://gerrit.wikimedia.org/r/802851 (https://phabricator.wikimedia.org/T293942) (owner: 10Dzahn)
[14:33:14] <wikibugs>	 (03PS2) 10Krinkle: hieradata: switchover doc to doc1002 [puppet] - 10https://gerrit.wikimedia.org/r/744763 (https://phabricator.wikimedia.org/T247653) (owner: 10Majavah)
[14:33:57] <wikibugs>	 (03CR) 10Jbond: "LGTM thanks will deploy" [puppet] - 10https://gerrit.wikimedia.org/r/802897 (owner: 10Ladsgroup)
[14:33:59] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] os_reports: Make the reports look better [puppet] - 10https://gerrit.wikimedia.org/r/802897 (owner: 10Ladsgroup)
[14:35:04] <wikibugs>	 (03PS1) 10Ssingh: trafficserver: 9.x upgrade: replace client.verify.server [puppet] - 10https://gerrit.wikimedia.org/r/803296 (https://phabricator.wikimedia.org/T309651)
[14:36:19] <wikibugs>	 (03PS3) 10Jbond: mcrouter: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803279 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[14:36:53] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35741/console" [puppet] - 10https://gerrit.wikimedia.org/r/803296 (https://phabricator.wikimedia.org/T309651) (owner: 10Ssingh)
[14:37:32] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] mcrouter: Add SPDX headers [puppet] - 10https://gerrit.wikimedia.org/r/803279 (https://phabricator.wikimedia.org/T308013) (owner: 10Zabe)
[14:38:45] <wikibugs>	 (03PS2) 10Elukey: role::prometheus: enable settings for k8s ml-staging [puppet] - 10https://gerrit.wikimedia.org/r/803295 (https://phabricator.wikimedia.org/T302195)
[14:41:05] <wikibugs>	 (03PS1) 10Ssingh: trafficserver: 9.x upgrade: remove redundant metrics [puppet] - 10https://gerrit.wikimedia.org/r/803297 (https://phabricator.wikimedia.org/T309651)
[14:41:52] <wikibugs>	 10SRE, 10Continuous-Integration-Infrastructure, 10serviceops, 10Patch-For-Review, 10Release-Engineering-Team (Seen): replace doc1001.eqiad.wmnet with a buster VM and create the codfw equivalent - https://phabricator.wikimedia.org/T247653 (10Krinkle) I propose the following rollout: 1. [change 744763 (pup...
[14:42:11] <wikibugs>	 (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35742/console" [puppet] - 10https://gerrit.wikimedia.org/r/803297 (https://phabricator.wikimedia.org/T309651) (owner: 10Ssingh)
[14:45:11] <icinga-wm>	 RECOVERY - SSH on pki2001.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[14:45:56] <wikibugs>	 (03PS1) 10Ladsgroup: os-reports: Push the ul elements inside [puppet] - 10https://gerrit.wikimedia.org/r/803299
[14:47:04] <wikibugs>	 (03PS1) 10Bking: Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720)
[14:47:38] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
[14:47:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:48:27] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720) (owner: 10Bking)
[14:49:00] <wikibugs>	 (03PS1) 10Ssingh: trafficserver: 9.x upgrade: update logging field for HTTP version [puppet] - 10https://gerrit.wikimedia.org/r/803301 (https://phabricator.wikimedia.org/T309651)
[14:50:04] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: es2031 crashed (es2) - https://phabricator.wikimedia.org/T309977 (10Papaul) `  2022-06-06 12:36:35  PCI1360  A bus fatal error was detected on a component at slot 4.   Log Sequence Number: 323 Detailed Description: System performance may be degraded, or system may fail to operate....
[14:50:16] <wikibugs>	 (03PS2) 10Bking: Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720)
[14:50:34] <wikibugs>	 (03CR) 10Ssingh: "Not very happy with this one but let's discuss that during the reviews." [puppet] - 10https://gerrit.wikimedia.org/r/803301 (https://phabricator.wikimedia.org/T309651) (owner: 10Ssingh)
[14:51:33] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720) (owner: 10Bking)
[14:52:04] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: es2031 crashed (es2) - https://phabricator.wikimedia.org/T309977 (10Papaul) https://www.dell.com/support/manuals/en-us/integrated-dell-remote-access-cntrllr-8-with-lifecycle-controller-v2.00.00.00/eemi_13g-v1/pci-event-messages?guid=guid-b22e470e-adc2-4ef4-ac82-98df81dc1dff&lang=en...
[14:52:16] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Add BGP configuration for the new ML staging codfw cluster [homer/public] - 10https://gerrit.wikimedia.org/r/802072 (https://phabricator.wikimedia.org/T302198) (owner: 10Elukey)
[14:52:39] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Add BGP configuration for the new ML staging codfw cluster (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/802072 (https://phabricator.wikimedia.org/T302198) (owner: 10Elukey)
[14:52:49] <wikibugs>	 (03PS3) 10Bking: Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720)
[14:52:54] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] os-reports: Push the ul elements inside [puppet] - 10https://gerrit.wikimedia.org/r/803299 (owner: 10Ladsgroup)
[14:53:27] <wikibugs>	 (03Merged) 10jenkins-bot: Add BGP configuration for the new ML staging codfw cluster [homer/public] - 10https://gerrit.wikimedia.org/r/802072 (https://phabricator.wikimedia.org/T302198) (owner: 10Elukey)
[14:53:48] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720) (owner: 10Bking)
[14:54:57] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
[14:55:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:55:35] <logmsgbot>	 !log kevinbazira@deploy1002 helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
[14:55:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:56:10] <wikibugs>	 (03PS4) 10Bking: Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720)
[14:56:23] <elukey>	 !log add BGP config for the k8s ml-staging cluster on cr{1,2}-codfw via homer - T302198
[14:56:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:56:29] <stashbot>	 T302198: Create ml-serve-staging k8s's control plane VMs - https://phabricator.wikimedia.org/T302198
[14:56:58] <wikibugs>	 (03CR) 10Jbond: prometheus::blackbox::check: add new blackbox exporter check (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/787067 (owner: 10Jbond)
[14:57:07] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720) (owner: 10Bking)
[14:57:18] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
[14:57:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:59:33] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
[14:59:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:59:53] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
[14:59:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:20] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
[15:00:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:32] <logmsgbot>	 !log elukey@deploy1002 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
[15:00:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:00:54] <wikibugs>	 (03PS5) 10Bking: Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720)
[15:01:47] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720) (owner: 10Bking)
[15:02:21] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] role::prometheus: enable settings for k8s ml-staging [puppet] - 10https://gerrit.wikimedia.org/r/803295 (https://phabricator.wikimedia.org/T302195) (owner: 10Elukey)
[15:04:09] <wikibugs>	 (03PS1) 10PipelineBot: mathoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/803305
[15:11:11] <icinga-wm>	 PROBLEM - Host es2031.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[15:16:53] <wikibugs>	 (03PS6) 10Bking: Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720)
[15:17:07] <icinga-wm>	 RECOVERY - Host es2031.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.69 ms
[15:17:14] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] mwdebug service: Add traindev environment support [deployment-charts] - 10https://gerrit.wikimedia.org/r/798883 (https://phabricator.wikimedia.org/T299648) (owner: 10Ahmon Dancy)
[15:17:50] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720) (owner: 10Bking)
[15:18:05] <wikibugs>	 (03CR) 10Ahmon Dancy: "Thanks Alex!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/798883 (https://phabricator.wikimedia.org/T299648) (owner: 10Ahmon Dancy)
[15:18:26] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: es2031 crashed (es2) - https://phabricator.wikimedia.org/T309977 (10Papaul) a:05Papaul→03Marostegui Firmware upgrade done for : - BIOS - IDRAC - Backplan1  Power drain on the server   @Marostegui we can repool the server for now after all the firmware upgrade according to Dell...
[15:18:52] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] mediawiki 0.2.2: Run test job as uid 1000 [deployment-charts] - 10https://gerrit.wikimedia.org/r/802799 (owner: 10Ahmon Dancy)
[15:20:52] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: es2031 crashed (es2) - https://phabricator.wikimedia.org/T309977 (10Marostegui) Sounds good Papaul, I will start MySQL again then
[15:21:00] <wikibugs>	 (03Merged) 10jenkins-bot: mwdebug service: Add traindev environment support [deployment-charts] - 10https://gerrit.wikimedia.org/r/798883 (https://phabricator.wikimedia.org/T299648) (owner: 10Ahmon Dancy)
[15:22:15] <wikibugs>	 (03Merged) 10jenkins-bot: mediawiki 0.2.2: Run test job as uid 1000 [deployment-charts] - 10https://gerrit.wikimedia.org/r/802799 (owner: 10Ahmon Dancy)
[15:23:07] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[15:24:53] <volans>	 XioNoX: didn't you merge the disable notification earlier for netbox1002?
[15:25:15] <XioNoX>	 volans: yep
[15:25:29] <wikibugs>	 10SRE, 10ops-codfw, 10DBA: es2031 crashed (es2) - https://phabricator.wikimedia.org/T309977 (10Marostegui) Upgraded and started mysql
[15:25:55] <volans>	 so why did it alert? :D
[15:26:13] <godog>	 jbond: ack re: stripping '---' from the output, yes that's what I meant
[15:26:31] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] tag-release.sh: add some logging, more rigorous tag push [phabricator/deployment] (wmf/stable) - 10https://gerrit.wikimedia.org/r/802869 (owner: 10Brennen Bearnes)
[15:29:55] <wikibugs>	 (03CR) 10Ahmon Dancy: [C: 03+1] GitLab: enable container registry [puppet] - 10https://gerrit.wikimedia.org/r/790778 (https://phabricator.wikimedia.org/T307537) (owner: 10Brennen Bearnes)
[15:30:05] <jouncebot>	 jan_drewniak: #bothumor I � Unicode. All rise for Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220606T1530).
[15:31:42] <wikibugs>	 (03PS2) 10Daimona Eaytoy: Remove references to $wgEnableLocalTimedText [mediawiki-config] - 10https://gerrit.wikimedia.org/r/802894
[15:35:08] <wikibugs>	 (03CR) 10Dom Walden: "I am afraid I don't know enough about this to comment. But, happy to +2." [deployment-charts] - 10https://gerrit.wikimedia.org/r/803305 (owner: 10PipelineBot)
[15:36:34] <logmsgbot>	 !log pt1979@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[15:36:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:36:44] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1001 for host clouddumps1001.wikimedia.org w...
[15:38:29] <icinga-wm>	 RECOVERY - SSH on cp5012.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:39:25] <icinga-wm>	 PROBLEM - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is CRITICAL: /v2/translate/{from}/{to}/{provider} (Machine translate an HTML fragment using TestClient, adapt the links to target language wiki.) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX
[15:39:26] <wikibugs>	 (03CR) 10Dave Pifke: [C: 03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/802752 (https://phabricator.wikimedia.org/T305460) (owner: 10Muehlenhoff)
[15:40:24] <wikibugs>	 (03CR) 10Dave Pifke: [C: 03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/802750 (https://phabricator.wikimedia.org/T305460) (owner: 10Muehlenhoff)
[15:40:53] <wikibugs>	 (03CR) 10Dave Pifke: [C: 03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/802749 (owner: 10Muehlenhoff)
[15:41:29] <icinga-wm>	 RECOVERY - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX
[15:42:07] <wikibugs>	 10SRE, 10Performance-Team, 10Patch-For-Review: Upgrade webperf hosts to Bullseye - https://phabricator.wikimedia.org/T305460 (10dpifke)
[15:42:39] <wikibugs>	 10SRE, 10Performance-Team, 10Patch-For-Review: Upgrade webperf hosts to Bullseye - https://phabricator.wikimedia.org/T305460 (10dpifke)
[15:43:18] <wikibugs>	 10SRE, 10Performance-Team, 10Patch-For-Review: Upgrade webperf hosts to Bullseye - https://phabricator.wikimedia.org/T305460 (10dpifke)
[15:43:32] <wikibugs>	 10SRE, 10DC-Ops, 10Discovery-Search (Current work): Upgrade cloudelastic clusters to Debian Bullseye - https://phabricator.wikimedia.org/T309343 (10MPhamWMF)
[15:44:40] <wikibugs>	 (03PS1) 10Jbond: wmflib: add to_yaml function which allows striping yaml header [puppet] - 10https://gerrit.wikimedia.org/r/803311
[15:45:47] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmflib: add to_yaml function which allows striping yaml header [puppet] - 10https://gerrit.wikimedia.org/r/803311 (owner: 10Jbond)
[15:45:48] <wikibugs>	 10SRE-tools, 10Discovery, 10Infrastructure-Foundations, 10Discovery-Search (Current work), 10IPv6: Some elastic hosts do not have IPv6 DNS records - https://phabricator.wikimedia.org/T271143 (10bking) This is complete...closing!
[15:47:03] <wikibugs>	 (03PS2) 10Jbond: wmflib: add to_yaml function which allows striping yaml header [puppet] - 10https://gerrit.wikimedia.org/r/803311
[15:48:06] <wikibugs>	 10SRE, 10Traffic, 10Patch-For-Review: Package and deploy ATS 9.1.2 - https://phabricator.wikimedia.org/T309651 (10ssingh)
[15:50:40] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] wmflib: add to_yaml function which allows striping yaml header [puppet] - 10https://gerrit.wikimedia.org/r/803311 (owner: 10Jbond)
[15:53:20] <wikibugs>	 (03PS3) 10Jbond: wmflib: add to_yaml function which allows striping yaml header [puppet] - 10https://gerrit.wikimedia.org/r/803311
[15:56:23] <logmsgbot>	 !log pt1979@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
[15:56:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:58:26] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] wmflib: add to_yaml function which allows striping yaml header [puppet] - 10https://gerrit.wikimedia.org/r/803311 (owner: 10Jbond)
[15:59:27] <logmsgbot>	 !log pt1979@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
[15:59:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:59:32] <wikibugs>	 (03PS7) 10Bking: Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720)
[16:03:18] <wikibugs>	 10SRE, 10Data-Engineering: Also intake Network Error Logging events into the Analytics Data Lake - https://phabricator.wikimedia.org/T304373 (10JArguello-WMF)
[16:03:44] <wikibugs>	 (03PS20) 10Jbond: prometheus::blackbox::check: add new blackbox exporter check [puppet] - 10https://gerrit.wikimedia.org/r/787067
[16:03:46] <wikibugs>	 10SRE, 10Data-Engineering, 10SRE Observability: dropped packets to kafkamon 9000/tcp - https://phabricator.wikimedia.org/T238794 (10JArguello-WMF)
[16:05:11] <wikibugs>	 (03PS21) 10Jbond: prometheus::blackbox::check: add new blackbox exporter check [puppet] - 10https://gerrit.wikimedia.org/r/787067
[16:06:16] <wikibugs>	 (03PS4) 10Ayounsi: Initial support for servers switch interfaces [cookbooks] - 10https://gerrit.wikimedia.org/r/803261
[16:08:13] <wikibugs>	 10SRE, 10SRE-tools, 10Icinga, 10Infrastructure-Foundations, 10observability: Icinga paged for a host that should have been downtimed - https://phabricator.wikimedia.org/T309447 (10Volans) Instead of adding a quick check in the downtime cookbook only I preferred to add the feature to spicerack directly so...
[16:08:46] <wikibugs>	 (03PS1) 10Volans: pylint: remove unnecessary comments [software/spicerack] - 10https://gerrit.wikimedia.org/r/803316
[16:08:48] <wikibugs>	 (03PS1) 10Volans: icinga: ensure that the downtime was applied [software/spicerack] - 10https://gerrit.wikimedia.org/r/803317 (https://phabricator.wikimedia.org/T309447)
[16:10:55] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Initial support for servers switch interfaces [cookbooks] - 10https://gerrit.wikimedia.org/r/803261 (owner: 10Ayounsi)
[16:10:58] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+1] Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720) (owner: 10Bking)
[16:12:00] <wikibugs>	 (03CR) 10Bking: [C: 03+2] Elastic: Add elastic bindir to root's path [puppet] - 10https://gerrit.wikimedia.org/r/803300 (https://phabricator.wikimedia.org/T309720) (owner: 10Bking)
[16:12:45] <logmsgbot>	 !log pt1979@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddumps1001.wikimedia.org with OS bullseye
[16:12:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:12:55] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin1001 for host clouddumps1001.wikimedia.org with...
[16:15:38] <TheresNoTime>	 legoktm: here's the nudge you asked for ref merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/800855 if no one else had :) T309449
[16:15:39] <stashbot>	 T309449: Package 'cgroup-bin' has no installation candidate on Debian 11 (modules/mediawiki/manifests/cgroup.pp) - https://phabricator.wikimedia.org/T309449
[16:16:59] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] pylint: remove unnecessary comments [software/spicerack] - 10https://gerrit.wikimedia.org/r/803316 (owner: 10Volans)
[16:20:09] <wikibugs>	 (03PS2) 10Volans: pylint: remove unnecessary comments [software/spicerack] - 10https://gerrit.wikimedia.org/r/803316
[16:20:11] <wikibugs>	 (03PS2) 10Volans: icinga: ensure that the downtime was applied [software/spicerack] - 10https://gerrit.wikimedia.org/r/803317 (https://phabricator.wikimedia.org/T309447)
[16:23:28] <wikibugs>	 (03CR) 10Ayounsi: "Example output for an interface rename: https://phabricator.wikimedia.org/P29440" [cookbooks] - 10https://gerrit.wikimedia.org/r/803261 (owner: 10Ayounsi)
[16:26:16] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[16:30:15] <wikibugs>	 (03PS1) 10Papaul: Testing new partman recipe for clouddumps nodes [puppet] - 10https://gerrit.wikimedia.org/r/803318 (https://phabricator.wikimedia.org/T302981)
[16:31:55] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] Testing new partman recipe for clouddumps nodes [puppet] - 10https://gerrit.wikimedia.org/r/803318 (https://phabricator.wikimedia.org/T302981) (owner: 10Papaul)
[16:33:36] <logmsgbot>	 !log pt1979@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[16:33:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:33:46] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1001 for host clouddumps1001.wikimedia.org w...
[16:34:23] <wikibugs>	 (03PS5) 10Ayounsi: Initial support for servers switch interfaces [cookbooks] - 10https://gerrit.wikimedia.org/r/803261
[16:42:06] <wikibugs>	 (03PS1) 10Ebernhardson: Revert "Revert "Upgrade to elasticsearch 7.10.2"" [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/803321 (https://phabricator.wikimedia.org/T309720)
[16:43:30] <wikibugs>	 10SRE, 10ops-codfw: (Need By:TBD) rack/setup/install row A new PDUs - https://phabricator.wikimedia.org/T309957 (10Papaul) Testing out the "Move devices attributes" script before using it on the new PDUs move all configuration from ps1-a2-codfw to ps1-a2-codfw-new give the output below   ` [success] [dst] Sett...
[16:45:39] <logmsgbot>	 !log pt1979@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
[16:45:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:46:44] <XioNoX>	 volans: puppet is disabled on netbox1002
[16:46:54] <XioNoX>	 that's why, I'll follow up on that
[16:47:06] <volans>	 XioNoX: ahhh thx
[16:48:21] <logmsgbot>	 !log pt1979@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
[16:48:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:51:06] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Superset & Turnilo for kstoller - https://phabricator.wikimedia.org/T310002 (10nettrom_WMF)
[16:53:00] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Product-Analytics: Requesting access to Superset & Turnilo for kstoller - https://phabricator.wikimedia.org/T310002 (10nettrom_WMF) I filed this task and am notifying @KStoller-WMF about it so she can fill out the necessary information. I don't think SSH access is needed at th...
[16:59:29] <wikibugs>	 (03CR) 10Thcipriani: [C: 03+2] tag-release.sh: add some logging, more rigorous tag push [phabricator/deployment] (wmf/stable) - 10https://gerrit.wikimedia.org/r/802869 (owner: 10Brennen Bearnes)
[17:00:05] <jouncebot>	 ryankemper: It is that lovely time of the day again! You are hereby commanded to deploy Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220606T1700).
[17:01:16] <logmsgbot>	 !log pt1979@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddumps1001.wikimedia.org with OS bullseye
[17:01:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:01:27] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin1001 for host clouddumps1001.wikimedia.org with...
[17:18:36] <wikibugs>	 (03PS1) 10Jbond: C:puppetmaster: Add requestctl validate to the private repo pre-commit [puppet] - 10https://gerrit.wikimedia.org/r/803324
[17:20:30] <wikibugs>	 (03PS2) 10Jbond: C:puppetmaster: Add requestctl validate to the private repo pre-commit [puppet] - 10https://gerrit.wikimedia.org/r/803324
[17:21:33] <wikibugs>	 (03PS3) 10Jbond: C:puppetmaster: Add requestctl validate to the private repo pre-commit [puppet] - 10https://gerrit.wikimedia.org/r/803324
[17:21:54] <wikibugs>	 10SRE, 10Data-Engineering, 10Discovery, 10Event-Platform, 10Platform Team Workboards (Clinic Duty Team): Avoid accepting Kafka messages with whacky timestamps - https://phabricator.wikimedia.org/T282887 (10JArguello-WMF)
[17:24:03] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 22): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35745/console" [puppet] - 10https://gerrit.wikimedia.org/r/803324 (owner: 10Jbond)
[17:35:52] <wikibugs>	 (03CR) 10Herron: [C: 03+1] "LGTM! 📈📉" [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/802646 (https://phabricator.wikimedia.org/T302842) (owner: 10RLazarus)
[17:36:51] <icinga-wm>	 PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
[17:37:04] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Add Evelien WMDE to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T309700 (10KFrancis) @MoritzMuehlenhoff @CDanis The NDA has been completed.  Please proceed with the access request.  Thanks!
[17:39:31] <wikibugs>	 (03Abandoned) 10Reedy: Bump default cache epochs from 20130601 to 20160101 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/443866 (owner: 10Reedy)
[17:39:52] <logmsgbot>	 !log pt1979@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[17:39:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:40:12] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1001 for host clouddumps1001.wikimedia.org w...
[17:40:45] <wikibugs>	 (03PS2) 10Volans: Netbox Ganeti sync: add groups support [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/802179 (https://phabricator.wikimedia.org/T262446)
[17:41:15] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] C:puppetmaster: Add requestctl validate to the private repo pre-commit (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/803324 (owner: 10Jbond)
[17:44:03] <wikibugs>	 (03CR) 10Brennen Bearnes: [V: 03+2] tag-release.sh: add some logging, more rigorous tag push [phabricator/deployment] (wmf/stable) - 10https://gerrit.wikimedia.org/r/802869 (owner: 10Brennen Bearnes)
[17:45:58] <wikibugs>	 (03PS1) 10Herron: logstash-slo: update plugin id label from elasticsearch to logstash [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/803328
[17:46:29] <wikibugs>	 (03PS2) 10Herron: logstash-slo: update plugin id label from elasticsearch to opensearch [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/803328
[17:49:06] <wikibugs>	 (03CR) 10Herron: [V: 03+2 C: 03+2] logstash-slo: update plugin id label from elasticsearch to opensearch [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/803328 (owner: 10Herron)
[17:50:43] <wikibugs>	 10SRE, 10conftool, 10Sustainability (Incident Followup): Make it easier to create a new requestctl object - https://phabricator.wikimedia.org/T310009 (10RLazarus) p:05Triage→03Medium
[17:57:09] <logmsgbot>	 !log pt1979@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
[17:57:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:58:20] <wikibugs>	 (03PS3) 10Volans: Netbox Ganeti sync: add groups support [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/802179 (https://phabricator.wikimedia.org/T262446)
[17:58:47] <wikibugs>	 (03CR) 10Volans: "addressed comment" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/802179 (https://phabricator.wikimedia.org/T262446) (owner: 10Volans)
[18:00:18] <logmsgbot>	 !log pt1979@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
[18:00:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:57] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
[18:10:58] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
[18:10:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:03] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1143 (T298560)', diff saved to https://phabricator.wikimedia.org/P29442 and previous config saved to /var/cache/conftool/dbconfig/20220606-181103-ladsgroup.json
[18:11:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:07] <stashbot>	 T298560: Fix mismatching field type of revision.rev_timestamp on wmf wikis - https://phabricator.wikimedia.org/T298560
[18:12:31] <wikibugs>	 (03CR) 10RLazarus: slo: Correct queries for error budget remaining (032 comments) [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/802646 (https://phabricator.wikimedia.org/T302842) (owner: 10RLazarus)
[18:14:30] <logmsgbot>	 !log pt1979@cumin1001 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddumps1001.wikimedia.org with OS bullseye
[18:14:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:40] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin1001 for host clouddumps1001.wikimedia.org with...
[18:14:55] <wikibugs>	 (03PS5) 10RLazarus: slo: Correct queries for error budget remaining [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/802646 (https://phabricator.wikimedia.org/T302842)
[18:15:50] <wikibugs>	 (03CR) 10RLazarus: [V: 03+2 C: 03+2] slo: Correct queries for error budget remaining [grafana-grizzly] - 10https://gerrit.wikimedia.org/r/802646 (https://phabricator.wikimedia.org/T302842) (owner: 10RLazarus)
[18:17:30] <icinga-wm>	 PROBLEM - k8s API server requests latencies on ml-serve-ctrl1001 is CRITICAL: instance=10.64.16.202 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[18:19:23] <wikibugs>	 (03PS2) 10Volans: ganeti-netbox-sync: refactor into classes [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/802178
[18:19:31] <wikibugs>	 (03CR) 10Volans: "addressed comments" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/802178 (owner: 10Volans)
[18:19:51] <wikibugs>	 (03CR) 10Volans: "addressed comments" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/802179 (https://phabricator.wikimedia.org/T262446) (owner: 10Volans)
[18:21:01] <wikibugs>	 (03CR) 10Volans: "Sorry, I messed up with the rebase, I squashed the 2 CR into this latest PS, I'll fix it later re-splitting the two." [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/802178 (owner: 10Volans)
[18:27:33] <wikibugs>	 (03CR) 10RLazarus: "Just for clarity: would you like me to go ahead and deploy this?" [puppet] - 10https://gerrit.wikimedia.org/r/801776 (https://phabricator.wikimedia.org/T285570) (owner: 10Catrope)
[18:28:00] <wikibugs>	 (03PS1) 10Andrew Bogott: magnum: update policy.yaml [puppet] - 10https://gerrit.wikimedia.org/r/803332
[18:30:11] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] magnum: update policy.yaml [puppet] - 10https://gerrit.wikimedia.org/r/803332 (owner: 10Andrew Bogott)
[18:36:00] <wikibugs>	 (03PS1) 10Andrew Bogott: magnum policy.yaml: close a string [puppet] - 10https://gerrit.wikimedia.org/r/803333
[18:37:13] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] magnum policy.yaml: close a string [puppet] - 10https://gerrit.wikimedia.org/r/803333 (owner: 10Andrew Bogott)
[18:37:52] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2009 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[18:39:34] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1009 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[18:39:55] <wikibugs>	 10SRE, 10LDAP-Access-Requests: Add Evelien WMDE to the ldap/wmde and ldap/nda group - https://phabricator.wikimedia.org/T309700 (10CDanis) 05Open→03Resolved a:05KFrancis→03CDanis Completed!
[18:56:33] <wikibugs>	 (03PS1) 10Andrew Bogott: Revert "P:wmcs::prometheus: set openstack scrape_interval to 4m" [puppet] - 10https://gerrit.wikimedia.org/r/802956
[18:58:36] <wikibugs>	 10SRE, 10ops-eqiad, 10serviceops: mw1415 (canary appserver) is down, incl. mgmt - https://phabricator.wikimedia.org/T307755 (10Dzahn) @Cmjohnson Alright, gotcha! Thanks for the updates and Dell request.
[18:58:47] <wikibugs>	 10SRE, 10ops-eqiad, 10serviceops: mw1415 (canary appserver) is down, incl. mgmt - https://phabricator.wikimedia.org/T307755 (10Dzahn) 05Open→03In progress
[19:00:15] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Revert "P:wmcs::prometheus: set openstack scrape_interval to 4m" [puppet] - 10https://gerrit.wikimedia.org/r/802956 (owner: 10Andrew Bogott)
[19:02:02] <wikibugs>	 (03CR) 10Catrope: doc.wikimedia.org CSP: Also allow form submissions to enwiki/wikidata (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/801776 (https://phabricator.wikimedia.org/T285570) (owner: 10Catrope)
[19:13:34] <rzl>	 jouncebot: nowandnext
[19:13:34] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 46 minute(s)
[19:13:34] <jouncebot>	 In 0 hour(s) and 46 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220606T2000)
[19:13:53] <rzl>	 !log disabling puppet on appservers to deploy https://gerrit.wikimedia.org/r/801776
[19:13:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:15:46] <icinga-wm>	 PROBLEM - SSH on wtp1036.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:15:49] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] doc.wikimedia.org CSP: Also allow form submissions to enwiki/wikidata [puppet] - 10https://gerrit.wikimedia.org/r/801776 (https://phabricator.wikimedia.org/T285570) (owner: 10Catrope)
[19:19:05] <rzl>	 !log enabled puppet on appservers
[19:19:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:23:43] <wikibugs>	 10SRE, 10Sustainability (Incident Followup): get a legend for haproxy "anomalous session termination states" - https://phabricator.wikimedia.org/T308952 (10Dzahn) This is a better link since it's directly upstream and latest docs from 2022:  https://www.haproxy.org/download/2.7/doc/configuration.txt  ^ it's th...
[19:24:17] <wikibugs>	 10SRE, 10SRE-OnFire, 10Observability-Logging, 10Sustainability (Incident Followup), 10Wikimedia-Incident: create a sampled log of POST data - https://phabricator.wikimedia.org/T309186 (10Krinkle) We have something like this for POST requests to `api.php` on appservers, which we log (unsampled) to `api.lo...
[19:25:58] <wikibugs>	 (03CR) 10Ryan Kemper: [V: 03+1 C: 03+2] query service: port cronjobs to systemd timers [puppet] - 10https://gerrit.wikimedia.org/r/792104 (https://phabricator.wikimedia.org/T273673) (owner: 10Slyngshede)
[19:29:51] <wikibugs>	 (03PS1) 10Ryan Kemper: query service: clean up absented resources [puppet] - 10https://gerrit.wikimedia.org/r/803336 (https://phabricator.wikimedia.org/T273673)
[19:34:56] <icinga-wm>	 RECOVERY - k8s API server requests latencies on ml-serve-ctrl1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&viewPanel=27
[19:37:05] <wikibugs>	 10SRE, 10Sustainability (Incident Followup): get a legend for haproxy "anomalous session termination states" - https://phabricator.wikimedia.org/T308952 (10Dzahn) 05In progress→03Open
[19:38:04] <wikibugs>	 (03PS2) 10Ryan Kemper: query service: clean up absented resources [puppet] - 10https://gerrit.wikimedia.org/r/803336 (https://phabricator.wikimedia.org/T273673)
[19:38:17] <wikibugs>	 (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/803336 (https://phabricator.wikimedia.org/T273673) (owner: 10Ryan Kemper)
[19:46:29] <wikibugs>	 (03PS1) 10AntiCompositeNumber: SpecialDeletedContributions: Hide date headers [core] (wmf/1.39.0-wmf.14) - 10https://gerrit.wikimedia.org/r/802957
[19:54:59] <wikibugs>	 (03PS1) 10Ryan Kemper: query_service: load categories daily, not weekly [puppet] - 10https://gerrit.wikimedia.org/r/803339 (https://phabricator.wikimedia.org/T273673)
[19:55:15] <wikibugs>	 (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/803339 (https://phabricator.wikimedia.org/T273673) (owner: 10Ryan Kemper)
[19:58:46] <wikibugs>	 (03PS2) 10Ryan Kemper: query_service: load categories daily, not weekly [puppet] - 10https://gerrit.wikimedia.org/r/803339 (https://phabricator.wikimedia.org/T273673)
[20:00:05] <jouncebot>	 RoanKattouw, Urbanecm, and cjming: #bothumor I � Unicode. All rise for UTC late backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220606T2000).
[20:00:05] <jouncebot>	 AntiComposite: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[20:00:22] <AntiComposite>	 o/
[20:00:23] <wikibugs>	 (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/803339 (https://phabricator.wikimedia.org/T273673) (owner: 10Ryan Kemper)
[20:00:24] <urbanecm>	 hello AntiComposite!
[20:00:33] <urbanecm>	 i can deploy today
[20:01:43] <wikibugs>	 (03PS1) 10Nskaggs: Add tenacity lib and retry logic [puppet] - 10https://gerrit.wikimedia.org/r/803340
[20:02:15] <urbanecm>	 AntiComposite: just to double check, we're removing the dates from https://test.wikipedia.org/wiki/Special:DeletedContributions/Martin_Urbanec, right?
[20:02:19] <urbanecm>	 (the headlines i mean)
[20:02:30] <AntiComposite>	 yes, that's correct
[20:02:34] <urbanecm>	 ok, thanks
[20:02:36] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] SpecialDeletedContributions: Hide date headers [core] (wmf/1.39.0-wmf.14) - 10https://gerrit.wikimedia.org/r/802957 (owner: 10AntiCompositeNumber)
[20:02:39] <AntiComposite>	 same as Special:Contribs
[20:03:05] <urbanecm>	 okay
[20:03:19] <urbanecm>	 I'll let you know once this is ready for testing -- will take a while to merge
[20:03:49] <AntiComposite>	 alright
[20:05:45] <wikibugs>	 (03PS3) 10Ryan Kemper: query_service: load categories daily, not weekly [puppet] - 10https://gerrit.wikimedia.org/r/803339 (https://phabricator.wikimedia.org/T273673)
[20:05:55] <wikibugs>	 (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/803339 (https://phabricator.wikimedia.org/T273673) (owner: 10Ryan Kemper)
[20:08:05] <wikibugs>	 (03PS1) 10Jbond: utils: Add small script to set up bundler [puppet] - 10https://gerrit.wikimedia.org/r/803341
[20:08:59] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] utils: Add small script to set up bundler [puppet] - 10https://gerrit.wikimedia.org/r/803341 (owner: 10Jbond)
[20:09:09] <wikibugs>	 (03PS4) 10Ryan Kemper: query_service: load categories daily, not weekly [puppet] - 10https://gerrit.wikimedia.org/r/803339 (https://phabricator.wikimedia.org/T273673)
[20:10:44] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] query_service: load categories daily, not weekly [puppet] - 10https://gerrit.wikimedia.org/r/803339 (https://phabricator.wikimedia.org/T273673) (owner: 10Ryan Kemper)
[20:13:09] <wikibugs>	 (03CR) 10Dzahn: utils: Add small script to set up bundler (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/803341 (owner: 10Jbond)
[20:13:51] <wikibugs>	 (03PS3) 10Ryan Kemper: query service: clean up absented resources [puppet] - 10https://gerrit.wikimedia.org/r/803336 (https://phabricator.wikimedia.org/T273673)
[20:17:41] <wikibugs>	 (03CR) 10Ryan Kemper: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/803336 (https://phabricator.wikimedia.org/T273673) (owner: 10Ryan Kemper)
[20:18:17] <wikibugs>	 (03CR) 10Eevans: WIP: Configure AQS Cassandra hosts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/802604 (https://phabricator.wikimedia.org/T307801) (owner: 10Eevans)
[20:21:43] <wikibugs>	 (03Merged) 10jenkins-bot: SpecialDeletedContributions: Hide date headers [core] (wmf/1.39.0-wmf.14) - 10https://gerrit.wikimedia.org/r/802957 (owner: 10AntiCompositeNumber)
[20:21:55] <wikibugs>	 (03PS4) 10Ryan Kemper: query service: clean up absented resources [puppet] - 10https://gerrit.wikimedia.org/r/803336 (https://phabricator.wikimedia.org/T273673)
[20:22:09] <urbanecm>	 here we go :)
[20:22:14] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] query service: clean up absented resources [puppet] - 10https://gerrit.wikimedia.org/r/803336 (https://phabricator.wikimedia.org/T273673) (owner: 10Ryan Kemper)
[20:22:16] <wikibugs>	 (03CR) 10Ryan Kemper: [V: 03+2 C: 03+2] query service: clean up absented resources [puppet] - 10https://gerrit.wikimedia.org/r/803336 (https://phabricator.wikimedia.org/T273673) (owner: 10Ryan Kemper)
[20:23:48] <urbanecm>	 AntiComposite: should be ready at mwdebug1001. can you check please?
[20:24:23] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] START helmfile.d/services/mwdebug: apply
[20:24:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:24:25] <AntiComposite>	 urbanecm, looks good to me, thanks
[20:24:40] <urbanecm>	 thanks, syncing
[20:25:42] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops: Recycling Pickup for EQIAD - https://phabricator.wikimedia.org/T307140 (10wiki_willy) Initial quote received back is  $26,642.00 for the equipment, minus  $3,325.25 for the drive shredding and $3,716.76 for freight charges.  I'm seeing if they can lower the freight costs, before...
[20:26:16] <jinxer-wm>	 (CirrusSearchHighOldGCFrequency) firing: Elasticsearch instance cloudelastic1004-cloudelastic-chi-eqiad is running the old gc excessively - https://wikitech.wikimedia.org/wiki/Search#Stuck_in_old_GC_hell - https://grafana.wikimedia.org/d/000000462/elasticsearch-memory - https://alerts.wikimedia.org/?q=alertname%3DCirrusSearchHighOldGCFrequency
[20:27:06] <wikibugs>	 (03CR) 10Eevans: [C: 03+1] Dummy keys and certificates for cassandra (aqs) (031 comment) [labs/private] - 10https://gerrit.wikimedia.org/r/802631 (https://phabricator.wikimedia.org/T307801) (owner: 10Eevans)
[20:27:08] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
[20:27:09] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] START helmfile.d/services/mwdebug: apply
[20:27:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:27:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:06] <logmsgbot>	 !log mwdebug-deploy@deploy1002 helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
[20:28:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:34] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized php-1.39.0-wmf.14/includes/specials/SpecialDeletedContributions.php: a15c11e72d766fa45aee690d3dffb17b186a35e0: SpecialDeletedContributions: Hide date headers (duration: 03m 09s)
[20:28:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:28:39] <urbanecm>	 AntiComposite: and live
[20:28:41] <urbanecm>	 anything else?
[20:28:56] <AntiComposite>	 nope, looks good to me, thanks for your help!
[20:29:39] <urbanecm>	 no problem, thanks for the patch!
[20:29:47] <urbanecm>	 !log UTC late B&C window deploy
[20:29:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:29:51] <urbanecm>	 !log UTC late B&C window deploy completed
[20:29:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:30:10] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] opensearch: add support for managing opensearch 2.0 [puppet] - 10https://gerrit.wikimedia.org/r/802862 (https://phabricator.wikimedia.org/T304440) (owner: 10Cwhite)
[20:31:11] <wikibugs>	 (03PS1) 10Ryan Kemper: query_service: we don't use cron here anymore [puppet] - 10https://gerrit.wikimedia.org/r/803344 (https://phabricator.wikimedia.org/T273673)
[20:33:20] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] add new index pattern format [software/ecs] - 10https://gerrit.wikimedia.org/r/802873 (https://phabricator.wikimedia.org/T305175) (owner: 10Cwhite)
[20:33:59] <wikibugs>	 (03Merged) 10jenkins-bot: add new index pattern format [software/ecs] - 10https://gerrit.wikimedia.org/r/802873 (https://phabricator.wikimedia.org/T305175) (owner: 10Cwhite)
[20:39:07] <wikibugs>	 (03CR) 10Andrew Bogott: "This might help with openstack instability. Is it possible to add a sleep between attempts? Or is that the default?" [puppet] - 10https://gerrit.wikimedia.org/r/803340 (owner: 10Nskaggs)
[20:39:49] <wikibugs>	 (03CR) 10Andrew Bogott: Add tenacity lib and retry logic (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/803340 (owner: 10Nskaggs)
[20:43:12] <icinga-wm>	 PROBLEM - SSH on cp5012.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:44:21] <wikibugs>	 (03PS1) 10Cwhite: templates: replace all version instances [software/ecs] - 10https://gerrit.wikimedia.org/r/803345 (https://phabricator.wikimedia.org/T305175)
[20:47:20] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] query_service: we don't use cron here anymore [puppet] - 10https://gerrit.wikimedia.org/r/803344 (https://phabricator.wikimedia.org/T273673) (owner: 10Ryan Kemper)
[20:48:16] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] templates: replace all version instances [software/ecs] - 10https://gerrit.wikimedia.org/r/803345 (https://phabricator.wikimedia.org/T305175) (owner: 10Cwhite)
[20:48:49] <wikibugs>	 (03Merged) 10jenkins-bot: templates: replace all version instances [software/ecs] - 10https://gerrit.wikimedia.org/r/803345 (https://phabricator.wikimedia.org/T305175) (owner: 10Cwhite)
[20:57:56] <icinga-wm>	 PROBLEM - SSH on druid1006.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:00:05] <jouncebot>	 Reedy, sbassett, Maryum, and manfredi: (Dis)respected human, time to deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20220606T2100). Please do the needful.
[21:03:45] <wikibugs>	 (03PS1) 10Cwhite: add new index pattern to ecs templates [puppet] - 10https://gerrit.wikimedia.org/r/803350 (https://phabricator.wikimedia.org/T305175)
[21:04:15] <wikibugs>	 (03PS2) 10Cwhite: logstash: add new index pattern to ecs templates [puppet] - 10https://gerrit.wikimedia.org/r/803350 (https://phabricator.wikimedia.org/T305175)
[21:07:42] <wikibugs>	 (03CR) 10Dduvall: [C: 03+1] Turn mw_releases into a list [puppet] - 10https://gerrit.wikimedia.org/r/800758 (https://phabricator.wikimedia.org/T299648) (owner: 10Ahmon Dancy)
[21:09:34] <wikibugs>	 (03PS1) 10Nskaggs: Fix spelling [puppet] - 10https://gerrit.wikimedia.org/r/803353
[21:10:45] <wikibugs>	 (03Abandoned) 10Nskaggs: Fix spelling [puppet] - 10https://gerrit.wikimedia.org/r/803353 (owner: 10Nskaggs)
[21:17:56] <icinga-wm>	 PROBLEM - SSH on aqs1008.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:24:04] <wikibugs>	 10ops-eqiad: Port with no description on access switch - https://phabricator.wikimedia.org/T309741 (10wiki_willy) a:03Cmjohnson
[21:24:54] <wikibugs>	 (03PS3) 10Volans: ganeti-netbox-sync: refactor into classes [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/802178
[21:24:56] <wikibugs>	 (03PS4) 10Volans: Netbox Ganeti sync: add groups support [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/802179 (https://phabricator.wikimedia.org/T262446)
[21:26:31] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): Q4: (Need By: TBD) rack/setup/install 6 wmcs hosts - https://phabricator.wikimedia.org/T304888 (10wiki_willy)
[21:26:44] <wikibugs>	 (03CR) 10Volans: "un-squashed commits" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/802178 (owner: 10Volans)
[21:27:08] <wikibugs>	 10SRE, 10ops-eqiad, 10cloud-services-team (Kanban): Degraded RAID on cloudnet1004 - https://phabricator.wikimedia.org/T309576 (10wiki_willy) 05Open→03Resolved a:03wiki_willy
[21:29:39] <wikibugs>	 (03CR) 10Volans: "FYI, if it might be useful we have also @retry in the wmflib package. See the related documentation in:" [puppet] - 10https://gerrit.wikimedia.org/r/803340 (owner: 10Nskaggs)
[21:29:43] <wikibugs>	 (03PS2) 10Nskaggs: Add tenacity lib and retry logic [puppet] - 10https://gerrit.wikimedia.org/r/803340
[21:31:33] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1 C: 03+2] "https://puppet-compiler.wmflabs.org/pcc-worker1003/35746/" [puppet] - 10https://gerrit.wikimedia.org/r/802851 (https://phabricator.wikimedia.org/T293942) (owner: 10Dzahn)
[21:33:16] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1 C: 03+2] "noop on otrs1001 confirmed" [puppet] - 10https://gerrit.wikimedia.org/r/802851 (https://phabricator.wikimedia.org/T293942) (owner: 10Dzahn)
[21:33:49] <wikibugs>	 (03PS2) 10Dzahn: vrts: rename daemon resource and template from otrs to vrts [puppet] - 10https://gerrit.wikimedia.org/r/802849 (https://phabricator.wikimedia.org/T293942)
[21:34:34] * Krinkle testing on mwdebug1002
[21:35:18] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1 C: 03+2] "https://puppet-compiler.wmflabs.org/pcc-worker1002/35747/otrs1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/802849 (https://phabricator.wikimedia.org/T293942) (owner: 10Dzahn)
[21:41:19] <mutante>	 !log otrs1001 - stopped otrs-daemon, started vrts-daemon - after renaming it gerrit:802849 (T293942)
[21:41:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:41:25] <stashbot>	 T293942: refactor OTRS role/module/cumin aliases - https://phabricator.wikimedia.org/T293942
[21:48:24] <wikibugs>	 (03CR) 10Dzahn: [V: 03+1 C: 03+2] "manually stopped otrs-daemon, started vrts-daemon" [puppet] - 10https://gerrit.wikimedia.org/r/802849 (https://phabricator.wikimedia.org/T293942) (owner: 10Dzahn)
[22:01:23] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] logstash: add new index pattern to ecs templates [puppet] - 10https://gerrit.wikimedia.org/r/803350 (https://phabricator.wikimedia.org/T305175) (owner: 10Cwhite)
[22:03:00] * Krinkle done testing
[22:09:33] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Product-Analytics: Requesting access to Superset & Turnilo for kstoller - https://phabricator.wikimedia.org/T310002 (10KStoller-WMF)
[22:12:00] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2009 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[22:12:32] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1009 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[22:16:32] <wikibugs>	 (03CR) 10Nskaggs: Add tenacity lib and retry logic (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/803340 (owner: 10Nskaggs)
[22:18:15] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Product-Analytics: Requesting access to Superset & Turnilo for kstoller - https://phabricator.wikimedia.org/T310002 (10KStoller-WMF) I've added my info to the task and signed the "Acknowledgement of Wikimedia Server Access Responsibilities".  I think I now need @MMiller_WMF 's...
[22:19:06] <icinga-wm>	 RECOVERY - SSH on aqs1008.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:19:28] <icinga-wm>	 RECOVERY - SSH on wtp1036.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[22:21:54] <cwhite>	 !log upgrade prometheus-es-exporter on logstash2026 T304440
[22:21:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:21:58] <stashbot>	 T304440: Test and upgrade OpenSearch to 2.0.0 - https://phabricator.wikimedia.org/T304440
[22:24:06] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10serviceops: allow certain users to disable puppet on mwdebug hosts - https://phabricator.wikimedia.org/T305979 (10Dzahn) a:03Dzahn ok, thank you IF team! assigning back to me for the moment to follow-up. Yes, there was a specific person. I will readd this with a speci...
[22:33:07] <wikibugs>	 10SRE, 10Codex, 10WVUI, 10ContentSecurityPolicy, 10SecTeam-Processed: WVUI and Codex demos: CSP stopping typeahead input demos working - https://phabricator.wikimedia.org/T285570 (10Catrope) 05Open→03Resolved
[22:39:36] <cwhite>	 !log upgrade prometheus-es-exporter on logstash1026 T304440
[22:39:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:39:42] <stashbot>	 T304440: Test and upgrade OpenSearch to 2.0.0 - https://phabricator.wikimedia.org/T304440
[22:45:34] <icinga-wm>	 RECOVERY - SSH on cp5012.mgmt is OK: SSH OK - OpenSSH_7.4 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:10:01] <wikibugs>	 (03PS1) 10Papaul: Testing partman recipe for couddumps nodes [puppet] - 10https://gerrit.wikimedia.org/r/803366 (https://phabricator.wikimedia.org/T302981)
[23:11:58] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] Testing partman recipe for couddumps nodes [puppet] - 10https://gerrit.wikimedia.org/r/803366 (https://phabricator.wikimedia.org/T302981) (owner: 10Papaul)
[23:14:03] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10Papaul) @Andrew it looks like the way partman is seeing disks in a raid configuration and disk in a no raid configuration is dif...
[23:14:15] <logmsgbot>	 !log pt1979@cumin1001 START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
[23:14:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:14:25] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin1001 for host clouddumps1001.wikimedia.org w...
[23:17:27] <tzatziki>	 !log removing one file for legal compliance
[23:17:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:20:42] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
[23:21:12] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
[23:27:52] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes
[23:39:13] <wikibugs>	 (03PS1) 10BCornwall: Traffic: Add PyBal BGP sessions [alerts] - 10https://gerrit.wikimedia.org/r/803368 (https://phabricator.wikimedia.org/T300723)
[23:44:38] <wikibugs>	 (03PS2) 10BCornwall: Traffic: Add PyBal BGP sessions [alerts] - 10https://gerrit.wikimedia.org/r/803368 (https://phabricator.wikimedia.org/T300723)
[23:49:19] <wikibugs>	 (03CR) 10BCornwall: "Not sure if "warning" is the appropriate severity for this; I suspect it may require a more urgent severity." [alerts] - 10https://gerrit.wikimedia.org/r/803368 (https://phabricator.wikimedia.org/T300723) (owner: 10BCornwall)
[23:54:43] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10Infrastructure-Foundations, and 3 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10Andrew) That's similar to what I was seeing -- I don't understand why partman can tell the difference unless it's just the diffe...