| 2021-02-15 01:07:53 | <icinga-wm> | RECOVERY - Check systemd state on relforge1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 01:13:05 | <icinga-wm> | PROBLEM - Check systemd state on relforge1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 02:13:17 | <icinga-wm> | RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 02:18:25 | <icinga-wm> | PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 02:31:55 | <icinga-wm> | PROBLEM - MediaWiki memcached error rate on alert1001 is CRITICAL: 5870 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops | 
              
                | 2021-02-15 02:33:31 | <icinga-wm> | RECOVERY - MediaWiki memcached error rate on alert1001 is OK: (C)5000 gt (W)1000 gt 2 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops | 
              
                | 2021-02-15 02:42:29 | <icinga-wm> | RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 02:47:15 | <icinga-wm> | PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 03:08:29 | <icinga-wm> | RECOVERY - Check systemd state on relforge1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 03:13:39 | <icinga-wm> | PROBLEM - Check systemd state on relforge1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 03:34:49 | <icinga-wm> | PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 132, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down | 
              
                | 2021-02-15 03:35:57 | <icinga-wm> | PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 239, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down | 
              
                | 2021-02-15 04:43:13 | <icinga-wm> | RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 04:48:23 | <icinga-wm> | PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 05:02:27 | <icinga-wm> | PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/page/mobile-sections/{title} (Get mobile-sections for a test page on enwiki) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase | 
              
                | 2021-02-15 05:04:03 | <icinga-wm> | RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase | 
              
                | 2021-02-15 06:02:28 | <wikibugs> | 'SRE, ''DBA: Decom dbmonitor2001 - https://phabricator.wikimedia.org/T274496 (''Marostegui) p:''Triage→''Medium a:''Kormat Yeah, as far as I remember we're not using this for anything Assigning it for Stevie for confirmation and removal (if that
                  applies)' | 
              
                | 2021-02-15 06:10:23 | <wikibugs> | 'SRE, ''ops-eqiad, ''DBA: Investigate and repool db1134 - https://phabricator.wikimedia.org/T274472 (''Marostegui) Thanks everyone who responded to this incident!' | 
              
                | 2021-02-15 06:17:33 | <wikibugs> | 'SRE, ''DBA, ''Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (''Marostegui) >>! In T258361#6822070, @jcrespo wrote: > I am taking db1163 to, at least temporarily, substitute db1134 due to T274472. Thanks. I...' | 
              
                | 2021-02-15 06:19:15 | <wikibugs> | ('PS1) ''Marostegui: db1162: Enable notifications [puppet] - ''https://gerrit.wikimedia.org/r/664087 (https://phabricator.wikimedia.org/T258361)' | 
              
                | 2021-02-15 06:20:14 | <wikibugs> | ('CR) ''Marostegui: [C: ''+2] db1162: Enable notifications [puppet] - ''https://gerrit.wikimedia.org/r/664087 (https://phabricator.wikimedia.org/T258361) (owner: ''Marostegui)' | 
              
                | 2021-02-15 06:36:31 | <wikibugs> | ('PS1) ''Marostegui: instances.yaml: Add db1162 to dbctl [puppet] - ''https://gerrit.wikimedia.org/r/664088 (https://phabricator.wikimedia.org/T258361)' | 
              
                | 2021-02-15 06:37:05 | <wikibugs> | ('CR) ''Marostegui: [C: ''+2] instances.yaml: Add db1162 to dbctl [puppet] - ''https://gerrit.wikimedia.org/r/664088 (https://phabricator.wikimedia.org/T258361) (owner: ''Marostegui)' | 
              
                | 2021-02-15 06:40:02 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'Add db1162 to dbctl - depooled T258361', diff saved to https://phabricator.wikimedia.org/P14339 and previous config saved to /var/cache/conftool/dbconfig/20210215-064001-marostegui.json | 
              
                | 2021-02-15 06:40:06 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 06:40:08 | <stashbot> | T258361: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 | 
              
                | 2021-02-15 06:46:28 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'Pool db1162 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P14340 and previous config saved to /var/cache/conftool/dbconfig/20210215-064628-marostegui.json | 
              
                | 2021-02-15 06:46:32 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 06:46:33 | <stashbot> | T258361: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 | 
              
                | 2021-02-15 06:56:50 | <wikibugs> | ('PS1) ''Marostegui: install_server: Do not reimage db1162 and db1163 [puppet] - ''https://gerrit.wikimedia.org/r/664089' | 
              
                | 2021-02-15 06:57:31 | <wikibugs> | ('CR) ''Marostegui: [C: ''+2] install_server: Do not reimage db1162 and db1163 [puppet] - ''https://gerrit.wikimedia.org/r/664089 (owner: ''Marostegui)' | 
              
                | 2021-02-15 06:58:07 | <icinga-wm> | RECOVERY - Check systemd state on search-loader2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 07:02:06 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'Pool db1162 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P14341 and previous config saved to /var/cache/conftool/dbconfig/20210215-070206-marostegui.json | 
              
                | 2021-02-15 07:02:10 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 07:02:12 | <stashbot> | T258361: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 | 
              
                | 2021-02-15 07:09:46 | <wikibugs> | 'SRE, ''ops-eqiad: ms-be1034 not powering on - https://phabricator.wikimedia.org/T274488 (''elukey) ''Resolved→''Open ms-be1034 is down again, same issue as the one described by Filippo... :(' | 
              
                | 2021-02-15 07:10:31 | <icinga-wm> | ACKNOWLEDGEMENT - Host ms-be1034 is DOWN: PING CRITICAL - Packet loss = 100% Elukey T274488 | 
              
                | 2021-02-15 07:14:17 | <logmsgbot> | !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-tool1007.eqiad.wmnet | 
              
                | 2021-02-15 07:14:20 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 07:16:37 | <logmsgbot> | !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1007.eqiad.wmnet | 
              
                | 2021-02-15 07:16:40 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 07:20:41 | <logmsgbot> | !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-tool1008.eqiad.wmnet | 
              
                | 2021-02-15 07:20:44 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 07:22:37 | <logmsgbot> | !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1008.eqiad.wmnet | 
              
                | 2021-02-15 07:22:40 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 07:24:21 | <logmsgbot> | !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-tool1009.eqiad.wmnet | 
              
                | 2021-02-15 07:24:23 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 07:26:40 | <logmsgbot> | !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1009.eqiad.wmnet | 
              
                | 2021-02-15 07:26:44 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 07:28:21 | <logmsgbot> | !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-tool1010.eqiad.wmnet | 
              
                | 2021-02-15 07:28:26 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 07:33:24 | <logmsgbot> | !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1010.eqiad.wmnet | 
              
                | 2021-02-15 07:33:29 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 07:38:23 | <icinga-wm> | RECOVERY - Check systemd state on relforge1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 07:42:54 | <logmsgbot> | !log elukey@cumin1001 START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes - elukey@cumin1001 | 
              
                | 2021-02-15 07:42:57 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 07:43:33 | <icinga-wm> | PROBLEM - Check systemd state on relforge1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 07:47:21 | <wikibugs> | ('PS1) ''ArielGlenn: wikidata json dumps: re-add source of shared functions [puppet] - ''https://gerrit.wikimedia.org/r/664090' | 
              
                | 2021-02-15 07:48:16 | <wikibugs> | ('CR) ''ArielGlenn: [C: ''+2] wikidata json dumps: re-add source of shared functions [puppet] - ''https://gerrit.wikimedia.org/r/664090 (owner: ''ArielGlenn)' | 
              
                | 2021-02-15 07:49:32 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 3%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14342 and previous config saved to /var/cache/conftool/dbconfig/20210215-074932-root.json | 
              
                | 2021-02-15 07:49:35 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 07:57:16 | <wikibugs> | ('PS1) ''ArielGlenn: now that snapshot1005 is testbed host, make snapshot1007 the enwiki dumps runner [puppet] - ''https://gerrit.wikimedia.org/r/664091 (https://phabricator.wikimedia.org/T269377)' | 
              
                | 2021-02-15 08:04:37 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 4%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14343 and previous config saved to /var/cache/conftool/dbconfig/20210215-080435-root.json | 
              
                | 2021-02-15 08:04:40 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 08:07:33 | <icinga-wm> | RECOVERY - Check systemd state on relforge1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 08:08:21 | <wikibugs> | ('CR) ''ArielGlenn: [C: ''+2] now that snapshot1005 is testbed host, make snapshot1007 the enwiki dumps runner [puppet] - ''https://gerrit.wikimedia.org/r/664091 (https://phabricator.wikimedia.org/T269377) (owner: ''ArielGlenn)' | 
              
                | 2021-02-15 08:10:50 | <wikibugs> | ('PS1) ''ArielGlenn: prep snapshot1005 and 1006 for reinstall with buster [puppet] - ''https://gerrit.wikimedia.org/r/664092 (https://phabricator.wikimedia.org/T269377)' | 
              
                | 2021-02-15 08:13:14 | <wikibugs> | ('CR) ''ArielGlenn: [C: ''+2] prep snapshot1005 and 1006 for reinstall with buster [puppet] - ''https://gerrit.wikimedia.org/r/664092 (https://phabricator.wikimedia.org/T269377) (owner: ''ArielGlenn)' | 
              
                | 2021-02-15 08:17:33 | <wikibugs> | 'SRE, ''Dumps-Generation, ''Platform Engineering, ''serviceops, and 2 others: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (''ops-monitoring-bot) Script wmf-auto-reimage was launched by ariel on cumin1001.eqiad.wmnet for hosts: ` snapshot1005.eqiad.wmnet ` The log can be fo...' | 
              
                | 2021-02-15 08:19:41 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 5%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14344 and previous config saved to /var/cache/conftool/dbconfig/20210215-081940-root.json | 
              
                | 2021-02-15 08:26:51 | <wikibugs> | 'SRE, ''DBA, ''Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (''Marostegui)' | 
              
                | 2021-02-15 08:27:19 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1075 T274235', diff saved to https://phabricator.wikimedia.org/P14345 and previous config saved to /var/cache/conftool/dbconfig/20210215-082718-marostegui.json | 
              
                | 2021-02-15 08:27:47 | <icinga-wm> | PROBLEM - Check systemd state on sodium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 08:29:05 | <gehel> | !log powercycle wdqs1009 | 
              
                | 2021-02-15 08:29:22 | <wikibugs> | ('PS1) ''Marostegui: db1075: Disable notifications [puppet] - ''https://gerrit.wikimedia.org/r/664093 (https://phabricator.wikimedia.org/T274235)' | 
              
                | 2021-02-15 08:29:24 | <wikibugs> | ('PS1) ''Elukey: profile::hadoop::backup::namenode: add a more precise notes_url [puppet] - ''https://gerrit.wikimedia.org/r/664094' | 
              
                | 2021-02-15 08:29:25 | <logmsgbot> | !log ariel@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1005.eqiad.wmnet with reason: REIMAGE | 
              
                | 2021-02-15 08:30:06 | <wikibugs> | ('CR) ''Marostegui: [C: ''+2] db1075: Disable notifications [puppet] - ''https://gerrit.wikimedia.org/r/664093 (https://phabricator.wikimedia.org/T274235) (owner: ''Marostegui)' | 
              
                | 2021-02-15 08:31:30 | <logmsgbot> | !log ariel@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1005.eqiad.wmnet with reason: REIMAGE | 
              
                | 2021-02-15 08:31:48 | <wikibugs> | ('PS1) ''JMeybohm: tiller: Run tiller as user nobody [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/664095 (https://phabricator.wikimedia.org/T274254)' | 
              
                | 2021-02-15 08:31:50 | <wikibugs> | ('PS1) ''JMeybohm: eventrouter: Use numeric UID [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/664096 (https://phabricator.wikimedia.org/T274254)' | 
              
                | 2021-02-15 08:31:52 | <wikibugs> | ('PS1) ''JMeybohm: fluent-bit: Use numeric UID [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/664097 (https://phabricator.wikimedia.org/T274254)' | 
              
                | 2021-02-15 08:31:57 | <wikibugs> | ('PS1) ''JMeybohm: ratelimit: Use numeric UID [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/664098 (https://phabricator.wikimedia.org/T274254)' | 
              
                | 2021-02-15 08:32:20 | <wikibugs> | ('CR) ''Elukey: [C: ''+2] profile::hadoop::backup::namenode: add a more precise notes_url [puppet] - ''https://gerrit.wikimedia.org/r/664094 (owner: ''Elukey)' | 
              
                | 2021-02-15 08:34:44 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14346 and previous config saved to /var/cache/conftool/dbconfig/20210215-083444-root.json | 
              
                | 2021-02-15 08:44:12 | <wikibugs> | ('PS1) ''Elukey: hadoop: enable HDFS service port for Analytics Hadoop [puppet] - ''https://gerrit.wikimedia.org/r/664099 (https://phabricator.wikimedia.org/T273629)' | 
              
                | 2021-02-15 08:45:24 | <wikibugs> | ('CR) ''JMeybohm: [C: ''+1] "Nice!" [deployment-charts] - ''https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893) (owner: ''Kosta Harlan)' | 
              
                | 2021-02-15 08:47:53 | <wikibugs> | ('CR) ''Elukey: [V: ''+1] "PCC SUCCESS (DIFF 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28056/console"; [puppet] - ''https://gerrit.wikimedia.org/r/664099 (https://phabricator.wikimedia.org/T273629)
                  (owner: ''Elukey)' | 
              
                | 2021-02-15 08:48:01 | <logmsgbot> | !log ryankemper@cumin1001 END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) | 
              
                | 2021-02-15 08:49:48 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 15%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14347 and previous config saved to /var/cache/conftool/dbconfig/20210215-084947-root.json | 
              
                | 2021-02-15 08:50:59 | <wikibugs> | 'ops-eqiad, ''DC-Ops, ''Wikidata, ''Wikidata-Query-Service: Upgrade firmware on wdqs1009 - https://phabricator.wikimedia.org/T274751 (''Gehel)' | 
              
                | 2021-02-15 08:53:53 | <wikibugs> | 'SRE, ''Dumps-Generation, ''Platform Engineering, ''serviceops, and 2 others: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (''ops-monitoring-bot) Completed auto-reimage of hosts: ` ['snapshot1005.eqiad.wmnet'] ` and were **ALL** successful.' | 
              
                | 2021-02-15 08:58:58 | <logmsgbot> | !log elukey@cumin1001 END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes - elukey@cumin1001 | 
              
                | 2021-02-15 09:01:22 | <wikibugs> | 'SRE, ''DBA, ''Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (''Marostegui)' | 
              
                | 2021-02-15 09:01:30 | <wikibugs> | 'SRE, ''DBA, ''Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (''Marostegui)' | 
              
                | 2021-02-15 09:01:32 | <wikibugs> | ('CR) ''Elukey: [V: ''+1 C: ''+2] hadoop: enable HDFS service port for Analytics Hadoop [puppet] - ''https://gerrit.wikimedia.org/r/664099 (https://phabricator.wikimedia.org/T273629) (owner: ''Elukey)' | 
              
                | 2021-02-15 09:04:52 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 20%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14348 and previous config saved to /var/cache/conftool/dbconfig/20210215-090451-root.json | 
              
                | 2021-02-15 09:05:56 | <wikibugs> | ('PS1) ''Joal: Update oozie sharelib creation [puppet] - ''https://gerrit.wikimedia.org/r/664172 (https://phabricator.wikimedia.org/T274322)' | 
              
                | 2021-02-15 09:06:00 | <wikibugs> | ('CR) ''JMeybohm: [C: ''-1] "You do mix list indention styles a bit, don't know if we should argue about it or just leave it be." (''2 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/651757 (owner: ''Giuseppe Lavagetto)' | 
              
                | 2021-02-15 09:06:03 | <joal> | elukey: --^ | 
              
                | 2021-02-15 09:06:06 | <joal> | for when you have time | 
              
                | 2021-02-15 09:07:48 | <wikibugs> | ('CR) ''Elukey: [C: ''+2] Update oozie sharelib creation [puppet] - ''https://gerrit.wikimedia.org/r/664172 (https://phabricator.wikimedia.org/T274322) (owner: ''Joal)' | 
              
                | 2021-02-15 09:11:52 | <logmsgbot> | !log ryankemper@cumin1001 END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97) | 
              
                | 2021-02-15 09:12:50 | <wikibugs> | 'SRE, ''Dumps-Generation, ''Platform Engineering, ''serviceops, and 2 others: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (''ops-monitoring-bot) Script wmf-auto-reimage was launched by ariel on cumin1001.eqiad.wmnet for hosts: ` snapshot1006.eqiad.wmnet ` The log can be fo...' | 
              
                | 2021-02-15 09:13:58 | <wikibugs> | ('PS1) ''Filippo Giunchedi: grafana: stop POST to /api/snapshots [puppet] - ''https://gerrit.wikimedia.org/r/664224 (https://phabricator.wikimedia.org/T274736)' | 
              
                | 2021-02-15 09:15:13 | <wikibugs> | ('CR) ''Filippo Giunchedi: [V: ''+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28057/console"; [puppet] - ''https://gerrit.wikimedia.org/r/664224 (https://phabricator.wikimedia.org/T274736) (owner: ''Filippo Giunchedi)' | 
              
                | 2021-02-15 09:15:53 | <wikibugs> | ('CR) ''Kosta Harlan: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)' | 
              
                | 2021-02-15 09:17:11 | <wikibugs> | ('CR) ''Muehlenhoff: [C: ''+1] "Looks good" [puppet] - ''https://gerrit.wikimedia.org/r/664224 (https://phabricator.wikimedia.org/T274736) (owner: ''Filippo Giunchedi)' | 
              
                | 2021-02-15 09:17:49 | <wikibugs> | ('CR) ''Filippo Giunchedi: [V: ''+1 C: ''+2] grafana: stop POST to /api/snapshots [puppet] - ''https://gerrit.wikimedia.org/r/664224 (https://phabricator.wikimedia.org/T274736) (owner: ''Filippo Giunchedi)' | 
              
                | 2021-02-15 09:19:55 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14349 and previous config saved to /var/cache/conftool/dbconfig/20210215-091955-root.json | 
              
                | 2021-02-15 09:24:00 | <wikibugs> | ('PS1) ''ArielGlenn: misc dumps: move commons rdf to later on Sunday and media info to earlier [puppet] - ''https://gerrit.wikimedia.org/r/664225 (https://phabricator.wikimedia.org/T269377)' | 
              
                | 2021-02-15 09:24:02 | <wikibugs> | ('CR) ''David Caro: "Got a couple questions, nits you can safely ignore :)" (''6 comments) [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)' | 
              
                | 2021-02-15 09:24:20 | <wikibugs> | ('PS2) ''JMeybohm: mathoid: pipeline bot promote [deployment-charts] - ''https://gerrit.wikimedia.org/r/663873 (https://phabricator.wikimedia.org/T274262) (owner: ''PipelineBot)' | 
              
                | 2021-02-15 09:24:39 | <wikibugs> | ('CR) ''Ayounsi: [C: ''+2] Remove sampling feature flag [homer/public] - ''https://gerrit.wikimedia.org/r/663533 (owner: ''Ayounsi)' | 
              
                | 2021-02-15 09:25:47 | <logmsgbot> | !log ariel@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1006.eqiad.wmnet with reason: REIMAGE | 
              
                | 2021-02-15 09:26:49 | <wikibugs> | ('CR) ''Ayounsi: "confirmed NOOP." [homer/public] - ''https://gerrit.wikimedia.org/r/663533 (owner: ''Ayounsi)' | 
              
                | 2021-02-15 09:27:52 | <logmsgbot> | !log ariel@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1006.eqiad.wmnet with reason: REIMAGE | 
              
                | 2021-02-15 09:28:41 | <wikibugs> | ('PS1) ''Vgutierrez: admin: Add christinedk user [puppet] - ''https://gerrit.wikimedia.org/r/664226 (https://phabricator.wikimedia.org/T274304)' | 
              
                | 2021-02-15 09:28:43 | <wikibugs> | ('PS1) ''Vgutierrez: admin: Add christinedk to analytics-privatedata-users [puppet] - ''https://gerrit.wikimedia.org/r/664227 (https://phabricator.wikimedia.org/T274304)' | 
              
                | 2021-02-15 09:34:59 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 30%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14350 and previous config saved to /var/cache/conftool/dbconfig/20210215-093458-root.json | 
              
                | 2021-02-15 09:35:26 | <wikibugs> | ('CR) ''Muehlenhoff: admin: Add christinedk user (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/664226 (https://phabricator.wikimedia.org/T274304) (owner: ''Vgutierrez)' | 
              
                | 2021-02-15 09:37:27 | <wikibugs> | 'SRE, ''observability: Icinga meta monitoring pages during icinga host reboots - https://phabricator.wikimedia.org/T274662 (''Volans) If we allow for normal reboots going unnoticed, would we catch a scenario in which the icinga host reboots every 5 minutes due to a bug or DoS? P.S. Keyholder is not armed aft...' | 
              
                | 2021-02-15 09:43:50 | <elukey> | !log roll restart HDFS daemons in Analytics Hadoop to pick up new RPC queue changes - T273629 | 
              
                | 2021-02-15 09:47:55 | <wikibugs> | ('CR) ''Volans: "Optional nit inline" (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/663860 (owner: ''Hnowlan)' | 
              
                | 2021-02-15 09:50:03 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 40%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14351 and previous config saved to /var/cache/conftool/dbconfig/20210215-095002-root.json | 
              
                | 2021-02-15 09:50:41 | <wikibugs> | 'SRE, ''Dumps-Generation, ''Platform Engineering, ''serviceops, and 2 others: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (''ops-monitoring-bot) Completed auto-reimage of hosts: ` ['snapshot1006.eqiad.wmnet'] ` and were **ALL** successful.' | 
              
                | 2021-02-15 09:55:54 | <wikibugs> | ('PS1) ''Jcrespo: Revert "dbbackups: disable all ES db bacula runs until next week" [puppet] - ''https://gerrit.wikimedia.org/r/663961' | 
              
                | 2021-02-15 09:56:15 | <wikibugs> | ('PS2) ''Jcrespo: Revert "dbbackups: disable all ES db bacula runs until next week" [puppet] - ''https://gerrit.wikimedia.org/r/663961' | 
              
                | 2021-02-15 09:57:14 | <wikibugs> | 'SRE, ''Dumps-Generation, ''Platform Engineering, ''serviceops, and 2 others: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (''ArielGlenn) I was not going to re-image snapshot1005 and 6 because their replacements were due to have come in, but the boxes have not arrived yet a...' | 
              
                | 2021-02-15 09:57:18 | <wikibugs> | 'SRE: Create cookbook to add a node to a Ganeti cluster - https://phabricator.wikimedia.org/T274527 (''MoritzMuehlenhoff) p:''Triage→''Medium' | 
              
                | 2021-02-15 09:57:34 | <wikibugs> | ('PS2) ''ArielGlenn: misc dumps: move commons rdf to later on Sunday and media info to earlier [puppet] - ''https://gerrit.wikimedia.org/r/664225 (https://phabricator.wikimedia.org/T269377)' | 
              
                | 2021-02-15 09:57:51 | <wikibugs> | 'SRE, ''Packaging: Copy cassandra packages to buster-wikimedia - https://phabricator.wikimedia.org/T274119 (''MoritzMuehlenhoff) p:''Triage→''Medium' | 
              
                | 2021-02-15 09:58:12 | <wikibugs> | ('CR) ''Jcrespo: [C: ''+2] Revert "dbbackups: disable all ES db bacula runs until next week" [puppet] - ''https://gerrit.wikimedia.org/r/663961 (owner: ''Jcrespo)' | 
              
                | 2021-02-15 09:59:02 | <wikibugs> | ('CR) ''ArielGlenn: [C: ''+2] misc dumps: move commons rdf to later on Sunday and media info to earlier [puppet] - ''https://gerrit.wikimedia.org/r/664225 (https://phabricator.wikimedia.org/T269377) (owner: ''ArielGlenn)' | 
              
                | 2021-02-15 10:00:12 | <apergos> | jynus: may I merge your puppet patch "backup::set { 'mysql-srv-backups-dumps-latest':" etc? | 
              
                | 2021-02-15 10:00:17 | <jynus> | yes | 
              
                | 2021-02-15 10:00:41 | <apergos> | done! | 
              
                | 2021-02-15 10:00:44 | <jynus> | thanks | 
              
                | 2021-02-15 10:02:02 | <apergos> | thanks for the quick response! | 
              
                | 2021-02-15 10:05:06 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14352 and previous config saved to /var/cache/conftool/dbconfig/20210215-100505-root.json | 
              
                | 2021-02-15 10:09:14 | <hashar> | !log Switching Jenkins jobs to Quibble 0.0.46 | 
              
                | 2021-02-15 10:15:52 | <wikibugs> | 'SRE, ''ops-eqiad: ms-be1034 not powering on - https://phabricator.wikimedia.org/T274488 (''fgiunchedi) Thank you for all the work ! LMK how I can help e.g. if speeding up the decom of one host in T272836 would help (as opposed as decom'ing all hosts at the same time)' | 
              
                | 2021-02-15 10:20:09 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 60%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14353 and previous config saved to /var/cache/conftool/dbconfig/20210215-102009-root.json | 
              
                | 2021-02-15 10:23:30 | <logmsgbot> | !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host netmon1002.wikimedia.org | 
              
                | 2021-02-15 10:27:29 | <logmsgbot> | !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1002.wikimedia.org | 
              
                | 2021-02-15 10:30:08 | <wikibugs> | ('CR) ''Marostegui: [C: ''+2] db1134: Do not be tag as candidate master [puppet] - ''https://gerrit.wikimedia.org/r/664230 (https://phabricator.wikimedia.org/T274472) (owner: ''Marostegui)' | 
              
                | 2021-02-15 10:31:09 | <wikibugs> | ('PS1) ''Arturo Borrero Gonzalez: dumps: distribution: nfs: allow establishing connections with TCP ports > 1024 [puppet] - ''https://gerrit.wikimedia.org/r/664231 (https://phabricator.wikimedia.org/T272397)' | 
              
                | 2021-02-15 10:35:13 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 70%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14355 and previous config saved to /var/cache/conftool/dbconfig/20210215-103512-root.json | 
              
                | 2021-02-15 10:41:25 | <wikibugs> | ('PS2) ''Arturo Borrero Gonzalez: dumps: distribution: nfs: allow establishing connections with TCP ports >= 1024 [puppet] - ''https://gerrit.wikimedia.org/r/664231 (https://phabricator.wikimedia.org/T272397)' | 
              
                | 2021-02-15 10:44:09 | <wikibugs> | ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] dumps: distribution: nfs: allow establishing connections with TCP ports >= 1024 [puppet] - ''https://gerrit.wikimedia.org/r/664231 (https://phabricator.wikimedia.org/T272397) (owner: ''Arturo Borrero Gonzalez)' | 
              
                | 2021-02-15 10:47:08 | <wikibugs> | ('PS1) ''Arturo Borrero Gonzalez: labstore: allow NFS connections from public cloud networks [puppet] - ''https://gerrit.wikimedia.org/r/664233 (https://phabricator.wikimedia.org/T272397)' | 
              
                | 2021-02-15 10:48:49 | <wikibugs> | ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] labstore: allow NFS connections from public cloud networks [puppet] - ''https://gerrit.wikimedia.org/r/664233 (https://phabricator.wikimedia.org/T272397) (owner: ''Arturo Borrero Gonzalez)' | 
              
                | 2021-02-15 10:49:05 | <wikibugs> | ('PS1) ''ArielGlenn: swap roles of dumpsdata1001 and 1003 so 1003 is primary for xml/sql dumps [puppet] - ''https://gerrit.wikimedia.org/r/664234 (https://phabricator.wikimedia.org/T273713)' | 
              
                | 2021-02-15 10:50:16 | <godog> | jouncebot: next | 
              
                | 2021-02-15 10:50:16 | <jouncebot> | In 0 hour(s) and 39 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210215T1130) | 
              
                | 2021-02-15 10:50:16 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 80%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14356 and previous config saved to /var/cache/conftool/dbconfig/20210215-105016-root.json | 
              
                | 2021-02-15 10:57:30 | <wikibugs> | ('PS2) ''ArielGlenn: swap roles of dumpsdata1001 and 1003 so 1003 is primary for xml/sql dumps [puppet] - ''https://gerrit.wikimedia.org/r/664234 (https://phabricator.wikimedia.org/T273713)' | 
              
                | 2021-02-15 10:57:59 | <logmsgbot> | !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host grafana1002.eqiad.wmnet | 
              
                | 2021-02-15 10:58:44 | <wikibugs> | ('PS1) ''Jcrespo: Preventive commit for jynus to misspell "bullseye", next Debian version [puppet] - ''https://gerrit.wikimedia.org/r/664237' | 
              
                | 2021-02-15 10:58:59 | <wikibugs> | ('CR) ''ArielGlenn: [C: ''+2] swap roles of dumpsdata1001 and 1003 so 1003 is primary for xml/sql dumps [puppet] - ''https://gerrit.wikimedia.org/r/664234 (https://phabricator.wikimedia.org/T273713) (owner: ''ArielGlenn)' | 
              
                | 2021-02-15 11:00:25 | <logmsgbot> | !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana1002.eqiad.wmnet | 
              
                | 2021-02-15 11:02:02 | <wikibugs> | ('CR) ''Effie Mouzeli: "PCC https://puppet-compiler.wmflabs.org/compiler1002/28058/"; [puppet] - ''https://gerrit.wikimedia.org/r/663565 (https://phabricator.wikimedia.org/T273115) (owner: ''Effie Mouzeli)' | 
              
                | 2021-02-15 11:03:17 | <wikibugs> | ('PS2) ''Hnowlan: mtail: add exception handling in tests for non-Debian OSes [puppet] - ''https://gerrit.wikimedia.org/r/663860' | 
              
                | 2021-02-15 11:05:20 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 90%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14357 and previous config saved to /var/cache/conftool/dbconfig/20210215-110519-root.json | 
              
                | 2021-02-15 11:06:51 | <icinga-wm> | RECOVERY - Maps HTTPS on maps2007 is OK: HTTP OK: HTTP/1.1 200 OK - 1329 bytes in 0.301 second response time https://wikitech.wikimedia.org/wiki/Maps/RunBook | 
              
                | 2021-02-15 11:07:27 | <icinga-wm> | RECOVERY - tilerator on maps2007 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.085 second response time https://wikitech.wikimedia.org/wiki/Services/Monitoring/tilerator | 
              
                | 2021-02-15 11:08:21 | <icinga-wm> | RECOVERY - Check systemd state on maps2007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 11:10:25 | <logmsgbot> | !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes:weight=10; selector: name=maps2007.codfw.wmnet | 
              
                | 2021-02-15 11:11:57 | <wikibugs> | ('CR) ''Hnowlan: mtail: add exception handling in tests for non-Debian OSes (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/663860 (owner: ''Hnowlan)' | 
              
                | 2021-02-15 11:14:57 | <wikibugs> | ('PS1) ''Elukey: profile::hadoop::master: raise threshold for corrupt blocks [puppet] - ''https://gerrit.wikimedia.org/r/664238' | 
              
                | 2021-02-15 11:16:50 | <wikibugs> | ('CR) ''Elukey: [C: ''+2] profile::hadoop::master: raise threshold for corrupt blocks [puppet] - ''https://gerrit.wikimedia.org/r/664238 (owner: ''Elukey)' | 
              
                | 2021-02-15 11:20:24 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14358 and previous config saved to /var/cache/conftool/dbconfig/20210215-112023-root.json | 
              
                | 2021-02-15 11:27:16 | <wikibugs> | ('PS4) ''Arturo Borrero Gonzalez: cloud: drop NAT exceptions for dumps NFS [puppet] - ''https://gerrit.wikimedia.org/r/657152 (https://phabricator.wikimedia.org/T272397)' | 
              
                | 2021-02-15 11:28:11 | <logmsgbot> | !log elukey@cumin1001 START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes - elukey@cumin1001 | 
              
                | 2021-02-15 11:28:44 | <elukey> | this may trigger (I hope not) AQS alerts --^ | 
              
                | 2021-02-15 11:28:52 | <elukey> | in case it is my fault and you can blame me | 
              
                | 2021-02-15 11:29:05 | <elukey> | sees kormat ready for it | 
              
                | 2021-02-15 11:29:31 | <kormat> | nods solemnly | 
              
                | 2021-02-15 11:29:57 | <wikibugs> | ('CR) ''Hnowlan: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)' | 
              
                | 2021-02-15 11:30:04 | <jouncebot> | jan_drewniak: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Wikimedia Portals Update . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210215T1130). | 
              
                | 2021-02-15 11:32:31 | <wikibugs> | ('PS1) ''Arturo Borrero Gonzalez: cloudgw: move common hiera into proper file [puppet] - ''https://gerrit.wikimedia.org/r/664241 (https://phabricator.wikimedia.org/T272963)' | 
              
                | 2021-02-15 11:33:13 | <wikibugs> | ('CR) ''Jbond: "See comments inline, also wonder if you considered using pathlib for the file operations." (''5 comments) [puppet] - ''https://gerrit.wikimedia.org/r/663658 (https://phabricator.wikimedia.org/T271583) (owner: ''CRusnov)' | 
              
                | 2021-02-15 11:33:17 | <wikibugs> | ('PS4) ''Effie Mouzeli: hieradata: enable memcached socket mwdebug1003, mwdebug2001 [puppet] - ''https://gerrit.wikimedia.org/r/663796 (https://phabricator.wikimedia.org/T273115)' | 
              
                | 2021-02-15 11:33:19 | <wikibugs> | ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] cloudgw: move common hiera into proper file [puppet] - ''https://gerrit.wikimedia.org/r/664241 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)' | 
              
                | 2021-02-15 11:34:50 | <wikibugs> | ('PS5) ''Arturo Borrero Gonzalez: cloud: drop NAT exceptions for dumps NFS [puppet] - ''https://gerrit.wikimedia.org/r/657152 (https://phabricator.wikimedia.org/T272397)' | 
              
                | 2021-02-15 11:37:34 | <moritzm> | !log reimaging bast5001 to buster | 
              
                | 2021-02-15 11:45:23 | <wikibugs> | ('CR) ''Jbond: "Adding Andrew to approve privatedata-users access" [puppet] - ''https://gerrit.wikimedia.org/r/664227 (https://phabricator.wikimedia.org/T274304) (owner: ''Vgutierrez)' | 
              
                | 2021-02-15 11:52:45 | <logmsgbot> | !log ariel@cumin1001 START - Cookbook sre.hosts.reboot-single for host snapshot1007.eqiad.wmnet | 
              
                | 2021-02-15 11:54:09 | <wikibugs> | ('CR) ''Jbond: "see comments" (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/663993 (owner: ''Urbanecm)' | 
              
                | 2021-02-15 11:55:13 | <wikibugs> | ('CR) ''Urbanecm: Update urbanecm's dotfiles (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/663993 (owner: ''Urbanecm)' | 
              
                | 2021-02-15 11:55:23 | <wikibugs> | ('PS2) ''Urbanecm: Update urbanecm's dotfiles [puppet] - ''https://gerrit.wikimedia.org/r/663993' | 
              
                | 2021-02-15 11:56:00 | <wikibugs> | ('CR) ''Jbond: [C: ''+2] Update urbanecm's dotfiles [puppet] - ''https://gerrit.wikimedia.org/r/663993 (owner: ''Urbanecm)' | 
              
                | 2021-02-15 11:56:21 | <jbond42> | Urbanecm: ^^ merged | 
              
                | 2021-02-15 11:56:24 | <Urbanecm> | thanks jbond42 ! | 
              
                | 2021-02-15 11:56:28 | <jbond42> | :) np | 
              
                | 2021-02-15 11:58:52 | <logmsgbot> | !log ariel@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1007.eqiad.wmnet | 
              
                | 2021-02-15 12:00:05 | <jouncebot> | Amir1, Lucas_WMDE, awight, and Urbanecm: That opportune time is upon us again. Time for a European mid-day backport window deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210215T1200). | 
              
                | 2021-02-15 12:00:05 | <jouncebot> | No GERRIT patches in the queue for this window AFAICS. | 
              
                | 2021-02-15 12:00:14 | <Urbanecm> | I'll deploy regardless | 
              
                | 2021-02-15 12:01:12 | <wikibugs> | ('CR) ''Urbanecm: [C: ''+2] Revert "Revert "Enable SandboxLink at viwiki"" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/663736 (https://phabricator.wikimedia.org/T272796) (owner: ''Urbanecm)' | 
              
                | 2021-02-15 12:02:46 | <wikibugs> | ('Merged) ''jenkins-bot: Revert "Revert "Enable SandboxLink at viwiki"" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/663736 (https://phabricator.wikimedia.org/T272796) (owner: ''Urbanecm)' | 
              
                | 2021-02-15 12:04:02 | <wikibugs> | ('PS1) ''Effie Mouzeli: hiera: install memcached 1.6 on mc1037 [puppet] - ''https://gerrit.wikimedia.org/r/664271 (https://phabricator.wikimedia.org/T270315)' | 
              
                | 2021-02-15 12:06:36 | <wikibugs> | ('CR) ''Jbond: [C: ''+1] "thanks this will also be a big help to me 😊" [puppet] - ''https://gerrit.wikimedia.org/r/664237 (owner: ''Jcrespo)' | 
              
                | 2021-02-15 12:07:47 | <logmsgbot> | !log jmm@cumin2001 START - Cookbook sre.hosts.downtime for 2:00:00 on bast5001.wikimedia.org with reason: REIMAGE | 
              
                | 2021-02-15 12:07:55 | <wikibugs> | ('PS22) ''Kosta Harlan: linkrecommendation: Cron job to load datasets [deployment-charts] - ''https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893)' | 
              
                | 2021-02-15 12:08:54 | <Urbanecm> | can someone check mwdebug1002.eqiad.wmnet status, and remove it from scap if it is still broken (as mutante said in ops list)? | 
              
                | 2021-02-15 12:09:16 | <wikibugs> | ('CR) ''Effie Mouzeli: "PCC https://puppet-compiler.wmflabs.org/compiler1003/28065/mc2037.codfw.wmnet/index.html"; [puppet] - ''https://gerrit.wikimedia.org/r/664271 (https://phabricator.wikimedia.org/T270315) (owner: ''Effie Mouzeli)' | 
              
                | 2021-02-15 12:09:47 | <logmsgbot> | !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast5001.wikimedia.org with reason: REIMAGE | 
              
                | 2021-02-15 12:09:59 | <wikibugs> | ('PS2) ''Muehlenhoff: Swift: Stop setting net.ipv4.tcp_tw_recycle for buster and later [puppet] - ''https://gerrit.wikimedia.org/r/662918' | 
              
                | 2021-02-15 12:10:35 | <logmsgbot> | !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 662d5f6af01f6cf6ce7e9d56cf1bc3ba282afee1: Revert "Revert "Enable SandboxLink at viwiki"" (T272796) (duration: 05m 26s) | 
              
                | 2021-02-15 12:10:41 | <Urbanecm> | finally | 
              
                | 2021-02-15 12:11:36 | <wikibugs> | 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytic Cluster for Research Scientist (Paragon) - https://phabricator.wikimedia.org/T274631 (''MoritzMuehlenhoff) Also needs approval by @Ottomata for Hadoop access.' | 
              
                | 2021-02-15 12:13:39 | <wikibugs> | ('CR) ''JMeybohm: [C: ''+1] linkrecommendation: Cron job to load datasets [deployment-charts] - ''https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893) (owner: ''Kosta Harlan)' | 
              
                | 2021-02-15 12:14:25 | <wikibugs> | ('CR) ''Kosta Harlan: [C: ''+2] linkrecommendation: Cron job to load datasets [deployment-charts] - ''https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893) (owner: ''Kosta Harlan)' | 
              
                | 2021-02-15 12:15:59 | <wikibugs> | ('Merged) ''jenkins-bot: linkrecommendation: Cron job to load datasets [deployment-charts] - ''https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893) (owner: ''Kosta Harlan)' | 
              
                | 2021-02-15 12:16:19 | <wikibugs> | ('CR) ''Vgutierrez: [C: ''+1] delete class tlsproxy::prometheus and nginx template [puppet] - ''https://gerrit.wikimedia.org/r/659377 (https://phabricator.wikimedia.org/T272559) (owner: ''Dzahn)' | 
              
                | 2021-02-15 12:16:21 | <wikibugs> | ('PS2) ''Urbanecm: ukwikisource: Finish removal of NS Translations [mediawiki-config] - ''https://gerrit.wikimedia.org/r/664053 (https://phabricator.wikimedia.org/T270628)' | 
              
                | 2021-02-15 12:16:24 | <wikibugs> | ('CR) ''Urbanecm: [C: ''+2] ukwikisource: Finish removal of NS Translations [mediawiki-config] - ''https://gerrit.wikimedia.org/r/664053 (https://phabricator.wikimedia.org/T270628) (owner: ''Urbanecm)' | 
              
                | 2021-02-15 12:17:21 | <wikibugs> | ('Merged) ''jenkins-bot: ukwikisource: Finish removal of NS Translations [mediawiki-config] - ''https://gerrit.wikimedia.org/r/664053 (https://phabricator.wikimedia.org/T270628) (owner: ''Urbanecm)' | 
              
                | 2021-02-15 12:17:27 | <wikibugs> | ('CR) ''Elukey: [C: ''+1] "left a nit for the commit msg, LGTM otherwise!" (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/664271 (https://phabricator.wikimedia.org/T270315) (owner: ''Effie Mouzeli)' | 
              
                | 2021-02-15 12:18:18 | <wikibugs> | ('CR) ''Elukey: [C: ''+1] "Effie can you run a pcc to see if everything looks good?" [puppet] - ''https://gerrit.wikimedia.org/r/663868 (https://phabricator.wikimedia.org/T270315) (owner: ''Effie Mouzeli)' | 
              
                | 2021-02-15 12:18:47 | <logmsgbot> | !log kharlan@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' . | 
              
                | 2021-02-15 12:21:30 | <Urbanecm> | repeating myself: can someone depool mwdebug1002? it's currently down (see mail from dzahn in ops list), but still pooled and thus in scap dsh group :/ | 
              
                | 2021-02-15 12:22:25 | <wikibugs> | 'SRE: hosts failing puppet compile due to missing secrets - https://phabricator.wikimedia.org/T274392 (''MoritzMuehlenhoff) p:''Triage→''Medium' | 
              
                | 2021-02-15 12:23:45 | <wikibugs> | 'SRE: hosts failing puppet compile due to missing secrets - https://phabricator.wikimedia.org/T274392 (''MoritzMuehlenhoff) Adding a few tags for affected sub teams, simply untag when completed' | 
              
                | 2021-02-15 12:24:38 | <wikibugs> | 'SRE, ''Analytics, ''observability, ''serviceops, ''cloud-services-team (Kanban): hosts failing puppet compile due to missing secrets - https://phabricator.wikimedia.org/T274392 (''MoritzMuehlenhoff)' | 
              
                | 2021-02-15 12:25:33 | <wikibugs> | ('CR) ''Volans: "quick direct reply, will have a pass later" (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/663658 (https://phabricator.wikimedia.org/T271583) (owner: ''CRusnov)' | 
              
                | 2021-02-15 12:25:55 | <wikibugs> | ('CR) ''Arturo Borrero Gonzalez: "Thanks for the review!" (''6 comments) [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)' | 
              
                | 2021-02-15 12:30:51 | <wikibugs> | ('PS1) ''JMeybohm: admin: Allow tiller to create batch ressources [deployment-charts] - ''https://gerrit.wikimedia.org/r/664273' | 
              
                | 2021-02-15 12:32:02 | <wikibugs> | ('CR) ''JMeybohm: [V: ''+2 C: ''+2] admin: Allow tiller to create batch ressources [deployment-charts] - ''https://gerrit.wikimedia.org/r/664273 (owner: ''JMeybohm)' | 
              
                | 2021-02-15 12:32:29 | <logmsgbot> | !log jmm@puppetmaster1001 conftool action : set/pooled=inactive; selector: name=mwdebug1002.eqiad.wmnet | 
              
                | 2021-02-15 12:33:32 | <wikibugs> | ('Merged) ''jenkins-bot: admin: Allow tiller to create batch ressources [deployment-charts] - ''https://gerrit.wikimedia.org/r/664273 (owner: ''JMeybohm)' | 
              
                | 2021-02-15 12:35:00 | <logmsgbot> | !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' . | 
              
                | 2021-02-15 12:35:39 | <logmsgbot> | !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: cdf15981f7c6f7e02a3fb1c1ce61dc14815f216d: ukwikisource: Finish removal of NS Translations (T270628) (duration: 01m 07s) | 
              
                | 2021-02-15 12:36:24 | <wikibugs> | ('PS1) ''Elukey: Add/Fix kerberos fake keytabs [labs/private] - ''https://gerrit.wikimedia.org/r/664274 (https://phabricator.wikimedia.org/T274392)' | 
              
                | 2021-02-15 12:36:46 | <wikibugs> | ('CR) ''Elukey: [V: ''+2 C: ''+2] Add/Fix kerberos fake keytabs [labs/private] - ''https://gerrit.wikimedia.org/r/664274 (https://phabricator.wikimedia.org/T274392) (owner: ''Elukey)' | 
              
                | 2021-02-15 12:37:06 | <logmsgbot> | !log elukey@cumin1001 END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes - elukey@cumin1001 | 
              
                | 2021-02-15 12:37:32 | <wikibugs> | 'SRE, ''ops-eqiad, ''DC-Ops, ''Wikidata, and 2 others: Upgrade firmware on wdqs1009 - https://phabricator.wikimedia.org/T274751 (''MoritzMuehlenhoff) p:''Triage→''Medium' | 
              
                | 2021-02-15 12:38:28 | <wikibugs> | ('CR) ''David Caro: [C: ''+1] cloudgw: introduce HA by using keepalived/VRRP (''6 comments) [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)' | 
              
                | 2021-02-15 12:38:36 | <wikibugs> | ('PS9) ''Arturo Borrero Gonzalez: cloudgw: introduce HA by using keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963)' | 
              
                | 2021-02-15 12:38:38 | <wikibugs> | 'SRE, ''observability, ''serviceops, ''Patch-For-Review, ''cloud-services-team (Kanban): hosts failing puppet compile due to missing secrets - https://phabricator.wikimedia.org/T274392 (''elukey)' | 
              
                | 2021-02-15 12:39:18 | <logmsgbot> | !log kharlan@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' . | 
              
                | 2021-02-15 12:40:12 | <wikibugs> | ('PS10) ''Arturo Borrero Gonzalez: cloudgw: introduce HA by using keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963)' | 
              
                | 2021-02-15 12:43:59 | <moritzm> | !log reimaging bast4002 to buster | 
              
                | 2021-02-15 12:44:04 | <wikibugs> | ('PS11) ''Arturo Borrero Gonzalez: cloudgw: introduce HA by using keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963)' | 
              
                | 2021-02-15 12:44:09 | <logmsgbot> | !log jayme@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' . | 
              
                | 2021-02-15 12:44:39 | <icinga-wm> | PROBLEM - etherpad_lite_process_running on etherpad1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/node /usr/share/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js https://wikitech.wikimedia.org/wiki/Etherpad.wikimedia.org | 
              
                | 2021-02-15 12:44:59 | <icinga-wm> | PROBLEM - etherpad_up reduced availability on alert1001 is CRITICAL: 0 le 0.8 https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_exporters_%22up%22_metrics_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets | 
              
                | 2021-02-15 12:45:53 | <icinga-wm> | PROBLEM - etherpad.wikimedia.org HTTP on etherpad1002 is CRITICAL: connect to address 10.64.32.178 and port 9001: Connection refused https://wikitech.wikimedia.org/wiki/Etherpad.wikimedia.org | 
              
                | 2021-02-15 12:46:25 | <icinga-wm> | RECOVERY - etherpad_lite_process_running on etherpad1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/node /usr/share/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js https://wikitech.wikimedia.org/wiki/Etherpad.wikimedia.org | 
              
                | 2021-02-15 12:47:24 | <wikibugs> | ('PS12) ''Arturo Borrero Gonzalez: cloudgw: introduce HA by using keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963)' | 
              
                | 2021-02-15 12:47:35 | <icinga-wm> | RECOVERY - etherpad.wikimedia.org HTTP on etherpad1002 is OK: HTTP OK: HTTP/1.1 200 OK - 9184 bytes in 0.004 second response time https://wikitech.wikimedia.org/wiki/Etherpad.wikimedia.org | 
              
                | 2021-02-15 12:47:58 | <wikibugs> | ('CR) ''Effie Mouzeli: "> Patch Set 1: Code-Review+1" [puppet] - ''https://gerrit.wikimedia.org/r/663868 (https://phabricator.wikimedia.org/T270315) (owner: ''Effie Mouzeli)' | 
              
                | 2021-02-15 12:48:27 | <icinga-wm> | RECOVERY - etherpad_up reduced availability on alert1001 is OK: (C)0.8 le (W)0.9 le 1 https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_exporters_%22up%22_metrics_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets | 
              
                | 2021-02-15 12:49:10 | <wikibugs> | ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] "PCC https://puppet-compiler.wmflabs.org/compiler1002/28075/"; [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)' | 
              
                | 2021-02-15 12:49:13 | <wikibugs> | ('CR) ''Arturo Borrero Gonzalez: [V: ''+2 C: ''+2] cloudgw: introduce HA by using keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)' | 
              
                | 2021-02-15 12:49:45 | <logmsgbot> | !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1093 T273955', diff saved to https://phabricator.wikimedia.org/P14359 and previous config saved to /var/cache/conftool/dbconfig/20210215-124944-marostegui.json | 
              
                | 2021-02-15 12:50:24 | <wikibugs> | ('PS2) ''David Caro: utils: add script to run docker ci tests locally [software/spicerack] - ''https://gerrit.wikimedia.org/r/663205 (https://phabricator.wikimedia.org/T274338)' | 
              
                | 2021-02-15 12:50:27 | <wikibugs> | ('PS1) ''Marostegui: db1093: Disable notifications [puppet] - ''https://gerrit.wikimedia.org/r/664276 (https://phabricator.wikimedia.org/T273955)' | 
              
                | 2021-02-15 12:50:50 | <logmsgbot> | !log jayme@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' . | 
              
                | 2021-02-15 12:51:16 | <wikibugs> | ('CR) ''Marostegui: [C: ''+2] db1093: Disable notifications [puppet] - ''https://gerrit.wikimedia.org/r/664276 (https://phabricator.wikimedia.org/T273955) (owner: ''Marostegui)' | 
              
                | 2021-02-15 12:58:16 | <logmsgbot> | !log kharlan@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' . | 
              
                | 2021-02-15 12:58:16 | <logmsgbot> | !log kharlan@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' . | 
              
                | 2021-02-15 12:58:19 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 12:58:22 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 13:01:10 | <Lucas_WMDE> | we lost a whole bunch of SAL messages because stashbot was out | 
              
                | 2021-02-15 13:01:12 | <logmsgbot> | !log jmm@cumin2001 START - Cookbook sre.hosts.downtime for 2:00:00 on bast4002.wikimedia.org with reason: REIMAGE | 
              
                | 2021-02-15 13:01:15 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 13:01:21 | <Lucas_WMDE> | is it worth repeating them all? | 
              
                | 2021-02-15 13:01:49 | <Lucas_WMDE> | cc marostegui, ryankemper, ariel, elukey… | 
              
                | 2021-02-15 13:02:04 | <marostegui> | Lucas_WMDE: not from my side, thanks though! :) | 
              
                | 2021-02-15 13:02:10 | <Lucas_WMDE> | ok | 
              
                | 2021-02-15 13:02:26 | <Lucas_WMDE> | sometimes I do it but this seems to be almost 50 missed messages and I’m lazy :D | 
              
                | 2021-02-15 13:02:41 | <logmsgbot> | !log kharlan@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' . | 
              
                | 2021-02-15 13:02:41 | <logmsgbot> | !log kharlan@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' . | 
              
                | 2021-02-15 13:02:44 | <Lucas_WMDE> | (they’re all in the IRC log) | 
              
                | 2021-02-15 13:02:44 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 13:02:48 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 13:02:50 | <wikibugs> | 'SRE, ''DBA, ''Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (''Marostegui) db1162 is fully pooled' | 
              
                | 2021-02-15 13:03:18 | <logmsgbot> | !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4002.wikimedia.org with reason: REIMAGE | 
              
                | 2021-02-15 13:03:21 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 13:05:58 | <Lucas_WMDE> | !log notice: stashbot had issues between 8:19 and 12:50, see for https://wm-bot.wmflabs.org/browser/index.php?start=02%2F15%2F2021&end=02%2F15%2F2021&display=%23wikimedia-operations for missed !log messages | 
              
                | 2021-02-15 13:06:01 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 13:06:54 | <godog> | !log swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - T272836 | 
              
                | 2021-02-15 13:06:57 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 13:06:58 | <stashbot> | T272836: Decom ms-be[1019-1026] from swift - https://phabricator.wikimedia.org/T272836 | 
              
                | 2021-02-15 13:14:05 | <wikibugs> | ('PS1) ''JMeybohm: linkrecommendation: Read DB_USER from public config [deployment-charts] - ''https://gerrit.wikimedia.org/r/664277 (https://phabricator.wikimedia.org/T265893)' | 
              
                | 2021-02-15 13:14:16 | <jayme> | ^ kostajh | 
              
                | 2021-02-15 13:14:58 | <wikibugs> | ('CR) ''Kosta Harlan: [C: ''+2] linkrecommendation: Read DB_USER from public config [deployment-charts] - ''https://gerrit.wikimedia.org/r/664277 (https://phabricator.wikimedia.org/T265893) (owner: ''JMeybohm)' | 
              
                | 2021-02-15 13:15:30 | <kostajh> | jayme: cheers | 
              
                | 2021-02-15 13:17:35 | <wikibugs> | ('Merged) ''jenkins-bot: linkrecommendation: Read DB_USER from public config [deployment-charts] - ''https://gerrit.wikimedia.org/r/664277 (https://phabricator.wikimedia.org/T265893) (owner: ''JMeybohm)' | 
              
                | 2021-02-15 13:19:28 | <logmsgbot> | !log kharlan@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' . | 
              
                | 2021-02-15 13:19:36 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 13:21:47 | <wikibugs> | ('PS4) ''Hnowlan: mtail: create separate metrics histogram based on endpoint [puppet] - ''https://gerrit.wikimedia.org/r/634207 (https://phabricator.wikimedia.org/T263727)' | 
              
                | 2021-02-15 13:22:04 | <wikibugs> | ('CR) ''Hnowlan: [V: ''+2 C: ''+2] tegola: Add docker image. [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/654662 (https://phabricator.wikimedia.org/T270170) (owner: ''Hnowlan)' | 
              
                | 2021-02-15 13:28:57 | <wikibugs> | ('CR) ''Alexandros Kosiaris: "Shouldn't this instead be done via the pipeline? It would greatly decouple upgrading tegola from requiring an SRE to build newer versions " [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/654662 (https://phabricator.wikimedia.org/T270170) (owner: ''Hnowlan)' | 
              
                | 2021-02-15 13:33:36 | <marostegui> | !log Stop MySQL on db1093 - T273955 | 
              
                | 2021-02-15 13:33:39 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 13:33:41 | <stashbot> | T273955: decommission db1093.eqiad.wmnet - https://phabricator.wikimedia.org/T273955 | 
              
                | 2021-02-15 13:34:02 | <wikibugs> | ('PS5) ''Jbond: Add check to error when calling to hiera() [puppet-lint/wmf_styleguide-check] - ''https://gerrit.wikimedia.org/r/659789 (https://phabricator.wikimedia.org/T209953) (owner: ''Ladsgroup)' | 
              
                | 2021-02-15 13:34:39 | <wikibugs> | ('CR) ''jerkins-bot: [V: ''-1] Add check to error when calling to hiera() [puppet-lint/wmf_styleguide-check] - ''https://gerrit.wikimedia.org/r/659789 (https://phabricator.wikimedia.org/T209953) (owner: ''Ladsgroup)' | 
              
                | 2021-02-15 13:38:10 | <moritzm> | !log installing subversion security updates | 
              
                | 2021-02-15 13:38:14 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 13:41:38 | <wikibugs> | ('PS6) ''Jbond: Add check to error when calling to hiera() [puppet-lint/wmf_styleguide-check] - ''https://gerrit.wikimedia.org/r/659789 (https://phabricator.wikimedia.org/T209953) (owner: ''Ladsgroup)' | 
              
                | 2021-02-15 13:43:11 | <icinga-wm> | RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 13:47:55 | <wikibugs> | ('PS2) ''Muehlenhoff: admin: Add christinedk user [puppet] - ''https://gerrit.wikimedia.org/r/664226 (https://phabricator.wikimedia.org/T274304) (owner: ''Vgutierrez)' | 
              
                | 2021-02-15 13:48:03 | <wikibugs> | ('PS7) ''Jbond: Add check to error when calling to hiera() [puppet-lint/wmf_styleguide-check] - ''https://gerrit.wikimedia.org/r/659789 (https://phabricator.wikimedia.org/T209953) (owner: ''Ladsgroup)' | 
              
                | 2021-02-15 13:48:13 | <wikibugs> | ('CR) ''jerkins-bot: [V: ''-1] Add check to error when calling to hiera() [puppet-lint/wmf_styleguide-check] - ''https://gerrit.wikimedia.org/r/659789 (https://phabricator.wikimedia.org/T209953) (owner: ''Ladsgroup)' | 
              
                | 2021-02-15 13:53:00 | <logmsgbot> | !log gehel@cumin2001 START - Cookbook sre.wdqs.data-reload | 
              
                | 2021-02-15 13:53:03 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 13:57:13 | <moritzm> | !log installing libonig security update for stretch | 
              
                | 2021-02-15 13:57:16 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 14:08:09 | <godog> | !log swift eqiad-prod: add weight back to sdg on ms-be1054 - T273582 | 
              
                | 2021-02-15 14:08:14 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 14:08:15 | <stashbot> | T273582: Put sdg1 on ms-be1054 back in service - https://phabricator.wikimedia.org/T273582 | 
              
                | 2021-02-15 14:10:43 | <wikibugs> | 'SRE, ''SRE-swift-storage, ''Patch-For-Review, ''User-fgiunchedi: swift backend decomms / rebalances are noisy - https://phabricator.wikimedia.org/T221904 (''fgiunchedi) ''Open→''Resolved I'm boldly resolving this again since
                  limiting memory usage for object replication processes helped a whole lot to...' | 
              
                | 2021-02-15 14:12:42 | <wikibugs> | ('PS1) ''Urbanecm: Add *.president.az to the wgCopyUploadsDomains allowlist of Wikimedia Commons [mediawiki-config] - ''https://gerrit.wikimedia.org/r/664294 (https://phabricator.wikimedia.org/T274789)' | 
              
                | 2021-02-15 14:13:04 | <Urbanecm> | jouncebot: now | 
              
                | 2021-02-15 14:13:05 | <jouncebot> | No deployments scheduled for the next 3 hour(s) and 46 minute(s) | 
              
                | 2021-02-15 14:13:15 | <wikibugs> | ('PS2) ''Urbanecm: Add *.president.az to the wgCopyUploadsDomains allowlist of Wikimedia Commons [mediawiki-config] - ''https://gerrit.wikimedia.org/r/664294 (https://phabricator.wikimedia.org/T274789)' | 
              
                | 2021-02-15 14:13:18 | <wikibugs> | ('CR) ''Urbanecm: [C: ''+2] Add *.president.az to the wgCopyUploadsDomains allowlist of Wikimedia Commons [mediawiki-config] - ''https://gerrit.wikimedia.org/r/664294 (https://phabricator.wikimedia.org/T274789) (owner: ''Urbanecm)' | 
              
                | 2021-02-15 14:14:07 | <wikibugs> | ('Merged) ''jenkins-bot: Add *.president.az to the wgCopyUploadsDomains allowlist of Wikimedia Commons [mediawiki-config] - ''https://gerrit.wikimedia.org/r/664294 (https://phabricator.wikimedia.org/T274789) (owner: ''Urbanecm)' | 
              
                | 2021-02-15 14:17:02 | <logmsgbot> | !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 00905c4a7e4bb69f39e52e1c4d4d6168006b0e7b: Add *.president.az to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T274789) (duration: 01m 09s) | 
              
                | 2021-02-15 14:17:06 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 14:17:07 | <stashbot> | T274789: Add <https://static.president.az/>; to the wgCopyUploadsDomains allowlist of Wikimedia Commons - https://phabricator.wikimedia.org/T274789 | 
              
                | 2021-02-15 14:19:43 | <icinga-wm> | PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 14:23:44 | <wikibugs> | ('PS8) ''Jbond: Add check to error when calling to hiera() [puppet-lint/wmf_styleguide-check] - ''https://gerrit.wikimedia.org/r/659789 (https://phabricator.wikimedia.org/T209953) (owner: ''Ladsgroup)' | 
              
                | 2021-02-15 14:25:33 | <icinga-wm> | RECOVERY - Check systemd state on sodium is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 14:28:37 | <wikibugs> | ('CR) ''David Caro: utils: add script to run docker ci tests locally (''3 comments) [software/spicerack] - ''https://gerrit.wikimedia.org/r/663205 (https://phabricator.wikimedia.org/T274338) (owner: ''David Caro)' | 
              
                | 2021-02-15 14:31:40 | <wikibugs> | ('CR) ''Jbond: [C: ''+2] Add check to error when calling to hiera() [puppet-lint/wmf_styleguide-check] - ''https://gerrit.wikimedia.org/r/659789 (https://phabricator.wikimedia.org/T209953) (owner: ''Ladsgroup)' | 
              
                | 2021-02-15 14:34:23 | <wikibugs> | 'SRE, ''Maps, ''Product-Infrastructure-Team-Backlog, ''Services, ''Service-deployment-requests: New Service Request geoshapes - https://phabricator.wikimedia.org/T274388 (''MoritzMuehlenhoff) p:''Triage→''Medium' | 
              
                | 2021-02-15 14:34:33 | <wikibugs> | 'SRE, ''Maps, ''Product-Infrastructure-Team-Backlog, ''Services, ''Service-deployment-requests: [DRAFT] New Service Request tegola - https://phabricator.wikimedia.org/T274390 (''MoritzMuehlenhoff) p:''Triage→''Medium' | 
              
                | 2021-02-15 14:45:09 | <icinga-wm> | RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 14:48:25 | <wikibugs> | ('PS1) ''Jbond: Gemfile: increase dependency for wmf_style-stylegude-check [puppet] - ''https://gerrit.wikimedia.org/r/664297 (https://phabricator.wikimedia.org/T209953)' | 
              
                | 2021-02-15 15:04:50 | <godog> | !log upgrade grafana to 7.4.1 on grafana1002 - T263747 | 
              
                | 2021-02-15 15:04:54 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:04:55 | <stashbot> | T263747: Upgrade Grafana to 7.4 - https://phabricator.wikimedia.org/T263747 | 
              
                | 2021-02-15 15:06:15 | <wikibugs> | ('CR) ''Ppchelko: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)' | 
              
                | 2021-02-15 15:06:27 | <wikibugs> | 'SRE, ''SRE-Access-Requests: Requesting access to stat boxes for mlitn - https://phabricator.wikimedia.org/T274749 (''MoritzMuehlenhoff) Also adding @Ottomata for approval for analytics-privatedata-users.' | 
              
                | 2021-02-15 15:09:46 | <moritzm> | !log reimaging bast3004 to buster | 
              
                | 2021-02-15 15:09:49 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:15:06 | <wikibugs> | ('PS1) ''Bartosz Dziewoński: CommentFormatter: Fix problems with editsection and quotes [extensions/DiscussionTools] (wmf/1.36.0-wmf.30) - ''https://gerrit.wikimedia.org/r/664254 (https://phabricator.wikimedia.org/T274709)' | 
              
                | 2021-02-15 15:17:18 | <logmsgbot> | !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host schema1003.eqiad.wmnet | 
              
                | 2021-02-15 15:17:21 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:17:21 | <wikibugs> | ('CR) ''Jbond: "did a quick pass however im not that familiar with the current decom cook book" (''7 comments) [cookbooks] - ''https://gerrit.wikimedia.org/r/663878 (owner: ''Elukey)' | 
              
                | 2021-02-15 15:20:05 | <wikibugs> | ('PS1) ''Kormat: integration_env: Rework cli to simplify operations [software/wmfmariadbpy] - ''https://gerrit.wikimedia.org/r/664300' | 
              
                | 2021-02-15 15:20:10 | <icinga-wm> | PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp5012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server | 
              
                | 2021-02-15 15:27:56 | <wikibugs> | ('CR) ''Hashar: [C: ''+1] "Can be merged anytime, the CI job always does a gem update :]" [puppet] - ''https://gerrit.wikimedia.org/r/664297 (https://phabricator.wikimedia.org/T209953) (owner: ''Jbond)' | 
              
                | 2021-02-15 15:28:49 | <wikibugs> | ('CR) ''Jbond: [C: ''+2] Gemfile: increase dependency for wmf_style-stylegude-check [puppet] - ''https://gerrit.wikimedia.org/r/664297 (https://phabricator.wikimedia.org/T209953) (owner: ''Jbond)' | 
              
                | 2021-02-15 15:30:19 | <logmsgbot> | !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1003.eqiad.wmnet | 
              
                | 2021-02-15 15:30:23 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:31:21 | <icinga-wm> | RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp5012 is OK: HTTP OK: HTTP/1.0 200 OK - 23547 bytes in 0.829 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server | 
              
                | 2021-02-15 15:33:08 | <moritzm> | !log installing linux-4.19 update for Stretch on servers which have it installed (no reboots, just updating the kernels) | 
              
                | 2021-02-15 15:33:12 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:33:35 | <wikibugs> | ('CR) ''Kormat: [C: ''+2] integration_env: Rework cli to simplify operations [software/wmfmariadbpy] - ''https://gerrit.wikimedia.org/r/664300 (owner: ''Kormat)' | 
              
                | 2021-02-15 15:34:16 | <logmsgbot> | !log jmm@cumin2001 START - Cookbook sre.hosts.downtime for 2:00:00 on bast3004.wikimedia.org with reason: REIMAGE | 
              
                | 2021-02-15 15:34:20 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:34:30 | <wikibugs> | ('CR) ''Jcrespo: [C: ''+2] Preventive commit for jynus to misspell "bullseye", next Debian version [puppet] - ''https://gerrit.wikimedia.org/r/664237 (owner: ''Jcrespo)' | 
              
                | 2021-02-15 15:36:11 | <logmsgbot> | !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' . | 
              
                | 2021-02-15 15:36:12 | <logmsgbot> | !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' . | 
              
                | 2021-02-15 15:36:12 | <logmsgbot> | !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' . | 
              
                | 2021-02-15 15:36:15 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:36:18 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:36:20 | <logmsgbot> | !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast3004.wikimedia.org with reason: REIMAGE | 
              
                | 2021-02-15 15:36:22 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:36:25 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:36:46 | <wikibugs> | ('PS1) ''Jcrespo: testing test at test at testing [puppet] - ''https://gerrit.wikimedia.org/r/664301' | 
              
                | 2021-02-15 15:36:54 | <wikibugs> | ('Merged) ''jenkins-bot: integration_env: Rework cli to simplify operations [software/wmfmariadbpy] - ''https://gerrit.wikimedia.org/r/664300 (owner: ''Kormat)' | 
              
                | 2021-02-15 15:38:02 | <wikibugs> | ('CR) ''jerkins-bot: [V: ''-1] testing test at test at testing [puppet] - ''https://gerrit.wikimedia.org/r/664301 (owner: ''Jcrespo)' | 
              
                | 2021-02-15 15:38:36 | <logmsgbot> | !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host schema1004.eqiad.wmnet | 
              
                | 2021-02-15 15:38:39 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:38:49 | <wikibugs> | ('CR) ''Jcrespo: "16:37:55 Typo found!" [puppet] - ''https://gerrit.wikimedia.org/r/664301 (owner: ''Jcrespo)' | 
              
                | 2021-02-15 15:39:13 | <wikibugs> | ('Abandoned) ''Jcrespo: testing test at test at testing [puppet] - ''https://gerrit.wikimedia.org/r/664301 (owner: ''Jcrespo)' | 
              
                | 2021-02-15 15:39:46 | <wikibugs> | ('CR) ''Alexandros Kosiaris: [C: ''-1] "1 pedantic comment but perhaps we can solve this more easily, see inline." (''2 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/659863 (owner: ''JMeybohm)' | 
              
                | 2021-02-15 15:39:52 | <wikibugs> | 'SRE: reprepro unable to run checkupdate and import upgraded packages - https://phabricator.wikimedia.org/T274797 (''fgiunchedi)' | 
              
                | 2021-02-15 15:40:39 | <wikibugs> | ('PS1) ''Elukey: hadoop: update the HDFS Namenode rack configuration [puppet] - ''https://gerrit.wikimedia.org/r/664302 (https://phabricator.wikimedia.org/T274795)' | 
              
                | 2021-02-15 15:41:13 | <wikibugs> | 'SRE: reprepro unable to run checkupdate and import upgraded packages - https://phabricator.wikimedia.org/T274797 (''fgiunchedi)' | 
              
                | 2021-02-15 15:44:52 | <wikibugs> | ('CR) ''Alexandros Kosiaris: "+1, but perhaps we don't even need it? See dependent commit" [deployment-charts] - ''https://gerrit.wikimedia.org/r/659864 (owner: ''JMeybohm)' | 
              
                | 2021-02-15 15:45:07 | <wikibugs> | ('PS1) ''Muehlenhoff: Add a comment to the snapshot block [puppet] - ''https://gerrit.wikimedia.org/r/664303' | 
              
                | 2021-02-15 15:46:19 | <logmsgbot> | !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1004.eqiad.wmnet | 
              
                | 2021-02-15 15:46:21 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:46:44 | <wikibugs> | ('PS1) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: add vlan 2120 back into the neutron bridge" [puppet] - ''https://gerrit.wikimedia.org/r/664255' | 
              
                | 2021-02-15 15:46:53 | <wikibugs> | ('PS2) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: add vlan 2120 back into the neutron bridge" [puppet] - ''https://gerrit.wikimedia.org/r/664255' | 
              
                | 2021-02-15 15:47:25 | <wikibugs> | ('PS3) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: add vlan 2120 back into the neutron bridge" [puppet] - ''https://gerrit.wikimedia.org/r/664255 (https://phabricator.wikimedia.org/T272963)' | 
              
                | 2021-02-15 15:47:45 | <wikibugs> | ('PS2) ''Elukey: hadoop: update the HDFS Namenode rack configuration [puppet] - ''https://gerrit.wikimedia.org/r/664302 (https://phabricator.wikimedia.org/T274795)' | 
              
                | 2021-02-15 15:48:09 | <logmsgbot> | !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' . | 
              
                | 2021-02-15 15:48:12 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:48:54 | <logmsgbot> | !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host schema2003.codfw.wmnet | 
              
                | 2021-02-15 15:48:56 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:49:03 | <wikibugs> | ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] Revert "cloud: hiera: add vlan 2120 back into the neutron bridge" [puppet] - ''https://gerrit.wikimedia.org/r/664255 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)' | 
              
                | 2021-02-15 15:50:13 | <wikibugs> | ('PS1) ''Muehlenhoff: Remove obsolete cloudera config from reprepro [puppet] - ''https://gerrit.wikimedia.org/r/664304 (https://phabricator.wikimedia.org/T274797)' | 
              
                | 2021-02-15 15:50:56 | <wikibugs> | ('CR) ''Ppchelko: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)' | 
              
                | 2021-02-15 15:51:26 | <logmsgbot> | !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2003.codfw.wmnet | 
              
                | 2021-02-15 15:51:26 | <wikibugs> | ('PS1) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: connect cloudnet servers back to vlan 2120" [puppet] - ''https://gerrit.wikimedia.org/r/664256' | 
              
                | 2021-02-15 15:51:29 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:51:39 | <wikibugs> | ('PS2) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: connect cloudnet servers back to vlan 2120" [puppet] - ''https://gerrit.wikimedia.org/r/664256 (https://phabricator.wikimedia.org/T272963)' | 
              
                | 2021-02-15 15:51:47 | <wikibugs> | ('PS3) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: connect cloudnet servers back to vlan 2120" [puppet] - ''https://gerrit.wikimedia.org/r/664256 (https://phabricator.wikimedia.org/T272963)' | 
              
                | 2021-02-15 15:52:15 | <wikibugs> | 'SRE, ''Patch-For-Review: reprepro unable to run checkupdate and import upgraded packages - https://phabricator.wikimedia.org/T274797 (''fgiunchedi) Note that the elastic 5 "not found" errors seem flappy, I just got a `checkupdate` run without those errors' | 
              
                | 2021-02-15 15:53:19 | <wikibugs> | ('PS1) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: enable back neutron hacks in codfw1dev" [puppet] - ''https://gerrit.wikimedia.org/r/664257' | 
              
                | 2021-02-15 15:53:26 | <wikibugs> | ('PS2) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: enable back neutron hacks in codfw1dev" [puppet] - ''https://gerrit.wikimedia.org/r/664257' | 
              
                | 2021-02-15 15:53:34 | <logmsgbot> | !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host schema2004.codfw.wmnet | 
              
                | 2021-02-15 15:53:36 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:53:37 | <wikibugs> | ('PS3) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: enable back neutron hacks in codfw1dev" [puppet] - ''https://gerrit.wikimedia.org/r/664257 (https://phabricator.wikimedia.org/T272963)' | 
              
                | 2021-02-15 15:53:49 | <wikibugs> | ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] Revert "cloud: hiera: connect cloudnet servers back to vlan 2120" [puppet] - ''https://gerrit.wikimedia.org/r/664256 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)' | 
              
                | 2021-02-15 15:53:57 | <wikibugs> | ('CR) ''Filippo Giunchedi: [C: ''+1] Add a comment to the snapshot block [puppet] - ''https://gerrit.wikimedia.org/r/664303 (owner: ''Muehlenhoff)' | 
              
                | 2021-02-15 15:57:26 | <wikibugs> | ('PS4) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: enable back neutron hacks in codfw1dev" This reverts commit 5ca98c9df08f6c6e2d97bc7b6279cdaf573eddce. Reason for revert: rebuilding the cloudgw setup Bug: T272963 Change-Id: I8185f4fa36a70255940d78db45b0f50cfc6abb98 Signed-off-by: Arturo Borrero Gonzalez <aborrero@wikimedia.org> [puppet] - ''https://gerrit.wikimedia.org/r/664257 (https://phabricator.wi' | 
              
                | 2021-02-15 15:58:00 | <logmsgbot> | !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2004.codfw.wmnet | 
              
                | 2021-02-15 15:58:03 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 15:58:12 | <wikibugs> | ('PS5) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: enable back neutron hacks in codfw1dev" [puppet] - ''https://gerrit.wikimedia.org/r/664257 (https://phabricator.wikimedia.org/T272963)' | 
              
                | 2021-02-15 15:58:20 | <wikibugs> | 'SRE, ''SRE-tools, ''User-Joe: Covert deploy_apache_change.sh to a spicerack cookbook - https://phabricator.wikimedia.org/T203948 (''jijiki)' | 
              
                | 2021-02-15 16:02:38 | <wikibugs> | ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] Revert "cloud: hiera: enable back neutron hacks in codfw1dev" [puppet] - ''https://gerrit.wikimedia.org/r/664257 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)' | 
              
                | 2021-02-15 16:04:06 | <wikibugs> | ('CR) ''Volans: "Thanks for the refactor, some comments inline, some already discussed over IRC." (''14 comments) [software/spicerack] - ''https://gerrit.wikimedia.org/r/661921 (https://phabricator.wikimedia.org/T267412) (owner: ''David Caro)' | 
              
                | 2021-02-15 16:04:51 | <wikibugs> | 'SRE, ''ops-eqiad, ''DC-Ops, ''Wikidata, and 3 others: Upgrade firmware on wdqs1009 - https://phabricator.wikimedia.org/T274751 (''Gehel)' | 
              
                | 2021-02-15 16:05:18 | <logmsgbot> | !log aborrero@cumin2001 START - Cookbook sre.hosts.reboot-single for host cloudnet2003-dev.codfw.wmnet | 
              
                | 2021-02-15 16:05:21 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 16:05:56 | <wikibugs> | 'SRE: netbox update (triggered from reimage script) failed: 'ImportPuppetDB' object has no attribute 'log_error' - https://phabricator.wikimedia.org/T274802 (''MoritzMuehlenhoff)' | 
              
                | 2021-02-15 16:07:37 | <logmsgbot> | !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet | 
              
                | 2021-02-15 16:07:40 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 16:09:49 | <logmsgbot> | !log aborrero@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2003-dev.codfw.wmnet | 
              
                | 2021-02-15 16:09:52 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 16:10:23 | <icinga-wm> | PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) is CRITICAL: Test Zotero and citoid alive returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Citoid | 
              
                | 2021-02-15 16:11:29 | <wikibugs> | ('PS1) ''Arturo Borrero Gonzalez: cloudgw: stop setting up VIP addresses that are now handle via keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/664307 (https://phabricator.wikimedia.org/T272963)' | 
              
                | 2021-02-15 16:11:55 | <icinga-wm> | RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid | 
              
                | 2021-02-15 16:12:12 | <logmsgbot> | !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2001.codfw.wmnet | 
              
                | 2021-02-15 16:12:16 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 16:12:57 | <logmsgbot> | !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet | 
              
                | 2021-02-15 16:13:01 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 16:14:35 | <hoo> | !log Updated the Wikidata property suggester with data from the 2021-02-01 JSON dump (with pre-applied T132839 workarounds) | 
              
                | 2021-02-15 16:14:38 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 16:14:40 | <stashbot> | T132839: [RfC] Property suggester suggests human properties for non-human items - https://phabricator.wikimedia.org/T132839 | 
              
                | 2021-02-15 16:16:34 | <wikibugs> | ('PS2) ''Arturo Borrero Gonzalez: cloudgw: stop setting up VIP addresses that are now handle via keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/664307 (https://phabricator.wikimedia.org/T272963)' | 
              
                | 2021-02-15 16:18:20 | <wikibugs> | ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] cloudgw: stop setting up VIP addresses that are now handle via keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/664307 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)' | 
              
                | 2021-02-15 16:18:35 | <logmsgbot> | !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2002.codfw.wmnet | 
              
                | 2021-02-15 16:18:39 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 16:22:08 | <wikibugs> | ('CR) ''Muehlenhoff: [C: ''+2] Add a comment to the snapshot block [puppet] - ''https://gerrit.wikimedia.org/r/664303 (owner: ''Muehlenhoff)' | 
              
                | 2021-02-15 16:22:14 | <logmsgbot> | !log jmm@puppetmaster1001 conftool action : set/pooled=inactive; selector: name=mwdebug1002.eqiad.wmnet | 
              
                | 2021-02-15 16:22:17 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 16:24:53 | <wikibugs> | 'SRE: netbox update (triggered from reimage script) failed: 'ImportPuppetDB' object has no attribute 'log_error' - https://phabricator.wikimedia.org/T274802 (''Volans) p:''Triage→''High a:''Volans' | 
              
                | 2021-02-15 16:25:11 | <wikibugs> | ('PS1) ''Volans: interface automation: fix typo in method name [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664308 (https://phabricator.wikimedia.org/T274802)' | 
              
                | 2021-02-15 16:26:03 | <jayme> | !log rolled back linkrecommendation helm releases to the most recent revision running chart verion linkrecommendation-0.0.4 on clusters codfw and eqiad (cc: kostajh) | 
              
                | 2021-02-15 16:26:05 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 16:27:09 | <logmsgbot> | !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single for host kubestage1001.eqiad.wmnet | 
              
                | 2021-02-15 16:27:13 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 16:28:09 | <wikibugs> | ('CR) ''Volans: [C: ''+2] "self merging as it's just a typo, will run the script against bast3004 manually to verify it" [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664308 (https://phabricator.wikimedia.org/T274802) (owner: ''Volans)' | 
              
                | 2021-02-15 16:32:38 | <logmsgbot> | !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1001.eqiad.wmnet | 
              
                | 2021-02-15 16:32:43 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 16:33:48 | <volans> | !log restarted netbox on netbox1001 | 
              
                | 2021-02-15 16:33:51 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 16:36:18 | <icinga-wm> | PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 16:36:42 | <wikibugs> | ('PS1) ''Volans: interface automation: fix typo in method name (2) [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664309 (https://phabricator.wikimedia.org/T274802)' | 
              
                | 2021-02-15 16:37:12 | <volans> | mmmh icinga, are you sure? it's all good there, it was me and was already fixed | 
              
                | 2021-02-15 16:37:20 | <wikibugs> | ('CR) ''jerkins-bot: [V: ''-1] interface automation: fix typo in method name (2) [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664309 (https://phabricator.wikimedia.org/T274802) (owner: ''Volans)' | 
              
                | 2021-02-15 16:37:56 | <wikibugs> | ('PS2) ''Volans: interface automation: fix typo in method name (2) [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664309 (https://phabricator.wikimedia.org/T274802)' | 
              
                | 2021-02-15 16:39:57 | <logmsgbot> | !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single for host kubestage1002.eqiad.wmnet | 
              
                | 2021-02-15 16:40:00 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 16:40:06 | <wikibugs> | ('CR) ''Volans: [C: ''+2] "Typo fix." [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664309 (https://phabricator.wikimedia.org/T274802) (owner: ''Volans)' | 
              
                | 2021-02-15 16:40:14 | <wikibugs> | ('PS1) ''Kosta Harlan: linkrecommendation: Set backoffLimit to 1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/664310 (https://phabricator.wikimedia.org/T265893)' | 
              
                | 2021-02-15 16:40:14 | <icinga-wm> | PROBLEM - kubelet operational latencies on kubestage1001 is CRITICAL: instance=kubestage1001.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 | 
              
                | 2021-02-15 16:40:45 | <jayme> | ^ thats "expected" (kind of) from reboots | 
              
                | 2021-02-15 16:41:29 | <wikibugs> | ('CR) ''jerkins-bot: [V: ''-1] linkrecommendation: Set backoffLimit to 1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/664310 (https://phabricator.wikimedia.org/T265893) (owner: ''Kosta Harlan)' | 
              
                | 2021-02-15 16:41:40 | <icinga-wm> | RECOVERY - kubelet operational latencies on kubestage1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 | 
              
                | 2021-02-15 16:43:00 | <wikibugs> | ('PS2) ''Kosta Harlan: linkrecommendation: Set backoffLimit to 1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/664310 (https://phabricator.wikimedia.org/T265893)' | 
              
                | 2021-02-15 16:43:18 | <icinga-wm> | RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 16:44:44 | <wikibugs> | 'SRE, ''CAS-SSO, ''Patch-For-Review: Investigate CAS Session logout - https://phabricator.wikimedia.org/T273867 (''Gehel) Removing discovery-search, if you need our help again, please ping us!' | 
              
                | 2021-02-15 16:46:44 | <logmsgbot> | !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1002.eqiad.wmnet | 
              
                | 2021-02-15 16:46:49 | <stashbot> | Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 
              
                | 2021-02-15 16:48:30 | <icinga-wm> | PROBLEM - Check systemd state on netbox2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 16:48:50 | <wikibugs> | 'SRE, ''Patch-For-Review: netbox update (triggered from reimage script) failed: 'ImportPuppetDB' object has no attribute 'log_error' - https://phabricator.wikimedia.org/T274802 (''Volans) a:''Volans→''crusnov @crusnov passing it over to you. I've fixed the basic typos, but the problem now is that the scri...' | 
              
                | 2021-02-15 16:49:43 | <wikibugs> | ('PS1) ''Arturo Borrero Gonzalez: cloudgw: switch data place interface config modes to manual [puppet] - ''https://gerrit.wikimedia.org/r/664311 (https://phabricator.wikimedia.org/T272963)' | 
              
                | 2021-02-15 16:49:51 | <wikibugs> | 'SRE, ''Patch-For-Review: netbox update (triggered from reimage script) failed: 'ImportPuppetDB' object has no attribute 'log_error' - https://phabricator.wikimedia.org/T274802 (''crusnov) That seems reasonable, I'll look at it and get a patch out soonish.' | 
              
                | 2021-02-15 16:52:45 | <wikibugs> | ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] cloudgw: switch data place interface config modes to manual [puppet] - ''https://gerrit.wikimedia.org/r/664311 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)' | 
              
                | 2021-02-15 16:53:09 | <icinga-wm> | PROBLEM - kubelet operational latencies on kubestage1002 is CRITICAL: instance=kubestage1002.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 | 
              
                | 2021-02-15 16:57:37 | <icinga-wm> | RECOVERY - kubelet operational latencies on kubestage1002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 | 
              
                | 2021-02-15 17:00:58 | <wikibugs> | 'SRE, ''Maps, ''Product-Infrastructure-Team-Backlog, ''Services, ''Service-deployment-requests: New Service Request geoshapes - https://phabricator.wikimedia.org/T274388 (''akosiaris) Thanks for this task! So I 've studied the diagrams a bit, they are helpful. The deployment pipeline definitely
                  suppor...' | 
              
                | 2021-02-15 17:03:18 | <wikibugs> | ('CR) ''Elukey: [C: ''+1] "Just to confirm - this will keep the cloudera components but clear all the pull-specific bits. If so, big +1, thanks :)" [puppet] - ''https://gerrit.wikimedia.org/r/664304 (https://phabricator.wikimedia.org/T274797) (owner: ''Muehlenhoff)' | 
              
                | 2021-02-15 17:16:13 | <wikibugs> | ('CR) ''Elukey: "John thanks a lot for the review! For this particular use case, I'd prefer to just move the existing code base to the class api and then m" [cookbooks] - ''https://gerrit.wikimedia.org/r/663878 (owner: ''Elukey)' | 
              
                | 2021-02-15 17:27:06 | <wikibugs> | ('CR) ''Elukey: [C: ''+2] hadoop: update the HDFS Namenode rack configuration [puppet] - ''https://gerrit.wikimedia.org/r/664302 (https://phabricator.wikimedia.org/T274795) (owner: ''Elukey)' | 
              
                | 2021-02-15 17:28:16 | <wikibugs> | ('PS1) ''Jcrespo: configcluster: Enable etcd v3 backups for stretch hosts [puppet] - ''https://gerrit.wikimedia.org/r/664313 (https://phabricator.wikimedia.org/T271573)' | 
              
                | 2021-02-15 17:28:18 | <wikibugs> | ('PS1) ''Jcrespo: bacula: Revert TLS 1.0 downgrade on storage servers (including director) [puppet] - ''https://gerrit.wikimedia.org/r/664314 (https://phabricator.wikimedia.org/T273182)' | 
              
                | 2021-02-15 17:29:54 | <wikibugs> | ('Abandoned) ''Jcrespo: jessie: Remove old openssl override after revert to package version [puppet] - ''https://gerrit.wikimedia.org/r/660857 (https://phabricator.wikimedia.org/T273182) (owner: ''Jcrespo)' | 
              
                | 2021-02-15 17:30:04 | <wikibugs> | ('CR) ''Kosta Harlan: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)' | 
              
                | 2021-02-15 17:32:07 | <wikibugs> | ('CR) ''JMeybohm: [C: ''+1] linkrecommendation: Set backoffLimit to 1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/664310 (https://phabricator.wikimedia.org/T265893) (owner: ''Kosta Harlan)' | 
              
                | 2021-02-15 17:32:43 | <wikibugs> | ('PS10) ''David Caro: toolforge.etcdctl: add new etcdctl module [software/spicerack] - ''https://gerrit.wikimedia.org/r/661921 (https://phabricator.wikimedia.org/T267412)' | 
              
                | 2021-02-15 17:32:43 | <icinga-wm> | RECOVERY - Check systemd state on netbox2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state | 
              
                | 2021-02-15 17:33:16 | <wikibugs> | ('CR) ''David Caro: "Done all the changes as requested" (''13 comments) [software/spicerack] - ''https://gerrit.wikimedia.org/r/661921 (https://phabricator.wikimedia.org/T267412) (owner: ''David Caro)' | 
              
                | 2021-02-15 17:39:15 | <wikibugs> | ('CR) ''Jcrespo: "Have you tested backups with the script on etcd3? I don't see anything, like a path, completely wrong, but I don't know enough about what " [puppet] - ''https://gerrit.wikimedia.org/r/664313 (https://phabricator.wikimedia.org/T271573) (owner: ''Jcrespo)' | 
              
                | 2021-02-15 17:41:17 | <wikibugs> | 'SRE, ''serviceops, ''Patch-For-Review: upgrade conf2* servers to stretch - https://phabricator.wikimedia.org/T271573 (''jcrespo) I've sent: https://gerrit.wikimedia.org/r/c/operations/puppet/+/664313 Independently of the pace of upgrading, we should give some priority to generating fresh backups from the...' | 
              
                | 2021-02-15 17:43:56 | <wikibugs> | ('PS2) ''Jcrespo: configcluster: Enable etcd v3 backups for stretch hosts [puppet] - ''https://gerrit.wikimedia.org/r/664313 (https://phabricator.wikimedia.org/T271573)' | 
              
                | 2021-02-15 17:44:23 | <wikibugs> | ('PS3) ''Jcrespo: configcluster: Enable etcd v3 backups for stretch hosts [puppet] - ''https://gerrit.wikimedia.org/r/664313 (https://phabricator.wikimedia.org/T271573)' | 
              
                | 2021-02-15 17:55:42 | <wikibugs> | ('PS1) ''Arturo Borrero Gonzalez: cloudgw: interfaces: relax check on routing setup by using 'onlink' [puppet] - ''https://gerrit.wikimedia.org/r/664317 (https://phabricator.wikimedia.org/T272963)' | 
              
                | 2021-02-15 17:57:40 | <wikibugs> | ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] cloudgw: interfaces: relax check on routing setup by using 'onlink' [puppet] - ''https://gerrit.wikimedia.org/r/664317 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)' | 
              
                | 2021-02-15 17:59:36 | <wikibugs> | ('CR) ''Muehlenhoff: "> Patch Set 1: Code-Review+1" [puppet] - ''https://gerrit.wikimedia.org/r/664304 (https://phabricator.wikimedia.org/T274797) (owner: ''Muehlenhoff)' | 
              
                | 2021-02-15 18:00:04 | <jouncebot> | ryankemper: Dear deployers, time to do the Wikidata Query Service weekly deploy deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210215T1800). | 
              
                | 2021-02-15 18:05:14 | <wikibugs> | ('CR) ''Ppchelko: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)' | 
              
                | 2021-02-15 18:10:38 | <wikibugs> | ('CR) ''Jbond: [C: ''+1] "> Patch Set 1:" [cookbooks] - ''https://gerrit.wikimedia.org/r/663878 (owner: ''Elukey)' | 
              
                | 2021-02-15 18:14:52 | <wikibugs> | 'SRE, ''DBA, ''serviceops, ''Goal, ''Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (''jcrespo)' | 
              
                | 2021-02-15 18:15:15 | <wikibugs> | 'SRE, ''Data-Persistence-Backup, ''Goal, ''Patch-For-Review: Followup to backup1001 bacula switchover (misc pending tasks) - https://phabricator.wikimedia.org/T238048 (''jcrespo)' | 
              
                | 2021-02-15 18:15:40 | <wikibugs> | ('CR) ''Kosta Harlan: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)' | 
              
                | 2021-02-15 18:15:41 | <wikibugs> | 'SRE, ''Data-Persistence-Backup, ''Goal, ''Patch-For-Review: Followup to backup1001 bacula switchover (misc pending tasks) - https://phabricator.wikimedia.org/T238048 (''jcrespo) ''Open→''Resolved Regarding the last 2 points, we
                  have, in a way, done the last point "parametrize better the jobdefaults i...' | 
              
                | 2021-02-15 18:17:39 | <icinga-wm> | PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes | 
              
                | 2021-02-15 18:28:38 | <wikibugs> | ('PS1) ''Effie Mouzeli: (WIP) mediawiki::alerts add alert when 20% of servers is saturated [puppet] - ''https://gerrit.wikimedia.org/r/664319 (https://phabricator.wikimedia.org/T267176)' | 
              
                | 2021-02-15 18:33:52 | <wikibugs> | ('CR) ''Ppchelko: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)' | 
              
                | 2021-02-15 18:41:27 | <icinga-wm> | PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw | 
              
                | 2021-02-15 18:41:47 | <icinga-wm> | PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad | 
              
                | 2021-02-15 18:45:40 | <jynus> | that looks like DPLA bot on commons | 
              
                | 2021-02-15 18:46:29 | <jynus> | I see no issues, but keep an eye in case something degrades (thumbail generation, codfw s4 replication, etc.) | 
              
                | 2021-02-15 18:47:54 | <jynus> | that's 10 1MB files per second | 
              
                | 2021-02-15 18:48:16 | <tabbycat> | jynus: swift is TimedMediaHandler or just the place where uploads are being stored? | 
              
                | 2021-02-15 18:49:21 | <jynus> | swift is our OpenStack Swift cluster, our backend storage for media and rendered stuff: https://wikitech.wikimedia.org/wiki/Swift | 
              
                | 2021-02-15 18:49:59 | <jynus> | the alert is just a warning on a high rate of uploads- that doesn't mean there is a problem, but it is an unusual state | 
              
                | 2021-02-15 18:50:23 | <jynus> | normally we worry when it is very low, because it means there is a problem with uploads | 
              
                | 2021-02-15 19:00:04 | <jouncebot> | RoanKattouw, Niharika, and Urbanecm: (Dis)respected human, time to deploy Morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210215T1900). Please do the needful. | 
              
                | 2021-02-15 19:00:04 | <jouncebot> | No GERRIT patches in the queue for this window AFAICS. | 
              
                | 2021-02-15 19:00:56 | <Urbanecm> | jynus: do we want to do T248177? | 
              
                | 2021-02-15 19:00:56 | <stashbot> | T248177: Enforce upload rate limits for bots on commons - https://phabricator.wikimedia.org/T248177 | 
              
                | 2021-02-15 19:01:29 | <Urbanecm> | (but 999 uploads per second is effectively no rate limit anyway :/ ) | 
              
                | 2021-02-15 19:02:09 | <tabbycat> | 999/s is o_O | 
              
                | 2021-02-15 19:03:32 | <tabbycat> | IIRC there is/was an UploadStash for large or batch uploads Urbanecm ? | 
              
                | 2021-02-15 19:04:10 | <Urbanecm> | there's still uploadstash, dunno if it helps with ratelimited uploads | 
              
                | 2021-02-15 19:11:01 | <icinga-wm> | PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad | 
              
                | 2021-02-15 19:21:03 | <icinga-wm> | PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw | 
              
                | 2021-02-15 19:28:58 | <wikibugs> | ('CR) ''Kosta Harlan: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)' | 
              
                | 2021-02-15 19:31:51 | <wikibugs> | ('CR) ''CRusnov: "This change is ready for review." [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664332 (https://phabricator.wikimedia.org/T274802) (owner: ''CRusnov)' | 
              
                | 2021-02-15 20:10:06 | <wikibugs> | ('PS1) ''Ladsgroup: [DNM] Test jenkins new rule on banning use of hiera() [puppet] - ''https://gerrit.wikimedia.org/r/664350' | 
              
                | 2021-02-15 20:11:43 | <wikibugs> | ('CR) ''jerkins-bot: [V: ''-1] [DNM] Test jenkins new rule on banning use of hiera() [puppet] - ''https://gerrit.wikimedia.org/r/664350 (owner: ''Ladsgroup)' | 
              
                | 2021-02-15 20:25:00 | <wikibugs> | ('Abandoned) ''Ladsgroup: [DNM] Test jenkins new rule on banning use of hiera() [puppet] - ''https://gerrit.wikimedia.org/r/664350 (owner: ''Ladsgroup)' | 
              
                | 2021-02-15 20:30:51 | <wikibugs> | 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytic Cluster for Research Scientist (Paragon) - https://phabricator.wikimedia.org/T274631 (''leila) approved. Thank you for your support!' | 
              
                | 2021-02-15 20:46:21 | <icinga-wm> | PROBLEM - MegaRAID on an-worker1097 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring | 
              
                | 2021-02-15 20:46:24 | <icinga-wm> | ACKNOWLEDGEMENT - MegaRAID on an-worker1097 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T274819 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring | 
              
                | 2021-02-15 20:46:27 | <wikibugs> | 'SRE, ''ops-eqiad: Degraded RAID on an-worker1097 - https://phabricator.wikimedia.org/T274819 (''ops-monitoring-bot)' | 
              
                | 2021-02-15 20:47:01 | <wikibugs> | 'SRE, ''ops-eqiad, ''Analytics: Degraded RAID on an-worker1097 - https://phabricator.wikimedia.org/T274819 (''Peachey88)' | 
              
                | 2021-02-15 21:00:04 | <jouncebot> | chrisalbon and accraze: It is that lovely time of the day again! You are hereby commanded to deploy Services – Graphoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210215T2100). | 
              
                | 2021-02-15 21:51:52 | <icinga-wm> | PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw | 
              
                | 2021-02-15 21:52:04 | <icinga-wm> | PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad | 
              
                | 2021-02-15 22:00:04 | <jouncebot> | Reedy and sbassett: Dear deployers, time to do the Weekly Security deployment window deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210215T2200). | 
              
                | 2021-02-15 22:50:50 | <wikibugs> | ('CR) ''Volans: [C: ''+1] "Code looks good to me, please test it on netbox-next to be sure." (''1 comment) [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664332 (https://phabricator.wikimedia.org/T274802) (owner: ''CRusnov)' | 
              
                | 2021-02-15 22:52:34 | <icinga-wm> | PROBLEM - Device not healthy -SMART- on an-worker1097 is CRITICAL: cluster=analytics device=sat+megaraid,13 instance=an-worker1097 job=node site=eqiad https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=an-worker1097&var-datasource=eqiad+prometheus/ops | 
              
                | 2021-02-15 23:31:52 | <wikibugs> | ('CR) ''Gergő Tisza: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)' |