Fork me on GitHub

Wikimedia IRC logs browser - #wikimedia-operations

Filter:
Start date
End date

Displaying 538 items:

2021-02-15 01:07:53 <icinga-wm> RECOVERY - Check systemd state on relforge1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 01:13:05 <icinga-wm> PROBLEM - Check systemd state on relforge1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 02:13:17 <icinga-wm> RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 02:18:25 <icinga-wm> PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 02:31:55 <icinga-wm> PROBLEM - MediaWiki memcached error rate on alert1001 is CRITICAL: 5870 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
2021-02-15 02:33:31 <icinga-wm> RECOVERY - MediaWiki memcached error rate on alert1001 is OK: (C)5000 gt (W)1000 gt 2 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
2021-02-15 02:42:29 <icinga-wm> RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 02:47:15 <icinga-wm> PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 03:08:29 <icinga-wm> RECOVERY - Check systemd state on relforge1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 03:13:39 <icinga-wm> PROBLEM - Check systemd state on relforge1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 03:34:49 <icinga-wm> PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 132, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
2021-02-15 03:35:57 <icinga-wm> PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 239, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
2021-02-15 04:43:13 <icinga-wm> RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 04:48:23 <icinga-wm> PROBLEM - Check systemd state on relforge1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 05:02:27 <icinga-wm> PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/page/mobile-sections/{title} (Get mobile-sections for a test page on enwiki) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase
2021-02-15 05:04:03 <icinga-wm> RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase
2021-02-15 06:02:28 <wikibugs> 'SRE, ''DBA: Decom dbmonitor2001 - https://phabricator.wikimedia.org/T274496 (''Marostegui) p:''Triage''Medium a:''Kormat Yeah, as far as I remember we're not using this for anything Assigning it for Stevie for confirmation and removal (if that applies)'
2021-02-15 06:10:23 <wikibugs> 'SRE, ''ops-eqiad, ''DBA: Investigate and repool db1134 - https://phabricator.wikimedia.org/T274472 (''Marostegui) Thanks everyone who responded to this incident!'
2021-02-15 06:17:33 <wikibugs> 'SRE, ''DBA, ''Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (''Marostegui) >>! In T258361#6822070, @jcrespo wrote: > I am taking db1163 to, at least temporarily, substitute db1134 due to T274472. Thanks. I...'
2021-02-15 06:19:15 <wikibugs> ('PS1) ''Marostegui: db1162: Enable notifications [puppet] - ''https://gerrit.wikimedia.org/r/664087 (https://phabricator.wikimedia.org/T258361)'
2021-02-15 06:20:14 <wikibugs> ('CR) ''Marostegui: [C: ''+2] db1162: Enable notifications [puppet] - ''https://gerrit.wikimedia.org/r/664087 (https://phabricator.wikimedia.org/T258361) (owner: ''Marostegui)'
2021-02-15 06:36:31 <wikibugs> ('PS1) ''Marostegui: instances.yaml: Add db1162 to dbctl [puppet] - ''https://gerrit.wikimedia.org/r/664088 (https://phabricator.wikimedia.org/T258361)'
2021-02-15 06:37:05 <wikibugs> ('CR) ''Marostegui: [C: ''+2] instances.yaml: Add db1162 to dbctl [puppet] - ''https://gerrit.wikimedia.org/r/664088 (https://phabricator.wikimedia.org/T258361) (owner: ''Marostegui)'
2021-02-15 06:40:02 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'Add db1162 to dbctl - depooled T258361', diff saved to https://phabricator.wikimedia.org/P14339 and previous config saved to /var/cache/conftool/dbconfig/20210215-064001-marostegui.json
2021-02-15 06:40:06 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 06:40:08 <stashbot> T258361: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361
2021-02-15 06:46:28 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'Pool db1162 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P14340 and previous config saved to /var/cache/conftool/dbconfig/20210215-064628-marostegui.json
2021-02-15 06:46:32 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 06:46:33 <stashbot> T258361: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361
2021-02-15 06:56:50 <wikibugs> ('PS1) ''Marostegui: install_server: Do not reimage db1162 and db1163 [puppet] - ''https://gerrit.wikimedia.org/r/664089'
2021-02-15 06:57:31 <wikibugs> ('CR) ''Marostegui: [C: ''+2] install_server: Do not reimage db1162 and db1163 [puppet] - ''https://gerrit.wikimedia.org/r/664089 (owner: ''Marostegui)'
2021-02-15 06:58:07 <icinga-wm> RECOVERY - Check systemd state on search-loader2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 07:02:06 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'Pool db1162 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P14341 and previous config saved to /var/cache/conftool/dbconfig/20210215-070206-marostegui.json
2021-02-15 07:02:10 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 07:02:12 <stashbot> T258361: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361
2021-02-15 07:09:46 <wikibugs> 'SRE, ''ops-eqiad: ms-be1034 not powering on - https://phabricator.wikimedia.org/T274488 (''elukey) ''Resolved''Open ms-be1034 is down again, same issue as the one described by Filippo... :('
2021-02-15 07:10:31 <icinga-wm> ACKNOWLEDGEMENT - Host ms-be1034 is DOWN: PING CRITICAL - Packet loss = 100% Elukey T274488
2021-02-15 07:14:17 <logmsgbot> !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-tool1007.eqiad.wmnet
2021-02-15 07:14:20 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 07:16:37 <logmsgbot> !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1007.eqiad.wmnet
2021-02-15 07:16:40 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 07:20:41 <logmsgbot> !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-tool1008.eqiad.wmnet
2021-02-15 07:20:44 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 07:22:37 <logmsgbot> !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1008.eqiad.wmnet
2021-02-15 07:22:40 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 07:24:21 <logmsgbot> !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-tool1009.eqiad.wmnet
2021-02-15 07:24:23 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 07:26:40 <logmsgbot> !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1009.eqiad.wmnet
2021-02-15 07:26:44 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 07:28:21 <logmsgbot> !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-tool1010.eqiad.wmnet
2021-02-15 07:28:26 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 07:33:24 <logmsgbot> !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1010.eqiad.wmnet
2021-02-15 07:33:29 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 07:38:23 <icinga-wm> RECOVERY - Check systemd state on relforge1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 07:42:54 <logmsgbot> !log elukey@cumin1001 START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes - elukey@cumin1001
2021-02-15 07:42:57 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 07:43:33 <icinga-wm> PROBLEM - Check systemd state on relforge1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 07:47:21 <wikibugs> ('PS1) ''ArielGlenn: wikidata json dumps: re-add source of shared functions [puppet] - ''https://gerrit.wikimedia.org/r/664090'
2021-02-15 07:48:16 <wikibugs> ('CR) ''ArielGlenn: [C: ''+2] wikidata json dumps: re-add source of shared functions [puppet] - ''https://gerrit.wikimedia.org/r/664090 (owner: ''ArielGlenn)'
2021-02-15 07:49:32 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 3%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14342 and previous config saved to /var/cache/conftool/dbconfig/20210215-074932-root.json
2021-02-15 07:49:35 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 07:57:16 <wikibugs> ('PS1) ''ArielGlenn: now that snapshot1005 is testbed host, make snapshot1007 the enwiki dumps runner [puppet] - ''https://gerrit.wikimedia.org/r/664091 (https://phabricator.wikimedia.org/T269377)'
2021-02-15 08:04:37 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 4%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14343 and previous config saved to /var/cache/conftool/dbconfig/20210215-080435-root.json
2021-02-15 08:04:40 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 08:07:33 <icinga-wm> RECOVERY - Check systemd state on relforge1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 08:08:21 <wikibugs> ('CR) ''ArielGlenn: [C: ''+2] now that snapshot1005 is testbed host, make snapshot1007 the enwiki dumps runner [puppet] - ''https://gerrit.wikimedia.org/r/664091 (https://phabricator.wikimedia.org/T269377) (owner: ''ArielGlenn)'
2021-02-15 08:10:50 <wikibugs> ('PS1) ''ArielGlenn: prep snapshot1005 and 1006 for reinstall with buster [puppet] - ''https://gerrit.wikimedia.org/r/664092 (https://phabricator.wikimedia.org/T269377)'
2021-02-15 08:13:14 <wikibugs> ('CR) ''ArielGlenn: [C: ''+2] prep snapshot1005 and 1006 for reinstall with buster [puppet] - ''https://gerrit.wikimedia.org/r/664092 (https://phabricator.wikimedia.org/T269377) (owner: ''ArielGlenn)'
2021-02-15 08:17:33 <wikibugs> 'SRE, ''Dumps-Generation, ''Platform Engineering, ''serviceops, and 2 others: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (''ops-monitoring-bot) Script wmf-auto-reimage was launched by ariel on cumin1001.eqiad.wmnet for hosts: ` snapshot1005.eqiad.wmnet ` The log can be fo...'
2021-02-15 08:19:41 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 5%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14344 and previous config saved to /var/cache/conftool/dbconfig/20210215-081940-root.json
2021-02-15 08:26:51 <wikibugs> 'SRE, ''DBA, ''Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (''Marostegui)'
2021-02-15 08:27:19 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1075 T274235', diff saved to https://phabricator.wikimedia.org/P14345 and previous config saved to /var/cache/conftool/dbconfig/20210215-082718-marostegui.json
2021-02-15 08:27:47 <icinga-wm> PROBLEM - Check systemd state on sodium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 08:29:05 <gehel> !log powercycle wdqs1009
2021-02-15 08:29:22 <wikibugs> ('PS1) ''Marostegui: db1075: Disable notifications [puppet] - ''https://gerrit.wikimedia.org/r/664093 (https://phabricator.wikimedia.org/T274235)'
2021-02-15 08:29:24 <wikibugs> ('PS1) ''Elukey: profile::hadoop::backup::namenode: add a more precise notes_url [puppet] - ''https://gerrit.wikimedia.org/r/664094'
2021-02-15 08:29:25 <logmsgbot> !log ariel@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1005.eqiad.wmnet with reason: REIMAGE
2021-02-15 08:30:06 <wikibugs> ('CR) ''Marostegui: [C: ''+2] db1075: Disable notifications [puppet] - ''https://gerrit.wikimedia.org/r/664093 (https://phabricator.wikimedia.org/T274235) (owner: ''Marostegui)'
2021-02-15 08:31:30 <logmsgbot> !log ariel@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1005.eqiad.wmnet with reason: REIMAGE
2021-02-15 08:31:48 <wikibugs> ('PS1) ''JMeybohm: tiller: Run tiller as user nobody [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/664095 (https://phabricator.wikimedia.org/T274254)'
2021-02-15 08:31:50 <wikibugs> ('PS1) ''JMeybohm: eventrouter: Use numeric UID [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/664096 (https://phabricator.wikimedia.org/T274254)'
2021-02-15 08:31:52 <wikibugs> ('PS1) ''JMeybohm: fluent-bit: Use numeric UID [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/664097 (https://phabricator.wikimedia.org/T274254)'
2021-02-15 08:31:57 <wikibugs> ('PS1) ''JMeybohm: ratelimit: Use numeric UID [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/664098 (https://phabricator.wikimedia.org/T274254)'
2021-02-15 08:32:20 <wikibugs> ('CR) ''Elukey: [C: ''+2] profile::hadoop::backup::namenode: add a more precise notes_url [puppet] - ''https://gerrit.wikimedia.org/r/664094 (owner: ''Elukey)'
2021-02-15 08:34:44 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14346 and previous config saved to /var/cache/conftool/dbconfig/20210215-083444-root.json
2021-02-15 08:44:12 <wikibugs> ('PS1) ''Elukey: hadoop: enable HDFS service port for Analytics Hadoop [puppet] - ''https://gerrit.wikimedia.org/r/664099 (https://phabricator.wikimedia.org/T273629)'
2021-02-15 08:45:24 <wikibugs> ('CR) ''JMeybohm: [C: ''+1] "Nice!" [deployment-charts] - ''https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893) (owner: ''Kosta Harlan)'
2021-02-15 08:47:53 <wikibugs> ('CR) ''Elukey: [V: ''+1] "PCC SUCCESS (DIFF 6): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28056/console"; [puppet] - ''https://gerrit.wikimedia.org/r/664099 (https://phabricator.wikimedia.org/T273629) (owner: ''Elukey)'
2021-02-15 08:48:01 <logmsgbot> !log ryankemper@cumin1001 END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
2021-02-15 08:49:48 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 15%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14347 and previous config saved to /var/cache/conftool/dbconfig/20210215-084947-root.json
2021-02-15 08:50:59 <wikibugs> 'ops-eqiad, ''DC-Ops, ''Wikidata, ''Wikidata-Query-Service: Upgrade firmware on wdqs1009 - https://phabricator.wikimedia.org/T274751 (''Gehel)'
2021-02-15 08:53:53 <wikibugs> 'SRE, ''Dumps-Generation, ''Platform Engineering, ''serviceops, and 2 others: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (''ops-monitoring-bot) Completed auto-reimage of hosts: ` ['snapshot1005.eqiad.wmnet'] ` and were **ALL** successful.'
2021-02-15 08:58:58 <logmsgbot> !log elukey@cumin1001 END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes - elukey@cumin1001
2021-02-15 09:01:22 <wikibugs> 'SRE, ''DBA, ''Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (''Marostegui)'
2021-02-15 09:01:30 <wikibugs> 'SRE, ''DBA, ''Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (''Marostegui)'
2021-02-15 09:01:32 <wikibugs> ('CR) ''Elukey: [V: ''+1 C: ''+2] hadoop: enable HDFS service port for Analytics Hadoop [puppet] - ''https://gerrit.wikimedia.org/r/664099 (https://phabricator.wikimedia.org/T273629) (owner: ''Elukey)'
2021-02-15 09:04:52 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 20%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14348 and previous config saved to /var/cache/conftool/dbconfig/20210215-090451-root.json
2021-02-15 09:05:56 <wikibugs> ('PS1) ''Joal: Update oozie sharelib creation [puppet] - ''https://gerrit.wikimedia.org/r/664172 (https://phabricator.wikimedia.org/T274322)'
2021-02-15 09:06:00 <wikibugs> ('CR) ''JMeybohm: [C: ''-1] "You do mix list indention styles a bit, don't know if we should argue about it or just leave it be." (''2 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/651757 (owner: ''Giuseppe Lavagetto)'
2021-02-15 09:06:03 <joal> elukey: --^
2021-02-15 09:06:06 <joal> for when you have time
2021-02-15 09:07:48 <wikibugs> ('CR) ''Elukey: [C: ''+2] Update oozie sharelib creation [puppet] - ''https://gerrit.wikimedia.org/r/664172 (https://phabricator.wikimedia.org/T274322) (owner: ''Joal)'
2021-02-15 09:11:52 <logmsgbot> !log ryankemper@cumin1001 END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
2021-02-15 09:12:50 <wikibugs> 'SRE, ''Dumps-Generation, ''Platform Engineering, ''serviceops, and 2 others: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (''ops-monitoring-bot) Script wmf-auto-reimage was launched by ariel on cumin1001.eqiad.wmnet for hosts: ` snapshot1006.eqiad.wmnet ` The log can be fo...'
2021-02-15 09:13:58 <wikibugs> ('PS1) ''Filippo Giunchedi: grafana: stop POST to /api/snapshots [puppet] - ''https://gerrit.wikimedia.org/r/664224 (https://phabricator.wikimedia.org/T274736)'
2021-02-15 09:15:13 <wikibugs> ('CR) ''Filippo Giunchedi: [V: ''+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/28057/console"; [puppet] - ''https://gerrit.wikimedia.org/r/664224 (https://phabricator.wikimedia.org/T274736) (owner: ''Filippo Giunchedi)'
2021-02-15 09:15:53 <wikibugs> ('CR) ''Kosta Harlan: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)'
2021-02-15 09:17:11 <wikibugs> ('CR) ''Muehlenhoff: [C: ''+1] "Looks good" [puppet] - ''https://gerrit.wikimedia.org/r/664224 (https://phabricator.wikimedia.org/T274736) (owner: ''Filippo Giunchedi)'
2021-02-15 09:17:49 <wikibugs> ('CR) ''Filippo Giunchedi: [V: ''+1 C: ''+2] grafana: stop POST to /api/snapshots [puppet] - ''https://gerrit.wikimedia.org/r/664224 (https://phabricator.wikimedia.org/T274736) (owner: ''Filippo Giunchedi)'
2021-02-15 09:19:55 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14349 and previous config saved to /var/cache/conftool/dbconfig/20210215-091955-root.json
2021-02-15 09:24:00 <wikibugs> ('PS1) ''ArielGlenn: misc dumps: move commons rdf to later on Sunday and media info to earlier [puppet] - ''https://gerrit.wikimedia.org/r/664225 (https://phabricator.wikimedia.org/T269377)'
2021-02-15 09:24:02 <wikibugs> ('CR) ''David Caro: "Got a couple questions, nits you can safely ignore :)" (''6 comments) [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)'
2021-02-15 09:24:20 <wikibugs> ('PS2) ''JMeybohm: mathoid: pipeline bot promote [deployment-charts] - ''https://gerrit.wikimedia.org/r/663873 (https://phabricator.wikimedia.org/T274262) (owner: ''PipelineBot)'
2021-02-15 09:24:39 <wikibugs> ('CR) ''Ayounsi: [C: ''+2] Remove sampling feature flag [homer/public] - ''https://gerrit.wikimedia.org/r/663533 (owner: ''Ayounsi)'
2021-02-15 09:25:47 <logmsgbot> !log ariel@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1006.eqiad.wmnet with reason: REIMAGE
2021-02-15 09:26:49 <wikibugs> ('CR) ''Ayounsi: "confirmed NOOP." [homer/public] - ''https://gerrit.wikimedia.org/r/663533 (owner: ''Ayounsi)'
2021-02-15 09:27:52 <logmsgbot> !log ariel@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1006.eqiad.wmnet with reason: REIMAGE
2021-02-15 09:28:41 <wikibugs> ('PS1) ''Vgutierrez: admin: Add christinedk user [puppet] - ''https://gerrit.wikimedia.org/r/664226 (https://phabricator.wikimedia.org/T274304)'
2021-02-15 09:28:43 <wikibugs> ('PS1) ''Vgutierrez: admin: Add christinedk to analytics-privatedata-users [puppet] - ''https://gerrit.wikimedia.org/r/664227 (https://phabricator.wikimedia.org/T274304)'
2021-02-15 09:34:59 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 30%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14350 and previous config saved to /var/cache/conftool/dbconfig/20210215-093458-root.json
2021-02-15 09:35:26 <wikibugs> ('CR) ''Muehlenhoff: admin: Add christinedk user (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/664226 (https://phabricator.wikimedia.org/T274304) (owner: ''Vgutierrez)'
2021-02-15 09:37:27 <wikibugs> 'SRE, ''observability: Icinga meta monitoring pages during icinga host reboots - https://phabricator.wikimedia.org/T274662 (''Volans) If we allow for normal reboots going unnoticed, would we catch a scenario in which the icinga host reboots every 5 minutes due to a bug or DoS? P.S. Keyholder is not armed aft...'
2021-02-15 09:43:50 <elukey> !log roll restart HDFS daemons in Analytics Hadoop to pick up new RPC queue changes - T273629
2021-02-15 09:47:55 <wikibugs> ('CR) ''Volans: "Optional nit inline" (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/663860 (owner: ''Hnowlan)'
2021-02-15 09:50:03 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 40%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14351 and previous config saved to /var/cache/conftool/dbconfig/20210215-095002-root.json
2021-02-15 09:50:41 <wikibugs> 'SRE, ''Dumps-Generation, ''Platform Engineering, ''serviceops, and 2 others: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (''ops-monitoring-bot) Completed auto-reimage of hosts: ` ['snapshot1006.eqiad.wmnet'] ` and were **ALL** successful.'
2021-02-15 09:55:54 <wikibugs> ('PS1) ''Jcrespo: Revert "dbbackups: disable all ES db bacula runs until next week" [puppet] - ''https://gerrit.wikimedia.org/r/663961'
2021-02-15 09:56:15 <wikibugs> ('PS2) ''Jcrespo: Revert "dbbackups: disable all ES db bacula runs until next week" [puppet] - ''https://gerrit.wikimedia.org/r/663961'
2021-02-15 09:57:14 <wikibugs> 'SRE, ''Dumps-Generation, ''Platform Engineering, ''serviceops, and 2 others: Upgrade snapshot hosts to Buster - https://phabricator.wikimedia.org/T269377 (''ArielGlenn) I was not going to re-image snapshot1005 and 6 because their replacements were due to have come in, but the boxes have not arrived yet a...'
2021-02-15 09:57:18 <wikibugs> 'SRE: Create cookbook to add a node to a Ganeti cluster - https://phabricator.wikimedia.org/T274527 (''MoritzMuehlenhoff) p:''Triage''Medium'
2021-02-15 09:57:34 <wikibugs> ('PS2) ''ArielGlenn: misc dumps: move commons rdf to later on Sunday and media info to earlier [puppet] - ''https://gerrit.wikimedia.org/r/664225 (https://phabricator.wikimedia.org/T269377)'
2021-02-15 09:57:51 <wikibugs> 'SRE, ''Packaging: Copy cassandra packages to buster-wikimedia - https://phabricator.wikimedia.org/T274119 (''MoritzMuehlenhoff) p:''Triage''Medium'
2021-02-15 09:58:12 <wikibugs> ('CR) ''Jcrespo: [C: ''+2] Revert "dbbackups: disable all ES db bacula runs until next week" [puppet] - ''https://gerrit.wikimedia.org/r/663961 (owner: ''Jcrespo)'
2021-02-15 09:59:02 <wikibugs> ('CR) ''ArielGlenn: [C: ''+2] misc dumps: move commons rdf to later on Sunday and media info to earlier [puppet] - ''https://gerrit.wikimedia.org/r/664225 (https://phabricator.wikimedia.org/T269377) (owner: ''ArielGlenn)'
2021-02-15 10:00:12 <apergos> jynus: may I merge your puppet patch "backup::set { 'mysql-srv-backups-dumps-latest':" etc?
2021-02-15 10:00:17 <jynus> yes
2021-02-15 10:00:41 <apergos> done!
2021-02-15 10:00:44 <jynus> thanks
2021-02-15 10:02:02 <apergos> thanks for the quick response!
2021-02-15 10:05:06 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14352 and previous config saved to /var/cache/conftool/dbconfig/20210215-100505-root.json
2021-02-15 10:09:14 <hashar> !log Switching Jenkins jobs to Quibble 0.0.46
2021-02-15 10:15:52 <wikibugs> 'SRE, ''ops-eqiad: ms-be1034 not powering on - https://phabricator.wikimedia.org/T274488 (''fgiunchedi) Thank you for all the work ! LMK how I can help e.g. if speeding up the decom of one host in T272836 would help (as opposed as decom'ing all hosts at the same time)'
2021-02-15 10:20:09 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 60%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14353 and previous config saved to /var/cache/conftool/dbconfig/20210215-102009-root.json
2021-02-15 10:23:30 <logmsgbot> !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host netmon1002.wikimedia.org
2021-02-15 10:27:29 <logmsgbot> !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1002.wikimedia.org
2021-02-15 10:30:08 <wikibugs> ('CR) ''Marostegui: [C: ''+2] db1134: Do not be tag as candidate master [puppet] - ''https://gerrit.wikimedia.org/r/664230 (https://phabricator.wikimedia.org/T274472) (owner: ''Marostegui)'
2021-02-15 10:31:09 <wikibugs> ('PS1) ''Arturo Borrero Gonzalez: dumps: distribution: nfs: allow establishing connections with TCP ports > 1024 [puppet] - ''https://gerrit.wikimedia.org/r/664231 (https://phabricator.wikimedia.org/T272397)'
2021-02-15 10:35:13 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 70%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14355 and previous config saved to /var/cache/conftool/dbconfig/20210215-103512-root.json
2021-02-15 10:41:25 <wikibugs> ('PS2) ''Arturo Borrero Gonzalez: dumps: distribution: nfs: allow establishing connections with TCP ports >= 1024 [puppet] - ''https://gerrit.wikimedia.org/r/664231 (https://phabricator.wikimedia.org/T272397)'
2021-02-15 10:44:09 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] dumps: distribution: nfs: allow establishing connections with TCP ports >= 1024 [puppet] - ''https://gerrit.wikimedia.org/r/664231 (https://phabricator.wikimedia.org/T272397) (owner: ''Arturo Borrero Gonzalez)'
2021-02-15 10:47:08 <wikibugs> ('PS1) ''Arturo Borrero Gonzalez: labstore: allow NFS connections from public cloud networks [puppet] - ''https://gerrit.wikimedia.org/r/664233 (https://phabricator.wikimedia.org/T272397)'
2021-02-15 10:48:49 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] labstore: allow NFS connections from public cloud networks [puppet] - ''https://gerrit.wikimedia.org/r/664233 (https://phabricator.wikimedia.org/T272397) (owner: ''Arturo Borrero Gonzalez)'
2021-02-15 10:49:05 <wikibugs> ('PS1) ''ArielGlenn: swap roles of dumpsdata1001 and 1003 so 1003 is primary for xml/sql dumps [puppet] - ''https://gerrit.wikimedia.org/r/664234 (https://phabricator.wikimedia.org/T273713)'
2021-02-15 10:50:16 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 80%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14356 and previous config saved to /var/cache/conftool/dbconfig/20210215-105016-root.json
2021-02-15 10:50:16 <jouncebot> In 0 hour(s) and 39 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210215T1130)
2021-02-15 10:50:16 <godog> jouncebot: next
2021-02-15 10:57:30 <wikibugs> ('PS2) ''ArielGlenn: swap roles of dumpsdata1001 and 1003 so 1003 is primary for xml/sql dumps [puppet] - ''https://gerrit.wikimedia.org/r/664234 (https://phabricator.wikimedia.org/T273713)'
2021-02-15 10:57:59 <logmsgbot> !log filippo@cumin1001 START - Cookbook sre.hosts.reboot-single for host grafana1002.eqiad.wmnet
2021-02-15 10:58:44 <wikibugs> ('PS1) ''Jcrespo: Preventive commit for jynus to misspell "bullseye", next Debian version [puppet] - ''https://gerrit.wikimedia.org/r/664237'
2021-02-15 10:58:59 <wikibugs> ('CR) ''ArielGlenn: [C: ''+2] swap roles of dumpsdata1001 and 1003 so 1003 is primary for xml/sql dumps [puppet] - ''https://gerrit.wikimedia.org/r/664234 (https://phabricator.wikimedia.org/T273713) (owner: ''ArielGlenn)'
2021-02-15 11:00:25 <logmsgbot> !log filippo@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana1002.eqiad.wmnet
2021-02-15 11:02:02 <wikibugs> ('CR) ''Effie Mouzeli: "PCC https://puppet-compiler.wmflabs.org/compiler1002/28058/"; [puppet] - ''https://gerrit.wikimedia.org/r/663565 (https://phabricator.wikimedia.org/T273115) (owner: ''Effie Mouzeli)'
2021-02-15 11:03:17 <wikibugs> ('PS2) ''Hnowlan: mtail: add exception handling in tests for non-Debian OSes [puppet] - ''https://gerrit.wikimedia.org/r/663860'
2021-02-15 11:05:20 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 90%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14357 and previous config saved to /var/cache/conftool/dbconfig/20210215-110519-root.json
2021-02-15 11:06:51 <icinga-wm> RECOVERY - Maps HTTPS on maps2007 is OK: HTTP OK: HTTP/1.1 200 OK - 1329 bytes in 0.301 second response time https://wikitech.wikimedia.org/wiki/Maps/RunBook
2021-02-15 11:07:27 <icinga-wm> RECOVERY - tilerator on maps2007 is OK: HTTP OK: HTTP/1.1 200 OK - 324 bytes in 0.085 second response time https://wikitech.wikimedia.org/wiki/Services/Monitoring/tilerator
2021-02-15 11:08:21 <icinga-wm> RECOVERY - Check systemd state on maps2007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 11:10:25 <logmsgbot> !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes:weight=10; selector: name=maps2007.codfw.wmnet
2021-02-15 11:11:57 <wikibugs> ('CR) ''Hnowlan: mtail: add exception handling in tests for non-Debian OSes (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/663860 (owner: ''Hnowlan)'
2021-02-15 11:14:57 <wikibugs> ('PS1) ''Elukey: profile::hadoop::master: raise threshold for corrupt blocks [puppet] - ''https://gerrit.wikimedia.org/r/664238'
2021-02-15 11:16:50 <wikibugs> ('CR) ''Elukey: [C: ''+2] profile::hadoop::master: raise threshold for corrupt blocks [puppet] - ''https://gerrit.wikimedia.org/r/664238 (owner: ''Elukey)'
2021-02-15 11:20:24 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14358 and previous config saved to /var/cache/conftool/dbconfig/20210215-112023-root.json
2021-02-15 11:27:16 <wikibugs> ('PS4) ''Arturo Borrero Gonzalez: cloud: drop NAT exceptions for dumps NFS [puppet] - ''https://gerrit.wikimedia.org/r/657152 (https://phabricator.wikimedia.org/T272397)'
2021-02-15 11:28:11 <logmsgbot> !log elukey@cumin1001 START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes - elukey@cumin1001
2021-02-15 11:28:44 <elukey> this may trigger (I hope not) AQS alerts --^
2021-02-15 11:28:52 <elukey> in case it is my fault and you can blame me
2021-02-15 11:29:05 <elukey> sees kormat ready for it
2021-02-15 11:29:31 <kormat> nods solemnly
2021-02-15 11:29:57 <wikibugs> ('CR) ''Hnowlan: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)'
2021-02-15 11:30:04 <jouncebot> jan_drewniak: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Wikimedia Portals Update . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210215T1130).
2021-02-15 11:32:31 <wikibugs> ('PS1) ''Arturo Borrero Gonzalez: cloudgw: move common hiera into proper file [puppet] - ''https://gerrit.wikimedia.org/r/664241 (https://phabricator.wikimedia.org/T272963)'
2021-02-15 11:33:13 <wikibugs> ('CR) ''Jbond: "See comments inline, also wonder if you considered using pathlib for the file operations." (''5 comments) [puppet] - ''https://gerrit.wikimedia.org/r/663658 (https://phabricator.wikimedia.org/T271583) (owner: ''CRusnov)'
2021-02-15 11:33:17 <wikibugs> ('PS4) ''Effie Mouzeli: hieradata: enable memcached socket mwdebug1003, mwdebug2001 [puppet] - ''https://gerrit.wikimedia.org/r/663796 (https://phabricator.wikimedia.org/T273115)'
2021-02-15 11:33:19 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] cloudgw: move common hiera into proper file [puppet] - ''https://gerrit.wikimedia.org/r/664241 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)'
2021-02-15 11:34:50 <wikibugs> ('PS5) ''Arturo Borrero Gonzalez: cloud: drop NAT exceptions for dumps NFS [puppet] - ''https://gerrit.wikimedia.org/r/657152 (https://phabricator.wikimedia.org/T272397)'
2021-02-15 11:37:34 <moritzm> !log reimaging bast5001 to buster
2021-02-15 11:45:23 <wikibugs> ('CR) ''Jbond: "Adding Andrew to approve privatedata-users access" [puppet] - ''https://gerrit.wikimedia.org/r/664227 (https://phabricator.wikimedia.org/T274304) (owner: ''Vgutierrez)'
2021-02-15 11:52:45 <logmsgbot> !log ariel@cumin1001 START - Cookbook sre.hosts.reboot-single for host snapshot1007.eqiad.wmnet
2021-02-15 11:54:09 <wikibugs> ('CR) ''Jbond: "see comments" (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/663993 (owner: ''Urbanecm)'
2021-02-15 11:55:13 <wikibugs> ('CR) ''Urbanecm: Update urbanecm's dotfiles (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/663993 (owner: ''Urbanecm)'
2021-02-15 11:55:23 <wikibugs> ('PS2) ''Urbanecm: Update urbanecm's dotfiles [puppet] - ''https://gerrit.wikimedia.org/r/663993'
2021-02-15 11:56:00 <wikibugs> ('CR) ''Jbond: [C: ''+2] Update urbanecm's dotfiles [puppet] - ''https://gerrit.wikimedia.org/r/663993 (owner: ''Urbanecm)'
2021-02-15 11:56:21 <jbond42> Urbanecm: ^^ merged
2021-02-15 11:56:24 <Urbanecm> thanks jbond42 !
2021-02-15 11:56:28 <jbond42> :) np
2021-02-15 11:58:52 <logmsgbot> !log ariel@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1007.eqiad.wmnet
2021-02-15 12:00:05 <jouncebot> No GERRIT patches in the queue for this window AFAICS.
2021-02-15 12:00:05 <jouncebot> Amir1, Lucas_WMDE, awight, and Urbanecm: That opportune time is upon us again. Time for a European mid-day backport window deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210215T1200).
2021-02-15 12:00:14 <Urbanecm> I'll deploy regardless
2021-02-15 12:01:12 <wikibugs> ('CR) ''Urbanecm: [C: ''+2] Revert "Revert "Enable SandboxLink at viwiki"" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/663736 (https://phabricator.wikimedia.org/T272796) (owner: ''Urbanecm)'
2021-02-15 12:02:46 <wikibugs> ('Merged) ''jenkins-bot: Revert "Revert "Enable SandboxLink at viwiki"" [mediawiki-config] - ''https://gerrit.wikimedia.org/r/663736 (https://phabricator.wikimedia.org/T272796) (owner: ''Urbanecm)'
2021-02-15 12:04:02 <wikibugs> ('PS1) ''Effie Mouzeli: hiera: install memcached 1.6 on mc1037 [puppet] - ''https://gerrit.wikimedia.org/r/664271 (https://phabricator.wikimedia.org/T270315)'
2021-02-15 12:06:36 <wikibugs> ('CR) ''Jbond: [C: ''+1] "thanks this will also be a big help to me 😊" [puppet] - ''https://gerrit.wikimedia.org/r/664237 (owner: ''Jcrespo)'
2021-02-15 12:07:47 <logmsgbot> !log jmm@cumin2001 START - Cookbook sre.hosts.downtime for 2:00:00 on bast5001.wikimedia.org with reason: REIMAGE
2021-02-15 12:07:55 <wikibugs> ('PS22) ''Kosta Harlan: linkrecommendation: Cron job to load datasets [deployment-charts] - ''https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893)'
2021-02-15 12:08:54 <Urbanecm> can someone check mwdebug1002.eqiad.wmnet status, and remove it from scap if it is still broken (as mutante said in ops list)?
2021-02-15 12:09:16 <wikibugs> ('CR) ''Effie Mouzeli: "PCC https://puppet-compiler.wmflabs.org/compiler1003/28065/mc2037.codfw.wmnet/index.html"; [puppet] - ''https://gerrit.wikimedia.org/r/664271 (https://phabricator.wikimedia.org/T270315) (owner: ''Effie Mouzeli)'
2021-02-15 12:09:47 <logmsgbot> !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast5001.wikimedia.org with reason: REIMAGE
2021-02-15 12:09:59 <wikibugs> ('PS2) ''Muehlenhoff: Swift: Stop setting net.ipv4.tcp_tw_recycle for buster and later [puppet] - ''https://gerrit.wikimedia.org/r/662918'
2021-02-15 12:10:35 <logmsgbot> !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 662d5f6af01f6cf6ce7e9d56cf1bc3ba282afee1: Revert "Revert "Enable SandboxLink at viwiki"" (T272796) (duration: 05m 26s)
2021-02-15 12:10:41 <Urbanecm> finally
2021-02-15 12:11:36 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytic Cluster for Research Scientist (Paragon) - https://phabricator.wikimedia.org/T274631 (''MoritzMuehlenhoff) Also needs approval by @Ottomata for Hadoop access.'
2021-02-15 12:13:39 <wikibugs> ('CR) ''JMeybohm: [C: ''+1] linkrecommendation: Cron job to load datasets [deployment-charts] - ''https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893) (owner: ''Kosta Harlan)'
2021-02-15 12:14:25 <wikibugs> ('CR) ''Kosta Harlan: [C: ''+2] linkrecommendation: Cron job to load datasets [deployment-charts] - ''https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893) (owner: ''Kosta Harlan)'
2021-02-15 12:15:59 <wikibugs> ('Merged) ''jenkins-bot: linkrecommendation: Cron job to load datasets [deployment-charts] - ''https://gerrit.wikimedia.org/r/660394 (https://phabricator.wikimedia.org/T265893) (owner: ''Kosta Harlan)'
2021-02-15 12:16:19 <wikibugs> ('CR) ''Vgutierrez: [C: ''+1] delete class tlsproxy::prometheus and nginx template [puppet] - ''https://gerrit.wikimedia.org/r/659377 (https://phabricator.wikimedia.org/T272559) (owner: ''Dzahn)'
2021-02-15 12:16:21 <wikibugs> ('PS2) ''Urbanecm: ukwikisource: Finish removal of NS Translations [mediawiki-config] - ''https://gerrit.wikimedia.org/r/664053 (https://phabricator.wikimedia.org/T270628)'
2021-02-15 12:16:24 <wikibugs> ('CR) ''Urbanecm: [C: ''+2] ukwikisource: Finish removal of NS Translations [mediawiki-config] - ''https://gerrit.wikimedia.org/r/664053 (https://phabricator.wikimedia.org/T270628) (owner: ''Urbanecm)'
2021-02-15 12:17:21 <wikibugs> ('Merged) ''jenkins-bot: ukwikisource: Finish removal of NS Translations [mediawiki-config] - ''https://gerrit.wikimedia.org/r/664053 (https://phabricator.wikimedia.org/T270628) (owner: ''Urbanecm)'
2021-02-15 12:17:27 <wikibugs> ('CR) ''Elukey: [C: ''+1] "left a nit for the commit msg, LGTM otherwise!" (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/664271 (https://phabricator.wikimedia.org/T270315) (owner: ''Effie Mouzeli)'
2021-02-15 12:18:18 <wikibugs> ('CR) ''Elukey: [C: ''+1] "Effie can you run a pcc to see if everything looks good?" [puppet] - ''https://gerrit.wikimedia.org/r/663868 (https://phabricator.wikimedia.org/T270315) (owner: ''Effie Mouzeli)'
2021-02-15 12:18:47 <logmsgbot> !log kharlan@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
2021-02-15 12:21:30 <Urbanecm> repeating myself: can someone depool mwdebug1002? it's currently down (see mail from dzahn in ops list), but still pooled and thus in scap dsh group :/
2021-02-15 12:22:25 <wikibugs> 'SRE: hosts failing puppet compile due to missing secrets - https://phabricator.wikimedia.org/T274392 (''MoritzMuehlenhoff) p:''Triage''Medium'
2021-02-15 12:23:45 <wikibugs> 'SRE: hosts failing puppet compile due to missing secrets - https://phabricator.wikimedia.org/T274392 (''MoritzMuehlenhoff) Adding a few tags for affected sub teams, simply untag when completed'
2021-02-15 12:24:38 <wikibugs> 'SRE, ''Analytics, ''observability, ''serviceops, ''cloud-services-team (Kanban): hosts failing puppet compile due to missing secrets - https://phabricator.wikimedia.org/T274392 (''MoritzMuehlenhoff)'
2021-02-15 12:25:33 <wikibugs> ('CR) ''Volans: "quick direct reply, will have a pass later" (''1 comment) [puppet] - ''https://gerrit.wikimedia.org/r/663658 (https://phabricator.wikimedia.org/T271583) (owner: ''CRusnov)'
2021-02-15 12:25:55 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: "Thanks for the review!" (''6 comments) [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)'
2021-02-15 12:30:51 <wikibugs> ('PS1) ''JMeybohm: admin: Allow tiller to create batch ressources [deployment-charts] - ''https://gerrit.wikimedia.org/r/664273'
2021-02-15 12:32:02 <wikibugs> ('CR) ''JMeybohm: [V: ''+2 C: ''+2] admin: Allow tiller to create batch ressources [deployment-charts] - ''https://gerrit.wikimedia.org/r/664273 (owner: ''JMeybohm)'
2021-02-15 12:32:29 <logmsgbot> !log jmm@puppetmaster1001 conftool action : set/pooled=inactive; selector: name=mwdebug1002.eqiad.wmnet
2021-02-15 12:33:32 <wikibugs> ('Merged) ''jenkins-bot: admin: Allow tiller to create batch ressources [deployment-charts] - ''https://gerrit.wikimedia.org/r/664273 (owner: ''JMeybohm)'
2021-02-15 12:35:00 <logmsgbot> !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
2021-02-15 12:35:39 <logmsgbot> !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: cdf15981f7c6f7e02a3fb1c1ce61dc14815f216d: ukwikisource: Finish removal of NS Translations (T270628) (duration: 01m 07s)
2021-02-15 12:36:24 <wikibugs> ('PS1) ''Elukey: Add/Fix kerberos fake keytabs [labs/private] - ''https://gerrit.wikimedia.org/r/664274 (https://phabricator.wikimedia.org/T274392)'
2021-02-15 12:36:46 <wikibugs> ('CR) ''Elukey: [V: ''+2 C: ''+2] Add/Fix kerberos fake keytabs [labs/private] - ''https://gerrit.wikimedia.org/r/664274 (https://phabricator.wikimedia.org/T274392) (owner: ''Elukey)'
2021-02-15 12:37:06 <logmsgbot> !log elukey@cumin1001 END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes - elukey@cumin1001
2021-02-15 12:37:32 <wikibugs> 'SRE, ''ops-eqiad, ''DC-Ops, ''Wikidata, and 2 others: Upgrade firmware on wdqs1009 - https://phabricator.wikimedia.org/T274751 (''MoritzMuehlenhoff) p:''Triage''Medium'
2021-02-15 12:38:28 <wikibugs> ('CR) ''David Caro: [C: ''+1] cloudgw: introduce HA by using keepalived/VRRP (''6 comments) [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)'
2021-02-15 12:38:36 <wikibugs> ('PS9) ''Arturo Borrero Gonzalez: cloudgw: introduce HA by using keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963)'
2021-02-15 12:38:38 <wikibugs> 'SRE, ''observability, ''serviceops, ''Patch-For-Review, ''cloud-services-team (Kanban): hosts failing puppet compile due to missing secrets - https://phabricator.wikimedia.org/T274392 (''elukey)'
2021-02-15 12:39:18 <logmsgbot> !log kharlan@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
2021-02-15 12:40:12 <wikibugs> ('PS10) ''Arturo Borrero Gonzalez: cloudgw: introduce HA by using keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963)'
2021-02-15 12:43:59 <moritzm> !log reimaging bast4002 to buster
2021-02-15 12:44:04 <wikibugs> ('PS11) ''Arturo Borrero Gonzalez: cloudgw: introduce HA by using keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963)'
2021-02-15 12:44:09 <logmsgbot> !log jayme@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
2021-02-15 12:44:39 <icinga-wm> PROBLEM - etherpad_lite_process_running on etherpad1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/node /usr/share/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js https://wikitech.wikimedia.org/wiki/Etherpad.wikimedia.org
2021-02-15 12:44:59 <icinga-wm> PROBLEM - etherpad_up reduced availability on alert1001 is CRITICAL: 0 le 0.8 https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_exporters_%22up%22_metrics_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
2021-02-15 12:45:53 <icinga-wm> PROBLEM - etherpad.wikimedia.org HTTP on etherpad1002 is CRITICAL: connect to address 10.64.32.178 and port 9001: Connection refused https://wikitech.wikimedia.org/wiki/Etherpad.wikimedia.org
2021-02-15 12:46:25 <icinga-wm> RECOVERY - etherpad_lite_process_running on etherpad1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/node /usr/share/etherpad-lite/node_modules/ep_etherpad-lite/node/server.js https://wikitech.wikimedia.org/wiki/Etherpad.wikimedia.org
2021-02-15 12:47:24 <wikibugs> ('PS12) ''Arturo Borrero Gonzalez: cloudgw: introduce HA by using keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963)'
2021-02-15 12:47:35 <icinga-wm> RECOVERY - etherpad.wikimedia.org HTTP on etherpad1002 is OK: HTTP OK: HTTP/1.1 200 OK - 9184 bytes in 0.004 second response time https://wikitech.wikimedia.org/wiki/Etherpad.wikimedia.org
2021-02-15 12:47:58 <wikibugs> ('CR) ''Effie Mouzeli: "> Patch Set 1: Code-Review+1" [puppet] - ''https://gerrit.wikimedia.org/r/663868 (https://phabricator.wikimedia.org/T270315) (owner: ''Effie Mouzeli)'
2021-02-15 12:48:27 <icinga-wm> RECOVERY - etherpad_up reduced availability on alert1001 is OK: (C)0.8 le (W)0.9 le 1 https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_exporters_%22up%22_metrics_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
2021-02-15 12:49:10 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] "PCC https://puppet-compiler.wmflabs.org/compiler1002/28075/"; [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)'
2021-02-15 12:49:13 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: [V: ''+2 C: ''+2] cloudgw: introduce HA by using keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/663823 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)'
2021-02-15 12:49:45 <logmsgbot> !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1093 T273955', diff saved to https://phabricator.wikimedia.org/P14359 and previous config saved to /var/cache/conftool/dbconfig/20210215-124944-marostegui.json
2021-02-15 12:50:24 <wikibugs> ('PS2) ''David Caro: utils: add script to run docker ci tests locally [software/spicerack] - ''https://gerrit.wikimedia.org/r/663205 (https://phabricator.wikimedia.org/T274338)'
2021-02-15 12:50:27 <wikibugs> ('PS1) ''Marostegui: db1093: Disable notifications [puppet] - ''https://gerrit.wikimedia.org/r/664276 (https://phabricator.wikimedia.org/T273955)'
2021-02-15 12:50:50 <logmsgbot> !log jayme@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
2021-02-15 12:51:16 <wikibugs> ('CR) ''Marostegui: [C: ''+2] db1093: Disable notifications [puppet] - ''https://gerrit.wikimedia.org/r/664276 (https://phabricator.wikimedia.org/T273955) (owner: ''Marostegui)'
2021-02-15 12:58:16 <logmsgbot> !log kharlan@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
2021-02-15 12:58:16 <logmsgbot> !log kharlan@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
2021-02-15 12:58:19 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 12:58:22 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 13:01:10 <Lucas_WMDE> we lost a whole bunch of SAL messages because stashbot was out
2021-02-15 13:01:12 <logmsgbot> !log jmm@cumin2001 START - Cookbook sre.hosts.downtime for 2:00:00 on bast4002.wikimedia.org with reason: REIMAGE
2021-02-15 13:01:15 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 13:01:21 <Lucas_WMDE> is it worth repeating them all?
2021-02-15 13:01:49 <Lucas_WMDE> cc marostegui, ryankemper, ariel, elukey…
2021-02-15 13:02:04 <marostegui> Lucas_WMDE: not from my side, thanks though! :)
2021-02-15 13:02:10 <Lucas_WMDE> ok
2021-02-15 13:02:26 <Lucas_WMDE> sometimes I do it but this seems to be almost 50 missed messages and I’m lazy :D
2021-02-15 13:02:41 <logmsgbot> !log kharlan@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
2021-02-15 13:02:41 <logmsgbot> !log kharlan@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
2021-02-15 13:02:44 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 13:02:44 <Lucas_WMDE> (they’re all in the IRC log)
2021-02-15 13:02:48 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 13:02:50 <wikibugs> 'SRE, ''DBA, ''Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (''Marostegui) db1162 is fully pooled'
2021-02-15 13:03:18 <logmsgbot> !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4002.wikimedia.org with reason: REIMAGE
2021-02-15 13:03:21 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 13:05:58 <Lucas_WMDE> !log notice: stashbot had issues between 8:19 and 12:50, see for https://wm-bot.wmflabs.org/browser/index.php?start=02%2F15%2F2021&end=02%2F15%2F2021&display=%23wikimedia-operations for missed !log messages
2021-02-15 13:06:01 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 13:06:54 <godog> !log swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - T272836
2021-02-15 13:06:57 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 13:06:58 <stashbot> T272836: Decom ms-be[1019-1026] from swift - https://phabricator.wikimedia.org/T272836
2021-02-15 13:14:05 <wikibugs> ('PS1) ''JMeybohm: linkrecommendation: Read DB_USER from public config [deployment-charts] - ''https://gerrit.wikimedia.org/r/664277 (https://phabricator.wikimedia.org/T265893)'
2021-02-15 13:14:16 <jayme> ^ kostajh
2021-02-15 13:14:58 <wikibugs> ('CR) ''Kosta Harlan: [C: ''+2] linkrecommendation: Read DB_USER from public config [deployment-charts] - ''https://gerrit.wikimedia.org/r/664277 (https://phabricator.wikimedia.org/T265893) (owner: ''JMeybohm)'
2021-02-15 13:15:30 <kostajh> jayme: cheers
2021-02-15 13:17:35 <wikibugs> ('Merged) ''jenkins-bot: linkrecommendation: Read DB_USER from public config [deployment-charts] - ''https://gerrit.wikimedia.org/r/664277 (https://phabricator.wikimedia.org/T265893) (owner: ''JMeybohm)'
2021-02-15 13:19:28 <logmsgbot> !log kharlan@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
2021-02-15 13:19:36 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 13:21:47 <wikibugs> ('PS4) ''Hnowlan: mtail: create separate metrics histogram based on endpoint [puppet] - ''https://gerrit.wikimedia.org/r/634207 (https://phabricator.wikimedia.org/T263727)'
2021-02-15 13:22:04 <wikibugs> ('CR) ''Hnowlan: [V: ''+2 C: ''+2] tegola: Add docker image. [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/654662 (https://phabricator.wikimedia.org/T270170) (owner: ''Hnowlan)'
2021-02-15 13:28:57 <wikibugs> ('CR) ''Alexandros Kosiaris: "Shouldn't this instead be done via the pipeline? It would greatly decouple upgrading tegola from requiring an SRE to build newer versions " [docker-images/production-images] - ''https://gerrit.wikimedia.org/r/654662 (https://phabricator.wikimedia.org/T270170) (owner: ''Hnowlan)'
2021-02-15 13:33:36 <marostegui> !log Stop MySQL on db1093 - T273955
2021-02-15 13:33:39 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 13:33:41 <stashbot> T273955: decommission db1093.eqiad.wmnet - https://phabricator.wikimedia.org/T273955
2021-02-15 13:34:02 <wikibugs> ('PS5) ''Jbond: Add check to error when calling to hiera() [puppet-lint/wmf_styleguide-check] - ''https://gerrit.wikimedia.org/r/659789 (https://phabricator.wikimedia.org/T209953) (owner: ''Ladsgroup)'
2021-02-15 13:34:39 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] Add check to error when calling to hiera() [puppet-lint/wmf_styleguide-check] - ''https://gerrit.wikimedia.org/r/659789 (https://phabricator.wikimedia.org/T209953) (owner: ''Ladsgroup)'
2021-02-15 13:38:10 <moritzm> !log installing subversion security updates
2021-02-15 13:38:14 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 13:41:38 <wikibugs> ('PS6) ''Jbond: Add check to error when calling to hiera() [puppet-lint/wmf_styleguide-check] - ''https://gerrit.wikimedia.org/r/659789 (https://phabricator.wikimedia.org/T209953) (owner: ''Ladsgroup)'
2021-02-15 13:43:11 <icinga-wm> RECOVERY - Check systemd state on relforge1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 13:47:55 <wikibugs> ('PS2) ''Muehlenhoff: admin: Add christinedk user [puppet] - ''https://gerrit.wikimedia.org/r/664226 (https://phabricator.wikimedia.org/T274304) (owner: ''Vgutierrez)'
2021-02-15 13:48:03 <wikibugs> ('PS7) ''Jbond: Add check to error when calling to hiera() [puppet-lint/wmf_styleguide-check] - ''https://gerrit.wikimedia.org/r/659789 (https://phabricator.wikimedia.org/T209953) (owner: ''Ladsgroup)'
2021-02-15 13:48:13 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] Add check to error when calling to hiera() [puppet-lint/wmf_styleguide-check] - ''https://gerrit.wikimedia.org/r/659789 (https://phabricator.wikimedia.org/T209953) (owner: ''Ladsgroup)'
2021-02-15 13:53:00 <logmsgbot> !log gehel@cumin2001 START - Cookbook sre.wdqs.data-reload
2021-02-15 13:53:03 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 13:57:13 <moritzm> !log installing libonig security update for stretch
2021-02-15 13:57:16 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 14:08:09 <godog> !log swift eqiad-prod: add weight back to sdg on ms-be1054 - T273582
2021-02-15 14:08:14 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 14:08:15 <stashbot> T273582: Put sdg1 on ms-be1054 back in service - https://phabricator.wikimedia.org/T273582
2021-02-15 14:10:43 <wikibugs> 'SRE, ''SRE-swift-storage, ''Patch-For-Review, ''User-fgiunchedi: swift backend decomms / rebalances are noisy - https://phabricator.wikimedia.org/T221904 (''fgiunchedi) ''Open''Resolved I'm boldly resolving this again since limiting memory usage for object replication processes helped a whole lot to...'
2021-02-15 14:12:42 <wikibugs> ('PS1) ''Urbanecm: Add *.president.az to the wgCopyUploadsDomains allowlist of Wikimedia Commons [mediawiki-config] - ''https://gerrit.wikimedia.org/r/664294 (https://phabricator.wikimedia.org/T274789)'
2021-02-15 14:13:04 <Urbanecm> jouncebot: now
2021-02-15 14:13:05 <jouncebot> No deployments scheduled for the next 3 hour(s) and 46 minute(s)
2021-02-15 14:13:15 <wikibugs> ('PS2) ''Urbanecm: Add *.president.az to the wgCopyUploadsDomains allowlist of Wikimedia Commons [mediawiki-config] - ''https://gerrit.wikimedia.org/r/664294 (https://phabricator.wikimedia.org/T274789)'
2021-02-15 14:13:18 <wikibugs> ('CR) ''Urbanecm: [C: ''+2] Add *.president.az to the wgCopyUploadsDomains allowlist of Wikimedia Commons [mediawiki-config] - ''https://gerrit.wikimedia.org/r/664294 (https://phabricator.wikimedia.org/T274789) (owner: ''Urbanecm)'
2021-02-15 14:14:07 <wikibugs> ('Merged) ''jenkins-bot: Add *.president.az to the wgCopyUploadsDomains allowlist of Wikimedia Commons [mediawiki-config] - ''https://gerrit.wikimedia.org/r/664294 (https://phabricator.wikimedia.org/T274789) (owner: ''Urbanecm)'
2021-02-15 14:17:02 <logmsgbot> !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 00905c4a7e4bb69f39e52e1c4d4d6168006b0e7b: Add *.president.az to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T274789) (duration: 01m 09s)
2021-02-15 14:17:06 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 14:17:07 <stashbot> T274789: Add <https://static.president.az/>; to the wgCopyUploadsDomains allowlist of Wikimedia Commons - https://phabricator.wikimedia.org/T274789
2021-02-15 14:19:43 <icinga-wm> PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 14:23:44 <wikibugs> ('PS8) ''Jbond: Add check to error when calling to hiera() [puppet-lint/wmf_styleguide-check] - ''https://gerrit.wikimedia.org/r/659789 (https://phabricator.wikimedia.org/T209953) (owner: ''Ladsgroup)'
2021-02-15 14:25:33 <icinga-wm> RECOVERY - Check systemd state on sodium is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 14:28:37 <wikibugs> ('CR) ''David Caro: utils: add script to run docker ci tests locally (''3 comments) [software/spicerack] - ''https://gerrit.wikimedia.org/r/663205 (https://phabricator.wikimedia.org/T274338) (owner: ''David Caro)'
2021-02-15 14:31:40 <wikibugs> ('CR) ''Jbond: [C: ''+2] Add check to error when calling to hiera() [puppet-lint/wmf_styleguide-check] - ''https://gerrit.wikimedia.org/r/659789 (https://phabricator.wikimedia.org/T209953) (owner: ''Ladsgroup)'
2021-02-15 14:34:23 <wikibugs> 'SRE, ''Maps, ''Product-Infrastructure-Team-Backlog, ''Services, ''Service-deployment-requests: New Service Request geoshapes - https://phabricator.wikimedia.org/T274388 (''MoritzMuehlenhoff) p:''Triage''Medium'
2021-02-15 14:34:33 <wikibugs> 'SRE, ''Maps, ''Product-Infrastructure-Team-Backlog, ''Services, ''Service-deployment-requests: [DRAFT] New Service Request tegola - https://phabricator.wikimedia.org/T274390 (''MoritzMuehlenhoff) p:''Triage''Medium'
2021-02-15 14:45:09 <icinga-wm> RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 14:48:25 <wikibugs> ('PS1) ''Jbond: Gemfile: increase dependency for wmf_style-stylegude-check [puppet] - ''https://gerrit.wikimedia.org/r/664297 (https://phabricator.wikimedia.org/T209953)'
2021-02-15 15:04:50 <godog> !log upgrade grafana to 7.4.1 on grafana1002 - T263747
2021-02-15 15:04:54 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:04:55 <stashbot> T263747: Upgrade Grafana to 7.4 - https://phabricator.wikimedia.org/T263747
2021-02-15 15:06:15 <wikibugs> ('CR) ''Ppchelko: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)'
2021-02-15 15:06:27 <wikibugs> 'SRE, ''SRE-Access-Requests: Requesting access to stat boxes for mlitn - https://phabricator.wikimedia.org/T274749 (''MoritzMuehlenhoff) Also adding @Ottomata for approval for analytics-privatedata-users.'
2021-02-15 15:09:46 <moritzm> !log reimaging bast3004 to buster
2021-02-15 15:09:49 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:15:06 <wikibugs> ('PS1) ''Bartosz Dziewoński: CommentFormatter: Fix problems with editsection and quotes [extensions/DiscussionTools] (wmf/1.36.0-wmf.30) - ''https://gerrit.wikimedia.org/r/664254 (https://phabricator.wikimedia.org/T274709)'
2021-02-15 15:17:18 <logmsgbot> !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host schema1003.eqiad.wmnet
2021-02-15 15:17:21 <wikibugs> ('CR) ''Jbond: "did a quick pass however im not that familiar with the current decom cook book" (''7 comments) [cookbooks] - ''https://gerrit.wikimedia.org/r/663878 (owner: ''Elukey)'
2021-02-15 15:17:21 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:20:05 <wikibugs> ('PS1) ''Kormat: integration_env: Rework cli to simplify operations [software/wmfmariadbpy] - ''https://gerrit.wikimedia.org/r/664300'
2021-02-15 15:20:10 <icinga-wm> PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp5012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
2021-02-15 15:27:56 <wikibugs> ('CR) ''Hashar: [C: ''+1] "Can be merged anytime, the CI job always does a gem update :]" [puppet] - ''https://gerrit.wikimedia.org/r/664297 (https://phabricator.wikimedia.org/T209953) (owner: ''Jbond)'
2021-02-15 15:28:49 <wikibugs> ('CR) ''Jbond: [C: ''+2] Gemfile: increase dependency for wmf_style-stylegude-check [puppet] - ''https://gerrit.wikimedia.org/r/664297 (https://phabricator.wikimedia.org/T209953) (owner: ''Jbond)'
2021-02-15 15:30:19 <logmsgbot> !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1003.eqiad.wmnet
2021-02-15 15:30:23 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:31:21 <icinga-wm> RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp5012 is OK: HTTP OK: HTTP/1.0 200 OK - 23547 bytes in 0.829 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
2021-02-15 15:33:08 <moritzm> !log installing linux-4.19 update for Stretch on servers which have it installed (no reboots, just updating the kernels)
2021-02-15 15:33:12 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:33:35 <wikibugs> ('CR) ''Kormat: [C: ''+2] integration_env: Rework cli to simplify operations [software/wmfmariadbpy] - ''https://gerrit.wikimedia.org/r/664300 (owner: ''Kormat)'
2021-02-15 15:34:16 <logmsgbot> !log jmm@cumin2001 START - Cookbook sre.hosts.downtime for 2:00:00 on bast3004.wikimedia.org with reason: REIMAGE
2021-02-15 15:34:20 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:34:30 <wikibugs> ('CR) ''Jcrespo: [C: ''+2] Preventive commit for jynus to misspell "bullseye", next Debian version [puppet] - ''https://gerrit.wikimedia.org/r/664237 (owner: ''Jcrespo)'
2021-02-15 15:36:11 <logmsgbot> !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
2021-02-15 15:36:12 <logmsgbot> !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
2021-02-15 15:36:12 <logmsgbot> !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
2021-02-15 15:36:15 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:36:18 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:36:20 <logmsgbot> !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast3004.wikimedia.org with reason: REIMAGE
2021-02-15 15:36:22 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:36:25 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:36:46 <wikibugs> ('PS1) ''Jcrespo: testing test at test at testing [puppet] - ''https://gerrit.wikimedia.org/r/664301'
2021-02-15 15:36:54 <wikibugs> ('Merged) ''jenkins-bot: integration_env: Rework cli to simplify operations [software/wmfmariadbpy] - ''https://gerrit.wikimedia.org/r/664300 (owner: ''Kormat)'
2021-02-15 15:38:02 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] testing test at test at testing [puppet] - ''https://gerrit.wikimedia.org/r/664301 (owner: ''Jcrespo)'
2021-02-15 15:38:36 <logmsgbot> !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host schema1004.eqiad.wmnet
2021-02-15 15:38:39 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:38:49 <wikibugs> ('CR) ''Jcrespo: "16:37:55 Typo found!" [puppet] - ''https://gerrit.wikimedia.org/r/664301 (owner: ''Jcrespo)'
2021-02-15 15:39:13 <wikibugs> ('Abandoned) ''Jcrespo: testing test at test at testing [puppet] - ''https://gerrit.wikimedia.org/r/664301 (owner: ''Jcrespo)'
2021-02-15 15:39:46 <wikibugs> ('CR) ''Alexandros Kosiaris: [C: ''-1] "1 pedantic comment but perhaps we can solve this more easily, see inline." (''2 comments) [deployment-charts] - ''https://gerrit.wikimedia.org/r/659863 (owner: ''JMeybohm)'
2021-02-15 15:39:52 <wikibugs> 'SRE: reprepro unable to run checkupdate and import upgraded packages - https://phabricator.wikimedia.org/T274797 (''fgiunchedi)'
2021-02-15 15:40:39 <wikibugs> ('PS1) ''Elukey: hadoop: update the HDFS Namenode rack configuration [puppet] - ''https://gerrit.wikimedia.org/r/664302 (https://phabricator.wikimedia.org/T274795)'
2021-02-15 15:41:13 <wikibugs> 'SRE: reprepro unable to run checkupdate and import upgraded packages - https://phabricator.wikimedia.org/T274797 (''fgiunchedi)'
2021-02-15 15:44:52 <wikibugs> ('CR) ''Alexandros Kosiaris: "+1, but perhaps we don't even need it? See dependent commit" [deployment-charts] - ''https://gerrit.wikimedia.org/r/659864 (owner: ''JMeybohm)'
2021-02-15 15:45:07 <wikibugs> ('PS1) ''Muehlenhoff: Add a comment to the snapshot block [puppet] - ''https://gerrit.wikimedia.org/r/664303'
2021-02-15 15:46:19 <logmsgbot> !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1004.eqiad.wmnet
2021-02-15 15:46:21 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:46:44 <wikibugs> ('PS1) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: add vlan 2120 back into the neutron bridge" [puppet] - ''https://gerrit.wikimedia.org/r/664255'
2021-02-15 15:46:53 <wikibugs> ('PS2) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: add vlan 2120 back into the neutron bridge" [puppet] - ''https://gerrit.wikimedia.org/r/664255'
2021-02-15 15:47:25 <wikibugs> ('PS3) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: add vlan 2120 back into the neutron bridge" [puppet] - ''https://gerrit.wikimedia.org/r/664255 (https://phabricator.wikimedia.org/T272963)'
2021-02-15 15:47:45 <wikibugs> ('PS2) ''Elukey: hadoop: update the HDFS Namenode rack configuration [puppet] - ''https://gerrit.wikimedia.org/r/664302 (https://phabricator.wikimedia.org/T274795)'
2021-02-15 15:48:09 <logmsgbot> !log jayme@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
2021-02-15 15:48:12 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:48:54 <logmsgbot> !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host schema2003.codfw.wmnet
2021-02-15 15:48:56 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:49:03 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] Revert "cloud: hiera: add vlan 2120 back into the neutron bridge" [puppet] - ''https://gerrit.wikimedia.org/r/664255 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)'
2021-02-15 15:50:13 <wikibugs> ('PS1) ''Muehlenhoff: Remove obsolete cloudera config from reprepro [puppet] - ''https://gerrit.wikimedia.org/r/664304 (https://phabricator.wikimedia.org/T274797)'
2021-02-15 15:50:56 <wikibugs> ('CR) ''Ppchelko: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)'
2021-02-15 15:51:26 <wikibugs> ('PS1) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: connect cloudnet servers back to vlan 2120" [puppet] - ''https://gerrit.wikimedia.org/r/664256'
2021-02-15 15:51:26 <logmsgbot> !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2003.codfw.wmnet
2021-02-15 15:51:29 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:51:39 <wikibugs> ('PS2) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: connect cloudnet servers back to vlan 2120" [puppet] - ''https://gerrit.wikimedia.org/r/664256 (https://phabricator.wikimedia.org/T272963)'
2021-02-15 15:51:47 <wikibugs> ('PS3) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: connect cloudnet servers back to vlan 2120" [puppet] - ''https://gerrit.wikimedia.org/r/664256 (https://phabricator.wikimedia.org/T272963)'
2021-02-15 15:52:15 <wikibugs> 'SRE, ''Patch-For-Review: reprepro unable to run checkupdate and import upgraded packages - https://phabricator.wikimedia.org/T274797 (''fgiunchedi) Note that the elastic 5 "not found" errors seem flappy, I just got a `checkupdate` run without those errors'
2021-02-15 15:53:19 <wikibugs> ('PS1) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: enable back neutron hacks in codfw1dev" [puppet] - ''https://gerrit.wikimedia.org/r/664257'
2021-02-15 15:53:26 <wikibugs> ('PS2) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: enable back neutron hacks in codfw1dev" [puppet] - ''https://gerrit.wikimedia.org/r/664257'
2021-02-15 15:53:34 <logmsgbot> !log elukey@cumin1001 START - Cookbook sre.hosts.reboot-single for host schema2004.codfw.wmnet
2021-02-15 15:53:36 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:53:37 <wikibugs> ('PS3) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: enable back neutron hacks in codfw1dev" [puppet] - ''https://gerrit.wikimedia.org/r/664257 (https://phabricator.wikimedia.org/T272963)'
2021-02-15 15:53:49 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] Revert "cloud: hiera: connect cloudnet servers back to vlan 2120" [puppet] - ''https://gerrit.wikimedia.org/r/664256 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)'
2021-02-15 15:53:57 <wikibugs> ('CR) ''Filippo Giunchedi: [C: ''+1] Add a comment to the snapshot block [puppet] - ''https://gerrit.wikimedia.org/r/664303 (owner: ''Muehlenhoff)'
2021-02-15 15:57:26 <wikibugs> ('PS4) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: enable back neutron hacks in codfw1dev" This reverts commit 5ca98c9df08f6c6e2d97bc7b6279cdaf573eddce. Reason for revert: rebuilding the cloudgw setup Bug: T272963 Change-Id: I8185f4fa36a70255940d78db45b0f50cfc6abb98 Signed-off-by: Arturo Borrero Gonzalez <aborrero@wikimedia.org> [puppet] - ''https://gerrit.wikimedia.org/r/664257 (https://phabricator.wi'
2021-02-15 15:58:00 <logmsgbot> !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2004.codfw.wmnet
2021-02-15 15:58:03 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 15:58:12 <wikibugs> ('PS5) ''Arturo Borrero Gonzalez: Revert "cloud: hiera: enable back neutron hacks in codfw1dev" [puppet] - ''https://gerrit.wikimedia.org/r/664257 (https://phabricator.wikimedia.org/T272963)'
2021-02-15 15:58:20 <wikibugs> 'SRE, ''SRE-tools, ''User-Joe: Covert deploy_apache_change.sh to a spicerack cookbook - https://phabricator.wikimedia.org/T203948 (''jijiki)'
2021-02-15 16:02:38 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] Revert "cloud: hiera: enable back neutron hacks in codfw1dev" [puppet] - ''https://gerrit.wikimedia.org/r/664257 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)'
2021-02-15 16:04:06 <wikibugs> ('CR) ''Volans: "Thanks for the refactor, some comments inline, some already discussed over IRC." (''14 comments) [software/spicerack] - ''https://gerrit.wikimedia.org/r/661921 (https://phabricator.wikimedia.org/T267412) (owner: ''David Caro)'
2021-02-15 16:04:51 <wikibugs> 'SRE, ''ops-eqiad, ''DC-Ops, ''Wikidata, and 3 others: Upgrade firmware on wdqs1009 - https://phabricator.wikimedia.org/T274751 (''Gehel)'
2021-02-15 16:05:18 <logmsgbot> !log aborrero@cumin2001 START - Cookbook sre.hosts.reboot-single for host cloudnet2003-dev.codfw.wmnet
2021-02-15 16:05:21 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 16:05:56 <wikibugs> 'SRE: netbox update (triggered from reimage script) failed: 'ImportPuppetDB' object has no attribute 'log_error' - https://phabricator.wikimedia.org/T274802 (''MoritzMuehlenhoff)'
2021-02-15 16:07:37 <logmsgbot> !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet
2021-02-15 16:07:40 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 16:09:49 <logmsgbot> !log aborrero@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2003-dev.codfw.wmnet
2021-02-15 16:09:52 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 16:10:23 <icinga-wm> PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) is CRITICAL: Test Zotero and citoid alive returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Citoid
2021-02-15 16:11:29 <wikibugs> ('PS1) ''Arturo Borrero Gonzalez: cloudgw: stop setting up VIP addresses that are now handle via keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/664307 (https://phabricator.wikimedia.org/T272963)'
2021-02-15 16:11:55 <icinga-wm> RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
2021-02-15 16:12:12 <logmsgbot> !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2001.codfw.wmnet
2021-02-15 16:12:16 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 16:12:57 <logmsgbot> !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet
2021-02-15 16:13:01 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 16:14:35 <hoo> !log Updated the Wikidata property suggester with data from the 2021-02-01 JSON dump (with pre-applied T132839 workarounds)
2021-02-15 16:14:38 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 16:14:40 <stashbot> T132839: [RfC] Property suggester suggests human properties for non-human items - https://phabricator.wikimedia.org/T132839
2021-02-15 16:16:34 <wikibugs> ('PS2) ''Arturo Borrero Gonzalez: cloudgw: stop setting up VIP addresses that are now handle via keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/664307 (https://phabricator.wikimedia.org/T272963)'
2021-02-15 16:18:20 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] cloudgw: stop setting up VIP addresses that are now handle via keepalived/VRRP [puppet] - ''https://gerrit.wikimedia.org/r/664307 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)'
2021-02-15 16:18:35 <logmsgbot> !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2002.codfw.wmnet
2021-02-15 16:18:39 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 16:22:08 <wikibugs> ('CR) ''Muehlenhoff: [C: ''+2] Add a comment to the snapshot block [puppet] - ''https://gerrit.wikimedia.org/r/664303 (owner: ''Muehlenhoff)'
2021-02-15 16:22:14 <logmsgbot> !log jmm@puppetmaster1001 conftool action : set/pooled=inactive; selector: name=mwdebug1002.eqiad.wmnet
2021-02-15 16:22:17 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 16:24:53 <wikibugs> 'SRE: netbox update (triggered from reimage script) failed: 'ImportPuppetDB' object has no attribute 'log_error' - https://phabricator.wikimedia.org/T274802 (''Volans) p:''Triage''High a:''Volans'
2021-02-15 16:25:11 <wikibugs> ('PS1) ''Volans: interface automation: fix typo in method name [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664308 (https://phabricator.wikimedia.org/T274802)'
2021-02-15 16:26:03 <jayme> !log rolled back linkrecommendation helm releases to the most recent revision running chart verion linkrecommendation-0.0.4 on clusters codfw and eqiad (cc: kostajh)
2021-02-15 16:26:05 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 16:27:09 <logmsgbot> !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single for host kubestage1001.eqiad.wmnet
2021-02-15 16:27:13 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 16:28:09 <wikibugs> ('CR) ''Volans: [C: ''+2] "self merging as it's just a typo, will run the script against bast3004 manually to verify it" [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664308 (https://phabricator.wikimedia.org/T274802) (owner: ''Volans)'
2021-02-15 16:32:38 <logmsgbot> !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1001.eqiad.wmnet
2021-02-15 16:32:43 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 16:33:48 <volans> !log restarted netbox on netbox1001
2021-02-15 16:33:51 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 16:36:18 <icinga-wm> PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 16:36:42 <wikibugs> ('PS1) ''Volans: interface automation: fix typo in method name (2) [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664309 (https://phabricator.wikimedia.org/T274802)'
2021-02-15 16:37:12 <volans> mmmh icinga, are you sure? it's all good there, it was me and was already fixed
2021-02-15 16:37:20 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] interface automation: fix typo in method name (2) [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664309 (https://phabricator.wikimedia.org/T274802) (owner: ''Volans)'
2021-02-15 16:37:56 <wikibugs> ('PS2) ''Volans: interface automation: fix typo in method name (2) [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664309 (https://phabricator.wikimedia.org/T274802)'
2021-02-15 16:39:57 <logmsgbot> !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single for host kubestage1002.eqiad.wmnet
2021-02-15 16:40:00 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 16:40:06 <wikibugs> ('CR) ''Volans: [C: ''+2] "Typo fix." [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664309 (https://phabricator.wikimedia.org/T274802) (owner: ''Volans)'
2021-02-15 16:40:14 <icinga-wm> PROBLEM - kubelet operational latencies on kubestage1001 is CRITICAL: instance=kubestage1001.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
2021-02-15 16:40:14 <wikibugs> ('PS1) ''Kosta Harlan: linkrecommendation: Set backoffLimit to 1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/664310 (https://phabricator.wikimedia.org/T265893)'
2021-02-15 16:40:45 <jayme> ^ thats "expected" (kind of) from reboots
2021-02-15 16:41:29 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] linkrecommendation: Set backoffLimit to 1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/664310 (https://phabricator.wikimedia.org/T265893) (owner: ''Kosta Harlan)'
2021-02-15 16:41:40 <icinga-wm> RECOVERY - kubelet operational latencies on kubestage1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
2021-02-15 16:43:00 <wikibugs> ('PS2) ''Kosta Harlan: linkrecommendation: Set backoffLimit to 1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/664310 (https://phabricator.wikimedia.org/T265893)'
2021-02-15 16:43:18 <icinga-wm> RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 16:44:44 <wikibugs> 'SRE, ''CAS-SSO, ''Patch-For-Review: Investigate CAS Session logout - https://phabricator.wikimedia.org/T273867 (''Gehel) Removing discovery-search, if you need our help again, please ping us!'
2021-02-15 16:46:44 <logmsgbot> !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1002.eqiad.wmnet
2021-02-15 16:46:49 <stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
2021-02-15 16:48:30 <icinga-wm> PROBLEM - Check systemd state on netbox2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 16:48:50 <wikibugs> 'SRE, ''Patch-For-Review: netbox update (triggered from reimage script) failed: 'ImportPuppetDB' object has no attribute 'log_error' - https://phabricator.wikimedia.org/T274802 (''Volans) a:''Volans''crusnov @crusnov passing it over to you. I've fixed the basic typos, but the problem now is that the scri...'
2021-02-15 16:49:43 <wikibugs> ('PS1) ''Arturo Borrero Gonzalez: cloudgw: switch data place interface config modes to manual [puppet] - ''https://gerrit.wikimedia.org/r/664311 (https://phabricator.wikimedia.org/T272963)'
2021-02-15 16:49:51 <wikibugs> 'SRE, ''Patch-For-Review: netbox update (triggered from reimage script) failed: 'ImportPuppetDB' object has no attribute 'log_error' - https://phabricator.wikimedia.org/T274802 (''crusnov) That seems reasonable, I'll look at it and get a patch out soonish.'
2021-02-15 16:52:45 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] cloudgw: switch data place interface config modes to manual [puppet] - ''https://gerrit.wikimedia.org/r/664311 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)'
2021-02-15 16:53:09 <icinga-wm> PROBLEM - kubelet operational latencies on kubestage1002 is CRITICAL: instance=kubestage1002.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
2021-02-15 16:57:37 <icinga-wm> RECOVERY - kubelet operational latencies on kubestage1002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
2021-02-15 17:00:58 <wikibugs> 'SRE, ''Maps, ''Product-Infrastructure-Team-Backlog, ''Services, ''Service-deployment-requests: New Service Request geoshapes - https://phabricator.wikimedia.org/T274388 (''akosiaris) Thanks for this task! So I 've studied the diagrams a bit, they are helpful. The deployment pipeline definitely suppor...'
2021-02-15 17:03:18 <wikibugs> ('CR) ''Elukey: [C: ''+1] "Just to confirm - this will keep the cloudera components but clear all the pull-specific bits. If so, big +1, thanks :)" [puppet] - ''https://gerrit.wikimedia.org/r/664304 (https://phabricator.wikimedia.org/T274797) (owner: ''Muehlenhoff)'
2021-02-15 17:16:13 <wikibugs> ('CR) ''Elukey: "John thanks a lot for the review! For this particular use case, I'd prefer to just move the existing code base to the class api and then m" [cookbooks] - ''https://gerrit.wikimedia.org/r/663878 (owner: ''Elukey)'
2021-02-15 17:27:06 <wikibugs> ('CR) ''Elukey: [C: ''+2] hadoop: update the HDFS Namenode rack configuration [puppet] - ''https://gerrit.wikimedia.org/r/664302 (https://phabricator.wikimedia.org/T274795) (owner: ''Elukey)'
2021-02-15 17:28:16 <wikibugs> ('PS1) ''Jcrespo: configcluster: Enable etcd v3 backups for stretch hosts [puppet] - ''https://gerrit.wikimedia.org/r/664313 (https://phabricator.wikimedia.org/T271573)'
2021-02-15 17:28:18 <wikibugs> ('PS1) ''Jcrespo: bacula: Revert TLS 1.0 downgrade on storage servers (including director) [puppet] - ''https://gerrit.wikimedia.org/r/664314 (https://phabricator.wikimedia.org/T273182)'
2021-02-15 17:29:54 <wikibugs> ('Abandoned) ''Jcrespo: jessie: Remove old openssl override after revert to package version [puppet] - ''https://gerrit.wikimedia.org/r/660857 (https://phabricator.wikimedia.org/T273182) (owner: ''Jcrespo)'
2021-02-15 17:30:04 <wikibugs> ('CR) ''Kosta Harlan: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)'
2021-02-15 17:32:07 <wikibugs> ('CR) ''JMeybohm: [C: ''+1] linkrecommendation: Set backoffLimit to 1 [deployment-charts] - ''https://gerrit.wikimedia.org/r/664310 (https://phabricator.wikimedia.org/T265893) (owner: ''Kosta Harlan)'
2021-02-15 17:32:43 <icinga-wm> RECOVERY - Check systemd state on netbox2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
2021-02-15 17:32:43 <wikibugs> ('PS10) ''David Caro: toolforge.etcdctl: add new etcdctl module [software/spicerack] - ''https://gerrit.wikimedia.org/r/661921 (https://phabricator.wikimedia.org/T267412)'
2021-02-15 17:33:16 <wikibugs> ('CR) ''David Caro: "Done all the changes as requested" (''13 comments) [software/spicerack] - ''https://gerrit.wikimedia.org/r/661921 (https://phabricator.wikimedia.org/T267412) (owner: ''David Caro)'
2021-02-15 17:39:15 <wikibugs> ('CR) ''Jcrespo: "Have you tested backups with the script on etcd3? I don't see anything, like a path, completely wrong, but I don't know enough about what " [puppet] - ''https://gerrit.wikimedia.org/r/664313 (https://phabricator.wikimedia.org/T271573) (owner: ''Jcrespo)'
2021-02-15 17:41:17 <wikibugs> 'SRE, ''serviceops, ''Patch-For-Review: upgrade conf2* servers to stretch - https://phabricator.wikimedia.org/T271573 (''jcrespo) I've sent: https://gerrit.wikimedia.org/r/c/operations/puppet/+/664313 Independently of the pace of upgrading, we should give some priority to generating fresh backups from the...'
2021-02-15 17:43:56 <wikibugs> ('PS2) ''Jcrespo: configcluster: Enable etcd v3 backups for stretch hosts [puppet] - ''https://gerrit.wikimedia.org/r/664313 (https://phabricator.wikimedia.org/T271573)'
2021-02-15 17:44:23 <wikibugs> ('PS3) ''Jcrespo: configcluster: Enable etcd v3 backups for stretch hosts [puppet] - ''https://gerrit.wikimedia.org/r/664313 (https://phabricator.wikimedia.org/T271573)'
2021-02-15 17:55:42 <wikibugs> ('PS1) ''Arturo Borrero Gonzalez: cloudgw: interfaces: relax check on routing setup by using 'onlink' [puppet] - ''https://gerrit.wikimedia.org/r/664317 (https://phabricator.wikimedia.org/T272963)'
2021-02-15 17:57:40 <wikibugs> ('CR) ''Arturo Borrero Gonzalez: [C: ''+2] cloudgw: interfaces: relax check on routing setup by using 'onlink' [puppet] - ''https://gerrit.wikimedia.org/r/664317 (https://phabricator.wikimedia.org/T272963) (owner: ''Arturo Borrero Gonzalez)'
2021-02-15 17:59:36 <wikibugs> ('CR) ''Muehlenhoff: "> Patch Set 1: Code-Review+1" [puppet] - ''https://gerrit.wikimedia.org/r/664304 (https://phabricator.wikimedia.org/T274797) (owner: ''Muehlenhoff)'
2021-02-15 18:00:04 <jouncebot> ryankemper: Dear deployers, time to do the Wikidata Query Service weekly deploy deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210215T1800).
2021-02-15 18:05:14 <wikibugs> ('CR) ''Ppchelko: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)'
2021-02-15 18:10:38 <wikibugs> ('CR) ''Jbond: [C: ''+1] "> Patch Set 1:" [cookbooks] - ''https://gerrit.wikimedia.org/r/663878 (owner: ''Elukey)'
2021-02-15 18:14:52 <wikibugs> 'SRE, ''DBA, ''serviceops, ''Goal, ''Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (''jcrespo)'
2021-02-15 18:15:15 <wikibugs> 'SRE, ''Data-Persistence-Backup, ''Goal, ''Patch-For-Review: Followup to backup1001 bacula switchover (misc pending tasks) - https://phabricator.wikimedia.org/T238048 (''jcrespo)'
2021-02-15 18:15:40 <wikibugs> ('CR) ''Kosta Harlan: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)'
2021-02-15 18:15:41 <wikibugs> 'SRE, ''Data-Persistence-Backup, ''Goal, ''Patch-For-Review: Followup to backup1001 bacula switchover (misc pending tasks) - https://phabricator.wikimedia.org/T238048 (''jcrespo) ''Open''Resolved Regarding the last 2 points, we have, in a way, done the last point "parametrize better the jobdefaults i...'
2021-02-15 18:17:39 <icinga-wm> PROBLEM - Uncommitted DNS changes in Netbox on netbox1001 is CRITICAL: Netbox has uncommitted DNS changes https://wikitech.wikimedia.org/wiki/Monitoring/Netbox_DNS_uncommitted_changes
2021-02-15 18:28:38 <wikibugs> ('PS1) ''Effie Mouzeli: (WIP) mediawiki::alerts add alert when 20% of servers is saturated [puppet] - ''https://gerrit.wikimedia.org/r/664319 (https://phabricator.wikimedia.org/T267176)'
2021-02-15 18:33:52 <wikibugs> ('CR) ''Ppchelko: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)'
2021-02-15 18:41:27 <icinga-wm> PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
2021-02-15 18:41:47 <icinga-wm> PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
2021-02-15 18:45:40 <jynus> that looks like DPLA bot on commons
2021-02-15 18:46:29 <jynus> I see no issues, but keep an eye in case something degrades (thumbail generation, codfw s4 replication, etc.)
2021-02-15 18:47:54 <jynus> that's 10 1MB files per second
2021-02-15 18:48:16 <tabbycat> jynus: swift is TimedMediaHandler or just the place where uploads are being stored?
2021-02-15 18:49:21 <jynus> swift is our OpenStack Swift cluster, our backend storage for media and rendered stuff: https://wikitech.wikimedia.org/wiki/Swift
2021-02-15 18:49:59 <jynus> the alert is just a warning on a high rate of uploads- that doesn't mean there is a problem, but it is an unusual state
2021-02-15 18:50:23 <jynus> normally we worry when it is very low, because it means there is a problem with uploads
2021-02-15 19:00:04 <jouncebot> No GERRIT patches in the queue for this window AFAICS.
2021-02-15 19:00:04 <jouncebot> RoanKattouw, Niharika, and Urbanecm: (Dis)respected human, time to deploy Morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210215T1900). Please do the needful.
2021-02-15 19:00:56 <stashbot> T248177: Enforce upload rate limits for bots on commons - https://phabricator.wikimedia.org/T248177
2021-02-15 19:00:56 <Urbanecm> jynus: do we want to do T248177?
2021-02-15 19:01:29 <Urbanecm> (but 999 uploads per second is effectively no rate limit anyway :/ )
2021-02-15 19:02:09 <tabbycat> 999/s is o_O
2021-02-15 19:03:32 <tabbycat> IIRC there is/was an UploadStash for large or batch uploads Urbanecm ?
2021-02-15 19:04:10 <Urbanecm> there's still uploadstash, dunno if it helps with ratelimited uploads
2021-02-15 19:11:01 <icinga-wm> PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
2021-02-15 19:21:03 <icinga-wm> PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
2021-02-15 19:28:58 <wikibugs> ('CR) ''Kosta Harlan: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)'
2021-02-15 19:31:51 <wikibugs> ('CR) ''CRusnov: "This change is ready for review." [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664332 (https://phabricator.wikimedia.org/T274802) (owner: ''CRusnov)'
2021-02-15 20:10:06 <wikibugs> ('PS1) ''Ladsgroup: [DNM] Test jenkins new rule on banning use of hiera() [puppet] - ''https://gerrit.wikimedia.org/r/664350'
2021-02-15 20:11:43 <wikibugs> ('CR) ''jerkins-bot: [V: ''-1] [DNM] Test jenkins new rule on banning use of hiera() [puppet] - ''https://gerrit.wikimedia.org/r/664350 (owner: ''Ladsgroup)'
2021-02-15 20:25:00 <wikibugs> ('Abandoned) ''Ladsgroup: [DNM] Test jenkins new rule on banning use of hiera() [puppet] - ''https://gerrit.wikimedia.org/r/664350 (owner: ''Ladsgroup)'
2021-02-15 20:30:51 <wikibugs> 'SRE, ''SRE-Access-Requests, ''Patch-For-Review: Requesting access to Analytic Cluster for Research Scientist (Paragon) - https://phabricator.wikimedia.org/T274631 (''leila) approved. Thank you for your support!'
2021-02-15 20:46:21 <icinga-wm> PROBLEM - MegaRAID on an-worker1097 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
2021-02-15 20:46:24 <icinga-wm> ACKNOWLEDGEMENT - MegaRAID on an-worker1097 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T274819 https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring
2021-02-15 20:46:27 <wikibugs> 'SRE, ''ops-eqiad: Degraded RAID on an-worker1097 - https://phabricator.wikimedia.org/T274819 (''ops-monitoring-bot)'
2021-02-15 20:47:01 <wikibugs> 'SRE, ''ops-eqiad, ''Analytics: Degraded RAID on an-worker1097 - https://phabricator.wikimedia.org/T274819 (''Peachey88)'
2021-02-15 21:00:04 <jouncebot> chrisalbon and accraze: It is that lovely time of the day again! You are hereby commanded to deploy Services – Graphoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210215T2100).
2021-02-15 21:51:52 <icinga-wm> PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw
2021-02-15 21:52:04 <icinga-wm> PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad
2021-02-15 22:00:04 <jouncebot> Reedy and sbassett: Dear deployers, time to do the Weekly Security deployment window deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210215T2200).
2021-02-15 22:50:50 <wikibugs> ('CR) ''Volans: [C: ''+1] "Code looks good to me, please test it on netbox-next to be sure." (''1 comment) [software/netbox-extras] - ''https://gerrit.wikimedia.org/r/664332 (https://phabricator.wikimedia.org/T274802) (owner: ''CRusnov)'
2021-02-15 22:52:34 <icinga-wm> PROBLEM - Device not healthy -SMART- on an-worker1097 is CRITICAL: cluster=analytics device=sat+megaraid,13 instance=an-worker1097 job=node site=eqiad https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=an-worker1097&var-datasource=eqiad+prometheus/ops
2021-02-15 23:31:52 <wikibugs> ('CR) ''Gergő Tisza: api-gateway: generic discovery service config option, add linkrecommendation (''1 comment) [deployment-charts] - ''https://gerrit.wikimedia.org/r/662692 (https://phabricator.wikimedia.org/T269581) (owner: ''Hnowlan)'

This page is generated from SQL logs, you can also download static txt files from here