[00:10:03] <Reedy>	 You need to get someone with permission to do it
[00:10:09] <Reedy>	 And/or get yourself added to the allow list
[00:11:57] <wikibugs>	 (03CR) 10Reedy: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/741980 (https://phabricator.wikimedia.org/T296136) (owner: 104nn1l2)
[00:12:40] <wikibugs>	 (03PS3) 10Reedy: enwikisource: enable anonymous talk page mobile tabs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/741097 (https://phabricator.wikimedia.org/T47955) (owner: 10Inductiveload)
[00:16:49] <nn1l2>	 Thanks, how can I get myself added to the allow list?
[00:20:04] <AntiComposite>	 https://www.mediawiki.org/wiki/Continuous_integration/Allow_list
[00:20:26] <AntiComposite>	 tl;dr convince someone that you aren't malicious and should be added to https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/refs/heads/master/zuul/layout.yaml
[00:25:32] <nn1l2>	 Thanks, I'm an admin and interface admin on Commons: https://commons.wikimedia.org/wiki/User:4nn1l2 Been around about 10 years. Here is a list of my previous commits: https://phabricator.wikimedia.org/people/commits/4285/ Could someone pleas add me to the list?
[00:50:01] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=sidekiq site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[00:52:13] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[00:56:01] <nn1l2>	 Here is the associated patch: https://gerrit.wikimedia.org/r/c/integration/config/+/741985      Should I schedule it for a backport window or sth?
[04:30:45] <icinga-wm>	 PROBLEM - Check systemd state on db1115 is CRITICAL: CRITICAL - degraded: The following units failed: wmf_auto_restart_prometheus-mysqld-exporter.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:43:19] <icinga-wm>	 PROBLEM - SSH on kubernetes1004.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:41:07] <icinga-wm>	 RECOVERY - Check systemd state on db1115 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:44:31] <icinga-wm>	 RECOVERY - SSH on kubernetes1004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[05:54:21] <wikibugs>	 (03PS1) 10Marostegui: control-mariadb-client-10.4-bullseye: Bump version [software] - 10https://gerrit.wikimedia.org/r/741997 (https://phabricator.wikimedia.org/T295965)
[05:55:32] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] control-mariadb-client-10.4-bullseye: Bump version [software] - 10https://gerrit.wikimedia.org/r/741997 (https://phabricator.wikimedia.org/T295965) (owner: 10Marostegui)
[05:56:03] <wikibugs>	 (03Merged) 10jenkins-bot: control-mariadb-client-10.4-bullseye: Bump version [software] - 10https://gerrit.wikimedia.org/r/741997 (https://phabricator.wikimedia.org/T295965) (owner: 10Marostegui)
[06:19:45] <Amir1>	 !log killing lingering process from mwmaint to depooled db (db1160) that was depooled nine hours ago
[06:19:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:28:11] <Amir1>	 !log killing extensions/MachineVision/maintenance/fetchSuggestions.php in mwmaint
[06:28:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:35:10] <Amir1>	 Created T296507
[06:35:11] <stashbot>	 T296507: fetchSuggestions opens connection to depooled database after nine hours - https://phabricator.wikimedia.org/T296507
[07:13:59] <icinga-wm>	 PROBLEM - SSH on bast5002 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring
[07:16:11] <icinga-wm>	 RECOVERY - SSH on bast5002 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[07:34:11] <wikibugs>	 (03CR) 10Elukey: "Left some ideas/comments, let me know your thoughts John!" [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089) (owner: 10Jbond)
[07:43:20] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'After maintenance db1160 (T296143)', diff saved to https://phabricator.wikimedia.org/P17873 and previous config saved to /var/cache/conftool/dbconfig/20211126-074320-ladsgroup.json
[07:43:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:43:25] <stashbot>	 T296143: Optimize commonswiki image table - https://phabricator.wikimedia.org/T296143
[07:58:25] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'After maintenance db1160 (T296143)', diff saved to https://phabricator.wikimedia.org/P17874 and previous config saved to /var/cache/conftool/dbconfig/20211126-075824-ladsgroup.json
[07:58:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:58:29] <stashbot>	 T296143: Optimize commonswiki image table - https://phabricator.wikimedia.org/T296143
[08:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211126T0800)
[08:13:29] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'After maintenance db1160 (T296143)', diff saved to https://phabricator.wikimedia.org/P17875 and previous config saved to /var/cache/conftool/dbconfig/20211126-081329-ladsgroup.json
[08:13:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:13:34] <stashbot>	 T296143: Optimize commonswiki image table - https://phabricator.wikimedia.org/T296143
[08:28:34] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'After maintenance db1160 (T296143)', diff saved to https://phabricator.wikimedia.org/P17876 and previous config saved to /var/cache/conftool/dbconfig/20211126-082834-ladsgroup.json
[08:28:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:28:39] <stashbot>	 T296143: Optimize commonswiki image table - https://phabricator.wikimedia.org/T296143
[08:50:03] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Wikibase Release Strategy, 10Wikidata, 10wdwb-tech: Requesting access to releasers-wikibase for rosalie-WMDE - https://phabricator.wikimedia.org/T295765 (10Rosalie_WMDE) @Jelto The document has been signed
[08:50:19] <wikibugs>	 10SRE, 10SRE-Access-Requests, 10Wikibase Release Strategy, 10Wikidata, 10wdwb-tech: Requesting access to releasers-wikibase for rosalie-WMDE - https://phabricator.wikimedia.org/T295765 (10Rosalie_WMDE)
[08:53:19] <wikibugs>	 10SRE, 10Data-Persistence, 10observability, 10Patch-For-Review: MySQL metrics monitoring - https://phabricator.wikimedia.org/T143896 (10Marostegui)
[09:06:40] <wikibugs>	 (03PS1) 10Majavah: devtools: set doc1002 to use local puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/742078
[09:08:43] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "LGTM, minor pedantic comment." [deployment-charts] - 10https://gerrit.wikimedia.org/r/740858 (https://phabricator.wikimedia.org/T296303) (owner: 10JMeybohm)
[09:13:33] <wikibugs>	 (03PS7) 10Majavah: P::doc: sync data to non-active servers [puppet] - 10https://gerrit.wikimedia.org/r/741713 (https://phabricator.wikimedia.org/T247653)
[09:15:28] <wikibugs>	 (03CR) 10Majavah: "Tested on "devtools" cloud vps project. Works as expected." [puppet] - 10https://gerrit.wikimedia.org/r/741713 (https://phabricator.wikimedia.org/T247653) (owner: 10Majavah)
[09:23:35] <wikibugs>	 (03PS4) 10David Caro: WIP cli: add --fail-fast flag and behavior [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/740539 (https://phabricator.wikimedia.org/T295028)
[09:23:52] <wikibugs>	 (03PS5) 10David Caro: cli: add --fail-fast flag and behavior [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/740539 (https://phabricator.wikimedia.org/T295028)
[09:24:55] <wikibugs>	 (03CR) 10David Caro: cli: add --fail-fast flag and behavior (031 comment) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/740539 (https://phabricator.wikimedia.org/T295028) (owner: 10David Caro)
[09:35:24] <wikibugs>	 (03CR) 10David Caro: cli: add --fail-fast flag and behavior (033 comments) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/740539 (https://phabricator.wikimedia.org/T295028) (owner: 10David Caro)
[09:37:13] <wikibugs>	 (03PS6) 10David Caro: cli: add --fail-fast flag and behavior [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/740539 (https://phabricator.wikimedia.org/T295028)
[09:44:06] <wikibugs>	 (03CR) 10David Caro: cli: add --fail-fast flag and behavior (032 comments) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/740539 (https://phabricator.wikimedia.org/T295028) (owner: 10David Caro)
[09:50:12] <wikibugs>	 (03PS11) 10Jbond: P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089)
[09:50:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089) (owner: 10Jbond)
[09:51:10] <wikibugs>	 (03PS4) 10David Caro: timesyncd: handle bullseye ntp hosts [puppet] - 10https://gerrit.wikimedia.org/r/741849 (https://phabricator.wikimedia.org/T296456)
[09:51:48] <wikibugs>	 (03PS12) 10Jbond: P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089)
[09:52:00] <wikibugs>	 (03PS5) 10David Caro: timesyncd: handle bullseye ntp hosts [puppet] - 10https://gerrit.wikimedia.org/r/741849 (https://phabricator.wikimedia.org/T296456)
[09:52:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089) (owner: 10Jbond)
[09:52:28] <wikibugs>	 (03PS6) 10David Caro: timesyncd: handle bullseye ntp hosts [puppet] - 10https://gerrit.wikimedia.org/r/741849 (https://phabricator.wikimedia.org/T296456)
[09:54:26] <wikibugs>	 (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32656/console" [puppet] - 10https://gerrit.wikimedia.org/r/741849 (https://phabricator.wikimedia.org/T296456) (owner: 10David Caro)
[09:55:10] <wikibugs>	 (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32657/console" [puppet] - 10https://gerrit.wikimedia.org/r/741849 (https://phabricator.wikimedia.org/T296456) (owner: 10David Caro)
[09:55:34] <wikibugs>	 (03CR) 10David Caro: [V: 03+1 C: 03+2] timesyncd: handle bullseye ntp hosts [puppet] - 10https://gerrit.wikimedia.org/r/741849 (https://phabricator.wikimedia.org/T296456) (owner: 10David Caro)
[09:55:45] <wikibugs>	 (03CR) 10David Caro: [V: 03+1 C: 03+2] timesyncd: handle bullseye ntp hosts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/741849 (https://phabricator.wikimedia.org/T296456) (owner: 10David Caro)
[09:58:33] <icinga-wm>	 PROBLEM - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp3056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[10:04:06] <wikibugs>	 (03PS1) 10David Caro: timsyncd: Flip the handling service condition [puppet] - 10https://gerrit.wikimedia.org/r/742107 (https://phabricator.wikimedia.org/T296456)
[10:04:28] <wikibugs>	 (03CR) 10David Caro: [C: 03+2] timsyncd: Flip the handling service condition [puppet] - 10https://gerrit.wikimedia.org/r/742107 (https://phabricator.wikimedia.org/T296456) (owner: 10David Caro)
[10:04:51] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance T296143
[10:04:53] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance T296143
[10:04:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:04:56] <stashbot>	 T296143: Optimize commonswiki image table - https://phabricator.wikimedia.org/T296143
[10:04:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:05:26] <jinxer-wm>	 (KubernetesCalicoDown) firing: kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[10:05:42] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance T296274
[10:05:43] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance T296274
[10:05:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:05:45] <stashbot>	 T296274: Clean up wikiadmin GRANTs mess - https://phabricator.wikimedia.org/T296274
[10:05:48] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1177 (T296274)', diff saved to https://phabricator.wikimedia.org/P17877 and previous config saved to /var/cache/conftool/dbconfig/20211126-100547-ladsgroup.json
[10:05:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:05:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:07:36] <wikibugs>	 (03PS13) 10Jbond: P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089)
[10:08:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089) (owner: 10Jbond)
[10:09:06] <wikibugs>	 (03PS1) 10Ayounsi: Pmacct add sflow listener [puppet] - 10https://gerrit.wikimedia.org/r/742110 (https://phabricator.wikimedia.org/T263277)
[10:09:25] <icinga-wm>	 RECOVERY - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp3056 is OK: HTTP OK: HTTP/1.0 200 OK - 23694 bytes in 0.249 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[10:10:26] <jinxer-wm>	 (KubernetesCalicoDown) resolved: (2) kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[10:13:32] <wikibugs>	 (03PS14) 10Jbond: P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089)
[10:14:09] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089) (owner: 10Jbond)
[10:14:24] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repool after fixing users T296274', diff saved to https://phabricator.wikimedia.org/P17878 and previous config saved to /var/cache/conftool/dbconfig/20211126-101423-ladsgroup.json
[10:14:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:14:29] <stashbot>	 T296274: Clean up wikiadmin GRANTs mess - https://phabricator.wikimedia.org/T296274
[10:17:08] <logmsgbot>	 !log ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance T296274
[10:17:09] <logmsgbot>	 !log ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance T296274
[10:17:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:14] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db1111 (T296274)', diff saved to https://phabricator.wikimedia.org/P17879 and previous config saved to /var/cache/conftool/dbconfig/20211126-101714-ladsgroup.json
[10:17:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:20:25] <wikibugs>	 (03PS15) 10Jbond: P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089)
[10:20:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089) (owner: 10Jbond)
[10:23:06] <wikibugs>	 (03PS16) 10Jbond: P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089)
[10:23:40] <logmsgbot>	 !log ladsgroup@cumin1001 dbctl commit (dc=all): 'Repool after fixing users T296274', diff saved to https://phabricator.wikimedia.org/P17880 and previous config saved to /var/cache/conftool/dbconfig/20211126-102340-ladsgroup.json
[10:23:41] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089) (owner: 10Jbond)
[10:23:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:23:45] <stashbot>	 T296274: Clean up wikiadmin GRANTs mess - https://phabricator.wikimedia.org/T296274
[10:23:50] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32661/console" [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089) (owner: 10Jbond)
[10:26:37] <wikibugs>	 (03PS1) 10David Caro: tests: move to pytest [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/742112 (https://phabricator.wikimedia.org/T296481)
[10:28:38] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10Traffic-Icebox, 10netops, 10Patch-For-Review: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10ayounsi) I went the "set a different sampling pipeline for internal flows" way with the above POC for the reasons mentioned in T263...
[10:33:11] <wikibugs>	 (03PS17) 10Jbond: P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089)
[10:33:26] <wikibugs>	 (03CR) 10Jbond: P:base::certificates: update support for trusted CA (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089) (owner: 10Jbond)
[10:34:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089) (owner: 10Jbond)
[10:35:51] <wikibugs>	 (03CR) 10Jbond: P:base::certificates: update support for trusted CA (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089) (owner: 10Jbond)
[10:37:26] <jinxer-wm>	 (KubernetesCalicoDown) firing: kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[10:37:27] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site={codfw,eqiad} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[10:42:26] <jinxer-wm>	 (KubernetesCalicoDown) firing: (2) kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[10:47:26] <jinxer-wm>	 (KubernetesCalicoDown) resolved: (2) kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[10:54:02] <wikibugs>	 (03CR) 10Jbond: "looks good and my local tests pass" [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/740539 (https://phabricator.wikimedia.org/T295028) (owner: 10David Caro)
[10:56:49] <wikibugs>	 (03PS18) 10Jbond: P:base::certificates: update support for trusted CA [puppet] - 10https://gerrit.wikimedia.org/r/741867 (https://phabricator.wikimedia.org/T296089)
[10:59:04] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/742112 (https://phabricator.wikimedia.org/T296481) (owner: 10David Caro)
[11:00:51] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[11:01:26] <jynus>	 atlas_exporter monitoring is flapping on an off quite frequently lately
[11:04:26] <jinxer-wm>	 (KubernetesCalicoDown) firing: kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[11:09:26] <jinxer-wm>	 (KubernetesCalicoDown) resolved: (2) kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[11:13:29] <wikibugs>	 (03CR) 10David Caro: cli: add --fail-fast flag and behavior (031 comment) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/740539 (https://phabricator.wikimedia.org/T295028) (owner: 10David Caro)
[11:15:45] <wikibugs>	 (03PS1) 10Filippo Giunchedi: pontoon: use profile::base on puppet master [puppet] - 10https://gerrit.wikimedia.org/r/742121
[11:15:49] <wikibugs>	 (03CR) 10Jbond: cli: add --fail-fast flag and behavior (031 comment) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/740539 (https://phabricator.wikimedia.org/T295028) (owner: 10David Caro)
[11:18:07] <wikibugs>	 (03Abandoned) 10Filippo Giunchedi: pontoon: tmp remove base::puppet for duplicate declaration? [puppet] - 10https://gerrit.wikimedia.org/r/740595 (owner: 10Filippo Giunchedi)
[11:19:16] <wikibugs>	 (03PS1) 10Vgutierrez: cache::haproxy: Set stat socket privileve level to admin [puppet] - 10https://gerrit.wikimedia.org/r/742122 (https://phabricator.wikimedia.org/T290005)
[11:20:28] <wikibugs>	 (03PS2) 10Vgutierrez: cache::haproxy: Set stat socket privilege level to admin [puppet] - 10https://gerrit.wikimedia.org/r/742122 (https://phabricator.wikimedia.org/T290005)
[11:24:21] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] cache::haproxy: Set stat socket privilege level to admin [puppet] - 10https://gerrit.wikimedia.org/r/742122 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[11:25:50] <vgutierrez>	 !log restarting HAProxy on O:cache::(text|upload)_haproxy - T290005
[11:25:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:25:55] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[11:32:26] <jinxer-wm>	 (KubernetesCalicoDown) firing: (2) kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[11:39:15] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [staging-codfw] START helmfile.d/admin 'sync'.
[11:39:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:41:10] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [staging-codfw] START helmfile.d/admin 'sync'.
[11:41:12] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
[11:41:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:41:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:41:17] <akosiaris>	 !log T296303 cleanup weird state of calico-codfw cluster
[11:41:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:41:20] <stashbot>	 T296303: New Kubernetes nodes may end up with no Pod IPv4 block assigned - https://phabricator.wikimedia.org/T296303
[11:47:26] <jinxer-wm>	 (KubernetesCalicoDown) resolved: (2) kubestage2001.codfw.wmnet:9091 is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org
[11:56:34] <wikibugs>	 (03CR) 10David Caro: "I got a question and a few nits (feel free to ignore those)" [puppet] - 10https://gerrit.wikimedia.org/r/740915 (https://phabricator.wikimedia.org/T295247) (owner: 10Majavah)
[12:06:45] <wikibugs>	 (03PS1) 10Vgutierrez: cache::haproxy: Relax HTTP parsing rules [puppet] - 10https://gerrit.wikimedia.org/r/742128 (https://phabricator.wikimedia.org/T290005)
[12:09:04] <wikibugs>	 (03CR) 10Vgutierrez: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32662/console" [puppet] - 10https://gerrit.wikimedia.org/r/742128 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[12:12:29] <wikibugs>	 (03CR) 10Vgutierrez: [V: 03+1 C: 03+2] cache::haproxy: Relax HTTP parsing rules [puppet] - 10https://gerrit.wikimedia.org/r/742128 (https://phabricator.wikimedia.org/T290005) (owner: 10Vgutierrez)
[12:19:14] <wikibugs>	 10SRE, 10DBA, 10Privacy Engineering, 10WMF-Legal, and 3 others: dbtree loads third party resources (from google.com/jsapi) - https://phabricator.wikimedia.org/T96499 (10Marostegui) 05Stalled→03Declined We are going to deprecate tendril in favour of orchestrator, we've already opened it for people under...
[12:21:33] <vgutierrez>	 !log restarting HAProxy on O:cache::upload_haproxy - T290005
[12:21:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:21:37] <stashbot>	 T290005: Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005
[12:23:39] <wikibugs>	 (03PS1) 10Jbond: P:puppet_compiler::postgres_database: create ssl directory [puppet] - 10https://gerrit.wikimedia.org/r/742130
[12:24:01] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] P:puppet_compiler::postgres_database: create ssl directory [puppet] - 10https://gerrit.wikimedia.org/r/742130 (owner: 10Jbond)
[12:40:37] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: ceph: move bootstrap keyring into new auth abstraction [puppet] - 10https://gerrit.wikimedia.org/r/742132 (https://phabricator.wikimedia.org/T293752)
[12:42:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ceph: move bootstrap keyring into new auth abstraction [puppet] - 10https://gerrit.wikimedia.org/r/742132 (https://phabricator.wikimedia.org/T293752) (owner: 10Arturo Borrero Gonzalez)
[12:43:11] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: hieradata: ceph: refresh bootstrap auth [labs/private] - 10https://gerrit.wikimedia.org/r/742133 (https://phabricator.wikimedia.org/T293752)
[12:45:26] <wikibugs>	 (03PS1) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[12:45:59] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32663/console" [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[12:46:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[12:50:10] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: ceph: move bootstrap keyring into new auth abstraction [puppet] - 10https://gerrit.wikimedia.org/r/742132 (https://phabricator.wikimedia.org/T293752)
[12:54:44] <wikibugs>	 (03CR) 10Majavah: [C: 03+1] "Let's give this a try at some point!" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/738503 (https://phabricator.wikimedia.org/T293552) (owner: 10Legoktm)
[12:58:44] <wikibugs>	 (03PS2) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[12:59:19] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32664/console" [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[12:59:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[12:59:35] <icinga-wm>	 PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[13:02:48] <wikibugs>	 (03PS3) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[13:03:22] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:04:24] <wikibugs>	 (03PS4) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[13:05:09] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:05:11] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32666/console" [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:08:55] <wikibugs>	 (03PS1) 10Kormat: .gitignore: Ignore __pycache__ dirs. [software/wmfdb] - 10https://gerrit.wikimedia.org/r/742140
[13:11:31] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: use profile::base on puppet master [puppet] - 10https://gerrit.wikimedia.org/r/742121 (owner: 10Filippo Giunchedi)
[13:13:34] <wikibugs>	 (03PS5) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[13:14:08] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32667/console" [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:14:09] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:16:24] <wikibugs>	 (03PS6) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[13:16:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:17:25] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32669/console" [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:24:37] <icinga-wm>	 RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.68 ms
[13:25:03] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[13:25:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:25:50] <wikibugs>	 (03PS4) 10Jelto: helmfile.d:miscweb add node affinity to ssd nodes [deployment-charts] - 10https://gerrit.wikimedia.org/r/741124
[13:25:52] <logmsgbot>	 !log akosiaris@deploy1002 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[13:25:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:26:47] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: ceph: move bootstrap keyring into new auth abstraction [puppet] - 10https://gerrit.wikimedia.org/r/742132 (https://phabricator.wikimedia.org/T293752)
[13:29:15] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[13:29:55] <wikibugs>	 (03CR) 10David Caro: ceph: move bootstrap keyring into new auth abstraction (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/742132 (https://phabricator.wikimedia.org/T293752) (owner: 10Arturo Borrero Gonzalez)
[13:31:19] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[13:33:15] <icinga-wm>	 PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[13:33:54] <wikibugs>	 (03PS4) 10Arturo Borrero Gonzalez: ceph: move bootstrap keyring into new auth abstraction [puppet] - 10https://gerrit.wikimedia.org/r/742132 (https://phabricator.wikimedia.org/T293752)
[13:35:37] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: ceph: move bootstrap keyring into new auth abstraction (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/742132 (https://phabricator.wikimedia.org/T293752) (owner: 10Arturo Borrero Gonzalez)
[13:35:49] <wikibugs>	 10SRE, 10Analytics, 10Observability-Metrics: statsd and gunicorn metrics for superset - https://phabricator.wikimedia.org/T293761 (10fgiunchedi)
[13:36:16] <wikibugs>	 10SRE, 10Observability-Logging, 10User-ema: rsyslog errors about duplicate module includes - https://phabricator.wikimedia.org/T292175 (10fgiunchedi)
[13:36:29] <wikibugs>	 10SRE, 10Observability-Logging, 10User-ema: rsyslog error: queue directory '/var/spool/rsyslog' and file name prefix 'output_kafka_json' already used - https://phabricator.wikimedia.org/T292180 (10fgiunchedi)
[13:37:47] <wikibugs>	 (03PS7) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[13:38:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:38:56] <wikibugs>	 (03CR) 10Jelto: [C: 03+2] helmfile.d:miscweb add node affinity to ssd nodes [deployment-charts] - 10https://gerrit.wikimedia.org/r/741124 (owner: 10Jelto)
[13:40:46] <wikibugs>	 10SRE, 10ops-eqiad, 10DC-Ops, 10SRE Observability (FY2021/2022-Q2): (Need By: TBD) rack/setup/install prometheus100[56] - https://phabricator.wikimedia.org/T294967 (10fgiunchedi)
[13:41:09] <wikibugs>	 (03PS8) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[13:41:19] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] Add ownership annotations for more Service SRE services [puppet] - 10https://gerrit.wikimedia.org/r/738426 (https://phabricator.wikimedia.org/T216088) (owner: 10Muehlenhoff)
[13:41:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:42:28] <wikibugs>	 (03Merged) 10jenkins-bot: helmfile.d:miscweb add node affinity to ssd nodes [deployment-charts] - 10https://gerrit.wikimedia.org/r/741124 (owner: 10Jelto)
[13:43:29] <wikibugs>	 (03PS9) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[13:44:00] <wikibugs>	 (03PS10) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[13:44:08] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32672/console" [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:44:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:46:03] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32673/console" [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:46:29] <logmsgbot>	 !log jelto@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
[13:46:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:38] <logmsgbot>	 !log jelto@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
[13:48:39] <wikibugs>	 (03CR) 10David Caro: ceph: move bootstrap keyring into new auth abstraction (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/742132 (https://phabricator.wikimedia.org/T293752) (owner: 10Arturo Borrero Gonzalez)
[13:48:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:51:55] <wikibugs>	 10Puppet, 10SRE, 10Infrastructure-Foundations: Role hieradata for non-existent roles - https://phabricator.wikimedia.org/T296533 (10Majavah)
[13:52:35] <wikibugs>	 (03PS11) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[13:52:59] <wikibugs>	 10SRE, 10DBA, 10Wikimedia-Mailing-lists, 10Schema-change: Mailman3 schema change:  Switch autoresponse_text fields to Text - https://phabricator.wikimedia.org/T286552 (10Ladsgroup) I ran it in the cloud. So far everything looks good.
[13:53:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:55:43] <wikibugs>	 (03PS12) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[13:56:20] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:57:17] <wikibugs>	 (03PS13) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[13:57:26] <wikibugs>	 (03PS5) 10Arturo Borrero Gonzalez: ceph: move bootstrap keyring into new auth abstraction [puppet] - 10https://gerrit.wikimedia.org/r/742132 (https://phabricator.wikimedia.org/T293752)
[13:57:52] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32677/console" [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:57:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[13:58:37] <wikibugs>	 (03CR) 10David Caro: [C: 03+1] hieradata: ceph: refresh bootstrap auth [labs/private] - 10https://gerrit.wikimedia.org/r/742133 (https://phabricator.wikimedia.org/T293752) (owner: 10Arturo Borrero Gonzalez)
[13:58:57] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: hieradata: ceph: refresh bootstrap auth [labs/private] - 10https://gerrit.wikimedia.org/r/742133 (https://phabricator.wikimedia.org/T293752)
[14:00:24] <wikibugs>	 10SRE, 10DBA, 10Wikimedia-Mailing-lists, 10Schema-change: Mailman3 schema change:  Switch autoresponse_text fields to Text - https://phabricator.wikimedia.org/T286552 (10Ladsgroup) Added a massive text to auto response and it worked fine meaning the schema change fixes the issue. I think we can move forwar...
[14:02:54] <wikibugs>	 (03PS14) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[14:03:29] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[14:03:31] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32678/console" [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[14:06:25] <wikibugs>	 (03PS15) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[14:07:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[14:08:23] <wikibugs>	 10SRE, 10DBA, 10Wikimedia-Mailing-lists, 10Schema-change: Mailman3 schema change:  Switch autoresponse_text fields to Text - https://phabricator.wikimedia.org/T286552 (10Marostegui) Sounds good to me, I can help with the deployment :)
[14:09:31] <wikibugs>	 (03PS16) 10Jbond: P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136
[14:10:57] <icinga-wm>	 RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.67 ms
[14:11:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:puppet_compiler::postgres_database: pass config via hiera [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[14:14:02] <wikibugs>	 (03PS17) 10Jbond: P:puppet_compiler: Refactor [puppet] - 10https://gerrit.wikimedia.org/r/742136
[14:15:51] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] P:puppet_compiler: Refactor [puppet] - 10https://gerrit.wikimedia.org/r/742136 (owner: 10Jbond)
[14:15:54] <wikibugs>	 10ops-codfw: logstash2028.mgmt flapping - https://phabricator.wikimedia.org/T296540 (10fgiunchedi)
[14:21:20] <logmsgbot>	 !log jelto@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
[14:21:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:35] <wikibugs>	 (03PS1) 10Jbond: puppet_compiler: mkdir_p workdir not vardir [puppet] - 10https://gerrit.wikimedia.org/r/742146
[14:23:32] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] puppet_compiler: mkdir_p workdir not vardir [puppet] - 10https://gerrit.wikimedia.org/r/742146 (owner: 10Jbond)
[14:25:41] <logmsgbot>	 !log jelto@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
[14:25:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:28:20] <wikibugs>	 (03PS1) 10Filippo Giunchedi: logstash: log receiver and instance alert labels [puppet] - 10https://gerrit.wikimedia.org/r/742147
[14:30:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] logstash: log receiver and instance alert labels [puppet] - 10https://gerrit.wikimedia.org/r/742147 (owner: 10Filippo Giunchedi)
[14:30:32] <logmsgbot>	 !log jelto@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
[14:30:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:54] <wikibugs>	 (03PS1) 10Jbond: puppet_compiler: dont use mkdir_p [puppet] - 10https://gerrit.wikimedia.org/r/742149
[14:37:31] <wikibugs>	 (03PS2) 10Jbond: puppet_compiler: dont use mkdir_p [puppet] - 10https://gerrit.wikimedia.org/r/742149
[14:38:03] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32681/console" [puppet] - 10https://gerrit.wikimedia.org/r/742149 (owner: 10Jbond)
[14:39:15] <wikibugs>	 (03PS2) 10Filippo Giunchedi: logstash: log additional alert labels [puppet] - 10https://gerrit.wikimedia.org/r/742147
[14:39:18] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] puppet_compiler: dont use mkdir_p [puppet] - 10https://gerrit.wikimedia.org/r/742149 (owner: 10Jbond)
[14:40:22] <wikibugs>	 (03CR) 10Awight: "I'm not sure how to run the image or the varnishtest, so the patch was made blindly." [puppet] - 10https://gerrit.wikimedia.org/r/742148 (https://phabricator.wikimedia.org/T296512) (owner: 10Awight)
[15:00:31] <wikibugs>	 (03PS1) 10MMandere: admin: Add user rosalie-wmde to releasers-wikibase [puppet] - 10https://gerrit.wikimedia.org/r/742152 (https://phabricator.wikimedia.org/T295765)
[15:03:15] <icinga-wm>	 PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[15:06:14] <wikibugs>	 10ops-eqiad, 10DC-Ops: hw troubleshooting: memory stick failure (uncorrectable error + reduced available memory) for db1102 - https://phabricator.wikimedia.org/T296546 (10jcrespo)
[15:06:30] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Reduce memory allocation for dbs at db1102 due to hw failure [puppet] - 10https://gerrit.wikimedia.org/r/742153 (https://phabricator.wikimedia.org/T296546)
[15:07:58] <wikibugs>	 10ops-eqiad, 10DC-Ops, 10Data-Persistence-Backup, 10database-backups, 10Patch-For-Review: hw troubleshooting: memory stick failure (uncorrectable error + reduced available memory) for db1102 - https://phabricator.wikimedia.org/T296546 (10jcrespo)
[15:08:15] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb: Reduce memory allocation for dbs at db1102 due to hw failure [puppet] - 10https://gerrit.wikimedia.org/r/742153 (https://phabricator.wikimedia.org/T296546) (owner: 10Jcrespo)
[15:15:41] <icinga-wm>	 RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 45.24 ms
[15:17:23] <icinga-wm>	 PROBLEM - Check systemd state on ores1008 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:31:42] <wikibugs>	 (03PS1) 10Jbond: puppet_compiler: additional volume is not ephemeral [puppet] - 10https://gerrit.wikimedia.org/r/742158
[15:33:01] <icinga-wm>	 PROBLEM - Disk space on ores1008 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=96%): /tmp 0 MB (0% inode=96%): /var/tmp 0 MB (0% inode=96%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1008&var-datasource=eqiad+prometheus/ops
[15:33:38] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] puppet_compiler: additional volume is not ephemeral [puppet] - 10https://gerrit.wikimedia.org/r/742158 (owner: 10Jbond)
[15:40:14] <jynus>	 whose handling orest lately? research? machine learning?
[15:40:19] <jynus>	 *ORES
[15:42:14] <elukey>	 ML :)
[15:42:30] <elukey>	 ah snap I see the alert, my highlights for IRC didn't work
[15:42:32] <elukey>	 sigh
[15:42:37] <jynus>	 deploy-cache seems the culprit
[15:43:19] <jynus>	 ah, no, that is not
[15:44:00] <elukey>	 it seems /var/tmp
[15:44:04] <jynus>	 yepo
[15:44:07] <jynus>	 *yep
[15:44:40] <elukey>	 ah lovely some fresh coredumps
[15:44:55] <elukey>	 celery indeed segfaulted
[15:45:35] <icinga-wm>	 RECOVERY - Check systemd state on ores1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:45:43] <jynus>	 for mysql we disabled cores- a 500GB files wasn't that useful :-)
[15:45:55] <jynus>	 as in, coredumps, not processor cores :-)
[15:46:45] <elukey>	 !log move /var/tmp/core/* to /srv/coredumps on ores1008 to free root space
[15:46:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:08] <elukey>	 yeah we should put a limit and/or move the target dir on a bigger partition
[15:49:46] <elukey>	 wow the disks are really slow
[15:50:42] <wikibugs>	 (03PS1) 10Jbond: P:ci::slave::labs::common: Add toggle for lvm managment [puppet] - 10https://gerrit.wikimedia.org/r/742163
[15:51:18] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32682/console" [puppet] - 10https://gerrit.wikimedia.org/r/742163 (owner: 10Jbond)
[15:51:34] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:ci::slave::labs::common: Add toggle for lvm managment [puppet] - 10https://gerrit.wikimedia.org/r/742163 (owner: 10Jbond)
[15:53:09] <wikibugs>	 (03PS2) 10Jbond: P:ci::slave::labs::common: Add toggle for lvm managment [puppet] - 10https://gerrit.wikimedia.org/r/742163
[15:53:51] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:ci::slave::labs::common: Add toggle for lvm managment [puppet] - 10https://gerrit.wikimedia.org/r/742163 (owner: 10Jbond)
[15:53:53] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32683/console" [puppet] - 10https://gerrit.wikimedia.org/r/742163 (owner: 10Jbond)
[15:54:07] <icinga-wm>	 RECOVERY - Disk space on ores1008 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1008&var-datasource=eqiad+prometheus/ops
[15:55:19] <wikibugs>	 (03PS3) 10Jbond: P:ci::slave::labs::common: Add toggle for lvm managment [puppet] - 10https://gerrit.wikimedia.org/r/742163
[15:55:58] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32684/console" [puppet] - 10https://gerrit.wikimedia.org/r/742163 (owner: 10Jbond)
[15:56:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:ci::slave::labs::common: Add toggle for lvm managment [puppet] - 10https://gerrit.wikimedia.org/r/742163 (owner: 10Jbond)
[15:58:23] <wikibugs>	 (03PS4) 10Jbond: P:ci::slave::labs::common: Add toggle for lvm managment [puppet] - 10https://gerrit.wikimedia.org/r/742163
[15:59:04] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32685/console" [puppet] - 10https://gerrit.wikimedia.org/r/742163 (owner: 10Jbond)
[16:00:13] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] P:ci::slave::labs::common: Add toggle for lvm managment [puppet] - 10https://gerrit.wikimedia.org/r/742163 (owner: 10Jbond)
[16:03:20] <wikibugs>	 (03PS1) 10Jelto: charts: fix affinity indentation in charts and scaffold chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/742166
[16:05:53] <arnoldokoth>	 !log drain kubestage1001 node in prep for decommissioning
[16:05:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:09:34] <wikibugs>	 (03CR) 10Btullis: "As discussed in the attached ticket, I propose that we abandon this CR and make a follow-up ticket to create an inventory of the paging le" [puppet] - 10https://gerrit.wikimedia.org/r/681420 (https://phabricator.wikimedia.org/T273064) (owner: 10Razzi)
[16:09:36] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): Update termbox to 2021-11-26-093451-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/742167 (https://phabricator.wikimedia.org/T296202)
[16:10:52] <wikibugs>	 (03Abandoned) 10Btullis: alerts: add victorops paging for hadoop master and kafka broker [puppet] - 10https://gerrit.wikimedia.org/r/681420 (https://phabricator.wikimedia.org/T273064) (owner: 10Razzi)
[16:11:03] <arnoldokoth>	 !log drain kubestage1002 node in prep for decommissioning
[16:11:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:15:25] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "I’m not sure if both files should be updated in the same commit, but it looks like this is what was done in the past, and if I understand " [deployment-charts] - 10https://gerrit.wikimedia.org/r/742167 (https://phabricator.wikimedia.org/T296202) (owner: 10Lucas Werkmeister (WMDE))
[16:20:39] <wikibugs>	 (03PS1) 10Majavah: admin: add .bashrc for taavi [puppet] - 10https://gerrit.wikimedia.org/r/742168
[16:23:22] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10User-jbond: puppetdb postgress server: fix dependcey loop - https://phabricator.wikimedia.org/T296550 (10jbond) p:05Triage→03Medium
[16:24:25] <wikibugs>	 10Puppet, 10Infrastructure-Foundations, 10User-jbond: puppetdb postgress server: fix dependcey loop - https://phabricator.wikimedia.org/T296550 (10jbond)
[16:24:32] <wikibugs>	 (03CR) 10Jelto: [C: 04-1] gitlab: restore script keep_config options (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/741675 (https://phabricator.wikimedia.org/T274463) (owner: 10AOkoth)
[16:28:07] <wikibugs>	 (03PS2) 10Majavah: admin: add .bashrc for taavi [puppet] - 10https://gerrit.wikimedia.org/r/742168
[16:37:09] <icinga-wm>	 PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[16:52:24] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] "I have seen and reviewed most of this code before in https://gitlab.com/wmde/wmde-technicalwishes-docker-dev/-/merge_requests/36/diffs and" [puppet] - 10https://gerrit.wikimedia.org/r/742148 (https://phabricator.wikimedia.org/T296512) (owner: 10Awight)
[16:53:23] <wikibugs>	 (03PS6) 10Arturo Borrero Gonzalez: ceph: move bootstrap keyring into new auth abstraction [puppet] - 10https://gerrit.wikimedia.org/r/742132 (https://phabricator.wikimedia.org/T293752)
[16:59:39] <wikibugs>	 (03PS4) 10Hnowlan: partman: add reuse partman profile for cassandra hosts [puppet] - 10https://gerrit.wikimedia.org/r/738924 (https://phabricator.wikimedia.org/T295375)
[16:59:56] <wikibugs>	 (03CR) 10Hnowlan: partman: add reuse partman profile for cassandra hosts (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/738924 (https://phabricator.wikimedia.org/T295375) (owner: 10Hnowlan)
[17:00:00] <wikibugs>	 (03PS5) 10Hnowlan: partman: add reuse partman profile for cassandra hosts [puppet] - 10https://gerrit.wikimedia.org/r/738924 (https://phabricator.wikimedia.org/T295375)
[17:06:27] <wikibugs>	 (03PS3) 10Jcrespo: admin: add .bashrc for taavi [puppet] - 10https://gerrit.wikimedia.org/r/742168 (owner: 10Majavah)
[17:07:32] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] admin: add .bashrc for taavi [puppet] - 10https://gerrit.wikimedia.org/r/742168 (owner: 10Majavah)
[17:15:17] <icinga-wm>	 RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.79 ms
[17:16:35] <wikibugs>	 (03CR) 10Jcrespo: "Is there some obscure puppet functionality I cannot see (eg. custom module outside production referencing the btulis files) or is this a m" [puppet] - 10https://gerrit.wikimedia.org/r/731403 (https://phabricator.wikimedia.org/T285754) (owner: 10Btullis)
[17:19:47] <wikibugs>	 (03PS1) 10Jcrespo: admin: Fix path of btullis' dotfiles and one script [puppet] - 10https://gerrit.wikimedia.org/r/742172 (https://phabricator.wikimedia.org/T285754)
[17:21:38] <wikibugs>	 (03CR) 10Jcrespo: "https://gerrit.wikimedia.org/r/c/operations/puppet/+/742172/" [puppet] - 10https://gerrit.wikimedia.org/r/731403 (https://phabricator.wikimedia.org/T285754) (owner: 10Btullis)
[17:27:25] <wikibugs>	 (03CR) 10Jbond: Add initial personal dotfiles and one script (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/731403 (https://phabricator.wikimedia.org/T285754) (owner: 10Btullis)
[17:27:38] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm thx" [puppet] - 10https://gerrit.wikimedia.org/r/742172 (https://phabricator.wikimedia.org/T285754) (owner: 10Jcrespo)
[17:30:25] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+1] "I think this should be safe to merge, but as I am about to leave for the weekend, I will let btullis themself merge at their convenience, " [puppet] - 10https://gerrit.wikimedia.org/r/742172 (https://phabricator.wikimedia.org/T285754) (owner: 10Jcrespo)
[17:42:58] <wikibugs>	 (03PS5) 10Hnowlan: C:cassandra: add optional java_package variable [puppet] - 10https://gerrit.wikimedia.org/r/722599 (https://phabricator.wikimedia.org/T261966) (owner: 10Jbond)
[17:44:33] <wikibugs>	 (03CR) 10Hnowlan: [V: 03+1] "PCC SUCCESS (DIFF 7): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32687/console" [puppet] - 10https://gerrit.wikimedia.org/r/722599 (https://phabricator.wikimedia.org/T261966) (owner: 10Jbond)
[17:56:52] <wikibugs>	 (03CR) 10Hnowlan: [V: 03+1] "lgtm, I can merge this on monday" [puppet] - 10https://gerrit.wikimedia.org/r/722599 (https://phabricator.wikimedia.org/T261966) (owner: 10Jbond)
[18:19:09] <wikibugs>	 (03PS7) 10Arturo Borrero Gonzalez: ceph: move bootstrap keyring into new auth abstraction [puppet] - 10https://gerrit.wikimedia.org/r/742132 (https://phabricator.wikimedia.org/T293752)
[18:19:11] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: profile: ceph: cleanup firewall config [puppet] - 10https://gerrit.wikimedia.org/r/742174
[18:19:13] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: ceph: auth: introduce new parameter 'import_to_ceph' [puppet] - 10https://gerrit.wikimedia.org/r/742175 (https://phabricator.wikimedia.org/T293752)
[18:19:15] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: ceph: migrate mon auth to the new abstraction [puppet] - 10https://gerrit.wikimedia.org/r/742176 (https://phabricator.wikimedia.org/T293752)
[18:21:03] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ceph: migrate mon auth to the new abstraction [puppet] - 10https://gerrit.wikimedia.org/r/742176 (https://phabricator.wikimedia.org/T293752) (owner: 10Arturo Borrero Gonzalez)
[18:22:32] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: ceph: migrate mon auth to the new abstraction [puppet] - 10https://gerrit.wikimedia.org/r/742176 (https://phabricator.wikimedia.org/T293752)
[18:24:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ceph: migrate mon auth to the new abstraction [puppet] - 10https://gerrit.wikimedia.org/r/742176 (https://phabricator.wikimedia.org/T293752) (owner: 10Arturo Borrero Gonzalez)
[18:44:18] <icinga-wm>	 RECOVERY - Check systemd state on ms-fe2010 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:50:54] <icinga-wm>	 PROBLEM - Check systemd state on ms-fe2010 is CRITICAL: CRITICAL - degraded: The following units failed: swift-proxy.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:11:22] <icinga-wm>	 PROBLEM - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp3054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[19:14:50] <wikibugs>	 10SRE, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review, 10Sustainability (Incident Followup): Use next-hop-self for iBGP sessions - https://phabricator.wikimedia.org/T295672 (10cmooney) Earlier in the week I attempted to remove the "metric-out minimum-igp" from the iBGP session between cr1-eqi...
[19:22:16] <icinga-wm>	 RECOVERY - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp3054 is OK: HTTP OK: HTTP/1.0 200 OK - 23674 bytes in 0.250 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[19:30:00] <nn1l2>	 Does anybody know when approximately Wikimedia will migrate from Gerrit to Gitlab?
[19:30:58] <majavah>	 "some date that's not in the past" is the best I have
[19:31:25] <AntiComposite>	 it also depends what you mean by "Wikimedia" and "migrate"
[19:32:04] <nn1l2>	 I'm reading Gerrit user manual. I want to know if it's worth of my time.
[19:33:21] <AntiComposite>	 I expect we'll be using Gerrit for a while. Repositories will slowly migrate to GitLab, as it becomes more stable/usable
[20:05:50] <icinga-wm>	 PROBLEM - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp3056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[20:06:13] <perryprog>	 I've wondered this as well. I've always found that Gerrit seems to be... just fine, and it surprises me that a move off it and all the tooling we already have for it is worth something.
[20:08:19] <perryprog>	 Although it's probably also because change is bad, everything old is good, everything new is bad, keep things the same, yada yada
[20:14:36] <icinga-wm>	 RECOVERY - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp3056 is OK: HTTP OK: HTTP/1.0 200 OK - 23694 bytes in 0.251 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[20:19:10] <icinga-wm>	 PROBLEM - SSH on contint1001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:33:41] <brennen>	 nn1l2: we are in the process of that migration.  see https://www.mediawiki.org/wiki/GitLab/Roadmap#pioneers
[20:34:15] <brennen>	 there is a channel for discussion and collaboration on migrating things at #wikimedia-gitlab
[20:38:48] <nn1l2>	 Thanks for the link!
[21:07:47] <wikibugs>	 (03CR) 10Ssingh: [C: 03+1] "confirmed UID, NDA, L3, SSH key." [puppet] - 10https://gerrit.wikimedia.org/r/742152 (https://phabricator.wikimedia.org/T295765) (owner: 10MMandere)
[21:31:54] <icinga-wm>	 PROBLEM - proton LVS codfw on proton.svc.codfw.wmnet is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received https://wikitech.wikimedia.org/wiki/Proton
[21:34:02] <icinga-wm>	 RECOVERY - proton LVS codfw on proton.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Proton
[22:14:24] <icinga-wm>	 PROBLEM - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[22:23:08] <icinga-wm>	 RECOVERY - Ensure traffic_exporter for the tls instance binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 23681 bytes in 0.249 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[23:22:40] <icinga-wm>	 RECOVERY - SSH on contint1001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:32:08] <icinga-wm>	 PROBLEM - Check systemd state on ores1007 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:34:00] <icinga-wm>	 PROBLEM - Router interfaces on mr1-drmrs is CRITICAL: CRITICAL: host 185.15.58.130, interfaces up: 34, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[23:37:52] <icinga-wm>	 PROBLEM - Host mr1-drmrs.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[23:37:52] <icinga-wm>	 PROBLEM - Host mr1-drmrs.oob is DOWN: PING CRITICAL - Packet loss = 100%
[23:37:58] <icinga-wm>	 PROBLEM - Check systemd state on ores1008 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:39:34] <icinga-wm>	 PROBLEM - Check systemd state on ores1006 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus_puppet_agent_stats.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:47:26] <icinga-wm>	 PROBLEM - Disk space on ores1007 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=96%): /tmp 0 MB (0% inode=96%): /var/tmp 0 MB (0% inode=96%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1007&var-datasource=eqiad+prometheus/ops
[23:51:18] <icinga-wm>	 PROBLEM - Disk space on ores1008 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=96%): /tmp 0 MB (0% inode=96%): /var/tmp 0 MB (0% inode=96%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1008&var-datasource=eqiad+prometheus/ops