[01:37:03] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [01:57:06] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.039 second response time [02:29:04] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [02:45:03] (03PS1) 10Samwilson: [DarkMode] Add new extension [integration/config] - 10https://gerrit.wikimedia.org/r/506934 (https://phabricator.wikimedia.org/T221877) [05:02:54] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<20.00%) [06:52:53] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:41:20] PROBLEM - Citoid on deployment-sca02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:51:10] RECOVERY - Citoid on deployment-sca02 is OK: HTTP OK: HTTP/1.1 200 OK - 921 bytes in 0.025 second response time [08:30:21] 10Continuous-Integration-Config, 10ArticleFeedbackv5, 10Patch-For-Review, 10User-D3r1ck01: Add CI to mediawiki/extensions/ArticleFeedbackv5 - https://phabricator.wikimedia.org/T218759 (10D3r1ck01) 05Open→03Resolved [08:39:56] 10MediaWiki-Codesniffer: Add sniff to require trailing commas in multiline arrays - https://phabricator.wikimedia.org/T222042 (10thiemowmde) This is one of these cases where I wonder if the effort is worth it. How often do we run into a situation where we think "dang, I wish a sniff would have reported this befo... [08:43:41] 10MediaWiki-Codesniffer: Add sniff to require trailing commas in multiline arrays - https://phabricator.wikimedia.org/T222042 (10D3r1ck01) > But **not** when the array is formatted like this: > `lang=php > $array = [ 'a' => 1, 'b' => 2 ]; > ` You're absolutely right per above, that is not a multiline array, it'... [09:40:26] 10Beta-Cluster-Infrastructure: Migrate away from Debian Jessie to Debian Stretch - https://phabricator.wikimedia.org/T218729 (10fgiunchedi) re: logstash, prod hosts are stretch so starting up a stretch instance with the same roles/hiera is expected to work. There will be a couple of migrations involved, namely m... [09:50:27] 10Beta-Cluster-Infrastructure, 10Thumbor: thumbor on deployment-imagescaler03 does not want to start with firejail private-dev rule - https://phabricator.wikimedia.org/T221879 (10Gilles) Thumbor appears to be running fine on that host right now, and I see the private-dev rule in /etc/firejail/thumbor.profile... [09:54:15] 10Beta-Cluster-Infrastructure: Puppet error on deployment-imagescaler03 due to conflicting Node.js packages - https://phabricator.wikimedia.org/T219089 (10Gilles) 05Open→03Resolved a:03Gilles Installed nodejs-legacy manually without issue on that host. Not sure why Puppet failed to do so. I see that the pa... [09:56:37] 10Beta-Cluster-Infrastructure, 10Thumbor: libpng problems on deployment-imagescaler02 - https://phabricator.wikimedia.org/T221880 (10Gilles) I'm going to guess that this host has an old version of 3d2png installed with node binaries compiled against Jessie instead of Stretch. libpng12-0 doesn't exist in Stretc... [10:11:19] 10Beta-Cluster-Infrastructure, 10Thumbor, 10Patch-For-Review: libpng problems on deployment-imagescaler02 - https://phabricator.wikimedia.org/T221880 (10Gilles) 05Open→03Resolved a:03Gilles [10:11:43] 10Beta-Cluster-Infrastructure, 10Thumbor, 10Patch-For-Review: libpng problems on deployment-imagescaler02 - https://phabricator.wikimedia.org/T221880 (10Gilles) I've deployed the Stretch version of 3d2png on both hosts, and that file renders fine. [11:15:02] 10Continuous-Integration-Infrastructure, 10Operations, 10Traffic, 10Patch-For-Review: Make CI run Varnish VCL tests - https://phabricator.wikimedia.org/T128188 (10ema) VTC tests can now be run from dev workstations against PCC: ` ema@ariel:~/wmf/operations-puppet$ cd modules/varnish/files/tests ; ./run.py... [11:54:04] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [11:56:35] 10Gerrit, 10Repository-Admins, 10Shape Expressions, 10Wikidata, 10Wikidata-Campsite: rename repository for WikibaseSchema - https://phabricator.wikimedia.org/T221946 (10WMDE-leszek) This would include: [ ] renaming the gerrit repository [ ] adjusting Jenkins CI config [ ] adjusting translatewiki/L10bot c... [11:59:06] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.068 second response time [11:59:44] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [12:10:03] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [12:20:05] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.026 second response time [12:23:58] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [13:35:08] PROBLEM - Citoid on deployment-sca02 is CRITICAL: connect to address 172.16.5.112 and port 1970: Connection refused [13:40:07] RECOVERY - Citoid on deployment-sca02 is OK: HTTP OK: HTTP/1.1 200 OK - 921 bytes in 0.026 second response time [13:56:49] 10Gerrit, 10Release-Engineering-Team, 10VPS-project-codesearch, 10Patch-For-Review: Gerrit thread use GC thrashing - https://phabricator.wikimedia.org/T221026 (10thcipriani) It seems like we've tuned a good number of gerrit parameters at this point and we're still experiencing GC thrashing (although less t... [14:43:14] 10Beta-Cluster-Infrastructure, 10Thumbor: thumbor on deployment-imagescaler03 does not want to start with firejail private-dev rule - https://phabricator.wikimedia.org/T221879 (10Krenair) I think Moritz removed the private-dev rule from the profile to start it, then put it back [14:44:27] PROBLEM - Host deployment-ms-be03 is DOWN: CRITICAL - Host Unreachable (172.16.5.51) [14:45:18] PROBLEM - Host deployment-ms-be04 is DOWN: CRITICAL - Host Unreachable (172.16.4.129) [14:49:18] 10Beta-Cluster-Infrastructure, 10Thumbor: thumbor on deployment-imagescaler03 does not want to start with firejail private-dev rule - https://phabricator.wikimedia.org/T221879 (10Gilles) Is private-dev problematic because it's a WMCS VM? @MoritzMuehlenhoff do you have any clue? [14:50:03] 10Beta-Cluster-Infrastructure: Migrate away from Debian Jessie to Debian Stretch - https://phabricator.wikimedia.org/T218729 (10Krenair) On Thursday when I get rid of deployment-ms-fe02 and deployment-poolcounter04 we should have enough room in the quota again to create an xlarge, at which point I can make a new... [14:54:37] 10Beta-Cluster-Infrastructure, 10Thumbor: thumbor on deployment-imagescaler03 does not want to start with firejail private-dev rule - https://phabricator.wikimedia.org/T221879 (10MoritzMuehlenhoff) I can't think of a reason why it should be problematic in an Cloud VPS VM and haven't had any time to investigate... [15:13:57] 10Release-Engineering-Team, 10Librarization, 10Quibble: Install extension require-dev dependencies in wmf-quibble-vendor-mysql-hhvm-docker - https://phabricator.wikimedia.org/T220723 (10EBernhardson) An additional complexity to keep in mind will be extensions that depend on other extensions. The initial work... [15:17:03] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Core Platform Team (Extension Management (TEC13)), 10Core Platform Team Kanban (Doing), 10Patch-For-Review: Determine a standard way of installing MediaWiki lib/extension dependencies within containers - https://phabricator.wikimedia.org/T193824 (... [15:18:37] 10Beta-Cluster-Infrastructure, 10User-DannyS712: Beta Cluster: Remove rights from DannyS712 - https://phabricator.wikimedia.org/T222077 (10DannyS712) [15:24:06] 10Beta-Cluster-Infrastructure, 10User-DannyS712: Beta Cluster: Remove rights from DannyS712 - https://phabricator.wikimedia.org/T222077 (10Krenair) 05Open→03Resolved a:03Krenair you didn't have bot but I removed interface admin [16:05:09] PROBLEM - Citoid on deployment-sca02 is CRITICAL: connect to address 172.16.5.112 and port 1970: Connection refused [16:10:07] RECOVERY - Citoid on deployment-sca02 is OK: HTTP OK: HTTP/1.1 200 OK - 921 bytes in 0.042 second response time [16:24:42] PROBLEM - Citoid on deployment-sca01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:27] PROBLEM - Host deployment-ms-fe02 is DOWN: CRITICAL - Host Unreachable (172.16.5.66) [16:40:46] PROBLEM - Host deployment-poolcounter04 is DOWN: CRITICAL - Host Unreachable (172.16.5.58) [16:49:32] RECOVERY - Citoid on deployment-sca01 is OK: HTTP OK: HTTP/1.1 200 OK - 921 bytes in 0.020 second response time [17:05:29] PROBLEM - Citoid on deployment-sca01 is CRITICAL: connect to address 172.16.5.13 and port 1970: Connection refused [17:05:44] zuul's now a open infrastructure project https://twitter.com/ZuulCI/status/1122581290396438528 [17:25:31] RECOVERY - Citoid on deployment-sca01 is OK: HTTP OK: HTTP/1.1 200 OK - 921 bytes in 0.027 second response time [17:39:34] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [17:52:38] * ottomata is looking for the mediawiki train deployment status page link... [17:54:23] * ottomata found it: https://tools.wmflabs.org/versions/ [18:26:59] 10Gerrit, 10Wikimedia-General-or-Unknown, 10Documentation, 10Epic, and 3 others: Update Gerrit /r/p/ links to /r/ - https://phabricator.wikimedia.org/T218844 (10bd808) [18:59:10] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [19:59:55] (03PS1) 10Jforrester: MakeWmfBranch: Support MW_VERSION as well as $wgVersion [tools/release] - 10https://gerrit.wikimedia.org/r/507112 [20:00:31] (03CR) 10jerkins-bot: [V: 04-1] MakeWmfBranch: Support MW_VERSION as well as $wgVersion [tools/release] - 10https://gerrit.wikimedia.org/r/507112 (owner: 10Jforrester) [20:02:43] 10Release-Engineering-Team, 10MediaWiki-Containers, 10Operations, 10Core Platform Team Kanban (Done with CPT), and 4 others: FY2017/18 Program 6 - Outcome 2 - Objective 3: Integrated, container-based development environment - https://phabricator.wikimedia.org/T170456 (10mobrovac) [20:03:09] (03PS2) 10Jforrester: MakeWmfBranch: Support MW_VERSION as well as $wgVersion [tools/release] - 10https://gerrit.wikimedia.org/r/507112 [20:54:02] 10Release-Engineering-Team (Kanban), 10Community-Tech: Phab/Wikitech/Gerrit accounts for Harumi Monroy - https://phabricator.wikimedia.org/T222110 (10MaxSem) [21:14:55] 10Gerrit, 10Release-Engineering-Team, 10VPS-project-codesearch, 10Patch-For-Review: Gerrit thread use GC thrashing - https://phabricator.wikimedia.org/T221026 (10Dzahn) - We are now using G1 GC - The following bug found by Paladox seems relevant: https://bugs.chromium.org/p/gerrit/issues/detail?id=3259#c5... [21:24:03] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [21:31:01] 10Gerrit, 10Release-Engineering-Team, 10VPS-project-codesearch, 10Patch-For-Review: Gerrit thread use GC thrashing - https://phabricator.wikimedia.org/T221026 (10thcipriani) I have noticed 2 problems: 1. GC spiral of doom 2. HTTP threads get stuck behind some lock held by a `SendEmail` thread. I had assu... [21:38:41] 10Release-Engineering-Team (Kanban), 10Community-Tech: Phab/Wikitech/Gerrit accounts for Harumi Monroy - https://phabricator.wikimedia.org/T222110 (10MaxSem) [21:44:04] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.028 second response time [21:50:07] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [21:58:57] 10Release-Engineering-Team (Kanban), 10Community-Tech: Phab/Wikitech/Gerrit accounts for Harumi Monroy - https://phabricator.wikimedia.org/T222110 (10MaxSem) [22:05:23] 10Release-Engineering-Team (Kanban), 10Community-Tech: Phab/Wikitech/Gerrit accounts for Harumi Monroy - https://phabricator.wikimedia.org/T222110 (10Krenair) github invite sent WMCS account is basically a wikitech account as it all comes from LDAP, will merge on the checklist [22:06:52] 10Release-Engineering-Team (Kanban), 10Community-Tech: Phab/Wikitech/Gerrit accounts for Harumi Monroy - https://phabricator.wikimedia.org/T222110 (10Krenair) [22:15:06] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.024 second response time [22:17:04] 10Release-Engineering-Team (Kanban), 10Community-Tech: Phab/Wikitech/Gerrit accounts for Harumi Monroy - https://phabricator.wikimedia.org/T222110 (10MaxSem) Thanks, I wasn't sure there were no extra steps due to heightened security. [22:43:49] 10Release-Engineering-Team (Kanban), 10Community-Tech: Phab/Wikitech/Gerrit accounts for Harumi Monroy - https://phabricator.wikimedia.org/T222110 (10Dzahn) ` Account created The user account for HMonroy (talk) has been created. 22:42, 29 April 2019 User account HMonroy (talk | contribs | block) was create... [22:44:19] 10Release-Engineering-Team (Kanban), 10Community-Tech: Phab/Wikitech/Gerrit accounts for Harumi Monroy - https://phabricator.wikimedia.org/T222110 (10Dzahn) [23:20:56] 10Beta-Cluster-Infrastructure, 10SDC General, 10Wikidata, 10serviceops, 10Services (done): No jobs running on beta cluster - https://phabricator.wikimedia.org/T215339 (10MMiller_WMF) [23:33:59] 10Release-Engineering-Team (Kanban), 10Community-Tech: Phab/Wikitech/Gerrit accounts for Harumi Monroy - https://phabricator.wikimedia.org/T222110 (10Dzahn) - reset the password to something random in LDAP - confirmed login works - sent personal email with the initial password asking for it to be changed