[00:08:06] 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Puppet: Puppet broken on deployment-kafka-jumbo-[12] due to version of a package being missing - https://phabricator.wikimedia.org/T184240#3881725 (10Krenair) [00:13:22] RECOVERY - Puppet errors on deployment-ms-be03 is OK: OK: Less than 1.00% above the threshold [0.0] [00:18:48] RECOVERY - Puppet errors on deployment-ms-be04 is OK: OK: Less than 1.00% above the threshold [0.0] [00:27:23] 10Beta-Cluster-Infrastructure, 10Operations, 10media-storage, 10Puppet: Puppet broken on deployment-ms-be0[34] with evaluation error in swift module - https://phabricator.wikimedia.org/T184236#3881727 (10Krenair) Looks like the reason is we have an old broken version of https://gerrit.wikimedia.org/r/#/c/3... [00:35:18] RECOVERY - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is OK: OK: Less than 100.00% above the threshold [0.0] [00:39:21] PROBLEM - Puppet errors on deployment-ms-be03 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [00:39:49] PROBLEM - Puppet errors on deployment-ms-be04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [00:43:10] RECOVERY - Puppet errors on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:54:20] RECOVERY - Puppet errors on deployment-ms-be03 is OK: OK: Less than 1.00% above the threshold [0.0] [00:54:49] RECOVERY - Puppet errors on deployment-ms-be04 is OK: OK: Less than 1.00% above the threshold [0.0] [00:57:53] 10Beta-Cluster-Infrastructure, 10Operations, 10media-storage, 10Patch-For-Review, 10Puppet: Puppet broken on deployment-ms-be0[34] with evaluation error in swift module - https://phabricator.wikimedia.org/T184236#3881742 (10Krenair) a:03Krenair Found a syntax problem in the latest version of it too (je... [01:03:31] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<55.56%) [01:09:10] PROBLEM - Puppet errors on deployment-sca02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [02:43:42] Project beta-scap-eqiad build #189826: 04FAILURE in 0.39 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/189826/ [02:51:44] 02:43:41 LockFailedError: Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "mwdeploy"; reason is "(no justification provided)" [02:51:44] 02:43:41 02:43:41 scap failed: LockFailedError Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "mwdeploy"; reason is "(no justification provided)" (duration: 00m 00s) [02:56:30] Yippee, build fixed! [02:56:30] Project beta-scap-eqiad build #189827: 09FIXED in 2 min 53 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/189827/ [04:00:07] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<55.56%) [04:59:29] PROBLEM - Puppet staleness on deployment-restbase01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [05:22:25] PROBLEM - Puppet staleness on deployment-restbase02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [05:56:36] 10Continuous-Integration-Config, 10Wiki-Loves-Monuments-Database, 10Patch-For-Review: Generate clover.xml for labs/tools/heritage - https://phabricator.wikimedia.org/T179054#3881834 (10Legoktm) >>! In T179054#3877791, @Lokal_Profil wrote: > Thanks! > > Possibly a new task: Could we also generate coverage fo... [07:05:09] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [08:19:29] (03PS1) 10Rafidaslam: Make Change-Id optional in non-gerrit repositories [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/402775 (https://phabricator.wikimedia.org/T179905) [08:20:34] 10commit-message-validator, 10Google-Code-in-2017, 10Patch-For-Review: Make Change-Id optional - https://phabricator.wikimedia.org/T179905#3881998 (10rafidaslam) I think it'd be neat if this tool was using a library like `GitPython` instead of calling `git` from shell. We could get some advantages like incre... [08:46:11] (03CR) 10Hashar: "I love it. That makes the content easy to discover!" [integration/docroot] - 10https://gerrit.wikimedia.org/r/402482 (owner: 10Legoktm) [09:04:07] (03CR) 10Hashar: "You can probably speed it up by first checking whether there is a .gitreview file pointing to your Gerrit." (031 comment) [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/402775 (https://phabricator.wikimedia.org/T179905) (owner: 10Rafidaslam) [09:13:06] 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-netbox, looks like it thinks its a prod box - https://phabricator.wikimedia.org/T184242#3882064 (10hashar) Seems like deployment-netbox fails to setup the LetsEncrypt certificate because it is coded to use the production URL (netbox.wikimedia... [09:15:26] 10commit-message-validator, 10Google-Code-in-2017, 10Patch-For-Review: Make Change-Id optional - https://phabricator.wikimedia.org/T179905#3882084 (10jayvdb) Just do it. [09:19:17] 10Continuous-Integration-Infrastructure, 10Operations, 10Traffic: Lower varnish caching length on doc.wikimedia.org - https://phabricator.wikimedia.org/T184255#3877424 (10ema) Yes Apache should send the `Cache-Control` header for that purpose. Eg: `Cache-control: s-maxage=3600, must-revalidate, max-age=0` [09:19:30] 10Continuous-Integration-Infrastructure, 10Operations, 10Traffic: Lower varnish caching length on doc.wikimedia.org - https://phabricator.wikimedia.org/T184255#3882097 (10ema) p:05Triage>03Normal [09:28:51] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Operations, 10Patch-For-Review, 10Puppet: Get rid of "import realm.pp" in manifests/site.pp - https://phabricator.wikimedia.org/T154915#3882111 (10hashar) Pending https://gerrit.wikimedia.org/r/#/c/333012/ to have puppet-syntax to f... [10:29:43] 10Gerrit: @Eisenhaus335 probably needs some help over at Gerrit - https://phabricator.wikimedia.org/T183797#3882279 (10Aklapper) 05stalled>03declined Well, no reply by @Eisenhaus335, unfortunately. [11:12:26] 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#3882348 (10hashar) [11:12:28] 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Puppet: Puppet broken on deployment-mediawiki07, deployment-imagescaler02, deployment-redis06, deployment-videoscaler01 due to prometheus exporter packages being missing in stretch - https://phabricator.wikimedia.org/T184239#3882346 (10hashar) 05Open>03... [11:14:07] 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Puppet: Puppet broken on deployment-mediawiki07, deployment-imagescaler02, deployment-redis06, deployment-videoscaler01 due to prometheus exporter packages being missing in stretch - https://phabricator.wikimedia.org/T184239#3882351 (10MoritzMuehlenhoff) C... [11:19:05] 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Puppet: Puppet broken on deployment-mediawiki07, deployment-imagescaler02, deployment-redis06, deployment-videoscaler01 due to prometheus exporter packages being missing in stretch - https://phabricator.wikimedia.org/T184239#3877128 (10ArielGlenn) Welp, it... [11:20:41] 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Puppet: Puppet broken on deployment-kafka-jumbo-[12] due to version of a package being missing - https://phabricator.wikimedia.org/T184240#3882395 (10Paladox) @Krenair the change was merged now, should we close as resolved? :) [11:54:57] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown, 10Multi-Content-Revisions, 10User-Addshore: mediawiki tests fail in mediawiki code coverage test - https://phabricator.wikimedia.org/T183777#3863386 (10Addshore) Hmmm, what is the difference in the setup for tests / phpunit between the regula... [12:07:20] 10Release-Engineering-Team (Watching / External), 10Wikidata, 10Patch-For-Review, 10User-Addshore: Undeploy the Wikidata extension - https://phabricator.wikimedia.org/T181708#3882487 (10Addshore) @demon Does it count as undeployed once removed from that script? :) [12:07:37] 10Gerrit, 10Upstream: Gerrit should feature customizable message on Login page (No 'Forgot password' link in the gerrit login page.) - https://phabricator.wikimedia.org/T60205#3882489 (10Paladox) I think we could potentially add support for this upstream if we follow how they do it for the error message. We s... [12:28:59] 10Continuous-Integration-Config, 10BlueSpice, 10Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811#3882538 (10Osnard) Thanks for the hint. I'll take care of it asap. [12:29:03] PROBLEM - Puppet errors on deployment-sca03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:31:08] PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:32:00] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:32:40] PROBLEM - Puppet errors on deployment-parsoid09 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:32:56] PROBLEM - Puppet errors on deployment-mcs01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:34:55] PROBLEM - Puppet errors on deployment-cpjobqueue is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:36:59] PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:39:11] PROBLEM - Puppet errors on deployment-cassandra3-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:39:48] PROBLEM - Puppet errors on deployment-changeprop is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:40:32] PROBLEM - Puppet errors on deployment-sca01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:42:50] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:42:58] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:44:32] PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:45:19] PROBLEM - Puppet errors on deployment-cassandra3-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:45:47] PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:47:01] PROBLEM - Puppet errors on deployment-tmh01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:47:53] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:50:04] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:50:28] PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:52:02] PROBLEM - Puppet errors on deployment-eventlog02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:52:42] PROBLEM - Puppet errors on deployment-jobrunner02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:55:29] PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:57:35] PROBLEM - Puppet errors on deployment-mediawiki05 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:58:50] PROBLEM - Puppet errors on deployment-zotero01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:14:07] (03PS1) 10Robert Vogel: Add dependencies to BlueSpice* extensions [integration/config] - 10https://gerrit.wikimedia.org/r/402826 [13:15:31] (03CR) 10jerkins-bot: [V: 04-1] Add dependencies to BlueSpice* extensions [integration/config] - 10https://gerrit.wikimedia.org/r/402826 (owner: 10Robert Vogel) [13:42:50] 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team, 10Wikimedia-log-errors: Move beta cluster ORES to its own machine - https://phabricator.wikimedia.org/T184282#3882712 (10awight) It seems we have an ORES Redis node in deployment-prep, which is unnecessary. That role should be fulfilled by th... [13:45:22] (03PS2) 10Robert Vogel: Add dependencies to BlueSpice* extensions [integration/config] - 10https://gerrit.wikimedia.org/r/402826 [13:51:19] 10Continuous-Integration-Config, 10BlueSpice, 10Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811#3882725 (10Osnard) @Umherirrender As a first step I've added the dependencies as suggested [1]. I also tried to add "ExtJSBase" as an depenency to "BlueSpiceFoun... [14:19:24] 10Continuous-Integration-Config, 10Operations: Puppet tox: properly lint both Py2 and Py3 files - https://phabricator.wikimedia.org/T184435#3882789 (10Volans) [14:19:33] 10Continuous-Integration-Config, 10Operations: Puppet tox: properly lint both Py2 and Py3 files - https://phabricator.wikimedia.org/T184435#3882801 (10Volans) p:05Triage>03Normal [14:56:11] 10Release-Engineering-Team (Kanban), 10VisualEditor, 10User-Ryasmeen, 10User-zeljkofilipin: LanguageScreenshotBot trying to edit a non-existent page without signing in - https://phabricator.wikimedia.org/T162454#3882938 (10zeljkofilipin) 05Open>03Resolved This has been resolved a long time ago. [15:10:10] (03CR) 10Legoktm: [C: 032] Fix SpaceyParenthesisSniff comment detection for ignore statements [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/402605 (owner: 10Umherirrender) [15:11:29] (03Merged) 10jenkins-bot: Fix SpaceyParenthesisSniff comment detection for ignore statements [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/402605 (owner: 10Umherirrender) [15:17:00] (03CR) 10jenkins-bot: Fix SpaceyParenthesisSniff comment detection for ignore statements [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/402605 (owner: 10Umherirrender) [16:26:40] (03PS3) 10WMDE-leszek: Only run npm job for changes in data-values/value-view [integration/config] - 10https://gerrit.wikimedia.org/r/383872 (https://phabricator.wikimedia.org/T178083) [16:29:20] (03CR) 10Thiemo Kreuz (WMDE): [C: 031] "This currently makes CI for the value-view component fail, see Iba30a82. There is a "composer validate" command executed, and it fails, bu" [integration/config] - 10https://gerrit.wikimedia.org/r/383872 (https://phabricator.wikimedia.org/T178083) (owner: 10WMDE-leszek) [16:30:12] (03CR) 10Addshore: [C: 032] Only run npm job for changes in data-values/value-view [integration/config] - 10https://gerrit.wikimedia.org/r/383872 (https://phabricator.wikimedia.org/T178083) (owner: 10WMDE-leszek) [16:31:15] (03Merged) 10jenkins-bot: Only run npm job for changes in data-values/value-view [integration/config] - 10https://gerrit.wikimedia.org/r/383872 (https://phabricator.wikimedia.org/T178083) (owner: 10WMDE-leszek) [16:32:10] !log reloaded zuul for https://gerrit.wikimedia.org/r/#/c/383872/ [16:32:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:38:54] 10Continuous-Integration-Infrastructure, 10Wikidata Query UI, 10Jenkins, 10Patch-For-Review: wikidata/query/gui CI job lacks PhantomJS / proper browsers - https://phabricator.wikimedia.org/T183831#3883301 (10Lucas_Werkmeister_WMDE) Is there anything left to do on this task or can we close it? I’m not sure... [16:41:36] 10Beta-Cluster-Infrastructure, 10Analytics, 10Puppet: Puppet broken on deployment-kafka03 due to full disk - https://phabricator.wikimedia.org/T184235#3883319 (10fdans) [16:42:54] (03PS1) 10WMDE-leszek: Use npm browser job for data-values/value-view [integration/config] - 10https://gerrit.wikimedia.org/r/402862 (https://phabricator.wikimedia.org/T178083) [16:45:09] (03CR) 10Addshore: [C: 032] Use npm browser job for data-values/value-view [integration/config] - 10https://gerrit.wikimedia.org/r/402862 (https://phabricator.wikimedia.org/T178083) (owner: 10WMDE-leszek) [16:46:35] (03CR) 10jerkins-bot: [V: 04-1] Use npm browser job for data-values/value-view [integration/config] - 10https://gerrit.wikimedia.org/r/402862 (https://phabricator.wikimedia.org/T178083) (owner: 10WMDE-leszek) [16:46:35] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Installation method for Minikube on CI for k8s testing - https://phabricator.wikimedia.org/T184457#3883381 (10thcipriani) [16:47:31] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Verify functionality of the 'production' image in the context of an isolated k8s deployment - https://phabricator.wikimedia.org/T183165#3883395 (10thcipriani) [16:47:33] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Installation method for Minikube on CI for k8s testing - https://phabricator.wikimedia.org/T184457#3883394 (10thcipriani) [16:47:51] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Installation method for Minikube on CI for k8s testing - https://phabricator.wikimedia.org/T184457#3883381 (10thcipriani) p:05Triage>03Normal [16:55:39] 10Continuous-Integration-Config, 10BlueSpice, 10Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811#3883435 (10Umherirrender) >>! In T130811#3882725, @Osnard wrote: > @Umherirrender As a first step I've added the dependencies as suggested [1]. I also tried to a... [16:56:54] (03CR) 10WMDE-leszek: "Okay, so I wanted to be smart, but obviously am not." [integration/config] - 10https://gerrit.wikimedia.org/r/402862 (https://phabricator.wikimedia.org/T178083) (owner: 10WMDE-leszek) [16:59:35] 10Release-Engineering-Team (Kanban), 10Phabricator: Test phabricator translations on phab.wmflabs.org - https://phabricator.wikimedia.org/T184459#3883457 (10mmodell) p:05Triage>03Low [17:11:13] 10Release-Engineering-Team (Kanban), 10User-greg, 10User-zeljkofilipin: Create #wikimedia-releng-feed and move bots there - https://phabricator.wikimedia.org/T181582#3794965 (10mmodell) I used to rely on wikibugs for @mention notifications but now that we have notification popups in phabricator I don't rely... [17:22:36] (03PS1) 10Umherirrender: Remove direction from @param [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/402871 [17:28:51] PROBLEM - Puppet errors on deployment-snapshot01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:40:46] 10MediaWiki-Codesniffer: Validate order of Doxygen annotations in documentation comments - https://phabricator.wikimedia.org/T175374#3883587 (10Krinkle) [17:40:52] 10MediaWiki-Codesniffer: Validate order of Doxygen annotations in documentation comments - https://phabricator.wikimedia.org/T175374#3591805 (10Krinkle) p:05Triage>03Normal [17:54:03] 10Continuous-Integration-Infrastructure (shipyard), 10Operations: npm 1.4.21 can't use a http proxy - https://phabricator.wikimedia.org/T183569#3883638 (10hashar) a:05Joe>03None Resetting assignee, came from the parent task. Potentially we could rebuild the Jessie package `node-tunnel-agent` with patch h... [17:54:11] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Operations: npm 1.4.21 can't use a http proxy - https://phabricator.wikimedia.org/T183569#3883640 (10hashar) [18:00:06] 10Release-Engineering-Team (Watching / External), 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Allow "releasers-mediawiki" sudo rights to manage Jenkins - https://phabricator.wikimedia.org/T183972#3883662 (10RobH) Please note this was approved in the ops meeting (typo to fix in patchset). I'm... [18:02:43] (03PS1) 10Hashar: docker image for commited node_modules [integration/config] - 10https://gerrit.wikimedia.org/r/402876 [18:02:56] (03CR) 10Hashar: [C: 04-2] docker image for commited node_modules [integration/config] - 10https://gerrit.wikimedia.org/r/402876 (owner: 10Hashar) [18:08:42] (03CR) 10Hashar: [C: 04-2] "That works for cxserver, but would not for services that requires extra packages." [integration/config] - 10https://gerrit.wikimedia.org/r/402876 (owner: 10Hashar) [18:09:19] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3883712 (10mmodell) @krenair: that should be fixed as soon as jenkins is finished building https://integration.wikimedia.org/ci/job/phabricator-jessie-commits/896/ [18:11:24] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3883718 (10mmodell) @mobrovac: your problem should also be fixed, as of the aforementioned build. Going forward, we will be making some changes to the scap CI proc... [18:16:03] hashar: hey, I'm not an admin in deployment-prep. Can I be? I thought I was but it seems either I was wrong or it's taken away [18:17:36] I need it to clean up ores service nodes there [18:23:19] thcipriani: ^ [18:23:58] Amir1: oh sure, lemme check what the deal is there. I thought you were an admin on that project, too. [18:24:27] 10Release-Engineering-Team (Kanban), 10User-greg, 10User-zeljkofilipin: Create #wikimedia-releng-feed and move bots there - https://phabricator.wikimedia.org/T181582#3883768 (10greg) 05Open>03declined Verdict: After we cleared up the browser test spam (of tests we don't own nor respond to) we're a lot be... [18:24:32] My guess is accidental removal i do it all the time with other things [18:25:49] Amir1: you should be an admin now [18:26:36] thcipriani: thank you very much [18:27:08] I'm going to make the whole service down for half an hour probably, it has been down for weeks already [18:27:13] so I guess it's okay [18:27:49] sounds like it won't be disruptive :) [18:29:25] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3883809 (10mobrovac) Heh, as far as I can see, the build completed, but I'm still stuck with the same problem and the same Scap version on `deployment-tin` as befor... [18:29:46] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3883810 (10mmodell) dependent builds are stuck... I'm working on it [18:32:32] !log doing https://www.mediawiki.org/wiki/Continuous_integration/Architecture/Troubleshooting#Jenkins_executioner_lock to fix deployment-tin executioner lock stalling postmerge. [18:32:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:38:22] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3874981 (10greg) ``` 18:32:32 +thcipriani | !log doing https://www.mediawiki.org/wiki/Continuous_integration/Architecture/Troubleshooting#Jenkins_executioner_... [18:53:31] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<11.11%) [18:54:27] PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:56:58] PROBLEM - Puppet staleness on deployment-kafka03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [18:57:23] I'm sorry, I can't finish it today but I didn't bring down anything (yet) so it should be okay overnight, the only thing is that I added a new instance that might reduce the quota but will free a lot by deleting two tomorrow [18:58:47] Amir1: ack [19:07:03] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3883893 (10Krenair) >>! In T184176#3883712, @mmodell wrote: > @krenair: that should be fixed as soon as jenkins is finished building https://integration.wikimedia.o... [19:11:53] (03CR) 10Hashar: "The -gate job failed because of:" [integration/config] - 10https://gerrit.wikimedia.org/r/402862 (https://phabricator.wikimedia.org/T178083) (owner: 10WMDE-leszek) [19:18:31] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<44.44%) [19:52:02] RECOVERY - Puppet errors on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:52:58] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [19:54:32] RECOVERY - Puppet errors on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:54:52] legoktm: It looks like https://phabricator.wikimedia.org/T179055 is resolved, or is that not yet committed to git? [19:55:00] given it now shows up on doc.wm.o [19:55:03] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [19:55:19] RECOVERY - Puppet errors on deployment-cassandra3-02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:55:25] Krinkle: https://www.mediawiki.org/w/index.php?title=User:Legoktm&diff=prev&oldid=2681459 :) [19:55:33] * greg-g was just going through my watchlist [19:55:40] greg-g: Thanks [19:55:49] RECOVERY - Puppet errors on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:57:41] RECOVERY - Puppet errors on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:57:49] RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [19:58:48] 10Continuous-Integration-Infrastructure, 10RemexHtml, 10Patch-For-Review: Figure out how to speed up RemexHtml coverage runs - https://phabricator.wikimedia.org/T179055#3884111 (10Krinkle) @Legoktm RemexHtml now shows up on . Was that a local change on Jenkins, or did it get... [20:00:27] RECOVERY - Puppet errors on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0] [20:02:03] RECOVERY - Puppet errors on deployment-eventlog02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:02:35] RECOVERY - Puppet errors on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0] [20:02:39] 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-eventlogging04 due to missing repo on deployment-tin? - https://phabricator.wikimedia.org/T184238#3877100 (10mmodell) I replaced the DEPLOY_HEAD file by running `scap deploy` and then I ran into a different error, which I'm fixing, then this... [20:05:32] RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:06:06] RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:07:58] RECOVERY - Puppet errors on deployment-mcs01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:08:43] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [20:08:51] RECOVERY - Puppet errors on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:09:05] RECOVERY - Puppet errors on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:12:00] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [20:14:08] RECOVERY - Puppet errors on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:14:12] RECOVERY - Puppet errors on deployment-cassandra3-01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:14:48] RECOVERY - Puppet errors on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0] [20:14:55] RECOVERY - Puppet errors on deployment-cpjobqueue is OK: OK: Less than 1.00% above the threshold [0.0] [20:17:50] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:20:31] RECOVERY - Puppet errors on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:21:27] PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:22:44] :) [20:24:14] 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-eventlogging04 due to missing repo on deployment-tin? - https://phabricator.wikimedia.org/T184238#3884195 (10mmodell) Now I get an error because /var/lib/superset does not exist: ``` Notice: /Stage[main]/Superset/Exec[init_superset]/returns:... [20:24:57] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3884199 (10mmodell) That should be resolved now. Package upgraded properly on the nodes I've tested. [20:29:58] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3884213 (10mmodell) ``` twentyafterfour@deployment-tin:/srv/deployment/mathoid/deploy$ scap deploy 20:28:58 Started deploy [mathoid/deploy@c9957ce] (beta) 20:28:58... [20:30:14] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: Scap not working in Beta - https://phabricator.wikimedia.org/T184176#3884215 (10mmodell) 05Open>03Resolved [20:30:33] twentyafterfour: So the issue with ActiveAbstracts T184177 still isn't fixed in master, so I added it to wmf.16 blockers. Fix is easy: revert 2 master patches in branch. [20:30:34] T184177: Abstract dumps broken by MW deploy - https://phabricator.wikimedia.org/T184177 [20:32:30] Maybe should be reverted in master but idk it violates PHPCS [20:33:43] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [20:34:13] special guest appearance, tonight only! [20:34:24] Welcome apergos! Was just telling twentyafterfour about ActiveAbstracts being busted [20:34:30] yeah [20:34:31] So need caution with wmf.16 again [20:34:39] well it's still broken in master so [20:34:49] we can't just grab master for wmf.16 [20:35:08] the joys of having a beta testbed for dumps :-P [20:39:19] Were there changes to backup.inc in core? [20:39:27] I'm wondering if that new format for --plugin is doing it [20:39:46] what I remember was that classes were moved into their own file [20:40:10] $dumper = new BackupDumper( [ [20:40:10] "--plugin=AbstractFilter:$IP/extensions/ActiveAbstract/AbstractFilter.php", [20:40:10] "--current", "--output=file:" . $fname, "--filter=namespace:NS_MAIN", [20:40:10] "--filter=noredirect", "--filter=abstract" [20:40:10] ] ); [20:40:22] Like, how does it know where to find noredirect? [20:40:26] (so the https://gerrit.wikimedia.org/r/#/c/398629/ noredirectfilter for example) [20:40:27] and then [20:40:49] Ah, $dumper->registerFilter( 'noredirect', 'NoredirectFilter' ); [20:40:59] https://gerrit.wikimedia.org/r/#/c/400397/1/AbstractFilter.php [20:41:02] and there we have it [20:42:01] RECOVERY - Puppet errors on deployment-aqs03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:42:14] huh [20:42:27] you no I didn't even know that there was a bot reporting beta errors in here [20:42:29] "Autoloading is not available if using PHP in CLI interactive mode." [20:42:32] That sounds fun ^ [20:42:34] THANKS [20:42:37] thanks a whole lot [20:42:38] (but red herring, it's interactive mode) [20:42:41] RECOVERY - Puppet errors on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0] [20:42:45] ah right [20:42:47] whew [20:44:34] I'm curious though if it has to do with dynamic class instantiation. [20:44:51] Like, "new $foobar" has issues with autoloading? [20:44:57] I'm finding some weird stuff re: namespacing there [20:45:14] it's possible [20:47:47] Using ::class might help [20:48:37] Like NoredirectFilter::class would trigger autoloader. [20:48:46] (and protect against namespace refactoring) [20:48:49] We should test this [20:49:07] well, I have a dumps command ready to go on snapshot01 in beta [20:49:22] so all that needs to happen is for that change to get onto php-master on snapshot01 [20:49:40] just let's bear in mind that puppet will write over it when it runs (sync) [20:50:05] Actually, that....might work? [20:50:37] well [20:50:45] you should have access tothe instance I think [20:50:47] Wait. [20:50:53] RedirectFilter should have a register() method [20:50:55] do you want to edit the file in place and poke me to run? [20:50:57] Like GoogleCoopFilter [20:51:00] And AbstractFilter [20:51:02] Should've been moved [20:51:04] With the refactor [20:51:11] moved... where? [20:51:19] When it was moved to its own class [20:51:21] One sec. [20:51:26] k [20:53:03] oh I see, literally the code was just moved to a separe file and that was it [20:53:05] huh [20:54:32] yeah abtractfilter tries to register the noredirect one, it's still left in there [20:57:33] https://gerrit.wikimedia.org/r/#/c/402911/ [20:57:51] Untested, but that's my *thinking* right now [20:58:27] Basically, AbstractFilter registered two filters, but each should register its own? If not: that seems like a violation of isolating your concerns. [21:01:26] RECOVERY - Puppet errors on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0] [21:03:00] Can we test master + that change? [21:03:13] self::CLASS, very fancy [21:03:17] yeah why not [21:03:56] let's see if I am permitted to cherry-pick that into /srv/mediawiki on snapshot01 [21:05:45] no because it's not a git repo at that end [21:05:48] grrrrr [21:06:41] PROBLEM - Puppet errors on deployment-ores01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:08:21] PROBLEM - Puppet errors on deployment-mx02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:11:23] Unrecognized filter type 'noredirect' [21:11:44] so that's different than the last time which was Fatal error: Class 'NoredirectFilter' not found in /srv/mediawiki/php-1.31.0-wmf.15/maintenance/backup.inc on line 212 [21:13:23] Ah ok. Soooooo [21:13:34] So, we should keep it in ActiveAbstract [21:13:40] But swapping for ::CLASS should fix it [21:13:41] Lemme amend [21:15:31] 10Beta-Cluster-Infrastructure, 10Services, 10Puppet: Puppet disabled for a month on deployment-restbase instances - https://phabricator.wikimedia.org/T184477#3884325 (10Krenair) [21:16:05] 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#3884337 (10Krenair) [21:16:08] 10Beta-Cluster-Infrastructure, 10Services, 10Puppet: Puppet disabled for a month on deployment-restbase instances - https://phabricator.wikimedia.org/T184477#3884336 (10Krenair) [21:16:09] apergos: If this works, I'm inclined to bring up a larger discussion of "Dynamic class construction considered harmful" [21:16:11] :( [21:16:26] PS2 posted [21:16:28] 10Beta-Cluster-Infrastructure, 10Services, 10Puppet: Puppet disabled for a month on deployment-restbase0[12] instances - https://phabricator.wikimedia.org/T184477#3884325 (10Krenair) [21:17:20] Well, that we should be using Foo::CLASS to opportunistically trigger the autoloader earlier, rather than waiting for "new $foo" [21:18:25] [ae5194061953f8b0450a75d9] [no req] Error from line 212 of /srv/mediawiki/php-master/maintenance/backup.inc: Class 'NoredirectFilter' not found [21:18:39] let's not get too far ahead of ourselves :-P [21:20:18] any thoughts, no_justification? [21:21:06] 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team, 10Puppet: Puppet broken on deployment-ores01 due to missing hieradata - https://phabricator.wikimedia.org/T184478#3884352 (10Krenair) p:05Triage>03Normal [21:21:28] 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team, 10Puppet: Puppet broken on deployment-ores01 due to missing hieradata - https://phabricator.wikimedia.org/T184478#3884365 (10Krenair) It actually looks like no one but me has logged onto this thing [21:23:13] Hmmmmm [21:23:30] also the patch author left you a message on the new changeset [21:23:40] so at least is available to discuss [21:24:05] He's usually in #-operations [21:24:14] Ah, isn't rn [21:24:16] Must be offline [21:24:31] well I mean just left you a message now [21:24:32] so [21:24:37] :) [21:24:55] this is real stinker [21:25:14] brion was right to get out of the dump business :-P [21:25:17] (who wrote those) [21:25:24] *is a real [21:25:40] heh [21:26:50] as the great bugs bunny said, "ain't i a stinker" [21:26:55] we're fighitng the "each class in it's own file, but plugins! but registration. but blerg" in activeabstracts, if you want to catch up :-P [21:26:59] heh [21:27:05] apergos: Oh, sorta related cuz I was futzing with this area of the code.... https://gerrit.wikimedia.org/r/#/c/402936/ [21:27:05] got another thing for you while you're here apergos. assuming you have time - if not I can just make a task [21:27:07] you know I can totally hear that line in his voice too [21:27:11] (heh, super safe code moves!) [21:27:13] (ironic [21:27:22] Krenair: let's hear it, it might be a task anyways (11:30 pm) [21:27:24] but shoot [21:27:44] apergos, puppet on deployment-snapshot01 [21:27:54] brion: Best I can tell, $foo = 'Bar'; new $foo() has Issues(tm) triggering the autoloader? [21:28:01] Maybe slightly more specific to CLI mode [21:28:04] error due to php-wikidiff2 package having unmet dependencies [21:28:09] no_justification: is that part of https://phabricator.wikimedia.org/T182814 ? [21:28:09] hmmmmm, it *ought* to work [21:28:17] which I pinged on and got 0 reply btw [21:28:21] Krenair is that host stretch? [21:28:25] does class_exists( $foo ) still trigger autoload? [21:28:32] paladox, yep [21:28:40] That package dosen't work for me either [21:28:41] brion: It should. I used to have a method in Autoloader to force it in fact [21:28:42] Krenair: puppet on deployment-snap01 is busted til we have a wikidiff package in stretch backports [21:28:46] ok [21:28:47] i think it's due to it needing php5. [21:28:51] is there a task for that apergos ? [21:28:52] So wrapping things in class_exists() should force it. [21:28:57] and we'll have that as soon as moritz digs himself out from all the security updates to add it to the repo [21:29:02] :) [21:29:11] I mean, puppet does run, it just whines about that one thing [21:29:32] yes there's a task, lemme find [21:29:36] true but I don't like puppet to have any errors [21:29:38] php-wikidiff2 : Depends: phpapi- but it is not installable [21:29:54] usually means something is broken in several ways [21:30:21] Krenair: I know hashar has some history here re: wikidiff2 packages. I think we faced something similar in CI instances [21:30:25] Wait, nvm. [21:30:35] (sorry for ping, I was confusing it with libtidy) [21:30:40] np [21:30:45] * no_justification should focus on one thing at a time [21:30:46] https://phabricator.wikimedia.org/T184270 Krenair [21:30:55] Looking around, i see a very old package in ubuntu [21:31:03] lego ktm built the package already [21:31:08] it's just waiting to go to the repo [21:31:09] none in debian. the ubuntu one looks to be virtual and provides php5-common. [21:31:19] oh [21:31:40] and once it does, depending on where it is, I might need to update my patch to pin backports for that package too, but that's 5 minutes [21:31:55] is this in backup.inc's "filter" loading that it's failing? [21:32:08] thanks apergos [21:32:10] 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#3884405 (10Krenair) -snapshot01 is T184270 (package it wants is missing from stretch, moritz to fix when higher priority things are done) [21:32:13] yw [21:32:16] brion: uh [21:33:07] [8aa441d5af3d953d776f828f] [no req] Error from line 212 of /srv/mediawiki/php-master/maintenance/backup.inc: Class 'NoredirectFilter' not found [21:33:13] it's there exactly [21:33:20] for the currrent incarnation of patch [21:33:22] yep that'll do ya [21:33:37] yeah either something's weird about the data going in or it's not groking the autoloader [21:33:56] no, for master [21:34:06] because puppet just synced over what was there, heh [21:34:45] and NoredirectFilter comes from another ext? or local code outside tree? [21:34:48] but just before puppet overwrote it, the same error, ie with chad's latest patch [21:34:52] heh [21:34:56] same extension [21:35:02] ok lemme look [21:35:04] just been moved to is own file in the extension because [21:35:14] that's the way we do now for php style [21:36:00] Krenair: thanks for checking on these hosts, I was amazed earlier when I saw those tasks [21:36:52] about the amount of problems? [21:37:42] no, about how thorough you are about hunting them all down and following through [21:37:54] oh [21:37:58] tbh I expect at any time that many instances will have puppet issues for one reason or another [21:38:08] huh. it looks ok enough in isolation [21:38:14] yeah it's like a little project of it's own [21:38:20] Krenair is certainly thorough :) [21:39:02] it's goo, it means I was able to see a bunch of other instances all with the stretch experimental issue, for example [21:39:27] *good [21:40:14] is there a way to see puppet output in shinken or does it just log which hosts had errors? [21:40:44] twentyafterfour: just log stuff [21:40:58] I think shinken only sees the metrics that those things dump [21:41:01] ssh someinstance sudo tail -n 200 /var/log/puppet.log [21:41:05] not sure though [21:41:09] I used to know [21:41:16] or maybe you can look them up on the puppet master [21:41:30] you can always go to deployment-cumin and "sudo cumin '*' 'puppet agent -tv'" [21:41:32] ok i can repro the bug, lemme . .. hmmmm [21:41:37] if there's a cumin that works on beta you could do that [21:41:43] exactly! [21:41:45] i'm betting the autoloader from the extension.json is not being loaded by the plugin load code [21:41:50] deployment-cumin.deployment-prep.eqiad.wmflabs ! [21:42:04] I got most of them that way (or with salt in times past) [21:42:09] yeah cumin is awesome [21:42:17] still missed a few though, not sure how. possibly pebkac [21:43:23] "pebkac happens" [21:43:28] :) [21:43:57] 10Release-Engineering-Team (Watching / External), 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Allow "releasers-mediawiki" sudo rights to manage Jenkins - https://phabricator.wikimedia.org/T183972#3884457 (10RobH) 05Open>03Resolved a:03RobH merged live [21:46:19] I believe that our cumin setup allows not only deployment-cumin but also some labs-wide system for cloud roots to use [21:46:22] 10Beta-Cluster-Infrastructure, 10Operations, 10Patch-For-Review, 10Prometheus-metrics-monitoring, 10User-fgiunchedi: Move deployment-prep redis instances to stretch - https://phabricator.wikimedia.org/T179371#3722645 (10mmodell) @fgiunchedi sounds good to me! Puppet is now broken on the old redis nodes,... [21:46:31] and is less horrible than salt in general to work with [21:47:04] on the other hand, it forces you to go through ssh and co. - which is usually fine, though I used to use it to fix up instances that were very broken to the point of people no longer being able to ssh in [21:47:13] used to use salt* [21:48:07] yeah I had that too [21:48:15] ssh for these, salt for those etc [21:48:27] and a little set of tools to tell me which instances were going to need what fix [21:49:01] people think it's not often that ssh is broken on labs instances [21:49:09] I could tell you stories... [21:50:11] Could someone take a look at this please https://phabricator.wikimedia.org/T182941 its a gerrit repo admin thing all i need is labs/icinga2's group on gerrit to be owned by itself please :) [21:55:09] apergos: also, the puppet-agent send their log to syslog and rsyslog send them to logstash [21:55:43] apergos: so you can get all puppet-agent warnings/errors from there: https://logstash-beta.wmflabs.org/goto/f0c811faad068eceebb7d415da21deb9 [21:55:53] I hadn't checked logstash for those [21:55:57] not bad at all [21:56:04] Krenair: do you use that? ^^ [21:57:00] (we also have lost all dashboards bah :( ) [21:58:01] yes I saw I couldn't get dashboards but you can dig around and construct a decent search string [21:58:18] so how do dashboards get lost, while I'm hangin out in here? [21:58:38] no idea [21:58:49] probably the elasticsearch database has disappeared [22:01:05] ouch [22:02:21] apergos, IIRC, every time I've tried to interact with logstash I have gotten discouraged by the UI and given up eventually [22:02:32] that would've been with prod logstash a year ago [22:03:07] I'm usually interested in events on one host so I just go there [22:03:23] but today for example I wanted to see if there were other things in beta with memcache issues [22:03:27] and indeed there were [22:03:48] plus that stuff isn't logged anywhere else in beta, I'm literally forced to go there to see what's going on [22:04:44] did they get rid of our fluorine equivalent? [22:06:49] apergos: i gotta run to an appointment; if that backup plugin is still biting you later i'll see if i can narrow down what's happening [22:07:49] thanks brion for looking at it [22:08:04] Krenair: I didn't know we have one! [22:08:27] IIRC I did the jessie migration for it [22:08:33] great [22:08:36] there it is, deployment-fluorine02 [22:08:41] that would have saved me some grief earlier toda [22:08:42] y [22:08:45] and by jessie migration I mean from like trusty or something [22:08:53] not away from jessie [22:08:56] right [22:09:54] 10Gerrit: Migrate to NoteDb - https://phabricator.wikimedia.org/T174034#3884521 (10Paladox) Upstream's time table puts reviewdb discontinuation at sometime this year. When we do the upgrade to 2.15 it wont migrate everything. groups will be migrated when we do the upgrade to 2.16 / 3.0. Comments are now stored... [22:10:51] this was a long time ago [22:11:47] req-9330ff6a-7798-47dc-8dfa-d79c61cfe1f4 Create 18 Aug 2016, 7:02 p.m. krenair [22:13:03] I logged onto a couple instances today, they'd been up for over 500 days [22:13:15] amazing [22:25:32] apergos, doesn't look like any deployment-prep ones are over 201 days [22:27:44] maybe I hallucinated it [22:27:51] and it was only hours or minutes :-D [22:28:05] or it wasn't deployment-prep [22:28:11] maybe it was some prod host [22:28:39] I did hallucinate it [22:30:11] it was 201 days, and then I looked at the run time of the processes in question and completely conflated that in my mind when "recalling" the info later [22:30:13] stupid brain [22:32:58] :D [22:34:02] deployment-dumps-puppetmaster.deployment-prep.eqiad.wmflabs [22:34:03] hmmm [22:34:21] it is a short term hack :) [22:35:13] it'll fit right in then [22:36:01] that puppetmaster exists so I can cherry pick my php7 patches onto it [22:36:06] and run them only on my instance [22:36:18] double hack really: puppet::self doesn't work on stretch either [22:36:19] ah, stuff that is unsafe for other instances to be doing? [22:36:23] pretty sure you wouldn't appreciate them being applied to all stretch instances [22:36:29] correct! [22:36:42] dumps is the very first thing to use stretch/php7, even on beta [22:36:49] we are pioneers [22:37:27] bd808, we still using puppet::self? [22:37:36] no, it's uh [22:37:40] I thought that died a horrible, painful death over a year ago? [22:37:55] role::puppetmaster::standalone [22:37:56] that [22:38:10] well that's recommended, I don' know if anyone *uses* the old bad one [22:38:13] aha right [22:38:17] yeah. my brain just thinks of that as puppet::self still I guess ;) [22:38:21] :D [22:39:09] I think a.ndrew's last audit found 4-5 puppet::self hosts still lingering [22:41:08] ewww [22:41:13] but maybe by now they are gone [22:41:19] oh so while you are all here and chatty [22:41:36] I have https://gerrit.wikimedia.org/r/#/c/402803/ [22:42:06] its goal is to allow folks to include that stuff into the appropriate role for their instance [22:42:15] instead of you have to have two roles on anything with mediawiki [22:42:29] plus it reduces puppet style violations a bit [22:42:40] anyone dare to look and approve? :-P [22:43:30] I don't think I was around when profiles became the norm and don't really understand them [22:44:12] this does appear to be doing basically the same thing as before? [22:44:28] profiles can be included in roles [22:44:31] one role per node [22:44:43] profiles may include other profiles though it's not awesome [22:44:50] they must declare classes (no include) [22:45:31] profiles can't be includeed in module classes, nor can roles be included in profiles etc [22:45:44] so the idea is a role is a collection of profiles that go on a node [22:45:57] the profiles, if they have params, read them out of hiera [22:46:01] hiera is used nowhere else [22:46:08] apergos, I was just nosing around your -dumps-puppetmaster and noticed a cherry-pick of https://gerrit.wikimedia.org/r/#/c/372764/ - are you able to review that? [22:46:22] then the module manifests are all the little bits and stuff that get grabbed for different profiles [22:46:28] apergos, I read something about nodes now only being allowed to have one role somewhere [22:46:39] yes, one role per node. period [22:46:45] a role is 'here is what this machine does' [22:46:49] not 'it does 5 things' [22:46:54] that would be 5 profiles :-p [22:47:17] so my first one of the cherry picks I needed (there's three plus the php7 one) [22:47:22] some of our instances have multiple roles from hiera [22:47:24] Krenair: and https://wikitech.wikimedia.org/wiki/Puppet_coding#Organization is the long read [22:47:25] that I want to shepherd through is actually [22:47:28] are they going to break? [22:47:42] no, we'll jut nag people and nag them til they fix their instances to comply [22:47:46] uh [22:47:56] anyways the one I want to get through first of those changesets is uh [22:48:10] also joe wrote a puppet-lint plugin that enforce some of those rules [22:48:41] and a jenkins job runs it continuously on https://integration.wikimedia.org/ci/job/operations-puppet-wmf-style-guide/ with a graph and dashboard report [22:48:46] https://gerrit.wikimedia.org/r/#/c/361796/ Krenair [22:48:54] but I have not had time yet [22:49:13] hm, December 2016 [22:49:23] I have: "things broken in beta", "things it would be nice to do for beta" and "things I have to have work for my php7 migration stuff" [22:49:37] I try to get some of 1 and 3 done [22:49:44] 2 is a little farther away just yet [22:49:47] I understand :) [22:50:23] getting the gid fixup would be nice, my dumps repo for example has been there since 2016, from trebuchet [22:50:35] and there may well be others on tin, same story [22:50:56] hm 1 am [22:51:10] I think I've given up on work for today, can still chat/answer qs but [22:51:21] looking at code... eh [22:51:29] get some sleep :) [22:51:44] nah, want a bit of chocolate and wind down [22:51:50] probably sleep in an hour though :-) [22:57:49] 10Release-Engineering-Team (Kanban), 10Scap, 10Wikimedia-Incident: Investigate deployment that caused high error-rate but wasn't prevented by Scap - https://phabricator.wikimedia.org/T183952#3884718 (10thcipriani) >>! In T183952#3878049, @zeljkofilipin wrote: > Scap did fail during deployment. Since the comm... [22:58:20] ok, have a good night apergos :) [22:59:07] thanks, see yas! (also checking out of here, channel limit issues with my client) [23:03:37] kind of wish that ops people would review any puppet patches they cherry-pick themselves, there's a reason they have to cherry-pick them instead of them just already being there :/ [23:04:29] 10Release-Engineering-Team (Kanban), 10Scap, 10Wikimedia-Incident: Investigate deployment that caused high error-rate and was prevented from going past canaries by Scap - https://phabricator.wikimedia.org/T183952#3884732 (10thcipriani) [23:07:10] 10Release-Engineering-Team (Kanban), 10Scap, 10Wikimedia-Incident: Investigate deployment that caused high error-rate and was prevented from going past canaries by Scap - https://phabricator.wikimedia.org/T183952#3884737 (10thcipriani) 05Open>03Resolved a:03thcipriani Dug a little deeper on this today... [23:24:51] 10Continuous-Integration-Infrastructure, 10RemexHtml, 10Patch-For-Review: Figure out how to speed up RemexHtml coverage runs - https://phabricator.wikimedia.org/T179055#3884760 (10Legoktm) Sorry, I deployed a change via jjb to test it and never followed-up. I was trying to figure out the difference in the re... [23:51:34] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown, 10Multi-Content-Revisions, 10User-Addshore: mediawiki tests fail in mediawiki code coverage test - https://phabricator.wikimedia.org/T183777#3884804 (10Legoktm) SQLite instead of MySQL, PHP 5.6 instead of PHP 5.5/HHVM, runs on a permanent sla...