[01:01:22] ebernhardson, know anything about deployment-elastic08? [01:01:34] apparently it was in puppet but now: [01:01:42] ssh: Could not resolve hostname deployment-elastic08.deployment-prep.eqiad.wmflabs: Name or service not known [01:02:32] gehel, ^ [01:13:02] Krenair wasen't it deleted today [01:13:16] there was a icinga notification saying it was down earlier [01:13:37] yep [01:57:48] hmm, that long? according to mediawiki-config deployment-elastic08 was removed from LabsServices in Oct 2016 [01:58:31] the ticket seems pretty clear it was turned off though. must be something recently spun up? [02:17:51] 10Release-Engineering-Team (Watching / External), 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-log-errors: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index. - https://phabricator.wikimedia.org/T194403 (10Krinkle) Moving to "Production impact". This is... [06:14:25] PROBLEM - Puppet errors on deployment-deploy01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [06:39:23] RECOVERY - Puppet errors on deployment-deploy01 is OK: OK: Less than 1.00% above the threshold [0.0] [06:59:29] Krenair: it was deleted, I'm experimenting with upgrading elasticsearch to stretch. See T193649 [06:59:30] T193649: migrate elasticsearch to stretch (from jessie) - https://phabricator.wikimedia.org/T193649 [06:59:53] I'm going to do a few iterations with it. [07:39:21] gehel, okay please remember to run puppet node deactivate and puppet node clean for it [07:39:23] https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Decommission_an_instance [07:40:01] Krenair: will do [07:40:45] isn't there a hook in instance deletion that could automate that? It seems quite error prone to rely on users to remember it... [07:41:37] you'd have to have the instance deletion stuff figure out what the puppetmaster for the instance was [07:41:42] and be able to SSH into it [07:42:30] trickier than you may think considering the sheer number of places puppet data can come from, actually probably not possible to do 100% [07:43:33] maybe just a warning to the user from the UI? [07:44:00] I might be the only one who missed that step, so it might not make sense at all to do anything. [07:45:16] horizon is upstream so that might be non-trivial, but maybe doable [07:45:27] you're not the only one [08:43:28] 10Release-Engineering-Team (Watching / External), 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-log-errors: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index. - https://phabricator.wikimedia.org/T194403 (10jcrespo) @Krinkle may I ask to update the descri... [09:07:21] 10Release-Engineering-Team (Watching / External), 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-log-errors: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index. - https://phabricator.wikimedia.org/T194403 (10aaron) Given how low server_failure_limit is, it... [11:21:47] 10Phabricator, 10User-Zppix: Automatically add Volunteer badge to users on Phabricator - https://phabricator.wikimedia.org/T164736 (10TerraCodes) [11:38:25] 10Scap, 10DBA, 10Operations: "sql wikishared" doesn't work on mwmaint1001 - https://phabricator.wikimedia.org/T199316 (10Amire80) [11:46:59] 10Scap, 10MediaWiki-Platform-Team: "sql wikishared" doesn't work on mwmaint1001 - https://phabricator.wikimedia.org/T199316 (10jcrespo) [12:06:19] 10Phabricator, 10Patch-For-Review: Rate-limit is too harsh and affects human users - https://phabricator.wikimedia.org/T198974 (10Daimona) Agreed that this is truly annoying. Took me a lot of time to file a report due to "concurrent connections". [12:06:32] 10Scap, 10MediaWiki-Platform-Team: "sql wikishared" doesn't work on mwmaint1001 - https://phabricator.wikimedia.org/T199316 (10Anomie) The new way to access that be something like `sql wikishared --cluster=extension1`. The old `sql` script had [[https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet... [12:44:07] 10Release-Engineering-Team (Watching / External), 10MediaWiki-Database, 10Performance-Team, 10MW-1.32-release-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), and 2 others: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index... - https://phabricator.wikimedia.org/T194403 [12:55:47] PROBLEM - Puppet errors on deployment-elastic09 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:08:48] (03CR) 10Alexandros Kosiaris: [C: 04-1] Perform helm deployment in service-pipeline (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/425936 (https://phabricator.wikimedia.org/T188935) (owner: 10Dduvall) [13:20:22] 10Continuous-Integration-Config, 10Fundraising-Backlog, 10MediaWiki-extensions-DonationInterface: Fundraising should fall back to non master - https://phabricator.wikimedia.org/T199130 (10hashar) [13:35:24] !log deployment-prep: launched new instance deployment-maps04 for maps testing on stretch [13:35:29] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:45:12] 10Release-Engineering-Team (Kanban), 10Scap, 10Services (blocked): Scap3 doesn't support non us-ascii characters in config templates for service deployments - https://phabricator.wikimedia.org/T198621 (10fgiunchedi) [13:45:15] 10Release-Engineering-Team (Kanban), 10Scap, 10Operations, 10Patch-For-Review, 10Services (blocked): Update Debian Package for Scap3 to 3.8.4-1 - https://phabricator.wikimedia.org/T199283 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi This is completed [13:46:30] 10Release-Engineering-Team (Kanban), 10Scap, 10Services (blocked): Scap3 doesn't support non us-ascii characters in config templates for service deployments - https://phabricator.wikimedia.org/T198621 (10thcipriani) 05Open>03Resolved a:03thcipriani D1077 should have resolved this problem and it is now... [13:54:03] 10Continuous-Integration-Config, 10Fundraising-Backlog, 10MediaWiki-extensions-DonationInterface: Fundraising should fall back to non master - https://phabricator.wikimedia.org/T199130 (10hashar) And eventually I figured out that the issue is when sending a patch to mediawiki/core branch `fundraising/REL1_27... [13:56:20] (03PS1) 10Hashar: Skip Quibble for fundraising/REL1_27 branch [integration/config] - 10https://gerrit.wikimedia.org/r/445169 [13:58:59] 10Release-Engineering-Team (Watching / External), 10GitHub-Mirrors, 10Security-Team: People who need to enable 2FA or risk losing access - https://phabricator.wikimedia.org/T199328 (10Reedy) [13:59:25] 10Release-Engineering-Team (Watching / External), 10GitHub-Mirrors, 10Security-Team: People who need to enable 2FA or risk losing access - https://phabricator.wikimedia.org/T199328 (10Reedy) [14:05:46] (03PS1) 10Thcipriani: deploy-promote: fix clean repo assertion [tools/release] - 10https://gerrit.wikimedia.org/r/445170 [14:05:48] 10Release-Engineering-Team (Watching / External), 10GitHub-Mirrors, 10Security-Team: People who need to enable 2FA or risk losing access - https://phabricator.wikimedia.org/T199328 (10Reedy) [14:06:13] 10Project-Admins: Create the Wikidata soweego project - https://phabricator.wikimedia.org/T199330 (10Hjfocs) [14:12:06] (03CR) 10Zfilipin: [C: 031] deploy-promote: fix clean repo assertion [tools/release] - 10https://gerrit.wikimedia.org/r/445170 (owner: 10Thcipriani) [14:13:47] hello everybody [14:14:03] (03PS2) 10Hashar: Use DonationInterface job for mw/core fundraising branch [integration/config] - 10https://gerrit.wikimedia.org/r/445169 (https://phabricator.wikimedia.org/T199130) [14:14:08] I'd need to add the analytics_deploy private key to the keyholder on deployment-tin [14:14:14] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Fundraising-Backlog, 10MediaWiki-extensions-DonationInterface, 10Patch-For-Review: Fundraising should fall back to non master - https://phabricator.wikimedia.org/T199130 (10hashar) a:03hashar [14:14:35] what is the procedure? [14:15:09] elukey: https://wikitech.wikimedia.org/wiki/Keyholder lists where the keys should be put on beta [14:15:23] namely that is in a private git repo on the deployment-prep puppetmaster [14:15:45] which is deployment-puppetmaster03.deployment-prep.eqiad.wmflabs [14:15:59] then run puppet [14:16:12] and /usr/local/sbin/keyholder arm [14:16:23] fill the passphrase and it should work [14:17:39] thcipriani: hi! Do you still have interest in ejegg / donation interface CI config ? [14:17:46] iirc you got involved at some point [14:17:57] hashar: ahh ok I missed the beta part in there, thanks! So I just generate a keypair, add it on puppetmaster03 and then do the rest? [14:18:17] 10Release-Engineering-Team (Kanban), 10Scap, 10Services (blocked): Scap3 doesn't support non us-ascii characters in config templates for service deployments - https://phabricator.wikimedia.org/T198621 (10Pchelolo) Thank you. Just made a deployment and verified it's working correctly with non-ascii configs [14:18:29] elukey: yeah seems so [14:18:49] elukey: on deployment-puppmaster03 there should be a few local commits in /var/lib/git/labs/private/ [14:23:55] (03CR) 10Hashar: "That should fix it for mediawiki/core . I haven't looked at what would happen with other repositories being used in that job." [integration/config] - 10https://gerrit.wikimedia.org/r/445169 (https://phabricator.wikimedia.org/T199130) (owner: 10Hashar) [14:25:13] 10Phabricator, 10VisualEditor, 10User-Ryasmeen: 404 on VisualEditor workboard (due to custom filter applied which did not exist in database) - https://phabricator.wikimedia.org/T199207 (10Deskana) Great. Thanks everyone! [14:53:08] (03CR) 10Hashar: [C: 032] deploy-promote: fix clean repo assertion [tools/release] - 10https://gerrit.wikimedia.org/r/445170 (owner: 10Thcipriani) [14:53:49] (03Merged) 10jenkins-bot: deploy-promote: fix clean repo assertion [tools/release] - 10https://gerrit.wikimedia.org/r/445170 (owner: 10Thcipriani) [15:05:57] hashar: I subclassed some jenkins job with cwd a couple years ago to make sure zuul-cloner didn't fall back to master, IIRC, but it's been a while [15:08:08] * cwd vaguely recalls [15:15:29] (03CR) 10Hashar: [C: 032] "Lets see how it goes" [integration/config] - 10https://gerrit.wikimedia.org/r/445169 (https://phabricator.wikimedia.org/T199130) (owner: 10Hashar) [15:17:42] (03Merged) 10jenkins-bot: Use DonationInterface job for mw/core fundraising branch [integration/config] - 10https://gerrit.wikimedia.org/r/445169 (https://phabricator.wikimedia.org/T199130) (owner: 10Hashar) [15:21:23] thanks hashar! [15:28:14] ejegg: Reedy yeah I think that is fixed for mediawiki/core fundraising/REL1_27 [15:28:52] mediawiki/vendor probably has the issue as well [15:31:22] hashar mediawiki/vendor actually DOES have a fundraising/REL1_27 branch [15:31:29] so that one was fine [15:31:33] \o/ [16:09:38] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:34:30] 10Release-Engineering-Team (Kanban), 10Scap, 10User-zeljkofilipin: Running smoke tests during deployment - https://phabricator.wikimedia.org/T187733 (10thcipriani) p:05Triage>03Low [17:20:59] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (shipyard), 10BlueSpice, 10Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811 (10hashar) [17:45:44] PROBLEM - SSH on integration-slave-docker-1016 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:50:37] RECOVERY - SSH on integration-slave-docker-1016 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [17:56:55] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MediaWiki-User-preferences, 10User-zeljkofilipin, 10Wikimedia-log-errors (Shared Build Failure): Selenium "User should be able to change preferences" test flaky - https://phabricator.wikimedia.org/T198137 (10Mooeypoo) p:05High>03Unbreak!... [18:02:22] PROBLEM - Puppet errors on integration-slave-docker-1016 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [18:07:56] 10Phabricator, 10Patch-For-Review: Rate-limit is too harsh and affects human users - https://phabricator.wikimedia.org/T198974 (10chasemp) I have been hit too :) [18:09:44] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MediaWiki-User-preferences, 10User-zeljkofilipin, 10Wikimedia-log-errors (Shared Build Failure): Selenium "User should be able to change preferences" test flaky - https://phabricator.wikimedia.org/T198137 (10hashar) My theory is that the `Rec... [18:11:04] chasemp: there's a ops/puppet change by twentyafterfour that doubles the limits, but thus far nobody has reviewed/merged it yet [18:17:33] Krenair: hi - I am getting issues with Beta es.wikibooks certificate [18:17:40] 'not safe', etc. [18:17:49] seems it's issued for aa.* instead [18:32:20] RECOVERY - Puppet errors on integration-slave-docker-1016 is OK: OK: Less than 1.00% above the threshold [0.0] [18:32:26] it's now offical gwtui is deprecated per release notes https://gerrit-review.googlesource.com/c/homepage/+/188050 [18:39:53] 10Phabricator, 10Patch-For-Review: Rate-limit is too harsh and affects human users - https://phabricator.wikimedia.org/T198974 (10Paladox) >>! In T198974#4416875, @dbarratt wrote: >>>! In T198974#4415162, @jcrespo wrote: >> I belive I think what is the source of the issues, when you write a comment, like this,... [18:40:57] (03CR) 10Hashar: "Acknowledging following scrum of scrum. There are a few issues I noticed will look at it." (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/442126 (https://phabricator.wikimedia.org/T177896) (owner: 10Mholloway) [18:52:18] (03CR) 10Thcipriani: "I think I thought of a way this could work, see inline comments." (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/442126 (https://phabricator.wikimedia.org/T177896) (owner: 10Mholloway) [18:56:43] greg-g: twentyafterfour you want me to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/444810 now? [19:22:46] chasemp: yes please [19:23:08] chasemp: or https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/445145/ [19:23:26] twentyafterfour: I'm hoping for a +1 from greg or aklapper since I'm not in the loop on rsponding if it goes bad [19:24:14] or someone else in the know [19:24:19] it's been causing a lot of trouble and signups require approval so the throttle isn't much benefit right now [19:24:54] ok I'll merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/444810 now and let's see if https://gerrit.wikimedia.org/r/c/operations/puppet/+/445145/ is needed I guess [19:24:59] and I'll shoot an email about it [19:25:05] chasemp: ok cool thanks [19:33:28] 10Release-Engineering-Team, 10MediaWiki-General-or-Unknown, 10MW-1.29-release: Formalise and Announce REL1_29 EOL - https://phabricator.wikimedia.org/T197669 (10Legoktm) +1 to what @Kghbln said. One last maintenance release and then declare it EOL? [19:34:31] 10Release-Engineering-Team, 10MediaWiki-General-or-Unknown, 10MW-1.29-release: Formalise and Announce REL1_29 EOL - https://phabricator.wikimedia.org/T197669 (10Reedy) >>! In T197669#4417090, @Legoktm wrote: > +1 to what @Kghbln said. One last maintenance release and then declare it EOL? That's vaguely what... [19:39:04] 10Phabricator, 10Patch-For-Review: Rate-limit is too harsh and affects human users - https://phabricator.wikimedia.org/T198974 (10Legoktm) >>! In T198974#4416922, @mmodell wrote: > I submitted a patch to raise the limits. I just need someone from sre to merge. Link? [19:39:31] 10Phabricator, 10Patch-For-Review: Rate-limit is too harsh and affects human users - https://phabricator.wikimedia.org/T198974 (10Legoktm) Oh, my bad, it's above. [19:53:25] !log next set of db updates for beta might be a bit slow. Expected! [19:53:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:54:30] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MediaWiki-User-preferences, 10Patch-For-Review, and 2 others: Selenium "User should be able to change preferences" test flaky - https://phabricator.wikimedia.org/T198137 (10hashar) https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/445222/ set... [19:58:33] 10Phabricator, 10Patch-For-Review: Rate-limit is too harsh and affects human users - https://phabricator.wikimedia.org/T198974 (10Aklapper) >>! In T198974#4416874, @Paladox wrote: > I think we should disable this as the account approvals is now manual meaning less likely a spammer will get through. That comme... [19:59:50] 10Phabricator, 10Wikibugs, 10Patch-For-Review: wikibugs hits Phabricator's rate limiting and hence is unreliable - https://phabricator.wikimedia.org/T198915 (10Aklapper) 05Open>03Resolved a:03Krenair I'd hope that Krenair's patch above plus T198974 will solve this problem. Hence closing. [20:01:10] 10Phabricator, 10Wikibugs, 10Patch-For-Review: wikibugs hits Phabricator's rate limiting and hence is unreliable - https://phabricator.wikimedia.org/T198915 (10mmodell) >>! In T198915#4405600, @Legoktm wrote: > Why are read-only requests being rate limited? That doesn't really make sense. Only write requests... [20:07:51] !log (beta): Update mobileapps to b5e152d [20:07:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:22:35] twentyafterfour: chasemp +1'd [20:48:08] 10Phabricator, 10Wikibugs, 10Patch-For-Review: wikibugs hits Phabricator's rate limiting and hence is unreliable - https://phabricator.wikimedia.org/T198915 (10Krenair) 05Resolved>03Open a:05Krenair>03None My commit above is a crappy client-side workaround, it is not a proper server-side fix. [20:49:48] 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259 (10Krenair) [20:49:51] 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Puppet: Puppet broken on deployment-mx due to systemd on trusty - https://phabricator.wikimedia.org/T184244 (10Krenair) 05Open>03Resolved [20:49:54] 10Beta-Cluster-Infrastructure, 10Operations, 10HHVM: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006 (10Krenair) [21:13:05] twentyafterfour: greg-g bombs away [22:02:51] 10Gerrit: Group description amendment request - https://phabricator.wikimedia.org/T199384 (10MarcoAurelio) [22:13:41] 10Phabricator, 10Patch-For-Review: Rate-limit is too harsh and affects human users - https://phabricator.wikimedia.org/T198974 (10mmodell) I've disabled the rate limit because even after merging 7562c262da5699d61634ffb8e4ea3aab54a0048d we still saw regular users hitting the limit. I think the rate limiting cod... [22:22:54] 10Gerrit: [admin] Group description amendment request - https://phabricator.wikimedia.org/T199384 (10MarcoAurelio) [22:39:37] twentyafterfour, is there no proxy magic we can put in front of phab to limit certain http methods to a maximum per IP per minute or something? [22:48:32] 10Beta-Cluster-Infrastructure: Beta eswikibooks certificate issues - https://phabricator.wikimedia.org/T199387 (10MarcoAurelio) [22:51:05] 10Beta-Cluster-Infrastructure: Beta eswikibooks certificate issues - https://phabricator.wikimedia.org/T199387 (10Reedy) It's because it's not in https://github.com/wikimedia/puppet/blame/production/hieradata/labs/deployment-prep/host/deployment-cache-text04.yaml [22:51:07] 10Beta-Cluster-Infrastructure: Beta eswikibooks certificate issues - https://phabricator.wikimedia.org/T199387 (10Krenair) aa.m.wikipedia is just the first name on the list, there's plenty more unfortunately it does not include es.wikibooks [22:51:33] 10Continuous-Integration-Infrastructure, 10Core-Platform-Team, 10Test-Coverage: Migrate https://tools.wmflabs.org/coverage/mediawiki/ to CI infrastructure - https://phabricator.wikimedia.org/T182751 (10CCicalese_WMF) [22:51:46] 10Beta-Cluster-Infrastructure: Beta eswikibooks certificate issues - https://phabricator.wikimedia.org/T199387 (10Krenair) I'm pretty sure that list is at maximum capacity btw so this is probably blocked by T182927 [22:52:46] 10Beta-Cluster-Infrastructure: Beta eswikibooks certificate issues - https://phabricator.wikimedia.org/T199387 (10Reedy) I wonder if any others are missing [22:52:59] 10Beta-Cluster-Infrastructure: Beta eswikibooks certificate issues - https://phabricator.wikimedia.org/T199387 (10Krenair) [22:53:01] 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Get letsencrypt wildcard cert for *.beta.wmflabs.org domains - https://phabricator.wikimedia.org/T182927 (10Krenair) [22:54:50] 10Beta-Cluster-Infrastructure: Beta eswikibooks certificate issues - https://phabricator.wikimedia.org/T199387 (10MarcoAurelio) I understand that just adding the es and es.m domains to that list would not be enough and some sudo puppet agent commands would need to be run in order to get the issue fixed... IF a c... [22:55:40] 10Beta-Cluster-Infrastructure: Beta eswikibooks certificate issues - https://phabricator.wikimedia.org/T199387 (10Reedy) >>! In T199387#4417906, @MarcoAurelio wrote: > I understand that just adding the es and es.m domains to that list would not be enough and some sudo puppet agent commands would need to be run i... [23:09:30] 10Scap: When scap deploy is aborted it should say so in the log - https://phabricator.wikimedia.org/T199388 (10bearND) [23:11:32] 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Get letsencrypt wildcard cert for *.beta.wmflabs.org domains - https://phabricator.wikimedia.org/T182927 (10MarcoAurelio) [23:11:45] 10Beta-Cluster-Infrastructure: Beta eswikibooks certificate issues - https://phabricator.wikimedia.org/T199387 (10MarcoAurelio) 05Open>03stalled [23:14:40] 10Scap: When scap deploy is aborted it should say so in the log - https://phabricator.wikimedia.org/T199388 (10thcipriani) p:05Triage>03Normal