[00:27:52] TimStarling: thanks. i'll merge a puppet change for scandium as well that creates a switch between parsoid/JS and parsoid/PHP in Hiera [00:28:09] (but not actually switch it right now) [02:16:59] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [02:31:11] o/ hey there. i think the beta cluster has ceased updating. can someone bump the thingy? [03:17:53] * thcipriani nudges thingy [03:18:21] !log cancelled stuck beta-scap-eqiad job to unblock postmerge [03:18:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [07:04:35] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:25:22] 10Project-Admins: Requests for addition to the #acl*Project-Admins group (in comments) - https://phabricator.wikimedia.org/T706 (10Jony) Hi, I'd like to be added to Project-Admins in to create sprint-projects for the #bengali-sites and #wikidata. [07:44:05] PROBLEM - Puppet staleness on webperformance is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [07:46:59] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [08:26:08] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [09:32:06] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [09:32:35] (03PS2) 10Ladsgroup: Set cache directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/528933 (https://phabricator.wikimedia.org/T225730) [09:35:43] Anyone willing to review this ^ [09:39:43] Amir1: I took a look, but felt like I don't have enough understanding of the MediaWiki caches to give an educated opinion. [09:40:09] Oh hey, nice simplification. [09:40:43] Krinkle knows better but he's ooo :( [09:43:45] Amir1: Is TMPDIR set to something different for each job? [09:44:07] Otherwise we might have conflicts, according to https://www.mediawiki.org/wiki/Manual:$wgCacheDirectory [09:44:13] > Note that you should not set this directory to '/tmp' else there could be conflicts with other MediaWiki installations on the same server (wiki farms or shared servers). [09:45:33] awight: but quibble is on docker, all of the tmp directories are isolated from each other [09:46:07] oho, thanks! [09:48:05] (03CR) 10Awight: [C: 03+2] "This seems wholesome, according to my limited understanding. Any conflicts that might happen between tests that e.g. redefine the same me" [integration/quibble] - 10https://gerrit.wikimedia.org/r/528933 (https://phabricator.wikimedia.org/T225730) (owner: 10Ladsgroup) [09:49:03] (03Merged) 10jenkins-bot: Set cache directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/528933 (https://phabricator.wikimedia.org/T225730) (owner: 10Ladsgroup) [09:49:20] 10Beta-Cluster-Infrastructure, 10Performance-Team: Move XHGui from tungsten to webperf-002 - https://phabricator.wikimedia.org/T180761 (10MoritzMuehlenhoff) > Regarding multi-dc, we have four options I know of: > 3. Or; Push back this problem and migrate from tungsten to webperf1002 first. > ** no standby/fail... [09:49:37] (03CR) 10jenkins-bot: Set cache directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/528933 (https://phabricator.wikimedia.org/T225730) (owner: 10Ladsgroup) [09:51:56] awight: thanks! [09:52:07] now I need someone to deploy it :D [09:52:36] Good luck, I'm looking forward to seeing another dramatic runtime improvement :-) [09:53:33] Holler if you feel like talking through the obstacles to parallel browser testing, btw. I think we're in a position to start designing how that might work. [09:54:13] Sure! [09:58:10] Amir1: Tmp is fine for quibble indeed. My longer version was to cover all bases in core [09:58:15] * Krinkle hides again [10:30:02] James_F|Away: hey, when you have time, can you deploy quibble? [10:53:52] 10Phabricator, 10Developer-Advocacy (Jul-Sep 2019): Re-evaluate our use of Phabricator Conpherence chat - https://phabricator.wikimedia.org/T127640 (10Aklapper) 05Open→03Resolved No requests in the last four weeks for data. I now uninstalled Conpherence. Closing as resolved. Thanks everyone. [10:53:55] 10Phabricator, 10Documentation: Re-evaluate installed Wikimedia Phabricator applications - https://phabricator.wikimedia.org/T109186 (10Aklapper) [11:47:50] Your branch is behind 'origin/production' by 833 commits [11:47:53] great [12:09:00] 10Release-Engineering-Team, 10CPT Initiatives (Session Management Service (CDP2)), 10Core Platform Team Workboards (Green): Need help to create and deploy Debian-packaged Python 3 app - https://phabricator.wikimedia.org/T229980 (10holger.knust) [12:16:07] 10Release-Engineering-Team, 10CPT Initiatives (Session Management Service (CDP2)), 10Core Platform Team Workboards (Green): Need help to create and deploy Debian-packaged Python 3 app - https://phabricator.wikimedia.org/T229980 (10holger.knust) [13:05:05] (03CR) 10Effie Mouzeli: [C: 03+1] "Let't try it:)" [tools/scap] - 10https://gerrit.wikimedia.org/r/528929 (owner: 10Thcipriani) [13:22:05] thanks thcipriani :] [14:09:47] 10Phabricator: Improve start page window title - https://phabricator.wikimedia.org/T229225 (10PerfektesChaos) Yeah, everything fine now. [14:24:26] Uneducated question: can we turn on sonarqube for wikibase/termbox without much effort? It's a nodejs service would this still have value? I tried adding a sonar-project.properties and making a PS but I guess there is more to do [14:31:50] 10Continuous-Integration-Config, 10Operations: Fix operations/puppet.git "rebase hell" - https://phabricator.wikimedia.org/T224033 (10CDanis) We unfortunately did not discuss this during the SRE summit. Here is my two lepta: * The current situation of ff-only is both not as safe as it seems, and often create... [14:57:27] (03PS1) 10Giuseppe Lavagetto: Rebuild the operations-puppet docker image [integration/config] - 10https://gerrit.wikimedia.org/r/529108 [14:59:24] (03PS2) 10Giuseppe Lavagetto: Rebuild the operations-puppet docker image [integration/config] - 10https://gerrit.wikimedia.org/r/529108 [15:00:26] (03CR) 10CDanis: [C: 03+1] Rebuild the operations-puppet docker image [integration/config] - 10https://gerrit.wikimedia.org/r/529108 (owner: 10Giuseppe Lavagetto) [15:36:24] 10Continuous-Integration-Infrastructure, 10MediaWiki-Installer, 10Core Platform Team (Needs Cleaning - Security, stability, performance and scalability (TEC1)), 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: install.php --with-extensions silently... - https://phabricator.wikimedia.org/T225512 [15:48:26] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201908), 10serviceops-radar: Gerrit: Alert when # of active threads > some threshold - https://phabricator.wikimedia.org/T230138 (10thcipriani) [15:49:21] (03CR) 10Thcipriani: [C: 03+2] PHPRestart: Create global INSTANCE for pickling [tools/scap] - 10https://gerrit.wikimedia.org/r/528929 (owner: 10Thcipriani) [15:51:25] (03Merged) 10jenkins-bot: PHPRestart: Create global INSTANCE for pickling [tools/scap] - 10https://gerrit.wikimedia.org/r/528929 (owner: 10Thcipriani) [15:52:15] (03CR) 10jenkins-bot: PHPRestart: Create global INSTANCE for pickling [tools/scap] - 10https://gerrit.wikimedia.org/r/528929 (owner: 10Thcipriani) [16:52:10] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201908), 10Scap, 10serviceops, and 2 others: Deploy scap 3.12.0-1 to production - https://phabricator.wikimedia.org/T230144 (10thcipriani) [16:52:37] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201908), 10Scap, 10serviceops, and 2 others: Deploy scap 3.12.0-1 to production - https://phabricator.wikimedia.org/T230144 (10thcipriani) a:05thcipriani→03None [17:06:05] <_joe_> hi there! [17:06:28] <_joe_> I'd need help with deploying https://gerrit.wikimedia.org/r/#/c/integration/config/+/529108/ [17:06:41] <_joe_> it will cut the execution time of puppet in CI by ~ 50% [17:06:55] nice :) [17:07:10] <_joe_> oh nothing fancy, just by virtue of refreshing the bundle [17:07:17] <_joe_> we added some expensive gems recently [17:07:24] makes sense. [17:07:27] <_joe_> so now at each run it has to run bundle install of those [17:07:38] <_joe_> the only caveat is [17:07:52] <_joe_> we never used the new image with the new script in CI [17:08:01] <_joe_> we might need to rollback that part if something is wrong [17:09:00] ah, yeah, I see you were using 5.3 and never bumped to 5.4 [17:09:25] k, lgtm, I'll go ahead and merge and build the new image. [17:09:52] (03CR) 10Thcipriani: [C: 03+2] Rebuild the operations-puppet docker image [integration/config] - 10https://gerrit.wikimedia.org/r/529108 (owner: 10Giuseppe Lavagetto) [17:10:45] after the image is built, I'll re-deploy the job. Is there anything specific that needs to be tested for that? Or just recheck a patch and see if something explodes? [17:13:04] (03Merged) 10jenkins-bot: Rebuild the operations-puppet docker image [integration/config] - 10https://gerrit.wikimedia.org/r/529108 (owner: 10Giuseppe Lavagetto) [17:14:58] !log updating docker-pkg files on contint1001 for https://gerrit.wikimedia.org/r/529108 [17:15:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:28:37] 10Phabricator: Autofocus on TOTP input field - https://phabricator.wikimedia.org/T229757 (10epriestley) This should be resolved upstream by . [17:31:00] 10Phabricator: Autofocus on TOTP input field - https://phabricator.wikimedia.org/T229757 (10MarcoAurelio) Thanks @epriestley 😹 [17:31:18] 10Phabricator (Upstream), 10Upstream: Autofocus on TOTP input field - https://phabricator.wikimedia.org/T229757 (10MarcoAurelio) [17:31:22] _joe_: your operations-puppet-tests-stretch-docker change is live, seems to work in a simple case: https://integration.wikimedia.org/ci/job/operations-puppet-tests-stretch-docker/19060/console [17:31:42] <_joe_> thcipriani: thanks! [18:07:03] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [18:26:56] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [19:12:25] Hi! MaxSem, RoanKattouw or Niharika, would one of you like to help me deploy a couple small config changes on the evening swat? [19:12:57] https://gerrit.wikimedia.org/r/527183 and https://gerrit.wikimedia.org/r/526756 [19:13:14] Let me just add those to the deployments table... [19:15:04] added [19:15:49] ejegg: I'm in Europe this month, so the evening SWAT will be too late for me (1am) [19:16:09] ok, have fun over there! [19:21:12] thcipriani: paladox: would now be a good time to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/527596 ? [19:21:26] (once teh train is done) [19:22:00] cdanis yup! [19:22:17] cdanis: sure! if it seems like the train is going to be stable :) [19:22:45] looks like it's live everywhere at least [19:23:00] brennen: any issues so far? [19:25:23] cdanis / thcipriani: just filed https://phabricator.wikimedia.org/T230158 - otherwise looking pretty quiet i think [19:26:10] brennen: fine to take away gerrit for a few minutes? [19:26:20] thcipriani: yeah, i think so. [19:26:55] ok cool [19:27:12] thcipriani: the usual where I do the +2 and puppet-merge and you do the gerrit restart? :) [19:27:26] cdanis: sounds like a plan [19:27:31] thank you! [19:30:10] thcipriani: puppet-merged [19:30:38] thanks cdanis ! [19:31:14] cdanis: great, thanks [19:54:20] hopefully third time is the charm? :) [19:55:01] 10Phabricator (Upstream), 10Upstream: Error viewing project feed when many recent changes were in access restricted tasks: "Query (of class "PhabricatorFeedQuery") overheated: examined more than 500 raw rows without finding 50 visible objects" - https://phabricator.wikimedia.org/T230001 (10epriestley) This sho... [19:59:34] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [20:02:32] legoktm it should work this time [20:02:46] we only reverted it as we had to go back to an older gerrit version due to the threads issue [20:03:04] but it's mostly been stable (we did have a threads issue a few days ago) [20:05:46] ejegg: I'm actually doubtful about the pages04 patch without explicit approval from Security [20:06:15] :D [20:08:14] Hey RelEng, what do you think about us subscribing our GitHub org for beta of their CI? https://github.blog/2019-08-08-github-actions-now-supports-ci-cd/ [20:24:42] MaxSem: we had security and legal take a look at it a couple years ago when we added the 'remind me later' feature to banners [20:25:00] this patch just stops breaking the feature in the forced banner preview mode [20:25:18] i.e. when you show a specific banner using the ?banner= param [20:30:00] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [20:55:39] MaxSem: that is, the form that can submit to www.pages04.net has been a part of fundraising banners for a couple years now. legal and security were OK with it because it takes a couple of affirmative actions to send any user data to that domain [20:56:13] and because that domain is controlled by the same entity that WMF fundraising uses to do all our bulk emailing [20:56:57] The setting changed by this patch is just an extra-strict (i.e. actually blocking, not report-only) CSP added to forced banner previews [20:57:35] the forced banner previews also add a tiny js module to catch the CSP violation exceptions and display an alert to the user [20:58:13] this is a safeguard against unintentional privacy violations, like loading an image or a script hosted on a social network [20:59:10] since www.pages04.net has already been okayed for production use, the fundraising employees who design the banners would like to be able to test the functionality in forced banner preview mode [21:12:54] MaxSem: also just to note, there are tasks to remove the client-side calls to that domain, eventually they do need to go before site-wide CSP is enforced [21:13:01] and we are aware it's a less than ideal situaiton [21:13:39] in any case, as ejegg mentioned, this change is just to make the banner production process smoother, and doesn't impact what is actually put on the site [21:21:07] (03PS6) 1020after4: local-charts: CLI for managing minikube, helm, etc [releng/local-charts] - 10https://gerrit.wikimedia.org/r/525563 (https://phabricator.wikimedia.org/T224939) [21:22:52] (03PS7) 1020after4: local-charts: CLI for managing minikube, helm, etc [releng/local-charts] - 10https://gerrit.wikimedia.org/r/525563 (https://phabricator.wikimedia.org/T224939) [21:23:11] (03CR) 1020after4: local-charts: CLI for managing minikube, helm, etc (033 comments) [releng/local-charts] - 10https://gerrit.wikimedia.org/r/525563 (https://phabricator.wikimedia.org/T224939) (owner: 1020after4) [21:24:28] (03CR) 1020after4: local-charts: CLI for managing minikube, helm, etc (033 comments) [releng/local-charts] - 10https://gerrit.wikimedia.org/r/525563 (https://phabricator.wikimedia.org/T224939) (owner: 1020after4) [21:34:53] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201908), 10Scap, 10serviceops, and 2 others: Deploy scap 3.12.0-1 to production - https://phabricator.wikimedia.org/T230144 (10Dzahn) debdeploy spec file generated by `generate-debdeploy-spec` ` source: scap comment: https... [22:09:10] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201908), 10Scap, 10serviceops, and 2 others: Deploy scap 3.12.0-1 to production - https://phabricator.wikimedia.org/T230144 (10Dzahn) 05Open→03Resolved a:03Dzahn ` [cumin1001:~] $ sudo debdeploy deploy -u 2019-08-08-sc... [22:09:16] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (201908), 10Scap, 10serviceops, and 3 others: Enhance MediaWiki deployments for support of php7.x - https://phabricator.wikimedia.org/T224857 (10Dzahn) [22:09:33] thcipriani: https://debmonitor.wikimedia.org/packages/scap [22:10:05] in time for evening SWAT [22:16:02] mutante: <3 !! [22:16:55] *hopefully* that unblocks the php7 migration :) [22:17:59] ;) [22:23:36] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201908), 10observability, 10serviceops-radar: Gerrit: Alert when # of active threads > some threshold - https://phabricator.wikimedia.org/T230138 (10Dzahn) [22:35:53] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201908), 10observability, 10serviceops-radar: Gerrit: Alert when # of active threads > some threshold - https://phabricator.wikimedia.org/T230138 (10CDanis) Aren't we already alerting on Gerrit responding to HTT... [22:44:04] MaxSem do the explanations above allay your concerns about deploying those config patches in the evening SWAT? [22:44:53] I would just like to see an explicit yes from security [22:46:49] OK, we'll take care of it [22:47:25] fr-tech is confident about security and legal's prior review of the actual scripts [23:05:40] MaxSem: here's the last word from security on the Remind Me Later functionality: https://phabricator.wikimedia.org/T195260 [23:06:27] Again, the patch under review doesn't change anything about this behavior for readers, just for CentralNotice administrators who are previewing banners