[00:02:17] no_justification heh upstream had to release two new 2.11 and 2.12 releases [00:02:23] even though they are no longer supported [00:02:28] thcipriani, andrewbogott: Step 1 of making commands locale-aware: https://phabricator.wikimedia.org/D988 [00:02:40] due to github breaking support for something which was fixed in a newer release. [00:12:25] RECOVERY - Puppet errors on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [00:12:55] RECOVERY - Puppet errors on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0] [00:12:55] RECOVERY - Puppet errors on deployment-cassandra3-01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:14:41] RECOVERY - Puppet errors on deployment-ms-be04 is OK: OK: Less than 1.00% above the threshold [0.0] [00:15:04] RECOVERY - Puppet errors on deployment-cumin is OK: OK: Less than 1.00% above the threshold [0.0] [00:15:46] RECOVERY - Puppet errors on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0] [00:16:22] RECOVERY - Puppet errors on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0] [00:17:49] RECOVERY - Puppet errors on deployment-kafka-jumbo-2 is OK: OK: Less than 1.00% above the threshold [0.0] [00:18:55] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:19:07] RECOVERY - Puppet errors on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0] [00:19:31] RECOVERY - Puppet errors on deployment-memc06 is OK: OK: Less than 1.00% above the threshold [0.0] [00:20:12] RECOVERY - Puppet errors on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [00:20:26] RECOVERY - Puppet errors on deployment-imagescaler02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:20:50] RECOVERY - Puppet errors on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:21:06] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [00:22:20] RECOVERY - Puppet errors on deployment-kafka-jumbo-1 is OK: OK: Less than 1.00% above the threshold [0.0] [00:22:53] RECOVERY - Puppet errors on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:23:33] RECOVERY - Puppet errors on deployment-cassandra3-02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:23:39] RECOVERY - Puppet errors on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:25:41] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [00:31:17] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [00:38:33] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-WikimediaMaintenance: Error while create a new wiki in beta cluster - https://phabricator.wikimedia.org/T188353#4004499 (10awight) [01:07:09] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-WikimediaMaintenance: Error while create a new wiki in beta cluster - https://phabricator.wikimedia.org/T188353#4004499 (10demon) Well this was [[https://gerrit.wikimedia.org/g/mediawiki/extensions/WikimediaMaintenance/+/64ae4032d183b3ed33a2745bac2a08f99747... [01:08:07] 10Release-Engineering-Team (Watching / External), 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4004558 (10Dzahn) This host fails at the "Partition Disk" step in installer. It is similar but different from bas... [01:08:45] 10Release-Engineering-Team (Watching / External), 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4004563 (10Dzahn) The mgmt password works again since Chris re-enabled the root user. [02:26:07] 10Continuous-Integration-Infrastructure (shipyard): composer-package-php70-docker jobs have xdebug enabled by default - https://phabricator.wikimedia.org/T188363#4004733 (10Legoktm) [02:50:36] 10Continuous-Integration-Infrastructure, 10Tracking: PHP7 support in CI (tracking) - https://phabricator.wikimedia.org/T144964#4004787 (10Legoktm) [02:50:41] 10Continuous-Integration-Infrastructure (shipyard), 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Patch-For-Review: Run MediaWiki tests on PHP 7 - https://phabricator.wikimedia.org/T144962#4004785 (10Legoktm) 05stalled>03Open [02:52:28] 10Continuous-Integration-Infrastructure (shipyard), 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Patch-For-Review: Run MediaWiki tests on PHP 7 - https://phabricator.wikimedia.org/T144962#2615580 (10Legoktm) [02:53:40] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<22.22%) [03:01:26] !log Jenkins slave connection to deployment-tin is broken again. No error. Script console works. Disconnect/Relaunch doesn't resolve. 6 idle executors but jobs are no starting for some reason. [03:01:31] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [03:02:27] !log Deleted beta-* related job builds in Jenkins that were stuck >1hr [03:02:31] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [03:08:50] PROBLEM - Puppet staleness on deployment-eventlogging05 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [43200.0] [03:18:44] (03CR) 10Legoktm: [C: 032] "Thanks :)" [integration/docroot] - 10https://gerrit.wikimedia.org/r/414210 (owner: 10Libraryupgrader) [03:19:26] (03CR) 10jenkins-bot: build: Updating mediawiki/mediawiki-codesniffer to 16.0.1 [integration/docroot] - 10https://gerrit.wikimedia.org/r/414210 (owner: 10Libraryupgrader) [03:22:55] (03PS1) 10Legoktm: Run composer-package-php70-docker for oojs/ui [integration/config] - 10https://gerrit.wikimedia.org/r/414896 [03:27:47] (03PS2) 10Legoktm: Run composer-package-php70-docker for oojs/ui [integration/config] - 10https://gerrit.wikimedia.org/r/414896 [03:27:58] (03CR) 10Legoktm: [C: 032] Run composer-package-php70-docker for oojs/ui [integration/config] - 10https://gerrit.wikimedia.org/r/414896 (owner: 10Legoktm) [03:29:05] (03CR) 10Legoktm: "A year ago I think this made sense, but today it will cause more maintenance problems as we add more PHP versions, so I've reverted it in " [integration/config] - 10https://gerrit.wikimedia.org/r/345286 (https://phabricator.wikimedia.org/T155483) (owner: 10Prtksxna) [03:29:13] (03Merged) 10jenkins-bot: Run composer-package-php70-docker for oojs/ui [integration/config] - 10https://gerrit.wikimedia.org/r/414896 (owner: 10Legoktm) [03:29:56] !log deployed https://gerrit.wikimedia.org/r/414896 [03:30:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [03:30:06] 10Phabricator: Email sometimes not being sent when a task is created - https://phabricator.wikimedia.org/T182549#4004825 (10Anomie) >>! In T182549#4003905, @mmodell wrote: > I'm not sure how to debug this without more information. Are other people experiencing similar problems? Not that I've heard of, but I don... [03:54:09] Yippee, build fixed! [03:54:10] Project mediawiki-core-code-coverage-php7 build #112: 09FIXED in 54 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-php7/112/ [04:03:11] (03PS1) 10Chad: Kill beta-scap-eqiad job entirely [integration/config] - 10https://gerrit.wikimedia.org/r/414923 [04:05:43] (03CR) 10jenkins-bot: build: Updating phpunit/phpunit to 4.8.36 || ^6.5 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/414925 (owner: 10Libraryupgrader) [04:07:32] (03CR) 10Legoktm: "recheck" [integration/docroot] - 10https://gerrit.wikimedia.org/r/414922 (owner: 10Libraryupgrader) [04:10:39] 10Beta-Cluster-Infrastructure, 10Performance-Team: Make MediaWiki profiler in Beta match production - https://phabricator.wikimedia.org/T180766#4004845 (10Krinkle) [04:21:24] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Config, 10Release-Engineering-Team: Use cron instead of Jenkins for beta deployments - https://phabricator.wikimedia.org/T188367#4004864 (10Krinkle) [04:22:46] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Config, 10Release-Engineering-Team: Use cron instead of Jenkins for beta deployments - https://phabricator.wikimedia.org/T188367#4004874 (10Krinkle) [04:23:14] (03CR) 10Krinkle: "Wanna tag with T188367? :)" [integration/config] - 10https://gerrit.wikimedia.org/r/414923 (owner: 10Chad) [04:55:14] (03CR) 10Krinkle: [C: 032] build: Updating phpunit/phpunit to 4.8.36 || ^6.5 [integration/docroot] - 10https://gerrit.wikimedia.org/r/414922 (owner: 10Libraryupgrader) [04:55:48] (03CR) 10jenkins-bot: build: Updating phpunit/phpunit to 4.8.36 || ^6.5 [integration/docroot] - 10https://gerrit.wikimedia.org/r/414922 (owner: 10Libraryupgrader) [05:27:48] 10Continuous-Integration-Config, 10Security-Team, 10phan-taint-check-plugin, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Patch-For-Review: Make jenkins run phan-taint-check-plugin non-voting and then voting - https://phabricator.wikimedia.org/T182599#4004941 (10Legoktm) a:05Legoktm>03Bawolff... [05:30:24] 10Continuous-Integration-Config, 10Patch-For-Review, 10User-Addshore: Allow use of phan 0.8.5+ in wikimedia CI - https://phabricator.wikimedia.org/T174339#3558368 (10Legoktm) In the repository's composer.json we should add something like: ``` "extra": { "phan-version": "0.8.0", }, ``` But even before th... [05:57:50] Krinkle: no_justification: sometimes it just gets stuck starting the first job. I cancelled that job in jenkins and it executed all the other jobs on its own [05:57:56] (re beta update whatever) [05:58:05] Still doesn't need to be jenkins jobs [05:58:18] agreed [06:00:09] Project beta-scap-eqiad build #197369: 04FAILURE in 1 min 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197369/ [06:02:38] Hmmm, I bet that's the fault of my LCStore swap earlier [06:02:41] /usr/local/bin/mwscript mergeMessageFileList.php --wiki="dewiktionary" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.tMACeEOiKV" [06:04:45] * no_justification manually rebuilds all l10n [06:05:28] Project beta-scap-eqiad build #197370: 04STILL FAILING in 1 min 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197370/ [06:11:38] !log manually triggering a bunch of jenkins jobs [06:11:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [06:12:45] >>> print('\n'.join('zuul-test-repo %s' % repo for repo in l[:50])) [06:12:55] There are 468 repositories to trigger. [06:14:04] I'm thinking about creating a low priority PHP7 queue, triggering all 468 jobs, and just letting zuul deal with them, but I feel like that's just going to cause problems somewhere along the line [06:14:58] Fatal error: Uncaught exception 'MWException' with message 'No localisation cache found for English. Please run maintenance/rebuildLocalisationCache.php.' in /srv/mediawiki-staging/php-master/includes/cache/localisation/LocalisationCache.php:476 [06:14:59] Still? [06:15:01] I hate that error [06:15:54] Project beta-scap-eqiad build #197371: 04STILL FAILING in 2 min 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197371/ [06:21:24] Hmmmm [06:22:10] mwscript rebuildLocalisationCache.php --wiki=enwiki --lang=en --threads=4 --force [06:22:12] Reports no failure [06:22:21] But I see nothing in $IP/cache/ [06:25:30] Project beta-scap-eqiad build #197372: 04STILL FAILING in 1 min 48 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197372/ [06:35:44] Project beta-scap-eqiad build #197373: 04STILL FAILING in 1 min 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197373/ [06:36:39] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q4, 10Patch-For-Review: Migrate leftover Nodepool jobs to Docker - https://phabricator.wikimedia.org/T187797#4005023 (10greg) p:05Triage>03High This is a goal for Q4 (Apr-Jun) this year. [06:37:09] 10Continuous-Integration-Infrastructure, 10Nodepool: Add monitoring and capacity planning for Nodepool - https://phabricator.wikimedia.org/T113806#4005027 (10greg) 05Open>03declined We're migrating away (see eg T187797), no need to do this now. [06:37:38] 10Continuous-Integration-Infrastructure, 10Nodepool, 10Upstream: Nodepool leaks instances and does not gabage collect them - https://phabricator.wikimedia.org/T151949#4005032 (10greg) 05Open>03declined We're migrating away (see eg T187797), no need to do this now. [06:37:44] 10Continuous-Integration-Scaling, 10Nodepool: Nodepool should send metrics to statsd - https://phabricator.wikimedia.org/T111496#4005041 (10greg) [06:37:46] 10Continuous-Integration-Infrastructure, 10Documentation, 10Nodepool: Document Nodepool statsd metrics - https://phabricator.wikimedia.org/T111503#4005037 (10greg) 05Open>03declined We're migrating away (see eg T187797), no need to do this now. [06:39:07] Project beta-scap-eqiad build #197374: 04STILL FAILING in 3 min 19 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197374/ [06:40:39] Ughhhhh [06:42:56] what is going on? [06:43:39] Localization cache rebuild ain't working on beta [06:43:49] I think it's because of the change we made earlier today [06:43:50] I see [06:44:02] But A) why didn't it fail then, and B) why can't I seem to rebuild stuff? [06:44:33] Reverted. But I doubt it'll fix things [06:45:48] A-ha! [06:45:52] It was working! [06:45:57] * no_justification fixes [06:46:14] heh [06:48:05] Well, it's doing *something* now [06:48:19] l10nupdate vs jenkins-deploy fighting over permissions on cache/l10n/ [06:52:32] Project beta-scap-eqiad build #197375: 04STILL FAILING in 8 min 45 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197375/ [07:07:32] 10Project-Admins, 10Release-Engineering-Team (Next): Create Trusted Contributors project? - https://phabricator.wikimedia.org/T145832#4005061 (10greg) [07:08:38] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:09:09] Yippee, build fixed! [07:09:10] Project beta-scap-eqiad build #197376: 09FIXED in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197376/ [07:09:53] Yay! [07:10:00] We're running with php-backed l10n in beta now [07:10:50] PROBLEM - Free space - all mounts on deployment-mediawiki05 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki05.diskspace.root.byte_percentfree (<11.11%) [07:17:27] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Config, 10Release-Engineering-Team: Use cron instead of Jenkins for beta deployments - https://phabricator.wikimedia.org/T188367#4004864 (10greg) See also {T183164} [07:22:42] 10Phabricator, 10Analytics-Tech-community-metrics, 10Bugzilla-Migration, 10DevRel-November-2015: Closed tickets in Bugzilla migrated without closing event? - https://phabricator.wikimedia.org/T107254#4005100 (10greg) >>! In T107254#4000910, @Aklapper wrote: > For the records, the [[ https://phabricator.wik... [07:23:51] Project beta-scap-eqiad build #197377: 04FAILURE in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197377/ [07:25:01] 10Continuous-Integration-Infrastructure: castor rsync's taking 3-5 minutes for mwgate-npm jobs - https://phabricator.wikimedia.org/T188375#4005106 (10Legoktm) [07:25:18] no_justification: if only it stayed working :P [07:25:40] Son of a bitch. [07:25:49] PROBLEM - Free space - all mounts on deployment-mediawiki05 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki05.diskspace.root.byte_percentfree (<22.22%) [07:25:51] Project beta-scap-eqiad build #197378: 04STILL FAILING in 1 min 15 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197378/ [07:25:52] Its mostly cuz of scap I think [07:26:28] Bleh I guess I'll revert. Again. [07:28:05] Reverted. Curious why it took all day to fail [07:28:27] because all of the jenkins jobs were stuck until I unstuck them :p [07:28:30] Project beta-scap-eqiad build #197379: 04STILL FAILING in 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197379/ [07:28:48] PROBLEM - Free space - all mounts on deployment-mediawiki06 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki06.diskspace.root.byte_percentfree (<55.56%) [07:28:54] I also wonder why it breaks always during the evening [07:28:59] Like late [07:29:53] I wonder if l10nupdate is somehow involved in this madness... [07:30:46] Project beta-scap-eqiad build #197380: 04STILL FAILING in 37 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197380/ [07:37:22] (03PS1) 10Giuseppe Lavagetto: Remove debian-glue from conftool CI [integration/config] - 10https://gerrit.wikimedia.org/r/414957 [07:38:11] (03CR) 10Legoktm: [C: 032] Remove debian-glue from conftool CI [integration/config] - 10https://gerrit.wikimedia.org/r/414957 (owner: 10Giuseppe Lavagetto) [07:38:51] (03PS1) 10Chad: conftool: Swap to debian-glue-non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/414959 [07:39:18] (03Abandoned) 10Chad: conftool: Swap to debian-glue-non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/414959 (owner: 10Chad) [07:40:09] (03Merged) 10jenkins-bot: Remove debian-glue from conftool CI [integration/config] - 10https://gerrit.wikimedia.org/r/414957 (owner: 10Giuseppe Lavagetto) [07:40:51] !log deployed https://gerrit.wikimedia.org/r/414957 [07:40:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [07:42:53] tldr: https://phabricator.wikimedia.org/T105683 [07:43:44] > This prevents us from migrating to LCStoreStaticArray. [07:43:45] nice :D [07:55:09] Project beta-scap-eqiad build #197381: 04STILL FAILING in 21 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197381/ [08:00:26] I just threw 50 more jobs into the queue, which should be the last for tonight [08:01:52] (03CR) 10Legoktm: [C: 04-1] "Per my comment on the bug, just disabling the job that exposes an issue in the extension won't make the situation any better." [integration/config] - 10https://gerrit.wikimedia.org/r/414774 (https://phabricator.wikimedia.org/T185697) (owner: 10Lucas Werkmeister (WMDE)) [08:03:00] (03PS1) 10Legoktm: Run phan for Gadgets [integration/config] - 10https://gerrit.wikimedia.org/r/414962 [08:03:13] Project beta-scap-eqiad build #197382: 04STILL FAILING in 7 min 24 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197382/ [08:09:00] Project beta-scap-eqiad build #197383: 04STILL FAILING in 5 min 4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197383/ [08:18:39] Project beta-scap-eqiad build #197384: 04STILL FAILING in 5 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197384/ [08:28:40] Project beta-scap-eqiad build #197385: 04STILL FAILING in 5 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197385/ [08:33:45] Project beta-scap-eqiad build #197386: 04STILL FAILING in 4 min 57 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197386/ [08:39:28] Project beta-scap-eqiad build #197387: 04STILL FAILING in 5 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197387/ [08:44:32] !log deployment-mediawiki06: out of disk space. Ran apt-get clean [08:44:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:52:24] Project beta-scap-eqiad build #197388: 04STILL FAILING in 8 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197388/ [08:53:49] RECOVERY - Free space - all mounts on deployment-mediawiki06 is OK: OK: All targets OK [08:57:21] (03CR) 10Hashar: [C: 04-1] "beta-scap-eqiad is triggered by two jobs:" [integration/config] - 10https://gerrit.wikimedia.org/r/414923 (owner: 10Chad) [08:58:41] Project beta-scap-eqiad build #197389: 04STILL FAILING in 4 min 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197389/ [09:08:41] Project beta-scap-eqiad build #197390: 04STILL FAILING in 4 min 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197390/ [09:10:19] 10Continuous-Integration-Infrastructure: castor rsync's taking 3-5 minutes for mwgate-npm jobs - https://phabricator.wikimedia.org/T188375#4005106 (10hashar) On castor02.integration.eqiad.wmflabs: ``` $ sudo du -s -m /srv/jenkins-workspace/caches/castor-mw-ext-and-skins/master/mwgate-npm-node-6-docker 249 /srv/j... [09:12:17] (03CR) 10Hashar: "We will probably need to add more Docker slaves :]" [integration/config] - 10https://gerrit.wikimedia.org/r/413964 (https://phabricator.wikimedia.org/T144962) (owner: 10Legoktm) [09:18:26] Project beta-scap-eqiad build #197391: 04STILL FAILING in 4 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197391/ [09:22:31] 10Phabricator, 10Release-Engineering-Team (Kanban), 10monitoring, 10Browser-Tests, 10User-zeljkofilipin: Develop tests for phabricator search to detect regressions / search quality issues - https://phabricator.wikimedia.org/T182160#4005265 (10zeljkofilipin) I'm available for pairing and/or reviews! :) [09:28:42] Project beta-scap-eqiad build #197392: 04STILL FAILING in 5 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197392/ [09:29:32] 10Continuous-Integration-Infrastructure: castor rsync's taking 3-5 minutes for mwgate-npm jobs - https://phabricator.wikimedia.org/T188375#4005272 (10hashar) Sounds to me the caching system should probably be more distributed. Maybe using a distributed file system that all slaves would participate in, or brew ou... [09:35:28] !log deployment-mediawiki05: out of disk space. Ran apt-get clean, cleaned old kernels/packages and dropped hhvm bytecode cache [09:35:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:41:49] Yippee, build fixed! [09:41:50] Project beta-scap-eqiad build #197393: 09FIXED in 8 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/197393/ [09:50:49] RECOVERY - Free space - all mounts on deployment-mediawiki05 is OK: OK: All targets OK [10:04:49] 10Phabricator, 10MediaWiki-extensions-Translate, 10translatewiki.net, 10I18n: Improvements for automatic reporting of tasks from translatewiki to Phabricator - https://phabricator.wikimedia.org/T188379#4005297 (10Amire80) [10:05:00] 10Phabricator, 10MediaWiki-extensions-Translate, 10translatewiki.net, 10I18n: Improvements for automatic reporting of tasks from translatewiki to Phabricator - https://phabricator.wikimedia.org/T188379#4005297 (10Amire80) [10:07:35] (03PS1) 10Zfilipin: Move job from experimental to test and gate-and-submit pipelines for wikimedia/portals/deploy [integration/config] - 10https://gerrit.wikimedia.org/r/414970 (https://phabricator.wikimedia.org/T180777) [10:08:07] (03CR) 10Lucas Werkmeister (WMDE): "But that bug doesn’t happen in the regular tests, and I have no idea what’s different about the coverage builds. And it doesn’t look like " [integration/config] - 10https://gerrit.wikimedia.org/r/414774 (https://phabricator.wikimedia.org/T185697) (owner: 10Lucas Werkmeister (WMDE)) [10:10:44] (03CR) 10Zfilipin: [C: 032] Move job from experimental to test and gate-and-submit pipelines for wikimedia/portals/deploy [integration/config] - 10https://gerrit.wikimedia.org/r/414970 (https://phabricator.wikimedia.org/T180777) (owner: 10Zfilipin) [10:11:50] PROBLEM - Free space - all mounts on deployment-eventlog02 is CRITICAL: CRITICAL: deployment-prep.deployment-eventlog02.diskspace.root.byte_percentfree (<100.00%) [10:12:06] (03Merged) 10jenkins-bot: Move job from experimental to test and gate-and-submit pipelines for wikimedia/portals/deploy [integration/config] - 10https://gerrit.wikimedia.org/r/414970 (https://phabricator.wikimedia.org/T180777) (owner: 10Zfilipin) [10:15:01] !log Reloading Zuul to deploy d9ed9d4dded7d646fc9c4b54155613eef99752a9 [10:15:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:18:25] PROBLEM - Puppet errors on deployment-eventlogging05 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [10:20:48] 10Continuous-Integration-Config, 10Patch-For-Review, 10User-Addshore: Allow use of phan 0.8.5+ in wikimedia CI - https://phabricator.wikimedia.org/T174339#4005374 (10Addshore) >>! In T174339#4004946, @Legoktm wrote: > But even before that we need to update the docker image to not hardcode a specific version... [10:23:52] RECOVERY - Puppet staleness on deployment-eventlogging05 is OK: OK: Less than 1.00% above the threshold [3600.0] [10:33:26] RECOVERY - Puppet errors on deployment-eventlogging05 is OK: OK: Less than 1.00% above the threshold [0.0] [10:39:02] 10Release-Engineering-Team (Kanban), 10Fundraising-Backlog, 10MediaWiki-extensions-DonationInterface, 10Browser-Tests, and 2 others: Write browser tests for DonationInterface - https://phabricator.wikimedia.org/T99955#4005404 (10zeljkofilipin) Looks like CI is having trouble with DonationInterface. From [[... [10:43:40] (03CR) 10DCausse: [C: 031] selenium-EXTENSION-jessie Jenkins job e-mail notification [integration/config] - 10https://gerrit.wikimedia.org/r/412931 (https://phabricator.wikimedia.org/T185315) (owner: 10Zfilipin) [10:54:07] PROBLEM - Host deployment-videoscaler01 is DOWN: CRITICAL - Host Unreachable (10.68.19.130) [10:54:52] PROBLEM - Host deployment-tmh01 is DOWN: CRITICAL - Host Unreachable (10.68.16.211) [11:04:25] 10Release-Engineering-Team (Kanban), 10Fundraising-Backlog, 10MediaWiki-extensions-DonationInterface, 10Browser-Tests, and 2 others: Write browser tests for DonationInterface - https://phabricator.wikimedia.org/T99955#4005473 (10zeljkofilipin) a:05zeljkofilipin>03None Please see [[ https://gerrit.wikim... [11:31:13] (03CR) 10Zfilipin: [C: 032] "With three +1s I think this is safe to be merged and deployed. :)" [integration/config] - 10https://gerrit.wikimedia.org/r/412931 (https://phabricator.wikimedia.org/T185315) (owner: 10Zfilipin) [11:31:45] selfmergeabusezomg!!11 [11:32:44] (03Merged) 10jenkins-bot: selenium-EXTENSION-jessie Jenkins job e-mail notification [integration/config] - 10https://gerrit.wikimedia.org/r/412931 (https://phabricator.wikimedia.org/T185315) (owner: 10Zfilipin) [11:38:34] 10Release-Engineering-Team (Kanban), 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), 10Patch-For-Review, 10User-zeljkofilipin: Q3 Selenium framework improvements - https://phabricator.wikimedia.org/T182421#4005539 (10zeljkofilipin) [11:38:37] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-zeljkofilipin: selenium-EXTENSION-jessie Jenkins job should have e-mail notification - https://phabricator.wikimedia.org/T185315#4005537 (10zeljkofilipin) 05Open>03Resolved The commit is deployed: ``` $ jenkins-jobs --conf etc/jenkins_jobs.... [11:39:12] (03CR) 10Zfilipin: [C: 032] "The commit is deployed. T185315#4005537" [integration/config] - 10https://gerrit.wikimedia.org/r/412931 (https://phabricator.wikimedia.org/T185315) (owner: 10Zfilipin) [12:30:54] PROBLEM - Puppet errors on deployment-ores01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:39:52] 10Phabricator, 10Community-Liaisons, 10Developer-Relations, 10Developer-Wishlist (2017), 10Goal: Consolidate the many tech events calendars in Phabricator's calendar - https://phabricator.wikimedia.org/T1035#4005688 (10Qgil) p:05Normal>03Low a:05Qgil>03None I keep not having time to even start th... [12:58:55] 10Phabricator, 10MediaWiki-extensions-Translate, 10translatewiki.net, 10I18n: Improvements for automatic reporting of tasks from translatewiki to Phabricator - https://phabricator.wikimedia.org/T188379#4005752 (10Aklapper) <3. Thanks! > [ ] Create a new tag, for example "translatewiki-support-requests". Th... [13:05:50] 10Phabricator, 10Analytics-Tech-community-metrics, 10Bugzilla-Migration, 10DevRel-November-2015: Closed tickets in Bugzilla migrated without closing event? - https://phabricator.wikimedia.org/T107254#4005767 (10Aklapper) >>! In T107254#4005100, @greg wrote: > Now we have a good place for the data, do we ha... [13:12:03] (03PS1) 10Hashar: Skip integration tests in maven site publish job [integration/config] - 10https://gerrit.wikimedia.org/r/415001 [13:12:05] (03PS1) 10Hashar: Migrate wikidata/query/rdf publish job to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/415002 [13:22:32] 10Release-Engineering-Team (Kanban), 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), 10Patch-For-Review, 10User-zeljkofilipin: Update README file for Selenium tests - https://phabricator.wikimedia.org/T187862#4005830 (10zeljkofilipin) [13:28:35] 10Release-Engineering-Team (Kanban), 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), 10Patch-For-Review, 10User-zeljkofilipin: Update README file for Selenium tests - https://phabricator.wikimedia.org/T187862#4005846 (10zeljkofilipin) [13:29:11] 10Release-Engineering-Team (Kanban), 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), 10Patch-For-Review, 10User-zeljkofilipin: Update README file for Selenium tests - https://phabricator.wikimedia.org/T187862#4005850 (10zeljkofilipin) [13:29:47] 10Release-Engineering-Team (Kanban), 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), 10Patch-For-Review, 10User-zeljkofilipin: Update README file for Selenium tests - https://phabricator.wikimedia.org/T187862#3988462 (10zeljkofilipin) [13:41:16] (03CR) 10Hashar: "The job runs on postmerge with:" [integration/config] - 10https://gerrit.wikimedia.org/r/415001 (owner: 10Hashar) [13:41:35] (03CR) 10Hashar: [C: 04-1] "The build is broken somehow." [integration/config] - 10https://gerrit.wikimedia.org/r/415002 (owner: 10Hashar) [14:33:23] 10Release-Engineering-Team (Kanban), 10Fundraising-Backlog, 10MediaWiki-extensions-DonationInterface, 10Browser-Tests, and 2 others: Write browser tests for DonationInterface - https://phabricator.wikimedia.org/T99955#4006066 (10hashar) DonationInterface test environment is differentthan the other jobs. It... [14:41:13] (03CR) 10DCausse: Skip integration tests in maven site publish job (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/415001 (owner: 10Hashar) [14:49:45] 10Continuous-Integration-Config, 10ContentTranslation, 10MediaWiki-extensions-Interwiki, 10MediaWiki-extensions-TranslationNotifications, 10WikimediaMessages: Add i18n related extensions to mediawiki-extensions-(php55|php70|hhvm) - https://phabricator.wikimedia.org/T86930#979673 (10Amire80) Updating the... [14:53:15] PROBLEM - Host deployment-puppetdb01 is DOWN: CRITICAL - Host Unreachable (10.68.23.76) [15:04:00] (03CR) 10Gehel: "I'm not sure that's what we want, and I am now wondering why we are skipping the unit tests. Since we're not keeping any state from the ma" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/415001 (owner: 10Hashar) [15:05:06] dcausse: gehel: merci pour skipITs :] [15:05:13] saletés de minuscules [15:06:22] (03CR) 10Gehel: Migrate wikidata/query/rdf publish job to Docker (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/415002 (owner: 10Hashar) [15:07:26] hashar: I know you removed as reviewer of ^, but I still read it :) [15:07:42] hashar: it looks like we use a different base docker image for each project. Why is that? [15:08:00] Couldn't we share the same image for all java 8 based builds? [15:08:10] ideally yeah [15:08:23] but ... ? [15:08:26] but some java repos require some different Debian packages iirc [15:08:39] but yeah probably we should unify them as a single container [15:09:01] for wikidata/query/rdf I gotta try a bit more locally until I have something actually passing [15:09:20] or at least come up with an easy reproduction case [15:09:25] as far as I know, all the discovery java projects are self contained and only require maven (not even that, we have mvn wrapper in the repo) [15:10:00] Well, they obviously require a JDK8 as well. If that's not the case, it is probably an error on our side, which should be fixed [15:10:10] or at least clearly documented [15:11:10] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: WebdriverIO `sync: false` - https://phabricator.wikimedia.org/T182412#4006300 (10zeljkofilipin) p:05Normal>03Low [15:11:41] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Document differences between Ruby and Node.js Selenium frameworks - https://phabricator.wikimedia.org/T182692#4006302 (10zeljkofilipin) p:05Low>03High [15:12:53] gehel: the difference between images: mjolnir has liblapack3 libgomp1 , query-rdf needs phanromjs and some env variables for npm and phantomjs, xgboost has "cmake gcc g++ make openjdk-8-jdk python-minimal" for node-gyp [15:13:14] so I guess I went with dedicated containters to better reflect the corner cases [15:13:24] but yeah probably that could be unified [15:14:21] yeah, mjolnir is a special case [15:14:54] not as much a java project as a java / python / scala / c / c++ / etc... bundle of everything [15:14:59] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Sample code in Node.js for repositories that still have Selenium+Ruby tests - https://phabricator.wikimedia.org/T183160#4006326 (10zeljkofilipin) p:05Normal>03Low [15:15:16] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Patches in Gerrit deleting Selenium+Ruby tests for repositories that still have them - https://phabricator.wikimedia.org/T183162#4006328 (10zeljkofilipin) p:05Normal>03Low [15:15:40] hashar: babysitting time, I'll be back around 5pm [15:15:53] hashar: let me know if I can help you to unify those containers... [15:16:17] gehel: yeah I will keep that in mind, but I guess that will be for later :] [15:22:46] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Phlogiston: Phlogiston reports don't have new data since mid-February - https://phabricator.wikimedia.org/T188149#4006341 (10mmodell) So, after much digging I wasn't able to find any changes to edges, however, it appears that the dump stops after the 9999... [15:38:50] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: Update page object pattern in Selenium tests - https://phabricator.wikimedia.org/T185094#4006388 (10zeljkofilipin) As far as I can tell, `import()` is not supported yet even in the latest Node.js release.... [15:44:50] (03PS2) 10Hashar: Skip integration tests in maven site publish job [integration/config] - 10https://gerrit.wikimedia.org/r/415001 [15:44:52] (03PS2) 10Hashar: Migrate wikidata/query/rdf publish job to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/415002 [15:45:43] (03CR) 10Hashar: Skip integration tests in maven site publish job (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/415001 (owner: 10Hashar) [15:47:18] (03CR) 10Hashar: "In reply to Gehel," [integration/config] - 10https://gerrit.wikimedia.org/r/415001 (owner: 10Hashar) [15:55:27] (03PS1) 10Zfilipin: WIP Create selenium-core-jessie daily Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/415027 (https://phabricator.wikimedia.org/T185011) [16:04:29] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin, 10Wikimedia-Incident: Create selenium-core-jessie daily Jenkins job - https://phabricator.wikimedia.org/T185011#4006448 (10zeljkofilipin) Created test job, [[ https://integration.wikimedia.org/ci/view/S... [16:12:53] 10Release-Engineering-Team (Kanban), 10Fundraising-Backlog, 10MediaWiki-extensions-DonationInterface, 10Browser-Tests, and 2 others: Write browser tests for DonationInterface - https://phabricator.wikimedia.org/T99955#4006457 (10Ejegg) Thanks so much for all the help! We might have to wait till June for th... [16:13:48] Project mediawiki-core-code-coverage build #3352: 15ABORTED in 1 hr 13 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/3352/ [16:26:02] (03PS2) 10Zfilipin: WIP Create selenium-core-jessie daily Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/415027 (https://phabricator.wikimedia.org/T185011) [16:33:19] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:50:36] (03PS3) 10Zfilipin: WIP Create selenium-core-jessie daily Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/415027 (https://phabricator.wikimedia.org/T185011) [16:58:59] 10Release-Engineering-Team (Watching / External), 10Epic, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10User-notice: Deploy refactored comment storage - https://phabricator.wikimedia.org/T166733#4006641 (10Anomie) [17:00:43] (03CR) 10Gehel: "@Hashar: let me know if I can help. site generation works fine locally for me, so I'm not sure what issue you are running into. Yes, maven" [integration/config] - 10https://gerrit.wikimedia.org/r/415001 (owner: 10Hashar) [17:01:42] 10Release-Engineering-Team (Watching / External), 10Epic, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10User-notice: Deploy refactored comment storage - https://phabricator.wikimedia.org/T166733#4006668 (10Anomie) Before we can begin step 3.1, {T181650} needs to be done so we know what exactly to ann... [17:01:57] 10Release-Engineering-Team (Watching / External), 10Epic, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10User-notice: Deploy refactored comment storage - https://phabricator.wikimedia.org/T166733#4006671 (10Anomie) [17:02:45] 10Release-Engineering-Team (Kanban), 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), 10Patch-For-Review, 10User-zeljkofilipin: Update README file for Selenium tests - https://phabricator.wikimedia.org/T187862#4006674 (10zeljkofilipin) [17:13:13] 10Phabricator, 10Wikimedia Phabricator RfC: Configure Phabricator for our needs - https://phabricator.wikimedia.org/T34#4006724 (10thiemowmde) [17:13:16] 10Phabricator (Upstream), 10Wikimedia Phabricator RfC, 10Upstream: Configure the default styling to have a bit bigger font size - https://phabricator.wikimedia.org/T81#4006722 (10thiemowmde) 05Open>03Invalid Unfortunately I did not uploaded a screenshot back then. From todays perspective I can not tell a... [17:13:36] 10Phabricator, 10MediaWiki-extensions-Translate, 10translatewiki.net, 10I18n: Improvements for automatic reporting of tasks from translatewiki to Phabricator - https://phabricator.wikimedia.org/T188379#4006726 (10Nikerabbit) I support these ideas. Would be nice if someone picked this up, I can provide guid... [17:18:54] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:21:18] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:21:32] PROBLEM - App Server Main HTTP Response on deployment-mediawiki04 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:21:36] (03PS4) 10Zfilipin: WIP Create selenium-core-jessie daily Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/415027 (https://phabricator.wikimedia.org/T185011) [17:23:50] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 36196 bytes in 3.966 second response time [17:26:13] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 47743 bytes in 3.642 second response time [17:26:27] RECOVERY - App Server Main HTTP Response on deployment-mediawiki04 is OK: HTTP OK: HTTP/1.1 200 OK - 47134 bytes in 3.827 second response time [17:39:54] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:42:19] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:42:33] PROBLEM - App Server Main HTTP Response on deployment-mediawiki04 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:45:39] (03PS5) 10Zfilipin: WIP Create selenium-core-jessie daily Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/415027 (https://phabricator.wikimedia.org/T185011) [17:54:47] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 36213 bytes in 3.408 second response time [17:56:49] 10Phabricator, 10MediaWiki-extensions-Translate, 10translatewiki.net, 10I18n: Improvements for automatic reporting of tasks from translatewiki to Phabricator - https://phabricator.wikimedia.org/T188379#4006943 (10Nemo_bis) What would a multiplication of tags help? [18:00:57] PROBLEM - Puppet errors on deployment-prometheus01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [18:03:40] it's me ^ [18:04:24] and i'm about to freak out because this is like 5 fixes after the original one which caused the second one and so on [18:04:45] if i cant find the next fix i'll just have to revert ALL THE THINGS and give up [18:04:53] :( [18:06:10] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Watching / External), 10Jenkins, 10Upstream: Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) - https://phabricator.wikimedia.org/T72597#4006993 (1... [18:06:12] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Jenkins: Move the beta cluster jobs to a dedicated/standalone Jenkins instance - https://phabricator.wikimedia.org/T183164#4006989 (10demon) 05Open>03declined We're gonna go with the other task T188367 instead. [18:07:53] (03CR) 10Chad: "> So the issue to be figured out is how to deal with the race" [integration/config] - 10https://gerrit.wikimedia.org/r/414923 (owner: 10Chad) [18:08:02] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Config, 10Release-Engineering-Team: Use cron instead of Jenkins for beta deployments - https://phabricator.wikimedia.org/T188367#4007009 (10greg) [18:09:22] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Watching / External), 10Jenkins, 10Upstream: Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) - https://phabricator.wikimedia.org/T72597#4007008 (1... [18:32:43] PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:34:35] PROBLEM - App Server Main HTTP Response on deployment-mediawiki07 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 hphp_invoke - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 287 bytes in 0.020 second response time [18:54:29] PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:13:05] ^ that one is not me [19:13:07] The last Puppet run was at Mon Dec 11 18:20:54 UTC 2017 (112371 minutes ago). [19:15:02] but if that has been alerting since then .. [19:15:35] maybe nobody sees it [19:15:40] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Wikidata: Request access to beta cluster for Lucas Werkmeister - https://phabricator.wikimedia.org/T188427#4007356 (10Ladsgroup) [19:26:58] deployment-prometheus01 now fixed [19:27:49] deployment-secured* broken as before and unrelated to what i did [19:30:06] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q4, 10Patch-For-Review: Migrate leftover Nodepool jobs to Docker - https://phabricator.wikimedia.org/T187797#4007449 (10hashar) [19:35:56] RECOVERY - Puppet errors on deployment-prometheus01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:36:40] deployment-cache-text04: also not me. The last Puppet run was at Tue Feb 13 12:31:52 UTC 2018 (20580 minutes ago). [19:51:41] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Phlogiston: Phlogiston reports don't have new data since mid-February - https://phabricator.wikimedia.org/T188149#4007556 (10JAufrecht) I still see 179180 tasks in the dump, which is presumably all of the public tasks. What does seem to be missing in the... [20:22:05] no_justification: btw, not knowing why https://phabricator.wikimedia.org/T181833 is happening is making me scared of deploying anything wmf-config or dblists related. [20:22:48] I'm not even sure if order of changes is insured, e.g. if I deploy a change that removes use of a dblist and a server misses that somehow, and then later a change to remove the dblist, can I even be sure that the server will either miss again or pick up both? [20:22:55] I don't think so, given dblist isn't a php file. [20:24:39] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint): Rewrite mediawiki-core-doxygen-publish Jenkins job to poll scm instead of being triggered by Zuul - https://phabricator.wikimedia.org/T115755#4007698 (10Krinkle) [20:30:32] Project selenium-Wikibase-chrome » chrome,beta,Linux,DebianJessie && contintLabsSlave build #123: 04FAILURE in 43 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase-chrome/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=DebianJessie%20&&%20contintLabsSlave/123/ [20:35:00] Krinkle: symlinks scare me here [20:38:20] 10Phabricator, 10Project-Admins, 10NewPHP: Replace #newphp project with specific projects for PHP versions (#php7.0, #php7.1, etc.) - https://phabricator.wikimedia.org/T188436#4007719 (10Legoktm) [20:43:11] Krinkle: Tbh, does the local file cache for InitialiseSettings actually make a difference in a opcode-cached world? [20:44:48] no_justification: I think it does, yes, but we should test it to be sure. We don't cache wmf-config for caching the parsing of the array or for deciding which keys become global variables. The thing we're caching is the 1) parsing of dblists, 2) elaborate fallback logic from dbname/project-names/tags/default, placeholder values, and complex array_merging strategies, and finally, to export them as global variables indeed. [20:45:07] That sounds worthwhile caching, but also don't know how much time would be spent there to be sure. [20:45:10] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T183962#4007750 (10thcipriani) [20:45:14] I suppose we could try APC instead, though. [20:45:16] That might be simpler. [20:45:39] But regardless, we need to figure out why it is failing right now. If the problem is with the cache key (mtime), then switching the store won't help [20:45:42] Yeah, throwing it into apcu or whatever makes more sense imho [20:46:16] Relying on mtimes only results in heartache :( [20:47:01] What if instead of relying on mtimes, the final step of an AbstractSync became "purge cached version of IS" [20:47:52] (in case folks didn't notice, I made touching IS unconditional on basically all syncs quite some time ago, so it obviously hasn't hurt too many folks) [20:48:25] Which implies to me the mtime check isn't infallible. [20:48:38] FWIW, I like the idea of purging IS via scap. Currently there are a lot of weird timing things that could be happening. [20:49:09] That was the idea behind touch-it-every-time [20:49:23] Because you could change a dblist but if IS doesn't get purged then it won't be recomputed. [20:49:45] What if the issue is a race condition with the mtime? [20:49:46] there's the time when scap updates the file, the timing when rsync moves the file out of .~.tmp or whatever, there's the timing of the incoming web request that acutally triggers the mtime check. It's weird. [20:50:01] Seems like the only thing that would explain it [20:50:15] Yeah [20:50:53] Purging it after all that should be safe, that way at least we won’t purge to soon and make it remain stale [20:51:03] I can work up a patch for scap to this [20:51:06] *to scap for this [20:51:20] But purging after does open the possibility of code running with newer versions of other files [20:51:43] So we’d have to be really strict to use separate changes and syncs for this stuff [20:52:00] also had the thought, we got burned by this in git-fat, rsync could move it before moving CS, then move IS, then web request hits, then scap touches IS all within the same second. This was the problem with .git's index it only cared about the second some file was touched for cache purposes. [20:52:02] Actually. Current behavior is touch(1) after we've finished rsync [20:52:10] So this is on scap *pull* [20:52:35] which we do currently [20:52:45] That sounds good [20:53:00] So that means mtimes may vary across servers [20:53:04] Which is fine [20:53:09] But....how are we still having a race condition...? [20:53:12] If we touch last? [20:53:16] Right [20:53:39] unclear, I've tried to think through why we still have race conditions, but there are a lot of timing things that could be happening. [20:53:41] I'm looking at tasks.sync_common() btw [20:54:41] Also: remember that I removed --no-touch / --beta-only-change awhile ago too [20:54:44] yeah, it's https://github.com/wikimedia/scap/blob/master/scap/tasks.py#L364-L366 [20:54:44] So can't blame that [20:55:07] I've seen it continue to use old IS code even when scap logs it's been touched. [20:55:35] What does -n for sudo(1) do? [20:55:48] Ah non-interactive, got it [20:56:05] afaict it's some race/problem/something between the time a request hits the server and the timing of the sync/rsync/touch [20:56:56] inode number does, likely, factor into this somehow :) [20:57:09] since we use --delay-update with rsync [20:57:26] Um, also `scap pull` isn't inherently the last step of a deploy [20:57:44] this is true [20:57:59] although at the end of that IS should be done being modified [20:58:08] A full scap will also trigger rebuild-json-cdbs-whatever and compile-wikiversions [20:58:17] 10Phabricator, 10Project-Admins, 10NewPHP: Replace #newphp project with specific projects for PHP versions (#php7.0, #php7.1, etc.) - https://phabricator.wikimedia.org/T188436#4007719 (10Jdforrester-WMF) WFM. What scope are we applying? "Just" MediaWiki and bundled extensions? Wikimedia Foundation production... [20:58:27] 10Continuous-Integration-Config, 10Zuul: cleanup outdated tests/dependencies on jenkins' composer.json - https://phabricator.wikimedia.org/T188438#4007792 (10MarcoAurelio) [21:00:30] huh [21:00:43] I wonder if this could be the same problem as git-fat [21:01:00] er https://github.com/git/git/blob/master/Documentation/technical/racy-git.txt [21:01:42] which is, filemtime is returned a unixtimestamp [21:01:49] 10Continuous-Integration-Config, 10Patch-For-Review, 10User-Addshore: Allow use of phan 0.8.5+ in wikimedia CI - https://phabricator.wikimedia.org/T174339#4007828 (10Legoktm) I was thinking of having it install phan at runtime. [21:03:46] so if you have CS.php and IS.php changed, you do a sync-dir wmf-config, IS.php has a lower inode # on the target, --delay-update moves that one *before* CS.php, web request comes in, caches values without CS.php. [21:03:53] 10Phabricator, 10Project-Admins, 10NewPHP: Replace #newphp project with specific projects for PHP versions (#php7.0, #php7.1, etc.) - https://phabricator.wikimedia.org/T188436#4007719 (10Smalyshev) I'd say WMF production and all extensions we choose to care about for some specific reason. [21:04:09] then the touch happens the same second of the sync, so no new caching is triggered. [21:06:26] 10Phabricator, 10Project-Admins, 10NewPHP: Replace #newphp project with specific projects for PHP versions (#php7.0, #php7.1, etc.) - https://phabricator.wikimedia.org/T188436#4007847 (10Legoktm) Anything Phab is normally the bug tracker for? :) The only non-MediaWiki core related task I see in there is T180... [21:08:20] (03Draft1) 10MarcoAurelio: Run phan for mediawiki/extensions/CentralLogging [integration/config] - 10https://gerrit.wikimedia.org/r/415099 [21:08:22] (03PS2) 10MarcoAurelio: Run phan for mediawiki/extensions/CentralLogging [integration/config] - 10https://gerrit.wikimedia.org/r/415099 [21:08:23] PROBLEM - Puppet errors on deployment-mx02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:10:16] (03PS1) 10Hashar: Experimental docker based job for oojs/ui [integration/config] - 10https://gerrit.wikimedia.org/r/415102 (https://phabricator.wikimedia.org/T187797) [21:13:12] thcipriani: What if we put the touch somewhere offset a bit? [21:13:27] time.sleep(1) [21:13:29] :) [21:13:39] Heh [21:14:14] for the racy-git problem I modified the timestamp directly to set the file a second into the future IIRC [21:14:23] (03CR) 10Hashar: [C: 032] "Works for me locally!!! :D" [integration/config] - 10https://gerrit.wikimedia.org/r/415102 (https://phabricator.wikimedia.org/T187797) (owner: 10Hashar) [21:14:45] https://github.com/wikimedia/operations-debs-git-fat/commit/0e3abb0c5e8b1e4d81470397ec17138c6d24d9e8 [21:14:56] so effectively time.sleep(1) [21:16:07] I *did* come across https://github.com/git/git/blob/master/templates/hooks--fsmonitor-watchman.sample / core.fsmonitor [21:16:10] Last week [21:16:24] Also! Ævar just contributed a patch to it?! https://public-inbox.org/git/CACsJy8CJtW3LZ+4Z_06uM4rJO88FXsNvcw+zzVqdFpsQUKrvrg@mail.gmail.com/T/ [21:16:37] Krinkle: ^^^ !!! [21:16:52] (03Merged) 10jenkins-bot: Experimental docker based job for oojs/ui [integration/config] - 10https://gerrit.wikimedia.org/r/415102 (https://phabricator.wikimedia.org/T187797) (owner: 10Hashar) [21:17:16] thcipriani: fsmonitor seems to be used in git-update-index [21:17:21] !log Building docker image releng/npm-test-oojsui:0.1.0 - https://gerrit.wikimedia.org/r/#/c/415102/ [21:17:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:20:31] thcipriani: Also interesting.... core.ignorestat [21:21:05] nice, did not know that was an option :) [21:21:59] 10Continuous-Integration-Config, 10Zuul: cleanup outdated tests/dependencies on jenkins' composer.json - https://phabricator.wikimedia.org/T188438#4007970 (10Legoktm) We could, I'm not sure if it would make a speed difference one way or another. It's cloning mediawiki/vendor, then removing all the non-core/ext... [21:23:10] Also some optimizations we should look at setting: core.{compression,looseCompression,packedGitWindowSize,packedGitLimit,deltaBaseCacheLimit} so `gc` does the Right Thing [21:23:32] Ooooooh! #til about core.hooksPath [21:23:48] You could setup a ~/.githooks/ or something and set your repos to point there [21:24:06] So no more having to curl the commit-msg one from gerrit ;-) [21:24:59] legoktm: Solved our bootstrap-repos-properly problem: https://git-scm.com/docs/git-init#_template_directory [21:25:19] We can create git repos outside of gerrit's admin UI. So if we did our own create page (via our plugin) we could do something like that [21:26:55] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T183962#4008031 (10thcipriani) [21:28:32] Fatal error: write EPIPE pff [21:41:23] thcipriani: https://phabricator.wikimedia.org/D989 should be trivial but useful :) [21:44:40] thcipriani: Wait. I wonder....if the .git directory you generate for all of mw-staging is...involved? [21:44:40] Like, are the changes to /that/ tree fucking with our pile of mtimes [21:44:57] Cuz so far we've got 4 (I think?) mtimes that are competing. [21:45:43] (03PS1) 10Hashar: Fix up oojs-ui Docker job and example [integration/config] - 10https://gerrit.wikimedia.org/r/415162 [21:45:57] 1) Git work tree on tin [21:45:57] 2) general stat times [21:45:58] 3) rsync's time [21:45:58] 4) Touch time [21:45:58] 5) The shared git tree thingie you added's times? [21:46:00] Wait, that's 5?! [21:46:14] no_justification: and hhvm stat cache? but maybe it is disabled :D [21:47:10] True, so one of those 5 ^ is getting compared to a prior stat time (of those 5) which may or may not be cached [21:48:21] no_justification: also from a paste incident https://phabricator.wikimedia.org/T134448 [21:48:53] which was due to https://phabricator.wikimedia.org/rOMWC9330dbd38fc85998d8bdea67224ea4063a60d490 [21:49:02] that replaced touch IS.php by a touch of the config dir [21:49:39] end of story: hhvm was not picking up the mtime change on the dir [21:49:56] and we have: modules/hhvm/manifests/init.pp: stat_cache => true, [21:50:05] so "maybe" it is related to whatever issue you are fighting with [21:53:49] hrm, the generated git directory should be disabled for prod if you're talking about the flattened one, but I'm sure that would confound the issue further. [21:53:50] 10Continuous-Integration-Config: cleanup outdated tests/dependencies on jenkins' composer.json - https://phabricator.wikimedia.org/T188438#4008124 (10Legoktm) p:05Triage>03Lowest [22:02:06] https://integration.wikimedia.org/ci/job/oojs-ui-npm-run-jenkins-node-6-docker/4/ SUCCESS ! [22:02:16] I am off to bed. Happy hackings! [22:03:33] hash [22:03:34] Whoops [22:03:38] I was gonna say....that was me :p [22:04:20] thcipriani: Hmmmm, touch being after pull but before recompiling wikiversions *definitely* could lead to stale cache. [22:05:15] I feel like caching on mtime is just gonna keep biting us. We should move to a "cache forever unless we invalidate it" model [22:05:19] (scap would always invalidate it) [22:05:55] 10Phabricator, 10Project-Admins, 10NewPHP: Replace #newphp project with specific projects for PHP versions (#php7.0, #php7.1, etc.) - https://phabricator.wikimedia.org/T188436#4008235 (10Jdforrester-WMF) >>! In T188436#4007847, @Legoktm wrote: > Anything Phab is normally the bug tracker for? :) So this proj... [22:07:09] tools-promtheus-01 also fixed now [22:08:17] 10Phabricator, 10Project-Admins, 10NewPHP: Replace #newphp project with specific projects for PHP versions (#php7.0, #php7.1, etc.) - https://phabricator.wikimedia.org/T188436#4007719 (10demon) Or you can just close the project when it's mostly done. Plenty of tasks have now-greyed-out projects attached. [22:20:31] (03CR) 10Hashar: [C: 032] Fix up oojs-ui Docker job and example [integration/config] - 10https://gerrit.wikimedia.org/r/415162 (owner: 10Hashar) [22:21:03] no_justification /me add [22:21:05] oh [22:21:12] wait, that meessage was sent to early [22:21:34] anyways i've added syntax highlighting to docker and less in https://gerrit-review.googlesource.com/c/gerrit/+/162815/ :) [22:23:36] (03Merged) 10jenkins-bot: Fix up oojs-ui Docker job and example [integration/config] - 10https://gerrit.wikimedia.org/r/415162 (owner: 10Hashar) [22:48:17] 10Phabricator, 10Project-Admins, 10NewPHP: Replace #newphp project with specific projects for PHP versions (#php7.0, #php7.1, etc.) - https://phabricator.wikimedia.org/T188436#4008458 (10Jdforrester-WMF) Fine. Done: #php70 #php71 #php72 [22:48:50] paladox: Protip....I'd more information in your commit summary about what version of highlight.js you introduced (maybe with a link to a git log or something). The minified file is basically impossible to review [22:49:02] Also: folks other than WMF use those formats, no need to name drop us :) [22:51:44] 10MediaWiki-Releasing, 10PHP 7.1 support, 10MW-1.27-release, 10MW-1.27-release-notes: Make MediaWiki 1.27 (LTS) compatible with PHP 7.1 - https://phabricator.wikimedia.org/T174262#4008481 (10Jdforrester-WMF) [22:55:17] 10Phabricator, 10Project-Admins, 10[DO NOT USE] NewPHP: Replace #newphp project with specific projects for PHP versions (#php7.0, #php7.1, etc.) - https://phabricator.wikimedia.org/T188436#4008495 (10Jdforrester-WMF) 05Open>03Resolved a:03Jdforrester-WMF [23:01:02] no_justification: heh [23:01:11] no_justification: upstream wanted use cases [23:01:31] no_justification: also that repo is built from the master branch [23:02:08] The rep is now unmainted so it is basically just adding more langs , no actual changes except Lang’s [23:03:49] no_justification: I had to manually build that file to add those Lang’s [23:04:02] no_justification: Is there a view as to when 1.31-rc.0 will be cut? After wmf.30? [23:04:47] James_F: Nfc. [23:04:50] paladox: Fair 'nuff [23:05:29] no_justification: I would like them to switch to codemirror [23:05:40] But that seems to be a pretty large project [23:06:52] * James_F nods. [23:08:54] Idea that would simplify make-release: just sign the release *email* with GPG, and include the md5/sha1/sha256 sums for the files [23:09:03] cf: https://groups.google.com/d/msg/repo-discuss/NqSp7MJ11Cs/rrgUZwN7AQAJ [23:09:17] (technically they also sign the maven central uploads, but meh) [23:09:44] Serving over https + providing file hashes is plenty to verify a release [23:09:55] But we can sign the e-mail too so nobody thinks it's been faked. [23:10:10] James_F: I made all of the 1.31-wmf.XX blocker tasks that I expected for this series... /me looks [23:10:25] (forgot a .0) [23:10:38] so yeah, after .30 is what I was thinking: https://phabricator.wikimedia.org/project/board/2770/ [23:10:57] https://phabricator.wikimedia.org/project/view/3011/ [23:10:57] which is cut on April 17th [23:11:11] Yeah, I guess that makes sense. [23:11:15] May/Nov is our usual timeline [23:11:21] greg-g: OK. [23:12:03] greg-g: Sharpens the mind with deadlines for some of our bigger in-flight changes, then. :-) [23:12:09] cool [23:16:17] no_justification: debian would much prefer tarballs themselves are signed, because that way the only thing you need to verify the authenticity/integrity is the tarball, signature, and public key...not emails, hashes, etc. [23:16:32] Fair enough [23:16:53] We should/could add sha/md5 sums too [23:19:53] If someone has spare cycles, getting make-release/branch.py in final working order would be great ;-) [23:20:48] (spoiler: it's also designed to work like make-wmf-branch so I can kill that outright) [23:22:42] new permalink :) - https://wikitech.wikimedia.org/wiki/Deployments#!/deploycal/current [23:22:47] for "current" [23:22:53] (might take a minute for js cache to roll over) [23:28:21] Krinkle: nice! Now we just need a gadget that makes editing that page less insane [23:29:01] bd808: Hehe, yeah. Does it currently work with VE? If not, using template data for the templates would make it a bit easier at least. [23:29:05] like a form that gives you a list of future SWAT windows and takes your patch, task, and irc nick [23:30:09] VE sees the whole table as on giant #invoke [23:30:16] Hm.. yeah, right now it's not very VE-compatible. Yeah [23:30:44] two problems: unbalanced templates (there is a "start table" template and an "end table" template, instead of a "table" template taking the rest as parameters), and nested templates. [23:30:51] It would be easier if each day were a section [23:30:56] Right [23:31:13] With the main table boilerplate substituted into the page (just simple table open syntax and a few header labels). [23:31:23] Then possibly templates for entries, which would have a nice form to edit in VE. [23:31:39] So that clicking a row in VE would open the simple template dialog which nice parameters. [23:32:08] * bd808 tries not to fall into this rabbithole [23:32:41] if the DOM changes too much we'd need to update jouncebot's parser too, but that should be possible [23:32:53] * bd808 really backs away from the edge of the hole [23:34:35] I toyed with the idea of a public GCalendar [23:34:44] But we'd want a way to approve new additions [23:34:46] Or something [23:34:58] Oh well [23:39:47] you'll find T171940 of your interest [23:39:47] T171940: Create an easier way to add/remove/modify patches for SWAT - https://phabricator.wikimedia.org/T171940 [23:43:10] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Phlogiston: Phlogiston reports don't have new data since mid-February - https://phabricator.wikimedia.org/T188149#4008704 (10mmodell) The keys of the task objects, which are not phids but rather based on the actual `id` field in the database. The last one... [23:58:23] paladox: Weird bug. Open a change's diff. Click on your diff preferences. In the font-size box: delete the digits. Box/label disappear. Closing/reopening the preferences doesn't restore it, have to hard-refresh the overall change. [23:58:47] Doesn't affect tab-width box.