[00:46:16] Any reason the train was skipped today, or is it just because the bot pinged Chad and he was out today? (cc greg-g thcipriani|afk ) [01:59:40] RoanKattouw: think he was on vacation or something [02:00:10] 22:39 < greg-g> eddiegp: sorry, Chad was going to do it but it appears he was delayed in getting back from his vacation :/ It was too late to start it when we realized this (it's mostly a one-person job, [02:00:14] so we are used to just seeing it happen as all the others continue to work on their other work) [02:00:17] 22:39 < greg-g> we'll catch up tomorrow [02:00:20] from earlier in wikimedia-operations [02:38:03] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<30.00%) [04:16:21] Yippee, build fixed! [04:16:21] Project selenium-MultimediaViewer » firefox,beta,Linux,BrowserTests build #543: 09FIXED in 20 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/543/ [04:29:31] PROBLEM - Puppet errors on deployment-memc05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [05:09:32] RECOVERY - Puppet errors on deployment-memc05 is OK: OK: Less than 1.00% above the threshold [0.0] [06:04:20] thanks aude [06:57:18] PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [07:03:07] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10MediaWiki-Parser, 10Readers-Web-Backlog (Tracking): Templates rendering as links on beta cluster - https://phabricator.wikimedia.org/T173576#3674626 (10Tgr) Do you have a diff? Probably there are subtle differences in how transcluded content insi... [07:04:24] 10Continuous-Integration-Infrastructure: Test Failure Unrelated to Patch - https://phabricator.wikimedia.org/T177905#3674627 (10Paladox) [07:04:51] 10Continuous-Integration-Infrastructure: Test Failure Unrelated to Patch - https://phabricator.wikimedia.org/T177905#3674474 (10Paladox) It’s saying java.nio.file.AccessDeniedException: /srv/jenkins-workspace/workspace/composer-package-php70-docker/src/vendor/symfony/yaml/Tag/TaggedValue.php [07:13:05] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:48:12] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: Reenable ssh MAC/KEX hardening on beta cluster and integration labs project - https://phabricator.wikimedia.org/T100518#3674683 (10hashar) [07:48:14] 10Continuous-Integration-Infrastructure, 10Operations, 10Patch-For-Review, 10WorkType-Maintenance: Jenkins master / client ssh connection fails due to missing ssh algorithm - https://phabricator.wikimedia.org/T100509#3674684 (10hashar) [07:48:17] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Jenkins, 10Patch-For-Review, and 2 others: Jenkins trilead-ssh2 doesn't support our MAC/KEX algorithms - https://phabricator.wikimedia.org/T103351#3674681 (10hashar) 05Open>03Resolved Fixed by upstream and proven to work. T... [07:49:18] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: Reenable ssh MAC/KEX hardening on beta cluster and integration labs project - https://phabricator.wikimedia.org/T100518#1314596 (10hashar) Done for labs via https://gerrit.wikimedia.org/r/#/c/383120/ And for production: https://gerrit.wik... [07:50:24] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10Patch-For-Review: Reenable ssh MAC/KEX hardening on beta cluster and integration labs project - https://phabricator.wikimedia.org/T100518#3674688 (10hashar) 05Open>03Resolved a:03hashar [07:51:04] 10Continuous-Integration-Infrastructure: mw-ext-php70-phan-jessie complains about PHP temp directory not writable to composer - https://phabricator.wikimedia.org/T167969#3674690 (10hashar) [07:54:33] (03PS4) 10Hashar: Template to factor out common maven jobs [integration/config] - 10https://gerrit.wikimedia.org/r/383134 [07:54:41] (03PS4) 10Hashar: search: use the generic maven job template [integration/config] - 10https://gerrit.wikimedia.org/r/383132 (owner: 10Gehel) [07:55:35] (03CR) 10Hashar: [C: 032] "NOOP in JJB. That is just introducing a new template." [integration/config] - 10https://gerrit.wikimedia.org/r/383134 (owner: 10Hashar) [07:56:52] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Jenkins, 10Patch-For-Review, and 2 others: Jenkins trilead-ssh2 doesn't support our MAC/KEX algorithms - https://phabricator.wikimedia.org/T103351#3674691 (10Paladox) Your welcome :). Also m1clark implemented the support. I hel... [07:57:42] (03Merged) 10jenkins-bot: Template to factor out common maven jobs [integration/config] - 10https://gerrit.wikimedia.org/r/383134 (owner: 10Hashar) [07:58:23] (03CR) 10Hashar: [C: 032] search: use the generic maven job template [integration/config] - 10https://gerrit.wikimedia.org/r/383132 (owner: 10Gehel) [07:58:31] gehel: I am merging the changes from monday :) [07:58:32] 10Continuous-Integration-Infrastructure (shipyard), 10Operations, 10User-Joe: Unify production and CI docker image build process - https://phabricator.wikimedia.org/T177276#3674692 (10Joe) >>! In T177276#3673007, @Legoktm wrote: >>>! In T177276#3671190, @Joe wrote: >> * There is no need for cache busters as... [07:58:34] mavennnnn [07:59:31] (03Merged) 10jenkins-bot: search: use the generic maven job template [integration/config] - 10https://gerrit.wikimedia.org/r/383132 (owner: 10Gehel) [07:59:47] hashar: thanks! [08:04:22] (03PS11) 10Hashar: Add jenkins jobs for discovery-parent-pom and -maven-tool-configs [integration/config] - 10https://gerrit.wikimedia.org/r/383094 (owner: 10Gehel) [08:04:27] gehel: and I rebased your change for the other repos [08:04:31] adding the config in Zuul :) [08:04:46] * gehel is pretending to understand what all that means... [08:05:18] ;]]] [08:05:47] gehel: top of https://gerrit.wikimedia.org/r/#/c/383094/10..11/zuul/layout.yaml is the mapping between a Gerrit project and the jobs being triggered on some gerrit actions [08:06:04] test: reflects a new patchset being uploaded, that would trigger the job discovery-parent-pom-maven [08:07:20] PROBLEM - Puppet errors on deployent-cassandra3-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:08:07] (03CR) 10Hashar: [C: 032] Add jenkins jobs for discovery-parent-pom and -maven-tool-configs [integration/config] - 10https://gerrit.wikimedia.org/r/383094 (owner: 10Gehel) [08:08:34] hashar: yep, makes some kind of sense... [08:09:17] (03Merged) 10jenkins-bot: Add jenkins jobs for discovery-parent-pom and -maven-tool-configs [integration/config] - 10https://gerrit.wikimedia.org/r/383094 (owner: 10Gehel) [08:09:56] PROBLEM - Puppet errors on deployment-cassandra3-02 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [0.0] [08:10:16] hashar: I've already spent more time than I should have on this, so I'm not going to look into making analytics use the generic maven job, but feel free to have fun! [08:10:22] and I sent a couple of dummy changes to validate the jobs https://gerrit.wikimedia.org/r/383512 https://gerrit.wikimedia.org/r/383513 [08:10:38] gehel: no worries. Thanks for the patch and the idea of refactoring those jobs!! [08:10:52] It has been fun! [08:11:12] discovery-maven-tool-configs works it reported a SUCCESS on https://gerrit.wikimedia.org/r/#/c/383512/ [08:12:57] kool! Thanks! [08:17:17] (03PS2) 10Hashar: dib: rely on jenkins::comon Jenkins agent jre [integration/config] - 10https://gerrit.wikimedia.org/r/382425 (https://phabricator.wikimedia.org/T162828) [08:17:58] (03CR) 10Hashar: [C: 032] dib: rely on jenkins::comon Jenkins agent jre [integration/config] - 10https://gerrit.wikimedia.org/r/382425 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [08:19:39] (03Merged) 10jenkins-bot: dib: rely on jenkins::comon Jenkins agent jre [integration/config] - 10https://gerrit.wikimedia.org/r/382425 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [08:19:44] (03PS1) 10Hashar: dib: remove puppet ordering hack [integration/config] - 10https://gerrit.wikimedia.org/r/383514 [08:20:28] (03CR) 10Hashar: [C: 032] dib: remove puppet ordering hack [integration/config] - 10https://gerrit.wikimedia.org/r/383514 (owner: 10Hashar) [08:21:38] (03Merged) 10jenkins-bot: dib: remove puppet ordering hack [integration/config] - 10https://gerrit.wikimedia.org/r/383514 (owner: 10Hashar) [08:22:48] !log nodepool: refreshing Jessie snapshot after some puppet patches got merged [08:22:53] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:26:24] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban): mediawiki-core-phpcs-docker jobs running on integration-slave-docker-1001 are running significantly slower than other slaves - https://phabricator.wikimedia.org/T177039#3674759 (10hashar) 05Open>03Resolved Looked at sl... [08:33:28] !log Image snapshot-ci-jessie-1507710117 in wmflabs-eqiad is ready [08:33:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:41:55] PROBLEM - Host deployent-cassandra3-01 is DOWN: CRITICAL - Host Unreachable (10.68.19.232) [08:59:16] 10Continuous-Integration-Infrastructure (shipyard): Test Failure Unrelated to Patch - https://phabricator.wikimedia.org/T177905#3674807 (10hashar) Ditto on eg https://gerrit.wikimedia.org/r/#/c/359923/ I am reverting 49e7ef9f9442ace863fab786a9fbcd9bcf93e032 lets follow up on T144961 [09:00:14] 10Continuous-Integration-Infrastructure (shipyard): Test Failure Unrelated to Patch - https://phabricator.wikimedia.org/T177905#3674812 (10hashar) [09:02:04] (03PS1) 10Hashar: Revert "Move composer-package-php70-docker out of experimental" [integration/config] - 10https://gerrit.wikimedia.org/r/383517 (https://phabricator.wikimedia.org/T144961) [09:02:12] PROBLEM - Puppet errors on deployment-trending01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:02:24] (03CR) 10Hashar: [C: 032] Revert "Move composer-package-php70-docker out of experimental" [integration/config] - 10https://gerrit.wikimedia.org/r/383517 (https://phabricator.wikimedia.org/T144961) (owner: 10Hashar) [09:03:21] 10Continuous-Integration-Infrastructure (shipyard), 10Patch-For-Review: Test Failure Unrelated to Patch - https://phabricator.wikimedia.org/T177905#3674819 (10hashar) 05Open>03Resolved a:03hashar I have moved composer-package-php70-docker back to the experimental pipeline. [09:03:33] (03Merged) 10jenkins-bot: Revert "Move composer-package-php70-docker out of experimental" [integration/config] - 10https://gerrit.wikimedia.org/r/383517 (https://phabricator.wikimedia.org/T144961) (owner: 10Hashar) [09:05:10] PROBLEM - Puppet errors on deployment-cassandra3-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:07:04] (03CR) 10Hashar: "recheck" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/383168 (owner: 10Umherirrender) [09:14:12] 10Continuous-Integration-Infrastructure, 10Tracking: PHP7 support in CI (tracking) - https://phabricator.wikimedia.org/T144964#3674868 (10hashar) [09:14:14] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (shipyard), 10Patch-For-Review: Create composer-php70 job - https://phabricator.wikimedia.org/T144961#3674863 (10hashar) 05Resolved>03Open I have moved `composer-package-php70-docker` back to the experimental pipeline due to T177905... [10:41:07] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3675010 (10zeljkofilipin) [10:59:07] 10Release-Engineering-Team (Kanban), 10Navigation-Popups: Run Popups Selenium tests daily targeting beta cluster - https://phabricator.wikimedia.org/T177924#3675046 (10zeljkofilipin) [10:59:50] 10Release-Engineering-Team (Kanban), 10Navigation-Popups, 10User-zeljkofilipin: Run Popups Selenium tests daily targeting beta cluster - https://phabricator.wikimedia.org/T177924#3675046 (10zeljkofilipin) p:05Triage>03Normal [11:00:39] 10Release-Engineering-Team (Kanban), 10Page-Previews, 10User-zeljkofilipin: Run Popups Selenium tests daily targeting beta cluster - https://phabricator.wikimedia.org/T177924#3675046 (10zeljkofilipin) [11:06:33] 10Release-Engineering-Team (Kanban), 10Page-Previews, 10Patch-For-Review, 10User-zeljkofilipin: Run Popups Selenium tests daily targeting beta cluster - https://phabricator.wikimedia.org/T177924#3675075 (10zeljkofilipin) I have created a test Jenkins job, [[ https://integration.wikimedia.org/ci/view/Seleni... [11:37:57] PROBLEM - App Server Main HTTP Response on deployment-mediawiki07 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 392 bytes in 0.005 second response time [11:37:58] 10Release-Engineering-Team (Kanban), 10Page-Previews, 10Patch-For-Review, 10User-zeljkofilipin: Run Popups Selenium tests daily targeting beta cluster - https://phabricator.wikimedia.org/T177924#3675148 (10zeljkofilipin) @Jdlrobson I am not familiar with Selenium tests in Popups. Please let me know if you... [11:38:17] 10Release-Engineering-Team (Kanban), 10Page-Previews, 10Patch-For-Review, 10User-zeljkofilipin: Run Popups Selenium tests daily targeting beta cluster - https://phabricator.wikimedia.org/T177924#3675149 (10zeljkofilipin) a:05zeljkofilipin>03Jdlrobson [12:10:39] 10Release-Engineering-Team (Watching / External), 10Contributors-Team, 10MobileFrontend, 10Operations, and 3 others: Diff page consistently produces 503 on beta cluster on first visit - https://phabricator.wikimedia.org/T176637#3675211 (10MoritzMuehlenhoff) >>! In T176637#3672117, @jkroll wrote: > That was... [12:22:52] Yippee, build fixed! [12:22:53] Project selenium-GettingStarted » firefox,beta,Linux,BrowserTests build #552: 09FIXED in 51 sec: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/552/ [13:05:07] (03PS1) 10Hashar: composer-package: git clone inside the container [integration/config] - 10https://gerrit.wikimedia.org/r/383560 (https://phabricator.wikimedia.org/T144961) [13:06:21] (03CR) 10jerkins-bot: [V: 04-1] composer-package: git clone inside the container [integration/config] - 10https://gerrit.wikimedia.org/r/383560 (https://phabricator.wikimedia.org/T144961) (owner: 10Hashar) [13:06:23] (03CR) 10Hashar: "Untested but that would do the git operations inside the container. Gotta check whether the Jenkins job does the git clone as well (it mig" [integration/config] - 10https://gerrit.wikimedia.org/r/383560 (https://phabricator.wikimedia.org/T144961) (owner: 10Hashar) [13:07:50] (03PS2) 10Hashar: composer-package: git clone inside the container [integration/config] - 10https://gerrit.wikimedia.org/r/383560 (https://phabricator.wikimedia.org/T144961) [13:34:43] 10Release-Engineering-Team (Kanban), 10Readers-Web-Backlog, 10RelatedArticles, 10Browser-Tests, and 4 others: Automated browser tests cannot create pages on the Beta Cluster as anonymous user in RelatedArticles tests - https://phabricator.wikimedia.org/T176315#3621074 (10Niedzielski) This issue is still pr... [13:35:38] 10Release-Engineering-Team (Kanban), 10Page-Previews, 10Readers-Web-Backlog, 10Patch-For-Review, 10User-zeljkofilipin: Run Popups Selenium tests daily targeting beta cluster - https://phabricator.wikimedia.org/T177924#3675613 (10Niedzielski) [13:58:51] 10Release-Engineering-Team (Watching / External), 10Discovery, 10Discovery-Analysis, 10Operations: Setup a mirror for R language dependencies (CRAN) - https://phabricator.wikimedia.org/T170995#3675668 (10hashar) [14:00:05] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team: Remove integration-slave-docker-1705.integration.eqiad.wmflabs - https://phabricator.wikimedia.org/T177743#3675678 (10hashar) Thanks :) [14:05:26] 10Release-Engineering-Team (Kanban), 10Phabricator: Add support for task types - https://phabricator.wikimedia.org/T93499#3675698 (10Fjalapeno) @mmodell for iOS there is a tag: #ios-app-bugs Android has a tag: #android-app-bugs I see there is also a generic #bug-report tag. Would these work? [14:38:52] (03CR) 10Zfilipin: Add Jenkins job for browser tests (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/375377 (https://phabricator.wikimedia.org/T160238) (owner: 10Jdlrobson) [14:44:24] hasharAway: any idea of https://integration.wikimedia.org/ci/job/operations-puppet-tests-docker/7056/console is a failure of jenkins or the patch? I think maybe it's a false negative [14:47:56] (03CR) 10Zfilipin: Add Jenkins job for browser tests (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/375377 (https://phabricator.wikimedia.org/T160238) (owner: 10Jdlrobson) [14:49:11] (03PS4) 10Zfilipin: Add Jenkins job for browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/375377 (https://phabricator.wikimedia.org/T160238) (owner: 10Jdlrobson) [14:49:18] (03CR) 10jerkins-bot: [V: 04-1] Add Jenkins job for browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/375377 (https://phabricator.wikimedia.org/T160238) (owner: 10Jdlrobson) [14:52:11] 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Release-Engineering-Team (Kanban), 10MW-1.30-release-notes (WMF-deploy-2017-09-05 (1.30.0-wmf.17)), 10MW-1.31-release-notes (WMF-deploy-2017-10-03 (1.31.0-wmf.2)), and 2 others: mwext-ruby-jes... - https://phabricator.wikimedia.org/T164479#3675874 [14:53:09] 10Release-Engineering-Team (Kanban), 10MW-1.30-release-notes (WMF-deploy-2017-09-05 (1.30.0-wmf.17)), 10MW-1.31-release-notes (WMF-deploy-2017-10-03 (1.31.0-wmf.2)), 10Patch-For-Review, 10User-zeljkofilipin: mwext-ruby-jessie Jenkins job runs all Ruby tas... - https://phabricator.wikimedia.org/T164479#3234955 [15:01:50] I must have forgotten how to read the MediaWiki roadmap, but I’m in https://www.mediawiki.org/wiki/MediaWiki_1.31/Roadmap and from what I see there should be a -wmf3 branch, which is released to mediawiki.org and test.wp, etc. I don’t even see the corresponding git branch. [15:02:11] Can anyone here confirm that the roadmap is incorrect, or point me to the wmf3 git branch? [15:08:48] awight: the roadmap is incorrect. wmf.3 will be cut and pushed to group0 and group1 today. [15:08:50] * thcipriani edits [15:10:11] PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:10:12] roadmap is up-to-date now. [15:15:09] thcipriani: TY! [15:21:51] PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:22:05] thcipriani: Would you remind me, is there a way I can manually branch an extension to prevent the train from grabbing some sloppy stuff I merged to master yesterday, for example? [15:22:59] 10Release-Engineering-Team, 10Mathoid, 10Release Pipeline: Add experimental blubber test build/run to mathoid jenkins test pipeline - https://phabricator.wikimedia.org/T177954#3675976 (10thcipriani) [15:24:10] thcipriani: nvm, I reverted for now, which is the better solution anyway. [15:24:48] awight: hrm...ok that works, I was just staring hard at our branching code: https://github.com/wikimedia/mediawiki-tools-release/blob/master/make-wmf-branch/MakeWmfBranch.php [15:25:00] lol seriously [15:25:01] sorry for the inconvenience :( [15:25:35] not at all an inconvenience. It’s been a while since I worked on an extension included in the train, is all. [15:26:09] tl;dr, I merged some code yesterday which worked great on my local and in tests, but fails mysteriously (actually even worse: works mysteriously) on beta. The usual. [15:26:25] Actually, hey—who would I bother with a logstash-beta question? [15:29:38] I don't know who the lucky person to have last touched it is :) What's the question? I might be able to help or at least point in a somewhat sane direction. [15:32:20] <_joe_> !log removing deployment-pdf01, T177931 [15:32:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:32:25] T177931: Decommission OCG from production - https://phabricator.wikimedia.org/T177931 [15:33:03] nice :) [15:33:18] (yeah, fatality!) [15:34:04] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<100.00%) [15:34:25] thcipriani: I should have seen a bunch of log messages coming from a MediaWiki extension deployed to beta. I confirmed the code was deployed by checking Special:Version. Its logger is instantiated like LoggerFactory::getInstance( 'ORES' ), and I was logging at INFO level. Didn’t see any messages. [15:34:33] PROBLEM - Host deployment-pdf01 is DOWN: CRITICAL - Host Unreachable (10.68.16.73) [15:35:08] OTOH, I also wasn’t able to prove that the new code was running at *all*, in which case there would have been no log messages. [15:35:49] heh, yeah that might do it :) [15:37:19] if you want to file a task and add the beta-cluster-infrastructure tag that'd probably get some eyes/provide a common place to dig in and post what we find. [15:37:35] I can’t imagine how Special:Version would tell me one version of code yet would be running the old version, however. [15:37:47] thcipriani: kk, thanks for the tag [15:39:11] yeah, Special:Version if anything may give you an older tag than is actually running because security things, but not a newer tag than is actually running. Unless there is some bug in the disclosable head code. [15:39:43] The remaining hypothesis is that I’ve completely lost my marbles. [15:39:48] * awight looks under the couch [15:40:03] :) [15:51:11] PROBLEM - Free space - all mounts on deployment-mediawiki05 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki05.diskspace.root.byte_percentfree (<11.11%) [15:51:44] 10Release-Engineering-Team, 10Release Pipeline: Define pipeline failure developer feedback - https://phabricator.wikimedia.org/T177868#3676092 (10thcipriani) [15:56:09] RECOVERY - Free space - all mounts on deployment-mediawiki05 is OK: OK: All targets OK [15:58:16] 10Release-Engineering-Team, 10Release Pipeline: Pipeline image build cleanup - https://phabricator.wikimedia.org/T177867#3676109 (10thcipriani) >>! In T177867#3673834, @dduvall wrote: > By default, these commands will only delete "dangling" images which will not include tagged images such as the ones resulting... [16:09:42] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Jenkins: Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962#3676158 (10hashar) [16:10:21] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Jenkins: Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962#3676177 (10hashar) [16:10:38] hasharAway hi, we should try to install the blue ocean plugin :). [16:15:34] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Jenkins: Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962#3676183 (10hashar) Installed on contint1001/contint2001 from http://pkg.jenkins-ci.org/debian-stable/binary/jenkins... [16:17:17] Yippee, build fixed! [16:17:17] Project beta-scap-eqiad build #177071: 09FIXED in 3 min 34 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/177071/ [16:34:32] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10Deployments, 10WorkType-NewFunctionality: Scap3 submodule space issues - https://phabricator.wikimedia.org/T137124#3676236 (10mmodell) So I've done some experimenting with various ways to handle submodules now that we have a rela... [16:39:54] 10Release-Engineering-Team (Kanban), 10Deployments: Automate the recurring management of wikitech:Deployments and phab:#train_deployments - https://phabricator.wikimedia.org/T114488#1697362 (10mmodell) p:05High>03Normal This has unfortunately been on the back burner as I've been busy with scap and other ph... [16:43:48] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Jenkins: Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962#3676158 (10hashar) p:05Triage>03High [16:46:06] Yippee, build fixed! [16:46:07] Project beta-scap-eqiad build #177074: 09FIXED in 2 min 28 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/177074/ [16:51:20] 10Release-Engineering-Team (Kanban), 10User-greg: Fill in quarterly review slides for our stuff - https://phabricator.wikimedia.org/T176820#3676278 (10greg) 05Open>03Resolved [16:56:17] 10Gerrit, 10Release-Engineering-Team: Update gerrit to 2.15 - https://phabricator.wikimedia.org/T177201#3676308 (10Paladox) [16:57:04] 10Gerrit, 10Release-Engineering-Team: Update gerrit to 2.15 - https://phabricator.wikimedia.org/T177201#3650281 (10Paladox) [16:59:32] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10scap2: Eliminate symlinks in mediawiki-config (as much as possible) - https://phabricator.wikimedia.org/T126306#3676335 (10mmodell) [17:03:51] (03PS11) 10Umherirrender: Update test config [integration/config] - 10https://gerrit.wikimedia.org/r/380790 [17:18:33] 10Release-Engineering-Team, 10Release Pipeline: Pipeline image build cleanup - https://phabricator.wikimedia.org/T177867#3676418 (10dduvall) >>! In T177867#3676109, @thcipriani wrote: >>>! In T177867#3673834, @dduvall wrote: >> By default, these commands will only delete "dangling" images which will not includ... [17:27:05] (03CR) 10Mholloway: [C: 031] Set file filter for apps-android-wikipedia-tox-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/382837 (https://phabricator.wikimedia.org/T177016) (owner: 10Legoktm) [17:38:19] (03PS1) 10Mholloway: Bump Android periodic test emulator to android-26 [integration/config] - 10https://gerrit.wikimedia.org/r/383619 [17:39:13] (03PS5) 10Jdlrobson: Add Jenkins job for browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/375377 (https://phabricator.wikimedia.org/T160238) [17:39:20] (03CR) 10Jdlrobson: "Hopefully my new commit message clarifies things." (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/375377 (https://phabricator.wikimedia.org/T160238) (owner: 10Jdlrobson) [17:39:30] (03CR) 10Jdlrobson: Add Jenkins job for browser tests (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/375377 (https://phabricator.wikimedia.org/T160238) (owner: 10Jdlrobson) [17:50:54] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Jenkins: Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962#3676542 (10MoritzMuehlenhoff) I've uploaded 2.73.2 to apt.wikimedia.org [18:13:38] any idea why we have hhvm installed on gerrit2001?? [18:13:44] what is the process of updating /var/lib/git/labs/private on puppetmaster in deployment-prep? [18:13:54] TL;DR whatever the process, i think i just violated it [18:14:10] it's kind of a mystery, since it's not in puppet, not in root bash_history, not in apt/history.log.. nothing at all.. yet it is installed and it isnt on cobalt [18:14:52] urandom: o/ [18:15:04] I can probably help for the deployment-prep puppet master :D [18:15:23] hashar: i added some key material to labs/private.git [18:15:46] hashar: and then i merged it, but i'm thinking maybe that was wrong :) [18:15:49] it is cloned on the puppet master in /var/lib/git/labs/private/ [18:15:58] yeah [18:15:59] there is then a cronjob that automagically rebase it from time to time [18:16:06] oh i see [18:16:11] */10 * * * * /usr/local/bin/git-sync-upstream >>/var/log/git-sync-upstream.log 2>&1 [18:16:14] so every 10 minutes [18:16:21] it runs that script above that is somewhere in puppet.git [18:16:23] /o\ [18:16:29] and spurts stuff to /var/log/git-sync-upstream.log [18:16:39] usually tailling that file show up the error [18:17:00] looks like it breaks on rebasing operations/puppet on one of my patches [18:17:03] I am gonna rebase [18:17:31] hashar: i did a merge there [18:18:05] hashar: maybe that needs to be undone [18:19:01] !log beta: rebased puppet master due to a conflict with b3c6968b3c [18:19:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:19:24] and I am running the cron manually: /usr/local/bin/git-sync-upstream >>/var/log/git-sync-upstream.log 2>&1 [18:19:41] From https://gerrit.wikimedia.org/r/p/labs/private [18:19:42] = [up to date] master -> origin/master [18:19:42] Up-to-date: /var/lib/git/labs/private [18:19:53] oahrh [18:19:57] urandom: yeah indeed there is some merge :) [18:20:15] 'o\ [18:20:24] fixed by using "git rebase" :) [18:20:32] hashar: sorry! [18:20:36] and thanks [18:20:37] no worries! [18:20:56] it is always better to ping about something than to hide it under the carpet [18:21:15] indeed, but better still to ping *before* :) [18:21:25] urandom: do note that labs/private.git is readable by anyone having an account on labs [18:21:29] so that is hmm public [18:21:31] right [18:21:43] we tend to stick local commits directly on the puppetmaster [18:22:05] hashar: ok; why is that? [18:22:23] oh, because it's the *repo* is public? [18:22:29] yeah it is semi public [18:22:32] to obscure it [18:22:41] well you just need a labs account to access to the repo so that is almost public :] [18:22:47] labs/private is even readable by anyone. period. no account needed [18:22:52] oh [18:23:18] yeah, that's what i was asking, labs/private.git in gerrit is viewable by anyone [18:23:33] mutante: while you are around. Where you involved in setting up the releases1001 / releases2001 hosts by any chance ? [18:23:38] /var/lib/git/labs/private is viewable by almost everyone :) [18:24:04] not sure how much security it gained by keeping it out of the former (in my case) [18:24:12] urandom: yes, it's known and expected, the name 'private' is misleadiing [18:24:25] it is not about security at all [18:24:33] it's just a reference to the _real_ private repo [18:24:44] and the data in it is all fake passwords and keys [18:24:55] mutante: i think hashar was saying that local commits are made on puppetmaster to avoid having secrets in labs/private on gerrit [18:24:55] PROBLEM - Puppet errors on deployment-ms-be04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [18:24:58] hashar: yes, i was [18:25:11] yup [18:25:34] frankly i dont understand why beta has a separate master [18:25:34] right; so local commits are made so as to be slightly less public :) [18:25:36] so if there is a password you dont want to make public, dont send it to labs/private but keep it as a local hack on deployment-puppetmaster [18:25:37] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2), 10Deployments, 10WorkType-NewFunctionality: Scap3 submodule space issues - https://phabricator.wikimedia.org/T137124#3676632 (10Halfak) Awesome to see some progress. I'd like to note {T171758} since it's related and would easily a... [18:25:43] urandom: 5/5 :) [18:25:52] hashar: i think we are ok with what i committed [18:25:53] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [18:26:45] mutante: would you mind upgrading jenkins on releases1001/releases1002 please? It got a security release earlier today. I have already upgraded contint1001/contint2001 [18:26:48] what kind of passwords do you need to hide? [18:26:55] mutante: and it is already on apt.wm.o ( https://phabricator.wikimedia.org/T177962#3676542 ) :] [18:27:37] hashar: yea, i can do that. maybe in half an hour or so? [18:28:34] mutante: yeah that is not that urgent for the releases hosts. I guess that can wait post lunch but would be nice to have it done today though :D [18:28:37] (03CR) 10Umherirrender: [C: 04-1] "Breaks on" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/383184 (owner: 10Umherirrender) [18:28:42] hashar: also, i have news for you.https://gerrit.wikimedia.org/r/#/c/312523/ is merged, so you can now do the things listed on https://gerrit.wikimedia.org/r/#/c/330412/4 [18:28:56] hashar: yes, ok, today [18:29:35] mutante: I have noticed ! thank you :) [18:33:26] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [18:39:22] PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [18:40:19] mutante: I have nuked /mnt from the puppet compiler instance. So you can safely merge https://gerrit.wikimedia.org/r/#/c/330412/ now :) [18:40:30] mutante: and I will do the rest of the needed steps / verify :] [18:52:22] is deployment-tin the right place to kick off deployments in deployment-prep? [18:52:30] urandom: yes [18:54:55] urandom: Depending on what you want to deploy.. You don't actually have to do anything [18:55:53] *magic* [18:59:57] RECOVERY - Puppet errors on deployment-ms-be04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:00:25] * urandom cues a gif [19:01:05] * urandom sighs [19:01:40] Exiting; no certificate found and waitforcert is disabled <--- this happened earlier and mobrovac fixed it, are there docs i could be pointed to? [19:01:47] i guess the cert needs signing? [19:02:19] urandom: usually but also note it will show the same if not run as root [19:02:37] i ran puppet as root [19:03:01] if i didn't know better i'd say it was puppet's purpose to paint my terminal in red [19:08:24] RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [19:09:20] hashar: 330412 is merged now [19:10:11] RECOVERY - Puppet errors on deployment-cassandra3-01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:10:28] (the bot doesnt tell us , was it moved to a 'bot-channle', heh) [19:13:25] mutante: danke! [19:23:51] PROBLEM - jenkins_service_running on releases2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [19:24:43] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Technical-Debt: Migrate CI labs slaves to use /srv instead of /mnt - https://phabricator.wikimedia.org/T146381#3677060 (10hashar) 05Open>03Resolved Done! Thank you @Dzahn for the puppet merges. [19:25:49] ACKNOWLEDGEMENT - jenkins_service_running on releases2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war daniel_zahn upgrade in progress [19:27:51] RECOVERY - jenkins_service_running on releases2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [19:28:57] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Jenkins: Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962#3677098 (10hashar) 05Open>03Resolved 21:27:51 <@Dzahn> !log releases2001 - upgraded jenkins to 2.73.2, kept exi... [19:29:58] RECOVERY - Puppet errors on deployment-cassandra3-02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:59:20] bah [19:59:25] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [19:59:26] the jobrunner is out of date on beta [19:59:45] !log deployment-prep: deploying jobrunner to catchup with changes. [19:59:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:02:51] 10Release-Engineering-Team (Kanban), 10MediaWiki-Search, 10Patch-For-Review: Use innodb engine for searchindex table in mysql - https://phabricator.wikimedia.org/T107875#3677288 (10Bawolff) 05Resolved>03Open >>! In T107875#3668064, @mmodell wrote: > This is ancient history now. Its not. Issue is still p... [20:04:01] 10Release-Engineering-Team (Kanban), 10MediaWiki-Search, 10Patch-For-Review: Use innodb engine for searchindex table in mysql - https://phabricator.wikimedia.org/T107875#3677292 (10mmodell) a:05mmodell>03None [20:09:54] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Jenkins: Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962#3677334 (10Dzahn) 19:20 mutante: apt: reprepro copy stretch-wikimedia jessie-wikimedia jenkins [20:10:50] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [20:20:48] 10Release-Engineering-Team (Kanban), 10JobRunner-Service, 10Operations, 10Beta-Cluster-reproducible, 10Patch-For-Review: jobrunner / jobchron systemd services are in error state after a stop - https://phabricator.wikimedia.org/T168044#3677405 (10hashar) 05stalled>03Resolved Solved. All kudos/credits... [20:25:48] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [20:30:59] 10Release-Engineering-Team (Kanban), 10Phabricator: Add support for task types - https://phabricator.wikimedia.org/T93499#3677453 (10mmodell) @Fjalapeno Yes but there obviously will be some tasks which aren't covered by the tags. The generic #bug-report tag is archived and no longer used afaik. [20:38:52] PROBLEM - Free space - all mounts on integration-slave-jessie-1004 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1004.diskspace._srv.byte_percentfree (<40.00%) [20:47:09] 10Continuous-Integration-Infrastructure: automatically build and commit mediawiki/vendor (composer) - https://phabricator.wikimedia.org/T101123#3677521 (10Krinkle) [20:51:20] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10MediaWiki-Vendor: Ensure with test that composer.json matches between mediawiki/vendor and mediawiki/core - https://phabricator.wikimedia.org/T113360#3677536 (10Krinkle) p:05Triage>03High [20:52:03] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10MediaWiki-Vendor: Ensure with test that composer.json matches between mediawiki/vendor and mediawiki/core - https://phabricator.wikimedia.org/T113360#1662899 (10Krinkle) Rephrasing this to be about the original problem. Feel free to close if such tes... [20:52:31] 10Continuous-Integration-Infrastructure: automatically build and commit mediawiki/vendor (composer) - https://phabricator.wikimedia.org/T101123#3677544 (10Krinkle) 05Open>03declined Closing per T105638. [20:52:58] 10Continuous-Integration-Infrastructure: automatically build and commit mediawiki/vendor (composer) - https://phabricator.wikimedia.org/T101123#3677558 (10Krinkle) [20:53:01] 10Continuous-Integration-Infrastructure, 10Patch-For-Review: Fetch dependencies using composer instead of cloning mediawiki/vendor for non-wmf branches - https://phabricator.wikimedia.org/T90303#3677557 (10Krinkle) [20:53:16] 10Continuous-Integration-Infrastructure: automatically build and commit mediawiki/vendor (composer) - https://phabricator.wikimedia.org/T101123#1330244 (10Krinkle) [20:53:18] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10MediaWiki-Vendor: Ensure with test that composer.json matches between mediawiki/vendor and mediawiki/core - https://phabricator.wikimedia.org/T113360#3677559 (10Krinkle) [20:54:52] hi, just wanted to let releng know, someone generated a ssh key under deployment-puppetmaster and uploaded it to gerrit.new.wmflabs.org/r/ under the admin's account. Just wanted to make sure that was ok. As i did not use it nor tend to use it. But everyone has access to the account. [20:54:53] "puppet@deployment-puppetmaster" [21:03:05] 10Continuous-Integration-Infrastructure, 10Nodepool: Increase Jenkins/Nodepool capacity - https://phabricator.wikimedia.org/T173047#3677593 (10hashar) 05Open>03declined For some reason RabbitMQ deadlocks and bring down all OpenStack operations. When the rate of requests goes down, it eventually recover by... [22:01:31] 10Continuous-Integration-Infrastructure (shipyard): Should we expose some JENKINS_ environment variables in docker? - https://phabricator.wikimedia.org/T177684#3666829 (10hashar) https://wiki.jenkins.io/display/JENKINS/Building+a+software+project has a list of some Jenkins built-in variables. The way variables... [22:11:15] 10Release-Engineering-Team (Kanban), 10Page-Previews, 10Readers-Web-Backlog, 10Patch-For-Review, 10User-zeljkofilipin: Run Popups Selenium tests daily targeting beta cluster - https://phabricator.wikimedia.org/T177924#3677815 (10Jdlrobson) >>! In T177924#3675075, @zeljkofilipin wrote: > I have created a... [22:11:30] 10Release-Engineering-Team (Kanban), 10Page-Previews, 10Readers-Web-Backlog, 10Patch-For-Review, 10User-zeljkofilipin: Run Popups Selenium tests daily targeting beta cluster - https://phabricator.wikimedia.org/T177924#3677816 (10Jdlrobson) a:05Jdlrobson>03zeljkofilipin [22:49:33] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10MediaWiki-Parser, 10Parsing-Team, 10Readers-Web-Backlog (Tracking): Templates rendering as links on beta cluster - https://phabricator.wikimedia.org/T173576#3677975 (10Jdlrobson) Diff is too messy to be useful, but the main visual differences a... [22:51:25] 10Release-Engineering-Team (Kanban), 10Readers-Web-Backlog, 10RelatedArticles, 10Browser-Tests, and 4 others: Automated browser tests cannot create pages on the Beta Cluster as anonymous user in RelatedArticles tests - https://phabricator.wikimedia.org/T176315#3677990 (10Jdlrobson) @zeljkofilipin I've advi... [22:52:21] Hmm, I’m on tin-beta and am seeing timeouts on just one of my git-ssh.wmo submodules [22:55:54] (03Restored) 10Mholloway: Provide Android SDK location as an argument to non-periodic test scripts [integration/config] - 10https://gerrit.wikimedia.org/r/368238 (https://phabricator.wikimedia.org/T171811) (owner: 10Mholloway) [23:11:02] (03CR) 10Dbrant: [C: 031] Provide Android SDK location as an argument to non-periodic test scripts [integration/config] - 10https://gerrit.wikimedia.org/r/368238 (https://phabricator.wikimedia.org/T171811) (owner: 10Mholloway) [23:12:45] 10Continuous-Integration-Infrastructure, 10Nodepool: Increase Jenkins/Nodepool capacity - https://phabricator.wikimedia.org/T173047#3678047 (10bd808) > Middle term, I guess we will look at how we can decouple CI from WMCS and move it to its own infrastructure. This is the only part of the conclusion that seem... [23:51:31] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Include Blubber metadata in Dockerfile output as labels - https://phabricator.wikimedia.org/T178022#3678181 (10dduvall) [23:52:01] 10Release-Engineering-Team, 10Release Pipeline: Pipeline image build cleanup - https://phabricator.wikimedia.org/T177867#3673142 (10dduvall) [23:52:03] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Include Blubber metadata in Dockerfile output as labels - https://phabricator.wikimedia.org/T178022#3678196 (10dduvall) [23:52:24] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Include Blubber metadata in Dockerfile output as labels - https://phabricator.wikimedia.org/T178022#3678181 (10dduvall) a:03dduvall