[06:20:04] Project selenium-Wikibase » chrome,test,Linux,BrowserTests build #329: 04FAILURE in 1 hr 40 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=BrowserTests/329/ [06:46:19] 10Browser-Tests-Infrastructure, 07Documentation, 07Easy, 07Software-Licensing: Ruby gem documentation should state license - https://phabricator.wikimedia.org/T94001#3177485 (10Rammanojpotla) @hasher I am trying to install mediawiki_selenium.git I have generated a gemfile and run 'bundle install' which ins... [08:07:03] 10Scap, 13Patch-For-Review, 15User-fgiunchedi: Ensure deployment_server is global - https://phabricator.wikimedia.org/T162814#3177616 (10fgiunchedi) [08:30:55] 10Continuous-Integration-Config, 10Wikimedia-Site-requests, 13Patch-For-Review: Enforce our PHP coding standards on operations/mediawiki-config via composer? - https://phabricator.wikimedia.org/T162835#3177627 (10hashar) [08:38:28] 10Browser-Tests-Infrastructure, 07Documentation, 07Easy, 07Software-Licensing: Ruby gem documentation should state license - https://phabricator.wikimedia.org/T94001#3177632 (10hashar) What you have done is define a ruby dependency in your `/var/www/html` directory. The ruby dependency has been installed a... [09:17:45] 10Continuous-Integration-Config, 10Wikimedia-Site-requests, 13Patch-For-Review: Enforce our PHP coding standards on operations/mediawiki-config via composer? - https://phabricator.wikimedia.org/T162835#3176500 (10Dereckson) I concur it's a good idea, I sometimes fix some issues when I see some, and James sub... [09:53:35] hullo you lovely releng folk [09:54:21] i //think// i'm seeing unrelated test js unit test failures across different projects but can't track down the source [09:56:17] wai [09:56:18] *wait [09:56:32] i see that we recently migrated to jquery 3 [10:09:46] red herring possibly [10:31:01] phuedx: found the problem? [10:31:09] zeljkof: not quite [10:32:05] I'm not really familiar with quint, and I think hashar is traveling today [10:32:19] zeljkof: i'm seeing different failures in different contexts: [10:33:11] mediawiki-extensions-qunit-jessie fails for core, mwext-qunit-jessie for an extension [10:33:15] different test failures too [10:33:23] i was hoping at least the test failure would be consistent [10:34:17] Antoine was working on merging quint and selenium/webdriverio jobs, but there was some trouble and he said he reverted it back [10:34:38] maybe there is something wrong with the revert [10:44:00] between ps35 of https://gerrit.wikimedia.org/r/#/c/322812 //may// have introduced the failures [10:44:07] ps34 built fine [10:44:22] not trying to point fingers -- i'm just going through the most recent changes to mediawiki core [10:45:55] hm, that does seem like the most likely thing [10:46:15] are failures consistent? [10:46:28] I mean, the same test always fails with the same error? [10:48:03] reading through the comment thread of that change [10:48:51] (and remembering recent changes to mobilefrontend, for example) [10:48:55] maybe Krinkle would know more ^ [10:49:39] it seems as if certain types of error will cascade through the test runs [10:50:02] e.g. Krinkle mentions wikibase failures [11:08:03] PROBLEM - Puppet run on integration-c1 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [11:18:05] * phuedx is currently trying to find the source of the error in his local env [11:18:17] i'll give up in a little while and write it up in phab [11:22:16] 06Release-Engineering-Team, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3177934 (10phuedx) [11:22:41] 06Release-Engineering-Team, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3177947 (10phuedx) I'll add more detail as I investigate. [11:25:20] 06Release-Engineering-Team, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3177948 (10phuedx) [11:25:22] 10Browser-Tests-Infrastructure, 07Documentation, 07Easy, 07Software-Licensing: Ruby gem documentation should state license - https://phabricator.wikimedia.org/T94001#3177949 (10Rammanojpotla) Yes I have prevously installed few of the applications in the same manner but the thing what I didn't get is should... [12:02:48] 06Release-Engineering-Team, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3177985 (10Paladox) p:05Triage>03High [12:03:06] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3177934 (10Paladox) [12:07:04] paladox: if you've got any ideas, then shout out! [12:07:30] Oh. Could it be related to the jquery 3 upgrade? [12:07:38] though that is behind a config [12:08:31] paladox: enabled by default [12:08:47] not sure. [12:09:09] oh [12:09:10] it is [12:09:11] https://gerrit.wikimedia.org/r/#/c/322812/35/includes/DefaultSettings.php [12:09:15] phuedx ^^ [12:09:55] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3177988 (10Paladox) may be related to jQuery being upda... [12:10:06] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3177989 (10Paladox) cc @Krinkle [12:10:20] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-ResourceLoader, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3177991 (10Paladox) [12:16:05] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-ResourceLoader, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3177992 (10phuedx) Runnin... [13:15:31] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-ResourceLoader, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3178137 (10phuedx) a:05p... [13:17:26] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-ResourceLoader, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3178142 (10phuedx) ^ Sorr... [13:24:42] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-ResourceLoader, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3177934 (10hashar) The Me... [13:30:32] 10Browser-Tests-Infrastructure, 07Documentation, 07Easy, 07Software-Licensing: Ruby gem documentation should state license - https://phabricator.wikimedia.org/T94001#3178184 (10hashar) This task is about changing the README.md file to link to the licenses files. You dont even need mediawiki/core to do that... [13:45:54] zeljkof: I am trying to polish up the mediawiki-core-qunit-selenium-jessie job [13:49:09] PROBLEM - Puppet run on buildlog is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:58:04] 10Gerrit: REL1_28 checkout of extensions.git is broken - https://phabricator.wikimedia.org/T162884#3178260 (10Aklapper) Potential outcome from {T115262} [14:01:18] hashar: sorry, just saw you ping (late lunching again) [14:01:22] do you need help? [14:02:04] 00:03:02.184 None of the test reports contained any result [14:03:04] AH [14:03:09] results: 'log/junit*.xml' [14:03:25] the files are named WDIO.xunit..........xml [14:03:48] I thought we have fixed that :/ [14:03:54] in the jjb file [14:05:36] and there is a lot of fun [14:05:36] https://integration.wikimedia.org/ci/job/mediawiki-core-qunit-selenium-jessie/8/console [14:05:44] there are 4 failures [14:05:55] but eventually it finishes with SUCCESS [14:07:22] I suspect wdio <- grunt <- npm some how does not exit > 0 [14:09:14] * hashar whistles [14:09:16] and watch https://integration.wikimedia.org/ci/job/mediawiki-core-qunit-selenium-jessie/9/console [14:10:01] ok, failed [14:10:06] progress [14:10:24] 00:02:49.445 Aborted due to warnings. [14:10:30] but it still run the next command [14:10:39] so grunt webdriver:test does not exit 1 [14:11:11] ah no [14:11:18] that is npm run selenium [14:11:22] it does: grunt webdriver:test; killall chromedriver" [14:11:31] and the killall chromedriver exit 0 [14:11:40] effectively shallowing the error [14:32:09] hashar: maybe we need your chromedriver service patch merged [14:38:29] zeljkof: well or hmm [14:38:38] I am gonna change the junit result [14:38:50] and most probably the job can just directly spawn chromedriver && grunt web driver:test [14:40:15] (03PS4) 10Hashar: Run webdriverio tests for MediaWiki core [integration/config] - 10https://gerrit.wikimedia.org/r/347587 (https://phabricator.wikimedia.org/T139740) [14:40:41] zeljkof: https://gerrit.wikimedia.org/r/#/c/347587/3..4/jjb/mediawiki.yaml :( [14:40:50] I replace npm run selenium [14:41:10] and have the jenkins job take care of spawning chromedriver and killing it [14:41:50] (03CR) 10jerkins-bot: [V: 04-1] Run webdriverio tests for MediaWiki core [integration/config] - 10https://gerrit.wikimedia.org/r/347587 (https://phabricator.wikimedia.org/T139740) (owner: 10Hashar) [14:44:15] hashar: does it work? [14:44:31] we can always clean it up [14:45:41] well yeah [14:45:44] looks like it works for me [14:45:55] checking whether the junit plugin pass [14:46:00] when there are no files found [14:46:12] https://integration.wikimedia.org/ci/job/junit-fail/1/console [14:46:14] it pass :-} [14:46:56] (03PS5) 10Hashar: Run webdriverio tests for MediaWiki core [integration/config] - 10https://gerrit.wikimedia.org/r/347587 (https://phabricator.wikimedia.org/T139740) [14:48:27] (03CR) 10jerkins-bot: [V: 04-1] Run webdriverio tests for MediaWiki core [integration/config] - 10https://gerrit.wikimedia.org/r/347587 (https://phabricator.wikimedia.org/T139740) (owner: 10Hashar) [14:58:18] Oh my [14:58:25] a new release of voluptuous got cut [15:00:39] (03PS1) 10Hashar: WMF: pin voluptuous<0.10 [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/348083 [15:00:41] pff [15:01:24] (03CR) 10Hashar: [C: 032] "See https://pypi.org/project/voluptuous/#history for release history." [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/348083 (owner: 10Hashar) [15:02:32] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/347587 (https://phabricator.wikimedia.org/T139740) (owner: 10Hashar) [15:08:44] so zuul layout validation is fixed [15:09:30] zeljkof: I am going to deploy it [15:09:54] (03CR) 10Hashar: [C: 032] "From a quick discussion with Željko lets be bold." [integration/config] - 10https://gerrit.wikimedia.org/r/347587 (https://phabricator.wikimedia.org/T139740) (owner: 10Hashar) [15:10:42] hashar: \o/ [15:12:08] (03Merged) 10jenkins-bot: Run webdriverio tests for MediaWiki core [integration/config] - 10https://gerrit.wikimedia.org/r/347587 (https://phabricator.wikimedia.org/T139740) (owner: 10Hashar) [15:13:10] zeljkof: deployed [15:13:18] gonna recheck a change on mediawiki/skins/Vector [15:14:12] master: https://gerrit.wikimedia.org/r/#/c/323228/ [15:14:26] REL1_28 https://gerrit.wikimedia.org/r/#/c/348087/ [15:15:09] !log Deployed mediawiki-core-qunit-selenium-jessie job (runs qunit + selenium with webdriverio) https://gerrit.wikimedia.org/r/#/c/347587/ - T139740 [15:15:13] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:15:13] T139740: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740 [15:21:17] the one for master running at https://integration.wikimedia.org/ci/job/mediawiki-core-qunit-selenium-jessie/12/console [15:23:08] 6 passing [15:23:13] Finished: SUCCESS [15:23:33] and tests are properly reported at https://integration.wikimedia.org/ci/job/mediawiki-core-qunit-selenium-jessie/12/testReport/ [15:26:33] zeljkof: and the build for REL1_28 https://integration.wikimedia.org/ci/job/mediawiki-core-qunit-selenium-jessie/13/console [15:26:34] 00:02:42.775 + '[' -f ./tests/selenium/wdio.conf.js ']' [15:26:34] 00:02:42.952 [PostBuildScript] - Execution post build scripts. [15:26:39] 00:02:45.641 Recording test results [15:26:39] 00:02:45.883 None of the test reports contained any result [15:26:39] 00:02:45.982 Finished: SUCCESS [15:26:47] eg selenium/wdio got skipped [15:26:53] and the build is still a success [15:29:22] 10Browser-Tests-Infrastructure, 10MediaWiki-General-or-Unknown, 07JavaScript, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 4 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3178713 (10hashar) The Jenkins job `mediawiki-core-qunit-selenium-jessi... [15:29:55] zeljkof: sorry it took so long [15:31:18] hashar: no problem at all! [15:31:23] hopefully [15:31:24] thanks for working on it [15:34:20] 10Browser-Tests-Infrastructure, 10MediaWiki-General-or-Unknown, 07JavaScript, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 4 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3178757 (10zeljkofilipin) [15:36:59] !log deployment-mediawiki04 clearing /var/cache/hhvm/fcgi.hhbc.sq3 [15:37:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:43:04] Project selenium-MobileFrontend » chrome,beta,Linux,BrowserTests build #389: 04FAILURE in 21 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/389/ [15:52:32] Project selenium-MobileFrontend » firefox,beta,Linux,BrowserTests build #389: 04FAILURE in 30 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/389/ [16:05:09] 06Release-Engineering-Team, 10Page-Previews, 06Reading-Web-Backlog: Generate compiled assets from continuous integration - https://phabricator.wikimedia.org/T158980#3178940 (10hashar) Beside the quick chat early in March and the summary at T158980#3087905 , I do not plan to work on this nor do I have any idl... [16:08:09] will be back around later this evening [16:33:25] 10Continuous-Integration-Config, 06Release-Engineering-Team, 07Regression: doc.wikimedia.org docs for old releases is actually master - https://phabricator.wikimedia.org/T162506#3179297 (10hashar) 05Open>03Resolved We have all the patch releases for 1.23, 1.27 and 1.28. I am dropping the ones for the EO... [17:28:35] 10Deployment-Systems, 06Collaboration-Team-Triage, 10Flow, 07I18n: Message Mediawiki:Flow-terms-of-use-edit on nowp is English - https://phabricator.wikimedia.org/T133571#3179519 (10jeblad) 05Open>03Resolved [18:01:59] question: if I accidentally clicked +2 instead of +1, how can I make it not merge the thing? [18:02:44] quckly set it back to +1 [18:03:00] ah ok so if I quickly remove +2 it won't merge? [18:03:04] yep [18:03:04] ok, did so [18:03:08] paladox: thanks [18:04:11] paladox: zuul still shows it in the submit queue... [18:04:44] Yep it will. but wont be able to merge without cr +2 [18:04:54] ah, ok, thanks [18:05:42] my dexterity is low today :( [18:05:56] SMalyshev: cold fingers? [18:08:01] your welcome :) [18:10:35] greg-g: morning... [18:10:51] :) [18:30:07] 10Continuous-Integration-Infrastructure, 07Jenkins, 07Upstream, 07WorkType-NewFunctionality: Jenkins trilead-ssh2 doesn't support our MAC/KEX algorithms - https://phabricator.wikimedia.org/T103351#3179876 (10Paladox) This will require a jenkins update. [18:30:25] 10releng-201516-q1, 15User-greg: [keyresult] Implement the needed features in scap for RESTBase deploys - https://phabricator.wikimedia.org/T113119#3179881 (10Pchelolo) [18:30:26] 10Scap, 10scap2, 07Epic: EPIC: Future Deployment Tooling - https://phabricator.wikimedia.org/T101023#3179882 (10Pchelolo) [18:30:30] 10Deployment-Systems, 06Release-Engineering-Team, 10RESTBase, 05Goal, 06Services (next): Create or improve the RESTBase deploy method - https://phabricator.wikimedia.org/T102667#3179878 (10Pchelolo) 05Open>03Resolved a:03Pchelolo I guess after move to scap3 this ticket is obsolete, so I'm closing i... [18:30:37] 10Continuous-Integration-Infrastructure, 07Jenkins, 07Upstream, 07WorkType-NewFunctionality: Jenkins trilead-ssh2 doesn't support our MAC/KEX algorithms - https://phabricator.wikimedia.org/T103351#3179884 (10Paladox) [18:30:40] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 07Jenkins: Upgrade Jenkins from 1.x to latest 2.x - https://phabricator.wikimedia.org/T144106#3179883 (10Paladox) [18:31:21] 10Deployment-Systems, 06Operations, 10RESTBase, 06Services, 13Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#3179892 (10Pchelolo) 05Open>03Resolved a:03Pchelolo After moving to scap3 this ticket is obsolete,... [18:41:50] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-MultimediaViewer: Automated screenshots - https://phabricator.wikimedia.org/T77634#3179999 (10MBinder_WMF) [18:44:42] (03PS1) 10Krinkle: Remove outdated wgIncludejQueryMigrate setting [integration/jenkins] - 10https://gerrit.wikimedia.org/r/348121 [18:45:02] (03CR) 10Krinkle: [C: 032] Remove outdated wgIncludejQueryMigrate setting [integration/jenkins] - 10https://gerrit.wikimedia.org/r/348121 (owner: 10Krinkle) [18:46:35] (03Merged) 10jenkins-bot: Remove outdated wgIncludejQueryMigrate setting [integration/jenkins] - 10https://gerrit.wikimedia.org/r/348121 (owner: 10Krinkle) [19:15:03] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-ResourceLoader, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3180476 (10Krinkle) These... [19:25:54] 06Release-Engineering-Team, 06Operations, 10Phabricator, 10hardware-requests, 10ops-eqiad: replacement hardware for iridium (phabricator) - https://phabricator.wikimedia.org/T156970#3180559 (10RobH) a:05mark>03faidon I'm assigning this to @faidon since @mark is away at present. @faidon: Basically I... [19:31:29] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-ResourceLoader, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3180620 (10phuedx) @Krink... [19:40:01] hashar the regression was fixed in trilead now. :) [19:40:40] hashar also i submitted your systemd change for jenkins upstream with full creds to you. [19:40:57] https://github.com/jenkinsci/packaging/pull/93 [19:43:19] Yippee, build fixed! [19:43:19] Project selenium-MobileFrontend » chrome,beta,Linux,BrowserTests build #390: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/390/ [19:44:03] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-ResourceLoader, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3177934 (10Jdforrester-WMF... [19:48:13] 06Release-Engineering-Team, 10Page-Previews, 06Reading-Web-Backlog: Generate compiled assets from continuous integration - https://phabricator.wikimedia.org/T158980#3180686 (10Jdlrobson) [19:51:51] Yippee, build fixed! [19:51:52] Project selenium-MobileFrontend » firefox,beta,Linux,BrowserTests build #390: 09FIXED in 31 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/390/ [19:57:44] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-ResourceLoader, 10Page-Previews: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3180702 (10Krinkle) >>! I... [20:14:24] hashar: your aware a ton of instances on CI are offline at the moment right? [20:14:32] all are jessie [20:14:55] Zppix that's expected i think (not 100% sure) [20:15:10] paladox: thats why im asking, ive never seen that before [20:15:26] ok [20:16:44] Zppix: the instances are put offline as soon as a build completed on them [20:17:01] hashar: okay, sorry for pinging its just ive never seen that before. [20:17:10] then the instance in wmflabs and the configuration in jenkins is garbage collected on a best effort [20:17:21] so yeah you could see a bunch of offline instances [20:18:01] if you want to see the state of the pool: https://grafana.wikimedia.org/dashboard/db/nodepool [20:18:10] top left graph shows an history [20:18:17] hashar: alright thanks. [20:18:20] top right pie chart gives the current state (more or less) [20:18:46] jenkins got restarted an hour or so ago [20:18:54] so I guess the instances got disposed but not removed from Jenkins [20:19:12] maybe Nodepool will unconfigure them from jenkins [20:19:13] hashar: so just ignore them i take it? [20:54:41] Zppix: yeah you can ignore them. I have manually deleted all those left over slaves [20:54:44] off to bed *wave* [20:54:48] night [21:11:43] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-ResourceLoader, and 4 others: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3181109 (10Jdlrobson) p:05Hig... [21:12:20] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-ResourceLoader, and 4 others: mwext- and mediawiki-extensions-qunit-jessie builds failing in core and other extensions - https://phabricator.wikimedia.org/T162876#3181123 (10Jdlrobson) Oh.. jus... [21:32:39] 10Scap, 06Discovery, 10Wikimedia-Portals, 03Discovery-Portal-Sprint: Portals deployment failed - https://phabricator.wikimedia.org/T161832#3181243 (10debt) 05Open>03Resolved a:03debt This was deployed in production on April 11, 2017 [21:45:19] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T160551#3181330 (10demon) 05Open>03Resolved [21:45:33] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T160552#3181331 (10demon) 05Open>03Resolved [21:50:40] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T162954#3181337 (10greg) [21:50:55] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T161733#3141403 (10greg) [22:09:51] Is something happening to/with Jenkins/Zuul [22:10:04] All queues are stalled [22:10:13] Only one patch in gate-and-submit has a job running [22:10:37] oh [22:10:41] it does seem stalled [22:10:55] not sure who to ping [22:11:25] Krinkle: Do you know who can fix Zuul? [22:11:31] Or should I try it myself based on the docs? [22:11:33] It isen't zuul [22:11:36] it's nodepool [22:13:32] jenkins/nodepool are running a lot of tests right now: [22:13:34] https://integration.wikimedia.org/ci/ [22:14:02] nthing looks strange here: https://grafana.wikimedia.org/dashboard/db/nodepool?orgId=1 [22:14:06] Oh, they are now [22:14:11] When I complained, only one job was active [22:14:33] Still, there aren't terribly many workers it looks like [22:14:34] See https://integration.wikimedia.org/zuul/ [22:15:00] there were some launch errors, apparently (see one of the boxes on that grafan link) [22:15:13] It looks like a lot of load at /ci , but it's really only 2 patchsets' worth of jobs [22:15:22] RoanKattouw: to see workers, look at https://integration.wikimedia.org/ci/ or the grafana link, zuul hides most of it [22:15:22] There are another dozen or so waiting [22:15:28] greg-g: https://upload.wikimedia.org/wikipedia/en/8/87/Failure_to_Launch.jpg [22:15:57] I don't see anything wrong [22:16:07] Oh I see, it's adding more workers [22:16:32] greg-g: I don't know the internals very well any more since it was ported to nodepool, I just know that normally the system can deal with more than 2 patches in parallel and right now it's failing to do that [22:16:45] let's not jump to conclusions [22:17:02] https://integration.wikimedia.org/zuul/ looks pretty wrong in that there are patches that were submitted 15+ mins ago that still haven't had their tests run [22:17:11] those two jobs are maxing out our pool of vms, so, of course others will wait will gate-and-submit clears [22:17:25] of course, because of priority of the queues [22:17:43] tests wait until after merges [22:18:01] RoanKattouw: it looks okay to me some tests that are ran take a little longer than others (jessie tests have to wait for a new instance to be built due to instance deletion after ever build completion) [22:18:02] Even the gate-and-submit queue is behind, and there are only three patches in it [22:18:15] # of patches != amonut of work [22:18:18] Maybe it's just recovering from those earlier launch errors? [22:18:30] RoanKattouw you need to click on mediawiki/core changes [22:18:38] Yeah I know they have a lot of jobs for one patch [22:18:39] they run alot of tests [22:18:44] But they're all finished [22:18:58] Oh now it's started some more [22:19:09] I think nodepool is doing the right things afaict [22:20:16] yup [22:20:21] Yeah [22:20:34] RoanKattouw: it would be really noticeable with something was really wrong... there would be more jobs queued up then this [22:20:36] I just wonder why it feels so much slower than normal [22:20:46] Maybe there were just a lot of patches [22:21:41] I'm not sure how much the launch errors factored into it, honestly, maybe a lot? [22:22:04] Also, ultimately the goal here is to process patches [22:22:25] If it takes more than half an hour to process a patch, than that is itself a problem [22:22:35] Maybe a temporary one caused by high traffic, sure, but still [22:23:13] RoanKattouw: if anything if something is urgent someone just can easily v:2 and c:2 and submit the patch theirselves [22:23:29] RoanKattouw: we know. And we're between a rock and ahard place here. We could up the quota for nodepool but there are reasons not to do that. [22:23:38] Zppix: please don't encourage that, it makes zuul take longer [22:23:51] greg-g: hence is why i said if urgent [22:23:59] Zppix: he knows [22:24:09] I'm not going to do that except maybe if it interferes with SWAT in an urgent way [22:24:13] *in a major way [22:24:33] Like, if there's a deployment and Jenkins is hopelessly broken, I might V+2 wmf branch cherry-picks [22:24:44] But otherwise I wouldn't do that [22:24:55] RoanKattouw: are you on qa@lists ? [22:25:06] Right now it's not hopelessly broken, just slow [22:25:11] greg-g: I don't think so [22:25:50] RoanKattouw: s'alright :) But there's a Proof of Concept for using local docker containers to speed up some of the lint-type jobs that Tyler is looking into [22:26:00] Nice [22:26:19] (lint-type jobs as a first pass on the easy target) [22:26:23] Also as much as I like complaining, this is the first time that I have reason to complain about Zuul in many months [22:26:24] greg-g: is the qa lists open to volunteer relengers? [22:26:32] Zppix: it's a public list [22:26:40] RoanKattouw: :) [22:26:47] ok [22:26:49] As I was telling JR earlier, it used to be that Zuul/Jenkins became totally useless every day between noon and 1pm [22:26:53] Those dark days are long past [22:27:22] that time has now moved to weekends (they are pretty dead) and really late at night UTC-5 [22:28:05] Hmm according to https://integration.wikimedia.org/ci/ there seem to be 5 perpetually idle workers [22:28:20] Oh, nm [22:28:24] Those are just the "building" ones [22:28:35] Their number stays in sync with the number of "building" workers at https://grafana.wikimedia.org/dashboard/db/nodepool?orgId=1 [22:28:50] should [22:28:50] creating a new VM to run each job adds a very significant deloy [22:29:00] *delay [22:29:08] welcome to the rant part bd808 :) [22:29:13] party* [22:29:18] but this is going to be fixed :) [22:29:34] welcome to releng therapy :P [22:30:02] RoanKattouw: you are just several weeks late to the angry rant party. folks are working on a better thing [22:30:08] Excellent [22:30:24] Yeah I'm only here because this time it's my patch that 30 minutes after being submitted still hasn't started most of its jobs [22:30:34] Oh, they just started, yay [22:31:11] 30 minutes? Pshaw, that's basically lightspeed for nodepool backed CI [22:31:33] Last week, I waited upwards of 3-4 hours to shove ~30 patches through CI [22:31:37] * RainbowSprinkles mutters something about lawns [22:33:56] * bd808 steals some hard candy from the dish on RainbowSprinkles's coffee table and runs [22:34:19] mmmm, werthers!! [22:36:37] be nice you too xD [22:43:29] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [23:28:31] 10Deployment-Systems, 10RESTBase, 06Services: RESTBase deployment process - https://phabricator.wikimedia.org/T103344#3181815 (10greg) @mobrovac status of this one? [23:31:02] Hmm the Rebase If Necessary setting in Gerrit doesn't seem to work so well [23:31:20] Changes in mw-config fail with "Cannot Merge" all the time, and a simple press of the Rebase button fixes it [23:31:34] It seems to be almost impossible to +2 two changes at the same time and have both of them merge [23:35:45] RoanKattouw try depending on the changes you are about to merge [23:50:16] 06Release-Engineering-Team, 06Operations, 10Phabricator, 10hardware-requests, 10ops-eqiad: replacement hardware for iridium (phabricator) - https://phabricator.wikimedia.org/T156970#3181938 (10faidon) a:05faidon>03RobH That allocation sounds temporary, right? Sounds fine. [23:51:15] Feature request: can we have the canary check catch notices like "Undefined variable $blah"? [23:51:51] We accidentally deployed a config patch that caused ~10 warnings per second of the form Apr 13 23:44:16 mw1200: #012Notice: Undefined variable: wmgRelatedArticlesFooterBlacklistedSkins in /srv/mediawiki/wmf-config/CommonSettings.php on line 2878 [23:52:07] If canary checked for that, it would have stopped it from being deployed [23:53:47] RoanKattouw: #REDIRECT phabricator.wikimedia.org [23:53:48] ;) [23:53:54] wilco [23:53:57] roger [23:54:41] I think Timo has been complaining about that for a while ;) [23:55:39] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10Icinga, 06Operations, 13Patch-For-Review: remove/fix jenkins icinga monitoring on contint2001 - https://phabricator.wikimedia.org/T162822#3181965 (10Dzahn) "jenkins_service_running" is now gone on contint2001 (and fine on contint1001... [23:57:16] 06Release-Engineering-Team, 06Operations, 10Phabricator, 10hardware-requests, 10ops-eqiad: replacement hardware for iridium (phabricator) - https://phabricator.wikimedia.org/T156970#3181972 (10RobH) a:05RobH>03faidon >>! In T156970#3181938, @faidon wrote: > That allocation sounds temporary, right? So... [23:57:35] 10Gerrit: REL1_28 checkout of extensions.git is broken - https://phabricator.wikimedia.org/T162884#3181976 (10Betacommand) p:05Triage>03High [23:58:10] I'm rather confused now, last month when a fatal brought down the cluster, we realised the canary check only checks type:mediawiki in logstash (e.g. php error/warning etc.) but didn't look at hhvm's fatal log. that was fixed. [23:58:23] But if it also doesn't catch php notices, then that means the former isn't working either. [23:58:58] Yeah, I was complaining about unconditional fatals not being detected, but I do respect the current thing it does as well, catching increase in warnings is much harder to detect manually than fatals. [23:59:20] So it's good to have, but if it doesn't catch those warnings currently either, I'm concerned.