[00:02:47] (03CR) 10Krinkle: "@Peter I guess our instance is more stable, so this may not be needed. But figured it might be work experimenting for a few days and see i" [integration/config] - 10https://gerrit.wikimedia.org/r/341461 (owner: 10Krinkle) [00:45:01] Project beta-code-update-eqiad build #146082: 04FAILURE in 2 min 0 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/146082/ [00:49:36] Probably my fault ^ [00:52:23] 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review, 07Regression, 07Upstream: Cannot log into Gerrit as of recent upgrade - https://phabricator.wikimedia.org/T152640#3078308 (10demon) [00:52:26] 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Gerrit: Schedule downtime to migrate all users username to lowercase - https://phabricator.wikimedia.org/T155766#3078306 (10demon) 05Open>03Resolved Downtime was today [00:52:45] PROBLEM - Puppet run on integration-slave-trusty-1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [00:52:57] 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review, 07Regression, 07Upstream: Cannot log into Gerrit as of recent upgrade - https://phabricator.wikimedia.org/T152640#2856099 (10demon) 05Open>03Resolved Ok, not only is this issue fixed, but we've finally converted to case-insensitive logins so t... [00:54:36] (03PS1) 10Mholloway: [Android] Capture lint reports from new /build/reports directory [integration/config] - 10https://gerrit.wikimedia.org/r/341473 [00:55:02] Yippee, build fixed! [00:55:02] Project beta-code-update-eqiad build #146083: 09FIXED in 2 min 1 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/146083/ [01:08:07] 10Deployment-Systems: Setup alerts for l10nupdate - https://phabricator.wikimedia.org/T107515#1496925 (10MZMcBride) What does this task mean? Can someone please expand the task description or close this task? [01:27:48] RECOVERY - Puppet run on integration-slave-trusty-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [02:31:10] 06Release-Engineering-Team, 10MediaWiki-Configuration, 13Patch-For-Review, 06Services (watching): Automate WMF wiki creation - https://phabricator.wikimedia.org/T158730#3045487 (10Legoktm) Aren't dblists already in a standard format (newline delimited plain text) that we distribute across the cluster via s... [02:55:44] 06Release-Engineering-Team, 10MediaWiki-Configuration, 13Patch-For-Review, 06Services (watching): Automate WMF wiki creation - https://phabricator.wikimedia.org/T158730#3078564 (10demon) >>! In T158730#3078459, @Legoktm wrote: > Aren't dblists already in a standard format (newline delimited plain text) tha... [02:56:44] 06Release-Engineering-Team, 10MediaWiki-Configuration, 13Patch-For-Review, 06Services (watching): Automate WMF wiki creation - https://phabricator.wikimedia.org/T158730#3078565 (10demon) >>! In T158730#3078564, @demon wrote: >>>! In T158730#3078459, @Legoktm wrote: >> Aren't dblists already in a standard f... [03:31:20] 06Release-Engineering-Team, 10MediaWiki-Configuration, 13Patch-For-Review, 06Services (watching): Automate WMF wiki creation - https://phabricator.wikimedia.org/T158730#3078688 (10tstarling) The point is that updating dblists via gerrit and running scap is one of the avoidable steps in the task description... [04:08:31] Yippee, build fixed! [04:08:31] Project selenium-MultimediaViewer » safari,beta,OS X 10.9,BrowserTests build #322: 09FIXED in 12 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=BrowserTests/322/ [04:20:18] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-OAuth, 10Pywikibot-OAuth, 07Pywikibot-network, 07Pywikibot-tests: "Nonce already used" regularly occurring on beta cluster - https://phabricator.wikimedia.org/T109173#3078775 (10Tgr) 05Open>03Resolved a:03Tgr Has not happened for a long time. [05:01:50] (03PS1) 10Aude: Bump wikidata to wmf/1.29.0-wmf.15 [tools/release] - 10https://gerrit.wikimedia.org/r/341489 [05:34:06] 06Release-Engineering-Team, 10MediaWiki-Configuration, 13Patch-For-Review, 06Services (watching): Automate WMF wiki creation - https://phabricator.wikimedia.org/T158730#3078861 (10Joe) >>! In T158730#3078688, @tstarling wrote: > The point is that updating dblists via gerrit and running scap is one of the a... [05:49:34] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<44.44%) [06:44:35] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [06:47:35] 10Continuous-Integration-Config, 10ContentTranslation, 03Language-2017 Sprint 2, 03Language-2017 Sprint 3, and 4 others: mwext-qunit-jessie test fails on unrelated change - https://phabricator.wikimedia.org/T153038#2867222 (10Arrbee) [08:16:19] !log Pushing new Jessie image: image-jessie-20170306T224719Z.qcow2 [08:16:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:17:57] 06Release-Engineering-Team, 10Page-Previews, 06Reading-Web-Backlog: Generate compiled assets from continuous integration - https://phabricator.wikimedia.org/T158980#3079135 (10zeljkofilipin) Besides selenium/webdriverio I have little experience with npm jenkins jobs. I think this is a job for @hashar. [08:18:18] twentyafterfour: if you are still around. The content translation / qunit failure turned out to be due to chromium version :/ [08:18:53] twentyafterfour: and if interested, I can probably demo out how to run qunit [08:25:42] !log Image snapshot-ci-jessie-1488874660 in wmflabs-eqiad is ready (Chromium 55->56 among others) - T153038 [08:25:46] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:25:47] T153038: mwext-qunit-jessie test fails on unrelated change - https://phabricator.wikimedia.org/T153038 [08:34:30] zeljkof: good morning [08:34:41] zeljkof: so somehow apt-get upgrade was stall on Jessie images since end of december [08:35:03] zeljkof: and chromium has been updated from 55 to 56 which might cause mwext-mw-selenium browser tests to fail [08:37:48] hashar: tests should not fail if we have a recent version of chromedriver [08:38:04] let me know if you notice failures [08:39:57] 10Continuous-Integration-Config, 10ContentTranslation, 03Language-2017 Sprint 2, 03Language-2017 Sprint 3, and 4 others: mwext-qunit-jessie test fails on unrelated change - https://phabricator.wikimedia.org/T153038#3079202 (10hashar) 05Open>03Resolved Restoring the tests. The root cause is that the tes... [08:47:34] (03CR) 10Hashar: [C: 032] Bump wikidata to wmf/1.29.0-wmf.15 [tools/release] - 10https://gerrit.wikimedia.org/r/341489 (owner: 10Aude) [08:49:27] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T158996#3053926 (10hashar) Wikidata is bumped as well {nav wmf/1.29.0-wmf.10 > wmf/1.29.0-wmf.15} ( https://gerrit.wikimedia.org/r/#/c/341489/ ) [08:51:52] (03Merged) 10jenkins-bot: Bump wikidata to wmf/1.29.0-wmf.15 [tools/release] - 10https://gerrit.wikimedia.org/r/341489 (owner: 10Aude) [08:53:51] zeljkof: looking at your selenium 3 version bump ( https://gerrit.wikimedia.org/r/#/q/is:open+bug:T158074 ) [08:53:51] T158074: Update Ruby tests to Selenium 3 - https://phabricator.wikimedia.org/T158074 [08:54:09] that picked up selenium 3.1 , should we redo them to use 3.2 or even just drop the Gemfile.lock from the repos [08:54:10] hashar: thanks! [08:54:11] ? [08:54:26] Gemfile.lock should stay [08:54:38] it is recommended by bundler to have it [08:55:01] I think 3.1 is fine [08:55:19] we did not release the new version of the gem that depends on selenium 3.2 yet anyway [08:55:30] ahh [08:55:39] also would be nice to figure out what the new Logger class is about [08:55:44] the important thing is to move from selenium 2 to 3, the rest are details [08:56:02] go ahead ;) [08:56:34] and some repos do not trigger mwext-mw-selenium :/ [08:57:46] yes, it is not configured for all repos [08:58:06] going to add it while I am reviewing the changes :D [08:58:21] go ahead [08:58:40] I am not sure why we do not have that done already :| [09:05:53] (03PS1) 10Hashar: Add experimental mwext-mw-selenium-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/341506 (https://phabricator.wikimedia.org/T55697) [09:06:01] zeljkof: https://gerrit.wikimedia.org/r/341506 [09:06:11] adds it to experimental pipeline for the few repo that do not trigger the job yet [09:06:51] hashar: looks good, will +2 [09:09:43] zeljkof: and I guess the ones for Flow should be made voting :} [09:09:46] since that pass on https://gerrit.wikimedia.org/r/#/c/340123/ [09:10:03] (03CR) 10Zfilipin: [C: 032] Add experimental mwext-mw-selenium-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/341506 (https://phabricator.wikimedia.org/T55697) (owner: 10Hashar) [09:10:28] hashar: go for it ;) [09:10:55] gotta check with devs first I guess [09:11:20] push the patch and cc them [09:13:01] (03Merged) 10jenkins-bot: Add experimental mwext-mw-selenium-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/341506 (https://phabricator.wikimedia.org/T55697) (owner: 10Hashar) [09:14:07] deployed ^ [09:38:56] 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic, 13Patch-For-Review, 07Tracking: [EPIC] trigger browser tests from Gerrit (tracking) - https://phabricator.wikimedia.org/T55697#3079366 (10hashar) Have to make the following extension to trigger the job on patchset proposal: CentralAuth... [10:05:41] (03Abandoned) 10Zfilipin: WIP Increase in failures caused by Saucelabs [selenium] - 10https://gerrit.wikimedia.org/r/338785 (https://phabricator.wikimedia.org/T152963) (owner: 10Zfilipin) [10:15:31] 10Gerrit: Gerrit upload-pack send ALL references causing massive network I/O on common operations - https://phabricator.wikimedia.org/T103990#3079455 (10hashar) + @demon The summary is that `GIT_TRACE_PACKET=1 git fetch` shows an initial advertisement of all of `refs/changes/*` which are usually not needed wh... [10:34:47] 06Release-Engineering-Team, 06Operations, 07Beta-Cluster-reproducible, 07HHVM, 15User-Joe: Switch mwscript from Zend PHP5 to default php alternative (egHHVM) - https://phabricator.wikimedia.org/T146285#3079463 (10hashar) [10:36:30] 10Continuous-Integration-Infrastructure, 10Monitoring, 13Patch-For-Review: Alert when Zuul/Gearman queue is stalled - https://phabricator.wikimedia.org/T70113#3079478 (10hashar) 05Open>03Resolved a:03hashar There is a now a basic threshold alerting when there are more than a hundred functions waiting f... [12:51:42] zeljkof: soo been running your wdio patch :-} [12:51:55] hashar: what do you think? [12:52:02] I found a way to start chromedriver automatically [12:52:07] nice! [12:52:10] using the webdriver.io hook onPrepare [12:52:30] I remember trying the hooks but not being able to do it [12:52:35] probably used the wrong hook or somehting [12:57:11] (03PS1) 10Zfilipin: WIP Problem: Can not use --retry option to retry failed tests as part of the same run [selenium] - 10https://gerrit.wikimedia.org/r/341523 (https://phabricator.wikimedia.org/T152963) [12:58:28] * zeljkof is out of lunch [13:15:35] Failed to take screenshot on reject: [13:16:13] "UnexpectedAlertOpen","message":"A modal dialog was open, blocking this operation" [13:16:25] so apparently whenever there is an alert() ... we can't screenshot :D [13:17:58] 10Beta-Cluster-Infrastructure, 10Cognate, 10MediaWiki-extensions-InterwikiSorting, 13Patch-For-Review, and 2 others: Create beta hewiktionary for testing InterwikiSorting & Cognate - https://phabricator.wikimedia.org/T158628#3079954 (10Addshore) 05Open>03Resolved [13:27:52] hashar: hm, could be [13:28:00] did not see a popup in a while :) [13:30:29] hashar: whats the task for the mwnext-qunit-jessie error [13:30:34] mwext* [13:39:34] (03PS2) 10Zfilipin: WIP Problem: Can not use --retry option to retry failed tests as part of the same run [selenium] - 10https://gerrit.wikimedia.org/r/341523 (https://phabricator.wikimedia.org/T152963) [13:40:04] Zppix: the one for ContentTranslation I solved it earlier today [13:40:24] Zppix: the task should be in my last entries of the SAL http://ur1.ca/nhui9 [13:40:27] Zppix: https://phabricator.wikimedia.org/T153038 [13:40:38] hashar: ok, any word about the mediawiki-qunit-jessie failing? [13:40:47] zeljkof: thanks! [13:41:28] zeljkof: thanks! [13:41:34] zeljkof: oops sorry [13:42:20] no problem :) [13:42:30] feel free to thank me one more time ;) [13:46:01] zeljkof: https://gerrit.wikimedia.org/r/341528 follows up your change with some tweaks [13:46:09] Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #329: 04FAILURE in 2 min 7 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/329/ [13:46:26] zeljkof: lol [13:46:56] hashar: looking [13:48:14] hashar: nice [13:48:47] I will squash in your change the part that adds wdio.conf.default.js [13:48:53] and use it as a reference for jenkins/vagrant env [13:49:08] the auto spawning of chromedriver, I guess that will be a follow up change [13:49:37] do you know if there is a way to detect we are inside mw/vagrant? eg some env variable being set? [13:50:06] ah MWV [13:50:43] hashar: in js cant you do an @REQUIRE or something [13:52:18] hashar: go for it [13:52:25] I think the patch is good as is [13:52:33] and all changes can be made later [13:52:53] since you have vagrant, can you check whether there is a MWV* env variable set inside of it ? [13:52:55] but I don't really care, as far as the commit gets merged [13:53:01] export|grep MWV_ [13:53:02] hashar: sure [13:54:03] no MWV [13:54:07] but this could be useful [13:54:08] HOME="/home/vagrant" [13:54:18] LOGNAME="vagrant" [13:54:41] PWD="/home/vagrant" [13:54:53] USER="vagrant" [13:56:45] hashar: for integration repos it would be nice to have a typos files similar to operations repos, it could help prevent simple errors in the future just putting that out there [13:59:19] Zppix: one can invoke something like that from npm : https://www.npmjs.com/package/grunt-tyops :} [13:59:42] hashar: one that has more knowledge of npm then me [14:04:23] 10Scap, 10Parsoid: Check 'depool' exceeded 30.0s timeout - https://phabricator.wikimedia.org/T159387#3080096 (10thcipriani) p:05Triage>03Normal > Should we maybe increase the 30s timeout, or allow for a higher failure limit? Scap supports both, a few config changes can make either happen. Command checks... [14:10:15] zeljkof: what is your thought about moving the wdio.conf files under tests/selenium/ ? [14:10:30] hashar: I don't care [14:10:36] I am even tempted to drop the jenkins one and define settings directly inside gruntfile.js :D [14:10:52] (03PS2) 10Mholloway: [Android] Capture lint reports from new /build/reports directory [integration/config] - 10https://gerrit.wikimedia.org/r/341473 [14:11:01] they would probably be there if there was no problems with it [14:11:31] hashar: i'd think that would be a good idea [14:21:56] Don't want to bug you with unnecassary stuff, so I'm asking this first: Is it a usual thing that CI fails but then passes without changes by just doing recheck and I shouldn't care about this or are you guys interested in getting information about such cases to debug? [14:23:02] PROBLEM - Puppet run on deployment-copper is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:34:57] hashar: do you remember, is there a way to rerun a cucumber scenario if it fails from a cucumber hook? [14:35:05] I remember seeing it somewhere, can not find it now [14:38:39] zeljkof: with the Around() hook? [14:38:58] I think so [14:40:11] maybe this, but not sure [14:40:11] https://github.com/cucumber/cucumber-ruby/issues/882#issuecomment-182280246 [14:40:19] but maybe cucumber shallow all exceptions [14:40:41] 10Gerrit, 06Developer-Relations, 10Developer-Wishlist (2017): Add a welcome bot to Gerrit for first time contributors - https://phabricator.wikimedia.org/T73357#3080158 (10TheDJ) Small thought. Remember the discussion we had at the devsummit regarding human interaction... with regard to easy tasks @QuimGil a... [14:41:41] zeljkof: pretty sure I suggested Around, but maybe that happens too late [14:41:49] eg after the exception got handled by cucumber itself [14:42:12] I remember seeing some code somewhere that looked promising... can not find it now [14:43:19] I give up for today, need to sleep on it [14:43:28] nothing worked, whatever I tried [14:44:01] hm, one more idea [14:50:01] zeljkof: and I polished my follow up patch https://gerrit.wikimedia.org/r/341528 Draft to enhance webdriver.io integration :) [14:53:35] hashar: I am implementing changes you have suggested for webdriverio patch [14:55:37] zeljkof: I did them already [14:55:50] hashar: oh, sorry, did not notice [14:56:16] I don't see your PS in gerrit [14:56:19] did you push it? [14:56:19] you can review my follow up patch [14:56:45] sure, but it's little use of it until the parent one is merged [15:00:01] zeljkof: commented on my https://gerrit.wikimedia.org/r/#/c/341528/2 [15:00:03] why do you check experimental on it ? [15:00:16] ohh [15:00:20] for npm run selenium ... [15:01:22] yes [15:05:06] PROBLEM - Puppet run on deployment-aqs01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:05:15] zeljkof: you can have a glance at my follow up patch [15:05:23] if that make sense I will move the bits to your patch [15:05:26] hashar: looking [15:05:31] and I guess we will have something robust :} [15:05:44] spawning chromedriver automagically with the onPrepare hook, I will craft another change for that [15:05:46] it does not run on my machine :( [15:06:00] needs some review by someone who knows about node js [15:06:02] bah [15:06:57] hashar: take a look at my comment [15:07:49] works fine in jenkins: https://integration.wikimedia.org/ci/job/mediawiki-core-selenium-jessie/71/console [15:07:54] bah [15:08:40] fixed in PS3 [15:10:45] hashar: ok, works now, but all the tests fail for me, looks like the default config no longer works for me [15:12:10] under vagrant? [15:12:10] hashar: how do I tell it I want to target local vagrant instance? [15:12:25] no, I am running from host machine, targeting vagrant vm [15:12:30] ohhhhhhhhh [15:12:51] what I did is that the default conf ( tests/selenium/wdio.conf.js ) looks at env variables [15:13:10] eg: MEDIAWIKI_USERNAME MEDIAWIKI_PASSWORD MW_SERVER MW_SCRIPT_PATH [15:13:43] the vagrant one is only triggered if your username is vagrant [15:14:11] so I should do: USERNAME=vagrant npm run selenium [15:14:37] no [15:14:41] still fails [15:15:08] RECOVERY - Puppet run on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:15:54] USER [15:15:59] err [15:16:05] I think i got a fix :} [15:18:52] this works for me: USER=vagrant npm run selenium [15:19:12] yup [15:19:31] the Gruntfile has a switch to point to the vagrant config file [15:20:12] and with PS4 I made it so you can run: npm run selenium:vagrant [15:20:22] which invokes grunt webdriver:vagrant [15:20:30] https://gerrit.wikimedia.org/r/#/c/341528/3..4/Gruntfile.js [15:26:30] !log ci-staging, enabling puppet master auto signing ( puppetmaster::autosigner: true ) [15:26:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:47:34] 10Beta-Cluster-Infrastructure, 13Patch-For-Review, 15User-Ladsgroup, 03Wikidata-Sprint: Make beta German Wiktionary - https://phabricator.wikimedia.org/T150764#3080397 (10WMDE-leszek) 05Open>03Resolved https://de.wiktionary.beta.wmflabs.org/wiki/Hauptseite is there, thus closing the ticket. [16:03:18] 10Gerrit, 06Release-Engineering-Team: Update gerrit to 2.13.6 - https://phabricator.wikimedia.org/T158946#3080461 (10Paladox) I think there planing a .7 release this month. [16:04:10] RainbowSprinkles thanks for doing the lower case usernames in gerrit :) [16:43:36] (03CR) 10Niedzielski: [C: 032] [Android] Capture lint reports from new /build/reports directory [integration/config] - 10https://gerrit.wikimedia.org/r/341473 (owner: 10Mholloway) [16:45:59] (03Merged) 10jenkins-bot: [Android] Capture lint reports from new /build/reports directory [integration/config] - 10https://gerrit.wikimedia.org/r/341473 (owner: 10Mholloway) [16:56:09] RainbowSprinkles i figured out another fix for the prefix problem that should hopefully fix it one and for all :), the user of the server should create a js file under prefixed url like /r/ <-- thats the dir on the disk. then do var polygerrit_baseurl = '/r'; [16:56:19] which would then be loaded in polygerrit. [16:57:05] That could also fix it so we wont need rewrites but i would need to test something else for that :) [17:37:55] do the wikis 'ee_prototypewiki', 'en_rtlwiki', or 'deploymentwiki' mean anything with regards to beta cluster? I have some old search indexes for them on the deployment-prep elasticsearch servers but `expanddblist all` from deployment-tin doesn't think they exist [17:39:42] ebernhardson: https://en-rtl.wikipedia.beta.wmflabs.org/wiki/Main_Page [17:40:06] https://deployment.wikipedia.beta.wmflabs.org/wiki/Main_Page [17:40:17] hmm, wonder where they come from if not in `expanddblist all` :S [17:40:26] https://noc.wikimedia.org/conf/highlight.php?file=all-labs.dblist [17:40:46] I've got a feeling that ee_prototype is dead [17:41:00] oh i completely forgot, beat uses all-labs and not all. Thanks! [17:41:04] heh [18:16:00] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [18:24:48] 10Deployment-Systems: Setup alerting for l10nupdate - https://phabricator.wikimedia.org/T107515#3081136 (10greg) [18:25:06] 10Deployment-Systems, 10Monitoring: Setup alerting for l10nupdate - https://phabricator.wikimedia.org/T107515#1496925 (10greg) [18:28:25] 10Continuous-Integration-Config, 06Wikipedia-Android-App-Backlog, 07Technical-Debt: Figure out the right way to keep the Android SDK up to date in CI - https://phabricator.wikimedia.org/T158456#3081168 (10Mholloway) `integration-slave-jessie-100[1,2]` both still have an old version of the platform-tools, v.... [18:31:53] 10Scap: Add a delay configuration option to checks - https://phabricator.wikimedia.org/T159867#3081181 (10thcipriani) [18:35:37] 10Scap, 06Services (later), 15User-mobrovac: Delay repooling trending service after a restart - https://phabricator.wikimedia.org/T156687#3081211 (10thcipriani) >>! In T156687#3071520, @Fjalapeno wrote: > @mobrovac what is the ETA on this? Just curious for planing FWIW, the ability to use the work-around i... [18:42:05] 10Continuous-Integration-Config, 06Wikipedia-Android-App-Backlog, 13Patch-For-Review, 07Technical-Debt: Figure out the right way to keep the Android SDK up to date in CI - https://phabricator.wikimedia.org/T158456#3081237 (10Mholloway) [18:44:21] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta cluster scap job ( beta-scap-eqiad ) fails due to puppet erasing /etc/ssh/ssh_known_hosts - https://phabricator.wikimedia.org/T159332#3081251 (10thcipriani) 05Open>03stalled So I put a dumb fix in place and copied a good version of the `known_... [19:35:33] 10Scap, 10Parsoid: Check 'depool' exceeded 30.0s timeout - https://phabricator.wikimedia.org/T159387#3081502 (10Arlolra) @thcipriani Thanks for the pointers. We'll discuss what's appropriate as a team. Please have a look at D588 though, since I got, ``` 21:40:51 [wtp1014.eqiad.wmnet] Check 'repool' exceeded... [19:37:14] 10Gerrit: git review -d fails with 404 on https://gerrit.wikimedia.org/changes/ endpoint - https://phabricator.wikimedia.org/T159869#3081507 (10EBernhardson) [19:39:29] 10Gerrit: On Dashboard, add column with reviewer/reviewcount - https://phabricator.wikimedia.org/T50827#3081539 (10demon) 05Open>03Resolved a:03demon This is mostly ok imho [19:43:55] (03CR) 10Phedenskog: "Yes we should try that, we haven't done any testing since we increased the size of the agent. I think we should make it 7 runs to make sim" [integration/config] - 10https://gerrit.wikimedia.org/r/341461 (owner: 10Krinkle) [19:44:58] 10Gerrit: git review -d fails with 404 on https://gerrit.wikimedia.org/changes/ endpoint - https://phabricator.wikimedia.org/T159869#3081507 (10Paladox) We could just do a rewrite? @demon ? Since duplicate tasks keep getting created. [19:50:04] (03CR) 10Phedenskog: "We could also drastically increase the number, not the magic 21 (https://calendar.perfplanet.com/2016/measuring-webpagetest-precision/) bu" [integration/config] - 10https://gerrit.wikimedia.org/r/341461 (owner: 10Krinkle) [19:56:08] 10Gerrit: git review -d fails with 404 on https://gerrit.wikimedia.org/changes/ endpoint - https://phabricator.wikimedia.org/T159869#3081637 (10demon) I'm not rewriting/patching/maintaining `git-review` [20:01:01] (03CR) 10Krinkle: "@Peter: Sounds good. I'm worried about job execution time though." [integration/config] - 10https://gerrit.wikimedia.org/r/341461 (owner: 10Krinkle) [20:35:06] 06Release-Engineering-Team, 06Operations, 10Phabricator, 13Patch-For-Review: Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3081786 (10Paladox) I believe that we now have full support for debian jessie as far as i can tell.... [20:38:22] thcipriani: o/ any success? :] [20:39:57] hashar: not particularly, trying to run gradle build. unable to find tools.jar...still working on that one. Slowly working through error messages. [20:40:23] but! I think that's the last step [20:40:31] well. [20:40:39] last step in building .hpi anyway [20:41:04] I realized that building locally used a much newer version of the jdk, so jenkins was unhappy [20:41:54] Project selenium-Echo » chrome,beta,Linux,BrowserTests build #326: 04FAILURE in 52 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/326/ [20:43:22] thcipriani: I have hit that tools.jar issue a bunch of time [20:43:48] $ dpkg -S /usr/lib/jvm/java-7-openjdk-amd64/lib/tools.jar [20:43:48] openjdk-7-jdk:amd64: /usr/lib/jvm/java-7-openjdk-amd64/lib/tools.jar [20:44:02] thcipriani: maybe you need that jdk package installed ? [20:44:28] and sometime build script fail to find it. Might want to set JAVA_HOME= /usr/lib/jvm/java-7-openjdk-amd64/ [20:44:51] or who knows which env variable [20:46:30] hrm [20:47:05] so I have java-7-openjdk installed [20:47:20] I don't have the lib directory in /usr/lib/jvm/java-7-openjdk [20:47:42] libmaven3-core-java : Breaks: gradle (< 2.7-4) but 1.5-2 is to be installed [20:47:45] F******** [20:48:08] I had to use backports to get a new enough version of gradle fwiw [20:48:29] sudo apt-get install -t jessie-backports gradle [20:48:30] yeah [20:48:37] looks like it is downloading internet [20:49:01] seems to be par for the course in my experience so far [20:50:44] 06Release-Engineering-Team, 06Operations, 10Phabricator, 13Patch-For-Review: Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3081822 (10Dzahn) >>! In T158434#3081786, @Paladox wrote: > I believe that we now have full support... [20:52:20] E_NOSPACELEFT [20:52:24] wth! [20:53:38] BUILD SUCCESSFUL -- Total time: 3 mins 47.687 secs [20:53:52] where are you doing this? [20:53:57] on my machine!! [20:54:21] but maybe I have a gradle alias that always cause the build to pass [20:54:26] it is on a Jessie box [20:54:42] do you have openjdk-7-jdk installed ? [20:55:12] locally I'm on stretch [20:55:34] maybe try pointing JAVA_HOME [20:55:35] so I'm doing this on the ci-staging-jenkins-master01 box in my home dir at the moment :) [20:55:40] the installs are under /usr/lib/jvm [20:55:47] convenient [20:56:02] and yeah that has openjdk-7-jdk [20:56:19] cp: cannot create regular file ‘../jenkins-bootstrap/docker_build/plugins/’: No such file or directory [20:56:20] bah [20:56:25] (on my machine) [20:56:51] yeah, I got that locally as well [20:57:29] find /usr/lib/jvm -name tools.jar [20:57:33] yields nothing on that instance [20:57:53] openjdk-7-jdk: [20:57:53] Installed: (none) [20:57:58] there are several openjdk packages [20:58:13] the instance only has the JRE one [20:58:16] and you need the JDK [20:58:25] (dont ask me what the difference I have absolutely no idea) [20:58:42] so in theory: sudo apt-get install openjdk-7-jdk [20:58:44] gradle [20:59:14] (it looks like I am a genius on that one, but I must have wasted half a day fighting the same thing at some point) [20:59:47] yeah, I realized I was misreading apt-cache policy output just now [20:59:50] installing [21:00:20] and you will need https://github.com/pearsontechnology/jenkins-bootstrap in the parent dir [21:00:33] errr [21:00:38] Cloning into 'jenkins-bootstrap'... [21:00:38] Username for 'https://github.com': [21:01:28] yeah maybe that is just to create a docker that has jenkins + the plugin + everything [21:02:44] and apparently one can give it a version with VERSION [21:03:33] they also branch their versions instead of tagging looks weird [21:05:16] heh that is weird *cough* https://github.com/wikimedia/mediawiki/tree/wmf/1.29.0-wmf.14 *cough* [21:07:41] :D [21:11:08] so once you have the hpi [21:11:18] you can upload it via the jenkins webui in the plugin manager page [21:11:22] might have to reboot [21:11:35] and I guess you will need rkt installed [21:15:36] well [21:15:51] I see the deployenvironment builder at least [21:16:06] I guess that means it's installed for some value of "installed" [21:16:55] neat [21:17:18] :) [21:17:25] what is the url of the jenkins master already? [21:17:39] http://ci-staging-jenkins.wmflabs.org/ci/ [21:18:04] there is a neat thing to debug plugins [21:18:08] which is to add some loggers in http://ci-staging-jenkins.wmflabs.org/ci/log/ [21:18:17] that is based on I think log4j [21:18:27] which in python world would be 'logging' [21:18:29] you can define a logger [21:19:19] that would capture some key. [21:19:28] but the plugins need to define a namespace [21:29:47] 10Scap, 06Services (later), 15User-mobrovac: Delay repooling trending service after a restart - https://phabricator.wikimedia.org/T156687#3081937 (10Fjalapeno) @mobrovac thanks for the update! [21:38:06] 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.29.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T158996#3081986 (10mmodell) [22:00:47] 06Release-Engineering-Team, 06Operations, 10Phabricator, 10hardware-requests, 10ops-eqiad: replacement hardware for iridium (phabricator) - https://phabricator.wikimedia.org/T156970#3082031 (10Paladox) Bump, any update on this please? [22:09:46] 10Gerrit: git review -d fails with 404 on https://gerrit.wikimedia.org/changes/ endpoint - https://phabricator.wikimedia.org/T159869#3082098 (10Aklapper) See T154760 / T100987 [22:09:58] 10Gerrit: git review -d fails with 404 on https://gerrit.wikimedia.org/changes/ endpoint - https://phabricator.wikimedia.org/T159869#3082104 (10Aklapper) [22:10:01] 10Gerrit, 07Upstream: "git review -d XXX" doesn't work for http gerrit - https://phabricator.wikimedia.org/T100987#1325879 (10Aklapper) [22:10:44] (03CR) 10Hashar: "Would it help to split the two jobs in small chunks? Eg mobile vs desktop jobs." [integration/config] - 10https://gerrit.wikimedia.org/r/341461 (owner: 10Krinkle) [22:12:14] (03CR) 10Krinkle: "@Hashar: The bottleneck is the wpt agent (wpt.wmftest.org), not the Jenkins job. The job submits multiple runs at once already to reduce l" [integration/config] - 10https://gerrit.wikimedia.org/r/341461 (owner: 10Krinkle) [22:13:12] (03CR) 10Krinkle: "It's on AWS https://wikitech.wikimedia.org/wiki/WebPageTest and requires Windows." [integration/config] - 10https://gerrit.wikimedia.org/r/341461 (owner: 10Krinkle) [22:26:53] thcipriani: and I have bootstrapped a dummy zuul on ci-staging :) [22:26:58] https://horizon.wikimedia.org/project/instances/57071603-001e-4fd5-a195-26bf8629a5ba/ [22:29:31] fancy :) [22:29:55] RainbowSprinkles here's my other fix for polygerrit https://gerrit-review.googlesource.com/#/c/99004/ [22:29:56] hashar: its not nice to call zuul dumb xD [22:30:09] it's more of the fix i was trying to do in the first place + works too [22:30:22] it's a js config file [22:30:42] Zppix: DUMMY !! [22:30:53] thcipriani: will would need a gerrit at some point [22:31:12] hashar: :/ [22:31:21] safari is having some caching problems at the moment [22:31:28] thcipriani: also for k8s on ci-staging ( https://phabricator.wikimedia.org/T159864 ) we will probably need to bump the labs project quota [22:31:35] thcipriani: but I have no idea how many instances are needed :/ [22:31:36] as in my testing forceing it to throw an error dosent work [22:31:37] but does in chrome [22:32:22] As long as i can still do my stuff from safari i dont mind [22:32:55] yeah, unsure what the quota should be. we'll probably need to raise quota, but we'll file that when we know more, I guess. [22:36:57] paladox: nice :) [22:37:12] yep [22:37:13] paladox: then I am not sure when or even whether we would turn on polygerrit [22:37:17] not anytime soon for sure [22:37:36] Ok [22:37:37] I fixed that in https://gerrit-review.googlesource.com/#/c/99004/ [22:37:57] ideally polygerrit should have a prefix option so one could run it under something like review.example.org/poly/ [22:38:29] yeh [22:38:31] thats what this https://gerrit-review.googlesource.com/#/c/99004/ [22:38:32] does [22:38:37] i managed to fix it. [22:38:57] beat [22:39:01] neat [22:39:11] i created a new js file for polygerrit (similar to gerrit.config but for polygerrit as you can't access java code in javascript) [22:39:36] so what it means is we would create a js file under the home dir we set for the website in apache. [22:39:50] we would then define var polygerrit_baseurl = '/r'; [22:39:57] !log upgrading jenkins02.ci-staging to jenkins 2.x [22:40:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:40:20] thcipriani: and a second jenkins on jenkins02.ci-staging :] [22:40:40] and then polygerrit will start working (of course needs rewrites still since i haven't got as far as using javascript in writing the html out) [22:42:12] I noticed that in the instance list. A jenkins 2.x eh? [22:43:05] Ooh jenkins 2 can i break it hashar ? No in all seriousness is this going to be a good update? [22:43:35] so great [22:43:37] you will love it [22:43:48] best update, and I know updates [22:43:57] I sense sarcasm [22:44:07] :) [22:44:11] it doesnt come with cold beer though :/ [22:44:17] i have jenkins 2.x running on https://gerrit-jenkins.wmflabs.org/ [22:44:21] hashar: i do !!! [22:44:28] if I am lucky enough I will try an upgrade next week or the week after [22:44:31] Oh its that :/ dang [22:44:36] depends on how many crazy stuff happens [22:45:24] I feel like i need a disclaimer stating im not responsible for any damages [22:45:36] hashar here's the file [22:45:37] http://gerrit-new.wmflabs.org/polygerrit_baseurl.js [22:45:38] or how it would be defined [22:50:00] Is there a decent easy to understand dashboard for all of jenkins besides integration.wikimedia.org [22:51:07] PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [22:59:47] thcipriani: http://ci-staging-jenkins02.wmflabs.org/ci/ !!! :] [23:01:19] wow, weird [23:02:00] that is to prevent folks from having a jenkins that is wide open on the internet I guess [23:02:03] Weird [23:02:12] the temp password is somewhere in /var/lib/jenkins [23:02:27] will check tomorrow how we can hook it with lda [23:02:28] p [23:03:01] The same way with the other jenkins id assume [23:03:04] yeah, I was able to login, takes you through some fancy setup wizard [23:03:11] hashar: I hope you're not too fond of your children [23:03:40] Reedy: I will get time with them tomorrow morning :] [23:03:58] hashar: More, that dealing with ldap usually means sacrificing your first born [23:03:59] ;) [23:04:41] Lol [23:05:16] thcipriani: tip. I have added the whole /var/lib/jenkins in a git config [23:05:32] so to rollback: stop jenkins, git reset, start jenkins :] [23:05:42] did that on jenkins02 [23:05:55] heh [23:05:59] and that is probably going to be my rollback plan for the prod upgrade [23:06:18] Certainly worse ways of going about it [23:18:13] * hashar waves [23:26:06] RECOVERY - Puppet run on deployment-phab02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:31:16] wo woop i have managed to get index.html working properly in polygerrit [23:31:20] RainbowSprinkles ^^ [23:32:01] gerrit restart to adjust heap_limit ? [23:32:54] Second fix https://gerrit-review.googlesource.com/#/c/99510/ [23:34:55] paladox: poly-gerrit needs poly-fixes [23:35:05] lol, im fixing it [23:35:36] i found a very good solution, yet i have no idea why google never came up with it [23:35:45] only took me 4-5 months to figure out this solution [23:36:01] merged gerrit config change to lower heap size [23:57:32] 10Scap, 13Patch-For-Review, 06Services (later), 15User-mobrovac: Delay repooling trending service after a restart - https://phabricator.wikimedia.org/T156687#3082413 (10mobrovac) 05Open>03stalled Deployed and tested, setting as stalled until {T159867} is resolved so that we can remove the work-around.