[00:19:27] bd808: ugh, thanks [00:20:27] greg-g: Just needed some weeding in our beautiful garden. :) [00:20:40] :) [00:37:18] Yippee, build fixed! [00:37:19] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #246: FIXED in 45 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/246/ [02:21:35] PROBLEM - CI: Puppet failure events on labmon1001 is CRITICAL: CRITICAL: integration.integration-dev-trusty.puppetagent.failed_events.value (33.33%) [03:13:41] * Helder is excited about TDD after giving a try to http://exercism.io [03:40:25] 3Wikimedia Labs / 3deployment-prep (beta): Template:Artwork does not contain templatedata - 10https://bugzilla.wikimedia.org/71340 (10dan) 3NEW p:3Unprio s:3normal a:3None steps to reproduce ------------------ http://commons.wikimedia.beta.wmflabs.org/w/api.php?action=templatedata&titles=Template:Art... [04:18:32] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #247: FAILURE in 45 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/247/ [04:34:37] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce build #16: FAILURE in 38 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce/16/ [04:44:28] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #82: FAILURE in 9 min 49 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/82/ [05:06:23] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce build #28: FAILURE in 46 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce/28/ [06:01:29] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #228: FAILURE in 46 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/228/ [06:20:24] 3Wikimedia Labs / 3deployment-prep (beta): wikidata beta (item pages, etc.) inaccessible with 503 errors - 10https://bugzilla.wikimedia.org/69708 (10Ori Livneh) 5NEW>3RESO/WOR [06:22:39] 3Wikimedia Labs / 3deployment-prep (beta): HHVM crash logs need to go somewhere more visible than /tmp on the apache hosts - 10https://bugzilla.wikimedia.org/68459#c2 (10Ori Livneh) 5NEW>3RESO/FIX We're sending the logs to syslog now and forwarding them to fluorine and its beta equivalent. The log files... [06:24:53] 3Wikimedia Labs / 3deployment-prep (beta): beta: ResourceLoader CSS URL gzipped twice, causing skins to be broken - 10https://bugzilla.wikimedia.org/68720#c3 (10Ori Livneh) 5NEW>3RESO/FIX Fixed in 4bdfea04cea2e54e6076a1c6b955ba1e1b51209f. [06:49:29] Yippee, build fixed! [06:49:29] Project browsertests-MobileFrontend-test2.m.wikipedia.org-linux-firefox-sauce build #191: FIXED in 46 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-test2.m.wikipedia.org-linux-firefox-sauce/191/ [07:14:24] Project beta-scap-eqiad build #23210: FAILURE in 30 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/23210/ [07:16:33] doh 30 mins [07:18:10] Yippee, build fixed! [07:18:10] Project beta-scap-eqiad build #23211: FIXED in 1 min 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/23211/ [07:32:21] 3Wikimedia / 3Continuous integration: (Voting) -npm pipeline broken for MediaWiki-core, VisualEditor, (? other) repos - 10https://bugzilla.wikimedia.org/71314#c2 (10Antoine "hashar" Musso) Timo wrote: > Hi Antoine, > > https://integration.wikimedia.org/ci/job/mediawiki-core-npm/ > > As of build 2463[1], i... [07:41:49] !log Updated our Jenkins Job Builder fork 2d74b16..686265a [07:41:52] Logged the message, Master [07:49:19] (03PS1) 10Hashar: Migrate mediawiki-core-npm to Zuul cloner [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/163119 (https://bugzilla.wikimedia.org/71314) [07:50:39] !log Pooled back integration-slave1006 , was removed because of {{bug|71314}} [07:50:42] Logged the message, Master [07:52:39] (03PS2) 10Hashar: Migrate mediawiki-core-npm to Zuul cloner [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/163119 (https://bugzilla.wikimedia.org/71314) [07:54:06] hashar: Special:Preference on beta seems broken. [07:54:20] kart_: some developer broke it so :] [07:54:29] first step is to reproduce the issue, then fill in a bug :] [07:54:36] http://es.wikipedia.beta.wmflabs.org/wiki/Especial:Preferencias [07:54:51] ah. Just anything you're about to fix ;) [07:57:49] https://bugzilla.wikimedia.org/show_bug.cgi?id=71345 [07:58:37] 3Wikimedia / 3Continuous integration: (Voting) -npm pipeline broken for MediaWiki-core, VisualEditor, (? other) repos - 10https://bugzilla.wikimedia.org/71314#c4 (10Antoine "hashar" Musso) 5PATC>3RESO/FIX Fixed by migrating the job to Zuul cloner which does not fetch submodules. [07:58:57] (03CR) 10Hashar: [C: 032] "tested/works :)" [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/163119 (https://bugzilla.wikimedia.org/71314) (owner: 10Hashar) [07:59:58] kart_: that is a bug in WikimediaEvents [08:00:07] (most probably) [08:00:41] or some mw/core API interface broke [08:02:24] (03Merged) 10jenkins-bot: Migrate mediawiki-core-npm to Zuul cloner [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/163119 (https://bugzilla.wikimedia.org/71314) (owner: 10Hashar) [08:11:23] kart_: the patch is deployed in prod but apparently doesn't cause any issue [08:12:31] so I have no clue [08:22:41] hashar: Thanks for comments there. [08:25:55] zeljkof: hi [08:26:08] aharoni: morning :) [08:26:32] iirc, yesterday you mentioned that it's possible to run the screenshots manually job with one custom language code. [08:26:41] am I remembering correctly? [08:27:13] (And by the way, Japanese works perfectly!) [08:28:32] aharoni: it should be possible, it is not available at this moment, but it should not be hard to change the jenkins job to support it [08:29:01] Yippee, build fixed! [08:29:01] Project browsertests-VisualEditor-test2.wikipedia.org-linux-firefox-sauce build #205: FIXED in 1 hr 1 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-test2.wikipedia.org-linux-firefox-sauce/205/ [08:29:15] zeljkof: the reason I'm asking is that I'd like to try it with Persian (fa) and Arabic (ar) [08:29:47] both have over 90% translation [08:29:49] aharoni: should I make the change to the job today? [08:30:07] It it doesn't take too much of your time, it would be nice. [08:30:13] Project browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #183: FAILURE in 9 min 2 sec: https://integration.wikimedia.org/ci/job/browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/183/ [08:30:59] aharoni: I should be able to do it in 30 minutes or so [08:31:07] zeljkof: I would just add these languages to the list in job-templates-browsertests, but first I'd like to make sure that the fonts are not broken. [08:31:19] aharoni: sure, will try first [08:32:22] GAAAA, the ULS testing job failed because of the WikimediaEvents bug. [08:40:04] kart_: fixed by ori :-] [08:41:28] kart_: a deployed patch had some issue and the fix wasn't in master branch [08:41:31] http://es.wikipedia.beta.wmflabs.org/wiki/Especial:Preferencias works \O/ [08:41:46] hashar: kart_ - thanks for handling this [08:41:56] oh ori fixed it up :] [08:42:22] the trick was to look at the stacktrace which mentions WikimediaEvents, then look at recent merged patches [08:42:23] Now I can change my UI language from Kyrgyz to something ... just as exotic. [08:44:08] And since I'm here in the QA channel, I'd like to extend my huge thanks - YET AGAIN - to zeljkof and hashar and kart_ for the help with the VisualEditor translated screenshots GSoC project. [08:44:32] aharoni: you are welcome :) [08:44:39] and happy new year! [08:44:52] https://www.mediawiki.org/wiki/Help:VisualEditor/User_guide/ja [08:45:13] Maybe you don't know Japanese (I don't either), but you can see the translated screenshots. [08:45:24] cool [08:45:33] Now it's done with zero effort, thanks to your help with setting this up. [09:02:52] (03PS1) 10Tobias Gritschacher: Move wikidata-performance browsertests job to WMF Jenkins [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/163129 [09:03:23] (03CR) 10jenkins-bot: [V: 04-1] Move wikidata-performance browsertests job to WMF Jenkins [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/163129 (owner: 10Tobias Gritschacher) [09:08:28] (03PS2) 10Tobias Gritschacher: Move wikidata-performance browsertests job to WMF Jenkins [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/163129 [09:12:30] hashar: any chance that we could have https://wiki.jenkins-ci.org/display/JENKINS/Performance+Plugin on the wmf jenkins? [09:12:31] :) [09:12:34] guten morgen [09:13:44] hashar: the reson is, I want to move the last wikidata job from our jenkins instance to wmf jenkins. and this is the UI performance browsertest [09:14:09] and that would require this plugin, to get the nice graph you can see here: [09:14:10] http://wdjenkins.wmflabs.org/ci/job/wikibase-build-browsertests-ui-performance/ [09:15:17] Tobi_WMDE_SWE: oh I was not aware of that plugin [09:15:39] Tobi_WMDE_SWE: will probably need to add support for it in Jenkins Job Builder [09:16:26] I am not sure the dashboard will be any helpful since the Gerrit patchsets being tested are not a serie [09:16:34] hashar: it has support I guess, we used jenkins job builder as well [09:16:41] \O/ [09:16:46] see https://gerrit.wikimedia.org/r/#/c/163129/ [09:17:01] line 598 [09:17:08] Tobi_WMDE_SWE: can you fill a bug for it under wikimedia -> continuous integration ? [09:17:19] I am unlikely to be able to craft that today, but definitely next week [09:17:24] hashar: for installing the plugin? yes sure [09:21:08] (03Abandoned) 10Zfilipin: Environment variable MEDIAWIKI_API_UPLOAD_URL can now be set using build parameter in language screenshot job [integration/jenkins-job-builder-config] (cloudbees) - 10https://gerrit.wikimedia.org/r/156138 (owner: 10Vikassy) [09:23:41] 3Wikimedia / 3Continuous integration: Install Jenkins PerformancePlugin - 10https://bugzilla.wikimedia.org/71347 (10tobias.gritschacher) 3NEW p:3Unprio s:3normal a:3None Wikidata has a browsertest job that tests the overall performance of its UI. Basically the time the test needs is measured and show... [09:23:49] hashar: https://bugzilla.wikimedia.org/show_bug.cgi?id=71347 [09:25:09] Tobi_WMDE_SWE: Danke! [09:25:23] 3Wikimedia / 3Continuous integration: Install Jenkins PerformancePlugin - 10https://bugzilla.wikimedia.org/71347#c1 (10Antoine "hashar" Musso) Plugin: https://wiki.jenkins-ci.org/display/JENKINS/Performance+Plugin Apparently already supported by Jenkins Job Builder. [12:01:34] (03CR) 10Zfilipin: [C: 031] "I will leave it up to Antoine to merge this or not. It looks good to me and it reminds me that we need to do some serious refactoring of t" [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/163129 (owner: 10Tobias Gritschacher) [12:30:33] hashar: around? [12:44:25] hashar: is it possible to make the list of languages an optional parameter of this job? [12:44:40] https://integration.wikimedia.org/ci/view/BrowserTests/view/-All/job/browsertests-VisualEditor-language-screenshot-linux-firefox/build?delay=0sec [12:44:51] like MEDIAWIKI_API_UPLOAD_URL is set up to be [12:45:01] something similar to https://gerrit.wikimedia.org/r/#/c/157049/8/job-templates-browsertests.yaml,unified [12:45:04] * zeljkof brb [13:25:57] hashar: around? [13:26:39] zeljkof: yeahhh [13:26:48] zeljkof: sorry net is flappy and was finishing up some script [13:27:01] hashar: did you see my earlier question? [13:27:04] yeah [13:27:08] zeljkof: I have no idea :] [13:27:15] hashar: :) [13:27:29] I was playing with it and it was not clear to me how to do it [13:27:29] the list of language is an axis, it might be able to take a default value [13:27:52] gotta test with a job [13:27:59] gotta try with a test job [13:28:08] hashar: that way my idea too [13:31:55] zeljkof: https://wiki.jenkins-ci.org/display/JENKINS/Jenkins+Dynamic+Parameter+Plug-in [13:32:01] zeljkof: might do [13:32:11] hashar: will try [13:32:15] and it is supported by jjb via parameters: http://ci.openstack.org/jenkins-job-builder/parameters.html#parameters.dynamic-choice [13:32:45] the plugin is quite old though :/ [13:33:27] zeljkof: there is also https://wiki.jenkins-ci.org/display/JENKINS/Extensible+Choice+Parameter+plugin [13:34:34] zeljkof: and https://wiki.jenkins-ci.org/display/JENKINS/DynamicAxis+Plugin ahaha [13:35:44] http://ci.openstack.org/jenkins-job-builder/project_matrix.html <-- grep for dynamic [13:35:49] - axis: [13:35:49] type: dynamic [13:35:49] name: config [13:35:49] values: [13:35:49] - config_list [13:44:34] zeljkof: https://integration.wikimedia.org/ci/job/test-hashar-opt-axis/ [13:45:02] hashar: sorry, in a meeting [13:46:08] (03PS1) 10Hashar: Example to have a manually defined axis [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/163159 [13:46:18] zeljkof: related code is https://gerrit.wikimedia.org/r/163159 [13:46:36] (03CR) 10Hashar: [C: 04-2] "demo code" [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/163159 (owner: 10Hashar) [14:26:13] Hello zeljkof [14:26:32] vikasyaligar: sorry, in a meeting, shoudl be free in 5 misn [14:26:34] mins [14:27:03] great ! till then I will look into this => https://bugzilla.wikimedia.org/show_bug.cgi?id=71298 [14:35:30] vikasyaligar: I am here [14:35:41] vikasyaligar: please do [14:36:19] aharoni and I have disabled a couple of scenarios yesterday because they have started failing, we did not have the time to debug [14:36:39] zeljkof: yup ! I will solve it now. [14:36:44] O HAI vikasyaligar ! [14:36:50] aharoni: hello ! [14:36:53] great to see you. [14:37:03] I'm really happy about your project :) [14:37:19] It needs some updates, so if you are able to help, it will be very very useful. [14:37:29] aharoni: I am happy to see you using it :) [14:37:51] aharoni: sure! I can work this weekend :) [14:38:07] aharoni: Also I have to make GSOC report [14:39:01] vikasyaligar: yes please, that too :) [14:39:31] vikasyaligar: in case you haven't seen it, Japanese now works. [14:39:55] zeljkof, hashar and kart_ figured how to get the fonts installed in the right place. [14:40:36] Chinese screenshots are now created correctly as well, but they cannot actually be displayed, because they are created as simplified and traditional, and the page has hard time supporting it. [14:41:21] aharoni: awesome ! [14:41:26] \O/ [14:41:46] I am trying to figure out how to get the template that vikasyaligar created to support the Chinese traditional/simplified variants ( https://www.mediawiki.org/wiki/Template:Language_screenshot_URL ). [14:42:01] Templates are PAIN IN THE BUTTOCKS. [14:42:44] haha ! [14:44:53] zeljkof: how do you test if all the scenarios are successful in wikimedia jenkins?; If I run the language screenshot job, it will run for all the languages. But all I want to check is => if it works in jenkins for a particular languages. I want to run the language screenshot job when I am sure that all cucumber tests are passing. [14:45:21] vikasyaligar: I am working on that :) [14:45:21] zeljkof: all I can think of is to create a new job [14:45:26] zeljkof: great ! [14:45:38] vikasyaligar: yes, creating a test job with just one language is the way to go now [14:46:21] Yippee, build fixed! [14:46:21] zeljkof: and let the user enter one language through the Jenkins web interface, correct? [14:46:22] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #83: FIXED in 9 min 17 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/83/ [14:46:38] aharoni: yes [14:46:44] zeljkof: perfect [14:46:50] hashar helped me getting started with that [14:47:00] it was slightly more difficult than I thought it would be [14:49:50] sounds awesome !; zeljkof can you cc(or add reviewer) me when you are adding the patch ? [14:51:29] vikasyaligar: sure [14:51:41] aharoni, vikasyaligar: this is what hashar has done as a test https://gerrit.wikimedia.org/r/#/c/163159/ [14:51:56] the job: https://integration.wikimedia.org/ci/job/test-hashar-opt-axis/ [14:52:40] PROBLEM - CI: Puppet failure events on labmon1001 is CRITICAL: CRITICAL: integration.integration-dev-trusty.puppetagent.failed_events.value (100.00%) [14:56:20] wow ! never thought we could use build parameters like that :) [14:58:07] 3Wikimedia / 3Continuous integration: (Voting) -npm pipeline broken for MediaWiki-core, VisualEditor, (? other) repos - 10https://bugzilla.wikimedia.org/71314#c6 (10Bartosz Dziewoński) 5RESO/FIX>3REOP This is still broken today, but in a different way now. https://gerrit.wikimedia.org/r/#/c/163168/ http... [14:59:43] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #248: ABORTED in 22 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/248/ [14:59:51] hashar: is ^ known? [15:01:36] hola YuviPanda ! [15:01:40] long time no chat [15:01:42] heyo aharoni [15:01:43] indeed :) [15:01:52] zeljkof: - is that actually usable? [15:01:54] aharoni: how're you doing? [15:02:13] Can I run it with a particular language code? [15:02:38] aharoni: sorry, in a meeting, in the half of the patch [15:02:45] zeljkof: not urgent [15:02:54] YuviPanda: life is good [15:03:24] YuviPanda: what is known ? :D [15:03:33] hashar: puppet failing in integration-dev-trusty [15:04:02] (03PS1) 10Zfilipin: WIP pick a language [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/163175 [15:04:31] hashar: npm jobs are failing on slave1006 again -- https://integration.wikimedia.org/ci/job/mediawiki-core-npm/2484/console [15:05:48] Jenkins says permission denied when I try to view the build history on that node to see if all npm jobs die there or just some [15:05:51] YuviPanda: yeah I have shut down puppet on all instances (see -labs ) [15:06:09] hashar: yeah, but this one particular one has *failed events*, and has had them for a day or so, I think? [15:06:10] YuviPanda: andrewbogott is using integration project to test a LDAP Switch [15:06:15] bd808: :-( [15:06:23] hashar: yea, this is unrelated :) [15:06:28] bd808: at least that is not git related :] [15:06:53] bd808: I have dropped Jenkins git plugin in favor of using zuul cloner. So it just clone mediawiki/core without any submodule [15:07:35] maybe REL1_24 Is broken? :( [15:08:11] "Error: Cannot find module 'coffee-script'" looks like missing packages? [15:08:35] * bd808 has not looked into the root cause [15:12:07] That hhvm regression is annoying. I guess someone should make an upstream patch to declare all of the missing curl constants. [15:12:16] i got sh: grunt: command not found [15:12:16] :( [15:12:41] so continued node sadness there [15:13:17] Is there a missing puppet role or something? Or maybe just some kind of local corruption [15:13:37] bd808: also puppet agent is disabled on all integration box on purposes [15:13:46] andrew b is using the project to test a LDAP switch [15:13:55] the testing for the new certs, yeah [15:14:18] the npm issue is unrelated to the bug imho [15:14:19] * bd808 caught up on most backscroll this morning [15:14:45] Is that the only 14.04 slave? [15:15:11] * bd808 sees that 07 nad 08 are trusty too [15:16:05] hmmm... when I ssh into 06 and type `grunt` it works. I get the "no gruntfile" message [15:16:44] $ type -a grunt -- grunt is /usr/local/bin/grunt [15:17:07] bd808: we have 3 Trusty slaves: 1006 to 1008 [15:17:20] which have a Jenkins label UbuntuTrusty [15:17:31] and Timo made the npm job to use that label, so the jobs roam on all threenodes [15:17:34] *nod* [15:17:42] grunt might have been installed globally :/ [15:19:00] It looks to be. /usr/local/bin/grunt is a symlink to /usr/local/lib/node_modules/grunt-cli/bin/grunt [15:19:11] yeah doesn't work on my laptop huhu [15:19:20] timo probably ran npm -g install grunt [15:20:43] and that would conflict with the jobs? [15:21:22] I am trying to reproduce [15:21:24] on slave1006 [15:21:56] maybe npm has some trouble dealing with dependencies :/ [15:22:19] PROBLEM - CI: Puppet failure events on labmon1001 is CRITICAL: CRITICAL: integration.integration-dev-trusty.puppetagent.failed_events.value (100.00%) [15:27:01] (03PS2) 10Zfilipin: WIP cleaning up core [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/162910 [15:34:38] 3Wikimedia / 3Continuous integration: (Voting) -npm pipeline broken for MediaWiki-core, VisualEditor, (? other) repos - 10https://bugzilla.wikimedia.org/71314#c7 (10Antoine "hashar" Musso) Seems to be an entirely different issue: 00:00:03.045 [mediawiki-core-npm] $ /bin/bash -xe /tmp/hudson65266686831012425... [15:37:53] bd808: I am blaming npm for not installing dependencies :] [15:39:13] Sounds about right. Why would an install tool resolve the package dependency chain? :/ [15:40:26] I'd vote for trying your idea of deleting the existing .npm cache [15:41:40] bd808: will let Timo a chance to have a look at it [15:42:08] Should we depool that slave again meanwhile? [15:43:17] bd808: yeah probably [15:43:41] bd808: or I can just drop the npm cache *evil* [15:44:23] either works for me. If it's a real problem it will come back :) [15:44:34] Yippee, build fixed! [15:44:34] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #249: FIXED in 44 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/249/ [15:51:19] (03PS3) 10Zfilipin: Core browser tests no longer have site specific Cucumber tags [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/162910 (https://bugzilla.wikimedia.org/67616) [15:52:03] (03PS4) 10Zfilipin: Core browser tests no longer have site specific Cucumber tags [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/162910 (https://bugzilla.wikimedia.org/67616) [16:24:46] Project beta-scap-eqiad build #23264: FAILURE in 54 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/23264/ [16:33:03] Project beta-code-update-eqiad build #25869: FAILURE in 1.9 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/25869/ [16:46:52] 3Wikimedia Labs / 3deployment-prep (beta): Template:Artwork does not contain templatedata - 10https://bugzilla.wikimedia.org/71340 (10Greg Grossmeier) p:5Unprio>3Normal [16:49:53] 3Wikimedia Labs / 3deployment-prep (beta): Template:Artwork does not contain templatedata - 10https://bugzilla.wikimedia.org/71340#c1 (10Greg Grossmeier) Works now? What'd you do?! ;) [16:53:37] 3Wikimedia Labs / 3deployment-prep (beta): Template:Artwork does not contain templatedata - 10https://bugzilla.wikimedia.org/71340#c2 (10James Forrester) 5NEW>3RESO/WOR This sounds like it was an artefact of bug 50372. [17:20:06] Project beta-update-databases-eqiad build #4319: FAILURE in 5.2 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/4319/ [17:43:17] bd808: I'm not sure what the context is of that ping, but if you're setting up a new instance, don't, if you do, read the documentation. Integration is not 100% in puppet. A clean instance must not be pooled. [17:43:44] https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/Setup [17:44:49] Krinkle: integration-slave1006 seems to have npm issues. I depooled yesterday. hashar repooled after fixing some git related problems. npm still causing problems there. [17:46:48] hashar thought that deleting /mnt/home/jenkins-deploy/.npm/ might fix it but wanted to leave it for you to look at -- https://bugzilla.wikimedia.org/show_bug.cgi?id=71314#c7 [17:47:49] Yippee, build fixed! [17:47:50] Project beta-code-update-eqiad build #25875: FIXED in 14 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/25875/ [18:08:38] 3Wikimedia / 3Continuous integration: Jenkins: *-npm jobs broken for mediawiki-core and extensions - 10https://bugzilla.wikimedia.org/71314 (10Krinkle) [18:20:39] Yippee, build fixed! [18:20:40] Project beta-update-databases-eqiad build #4320: FIXED in 38 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/4320/ [18:31:10] what happened this morning (UTC) to Beta Cluster? http://people.wikimedia.org/~gjg/betacluster/graphs.html [18:35:45] greg-g: looks like a spike on mediawiki03 [18:35:58] security scanning perhaps? [18:36:38] marxarelli: looks like only 01 and 02 to me... [18:36:43] green/blue [18:37:43] * marxarelli rubs his crusty eyeballs [18:37:58] yeah, you're right. i misread the network io graph [18:38:06] * greg-g nods [18:38:08] :) [18:52:46] (03PS1) 10Krinkle: Use 'git clean -ff' instead of 'git clean -f' (force removal of submodules) [integration/jenkins] - 10https://gerrit.wikimedia.org/r/163221 [18:55:39] 3Wikimedia / 3Continuous integration: Jenkins: *-npm jobs broken for mediawiki-core and extensions - 10https://bugzilla.wikimedia.org/71314#c8 (10Krinkle) 5REOP>3RESO/FIX p:5Unprio>3Normal I've purged the .npm cache. The coffee-scripts dependency seems to have been a result of cache corruption. http... [19:00:24] 3Wikimedia / 3Continuous integration: Jenkisn: Re-enable lint checks for Apache config in operations-puppet - 10https://bugzilla.wikimedia.org/70068 (10Krinkle) [19:01:17] (03CR) 10Krinkle: [C: 032] Use 'git clean -ff' instead of 'git clean -f' (force removal of submodules) [integration/jenkins] - 10https://gerrit.wikimedia.org/r/163221 (owner: 10Krinkle) [19:01:20] (03Merged) 10jenkins-bot: Use 'git clean -ff' instead of 'git clean -f' (force removal of submodules) [integration/jenkins] - 10https://gerrit.wikimedia.org/r/163221 (owner: 10Krinkle) [19:15:40] (03CR) 10Krinkle: "The dir includes the workspace name. Those are not shared between parallel builds (that would be a disaster). I think the issue is somethi" [integration/jenkins] - 10https://gerrit.wikimedia.org/r/102149 (owner: 10Hashar) [19:16:30] Krinkle: a lot have changed over the course of 10 months :] [19:16:42] hashar: Hey [19:17:01] hashar: Where is the bug about jobs failing in Trusty between wmf branches and master and leaving submodules behind? [19:17:07] I can't find the bug or see an example of that failure [19:17:19] I want to test if that patch fixes it, or whether we need to fix the git-clean that is inside Zuul/merge.py [19:17:45] Krinkle: https://bugzilla.wikimedia.org/show_bug.cgi?id=71314 [19:17:58] Krinkle: your email is in comment #2 [19:17:59] got fixed [19:18:07] then reopened because of the npm cache corruption [19:18:27] Oh.., right [19:18:50] hashar: Do you know what happened to /srv/deployment? Why is puppet no longer creating that? [19:18:54] https://gist.githubusercontent.com/Krinkle/b6e5301c6279f1797c65/raw [19:19:03] I created a new instance etc. but it's failing horribly [19:19:12] the other ones I created 2 weeks ago were fine [19:20:26] /srv/deployment ? [19:20:29] on beta you mean? [19:20:31] No [19:20:33] integration [19:20:56] some puppet manifest got refactored I guess [19:21:22] or git::clone got refactored [19:21:28] which host is that? [19:22:04] I am pretty sure git::clone() used to create the parent dir [19:22:21] I deleted it and re-created it in case it was an intermittent issue (per ops advice) [19:22:26] I'll know in a minute [19:22:51] regarding git clean -ff I am not sure :-] [19:23:08] with Zuul cloner, mediawiki/core is cloned under /src/ [19:23:10] well, since you made it use zuul-merger, I guess that's no longer an issue [19:23:17] and mediawiki/vendor and extensions are somewhere under /src [19:23:30] spa git clean -ff would remove everything under /src/ and just keep mw/core [19:24:11] over the next couple weeks I will trigger a new job that tests several extensions together [19:24:18] and the job would be shared by those extensions [19:24:23] for example Mantle / MobileFrontend [19:25:14] the job will keep the extensions cloned, but the list of extensions to be loaded will be generated (i.e.: the job will not blindly load everything under extensions/* but a selected list of them [19:26:07] need to motivate myself to write some doc/rfc and submit it for approval by devs [19:26:38] hashar: Where are we on job execution in standy/throwaway vms? [19:26:50] Is the plan approved? Are we going with labs-backend? [19:27:01] no approval [19:27:11] been busy working on Zuul cloner / fixing up random stuff [19:27:16] and browsertests [19:27:34] I did talk with coren last month [19:28:04] seems labs can handle the use case. We might need to add some hardware to the platform, but not much to worry about [19:28:27] we agreed I would have to write a technical document explaining the architecture / workflow and we can then implement it [19:28:46] Yeah, for the longest time I also thought labs is the best approach [19:29:22] Krinkle: /srv/deployment is not defined apparently : ((( fgrep '/srv/deployment]' /var/lib/puppet/state/resources.txt yields nothing [19:29:29] however I'm increasingly more concerns based on my experience as a volunteer. Basically every single damn time I log in to labs to do anything, everything feels broken all the time. It's always that one thing I need isn't there and doesn't work for whatever reason. [19:29:41] That's quite frustrating to experience for ~ 10 months, every week. [19:29:53] it used to be wayyyyy worth trust me :] [19:30:06] you could not login for some reason [19:30:13] GlusterFS would go mad twice per week or so [19:30:24] I know there's more stability and more features (as staff, I know those things), but as a user I'm not seeing any of that [19:30:38] I just know it's always broken when I need it [19:30:53] ops breaking puppet every other week isn't helping either [19:31:31] Like yesterday I set out to do some experiements with local unit testing on integration slaves to use ubuntu/chromium headless instead of phantomjs. [19:31:50] The idea was: I set up an instance with plain trusty and contint-slave roles from puppetmaster, and then go and experiment [19:32:02] but instead it turns out our manifests are completely broken and don't even work on a clean instance. [19:32:17] We have 0 tests for manifests in general. [19:32:18] :-((((((((((((((( [19:32:33] though you managed to set up the integration-slave Trusty without too much troubles [19:32:40] IN most cases people test changes before merging, but I'm not sure where integration/kss came from [19:32:41] but yeah, puppet definitely needs integration tests [19:32:51] which we can't really do unless they run in isolated vms [19:32:58] Yep [19:33:04] integration/kss is a repo I created [19:33:15] lamely based on the same idea as integration/phpunit [19:33:23] i.e. have a copy of kss in a repo we maintain [19:33:26] k [19:33:31] to avoid having to download from npm [19:33:36] I don't think we'll use that though [19:33:38] but that was before we had labs slaves :D [19:33:52] yeah integration/kss should be dropped. But I am not sure how to delete a repo [19:33:59] I recommended the engineer working on it (s and prateek) to use Grunt and local npm-install in the workspace that generates the documentatin [19:34:04] OK. [19:34:48] yeah James F told me you were pushing folks to use grunt in their local repos [19:34:57] So just in general: I don't want *any* new global npm modules. The only ones grandfathered are grunt and jshint (and csslint). Everything else via local package.json. The migration from global modules is extremely painful and errorprone. It just doesn't work and makes it hard for developers to test locally since it's a different version etc. [19:35:02] so on CI side we just `npm test` and delegate the rest to devs [19:35:09] especially for kss, becaus we already have npm for mediawiki-core [19:35:18] +1 :] [19:35:35] though we could just upgrade the global module, announce it and let devs fix their repo to match the new Vision ™ [19:35:40] but yeah local is fine [19:35:45] We'd probably set up something like mediawiki-core-doc-publish (+kss?) which does 'npm install && grunt kss' and then sync to doc.wm.o [19:36:03] ah yeah hmm [19:36:06] hashar: doesn't work well with different projects. You can't atomically have all rpojects upgrade code in the same second. [19:36:14] I can't remember when / whom I was talking about doc generation [19:36:25] the idea was merely to define well known entry point for which we would define jobs [19:36:31] E.g. jshint 2 -> 3; there's changes that are required to be made. It's impossible to upgrade at once. Can't be done without causing people's projects to break master in a confusing way. [19:36:42] such as: tox -e doc || bundle exec doc || or npm run-script doc [19:36:59] So I'm recommending we instead announce the chnage, and people will update their own package.json and update any code at the same time. [19:37:12] jscs is passing everywhere it is enabled. [19:37:14] and it will be up to dev to implement that entry point and have the doc generated to well known directory (such as $WORKSPACE/doc/ [19:37:38] I expect phpcs will never ever pass everywhere because 1) config is not in the local repo, so people dont use it locally, 2) even when it passes, any change we make it it will cause it to fail in some repos again [19:38:37] It's not at all a priority, but I hope to move the phpcs rules into the local repo (like jshintrc and jscsrc) so you can use plugins for your text editors that highlight things, and it'll allow people to update the settings atomically in the same commit [19:38:53] hashar: OK. What are your priorities for next quarter in CI? [19:39:10] Yippee, build fixed! [19:39:11] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce build #17: FIXED in 41 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce/17/ [19:39:20] Do you think we should try and get VM testing to work for at least 1 job? [19:39:58] Krinkle: PHP CS might be nice to have, but the softwrae/linter is bloated anyway [19:40:07] Krinkle: it is not that easy to define rules nor to install or even run it [19:40:15] Krinkle: don't waste too much time on phpcs imho [19:40:28] It's complicated the way we use it, it doesn't have to be. [19:40:33] Krinkle: my priorities are now set in the Release/QA group (greg-g) though I have some kind of liberty [19:40:38] I use it for dozens of projects on github, not complicated at all [19:40:51] if you make phpcs easier to use definitely go for it :] [19:40:55] hashar: Is CI now part of Release Eng? If so, we can push our priorities there too [19:41:04] Krinkle: not really [19:41:09] It has to fall under something. We can't keep running CI as a volunteer/20% effort. [19:41:18] Krinkle: I am attached to Release/QA but CI is not formally part of it :] [19:41:22] OK [19:41:40] Then I'll push elsewhere. I refuse to keep it like this. Seriously. It's unworkable. [19:42:13] (I'll ask greg as well, it's not a pressing issue on the short term, but it really has to change) [19:42:22] Krinkle: my point is that CI should be a cross team effort with more folks involved [19:42:38] fire fighting is fine if the fires are preditable and limited, but I feel it's getting increasingly more stable because we're unable to deal with our tech debt right now [19:42:48] more unstable* [19:42:55] Krinkle: whenever we switch to Phabricator, Release/QA will be among the first to use it. So it will be easier to show CI related stuff to the rest of the team and train / assign them tasks [19:44:45] Krinkle: and as I use to say, CI is a 2,1 men project [19:44:59] 1 being me, the other 1 you and 0.1 the rest :-] [19:46:18] Krinkle: anyway for my CI related prio: 1) job that run PHPUnit tests of different extensions cloned together [19:46:31] will probably migrate the qunit job to that system as well [19:46:58] 2) tech doc describing the sandboxed VM architecture and get it down [19:47:00] done [19:47:34] and among that set up a nightly cluster, the equivalent of beta cluster but with code/puppet deployed only once per day instead of continuously. But that one is going to be a team effort [19:52:08] 3Wikimedia / 3Continuous integration: Jenkins: Fail on BOM in submitted files - 10https://bugzilla.wikimedia.org/38233 (10Krinkle) p:5Normal>3Low [19:52:32] Project browsertests-VisualEditor-test2.wikipedia.org-linux-chrome-sauce build #209: FAILURE in 59 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-test2.wikipedia.org-linux-chrome-sauce/209/ [19:52:58] hashar: https://bugzilla.wikimedia.org/show_bug.cgi?id=42961 is fixed, right? [19:54:22] (03PS1) 10Amire80: WIP: Add languages with translation over 90% [integration/jenkins-job-builder-config] - 10https://gerrit.wikimedia.org/r/163241 [19:57:08] 3Wikimedia / 3Continuous integration: split phpcs in voting and non-voting sniffs - 10https://bugzilla.wikimedia.org/46500#c7 (10Krinkle) 5PATC>3RESO/WON As part of simplification effort for phpcs, I'm closing this. The sniffer config should contain the errors and warnings we want to enforce. The rest i... [20:04:16] Yippee, build fixed! [20:04:17] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce build #29: FIXED in 45 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce/29/ [20:06:53] 3Wikimedia / 3Continuous integration: Jenkins: Set up JavaScript code coverage analysis (e.g. JSCover) - 10https://bugzilla.wikimedia.org/48365#c6 (10Krinkle) 5REOP>3RESO/WOR JSCover didn't work out very well. I've standardised on istanbul[1] for the moment. There are qunit[2] and karma[3] integrations.... [20:07:08] 3Wikimedia / 3Continuous integration: Jenkins: Set up JavaScript code coverage analysis - 10https://bugzilla.wikimedia.org/48365 (10Krinkle) p:5Low>3Normal a:3Krinkle [20:09:18] Krinkle|detached: I have marked https://bugzilla.wikimedia.org/show_bug.cgi?id=42961 fixed :D [20:09:21] 3Wikimedia / 3Continuous integration: Jenkins: Clean workspace for mediawiki extension jobs - 10https://bugzilla.wikimedia.org/42961#c5 (10Antoine "hashar" Musso) 5NEW>3RESO/WOR That is no more an issue / got fixed :-) [20:16:18] hashar: Krinkle|detached: we (robla and I) were just talking about CI and ownership, actually :) [20:17:16] hashar: Krinkle|detached summary: let's be more explicit about the needs you two have so we can deal with it correctly. It's been under the radar(ish) for too long [20:17:49] * robla catches up on backlog [20:19:06] 3Wikimedia / 3Continuous integration: Jenkins: Re-enable lint checks for Apache config in operations-puppet - 10https://bugzilla.wikimedia.org/70068 (10Greg Grossmeier) [20:19:09] robla: TL;DR Timo has ton of super nice ideas, we don't have that much processing bandwidth since we keep firefighting :] [20:19:54] and I feel guilty about it. [20:21:36] hashar: Krinkle|detached we should get those ideas written down so others are aware of them/can help/can help prioritize [20:22:48] greg-g: such as a todo list ? [20:22:55] or a wall of cards :-D [20:23:00] that'd be a good start [20:28:25] greg-g: left myself a note for monday, will look at writing such doc :] [20:28:36] or Timo beat me to it and I amend his [20:30:11] hashar: sounds like he should write it? (just based on how you characterized it "timo has some really nice ideas") [20:30:19] * greg-g goes to next meetin [20:30:22] g [20:47:26] greg-g: So why would RelEng *not* own CI? [20:53:21] I guess nobody really owned CI beside Rob having me in his team :] [20:54:13] we have more and more synergy with QA at least [20:54:26] since they have the browser tests on the CI jenkins and use the same JJB receipes [20:55:59] seems like everyone should "own" CI [20:56:31] Yippee, build fixed! [20:56:31] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #229: FIXED in 44 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/229/ [20:58:40] maybe we can set up a weekly CI checking for devs to discuss / proposes features / ask questions and so on [20:59:27] hashar: that seems like something awjr and teampractices should lead [20:59:47] that is a good idea [21:00:28] hashar: I found out just this week that Arthur has actually done Test Driven Development in PHP, for real [21:02:02] yeah that would be nice [21:02:07] 3Wikimedia / 3Continuous integration: Jenkins: Re-enable lint checks for Apache config in operations-puppet - 10https://bugzilla.wikimedia.org/70068#c3 (10Antoine "hashar" Musso) I am just noticing this bug :-/ The apache configuration got moved from the standalone repository operations/apache-config.git to... [21:02:38] but that needs a culture shift among the org/community since we barely write tests for MediaWiki [21:03:32] hashar: that is why I think awjr and teampractices are the ones to drive the bus for Jenkins use [21:05:57] I need to relocate to Quebec [21:06:07] the timezone difference sucks [21:06:37] that being said, sleep I need [21:07:36] bd808: it doesn't not, it's just that Timo does a bunch of the work and that is opaque to me (not blaming him, blaming me really) [21:12:30] I guess most of Timo work is related to feature needs and his love for Javascript [21:13:17] the scary part: https://bugzilla.wikimedia.org/buglist.cgi?component=Continuous%20integration&list_id=347346&product=Wikimedia&resolution=--- [21:13:24] 124 bugs found [21:13:33] that's just open CI bugs [21:14:09] scary part to me, as Release Team Manager [21:14:10] ;) [21:14:25] I get that it's a collaborative thing (like everything at WMF) but somebody should be staffing and budgeting for the test runner environment. But I'm probably not telling anyone anything new. [21:14:43] right [21:14:49] agreed 100% [21:15:48] chrismcmahon: I have done real live TDD with php too, but I personally hate TDD culture. (not practitioners, but the general vibe of the larger TDD community) [21:18:22] bd808: not sure what "TDD community" is, and I am a crummy programmer, but I've watched paired in TDD sessions and seen it work. PHP seems crippled though. [21:18:33] I do however value unit tests and integration tests and automated testing in general [21:18:58] crippled? phpunit not shiny enough? [21:19:16] php is lacking great bdd tools [21:19:54] There are a couple of attempts at them but without meta programming bdd frameworks can feel limited [21:20:01] but bdd !== tdd [21:20:20] At least to me tdd means test first development [21:20:29] yeah, I think the highest use of Jenkins would be for TDD [21:20:32] write a test, watch it fail, make it pass [21:20:33] yes [21:20:50] red - green- refactor [21:20:58] * bd808 nods [21:21:53] My problem with strict tdd is that in my opinion, good architecture is not an emergent property of the development process [21:22:06] Project beta-update-databases-eqiad build #4323: FAILURE in 2 min 5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/4323/ [21:22:17] bd808: some say not to design? [21:22:19] I think rails proves this point well enough in and of itself [21:22:30] hashar: Yeah, so it consistently fails. I just tried again on a new instance [21:22:34] hashar: https://gist.github.com/Krinkle/c030d940a9a7d99d921f [21:23:13] Krinkle: will you be in the office on Monday? [21:23:21] Krinkle: something must have changed so :/ [21:23:31] Krinkle: I poked around in the puppet repo history a bit and can't see how /srv/deployment and the project specific dirs were ever created from the code there [21:23:52] bd808: Rails wasn't build with TDD. DHH caused a small shitstorm a few weeks ago by saying that automated UI tests trump unit tests. The founder of SauceLabs was pleased. [21:24:05] At least without an initial trebuchet deploy [21:24:27] greg-g: hashar: I agree with hashar's ideal that CI should mentally be owned by everyone. Similar to code integrity, quality, performance and security (day-to-day). But for infrastructure, it needs maintenance. [21:24:28] chrismcmahon: ughhh [21:24:39] Krinkle: agreed, yeah. [21:24:55] and as such people that have dedicated hours to that maintenance (not on a good faith basis silently compromising other obligations/expectations) [21:25:03] the maintenance has grown out of control since the CI architecture is way more complicated than when I started [21:25:22] Yeah, we keep having to patch it, but every now and then, tech debt has to be paid. [21:25:27] indeed [21:25:33] 3 years ago it was a single machine with a 4 executors Jenkins, with maybe 10 jobs and almost no puppet conf / scripts [21:25:42] greg-g: I'll be in later today (after 6pm ish) and all of monday. [21:25:52] I have stopped counting the number of slaves, and have like 4k jobs :] [21:25:53] and I think before we can take on any large refactor/change in tooling/whatever, we need to get a good feel for where we are going. [21:26:27] spanning two ubuntu versions and all the possible software language we use (js, perl, python, php, hhvm, c++, android, c, Apple IOS ..) [21:26:30] that "vision" needs to be owned by RelEng, or it's effectively not owned by anyone [21:27:38] bd808: Hashar suspected maybe git::clone used to auto-create the dirs? [21:27:50] I dump a large number of untracked hours into beta as well every week. And I don't even want to admit how much of my brain is consumed by mediawiki-vagrant. Platforms like this need care and feeding. [21:27:56] or maybe some base class that just created the dir? [21:28:09] Krinkle: I wondered the same thing but I couldn't find where that would have happened. [21:28:20] Krinkle: most probably. Maybe a mkdir -p got removed [21:28:23] bd808: But in favour of getting 'it done'. Where should the dir be created right now? [21:28:36] bd808: and for that I'm both deeply indebted and thankful. [21:28:40] git::clone that is. Puppet does not grok recursive directory creation. [21:29:06] should we have the ci-slave role do a mkdir for that dir and have the rest depend on that File[''] ? [21:29:20] bd808: yeah kudos on all the thing you did for beta and scap rewrite [21:29:36] bd808: (and to be clear: I want to rely on you less, I really really really do) [21:29:40] Krinkle: In the role that is trying to fake being trebuchet I guess. You will probably need a common role to make /srv/deployment and then make the parent dir for each thing you want to place in there [21:29:45] bd808: the self puppetmaster alone was a huge improval. [21:30:09] aw thanks guys. :) [21:30:27] RECOVERY - BetaLabs: Puppet failure events on labmon1001 is OK: OK: All targets OK [21:30:30] I just tried to put plywood over broken windows [21:30:50] bd808: I'm in this weird in-between state where I don't want to thank you too much for all the little things because I don't want you to think I expect it of you :) [21:30:52] maybe someday the glass man will show up and fix them properly [21:31:31] greg-g: Just back me up when rob-la is mad because I slipped on something else ;) [21:32:17] back in the early day of beta, we wanted it to be some real staging area [21:32:24] so folks would deploy on it and assess it works [21:32:33] then pass it to the actual production deployers [21:32:44] bd808: will do. [21:34:13] so this time I am really going to bed , greg probably has already lost count of the number of times I said so tonight :D [21:34:20] :P [21:34:22] g'night [21:34:58] let me just finish that little refactoring [21:35:01] The system I built at Kount was 4 stages: vms for devs, an automatically updated environment for Jenkins/manual integration testing, a staging environment that was updated with the exact deployment processes for production where we did final acceptance testing of each release, and finally production [21:35:45] beta has always been patched in response to failure. it was never designed, we just filed bugs and people (mostly Antoine, with a lot of help from folks like Mobile) until it did what people wanted [21:35:52] 3 out of 4 here (ish) [21:35:58] devs were not allowed to directly touch the last 2 environments (stage and prod) [21:36:55] but ops would not deploy anything without a built package and a detailed deployment plan that were both dev resosibilities [21:36:59] bd808: +2 on that [21:37:29] "detailed deployment plan" [21:37:31] heh [21:37:52] now, "built package".... squashfs hhvm instance? :) [21:37:54] I wrote 90% of them. [21:38:18] we did tarball deploys but I was moving to deb packages [21:38:26] my previous job had even three stages: devs (produce doc + package) -> integration (test package, report, produce more doc) -> prod (receive package, are basically the god and can say NO to anything) [21:38:49] wait, am I being stupid on a Friday afternoon, or does RA HHVM with squashfs or whatever give us "binary build" equivalents? [21:38:58] it does [21:39:11] but people will hate it [21:39:20] and make backdoors [21:39:27] eval.php... [21:39:48] because how could we not just keep incrementally patching until things work? ;) [21:40:03] Obvious... [21:40:05] We don't. [21:40:30] man [21:40:31] In the world of packaged deploys you must be able to roll back very quickly [21:40:44] yeah, we talked about that... [21:41:18] and you must be able to triage the impact of a failure to determine if rollback is necessary very quickly [21:41:30] aka: rolling upgrads [21:41:32] +e [21:41:40] which means you need good production metrics that actually tell you about the health of the environment [21:41:47] PROBLEM - BetaLabs: Puppet failure events on labmon1001 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki04.puppetagent.failed_events.value (44.44%) [21:41:47] yeah, that [21:41:50] oh look at that [21:41:53] good timing [21:42:06] Can I just kill that vm? [21:42:12] yeah [21:42:19] that's jeremyb's mystery host [21:42:27] * bd808 boldly does [21:42:38] if he wants it back he can make it again [21:43:19] {{done}} [21:43:27] Project browsertests-MobileFrontend-test2.m.wikipedia.org-linux-firefox-sauce build #192: FAILURE in 45 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-test2.m.wikipedia.org-linux-firefox-sauce/192/ [21:43:46] hopefully that test wasn't talking to that host ;) [21:43:47] Ed Keyes 2007 "Sufficiently Advanced Monitoring is Indistinguishable from Testing" https://www.youtube.com/watch?v=uSo8i1N18oc [21:43:50] greg-g: twentyafterfour probably has lot to say as well regarding deployment [21:44:18] chrismcmahon: +1 [21:44:30] hashar: definitely. I want him to get the time to work on it/think about it. [21:44:31] ello [21:44:35] stupid phabricator [21:44:38] hashar: go to bed! [21:44:48] cant sleep [21:44:50] (for the record, I love phab) [21:45:01] I am looking at python functional programming [21:45:09] I just want to let twentyafterfour loose on deployment things :) [21:45:12] hashar: That should make you sleepy [21:45:23] and will probably take some course to learn Haskel [21:45:37] cause we all know nobody wants to see Objective CAML on the cluster :D [21:46:20] The problem with cool languages is having enough depth to maintain the apps if the original author leaves [21:46:27] Learn You a Haskell for Great Good! [21:46:54] * hashar writes note that twentyafterfour is willing to review Haskell [21:47:11] Yahoo Stores was rewritten from lisp to php & c because Yahoo couldn't find enough lisp hackers to keep it running [21:47:13] hahah [21:47:33] bd808: that is probably why ASP disappeared and PHP took over the world [21:47:48] But Paul Graham got his $$ out of it I guess [21:47:54] or that any big company mostly use J2EE [21:47:57] bd808: interesting, you know google flight info? that was a LISP project they acquired [21:48:29] ASP disappeared because M$ killed it. M$ likes to do that about every 6 years [21:48:35] and Sketchup is (was?) all Ruby. Google acquired them because of their awesome use of the Google Earth API [21:48:37] https://www.google.com/flights/ [21:49:58] sketchup was written in ruby? wow [21:50:17] GOOG probably has the pull to hire enough lisp geeks. Yahoo didn't 15 years ago. [21:51:49] anywho, reproducible builds, packaged builds, build pipeline,.... I've got some ideas now ;) (but no time) so, where are the fires to put out? [21:53:43] well, and there is education too. I started refactoring a browser tests repo today, found a step "I wait for the page to re-render". It is implemented as "sleep 10" [21:53:56] it is called 17 times in the suite [21:54:29] "I go make a pot of coffee" [21:54:53] yeah, we ( marxarelli and I (and robla)) were talking about that over lunch as well [21:55:06] greg-g: I seriously just replaced a step called "I get a cup of coffee" implemented as "sleep 7". My work is cut out for me. [21:55:15] :) [21:56:20] I make fun, but I really do sympathize, this shit is hard. [22:00:50] yeah, the conversation over lunch was around treating cucumber tests as contracts between product owners and devs, not as devs writing implementation-specific tests (which turn into basically regression tests) [22:01:15] ATDD then. [22:02:27] yeah, for the cucumber part at least [22:02:45] cucumber-level part, that is [22:03:14] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #233: FAILURE in 53 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/233/ [22:03:24] :( [22:05:23] hashar: since you're still online: what do you think of editing the bugs in the CI bugzilla component with [Jenkins] or [Zuul..... oh well [22:05:46] failure was "timed out after 5 seconds, waiting for {:class=>"ve-ce-branchNode", :tag_name=>"div"} to become present" [22:05:51] 3Wikimedia / 3Continuous integration: Jenkins: Install PerformancePlugin - 10https://bugzilla.wikimedia.org/71347 (10Greg Grossmeier) p:5Unprio>3Normal s:5normal>3enhanc [22:08:10] stupid timeouts [22:08:11] 3Wikimedia / 3Continuous integration: Prevent the addition of files with names that aren't supported on Windows - 10https://bugzilla.wikimedia.org/65140#c3 (10Greg Grossmeier) p:5Unprio>3Low This could be something added to Arcanist when we switch to Phabricator for code review. [22:09:22] 3Wikimedia / 3Continuous integration: Allow tests to specify what extensions and or what order things are loaded in - 10https://bugzilla.wikimedia.org/70250 (10Greg Grossmeier) p:5Unprio>3Normal s:5normal>3enhanc [22:09:53] 3Wikimedia / 3Continuous integration: Jenkins: point TMP/TEMP to workspace and delete it after build completion - 10https://bugzilla.wikimedia.org/68563 (10Greg Grossmeier) p:5Unprio>3Low s:5normal>3enhanc [22:11:03] (sorry for the bug spam here... going through the CI bugs) [22:11:08] 3Wikimedia / 3Continuous integration: Write job to ensure Parsoid settings on beta cluster is sane - 10https://bugzilla.wikimedia.org/68532 (10Greg Grossmeier) p:5Unprio>3Normal [22:11:52] 3Wikimedia / 3Continuous integration: Zuul: scale merge operations (tracking) - 10https://bugzilla.wikimedia.org/68480 (10Greg Grossmeier) p:5Unprio>3Normal [22:12:22] 3Wikimedia / 3Continuous integration: Jenkins: Add jobs for SwiftMailer extension - 10https://bugzilla.wikimedia.org/67722 (10Greg Grossmeier) p:5Unprio>3Normal [22:12:56] oh, chrismcmahon, how much do we use headless Fx vs depending on sauce for fx? [22:14:08] 3Wikimedia / 3Continuous integration: Augment jsonlint test with a test for duplicate keys - 10https://bugzilla.wikimedia.org/71284#c9 (10Kunal Mehta (Legoktm)) (In reply to Antoine "hashar" Musso from comment #8) > What I really meant is that neither PHP json_decode() or python json decoder > let us detects... [22:14:52] 3Wikimedia / 3Continuous integration: Jenkins: browser test host performance issue for timed builds - 10https://bugzilla.wikimedia.org/66449#c1 (10Greg Grossmeier) p:5Unprio>3Normal Chris: Is this still occuring for headless Fx after the throttling Antoine imposed? [22:16:22] greg-g: zero. we tried headless firefox and brought the Jenkins host to its knees. we are 100% SauceLabs at this point [22:16:34] !log Deleted deployment-mediawiki04 (i-000005ba.eqiad.wmflabs) and removed from salt and trebuchet [22:16:34] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce build #177: FAILURE in 40 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-sauce/177/ [22:16:36] Logged the message, Master [22:17:19] greg-g: xvfb takes a lot of resources, and multiple xvfb's just don't seem feasible [22:17:23] 3Wikimedia / 3Continuous integration: Jenkins: Zuul should not run jenkins-bot on changes for refs/meta/* - 10https://bugzilla.wikimedia.org/50389 (10Greg Grossmeier) [22:17:23] 3Wikimedia / 3Continuous integration: Jenkins: jenkins-bot reports spurious merge error when pushing changes to one of the gerrit config branches - 10https://bugzilla.wikimedia.org/64678 (10Greg Grossmeier) p:5Unprio>3Normal [22:17:40] chrismcmahon: gotcha, then my question in that comment on bug 66449 doesn't make sense :) [22:17:49] yep [22:18:29] greg-g: Sauce just compares favorably to every other alternative [22:18:45] yeah [22:18:52] 3Wikimedia / 3Continuous integration: Jenkins: browser test host performance issue for timed builds - 10https://bugzilla.wikimedia.org/66449#c2 (10Greg Grossmeier) from Chris on IRC: "we tried headless firefox and brought the Jenkins host to its knees. we are 100% SauceLabs at this point" My question is mo... [22:19:52] 3Wikimedia / 3Continuous integration: Update CodeSniffer rules automatically on machines running tests - 10https://bugzilla.wikimedia.org/64371 (10Greg Grossmeier) p:5Unprio>3Low [22:20:08] 3Wikimedia / 3Continuous integration: Figure out paths that needs to be backed up on gallium - 10https://bugzilla.wikimedia.org/63938 (10Greg Grossmeier) p:5Unprio>3High [22:22:06] Yippee, build fixed! [22:22:07] Project beta-update-databases-eqiad build #4324: FIXED in 2 min 5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/4324/ [22:22:15] legoktm: hey, wanna try what you just linked on that jsonlint bug? [22:22:17] :) [22:22:35] lol [22:25:49] that was just a case of https://meta.wikimedia.org/wiki/Cunningham%27s_Law [22:26:54] 3Wikimedia / 3Continuous integration: Add a Gerrit check for file line endings - 10https://bugzilla.wikimedia.org/51754 (10Greg Grossmeier) p:5Normal>3Low s:5normal>3enhanc [22:27:22] legoktm: :P [22:44:34] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #188: FAILURE in 0.38 sec: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/188/ [22:44:38] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #131: FAILURE in 0.25 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/131/ [22:46:56] Krinkle: Did you get your puppet mess untangled? If not how can I help? [22:47:10] bd808: Got distracted, but also not sure how to approach it yeah [22:47:54] What are the problematic roles/classes? [22:48:10] wat. git/gerrit is down [22:48:27] and I don't feel so good myself [22:48:32] chrismcmahon: web ui wfm [22:50:08] bd808: "ERROR: Problem fetching from origin / origin - could be unavailable. Continuing anyway" https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/131/console [22:50:47] bd808: this late on a Friday I don't really care much [22:58:46] Krinkle: Does the integration project have a local puppetmaster? If so we can cherry-pick that patch I just submitted and test it [23:02:01] Yep, doing that [23:02:12] Or maybe you wanna do it instead? [23:02:18] Cool. I just looked it up in wikitech :) [23:02:20] I mean, I'm sure you can, but would be good to see it all work out [23:02:30] integration-puppetmaster, varlib/git/operations/puppet [23:02:43] I can if you'd like. What host can I test it on that is failing now? [23:02:43] there's a few local patches arleady, so I tend to pick the cherry-pick command copy from gerrit [23:02:57] yup. that's good practice [23:03:00] then after picking on the puppetmaster, run puppet on integration-dev-trusty [23:03:09] ok. [23:03:09] integratin-dev * [23:03:37] bd808: btw, typo pacakges [23:04:02] bah. typing is hard [23:08:16] w00t. clean puppet run [23:08:32] Notice: /Stage[main]/Contint::Slave-scripts/File[/srv/deployment]/ensure: created [23:08:38] Notice: /Stage[main]/Contint::Slave-scripts/File[/srv/deployment/integration]/ensure: created [23:08:51] Followed by the various git clones [23:08:51] nice [23:11:07] Krinkle: If you wanted it, I could setup trebuchet in the integration project instead of that hack too. But I can see the ease of having puppet update everything instead of having to push the changes with git-deploy. [23:11:19] yeah [23:11:29] I'd rather have it deploy master [23:11:38] in prod we also have jenkisn slaves, those are deployed from tin [23:11:43] (gallium and lanthanum) [23:13:27] Krinkle: andrew.bogott merged for me. [23:15:07] Krinkle: One more offer of help. I can add the puppet class to the integration puppetmaster that I wrote for beta to keep the /var/lib/git/operations/puppet repo updated with the production branch. It does a fetch + rebase every hour, writes a log to /var/log and rolls everything back if the rebase doesn't merge cleanly. [23:15:35] I'm using it in a couple of small projects besides beta and it's pretty handy [23:16:02] * bd808 should actually move it out of the beta module probably