[02:30:21] PROBLEM - Puppet run on integration-slave-trusty-1006 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [03:10:21] RECOVERY - Puppet run on integration-slave-trusty-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [09:02:11] (03PS1) 10DCausse: Use java 8 for elastic plugins jobs [integration/config] - 10https://gerrit.wikimedia.org/r/322616 [09:14:09] (03CR) 10Hashar: [C: 031] "For context: "Ubuntu - OpenJdk 8" refers to a JDK path defined globally in Jenkins ( https://integration.wikimedia.org/ci/configure ) whi" [integration/config] - 10https://gerrit.wikimedia.org/r/322616 (owner: 10DCausse) [09:15:22] zeljkof: can you possibly babysit https://gerrit.wikimedia.org/r/#/c/322616/1 for dcausse please ? [09:15:22] that switch bunch of their ElasticSearch to java 8 [09:15:22] I am off, meeting starting [09:16:17] thanks [09:17:20] dcausse: if you feel adventurous you can deploy the jobs by using JJB ( https://www.mediawiki.org/wiki/Continuous_integration/Jenkins_job_builder ) [09:17:49] hashar: ok, will have a look [09:18:39] or poke Zeljko about it . He should show up soonish :} [09:19:29] sure :) [09:25:59] halfak: dcausse: I'm here, looking... [09:26:20] zeljkof: o/ [09:28:57] dcausse: ok, what I need to do with 322616? +2 and deploy the jobs? [09:29:29] zeljkof: I think so [09:31:54] dcausse: on it. can you test the jobs once they are deployed? [09:32:19] zeljkof: yes, I think I could rebuild an old patch [09:32:25] or I'll upload a new one [09:32:58] dcausse: ok, great, it's better to know if something is wrong sooner than later, it is easy to revert [09:33:12] zeljkof: makes sense [09:36:07] (03CR) 10Zfilipin: [C: 032] Use java 8 for elastic plugins jobs [integration/config] - 10https://gerrit.wikimedia.org/r/322616 (owner: 10DCausse) [09:41:06] (03PS2) 10Zfilipin: Use java 8 for elastic plugins jobs [integration/config] - 10https://gerrit.wikimedia.org/r/322616 (owner: 10DCausse) [09:41:49] dcausse: apologies for the delay, it has been a while since I have deployed jenkins jobs, forgot to rebase the patch before +2 :| [09:42:03] zeljkof: no worry [09:42:53] (03CR) 10Zfilipin: Use java 8 for elastic plugins jobs [integration/config] - 10https://gerrit.wikimedia.org/r/322616 (owner: 10DCausse) [09:42:55] (03CR) 10Zfilipin: [C: 032] Use java 8 for elastic plugins jobs [integration/config] - 10https://gerrit.wikimedia.org/r/322616 (owner: 10DCausse) [09:43:59] (03Merged) 10jenkins-bot: Use java 8 for elastic plugins jobs [integration/config] - 10https://gerrit.wikimedia.org/r/322616 (owner: 10DCausse) [09:51:34] PROBLEM - Puppet run on deployment-apertium01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:52:04] dcausse: merged, but now jenkins-jobs says it does not know about search-extra job (and the rest) :| [09:52:10] investigating... [09:52:32] ok, thanks for looking into this zeljkof ! [09:53:25] I'm doing something wrong... [10:01:13] dcausse: apologies for the delay, had to debug my setup, well, I was pointing the tool to the wrong folder :/ [10:01:23] anyway, the jobs are deployed now, please test [10:02:04] (03CR) 10Zfilipin: "Deployed jobs:" [integration/config] - 10https://gerrit.wikimedia.org/r/322616 (owner: 10DCausse) [10:03:27] zeljkof: testing, thanks! [10:16:50] zeljkof: it works like a charm (https://gerrit.wikimedia.org/r/#/c/322625/), thanks!! [10:17:58] dcausse: great! [10:18:14] sorry it took so long, my jjb-fu is a bit rusty [10:30:21] np :) [11:35:16] Yippee, build fixed! [11:35:17] Project selenium-Core » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #226: 09FIXED in 6 min 50 sec: https://integration.wikimedia.org/ci/job/selenium-Core/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/226/ [12:18:09] back and operational [12:21:50] (03CR) 10Hashar: [C: 04-1] Update test-requirements.txt to use jessie and not precise (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/322500 (owner: 10Paladox) [12:45:24] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 13Patch-For-Review: deployment-fluorine02 puppet broken - https://phabricator.wikimedia.org/T151169#2810445 (10hashar) a:03hashar Had to use `$::standard::has_ganglia`. I have cherry picked the patch on the beta cluster puppetmaster and that unlock... [12:46:21] !log beta: Cherry picked puppet fix for udp2log https://gerrit.wikimedia.org/r/#/c/322639/ T151169 [12:46:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:49:54] 10Continuous-Integration-Infrastructure, 06Discovery, 06Discovery-Search, 10Elasticsearch: ElasticSearch 2.3.5 + plugins 2.3.5 raise jar hell - https://phabricator.wikimedia.org/T151128#2810456 (10dcausse) I think we use git-deploy to deploy plugins on `deployment-prep` (via `deployment-tin.deployment-prep... [12:53:12] 10Beta-Cluster-Infrastructure: deployment-fluorine02 does not have logs - https://phabricator.wikimedia.org/T146723#2810463 (10hashar) 05Open>03Resolved a:03hashar udp2log-mw was not spawning for whatever reason and puppet was broken (T151169). I got puppet fixed, then stopped and restarted the service en... [12:53:55] 10Beta-Cluster-Infrastructure: deployment-fluorine02 does not have logs - https://phabricator.wikimedia.org/T146723#2810469 (10hashar) Puppet log showing the service could not start: ``` Debug: Service[udp2log-mw](provider=debian): Executing 'ps -ef' Debug: Executing '/bin/systemctl show -pSourcePath udp2log-mw'... [12:56:55] RECOVERY - Puppet run on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0] [13:00:31] 10Beta-Cluster-Infrastructure: deployment-fluorine02 does not have logs - https://phabricator.wikimedia.org/T146723#2810477 (10Krenair) Isn't it just going to fail again soon? We can't just keep having puppet restart it, it needs to be stable [13:04:35] Yippee, build fixed! [13:04:36] Project selenium-Math » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #216: 09FIXED in 34 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/216/ [13:04:36] Yippee, build fixed! [13:04:37] Project selenium-Math » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #216: 09FIXED in 35 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/216/ [13:07:20] 10Beta-Cluster-Infrastructure: deployment-fluorine02 does not have logs - https://phabricator.wikimedia.org/T146723#2810508 (10hashar) I have no idea what went wrong but puppet is fixed (was T151169) and service seems stable: Puppet find the service running: ``` root@deployment-fluorine02:~# puppet agent -tv In... [13:09:08] 10Continuous-Integration-Infrastructure, 06Discovery, 06Discovery-Search, 10Elasticsearch: ElasticSearch 2.3.5 + plugins 2.3.5 raise jar hell - https://phabricator.wikimedia.org/T151128#2810510 (10hashar) `git-deploy` is my understanding of how the plugins are deployed on beta cluster. `integration` is a... [13:09:26] dcausse: thanks for your reply about the ElasticSearch plugin deployment [13:09:35] java hell confused me for a good 2 or 3 hours [13:09:50] until I realized the issue was the .jar being git-fat pointers bah [13:13:30] 03Scap3, 10ContentTranslation-CXserver, 10MediaWiki-extensions-ContentTranslation, 05Language-Engineering October-December 2016, and 4 others: Enable Scap3 config deploys for CXServer - https://phabricator.wikimedia.org/T147634#2810526 (10KartikMistry) @akosiaris @mobrovac Should we deploy this tomorrow/la... [13:14:26] hashar: yes, I often get confused by git-fat too :/ [13:14:44] all of that being me doing a weird experimentation :D [13:14:50] :) [13:53:43] 03Scap3, 10ContentTranslation-CXserver, 10MediaWiki-extensions-ContentTranslation, 05Language-Engineering October-December 2016, and 4 others: Enable Scap3 config deploys for CXServer - https://phabricator.wikimedia.org/T147634#2810593 (10akosiaris) @KartikMistry. It's thanksgiving week. Releng has a freez... [13:59:49] 10Beta-Cluster-Infrastructure: deployment-fluorine02 does not have logs - https://phabricator.wikimedia.org/T146723#2810596 (10Ottomata) > No, we need it until there is some other mechanism to have the logs written to plain log files on disk. [[ https://github.com/wikimedia/operations-puppet/blob/production/mod... [14:34:03] Yippee, build fixed! [14:34:03] Project selenium-WikiLove » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #216: 09FIXED in 2 min 2 sec: https://integration.wikimedia.org/ci/job/selenium-WikiLove/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/216/ [14:45:47] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [14:48:08] PROBLEM - Host logstash is DOWN: CRITICAL - Host Unreachable (10.68.20.48) [14:50:56] logstash is me [14:50:56] I have deleted it [14:53:10] PROBLEM - Host deployment-pdf02 is DOWN: CRITICAL - Host Unreachable (10.68.16.129) [14:54:28] PROBLEM - Host deployment-conftool is DOWN: CRITICAL - Host Unreachable (10.68.20.30) [15:01:59] $ host buildlog.wmflabs.org [15:01:59] buildlog.wmflabs.org has address 208.80.155.156 [15:02:00] Host buildlog.wmflabs.org not found: 3(NXDOMAIN) [15:02:05] DNS never cess to amaze me [15:06:05] PROBLEM - Puppet run on buildlog is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:35:34] hashar, was that from your computer or a wikimedia host? [15:39:48] PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:40:56] Krenair: my computer [15:41:03] I gave up and used a different hostname :D [15:41:33] yeah I wouldn't trust my ISP's DNS with that sort of thing [15:43:58] Yippee, build fixed! [15:43:58] Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #235: 09FIXED in 21 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/235/ [15:51:03] PROBLEM - Puppet run on deployment-phab01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:56:23] hashar you know the problem we had a few days where tests just stopped testing because a patch had to many depends-on: i belive this https://gerrit.googlesource.com/plugins/zuul/ will help prevent that problem in the future? [15:56:24] :) [15:56:56] ;-} [15:57:09] the author Khai Do is their Gerrit expert :D [15:57:29] Oh [15:57:40] A REST endpoint to allow other clients to retrieve CRD info. [15:57:47] and built-in dependency cycle, sounds nie [15:57:49] nice [15:57:52] Yeh [15:57:56] good find [15:58:03] Yep [15:58:24] hashar it will show the needed-by reference too on the change screen [15:58:35] making it easyer to see patches that it is needed by [15:59:17] hashar: what a reason to be off-work (the work place burned down) :o all the best to them with the cleanup [15:59:18] I guess we will need to wait for either gerrit 2.13 or master to use the plugin as it dosent look like it supports gerrit 2.12 [16:00:15] ping zeljkof [16:02:30] Yippee, build fixed! [16:02:31] Project selenium-CentralNotice » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #219: 09FIXED in 1 min 30 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/219/ [16:02:36] Yippee, build fixed! [16:02:37] Project selenium-CentralNotice » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #219: 09FIXED in 1 min 36 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/219/ [16:04:12] Yippee, build fixed! [16:04:13] Project selenium-CentralNotice » chrome,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #219: 09FIXED in 3 min 12 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/219/ [16:05:02] Yippee, build fixed! [16:05:02] Project selenium-CentralNotice » firefox,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #219: 09FIXED in 4 min 1 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/219/ [16:05:17] Yippee, build fixed! [16:05:18] Project selenium-CentralNotice » chrome,beta,Windows 7,contintLabsSlave && UbuntuTrusty build #219: 09FIXED in 4 min 17 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Windows%207,label=contintLabsSlave%20&&%20UbuntuTrusty/219/ [16:06:50] Yippee, build fixed! [16:06:51] Project selenium-CentralNotice » firefox,beta,Windows 7,contintLabsSlave && UbuntuTrusty build #219: 09FIXED in 5 min 50 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Windows%207,label=contintLabsSlave%20&&%20UbuntuTrusty/219/ [16:14:32] hashar https://gerrit.googlesource.com/plugins/zuul/+/master/src/main/resources/Documentation/about.md [16:14:52] zeljkof: joining now [16:14:57] I wonder if the needed-by will work even if the patch that depends on that patch dosent use Depends-On [16:14:59] ? [16:16:41] I have no idea [16:16:52] we would have to look at it eventually [16:19:03] Ok [16:19:04] :) [16:34:50] hashar i updated the plugin here https://gerrit-review.googlesource.com/#/c/91874/ :) [16:50:54] 10Beta-Cluster-Infrastructure: deployment-fluorine02 does not have logs - https://phabricator.wikimedia.org/T146723#2811023 (10Krenair) >>! In T146723#2810596, @Ottomata wrote: >> No, we need it until there is some other mechanism to have the logs written to plain log files on disk. > > [[ https://github.com/wi... [16:51:23] who does "xff AT wikimedia.org" [16:51:42] mutante: hello. Could it be aliased/sent to OTRS ? [16:51:43] and maintains the "trusted XFF" list [16:52:11] i dont know [16:52:47] iirc there is a mediawiki extension for that [16:52:50] Tim [16:52:56] with a list of trusted ips [16:52:58] and probably also Tim [16:53:07] Reedy might know as well [16:53:17] I havent touched that list in a decade or so [16:53:22] thanks [16:55:35] * hashar away/audio [18:37:24] PROBLEM - Host deployment-puppetmaster is DOWN: CRITICAL - Host Unreachable (10.68.16.63) [18:42:33] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-CentralAuth, 10MobileFrontend, 06Reading-Web-Backlog: beta cluster: Notice: Undefined variable: wmgMFUseCentralAuthToken in /srv/mediawiki-staging/wmf-config/mobile-labs.php on line 20 - https://phabricator.wikimedia.org/T146945#2811507 (10Jdlrobson) 0... [19:11:23] hm, is there a chicken/egg issue with scap in prod? or am I doing something wrong? [19:11:45] ottomata: Care to be any more vague? ;) [19:11:50] in order for puppet to run on a target, the deploy needs to succeed first, but in order for a deploy to succeed, puppet has to have run on the target, so it has a systemd service unit defined in order to reload the target [19:11:59] (was typing! :) ) [19:12:35] so you gotta do: scap deploy, fails because systemd unit can be reloaded. then run puppet on target now that the repo is in place, it will create sysmted unit. then run scap deploy again [19:13:13] (i'm talking about scap3, btw) [19:14:59] can you unpack what you mean when you say "for puppet to run on a target, the deploy needs to succeed first" [19:15:01] ? [19:15:29] as far as I know, there must be a /srv/deployment/[repo]/.git/DEPLOY_HEAD on tin for puppet to run on a target [19:15:36] but that's all that's required [19:15:37] HMMMM, i suppose ok, i think i see where i went wrong. ok [19:16:17] is there anything about scap/puppet that we can make better so this problem is easier to avoid? [19:16:22] hm, yeah, my first puppet run on the target failed though....didn't look into why. i did have a typo in my scap.cfg, but it wasn't unparseable. and the target shouldn't care about scap.cfg when running puppet [19:16:48] but, yeah, i see that if the repo is present on tin, the first puppet run on the target should pull it in, and then create the systemd unit, and the restart the service [19:17:46] thcipriani: , sorry, i think it works, user error. [19:17:54] but, hm, i did have one problem [19:18:11] puppet on tin created the repo in such a way, that i couldn't run scap deploy-log or scap deploy [19:18:17] maybe that's my fault too? [19:18:32] i guess i should chgrp g+w scap/ on my repo and comit that [19:18:33] that's not good :\ [19:19:04] also, things in .git were not group writeable my my user, they weren't group owned by wikidev [19:19:15] i had to sudo chgrp -R wikidev .git/ ./ [19:19:30] and i think i also had to sudo chmod -R g+w .git [19:20:02] and the new repo on tin was setup by puppet? [19:20:06] we should fix that :\ [19:20:15] thcipriani: yes [19:20:26] could you file a task? [19:20:29] k [19:20:30] so we don't lose that [19:20:35] thank you! [19:24:31] 06Release-Engineering-Team, 03Scap3: scap3 repos permission errors after cloning by puppet in production. - https://phabricator.wikimedia.org/T151231#2811641 (10Ottomata) [19:24:51] thcipriani: ^ [19:25:09] ottomata: perfect, thank you! [19:41:35] 06Release-Engineering-Team, 03Scap3: scap3 repos permission errors after cloning by puppet in production. - https://phabricator.wikimedia.org/T151231#2811743 (10thcipriani) p:05Triage>03Normal [20:42:35] Yippee, build fixed! [20:42:35] Yippee, build fixed! [20:42:36] Project selenium-Echo » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #218: 09FIXED in 1 min 34 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/218/ [20:42:36] Project selenium-Echo » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #218: 09FIXED in 1 min 34 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/218/ [22:06:08] 10Continuous-Integration-Config, 10MediaWiki-extensions-MultimediaViewer, 06Reading-Web-Backlog, 13Patch-For-Review: Qunit tests for MultimediaViewer fail with the currently submitted code - https://phabricator.wikimedia.org/T150575#2812421 (10Tgr) 05Open>03Resolved [22:21:41] It seems the tests on nodepool are slow again