[00:13:44] !log Debugging a fatal in betalabs, might cause syncs to fail [00:13:49] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [00:20:01] Project beta-update-databases-eqiad build #9717: 04FAILURE in 0.54 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/9717/ [00:27:02] !log Initialized ORES on all wikis where it's enabled, was causing job failures [00:27:09] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [00:29:38] now please somebody trigger a logging fatal :P [00:32:24] MaxSem: just deploy https://www.mediawiki.org/wiki/Extension:Buggy in beta cluster for great profit [00:32:48] that actually might be useful there actually [00:32:53] (double actually!) [00:33:06] nope [00:33:25] I'm looking for a piece of code that makes a call with wrong parameters [00:50:29] !log Rebooting deployment-logstash3.eqiad.wmflabs; console full of hung process messages from kernel [00:50:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [01:12:59] !log Leaving my hacks for the night to collect data, if needed revert with cd /srv/mediawiki-staging/php-master/vendor && sudo git reset --hard HEAD && sudo chown -hR jenkins-deploy:wikidev . [01:13:06] 06Release-Engineering-Team: Add a European mid-day SWAT window - https://phabricator.wikimedia.org/T137970#2435839 (10greg) @krinkle sure, let's look into that, but it's tangential to this task. [01:13:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [01:14:01] !log Restarted logstash on deployment-logstash2 [01:14:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:01:28] hashar hi it seems i am having zuul/status.json: proxy error [08:01:30] on zuul [08:01:41] I doint know how i did that. [08:07:55] It seems http://localhost:8001 is not responding when doing curl [08:24:20] paladox: good morning [08:24:23] paladox: is zuul running? [08:24:30] ps -u f|grep zuul [08:24:30] and you too and yes zuul is running [08:24:35] wrong command [08:24:46] Oh [08:24:47] you can look for process listening on a tcp port [08:24:53] sudo netstat -tlnp [08:24:56] Thanks [08:25:17] sudo netstat -tlnp [08:25:22] tcp 0 0 0.0.0.0:8001 0.0.0.0:* LISTEN [08:25:34] tcp 0 0 0.0.0.0:8001 0.0.0.0:* LISTEN 4060/python [08:25:38] yeah that is the python process [08:25:42] Oh [08:25:52] and indeed it does not respond [08:26:04] yep, i think may be related to me trying to build [08:26:07] zuul as a deb [08:26:09] yesturday [08:26:09] you can then lok at the zuul debug logs in /var/log/zuul/debug.log [08:26:24] DEBUG zuul.Gearman: Looking for lost builds [08:26:31] Yep [08:26:33] which is a recurring / harmless process [08:26:38] Oh [08:26:44] then [08:26:54] I wonder what i did to break zuul [08:27:00] look for process being run by the zuul user: ps -u zuul f [08:27:04] Ok [08:27:12] there are a lot of zuul-server process! [08:27:19] ps -u zuul f [08:27:22] Oh yes [08:27:29] and systemctl status zuul [08:27:33] shows the same stuff [08:27:35] not sure what is happening [08:27:36] Yep [08:27:47] maybe systemd though the process is not running and keep spawning it over and over [08:27:53] Yep [08:28:55] try to stop it? sudo systemctl stop zuul [08:28:59] and see whether that kills all process [08:29:00] Ok [08:29:10] Ok ive stoped it now [08:29:29] and you will have to kill all the left over process now [08:29:42] Oh, how do i kill it please [08:30:05] ps -u zuul [08:30:12] gives you the list of process [08:30:18] Yep all defunct [08:30:30] ps -u zuul -o pid [08:30:34] get only the pid number [08:30:41] which one can then pass to kill [08:30:46] Yep alot of numbers [08:30:53] I ran: ps -u zuul -o pid|xargs kill [08:30:58] Oh :) [08:31:16] they are zombies process :( [08:31:34] Oh, would that be the reason [08:31:38] for proxy erroring out [08:31:43] all gone [08:31:48] Oh thanks [08:31:58] well if port 8001 yields nothing, surely that would cause the proxy error [08:32:06] since Apache times out trying to reach the Zuul web service [08:32:12] Works now [08:32:14] try to start it again? [08:32:20] Well status unaviable [08:32:28] I have to go and edit the zuul*.conf [08:32:31] files again [08:32:31] sudo systemctl start zuul [08:32:35] ah yeah :( [08:32:37] damn puppet [08:32:52] yay works now [08:32:54] :) [08:33:01] /var/log/puppet.log has the log of changes it does [08:33:04] Actually i re added the role zuul server [08:33:14] thinking it was that and it overwrote the things [08:33:21] But ive un added it now [08:34:03] then check the service: ps -u zuul f [08:34:15] 32098 ? Sl 0:04 /usr/share/python/zuul/bin/python /usr/bin/zuul-server -c /etc/zuul/zuul-server.conf [08:34:15] 32102 ? Z 0:00 \_ [zuul-server] [08:34:23] the second process (pid 32102) is Zombie [08:34:32] that is the Gearman embeded service [08:34:41] Oh [08:34:57] which fails for whatever reason [08:35:13] Oh [08:35:23] I wonder why they didnt shut down properly [08:35:30] gotta kill it [08:35:37] then spawn the process manually [08:36:06] sudo su - zuul -s /bin/bash [08:36:06] /usr/share/python/zuul/bin/python /usr/bin/zuul-server -c /etc/zuul/zuul-server.conf -d [08:36:17] Oh [08:36:19] (-d is to keep it in foreground) [08:36:25] Oh [08:36:27] which might bump errors to stdout/stderr [08:36:31] Oh [08:37:37] I dont know where the process stdout is logged by systemd :( [08:37:44] Oh [08:38:02] maybe it is just discarded [08:38:18] Yep [08:38:25] I am almost finished editing [08:38:29] the zuul*.conf files [08:38:57] gear yields: Exception: Could not open socket [08:38:59] :( [08:39:03] Oh [08:39:45] Im going to do now [08:39:46] zuul-server -c /etc/zuul/zuul.conf -l /etc/zuul/wikimedia/zuul/layout.yaml -t [08:40:09] OHHH [08:40:15] $ sudo netstat -tlnp|grep 4730 [08:40:15] tcp 0 0 127.0.0.1:4730 0.0.0.0:* LISTEN 6356/gearmand [08:40:18] Yep to apply the new conf's [08:40:20] Oh [08:40:22] there is a gearman daemon already listening on 4730 [08:40:27] that makes no sense [08:41:00] 2016-07-04 13:42:19 install gearman-job-server:amd64 1.0.6-5 [08:41:07] it should be dropped [08:41:13] hashar [08:41:13] root@gerrit-test:/etc/zuul# sudo service zuul-merger [08:41:14] [....] Zuul Merger: /etc/default/zuul-merger is not set to START_DAEMON=1: exiti[FA failed! [08:41:15] Oh [08:41:37] I did: apt-get remove --purge gearman-job-server [08:41:43] Oh thanks :) [08:41:50] then we can re install [08:41:57] and Zuul should be running now [08:42:01] Thanks [08:42:02] no gearman-job-server is not needed [08:42:10] that is a daemon written in C [08:42:17] Zuul has its own embedded Gearman server [08:42:32] I get this error http://gerrit-test.wmflabs.org/gerrit/#/c/6/2 [08:42:36] Cannot merge [08:42:38] Oh [08:42:46] Didnt know that [08:42:47] thanks [08:43:12] and now it refuses to start for a random reason :( [08:43:33] Oh [08:44:16] root@gerrit-test:/etc/zuul# sudo service zuul-merger status [08:44:16] ● zuul-merger.service - LSB: Zuul Merger [08:44:16] Loaded: loaded (/etc/init.d/zuul-merger) [08:44:16] Active: active (exited) since Thu 2016-07-07 08:43:59 UTC; 9s ago [08:44:20] Shows that ^^ [08:44:26] hashar it shows exited [08:44:39] zuul-server should be fine now [08:44:43] Ok [08:44:45] Thanks [08:45:02] systemctl status zuul-merger [08:45:03] Jul 07 08:43:59 gerrit-test zuul-merger[731]: Zuul Merger: /etc/default/zuul-merger is not set to START_DAEMON=1: exiting: failed! [08:45:11] it is disabled by default [08:45:15] Yep and oh [08:45:21] Do we edit the file [08:45:34] the role classes are supposed to reate /etc/default/zuul-merger and /etc/default/zuul-server with START_DAEMON=1 [08:45:35] Seems it is now stuck in que https://gerrit-zuul.wmflabs.org/ [08:45:39] Oh [08:46:38] hashar /etc/default/zuul-server shows as a new file [08:47:44] hashar it shows zuul/status.json: Bad Gateway [08:47:46] again [08:47:51] https://gerrit-zuul.wmflabs.org/ [08:47:54] Oh wait [08:48:01] i think apache was rolled back again [08:48:20] Yep it was rolled back [08:50:06] Still shows zuul-merger as exited [08:56:49] hashar ^^ [08:57:09] ● zuul-merger.service - LSB: Zuul Merger [08:57:09] Loaded: loaded (/etc/init.d/zuul-merger) [08:57:09] Active: active (exited) since Thu 2016-07-07 08:56:29 UTC; 13s ago [08:57:09] Process: 1989 ExecStop=/etc/init.d/zuul-merger stop (code=exited, status=0/SUCCESS) [08:57:09] Process: 2024 ExecStart=/etc/init.d/zuul-merger start (code=exited, status=0/SUCCESS) [08:57:36] 06Release-Engineering-Team: Identify inaugural SWAT members for the European SWAT window - https://phabricator.wikimedia.org/T139544#2436448 (10hashar) The European SWAT idea got triggered following a conversation with mobile devs @Jhernandez and @phuedx . So I guess they will be happy to propose patches :] The... [08:58:47] paladox: check the logs ? [08:59:06] 2170 ? Sl 0:00 /usr/share/python/zuul/bin/python /usr/bin/zuul-merger -c /etc/zuul/zuul-merger.conf [08:59:08] at least it is around [08:59:11] I have in the merger one and shows nothing that would cause the problem [08:59:26] It's running now [08:59:49] audio time for me [08:59:49] hashar http://gerrit-jenkins.wmflabs.org/job/test-gerrit/5/ [08:59:57] 06Release-Engineering-Team: Identify inaugural SWAT members for the European SWAT window - https://phabricator.wikimedia.org/T139544#2436452 (10zeljkofilipin) I would also like to contribute, but I need training. [09:00:13] 06Release-Engineering-Team: Identify inaugural SWAT members for the European SWAT window - https://phabricator.wikimedia.org/T139544#2436453 (10phuedx) If @hashar could host a hangout (on air?) whenever he deploys, that'd be great. It's been a while since I pushed bits out to the servers so I need to get my sea... [09:01:27] Oh [09:01:28] Ok [09:14:33] hashar yay it works http://gerrit-jenkins.wmflabs.org/job/test-gerrit/9/console [09:14:42] but the icon dosent go green it goes blie [09:14:43] blue [09:27:53] It also merges too [09:27:55] :) [09:39:03] paladox: for the green icon, you need a plugin. Greenball plugin [09:39:05] iirc [09:39:11] Oh thanks [09:39:28] hashar also it seems to not color success in gerrit [09:39:31] not sure why [09:39:47] that bits need a proper Gerrit conifg [09:39:56] Oh [09:40:41] what do i do gerrit config please [09:40:47] see ./modules/gerrit/templates/gerrit.config.erb [09:40:57] [commentlink "ci-test-result"] [09:40:57] match = "
  • ([^ ]+) ()[^<]+ : ([a-zA-Z_]+)([^<]*)
  • " [09:40:57] html = "
  • $2$1 $3$4
  • " [09:41:15] copy paste the [commentlink "ci-test-result"] section :) [09:41:36] Ok [09:41:40] Oh [09:41:46] Thanks [09:48:04] hashar ive looked and also applied that and works but dosent show green. [09:48:15] but it hides the link under text now [09:48:16] :) [09:57:10] \O/ [09:57:19] maybe you gotta restart jenkins [09:58:01] Ok [09:58:10] Iv'e restarted jenkins now and am going to recheck [10:02:43] hashar it shows green on jenkins [10:02:48] but still shows plan color [10:02:50] on gerrit [10:02:52] http://gerrit-test.wmflabs.org/gerrit/#/c/11/2 [10:02:56] http://gerrit-jenkins.wmflabs.org/job/test-gerrit/18/console [10:07:36] That's because it is in css [10:07:37] file [10:08:15] How do you apply css [10:08:17] please [10:08:27] https://github.com/wikimedia/operations-puppet/blob/production/modules/gerrit/files/skin/GerritSite.css [10:11:03] 06Release-Engineering-Team: Add a European mid-day SWAT window - https://phabricator.wikimedia.org/T137970#2436623 (10JanZerebecki) I would be interested in doing this. But it probably takes at least a week for me to be able find the time (need to switch my computer among other things) and it is unclear at this... [10:18:21] 06Release-Engineering-Team: Identify inaugural SWAT members for the European SWAT window - https://phabricator.wikimedia.org/T139544#2436652 (10JanZerebecki) Seems I posted this to the wrong task: >>! In T137970#2436623, @JanZerebecki wrote: > I would be interested in doing this. But it probably takes at least a... [10:20:48] hashar works now [10:20:49] :) [10:30:55] 10Deployment-Systems, 10scap, 10MediaWiki-API, 03Scap3 (Scap3-MediaWiki-MVP), and 2 others: Create a script to run test requests for the MediaWiki service - https://phabricator.wikimedia.org/T136839#2436704 (10mobrovac) p:05Triage>03Normal Ok, so to start off here, we need to agree on the way that Medi... [10:40:00] hashar there's a gerrit 2.12.3 update [10:40:00] https://gerrit-documentation.storage.googleapis.com/ReleaseNotes/ReleaseNotes-2.12.3.html [10:40:03] :) [11:23:26] hashar i got color working [11:23:27] :) [11:23:34] i also upgraded gerrit to 2.12.3 [11:23:36] :) [11:39:03] 10Browser-Tests-Infrastructure, 15User-zeljkofilipin: Update mediawiki_selenium to use Marionette - https://phabricator.wikimedia.org/T137540#2436915 (10zeljkofilipin) a:05zeljkofilipin>03None [11:45:36] hi, phab question: I'd like to create a custom tag to group a set of tickets so that I can access them quickly when working a feature that should help to fix them [11:46:21] better in #wikimedia-devtools , but anyway: https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects [11:46:39] andre__: thanks [11:46:41] if you just want your very personal own watchlist of bookmarks though, there are flags in Phabricator for that [11:48:53] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-Translate, 05Release, 07Wikimedia-log-errors: Notice: Undefined index: 0 in /srv/mediawiki/php-1.28.0-wmf.9/extensions/Translate/tag/TranslatablePage.php on line XXX - https://phabricator.wikimedia.org/T139447#2436970 (10Nikerabbit) p... [11:49:06] andre__: that's not very personal but more a way to add categorization that's not really parent/subtask, maybe Epic would be similar to what I'd like to have [12:23:32] * paladox testing gerrit test install with testing/test and using zuul clone for mediawiki/core with tests test-gerrit composer-gerrit-test :) [12:23:34] :) [12:25:09] 06Release-Engineering-Team: Identify inaugural SWAT members for the European SWAT window - https://phabricator.wikimedia.org/T139544#2437134 (10Nikerabbit) I am interested to help, though I won't be available every day. It has been a long time since I did deployments so I too would benefit from training. [12:26:15] 06Release-Engineering-Team: Identify inaugural SWAT members for the European SWAT window - https://phabricator.wikimedia.org/T139544#2437138 (10Dereckson) >>! In T137970#2385898, @Dereckson wrote: > If such window is acceptable from an ops point of view, I can be available during this time slot. [13:09:51] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: Drop some Trusty permanent slaves from integration labs project - https://phabricator.wikimedia.org/T139535#2437291 (10hashar) a:03hashar The Jenkins graph above is average so it does not accomodate for spikes :( I created some more g... [13:10:38] !log deleting integration-slave-trusty-1024 and integration-slave-trusty-1025 to free up some RAM. We have enough permanent Trusty slaves. T139535 [13:10:39] T139535: Drop some Trusty permanent slaves from integration labs project - https://phabricator.wikimedia.org/T139535 [13:10:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:16:58] PROBLEM - Host integration-slave-trusty-1025 is DOWN: CRITICAL - Host Unreachable (10.68.18.198) [13:18:54] PROBLEM - Host integration-slave-trusty-1024 is DOWN: CRITICAL - Host Unreachable (10.68.22.196) [13:26:15] hashar i created the composer-gerrit-test (Yes the name is horrible but im testing for now and will clear up later including possibly adding a section in integration/config for the repos that will be used on the test install) [13:26:28] http://gerrit-jenkins.wmflabs.org/job/composer-gerrit-test/3/console [13:26:40] ^^ i just carn't seem to get composer to run. [13:37:10] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: Drop some Trusty permanent slaves from integration labs project - https://phabricator.wikimedia.org/T139535#2437394 (10hashar) They were both on labvirt1010 which recovered 16GBytes of memory :-] {F4249228 size=full} [13:39:49] paladox: [[: not found [13:39:55] paladox: seems you run /bin/sh [13:40:02] Oh [13:40:27] paladox: in https://integration.wikimedia.org/ci/configure [13:40:34] Yep [13:40:38] lookup for "Shell executable" and switch to /bin/bash [13:40:50] Ok [13:40:54] Thanks [13:41:06] alternatively in the job you could: #!/bin/bash [13:41:33] Oh [13:42:27] hashar yay it worked [13:42:27] http://gerrit-jenkins.wmflabs.org/job/composer-gerrit-test/7/console [13:42:37] thankyou [13:43:21] I also cloned mediawiki/core into it [13:45:27] I am going to work on clean it up now and merging it into integration/config for anyway to test. [13:46:05] hashar also one question how can you ssh into other instances like integration does for testing [13:46:09] using mutiple instances [13:46:19] Luke081515 wanted to know how to do that too. [13:46:22] please [13:49:00] paladox: create an instance with contint::slave::common [13:49:08] Oh [13:49:11] generate a ssh key pair for the jenkins user [13:49:18] thanks im going to try that and oh [13:49:20] maybe create the user [13:49:25] Yep [13:49:37] then on the Jenkins master add the instance has a ssh slve -you will need the ssh slave plugin- [13:49:54] https://wiki.jenkins-ci.org/display/JENKINS/Distributed+builds#Distributedbuilds-Havemasterlaunchslaveagentviassh [13:49:57] https://wiki.jenkins-ci.org/display/JENKINS/SSH+Slaves+plugin [13:50:02] Yep [13:50:06] Ok thanks [13:50:11] and https://wiki.jenkins-ci.org/display/JENKINS/Step+by+step+guide+to+set+up+master+and+slave+machines [13:50:32] and for prod our guide is https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/Setup [13:50:33] I think i will call it gerrit-slave-1 [13:50:41] 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#2437464 (10zeljkofilipin) [13:50:41] thanks [13:50:47] jenkins-slave-01 :] [13:50:49] and thankyou for helping me :) [13:50:52] Ok [13:50:53] :) [13:51:01] i will do jenkins-slave-01 [13:51:45] hashar should i make it debian [13:51:48] or ubuntu [13:51:56] depends on what you want to do ? [13:52:09] Zend 5.5 are on Trusty [13:52:23] most of the jobs are on Jessie (eg python / ruby / javascript) [13:52:34] the rake / tox / npm jobs are tied to Jessie slaves [13:52:40] maybe you can get one of each ? [13:52:52] 10Deployment-Systems, 10scap, 10MediaWiki-API, 03Scap3 (Scap3-MediaWiki-MVP), and 2 others: Create a script to run test requests for the MediaWiki service - https://phabricator.wikimedia.org/T136839#2349461 (10Anomie) >>! In T136839#2436704, @mobrovac wrote: > One way I see that might be easy to implement... [13:53:00] on the slaves you can fill in labels to group slaves sharing a common property [13:53:07] eg have a label "UbuntuTrusty" [13:53:15] Oh [13:53:18] yes we can test [13:53:19] then you can restrict a job to run on a specific label [13:53:23] mutiple php on one slave [13:53:42] in the job page there is something like: [x] Restrict where this job can run: [UbuntuTrusty__________] [13:53:43] Good place to test then roll it out to the main production ci. [13:53:47] Oh yep [13:54:21] I will probaly install debian jessie then i can test php 5.3 and 5.5 on the same serer [13:54:23] server [13:54:29] or do you want ubuntu. [13:54:51] hashar ^^ [13:56:31] Well i presume we want two slaves one with jessie and one with ubuntu [14:00:59] hashar it says class contint::slave::common does not exist [14:01:19] Would it be this one role::ci::slave::labs::common [14:01:46] paladox: role::ci::slave::labs::common probably [14:01:53] Ok thanks [14:03:42] hashar how do i create the user jenkins on the slave please [14:03:55] and do i generate the key on the slave too [14:08:15] Theres the jenkins-deploy user [14:09:27] I am in audio right now [14:15:27] Ok [15:12:23] hashar what is jenkins-deploy password please [15:12:39] greg-g: so sad to see that someone argued for the removal of the humour. more fun is now illegal. :( [15:13:35] jzerebecki what do you mean is now illegal [15:14:29] jzerebecki: I actually just did it to clean it up, I then thought about putting it back, just at the very bottom, as I was going to sleep. feel free to bet me :) [15:20:24] paladox: create anotther user [15:20:30] Ok [15:20:31] paladox: jenkins-deploy is for CI / beta cluster [15:20:33] I have [15:20:38] i have created jenkins [15:20:39] :) [15:23:30] hashar doing this [15:23:31] jenkins@gerrit-test:~$ ssh jenkins@10.68.18.181 [15:23:31] Password: [15:23:31] Permission denied (publickey,keyboard-interactive). [15:23:37] brings up the permission denied [15:23:49] i entered the password correctly but it just isent ssh into the instance [15:28:15] hashar i can build dpkg on windows using wpkg (Windows) which runs both on debian since it is heavly dependant on that so building it on windows will work in debian and will work with windows [15:31:51] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 07Upstream, 15User-zeljkofilipin: Firefox v47 breaks mediawiki_selenium - https://phabricator.wikimedia.org/T137561#2437712 (10zeljkofilipin) Running `bundle update selenium-webdriver` in a repository will update version of `selenium-webdrive... [15:36:24] hashar i created the user but ssh wont work [15:37:58] paladox: https://en.wiktionary.org/wiki/illegal#English meaning 1 [15:38:10] Oh thanks for replying [15:45:11] well i found out where i do it. [15:45:16] I do it in wikitecgh [15:45:19] wikitech [15:48:32] thcipriani: are you done with SWAT ? [15:48:54] thcipriani: got some writing for the varnish routing of doc.wm.o and the various integration.wm.o URLS at https://phabricator.wikimedia.org/T139620 :D [15:49:14] seems we just have to migrate coverage reports out of integration.wm.o [15:49:33] and then integration.wm.o can be routed to scandium (it would then just serve the homepage + zuul + jenkins ) [15:49:43] :) [15:49:53] ostriches: if you can cookie lick https://phabricator.wikimedia.org/T139620 ! [15:51:39] hashar: yup, SWAT is complete [15:51:59] have some minutes to revisit the varnish url routing task before I leave? [15:52:37] I think it is going to be fairly easy to split the generated stuff from jenkins/zuul. Just have to move /cover/ (and break the URL unfortunately) [15:54:03] hashar: got a few for a hangout before you have to leave (I'm a slow typist :P) [15:54:48] I'm unclear why we're moving coverage from integration to doc? [15:55:24] paladox: no idea why ssh fails. On jenkins-slave-01 /var/log/auth.log has: error: AuthorizedKeysCommand /usr/sbin/ssh-key-ldap-lookup returned status 1 [15:55:32] Works now [15:55:38] you have to create it in wikitech [15:55:44] and add the ssh key there. [15:56:03] I am adding the user to jenkins now [15:56:05] to test [15:56:35] paladox: there is something going on with pam / ssh apparently. Maybe #wikimedia-labs can help [15:56:45] Oh [15:56:50] i managed to ssh into it [15:56:51] now [15:56:59] thcipriani: the coverage reports can not be hosted on scandium. But they are under integration.wm.o/cover/ [15:57:22] thcipriani: so if we move integration.wm.o to be routed to scandium (for Zuul / Jenkins) we would miss the coverage report [15:57:25] hashar: cannot be hosted on scandium because, like docs, coverage needs some kind of cgi? [15:57:35] thcipriani: so I think /cover/ should be moved to the same future machine that will host the doc [15:57:50] yeah I am pretty sure coverage reports has some php script. Can be checked though [15:58:10] or we can state that coverage reports have to be plain text :) [15:58:30] kk, that makes sense. We'd likely want a redirect from integration though, correct? [15:58:50] or is that break acceptable? [16:01:05] https://phabricator.wikimedia.org/T139620#2437809 [16:01:24] all static beside the couple php file we have to generate the browsing page at https://integration.wikimedia.org/cover/ [16:02:11] we could setup some wildcard permanent redirect from https://integration.wikimedia.org/cover/(.*) to https://whatever_new_host/cover/$1 yeah [16:02:32] yup, that sounds easy and would probably save a lot of headache. [16:02:59] then we will know that scandium has only the basic page https://integration.wikimedia.org/ [16:03:08] the HTML glue for Zuul status https://integration.wikimedia.org/zuul/ [16:03:22] and the mod_proxy routes to reach Zuul and Jenkins daemons [16:04:03] then we can get the generated doc/coverage migrated either before or after we migrate stuff to scandium [16:04:22] which leave us with the question: where to? :( [16:05:10] yeah, still no good answer there. Didn't seem like a ganeti vm was outside the realm of possibility. More a question of policy rather than possibility. [16:05:19] yeah looks like [16:05:27] will poke eu folks tomorrow [16:05:52] and draw yet another layout! [16:06:03] hashar I tryed ssh in jenkins [16:06:28] that will be for tomorrow. Rushing out to catch up with kids then I am out for local hacker group meeting. [16:06:32] thcipriani: thx tyler! [16:06:34] Ok [16:06:50] paladox: that is really some labs / ssh issue. Not sure what is going on , I have no good idea :( [16:06:58] Oh [16:07:07] probably pam/ssh refusing to honor /home/jenkins/.ssh/authorized_keys somehow [16:07:08] But ssh works on the machine [16:07:15] or the files in .ssh dir having the wrong permission [16:07:16] s [16:07:17] just not in jenkins gui [16:07:22] I am out! [16:07:22] Oh maybe [16:07:24] Ok bye [16:07:31] #wikimedia-labs can help I am sure [16:07:36] Ok [16:07:46] Not with jenkins [16:07:50] ive already asked [16:29:39] 07Browser-Tests, 10MobileFrontend, 06Reading-Web-Backlog, 07Technical-Debt: Refactor MobileFrontend browser tests (language, nearby) - https://phabricator.wikimedia.org/T109464#2437939 (10Jhernandez) [17:02:48] ostriches just to let you know gerrit was updated today to 2.12.3 and includes a fix for plugins download link [17:02:55] and also i think may have a security fix [18:13:19] hashar yay i got it working remotly now [18:13:20] http://gerrit-jenkins.wmflabs.org/job/composer-gerrit-test/13/console [18:15:20] username should be hasharAway [19:06:46] 05Gerrit-Migration, 03releng-201516-q4: Phase 1 repository migrations to Differential (goal - end of June 2016) - https://phabricator.wikimedia.org/T130418#2438726 (10greg) [19:06:49] 05Gerrit-Migration, 03releng-201516-q4, 10Wikimedia-Wikimania-Scholarships: Migrate wikimedia-wikimania-scholarships to Differential - https://phabricator.wikimedia.org/T132173#2438724 (10greg) 05Open>03Resolved donezors [20:25:36] 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T138555#2439040 (10mmodell) [21:02:20] 10Beta-Cluster-Infrastructure: Setup a Swift cluster on beta-cluster to match production - https://phabricator.wikimedia.org/T64835#670674 (10AlexMonk-WMF) >>! In T64835#2431784, @hashar wrote: > From some past discussions (and maybe it is recorded on a task): we will want to clean up mass of crap that is in upl... [21:04:29] 06Release-Engineering-Team (Deployment-Blockers), 05Release, 05WMF-deploy-2016-07-05_(1.28.0-wmf.9): MW-1.28.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T138555#2439257 (10mmodell) [21:09:01] 10Beta-Cluster-Infrastructure: Setup a Swift cluster on beta-cluster to match production - https://phabricator.wikimedia.org/T64835#2439297 (10AlexMonk-WMF) So, we need to: * clean up those files above * migrate all the remaining files by adding the swift filebackend to filebackend-labs.php and running maintenan... [21:14:24] Project beta-code-update-eqiad build #111858: 04FAILURE in 1 min 22 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/111858/ [21:15:37] 05Gerrit-Migration, 10MediaWiki-Vagrant: Migrate mediawiki-vagrant to Differential - https://phabricator.wikimedia.org/T131419#2439362 (10greg) We missed this one for Q4, but we still hope to migrate it soon. Here's the plan (from a call today with Mukunda, Chad, Tyler, and myself): timeline: * announce we'r... [21:16:36] 05Gerrit-Migration, 03releng-201516-q4, 15User-greg: Phase 1 repository migrations to Differential (goal - end of June 2016) - https://phabricator.wikimedia.org/T130418#2439385 (10greg) 05Open>03Resolved a:03greg Done, though we didn't get to migrating MW-V. [21:20:05] Project beta-update-databases-eqiad build #9736: 04FAILURE in 4.2 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/9736/ [21:24:22] Project beta-code-update-eqiad build #111859: 04STILL FAILING in 1 min 21 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/111859/ [21:34:20] Project beta-code-update-eqiad build #111860: 04STILL FAILING in 1 min 19 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/111860/ [21:35:17] :( ^ [21:36:31] looks like messed up perms: 21:33:02 error: insufficient permission for adding an object to repository database .git/objects [21:37:15] MaxSem: are you working on deployment-tin currently? [21:37:46] yup, .git is chowned as root:wikidev [21:38:05] yup [21:38:16] lemme see if I fished something [21:38:31] W000000T [21:39:06] why is vendor/.git/index owned by root:root? [21:39:24] cause I was debgging stuff [21:39:44] anyway, I finally have what I was looking for [21:39:54] good [21:39:56] :) [21:41:24] !log Chowned php-master/vendor back to jenkins-deploy [21:41:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:44:37] Yippee, build fixed! [21:44:38] Project beta-code-update-eqiad build #111861: 09FIXED in 1 min 36 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/111861/ [21:44:49] https://phabricator.wikimedia.org/T139469 [21:45:14] nice! [21:48:51] 06Release-Engineering-Team, 06Developer-Relations (Jul-Sep-2016): Write blog post highlighting recent Phabricator improvements - https://phabricator.wikimedia.org/T137727#2439718 (10Qgil) [22:05:01] greg-g, thcipriani - I'm going to push fixes for that and another VE bug that's spamming teh logs - objections? [22:06:17] MaxSem: no objections from me. No deployments are happening right now. Logspam has been a bit over-the-top recently. [22:12:32] +1 [22:17:30] mhm, jenkins doesn't update the submodule... [22:18:52] MaxSem: Yeah, you need to make one manually for VE. [22:18:58] :O [22:20:25] Isn't tech debt fun? [22:20:37] Yippee, build fixed! [22:20:38] Project beta-update-databases-eqiad build #9737: 09FIXED in 36 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/9737/ [22:20:50] I have a file with all the pre-written commands ready to alter/copy/paste when I do it. [22:22:15] https://wikitech.wikimedia.org/wiki/How_to_deploy_code/Core_submodule_update is helpful wrt to ve [22:24:43] I had a file like that but I uploaded it to wikitech ^ so everyone else could benefit [22:25:11] I still remember it, but meeeeeh [22:27:33] I remember when there was confusion over whether we actually wanted Gerrit to perform this automatically for us [22:36:13] James_F, deployed [23:31:04] MaxSem: Did the resulting log errors drop to 0? [23:31:27] aha [23:43:13] 10Continuous-Integration-Infrastructure, 10Zuul: Investigate Zuul 2.1.0-151-g30a433b that stops processing Gerrit events - https://phabricator.wikimedia.org/T137525#2440174 (10Paladox) @hashar could you try reinstalling it on the server as https://phabricator.wikimedia.org/rOPUP29188f7ea628ff8148544923957069bb...