[00:03:59] <greg-g>	 neat https://phabricator.wikimedia.org/T143536#2632502
[00:04:06] <greg-g>	 (ops-monitoring-bot)
[00:20:01] <wmf-insecte>	 Project beta-update-databases-eqiad build #11326: 04STILL FAILING in 0.63 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/11326/
[00:54:32] <bd808>	 greg-g: we're doing chatops now :)
[00:54:36] <bd808>	 so trendy
[00:54:48] <greg-g>	 hah
[00:55:14] <greg-g>	 next is "jouncebot deploy 1.28-wmf.20 group0"
[00:55:34] <bd808>	 you know....
[00:55:36] <bd808>	 :)
[00:56:04] <bd808>	 I've actually thought about adding something like that to stashbot
[00:56:34] <bd808>	 mostly just to see how hard it would be to get working well
[01:20:01] <wmf-insecte>	 Project beta-update-databases-eqiad build #11327: 04STILL FAILING in 0.63 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/11327/
[02:20:01] <wmf-insecte>	 Project beta-update-databases-eqiad build #11328: 04STILL FAILING in 0.81 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/11328/
[02:29:49] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-JsonConfig, 10MediaWiki-extensions-ZeroBanner, 06Reading-Web-Backlog, and 3 others: Zero phpunit test failure (blocks merges to MobileFrontend) - https://phabricator.wikimedia.org/T145227#2635594 (10jhobs) Just chiming in to say I've seen the change y...
[02:32:39] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-JsonConfig, 10MediaWiki-extensions-ZeroBanner, 06Reading-Web-Backlog, and 3 others: Zero phpunit test failure (blocks merges to MobileFrontend) - https://phabricator.wikimedia.org/T145227#2635596 (10Yurik) Investigating.  ZeroPortal should be dependen...
[02:42:01] <wmf-insecte>	 Project selenium-CirrusSearch » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #149: 04FAILURE in 1 min 0 sec: https://integration.wikimedia.org/ci/job/selenium-CirrusSearch/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/149/
[02:44:31] <wmf-insecte>	 Project beta-code-update-eqiad build #121372: 04FAILURE in 1 min 30 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121372/
[02:45:53] <shinken-wm>	 PROBLEM - Puppet run on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[02:46:29] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[02:54:36] <wmf-insecte>	 Yippee, build fixed!
[02:54:37] <wmf-insecte>	 Project beta-code-update-eqiad build #121373: 09FIXED in 1 min 35 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/121373/
[03:20:01] <wmf-insecte>	 Project beta-update-databases-eqiad build #11329: 04STILL FAILING in 0.62 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/11329/
[03:21:29] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:25:51] <shinken-wm>	 RECOVERY - Puppet run on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:33:18] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-JsonConfig, 10MediaWiki-extensions-ZeroBanner, 06Reading-Web-Backlog, and 3 others: Zero phpunit test failure (blocks merges to MobileFrontend) - https://phabricator.wikimedia.org/T145227#2635626 (10Yurik) Ok, first to clarify how JsonConfig handles r...
[04:19:03] <wmf-insecte>	 Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #141: 04FAILURE in 23 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/141/
[04:20:01] <wmf-insecte>	 Project beta-update-databases-eqiad build #11330: 04STILL FAILING in 0.76 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/11330/
[04:50:20] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-JsonConfig, 10MediaWiki-extensions-ZeroBanner, 06Reading-Web-Backlog, and 3 others: Zero phpunit test failure (blocks merges to MobileFrontend) - https://phabricator.wikimedia.org/T145227#2635678 (10jhobs) Thanks for the detailed explanation Yuri!
[05:20:01] <wmf-insecte>	 Project beta-update-databases-eqiad build #11331: 04STILL FAILING in 0.63 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/11331/
[05:34:56] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-db03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [43200.0]
[06:20:01] <wmf-insecte>	 Project beta-update-databases-eqiad build #11332: 04STILL FAILING in 0.75 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/11332/
[06:56:58] <shinken-wm>	 PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[07:14:37] <shinken-wm>	 PROBLEM - Puppet run on deployment-elastic08 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[07:17:19] <legoktm>	 !log mysql just died on a bunch of slaves (trusty-1013, 1012, 1001)
[07:17:23] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[07:19:45] <legoktm>	 !log sudo salt '*trusty*' cmd.run 'service mysql start', it was down on all trusty salves
[07:19:46] <legoktm>	 slaves*
[07:19:49] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[07:20:01] <wmf-insecte>	 Project beta-update-databases-eqiad build #11333: 04STILL FAILING in 0.69 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/11333/
[07:23:34] <legoktm>	 I self-merged the MF reverts that will fix beta
[07:31:57] <shinken-wm>	 RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0]
[07:48:00] <elukey>	 morning! I'd need to nuke mediawiki01 if nobody oppose
[07:48:09] <elukey>	 ah hashar still not here, will wait
[07:54:37] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0]
[07:55:36] <hashar>	 elukey: I am probably going to delete the Trusty  deployment-mediawiki01
[07:55:43] <hashar>	 seems the jessie one mediawiki04 works fine
[07:56:09] <elukey>	 hashar: o/
[07:56:25] <elukey>	 this would be great so I will be able to create mediawiki05 :P
[07:57:11] <hashar>	 I am digging in logstash logs
[07:58:05] <hashar>	 with stuff like "unknown: Unknown error:60" 
[07:58:46] <hashar>	 elukey: also moritzm is willing to recreate mira/deployment-tin as jessie hosts
[07:59:09] <elukey>	 afaik the keyholder is now able to work with systemd
[07:59:17] <elukey>	 that was the major blocker
[07:59:34] <shinken-wm>	 PROBLEM - Puppet run on deployment-mediawiki02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[08:02:06] <elukey>	 mediawiki02 feels the upcoming kill :D
[08:02:33] <elukey>	 hashar: are these problems appeared after the mediawiki01/04 replacement?
[08:02:47] <elukey>	 s/are/have/
[08:02:58] <moritzm>	 yeah, I can recreate mira/tin as jessie hosts in deployment-prep
[08:03:03] <moritzm>	 was the quota bumped yet?
[08:03:15] <elukey>	 nope..
[08:03:55] <moritzm>	 meta bug seems to be here: https://phabricator.wikimedia.org/T140904
[08:05:17] <elukey>	 I think that releng wants to wait a bit to cleanup not used resources
[08:05:45] <wikibugs>	 10Beta-Cluster-Infrastructure, 10CirrusSearch, 06Discovery, 06Discovery-Search: On Beta cluster, CirrusSearch yields: unknown: Unknown error:60 - https://phabricator.wikimedia.org/T145609#2635904 (10hashar)
[08:05:51] <elukey>	 and two db hosts should get nuked soon to free VCPUs and ram
[08:06:01] <hashar>	 ah so the db hosts
[08:06:12] <hashar>	 the migration was to happen yesterday
[08:06:19] <hashar>	 but one of the production database "exploded" yesterday
[08:06:24] <hashar>	 so Jaime had to context switch
[08:06:49] <hashar>	 so Dan is going to reschedule for next week.  The good news is that he apparently got enough information from Jaime that he is feeling confortable doing it by himsel
[08:06:50] <hashar>	 f
[08:07:17] <hashar>	 also gotta up to MariaDB 10  (vs 5.6 or something that comes from Jessie)
[08:07:44] <elukey>	 okok
[08:08:50] <wikibugs>	 10Beta-Cluster-Infrastructure, 10CirrusSearch, 06Discovery, 06Discovery-Search: On Beta cluster, CirrusSearch yields: unknown: Unknown error:60 - https://phabricator.wikimedia.org/T145609#2635917 (10hashar) All messages come from deployment-mediawiki04  which is a new MW App we have added yesterday and run...
[08:09:02] <hashar>	 poor mw04 does not talk to ElasticSearch somehow :(
[08:09:06] <elukey>	 if you guys are able to nuke some resources (other than dbs) in the next days it would be great to speed up the mw debian rollout 
[08:09:12] <elukey>	 ah really?
[08:09:17] <elukey>	 :/
[08:10:06] <elukey>	 checking
[08:11:07] <hashar>	 moritzm: you might ask to ask for a quota raise. Will be faster than us deleting / migrating instances
[08:11:37] <hashar>	 moritzm: with the reasoning being that  clean up is being done  so the quota raise is rather temporarily.  Then I am not sure how much capacity labs has and apparently it is still pretty crowed
[08:12:26] <moritzm>	 yeah, I'll make a Phab task now, we really shouldn't limit outselves with a quota for a resource like deployment-prep
[08:13:22] <elukey>	 I can see a lot of Sep 14 08:03:07 deployment-mediawiki04 puppet-agent[16304]: Failed to load library 'msgpack' for feature 'msgpack' on mw04 but probably not related
[08:13:32] <elukey>	 puppet works fine afaics
[08:14:05] <hashar>	 yeah that load library 'msgpack' is unrelated.   That is puppet complaining it can't find a ruby module
[08:14:08] <elukey>	 maybe something new from the 3.8 migration?
[08:14:12] <moritzm>	 which kind of quota did we bump into? CPU only or also RAM?
[08:14:14] <elukey>	 ah okok
[08:14:16] <hashar>	 that implements the msgpack format so it can use that instead of json
[08:14:22] <moritzm>	 the task template wants to know
[08:14:24] <hashar>	 msgpack being a kind of binary json
[08:14:33] <shinken-wm>	 RECOVERY - Puppet run on deployment-mediawiki02 is OK: OK: Less than 1.00% above the threshold [0.0]
[08:14:48] <elukey>	 moritzm: atm we have Used 352,256 of 358,400 (GB)
[08:15:00] <elukey>	 so I'd say both VCPU and ram
[08:15:11] <hashar>	 last time I went with a table showing the metrics https://phabricator.wikimedia.org/T133911
[08:15:35] <hashar>	 eg please bump the quota to allow for  4 m1.large instances,  then list out the amount of cpu/ram/disk needed to be added to the quota
[08:15:54] <moritzm>	 ok, how many of the CPUs are we currently using?
[08:16:16] <wikibugs>	 05Continuous-Integration-Scaling, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Bump quota of Nodepool instances (contintcloud tenant) - https://phabricator.wikimedia.org/T133911#2635935 (10hashar)
[08:18:12] <wikibugs>	 05Continuous-Integration-Scaling, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Bump quota of Nodepool instances (contintcloud tenant) - https://phabricator.wikimedia.org/T133911#2248624 (10hashar)
[08:18:14] <wikibugs>	 10Continuous-Integration-Config, 13Patch-For-Review: Create composer-php70 job - https://phabricator.wikimedia.org/T144961#2635937 (10hashar)
[08:19:42] <moritzm>	 https://phabricator.wikimedia.org/T145611
[08:21:07] <wmf-insecte>	 Yippee, build fixed!
[08:21:08] <wmf-insecte>	 Project beta-update-databases-eqiad build #11334: 09FIXED in 1 min 6 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/11334/
[08:24:03] <hashar>	 moritzm: I am tweaking the task thx :)
[08:25:30] <elukey>	 gehel: o/ if you have 5 mins would you mind to check https://phabricator.wikimedia.org/T145609#2635917 ?
[08:25:42] <gehel>	 elukey: sure
[08:25:50] <elukey>	 thankssss
[08:26:27] <elukey>	 so basically we created a new Debian instance to replace mediawiki01
[08:26:39] <elukey>	 is there anything (configuration wise) that we are missing?
[08:26:48] <gehel>	 elukey: I know mostly nothing about the mediawiki side of CIrrus, but dcausse might have more insight
[08:26:58] <elukey>	 ah okok! 
[08:27:14] <elukey>	 afaik this is a mw extension to talk directly with ES right?
[08:27:16] <moritzm>	 hashar: ok :-)
[08:27:20] <elukey>	 not sure for what though :)
[08:28:18] <gehel>	 elukey: yes, the cirrus extension talks to elasticsearch
[08:29:12] <elukey>	 gehel: if you could get some insight from your team later on during the day it would help a ton
[08:30:50] <gehel>	 dcausse is looking right now
[08:32:07] <hashar>	 that is only on deployment-mediawiki04
[08:32:16] <hashar>	 which we have added to the pool yesterday and is running JEssie
[08:32:22] <hashar>	 other app servers are on Trusty
[08:36:19] <dcausse>	 gehel: it's possible that it's due to https certificates
[08:36:36] <dcausse>	 from mw04 I can't curl https://deployment-elastic06.deployment-prep.eqiad.wmflabs:9243
[08:36:54] <gehel>	 that might explain...
[08:37:14] <gehel>	 though I would have expected a more explicit error message from curl.
[08:38:36] <dcausse>	 yes curl error codes are not very explicit: https://curl.haxx.se/libcurl/c/libcurl-errors.html
[08:38:44] <dcausse>	 60 seems to be CURLE_SSL_CACERT
[08:38:46] <gehel>	 elukey: as far as I know, we do deploy the puppet CA certificate on all node. With all the changes around puppet, could that be related?
[08:39:27] <gehel>	 need to go afk for 5 minutes, I'm getting oil delivered for our furnace...
[08:45:15] <gehel>	 back
[08:45:19] <dcausse>	 elukey: it seems that /etc/ssl/certs/7abfb60b.0 should be generated and it's not
[08:45:47] <dcausse>	 on trusty this file exists but not on debian
[08:47:38] <gehel>	 that file *should* be generated by "update-ca-certificates"
[08:47:43] <hashar>	 ohhh
[08:47:55] <gehel>	 the puppet cert itself is present in /etc/ssl/certs/Puppet_Internal_CA.pem
[08:48:30] <hashar>	 dont we get some puppet class to deal/manage the puppet certs?
[08:48:33] <gehel>	 dcausse: I now remember we had the same issue on Vagrant...
[08:49:15] <gehel>	 hashar: we "should*. And since we do get the puppet cert copied to /etc/ssl/certs, it seem that we have at least part of the solution in place
[08:49:37] <hashar>	 at least on Jessie there is:  /etc/ssl/certs/Puppet_Internal_CA.pem -> /usr/local/share/ca-certificates/Puppet_Internal_CA.crt
[08:50:56] <hashar>	 dcausse commented about it  when relforge instances got created:  https://phabricator.wikimedia.org/T142558#2544603
[08:51:08] <gehel>	 the "sslcert::ca" defined type seems to notify "Exec['update-ca-certificates']", so it *should* work, except that it does not...
[08:51:10] <dcausse>	 gehel: yes we had exactly the same issue, but I have no idea how to solve this problem properly, on that vagrant box I just created the link manually
[08:52:30] <wikibugs>	 10Beta-Cluster-Infrastructure, 10CirrusSearch, 06Discovery, 06Discovery-Search: On Beta cluster, CirrusSearch yields: unknown: Unknown error:60 - https://phabricator.wikimedia.org/T145609#2636088 (10hashar) [8:38:39]  <dcausse> yes curl error codes are not very explicit: https://curl.haxx.se/libcurl/c/libc...
[08:52:33] <gehel>	 elukey: do you know anything about update-ca-certificates?
[08:52:42] <hashar>	 the 10 dollars question is: why doesn't it happen in production?
[08:53:11] * gehel whishes he had 10 dollars...
[08:53:37] <hashar>	 at least there is modules/base/manifests/certificates.pp:25:    sslcert::ca { 'Puppet_Internal_CA':
[08:54:04] <dcausse>	 hashar: do you know a production app server that's running jessie?
[08:54:54] * elukey reads the backlog sorry
[08:55:19] <hashar>	 dcausse: I thought joe started reimaging a bunch of them.
[08:55:54] <dcausse>	 ok, I'm curious to see if this file is generated or if curl options are just slightly different
[08:56:07] <wikibugs>	 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 07Epic, 07Tracking: [EPIC] trigger browser tests from Gerrit (tracking) - https://phabricator.wikimedia.org/T55697#2636095 (10phuedx)
[08:56:10] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-RelatedArticles, 06Reading-Web-Backlog, 07Browser-Tests, and 3 others: RelatedArticles browser tests should run on a commit basis - https://phabricator.wikimedia.org/T120715#2636093 (10phuedx) 05Open>03Resolved Per T120715#2634871.
[08:56:54] <hashar>	 dcausse: at least mw2200.codfw.wmnet  is jessie
[08:57:05] <dcausse>	 hashar: thanks, looking
[08:57:14] <hashar>	 and it has the link /etc/ssl/certs/7abfb60b.0 -> Puppet_Internal_CA.pem
[08:57:19] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-extensions-RelatedArticles, 06Reading-Web-Backlog, 07Browser-Tests, and 3 others: RelatedArticles browser tests should run on a commit basis - https://phabricator.wikimedia.org/T120715#2636102 (10phuedx)
[08:57:19] <hashar>	 so guess it is some oddity on beta :(
[08:57:22] <elukey>	 dcausse: all the codfw DC is jessie now :)
[08:57:37] <elukey>	 and we have a lot of jessies in eqiad too
[08:58:21] <elukey>	 https://ganglia.wikimedia.org/latest/?r=week&cs=&ce=&c=Application+servers+eqiad&h=&tab=m&vn=&hide-hf=false&m=os_release&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name
[08:58:32] <elukey>	 all the ones with 4.4.0-1-amd64
[08:58:37] <dcausse>	 elukey: thanks
[08:58:57] <elukey>	 and API
[08:58:57] <elukey>	 https://ganglia.wikimedia.org/latest/?r=week&cs=&ce=&c=API+application+servers+eqiad&h=&tab=m&vn=&hide-hf=false&m=os_release&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name
[08:59:02] <elukey>	 same thing :)
[08:59:05] <elukey>	 let me know if I can help 
[08:59:20] <elukey>	 gehel: no idea bout update-ca-certificates sorry :(
[08:59:32] * gehel is reading man pages...
[09:00:18] <hashar>	 from a puppet debug:
[09:00:32] <hashar>	 Debug: /Stage[main]/Base::Certificates/Sslcert::Ca[Puppet_Internal_CA]
[09:00:34] <hashar>	 subscribes to Exec[update-ca-certificates]
[09:01:33] <gehel>	 update-ca-certificates is the one creating the links in /etc/ssl/certs. It should also call "c_rehash" to generate the hash symlinks, but it seems that it did not
[09:01:33] <elukey>	 elukey@deployment-mediawiki04:~$ ls -l /etc/ssl/certs/ | grep -i puppet
[09:01:36] <elukey>	 lrwxrwxrwx 1 root root     22 Sep 12 13:50 a861a8e4.0 -> Puppet_Internal_CA.pem
[09:01:39] <elukey>	 lrwxrwxrwx 1 root root     22 Sep 12 13:50 c934af94.0 -> Puppet_Internal_CA.pem
[09:01:42] <elukey>	 lrwxrwxrwx 1 root root     55 Sep 12 13:50 Puppet_Internal_CA.pem -> /usr/local/share/ca-certificates/Puppet_Internal_CA.crt
[09:02:12] <hashar>	 oh
[09:02:40] <gehel>	 Oh, so we have different hashes?
[09:02:48] * gehel did not even check, thanks elukey !
[09:03:20] <hashar>	 curl doing  stat("/etc/ssl/certs/7abfb60b.0", 0x7ffe7ec372a0) = -1 ENOENT (No such file or directory)
[09:04:57] <gehel>	 puppet master for deployment-mediawiki04 is not the same puppet master as the one who signed the elasticsearch certs
[09:05:21] <gehel>	 so our CA is broken. Why does it work for other servers?
[09:05:46] <wikibugs>	 10Beta-Cluster-Infrastructure, 10CirrusSearch, 06Discovery, 06Discovery-Search: On Beta cluster, CirrusSearch yields: unknown: Unknown error:60 - https://phabricator.wikimedia.org/T145609#2636112 (10hashar) Copy pasting from IRC investigations with dcausse / gehel / elukey. To repro:  ``` root@deployment-m...
[09:06:04] <hashar>	 beta uses it is own puppetmaster on deployment-puppetmater
[09:06:25] <hashar>	 maybe the CA cert got changed in between?
[09:07:47] <gehel>	 even stranger:
[09:07:51] <gehel>	 https://www.irccloud.com/pastebin/gXOfqA29/
[09:08:12] <gehel>	 certificate seems to be the same on wm03 and 04
[09:08:21] <hashar>	 the md5sum of /usr/local/share/ca-certificates/Puppet_Internal_CA.crt'  is the same on all servers 
[09:09:09] <gehel>	 but the hashes are different:
[09:09:11] <gehel>	 https://www.irccloud.com/pastebin/eKMl4Nca/
[09:09:49] * gehel is lost
[09:11:19] <gehel>	 hashar: on which server do you see a different Puppet_Internal_CA.crt?
[09:11:42] <hashar>	 it is the same everywhere
[09:11:50] <hashar>	 let me double check via salt though ( deployment-saltmaster )
[09:12:00] <hashar>	 err deployment-salt02.deployment-prep.eqiad.wmflabs
[09:12:26] <hashar>	 yeah that is the same everywhere
[09:12:37] <hashar>	 so I am pretty sure I have encountered that exact same issue (hash mismatch) years ago
[09:12:43] <hashar>	 digging in Phabriator for clues
[09:13:52] <gehel>	 The Puppet_Internal_CA.crt on deployment-mediawiki is not the same as production (obvious). So how can it work on some hosts...
[09:17:19] <gehel>	 hashar: the Puppet_Internal_CA.pem symlink was created on Dec 2 2015, but the hash symlinks only on Feb 25 2016 (for deployment-mediawiki0[23])
[09:17:58] <hashar>	 the regular file IGC_A.pem  is from that date
[09:18:02] <hashar>	 so I guess some new CA got added
[09:18:06] <hashar>	 and the symlinks refreshed
[09:18:22] <hashar>	 maybe we can try:  update-ca-certificates --verbose --fresh
[09:18:24] <gehel>	 might be...
[09:18:26] <hashar>	 seems to redo all symlinks
[09:18:48] <elukey>	 let's try
[09:19:01] <gehel>	 hashar: and do it on both mw03 AND 04...
[09:19:08] <elukey>	 doing it 
[09:19:12] <gehel>	 see if we fix everything or break everything...
[09:19:30] <elukey>	 now curl works on 04 :)
[09:19:32] <hashar>	 on mw04 I have  saved  output  of  ls -lrta  to  old.txt
[09:19:34] <hashar>	 ahah
[09:19:35] <hashar>	 magic
[09:19:54] <elukey>	 elukey@deployment-mediawiki04:/etc/ssl/certs$ ls -l /etc/ssl/certs/ | grep -i puppet
[09:19:56] <elukey>	 lrwxrwxrwx 1 root root     22 Sep 14 09:19 7abfb60b.0 -> Puppet_Internal_CA.pem
[09:19:59] <elukey>	 lrwxrwxrwx 1 root root     22 Sep 14 09:19 9bfff5bf.0 -> Puppet_Internal_CA.pem
[09:20:03] <elukey>	 lrwxrwxrwx 1 root root     55 Sep 12 13:50 Puppet_Internal_CA.pem -> /usr/local/share/ca-certificates/Puppet_Internal_CA.crt
[09:20:31] <hashar>	 magic
[09:20:40] <dcausse>	 \o/
[09:20:40] <gehel>	 so update-ca-certificates was not run by puppet? Or not at the right time?
[09:20:42] <hashar>	 so I am wondering what got to be fixed in puppet
[09:21:00] <dcausse>	 the hash can be detected with: openssl x509 -subject_hash  -noout < /etc/ssl/certs/Puppet_Internal_CA.pem
[09:21:10] <hashar>	 and I have no idea how the hash are generated.  But whatever cause them to change should probably trigger a   update-ca-certificates --fresh
[09:22:25] <hashar>	 maybe Brandon would know. Seem he has some experience with the sslcert puppet module
[09:23:00] <elukey>	     if $ensure == 'absent' {
[09:23:00] <elukey>	         # clean up manually -- update-ca-certificates leaves stale symlinks
[09:23:03] <elukey>	         file { "/etc/ssl/certs/${title}.pem":
[09:23:05] <elukey>	             ensure => $ensure,
[09:23:08] <elukey>	             before => File["/usr/local/share/ca-certificates/${title}.crt"],
[09:23:11] <elukey>	         }
[09:23:13] <elukey>	     }
[09:23:18] <gehel>	 currently in puppet update-ca-certificates is called without "--fresh"
[09:23:19] <hashar>	 eeek
[09:23:29] <hashar>	 but Brandon actually had the removal of cert to trigger the update  ( https://gerrit.wikimedia.org/r/#/c/224639/1/modules/sslcert/manifests/ca.pp )
[09:23:33] <hashar>	 guess that got reverted at some point
[09:24:02] <elukey>	 well we can ask also to moritzm :)
[09:24:48] <hashar>	 then the motifygot removed in https://gerrit.wikimedia.org/r/#/c/224828/2/modules/sslcert/manifests/ca.pp
[09:25:49] <gehel>	 hashar, elukey: which one of us takes the task? At this point, it probably does not make sense that the 3 of us continue digging into this
[09:25:58] <wikibugs>	 10Beta-Cluster-Infrastructure, 10CirrusSearch, 06Discovery, 06Discovery-Search: On Beta cluster, CirrusSearch yields: unknown: Unknown error:60 - https://phabricator.wikimedia.org/T145609#2636230 (10hashar)
[09:26:00] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2636229 (10hashar)
[09:26:08] <elukey>	 gehel: we'll keep going thanks for the help!
[09:26:19] <elukey>	 hashar: we are now probably unblocked to nuke mw10
[09:26:22] <elukey>	 *mw01
[09:26:23] <gehel>	 elukey: thanks to you!
[09:26:57] <hashar>	 I have updated the deploy doc
[09:26:58] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Operations, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2586022 (10hashar)
[09:27:02] <hashar>	 yeah mw01 can go now. I am deleting it
[09:27:07] <gehel>	 elukey: on a completely unrelated subject, it seems that you have quite a few commits on varnish kafka. If you have time at some point to explain what it is, I'm interested!
[09:27:49] <wmf-insecte>	 Yippee, build fixed!
[09:27:50] <wmf-insecte>	 Project selenium-CirrusSearch » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #151: 09FIXED in 43 sec: https://integration.wikimedia.org/ci/job/selenium-CirrusSearch/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/151/
[09:27:50] <hashar>	 !log Deleting deployment-mediawiki01 , replaced by deployment-mediawiki04  T144006
[09:27:54] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[09:28:01] <elukey>	 \o/
[09:28:23] <hashar>	 zeljkof: so CirrusSearch selenium jobs on beta were broken due to a mw app server not being able to reach the ElasticSearch (ssl cert issue).  
[09:28:32] <hashar>	 browser tests have a purpose sometime :}}}
[09:28:42] <dcausse>	 :)
[09:28:42] <zeljkof>	 hashar: :D
[09:28:46] <zeljkof>	 sometimes...
[09:28:54] <hashar>	 gehel: dcausse: elukey: thank you very much!  I have update the doc to switch to jessie servers and mention   update-ca-certificate --verbose --fresh
[09:29:03] <dcausse>	 hashar: thanks!
[09:29:16] <hashar>	 will want to track what is going on maybe. Then if production is not impacted it is probably not so urgent
[09:29:26] <gehel>	 note: the puppet code does run update-ca-certificate, but without --fresh
[09:29:47] <hashar>	 well somehow it seems to create a wrong hash for the Puppet_CA_Internal 
[09:29:59] <hashar>	 maybe because the ca is generated very early on in the initial puppet provisionning
[09:30:09] <hashar>	 then something get updated/upgraded later which cause the hash to be different
[09:30:24] <hashar>	 then I have no idea how those hashes vary :(
[09:30:44] <hashar>	 some upgrade / package install seems to miss a --fresh :(
[09:32:12] <hashar>	 elukey: moritzm:  got some quota freed up on beta allowing for one m1.large (4CPU, 8GB RAM).  So either we get a new mira for moritzm or a new mw app server for elukey :D
[09:35:44] <wikibugs>	 10Beta-Cluster-Infrastructure, 10CirrusSearch, 06Discovery, 06Discovery-Search: On Beta cluster, CirrusSearch yields: unknown: Unknown error:60 - https://phabricator.wikimedia.org/T145609#2636256 (10hashar)
[09:37:10] <elukey>	 I am going afk for a bit, if nobody takes the extra space I'll spin up mediawiki05 :)
[09:42:06] <moritzm>	 hashar: the new mira can easily be a m1.medium, I don't see a reason for an m1.large one?
[09:42:11] <moritzm>	 same for the new tin
[09:45:00] <hashar>	 not sure, maybe it needs ton of CPU to refresh the cdb files
[09:45:11] <hashar>	 we can try with a m1.medium and see what happens
[09:56:22] <wikibugs>	 10Beta-Cluster-Infrastructure, 10CirrusSearch, 06Discovery, 06Discovery-Search: On Beta cluster, CirrusSearch yields: unknown: Unknown error:60 - https://phabricator.wikimedia.org/T145609#2636308 (10hashar) From the syslog on deployment-mediawiki04  `update-ca-certificate` is run, then the Puppet_Internal_...
[09:57:35] <hashar>	 dcausse: gehel: elukey: I think I got a good explanation and commented on the task.   I suspect that update-ca-certificates does not detect that the Puppet_Internal_CA.crt file has been changed and does not refresh its symlink.   Would need a new puppet Exec to run it with --fresh.  I have cced Moritz/Brandon to the task
[09:57:42] <hashar>	 thank you again
[09:58:40] <gehel>	 hashar: that seems like a good explanation!
[09:58:58] <hashar>	 with puppet not doing any distinction between new and changed file event
[09:59:06] <hashar>	 so for a new file update-ca-certificates  works
[09:59:11] <hashar>	 but on a change, that does not :}
[09:59:52] <moritzm>	 hashar, elukey: any naming bikeshedding/prefs for the new instance namem, shall I make that mira2 or mira-jessie?
[10:00:00] <hashar>	 would need something like :  notify_new => Exec[update-ca-certificate]  ,  notify_changed => Exec[update-ca-certificates-fresh]
[10:00:36] <hashar>	 moritzm: I dont think we ever added the distro in the name.  I would go with mira2 :)
[10:01:31] <moritzm>	 ok
[10:02:13] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Pywikibot-core, 07Jenkins, and 2 others: pyflakes-py3 and pyflakes-pypy commands fails since 1 June, blocking merges in pywikibot - https://phabricator.wikimedia.org/T137628#2636318 (10Xqt) 05Open>03Resolved a:03Xqt
[10:26:16] <wikibugs>	 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3, 13Patch-For-Review: Create `scap swat` command to automate patch merging & testing during a swat deployment - https://phabricator.wikimedia.org/T142880#2636375 (10mmodell)
[10:29:43] <hashar>	 zeljkof: ho and Mobile team got Selenium tests to trigger from Gerrit for the RelatedArticles mw extension
[10:29:49] <hashar>	 it is all green
[10:29:55] <hashar>	 and involved patches from at least 3 team members
[10:29:56] <hashar>	 \O/
[10:36:18] <wikibugs>	 10Continuous-Integration-Infrastructure, 10SpamBlacklist, 07Documentation: Figure out a system to override default settings when in test context - https://phabricator.wikimedia.org/T89096#2636408 (10hashar) So looks like this has been implemented a year ago via rCIJE377e45b17decdc78e2bfa80fd541cf05b9625779...
[10:36:44] <shinken-wm>	 PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[10:37:07] <wikibugs>	 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10SpamBlacklist, 07Documentation: Figure out a system to override default settings when in test context - https://phabricator.wikimedia.org/T89096#1027386 (10hashar)
[10:37:42] <hashar>	 zeljkof: also found https://phabricator.wikimedia.org/T89096  which was to request a tests/browser/LocalSettings.php to be taken in account by the Jenkins job
[10:38:03] <hashar>	 zeljkof: it is implemented / used.  I think it just needs to be documented somewhere on mw.org .  So I have added it to the Browser-Test-Infrastructure "Documentation" column
[11:04:45] <zeljkof>	 hashar: sorry, on the phone, back in 30 minutes or so
[11:05:55] <shinken-wm>	 PROBLEM - Puppet run on deployment-redis01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[11:11:18] <elukey>	 moritzm: is it ok if I spin up mediawiki05 ?
[11:11:43] <shinken-wm>	 RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0]
[11:15:40] <moritzm>	 I've added mira02, should be fine if there's sufficient quota left
[11:17:23] <elukey>	 ah nice there are still 4 VCPUs, I thought mira02 was going to use it
[11:18:36] <elukey>	 snap I didn't refresh the horizon page
[11:18:44] <elukey>	 only two cores left so I can't spin up the instance :D
[11:18:49] <elukey>	 will wait for the quota
[11:18:56] <jan_drewniak>	 Hey releng folks, quick gerrit question here: is it ok if I change a repo from "Fast Forward Only" to "Merge if Necessary"? this repo: https://gerrit.wikimedia.org/r/#/admin/projects/wikimedia/portals
[11:19:19] <wikibugs>	 10Beta-Cluster-Infrastructure, 07Puppet: Puppet runs fails randomly on deployment-prep / beta cluster hosts - https://phabricator.wikimedia.org/T145631#2636458 (10hashar)
[11:19:29] <hashar>	 jan_drewniak: up to you I guess :}
[11:19:42] <hashar>	 jan_drewniak: that will cause Gerrit to craft a merge commit
[11:19:48] <hashar>	 and thus the git history will no more be linear
[11:20:12] <hashar>	 (gotta do something like:  git log --oneline --first-parent )  to get a linear history
[11:20:56] <jan_drewniak>	 will it always create a merge commit? I have a few commits with no conflicts, but gerrit wants me to rebase them all anyway
[11:21:45] <hashar>	 only if needed
[11:21:55] <hashar>	 you can also force it to always do a merge commit :D
[11:22:19] <hashar>	 let me dig in the gerrit conf history
[11:23:28] <hashar>	 the repositories configuration are kept inside git
[11:23:28] <hashar>	 git fetch  origin refs/meta/config && git checkout FETCH_HEAD
[11:23:42] <hashar>	 * eeb279e - Modified project settings (Wed Aug 3 22:02:54 2016 +0000) <MarcoAurelio>| 
[11:23:47] <hashar>	 [submit]
[11:23:48] <hashar>	 | +     action = fast forward only
[11:23:56] <hashar>	 no clue why that has been done though
[11:25:16] <jan_drewniak>	 Well it's making it a pain to merge the 5 one-line patches I have :P so I'm gonna change it, but maybe "Rebase if  Necessary" is what I'm looking for
[11:25:34] <hashar>	 Marco Aurelio having full rights on wikimedia/portals
[11:26:30] <jan_drewniak>	 don't know why either...
[11:26:35] <hashar>	 jan_drewniak: the doc being at https://gerrit.wikimedia.org/r/Documentation/intro-project-owner.html#submit-type
[11:27:10] <hashar>	 for what it is worth,  CI always tests the changes merge against the tip of the branch
[11:27:20] <hashar>	 and if CR+2  several changes such A then B
[11:27:27] <hashar>	 CI tests A with  tip of branch + A
[11:27:42] <hashar>	 and test B as if A already got merged, so it test:  tip of branch + A + B
[11:28:04] <hashar>	 so merge if needed will be fine, but would have to get devs to remember that the repo might have changed in between
[11:28:26] <hashar>	 also the history is a bit cluttered with all the merges commits.  But  git log --first-parent   usually solves that
[11:29:14] <hashar>	 other submits types are on https://gerrit.wikimedia.org/r/Documentation/project-configuration.html#submit_type
[11:29:30] <moritzm>	 elukey: or you can kill deployment-imagescaler01 temporarily. it's using an old version of the jessie image and I'm not sure it's up-to-date
[11:29:42] <hashar>	 looks like "Rebase If Necessary" will keep the history sane jan_drewniak :}
[11:29:52] <moritzm>	 anyway, labs people should be up soon, moght be resolved in a few hours anyway
[11:30:16] <elukey>	 yeah I'll wait
[11:30:28] <elukey>	 plenty of analytics things to do, I will not be bored :D
[11:30:37] <jan_drewniak>	 hashar: Thanks for the doc link. I personally don't mind the merge commits so much. Also, I think it's standard practice to do a git pull origin master before starting any new work right? 
[11:30:49] <hashar>	 eyah
[11:31:08] <hashar>	 jan_drewniak: but then by the time your change is reviewed and get a CR+2  origin/master might have advanced
[11:31:16] <hashar>	 so CI always tests against tip of branch
[11:31:42] <hashar>	 if there are no tests,   there is a slight change that your change would break due to the tip of branch having advanced
[11:31:54] <hashar>	 so some enforce fast-forward only to remember folks to rebase / verify manually and then +2
[11:32:11] <hashar>	 but you should be fine with Merge as necessary  or maybe try Rebase as necessary
[11:32:51] <hashar>	 The cherry-pick submit type is a bit cumbersome.   Gerrit ALWAYS cherry pick the change + add a bunch of metadata in the commit message 
[11:33:07] <hashar>	 so on submit, each change get a new patchset / different sha1
[11:33:37] <hashar>	 and that ignore the chain of dependencies. So probably better to not use it :D
[11:36:31] <jan_drewniak>	 hashar: gotcha, thanks for all the info :) I'll change it to merge if necessary but I'm submit it for review in case anyone has any objections
[11:37:10] <godog>	 elukey: is there a task for quota increase on deployment-prep? I'd need something similar for prometheus
[11:41:31] <moritzm>	 godog: create a sub task from https://phabricator.wikimedia.org/T140904
[11:45:16] <elukey>	 godog: don't skip the queue!! :P
[11:45:53] <shinken-wm>	 RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:16:58] <wikibugs>	 10Gerrit, 06Repository-Ownership-Requests, 10grrrit-wm: Give paladox a +2 in labs/tools/grrrit - https://phabricator.wikimedia.org/T145416#2636543 (10Paladox)
[12:18:15] <wikibugs>	 10Gerrit: Update gerrit to 2.12.4 - https://phabricator.wikimedia.org/T143089#2636558 (10Paladox) @demon hi, any update on this please?
[12:20:14] <wikibugs>	 10Gerrit, 07Upstream: Unable to see which patch is the parent/dependency of specific patch - https://phabricator.wikimedia.org/T141947#2636561 (10Paladox) I believe how they did it was they did it in order with the patch in the centre and the parent ones at the top and the sub ones at the bottom.
[12:22:20] <godog>	 hahah no I won't elukey, thanks moritzm !
[12:23:31] <wikibugs>	 10Gerrit: Changing the commit description creates out-dated patch. - https://phabricator.wikimedia.org/T54292#578616 (10Paladox) Hi I carnt be 100% sure but I think this was fixed in the upgrade to gerrit 2.12. Please could you re try to see if it is fixed for you please?
[12:28:25] <wikibugs>	 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10SpamBlacklist, 07Documentation: Figure out a system to override default settings when in test context - https://phabricator.wikimedia.org/T89096#2636594 (10zeljkofilipin) In context of browser tests? Some of the pages from **Run tes...
[12:53:22] <wikibugs>	 10Gerrit: Update gerrit to 2.12.4 - https://phabricator.wikimedia.org/T143089#2636646 (10Aklapper) p:05Normal>03Low Lowering priority; not going to happen soon, I'd say.
[12:54:07] <wikibugs>	 10Gerrit, 07Documentation: Update Gerrit docs on mw.org after 2.12 upgrade - https://phabricator.wikimedia.org/T140272#2636651 (10Paladox)  >>! In T140272#2491219, @Aklapper wrote: > Heh, I cannot access https://gerrit.wikimedia.org/r/#/c/9332/ to potentially update screenshots: >> Code Review - Error >> The p...
[13:28:40] <gehel>	 !log upgrading deployment-logstash2 to elasticsearch 2.3.5 - T145404
[13:28:43] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[13:41:01] <wikibugs>	 10Gerrit, 06Developer-Relations: Add a welcome bot to Gerrit for first time contributors - https://phabricator.wikimedia.org/T73357#2636709 (10Aklapper) Ah, great! Thanks for getting another issue (license) out of the way. :)  Does the text in T73357#2629661 look okay? Any changes? Or have we already arrived a...
[13:51:46] <shinken-wm>	 PROBLEM - Puppet run on mira02 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0]
[14:21:36] <ottomata>	 heya hashar, do I need to do anything special to run a service on a port in a jenkins job?
[14:21:54] <ottomata>	 https://integration.wikimedia.org/ci/job/phabricator-jessie-diffs/106/console
[14:22:13] <ottomata>	 https://phabricator.wikimedia.org/diffusion/WKSK/browse/master/test/utils/start_kafka.sh
[14:24:22] <hashar>	 eek
[14:24:46] <hashar>	 let me check on the slave ottomata 
[14:25:08] <hashar>	 tcp6       0      0 :::2181                 :::*                    LISTEN      16256/java      
[14:25:11] <hashar>	 something listening
[14:25:49] <hashar>	 though that process does not respond apparently
[14:26:11] <hashar>	 java    16256 jenkins-deploy   81u  IPv6           22432958      0t0      TCP *:2181 (LISTEN)
[14:26:22] <hashar>	 ottomata: some java is listening
[14:26:38] <ottomata>	 hm ok
[14:26:40] <ottomata>	 so its running
[14:26:43] <ottomata>	 just the check isn't wokring
[14:26:53] <hashar>	 but  "nc 127.0.0.1 2181"  does not yield anything
[14:27:06] <hashar>	 at the top of the console output you have the slave names
[14:27:19] <hashar>	 so you can most probably ssh to it ( integration-slave-jessie-1001.integration.eqiad.wmflabs )
[14:27:22] <hashar>	 and debug there?
[14:28:01] <ottomata>	 hmm, it tries but doesn't let me in
[14:28:08] <ottomata>	 debug1: Offering RSA public key: /Users/otto/.ssh/id_rsa-wmflabs
[14:28:08] <ottomata>	 debug1: Server accepts key: pkalg ssh-rsa blen 279
[14:28:09] <ottomata>	 Connection closed by UNKNOWN
[14:28:24] <hashar>	 who knows :D
[14:28:35] <hashar>	 I think ops have to connect through a specific bastion
[14:28:41] <hashar>	 or maybe you are not added to the project
[14:28:42] <ottomata>	 maybe i have to be in that project?
[14:28:43] <ottomata>	 yeah
[14:28:48] <hashar>	 but should be able to ssh as root via the labs restricted bastion
[14:28:51] <ottomata>	 oh ja?
[14:28:52] <ottomata>	 checking
[14:28:53] <ottomata>	 ...
[14:29:08] <ottomata>	 restricted.bastion.wmflabs.org
[14:29:09] <ottomata>	 ?
[14:29:11] <ottomata>	 that's the one i use
[14:29:23] <hashar>	 yeah
[14:29:33] <hashar>	 I guess you are not in the project :D
[14:29:36] <ottomata>	 debug1: Executing proxy command: exec ssh -A -W integration-slave-jessie-1001.integration.eqiad.wmflabs:22 restricted.bastion.wmflabs.org
[14:29:48] <ottomata>	 can you add me hashar?
[14:30:05] * hashar waits for a bribe
[14:30:13] <ottomata>	 oh i think I can add myself!
[14:30:20] <hashar>	 should be good now
[14:30:47] <hashar>	 $ ldaplist -l group project-integration|grep otto
[14:30:47] <hashar>	 	member: uid=otto,ou=people,dc=wikimedia,dc=org
[14:30:55] <ottomata>	 HMmmM
[14:31:05] <ottomata>	 same thing though
[14:31:06] <ottomata>	 debug1: Offering RSA public key: /Users/otto/.ssh/id_rsa-wmflabs
[14:31:06] <ottomata>	 debug1: Server accepts key: pkalg ssh-rsa blen 279
[14:31:06] <ottomata>	 Connection closed by UNKNOWN
[14:31:09] <hashar>	 fatal: Access denied for user otto by PAM account configuration [preauth]
[14:31:20] <ottomata>	 maybe it takes a sec?
[14:31:25] <ottomata>	 maybe puppet's gotta run?
[14:31:26] <ottomata>	 heh
[14:31:30] <hashar>	 !log Added otto to integration labs project
[14:31:34] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[14:31:36] <hashar>	 na should be all dynamic
[14:31:42] <hashar>	 with ssh using pam_ldap 
[14:31:44] <ottomata>	 OH
[14:31:45] <ottomata>	 in now
[14:31:45] <hashar>	 or something like that
[14:31:47] <ottomata>	 just took a sec  
[14:31:48] <hashar>	 ahh
[14:31:51] <hashar>	 ;-)
[14:31:57] <hashar>	 do you get root ?
[14:31:58] <hashar>	 sudo su -
[14:32:03] <ottomata>	 yes!
[14:32:16] <hashar>	 you are such a hacker
[14:32:17] <hashar>	 :D
[14:32:22] <ottomata>	 haha
[14:32:41] <hashar>	 so java listening on the port 2181
[14:32:47] <hashar>	 but there is nothing replying
[14:32:58] <hashar>	 maybe you need to enable some debug log on Zookeeper side
[14:33:08] <ottomata>	 ja gonna play
[14:33:16] <ottomata>	 also, now that I see how this works, i think installing kafka into ../kafka is probaby not good
[14:33:20] <ottomata>	 gonna do /tmp
[14:33:50] <hashar>	 or we install kafka on the CI slaves ? 
[14:34:03] <ottomata>	 naw i want to be able to wipe it for tests
[14:34:08] <ottomata>	 should be separate
[14:35:33] <ottomata>	 hashar:  hm, weird
[14:35:48] <ottomata>	 i wouldn't have expected my repo to be installed directly in /mnt/jenkins-workspace/workspace/phabricator-jessie-diffs
[14:36:21] <ottomata>	 how does that work if another repo makes a arc diff  to phab?
[14:46:27] <ottomata>	 hashar:  i think these builds will not finish
[14:46:30] <ottomata>	 how do I kill them?
[14:46:33] <ottomata>	 e.g. https://integration.wikimedia.org/ci/job/phabricator-jessie-diffs/106/
[14:50:02] <hashar>	 ottomata: 
[14:50:11] <hashar>	 login Jenkins using your labs account
[14:50:16] <hashar>	 then on the build page at https://integration.wikimedia.org/ci/job/phabricator-jessie-diffs/106/
[14:50:19] <ottomata>	 OYH
[14:50:20] <ottomata>	 wait
[14:50:21] <ottomata>	 i was logged in
[14:50:23] <ottomata>	 but now i'm not
[14:50:23] <hashar>	 next to the red progress bar you will get a [X]
[14:50:27] <hashar>	 magic
[14:50:32] <ottomata>	 oh yes I am
[14:50:35] <ottomata>	 AHHH
[14:50:36] <ottomata>	 the X!
[14:50:37] <hashar>	 or you hit https://integration.wikimedia.org/ci/job/phabricator-jessie-diffs/106/stop :}
[14:50:38] <ottomata>	 thanks
[14:50:50] <hashar>	 on the side bar
[14:50:54] <hashar>	 there is also a Rebuild  link
[14:51:03] <hashar>	 which sometime does the right thing
[14:51:13] <hashar>	 (though sometime some parameters of the job are not reinjected properly)
[14:51:48] <hashar>	 also
[14:51:49] <ottomata>	 yeha, i'm committing changes so those submit new jobs
[14:51:54] <ottomata>	 thanks, sorry i didn't see that 
[14:51:55] <ottomata>	 x
[14:51:59] <hashar>	 the job would archive whatever is under the log/  directory
[14:52:13] <hashar>	 so you can put whatever debug log to be written to  log/   and they will be captured
[14:52:30] <hashar>	 that is relative the the job workspace
[14:52:41] <hashar>	 $WORKSPACE/log   (with $WORKSPACE being set by Jenkins)
[14:53:11] <hashar>	 so in theory you an get zookeeper configured to send extensive log to there,  and you will get them captured and attached to the build page once the job is complete
[14:53:25] <ottomata>	 HMm ok cool
[14:53:26] <ottomata>	 good to know
[14:53:31] <ottomata>	 i think i can make that happen
[14:53:39] <ottomata>	 hashar:  is this $WORKSPACE cleaned up for every build?
[14:53:48] <ottomata>	 i don't understand how it could run multiple builds in parallel
[14:53:58] <ottomata>	 this seems to be workspace
[14:54:00] <ottomata>	  /mnt/jenkins-workspace/workspace/phabricator-jessie-commits
[14:54:30] <ottomata>	 sorry
[14:54:32] <ottomata>	 jessie-diffs
[14:54:58] <hashar>	 yeah
[14:55:14] <hashar>	 if builds are run in parallel on the same node
[14:55:23] <hashar>	 jenkins append @2  , @3 ...
[14:55:26] <ottomata>	 ohhhh
[14:55:27] <ottomata>	 weird ok
[14:55:28] <ottomata>	 got it
[14:55:36] <hashar>	 eg the second build running concurrently will have /mnt/jenkins-workspace/workspace/phabricator-jessie-commits@2
[14:56:11] <hashar>	 which is the case for this job
[14:56:17] <hashar>	 it can run concurrently on the same node
[14:57:13] <ottomata>	 ok cool
[14:57:16] <ottomata>	 makes more sense now
[15:02:03] <hashar>	 moritzm: looks like mira02 is not so happy :(
[15:02:12] <hashar>	 Service unit keyholder-proxy has an upstart script but nothing useful for systemd 
[15:04:08] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Analytics-Kanban, 10Differential, 10EventBus, 10Wikimedia-Stream: Run Kasocki tests in Jenkins via Differential commits - https://phabricator.wikimedia.org/T145140#2636952 (10Ottomata)
[15:04:24] <moritzm>	 hashar: that was fixed in https://gerrit.wikimedia.org/r/310550
[15:04:28] <moritzm>	 puppet run worked fine now
[15:04:45] <moritzm>	 still needs a reboot for the cgroup to take effect, doing that now
[15:05:00] <shinken-wm>	 PROBLEM - Puppet run on deployment-elastic07 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[15:06:01] <ottomata>	 hashar:  is $WORKSPACE an exported env var that is availabe to my scripts?
[15:10:16] <hashar>	 ottomata: yeah
[15:10:36] <hashar>	 full list at https://wiki.jenkins-ci.org/display/JENKINS/Building+a+software+project#Buildingasoftwareproject-JenkinsSetEnvironmentVariables
[15:10:59] <hashar>	 and if you want to check whether your code is running in Jenkins, maybe look whether JENKINS_URL env variable is set
[15:11:07] <hashar>	 then you can use $WORKSPACE
[15:12:17] <grrrit-wm>	 (03PS1) 10Paladox: [CookieWarning] Add Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/310561 
[15:15:10] <hashar>	 zeljkof: joakino is willing to know whether we capture the browser  javascript console
[15:15:22] <hashar>	 seems that it is a browser capability
[15:15:33] <hashar>	 not sure which of the component would handle the capture though
[15:15:52] <hashar>	 or which piece between mediawiki_selenium, selenium or watir needs to be hacked
[15:16:23] <joakino>	 hey
[15:16:41] <joakino>	 yeah it would be very interesting to log JS erros on the page in the log (if any)
[15:16:44] <zeljkof>	 joakino, hashar: uh, I remember playing with that, and it used to be hard
[15:17:07] <zeljkof>	 it is a selenium feature, or it would be, but I do not think selenium had it
[15:17:27] <zeljkof>	 it might be added since the last time I have looked
[15:18:05] <hashar>	 also I noticed page-object got some new releases but we are still at 1.0 :D
[15:18:10] <zeljkof>	 there used to be a phab ticket for it, let me try finding it
[15:18:25] <zeljkof>	 hashar: uh, really, nobody updated? :D
[15:19:03] <zeljkof>	 one of selenium contributors: http://jimevansmusic.blogspot.hr/2013/09/capturing-javascript-errors-in.html
[15:19:11] <zeljkof>	 but from 2013
[15:19:40] <zeljkof>	 stack overflow: http://stackoverflow.com/questions/4189312/capturing-javascript-error-in-selenium
[15:20:30] <paladox>	 hashar Im wondering do wikimedia host any production like website in the european union? Since im wondering doint we have to use https://www.mediawiki.org/wiki/Extension:CookieWarning on the wiki's like en.wikipedia.org?
[15:20:38] <hashar>	 zeljkof: would it be handled by watir-webdriver ?
[15:20:52] <hashar>	 cause I am still very confused by the stack of page-object / watir / selenium
[15:20:58] <zeljkof>	 hashar: no, it has to be done on selenium level
[15:21:25] <zeljkof>	 watir is just an nicer api on top of selenium, no real additional features
[15:21:42] <hashar>	 paladox: Wikimedia does not host anything in EU
[15:21:48] <zeljkof>	 I am working on the docs, will document the stack in the next few weeks
[15:21:49] <paladox>	 Ok
[15:21:59] <hashar>	 paladox: it is all in the US dc: Ashburn and Dallas
[15:22:05] <paladox>	 Oh ok
[15:22:19] <paladox>	 thanks for explaning
[15:22:22] <hashar>	 paladox: but I am not a lawyer :}
[15:22:28] <paladox>	 Ok :)
[15:23:11] <hashar>	 paladox: you can try filling a task for #WMF-Legal  https://phabricator.wikimedia.org/project/view/28/
[15:23:27] <hashar>	 should have the lawyers of the WMF 
[15:23:43] <grrrit-wm>	 (03PS1) 10Paladox: [ArticleFeedbackv5] Remove the rake test [integration/config] - 10https://gerrit.wikimedia.org/r/310567 
[15:23:43] <joakino>	 zeljkof: this one has info too, but with the java api http://stackoverflow.com/questions/25431380/capturing-browser-logs-with-selenium
[15:23:47] <joakino>	 2014
[15:23:47] <paladox>	 Oh
[15:23:54] <paladox>	 zeljkof ^^
[15:23:59] <paladox>	 Removed the rake test
[15:24:28] <grrrit-wm>	 (03PS2) 10Paladox: [ArticleFeedbackv5] Remove the rake test [integration/config] - 10https://gerrit.wikimedia.org/r/310567 (https://phabricator.wikimedia.org/T63588) 
[15:24:33] <zeljkof>	 joakino: yeah, the additional confusion is which features are available for which language bindings :|
[15:24:45] <zeljkof>	 paladox: sorry, I lack context
[15:24:50] <joakino>	 yeah, i imagine
[15:24:53] <hashar>	 well
[15:24:56] <paladox>	 Oh
[15:25:21] <paladox>	 zeljkof from https://gerrit.wikimedia.org/r/#/c/262358/
[15:25:34] <hashar>	 zeljkof: the stack is already documented in mediawiki_selenium README.md :} I just had to rtfm
[15:25:34] <paladox>	 you said to "
[15:25:35] <paladox>	 Agreed, if nobody volunteers to own Selenium tests for this repository, the code should be deleted. It is probably broken anyway.
[15:25:35] <paladox>	 "
[15:25:59] <zeljkof>	 paladox: I see
[15:26:20] <zeljkof>	 and I have deleted some files already? (It has been a long time since I touched that repo.)
[15:26:27] <paladox>	 Oh
[15:26:31] <paladox>	 I will delete the rest
[15:26:32] <paladox>	 now
[15:27:29] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[15:29:20] <godog>	 that's me ^ ditto for -upload
[15:30:45] <godog>	 what's the best way to refresh an existing cherry-picked patch in the puppetmaster, e.g. when a new PS is out?
[15:32:06] <zeljkof>	 joakino, hashar: can not find the task in phab :|
[15:32:20] <grrrit-wm>	 (03PS1) 10Florianschmidtwelzow: Add CookieWarning tests to CI [integration/config] - 10https://gerrit.wikimedia.org/r/310571 
[15:32:20] <joakino>	 :(
[15:33:21] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Add CookieWarning tests to CI [integration/config] - 10https://gerrit.wikimedia.org/r/310571 (owner: 10Florianschmidtwelzow)
[15:33:56] <grrrit-wm>	 (03CR) 10Paladox: "Already done here https://gerrit.wikimedia.org/r/#/c/310561/" [integration/config] - 10https://gerrit.wikimedia.org/r/310571 (owner: 10Florianschmidtwelzow)
[15:34:08] <grrrit-wm>	 (03CR) 10Paladox: [C: 04-1] Add CookieWarning tests to CI [integration/config] - 10https://gerrit.wikimedia.org/r/310571 (owner: 10Florianschmidtwelzow)
[15:34:26] <grrrit-wm>	 (03Abandoned) 10Florianschmidtwelzow: Add CookieWarning tests to CI [integration/config] - 10https://gerrit.wikimedia.org/r/310571 (owner: 10Florianschmidtwelzow)
[15:34:58] <hashar>	 godog: I do a  git rebase -i
[15:35:07] <hashar>	 godog: drop the old patch (remove the line in the editor)
[15:35:13] <hashar>	 then cherry pick
[15:35:24] <grrrit-wm>	 (03CR) 10Florianschmidtwelzow: [CookieWarning] Add Jenkins tests (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/310561 (owner: 10Paladox)
[15:35:53] <hashar>	 godog: if not familiar with  git rebase -i ,  I dont mind giving you a crash course
[15:36:03] <godog>	 hashar: thanks, yeah git rebase -i will do it
[15:36:13] <grrrit-wm>	 (03CR) 10Paladox: [CookieWarning] Add Jenkins tests (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/310561 (owner: 10Paladox)
[15:36:13] <godog>	 heh I'm sadly familiar with it -- but thanks!
[15:36:20] <hashar>	 ;-D
[15:36:27] <godog>	 ok sadly is unfair, it is actually quite nice
[15:36:35] <hoo>	 hashar: https://integration.wikimedia.org/ci/job/mwext-Wikibase-repo-tests-sqlite-php55/973/console
[15:36:45] <hoo>	 Can you help me with that segfault?
[15:36:55] <hashar>	 php is borked!!
[15:37:03] <grrrit-wm>	 (03CR) 10Paladox: "All jshint and jsonlint tests exist, we have deprecated them. But use them for users that aren't whitelisted yet. We will remove them one " [integration/config] - 10https://gerrit.wikimedia.org/r/310561 (owner: 10Paladox)
[15:37:43] <shinken-wm>	 PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[15:38:17] <hashar>	 hoo: I am trying a rebuild https://integration.wikimedia.org/ci/job/mwext-Wikibase-repo-tests-sqlite-php55/
[15:38:41] <hashar>	 I dont think we get much traces on segfaults though :(
[15:38:58] <paladox>	 hashar, maybe we could migrate that test to nodepool trusty?
[15:38:59] <hoo>	 sometime even phpunit's --debug is helpful
[15:39:06] <hoo>	 * times
[15:39:22] <paladox>	 Since it would be a little different to having a static instance. Such as maybe more resources for the test?
[15:39:37] <hashar>	 [Wed Sep 14 11:29:13 2016] php5[3944]: segfault at 3 ip 000000000070c450 sp 00007ffe88060438 error 6 in php5[400000+7f3000]
[15:39:38] <hashar>	 [Wed Sep 14 15:29:22 2016] php5[27542]: segfault at 3 ip 000000000070c450 sp 00007ffc0206fd08 error 6 in php5[400000+7f3000]
[15:39:50] <hashar>	 (from dmesg
[15:40:04] <hashar>	 which is not helpful
[15:40:51] <hashar>	 there are some core files caused by lua
[15:40:53] <hashar>	 but that is about it
[15:42:24] <paladox>	 Could the problem be comming back from before?
[15:44:59] <shinken-wm>	 RECOVERY - Puppet run on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:45:57] <paladox>	 legoktm looks like composer-php70 is a success, no failures :)
[15:47:28] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:47:44] <shinken-wm>	 RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0]
[15:49:45] <hashar>	 hoo not much clues sorry :(
[15:49:51] <hashar>	 maybe we want to enable something to take core
[15:49:53] <hashar>	 then hook in gdb
[15:55:12] <hashar>	 oh
[15:55:16] <hashar>	 I have managed to run chrome :}
[15:56:26] <hashar>	 and I get autocompletion in vim with:  bundle exec vim foo.rb
[15:56:27] <hashar>	 bah
[15:59:16] <hoo>	 hashar: So… how should we proceed with that change? Shall I try to disable the tests added?
[16:01:48] <wmf-insecte>	 Yippee, build fixed!
[16:01:49] <wmf-insecte>	 Project selenium-CentralNotice » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #148: 09FIXED in 48 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/148/
[16:11:37] <hashar>	 hoo: if you land that change, all other changes are going to break equally
[16:12:00] <hoo>	 hashar: Yeah, I know
[16:12:02] <hashar>	 hoo: maybe you can try sending another change
[16:12:11] <hashar>	 that deletes all the tests but the one you introduced
[16:12:18] <hashar>	 see whether it is that specific one that is segfaulting
[16:12:49] <paladox>	 hashar So a US website with UK visitors ought to be asking for consent from those UK visitors according to the UK legislation. 
[16:13:00] <hashar>	 paladox: ask Legal :D
[16:13:01] <paladox>	 https://www.cookielaw.org/faq/
[16:13:04] <paladox>	 Ok
[16:13:45] <hashar>	 hoo: do fill a task anyway.  Would be interesting to figure out with ops how to get a core file 
[16:13:47] <hashar>	 and get it analyzed
[16:14:00] <hashar>	 but at this point, we need an easy way to reproduce  / narrow down the issue
[16:14:28] <hashar>	 hoo: also it passed a couple days ago
[16:22:00] <hoo>	 hashar: Yes, that's weird
[16:23:45] <hashar>	 hoo: gotta leave sorry. But get a task filled and try to narrow it down
[16:23:57] <hoo>	 hashar: With https://gerrit.wikimedia.org/r/#/c/309972/5/repo/tests/phpunit/includes/Notifications/JobQueueChangeNotificationSenderTest.php it passes
[16:24:16] <hoo>	 So it's definitely that test that makes it go boom, but no idea why
[16:24:49] <hashar>	 try to craft a change that removes all tests
[16:24:52] <hashar>	 and add only that file
[16:24:56] <hashar>	 this way it will run faster
[16:25:07] <hashar>	 and if it core dump, that is a good way to easily reproducce
[16:25:17] <hashar>	 else we will want to add bunch of --debug
[16:25:24] <hashar>	 and figure out a way to generate the core file 
[16:29:45] <hashar>	 moauahaha
[16:29:53] <hashar>	 zeljkof: I found out how to retrieve the selenium logs :}
[16:29:57] <hashar>	 and maybe even the browser console 
[16:30:08] <hashar>	 that is all in Selenium::WebDriver::Logs
[16:31:03] <zeljkof>	 :D
[16:32:01] <shinken-wm>	 PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[16:32:11] <hashar>	 zeljkof: that is https://phabricator.wikimedia.org/T94577 :}
[16:35:03] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-db2 is OK: OK: Less than 1.00% above the threshold [3600.0]
[16:49:13] <grrrit-wm>	 (03PS1) 10Hashar: (WIP) Demo to dump browser and selenium logs (WIP) [selenium] - 10https://gerrit.wikimedia.org/r/310583 (https://phabricator.wikimedia.org/T94577) 
[16:51:11] <grrrit-wm>	 (03CR) 10Hashar: "On test failure it would be quite great to grab the browser console log and archive them in the job / dump them to stdout. That would sur" [selenium] - 10https://gerrit.wikimedia.org/r/310583 (https://phabricator.wikimedia.org/T94577) (owner: 10Hashar)
[16:51:50] <hashar>	 marxarelli: ^^^^   seems we can retrieve the Selenium and Browser logs with web driver \O/
[16:51:55] <hashar>	 I am off, but that looks interesting
[16:51:57] <hashar>	 ;_D
[16:52:31] <hashar>	 will come back for the train
[16:53:09] <wikibugs>	 10Browser-Tests-Infrastructure, 13Patch-For-Review: mediawiki_selenium feature to show/capture Selenium WebDriver requests to remote browser. - https://phabricator.wikimedia.org/T94577#1167350 (10Jhernandez) @zeljkofilipin Could we use this task too for tracking capturing the page errors / js logs too? Or shou...
[16:53:22] <shinken-wm>	 RECOVERY - Puppet staleness on deployment-db1 is OK: OK: Less than 1.00% above the threshold [3600.0]
[16:53:57] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] (WIP) Demo to dump browser and selenium logs (WIP) [selenium] - 10https://gerrit.wikimedia.org/r/310583 (https://phabricator.wikimedia.org/T94577) (owner: 10Hashar)
[16:55:17] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta puppetmaster cherry-pick process - https://phabricator.wikimedia.org/T135427#2637478 (10fgiunchedi) >>! In T135427#2628704, @thcipriani wrote: > As I've been working through this and refining I've realized that this proposal would undoubtedly lead...
[17:00:27] <ottomata>	 hey hashar, can I use this embeddable build status for my repo?
[17:00:33] <ottomata>	 https://integration.wikimedia.org/ci/job/phabricator-jessie-diffs/119/badge/
[17:00:42] <ottomata>	 all the links look relative to phabricator-jessie-diffs
[17:00:47] <ottomata>	 instead of my project
[17:00:57] <ottomata>	 and also to the specific build number
[17:01:56] <ottomata>	 oh hash ar is off
[17:02:14] <ottomata>	 hmm, twentyafterfour do you know?
[17:02:58] <ottomata>	 ok, here they are not relative to build number
[17:02:59] <ottomata>	 https://integration.wikimedia.org/ci/job/phabricator-jessie-diffs/badge/
[17:03:11] <ottomata>	 but, still, is phabricator-jessie-diffs
[17:03:29] <ottomata>	 which seems like it would be the build icon for any differential patch
[17:03:38] <ottomata>	 in any reo
[17:03:39] <ottomata>	 repo
[17:11:59] <shinken-wm>	 RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0]
[17:29:42] <twentyafterfour>	 ottomata: right it's gonna be the same job for any build, shared amongs all differential patches
[17:31:12] <ottomata>	 twentyafterfour: that's a shame, so i can't use that build icon, eh?
[17:31:29] <ottomata>	 a secondary q: i'm trying to run coveralls as part of the build
[17:31:30] <ottomata>	 i can do it
[17:31:41] <ottomata>	 but the job needs a private coveralls repo token
[17:31:45] <ottomata>	 that i shouldn't commit to the repo
[17:31:53] <ottomata>	 https://coveralls.zendesk.com/hc/en-us/articles/201347419-Coveralls-currently-supports
[17:32:26] <ottomata>	 is there a way to configure the job in phab or jenkins to provide that to the job, but also keep it secret?
[17:34:09] <wikibugs>	 10Beta-Cluster-Infrastructure, 07Puppet: Puppet runs fails randomly on deployment-prep / beta cluster hosts - https://phabricator.wikimedia.org/T145631#2636458 (10AlexMonk-WMF) I'm guessing most of these would be T131946 (my 9th April 18:49 UTC comment through to my 12th September 05:48 UTC comment, see also P...
[17:36:58] <wikibugs>	 10Continuous-Integration-Infrastructure, 10MediaWiki-Unit-tests, 13Patch-For-Review, 07Regression: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2637675 (10hoo) I also faced segfaults today in https://gerrit.wikimedia.org/r/#/c/30997...
[17:38:58] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs: Please raise quota for deployment-prep - https://phabricator.wikimedia.org/T145611#2637678 (10AlexMonk-WMF)
[17:39:06] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs: Request increased quota for deployment-prep labs project - https://phabricator.wikimedia.org/T145636#2637679 (10AlexMonk-WMF)
[17:40:08] <twentyafterfour>	 ottomata: good question. There isn't any _good_ way to do that but it is possible to configure secrets in jenkins and have them provided in an environment variable
[17:40:23] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Labs: Please raise quota for deployment-prep - https://phabricator.wikimedia.org/T145611#2635940 (10AlexMonk-WMF) From a deployment-prep admin PoV, I'd prefer the quota bump include the full VCPU count and RAM of the instances you'd like to create, rather than leaving us with 0...
[17:40:26] <twentyafterfour>	 the thing is, that can wind up exposed in the build log if you aren't careful
[17:40:43] <ottomata>	 twentyafterfour:  hm, that would work, coveralls takes an env i think
[17:41:07] <twentyafterfour>	 that would be best since the procedure of passing it as a cli arg is how it gets exposed to the log
[17:41:09] <ottomata>	 but, twentyafterfour woudln't that be set for all phab jessie builds then?
[17:41:13] <ottomata>	 since all repos share the same job?
[17:41:29] <twentyafterfour>	 ottomata: yes, though I think we could make a new separate job without too much work
[17:41:38] <twentyafterfour>	 which would get you the badge, also
[17:41:41] <ottomata>	 hm
[17:41:43] <ottomata>	 i guess that's fine
[17:41:51] <ottomata>	 woudl be better if that was how it was set up for all repos though, no?
[17:42:18] <bd808>	 not really. we would have a billion jobs
[17:42:54] <ottomata>	 hm, yeah, but, you want to special case everybody who wants to customize their job?
[17:43:00] <bd808>	 jenkins isn't the greatest thing for scaling. It gets really really slow to startup when you have too many jobs
[17:43:21] <bd808>	 we are already at the "really slow" stage :)
[17:43:35] <paladox>	 twentyafterfour we could do something https://gerrit.wikimedia.org/r/#/c/295396/
[17:43:41] <paladox>	 bd808 that's because of nodepool
[17:43:42] <shinken-wm>	 PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[17:44:07] <paladox>	 but yeh we really need to create a gearmon type plugin for differential so it will equal out the tests
[17:44:11] <twentyafterfour>	 bd808: good point. That's why I tried to make it share a single job definition
[17:44:12] <paladox>	 and not put it in a que
[17:44:22] <bd808>	 paladox: no, it's not. Its because of the way that jenkins stores and manages job state
[17:44:29] <paladox>	 Oh
[17:44:39] <paladox>	 bd808 maybe jenkins 2.x
[17:44:41] <paladox>	 ?
[17:44:59] <twentyafterfour>	 paladox: nodepool is part of the issue but bd808 is talking about startup time not job queue time
[17:45:11] <twentyafterfour>	 part of _a different_ issue
[17:45:13] <twentyafterfour>	 sorry
[17:45:14] <paladox>	 also twentyafterfour if we merge https://gerrit.wikimedia.org/r/#/c/295396/ we could extend it for ottomata to be able to do the custom things.
[17:45:16] <paladox>	 and yep
[17:45:40] <bd808>	 ottomata: if you can live with post-merge coveralls updates then you can jsut run them via travis from the github mirror of the repo
[17:45:44] <twentyafterfour>	 I'm not sure how we can have something custom like credentials / secrets, without defining separate jobs though
[17:46:41] <twentyafterfour>	 ottomata: there is also https://github.com/uber/uberalls
[17:46:42] <bd808>	 or you can rig up the thing that ocg does and push to testing branches on github pre-merge
[17:46:49] <twentyafterfour>	 that's something we should probably look into
[17:47:08] <paladox>	 There's a plugin for gerrit that can bring in pull commits
[17:47:24] <twentyafterfour>	 uberalls = "Track code coverage metrics with Jenkins and Phabricator"
[17:47:40] <twentyafterfour>	 "Code coverage metric storage service. Provide coverage metrics on differentials with Phabricator and Jenkins, just like Coveralls does for GitHub and TravisCI."
[17:47:48] <paladox>	 twentyafterfour we can use pipeline's it seems in jenkins 2.x https://jenkins.io/2.0/
[17:47:59] <twentyafterfour>	 we already have the jenkins part of coveralls installed
[17:48:00] <paladox>	 not sure how to use it though, but i have a test jenkins 2.x up and running
[17:48:03] <ottomata>	 oh looking at uberalls
[17:48:14] <twentyafterfour>	 I mean the jenkins uberalls
[17:49:01] <ottomata>	 oh ja?
[17:49:03] <ottomata>	 so hm
[17:49:05] <ottomata>	 hm
[17:50:01] <ottomata>	 twentyafterfour:  so uberalls service is running?
[17:50:08] <ottomata>	 how do I set access/ set it up for my repo?
[17:50:42] <ottomata>	 does it just work with Cobertura?
[17:51:44] <paladox>	 twentyafterfour why doint we do something similar to travis by parsing a .wikimedia-jenkins.yaml file
[17:52:01] <paladox>	 which can define for example weather to run npm, unit tests for extensions
[17:52:09] <paladox>	 for each repo
[18:00:27] <wikibugs>	 10Beta-Cluster-Infrastructure, 07Puppet: Puppet runs fails randomly on deployment-prep / beta cluster hosts - https://phabricator.wikimedia.org/T145631#2637713 (10Krinkle)
[18:08:34] <paladox>	 twentyafterfour There's a new interface for jenkins https://jenkins.io/blog/2016/05/26/introducing-blue-ocean/
[18:08:56] <paladox>	 More modern
[18:11:45] <shinken-wm>	 RECOVERY - Host deployment-parsoid05 is UP: PING OK - Packet loss = 0%, RTA = 0.84 ms
[18:14:01] <paladox>	 hashar these https://jenkins-contrib-themes.github.io/jenkins-material-theme/ and https://jenkins.io/blog/2016/05/26/introducing-blue-ocean/ look nice for jenkins
[18:14:05] <paladox>	 nice and modern themes
[18:14:07] <paladox>	 :)
[18:17:27] <shinken-wm>	 PROBLEM - Host deployment-parsoid05 is DOWN: CRITICAL - Host Unreachable (10.68.16.120)
[18:23:42] <shinken-wm>	 RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0]
[18:27:19] <wikibugs>	 10Gerrit, 13Patch-For-Review: Gerrit bug search not (naively) working with Phabricator tasks - https://phabricator.wikimedia.org/T85002#2637801 (10Paladox) Compare https://gerrit.wikimedia.org/r/#/q/bug:T85002 to  http://gerrit-test.wmflabs.org/gerrit/#/q/bug:T85002
[18:36:15] <twentyafterfour>	 ottomata: we don't have a service running, just the jenkins plugin installed
[18:36:43] <twentyafterfour>	 paladox: it would be cool to do something similar to travis, with a test config file, indeed.
[18:37:06] <twentyafterfour>	 ottomata: I'm not sure about cobertura
[18:37:59] <twentyafterfour>	 ottomata: https://github.com/uber/phabricator-jenkins-plugin/blob/master/changelog.md  looks like it works with or without cobertura
[18:39:33] <twentyafterfour>	 from https://github.com/uber/phabricator-jenkins-plugin   "If you have Uberalls enabled, enter a path to scan for Cobertura reports."  it seems there may not be too much involved in setting it up but it's not well documented anywhere
[18:43:04] <ottomata>	 hm ok cool
[18:43:29] <twentyafterfour>	 ottomata: so yeah, it seems I need to install the go dependencies on the jenkins machine, presumably the jenkins master but I am not sure... That might not be too easy to get going since I don't have access to the ci infra.
[18:43:51] <twentyafterfour>	 it would need to be packaged and puppetized, and properly monitored and such
[18:44:16] <ottomata>	 aye
[18:44:33] <ottomata>	 twentyafterfour:  it sounds like this all should be a part of a larger project
[18:44:43] <ottomata>	 to make builds customizable and support things like auto coverage generation
[18:44:49] <twentyafterfour>	 it's possible to send coverage to phabricator from cobertura without using uberalls, I think...but then you don't see the _change_ in coverage %
[18:45:24] <hashar>	 paladox: yeah Jenkins 2 has a new ui which is easier to customize. Good to see efforts going on that front
[18:45:43] <twentyafterfour>	 ottomata: indeed, it's not currently in scope for out current work but it is desirable and something I would advocate for
[18:45:53] <twentyafterfour>	 our current work
[18:48:19] <twentyafterfour>	 ottomata or paladox, would one of you like to make a task to suggest creating a wmf-ci.yaml that does something similar to .travis.yaml?  It seems like a good idea to me to move to having a single jenkins job that is very customizable based on a config file in the repo
[18:50:21] <twentyafterfour>	 we should probably look into supporting uberalls as well
[18:52:07] <paladox>	 hashar yeh, but im not sure if we could use one of those on our jenkins? Since it links to google, but i guess we can strip that part out since it is just for fonts
[18:52:12] <paladox>	 But overall nice theme
[18:52:18] <paladox>	 twentyafterfour yeh i can do that
[18:59:21] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Differential, 07Jenkins: Add support for a wmf-ci.yaml type file for wikimedia jenkins - https://phabricator.wikimedia.org/T145669#2637873 (10Paladox)
[18:59:27] <paladox>	 twentyafterfour hashar https://phabricator.wikimedia.org/T145669 :)
[19:05:17] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Differential, 07Jenkins: Add support for a wmf-ci.yaml type file for wikimedia jenkins - https://phabricator.wikimedia.org/T145669#2637926 (10Paladox)
[19:14:48] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Differential, 07Jenkins: Add support for a wmf-ci.yaml type file for wikimedia jenkins - https://phabricator.wikimedia.org/T145669#2637933 (10Paladox)
[19:14:50] <hashar>	 thcipriani: twentyafterfour:  moritz got a new mira on beta cluster
[19:14:58] <hashar>	 using  a Jessie instance named mira02
[19:15:10] <hashar>	 the keyholder is harnessed with systemd :D  
[19:15:14] <thcipriani>	 oh, and keyholder is work...cool
[19:15:18] <hashar>	 bad news: there is now trebuchet package on Jessie
[19:15:24] <hashar>	 s/now/no/
[19:15:25] <thcipriani>	 :((
[19:15:35] <hashar>	 which leads me to
[19:15:42] <hashar>	 wth is still using Trebuchet?
[19:16:38] <thcipriani>	 metrics collectors, restbase (nominally, but I don't think they use it), a bunch of others.../me digs for task
[19:17:16] <thcipriani>	 https://phabricator.wikimedia.org/T129290
[19:20:54] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2637947 (10hashar) [19:14:44] <logmsgbot> !log hashar@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.28.0-wmf.19  Easy. Nothing is h...
[19:21:36] <paladox>	 hashar twentyafterfour https://github.com/travis-ci/travis-build
[19:21:55] <wikibugs>	 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Differential, 07Jenkins: Add support for a wmf-ci.yaml type file for wikimedia jenkins - https://phabricator.wikimedia.org/T145669#2637949 (10Paladox) https://github.com/travis-ci/travis-build
[19:29:11] <wikibugs>	 03Scap3, 06Services, 15User-mobrovac: Allow per-environment scap.cfg overrides - https://phabricator.wikimedia.org/T134156#2637978 (10thcipriani)
[19:32:52] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2637997 (10Matanya)
[19:33:59] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2637999 (10hashar) Reverted due to T145673
[19:40:05] <wikibugs>	 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.19 deployment blockers - https://phabricator.wikimedia.org/T143328#2638069 (10Legoktm)
[19:47:08] <hashar>	 and roll backed
[19:47:16] <hashar>	 due to RTL wikis being screwed  up  https://phabricator.wikimedia.org/T145673  :(
[19:47:37] <greg-g>	 blugh, and that rename issue :/
[19:48:45] <hashar>	 rename issue ?
[19:49:49] <greg-g>	 the other blocker
[19:49:55] <greg-g>	 https://phabricator.wikimedia.org/T145596
[19:50:01] <greg-g>	 Renames getting stuck on mediawiki.org (Sept 13, 2016)
[19:50:50] <hashar>	 how did I miss that ? :((
[21:13:13] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 07Regression: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2638470 (10hashar) p:05High>03Unbreak! So that happens on a bunch of other p...
[21:13:55] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 3 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2638474 (10hashar) + wikibase / wikidata...
[21:26:54] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 3 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2638519 (10hashar) T64623 was the previou...
[21:36:04] <wikibugs>	 10Continuous-Integration-Infrastructure: Disable core dumps generation on CI labs slaves - https://phabricator.wikimedia.org/T96025#2638559 (10hashar) 05Resolved>03stalled I have put it back for T142158
[21:36:26] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 3 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2525383 (10hashar)
[21:36:28] <wikibugs>	 10Continuous-Integration-Infrastructure: Disable core dumps generation on CI labs slaves - https://phabricator.wikimedia.org/T96025#2638563 (10hashar)
[21:36:46] <grrrit-wm>	 (03PS1) 10Hashar: Reapply "Dump corefiles on segfaults" [integration/jenkins] - 10https://gerrit.wikimedia.org/r/310673 (https://phabricator.wikimedia.org/T142158) 
[21:37:27] <hashar>	 !log integration: setting "ulimit -c 2097152" on all slaves due to Zend PHP segfaulting  T142158 
[21:37:31] <qa-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master
[21:37:43] <hashar>	 so I get traces overnight
[21:37:50] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Reapply "Dump corefiles on segfaults" [integration/jenkins] - 10https://gerrit.wikimedia.org/r/310673 (https://phabricator.wikimedia.org/T142158) (owner: 10Hashar)
[21:38:19] <grrrit-wm>	 (03Merged) 10jenkins-bot: Reapply "Dump corefiles on segfaults" [integration/jenkins] - 10https://gerrit.wikimedia.org/r/310673 (https://phabricator.wikimedia.org/T142158) (owner: 10Hashar)
[21:38:57] <hashar>	 salt -v '*' cmd.run 'cd /srv/deployment/integraton/slave-scripts/ && git pull'
[21:38:58] <hashar>	 magic
[21:39:29] <hashar>	 and the segfault debugging guide is in https://phabricator.wikimedia.org/T64623 : D
[21:45:23] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2638603 (10hashar) 2GBytes core dump file...
[21:45:33] <hashar>	 bed crash
[21:45:40] <hashar>	 hope we get some core file tomorrow
[21:46:56] <twentyafterfour>	 paladox: travis-build looks interesting
[21:47:05] <paladox>	 Yep
[21:47:41] <paladox>	 twentyafterfour but we will need to fork it and update it to do mutiple jobs
[21:47:45] <paladox>	 instead of one
[21:47:50] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2638621 (10hashar) Triggered a few builds...
[21:49:04] <hashar>	 twentyafterfour: paladox: I am totally for us to reuse travis-build :]
[21:49:10] <hashar>	 if that is reusable :(
[21:49:10] <paladox>	 Yep :)
[21:49:27] <paladox>	 hashar it's open source just need's changing to do things that we want for wikimedia
[21:49:50] <hashar>	 it might be opensource but not really reusable for us
[21:49:54] <paladox>	 Oh
[21:49:59] <paladox>	 It is done in ruby
[21:50:02] <paladox>	 .rb
[21:50:20] <paladox>	 i have no idea how or where it parses the .travis-ci file in there though
[21:50:51] <paladox>	 twentyafterfour i partially fixed https://phabricator.wikimedia.org/T137354 in gerrit
[21:50:52] <paladox>	 :)
[21:51:19] <paladox>	 I did a workaround, just go to the ,access page click diffusion link, it takes you to the history page and then click browse
[21:51:21] <paladox>	 :)
[21:51:52] <hashar>	  /var/tmp/core/core.integration-slave-trusty-1011.php5.21391.1473889766
[21:51:56] <hashar>	 that was fast :]
[21:51:59] <paladox>	 LOL
[21:52:25] <paladox>	 hashar it is probaly a bug in php5.5 since we use 5.5.9 and could possibly fixed in a newer 5.5 release
[21:54:39] <paladox>	 i will be updating my pc in a min
[21:56:53] * paladox upgrade pc to windows 10 build 14926 https://blogs.windows.com/windowsexperience/2016/09/14/announcing-windows-10-insider-preview-build-14926-for-pc-and-mobile/#UDpmvr5wHF9F06RT.97 https://msdn.microsoft.com/en-gb/commandline/wsl/release_notes
[22:00:52] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2638672 (10hashar) Quickly looked at one...
[22:03:16] <MaxSem>	 aaaaaaa
[22:03:43] <MaxSem>	 I added 2 phones to 2FA, then deleted the wrong one :P
[22:04:30] <wikibugs>	 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2638684 (10hashar) Might be worth trying...
[22:04:44] <hashar>	 MaxSem: how that is easy 
[22:04:59] <hashar>	 MaxSem: just change your identity to meSxaM
[22:05:00] <hashar>	 solved!
[22:05:06] <hashar>	 gotta rebuild all your karma though :(
[22:05:29] <MaxSem>	 hashar, ...and get blocked as an impostor? :P
[22:05:33] <hashar>	 I hope you will get all your accounts recovered :(
[22:05:51] <MaxSem>	 I still have the old phone, just not on me
[22:07:21] <hashar>	 nyway gotta sleep
[22:07:31] <hashar>	 and will try to not dream of gdb and zend stacktraces :D
[23:23:09] <paladox>	 Im back
[23:23:11] <paladox>	 :)
[23:28:54] <wmf-insecte>	 Project performance-webpagetest-wmf build #2572: 04FAILURE in 1 hr 45 min: https://integration.wikimedia.org/ci/job/performance-webpagetest-wmf/2572/
[23:45:58] <grrrit-wm>	 (03PS1) 10Krinkle: mediawiki: Merge parsertests job back into main phpunit job [integration/config] - 10https://gerrit.wikimedia.org/r/310701 
[23:46:35] <grrrit-wm>	 (03PS2) 10Krinkle: mediawiki: Merge parsertests job back into main phpunit job [integration/config] - 10https://gerrit.wikimedia.org/r/310701