[00:37:20] Yippee, build fixed! [00:37:20] Project selenium-Flow ยป chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #6: 09FIXED in 21 min: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/6/ [01:51:59] 06Release-Engineering-Team, 06Labs, 06Operations, 10wikitech.wikimedia.org: Rename specific account in LDAP, Wikitech, Gerrit and Phabricator - https://phabricator.wikimedia.org/T133968#2250550 (10Dzahn) [03:04:20] PROBLEM - Puppet run on integration-slave-trusty-1024 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [03:05:26] PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [03:05:38] PROBLEM - Puppet run on deployment-tmh01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [03:08:13] PROBLEM - Puppet run on integration-slave-trusty-1018 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [03:12:24] Yippee, build fixed! [03:12:25] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #1060: 09FIXED in 30 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/1060/ [03:43:08] RECOVERY - Puppet run on integration-slave-trusty-1018 is OK: OK: Less than 1.00% above the threshold [0.0] [03:44:26] RECOVERY - Puppet run on integration-slave-trusty-1024 is OK: OK: Less than 1.00% above the threshold [0.0] [03:45:32] RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0] [06:12:47] RECOVERY - Puppet run on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:45:12] 03Scap3, 10scap, 13Patch-For-Review: scap::target shouldn't allow users to redefine the user's key - https://phabricator.wikimedia.org/T132747#2250766 (10mobrovac) [08:45:14] 10Beta-Cluster-Infrastructure, 03Scap3, 10EventBus, 06Services, and 2 others: Set up change-propagation in BetaCluster - https://phabricator.wikimedia.org/T133908#2250765 (10mobrovac) [08:45:37] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [08:46:29] 10Beta-Cluster-Infrastructure, 03Scap3, 10EventBus, 06Services, and 2 others: Set up change-propagation in BetaCluster - https://phabricator.wikimedia.org/T133908#2248554 (10mobrovac) Indeed, now all's good! Thank you @thcipriani and @mmodell. [09:15:11] hashar: ! :) [09:15:24] addshore: hello [09:15:40] could you look at https://gerrit.wikimedia.org/r/#/c/285287/ and https://gerrit.wikimedia.org/r/#/c/285300/ please? :) [09:15:48] been puzzling with https://wikimedia.github.io/ :D [09:15:56] oooooohhh [09:16:27] been suggested on wikitech-l , so I have just forked the one from Twitter did a mass replaces s/twitter/wikimedia/ pushed, done. [09:16:41] going to grab a coffee and I will look at the Gerrit changes above [09:16:45] aweosme [09:17:15] addshore: for https://gerrit.wikimedia.org/r/#/c/285287/2/zuul/layout.yaml,cm [09:17:20] addshore: jshint should really be added to the npm test entry point [09:17:29] (03PS3) 10Hashar: Enable JSHint check job on RevisionSlide [integration/config] - 10https://gerrit.wikimedia.org/r/285287 (https://phabricator.wikimedia.org/T133282) (owner: 10Addshore) [09:17:36] yep, that was already a comment, but then it ont run on check right? [09:17:55] (03CR) 10Hashar: [C: 032] "Will run some outdated JSHint version solely for non whitelisted people." [integration/config] - 10https://gerrit.wikimedia.org/r/285287 (https://phabricator.wikimedia.org/T133282) (owner: 10Addshore) [09:18:03] yeah [09:18:03] =] [09:18:18] for qunit [09:18:25] that is straight forward :-} [09:18:34] hopefully it will pass just fine [09:18:57] (03Merged) 10jenkins-bot: Enable JSHint check job on RevisionSlide [integration/config] - 10https://gerrit.wikimedia.org/r/285287 (https://phabricator.wikimedia.org/T133282) (owner: 10Addshore) [09:19:20] (03PS2) 10Hashar: Enable qunit test template on RevisionSlider [integration/config] - 10https://gerrit.wikimedia.org/r/285300 (owner: 10Addshore) [09:19:28] (03CR) 10Hashar: [C: 032] Enable qunit test template on RevisionSlider [integration/config] - 10https://gerrit.wikimedia.org/r/285300 (owner: 10Addshore) [09:20:24] (03Merged) 10jenkins-bot: Enable qunit test template on RevisionSlider [integration/config] - 10https://gerrit.wikimedia.org/r/285300 (owner: 10Addshore) [09:21:15] addshore: done and deployed [09:21:21] addshore: I am out for a coffee, be back in ~ 10 minutes or so [09:28:55] back [09:32:32] :D [09:32:46] hashar: the big question of the day is, how many lines of codes for test are too many lines of code? :d [09:33:23] addshore: and should we have tests covering our test suite? [09:33:29] xD [09:33:47] so, the WatchedItem & WatchedItemStore stuff we have been working on, we have been writing unit and integration tests for [09:33:58] essentially 100% coverage [09:34:29] https://integration.wikimedia.org/cover/mediawiki-core/master/php/includes/WatchedItem.php.html [09:34:32] https://integration.wikimedia.org/cover/mediawiki-core/master/php/includes/WatchedItemStore.php.html [09:34:49] holy shit [09:34:53] that is ... green ! [09:35:00] ;) [09:35:18] it would be very nice if you could write a summary of how you went to achieve 100% coverage [09:35:36] that might trick people in doing the same for other classes [09:35:42] hashar: its already been done. I think it will be published somewhere next week [09:35:49] 205+134 lines of tests for WatchedItem [09:35:53] neaaaaat [09:35:58] with unit and integration tests split into seperat files [09:36:03] same for WatchedItemStore [09:36:19] a month or so ago we have been talking here about splitting the tests [09:36:24] 2282+195 for those tests [09:36:30] one of the issue we have right now is the huge mediawiki-extensions-* tests [09:36:33] that ends up running everything [09:36:37] which is probably a bit lame / slow [09:36:42] hashar: yeh, so I just went ahead and started the split ;) [09:37:12] the unit tests all only extends PHPUnit_Framework_TestCase (not the mediawiki one) [09:37:15] and for mediawiki I had some patch that extracted true unit tests so one can run them by just running phpunit from the root of the repo [09:37:19] with out even having an installed mw [09:37:20] and in theroy could even be run without setting up mw etc [09:37:25] yup [09:37:25] ;) [09:37:46] so yeh, I went down the filename suffix route so *UnitTest and *IntegrationTest [09:37:59] addshore: lame proof https://gerrit.wikimedia.org/r/#/c/178696/ [09:38:19] for PHPUnit an issue we have is that all of them are loaded via a single hook UnitTestsList [09:38:36] *looks* [09:38:45] but then I guess we can post process the lists based on file name (the convention you are using) [09:39:21] so one can phpunit --filter UnitTest (would be used for jobs involving a single repos) or phpunit --filter IntegrationTest (for jobs having several extensions cloned) [09:39:50] :D [09:42:11] Yippee, build fixed! [09:42:11] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce build #795: 09FIXED in 21 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce/795/ [09:47:45] yumm [09:48:15] addshore: cloning and checking out MediaWiki core in less than 10 seconds !!! https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs-trusty/1/consoleFull [09:48:18] (using a local cache) [09:48:20] WOO [09:48:25] also https://usercontent.irccloud-cdn.com/file/CIzyagWp/ [09:48:38] ohhh [09:48:47] that should be included in whatever mail announce you are crafting :-} [09:50:59] haha :p [09:53:38] and phpcs is way too long :/ [10:01:58] 12 minutes grmblbl [10:03:26] 10Deployment-Systems, 10Monitoring, 06Operations: [ops] Monitor that LVS config and mw_install are in sync - https://phabricator.wikimedia.org/T25662#2250906 (10fgiunchedi) 05Open>03Invalid we have icinga checks for hosts missing in dsh groups for scap, plus pybal talking to etcd, conftool, and all the rest [10:10:33] 10Continuous-Integration-Config, 05Continuous-Integration-Scaling, 10releng-201516-q3, 03releng-201516-q4, and 2 others: [keyresult] Migrate php (Zend and HHVM) CI jobs to Nodepool - https://phabricator.wikimedia.org/T119139#2250921 (10hashar) [10:13:24] 05Continuous-Integration-Scaling, 07WorkType-NewFunctionality: Migrate mediawiki-core-phpcs job to Nodepool - https://phabricator.wikimedia.org/T133976#2250926 (10hashar) [10:15:32] (03PS1) 10Hashar: [mediawiki] experiment phpcs on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/286137 (https://phabricator.wikimedia.org/T133976) [10:16:19] 05Continuous-Integration-Scaling, 13Patch-For-Review, 07WorkType-NewFunctionality: Migrate mediawiki-core-phpcs job to Nodepool - https://phabricator.wikimedia.org/T133976#2250949 (10hashar) I gave it a try on https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs-trusty/ but it is twice slower for so... [10:19:47] (03PS2) 10Hashar: [mediawiki] experiment phpcs on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/286137 (https://phabricator.wikimedia.org/T133976) [10:20:02] found i [10:20:02] hashar: yeh :/ [10:20:15] I may look at trying to optimize some of the custom sniffs perhaps [10:20:15] Zend 5.5 is twice slower than HHVM -} [10:21:15] addshore: years ago I did run code sniffer with profiling and did notice some oddities that were affecting it [10:21:20] might be worth a try [10:21:32] but the low hanging fruit is to have it run solely on files changed in HEAD [10:21:46] so instead of reprocessing everything, you just lint the few files that are altered [10:22:46] 05Continuous-Integration-Scaling, 13Patch-For-Review, 07WorkType-NewFunctionality: Migrate mediawiki-core-phpcs job to Nodepool - https://phabricator.wikimedia.org/T133976#2250974 (10hashar) `mediawiki-core-phpcs` is being run under HHVM via a Zuul parameter. [10:23:14] (03CR) 10Hashar: [C: 032] "Adjusted to have Zuul inject PHP_BIN=hhvm which should cut the runtime in half." [integration/config] - 10https://gerrit.wikimedia.org/r/286137 (https://phabricator.wikimedia.org/T133976) (owner: 10Hashar) [10:24:04] 05Continuous-Integration-Scaling, 10OOjs-UI, 13Patch-For-Review, 07WorkType-NewFunctionality: Migrate OOjs UI npm, npm-run-doc and npm-run-demos CI jobs to Nodepool - https://phabricator.wikimedia.org/T128091#2250975 (10hashar) [10:24:13] (03Merged) 10jenkins-bot: [mediawiki] experiment phpcs on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/286137 (https://phabricator.wikimedia.org/T133976) (owner: 10Hashar) [10:24:56] 05Continuous-Integration-Scaling, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Bump quota of Nodepool instances (contintcloud tenant) - https://phabricator.wikimedia.org/T133911#2250978 (10hashar) [10:24:58] 10Continuous-Integration-Config, 05Continuous-Integration-Scaling, 10releng-201516-q3, 03releng-201516-q4, and 2 others: [keyresult] Migrate php (Zend and HHVM) CI jobs to Nodepool - https://phabricator.wikimedia.org/T119139#2250977 (10hashar) [10:39:01] 05Continuous-Integration-Scaling, 13Patch-For-Review, 07WorkType-NewFunctionality: Migrate mediawiki-core-phpcs job to Nodepool - https://phabricator.wikimedia.org/T133976#2251014 (10hashar) Gave it a try on https://gerrit.wikimedia.org/r/#/c/285389/ . HHVM did the trick and build time went from 13 minutes t... [10:39:09] 05Continuous-Integration-Scaling, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Bump quota of Nodepool instances (contintcloud tenant) - https://phabricator.wikimedia.org/T133911#2248624 (10hashar) [10:58:44] 10Beta-Cluster-Infrastructure, 03Scap3, 10EventBus, 06Services, and 2 others: Set up change-propagation in BetaCluster - https://phabricator.wikimedia.org/T133908#2251069 (10mobrovac) [11:53:15] hello [11:53:34] just wanted to confirm that phab doesn't really have much in the way of localization, right? I see a few languages but not that many. [11:53:52] am helping run a tech workshop along with Asaf in Tamil Nadu tomorrow, and we're doing a session on Phab, so just checking :) [12:15:43] PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [12:35:50] 10scap, 06Operations: Decide on /var/lib vs /home as locations of homedir for mwdeploy - https://phabricator.wikimedia.org/T86971#2251172 (10fgiunchedi) looping in #scap since it also belongs there [12:50:50] RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [12:54:00] Yippee, build fixed! [12:54:01] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #1034: 09FIXED in 21 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/1034/ [13:14:43] (03PS43) 10Zfilipin: WIP Migration of browsertests* Jenkins jobs to selenium* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/274136 (https://phabricator.wikimedia.org/T128190) [13:14:53] (03PS44) 10Zfilipin: WIP Migration of browsertests* Jenkins jobs to selenium* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/274136 (https://phabricator.wikimedia.org/T128190) [13:26:01] (03PS45) 10Zfilipin: Migration of browsertests* Jenkins jobs to selenium* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/274136 (https://phabricator.wikimedia.org/T128190) [13:52:35] (03PS46) 10Zfilipin: Migration of browsertests* Jenkins jobs to selenium* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/274136 (https://phabricator.wikimedia.org/T128190) [13:59:07] 10Browser-Tests-Infrastructure, 05MW-1.27-release-notes, 13Patch-For-Review, 15User-zeljkofilipin, and 2 others: Migration of browsertests* Jenkins jobs to selenium* jobs - https://phabricator.wikimedia.org/T128190#2251323 (10zeljkofilipin) [14:01:08] PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [14:06:09] (03CR) 10Hashar: [C: 031] "Been following the progress all time long. This is good to go, lets make them official!" [integration/config] - 10https://gerrit.wikimedia.org/r/274136 (https://phabricator.wikimedia.org/T128190) (owner: 10Zfilipin) [14:06:54] (03CR) 10Zfilipin: [C: 032] Migration of browsertests* Jenkins jobs to selenium* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/274136 (https://phabricator.wikimedia.org/T128190) (owner: 10Zfilipin) [14:07:43] PROBLEM - Puppet run on deployment-elastic07 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [14:07:57] PROBLEM - Puppet run on deployment-elastic05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [14:08:14] (03Merged) 10jenkins-bot: Migration of browsertests* Jenkins jobs to selenium* jobs [integration/config] - 10https://gerrit.wikimedia.org/r/274136 (https://phabricator.wikimedia.org/T128190) (owner: 10Zfilipin) [14:10:08] 10Browser-Tests-Infrastructure, 05MW-1.27-release-notes, 13Patch-For-Review, 15User-zeljkofilipin, and 2 others: Migration of browsertests* Jenkins jobs to selenium* jobs - https://phabricator.wikimedia.org/T128190#2251332 (10zeljkofilipin) [14:11:07] 10Browser-Tests-Infrastructure, 05MW-1.27-release-notes, 13Patch-For-Review, 15User-zeljkofilipin, and 2 others: Migration of browsertests* Jenkins jobs to selenium* jobs - https://phabricator.wikimedia.org/T128190#2150917 (10zeljkofilipin) [14:13:51] PROBLEM - Puppet run on deployment-mediawiki01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [14:23:25] 10Browser-Tests-Infrastructure, 05MW-1.27-release-notes, 13Patch-For-Review, 15User-zeljkofilipin, and 2 others: Migration of browsertests* Jenkins jobs to selenium* jobs - https://phabricator.wikimedia.org/T128190#2251353 (10zeljkofilipin) [14:24:48] 10Browser-Tests-Infrastructure, 05MW-1.27-release-notes, 13Patch-For-Review, 15User-zeljkofilipin, and 2 others: Migration of browsertests* Jenkins jobs to selenium* jobs - https://phabricator.wikimedia.org/T128190#2157364 (10zeljkofilipin) [14:36:10] 10Beta-Cluster-Infrastructure, 10scap, 10Analytics, 06Services, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1743135 (10elukey) Quick note: we are getting the following failure while trying to deploy: ``` elukey@deployment-tin:/srv/deployment/analytics/aqs/deploy$ deploy 14... [14:36:34] 10Browser-Tests-Infrastructure, 05MW-1.27-release-notes, 13Patch-For-Review, 15User-zeljkofilipin, and 2 others: Migration of browsertests* Jenkins jobs to selenium* jobs - https://phabricator.wikimedia.org/T128190#2251377 (10zeljkofilipin) [14:42:40] RECOVERY - Puppet run on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [14:43:04] RECOVERY - Puppet run on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [14:53:51] RECOVERY - Puppet run on deployment-mediawiki01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:59:19] 10Browser-Tests-Infrastructure, 05MW-1.27-release-notes, 13Patch-For-Review, 15User-zeljkofilipin, and 2 others: Migration of browsertests* Jenkins jobs to selenium* jobs - https://phabricator.wikimedia.org/T128190#2251430 (10zeljkofilipin) [14:59:56] 10Browser-Tests-Infrastructure, 15User-zeljkofilipin: Migration of browsertests* Jenkins jobs to selenium* jobs - https://phabricator.wikimedia.org/T128190#2251431 (10hashar) [15:00:03] zeljkof: I have dropped a bunch of tag from that task [15:00:08] most auto added by bots [15:00:14] hashar: thanks [15:00:28] still gathering some data I need for the e-mail :| [15:00:40] will probably finish it today, but send on Monday morning our time [15:01:25] well given you deleted the old jobs [15:01:25] you want to send it now ;-} [15:01:27] so folks know [15:01:36] lets polish it up now? [15:01:41] does not need to be long [15:03:32] ok, will send it today [15:03:48] I need 10-20 more minutes to finish some data gathering [15:14:23] !log integration: created 'cache-rsync' and 'integration-trusty-1026' , attempting to have Shinken to deprovision them [15:14:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:17:04] thcipriani: hashar would you guys mind putting up a task for appropriate contint roots? we had some CI issues (yesterday?) late hours and realizing hashar is special cased as the only releng person who can poke was eye opening [15:17:12] maybe tyler and mukunda make sense to add? [15:17:13] 10Beta-Cluster-Infrastructure, 10scap, 10Analytics, 06Services, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2251441 (10thcipriani) Blerg. We really need to automate `known_hosts` for scap targets in beta. When connecting to a server for the first time, a fingerprint of the... [15:17:40] chasemp: all of releng is supposed to be in contint-admins though [15:17:44] oh root [15:17:45] ... [15:17:53] let me look again then maybe I'm confused :) [15:18:20] 10Browser-Tests-Infrastructure, 15User-zeljkofilipin: Migration of browsertests* Jenkins jobs to selenium* jobs - https://phabricator.wikimedia.org/T128190#2251446 (10zeljkofilipin) [15:18:25] hmm. In the particular incident 2 days ago, I had the permissions I needed to restart nodepool; however, the problem was nova-compute, which we couldn't do anything about. [15:18:52] evidently the rabbitmq service needed to be restarted for OpenStack. [15:18:58] agreed it wasn't the fix but it still seems like hashar as the only member of contint-roots makes no sense [15:19:22] we should be able to handle everything with the service users [15:19:31] but yeah root is a nice convenience to have on gallium [15:19:40] for strace / tcpdump that sort of things [15:19:48] i was thinking thcipriani is leaning into the CI stuff more and at least him [15:19:52] thta may be wrong [15:19:52] that's true, the one thing I couldn't run was a strace on the process. [15:19:55] but I wanted to mention it [15:20:26] so yeah +1 on adding roar roots :-} [15:21:02] nodepool is a big of a first generation of ops/other responsibility demarcation so it's all novel still [15:21:02] my assumption from the debug output was that it wasn't getting a correct response from nova, but it was making a tcp connection correctly, but I couldn't see any further into the process. [15:21:23] well, I guess think on it and if you put someting up I would +1 :) [15:21:24] yeah we lack access to the openstack infra [15:21:40] maybe we could be granted some lurking access to some of the labs servers [15:21:49] 10Beta-Cluster-Infrastructure, 10scap, 10Analytics, 06Services, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1743135 (10mmodell) {T72792} [15:21:50] being able to tail some log would ease diagnostic [15:22:05] ^ [15:22:28] thcipriani: would you mind filling tasks ? :-D [15:22:42] the root access would probably need discussion in ops meeting [15:22:55] since there is some kind of Kabal approval needed [15:23:14] such privileges escalations are usually handled by ops on their monday meeting [15:23:22] hashar: can do. I'll file one for root on gallium for releng, one for some access to labs servers for tailing logs. [15:23:30] yeah [15:23:35] kk [15:23:38] sounds good [15:23:47] the puppet bit for contint-root is super trivial [15:23:55] the second part I'm less sure on, not opposed just want to see what your thinking [15:24:08] but thcipriani to contint-roots I feel like is a no brainer [15:24:22] having some lurk access on the labs hosts is a different can of worms anyway [15:24:38] ideally we would send all the openstack logs to some central place like a logstash ;-) [15:24:42] yes that would be new territory [15:24:44] also true [15:25:05] The last Puppet run was at Fri Apr 29 13:16:52 UTC 2016 (127 minutes ago). Puppet is disabled. reason not specified [15:25:09] on deployment-changeprop [15:25:24] but I dont think logstash has anything to finely tune who has access to which logs [15:25:42] Krenair: being setup by mobrovac wip [15:25:48] Krenair: he created it yesterday [15:26:11] yeah that's me [15:26:15] will add a note there [15:26:18] you need to set a reason [15:27:09] the access control bits that work with newer versions of kabana are commercial only. :/ elastic's offerings are open core and access control is a pay to play bit [15:27:17] fyi if you disable w/ no reason and then just disable w/ a reason it will won't take [15:27:24] you have to re-enable puppet and then re-disable [15:27:29] i find it annoying [15:27:59] bd808: eekkkk [15:28:11] If you disable without a reason you may find it gets reenabled for no reason :) [15:32:30] hashar: i would like to deploy https://gerrit.wikimedia.org/r/#/c/286109/ today but it's failing on unrelated test failure in echo [15:33:00] think we added echo as a dependency for wikibase + wikidata, then removed it but it still seems to be added (even on master Wikidata) [15:34:18] aude: that is odd [15:34:33] 00:00:00.601 INFO:zuul.CloneMapper: mediawiki/extensions/Echo -> src/extensions/Echo [15:34:36] still injected [15:35:25] yeah [15:35:29] * 0993349 - Remove generic dependency for Wikibase on Echo. (Wed Apr 27 15:02:55 2016 +0200) [15:35:29] * 81a1f1a - Remove Echo from Wikidata dependencies (Tue Apr 26 22:13:56 2016 +0200) [15:35:30] * eb480d8 - Add dependency for Wikibase on Echo. (Tue Apr 26 21:16:38 2016 +0200) [15:35:48] i would rather not overrule jenkins here [15:36:21] e.g. https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm/10722/consoleFull [15:36:39] aude: so Wikidata depends on ContentTranslation which depends on Echo ... [15:36:45] oh [15:36:55] that's evil :P [15:36:57] yeah [15:37:00] content translation depends on wikidata [15:37:05] that is becoming unmanageable [15:37:07] and wikidata depends on it now [15:37:08] 10Beta-Cluster-Infrastructure, 07Puppet: deployment-puppetmaster puppet fails due to "Could not render to Puppet::Network::Format[msgpack]: undefined method `to_msgpack' for #" - https://phabricator.wikimedia.org/T133989#2251508 (10Krenair) [15:37:17] PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:37:35] of course, we could try to fix the test but i can't reproduce it locally [15:37:45] of course :-} [15:37:48] that would be too easy [15:38:18] we need to rethink all that crap [15:38:43] this means content translation is blocked also [15:38:50] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10Ops-Access-Requests: Add RelEng to contint-roots - https://phabricator.wikimedia.org/T133990#2251528 (10thcipriani) [15:38:55] at least mediawiki-extensions-hhvm pass [15:38:55] it has all repos [15:39:03] ah [15:39:07] so maybe we can just drop mwext-testextension-hhvm [15:39:20] thcipriani: Hi! Do you have a minute for T116206? [15:39:20] T116206: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206 [15:39:22] which is Wikidata + a subset of extensions injected by Zuul [15:39:34] while mediawiki-extensions* have a larger set of extensions [15:39:45] elukey: sure, what's happening? [15:39:47] so maybe the echo tests fails because of a missing extension [15:39:55] also curiosu it's only on hhvm that it fails [15:40:16] and it might be a flappy test :( [15:40:24] yeah :( [15:40:54] aude: there is also https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm/10706/artifact/log/mw-ratelimit.log/*view*/ [15:40:55] some rate limiting [15:41:14] oh i see [15:41:26] i was just running the echo tests alone [15:41:29] 10Beta-Cluster-Infrastructure, 07Puppet: Setup puppet exported resources to collect ssh host keys for beta - https://phabricator.wikimedia.org/T72792#2251546 (10mmodell) Generating the keys on the puppetmaster and distributing them as regular files would be easy enough. No need for exported resources. E.g. th... [15:41:41] if i ran Wikibase + Echo, maybe i would get that [15:42:06] + the few other extensions https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm/10706/consoleFull [15:45:01] !log integration: deleting integration-trusty-1026 and cache-rsync . Maybe that will clear them up from Shinken [15:45:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:46:12] i am trying the tests [15:50:43] i think the rate limit thing is not a problem here. it's for a test that tests the rate limit works correctly in wikibase [15:52:20] 05Continuous-Integration-Scaling, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Bump quota of Nodepool instances (contintcloud tenant) - https://phabricator.wikimedia.org/T133911#2251564 (10hashar) We have 14 Trusty instances on the historical CI https://integration.wikimedia.org/ci/label/UbuntuTrusty... [15:53:38] aude: gotta escape myself sorry :( if that is an urgent patch you can just force merge [15:53:48] aude: then send another test one to figure out the issue [15:53:57] but maybe it is a legit failure [15:54:25] ok [15:54:39] aude: the recheck failed on the Zend 5 version this time https://integration.wikimedia.org/ci/job/mwext-testextension-php55/9132/console [15:54:42] definitely a flappy test [15:54:46] :( [15:55:04] i can spend some time to try to fix [15:55:10] both varient shows the ratelimiting [15:55:19] i think that comes from wikibase [15:55:24] maybe you can try comparing the debug log between both build [15:55:28] https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm/10745/ [15:55:30] ok [15:55:33] https://integration.wikimedia.org/ci/job/mwext-testextension-php55/9132/ [15:55:33] good idea [15:55:39] sometime that has the key [15:55:42] (sometime) [15:56:06] rate limit is in both [15:56:09] and maybe EchoUserLocatorTest::testLocateArticleCreator.testLocateArticleCreator is just flappy / wrong / whatever [15:56:15] probably [15:57:03] though it has been around for a while [16:00:37] aude: if all fail poke Erik B. ;) [16:00:46] I am off ! gotta rush back home [16:01:45] ok [16:07:11] RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0] [16:19:42] (03PS1) 10JanZerebecki: Revert "add Wikidata dependency on ContentTranslation" [integration/config] - 10https://gerrit.wikimedia.org/r/286174 [16:20:00] (03CR) 10jenkins-bot: [V: 04-1] Revert "add Wikidata dependency on ContentTranslation" [integration/config] - 10https://gerrit.wikimedia.org/r/286174 (owner: 10JanZerebecki) [16:23:18] (03PS2) 10JanZerebecki: Revert "add Wikidata dependency on ContentTranslation" [integration/config] - 10https://gerrit.wikimedia.org/r/286174 (https://phabricator.wikimedia.org/T133774) [16:23:39] (03CR) 10jenkins-bot: [V: 04-1] Revert "add Wikidata dependency on ContentTranslation" [integration/config] - 10https://gerrit.wikimedia.org/r/286174 (https://phabricator.wikimedia.org/T133774) (owner: 10JanZerebecki) [16:28:38] (03PS3) 10JanZerebecki: Revert "add Wikidata dependency on ContentTranslation" [integration/config] - 10https://gerrit.wikimedia.org/r/286174 (https://phabricator.wikimedia.org/T133774) [16:32:28] (03CR) 10JanZerebecki: [C: 032] Revert "add Wikidata dependency on ContentTranslation" [integration/config] - 10https://gerrit.wikimedia.org/r/286174 (https://phabricator.wikimedia.org/T133774) (owner: 10JanZerebecki) [16:33:08] PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:33:09] (03Merged) 10jenkins-bot: Revert "add Wikidata dependency on ContentTranslation" [integration/config] - 10https://gerrit.wikimedia.org/r/286174 (https://phabricator.wikimedia.org/T133774) (owner: 10JanZerebecki) [16:36:04] 10Beta-Cluster-Infrastructure, 10Flow, 03Collab-Team-2016-Q4: Set up second External Store cluster on Beta - https://phabricator.wikimedia.org/T128417#2251625 (10jmatazzoni) [16:36:07] 10Beta-Cluster-Infrastructure, 10Flow, 03Collab-Team-2016-Q4: Run Flow External Store migration in dry-run mode on Beta - https://phabricator.wikimedia.org/T119567#2251626 (10jmatazzoni) [16:36:10] 10Beta-Cluster-Infrastructure, 10Staging, 10DBA, 03Collab-Archive-2015-2016, and 2 others: Use External Store on Beta Cluster - https://phabricator.wikimedia.org/T95871#2251624 (10jmatazzoni) 05Open>03Resolved [16:37:46] !log restarting zuul for 4e9d180..ebb191f [16:37:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:38:09] s/restarting/reloading/ [16:51:51] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10Ops-Access-Requests: Allow RelEng nova log access - https://phabricator.wikimedia.org/T133992#2251644 (10thcipriani) [16:53:10] 10Beta-Cluster-Infrastructure, 10scap, 10Analytics, 06Services, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2251658 (10elukey) Adding some info after the chat with @thcipriani - to avoid "Agent admitted failure to sign key" we had to add myself and @joal to the deploy-servi... [17:03:09] RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0] [17:07:32] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10Ops-Access-Requests: Add RelEng to contint-roots - https://phabricator.wikimedia.org/T133990#2251528 (10JanZerebecki) The service has no restart defined that is better than start and stop, but it should be added to the sud... [17:08:13] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10Ops-Access-Requests: add nodepool restart to contint-admins - https://phabricator.wikimedia.org/T133990#2251707 (10JanZerebecki) [18:01:27] (03PS1) 10JanZerebecki: Update Wikidata [tools/release] - 10https://gerrit.wikimedia.org/r/286190 [18:01:40] (03PS2) 10JanZerebecki: Update Wikidata [tools/release] - 10https://gerrit.wikimedia.org/r/286190 [18:01:51] (03CR) 10JanZerebecki: [C: 032] Update Wikidata [tools/release] - 10https://gerrit.wikimedia.org/r/286190 (owner: 10JanZerebecki) [18:05:20] (03Merged) 10jenkins-bot: Update Wikidata [tools/release] - 10https://gerrit.wikimedia.org/r/286190 (owner: 10JanZerebecki) [18:22:51] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10Ops-Access-Requests: add nodepool restart to contint-admins - https://phabricator.wikimedia.org/T133990#2251871 (10hashar) Nodepool has the ability to dump a stack dump. One sends `SIGUSR2` and that spurts trace to `/var/l... [18:25:10] thcipriani: kudos on the scap glob thing :-} [18:25:55] I was really wondering while glob.glob() would not work thus looked at its implementation and it turned out to be very similar to what you wrote ;} [18:45:53] ostriches: Would it be OK for me to deploy https://gerrit.wikimedia.org/r/#/c/286126/ today (on a Friday)? [18:48:29] RoanKattouw: Sure have fun [18:49:40] AndyRussG: ---^^ I'll deploy in a few minutes once Jenkins merges the cherry-pick [19:03:56] PROBLEM - Puppet run on deployment-salt is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [19:16:41] matt_flaschen RoanKattouw Hi could you review https://gerrit.wikimedia.org/r/#/c/279957/ please. [19:16:49] Its todo with visualeditor and flow. [19:17:21] RoanKattouw: hi! oops had to be afk for a bit! [19:17:25] I'm here now tho [19:17:40] And oops on my end too, I forgot to follow through [19:17:40] thx!!!! [19:18:07] Going to deploy now [19:18:20] RoanKattouw: did you merge to wmf_deploy? [19:18:25] Or cherry-pick rather? [19:18:37] Yes [19:18:44] Ah K yeah see it now :) [19:18:48] But while waiting for Jenkins to merge it I got distracted [19:19:07] Ah yeah that delay is a feature, not a bug :) [19:19:08] Hmm, and I guess wmf_deploy merges aren't automatically brought into wmf22? [19:19:14] Nope [19:19:19] It seems to work in some branches but not others [19:19:35] Like, some previous wmf.Ns did have that happen automatically [19:19:35] I think there have been varying configs at different times in the recent past, for CN [19:19:41] OK, submodule update time then [19:19:49] :) thx!! [19:19:54] paladox, yes, I reviewed it about a week ago. I'll re-review when I can. [19:20:18] like in the old days.... [19:20:54] matt_flaschen: Thanks, im not sure if i should have moved somes of the resources but i added an if and else in an array [19:21:04] Im also not sure about the js [19:25:08] paladox, make sure you fix your indentation. Otherwise, it's confusing what's inside the if statement. What do you mean "not sure about the js"? [19:25:33] The indentation is wrong in Resources.php. [19:25:37] Oh [19:26:09] paladox, sorry, I mean Hooks.php [19:26:17] Ok [19:38:57] RECOVERY - Puppet run on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [19:42:46] matt_flaschen ive fixed the indentation in Hooks.php. [19:43:09] (03PS1) 10Hashar: (WIP) Zuul deployment with scap? (WIP) [integration/config] - 10https://gerrit.wikimedia.org/r/286207 (https://phabricator.wikimedia.org/T129357) [19:43:28] But with js im not sure if i followed your comments properly. [19:44:07] Im also not sure if you can do ( \ExtensionRegistry::getInstance()->isLoaded( 'VisualEditor' ) ? 'ext.flow.ui.visualeditor' ), [19:44:13] without the : [19:44:57] (03CR) 10Hashar: "First try at building a scap config file for Zuul configuration. I thought about deploying the Zuul source code but that should really be" [integration/config] - 10https://gerrit.wikimedia.org/r/286207 (https://phabricator.wikimedia.org/T129357) (owner: 10Hashar) [20:00:42] (03PS2) 10Hashar: (WIP) Zuul deployment with scap? (WIP) [integration/config] - 10https://gerrit.wikimedia.org/r/286207 (https://phabricator.wikimedia.org/T129357) [20:03:15] thcipriani: been looking at scap doc finally to deploy zuul config file :} [20:03:39] thcipriani: I will need scap to learn about reloading in that specific case (deploying solely config change, not the actual code) eek ;} [20:05:26] 10scap, 10RESTBase-Cassandra, 10cassandra, 03Scap3 (Scap3-Adoption-Phase1): Deploy Cassandra with scap3 - https://phabricator.wikimedia.org/T116340#2252051 (10Eevans) [20:06:37] 03Scap3, 10scap: scap to reload a service instead of restart - https://phabricator.wikimedia.org/T134001#2252054 (10hashar) [20:09:25] oh boy. [20:10:37] that should be possible without too much difficulty (if it isn't already) [20:10:40] the pieces are there. [20:20:16] thcipriani: I can imagine yeah [20:20:25] not much of a priority anyway [20:20:50] I am just sinking time on a friday evening :D [20:21:12] hashar: :D you could maybe cook something up with: https://doc.wikimedia.org/mw-tools-scap/scap3/quickstart/setup.html#command-checks [20:21:47] yeah hmm [20:21:55] I would rather not abuse the checks command [20:22:09] another thing that struck me is scap.cfg being configParser when it could really be yaml [20:22:36] :) That is on the roadmap [20:22:52] https://phabricator.wikimedia.org/T120410 [20:23:07] and for zuul code, the same code is used by two different services zuul-merger and zuul-scheduler which are not on the same devices [20:23:13] but I guess I come up with different scap.cfg files [20:23:33] that is a very common case, actually [20:23:59] the zuul setup has zuul-server + zuul-merger on gallium [20:24:05] but solely zuul-merger on scandiumm [20:24:07] oh my [20:24:10] the latest thinking is expanding the checks.yaml/making that more robust so that it doesn't feel like abusing it for this. [20:24:36] twentyafterfour is working on some debt clean-up that will make this easier. [20:24:41] as usual [20:25:18] and I am wondering if we could add the stage layer in check [20:25:25] let me write an example [20:28:34] hmm no [20:28:36] too lazy [20:28:40] but something like: [20:28:43] checks: [20:29:00] stage: promote [20:29:02] commands: [20:29:08] - nrpe: check_service [20:29:17] - shell: curl http://foo [20:29:25] stage: fetch [20:29:26] commands: [20:29:27] ... [20:29:43] but that is really just cosmetic [20:30:02] the online doc is quite nice ;-} [20:30:08] that seems like a reasonable extension of what we have in place. [20:30:27] I'm glad it's useful! [20:37:40] and since I like over engineering I thought about versioning the scap.cfg schema :d [20:37:47] but really I am overthinking [20:55:12] movie time *wave* and have a good week-end [20:57:29] 10Beta-Cluster-Infrastructure, 07Puppet, 07Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#2252240 (10hashar) [20:57:32] 10Beta-Cluster-Infrastructure, 07Puppet: deployment-puppetmaster puppet fails due to "Could not render to Puppet::Network::Format[msgpack]: undefined method `to_msgpack' for #" - https://phabricator.wikimedia.org/T133989#2252236 (10hashar) 05Open>03Resolved a:03hasha... [21:50:51] (03CR) 10Jforrester: "Title really fills me with confidence. ;-)" [integration/config] - 10https://gerrit.wikimedia.org/r/286207 (https://phabricator.wikimedia.org/T129357) (owner: 10Hashar) [22:11:19] 10Beta-Cluster-Infrastructure: deployment-tmh01.deployment-prep.eqiad.wmflabs refuses mwdeploy ssh connection - https://phabricator.wikimedia.org/T133769#2252544 (10mmodell) 05Open>03Resolved a:03mmodell [22:57:35] 10Beta-Cluster-Infrastructure, 07Puppet: deployment-puppetmaster puppet fails due to "Could not render to Puppet::Network::Format[msgpack]: undefined method `to_msgpack' for #" - https://phabricator.wikimedia.org/T133989#2252795 (10Krenair) I had to uninstall the ruby-msg... [22:59:58] PROBLEM - Puppet run on deployment-ms-be01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [23:03:27] 10Beta-Cluster-Infrastructure, 03Scap3, 10EventBus, 06Services, 15User-mobrovac: Set up change-propagation in BetaCluster - https://phabricator.wikimedia.org/T133908#2252840 (10mobrovac) 05Open>03Resolved After cherry-picking [Gerrit 286153](https://gerrit.wikimedia.org/r/#/c/286153/) on beta and man... [23:33:17] 10Deployment-Systems, 03Scap3, 07WorkType-NewFunctionality: Grab git-rev from config - https://phabricator.wikimedia.org/T133572#2252887 (10thcipriani) [23:39:59] RECOVERY - Puppet run on deployment-ms-be01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:45:36] 10Beta-Cluster-Infrastructure, 13Patch-For-Review: deployment-puppetmaster cherrypicked commits not reporting to graphite regularly enough? - https://phabricator.wikimedia.org/T132997#2252891 (10Krenair) a:03Krenair [23:49:33] 10Beta-Cluster-Infrastructure, 13Patch-For-Review: deployment-puppetmaster cherrypicked commits not reporting to graphite regularly enough? - https://phabricator.wikimedia.org/T132997#2252892 (10Krenair) Patch is cherry-picked on deployment-puppetmaster: https://graphite.wmflabs.org/render/?width=586&height=30...