[00:20:06] PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [00:22:55] PROBLEM - Puppet run on deployment-phab01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [01:07:55] 10Deployment-Systems, 10Scap (Scap3-MediaWiki-MVP), 10scap2, 10MediaWiki-API, and 4 others: Create a script to run test requests for the MediaWiki service - https://phabricator.wikimedia.org/T136839#3196342 (10thcipriani) It seems like all that's missing to incorporate this into deployments and monitoring... [01:31:09] PROBLEM - Puppet run on deployment-ircd is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [04:07:29] Project selenium-MultimediaViewer » safari,beta,OS X 10.9,BrowserTests build #367: 04FAILURE in 11 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=BrowserTests/367/ [04:17:35] Yippee, build fixed! [04:17:36] Project selenium-MultimediaViewer » firefox,beta,Linux,BrowserTests build #367: 09FIXED in 21 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/367/ [06:30:37] Yippee, build fixed! [06:30:37] Project selenium-Wikibase » chrome,test,Linux,BrowserTests build #336: 09FIXED in 1 hr 50 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=BrowserTests/336/ [08:20:19] 10Gerrit, 06Developer-Relations, 07Documentation: [[mw:Gerrit/Tutorial]] is way too much information for new contributors - https://phabricator.wikimedia.org/T161901#3196711 (10Qgil) [09:42:07] hashar: can you give me some advice for https://gerrit.wikimedia.org/r/#/q/topic:librarize-testing-access-wrapper ? [09:42:51] I'm trying to load a class via composer but it seems the extension tests completely ignore the extension's composer.json [09:55:23] hashar: can you have two patches each Depends-On each other? [09:56:29] it seems like that would solve it, extension tests need the core composer.json change for some reason, and the core patch needs the extension patches because of textextensions [10:05:34] tgr: that would cause a dependency cycle [10:05:44] and that is silently ignored by Zuul [10:07:22] tgr: I guess you need a change that add the library to mediawiki/vendor ? [10:08:52] tgr: and I dont think you have to add the library to the extensions require-dev [10:09:07] if it is in mediawiki/core require-dev that will be available to extensions as well [10:14:36] tgr: i commented on the change [11:05:46] hashar: the instructions for mediawiki/vendor say it should be used with --no-dev though [11:06:59] https://github.com/wikimedia/mediawiki-vendor/commit/4b25c9c61c89b7b5b98bf52e856b03d53483c0ac [11:11:11] 10Browser-Tests-Infrastructure, 07Documentation, 07Easy: mediawiki_selenium should document SauceLabs usage - https://phabricator.wikimedia.org/T98331#3197207 (10Rammanojpotla) I have submitted a patch set at https://gerrit.wikimedia.org/r/#/c/348935/ can anyone review it please? [11:26:43] 10Browser-Tests-Infrastructure, 07Documentation, 07Easy, 07Software-Licensing, 15User-zeljkofilipin: Ruby gem documentation should state license - https://phabricator.wikimedia.org/T94001#3197244 (10Rammanojpotla) I have pushed a patch set to https://gerrit.wikimedia.org/r/#/c/348222/7 can anyone review... [11:31:57] tgr: my bad sorry :( [11:33:07] hashar: so i should be OK if I just make all extension patches depend on the core patch, and force the core patch through after evereything else has +2, right? [11:36:25] tgr: na that will break for sure [11:36:40] seems the lib is not added to the autoloader : https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm-jessie/11297/artifact/log/composer.autoload_files.php.txt/*view*/ [11:36:48] or maybe I am mistaken [11:37:47] I am all confused sorry :( [11:38:10] that autoloader seems to be generated by black magic from something that's not the extension's compser.json [11:39:18] black magic being https://github.com/wikimedia/integration-jenkins/blob/master/bin/mw-fetch-composer-dev.sh [11:40:10] in any case, when I make an extension patch depend on the core patch, it seems to pass [11:40:40] except I get the "This change depends on a change that failed to merge. [11:40:44] " message now [11:41:20] I suppose I need to hand-remove the -2 from the core patch first [11:42:19] I have also no clue what to do about https://gerrit.wikimedia.org/r/#/c/349092/ which does not fail for some reason, but it's tested with 1.27 core so the core patch does not help [11:42:39] I guess I could just backport it to 1.27 [11:43:27] that is because DonationInterface master branch is really targeting 1.27 [11:44:25] tgr: could the MediaWiki core change still define \\TestingAccessWrapper [11:44:42] that would end up being a back compatibility class loading \\Wikimedia\\TestingAccessWrapper ? [11:45:01] so this way you dont have to immediately fix up all the extensions [11:45:31] I could add a class alias, no idea how well that works with tooling though [11:48:14] the Phab patch notifier bot is still dead) [11:53:43] I guess I can just split adding \Wikimedia\TAW and removing \TAW into separate patches [11:58:26] RECOVERY - Puppet staleness on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [3600.0] [12:06:30] tgr: yeah back compatibility is probably the easiest path [12:06:42] you will get less headaches and most probably end up merging the patches faster overall [12:22:51] Project selenium-GettingStarted » firefox,beta,Linux,BrowserTests build #372: 04FAILURE in 49 sec: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/372/ [12:36:27] 10Browser-Tests-Infrastructure, 07Documentation, 07Easy: audit/update headers in files - https://phabricator.wikimedia.org/T69141#3197384 (10Rammanojpotla) @zeljkofilipin as mentioned above in mediawiki-PdfHandler have a license in COPYING file(i.e GNU GPL license) so I have to modify it with the license in... [12:52:27] hashar: o/ - do we have graphite for deployment-prep hosts ? [12:52:37] elukey: yes ! [12:52:57] because I discovered that we might have TCP metrics to double check for the RST change :) [12:53:08] in diamond? [12:53:13] yep [12:53:26] so they are sent to http://graphite.wmflabs.org hopefull [12:53:34] assuming the statsd host is properly set in puppet [12:53:47] for mediawiki metrics are logged under BetaMediaWiki.* [12:53:55] and diamond metrics should be under deployment-prep.* [12:55:10] elukey: and potentially the graphite labs store is reachable from https://grafana-labs.wikimedia.org/ [12:55:17] which is a grafana specially for labs metrics [12:56:29] elukey: ae you referring to servers.XXX.network.connections.{TIME_WAIT,ESTABLISHED} etc? [12:56:38] I am not sure they are enabled on prod [12:57:22] deployment-prep.deployment-jobrunner02.tcp.* [12:57:25] there you go :) [12:57:29] https://graphite-labs.wikimedia.org/render/?width=800&height=600&target=deployment-prep.deployment-jobrunner02.network.connections.ESTABLISHED&target=deployment-prep.deployment-jobrunner02.network.connections.TIME_WAIT [12:57:44] yours is better :) [12:57:48] ahh TCP [12:58:17] I am pretty sure I looked for those network.connections.TIME_WAIt metrics on some prod host [12:58:19] and the metric ended up being empty [12:58:36] yeah the diamond collector is not everywere :( [12:59:01] guess because that generates too many stats [12:59:24] ah snap but I need to check the metric on the rdb hosts -.- [13:00:51] nope estabreset doesn't how anything [13:01:29] I am wondering though why so many TIME_WAITs listed for deployment-jobrunner02 [13:02:12] I mean, I expected less [13:04:36] elukey: deployment-jobrunner02:~$ ss -n --all|grep -c TIME-WAIT [13:04:36] 170 [13:05:21] breakdown by destination port is roughly: [13:05:28] 49 to port 3306 (mysql) [13:05:37] 27 for redis 6379 [13:05:51] 27 for each of 9000 and 9005 which seems to be redis related [13:05:56] yep yep I thought they were less... [13:06:11] and some 26 more for 11212 bah [13:06:19] this might be REALLY good to test in deployment-prep persistent connections for deployment-jobrunner02 [13:06:38] the thing is persistent connection got disabled [13:06:50] but we have no idea why / how or what it fixed [13:07:05] me too [13:07:18] I have lost track of all the mess [13:07:29] but given the server reuses time waiting connections [13:07:34] I dont think it is much an issue [13:07:41] nono the client do it [13:08:11] (ah port 11212 is nutcracker) [13:08:59] and google pointed me at TCP Fast Open ( https://wikitech.wikimedia.org/wiki/TCP_Fast_Open ) what ever it can be [13:09:36] I see two issues 1) Redis needs to handle a lot of sockets, one for each command 2) Redis' tcp listen has a maximum backlog of 511, so open a socket for each command means that we can risk to overflow it [13:10:14] 3) even if we reuse timewaits we are at risk of exhausting (potentially) tcp local ports used on the clients (like jobrunners) [13:12:31] tell me more about "Redis' tcp listen backlog of 511" :-} [13:13:11] brb in ~1 hour that I have to upgrade Piwik :) [13:16:27] elukey: and https://tweaked.io/guide/kernel/ has mooooaar tunables :D [13:23:49] (03PS1) 10Volans: [operations/software/etcd-mirror] add tox-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/349215 [13:33:30] (03CR) 10Hashar: [C: 032] [operations/software/etcd-mirror] add tox-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/349215 (owner: 10Volans) [13:35:26] (03Merged) 10jenkins-bot: [operations/software/etcd-mirror] add tox-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/349215 (owner: 10Volans) [13:46:32] Yippee, build fixed! [13:46:32] Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #373: 09FIXED in 2 min 31 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/373/ [13:57:34] tgr: could tests/phpunit/includes/TestingAccessWrapper.php be made to extend \Wikimedia\TestingAccessWrapper ? [13:57:43] I guess I can try in a child change and see what happens [13:59:50] hashar: it could, but the way the patches are now works as well [14:00:17] (add library -> switch extensions -> remove core class) [14:04:35] yup [14:05:10] though I would rather make it obvious the class is gone by changing the class in core to : class TestingAccessWrapper extends \Wikimedia\TestingAccessWrapper { } [14:05:28] the good thing is now your change is back compatible at least [14:05:47] at the price of code duplication between the library and core built-in class (which the one-liner above would solve) [14:15:58] (03PS3) 10Hashar: Add dependency for SimpleSAMLphp on PluggableAuth. [integration/config] - 10https://gerrit.wikimedia.org/r/349017 (owner: 10Cicalese) [14:16:53] (03CR) 10Hashar: [C: 032] "Yup that looks about right. Later on the job will end up installing directly based on extension.json list of requirements." [integration/config] - 10https://gerrit.wikimedia.org/r/349017 (owner: 10Cicalese) [14:18:09] (03Merged) 10jenkins-bot: Add dependency for SimpleSAMLphp on PluggableAuth. [integration/config] - 10https://gerrit.wikimedia.org/r/349017 (owner: 10Cicalese) [14:31:06] (03CR) 10Cicalese: "Great! Thank you!" [integration/config] - 10https://gerrit.wikimedia.org/r/349017 (owner: 10Cicalese) [14:38:48] (03CR) 10Thcipriani: [C: 04-1] "There are a few alternatives here:" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/336960 (owner: 10Ejegg) [15:17:00] (03PS6) 10Hashar: Decouple repos from mediawiki gate queue [integration/config] - 10https://gerrit.wikimedia.org/r/313028 (https://phabricator.wikimedia.org/T107529) [15:21:26] (03CR) 10jerkins-bot: [V: 04-1] Decouple repos from mediawiki gate queue [integration/config] - 10https://gerrit.wikimedia.org/r/313028 (https://phabricator.wikimedia.org/T107529) (owner: 10Hashar) [15:22:23] (03PS7) 10Hashar: Decouple repos from mediawiki gate queue [integration/config] - 10https://gerrit.wikimedia.org/r/313028 (https://phabricator.wikimedia.org/T107529) [15:25:26] (03CR) 10Hashar: "So I think I got it "right" now. The mediawiki repositories have more specific jobs 'mwgate-*' in gate-and-submit which isolated those re" [integration/config] - 10https://gerrit.wikimedia.org/r/313028 (https://phabricator.wikimedia.org/T107529) (owner: 10Hashar) [15:38:00] (03PS8) 10Hashar: Decouple repos from mediawiki gate queue [integration/config] - 10https://gerrit.wikimedia.org/r/313028 (https://phabricator.wikimedia.org/T107529) [15:38:34] (03CR) 10Hashar: "Sorry for the spam Timo. I have fixed a few mistakes that happened during the rebase. Will review the layout diff once more." [integration/config] - 10https://gerrit.wikimedia.org/r/313028 (https://phabricator.wikimedia.org/T107529) (owner: 10Hashar) [15:53:20] Project selenium-MobileFrontend » firefox,beta,Linux,BrowserTests build #397: 04FAILURE in 31 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/397/ [15:55:36] Project beta-scap-eqiad build #151777: 04FAILURE in 1 min 52 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/151777/ [15:59:29] PROBLEM - Puppet run on deployment-tin is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:05:31] Project beta-scap-eqiad build #151778: 04STILL FAILING in 1 min 50 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/151778/ [16:05:47] (03PS9) 10Hashar: Decouple repos from mediawiki gate queue [integration/config] - 10https://gerrit.wikimedia.org/r/313028 (https://phabricator.wikimedia.org/T107529) [16:12:13] thcipriani: twentyafterfour the poor scap on beta fails with : bash: /srv/deployment/scap/scap/bin/scap: No such file or directory [16:12:24] eg: [16:12:25] on deployment-mediawiki04.deployment-prep.eqiad.wmflabs returned [127]: bash: /srv/deployment/scap/scap/bin/scap: No such file or directory [16:12:50] heh, well that's not the correct script path, should be /usr/bin/scap [16:13:26] RECOVERY - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is OK: OK: Less than 100.00% above the threshold [0.0] [16:13:30] weird ok [16:13:34] * twentyafterfour will fix [16:15:00] hashar: sorry got lost in upgrades :) [16:15:22] any chance that we could turn on persistent connection for a couple of days on jobrunner02? [16:15:28] Project beta-scap-eqiad build #151779: 04STILL FAILING in 1 min 47 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/151779/ [16:15:31] *connections [16:15:55] and then observe https://graphite-labs.wikimedia.org/render/?width=800&height=600&target=deployment-prep.deployment-jobrunner02.network.connections.TIME_WAIT [16:21:33] (03CR) 10Ejegg: "Thanks for the review thcipriani! We're hoping to avoid having to maintain a separate repo at all, and just get the latest buildkit (i.e.," [integration/config] - 10https://gerrit.wikimedia.org/r/336960 (owner: 10Ejegg) [16:22:22] (03CR) 10Hashar: "I will review the diff against later. I am using the zuul-layout-diff files before.txt and current.txt and replace s/mwgate-// in current" [integration/config] - 10https://gerrit.wikimedia.org/r/313028 (https://phabricator.wikimedia.org/T107529) (owner: 10Hashar) [16:22:28] Project beta-scap-eqiad build #151780: 04STILL FAILING in 1 min 53 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/151780/ [16:25:24] Project beta-scap-eqiad build #151781: 04STILL FAILING in 1 min 45 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/151781/ [16:32:31] qq - can I temporaly disable puppet on deployment-tin to remove jobrunner02 from the scap dsh? [16:32:40] or is it a bad moment ? :) [16:35:36] Yippee, build fixed! [16:35:37] Project beta-scap-eqiad build #151782: 09FIXED in 1 min 51 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/151782/ [16:36:46] elukey: should be fine, don't think anyone is doing anything crazy right now [16:40:47] thanks :) [16:41:44] !log temporary disable puppet on deployment-tin to remove jobrunner02 from scap dsh; manually enable persistent connection between it and rdb redis hosts [16:41:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:52:22] does CI support java 8? [17:03:05] mmm my hack does not really work properly [17:03:37] I edited /srv/mediawiki/wmf-config/jobqueue.php [17:05:25] SMalyshev: we install openjdk-8-jdk on jessie instances [17:05:39] PROBLEM - Puppet errors on buildlog is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:05:44] PROBLEM - Puppet errors on swift is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:06:32] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:06:44] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:07:20] PROBLEM - Puppet errors on swift-storage-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:07:24] PROBLEM - Puppet errors on deployment-ircd is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:07:48] thcipriani: so I'm getting: 00:20:19 javac: invalid target release: 1.8 [17:07:59] thcipriani: is there anything I need to enable somewhere? [17:08:37] ah, I think this one: jdk: 'Ubuntu - OpenJdk 7' [17:08:38] SMalyshev: hrm I'm not aware of anything that needs to be enabled...where are you seeing that? [17:08:45] PROBLEM - Puppet errors on integration-publishing is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:08:46] PROBLEM - Puppet errors on deployment-phab01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:08:53] thcipriani: e.g. https://integration.wikimedia.org/ci/job/wikidata-query-rdf/1314/console [17:08:58] * thcipriani looks [17:09:06] PROBLEM - Puppet errors on deployment-phab02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:09:21] !log reverted hack on deployment-tin (apparently no effects on the jobrunner) [17:09:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:09:29] thcipriani: should I change that from 7 to 8 in jjb/wikidata.yaml? or maybe it could be both? [17:13:07] SMalyshev i doint think java 8 is installed on the slaves [17:13:32] SMalyshev: in looking at the job it looks like you can choose from "(System)" "Ubuntu - OpenJdk 7" "Ubuntu - OpenJdk 8" or "Debian - OpenJdk 8" [17:14:17] paladox: seems to be for debian https://github.com/wikimedia/puppet/blob/production/modules/contint/manifests/packages/java.pp#L5-L7 ? [17:14:23] er jessie rather [17:14:27] Oh [17:14:57] ah [17:14:58] require_package('openjdk-7-jdk') [17:15:10] thcipriani as require_package('openjdk-7-jdk') that is installed first i think that will make java 7 the default. [17:17:59] paladox: sure, but looks like maven jobs let you specify a specific version to use and then it just calls from the full path. Making that assumption based on the "JDK" line having a bunch of different options including "(System)" in the maven job as well as the job linked above (1314) using the full path to java, i.e. /usr/lib/jvm/java-7-openjdk-amd64//bin/java [17:18:22] Oh, thanks. [17:18:36] I wonder is java 7 even needed anymore on jessie? [17:18:58] Can we bump to java 8? [17:19:36] ¯\_(ツ)_/¯ not sure, honestly would have to check what all the projects are using. [17:20:12] ok [17:21:45] thcipriani: so should I put it as Debian? because jesse is debian as it seems. The old ubuntu one works, but I'm not sure which one to use [17:21:50] thcipriani i created this https://phabricator.wikimedia.org/T162828 task. All the android tests have been migrated to java 8 according to one of the developers of the app. [17:22:17] Theres no java 8 on trusty. Theres a ppa that provides openjdk-8 [17:22:49] SMalyshev: yeah, I'd try "Debian - OpenJdk 8". I can deploy that and try to rerun the job and if it doesn't work a rollback should be pretty easy. [17:23:12] I have java8 deployed on my production, so I'd like to do java 8 tests too... esp. as I start actually using java 8 code (lambdas are nice :) [17:23:31] heh, livin' in the future :) [17:24:10] java 8 was release in what, 2014? We're finally getting there! ;) [17:25:00] java 9 is only a few months away :0 [17:25:03] :) [17:25:04] SMalyshev: ok, lemme try changing the version in jenkins and then we'll retrigger that build. [17:25:22] (03PS1) 10Smalyshev: Move WDQS job to JDK 8 [integration/config] - 10https://gerrit.wikimedia.org/r/349257 [17:25:28] thcipriani: ^^ [17:27:26] SMalyshev: ok so once 1318 gets an instance https://integration.wikimedia.org/ci/job/wikidata-query-rdf/ we'll see if it works :) [17:27:40] (just rebuilt the patch you showed me) [17:28:45] RECOVERY - Puppet errors on integration-publishing is OK: OK: Less than 1.00% above the threshold [0.0] [17:29:05] (this may take a minute judging from https://integration.wikimedia.org/zuul/) [17:35:45] https://integration.wikimedia.org/ci/job/wikidata-query-rdf/1318/console [17:36:31] RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [17:39:04] ^ SMalyshev seems to have worked [17:39:45] PROBLEM - Puppet errors on integration-publishing is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:42:07] SMalyshev: assuming that you're going to merge the pom.xml soon I can make this update permanent? Or should I revert for the time being? [17:46:22] (03CR) 10Krinkle: Decouple repos from mediawiki gate queue (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/313028 (https://phabricator.wikimedia.org/T107529) (owner: 10Hashar) [17:51:30] thcipriani: well, this particular one is probably a week or two from merge, but I'd like to run tests on java 8 anyway, given that we use it in production [17:51:45] so it makes sense to test on that too [17:52:06] SMalyshev: ok, so sounds like it makes sense to go ahead and merge your patch? [17:52:10] thcipriani: yes [17:52:15] ok, doing [17:52:37] thcipriani: I'll check with existing code base that it doesn't break anything, if anything weird pops up I'll tell [17:52:51] SMalyshev: ok, sounds good :) [17:54:40] (03CR) 10Thcipriani: [C: 032] Move WDQS job to JDK 8 [integration/config] - 10https://gerrit.wikimedia.org/r/349257 (owner: 10Smalyshev) [17:56:24] (03Merged) 10jenkins-bot: Move WDQS job to JDK 8 [integration/config] - 10https://gerrit.wikimedia.org/r/349257 (owner: 10Smalyshev) [17:56:37] thcipriani: thanks! [17:59:43] RECOVERY - Puppet errors on integration-publishing is OK: OK: Less than 1.00% above the threshold [0.0] [18:00:07] np :) [18:10:44] PROBLEM - Puppet errors on integration-publishing is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [18:13:49] greg-g: are we still creating phab projects per incident these days? [18:14:32] no [18:14:56] just use #wikimedia-incident and then put the follow-ups in the "follow-up/actionables" column [18:30:43] RECOVERY - Puppet errors on integration-publishing is OK: OK: Less than 1.00% above the threshold [0.0] [18:31:27] greg-g: kk [18:35:40] Project beta-scap-eqiad build #151794: 04FAILURE in 1 min 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/151794/ [18:42:23] twentyafterfour: ^ "ImportError: No module named diff" [18:42:38] wtf [18:43:10] PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [18:44:16] /usr/lib/python2.7/dist-packages/pygments/lexers/diff.py exists [18:45:02] Hello. [18:45:23] heh, well I guess I'm glad that was what was on my clipboard [18:45:34] LOL [18:45:42] tried to middle click that link :P [18:45:43] Project beta-scap-eqiad build #151795: 04STILL FAILING in 1 min 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/151795/ [18:48:41] looks like trusty boxes have pygments 1.6 and jessie boxes have 2.0 [18:49:07] lovely [18:49:16] Project beta-scap-eqiad build #151796: 04STILL FAILING in 1 min 53 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/151796/ [18:55:38] Project beta-scap-eqiad build #151797: 04STILL FAILING in 1 min 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/151797/ [19:00:06] Project beta-scap-eqiad build #151798: 04STILL FAILING in 1 min 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/151798/ [19:02:42] Yippee, build fixed! [19:02:42] Project beta-scap-eqiad build #151799: 09FIXED in 2 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/151799/ [19:18:10] RECOVERY - Puppet errors on deployment-aqs03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:42:05] Yippee, build fixed! [20:42:05] Project selenium-Echo » chrome,beta,Linux,BrowserTests build #370: 09FIXED in 1 min 3 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/370/ [20:42:10] Yippee, build fixed! [20:42:11] Project selenium-Echo » firefox,beta,Linux,BrowserTests build #370: 09FIXED in 1 min 9 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/370/ [21:22:53] ejegg: I can merge https://gerrit.wikimedia.org/r/#/c/336960/6 if you're around. I started pulling on the buildkit thread and decided to back away :) [21:23:55] hehe, yeah, it would be a real pain to make that all work without github [21:24:14] If you're available to deploy it, that would be great! [21:25:06] We upstreamed our changes and killed off our fork of a queue wrapper just yesterday. Dropping two repos in a week would be superb [21:26:16] * thcipriani does [21:26:52] (03PS7) 10Thcipriani: Use upstream civicrm-buildkit [integration/config] - 10https://gerrit.wikimedia.org/r/336960 (owner: 10Ejegg) [21:28:06] (03CR) 10Thcipriani: [C: 032] "> buildkit already downloads a bunch of other tools from there" [integration/config] - 10https://gerrit.wikimedia.org/r/336960 (owner: 10Ejegg) [21:29:36] (03Merged) 10jenkins-bot: Use upstream civicrm-buildkit [integration/config] - 10https://gerrit.wikimedia.org/r/336960 (owner: 10Ejegg) [21:30:15] Rockin! I'll clear out the old buildkit dirs [21:32:14] cool, I was just about to figure out where they were :P [21:34:39] I can't get onto slave-trusty-1002 - is it down right now? [21:36:20] https://integration.wikimedia.org/ci/label/UbuntuTrusty/ looks like it should just be 100{1,3,4,6} [21:36:25] to keep you on your toes [21:36:26] :) [21:37:43] heh, k, got them all cleared out [21:37:46] trying a build [21:38:02] cool [21:40:41] dang, it's still complaining about no mcrypt! [21:40:55] I thought I got that installed on the slaves [21:41:05] lemme see if that config patch got reverted [21:43:22] hrm, I see php5-mcrypt installed [21:43:30] on some random trusty box [21:44:09] Is it installed on the nodepool? ie ci-trusty and ci-jessie? [21:44:23] the failure I just saw was on slave 1004 [21:45:39] d'oh, not showing up in php --info there [21:45:48] I guess I should have verified that [21:46:54] doesn't look like it's in conf.d for cli or apache2 [21:46:59] but it is available, FWIW [21:47:18] ahh, ok, I just missed one part [21:48:09] making another puppet patch [21:54:04] if I can figure out where... [21:56:26] probably slip something in the contint module somewheres. Looks like mediawiki::php_enmod does the magic, but probably better to just make an exec in contint to avoid cross-module deps. [21:58:40] Doint forget what you do in puppet you must do a change for nodepool in integration/config since nodepool dosent use puppet and refreshes the image daily i think. [22:01:01] so confused though, none of the other php extensions have special instructions for linking the conf file to cli/conf.d [22:01:13] Guessing the deb does that in most cases [22:01:55] seems like it should be fine to add it into contint::packages::php considering the that's run for both nodepool and for permanent instances afaict. https://github.com/wikimedia/integration-config/blob/master/dib/puppet/ciimage.pp#L38 [22:04:40] thcipriani: I added it to the require_package list a little while back: [22:04:44] https://github.com/wikimedia/puppet/blob/production/modules/contint/manifests/packages/php.pp [22:05:06] and the ini file is now in mods-available, just not linked in cli/conf.d [22:05:58] I'm having a hard time finding anything else in the puppet repo that enables the other extensions - looks like require_package is enough for the other ones [22:08:14] there's only one mention of php5/cli in the whole repo, and it's the full path to php5/cli/php.ini [22:18:17] ejegg: this is interesting. I just grabbed the ubuntu debs from: http://packages.ubuntu.com/trusty/amd64/php5-curl/download vs http://packages.ubuntu.com/trusty/amd64/php5-mcrypt/download after pulling them apart the php5-curl package has a postinst control file that is missing from php5-mcrypt [22:18:20] this file: https://gist.github.com/thcipriani/731c66ca1c976dfd2d6244a84e01e190 [22:18:38] noteworthy line: php5_invoke enmod ALL ${dsoname} [22:19:08] which would explain why it's not enabled on trusty, I guess [22:19:30] oho, maybe because mcrypt is deprecated? [22:19:39] ¯\_(ツ)_/¯ [22:19:44] it's not like that in debian :P [22:19:48] anyway, I think this might do it: https://gerrit.wikimedia.org/r/349343 [22:19:53] does that look right? [22:20:25] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T162954#3199998 (10greg) a:03mmodell [22:20:32] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T163512#3199999 (10greg) [22:20:44] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T162954#3181337 (10greg) a:05mmodell>03None [22:47:20] 06Release-Engineering-Team, 15User-greg: Publish WMF code-hosting exception policy - https://phabricator.wikimedia.org/T109919#3200065 (10greg) [22:47:44] 06Release-Engineering-Team, 15User-greg: Publish WMF code-hosting exception policy - https://phabricator.wikimedia.org/T109919#1562809 (10greg) 05Open>03declined [22:48:31] 06Release-Engineering-Team, 15User-greg: Review team ownership of projects/things listed on the Developers/Maintainers page - https://phabricator.wikimedia.org/T106751#3200067 (10greg) 05Open>03Resolved a:03greg Not sure why this wasn't a sub-task of the other task on this where I was making all the upda... [22:50:10] (03CR) 1020after4: [C: 032] Remove delete-stale-branch: use scap clean instead [tools/release] - 10https://gerrit.wikimedia.org/r/348870 (owner: 10Chad) [22:50:51] (03Merged) 10jenkins-bot: Remove delete-stale-branch: use scap clean instead [tools/release] - 10https://gerrit.wikimedia.org/r/348870 (owner: 10Chad) [23:02:46] Krinkle, bblack, does the data center switchover affect Labs at all? [23:02:49] I'm getting: [23:03:05] Notice: Undefined index: codfw in /srv/mediawiki-staging/wmf-config/LabsServices.php on line 61 [23:03:11] on deployment-tin. [23:03:45] Should $wmfMasterDatacenter for Labs still be eqiad? [23:06:19] ejegg: cherry-picked your puppet patch on the integration puppetmaster and it applied fine. Should be good to test. [23:08:28] thanks thcipriani, here goes! [23:10:31] matt_flaschen: I suppose so yes. [23:10:50] LabsServices.php clears and redefines wmfAllServices [23:11:00] so there's nothing there that would point to prod eqiad [23:11:12] Krinkle, that's what I thought, surprised no one caught this already. [23:11:48] matt_flaschen: set it in commonsettings-labs I think [23:12:10] actually, might not be in time [23:12:18] comonsettings loads labsservices very early [23:12:53] probably just ignore wmfMasterDatacenter in LabsServices.php [23:13:16] and perhaps also not use the key 'eqiad' at all, but 'labs' or 'beta' instead. That's a separate issue though (use of the same dc keys) [23:13:36] eqiad-beta or something [23:13:50] Thanks thcipriani, looks like mcrypt is no longer a blocker! Still got one failure, but I think it's with our buildkit site config. I'll puzzle that one out! [23:14:25] ejegg: word. Good luck :) [23:18:54] "probably just ignore wmfMasterDatacenter in LabsServices.php". Krinkle, yeah, that's what I was thinking. Patch in a second. [23:23:37] Krinkle, https://gerrit.wikimedia.org/r/349349 .