[00:20:06] PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [00:22:54] PROBLEM - Puppet run on deployment-phab01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:43:20] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#1973302 (10aude) now tests are failing: ``` 03:32:45 53) LuaStandalone: LuaWikibaseEntityLibraryTests[39]: mw.wikibase.entity.getDescriptio... [06:25:58] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Labs, 07Puppet, 15User-Joe: Re-think puppet management for deployment-prep - https://phabricator.wikimedia.org/T161675#3139285 (10Joe) [06:27:06] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Labs, 07Puppet, 15User-Joe: Re-think puppet management for deployment-prep - https://phabricator.wikimedia.org/T161675#3139297 (10Joe) [06:29:04] Yippee, build fixed! [06:29:05] Project selenium-Wikibase » chrome,test,Linux,BrowserTests build #314: 09FIXED in 1 hr 49 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=BrowserTests/314/ [06:47:20] Yippee, build fixed! [06:47:21] Project selenium-Wikibase » chrome,beta,Linux,BrowserTests build #314: 09FIXED in 2 hr 7 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/314/ [07:34:50] PROBLEM - Free space - all mounts on integration-slave-jessie-1001 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1001.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1001.diskspace._srv.byte_percentfree (<10.00%) [07:52:07] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10OOjs-UI, 13Patch-For-Review: Speed up oojs/ui Jenkins jobs - https://phabricator.wikimedia.org/T155483#3139483 (10Prtksxna) I tried to make changes in `jjb/oojs.yaml` and `zuul/layout.yaml` as well, but couldn... [08:02:03] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10OOjs-UI, 13Patch-For-Review: Speed up oojs/ui Jenkins jobs - https://phabricator.wikimedia.org/T155483#3139508 (10hashar) Adding the `composer test` step "only" adds 44 seconds: ``` 00:11:16.835 Running "exec:... [08:04:28] (03PS1) 10Prtksxna: oojs/ui: Stop using `composer-test-package` template [integration/config] - 10https://gerrit.wikimedia.org/r/345286 (https://phabricator.wikimedia.org/T155483) [08:05:14] (03PS2) 10Prtksxna: oojs/ui: Stop using `composer-test-package` template [integration/config] - 10https://gerrit.wikimedia.org/r/345286 (https://phabricator.wikimedia.org/T155483) [08:11:04] (03PS3) 10Hashar: oojs/ui: Stop using `composer-test-package` template [integration/config] - 10https://gerrit.wikimedia.org/r/345286 (https://phabricator.wikimedia.org/T155483) (owner: 10Prtksxna) [08:12:03] (03CR) 10Hashar: [C: 032] oojs/ui: Stop using `composer-test-package` template [integration/config] - 10https://gerrit.wikimedia.org/r/345286 (https://phabricator.wikimedia.org/T155483) (owner: 10Prtksxna) [08:12:50] (03Merged) 10jenkins-bot: oojs/ui: Stop using `composer-test-package` template [integration/config] - 10https://gerrit.wikimedia.org/r/345286 (https://phabricator.wikimedia.org/T155483) (owner: 10Prtksxna) [08:23:36] (03CR) 10Zfilipin: "@Hashar: I am not sure how to do that. Rubocop is executed via Rake." [integration/config] - 10https://gerrit.wikimedia.org/r/343848 (owner: 10Zfilipin) [09:07:40] 10Continuous-Integration-Config: Castor: mediawiki-core-qunit-jessie node_modules cache ineffective - https://phabricator.wikimedia.org/T159591#3139591 (10hashar) 05declined>03Open p:05Triage>03Normal That came up when looking at oojs/ui npm job ( T155483 ). It takes roughly 12 minutes, 5:30 minutes bein... [09:32:29] 10Continuous-Integration-Infrastructure, 07Puppet: Need a better way of testing puppet patches for contint/integration stuff - https://phabricator.wikimedia.org/T126370#3139613 (10hashar) There is a related task to add Puppet environments to #beta-cluster : {T161675} [09:41:04] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Labs, 07Puppet, 15User-Joe: Re-think puppet management for deployment-prep - https://phabricator.wikimedia.org/T161675#3139285 (10hashar) > We might want to have all nodes in this environment derive from a base node that includes all the labs b... [09:50:02] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Labs, 07Puppet, 15User-Joe: Re-think puppet management for deployment-prep - https://phabricator.wikimedia.org/T161675#3139659 (10Joe) @hashar the point is to have something that resembles production, including the role-based hiera lookup. It... [11:18:13] hashar: around? [11:18:22] aude: yes [11:18:44] all wikibase-related gate and submit jobs are failing because of issues with scribunto [11:18:50] probably related to https://phabricator.wikimedia.org/T125050 [11:19:04] e.g. https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer-trusty/151/console [11:19:10] proc_open(): fork failed - Cannot allocate memory [11:19:24] i am not sure how to fix [11:20:44] bah [11:21:17] other than reverting of course, but would be good if we could have the scribunto tests run [11:21:26] looks like Timo did some change in https://phabricator.wikimedia.org/T126670#3138907 [11:21:38] yeah [11:21:41] iirc the issue is that all the Scribunto tests ends up running [11:23:26] blblbl [11:23:40] aude: the wfDebug log shows 1.8GBytes of ram being used [11:23:41] https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer-trusty/160/artifact/log/mw-debug-cli.log/*view*/ [11:23:45] (at the bottom [11:24:04] 1016.5270 1799.0M Scribunto_LuaStandaloneInterpreter::terminate: terminating [11:24:34] o_O [11:26:03] so I guess there is some memory limit kicking in [11:26:59] probably [11:27:11] i can make a task for this to investigate [11:27:22] if this is something we can fix [11:28:09] and the HHVM one shows 610MBytes [11:28:33] when the cloud instances have 4GBytes of ram [11:29:13] ok [11:29:17] makes sense [11:30:48] and we have /etc/php5/cli/php.ini with memory_limit = -1 [11:30:53] I verified on a trusty instance [11:31:03] 10Browser-Tests-Infrastructure, 07JavaScript, 15User-zeljkofilipin: Run WebdriverIO tests using Firefox - https://phabricator.wikimedia.org/T161697#3139916 (10zeljkofilipin) [11:31:19] the idle trusty instance I am on has 3368MBytes free [11:34:13] and the XX question is why it tries to fork() [11:34:27] when the the setting is apparently to use the standalone php extension https://phabricator.wikimedia.org/rCIJEf3516f7e51faa6cb59a99b06e721f9cef2108f15 [11:34:52] i'm seeing if hoo can help [11:35:00] he knows a lot more about scribunto [11:36:54] i have hhvm (or php7) locally so can't easily run the tests to look [11:37:03] oh [11:37:18] i'm sure i could get php56 in vagrant or something though [11:38:39] i have an idea... [11:38:51] also the luastandalone process is invoked with ulimit -v 48828 (size of maximum virtual memory) [11:38:59] if the tests run ok in hhvm, then maybe we could have the setting just for hhvm (at least for now) [11:39:05] aude: Do you know if beta cluster has dispatching or not? [11:39:18] we could put a check around the config [11:39:28] Amir1: i don't think so [11:39:46] it's a cron job + whatever config needed [11:40:10] aude: what is a bit lame is that a patch to Wikibase has a Jenkins job that runs every single tests [11:40:25] including Scribunto tests for the various lua backend [11:41:00] yeah :/ [11:41:08] it is for gate and submit though [11:41:23] yup php5 jobs are only in gate-and-submit [11:43:38] LuaStandalone: Scribunto_LuaUriLibraryTests 1 mn 24 s [11:43:44] LuaStandalone: Scribunto_LuaUstringLibraryTests 43 s [11:43:55] (from https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer-trusty/160/testReport/(root)/ ) [11:44:17] aude: maybe there are forked lua process that keep accumulating [11:44:32] could be [11:44:37] filling up the memory on top of the 1.7GB reported by the wfDebug() [11:44:42] I guess we want to revert [11:44:48] but I need to get some food first [11:44:55] i'm sure wikibase uses a bit of memory for loading entities [11:45:04] and if it leaks/accumulates.... [11:45:07] ok [11:45:18] and we need to split tests in various groups [11:45:40] there is no point in running all Scribunto tests for a Wikibase patch [11:45:56] yeah [12:56:14] (03PS1) 10Hashar: (WIP) ensure all jobs in Zuul are in JJB [integration/config] - 10https://gerrit.wikimedia.org/r/345321 (https://phabricator.wikimedia.org/T103847) [12:57:41] 10Continuous-Integration-Config, 13Patch-For-Review: Write a test to ensure all jobs in Zuul are defined in JJB - https://phabricator.wikimedia.org/T103847#3140131 (10hashar) Gave it a try with the patch above: tox -e py27 -- tests/test_integration.py:IntegrationTests.test_jjb_zuul_jobs FAILURE: Job analyti... [12:59:51] (03CR) 10jerkins-bot: [V: 04-1] (WIP) ensure all jobs in Zuul are in JJB [integration/config] - 10https://gerrit.wikimedia.org/r/345321 (https://phabricator.wikimedia.org/T103847) (owner: 10Hashar) [13:13:54] (03PS2) 10Hashar: (WIP) ensure all jobs in Zuul are in JJB [integration/config] - 10https://gerrit.wikimedia.org/r/345321 (https://phabricator.wikimedia.org/T103847) [13:20:05] Project beta-update-databases-eqiad build #15998: 04FAILURE in 4.4 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/15998/ [13:23:51] (03CR) 10jerkins-bot: [V: 04-1] (WIP) ensure all jobs in Zuul are in JJB [integration/config] - 10https://gerrit.wikimedia.org/r/345321 (https://phabricator.wikimedia.org/T103847) (owner: 10Hashar) [13:27:25] 10Continuous-Integration-Config, 07Technical-Debt: Migrate "analytics-*" jobs to Jenkins Job Builder - https://phabricator.wikimedia.org/T97514#3140233 (10hashar) | Repo | Last change |--|-- | [[ https://gerrit.wikimedia.org/r/#/q/project:analytics/libanon | analytics/libanon ]] | Dec 2012 | [[ https://gerrit.... [13:30:44] (03PS1) 10Hashar: Drop obsolete analytics repositories from Zuul [integration/config] - 10https://gerrit.wikimedia.org/r/345334 (https://phabricator.wikimedia.org/T97514) [13:32:15] 10Continuous-Integration-Infrastructure, 06Operations: (Nodepool) CI is really slow tonight - https://phabricator.wikimedia.org/T155444#3140242 (10hashar) [13:32:17] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10OOjs-UI, 13Patch-For-Review: Speed up oojs/ui Jenkins jobs - https://phabricator.wikimedia.org/T155483#3140240 (10hashar) 05Open>03Resolved Mostly solved. There are still potential optimization to be done... [13:46:39] Yippee, build fixed! [13:46:39] Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #351: 09FIXED in 2 min 39 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/351/ [13:46:59] (03CR) 10Hashar: [C: 032] Drop obsolete analytics repositories from Zuul [integration/config] - 10https://gerrit.wikimedia.org/r/345334 (https://phabricator.wikimedia.org/T97514) (owner: 10Hashar) [13:50:36] (03Merged) 10jenkins-bot: Drop obsolete analytics repositories from Zuul [integration/config] - 10https://gerrit.wikimedia.org/r/345334 (https://phabricator.wikimedia.org/T97514) (owner: 10Hashar) [13:59:43] (03PS1) 10Hashar: Import analytics-wikistats job [integration/config] - 10https://gerrit.wikimedia.org/r/345343 (https://phabricator.wikimedia.org/T97514) [14:01:43] (03CR) 10Hashar: [C: 032] Import analytics-wikistats job [integration/config] - 10https://gerrit.wikimedia.org/r/345343 (https://phabricator.wikimedia.org/T97514) (owner: 10Hashar) [14:04:42] (03Merged) 10jenkins-bot: Import analytics-wikistats job [integration/config] - 10https://gerrit.wikimedia.org/r/345343 (https://phabricator.wikimedia.org/T97514) (owner: 10Hashar) [14:05:02] 10Continuous-Integration-Config, 13Patch-For-Review: Write a test to ensure all jobs in Zuul are defined in JJB - https://phabricator.wikimedia.org/T103847#3140328 (10hashar) [14:05:04] 10Continuous-Integration-Infrastructure, 07Technical-Debt: Delete old jobs not (or no longer) managed by JJB - https://phabricator.wikimedia.org/T91410#3140329 (10hashar) [14:05:06] 10Continuous-Integration-Config, 13Patch-For-Review, 07Technical-Debt: Migrate "analytics-*" jobs to Jenkins Job Builder - https://phabricator.wikimedia.org/T97514#3140326 (10hashar) 05Open>03Resolved a:03hashar [14:05:14] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/345321 (https://phabricator.wikimedia.org/T103847) (owner: 10Hashar) [14:13:49] (03PS2) 10Hashar: Android: remove lint job [integration/config] - 10https://gerrit.wikimedia.org/r/345122 (https://phabricator.wikimedia.org/T161305) [14:14:17] (03PS2) 10Hashar: Android: generate lint report on build page [integration/config] - 10https://gerrit.wikimedia.org/r/345127 (https://phabricator.wikimedia.org/T161305) [14:20:54] (03PS3) 10Hashar: Android: generate lint report on build page [integration/config] - 10https://gerrit.wikimedia.org/r/345127 (https://phabricator.wikimedia.org/T161305) [14:21:51] Yippee, build fixed! [14:21:52] Project beta-update-databases-eqiad build #15999: 09FIXED in 1 min 51 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/15999/ [14:25:40] (03CR) 10Hashar: [C: 032] Android: remove lint job [integration/config] - 10https://gerrit.wikimedia.org/r/345122 (https://phabricator.wikimedia.org/T161305) (owner: 10Hashar) [14:27:01] (03Merged) 10jenkins-bot: Android: remove lint job [integration/config] - 10https://gerrit.wikimedia.org/r/345122 (https://phabricator.wikimedia.org/T161305) (owner: 10Hashar) [14:27:27] (03CR) 10Hashar: [C: 032] "The job now always tries to generate a lint report, even when the build as failed. Tested on the dummy change https://gerrit.wikimedia.org" [integration/config] - 10https://gerrit.wikimedia.org/r/345127 (https://phabricator.wikimedia.org/T161305) (owner: 10Hashar) [14:28:46] (03Merged) 10jenkins-bot: Android: generate lint report on build page [integration/config] - 10https://gerrit.wikimedia.org/r/345127 (https://phabricator.wikimedia.org/T161305) (owner: 10Hashar) [14:42:59] 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Android-app-Bugs, 06Wikipedia-Android-App-Backlog, 13Patch-For-Review: Merge apps/android/wikipedia Jenkins jobs lint and test - https://phabricator.wikimedia.org/T161305#3140419 (10hashar) 05Open>03Resolved The lint job has been merged i... [14:44:10] 10Continuous-Integration-Infrastructure (Little Steps Sprint): Create "High Priority" test pipeline - https://phabricator.wikimedia.org/T160667#3140422 (10hashar) 05Open>03Resolved My todo above is a similar problem as the gate-and-submit. Namely prioritize wmf/ branches . So lets address it with T160668 [14:47:08] au [14:51:06] Hey thcipriani|afk, just checking in on the puppet work for 3d2png, I have exactly no idea where the definitions for the thumbnailers are in puppet, I'm happy to wander around looking for them, but if you can point me in the right direction I'd be grateful [14:58:54] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#1973302 (10hashar) So nothing specific has changed. T126670 has been closed as a cleanup task since we have `$wgScribuntoDefaultEngine = 'l... [15:05:43] thcipriani: I guess let me know if https://gerrit.wikimedia.org/r/345377 is right [15:07:09] marktraceur: hey, so the hieradata deployment server piece seems correct, the other part we'll have to find a place for it to live in a module or a role [15:07:28] so it looks like imagescalers all get the role https://github.com/wikimedia/puppet/blob/production/manifests/site.pp#L2109 [15:07:51] Yeah, I think that's where I put the scap part [15:08:34] At least, as far as I understand puppet, which is practically not. [15:11:19] we'll need to put scap::target in a puppet file, the hieradata files just feed configuration to the puppet modules, so we could put it here https://github.com/wikimedia/puppet/blob/production/modules/role/manifests/mediawiki/scaler.pp [15:11:38] trying to think of the "correct" place to put it based on https://wikitech.wikimedia.org/wiki/Puppet_coding#Organization [15:11:47] thcipriani: scaler, or imagescaler? [15:12:19] right, imagescaler, sorry, digging through github was different than digging through vim :) [15:12:30] Heh [15:12:32] https://github.com/wikimedia/puppet/blob/production/modules/role/manifests/mediawiki/imagescaler.pp [15:12:40] OK, I've got that set up... [15:13:10] thcipriani: Check now? [15:18:16] !log Delete a 32GB instance integration-ci - T161006 [15:18:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:18:20] T161006: Convince nova-scheduler to pay attention to CPU metrics - https://phabricator.wikimedia.org/T161006 [15:18:32] marktraceur: that seems functionally correct, but may need to be moved for organization purposes. I'll see if I can find a better spot for it. Also now that I'm digging through these roles it seems like beta images scalers may have a different setup than prod. [15:18:55] OK. [15:19:02] like: the deployment-imagescaler01 doesn't have mediawiki on it but the prod roles here seem like they do? Is that right? [15:19:15] Hmm [15:19:34] thcipriani: Maybe someone switched beta over to thumbor all stealthy-like? [15:19:46] In summary, no, that doesn't seem right, but I could be wrong [15:19:50] :) [15:19:56] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#1973302 (10Anomie) >>! In T125050#3140483, @hashar wrote: > For Wikibase tests, the job mwext-testextension-php55-composer-trusty fails. The... [15:20:09] I did see an /srv/thumbor dir yesterday iirc [15:23:19] 10Continuous-Integration-Config, 10MediaWiki-extensions-Scribunto, 10Wikidata: [Task] Add Scribunto to extension-gate in CI - https://phabricator.wikimedia.org/T125050#3140588 (10Anomie) [15:23:46] huh, well, it's looking at the puppetmaster in the thumbor project so I would assume it's running thumbor :) [15:26:39] marktraceur: is the plan to deploy this to the imagescalers like mw1293 or to imagescalers like thumbor1001? [15:26:39] thcipriani: Well, that's going to make things a little hairier for our beta deployment... [15:26:57] thcipriani: Right now I don't think we have a way to run it through thumbor, but I guess that's my next stop [15:27:44] We have to have scalers still for video on beta, though, right? Those still go through MediaWiki AFAIK [15:27:59] * thcipriani digs [15:28:16] Maybe gilles should be part of this conversation since both of these are his babies. [15:33:03] hrm. I'm lost in horizon trying to figure out what runs what role. I wonder if there's an easy way to figure out what machines have a particular role, ldapsearch used to work. Having trouble finding imagescalers/videoscalers in deployment-prep. [15:35:01] There a puppet role screen. [15:35:45] You click on the instance in the horizion screen. Then you click the puppet tab [15:35:54] deployment-imagescaler01 has thumbor shadow-serve thumbnail traffic, but the response is still provided by mediawiki [15:36:01] for now [15:36:36] the puppetmaster was overridden to test puppet changes without having to change things on the default puppetmaster [15:36:50] since I've had to do a substantial amount of puppet work for thumbor [15:37:40] anyway all of that is my parallel reality, marktraceur you can focus on mediawiki and I'll figure out how to get 3d2png deployed on the prod thumbor hosts as well once you've done it for image scalers [15:37:43] twentyafterfour, hi i am now getting this error [15:37:44] well this explains why ldapsearch is confused about me looking for host entries :) https://wikitech.wikimedia.org/wiki/Ldap_hosts#Done_Labs_ldap_host_entries_are_now_a_thing_of_the_past. [15:37:44] {"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406} [15:37:58] for beta it's the same machines, so nothing special to do, if you deploy it for mediawiki, it'll be available for thumbor [15:38:04] with elasticsearch (searching works but it's failing one of the checks) [15:38:10] https://phab-01.wmflabs.org/config/issue/elastic.misconfigured/ [15:38:53] gilles: so my dumb question is which machines in beta are handing image scaling? [15:39:21] deployment-imagescaler* [15:39:34] I don't know how many there are. 2, maybe? [15:39:50] or could be just the one [15:40:44] I think there's just deployment-imagescaler01 which is a thumbor box [15:41:07] it also runs mediawiki [15:41:16] which is what actually serves the thumbnails back to swift [15:42:00] if it doesnt then I forgot that we did that [15:42:10] erm, I don't see /srv/mediawiki directory on that box [15:42:30] ok, let me double check on actual beta images [15:42:32] twentyafterfour i think that may be because i upgraded elasticsearch to 5.3 last night. So phab may need to be updated for 5.3 support too. [15:43:21] cool thanks [15:47:33] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Labs, 07Puppet, 15User-Joe: Re-think puppet management for deployment-prep - https://phabricator.wikimedia.org/T161675#3139285 (10bd808) > on the deployment-prep puppetmaster, define a disk-based hiera hierarchy to mimic 1:1 what we have in pro... [15:48:52] thcipriani: You, Chad, and anyone else that did a lot of work on the integration project should probably take a look at _joe_'s ideas in T161675. Seems like it might be a good direction to take deployment-prep in [15:48:52] T161675: Re-think puppet management for deployment-prep - https://phabricator.wikimedia.org/T161675 [15:49:49] > There is no greppable way to find out where that role is applied on within deployment-prep [15:49:52] so about that [15:50:05] ldapsearch used to work, but now https://wikitech.wikimedia.org/wiki/Ldap_hosts#Done_Labs_ldap_host_entries_are_now_a_thing_of_the_past [15:50:13] is there really no way to get this data now? [15:50:20] other than horizon [15:51:03] thcipriani: it looks like the app servers are serving the thumbs, eg. deployment-mediawiki05 [15:51:28] don't remove deployment-imagescaler01 from deploy destinations for 3d2png, though, we'll need that eventually [15:51:36] Yippee, build fixed! [15:51:37] Project selenium-MobileFrontend » firefox,beta,Linux,BrowserTests build #374: 09FIXED in 29 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/374/ [15:51:57] thcipriani: yeah, not in ldap anymore and things aren't all in horizon either. it's horizon + wikitech hiera page + ops/puppet labs/... now [15:52:09] at bit of a dogs breakfast [15:52:22] gilles: So if we're serving 3d2png thumbnails from the app servers in beta, it's not quite production-like for testing purposes... [15:52:36] in fact I'm doing a bunch of requests and it's only deployment-mediawiki05 showing up. I'd like to track down the swift config for beta, though, that'll give us the definite answer [15:52:49] killing the wikitech page has been -1'd for now because the horizon changes don't give any version history and that makes many wiki people sad [15:53:35] thcipriani: good morning :d I eventually closed a few tasks from the Little Steps Sprint [15:54:02] marktraceur: it's possible that deployment-mediawiki05 is provisioned like an app server but only serves thumb traffic, I don't know. I'd like firs tto find where the config variables that define where swift sends thumb traffic is defined for beta [15:54:03] thcipriani: oojs/ui and apps/wikiedia/android have been optimized. And I guess the next one is going to be Wikibase [15:54:52] gilles: yeah, in looking at deployment-mediawiki05 the appserver role is the only thing applied https://horizon.wikimedia.org/project/instances/7e18c953-42d1-4624-a527-6b7077c1cd7e/ (aside from some firewall stuff which is the beta role you see there) [15:55:13] I've found the hiera config, deployment-mediawiki05 is the only host configured for thumbnails. but that's not the only thing it does [15:55:21] it also does API requests, for example [15:55:49] the balancing is done by varnish and configured in hieradata/labs.yaml cache::* variables [15:56:10] so at least mediawiki06 is dedicated to security audit since they tend to overload Hhvm [15:56:38] marktraceur: 05 does thumbs and APIs as far as I can see, 04 acts as the appserver [15:56:47] hashar: awesome about the Little Steps stuff! It'll be nice to see these jobs streamlined to not use so many instances. [15:56:58] that's in hieradata/labs.yaml in puppet [15:57:03] yeah that is the idea yea [15:57:14] thcipriani: mediawiki/core consumes like 10-12 instances per CR+2 patches :( [15:57:14] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Labs, 07Puppet, 15User-Joe: Re-think puppet management for deployment-prep - https://phabricator.wikimedia.org/T161675#3140702 (10Joe) >>! In T161675#3140664, @bd808 wrote: >> on the deployment-prep puppetmaster, define a disk-based hiera hiera... [15:57:39] so I'd say for 3d2png on beta, deploy it to deployment-mediawiki05.deployment-prep.eqiad.wmflabs for mediawiki deployment-mediawiki05.deployment-prep.eqiad.wmflabs to deployment-imagescaler01.deployment-prep.eqiad.wmflabs for thumbor [15:58:00] hashar: I've noticed. 2 core patches takes up the whole nodepool allowance :( [15:58:04] ignore the wild paste in the middle of my sentence... [15:58:27] thcipriani: yeah and we had discussion to merge those jobs a while ago. I Guess it is time to do it now [15:58:40] thcipriani: I will probably try to write a test runner to make it easier [15:59:15] marktraceur: by deploying to 05 it won't be on the designated app server (04) so you'll still get a similar environment as prod where the command is only available to the image scaler (and the API, but that shouldn't matter) [15:59:45] should be enough to *make sure that shit works* (tm) [15:59:47] gilles: OK, cool. And 05 is already getting the mediawiki::imagescaler role, so my puppet change should apply? [16:00:08] gilles: so it needs to be on deployment-mediawiki05 and deployment-imagescaler01 in beta? [16:00:21] I don't see that it gets that role, maybe it's inherited from role::beta::mediawiki or role::mediawiki::appserver [16:00:22] marktraceur: 05 does not get the mediawiki::imagescaler role afaict [16:00:26] I'm not sure what's up with that [16:02:00] I see the name "rendering" both in hieradata/labs.yaml and in the imagescaler.yaml for hiera... [16:02:26] right [16:02:51] it could be an oversight and happen to work provisioned that way and assigned as a rendering host [16:03:16] Worst case scenario, it doesn't get deployed and we get a bunch of broken thumbnails on beta, right? [16:03:35] Then it's back to the drawing board but at least we've learned something [16:05:00] so on deployment-mediawiki05 this turns up: grep -i 'image' /var/lib/puppet/state/classes.txt imagemagick::install and packages::imagemagick so that's...something ¯\_(ツ)_/¯ [16:05:03] oh beta. [16:05:36] right, I imagine the needed things are installed by chance [16:06:13] the app servers should include mediawiki::packages [16:06:19] which includes mediawiki::packages::multimedia [16:06:23] and other such sub classes [16:06:25] there you go [16:06:35] we use the same puppet class on CI nodes [16:06:55] probably worth trying to apply the imagescaler role to 05 [16:07:15] modules/mediawiki/manifests/packages.pp: include ::imagemagick::install [16:07:33] that has the firejail profiles as well [16:07:58] gilles: yeah, I think adding the imagescaler role to 05 would be the right thing if we're using it as an image scaler + we could test the scap deploy of 3d2png that way which is how we went down this dark path :) [16:08:47] * thcipriani does that [16:09:40] for what it is worth we also have deployment-imagescaler01.deployment-prep.eqiad.wmflabs [16:10:25] right, which is used for thumbor [16:10:52] maybe it was repurposed to just have thumbor, which might explain the cirrent situation. I can't remember, I believe godog set that up [16:11:29] alright, applied to role, running puppet on 05 now [16:11:40] good luck! ;-} [16:12:15] I don't remember reimaging / repurposing imagescaler01, but it is possible! [16:12:40] IOW thumbor was added but mediawiki was left in place IIRC [16:12:55] cool. so now deployment-mediawiki05 has role::mediawiki::imagescaler [16:13:17] it still renders thumbnails fine [16:13:28] awesome [16:13:29] So now we should be fine with the current state of the patch, and whenever we can deploy it it'll be good [16:13:44] I just need to add something to mediawiki-config to tell it where to find the script [16:14:01] (this is made even hairier by the fact that I am literally leaving town on Saturday morning) [16:14:11] ok, I'm still not clear: we don't need it on deployment-imagescaler01 just on mediawiki05, correct? [16:14:29] put it on both, we'll need it for thumbor as well [16:14:36] I'll take care of that integration [16:15:05] ok, I think we need to add scap::target in two different parts of puppet so that scap is installed on both [16:15:22] but don't provision the imagescaler role on imagescaler01 :) if you need that into a role, I'll add it to the thumbor role [16:15:39] so in addition to having it in role::mediawiki::imagescaler we'll need it in the thumbor role..yeah :) [16:16:20] is that just role::thumbor::mediawiki ? [16:16:30] yeah [16:17:39] marktraceur: mind if I update your patch? [16:17:45] thcipriani: Not at all :) [16:17:50] * thcipriani does [16:18:31] what will be the full path of 3d2png on the target machines? [16:19:16] 10Continuous-Integration-Config: confusing Jenkins failure on phpcs warning from php-composer-test. - https://phabricator.wikimedia.org/T112159#3140828 (10Krinkle) 05Open>03declined [16:22:20] gilles: full path will be /srv/deployment/3d2png/deploy [16:22:29] thanks [16:22:45] well, that'll be the path to the base of the repo anyway [16:22:59] right, I'll figure it out from there [16:29:47] OK, so I have a clear path forward now...assuming the pre-built stuff in node_modules works as expected [16:30:10] (03PS1) 10Krinkle: qunit: Remove obsolete 'tac|tac' hack [integration/config] - 10https://gerrit.wikimedia.org/r/345392 (https://phabricator.wikimedia.org/T153597) [16:30:28] I guess my only question is for greg-g - can we deploy something to beta tomorrow even though I'm going off the grid next week? [16:36:43] marktraceur: I think this is what we need https://gerrit.wikimedia.org/r/#/c/345377/ [16:47:53] 10Continuous-Integration-Config, 10Pywikibot-core, 13Patch-For-Review: Jenkins output for pywikibot job is hard to read - https://phabricator.wikimedia.org/T117570#3141001 (10Krinkle) [16:51:30] greg-g: So, I hear a rumor that we're going to have deployment restrictions in late April due to https://wikitech.wikimedia.org/wiki/Switch_Datacenter , do you have any more info about that? [16:52:33] Never mind, it was literally just announced to our department list [16:54:18] 10Continuous-Integration-Config, 10Fundraising-Backlog, 13Patch-For-Review: symfony-polyfill54 is breaking CI - https://phabricator.wikimedia.org/T143598#2573044 (10Krinkle) Looks like this is still failing on recent commits, . [16:55:11] Hi, there's an issue with https://gerrit.wikimedia.org/r/#/c/345266/ and we don't know what exactly goes wrong [16:57:36] Volker_E: InvalidArgumentException from line 79 of /home/jenkins/workspace/mediawiki-extensions-qunit-jessie/src/includes/resourceloader/ResourceLoaderImage.php: File type for different image files of 'help' not the same [16:58:15] Volker_E: https://paste.fedoraproject.org/paste/FotKA3WaPoNSOjcyI4ovGl5M1UNdIGYhyRLivL9gydE=/raw is the full exception [16:58:24] 10Continuous-Integration-Config, 10Librarization: Run a phpunit coverage job pre-merge for libraries - https://phabricator.wikimedia.org/T147093#3141042 (10Krinkle) This task is just about changing phpunit to run with `--coverage` for smaller repos like libraries. Similar to how we run JSDuck and Doxygen on va... [16:59:53] Volker_E: That error usually means that the file for that image doesn't exist at all [17:00:59] Project mediawiki-core-code-coverage-jessie build #3: 15ABORTED in 38 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-jessie/3/ [17:11:55] legoktm: RoanKattouw I don't see an issue, help icon is there in both directions… [17:12:20] Volker_E: I can help you debug after thi smeeting [17:12:24] idk what the error means, I'm just telling you what it is ;P [17:13:04] RoanKattouw: I'd appreciate it [17:13:08] legoktm: ;) [17:33:51] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10OOjs-UI, 13Patch-For-Review: Speed up oojs/ui Jenkins jobs - https://phabricator.wikimedia.org/T155483#3141134 (10Prtksxna) \o/ @hashar Is it possible to have a number on how many seconds/minutes we are savi... [17:34:32] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10OOjs-UI (OOjs-UI-0.20.1), 13Patch-For-Review: Speed up oojs/ui Jenkins jobs - https://phabricator.wikimedia.org/T155483#3141137 (10Prtksxna) [17:34:40] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10OOjs-UI (OOjs-UI-0.20.1): Speed up oojs/ui Jenkins jobs - https://phabricator.wikimedia.org/T155483#2944459 (10Prtksxna) [17:36:51] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Labs, 07Puppet, 15User-Joe: Re-think puppet management for deployment-prep - https://phabricator.wikimedia.org/T161675#3139285 (10thcipriani) > * on the deployment-prep puppetmaster, configure a 'staging' environment for puppet, with its own si... [17:45:43] 10Browser-Tests-Infrastructure, 10MediaWiki-General-or-Unknown, 07JavaScript, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 4 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3141177 (10zeljkofilipin) [17:45:45] 10Browser-Tests-Infrastructure, 07JavaScript, 13Patch-For-Review, 15User-zeljkofilipin: Write documentation on Selenium tests in Node.js - https://phabricator.wikimedia.org/T161103#3141175 (10zeljkofilipin) 05Open>03Resolved The documentation is good enough™. [17:47:30] 10Browser-Tests-Infrastructure, 10MediaWiki-General-or-Unknown, 07JavaScript, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 4 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#2441243 (10zeljkofilipin) [17:48:06] 10Browser-Tests-Infrastructure, 10MediaWiki-General-or-Unknown, 07JavaScript, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 4 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3141197 (10zeljkofilipin) [17:54:13] 10Continuous-Integration-Config, 10Fundraising-Backlog, 13Patch-For-Review: symfony-polyfill54 is breaking CI - https://phabricator.wikimedia.org/T143598#3141208 (10awight) @Krinkle Thanks for pointing out that the workaround no longer works :-) Looks like the php5lint job should be disabled for that repo. [17:55:02] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Labs, 07Puppet, 15User-Joe: Re-think puppet management for deployment-prep - https://phabricator.wikimedia.org/T161675#3141211 (10bd808) >>! In T161675#3141154, @thcipriani wrote: >>>! In T161675#3140664, @bd808 wrote: >> I think I would sugges... [18:02:56] 10Continuous-Integration-Config: Castor: mediawiki-core-qunit-jessie node_modules cache ineffective - https://phabricator.wikimedia.org/T159591#3141254 (10Prtksxna) >>! In T159591#3139591, @hashar wrote: > I am willing to try caching `node_modules`, as a trial on the few repositories that have long install time... [18:16:45] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 06Labs, 07Puppet, 15User-Joe: Re-think puppet management for deployment-prep - https://phabricator.wikimedia.org/T161675#3141291 (10Joe) >>! In T161675#3141154, @thcipriani wrote: > Will this repo //just// be a different `site.pp` for beta node d... [18:22:20] RoanKattouw: when do you think you've got time? 12? [18:22:53] Volker_E: Now-ish [18:24:43] RoanKattouw: so the help icon is def there [18:24:49] both -ltr/-rtl [18:24:50] 10Continuous-Integration-Config: Castor: mediawiki-core-qunit-jessie node_modules cache ineffective - https://phabricator.wikimedia.org/T159591#3141306 (10Krinkle) >>! In T159591#3141254, @Prtksxna wrote: >>! From the **description**: >> • Draft a change that changes `rm -rf node_modules; npm install` to `npm pr... [18:24:52] right [18:24:57] Downloading your patch now [18:25:17] could the error point to the following icon? [18:25:17] Moriel saw this error locally too and it confused us [18:25:21] Maybe [18:25:23] what [18:25:25] Looking locally now [18:25:27] interesting [18:25:35] We just thought it was her doing weird things with her OOUI build [18:25:43] Because she was doing somewhat strange things [18:26:57] OK I can repro [18:28:12] It comes from the oojs-ui.styles.icons module [18:28:20] I patched the exception to give me the module name, I'll upstream that later [18:29:54] Volker_E: So, where is help-rtl.svg ? [18:30:14] Oh, it's there [18:30:15] Never mind [18:30:46] in both apex/mediawiki [18:30:58] images/icons/help* [18:31:33] Well, here we go [18:31:37] https://www.irccloud.com/pastebin/9k8Vwz7T/ [18:31:42] I blame James_F [18:32:45] oh [18:32:47] Although he claims that the deprecated strings are older [18:33:02] Anyway, I'll patch this code so it ignores them [18:33:30] RainbowSprinkles hi, looks like they are going with soy as the way to creating index.html https://bugs.chromium.org/p/gerrit/issues/detail?id=5845 [18:33:31] This is probably the first deprecated icon that has language variants [18:33:40] But they said in the process of doing that they will fix the prefixed issue at the same time :) [18:34:55] RoanKattouw: yeah [18:44:01] Volker_E: OK, patch inbound [18:50:46] RoanKattouw: ah yes https://gerrit.wikimedia.org/r/#/c/345407/1 – thanks! [18:51:24] Sorry was about to ping that link to you [18:51:35] If you rebase your OOUI update commit on top of that, it should stop failing [18:52:25] RoanKattouw: I haven't updated the calendar yet nor sent an email (Faidon and I were just confirming the plan yesterday over email). tl;dr: no deploys the weeks of the switches, but the in-between week is normal. [18:52:35] * greg-g is on a bus to SFO [18:55:21] greg-g: OK cool, I did just get an announcement internally from Whatamidoing about it, who said you'd post an announcement to wikitech-l later [18:55:58] heh, OK, but yes [19:06:42] Project beta-scap-eqiad build #148585: 04FAILURE in 1 min 42 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/148585/ [19:08:45] 10Continuous-Integration-Config: Castor: mediawiki-core-qunit-jessie node_modules cache ineffective - https://phabricator.wikimedia.org/T159591#3141402 (10Prtksxna) @hashar I am curious to know if such a change could be implemented on per-repository basis. [19:10:44] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T161733#3141403 (10greg) [19:11:01] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T160552#3103176 (10greg) [19:12:15] Project beta-scap-eqiad build #148586: 04STILL FAILING in 1 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/148586/ [19:13:01] RoanKattouw: you saw it here first: https://wikitech.wikimedia.org/wiki/Deployments#Week_of_April_17th ;) [19:16:48] Project beta-scap-eqiad build #148587: 04STILL FAILING in 1 min 40 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/148587/ [19:20:02] Yippee, build fixed! [19:20:03] Project mediawiki-core-code-coverage-jessie build #4: 09FIXED in 2 hr 18 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-jessie/4/ [19:26:18] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown, 10MediaWiki-Unit-tests, 07Tracking: Let ApiDocumentationTest structure test pass on all repos - https://phabricator.wikimedia.org/T154838#3141522 (10Umherirrender) [19:26:39] Project beta-scap-eqiad build #148588: 04STILL FAILING in 1 min 38 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/148588/ [19:27:57] hrm, logstash broken in beta? scap thinks so... [19:28:49] deployment-logstash2.deployment-prep.eqiad.wmflabs [10.68.16.147] 9200 (?) : Connection refused [19:29:10] Unable to connect to Elasticsearch at http://localhost:9200. [19:32:09] thcipriani: i'm in the middle of deploying a new elasticsearch+kibana to it, but it's now going right [19:32:10] ebernhardson sent an email about elastic search being upgraded [19:32:14] this is why we do it first in labs :) [19:32:21] speaking of the genius... :] [19:32:40] s/now going/not going/ [19:33:00] ebernhardson: ah, ok, /me stops flailing about on beta :) [19:33:52] speaking of beta Giuseppe had to fight with the multiple hiera levels being applied on labs [19:34:09] and filled a nice proposal today to eventually overhaul puppet config / hiera [19:34:16] https://phabricator.wikimedia.org/T161675 "Re-think puppet management for deployment-prep" [19:36:38] Project beta-scap-eqiad build #148589: 04STILL FAILING in 1 min 42 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/148589/ [19:37:34] yeah, it's a good proposal, putting time into it at the expense of some other project is where it gets tricky :) [19:38:00] which is hmm a familiar problem ! [19:40:43] hashar my bugfix for ios 10.3 and mac os 10.12.4 was merged in gerrit :) [19:40:51] \o/ [19:41:07] * hashar waits for Apple to release: iOS "Paladox" 10.4 [19:41:13] lol [19:41:43] hashar just to let you know they are going to fix the prefixed issue a different way. They are fixing it with https://bugs.chromium.org/p/gerrit/issues/detail?id=5845 :) [19:42:17] hmm Gerrit bug tracker moved under chromium? [19:42:22] No [19:42:29] It's just under the chromium domain [19:42:46] aah I guess they keep using Google Code internally so [19:42:52] yep [19:43:25] I belive there is a memory leak in gerrit. They are trying to debug it. [19:43:47] I have no idea if it's in 10.3 But it's deffitly in 2.14 but shows more signs of the bug in master. [19:44:10] 10.3 -> 2.13 [19:44:14] https://groups.google.com/forum/#!topic/repo-discuss/G-SSkoj9uV8 [19:45:01] hashar also gerrit got alot more securer yesturday. My patch for eddsa support was merged. I tested it locally on a mac using the eddsa key and worked. [19:45:09] will be supported in 2.14+ [19:47:10] Project beta-scap-eqiad build #148590: 04STILL FAILING in 1 min 50 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/148590/ [19:49:15] thcipriani: logstash should be back up now on beta [19:51:23] ebernhardson: hrm, I can hit it from deployment-tin, but I guess the response format changed [19:51:45] getting a key error for 'aggregations' here https://github.com/wikimedia/puppet/blob/production/modules/service/files/logstash_checker.py#L247 [19:51:53] paladox: that is the newish ssh algorithms right? Pretty sure we have a task for that [19:52:01] Yep [19:52:08] thcipriani: hmm, the response format was same for cirrus but we don't use aggregations. looking [19:52:58] thcipriani: should still be aggregations, just checked a test request [19:53:05] thcipriani: could you pastie the response somewhere? [19:53:34] * thcipriani digs [19:54:23] (03PS3) 10Hashar: (WIP) ensure all jobs in Zuul are in JJB [integration/config] - 10https://gerrit.wikimedia.org/r/345321 (https://phabricator.wikimedia.org/T103847) [19:56:21] ebernhardson: hrm. something about the query changed I guess https://gist.github.com/thcipriani/c2f455d809719de019635fae5be30dac [19:56:46] thcipriani: ahh, filtered has been deprecated since 2.0 and is removed in 5.0 [19:56:50] Yippee, build fixed! [19:56:50] Project beta-scap-eqiad build #148591: 09FIXED in 1 min 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/148591/ [19:56:51] thcipriani: sec [19:57:15] ah, well that'll explain it https://github.com/wikimedia/puppet/blob/production/modules/service/files/logstash_checker.py#L191 [19:57:31] i'll update my puppet patch and re-cherry pick out to beta [19:57:58] !log added --force flag for scap in beta-scap-eqiad temporarily [19:58:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:58:52] 10Continuous-Integration-Config, 13Patch-For-Review: Write a test to ensure all jobs in Zuul are defined in JJB - https://phabricator.wikimedia.org/T103847#3141649 (10hashar) Test work. I used it to find out some bit rotting analytics jobs T97514 that were defined in Zuul but were gone from JJB. That will pre... [20:04:50] (03CR) 10Hashar: [C: 032] (WIP) ensure all jobs in Zuul are in JJB [integration/config] - 10https://gerrit.wikimedia.org/r/345321 (https://phabricator.wikimedia.org/T103847) (owner: 10Hashar) [20:05:52] 10Continuous-Integration-Config, 13Patch-For-Review: Write a test to ensure all jobs in Zuul are defined in JJB - https://phabricator.wikimedia.org/T103847#3141659 (10hashar) 05Open>03Resolved a:03hashar Can be run with: ``` tox -e py27 -- --logging-level=ERROR tests/test_integration.py ``` [20:06:23] (03Merged) 10jenkins-bot: (WIP) ensure all jobs in Zuul are in JJB [integration/config] - 10https://gerrit.wikimedia.org/r/345321 (https://phabricator.wikimedia.org/T103847) (owner: 10Hashar) [20:12:32] PROBLEM - Puppet run on deployment-logstash2 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:14:57] ^ me, free to ignore [20:21:09] RainbowSprinkles how do we find out more about a commit if they are giving out errors like this https://phabricator.wikimedia.org/T161206 one? [20:22:29] RECOVERY - Puppet run on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [20:24:00] The error is brining me to https://gerrit.googlesource.com/gerrit/+/refs/heads/stable-2.13/gerrit-server/src/main/java/com/google/gerrit/server/change/ChangeKindCacheImpl.java#331 [20:24:06] brining = bringing [20:44:16] (03CR) 10Hashar: [C: 031] "Yup looks good :)" [integration/config] - 10https://gerrit.wikimedia.org/r/345392 (https://phabricator.wikimedia.org/T153597) (owner: 10Krinkle) [20:50:01] thcipriani: is there something special about how scap runs the logstash_checker? I ran 'usr/local/bin/logstash_checker.py --service-name mediawiki --host deployment-mediawiki05 --delay 5 --logstash-host deployment-logstash2:9200' which checked fine, but a test scap still fails with the aggregation error so i must be hitting the wrong code path [20:51:25] hrm, --service-name mwdeploy may make a difference, that creates a different query [20:51:37] ahh, ok [20:51:40] ebernhardson Hi, im wondering would the elasticsearch 5.3.0 release be causing this error [20:51:41] {"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406} [20:51:45] in phabricator? [20:52:19] searching works fine. But phabricator checking the status of elasticsearch is failing. [20:52:34] ebernhardson: well done on Kibana blue/pink color scheme. I like it :] [20:52:42] paladox: no clue, we released 5.1.2 to production because 5.2 changes the tokenstream api from vectors to graphs, and we found some problems with it. Generally though elasticsearch doesn't accept anything form urlencoded, it just does json [20:52:51] but nothing really other than that: here's where scap does its business https://github.com/wikimedia/scap/blob/master/scap/tasks.py#L81 [20:53:09] oh [20:53:18] thcipriani: mwdeploy does trigger the error, thanks! [20:53:24] i found elasticsearch 5.2 worked. But upgrading to 5.3.0 causes that warnning. [20:53:27] * ebernhardson randomly guessed at the names [20:53:27] cool :) [20:53:47] paladox: elasticsearch is on a big kick about making everything fail faster, which is nice. Sometimes annoying though when you realize you were sending invalid things [20:53:58] oh [20:54:29] paladox: i would try leaving out content-type, or setting it to application/json [20:54:46] Oh, i doint think we send that. (not sure though) [20:55:23] here https://github.com/wikimedia/phabricator/blob/b800d11cead2ee6f0d9d2f166199c2cd904c85ad/src/applications/search/fulltextstorage/PhabricatorElasticFulltextStorageEngine.php#L310 [20:58:34] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T160550#3142007 (10thcipriani) [20:58:49] paladox: you'll have to dig into HTTPSFuture, it does it somewhere [20:59:09] oh [21:02:37] there was this update https://github.com/wikimedia/phabricator/commit/e41c25de5050d69b720424dadbe3d8680362ceaf [21:04:36] * ebernhardson desparately wants to re-format the queries in logstash_checker.py to be normal and sane :P [21:09:29] ah [21:09:49] twentyafterfour generating index through bin/search init fails with this error [21:10:20] twentyafterfour ebernhardson https://phabricator.wikimedia.org/P5158 [21:10:40] ebernhardson: if you have any query protips please share, these queries were me smashing my brain into docs for a handful of hours. :) [21:10:46] i think it may be a bug introduced in latest update to https://github.com/wikimedia/phabricator/commit/e41c25de5050d69b720424dadbe3d8680362ceaf [21:10:54] paladox: looking [21:11:09] thanks [21:11:29] it's only with 5.3? we use 5.2 in prod though? [21:12:23] I can revert back to 5.2 to see if it's with 5.3 [21:14:17] twentyafterfour the bug is with 5.3 [21:14:26] 5.2 works [21:14:41] thcipriani: :) i'm cleaning them up, and making things more explicit by dropping the query_string [21:14:50] query_string is just a parser on the elasticsearch side that generates the other queries [21:15:41] This https://github.com/elastic/elasticsearch/pull/21964 may have been the cause (1% chance) [21:20:34] paladox: you're looking for https://github.com/elastic/elasticsearch/pull/22691 [21:20:49] off by default in 5.x though, so perhaps somehow it got turned on in 5.3? [21:21:09] ah thanks [21:21:10] hmm [21:21:48] 10Continuous-Integration-Config, 10Fundraising-Backlog, 13Patch-For-Review: symfony-polyfill54 is breaking CI on wikimedia/fundraising/crm/vendor - https://phabricator.wikimedia.org/T143598#3142071 (10hashar) [21:23:13] ebernhardson i think they made it the default in elasticsearch [21:23:35] doing http.content_type.required: false in the elasticsearch config file fixes it. [21:25:58] 10Continuous-Integration-Config, 10Fundraising-Backlog, 13Patch-For-Review: symfony-polyfill54 is breaking CI on wikimedia/fundraising/crm/vendor - https://phabricator.wikimedia.org/T143598#3142093 (10hashar) That looks like the same problem we had on mediawiki/vendor . composer 1.1 generates a file meant fo... [21:25:59] ebernhardson anyways the config will be on by default in 6.x [21:29:57] thcipriani: decided it was simpler to leave query_string in than muck with it much. https://gerrit.wikimedia.org/r/#/c/344965/15/modules/service/files/logstash_checker.py [21:30:09] thcipriani: anything else you might know of that does direct queries to logstash clusteR? :) [21:30:30] ebernhardson https://github.com/elastic/elasticsearch/blob/ee802ad63c0f21d697a5095dd05dc6f94626ee4d/core/src/main/java/org/elasticsearch/http/HttpTransportSettings.java#L73 [21:30:41] looks like they made it true for rolling upgrades. [21:30:59] paladox: generally, i think the right approach is just going to be sending appropriate content types from phabricator [21:31:10] yep [21:31:24] ebernhardson: awesome :) hrm, there may be some kind of nagios alert that queries directly... [21:31:26] * thcipriani digs [21:32:45] ebernhardson: ah nevermind, just looks at graphite stuff. Thanks all I know of that looks at logstash. Thanks for your help! [21:33:01] er, That's all I know of, rather :) [21:33:20] thcipriani: great! [22:39:21] thcipriani: Hey, I was looking at T161737 to see if it's our fault (or if I can fix it either way), but the logstash URI 502s for me. [22:39:22] T161737: Catchable fatal error: Argument 1 passed to Title::equals() must be an instance of Title, null given - https://phabricator.wikimedia.org/T161737 [22:40:43] James_F: hrm, I've noticed that happen recently, not sure what the deal is. Lemme see if I can get a better linke [22:40:44] *link [22:42:53] ah that one's better https://logstash.wikimedia.org/goto/36a617c3c724a50b5fc2c20de5e4785c [22:42:56] * thcipriani updates task [22:52:49] thcipriani: Any stack traces? [22:53:06] thcipriani: 'Cos at this point it's a bit challenging. [22:53:40] indeed. Unfortunately there is not a lot of information afaict in the hhvm output. [22:56:34] Yeah. :-( [23:06:01] 10Browser-Tests-Infrastructure, 06Reading-Web-Backlog, 07Jenkins, 07Ruby, 15User-zeljkofilipin: MEDIAWIKI_URL may be set to incorrect value in mwext-mw-selenium job - https://phabricator.wikimedia.org/T144912#3142462 (10phuedx) [23:14:55] thcipriani: could you just paste the stacktrace onto the bugs themselves in the future? that would make debugging a lot easier if I didn't need to debug logstash + my browser first -.- [23:15:29] legoktm: fair enough, yeah, I can do that. [23:16:07] I still haven't figured out why, but noscript really doesn't like logstash [23:17:25] It's also better for longterm searching/prosperity [23:19:03] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T160550#3142484 (10thcipriani) [23:20:54] 10Continuous-Integration-Config, 06Release-Engineering-Team, 13Patch-For-Review: Switch MediaWiki coverage job from Trusty/Zend PHP 5.5 to Jessie/Zend PHP 7.0 - https://phabricator.wikimedia.org/T147778#3142493 (10Krinkle) >>! In T147778#3111710, @Krinkle wrote: > I've got a build in progress at (03PS1) 10Krinkle: Move mw coverage from Trusty (PHP 5.5) to Jessie (PHP 5.6) [integration/config] - 10https://gerrit.wikimedia.org/r/345475 (https://phabricator.wikimedia.org/T147778) [23:26:48] I used to have a problem with noscript and logstash for those shorturls. [23:27:16] can't remember how I fixed it, but I'm sure I'm less secure for having done so. [23:28:05] (03CR) 10Krinkle: [C: 032] "Pushed new version of 'mediawiki-core-code-coverage' to Jenkins." [integration/config] - 10https://gerrit.wikimedia.org/r/345475 (https://phabricator.wikimedia.org/T147778) (owner: 10Krinkle) [23:29:28] (03Merged) 10jenkins-bot: Move mw coverage from Trusty (PHP 5.5) to Jessie (PHP 5.6) [integration/config] - 10https://gerrit.wikimedia.org/r/345475 (https://phabricator.wikimedia.org/T147778) (owner: 10Krinkle) [23:34:46] 10Continuous-Integration-Config, 06Release-Engineering-Team, 13Patch-For-Review: Switch MediaWiki coverage job from Trusty/Zend PHP 5.5 to Jessie/Zend PHP 7.0 - https://phabricator.wikimedia.org/T147778#3142517 (10Krinkle) * Before: //Generated by PHP_CodeCo... [23:39:50] PROBLEM - Free space - all mounts on integration-slave-jessie-1001 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1001.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1001.diskspace._srv.byte_percentfree (<50.00%) [23:51:18] !log Free up space on integration-slave-jessie-1001 by removing old /srv/jenkins-workspace and /srv/pbuilder dirs [23:51:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:56:04] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T160550#3142569 (10Jdforrester-WMF) [23:56:43] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T160550#3103136 (10Jdforrester-WMF) Removed T161735 as it's now fixed in wmf.18 so doesn't block the train (but isn't Resolved yet).