[00:08:32] 10Browser-Tests: [Spike] Decouple MW-Selenium from Cucumber - https://phabricator.wikimedia.org/T108273#1516828 (10dduvall) 3NEW a:3dduvall [00:08:56] 10Browser-Tests: [Spike] Decouple MW-Selenium from Cucumber - https://phabricator.wikimedia.org/T108273#1516836 (10dduvall) [00:09:44] 10Browser-Tests: [Spike] Decouple MW-Selenium from Cucumber - https://phabricator.wikimedia.org/T108273#1516828 (10dduvall) [00:11:20] marxarelli: hah, you fixed the typo before I could [00:11:59] greg-g: wataaaah! [00:12:27] so quick those reflexes [00:12:28] (Bruce Lee's favorite beverage) [00:12:38] *groan* [00:13:01] i think you mean *zing!* [00:13:34] :D [00:18:29] 10Deployment-Systems, 10RESTBase, 6Services, 6operations, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1516845 (10thcipriani) > I hope to be wrong about this, but my fear is that things that aren't explicitly c... [00:38:44] 10Deployment-Systems, 10RESTBase, 6Services, 6operations, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1516868 (10GWicke) > a general way to have a puppet run be part of the deploy process As well as disablin... [01:06:39] Project browsertests-Wikidata-PerformanceTests-linux-firefox-sauce build #334: FAILURE in 38 sec: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-PerformanceTests-linux-firefox-sauce/334/ [01:15:04] 10Deployment-Systems, 6Release-Engineering, 6Services, 6operations: Streamline our service development and deployment process - https://phabricator.wikimedia.org/T93428#1516944 (10GWicke) [03:18:45] Yippee, build fixed! [03:18:45] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #778: FIXED in 36 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/778/ [03:25:28] 6Release-Engineering, 10Continuous-Integration-Config, 3Mobile-App-Sprint-63-Android-Europium, 5Patch-For-Review, 3Wikipedia-Android-App: Create jenkins slave instance dedicated to Android runs - https://phabricator.wikimedia.org/T107336#1516983 (10thcipriani) I threw everything we had at it: 8 cores and... [04:16:30] 6Release-Engineering, 10Continuous-Integration-Config, 3Mobile-App-Sprint-63-Android-Europium, 5Patch-For-Review, 3Wikipedia-Android-App: Create jenkins slave instance dedicated to Android runs - https://phabricator.wikimedia.org/T107336#1517040 (10Niedzielski) @thcipriani, thanks for trying that out! It... [04:31:09] Yippee, build fixed! [04:31:09] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #523: FIXED in 39 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/523/ [05:24:29] 10Deployment-Systems, 10RESTBase, 6Services, 6operations, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1517098 (10mmodell) >>! In T107532#1516868, @GWicke wrote: >> a general way to have a puppet run be part of... [06:31:40] 6Release-Engineering, 6Performance-Team, 6operations, 7Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1517180 (10mmodell) [07:51:12] (03CR) 10Spage: "Puzzled" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/226542 (owner: 10Paladox) [08:45:44] 6Release-Engineering, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1517417 (10Tau) Which is the easiest way to restore the files `includes/HttpFunctions.php` and `includes/filerepo/Fo... [09:19:53] (03PS16) 10Paladox: Add jenkins test for BoilerPlate [integration/config] - 10https://gerrit.wikimedia.org/r/226680 [09:20:16] (03CR) 10Paladox: "Can somebody merge this all patches have been merged." [integration/config] - 10https://gerrit.wikimedia.org/r/226680 (owner: 10Paladox) [09:57:03] (03CR) 10Paladox: Update Blueprint tests (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/226542 (owner: 10Paladox) [09:57:08] (03PS6) 10Paladox: Update Blueprint tests [integration/config] - 10https://gerrit.wikimedia.org/r/226542 [10:28:19] 6Release-Engineering, 6operations: Try out hack ( 10Continuous-Integration-Infrastructure, 6Multimedia, 6operations, 5Patch-For-Review: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1517731 (10brion) Patch https://gerrit.wikimedia.org/r/#/c/230078/ switches MW-Vagrant's TMH from... [12:32:31] 10Continuous-Integration-Infrastructure, 6Multimedia, 6operations, 5Patch-For-Review: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1517796 (10brion) I'm also unable to produce VP9 WebM output with this ffmpeg build... TMH patch... [12:38:20] 10Continuous-Integration-Infrastructure, 6Multimedia, 6operations, 5Patch-For-Review: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1517801 (10MoritzMuehlenhoff) I had a quick look at the PPA and the build embeds a local copy of l... [12:46:58] Yippee, build fixed! [12:46:59] Project browsertests-PdfHandler-test2.wikipedia.org-linux-firefox-sauce build #599: FIXED in 57 sec: https://integration.wikimedia.org/ci/job/browsertests-PdfHandler-test2.wikipedia.org-linux-firefox-sauce/599/ [12:54:20] Project browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #557: FAILURE in 19 sec: https://integration.wikimedia.org/ci/job/browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/557/ [13:03:53] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #744: FAILURE in 31 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/744/ [13:33:38] 6Release-Engineering, 6Zero, 6operations, 7Mobile, 7Technical-Debt: Pull WikipediaMobileFirefoxOS from mediawiki-config - https://phabricator.wikimedia.org/T107172#1517904 (10akosiaris) I am removing the operations project, as I am not seeing anything the ops can help with/have to do. Fee free to readd [13:33:48] 6Release-Engineering, 6Zero, 7Mobile, 7Technical-Debt: Pull WikipediaMobileFirefoxOS from mediawiki-config - https://phabricator.wikimedia.org/T107172#1517905 (10akosiaris) [13:57:50] 10Beta-Cluster, 10Continuous-Integration-Infrastructure, 10MediaWiki-API, 7Pywikibot-tests: prevent modules with broken paraminfo being deployed to production wikis - https://phabricator.wikimedia.org/T108322#1518013 (10jayvdb) [13:58:32] Anyone around to deploy quick fix? [14:35:36] Project browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce build #220: FAILURE in 7 min 35 sec: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/220/ [15:18:41] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce build #141: FAILURE in 39 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce/141/ [15:18:42] 6Release-Engineering, 6operations: [Spike] Try out hack ( 6Release-Engineering, 6operations: [Spike] Try out hack (>! In T91590#1517624, @Joe wrote: > Do we really want to vendor-lock us into HHVM? For what gain? > > Do we really, really have a compelling reason to use hack in me... [15:26:07] (03PS5) 10Paladox: Archive Mantle extension [integration/config] - 10https://gerrit.wikimedia.org/r/228877 [15:26:25] (03PS4) 10Paladox: Add npm to Translate extension [integration/config] - 10https://gerrit.wikimedia.org/r/228872 [15:27:08] (03PS13) 10Paladox: Allow skins to also be tested like extensions can [integration/config] - 10https://gerrit.wikimedia.org/r/228470 (https://phabricator.wikimedia.org/T107748) [15:30:33] (03PS4) 10Paladox: Fix tests by adding MobileFrontend as dependency to TimedMediaHandler [integration/config] - 10https://gerrit.wikimedia.org/r/228803 [15:36:15] 10Deployment-Systems, 10RESTBase, 6Services, 6operations, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1518303 (10mmodell) >>! In T107532#1517098, @mmodell wrote: > One approach that I've been thinking about a... [15:37:27] thcipriani: around? [15:37:48] kart_: yup [15:38:07] thcipriani: Need deploy of https://gerrit.wikimedia.org/r/#/c/230101/ [15:38:10] :) [15:40:52] 10Deployment-Systems, 10RESTBase, 6Services, 6operations, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1518323 (10GWicke) @mmodell, would this get along with a running puppet agent for general system configurat... [15:41:19] thcipriani: possible to do? [15:41:39] kart_: yeah, just checking if 17 will auto update or I need a submodule bump [15:41:56] ok :) [15:42:15] If not, let me know. I'll submit patch for core. [15:42:19] 10Deployment-Systems, 10RESTBase, 6Services, 6operations, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1518331 (10mmodell) @gwicke: yes it would, as long as the rules applied by `puppet apply` don't try to chan... [15:43:02] kart_: looks like it should auto bump the submodule, go ahead and merge that patch and I'll push it out. [15:43:21] twentyafterfour: also, how about puppet-private? [15:43:46] and, would this have access to prod hiera? [15:45:11] gwicke: It probably wouldn't have access to -private, hiera I'm not sure about, but we might be able to make that work by copying the hiera data to each node via a manifest on the puppet master [15:46:35] if that can be made nicely self-contained, easy to set up in labs, easier to test than prod puppet etc, then that could work for me [15:46:45] it's up against two lines per template, though ;) [15:47:06] gwicke: two lines per template? [15:47:43] I lied; it's actually 8: https://github.com/gwicke/ansible-deploy/blob/labs_support/roles/restbase/tasks/config.yml [15:47:57] can be made two though, when using the arg syntax [15:48:44] that is quite nice and concise [15:49:00] to me the --check --diff feature is the real killer [15:49:01] it [15:49:11] it basically provides the functionality of the puppet compiler [15:49:45] thcipriani: merging.. [15:49:47] shows you a diff of all the changes it *would* apply to a given node, without applying them [15:49:51] kart_: kk [15:49:56] gwicke: nice [15:50:42] thcipriani: merged. [15:50:52] twentyafterfour: in any case, I think we should get together and talk with ops, services & releng about how we are going to make config deploys less painful [15:51:03] and less dangerous [15:51:06] kart_: should be on tin momentarily, I'll sync it out when I see it. [15:51:08] yeah, indeed [15:51:59] gwicke: from my perspective the important part is that the config should live in the same repo as the code, so that you can have atomic changes to both config and code that depends on that config [15:52:22] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Allow skins to also be tested - https://phabricator.wikimedia.org/T107748#1518354 (10Florian) >>! In T107748#1502801, @Paladox wrote: > Ok done. ??? I think we misunderstood :) That was a question :P What tests should run for skins? Skins itself (at... [15:52:43] otherwise you have to coordinate commits in two places and everything is just twice as complex [15:54:18] whether the config is managed by ansible, puppet, or whatever, doesn't concern me as much, but ideally there would be a way to run the config management tasks from the deployment system instead of waiting for puppet and coordinating commits to the operations/puppet repo [15:54:21] twentyafterfour: well, the deployment of both should be driven by the same system [15:54:28] that doesn't necessarily need to be in the code repo [15:54:36] or rather, should probably not be [15:54:50] as it's common to deploy different versions to different environments [15:55:15] for example, we deploy hash X to production, but test hash Y in staging in preparation for the next deploy [15:55:28] gwicke: but the config is version specific right? or _can be_ version specific [15:55:51] yeah, can be [15:55:52] like if you add a new feature that requires a new config option [15:56:08] then the code and the config that includes the new option can be in sync easily [15:56:17] it's also environment-specific, though [15:56:36] the variables are definitely different in labs vs. staging vs. prod [15:57:13] right but couldn't you just have a config-$env.yaml, one file for each environment? [15:57:24] in a deployment system repository that has the variables and templates, you can commit all of those atomically [15:58:19] that will actually get harder with etcd [15:58:22] right but then you have to be sure that you deploy the right version of config for the corresponding version of code, and rollbacks have to roll back two places [15:58:48] no, code version and config change are done in the same repo, same commit [15:59:06] with etcd that will no longer be the case [15:59:17] how does etcd change things? [15:59:17] 15:57 < moritzm> headsup: gerrit restart in 5 minutes [15:59:31] twentyafterfour: a code roll-back won't roll back etcd [15:59:43] it's harder to atomically update a repo *and* etcd [15:59:49] yeah [16:00:06] not impossible by any means, just not as easy as 'git checkout xxx' [16:00:11] but the deployment tooling could write to etcd when deploying a config [16:00:26] with the config supplying the values for etcd [16:00:27] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Allow skins to also be tested - https://phabricator.wikimedia.org/T107748#1518358 (10Paladox) Running the mwext-testextension isent for unit tests it test the extension like all the code to make sure it works. This should do it on the skin too to make... [16:01:08] twentyafterfour: yeah, but then you don't really need etcd [16:01:14] as you have the info already [16:01:39] it's really about where the authoritative copy is [16:01:43] I'm not entirely clear on how etcd fits into the bigger picture [16:01:53] if the repo is the master for everything, then it shouldn't be too bad [16:01:57] etcd would just follow [16:02:31] etcd allows for quicker changes, if it's the authoritative copy, right? But it seems like a bad place for the authoritative info since it's not as persistent [16:03:13] yeah [16:03:27] it's nice for dynamic state [16:03:39] like maintaining the set of active nodes [16:03:53] fortunately, that's mostly orthogonal to deploys [16:04:24] I liked the idea of doing pull deployments, instead of push, which we discussed a while back in another channel ... and putting the version in etcd, then having the nodes take care of themself. But getting to that point seems like a large divergence from where we are currently headed [16:04:46] it's a pain in the a** to build orchestration around that [16:05:23] even something that's simple in a push system like hitting ctrl-c half-way through to terminate the deploy becomes really hard [16:05:36] yeah [16:06:20] but it would be a good way to deal with nodes coming back online after maintenance / downtime - they could run through a full deployment with the target version supplied by etcd [16:06:31] to achieve your 'eventually consistent' requirement [16:06:47] yeah, but they would need to do so in a coordinated manner [16:07:11] an alternative would be to simply run a cron job from the deploy host [16:07:33] ansible for example doesn't restart services if nothing changed [16:07:39] what needs to be coordinated? [16:08:08] rolling deploys, for example [16:08:38] as an example, we only want to take out one cassandra node at a time, as taking out several would lead to an outage [16:08:58] cirrus needs that too :) [16:09:08] or elasticsearch, actually [16:09:50] twentyafterfour: it's the same issue with puppet, which is why we can't use it to automatically restart services [16:12:00] it might be solvable in a pull model given enough effort, but you are getting into all the typical distributed-consensus-with-failures issues [16:12:51] definitely simpler to set up a cron job on a deploy host, and set up a lock file so that the cron job skips if a manual deploy is underway [16:23:51] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Allow skins to also be tested - https://phabricator.wikimedia.org/T107748#1518468 (10Florian) >>! In T107748#1518358, @Paladox wrote: > Running the mwext-testextension isent for unit tests it test the extension like all the code to make sure it works.... [16:25:46] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Allow skins to also be tested - https://phabricator.wikimedia.org/T107748#1518472 (10Paladox) Hum I am not sure then because I see extensions that doint even have test folder running the mwext-testextension test and they work. [16:31:54] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Allow skins to also be tested - https://phabricator.wikimedia.org/T107748#1518561 (10Paladox) See https://git.wikimedia.org/summary/mediawiki%2Fextensions%2Fexamples which isent running any unit tests yet it run the mwext test [16:41:00] 5Continuous-Integration-Isolation, 6operations, 7Blocked-on-Operations: Backport python-os-client-config 1.3.0-1 from Debian Sid to jessie-wikimedia - https://phabricator.wikimedia.org/T104967#1518585 (10Andrew) p:5Triage>3Normal [17:11:43] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Allow skins to also be tested - https://phabricator.wikimedia.org/T107748#1518692 (10Florian) Sure, the test group extensions runs some generic tests, mostly structure tests, see: https://github.com/wikimedia/mediawiki/blob/master/tests/phpunit/suite.... [17:37:51] 10Browser-Tests, 5Patch-For-Review, 3Reading-Web: Failed Jenkins job sets Sauce Labs job to passed - https://phabricator.wikimedia.org/T105589#1518855 (10Jdlrobson) This seems to be becoming more and more of a problem. I can't debug any of the failures quickly in: https://integration.wikimedia.org/ci/job/bro... [17:47:38] (03PS1) 10Dduvall: Decouple `Environment#test_name` [selenium] - 10https://gerrit.wikimedia.org/r/230139 (https://phabricator.wikimedia.org/T108273) [17:47:40] (03PS1) 10Dduvall: Decouple check for MediaWiki extension dependencies [selenium] - 10https://gerrit.wikimedia.org/r/230140 (https://phabricator.wikimedia.org/T108273) [17:52:51] !log Removed stale cherry-pick of Ib10deb5b4e42d440c5deff0897e714174f3e38fe that was breaking puppet rebase [17:52:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [17:55:40] !log logstash-beta.wmflabs.org working again; broken since Ib10deb5 was merged [17:55:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:01:57] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Allow skins to also be tested - https://phabricator.wikimedia.org/T107748#1518968 (10Paladox) Ok then skins should to test that all the code works. [18:04:39] so beta labs is down right? [18:05:53] beta cluster is loading fine for me: http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page [18:06:04] hmm, maybe not [18:06:21] loads for me [18:06:29] hit random [18:06:52] I got http://en.wikipedia.beta.wmflabs.org/wiki/0.1973438081580008 and it doesn't load [18:06:54] yeah no good [18:07:08] http://en.wikipedia.beta.wmflabs.org/wiki/UserMergetktzvm no load [18:07:31] jdlrobson: looks like a code breakage rather than a beta issue? [18:07:56] it's just the main page that seems to be okay [18:08:07] our browser tests just exploded so that's how i noticed. haven't had time to debug [18:08:33] 10Beta-Cluster: Incident: Beta Cluster not loading random pages - https://phabricator.wikimedia.org/T108356#1518989 (10greg) 3NEW [18:08:41] 10Beta-Cluster: Incident: Beta Cluster not loading random pages - https://phabricator.wikimedia.org/T108356#1518996 (10greg) p:5Triage>3Unbreak! [18:11:12] well fatal monitor seems to have exploded in beta [18:11:21] https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/fatalmonitor [18:13:05] wtf [18:14:16] 10Beta-Cluster: Incident: Beta Cluster not loading random pages - https://phabricator.wikimedia.org/T108356#1519012 (10greg) Fatal monitor exploded in the last 15 minutes: https://logstash-beta.wmflabs.org/#dashboard/temp/AU8JXo9l3FJIeRcqruq0 [18:20:38] hmm lots of "Catchable fatal error: Argument 2 passed to Wikibase\Client\Hooks\ParserLimitHookHandlers::__construct()" https://gerrit.wikimedia.org/r/#/c/225474/ merged yesterday... [18:21:14] 6Release-Engineering, 10Continuous-Integration-Config, 3Mobile-App-Sprint-63-Android-Europium, 5Patch-For-Review, 3Wikipedia-Android-App: Create jenkins slave instance dedicated to Android runs - https://phabricator.wikimedia.org/T107336#1519054 (10Niedzielski) @thcipriani, any chance we keep those extra... [18:22:30] 10Beta-Cluster: Incident: Beta Cluster not loading random pages - https://phabricator.wikimedia.org/T108356#1519075 (10greg) ``` 18:09 < greg-g> bblack: dumb question, any caching things change recently that would cause this to happen: Beta Cluster main page loads fine, but any other rando... [18:23:54] thcipriani: worth a test revert? [18:24:53] so, FWIW, it's the mediawiki backend, mediawiki01 has the 500 error same as varnish, so I don't think it's varnish [18:26:00] this is also weird: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/64779/consoleFull [18:28:45] greg-g: test revert? [18:29:04] I mean, test the theory of cause by reverting that change? [18:29:20] but, you're in charge :) [18:29:38] weiiird [18:29:55] that seems like it should have been a failed run [18:30:12] right? [18:30:20] 18:25 < bblack> 18 FetchError c Junk after gzip data [18:30:20] 18:26 < bblack> ^ is kiilling pages on beta [18:30:25] 18:26 < bblack> I don't see anything cherrypicked to deployment that would cause that obviously, though [18:30:52] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #402: FAILURE in 51 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/402/ [18:39:21] so, mediawiki03 doesn't have any traffic, so doing curl -I -H 'host: en.wikipedia.beta.wmflabs.org' localhost/wiki/UserMergefrtiwn and then checking the error_log shows the only error is that wikibase one [18:42:50] 10Beta-Cluster, 10MediaWiki-extensions-WikibaseClient, 10Wikidata: Incident: Beta Cluster not loading random pages - https://phabricator.wikimedia.org/T108356#1519206 (10greg) ``` 18:20 <+thciprian> hmm lots of "Catchable fatal error: Argument 2 passed to Wikibase\Client\Hooks\ParserLimit... [18:44:10] thcipriani: I pinged hoo in #wikidata, he said he's looking now [18:44:36] kk, may not be the thing, but certainly yelling loudly [18:45:22] yah [18:46:01] 18:45 < hoo> ah crap [18:46:02] 18:45 < hoo> I so hate stub ojects [18:46:03] :) [18:46:27] thcipriani: that spam in the beta-scap-eqiad job output is my cherry-picked cross-dc replication patch. Harmless but noisy. I'll unpick it [18:50:28] !log updated scap: removed cherry-pick of I3d2b4e7 and updated to latest HEAD [18:50:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:54:02] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #403: FAILURE in 1 min 1 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/403/ [18:55:03] thcipriani, greg-g: scap spam gone now -- https://integration.wikimedia.org/ci/job/beta-scap-eqiad/64782/console [19:01:26] 10Beta-Cluster, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 5Patch-For-Review: Incident: Beta Cluster not loading random pages - https://phabricator.wikimedia.org/T108356#1519292 (10greg) a:3hoo [19:02:10] alright, assuming that'll pan out, I'm going to go find lunch [19:43:18] 10Beta-Cluster, 10MediaWiki-extensions-WikibaseClient, 10Wikidata: Incident: Beta Cluster not loading random pages - https://phabricator.wikimedia.org/T108356#1519373 (10hoo) 5Open>3Resolved [19:44:47] jdlrobson: in case you didn't see, beta cluster looks fixed now [20:07:47] 10Beta-Cluster: beta fatal.log getting flooded with nonsense warnings about mediawikiwiki - https://phabricator.wikimedia.org/T108375#1519477 (10Krenair) 3NEW [20:28:44] 10Beta-Cluster: beta fatal.log getting flooded with nonsense warnings about mediawikiwiki - https://phabricator.wikimedia.org/T108375#1519529 (10Krenair) `/etc/init.d/mw-job-runner stop` does not stop the errors, but `service jobrunner stop` does. [20:32:19] greg-g: thanks a bunch! [20:34:36] 10Beta-Cluster: beta fatal.log getting flooded with nonsense warnings about mediawikiwiki - https://phabricator.wikimedia.org/T108375#1519534 (10Krenair) I did make it dump `$argv` when it triggers that error though. `["runJobs.php","--wiki=mediawikiwiki","--type=CentralAuthCreateLocalAccountJob","--maxtime=60",... [20:35:09] 10Beta-Cluster, 10MediaWiki-JobRunner: beta fatal.log getting flooded with nonsense warnings about mediawikiwiki - https://phabricator.wikimedia.org/T108375#1519538 (10greg) >>! In T108375, @Krenair wrote: > We don't have a mediawikiwiki in beta... Do we need one? Or should the jobrunner/maint script be agnos... [20:36:19] I wonder if that ^ is from the OAuth patch that tries to autocreate accounts... [20:36:49] tgr: ^ [20:36:56] 10Beta-Cluster, 10MediaWiki-JobRunner: beta fatal.log getting flooded with nonsense warnings about mediawikiwiki - https://phabricator.wikimedia.org/T108375#1519549 (10Krenair) The argv I posted above is the standard precise jobrunner dispatch command pattern (see `modules/mediawiki/templates/jobrunner/dispatc... [20:37:22] tgr: Would OAuth in beta cluster try to make an account on mediawikiwiki for a new user? [20:37:51] wmf-config/CommonSettings.php: $wgCentralAuthAutoCreateWikis = array( 'loginwiki', 'metawiki', 'mediawikiwiki' ); ? [20:37:58] :) [20:38:12] well there you go [20:38:23] wmf-config/CommonSettings.php: $wgMWOAuthCentralWiki = 'mediawikiwiki'; [20:39:11] Let's see what happens if I make that remove mediawikiwiki on labs and try to kill off the existing jobs [20:39:15] A completely separate question would be why deployment-videoscaler01 is running account creation jobs [20:39:27] one thing at a time :p [20:40:29] haha [20:41:03] ostriches: no email to mediawiki-announce? [20:41:47] Yippee, build fixed! [20:41:47] Project browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce build #221: FIXED in 9 min 9 sec: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/221/ [20:45:10] krenair@deployment-bastion:/srv/mediawiki-staging$ redis-cli -a zomgsecretlabsredispasswdhere -h deployment-redis01 "keys" "*" | grep mediawikiwiki [20:45:10] mediawikiwiki:jobqueue:CentralAuthCreateLocalAccountJob:l-unclaimed [20:45:10] mediawikiwiki:jobqueue:CentralAuthCreateLocalAccountJob:h-data [20:47:27] bd808, deleted those, sync'd a live hack to only add mediawikiwiki if in prod, restarted jobrunner [20:47:28] no luck [20:48:19] Krenair: hmmm [20:49:20] I think ... there may be more places that jubrunner hides the list of wikis [20:49:51] I vaguely remember purging something like this in prod in January [20:49:58] let me look in phab [20:50:42] Krenair: https://phabricator.wikimedia.org/T87360#991136 [20:51:11] We purged something very horrible from GWToolset jobs on commons in production IIRC [20:53:31] krenair@deployment-bastion:/srv/mediawiki-staging$ redis-cli -a $password -h deployment-redis01 hgetall jobqueue:aggregator:h-ready-queues:v2 | grep mediawikiwiki [20:53:31] CentralAuthCreateLocalAccountJob/mediawikiwiki [20:54:26] ran "hdel jobqueue:aggregator:h-ready-queues:v2 CentralAuthCreateLocalAccountJob/mediawikiwiki" [20:54:46] bd808, looks good! [20:55:09] shiny! chrome! V-8! [20:55:30] what? [20:55:41] going to commit this config change and deploy it properly [20:55:45] http://madmax.wikia.com/wiki/War_Boys [21:03:47] 10Beta-Cluster, 10MediaWiki-JobRunner, 5Patch-For-Review: beta fatal.log getting flooded with nonsense warnings about mediawikiwiki - https://phabricator.wikimedia.org/T108375#1519644 (10Krenair) Also ran the following commands in redis: * del mediawikiwiki:jobqueue:CentralAuthCreateLocalAccountJob:l-unclaim... [21:03:52] 10Beta-Cluster, 10MediaWiki-JobRunner, 5Patch-For-Review: beta fatal.log getting flooded with nonsense warnings about mediawikiwiki - https://phabricator.wikimedia.org/T108375#1519645 (10Krenair) 5Open>3Resolved a:3Krenair [21:21:09] (03PS2) 10Dduvall: Decouple `Environment#test_name` [selenium] - 10https://gerrit.wikimedia.org/r/230139 (https://phabricator.wikimedia.org/T108273) [21:21:11] (03PS2) 10Dduvall: Decouple check for MediaWiki extension dependencies [selenium] - 10https://gerrit.wikimedia.org/r/230140 (https://phabricator.wikimedia.org/T108273) [21:21:13] (03PS1) 10Dduvall: Decouple screenshot-ing and artifacts from Cucumber hooks [selenium] - 10https://gerrit.wikimedia.org/r/230230 (https://phabricator.wikimedia.org/T108273) [21:22:11] (03CR) 10jenkins-bot: [V: 04-1] Decouple check for MediaWiki extension dependencies [selenium] - 10https://gerrit.wikimedia.org/r/230140 (https://phabricator.wikimedia.org/T108273) (owner: 10Dduvall) [21:22:27] (03CR) 10jenkins-bot: [V: 04-1] Decouple `Environment#test_name` [selenium] - 10https://gerrit.wikimedia.org/r/230139 (https://phabricator.wikimedia.org/T108273) (owner: 10Dduvall) [21:22:33] (03CR) 10jenkins-bot: [V: 04-1] Decouple screenshot-ing and artifacts from Cucumber hooks [selenium] - 10https://gerrit.wikimedia.org/r/230230 (https://phabricator.wikimedia.org/T108273) (owner: 10Dduvall) [21:22:53] (03PS2) 10Dduvall: Decouple screenshot-ing and artifacts from Cucumber hooks [selenium] - 10https://gerrit.wikimedia.org/r/230230 (https://phabricator.wikimedia.org/T108273) [21:25:07] (03CR) 10jenkins-bot: [V: 04-1] Decouple screenshot-ing and artifacts from Cucumber hooks [selenium] - 10https://gerrit.wikimedia.org/r/230230 (https://phabricator.wikimedia.org/T108273) (owner: 10Dduvall) [21:33:35] greg-g: Magic secret re. story points: It doesn't matter. [21:34:42] greg-g: (As in, in my experience "number of Phabricator tasks" approximates to work. There are always tickets that are trivial and tickets that are major, but it normally doesn't skew much. [21:38:34] James_F: good point [21:39:47] greg-g: We actually automatically classify tasks we haven't estimated as '5's, which works out to about the average of all the ones we have, less a bit (because the bigger the work the more likely I'll have plonked some numbers on it). [21:46:20] (03PS1) 10Dduvall: Decouple session annotation from Cucumber [selenium] - 10https://gerrit.wikimedia.org/r/230234 (https://phabricator.wikimedia.org/T108273) [21:47:26] James_F: gotcha [21:47:39] greg-g: Worth replying as such on-list? [21:48:02] James_F: yeah, especially the "in the end, assume all tasks are equal isn't that far off" part [21:48:09] assuming* [21:48:26] Kk. [21:49:32] ok, my headache is too much now, I'm going to lay down for a bit [22:59:43] 10Beta-Cluster: fix nutcracker config in Beta - https://phabricator.wikimedia.org/T107538#1520086 (10thcipriani) 5Open>3Resolved a:3thcipriani Nutcracker was seemingly removed from deployment-bastion when it was removed from the `mediawiki::init` manifest and moved into the `role::mediawiki::common` manife... [23:06:17] 6Release-Engineering, 10Continuous-Integration-Config, 3Mobile-App-Sprint-63-Android-Europium, 5Patch-For-Review, 3Wikipedia-Android-App: Create jenkins slave instance dedicated to Android runs - https://phabricator.wikimedia.org/T107336#1520103 (10Niedzielski) @thcipriani, one more question. It looks li... [23:39:37] (03PS1) 10Niedzielski: Change apps-android-wikipedia-gradlew patterns [integration/config] - 10https://gerrit.wikimedia.org/r/230255 (https://phabricator.wikimedia.org/T107716) [23:43:50] (03PS2) 10Niedzielski: Change apps-android-wikipedia-gradlew patterns [integration/config] - 10https://gerrit.wikimedia.org/r/230255 (https://phabricator.wikimedia.org/T107716) [23:47:48] 10Beta-Cluster, 6Release-Engineering, 10Wikimedia-Logstash: Make logstash in beta public - https://phabricator.wikimedia.org/T76784#1520266 (10bd808) >>! In T76784#837010, @coren wrote: > Please remember to add the disclaimer from [[ https://wikitech.wikimedia.org/wiki/Wikitech:Labs_Terms_of_use#If_my_tools_... [23:49:38] 10Deployment-Systems, 10RESTBase, 6Services, 6operations, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1520271 (10GWicke) [23:51:02] 10Beta-Cluster, 6Release-Engineering, 10Wikimedia-Logstash: Make logstash in beta public - https://phabricator.wikimedia.org/T76784#1520273 (10bd808) @greg What do you think, can we flip this switch? I have all the bits in puppet now (at least in proposed patches) to let me change the vhost easily to no long... [23:55:05] (03PS1) 10Niedzielski: Update apps-android-wikipedia-gradlew-lint patterns [integration/config] - 10https://gerrit.wikimedia.org/r/230256 (https://phabricator.wikimedia.org/T99112)