[00:01:19] <RainbowSprinkles>	 I wonder how aawiki got broken to begin with
[00:01:23] <RainbowSprinkles>	 But, should be easily fixed now
[00:02:28] <wmf-insecte>	 Project beta-update-databases-eqiad build #17327: 04STILL FAILING in 4.4 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17327/
[00:03:00] <RainbowSprinkles>	 Ok, aawiki fixed, but arwiki busted now
[00:06:14] <RainbowSprinkles>	 RoanKattouw: I'm running update.php in foreachwiki
[00:06:25] <RainbowSprinkles>	 Surprisingly, there's a bunch of DB drift that's getting fixed/logged
[00:06:29] <RoanKattouw>	 That often fails in vagrant too
[00:06:41] <RoanKattouw>	 vagrant provision runs foreachwiki update.php and it often fails on one of my wikis
[00:07:01] <RainbowSprinkles>	 Ahhhhh, I know how things got confused.
[00:07:04] <RainbowSprinkles>	 Makes sense.
[00:07:23] <RainbowSprinkles>	 Ok: so the other day, we were runinng out of disk space on the beta databases. updatelogs were huge, for obvious reasons.
[00:07:27] <RainbowSprinkles>	 I truncated them
[00:07:40] <RainbowSprinkles>	 And so now running flow patches on *already correct* dbs breaks
[00:07:53] <RainbowSprinkles>	 That patch doesn't apply nicely if it's already applied, so something like a DROP IF EXIST would be best
[00:19:28] <RainbowSprinkles>	 RoanKattouw: Ideally, schema updates should detect if they've been done and/or be written in ways that are forward-compat and can re-run on an up-to-date schema.
[00:19:38] <RainbowSprinkles>	 This combination of updatelog + bad sql writing is bad :(
[00:19:46] <RoanKattouw>	 Yeah :(
[00:20:26] <wmf-insecte>	 Project beta-update-databases-eqiad build #17328: 04STILL FAILING in 26 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17328/
[00:20:54] <RainbowSprinkles>	 Oh shut up jenkins
[00:20:55] <RainbowSprinkles>	 We know
[00:21:07] <RoanKattouw>	 So maybe modifyExtensionField() should be discouraged then
[00:21:22] <RoanKattouw>	 It doesn't let you specify enough information to have it self-detect whether the schema change has happened
[00:21:31] <RoanKattouw>	 That said, I suppose updates passed to it are supposed to be idempotetn
[00:22:14] <RoanKattouw>	 So perhaps we should use DROP INDEX IF EXISTS
[00:22:20] <RoanKattouw>	 I checked and that is not a syntax error in SQLite
[00:24:00] <RainbowSprinkles>	 Yeah
[00:24:46] <RainbowSprinkles>	 The places we have nicer wrappers addTable(), dropTable(), etc etc etc
[00:24:57] <RainbowSprinkles>	 Are nice because we can wrap those in tableExists() checks and so forth
[00:26:03] <RoanKattouw>	 Here's the weird thing though, the file is still supposed to be idempotent
[00:26:10] <RoanKattouw>	 It drops the index but then creates the same index again
[00:26:16] <RoanKattouw>	 So something already went wrong if it doesn't exist
[00:26:26] <RainbowSprinkles>	 Yeah. Something got weird ~24h ago
[00:26:29] <RoanKattouw>	 But since reality doesn't appear to make sense, I'll add IF EXISTS anyway
[00:27:25] <RoanKattouw>	 OK I've edited Paladox's patch to add IF EXISTS and cleaned up the commit summary
[00:28:11] <RainbowSprinkles>	 flow_ext_ref_revision_v2
[00:28:16] <RainbowSprinkles>	 Is what exists on busted wikis
[00:28:25] <RainbowSprinkles>	 Not flow_ext_ref_idx_v2
[00:28:49] <RoanKattouw>	 Hmm
[00:29:21] <RainbowSprinkles>	 Should it have both? Neither? Who knows?!
[00:29:21] <RoanKattouw>	 That's a different schema change I think
[00:29:22] <RainbowSprinkles>	 :p
[00:29:32] <RainbowSprinkles>	 Super-alike names are confusing ;-)
[00:29:39] <RoanKattouw>	 Both, according to flow.sql
[00:30:16] <RainbowSprinkles>	 So, easy fix is IF EXISTS
[00:30:26] <RainbowSprinkles>	 But really: question as to how we got here? None of this changed recently.
[00:31:04] <RainbowSprinkles>	 This schema upgrade path for Flow is *very* fragile
[00:36:24] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Collaboration-Team-Triage, 10Flow, 10MediaWiki-Database, 13Patch-For-Review: Beta update.php fails with Can't DROP 'flow_ext_ref_idx_v2'; - https://phabricator.wikimedia.org/T166266#3290883 (10Paladox) Fails at  nWarning: fopen(/tmp/mw-UIDGenerator-UID-88): failed to open...
[00:41:14] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Collaboration-Team-Triage, 10Flow, 10MediaWiki-Database, 13Patch-For-Review: Beta update.php fails with Can't DROP 'flow_ext_ref_idx_v2'; - https://phabricator.wikimedia.org/T166266#3290885 (10demon) So that's unrelated, and I'm fixing it ^
[00:42:05] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Collaboration-Team-Triage, 10Flow, 10MediaWiki-Database, 13Patch-For-Review: Beta update.php fails with Can't DROP 'flow_ext_ref_idx_v2'; - https://phabricator.wikimedia.org/T166266#3290886 (10Mattflaschen-WMF) >>! In T166266#3290883, @Paladox wrote: > Fails at >  > Warni...
[00:56:59] <RainbowSprinkles>	 enwiki hanging on FlowUpdateUserWiki :\
[00:59:01] <RainbowSprinkles>	 Doesn't print any kind of progress markers, heh
[00:59:12] <RoanKattouw>	 Yeah there have been various bugs in the Flow schema upgrade path
[00:59:51] <RoanKattouw>	 I think they're probably all fixed now, but I wouldn't be surprised if you found one, and also the past bugs may have done strange things to your schema that the current code can't recover from
[01:00:39] <RainbowSprinkles>	 Yeah, that's my figuring
[01:00:47] <RainbowSprinkles>	 Basically: I think what set this off was me pruning updatelog
[01:01:04] <RoanKattouw>	 Yeah
[01:01:06] <RainbowSprinkles>	 And then the patches tried reapplying themselves
[01:01:23] <RainbowSprinkles>	 Well, some of them do, because others properly detect that they're done
[01:01:36] <RoanKattouw>	 Over in our team channel, Matt expressed skepticism at the safety/wisdom of pruning updatelog 
[01:01:58] <RainbowSprinkles>	 Tbh, it's always been a hacky table
[01:02:02] <RainbowSprinkles>	 And shouldn't be *relied* on
[01:02:13] <RainbowSprinkles>	 It was mostly a "we can probably skip this update because we know we've done it"
[01:02:20] <RainbowSprinkles>	 But updates, as you said, should be idempotent.
[01:06:26] <RainbowSprinkles>	 I truncated because they were all about 1g each
[01:08:52] <RainbowSprinkles>	 Would be nice if FlowUpdateUserWiki printed progress of any sort
[01:08:58] <RainbowSprinkles>	 show processlist tells me it is
[01:09:04] <RainbowSprinkles>	 :)
[01:20:33] <wmf-insecte>	 Project beta-update-databases-eqiad build #17329: 04STILL FAILING in 32 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17329/
[02:20:19] <wmf-insecte>	 Project beta-update-databases-eqiad build #17330: 04STILL FAILING in 18 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17330/
[02:28:33] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<44.44%)
[02:43:42] <wmf-insecte>	 Project beta-scap-eqiad build #156770: 04FAILURE in 0.48 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/156770/
[02:46:53] <RainbowSprinkles>	 !log running `mwscript extensions/Flow/maintenance/FlowUpdateUserWiki.php --wiki=enwiki` in a screen on deployment-tin, probably going to take all night
[02:46:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[02:55:49] <wmf-insecte>	 Yippee, build fixed!
[02:55:50] <wmf-insecte>	 Project beta-scap-eqiad build #156771: 09FIXED in 2 min 6 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/156771/
[03:20:26] <wmf-insecte>	 Project beta-update-databases-eqiad build #17331: 04STILL FAILING in 26 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17331/
[04:17:32] <wmf-insecte>	 Yippee, build fixed!
[04:17:32] <wmf-insecte>	 Project selenium-MultimediaViewer » firefox,beta,Linux,BrowserTests build #402: 09FIXED in 21 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/402/
[04:20:26] <wmf-insecte>	 Project beta-update-databases-eqiad build #17332: 04STILL FAILING in 26 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17332/
[05:20:26] <wmf-insecte>	 Project beta-update-databases-eqiad build #17333: 04STILL FAILING in 26 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17333/
[06:20:26] <wmf-insecte>	 Project beta-update-databases-eqiad build #17334: 04STILL FAILING in 26 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17334/
[06:24:13] <wmf-insecte>	 Yippee, build fixed!
[06:24:14] <wmf-insecte>	 Project selenium-Wikibase » chrome,test,Linux,BrowserTests build #371: 09FIXED in 1 hr 44 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=BrowserTests/371/
[06:49:09] <wmf-insecte>	 Yippee, build fixed!
[06:49:09] <wmf-insecte>	 Project selenium-Wikibase » chrome,beta,Linux,BrowserTests build #371: 09FIXED in 2 hr 9 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/371/
[07:08:37] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK
[07:20:23] <wmf-insecte>	 Project beta-update-databases-eqiad build #17335: 04STILL FAILING in 22 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17335/
[08:20:27] <wmf-insecte>	 Project beta-update-databases-eqiad build #17336: 04STILL FAILING in 26 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17336/
[08:42:43] <wikibugs>	 10Beta-Cluster-Infrastructure: Provide a version of frwiki on Beta Cluster / staging - https://phabricator.wikimedia.org/T166290#3291171 (10Jdforrester-WMF)
[09:20:23] <wmf-insecte>	 Project beta-update-databases-eqiad build #17337: 04STILL FAILING in 22 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17337/
[10:20:22] <wmf-insecte>	 Project beta-update-databases-eqiad build #17338: 04STILL FAILING in 22 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17338/
[10:33:59] <elukey>	 !log manual install of hhvm_3.18.2+dfsg-1+wmf4+exp1_amd64.deb on jobrunner02 to test a fix for the Redis.php lib
[10:34:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[10:39:16] <elukey>	 now if everything works the TCP time waits of jobrunner02 should go down
[10:51:06] <elukey>	 don't see anthing changed, will leave it running for a bit, ping me if needed
[11:20:25] <wmf-insecte>	 Project beta-update-databases-eqiad build #17339: 04STILL FAILING in 25 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17339/
[11:57:35] <prtksxna>	 Is there a way to undo a change that I've already landed? https://phabricator.wikimedia.org/D661
[11:57:45] <prtksxna>	 Should I re-open the revision?
[12:20:26] <wmf-insecte>	 Project beta-update-databases-eqiad build #17340: 04STILL FAILING in 25 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17340/
[12:57:56] <wikibugs>	 06Release-Engineering-Team (Kanban), 10VisualEditor, 15User-Ryasmeen, 15User-zeljkofilipin: LanguageScreenshotBot trying to edit a non-existent page without signing in - https://phabricator.wikimedia.org/T162454#3291710 (10Krinkle)
[13:10:11] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[13:12:12] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[13:15:34] <shinken-wm>	 PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[13:20:27] <wmf-insecte>	 Project beta-update-databases-eqiad build #17341: 04STILL FAILING in 27 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17341/
[13:31:39] <wikibugs>	 05Gitblit-Deprecate, 05MW-1.30-release-notes (WMF-deploy-2017-05-23_(1.30.0-wmf.2)), 13Patch-For-Review: Fix references to git.wikimedia.org in all repos - https://phabricator.wikimedia.org/T139089#3291795 (10Krinkle)
[13:32:22] <shinken-wm>	 PROBLEM - Puppet errors on deployment-restbase02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[13:32:27] <shinken-wm>	 PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[13:34:15] <wikibugs>	 10Browser-Tests-Infrastructure, 06Release-Engineering-Team (Backlog), 05MW-1.29-release-notes, 13Patch-For-Review, and 2 others: Update Ruby tests to Selenium 3 - https://phabricator.wikimedia.org/T158074#3291818 (10Krinkle)
[13:52:37] <wikibugs>	 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-General-or-Unknown, 07Epic, and 5 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3291879 (10Krinkle)
[14:17:10] <wikibugs>	 10Continuous-Integration-Config, 05Goal, 07I18n, 13Patch-For-Review: Configure banana checker for i18n files to run on all MediaWiki extensions and skins - https://phabricator.wikimedia.org/T94547#3292029 (10Krinkle)
[14:20:22] <wmf-insecte>	 Project beta-update-databases-eqiad build #17342: 04STILL FAILING in 22 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17342/
[14:26:42] <Krinkle>	 7 tasks left at https://phabricator.wikimedia.org/tag/mw-1.29-release/
[14:29:37] <shinken-wm>	 PROBLEM - Host deployment-phab02 is DOWN: CRITICAL - Host Unreachable (10.68.19.232)
[14:41:37] <wikibugs>	 (03Abandoned) 10Merlijn van Deen: Add fpm-based debian packaging [integration/composer] - 10https://gerrit.wikimedia.org/r/240451 (owner: 10Merlijn van Deen)
[14:41:41] <wikibugs>	 (03CR) 10Thcipriani: quibble: test running the container (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/355560 (owner: 10Hashar)
[14:45:03] <elukey>	 Krinkle: o/ - any chance that you have 5 mins for a question about https://github.com/facebook/hhvm/issues/7854 ? (Redis and persistent conn)
[14:49:49] <elukey>	 what I'd like to verify is if https://gerrit.wikimedia.org/r/#/c/353247/ is sufficient to hit that code
[14:50:13] <elukey>	 I deployed the new hhvm on jobrunner02 
[15:00:25] <wikibugs>	 06Release-Engineering-Team (Kanban), 10Scap: scap did not catch `Notice: Undefined variable: wmgRelatedArticlesShowInSidebar in /srv/mediawiki/wmf-config/CommonSettings.php on line 2893` - https://phabricator.wikimedia.org/T164754#3292167 (10zeljkofilipin) >>! In T164754#3250270, @bd808 wrote: > This third att...
[15:08:23] <Krinkle>	 elukey: I don't know , you'd have to ask AaronSchulz 
[15:08:49] <Krinkle>	 It might, and probably should, but it's been a while and job runners have changed a fair bit since we last used this (RPC etc.)
[15:20:27] <wmf-insecte>	 Project beta-update-databases-eqiad build #17343: 04STILL FAILING in 27 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17343/
[15:22:03] <elukey>	 Krinkle: sure sure, Aaron probably hates me at this stage :) What I'd like to know how many requests to Redis a jobrunner does at each run, I might just hack RunJobs.php and add some logs
[15:22:36] <elukey>	 wmf-config/jobqueue* config are much different in prod/labs
[15:22:44] <elukey>	 anyhow, thanks :)
[15:22:48] * Krinkle is also catching up on /,e 
[15:22:52] <Krinkle>	 T163337 *
[15:22:53] <stashbot>	 T163337: Job queue corruption after codfw switch over (Queue worth, duplicate runs) - https://phabricator.wikimedia.org/T163337
[15:23:01] <elukey>	 oh yes that one too
[15:23:08] <elukey>	 I think we got a good lead
[15:24:11] <elukey>	 but if we are right the solution is to restart periodically all the redis slaves to catch up (IIRC there is no command to force them to do a complete sync from the master)
[15:26:28] <elukey>	 let me know what you think about it
[15:27:00] <wikibugs>	 (03PS8) 10Zfilipin: WIP Problem: Can not use --retry option to retry failed tests as part of the same run [selenium] - 10https://gerrit.wikimedia.org/r/341523 (https://phabricator.wikimedia.org/T160086)
[15:27:00] * RainbowSprinkles hears redis job queue, gets triggered
[15:27:04] * RainbowSprinkles spasms
[15:28:50] * elukey hides the broken lua replication problem from RainbowSprinkles to calm him 
[15:29:56] <RainbowSprinkles>	 I'm mostly sad about the constant timeouts that litter my logs
[15:30:45] <elukey>	 RainbowSprinkles: I am working on it! If you want to help I am trying to test https://github.com/facebook/hhvm/issues/7854, that might be the main problem why I am not able to test persistent conns between the jobrunners and redis 
[15:30:53] <elukey>	 I suspect that this is one of the main issue for the timeouts
[15:31:04] <elukey>	 we open a tcp conn to redis for each command executed
[15:31:18] <RainbowSprinkles>	 Yeah, we don't use persistent right now
[15:32:40] <elukey>	 RainbowSprinkles: I have https://gerrit.wikimedia.org/r/#/c/351854/ ready to go.. Manually hacked mw1161 in prod but didn't see any reduction in TCP metrics (like time waits)
[15:33:35] <RainbowSprinkles>	 Hmm, could we try in beta too?
[15:33:46] <RainbowSprinkles>	 Just to give us wider testing
[15:34:25] <elukey>	 RainbowSprinkles: I already did :( - https://gerrit.wikimedia.org/r/#/c/353247/
[15:34:32] <elukey>	 no changes in the TCP metrics
[15:34:47] <RainbowSprinkles>	 Hmm
[15:34:51] <elukey>	 so I started to dig and I filed the issue report to hhvm (the one linked above)
[15:35:06] <elukey>	 I am wondering if that code review is sufficient to re-enable persistent conns
[15:35:40] <elukey>	 (about mw1161 - a prod jobrunner  https://grafana.wikimedia.org/dashboard/db/prometheus-apache-hhvm-dc-stats?panelId=22&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=jobrunner&var-instance=mw1161)
[15:35:50] <elukey>	 (time-waits are a huge number)
[15:37:04] <elukey>	 I believe that we hit the code I mentioned in the github issue after https://github.com/wikimedia/mediawiki/commit/0d89c642bde78be0b1093945d69e21bd5e3c6fff
[15:37:22] <elukey>	 but wmf-config/jobqueue* is much different in labs
[15:38:00] <elukey>	 (jobrunner02 in deployment-prep has a new hhvm version built by Moritz to include a fix for the issue that I am testing without any luck)
[15:42:02] <RainbowSprinkles>	 Boo :\
[15:47:15] <elukey>	 RainbowSprinkles: we'll manage to fix it eventually
[15:47:22] <RainbowSprinkles>	 Yeah, one can hope :)
[15:47:22] <elukey>	 I think we are close but something is missing
[15:48:04] <elukey>	 plus my experience with the mediawiki code is zero and each time my brain takes a huge amount of time to read php code
[15:49:43] <RainbowSprinkles>	 Considering I reviewed *most* of Aaron's work on the JobQueue when he refactored it ~2-3y ago, I remember surprisingly little of that area.
[16:03:31] <bearND>	 !log Update mobileapps to 946fe1f
[16:03:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[16:11:07] <wikibugs>	 10Browser-Tests-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-General-or-Unknown, 07Epic, and 5 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3292498 (10zeljkofilipin)
[16:14:49] <elukey>	 I just realized that to see if we are using persistent conns I just need to do sudo tcpdump -n host 10.68.16.177 on jobrunner02 (the ip is redis)
[16:15:03] <elukey>	 if I don't see Fin/syns then we are reusing conns
[16:16:15] <elukey>	 and I can see the commands since it is not encrypted and in plain text
[16:18:01] <elukey>	 it seems to me that we are re-using conns
[16:18:41] <elukey>	 but sometimes we have a storm of syns
[16:19:18] <RainbowSprinkles>	 Ah, so it is working (to some degree)
[16:20:28] <wmf-insecte>	 Project beta-update-databases-eqiad build #17344: 04STILL FAILING in 28 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17344/
[16:21:23] <RainbowSprinkles>	 Promise I'm close to fixing that ^
[16:24:37] <elukey>	 seems so, it could be that the jobrunner runs a lot of RunJobs.php, each one (re)using a single tcpconn
[16:26:07] <elukey>	 ok I have apache logs and tcpdump opened to filter syns
[16:26:25] <elukey>	 yes got it
[16:26:57] <elukey>	 need to count but I am pretty sure there is a match
[16:27:13] <elukey>	 yes confirmed
[16:29:23] <elukey>	 so I might have been testing the wrong thing all this time
[16:31:13] <elukey>	 ok I'll revert hhvm on jobrunner02 to apply this test and see if it re-uses conns
[16:41:56] <wikibugs>	 (03PS4) 10Zfilipin: WIP Run WebdriverIO tests in CI for extensions [integration/config] - 10https://gerrit.wikimedia.org/r/352602 (https://phabricator.wikimedia.org/T164721)
[16:42:49] <shinken-wm>	 PROBLEM - Puppet errors on deployment-jobrunner02 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [0.0]
[16:44:20] <elukey>	 this is probably me
[16:44:29] <elukey>	 !log restored hhvm on jobrunner02
[16:44:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[16:47:50] <shinken-wm>	 RECOVERY - Puppet errors on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0]
[16:53:07] <elukey>	 no something is weird with the tcpdump method, it seems working also with persistent not set
[16:53:59] <elukey>	 (Just commented /srv/mediawiki/wmf-config/jobqueue-labs.php and it keeps working fine)
[16:56:55] <wikibugs>	 (03CR) 10Zfilipin: "Apologies for the super late review. I am cleaning up my pending reviews. Would you still like to see this merged? If so, I can resolve th" [ruby/api] - 10https://gerrit.wikimedia.org/r/304331 (https://phabricator.wikimedia.org/T142600) (owner: 10Gergő Tisza)
[17:01:08] <shinken-wm>	 PROBLEM - Puppet errors on integration-slave-docker-1000 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[17:03:10] <thcipriani>	 ^ this one is me (fiddling with that machine now)
[17:07:42] <wikibugs>	 06Release-Engineering-Team (Kanban), 10Release Pipeline: Ensure Docker 17.05 on integration-slave-docker-1000 - https://phabricator.wikimedia.org/T164962#3292686 (10thcipriani) 05Open>03Resolved a:03thcipriani
[17:09:54] <wikibugs>	 (03CR) 10Zfilipin: "Apologies for the super late review. I am cleaning up my pending reviews. Would you still like to see this merged? If so, I can resolve th" [selenium] - 10https://gerrit.wikimedia.org/r/304332 (https://phabricator.wikimedia.org/T142600) (owner: 10Gergő Tisza)
[17:11:11] <shinken-wm>	 RECOVERY - Puppet errors on integration-slave-docker-1000 is OK: OK: Less than 1.00% above the threshold [0.0]
[17:17:59] <elukey>	 so I need to figure out another way to test this
[17:18:05] <elukey>	 it is getting super confusing
[17:19:20] <elukey>	 but one of the main issues with the current jobrunners is that too many runjobs.php are executed, that generates timewaits
[17:20:28] <wmf-insecte>	 Project beta-update-databases-eqiad build #17345: 04STILL FAILING in 28 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17345/
[17:39:06] <wikibugs>	 06Release-Engineering-Team, 10Wikimedia-Site-requests: Consider to switch frrwiki from group2 to group1 per Proofreadpage - https://phabricator.wikimedia.org/T166263#3292767 (10Dereckson) a:05Dereckson>03None
[17:41:09] <wikibugs>	 06Release-Engineering-Team, 10Wikimedia-Site-requests: Consider to switch frrwiki from group2 to group1 per Proofreadpage - https://phabricator.wikimedia.org/T166263#3290314 (10Zppix) Why temp?
[18:20:25] <wmf-insecte>	 Project beta-update-databases-eqiad build #17346: 04STILL FAILING in 25 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17346/
[19:16:12] <wikibugs>	 06Release-Engineering-Team (Kanban), 06Reading-Admin, 05Deployment Blockers, 05Release: MW-1.30.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T163512#3293005 (10matmarex)
[19:20:26] <wmf-insecte>	 Project beta-update-databases-eqiad build #17347: 04STILL FAILING in 26 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17347/
[19:46:26] <hashar>	 !log deployment-tin manually cleaning disk space
[19:46:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[19:52:42] <shinken-wm>	 PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[20:20:29] <wmf-insecte>	 Project beta-update-databases-eqiad build #17348: 04STILL FAILING in 28 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17348/
[20:32:43] <shinken-wm>	 RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0]
[20:53:48] <wmf-insecte>	 Project beta-code-update-eqiad build #157126: 04FAILURE in 48 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/157126/
[20:55:38] <wikibugs>	 06Release-Engineering-Team (Kanban), 06Reading-Admin, 05Deployment Blockers, 05Release: MW-1.30.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T163512#3293315 (10thcipriani)
[21:03:51] <wmf-insecte>	 Project beta-code-update-eqiad build #157127: 04STILL FAILING in 50 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/157127/
[21:13:44] <wmf-insecte>	 Project beta-code-update-eqiad build #157128: 04STILL FAILING in 44 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/157128/
[21:22:26] <wmf-insecte>	 Project beta-update-databases-eqiad build #17349: 04STILL FAILING in 27 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17349/
[21:25:19] <wmf-insecte>	 Yippee, build fixed!
[21:25:19] <wmf-insecte>	 Project beta-code-update-eqiad build #157129: 09FIXED in 47 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/157129/
[21:29:44] <paladox>	 You fixed it RainbowSprinkles :)
[21:30:38] <RainbowSprinkles>	 That's code update
[21:30:43] <RainbowSprinkles>	 Not databases update
[21:31:51] <wikibugs>	 06Release-Engineering-Team (Kanban), 06Reading-Admin, 05Deployment Blockers, 05Release: MW-1.30.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T163512#3293399 (10greg) From the newly created blocking sub-task:  >>! In T166345#3293389, @greg wrote: >>>! In T166345#3293316, @thcipriani wrote...
[21:32:19] <RainbowSprinkles>	 Someone had an uncommitted file it looks like
[21:32:33] <paladox>	 oh
[21:33:21] <paladox>	 lets merge https://gerrit.wikimedia.org/r/#/c/355448/
[21:34:44] <paladox>	 thanks
[21:40:14] <wikibugs>	 (03CR) 10Gergő Tisza: "I don't need it anymore since I'm not doing any selenium debugging these days. Feel free to abandon it if you don't think it's useful (I g" [ruby/api] - 10https://gerrit.wikimedia.org/r/304331 (https://phabricator.wikimedia.org/T142600) (owner: 10Gergő Tisza)
[21:40:56] <wikibugs>	 (03CR) 10Gergő Tisza: "I don't need it anymore since I'm not doing any selenium debugging these days. Feel free to abandon it if you don't think it's useful (I g" [selenium] - 10https://gerrit.wikimedia.org/r/304332 (https://phabricator.wikimedia.org/T142600) (owner: 10Gergő Tisza)
[21:42:39] <paladox>	 lol, of course, there's no drop index if exists
[21:42:40] <paladox>	 https://stackoverflow.com/questions/39849002/how-to-make-drop-index-if-exists-for-mysql
[21:42:45] <paladox>	 RainbowSprinkles ^^
[21:45:42] <wikibugs>	 06Release-Engineering-Team (Kanban), 06Reading-Admin, 05Deployment Blockers, 05Release: MW-1.30.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T163512#3293438 (10kaldari)
[21:46:03] <paladox>	 But it exists in mariadb
[21:46:03] <paladox>	 https://mariadb.com/kb/en/mariadb/drop-index/#if-exists
[21:46:28] <wikibugs>	 06Release-Engineering-Team, 10Wikimedia-Site-requests: Consider to switch frrwiki from group2 to group1 per Proofreadpage - https://phabricator.wikimedia.org/T166263#3290314 (10greg) @Dereckson are updates for the extension harder due to it being used by wikis in different groups? I guess I don't understand th...
[21:48:03] <Reedy>	 Only in >= 10.1.4
[21:50:32] <paladox>	 yep
[21:50:41] <paladox>	 Reedy would you know how to do a drop index?
[21:50:51] <paladox>	 Or do we have to some how use php magic to find it out
[21:50:56] <Reedy>	 eh?
[21:51:22] <paladox>	 Reedy i meant, php magic as in query the db and then if the index exist, doint run this drop index command
[21:51:53] <wikibugs>	 06Release-Engineering-Team, 10Wikimedia-Site-requests: Consider to switch frrwiki from group2 to group1 per Proofreadpage - https://phabricator.wikimedia.org/T166263#3293464 (10Dereckson) @Tpt can develop, but there are updates to come in ProofreadPage easier to handle if all wikisource switch as the same time...
[21:52:01] <Reedy>	 Well, that's what dropIndex does in the updater
[21:52:13] <Reedy>	 Checks for index existence first
[21:52:16] <paladox>	 Oh
[21:52:25] <paladox>	 What function is that?
[21:52:39] <Reedy>	 it's called drop index
[21:52:57] <Reedy>	 			[ 'dropIndex', 'user_groups', 'ug_user_group', 'patch-user_groups-primary-key.sql' ],
[21:53:09] <paladox>	 thankyou
[21:53:10] * paladox changes patch
[21:55:53] <RoanKattouw>	 Welp, sorry, I swear I found SQLite docs that said it was valid
[21:56:13] <paladox>	 RoanKattouw but not valid in mysql.
[21:56:21] <RoanKattouw>	 WTF
[21:56:34] <RoanKattouw>	 That's the first time I've seen something be sensible in SQLite and not in MySQL :/
[21:56:42] <paladox>	 But valid for mariadb 10.1.4+
[21:57:00] <Reedy>	 RoanKattouw: See also xkcd on "standards"
[21:57:01] <Reedy>	 ;D
[22:09:09] <paladox>	 Done, https://gerrit.wikimedia.org/r/#/c/355448/
[22:09:17] <paladox>	 Reedy RainbowSprinkles RoanKattouw ^^ :)
[22:20:34] <wmf-insecte>	 Project beta-update-databases-eqiad build #17350: 04STILL FAILING in 33 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17350/
[22:26:30] <paladox>	 Reedy, fixed it :). All works now. Saw the updater log in the test
[22:26:30] <paladox>	 :)
[22:26:42] <paladox>	 Dropping flow_ext_ref_idx_v2 index from table flow_ext_ref ...done.
[22:54:00] <RoanKattouw>	 Thanks paladox 
[22:54:08] <paladox>	 Your welcome :)
[22:54:14] <RoanKattouw>	 That patch has so many people's fingerprints on it now that I'll ask Matt to review it, since he hasn't touched it
[22:54:22] <paladox>	 thanks :)
[23:18:22] <wikibugs>	 06Release-Engineering-Team, 10Wikimedia-Site-requests: Consider to switch frrwiki from group2 to group1 per Proofreadpage - https://phabricator.wikimedia.org/T166263#3293610 (10greg) Understood. So, a follow-on question with a bit of preamble:  Most other extensions (modulo CentralNotice and Wikidata) handle b...
[23:19:25] <wikibugs>	 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Provide a version of frwiki on Beta Cluster / staging - https://phabricator.wikimedia.org/T166290#3293613 (10greg)
[23:20:27] <wmf-insecte>	 Project beta-update-databases-eqiad build #17351: 04STILL FAILING in 27 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/17351/
[23:47:32] <wikibugs>	 06Release-Engineering-Team (Kanban), 10Release Pipeline: Fix Blubber variant expansion for boolean/int config properties - https://phabricator.wikimedia.org/T166353#3293628 (10dduvall)
[23:49:20] <wikibugs>	 06Release-Engineering-Team (Kanban), 10Release Pipeline: Fix Blubber variant expansion for boolean/int config properties - https://phabricator.wikimedia.org/T166353#3293641 (10dduvall)