[03:05:20] PROBLEM - Puppet staleness on deployment-eventlog05 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [03:14:54] PROBLEM - Puppet staleness on deployment-maps03 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [43200.0] [07:29:09] zeljkof: good morning! I talked with Timo yesterday, we can switch webdriver.io to headless mode https://gerrit.wikimedia.org/r/#/c/358019/5/tests/selenium/wdio.conf.js ;D [07:32:17] hashar: cool [07:32:50] That would make it impossible to record video [07:32:53] I am amending the change [07:32:55] :| [07:35:02] zeljkof: chrome has some support to create screencast https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-startScreencast :D [07:36:33] Saw that [07:36:43] but yeah that stream .png files for each frame [07:36:48] that is surely low perf :] [07:36:49] Not sure if it would work for us [07:37:37] an alternative [07:37:50] if DISPLAY is not set pass '--headless', '--disable-gpu' [07:37:53] else skip it :] [07:38:09] so whenever there is no display/x server available that would use the headless mode [07:48:17] 10Release-Engineering-Team (Watching / External), 10DBA: Certain wiki databases missing from replicas? - https://phabricator.wikimedia.org/T132838#4087090 (10jcrespo) p:05High>03Low This is arguably not an issue- those databases are deleted, so of course they are missing from public/analytics replicas. Low... [07:58:39] zeljkof: so I tweaked your change https://gerrit.wikimedia.org/r/#/c/358019/6/tests/selenium/wdio.conf.js [07:58:46] so that when DISPLAY is available it runs as usual [07:58:58] but when DISPLAY is not set, that appends --headless --disable-gpu [08:20:35] hey twentyafterfour and no_justification and really anyone else who might be around right now [08:20:48] apergos: hello [08:20:53] we need to make very very sure that no updates using icu stuff with php7 happen [08:21:03] hey, glad to see you [08:21:14] icu stuff with php7? [08:21:16] this is something I forgot about last night (well at 1am) [08:22:07] php7 has not been rebuilt with the specific libicu version that we have everywhere incl hhvm using [08:22:15] ah [08:22:36] so ordering of things (think: categories) will be different, so that's not so nice [08:22:43] well for scap it's only used for linting php files so probably no icu use there? [08:23:08] so I looked at the l10update script, there's a bunch of interesting pieces in there [08:23:40] /usr/local/bin/mwscript extensions/LocalisationUpdate/update.php that's one [08:23:43] no idea what that does [08:23:44] hmm.. l10nupdate yeah does it use icu? hmm [08:23:59] /usr/local/bin/mwscript rebuildLocalisationCache.php same [08:24:49] /usr/local/bin/foreachwiki extensions/WikimediaMaintenance/refreshMessageBlobs.php ? [08:24:54] and btw there's stuff broken too [08:25:31] foreachwiki calls /usr/local/bin/foreachwikiindblist which has php5 hardcoded in it [08:25:44] so I don't know how that was ever going to work, maybe that's a blessing right now [08:26:20] I mentioned foreachwiki last night .. I don't know how much it's used, if any [08:26:31] well it's used right there in the l10update [08:26:52] those commands I pasted above, they're all from that [08:27:08] so do we need to go back to using tin for now? [08:27:16] well that's what we need to figure out [08:27:20] or can we fic libicu in php7 [08:27:29] *fix [08:27:46] there's not going to be a fix instantly, there's a build *in testing* as I understand it [08:27:54] rushing that would be a bad idea [08:28:06] yeah [08:28:47] so we can either look at these LocalisationUpdate/update.php and WikimediaMaintenance/refreshMessageBlobs.php scripts and see if they do anything bad [08:28:51] or yeah we can move back [08:29:26] well, see if they do anyhing bad plus ensure no one runs anything from deployment1001 that uses icu stuff, without realizing it [08:30:24] what would I need to look for? specific icu calls or do we just need to run them and compare the output? [08:30:58] well it's already run (parts of it have run, whatever is not hardcoded to use php5) once [08:31:12] so I guess we could look at category ordering and see if it's changed [08:31:14] on wiki [08:32:28] is there somewhere that describes the difference between the libicu versions? [08:32:34] I don't know what to look for [08:32:35] we're comparing libicu52 to libicu57 [08:32:50] i'm asking google right now [08:37:22] is this relevant? http://cldr.unicode.org/index/downloads/cldr-29#TOC-Migration [08:37:51] it doesn't help me any [08:38:01] we specifically need the collation ordering I guess [08:38:22] so everywhere else uses libicu 52 and php7 now uses icu 57? [08:40:38] 10Phabricator (Upstream), 10Upstream: Provide comment preview while batch editing tasks - https://phabricator.wikimedia.org/T133653#4087176 (10Aklapper) The latest Task Bulk Editor supports line breaks when mass-adding comments. So at least less bad surprises, though that does not make this request invalid. :) [08:42:00] yes [08:42:22] that's exactly it (sorry, was in another window trying to get google to cough up anything useful) [08:42:59] google has gotten significantly less useful in recent months [08:43:12] it's more and more crap, less and less relevant results [08:46:17] ok, reading up on https://www.mediawiki.org/wiki/Manual:$wgCategoryCollation [08:48:25] it doesn't seem to me that the newer icu version is necessarily worse? may be actually improved? [08:48:40] https://phabricator.wikimedia.org/P4286 [08:49:42] yes, the issue is just compatibility [08:49:55] see https://phabricator.wikimedia.org/T189295 [08:50:18] (03PS2) 10Hashar: Unhardcode Zuul --git-cache dir [integration/quibble] - 10https://gerrit.wikimedia.org/r/422224 [08:50:33] anyways as I look at the cron job, it looks like there's local rebuild of the cdb file, staging it for deployment (none of those touch collation in mw so far obviously) [08:50:54] right [08:51:18] (03CR) 10Hashar: [C: 032] "In a Docker container:" [integration/quibble] - 10https://gerrit.wikimedia.org/r/422224 (owner: 10Hashar) [08:51:41] (03Merged) 10jenkins-bot: Unhardcode Zuul --git-cache dir [integration/quibble] - 10https://gerrit.wikimedia.org/r/422224 (owner: 10Hashar) [08:51:44] then the script failed on scap-cdb-rebuild on all hosts [08:51:49] because the command wasn't there, [08:52:04] so whatever that might have done or not done, it didnt' do it [08:52:07] so we're ok for now [08:52:10] SO [08:53:04] there's no cron jobs on deploy1001 to worry about, that was the only one [08:53:15] we do have the fact that this job is broken right now [08:54:30] and we don't want anyone to run something from the command line that might aciddentally write something to the db *from deployment1001* using the new collation order [08:54:41] thoughts? [08:55:23] I don't really have a good idea of what could possibly touch category collation. It seems like l10n is not going to touch the db [08:55:42] collation shouldn't really be an issue in the translation cdbs [08:55:52] ok [08:56:26] I haven't inspected that code very closely but from a high level understanding of it I think it's probably safe [08:57:05] I would rather not be proven wrong by breaking things though [08:58:05] I think you're right because it's only updating either the cdb files in isolation, or removing some cache keys [08:58:08] that seems safe enough [08:58:46] there might be some other maintenance scripts that someone could run manually that I am unaware of [08:58:57] I have almost zero experience with the maintenance scripts [08:59:15] in theory they should not run them from the deployment hosts but who would prevent it [08:59:58] well if that isn't normal practice then it's probably not a huge concern. we could maybe just disable foreachwiki on the deployment host for good measure? [09:00:48] it's already broken :-D see: uses foreachwikiindblist -> uses php5 -> ain't no php5 there [09:00:55] right [09:00:58] obviously that needs to be fixed [09:01:19] I mean, we could make it output something like "don't do that" instead of having it break [09:01:23] right [09:01:39] this does mean though that l10update continues to be broken on deployment1001 [09:01:40] * twentyafterfour creates a task for this [09:01:55] until... when? [09:02:01] since it also uses that script [09:02:05] well...hmm [09:02:26] nvm then we should probably fix it to use php7 and test l10nupdate to be sure it's working [09:02:45] ugh. I wish we could just leave it broken forever [09:02:49] if we fix the script to use php7, then the script is available for people to run and break things [09:02:53] and there's the rub [09:02:53] right [09:03:08] * twentyafterfour doesn't like l10nupdate [09:03:20] it's a fragile and unfriendly thing to maintain [09:03:34] you are not the first one to menton this [09:03:41] yeah ;) [09:04:08] not my soap box, it's no.justification's dept. [09:04:16] heh [09:07:47] is no justification liable to have something insightful on making l10nupdate work, while keeping people from running $bad_things by mistake? [09:08:04] maybe [09:08:07] I really really hate the idea of the deploy1001 migration being blocked on the icu migration [09:08:33] 10Release-Engineering-Team (Kanban), 10Analytics-Tech-community-metrics, 10Code-Health: Develop canonical/single record of origin, machine readable list of all repos deployed to WMF sites. - https://phabricator.wikimedia.org/T190891#4087229 (10Aklapper) [09:08:39] yeah, especially since git-lfs is blocked on the deploy1001 migration [09:09:10] well: we can put the foreachwiki wrapper temporarily, note that the l10n task is broken (ticket I guess), and get his input this evening my time [09:09:17] (and ores is blocked on git-lfs) [09:09:20] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban): Prepare CI for REL1_31 - https://phabricator.wikimedia.org/T190879#4087230 (10Aklapper) [09:09:20] yuck [09:10:06] ok sounds like a plan. I'll make a task [09:10:08] ok [09:10:49] I'll see what moritz has to say about timeframes and etc, when he shows up [09:12:11] I've asked if he would drop in here when he's back around [09:15:53] Ugh. [09:16:08] Fucking l10nupdate [09:16:54] huh see email thread (hashar), just sent a bit go [09:16:58] *ago [09:17:13] I've given it far too many fucks over the years [09:18:17] welcome [09:18:30] it must be ridiculously ridiculous o clock there by now [09:18:38] https://phabricator.wikimedia.org/T190909 [09:18:57] 10Release-Engineering-Team (Next): php5 is missing on deploy1001 which breaks foreachwiki & l10nupdate - https://phabricator.wikimedia.org/T190909#4087263 (10mmodell) [09:19:48] 10Deployments, 10Release-Engineering-Team (Next), 10MediaWiki-Maintenance-scripts, 10PHP 7.0 support: php5 is missing on deploy1001 which breaks foreachwiki & l10nupdate - https://phabricator.wikimedia.org/T190909#4087254 (10mmodell) [09:19:56] which of the things in hashar's list are issues Right Now (i.e. instanves in beta that use stretch, that would be affected, ci, etc)? [09:21:21] It would be 2:21am, yes. [09:22:18] 10Beta-Cluster-Infrastructure, 10User-zeljkofilipin, 10WorkType-NewFunctionality: Make selenium users use botflags at beta-cluster - https://phabricator.wikimedia.org/T116027#4087267 (10zeljkofilipin) >>! In T116027#4086444, @zhuyifei1999 wrote: > You mean @legoktm ? It's possible. :) It was a few months ag... [09:24:31] your call: leave the job broken for now until you're awake later and can think things through? go back to tin for now? or, it's clear there's not a safe way to prevent people from maybe running a script that would touch category collation and so go back to tin until icu migration is done? [09:25:42] no_justification: ^^ [09:25:49] Go back to tin [09:25:53] sold [09:25:56] go to sleep [09:26:05] thank you [09:26:17] twentyafterfour: what needs to happen to move back? [09:26:18] heh it's 4:20 here [09:26:24] nice, real nice [09:26:34] Should be just swapping deploy_server back to tin [09:26:58] Er, deployment_server [09:27:01] common.yaml [09:27:03] to move back, well yeah just revert the patches that mut*ante merged last night [09:27:04] is this a puppet change I'll need to merge? [09:27:05] ic [09:27:11] yeah [09:27:14] Just the final one [09:27:21] We can still leave it as a co-master a-la naos [09:27:23] set it p and I'll merge it then [09:27:31] https://gerrit.wikimedia.org/r/#/c/420914/ [09:30:53] 10Scap, 10Operations, 10Packaging, 10Patch-For-Review: Install git-lfs client (at least on scap targets & masters) - https://phabricator.wikimedia.org/T180628#4087305 (10mmodell) [09:49:48] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: WebdriverIO should run Chrome headlessly - https://phabricator.wikimedia.org/T167507#4087351 (10zeljkofilipin) [09:54:48] Hey, for T190780 I need to verify a schema change has successfully been applied to beta. Can someone run a "DESCRIBE site_stats;" there for me (for all of the beta wikis at best, I'm currently looking how to easily iterate through those). [09:54:48] T190780: Schema changes to site_stats - https://phabricator.wikimedia.org/T190780 [10:13:19] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10User-zeljkofilipin: Cannot find module 'bluebird' - https://phabricator.wikimedia.org/T190914#4087460 (10zeljkofilipin) [10:13:29] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10User-zeljkofilipin: Cannot find module 'bluebird' - https://phabricator.wikimedia.org/T190914#4087472 (10zeljkofilipin) p:05Triage>03Normal [10:16:47] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10User-zeljkofilipin: Cannot find module 'bluebird' - https://phabricator.wikimedia.org/T190914#4087492 (10zeljkofilipin) [10:18:07] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10User-zeljkofilipin: Cannot find module 'bluebird' - https://phabricator.wikimedia.org/T190914#4087460 (10zeljkofilipin) [10:18:38] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10User-zeljkofilipin: MediaWiki-Vagrant does not install bluebird package - https://phabricator.wikimedia.org/T190914#4087460 (10zeljkofilipin) [10:23:36] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10User-zeljkofilipin: MediaWiki-Vagrant does not install bluebird package - https://phabricator.wikimedia.org/T190914#4087552 (10zeljkofilipin) [10:30:24] twentyafterfour: you still there? hoping for a test on deploy1001 before we call everything good [10:37:10] 10Differential, 10Phabricator, 10Developer-Wishlist (2017): Automatically add "patch-for-review" tag when `arc diff` - https://phabricator.wikimedia.org/T150510#4087563 (10Aklapper) p:05Low>03Lowest [10:54:08] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10User-zeljkofilipin: MediaWiki-Vagrant does not install bluebird package - https://phabricator.wikimedia.org/T190914#4087588 (10zeljkofilipin) [10:56:12] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10User-zeljkofilipin: MediaWiki-Vagrant does not install bluebird package - https://phabricator.wikimedia.org/T190914#4087591 (10zeljkofilipin) [11:13:31] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10User-zeljkofilipin: MediaWiki-Vagrant does not install bluebird package - https://phabricator.wikimedia.org/T190914#4087625 (10zeljkofilipin) [11:37:14] (03PS1) 10Hashar: (WIP) Port mw-apply-settings and drop slave-scripts [integration/quibble] - 10https://gerrit.wikimedia.org/r/422390 [11:39:12] apergos: sorry, I'm here now [11:39:18] ok [11:39:42] shall I run a single-file sync? [11:40:09] what do you do typically when switching hosts? [11:40:45] I'm not sure I've only done it just yesterday for the first time ;) [11:40:53] :-D [11:41:20] ok well let's denifitely do the one file at least [11:42:24] I can follow up with a full scap to test more stuff [11:43:15] apergos: Failed to acquire lock "/var/lock/scap-global-lock" [11:43:23] that file is root-owned and must be manually removed [11:43:34] 10Beta-Cluster-Infrastructure: Request for shell access on deployment-prep - https://phabricator.wikimedia.org/T190925#4087718 (10EddieGP) [11:44:24] done [11:45:15] (03PS2) 10Hashar: (WIP) Port mw-apply-settings and drop slave-scripts [integration/quibble] - 10https://gerrit.wikimedia.org/r/422390 [11:45:25] (03PS3) 10Hashar: Port mw-apply-settings and drop slave-scripts [integration/quibble] - 10https://gerrit.wikimedia.org/r/422390 [11:45:25] good, that file is in place on deploy1001 [11:45:37] syncing. I don't see the usual log message in -operations [11:48:45] I saw it [11:49:24] (02:48:08 μμ) logmsgbot: !log twentyafterfour@tin Synchronized README: test deploy from tin.eqiad.wmnet (duration: 03m 35s) [11:49:27] apergos: yeah looks like it's all good [11:49:33] I'm running a full scap now [11:49:36] ok! [11:50:02] that one takes a long time to run [11:50:08] yep [11:50:22] we'll get the completion log message and see [12:13:57] it's even slower than I remember [12:21:29] 10Release-Engineering-Team, 10Epic, 10Tracking, 10User-zeljkofilipin: Selenium framework improvements - https://phabricator.wikimedia.org/T182986#4087814 (10zeljkofilipin) [12:21:33] 10Release-Engineering-Team, 10Epic, 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), 10Patch-For-Review, 10User-zeljkofilipin: Q3 Selenium framework improvements - https://phabricator.wikimedia.org/T182421#4087815 (10zeljkofilipin) [12:21:35] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: WebdriverIO should run Chrome headlessly - https://phabricator.wikimedia.org/T167507#4087813 (10zeljkofilipin) [12:22:13] apergos: having three deployment servers seems to be making it slower, among other things [12:22:25] groan [12:22:45] it's 60% through the last step [12:23:08] sigh [12:23:25] long as it finishes before the eu swat :-/ [12:23:40] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: WebdriverIO should run Chrome headlessly - https://phabricator.wikimedia.org/T167507#4087844 (10zeljkofilipin) a:03zeljkofilipin [12:40:09] 10Beta-Cluster-Infrastructure, 10Privacy, 10User-MarcoAurelio: Disable the collection of private information on abusefilter log for Beta Cluster wikis - https://phabricator.wikimedia.org/T188862#4087892 (10MarcoAurelio) 05Open>03Resolved This works now. [12:54:19] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/srv/deployment-charts] [13:04:40] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/srv/deployment-charts] [13:16:05] Project mwext-phpunit-coverage-publish build #2675: 04FAILURE in 2.5 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/2675/ [13:18:23] 13:16:05 java.lang.NullPointerException: no workspace from node hudson.slaves.DumbSlave[ci-jessie-wikimedia-1011689] which is computer hudson.slaves.SlaveComputer@60289d36 and has channel null [13:18:28] hashar ^^ [13:18:52] Yippee, build fixed! [13:18:52] Project mwext-phpunit-coverage-publish build #2676: 09FIXED in 2 min 46 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/2676/ [13:19:19] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [13:22:50] (03CR) 10Hashar: [C: 032] Port mw-apply-settings and drop slave-scripts [integration/quibble] - 10https://gerrit.wikimedia.org/r/422390 (owner: 10Hashar) [13:23:14] (03Merged) 10jenkins-bot: Port mw-apply-settings and drop slave-scripts [integration/quibble] - 10https://gerrit.wikimedia.org/r/422390 (owner: 10Hashar) [13:29:40] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:30:57] 10Continuous-Integration-Infrastructure (shipyard), 10MediaWiki-Maintenance-scripts, 10Patch-For-Review: MediaWiki PHP based built-in server does not output log requests for index.php queries - https://phabricator.wikimedia.org/T190503#4088025 (10hashar) [13:37:29] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512#4088047 (10hashar) [13:37:33] 10Continuous-Integration-Infrastructure (shipyard), 10MediaWiki-Maintenance-scripts, 10Patch-For-Review: MediaWiki PHP based built-in server does not output log requests for index.php queries - https://phabricator.wikimedia.org/T190503#4088045 (10hashar) 05Open>03Resolved a:03hashar [13:41:55] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: WebdriverIO should run Chrome headlessly - https://phabricator.wikimedia.org/T167507#4088051 (10zeljkofilipin) [13:54:05] 10Release-Engineering-Team (Kanban), 10MediaWiki-Vagrant, 10Patch-For-Review, 10User-zeljkofilipin: MediaWiki-Vagrant does not install bluebird package - https://phabricator.wikimedia.org/T190914#4087460 (10zeljkofilipin) 05Open>03Resolved [13:54:39] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: WebdriverIO should run Chrome headlessly - https://phabricator.wikimedia.org/T167507#4088086 (10zeljkofilipin) [14:08:52] 10Deployments, 10Release-Engineering-Team (Next), 10MediaWiki-Maintenance-scripts, 10PHP 7.0 support, 10Patch-For-Review: php5 is missing on deploy1001 which breaks foreachwiki & l10nupdate - https://phabricator.wikimedia.org/T190909#4087254 (10Anomie) Related: {T146285} [14:46:12] 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team (Current), 10User-Ladsgroup: ores-beta grafana is broken - https://phabricator.wikimedia.org/T190075#4088225 (10Halfak) a:03Ladsgroup [14:46:19] twentyafterfour: no_justification : my suggestion: reinstall deploy1001 with jessie, use that. remove tin [14:46:27] then at least you dont have "slower because 3 servers" [14:46:35] and we also solved "tin old hardware" [14:46:46] and jessie->stretch can be done separately.. if you like [14:47:02] my original motiviation was just the part that tin is so old.. [14:47:27] the hardware part [14:47:44] let's talk later today [14:49:14] 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team (Current), 10User-Ladsgroup: Beta: Could not find class role::ores::worker - https://phabricator.wikimedia.org/T188316#4088239 (10Halfak) a:03Ladsgroup [14:54:22] mutante: I'm also OK with that. [14:58:38] reminder: I'm going to shut down nodepool for a few minutes, in a few minutes [15:14:33] all done — CI should be back to normal in a moment if it isn't already [15:24:20] (03PS1) 10Jforrester: [avro-php] Convert to composer-test-package now we're supporting PHP7 [integration/config] - 10https://gerrit.wikimedia.org/r/422418 [15:24:33] Project mwext-phpunit-coverage-publish build #2678: 04FAILURE in 3.4 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/2678/ [15:24:36] Project mwext-phpunit-coverage-publish build #2679: 04STILL FAILING in 1 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/2679/ [15:26:01] Yippee, build fixed! [15:26:01] Project mwext-phpunit-coverage-publish build #2680: 09FIXED in 1 min 21 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/2680/ [15:29:39] thanks andrewbogott ! [15:30:30] (03CR) 10Jforrester: "Support for PHP7 being added in I159203ac7629" [integration/config] - 10https://gerrit.wikimedia.org/r/422418 (owner: 10Jforrester) [15:32:45] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188#4088336 (10zeljkofilipin) [15:55:31] Oh i found some bugs in gr-group-audit-log lol [15:55:37] the component i created heh [15:55:46] https://bugs.chromium.org/p/gerrit/issues/detail?id=8644 [15:57:47] no_justification i fixed another py3 issue [15:57:48] https://gerrit-review.googlesource.com/c/gerrit/+/168951 [16:04:15] !log nodepool: deleting 4 instances that are no more used but that Nodepool failed to detect as no omre used (due to some reboots in the openstack infra) [16:04:17] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:04:38] andrewbogott: nodepool is all happy. Thank you! [16:11:20] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188#4088464 (10zeljkofilipin) ``` Xvfb :99 & export DISPLAY=:99 npm run selenium & ffmpeg -f x11grab -video_size 1280x1024 -i :99 -codec:v libx264 -r 12 log/1.mp4 ``` [16:17:20] no_justification oh we can configure motd based on ip [16:26:29] Yeah, but that seems kinda useless for us :p [16:28:50] yeh [16:28:56] i was thinking same thing :) [16:31:23] now this https://gerrit-review.googlesource.com/c/gerrit/+/168955 looks like a much better fix heh [16:31:34] i have now idea why i did not do that in the first better [17:27:35] (03PS1) 10Legoktm: Release 17.0.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/422441 [17:43:37] (03CR) 10Thiemo Kreuz (WMDE): Release 17.0.0 (031 comment) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/422441 (owner: 10Legoktm) [17:44:08] (03PS2) 10Thiemo Kreuz (WMDE): Release 17.0.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/422441 (owner: 10Legoktm) [17:45:51] (03CR) 10Thiemo Kreuz (WMDE): [C: 031] Release 17.0.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/422441 (owner: 10Legoktm) [17:45:53] (03CR) 10Legoktm: Release 17.0.0 (031 comment) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/422441 (owner: 10Legoktm) [17:46:54] (03CR) 10Thiemo Kreuz (WMDE): Release 17.0.0 (031 comment) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/422441 (owner: 10Legoktm) [18:02:49] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.27 deployment blockers - https://phabricator.wikimedia.org/T183966#4088848 (10mmodell) a:03mmodell [18:03:10] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.28 deployment blockers - https://phabricator.wikimedia.org/T183967#4088851 (10mmodell) a:03mmodell [18:10:27] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T183963#4088875 (10greg) 05Open>03Resolved whoopsies [18:12:02] Wow logspam is pretty bad in 1.31.0-wmf.26 [18:16:26] :( [18:16:37] (03CR) 10Legoktm: [C: 032] Release 17.0.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/422441 (owner: 10Legoktm) [18:17:26] (03Merged) 10jenkins-bot: Release 17.0.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/422441 (owner: 10Legoktm) [18:18:16] (03CR) 10jenkins-bot: Release 17.0.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/422441 (owner: 10Legoktm) [18:35:21] no_justification well 2.15 was released finally https://gerrit-review.googlesource.com/c/gerrit/+/169070 bring on the bug fixes now :) [18:44:20] 10Deployments, 10Release-Engineering-Team (Kanban), 10Operations, 10Beta-Cluster-reproducible, and 2 others: Switch mwscript from Zend PHP5 to default php alternative (e.g. HHVM or PHP7) - https://phabricator.wikimedia.org/T146285#4089001 (10mmodell) [18:44:25] 10Deployments, 10Release-Engineering-Team (Next), 10MediaWiki-Maintenance-scripts, 10PHP 7.0 support, 10Patch-For-Review: php5 is missing on deploy1001 which breaks foreachwiki & l10nupdate - https://phabricator.wikimedia.org/T190909#4089000 (10mmodell) [18:52:02] Is integration-slave-docker-1003.integration.eqiad.wmflabs still meant to be a going concern? It's all out of disk space so probably not working at all. [18:54:30] https://groups.google.com/forum/#!topic/repo-discuss/05s_xY123aE [18:55:39] 10Gerrit, 10Release-Engineering-Team (Someday): Update gerrit to 2.15 - https://phabricator.wikimedia.org/T177201#4089051 (10Paladox) p:05Lowest>03Low [18:59:11] 10Deployments, 10MediaWiki-Debug-Logger, 10User-Tgr: Capture PHP warnings with stacktraces in MediaWiki and save to logstash - https://phabricator.wikimedia.org/T45086#4089056 (10Tgr) [19:00:57] 10Gerrit, 10Upstream: Polygerrit search dropdown does not list all projects - https://phabricator.wikimedia.org/T188842#4089058 (10Paladox) [19:00:59] 10Gerrit: Enable Gerrit feature to add comment when people add reviewers to a patch - https://phabricator.wikimedia.org/T168030#4089059 (10Paladox) [19:01:01] 10Gerrit, 10Release-Engineering-Team (Someday): Update gerrit to 2.15 - https://phabricator.wikimedia.org/T177201#4089057 (10Paladox) [19:05:48] Mar 26, 2018 3:34:42 AM [19:05:48] Disconnected by bryandavis : Root partition is full and I don't know what to delete [19:07:08] !log legoktm@integration-slave-docker-1003:/srv/jenkins-workspace/workspace$ sudo rm -rf * # full disk [19:07:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:08:56] oh, root is full [19:08:57] ugh [19:10:15] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.27 deployment blockers - https://phabricator.wikimedia.org/T183966#4089080 (10mmodell) [19:12:05] hi [19:12:07] apergos: [19:12:12] hey ho [19:12:25] I guess we want a no_justification too [19:12:27] bd808: there was a abandoned docker container (docker-registry.wikimedia.org/releng/npm-test:0.5.0) that was taking up the disk space somehow I guess [19:12:30] my suggestion is that i just reinstall deploy1001 with jessie [19:12:34] which wont be much work [19:12:38] and then we can use that [19:12:44] bd808: stopping that container freed up 6.3GB [19:12:44] and we solved the following things: [19:12:51] That's what we discussed yeah [19:12:53] this means you'll redo it again in a couple weeks; do you mind doing that? [19:12:59] Also, I guess the "fix" to mwscript needs reverting for now? [19:13:01] (to go to stretch) [19:13:01] - deployment server is not on old hardware anymore [19:13:12] Although, might impact dumps migration [19:13:12] - hardwawre refresh goal is solved [19:13:15] * no_justification sighs [19:13:23] and "upgrade to stretch" is anotehr ticket [19:13:27] (should have been) [19:13:48] this also fixes "it's slower because we have 3 servers" [19:13:55] !log killed stuck docker container on 1003 to free up root partition, and then deleted old/all images to free up the rest of the space [19:13:55] as long as there's an hhvm build with my memcached fix, it should be fine, no_justification [19:13:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:13:58] and doesnt leave the 2 deployment servers on different versions [19:14:17] andrewbogott: integration-slave-docker-1003.integration.eqiad.wmflabs is fixed now [19:14:25] we can talk it through (the impact on dumps) but I don't even need the icu crap, I'm only waitign for it because that build is already in testing [19:14:30] legoktm: thanks! [19:14:35] and we cna't just throw another patch on top of it [19:14:46] we should probably revert the mwscript change [19:14:50] but not everything [19:15:13] we don't use mwscript though we use some of the multiversion stuff [19:15:15] we can still remove tin [19:15:39] everything I do has a config setting which we explicitly set for php (thank goodness) [19:16:05] how does this mpact the git-lfs stuff etc? [19:16:26] all i saw about that is .. that on stretch it's an available package [19:16:33] but puppet did not install it [19:16:49] twentyafterfour was talking about this with me [19:16:58] dunno if he is around now [19:19:48] he's here but busy with a deploy, let's get him in on this so we have a plan everyone agrees to [19:19:53] i should have had 2 tickets from the beginning [19:19:57] one to replace the hardware [19:20:01] and one to upgrade OS [19:20:16] you didn't forsee this mess [19:20:24] it's really not that related [19:20:38] and one doesnt have to block the other [19:20:42] apergos: I'm around [19:21:06] so folks are proposing that deployment1001 be reinstalled with jessie [19:21:17] so tht tin (out of warranty) can be decommissioned [19:21:27] I'm ok with that, though I really want to get stretch on deploy masters to unblock git-lfs [19:21:27] and the rest (stretch etc) put off until proper work can be done [19:21:39] yeah that makes sense [19:21:44] so the icu stuff goes around april 9 [19:22:03] can they wait that long? (if not it's gonna be a mess, pure and simple) [19:22:08] also, it's a good thing if naos is on the same version as the one in eqiad, right? [19:22:17] unless we _want_ to test the newer version on the inaactive one [19:22:34] apergos: I can wait, it's ores who are blocked [19:22:36] that was my other question.. so i count "eqiad and codfw are both jessie" as good [19:22:45] * halfak perks up [19:22:48] mutante: yeah that makes sense [19:23:08] btw, I just rolled back the train to wmf.26 [19:23:25] halfak: I hear ores needs git-lfs, that depends on stretch on the deployment hosts, we liely can't deliver that until icu library updates for stretch go around [19:23:32] that's apr 9 plus maybe a few days [19:23:45] then there's time frame of installing etc obviously [19:23:48] Gotcha. If that's the case, we can wait. [19:23:51] ok [19:23:54] We're sad, but that's life. [19:23:56] yep [19:24:10] It's really nice to have a date. [19:24:11] I'm sad too but that's how it is [19:24:15] that date coul slip [19:24:17] Do you forsee any other major blockers? [19:24:20] but it's a rough something [19:24:35] that I don't know, no_justification and others will have that answer [19:24:37] Lots of unknowns :\ [19:24:44] OK. [19:24:45] I hope there won't be, but I fear there will [19:24:52] * halfak wishes he wasn't such a trail blazer. [19:24:55] heh [19:25:00] I like boring engineering in interesting contexts [19:25:09] Not interesting engineering. Someone else can do that. [19:25:16] halfak: I'll do my best to help work through whatever issues come up [19:25:37] Thanks twentyafterfour, no_justification, & apergos. I'll go update the task. [19:25:44] we got the interesting life part right here, it's in your j [19:25:44] d [19:25:55] Does the ores hosts have stretch yet? [19:25:55] ;) [19:26:01] ok so I sign off on: deployment1001->jessie, tin -> decomm for now [19:26:04] We'll make damn sure we're buttoned up by April 9th so we're ready. [19:26:05] iirc, we didn't *require* git lfs on the master? [19:26:10] no_justification, yeah. We're all stretch now [19:26:15] Ah [19:26:33] no_justification, I don't have enough scap-fu to say either way. [19:27:00] awight and I did some proof of concept work on beta cluster. I think we are close to having all the issues worked out [19:27:38] Oh good. He's AFK this week, but we can pick that back up on Monday [19:27:45] no_justification: technically no, but it would be difficult to debug things in production without git-lfs [19:28:34] the only way I was able to get anything to work in beta was by sshing to the targets and poking around manually with git-lfs commands [19:28:57] ewwww [19:29:17] twentyafterfour: That's targets, not master.... [19:29:24] (10:26:01 μμ) apergos: ok so I sign off on: deployment1001->jessie, tin -> decomm for now everyone else on board yes? [19:29:50] I understand wanting git-lfs on masters, I just wanna understand what the strict blocker is [19:30:18] ok, thanks everybody, looks like consesus and i'll just do that and reinstall deploy1001 now [19:30:45] cant break anything while tin is active [19:32:01] mutante, is the idea to re-image deploy1001 to stretch once the ICU library updates are ready? [19:32:18] Forgive me if I've asked something stupid :| [19:32:42] paladox: Plugin dependencies are hard :( [19:32:48] You can't do a standalone build with them! [19:34:03] (like its-phab can't be standalone!) [19:34:10] External dependencies are easy [19:34:20] halfak: I presume it will be like that but there's a bit more that needs to be worked out (some script wrappers and such) [19:36:25] halfak: yes, it is. though it might then be deploy1002 [19:36:39] OK gotcha. Makes sense. [19:37:40] also once we have deploy100x on stretch, we would want to do the same with deploy200x in codfew to replace naos [19:38:35] and not have them on different versions in case one breaks [19:40:01] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): Prepare CI for REL1_31 - https://phabricator.wikimedia.org/T190879#4089163 (10greg) [19:40:31] apergos: yes I'm on board. sorry dealing with train [19:40:47] yeah i figured, we took silence as consent :- [19:40:49] D [19:55:21] (03Draft2) 10Zoranzoki21: Add mediawiki/extension/RandomArea in zuul [integration/config] - 10https://gerrit.wikimedia.org/r/422482 [20:03:39] Can anyone to deploy https://gerrit.wikimedia.org/r/#/c/422482/ [20:08:09] (03CR) 10Jayprakash12345: [C: 031] Add mediawiki/extension/RandomArea in zuul [integration/config] - 10https://gerrit.wikimedia.org/r/422482 (owner: 10Zoranzoki21) [20:09:27] no_justification oh [20:09:54] no_justification maybe we could come up with something to support this, though that can wait for now :) [20:12:22] did the mwscript changes get reverted yet btw? [20:12:27] * apergos just lobs that out there [20:16:32] Nope [20:16:37] Er, well I didn't see it go by? [20:17:03] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.27 deployment blockers - https://phabricator.wikimedia.org/T183966#4089265 (10mmodell) [20:17:26] I guess while we are at ubn on the train maybe other stuff can wait [20:17:26] 10Release-Engineering-Team: search-mjolnir-tox-docker job in status ABORT somewhat regularly - https://phabricator.wikimedia.org/T190963#4089266 (10EBernhardson) [20:17:37] no_justification groups are no longer in the db now heh [20:17:38] i think that will improve performance [20:17:39] as it wont query through the db now [20:24:12] no_justification and guess what you be glad to know, that GerritSite.css is replaced with gerrit-theme.html [20:24:20] but that file will only affect polygerrit [20:37:50] (03PS1) 10Hashar: Add Xvfb support and use it for Webdriver.io tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/422544 [20:45:24] !log Update mobileapps to a5833a0 on BC [20:45:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:49:33] 10Release-Engineering-Team: search-mjolnir-tox-docker job in status ABORT somewhat regularly - https://phabricator.wikimedia.org/T190963#4089321 (10EBernhardson) My initial counts of hosts didn't look close enough, those are strictly failures. Narrowing down to timeouts installing the mjolnir package into flake8... [20:50:47] paladox: https://gerrit.wikimedia.org/r/#/c/422469/ and https://gerrit.wikimedia.org/r/#/c/422429/ [20:51:01] yay :) [20:51:25] for this https://gerrit.wikimedia.org/r/c/422429/ whoopeee [20:52:15] Gah, I did it again [20:52:26] Added submodule in wrong location [20:52:47] I'm just glad `git mv` does the Right Thing [20:52:56] heh [20:53:05] ah i see [20:53:09] it's pure luck when submodules do the right thing [20:53:11] :) [20:53:15] lol [20:53:32] ebernhardson: In this ONE case, they do! [20:53:40] `git submodule add somerepo foo` [20:53:46] Oh crap, I put that in the wrong place! [20:53:55] `git mv foo bar` WORKS AS YOU WOULD THINK! [20:54:13] It updates .gitmodules, .git/config, .git/modules/* and the actual path all in one! [20:54:14] <3 [20:54:55] lol [20:55:29] Ahhh, would be good to use stable-2.14 [20:55:54] https://gerrit.wikimedia.org/r/422550 [20:56:26] +1'ed :) [21:01:53] (03PS1) 10Hashar: Add mediawiki/skins/Vector [integration/quibble] - 10https://gerrit.wikimedia.org/r/422553 [21:02:17] (03CR) 10jerkins-bot: [V: 04-1] Add mediawiki/skins/Vector [integration/quibble] - 10https://gerrit.wikimedia.org/r/422553 (owner: 10Hashar) [21:04:50] (03PS2) 10Hashar: Add mediawiki/skins/Vector [integration/quibble] - 10https://gerrit.wikimedia.org/r/422553 [21:30:31] (03CR) 10Hashar: [C: 032] "That can be optimized later on." [integration/quibble] - 10https://gerrit.wikimedia.org/r/422553 (owner: 10Hashar) [21:30:44] (03CR) 10Hashar: [C: 032] Add Xvfb support and use it for Webdriver.io tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/422544 (owner: 10Hashar) [21:31:10] (03Merged) 10jenkins-bot: Add Xvfb support and use it for Webdriver.io tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/422544 (owner: 10Hashar) [21:31:12] (03Merged) 10jenkins-bot: Add mediawiki/skins/Vector [integration/quibble] - 10https://gerrit.wikimedia.org/r/422553 (owner: 10Hashar) [21:45:47] (03PS1) 10Legoktm: Run extension-coverage for UrlShortener [integration/config] - 10https://gerrit.wikimedia.org/r/422566 [21:50:28] 10Beta-Cluster-Infrastructure, 10Operations, 10User-Ladsgroup: Remove uca-fa from beta cluster - https://phabricator.wikimedia.org/T190965#4089423 (10Ladsgroup) p:05Triage>03High [22:16:39] no_justification: Hm. somewhat unclear to me what the issue is with using hhvm for mwscript. I thought we fixed most of the issues, and I thought we already fixed the JIT slowness (e.g. disabled, or persists in its own JIT cache somewhere). [22:17:03] I don't think we can realistically make our timeline if we keep between php5/php7 only, given we need to be on php7/hhvm only in not much time. [22:17:12] I have no idea. [22:17:22] And php7 is unlikely to be ready for maintenance scripts by that time, not for real prod stuff anyway. [22:17:23] I just know it was one speed [22:17:30] Swapped to hhvm [22:17:32] Got slow [22:17:39] How much slower are we talking about? [22:17:40] Went back to PHP5 [22:18:01] Ummm.... Many minutes slower. Lemme find it in my history [22:19:55] I don't want to defend hhvm or slow performance, but it might seem... tolerable/necessary to have a few scripts and human actions that normally take 2 minutes, say, take 5 minutes or so. For a little while. [22:20:20] It was something like 6 minutes vs 40 minutes. [22:20:22] I just really want to get ahead of the php7 thing and start finding problems that we can deal with, right now those resources are wasted on doing on ohter things because we've already fixed all the issues we knew about. [22:20:24] it was more like 2 to 40, yeah, that [22:20:26] and we keep going back to php5 :( [22:20:49] For which script? [22:21:34] * paladox wonders how gwtui manages to delete singleusergroup groups lol [22:21:39] Krinkle: https://logstash.wikimedia.org/goto/7c97c58ad2fcc978e625e395d63c4c27 [22:21:43] somehow i think it is doing something special now [22:23:52] no_justification: Krinkle for the record, that's during scap, not the nightly l10nupdate (which wouldn't matter significantly) [22:25:06] Yeah, no problem. [22:25:15] I suppose we can figure out why it's slow and make it work, shoudln't be too hard. [22:25:27] But I'd prefer that maybe in that case we find a way to just opt-out l10n-update only for now [22:25:40] instead of all mwscript uses staying on php5 indefinitely. So we can start finding other problems. [22:28:14] l10nupdate-1's call could use PHP=php5 probably [22:28:44] Dropping cdb would improve performance too ;-) [22:55:39] 10Release-Engineering-Team (Watching / External), 10Operations, 10hardware-requests: eqiad: replacement tin/deployment server - https://phabricator.wikimedia.org/T174452#4089715 (10Dzahn) [22:55:47] 10Release-Engineering-Team (Watching / External), 10Operations, 10Patch-For-Review: setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4089714 (10Dzahn) 05Resolved>03Open [22:58:23] (03PS2) 10Legoktm: [avro-php] Convert to composer-test-package now we're supporting PHP7 [integration/config] - 10https://gerrit.wikimedia.org/r/422418 (owner: 10Jforrester) [22:58:32] (03CR) 10Legoktm: [C: 032] Run extension-coverage for UrlShortener [integration/config] - 10https://gerrit.wikimedia.org/r/422566 (owner: 10Legoktm) [22:58:38] (03CR) 10Legoktm: [C: 032] [avro-php] Convert to composer-test-package now we're supporting PHP7 [integration/config] - 10https://gerrit.wikimedia.org/r/422418 (owner: 10Jforrester) [22:59:43] (03Merged) 10jenkins-bot: Run extension-coverage for UrlShortener [integration/config] - 10https://gerrit.wikimedia.org/r/422566 (owner: 10Legoktm) [22:59:50] (03Merged) 10jenkins-bot: [avro-php] Convert to composer-test-package now we're supporting PHP7 [integration/config] - 10https://gerrit.wikimedia.org/r/422418 (owner: 10Jforrester) [23:00:36] !log deployed https://gerrit.wikimedia.org/r/422566 https://gerrit.wikimedia.org/r/422418 [23:00:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:27:09] (03PS1) 10Legoktm: Make a bunch of extension tests non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/422590 [23:28:16] (03CR) 10jerkins-bot: [V: 04-1] Make a bunch of extension tests non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/422590 (owner: 10Legoktm) [23:37:14] (03PS2) 10Legoktm: Make a bunch of extension tests non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/422590 [23:39:14] (03CR) 10Legoktm: [C: 032] Make a bunch of extension tests non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/422590 (owner: 10Legoktm) [23:40:29] (03Merged) 10jenkins-bot: Make a bunch of extension tests non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/422590 (owner: 10Legoktm) [23:42:31] !log deployed https://gerrit.wikimedia.org/r/422590 [23:42:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL