[00:00:53] 10Deployment-Systems: [Trebuchet] Automatically log trebuchet actions to Server admin log and Graphite - https://phabricator.wikimedia.org/T63997#1443630 (10Krinkle) [00:02:05] bd808: Is https://github.com/wikimedia/mediawiki-tools-scap/commit/303e72e1b2313d6719392d042128c13d1afea970 deployed? [00:02:22] Krinkle: yes [00:03:02] aliasByNode(deploy.{sync-file,sync-dir,scap}.count,-2) [00:03:08] Ah, that explains why it kept not showing up [00:03:17] * Krinkle uses all instead [00:04:23] I apparently forgot to !log when I did it? [00:04:53] It's stupid that trebuchet doesn't !log automatically [00:04:59] No, it's fine [00:05:10] I changed the grafana annotation to use this instead: [00:05:10] exclude(aliasByNode(deploy.*.count,-2),"all") [00:05:15] Now it shows up [00:05:53] bd808: useful, -> http://i.imgur.com/HanuL24.png [00:06:13] nice [00:06:17] https://grafana.wikimedia.org/#/dashboard/db/resourceloader [00:06:20] last 12 hours [00:08:00] Krinkle: grafana is sooo last week. https://tessera.wikimedia.org is the new hotness ;) [00:08:27] bd808: yeah, but the interface to set up a dashboard reminds me of FORTRAN [00:08:36] I'll look at it again some day but too much work atm [00:08:40] heh. I haven't tried yet [00:09:34] it doesn't support interactive zooming and panning around in data. So essentially "last X" only. Which means you can't focus with the detail of "last hour" on something last week. [00:09:39] which is useful when finding probelms [00:09:59] But I think tessera will be a good one for non-devops presentation of information [00:10:04] more higher level monitoring and show casind [00:10:08] showcasing [00:10:15] like the analytics report cards [00:10:22] but whilst still using graphite directly [00:10:24] underneath [00:10:30] yeah. it looks like a status dashbard tool [00:26:59] (03PS1) 10Legoktm: Configure 'npm' for generator-wikimedia-php-library [integration/config] - 10https://gerrit.wikimedia.org/r/223992 [00:27:16] (03CR) 10Legoktm: [C: 032] Configure 'npm' for generator-wikimedia-php-library [integration/config] - 10https://gerrit.wikimedia.org/r/223992 (owner: 10Legoktm) [00:33:00] (03CR) 10Legoktm: Configure 'npm' for generator-wikimedia-php-library [integration/config] - 10https://gerrit.wikimedia.org/r/223992 (owner: 10Legoktm) [01:59:19] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<44.44%) [02:19:21] RECOVERY - Free space - all mounts on deployment-bastion is OK All targets OK [02:50:35] (03CR) 10Legoktm: [C: 032] Configure 'npm' for generator-wikimedia-php-library [integration/config] - 10https://gerrit.wikimedia.org/r/223992 (owner: 10Legoktm) [02:52:22] (03Merged) 10jenkins-bot: Configure 'npm' for generator-wikimedia-php-library [integration/config] - 10https://gerrit.wikimedia.org/r/223992 (owner: 10Legoktm) [03:01:00] !log deploying https://gerrit.wikimedia.org/r/223992 [03:01:03] Logged the message, Master [06:50:30] PROBLEM - Free space - all mounts on deployment-videoscaler01 is CRITICAL deployment-prep.deployment-videoscaler01.diskspace._var.byte_percentfree (<40.00%) [06:59:03] PROBLEM - Puppet failure on integration-zuul-server is CRITICAL 100.00% of data above the critical threshold [0.0] [07:05:33] RECOVERY - Free space - all mounts on deployment-videoscaler01 is OK All targets OK [07:45:12] 5Continuous-Integration-Isolation, 7Nodepool: puppet yields: Could not find declared class ::nodepool at /etc/puppet/manifests/role/nodepool.pp:25 - https://phabricator.wikimedia.org/T105406#1444128 (10hashar) [07:45:13] 5Continuous-Integration-Isolation, 5Patch-For-Review: nodepool users should have OpenStack env variables set on login - https://phabricator.wikimedia.org/T103673#1444127 (10hashar) [07:45:28] 5Continuous-Integration-Isolation, 5Patch-For-Review: nodepool users should have OpenStack env variables set on login - https://phabricator.wikimedia.org/T103673#1444129 (10hashar) 5Open>3stalled Pending puppet to run on labnodepool. Currently blocked by T105406 [07:53:18] 5Continuous-Integration-Isolation, 7Nodepool: puppet yields: Could not find declared class ::nodepool at /etc/puppet/manifests/role/nodepool.pp:25 - https://phabricator.wikimedia.org/T105406#1444136 (10hashar) [07:54:21] 5Continuous-Integration-Isolation, 7Nodepool: puppet yields: Could not find declared class ::nodepool at /etc/puppet/manifests/role/nodepool.pp:25 - https://phabricator.wikimedia.org/T105406#1443220 (10hashar) [08:11:47] PROBLEM - Puppet staleness on deployment-restbase01 is CRITICAL 100.00% of data above the critical threshold [43200.0] [08:12:38] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Updating repository to a new version of RuboCop does not run RuboCop for the commit - https://phabricator.wikimedia.org/T105178#1444158 (10hashar) For repositories beside mediawiki/core we confirmed rubocop job is properly triggered on Gemfile.lock ch... [08:16:09] 5Continuous-Integration-Isolation, 7Nodepool: puppet yields: Could not find declared class ::nodepool at /etc/puppet/manifests/role/nodepool.pp:25 - https://phabricator.wikimedia.org/T105406#1444160 (10Joe) so, creating a simple test file ``` class role::nodepool {... [08:21:25] Project browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #638: FAILURE in 1 min 23 sec: https://integration.wikimedia.org/ci/job/browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/638/ [08:46:16] (03PS9) 10Paladox: Add jsonlint test for MaintenanceShell [integration/config] - 10https://gerrit.wikimedia.org/r/222592 [08:47:04] (03CR) 10Paladox: "@Krinkle please review this. This checks json since it isent currently being checked." [integration/config] - 10https://gerrit.wikimedia.org/r/222592 (owner: 10Paladox) [08:52:42] 5Continuous-Integration-Isolation, 7Nodepool: puppet yields: Could not find declared class ::nodepool at /etc/puppet/manifests/role/nodepool.pp:25 - https://phabricator.wikimedia.org/T105406#1444229 (10Joe) a:3Joe [09:29:17] PROBLEM - Puppet failure on nodepool-t105406 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:32:43] (03PS1) 10Amire80: Delete the ULS job [integration/config] - 10https://gerrit.wikimedia.org/r/224034 (https://phabricator.wikimedia.org/T94158) [09:44:27] Yippee, build fixed! [09:44:27] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #542: FIXED in 7 min 26 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/542/ [09:54:35] zeljkof-meeting: you were right from the start [09:54:39] the regex does not match the branch [09:54:45] hashar_: :D [09:54:46] ie ^(?!REL1_23|REL1_24|fundraising/REL.*)$ [09:54:50] does not match master [09:59:16] RECOVERY - Puppet failure on nodepool-t105406 is OK Less than 1.00% above the threshold [0.0] [10:01:06] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Updating repository to a new version of RuboCop does not run RuboCop for the commit - https://phabricator.wikimedia.org/T105178#1444333 (10hashar) The job is not matched because of the branch filter. `master` is never matched. That the exact same iss... [10:20:20] 10Browser-Tests, 10VisualEditor, 10Wikimania-Hackathon-2015: add true internationalized content support to auto-screenshots - https://phabricator.wikimedia.org/T105466#1444372 (10Amire80) 3NEW [10:22:40] (03PS1) 10Hashar: Fix mediawiki-core-bundle-rubocop branch filter [integration/config] - 10https://gerrit.wikimedia.org/r/224040 (https://phabricator.wikimedia.org/T105178) [10:23:16] (03CR) 10Hashar: "Tested using:" [integration/config] - 10https://gerrit.wikimedia.org/r/224040 (https://phabricator.wikimedia.org/T105178) (owner: 10Hashar) [10:23:19] (03PS2) 10Hashar: Fix mediawiki-core-bundle-rubocop branch filter [integration/config] - 10https://gerrit.wikimedia.org/r/224040 (https://phabricator.wikimedia.org/T105178) [10:29:39] 5Continuous-Integration-Isolation, 6operations, 7Nodepool, 5Patch-For-Review, 7Puppet: puppet yields: Could not find declared class ::nodepool at /etc/puppet/manifests/role/nodepool.pp:25 - https://phabricator.wikimedia.org/T105406#1444425 (10Joe) [10:29:45] (03CR) 10Hashar: [C: 032] Fix mediawiki-core-bundle-rubocop branch filter [integration/config] - 10https://gerrit.wikimedia.org/r/224040 (https://phabricator.wikimedia.org/T105178) (owner: 10Hashar) [10:30:39] 5Continuous-Integration-Isolation, 6operations, 7Nodepool, 5Patch-For-Review, 7Puppet: puppet yields: Could not find declared class ::nodepool at /etc/puppet/manifests/role/nodepool.pp:25 - https://phabricator.wikimedia.org/T105406#1444428 (10Joe) Problem solved in production, now labnodepool reports ```... [10:31:57] (03Merged) 10jenkins-bot: Fix mediawiki-core-bundle-rubocop branch filter [integration/config] - 10https://gerrit.wikimedia.org/r/224040 (https://phabricator.wikimedia.org/T105178) (owner: 10Hashar) [10:36:40] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Updating repository to a new version of RuboCop does not run RuboCop for the commit - https://phabricator.wikimedia.org/T105178#1444442 (10hashar) 5Open>3Resolved REL1_24 does not trigger it https://gerrit.wikimedia.org/r/#/c/224044/ master did h... [10:36:59] zeljkof: fixed! [10:37:07] zeljkof: now https://gerrit.wikimedia.org/r/#/c/223556/ fails rubocop [10:37:11] hashar: yeah! [10:37:41] zeljkof: there is a few rubocop errors though [10:37:45] (03PS10) 10Krinkle: Add jsonlint check for MaintenanceShell [integration/config] - 10https://gerrit.wikimedia.org/r/222592 (owner: 10Paladox) [10:37:51] (03CR) 10Krinkle: [C: 032] Add jsonlint check for MaintenanceShell [integration/config] - 10https://gerrit.wikimedia.org/r/222592 (owner: 10Paladox) [10:38:01] great, that is what I have noticed locally (because of rubocop upgrade) but then I was surprised that gerrit did not complain [10:38:01] zeljkof: not sure what happens with current master [10:38:55] what do you mean? [10:38:59] hashar: ^ [10:39:04] well [10:39:07] on mediawiki/core [10:39:12] running: bundle install && bundle rubocop [10:39:16] that yields 34 offenses :-( [10:39:26] (03CR) 10Paladox: "Thanks." [integration/config] - 10https://gerrit.wikimedia.org/r/222592 (owner: 10Paladox) [10:39:56] maybe I should fix them before bumping rubocop [10:40:01] that is straightforward [10:40:31] ah no bunch of issues are unrelated [10:42:00] (03PS11) 10Paladox: Add jsonlint check for MaintenanceShell [integration/config] - 10https://gerrit.wikimedia.org/r/222592 [10:42:28] (03CR) 10Paladox: [C: 031] "Needs +2 again please." [integration/config] - 10https://gerrit.wikimedia.org/r/222592 (owner: 10Paladox) [10:42:42] zeljkof: https://gerrit.wikimedia.org/r/224046 will fix mediawiki/core [10:43:39] 5Continuous-Integration-Isolation, 5Patch-For-Review: nodepool users should have OpenStack env variables set on login - https://phabricator.wikimedia.org/T103673#1444449 (10Joe) [10:43:41] 5Continuous-Integration-Isolation, 6operations, 7Nodepool, 5Patch-For-Review, 7Puppet: puppet yields: Could not find declared class ::nodepool at /etc/puppet/manifests/role/nodepool.pp:25 - https://phabricator.wikimedia.org/T105406#1444447 (10Joe) 5Open>3Resolved p:5Triage>3High [10:45:10] hashar: ok, will merge as soon as jenkins reports back to gerrit [10:45:23] zeljkof: seems the job hasn't been triggered since March or so :( [10:48:00] hashar: rubocop on core' [10:48:01] ? [10:48:24] yeah [10:48:51] zeljkof: https://gerrit.wikimedia.org/r/#/c/224046/ fix the single offense and also ignore node_modules (created by 'npm install') [11:02:06] (03CR) 10Paladox: Add jsonlint check for MaintenanceShell [integration/config] - 10https://gerrit.wikimedia.org/r/222592 (owner: 10Paladox) [11:02:15] (03CR) 10Paladox: [C: 031] Add jsonlint check for MaintenanceShell [integration/config] - 10https://gerrit.wikimedia.org/r/222592 (owner: 10Paladox) [11:07:01] hashar: zend failed here https://gerrit.wikimedia.org/r/#/c/224046/ [11:07:38] https://integration.wikimedia.org/ci/job/mediawiki-phpunit-zend/6422/testReport/ [11:07:39] no clue [11:07:46] seems like a race condition of some sort [11:08:33] yeah [11:08:33] "msgBlobMtime":"1436525996"} [11:08:38] "msgBlobMtime":"1436525997"} [11:08:42] different time [11:08:43] that is racy [11:09:01] that one is worth filing a bug [11:09:14] zeljkof: meanwhile you can remove your CR+2 vote and revote +2 :} [11:10:10] hashar: will do [11:11:29] will fill the bug for ressource loader [11:14:37] (03CR) 10Paladox: "@Krinkle needs +2 again because it said needed rebasing." [integration/config] - 10https://gerrit.wikimedia.org/r/222592 (owner: 10Paladox) [11:15:35] zeljkof: race condition filled at https://phabricator.wikimedia.org/T105476 [11:17:07] hashar: thanks! [11:17:15] I am in the middle of something else [11:21:31] hashar: merged that and rebased this https://gerrit.wikimedia.org/r/#/c/200767/ [11:22:46] 10Browser-Tests, 10MediaWiki-extensions-UniversalLanguageSelector, 5Patch-For-Review: Fix failed UniversalLanguageSelector browsertests Jenkins job - https://phabricator.wikimedia.org/T94158#1444552 (10zeljkofilipin) 5Open>3stalled a:5Amire80>3None [11:22:48] 6Release-Engineering, 7Epic, 5Patch-For-Review, 7Tracking: Fix or delete browsertests* Jenkins jobs that are failing for more than a week (tracking) - https://phabricator.wikimedia.org/T94150#1444554 (10zeljkofilipin) [11:23:13] 10Browser-Tests, 10MediaWiki-extensions-UniversalLanguageSelector, 5Patch-For-Review: Fix failed UniversalLanguageSelector browsertests Jenkins job - https://phabricator.wikimedia.org/T94158#1156479 (10zeljkofilipin) I have talked with @amire80, he is not working on this. [11:36:03] (03CR) 10Paladox: [C: 031] Add jsonlint check for MaintenanceShell [integration/config] - 10https://gerrit.wikimedia.org/r/222592 (owner: 10Paladox) [12:07:18] PROBLEM - Puppet failure on integration-slave-jessie-1001 is CRITICAL 100.00% of data above the critical threshold [0.0] [12:45:26] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: It takes about 20 seconds just to start a Sauce Labs browser - https://phabricator.wikimedia.org/T92613#1444658 (10zeljkofilipin) >>! In T92613#1438762, @dduvall wrote: > Did you check to see if the SauceLabs session is being queued... [12:51:32] PROBLEM - Free space - all mounts on deployment-videoscaler01 is CRITICAL deployment-prep.deployment-videoscaler01.diskspace._var.byte_percentfree (<20.00%) [12:54:44] Project browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #529: FAILURE in 43 sec: https://integration.wikimedia.org/ci/job/browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/529/ [13:36:24] (03Abandoned) 10Hashar: Add job template for doxygen publishing [integration/config] - 10https://gerrit.wikimedia.org/r/174416 (owner: 10Hashar) [14:08:09] $ gem install puppet==3.7.2 [14:08:10] ERROR: Could not find a valid gem 'puppet==3.7.2' (>= 0) in any repository [14:08:12] seriously [14:08:28] one day [14:08:44] I will write a world wide RFC to standardize all the package managers [14:28:20] hashar or thcipriani|afk : an unrelated question… I see that beta has a special memcached definition named mc-labs.php. Can you tell me where that file is referenced? [14:29:40] that’s regarding https://gerrit.wikimedia.org/r/#/c/224082/ [14:31:38] andrewbogott: looks like there's a reference to wmf-config/CommonSettings.php:189 [14:32:22] … where does the -labs come from? Does getRealmSpecificFilename do that somehow? [14:33:35] yeah, it checks if the following files exist in order: filename-{realm}-{datacener} filename-{realm} filename-{datacenter} filename [14:33:52] we have a specific function yeah [14:33:53] to vary [14:34:00] that is more or less described in the README file [14:34:05] think about it as a poor man hiera() [14:34:10] dang [14:34:13] so looks for wmf-config/mc-labs-eqiad.php first, then wmf-config/mc-labs.php, and so on. [14:34:15] maybe I’ll just make a bug for this instead [14:34:42] is -labs- a hack that just beta does? Or does anything running on labs do that? [14:34:58] I can't remember how we determine the realm [14:35:05] potentially based on /etc/wmf-realm [14:35:31] multiversion/MWRealm.php:if( file_exists( '/etc/wikimedia-realm' ) ) { [14:35:31] yeah [14:35:40] andrewbogott: so that should switch whenever you are on a labs instance [14:35:47] ah, ok. [14:35:54] the code is under multiversion/MWRealm.php [14:35:57] Hm, can’t decide if that is less crazy or more crazy [14:36:01] but I will stay away from it for now [14:36:09] the idea is that beside beta, there is no good use case for using operations/mediawiki-config.git ;} [14:36:40] I dont think we support labs project names [14:37:45] looks like the /etc/wikimedia-realm is set on both staging and deployment-prep and is set to labs, fwiw [14:42:32] ok, this is less terrible than I thought. [14:42:34] thanks [14:42:34] yeah that is populated by puppet [14:42:45] something like file { '/etc/wikimedia-realm': content => $::realm } [14:42:56] that is the hack I found out at the time to leak puppet context [14:43:01] there might be something nicer nowadays [15:02:28] !logerrors is https://phabricator.wikimedia.org/tag/wikimedia-log-errors/board/ [15:02:29] This key already exist - remove it, if you want to change it [15:02:34] !logerrors [15:02:34] https://phabricator.wikimedia.org/maniphest/query/3nBU5bwR5HqG/#R [15:02:53] !logerrors del [15:02:53] Successfully removed logerrors [15:02:55] !logerrors is https://phabricator.wikimedia.org/tag/wikimedia-log-errors/board/ [15:02:56] Key was added [15:03:04] 5Continuous-Integration-Isolation, 10Ops-Access-Requests, 6operations: Get Dan Duvall TEMP root to labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T102133#1445172 (10hashar) [15:03:17] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Isolation, 6operations, 7Nodepool: Use systemd for Nodepool - https://phabricator.wikimedia.org/T96867#1445175 (10hashar) [15:03:39] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Isolation, 6operations, 7Nodepool: Use systemd for Nodepool - https://phabricator.wikimedia.org/T96867#1445178 (10hashar) a:3hashar Lets give this a try. Will do it in operations/puppet.git for now then "upstream" it in the .deb package. [15:13:52] 5Continuous-Integration-Isolation, 7Nodepool: Nodepool Debian package should create /var/run/nodepool directory - https://phabricator.wikimedia.org/T105501#1445210 (10hashar) 3NEW [15:14:12] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL 100.00% of data above the critical threshold [0.0] [16:12:40] !log nodepool puppitization going on :-D [16:12:43] Logged the message, Master [16:12:47] off for the week-end [16:19:54] 10Deployment-Systems, 6Analytics-Backlog, 6Performance-Team, 6operations, 7Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1445428 (10Milimetric) Hey guys, so we are trying to get away from servicing ad-hoc data requests, but we... [16:24:07] 10Deployment-Systems, 5Patch-For-Review, 7Wikimedia-log-errors: Cant unserialize( ) - https://phabricator.wikimedia.org/T103744#1445446 (10demon) a:3demon [16:38:28] bd808_: You might enjoy that one ^ :) [16:38:33] I wrote a patch for it: https://gerrit.wikimedia.org/r/#/c/224107/ [16:40:32] ostriches: file_put_contents() with the LOCK_EX flag should basically do all that dance shouldn't it? [16:41:04] #til file_put_contents() supports LOCK_EX [16:42:05] Although, technically you can still race where you truncate prior to getting the lock I think. [16:42:16] If it opens + truncates then locks, then writes [16:42:30] Which is what happens with 'w' on fopen() [16:42:40] hmmm possibly [16:44:00] To the source! [16:45:55] hhvm opens in wb mode, locks, and then writes [16:46:34] https://github.com/facebook/hhvm/blob/f4f799a566d8890efb2818f9b81b9562420088eb/hphp/runtime/ext/std/ext_std_file.cpp#L587-L616 [16:47:18] Yep [16:47:22] Just looked at that [16:47:46] Which is probably what php-src does too (based on what's documented) [16:48:03] So yeah, the race condition with truncation prior to locking is still possible with file_get_contents() + LOCK_EX [16:48:10] I would guess so, but who cares about crappy old PHP5? ;) [16:48:39] does w truncate? I thought it just left the file pointer at pos 0? [16:48:54] 'w': Open for writing only; place the file pointer at the beginning of the file and truncate the file to zero length. If the file does not exist, attempt to create it. [16:49:06] TIL [16:49:19] 'c': Open the file for writing only. If the file does not exist, it is created. If it exists, it is neither truncated (as opposed to 'w'), nor the call to this function fails (as is the case with 'x'). The file pointer is positioned on the beginning of the file. This may be useful if it's desired to get an advisory lock (see flock()) before attempting to modify the file, as using 'w' could truncate the file before the [16:49:19] lock was obtained (if truncation is desired, ftruncate() can be used after the lock is requested). [16:49:43] So this allows us to create/open, then lock, then truncate, then write and close out [16:50:06] so that locks the writes but won't lock for read so you could still get a partial read I think [16:50:26] better for atomic operation would be to write tmp and rename [16:50:54] so all readers would ge the old or the new but never the new in unwritten or just truncated state [16:51:40] the problem isn't really with multiple writes, it is with concurrent reads and writes [16:52:03] and we don't want to make the read exclusive [16:52:11] true [16:54:06] So what you really want is something like tempnam() in the right dir; file_put_contents() and then rename() [16:56:58] Or, you know, a solution that doesn't involve writing serialized PHP variables to a file and then tossing it in extract() to setup our ENV [16:57:06] But hey, details details [16:57:06] :p [16:57:12] well there is that for sure [16:57:49] I actually thought a bit about how to make it write out php source files to include so the bytecode cache would kick in [16:57:59] and then I started thinking about doing that on tin [16:58:09] and then I decided I was way in the weeds :) [17:04:25] bd808: tempnam + file_put_contents + rename + unlink the tmp file is way cleaner. [17:04:28] Patch incoming [17:07:34] It's kind of funny that file_put_contents messes up the LOCK_EX semantics. Somebody should fix that [17:07:52] what the hell is the point of locking after truncation? [17:11:13] Hehe [17:12:44] this is the problem with working on FLOSS projects all the time. You start to think it's your job to fix all the broken crap everywhere [17:13:13] "normal" devs just shrug and decide to make a class they use to hid the ugly [17:21:41] PS2 up with the new way of renaming [17:24:29] looks pretty good to me [17:24:53] less lines, more safety, such awesome [17:25:17] PROBLEM - Puppet failure on nodepool-t105406 is CRITICAL 66.67% of data above the critical threshold [0.0] [17:29:50] bd808: rogue friday deploy? [17:29:51] :) [17:36:36] 10Deployment-Systems, 5Patch-For-Review, 7Wikimedia-log-errors: Cant unserialize( ) - https://phabricator.wikimedia.org/T103744#1445847 (10demon) 5Open>3Resolved Should be fixed now. It should've been pretty rare anyway so it's hard to see when it happened last, but if we see it h... [17:37:31] bd808: On the subject of ^, the workboard for log-errors is a Nicely Groomed Backlog [17:39:22] 10Deployment-Systems, 5Patch-For-Review, 7Wikimedia-log-errors: Cant unserialize( ) - https://phabricator.wikimedia.org/T103744#1445866 (10bd808) Just for posterity, @demon and I chatted about this on irc and realized that the problem wasn't really with the write operation (although i... [17:39:59] so many LQT bugs [17:40:11] Yeah, which is why they got their own column [17:40:13] lol [17:40:31] "crap ain't nobody gonna fix" [17:42:41] That being said, we've got a long tail of those 64 bugs that have nobody working on them [17:42:53] But are ostensibly impacting prod to some degree [17:43:08] hackathon next week... [17:43:19] * ostriches isn't there :p [17:43:32] Thoughts on https://phabricator.wikimedia.org/T102144, btw? [17:44:18] I bet it is only for JIT/interpreter errors [17:45:05] Also Aggregator is no longer a runtime option unless grep is failing me [17:45:22] A lot of their wiki docs are old crap [17:45:59] No hits for SleepSeconds and only hits for Aggregator are unrelated [17:46:33] Yeah, not seeing it. [17:46:54] PROBLEM - Puppet failure on integration-slave-trusty-1012 is CRITICAL 40.00% of data above the critical threshold [0.0] [17:47:05] 10Beta-Cluster, 7HHVM, 7Wikimedia-log-errors: Investigate HHVM Aggregator functions that writes errors to a database. - https://phabricator.wikimedia.org/T102144#1445890 (10bd808) Grepping the current HHVM codebase makes me think this is old config that has been removed from the product. [17:48:04] 10Beta-Cluster, 7HHVM, 7Wikimedia-log-errors: Investigate HHVM Aggregator functions that writes errors to a database. - https://phabricator.wikimedia.org/T102144#1445894 (10demon) 5Open>3Invalid a:3demon Agreed, we can't do this. [17:48:23] ostriches: if you want to get elbow deep in hhvm, we need to backport https://github.com/wikimedia/mediawiki-php-wmerrors [17:48:40] Or add equivalent funcitonality into hhvm core [17:48:52] I though we mostly had what we needed out of hhvm already. What's missing? [17:49:13] they raise errors on fatals but by the time the hander is called the stack is unwound [17:49:39] Ah gotcha [17:49:50] there are a small percentage of cases where the stack is still good but most of the time it is just main -> errorhandler [17:50:40] and of course the error handling code is completely different so "backport" is a euphemism [18:07:07] ostriches: Is it you I ask for permission to do an emergency Friday deploy with greg-g out? VisualEditor urgent fix to work around an epic bug in Firefox 39 (since reverted in FF 40)… [18:09:20] One of us... [18:09:40] Link to gerrit change? I'll probably say yes but just want to glance at what you're doin' :) [18:12:56] 10Browser-Tests, 10MediaWiki-extensions-OAuth: Add tests against beta to catch OAuth integration issues - https://phabricator.wikimedia.org/T78314#1446051 (10Tgr) T105387 is another good example of something we should have a smoke test for. [18:13:34] * ostriches keeps going on a log-errors patch spree [18:16:54] RECOVERY - Puppet failure on integration-slave-trusty-1012 is OK Less than 1.00% above the threshold [0.0] [18:27:19] ostriches: https://gerrit.wikimedia.org/r/#/c/224122/ [18:27:45] ostriches: Actual VE-core patch: https://gerrit.wikimedia.org/r/#/c/224054/ [18:36:49] looks fine, you guys deploying it? [18:37:02] Yeah, Krenair can if you're OK with it. [18:46:23] 10Beta-Cluster, 6Release-Engineering, 7Regression: Beta cluster logo broken (//static/images/project-logos 404 Not Found) - https://phabricator.wikimedia.org/T105541#1446146 (10Krinkle) 3NEW [18:48:53] 10Beta-Cluster, 6Release-Engineering, 7Regression: Beta cluster logo broken (/static/images/project-logos 404 Not Found) - https://phabricator.wikimedia.org/T105541#1446155 (10Krinkle) [18:58:14] James_F: Yeah no objections. I'm doing other shiz so if you guys are deploying that's even better :p [18:59:37] 10Beta-Cluster: http://en.wikipedia.beta.wmflabs.org/static/images/project-logos/betawiki.png is missing (404) - https://phabricator.wikimedia.org/T105543#1446181 (10Legoktm) 3NEW [19:03:02] legoktm, isn't that what Krinkle just filed? [19:30:20] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<55.56%) [19:43:31] Krenair: ....yes. [19:43:48] 10Beta-Cluster: http://en.wikipedia.beta.wmflabs.org/static/images/project-logos/betawiki.png is missing (404) - https://phabricator.wikimedia.org/T105543#1446256 (10Legoktm) [19:43:49] 10Beta-Cluster, 6Release-Engineering, 7Regression: Beta cluster logo broken (/static/images/project-logos 404 Not Found) - https://phabricator.wikimedia.org/T105541#1446257 (10Legoktm) [19:43:55] > These tasks will be merged into the current task and then closed. The current task will grow stronger. [19:50:18] RECOVERY - Free space - all mounts on deployment-bastion is OK All targets OK [20:21:25] 10Beta-Cluster, 7HHVM, 7Wikimedia-log-errors: Investigate HHVM Aggregator functions that writes errors to a database. - https://phabricator.wikimedia.org/T102144#1446303 (10hashar) So that is a left over from hiphop, the additional loggers got removed [[ https://github.com/facebook/hhvm/commit/0323a843a74137... [20:28:54] (03PS1) 10Dduvall: Create and cleanup TMPDIR for mw-selenium builder [integration/config] - 10https://gerrit.wikimedia.org/r/224179 (https://phabricator.wikimedia.org/T103039) [21:45:48] (03CR) 10Hashar: [C: 04-1] "Yeah that would do :-) When preparing, you want to remove the tmp content entirely just to be sure, it might not always be garbage collec" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/224179 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [21:46:54] (03PS2) 10Hashar: Delete the ULS job [integration/config] - 10https://gerrit.wikimedia.org/r/224034 (https://phabricator.wikimedia.org/T94158) (owner: 10Amire80) [21:47:25] (03CR) 10Hashar: [C: 032] "The job has already been deleted browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce" [integration/config] - 10https://gerrit.wikimedia.org/r/224034 (https://phabricator.wikimedia.org/T94158) (owner: 10Amire80) [21:47:50] 6Release-Engineering, 7Epic, 5Patch-For-Review, 7Tracking: Fix or delete browsertests* Jenkins jobs that are failing for more than a week (tracking) - https://phabricator.wikimedia.org/T94150#1446477 (10hashar) [21:47:52] 10Browser-Tests, 10MediaWiki-extensions-UniversalLanguageSelector, 5Patch-For-Review: Fix failed UniversalLanguageSelector browsertests Jenkins job - https://phabricator.wikimedia.org/T94158#1446474 (10hashar) 5stalled>3Resolved a:3hashar Amir deleted the Jenkins job browsertests-UniversalLanguageSelec... [21:48:47] (03CR) 10jenkins-bot: [V: 04-1] Delete the ULS job [integration/config] - 10https://gerrit.wikimedia.org/r/224034 (https://phabricator.wikimedia.org/T94158) (owner: 10Amire80) [21:50:24] (03PS2) 10Dduvall: Create and cleanup TMPDIR for mw-selenium builder [integration/config] - 10https://gerrit.wikimedia.org/r/224179 (https://phabricator.wikimedia.org/T103039) [21:52:18] (03CR) 10jenkins-bot: [V: 04-1] Create and cleanup TMPDIR for mw-selenium builder [integration/config] - 10https://gerrit.wikimedia.org/r/224179 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [22:01:20] (03PS1) 10Dduvall: Move teardown of MW-Selenium TMPDIR to slave script [integration/jenkins] - 10https://gerrit.wikimedia.org/r/224192 [22:04:43] (03CR) 10Dduvall: [C: 032] Move teardown of MW-Selenium TMPDIR to slave script [integration/jenkins] - 10https://gerrit.wikimedia.org/r/224192 (owner: 10Dduvall) [22:05:15] (03Merged) 10jenkins-bot: Move teardown of MW-Selenium TMPDIR to slave script [integration/jenkins] - 10https://gerrit.wikimedia.org/r/224192 (owner: 10Dduvall) [22:10:12] ostriches: did you see Aaron's comment on the atomic write patch? I looked into it and he's right that PHP has done a better job with the LOCK_EX handling since 5.2.5. It makes one wonder about other copy-pasta bits of the hhvm internals [22:11:50] Quite a bit :p [22:11:52] (03PS3) 10Dduvall: Create and cleanup TMPDIR for mw-selenium builder [integration/config] - 10https://gerrit.wikimedia.org/r/224179 (https://phabricator.wikimedia.org/T103039) [22:13:07] 10Browser-Tests, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Define JJB builder for running a subset of integration MW-Selenium tests - https://phabricator.wikimedia.org/T103039#1446525 (10dduvall) Running great with Chrome headless as well. https://integration.wikimedia.org/ci/job/mwext-mw-... [22:13:44] (03CR) 10jenkins-bot: [V: 04-1] Create and cleanup TMPDIR for mw-selenium builder [integration/config] - 10https://gerrit.wikimedia.org/r/224179 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [22:14:52] (03CR) 10Dduvall: "I've updated the job and it works well." [integration/config] - 10https://gerrit.wikimedia.org/r/224179 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [22:19:42] * bd808 filed https://github.com/facebook/hhvm/issues/5657 [22:29:11] so say I wanted to test noc changes using noc.wikimedia.beta.wmflabs.org [22:29:21] I know it serves from /srv/mediawiki/docroot/noc [22:30:11] Which host should it be on, and how do I make labs point the domain to the right place? [22:34:12] I was wondering if deployment-bastion would be acceptable - I think in pmtpa, noc.wm.o used to run from fenari? [22:34:24] It clearly needs to be somewhere with a mw install [22:36:34] https://phabricator.wikimedia.org/T83524 - yep, it was on fenari [22:39:14] Krenair: deployment-bastion imho is the best suited host that does duplicate production [22:39:34] as the only other option is mediawiki0[1-3] :) [22:42:36] Actually I think it would need puppet changes :( [22:51:05] :/ [22:59:21] Krenair: make a new host for it and we can easily add it to scap. I think it only needs MW to read the files right? It should be fine on a small instance [22:59:40] yeah, just the files [22:59:50] piling lots of crap on to the same vm just makes a mess for later [23:00:30] Krenair: or you could add it to deployment-fluorine [23:00:30] bd808: perfectly replicates production then :) [23:00:40] I think we'd need to change operations/puppet to get it to work [23:00:52] and get a new public ip for deployment-prep? :/ [23:11:48] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #191: FAILURE in 14 min: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/191/ [23:11:49] Krenair: wouldn't it just need varnish routing? [23:17:34] bd808, I don't know [23:18:11] beta cluster's vcl is a bit of a mess as I recall [23:18:18] where is it? [23:19:15] it should be in modules/varnish somewhere [23:20:09] in operations/puppet? [23:20:49] or on some local deployment-prep clone of operations/puppet? [23:28:26] bd808, [23:28:28] krenair@deployment-salt:/var/lib/git/operations/puppet$ grep beta.wmflabs.org modules/varnish/* -R [23:28:28] krenair@deployment-salt:/var/lib/git/operations/puppet$ [23:30:47] 10Beta-Cluster, 6Release-Engineering, 7Regression: Beta cluster logo broken (/static/images/project-logos 404 Not Found) - https://phabricator.wikimedia.org/T105541#1446779 (10Luke081515) I think all wikis are affected, at deployment is the favicon also broken. [23:31:44] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL 100.00% of data above the critical threshold [0.0] [23:36:38] (03CR) 10Dduvall: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/224179 (https://phabricator.wikimedia.org/T103039) (owner: 10Dduvall) [23:37:23] Krenair: hmmm... I'd have to poke around a bit to remeber how it is wired together. [23:37:49] role::tlsproxy::ssl::beta is part of it [23:38:20] but we don't even... sigh, ok [23:38:29] heh yeah [23:38:34] but we used to! [23:40:09] and varnish is stuff like role::cache::text [23:48:55] ugh [23:48:58] the docs are completely outdate [23:49:00] outdated*