[00:00:42] mediawiki/tools/schroot? [00:01:19] operations/software/schroot, if it's not mediawiki-specific [00:01:43] it's mediawiki specific [00:02:09] it's basically the same as vagrant [00:02:20] except in 150 lines of shell script [00:02:45] oh, neat. yeah, mediawiki/tools, i guess [00:03:30] I think my first priority for today is to fix enwiki which is apparently totally broken due to namespace corruption [00:03:49] second is probably something to do with HHVM and XML [00:04:13] T88361 isn't urgent is it? [00:04:45] no, erik just sent an email saying early march is fine for this [00:04:54] (i hadn't read it when i pinged you) [00:39:59] so it looks like none of the invalid titles on enwiki can be resolved automatically [00:40:33] is there a bug for this issue? [00:40:54] never mind, got it [02:54:50] apparently UDP logging doesn't work at all under HHVM? [02:54:55] is this a known issue? [02:55:48] we've had basically no API logs since Jan 18, and no XFF logs since January 25 [02:59:01] *rolleyes* "Disabled as spammy. Not often needed" [03:00:16] not often... if you do happen to need it though, whoopsie! [03:00:19] wtf does "spammy" mean? [03:01:15] I mean, it's collecting a very specific kind of information and putting it into its own file, it doesn't force you to read it [03:03:23] right, so ori made api.log be sampled [03:03:45] I mean, we basically bought this server for storing unsampled API logs, sized it appropriately, but whatever [03:07:31] looks like xff was https://gerrit.wikimedia.org/r/#/c/185923/ ... Reedy [03:08:34] yes, I know [03:17:31] TimStarling: hm? If you're referring to fluorine, it wasn't sized appropriately, because we kept getting disk usage alerts. [03:17:57] you can read about it in phabricator in a minute [03:18:34] OK [03:20:40] IIRC, api.log would balloon to ~150-200 GB in size before getting rotated and compressed, and it happened on several occasions that logrotate was killed mid-compression, which left the uncompressed file in /a/mw-log/archive [03:21:40] e.g.: [03:21:42] 22:51 icinga-wm: PROBLEM - Disk space on fluorine is CRITICAL: DISK CRITICAL - free space: /a 76224 MB (3% inode=99%): [06:37:23] [03:21:42] 22:53 _joe_: disk on fluorine again? [03:21:42] 22:53 ori: we never merged the 1:1000 patch [03:21:44] 22:53 _joe_: oh, my [03:21:46] 22:53 _joe_: let's do it [03:24:14] anyhow, at 1:1000 it's actually much more amenable to analysis using shell pipelines, but *shrug* [03:24:30] * ori waits for phab. [03:25:53] there's also: [03:26:11] I think you could have each element in $wgDebugLogGroups be an associative array [03:26:11] e.g. array( 'sample' => 1000, 'target' => "$host:$port") [03:26:11] that would allow for more features to be added to it in future [03:28:39] 3MediaWiki-Core-Team, operations: Store unsampled API and XFF logs - https://phabricator.wikimedia.org/T88393#1010498 (10tstarling) 3NEW [03:29:07] there you go [03:29:35] 3MediaWiki-Core-Team, operations: Store unsampled API and XFF logs - https://phabricator.wikimedia.org/T88393#1010506 (10tstarling) [03:36:33] TimStarling: so, to recap: "we basically bought this server for storing unsampled API logs, sized it appropriately, but whatever" -- the procurement request explicitly says "we can make do with sampled logs"; the server was sized for 2MB/s at peak, whereas we regularly exceed that even with 1:1000 for api.log [03:37:35] no, it was not sized for 2 MB/s [03:38:15] there was no need to make do with sampled logs since we had adequate hardware for unsampled logs [03:38:27] and we still do have such hardware, right? [03:38:28] at one point in time [03:38:37] you just saw a disk usage alert and disabled it [03:38:47] not "disk on fluorine *again*' [03:38:50] *note [03:39:08] several additional log buckets have been added since then, some of them quite verbose [03:39:11] you are saying that reducing the retention time is not possible? [03:39:41] are you saying that the server was at 100% disk utilisation? [03:39:49] disk I/O utilisation, I mean? [03:39:58] no, it wasn't [03:40:03] so what is the problem? [03:40:25] are you saying that API logs are not useful? [03:40:28] i'm saying that your description above paints a picture of me glibly turning off useful features for no rhyme or reason, whereas the reality is a little bit more nuanced [03:40:53] do you have any ideas for isolating T87645 without API logs? [03:41:43] * ori reads [03:43:25] TimStarling: oxygen:/a/log/webrequest/edits.tsv.log [03:43:53] not sampled. you're welcome! :) [03:44:07] * ori goes home. [03:46:39] 3MediaWiki-Core-Team: Memcached class with relayed delete() and daemon - https://phabricator.wikimedia.org/T88340#1010539 (10aaron) See https://gerrit.wikimedia.org/r/#/c/187074/ and https://github.com/AaronSchulz/python-memcached-relay/blob/master/mcrelayd.py [04:19:23] ori: only API edits were affected, and only maybe 10% of them [04:19:42] the theory is that there was a single API server that was affected [04:20:11] to confirm that, it would be nice to know what the backends were for each of the affected edits [04:20:19] frontend request logs do not have that information [04:21:10] also that edit.tsv.log only has log entries where action=edit appeared in the URL, but most bots POST all their parameters, including action [04:38:15] TimStarling: IIRC individual api servers should have unfiltered apache request logs in /var/log/apache2.log{.1,.2,..} [04:39:47] they have /var/log/apache2/other_vhosts_access.log [04:41:08] but it is still only URLs, not POST data [04:41:45] oh, and the hostname is omitted, heh [04:42:10] makes things interesting [04:46:42] we have binlogs [04:49:45] but they don't have server names [05:03:51] there's nothing characteristic in hhvm.log as far as I can see [05:05:16] 3MediaWiki-Core-Team, MediaWiki-Configuration, MediaWiki-Vagrant: Update MediaWiki-Vagrant's role system to use extension registration - https://phabricator.wikimedia.org/T86990#1010606 (10Legoktm) p:5Triage>3High [05:05:45] 3MediaWiki-Core-Team, Wikimedia-General-or-Unknown: Existed pages without ability to reach and obviously wrong namespace - https://phabricator.wikimedia.org/T87645#1010607 (10tstarling) a:5hashar>3tstarling [05:41:21] anyone want to comment on https://gerrit.wikimedia.org/r/#/c/188304 or should I just self merge it? [05:41:27] e.g. ori [05:42:11] would it be useful to have a dedicated log bucket for this, at least for the time being? [05:42:50] you mean log and also throw an exception? [05:43:12] I don't want to let the request continue, the idea is to stop the data loss [05:43:38] I thought it would be easy enough to grep the exception log [05:43:41] yeah. it has a slightly better chance of catching someone's eye than a few lines in exception.log. anyways, up to you. [05:43:41] right. [05:43:58] anyways, reviewing [05:49:29] do you have any theory as to what is causing it? [05:50:26] Jenkins is completely bonkers [05:51:46] well, if Language::lc() returned garbage then that would probably be sufficient [05:52:32] it's not CDB, because the namespaces are often canonical [05:52:45] and canonical namespaces are configured, not loaded from CDB [06:00:44] 3MediaWiki-Core-Team, operations: Store unsampled API and XFF logs - https://phabricator.wikimedia.org/T88393#1010638 (10Joe) Honestly I don't see the point in creating a godzilla 130 GB file every day. A correct way to tackle this is probably rotating the file more often than daily, and keep 7 days of retentio... [06:01:41] I could spend at least another 15 minutes looking at this without getting bored, so if you don't mind waiting, please do. But it looks sane enough if you want something in place quickly. The only things I can think of is to trigger_error or throw a plain exception, just in case there is some try / catch clownishness going around, and to potentially include some additional datapoints in the log message to make the issue easier to recon [06:01:42] struct (for example, the value of Title::newFromText( 'User:Foo' )->getNamespace()) [06:25:54] 3MediaWiki-Core-Team, operations: Store unsampled API and XFF logs - https://phabricator.wikimedia.org/T88393#1010663 (10tstarling) >>! In T88393#1010638, @Joe wrote: > Honestly I don't see the point in creating a godzilla 130 GB file every day. If we knew what HHVM server(s) were involved in T87645, that would... [11:42:34] 3MediaWiki-Core-Team, Wikimedia-General-or-Unknown: Existed pages without ability to reach and obviously wrong namespace - https://phabricator.wikimedia.org/T87645#1011122 (10TTO) [14:44:39] 3MediaWiki-Core-Team, Wikimedia-General-or-Unknown: Existed pages without ability to reach and obviously wrong namespace - https://phabricator.wikimedia.org/T87645#1011488 (10Anomie) >>! In T87645#1010605, @tstarling wrote: > enwiki rev_id 645295812 (the last in your list) happened on Feb 2 at 12:08, so it may b... [15:03:07] 3MediaWiki-API, MediaWiki-Core-Team: redirecttitle in API:Search returns title object instead of text - https://phabricator.wikimedia.org/T88397#1011519 (10Anomie) a:3Anomie [15:24:53] 3MediaWiki-API, MediaWiki-Core-Team: redirecttitle in API:Search returns title object instead of text - https://phabricator.wikimedia.org/T88397#1011581 (10Fomafix) The code was added in ce9bd769 in Nov 1 2010. Certainly it was intended to return only the title instead of the entire object. But for the last 4 ye... [16:48:09] anomie: I've got a honey-do for you from mobile if you have some time. Review https://gerrit.wikimedia.org/r/#/c/185323/ as a possible solution for https://phabricator.wikimedia.org/T86955 [16:50:45] * anomie will look in a minute [16:50:58] cool. not a drop everything sort of deal [16:51:12] more a "if you can get to it this week" [17:03:59] gerrit dashboard of doooom! -- http://tinyurl.com/core-gerrit [17:23:19] <_joe_> legoktm: can one user start the process of account merging from testwiki? [17:23:48] <_joe_> if so, or if he/she installs the chrome extension ori created, the debug host has the new package [17:24:03] <_joe_> so you can verify if the XOR patch really fixed the problem [17:25:26] 3MediaWiki-API, MediaWiki-Core-Team: redirecttitle in API:Search returns title object instead of text - https://phabricator.wikimedia.org/T88397#1011856 (10Umherirrender) FYI: This not effect format=xml because it calls toString for objects (internal reasons). [17:28:22] 3MediaWiki-API, MediaWiki-Core-Team: redirecttitle in API:Search returns title object instead of text - https://phabricator.wikimedia.org/T88397#1011859 (10Legoktm) 5Open>3Resolved [17:30:15] 3MediaWiki-Core-Team: MediaWiki multi-datacenter investigation and work - https://phabricator.wikimedia.org/T88445#1011867 (10aaron) 3NEW a:3aaron [17:34:14] 3MediaWiki-Core-Team: MediaWiki multi-datacenter investigation and work - https://phabricator.wikimedia.org/T88445#1011896 (10bd808) [17:34:38] 3MediaWiki-Core-Team: MediaWiki multi-datacenter investigation and work - https://phabricator.wikimedia.org/T88445#1011899 (10bd808) p:5Triage>3Normal [17:35:30] _joe_: thanks, I'll test it in a bit [17:38:51] 3MediaWiki-API, MediaWiki-Core-Team: redirecttitle in API:Search returns title object instead of text - https://phabricator.wikimedia.org/T88397#1011912 (10Anomie) Announced to mediawiki-api-announce, see https://lists.wikimedia.org/pipermail/mediawiki-api-announce/2015-February/000077.html [17:43:50] 3MediaWiki-API, MediaWiki-Core-Team: redirecttitle in API:Search returns title object instead of text - https://phabricator.wikimedia.org/T88397#1011928 (10Krenair) Sounds like a similar thing to T45518 - but with a Title object rather than another User. [17:47:58] geez, legoktm, you have a lot of open patches [17:49:13] :/ [17:52:43] bd808: so for the uber-optimized-autoloader, we first need to patch multiversion and then we can update the rest of them? [17:53:35] I think so. patch multiversion, then phpunit, then mw/vendor, then start using it [17:53:54] this is a pain in the ass part of how the composer autoloader works [17:54:16] 3MediaWiki-API, MediaWiki-Core-Team: redirecttitle in API:Search returns title object instead of text - https://phabricator.wikimedia.org/T88397#1011960 (10Anomie) Fortunately the Title object output doesn't contain sensitive data. [17:54:28] Every vendor repo ships the base class and the first one loaded at runtime wins [17:55:13] I guess I should have guarded things with a check to see if the base class supported the method [18:00:51] heh, mid air conflict [18:04:25] bd808: bahhhhhh https://integration.wikimedia.org/ci/job/operations-mw-config-tests/12734/console [18:04:34] I think phpunit needs to be first [18:07:59] or we could just convert that repo to use composer based testing... [19:25:22] 3MediaWiki-Core-Team, Librarization: Make MWException handle non-MW exceptions better - https://phabricator.wikimedia.org/T76652#1012311 (10Nemo_bis) Hm. https://gerrit.wikimedia.org/r/#/q/topic:kill-mwexception,n,z [19:25:41] 3MediaWiki-Core-Team, Librarization: Make MWException handle non-MW exceptions better - https://phabricator.wikimedia.org/T76652#1012313 (10Nemo_bis) [21:27:53] 3MediaWiki-Core-Team, MediaWiki-extensions-CentralAuth, SUL-Finalization: Expose users_to_rename table publicly - https://phabricator.wikimedia.org/T76774#1012795 (10Legoktm) a:3Legoktm [21:40:06] AaronS: i would like to review your python redis pub -> memcached proxy, what's the best way to do that? github issues / pull requests? [21:40:53] I prolly added a bunch of bugs with my last commit, I need to test that at the office [21:41:14] * AaronS planned on being in by now [21:41:15] also, you know about the udp proto, right? http://www.facebook.com/note.php?note_id=39391378919 [21:41:19] * ori|away too [21:41:25] ok see you in the office then :P [21:41:54] * AaronS want to wrap up the first past of a cassandra stash class [21:41:58] sessions! [21:43:10] past/pass [21:43:34] ori|away: you can look at my non-wip MW core comments [21:43:42] some bagostuff cleanups [22:25:01] * bd808 is trying to remember why we have a public and a members only mailing list [22:25:28] * bd808 and why he apparently randomly chooses a different one each time he sends mail [22:30:15] mw [22:31:58] If I type "core" the public one is chosen and if I type "mwcore" the private one is [22:32:09] * bd808 shakes fist at gmail [22:35:34] You were tipped 0.13 XPM for your commit on Project wikimedia/mediawiki. Please, log in and tell us your primecoin address to get it. [22:35:34] Your current balance is 0.56 XPM. If you don't enter a primecoin address your tips will be returned to the project in 30 days. [22:35:36] fuuuuuuuuuuuuuuuuu [22:35:53] _joe_: ^^ [22:36:28] <_joe_> shit, for realz? [22:36:45] more primecoin nonsense? [22:36:49] <_joe_> you got 0.56 funbux, Reedy, cheers! [22:37:16] srsly [22:37:24] Reedy: only 0.56? I had more than 2 before I made them stop emailing me [22:37:33] :P [22:37:42] https://github.com/sigmike/prime4commit [22:37:47] hmm, can't open issues [22:38:07] there should be a "stop emailing me" link in the email you got [22:39:12] yeah [22:39:25] but we should get wmf projects removed [22:39:32] http://prime4commit.com/projects/208/tips [22:39:41] or jsut make more commits [22:39:47] lol [22:39:49] * Reedy slaps bd808 [22:40:48] could you dos them by making too many commits? [22:41:41] probably not. interestingly they claim each commit gets 1% of remaining balance [23:00:22] Reedy: One of the designers is working on new 404 & 500 page templates. He's asking for help figuring out how to test his designs. Any tips on how to "easily" setup a wiki using multiversion and our docroot for testing? [23:01:21] I think 404's are handled by w/404.php and wmf-config/missing.php right? [23:01:44] can't you just view them in a browser? [23:02:51] hmm... that shouldn't be to hard to do for w/404.php at least [23:02:58] I'm a huge fan of https://s3.amazonaws.com/f.cl.ly/items/313T0Z342P360I0J1L0d/Wikipedia%20is%20Down!.jpg [23:03:10] bahha [23:03:12] bahaha [23:03:27] https://phabricator.wikimedia.org/M15 [23:03:56] I do like the "roan's on the case" text a lot [23:15:43] 3MediaWiki-Core-Team: Devise caching (memcached) strategy for multi-DC mediawiki - https://phabricator.wikimedia.org/T88492#1013160 (10aaron) 3NEW a:3aaron [23:17:19] 3MediaWiki-Core-Team: Devise stashing strategy for multi-DC mediawiki - https://phabricator.wikimedia.org/T88493#1013167 (10aaron) 3NEW a:3aaron [23:20:49] seems like every 2-3 years, we go through a cycle of: [23:20:51] 3MediaWiki-Core-Team, Wikimedia-General-or-Unknown: Existed pages without ability to reach and obviously wrong namespace - https://phabricator.wikimedia.org/T87645#1013187 (10tstarling) One possibility for a cause would be Language::lc() or another function in the same area of the code returning garbage, say due... [23:20:59] 1. Replace the 500 error page [23:21:13] 2. Have a 500 storm [23:21:55] 3. Find out that the new 500 page causes the 500 storm to be way worse [23:23:21] I don't recall the last time this happened, but it may be long enough ago that the institutional knowledge of what's fine and what's bad might not be there. [23:25:26] it should be possible to use images on the error page [23:25:56] as long as they use upload.wikimedia.org as a source, not data: URIs [23:26:15] if upload is down, the images won't load, which is fine [23:26:54] you can't really use linked CSS, or have a large style tag in the page, so that is a limitation [23:27:43] similarly with JS, it's possible to load code from the bottom of the document, but not really from the [23:49:39] Are data URIs bad because of the increased bandwidth? [23:50:27] These are generally probably good tips that should be given on https://phabricator.wikimedia.org/T76560 and the subsequent code reviews for proposed changes