[00:04:47] AaronSchulz: could i ask you to look into https://phabricator.wikimedia.org/T117679 "Link tables not being updated"? link table updates seem to have become much less reliable recently (oldest example on the bug is ~2 weeks). [00:58:14] bd808: re: MatmaRex_'s request above, how would you feel about assuming responsibility for an initial investigation? [00:58:58] ori: Like I had no idea where to start looking [00:59:56] I know, but we have to break out of that eventually, right? The problem is self-perpetuating. I don't know -> Aaron would know -> delegate to Aaron -> Aaron learns a bit more, the rest of us stay clueless [01:00:39] easy, clone Aaron [01:00:43] I've got my "leanr new things" queue full of hadoop and hive right now [01:00:44] If you get stuck at least we know what needs to be documented better [01:01:02] bd808: right, but you probably want to mix in something actually useful to the projects, right? ;) [01:01:10] lol [01:01:19] what's my KPI for that? [01:01:45] IOUs from performance team [01:01:54] we'd owe you one :P [01:03:22] Right now I'm going to get offline and out of the chair I've been in for 9 hours [01:03:34] fair, long day for everyone [01:11:54] MatmaRex_: it's on the todo list, lots of stuff to get too. I didn't see any "article not found" errors in kibana for titles in T117332, so it's not just simple slave lag making the revision not be found for those new files. [01:13:15] null edits fix it though, which just enqueues another jobs...so the problem must be with the original job not working somehow [01:14:16] hmm, haven't seen that task before. is it likely to be the same issue? (would it be helpful to merge them?) [01:15:20] probably the same [01:15:39] there is some low hanging slave-lag fix "fruit" I fix though...not sure it would help but it should be done [01:15:52] (e.g. switching to READ_LATEST if the expected rev is not there) [01:16:00] *I can fix [01:18:29] AaronSchulz: okay, i'll merge. mind if i copy what you said to the task? [01:20:41] sure [01:32:28] thanks [04:31:05] bd808: Hm.. so anything going to plain wfDebug(), that is not logged in prod right now, right? except for testwiki [04:32:03] Krinkle: correct [04:32:21] Hm... then we better not use it for logging error/warning level stuff in old code [04:32:32] yeah [04:32:34] Noticed a few of them in MessageBlobStore.php which I'm refactoring [04:32:42] and wondering if they never happen in prod, or just ignored [04:32:48] I think the db layer has a bunch too [04:33:26] I think we should actually try to get rid of all wfDebug() calls [04:34:02] I'd like to get rid of wfDebugLog too but that is generally a bit easier [04:44:24] TimStarling: I'd like to get in https://gerrit.wikimedia.org/r/#/c/252107/1 first then make a few tweaks (the override header and such) to the other patch [04:45:39] ok [04:48:28] * AaronSchulz spots a doc spelling typo [04:49:59] fixed [05:01:41] just figuring out your theory on why the cached value should increase monotonically [05:03:45] I suppose it will, assuming threads are woken in order of the position they are waiting for [05:07:41] * Krinkle may have found a big that's been in MW since the initial SVN revision [05:08:00] impressive [05:08:16] https://github.com/wikimedia/mediawiki/blob/73d13883bfe0114c2451fb2ec31e203ec93efdd9/includes/diff/DairikiDiff.php#L742-L751 [05:08:25] TimStarling: I was gonna ask you. I can't seem to figure out a way to hit this code path. [05:08:32] That is looping over an empty array, no? [05:08:59] yep [05:09:01] https://github.com/wikimedia/mediawiki/commit/d82c14fb4fbac288b42ca5918b0a72f33ecb1e69#diff-955cbc35eb632b1873d5de1c5f4e25e3R634 [05:09:04] but nothing ever calls it [05:09:05] 2003 Lee Crocker [05:09:12] yeah, it predates Lee though [05:09:12] (import) [05:09:41] you have read the credits? it is called DairikiDiff because it was by Geoffrey Dairiki [05:10:00] Copyright © 2000, 2001 Geoffrey T. Dairiki [05:10:12] so the bug probably dates from 2000 or 2001 [05:10:26] Hehe [05:11:25] I'm also not sure if the logic in general holds up. Reversing the array doesn't seem like that would have the effect of reversing a diff. [05:11:42] Though it does call reverse() [05:11:47] you know I ported this algorithm to C++ (probably not including reverse()) [05:11:51] and I did coverage tests on it [05:11:59] wikidiff3? [05:12:04] and there were a few functions in there that were apparently never reached [05:12:05] Oh the C extension [05:12:25] https://github.com/wikimedia/mediawiki-php-wikidiff2 [05:12:29] yeah, that [05:12:59] Yeah, any reverse logic happens within mediawiki [05:13:04] wikidiff2 goes straight to html serialisation [05:13:07] maybe shiftBoundaries? [05:13:30] some significant part of it anyway [05:13:44] https://phabricator.wikimedia.org/T56328 [05:14:09] I've been looking to find a way to leverage the intraline level diffing from it into non-html diff formats. [05:14:55] As I imagine it's still very much beneficial to reach out to C for diffing. [05:15:01] should be easy enough [05:15:45] the formatting code is of course separate from the diffing code [05:16:18] Right. TableDiff.cpp extends Wikidiff2.cpp [05:16:46] As I imagine it's still very much beneficial to reach out to C for diffing. [05:16:48] it may not be [05:17:04] you do get other advantages by being in C++ [05:17:07] i mean, C will be faster, but maybe not so much faster that it's worth maintaining an extension [05:17:11] specifically libthai integration [05:17:29] I'm somewhat confused about wikidiff2 vs wikidiff3 though. [05:17:36] one being cpp, the other being in MW [05:17:44] Is it just about gnu-diff3? [05:18:46] names are hard [05:19:29] wikidiff3 is not derived from wikidiff2, or anything else [05:19:48] well, he says it is based on eclipse [05:19:56] Wikidiff3 isn't even an instance of Diff [05:20:21] he probably should have thought of a better name for it [05:20:40] my wikidiff2 is in fact derived from wikidiff [05:20:51] so that was a more sensible name [05:21:13] although wikidiff to start with was a bit too vague [05:21:36] Right. wikidiff and wikidiff2 are both C extensions to PHP. [05:23:03] the interfaces appear compatible in how wikidiff (1) and wikidiff2 are called from MW [05:24:09] TimStarling: Would you be interested in maybe (once the task settles a bit) on helping with wikidiff2 and exposing the word diff functionality through an array-based model of sorts? [05:25:11] MobileFrontend seems to have re-implemented it in PHP for their MobileDiff special page. [05:25:19] Aye.. [05:25:28] I guess I could [05:26:47] class InlineDifferenceEngine extends DifferenceEngine { [05:27:18] if ( function_exists( 'wikidiff2_inline_diff' ) ) { [05:27:19] $text = wikidiff2_inline_diff( $otext, $ntext, 2 ); [05:28:08] so they haven't reimplemented it in PHP, MaxSem (I assume) has extended the C++ extension to support them [05:31:29] Interesting https://github.com/wikimedia/mediawiki-php-wikidiff2/commit/6dedd2d77dccf8b81911e45cce065059920bf4a0 [05:31:53] That commit message focusses on two column vs one column [05:33:23] But it seems like this is also HTML based [05:34:12]
I'm hoping maybe we can reduce it to a simple model and move the html wrapping back to PHP. I hope the iteration over the lines and wrapping is simple enough to do fast enough in plain PHP [05:36:17] yeah, it is faster if you format on the C++ side [05:37:04] g2g [16:07:48] Need to do the same thing to 6 documents.... breaking up lines [16:07:52] Fuck doing it manually [16:39:46] Reedy: possibly relevant -- https://xkcd.com/1319/ [16:40:06] hahah [16:40:15] I'm done with it now.. I think [16:40:19] Unfortunately GIGO [16:40:26] So we have missing data to begin with [16:40:40] Because, you know, who needs part numbers on production drawings for parts used for gas meters [16:41:28] And then, we won't be consistent anyway with descriptions [16:45:55] I've got an UBN! logging issue that could use code review -- https://gerrit.wikimedia.org/r/#/c/251679/ [16:46:01] ori, Krinkle: ^ [16:47:50] Checking [16:48:05] thanks [16:48:10] Aye, it changed [16:48:46] yeah I got fancier than PS1 rpoposed [16:49:36] I started working on a Filter to make things UTF-8 safe and then realized that it was an edge case better handled in the Formatter on failure [16:50:52] I don't feel good merging, but that's because I don't know the area very well. [16:50:55] Looks good though. [16:53:07] Krinkle: thanks. I can probably get ori or legoktm to merge if they agree [16:53:47] also, you ask a lot of logging questions so I should try to make you feel more comfortable with the stack we use [16:54:35] I suppose a good place to start would be to finally get around to documenting how WMF uses Monolog in prod [16:56:55] Reedy: What was the outcome of enabling error channel (w stacks) [16:57:06] I've not had chance [16:57:08] Been busy most of the day [16:57:31] Oh, I recall you having a patch [16:58:37] Yeah [16:58:45] Not had chance to enable it, monitor for 5 or so and disable again [17:00:01] Reedy: Could also start by enabling it for mw1017 dynamically and leave it there (flourine only) [17:00:39] https://wikitech.wikimedia.org/wiki/Debugging_in_production#Accessing_mw1017_with_a_magic_header [17:03:32] Or even let it to go logstash, why not. Something like channel error = array( enable => wfHostname === .. ) [17:57:24] bd808: reviewing [17:57:40] oh, I saw this yesterday [17:57:48] I still don't understand what about this problem is unique to mediawiki [17:58:03] in other words, why is not an upstream issue that should be fixed upstream? [17:58:11] upstream made it a runtime error on purpose [17:58:21] I'm not entirely sure why [17:58:42] i suspect that they just don't have our volume of traffic so they don't run into crazy encoding issues as reliably as we do [17:59:31] i'm not opposed to having a temporary workaround in our codebase if we can agree that it's temporary and at least start a convo with upstream [17:59:52] csteipp: Ping for basic status on the SessionManager security review. [17:59:55] or maybe this problem is specific to us or specific to mediawiki in a way i'm not seeing, in which case please explain [18:00:17] ori: well, see https://github.com/Seldaek/monolog/issues/545 [18:00:32] they did it on purpose [18:00:58] yeah, but that doesn't mean they thought it through [18:01:03] "Would it perhaps not be better to throw an error in toJson()?" [18:01:21] "no, it wouldn't, because such issues exist in the wild and there isn't much you can do about them; they are not an indication of an application error." [18:01:27] * YuviPanda nervously waits for talk [18:01:35] "deciding on a way to normalize them so they can be serialized would be a better approach" [18:02:51] errors should be actionable, right? I don't see value in emitting errors because a remote user has a shitty browser [18:03:44] again, my style is to come out swinging, but i'm not gung ho about this, if i am not seeing something here please let me know [18:03:51] anomie: Should be done this week [18:03:54] ok [18:04:18] ori: I'm going to poke them about the decision upstream [18:04:29] but I think we should fix now if we can [18:04:29] thanks [18:04:35] OK [18:04:56] let me look it over one more time to see if there are any silly errors and i'll merge it if it looks good [18:07:31] it's a shame json doesn't have a bytes type [18:07:52] json is a shame [18:08:11] you've been hanging out with tim too much :P [18:08:59] lack of comments was a dumb design decision [18:52:39] imma kill sync-dblist. [18:52:46] It's useless now that we have them all in a single dir. [18:53:19] jfdi [18:53:37] I wrote a patch, just needs someone to accept [18:54:07] https://phabricator.wikimedia.org/D43 [18:56:03] ewww, Differential sucks [18:56:41] repository name is somewhere below and you need to remeber all those crazy acronyms [18:57:05] Those acronyms are going away soon [18:57:19] TAaGAS [18:58:06] (I like Differential, FWIW) [19:40:06] ori, bd808: I wonder if we could move CDB development to Phab. It's not updated often and it'd be good to get a least one MW library in that pipeline soon. [19:41:54] fine by me. AhoCorasick would be fine too -- I kept it around because it could be useful for people, but we didn't end up using it ourselves [19:42:00] RunningStat too [19:42:12] RunningStat we are using, but I'd be fine with it in Phab [20:18:53] ostriches: sure. I'm not sure that CDB is going to see a lot of development though really [20:20:42] ori: Upstream at monolog they think we should just wrap everything up in https://github.com/Seldaek/monolog/blob/master/src/Monolog/Handler/WhatFailureGroupHandler.php [20:21:00] it's an error suppressing handler [20:21:54] More thought about getting something moved so we can get composer and the vendor submodule, etc all working right with Phab as needed. [20:23:07] bd808: seems Seldaek still hasn't considered your solution (to normalize/escape the encoding) [20:23:14] prpose that maybe? [20:23:48] I guess that means I need to get it to work on php5.3 first :/ [20:24:09] * bd808 blames iconv integration bugs [20:24:20] could just propose it and see if he thinks it's a good idea first [20:49:01] ori: the other scrubbing thing I was thinking about was more along the lines of what the less command does. When it sees a byte that is invalid UTF-8 it outputs '` where XX is the hex value of the offending byte. [20:49:29] that would be more reliable than the iconv stuff seems to be [20:50:33] I think I could whip up a small library that just did that with reasonable performance [22:06:37] AaronSchulz, hoo: do you know why there would be queued LocalRenameUserJob for over 24h that still haven't run yet? https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Zaher.Kadour is stuck, I manually ran the arwikisource one just now and it was fine... [22:06:58] legoktm@mw1152:~$ mwscript showJobs.php --wiki=aswikisource --group [22:06:58] LocalRenameUserJob: 1 queued; 0 claimed (0 active, 0 abandoned); 0 delayed [22:07:50] hm [22:07:59] will have a quick look [22:08:37] I didn't see anything in exception logs that indicated that they had run once and failed [22:08:57] (and by manually run, I mean runJobs.php --type=LocalRenameUserJob) [22:11:31] the job has attempts=1 [22:11:37] So it failed before [22:14:07] o.O [22:14:25] hoo: where do you see that? [22:14:42] legoktm: showJobs.php --list [22:14:55] aha. [22:15:03] okay, but shouldn't it have been re-queued then? [22:15:27] It has [22:15:35] Well, it's still in the queue [22:17:21] so...the job queue is just behind? :/ [22:17:45] It will pick it up again at some point [22:17:49] might take a bit though [22:17:55] I'm not that into the details [22:18:01] mwscript extensions/WikimediaMaintenance/getJobQueueLengths.php | grep aswiki [22:18:03] gives nothing [22:18:10] so it's not in the "queues with jobs" hash [22:18:47] Oh, I see [22:19:07] csteipp: https://gerrit.wikimedia.org/r/252339 [22:19:19] in theory, that eventually gets rebuilt by jobchron, maybe it could just be made less racey to begin with though instead of relying on that, which can take a while (though it shouldn't take days either) [22:22:11] is it alright if I run a foreachwiki loop to clear out the individual LocalRenameUserJob queues? or should I leave it alone for debugging? [22:23:18] AaronSchulz: ^ [22:36:52] legoktm: https://gerrit.wikimedia.org/r/#/c/252342/ [22:39:00] * legoktm learns about redis's WATCH [22:41:16] AaronSchulz: I assume that won't retroactively fix this situation? You said jobchron was supposed to rebuild it though... [22:41:33] I'm trying to rebuild the hash but now it's stuck empty [22:41:37] * AaronSchulz will have to hack around more [22:41:52] ok [22:41:59] * legoktm posts a quick update on-wiki [22:43:22] Thanks :) [22:44:36] * AaronSchulz manually deletes the :lock key and rebuilts [22:47:15] legoktm: ok, done, aswikisource is there now [22:47:25] yay [22:47:44] https://grafana-admin.wikimedia.org/dashboard/db/job-queue-health [22:47:50] haha, the 0 was me, it will go up again ;) [22:47:51] will the other wikis begin to just work, or will it require manual fixing? [22:48:51] they should have their jobs popped now [22:49:33] > There are no renames in progress for Zaher.Kadour. They may have already finished. [22:49:35] yay, thanks :D [22:50:08] cursive: ^ [22:51:33] Great, thanks AaronSchulz et. al [22:56:03] legoktm: I bet lots of jobs on rdb3 (1007) didn't end up in the hash since jobchron had to be restarted manually today [22:57:31] so the background script never added them (not sure why the notifyNonEmpty calls didn't work though, maybe the tried to pop() before mw was aware of 1007 since workers run for a min or two before exiting) [22:57:38] *they tried [22:57:56] ahhh