[00:02:15] tgr: can you try a https://meta.m.wikimedia.org login on mw1017. anomie and I have auhted successfully [00:02:30] hashar_: uh, good luck :( [00:02:37] outage debugging at 1AM is not fun [00:03:54] bd808: anomie: yup works now [00:04:06] although I tried commons [01:48:02] TimStarling: Sigh. The xbox server can be used in a process-local fashion (in which case the 'server' is simply another thread), or the client and server can be discrete servers that pass messages using HTTP. The former works fine with or without the output streaming patch. The latter segfaults with or without the output streaming patch. [01:52:33] ahhhh nope! Got it! [02:13:58] TimStarling: OK, so failure is reproducible. Try running /srv/output_streaming/run-test.sh on hhvm-test.hhvm.eqiad.wmflabs . Repro code also posted at . [02:14:27] do you think you'd have time to look into that? [02:15:18] yes, I can look at it [02:16:00] yay [04:48:26] bd808: can I make logstash use UTC? [04:48:47] where by logstash I mean kibana [04:48:51] well... no [04:49:06] its a bug in the way they use javascript in the time pickers [04:49:29] things will show in utc but when you set a date its set in the browser's tz [04:49:47] really f'ing annoying [04:51:59] I guess you can by setting your laptop to UTC :/ [04:56:10] is it possible to magic a magnifier icon to the uid field which looks for all messages with that uid? [04:56:36] btw is it worth the effort to report kibana issues / write paches? [04:56:43] how far are we behind master? [04:56:50] we are using an ancient version [04:56:56] like really ancient [04:57:05] the modern ui is totally different [04:57:22] and the last time I tried it really broken but that was months ago [04:58:14] if you open a log entry so you can see the uid field there should be a magnifier picker next to it [04:58:34] which is really just adding "uid:" to the query [04:59:53] yeah, but ideally it should add that and drop everything else [05:00:18] that's not very ideal for refining a query [05:00:25] but I get your usecase [05:03:00] current upstream is 4.4.0 (just released) and we are on 3.1.1 + maintenance patches from the 3.0 branch [05:05:10] while we are at it, thanks for all your work to improve logging :) I like to nag but I don't even want to think about how the same workflow would look with a 10G logile on fluorine [05:05:27] :) yw [05:05:45] I think we are making some progress [05:07:08] tgr: a lot faster [05:07:42] flatfiles are a lot faster? [05:07:59] RoanKattouw_away: re: getting logged out on officewiki, can you check if you have an officewikiToken cookie set? [05:10:41] I think so; I think kibana's interface has a long way to go before it is as flexible and powerful as shell pipelines and grep. [05:11:05] If you have some very specific values you need to extract from a large log, ElasticSearch will retrieve it quicker, because a flat file is not indexed. But most log analysis does not involve indexed operations. It involves filtering. [05:12:02] ori: to an extent it probably depdends on how proficient you are with the shell [05:12:13] agreed. no ui is going to beat a good user of awk/grep and a shell pipeline. Kibana is more accessible. Like an IDE for logs [05:13:01] but even really simple things like "filter for log lines containing word X" are very painful as soon as the data you log is even slightly complex [05:13:05] but I don't see a giant shared collection of python and awk scripts to do common log filtering and analysis [05:13:09] as in, can contain line breaks [05:13:16] usage took off in the last year, I see a lot of people using it [05:13:45] tgr: grep '\bword\b' [05:14:06] that gives you line 13 of a 35-line log message [05:14:16] grep -C10 '\bword\b' [05:14:31] that will cut off both ends of the data [05:15:07] yeah but you're not copying values into excel, you just want to see the entry [05:15:10] plus it will be full of garbage from other log events if the number of lines per filtered log event is not constant [05:15:24] indeed [05:15:36] ori: what would it take to get a "small" go program deployed in prod? [05:15:44] seeing 20 lines of a 30-line var_dump is not guaranteed to be helpful [05:15:45] it would take a "small" miracle [05:15:50] heh [05:15:55] shouldn't be too hard [05:16:07] is this your logstash CLI thing? [05:16:10] that looked really handy [05:16:10] yeah [05:16:24] here's something I did with it recently -- https://phabricator.wikimedia.org/P2544 [05:17:04] ggml? good god, much log? [05:17:12] go get my logs [05:17:29] this is HN material, you should post it [05:17:36] it looks slick [05:17:55] I'm sure a real go dev will puke all over it [05:19:10] I'd like to get tailing working with it better, but E_TOOMANYPROJECTS [05:19:28] it's funny how the exact same features that Google baked into Go to make it easy to deploy code [05:19:33] make it complicated to deploy here [05:19:50] yeah. static linking can be gross [05:20:02] so are grumpy debian people [05:20:07] <_< [05:20:27] and the way that go does(n't) handle dependency pinning worth a damn is gross too [05:20:31] it's a real problem, i guess [05:20:47] (static linking) [05:20:59] it certainly can be [05:21:33] one nasty bug in a widely used lib and you have to recompile your whole cluster [05:22:09] anyways, while it is generally annoying that a python utility script can be just checked into the puppet repository and deployed in minutes, while an equivalent Go program has to go through packaging purgatory [05:22:20] this actually looks useful enough to package [05:23:24] _joe_ packaged etcd IIRC [05:23:34] so maybe there's a good template to follow [14:41:09] bd808, tgr: Heh. If you look at https://graphite.wikimedia.org/render/?width=1000&height=600&_salt=1455047738.237&lineMode=connected&target=MediaWiki.edit.failures.session_loss.count you can see exactly when elukey adds or removes servers from the pools as recorded on https://wikitech.wikimedia.org/wiki/Server_Admin_Log. [18:24:18] anomie: this graph is climbing fast -- https://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=Memcached+eqiad&h=mc1001.eqiad.wmnet&jr=&js=&v=523764096.000000&m=used_memory&vl=bytes&ti=used_memory [18:24:31] seems like it's probably not us but worth figuring out [18:24:36] tgr: ^ [18:24:46] ask on _sec [18:24:55] in case elukey can explain it in terms of the work he has been doing [18:25:30] bd808: I'm not sure I'd call it fast, and it might be to all the redis messing-around that goes with upgrading all the servers that has been going on today. [18:26:03] yeah. poking elukey [18:30:31] bd808: or https://gerrit.wikimedia.org/r/#/c/263566/ which was just deployed? [18:30:47] at a glance it does not use memcached but I haven't looked closely [19:05:14] SMalyshev: https://www.mediawiki.org/wiki/Manual:External_libraries [19:17:40] bd808: Re the redis memory graph, maybe this one is more useful? https://ganglia.wikimedia.org/latest/stacked.php?m=used_memory&c=Memcached%20eqiad&st=1455218004&host_regex=&r=week [19:18:05] at least it's more colorful ;) [19:18:18] taste the rainbow! [19:18:30] * bd808 gets lunch [19:44:20] anomie: good point about things other than strings/ints in session data. Any shiny idea about how to handle that reasonably? [19:44:57] If we were logging right there instead of later on the logger would take care of that [19:45:10] maybe that would be easier to implement? [19:45:25] Yeah, maybe we should just do that. [19:45:32] we'd end up double logging possibly but that doesn't seem horrible [19:46:08] Could always set an expando on the exception to say "I already logged this" [19:48:15] Does SessionProvider know what Session is being worked with? [19:48:29] Not unless it's passed in [19:50:04] bd808: Another option would be to just do $exception->logContext = array( ... ) or something like that [19:50:14] hmmm... [19:51:08] That would be less clunky with a custom subclass [19:51:33] easy enough to add [19:51:53] I think I like that better than passing a session around needlessly [19:58:14] bd808: just use NormalizerFormatter::format? [19:58:28] a slight abuse of class responsibilities but meh [19:59:01] it's not guaranteed to exist [19:59:07] monolog is optional [19:59:22] oh yeah, the whole legacy thing still exists [19:59:38] yeah. I was going to kill that off in 1.28 [19:59:42] (wouldn't it be about time to kill it?) [19:59:49] ah, ok [20:00:27] make monolog the default PSR3 logger and revamp the config system for it [20:00:33] not a trivial project [20:01:02] * bd808 should actually write up a task/rfc on that [20:01:37] now that we have traits we could have a LoggerAwareException trait for exceptions which take $msg, $context instead of just $msg [20:01:44] and then add that to all the things [20:02:04] Most of the things are PHP's built-in exceptions, though. Can you add a trait to them? [20:02:14] a trait would have to add a new method [20:02:25] well, if you throw them you can subclass then [20:02:29] *them* [20:02:43] if something else throws them, you won't have context anyway [20:04:33] bd808: traits can define static methods so something like MyExtension::throwWithContext( $msg, $context )? [20:04:51] fancy [20:04:57] although Daniel would probably demand an exception factory for that [20:05:22] he does like his deep hierarchies and builders [20:06:25] could make a generic "AnnotatedException" class that wraps another exception and adds stuff too [20:06:42] but that won't play nicely with typed catch statements [20:06:58] (python could do it with meta programming magic) [20:07:07] in the weeds though [20:07:15] making MetadataMergeException [20:07:33] * anomie notes you could so it in objective C too, as useless as noting that is [20:08:30] you can totally do it in PHP [20:08:42] just eval classes on demand like phpunit does [20:11:55] train's out; do we wanna backport the IP logging stuff? [20:13:09] have you been able to test it "for real" yet? [20:13:30] no, beta is down [20:13:30] beta cluster being broken is not fun today [20:13:41] mw1017 it? ;) [20:14:59] +1 [20:15:33] * bd808 blames PHP5.5 for all of the worlds problems [20:17:22] * anomie checks php.ini for the "global warming" setting [20:20:10] I have been blaming Zend 5.5 for the last 30 hours or so as well :-D [20:20:50] :(( sorry hashar [20:21:21] seems like we may not have made a proper list of blockers before +2'ing some things [20:21:32] well [20:21:55] I totally missed the announcement myself or I would have '''boldly''' asked for a week delay [20:22:06] but hey, we will catch up ;-} [20:27:07] blame jetlag. legoktm wasn't thinking about all the moving parts in beta cluster and ci [20:27:26] didn't dawn on me until things started melting either [20:28:57] wmf.13 spiked the heck out of this graph -- https://graphite.wikimedia.org/render/?width=1000&height=600&_salt=1455047738.237&lineMode=connected&target=MediaWiki.edit.failures.session_loss.count [20:29:09] seems to have settled right down [20:30:31] Probably everyone on enwiki, the switch from non-SessionManager to SessionManager loses the backend session data [20:31:15] Non-CentralAuth sessions without "remember me" checked would also have been logged out, if we still have non-CentralAuth people around. [20:31:24] tgr: can you check loginwiki and see if there is anything funky about a local user named "HERBBBBB1"? Lots of autoCreateUser: failed with message Username entered already in use. Please choose a different name. for that account [20:32:07] I think those are common on loginwiki [20:32:29] bd808: Is that the thing at the bottom of our wmf.13 etherpad? [20:32:30] looking closer there are a bunch of them [20:32:49] bd808: https://phabricator.wikimedia.org/T70012 ? [20:32:53] anomie: looks like it is, yes [20:33:25] bd808: looks like your average enwiki registration [20:33:47] tgr: *nod* seems to match https://phabricator.wikimedia.org/T70012 [20:34:21] here's a view of them happening -- https://logstash.wikimedia.org/#dashboard/temp/AVLSCnrcptxhN1Xa_ut4 [20:34:46] autocreated by script, then all the edge login threads try to autocreate it, probably [20:34:49] *by job* [20:35:30] that normalize_message idea of yours was genius by the way tgr [20:35:54] trending boards are so much more clear for high variance messages [20:36:25] it's harmless but in theory failed autocreations should blacklist further autocreations in the session [20:37:21] of course that's a different session for each autologin wiki... [20:38:13] It's probably racing past that just like it's racing past the "User::idFromName( $user->getName(), User::READ_LATEST )" check, anyway. [20:38:51] heh, I spent some time in $job-2 trying to do that retroactively [20:39:36] with regexes like "\w+" -> and \d+ -> [20:40:01] that was before PSR3 got popular [20:40:41] yeah. I had a very long set of log message transforms for analysis at $DAYJOB-1 too [20:42:21] we could just disable autocreation on loginwiki [20:42:35] as there is a job for it anyway [20:43:25] but that would probably blow something up with CA autologins [20:43:50] does this error cause an autologin fail? [20:46:29] maybe on a user save failure we should just see if that account is attached and treat it as a successful autocreation in that case? [20:52:47] thanks bd808 :) [20:52:51] Since the User object returned by the session would still have an ID of zero (unless User::addToDatabase() changes it?), then the User isn't going to be seen as logged in and so the Special:CentralAutoLogin/checkLoggedIn is probably going to fail. But then when they go visit the wiki the CA auto-login script should trigger and log them in, so it works out in the end. For pre-SessionManager, it certainly did like that and no one has thought T70012 [20:52:52] was a high priority. [20:53:01] Reedy: yw :) [20:54:47] tgr, anomie: this graph is diverging again -- https://graphite.wikimedia.org/render/?width=1000&height=600&from=-5days&target=timeShift%28MediaWiki.site.users.rate%2C%2228d%22%29&target=MediaWiki.site.users.rate [20:55:41] it may be a time shift artifact, but it may not be [20:57:48] https://graphite.wikimedia.org/render/?width=1000&height=600&from=-5days&target=timeShift%28MediaWiki.site.users.rate%2C%2242d%22%29&target=MediaWiki.site.users.rate [20:57:56] this is a pretty good match [20:58:16] 49d, too [20:58:38] some thursdays seem to have lower account creation numbers [20:58:49] there is no such variance in the other days [21:00:45] that said, going back 15 weeks there are maybe 3 with a similar trend than today, so I wouldn't bet on it being an artifcat [21:01:24] kk. just being super mindful :) [21:03:07] the bump on https://ganglia.wikimedia.org/latest/stacked.php?m=used_memory&c=Memcached%20eqiad&st=1455218004&host_regex=&r=week seems to be a normal thursday thing too when you zoom out to month view [21:06:11] anomie: the change in account creation counts suggests that this [autocreate on 1x1 edge login] did not happen before [21:06:21] I agree it's not a huge problem, if that's all [21:09:42] ori: to confirm, we're going to mass convert all the array() -> [] in one patch? if so, can you +2 https://gerrit.wikimedia.org/r/#/c/269612/ ? [21:19:45] bd808: One doc question on https://gerrit.wikimedia.org/r/#/c/270003/ [21:20:35] grr, yes [21:20:43] legoktm, do you have any other concerns in https://gerrit.wikimedia.org/r/#/c/267391/ ? [21:21:26] anomie: amended [21:22:23] MaxSem: you probably want to add the .git-attributes file that prevents the useless files from being exported by github [21:23:22] yes, please [21:23:46] * bd808 checks to see if that is well documents [21:23:48] *ed [21:32:03] {{done}} [21:43:05] legoktm: I added a note about .gitattributes to https://www.mediawiki.org/wiki/Manual:Developing_libraries#Packagist_guidelines [21:43:18] please fix up any lies I may have included [21:45:19] bd808: I added a few more rules https://www.mediawiki.org/w/index.php?title=Manual%3ADeveloping_libraries&type=revision&diff=2047265&oldid=2047262 [21:45:33] all libraries except for cdb have "tests" [21:49:39] oh that wacky cdb [21:51:27] anomie, tgr: reports of Huggle3 problems. no details yet [21:51:53] bd808: Looking like connectivity issues from parts of Europe rather than anything MW-related? [21:53:21] I think they are 2 separate issues. [21:53:35] didn't we have complaints about huggle3 on the first roll out? [21:53:46] like maybe they needed to clear cookies of something? [21:53:53] * bd808 looks for notes [21:54:20] Huggle is a big pile of .NET stuff right? [21:55:17] No idea. [21:55:51] I think it was rewritten [21:55:57] C++ maybe [21:56:19] I do know petan is quick to jump to wrong conclusions at times, FWIW. [21:56:29] https://en.wikipedia.org/wiki/Wikipedia:Huggle [21:56:33] C++, python, apparently [21:58:04] bd808: old Huggle had problems, current one didn't [21:58:08] "ERROR: api.php responded with unknown result: WrongToken" [21:58:16] the .NET one is AWB I think [21:58:20] huggle3 users are getting that [21:58:24] http://wm-bot.wmflabs.org/logs/%23huggle/20160211.txt [22:01:28] huggle3 is cpp? [22:02:50] We all know Petr likes rewriting things [22:04:24] WRONG_TOKEN is a session token mismatch error [22:04:42] maybe jsut dump cookies and start over? [22:06:16] The awful, awful, awful code at https://github.com/huggle/huggle3-qt-lx/blob/master/huggle/login.cpp#L1133-L1151 shouldn't be broken by anything in wmf.13... [22:08:16] < petan> bd808: Fetching a token via action=login is deprecated. Use action=query&meta=tokens&type=login instead. XML sucks [22:11:36] bd808, why blacklist composer.json? what if the package has a dependency? [22:12:07] MaxSem: that's handled outside of the zip file that is downlaoded [22:12:37] a copy of your composer.json with a version number inserted comes from Packagist [22:19:06] * bd808 tries to build huggle3 locally [22:19:57] MY TRIGGER FINGER IS ITCHYYYYYYY! https://gerrit.wikimedia.org/r/#/c/269612/ [22:54:47] bd808: Sigh. https://phabricator.wikimedia.org/T126680#2020894 [22:55:09] *bam*! [22:56:05] uh, I wanted to flag that but totally forgot :( [22:56:08] I'm testing that fix locally [22:56:08] heh, I read earlier this week [22:56:10] "After an hour of reading through docs and researching URL encoding specs, it dawns on me. + is an encoded whitespace." [22:56:30] after spending some minutes figuring out how to do a curl test with the new tokens [23:00:11] it's purposefully built to trip up bots wich don't do URL encoding exactly per spec [23:00:25] that makes sense for edits but for e.g. logins not so much [23:01:09] https://i.imgflip.com/z1fpb.jpg [23:02:33] ori: lol, and that's not Haskell's fault in any way [23:02:42] well, which urlencode? [23:02:51] What tgr said is right, our edit tokens have both + and \ in them for this exact reason [23:03:10] e.g. the JS 'escape' function turns it into +%5C [23:04:09] Well exactly, and if you then use that in POST data, it breaks [23:04:23] indeed [23:04:42] but that level of strictness is only necessary for roundtripping wikitext [23:04:52] not for CSRF tokens in general [23:05:11] well, we should at least send out an announcement [23:06:40] tgr: yeah that sounds like a good idea [23:06:59] I threw up a PR for huggle3 -- https://github.com/huggle/huggle3-qt-lx/pull/172 [23:07:14] anomie, fixedthatforyou https://i.imgflip.com/z1ga0.jpg [23:07:23] hahahaha [23:07:25] well done [23:07:40] bd808, tgr: I note it was already mentioned that the tokens now contain "+" and "\" in https://lists.wikimedia.org/pipermail/mediawiki-api-announce/2016-January/000102.html [23:07:49] * anomie wanted to type "\\" there [23:08:37] anomie: it's there but that doesn't shout at people "Y U NO USE URLENCODE!?" [23:08:58] tgr: As for "which urlencode", https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1 in this case. [23:09:14] quiddity: ha! [23:09:27] anomie: I know [23:09:36] question is, how many API users don't? [23:09:59] About as many as don't know how to handle deleted cookies? [23:10:21] possibly [23:10:36] so IMO it doesn't hurt to spell it out in another email [23:11:28] * anomie is out for the evening, or at least for a little bit [23:11:37] o/ anomie [23:17:59] I wonder what's up with all the 'Session "..." requested with mismatched UserID and UserName cookies.' messages [23:18:37] they are all SUL renamed users, but shouldn't their cookies been corrected long since? [23:19:29] my suspicion is the old system allowed having a post-SUL ID and pre-SUL username cookie and the new one does not [23:22:37] legoktm: do you have a SUL-renamed test account maybe? [23:27:25] tgr: yes, one sec [23:27:59] tgr: Lego-test~testwiki, do you want me to login as it? [23:30:02] or I can give you the password, I have that one set up that one password logs in as global Lego-test, the other password will trigger the login as ~testwiki [23:36:27] legoktm: can you try if you can log in with it using Lego-test as the username? [23:36:52] or you can give me the password I wont bother you with requests :) [23:37:33] tgr: I logged in as Lego-test, and it logged me in as Lego-test~testwiki and sent me to Special:SulRenameWarning [23:40:54] that seems to work fine [23:43:50] bd808: where do logs go by default on deployment-fluorine? wfDebug.log? [23:48:02] tgr: ... probably? [23:48:46] that's the giant file there so I'd say yes [23:49:14] ah, yes, good point