[00:00:30] you export the data in some structured way in core/some extension and log it in WikimediaEvents [00:00:36] so there is a schema [00:01:29] tgr: https://dpaste.de/xBkN ? [00:03:02] *nod* I see the utility in what you are proposing tgr, but I wonder if it will ever really be used well. The $topics you picked in various patches look to be pretty indicative of what you were expecting to do with the data [00:03:06] ori: you can check the full set of patches at https://gerrit.wikimedia.org/r/#/q/status:open+branch:master+topic:authmetrics,n,z [00:03:46] and I'm thinking more and more that all we really need are some whisper files of trends [00:03:51] implementing that with a bunch of local stats calls which then should be kept in sync across projects is rather messy [00:04:10] rather than rich data in mysql for EL [00:04:29] I think bd808 is right, tbh. [00:04:32] which is why I did not use EL [00:05:23] what do you want, at the end of the day? how are you planning on using this data? [00:05:46] we want to have before/after data for changing AuthPlugin to AuthManager [00:05:58] make a bunch of graphs which show changes in the frequency of various auth events and get alerts when they drop [00:06:03] so we can ensure that if we break shit we notice and react [00:06:11] getContext()->getStats() [00:06:13] that's what you want [00:06:28] again, that's what https://gerrit.wikimedia.org/r/#/c/205869/ does [00:06:32] there are readymade puppet resources for alerting when graphite metrics deviate [00:07:20] yeah but it depends on adding an abstraction to core which we're not sure about [00:07:35] and regardless of the status of that patch, we've been instrumenting such events in core, and would want to instrument logins anyhow [00:07:37] I just don't like making those calls directly from core because at the application level you have somewhat strucutred data and it should be the logging framework and not the application which makes the call whether to destructure it [00:08:02] well, that's why you have hooks! [00:08:13] but hooks give the logging framework more power than is appropriate, in your view [00:08:22] which i sympathize with, and why i suggest the mapEvent thing above [00:08:46] hooks are very hard to change, in part because they are implemented poorly [00:09:00] what information to log typically changes a lot [00:09:21] you can usually get a context object from one of the variables you are passed [00:09:29] and collect whatever bits of state are interesting to you [00:09:43] plus adding a new hook adds to the mental burden of anyone who will try to find out which hook is right for them [00:10:04] but adding a whole new abstraction doesn't add to the cognitive burden of core users / developers?! [00:10:54] not much as long as it has a clear purpose and it is easy to decide whether you are interested in it [00:11:30] I personally like the explicit over implicit approach for this. If we want/need structured data then I think what I built is the least hideous thing. If we just need increment() then we should just use the stats object directly [00:12:13] it sounds like today all we really need are stats calls [00:12:30] and my patches can wait for a day when that is not the case [00:13:10] trg is arguing for design beauty in a swamp of expedient solutions [00:14:06] I think ori is arguing YAGNI and I just want to go eat dinner [00:14:09] I think this might be all you need: https://dpaste.de/SYFC [00:14:16] https://gerrit.wikimedia.org/r/#/c/205869/3/WikimediaEventsHooks.php has view logic in it [00:14:28] how to derive key names from events [00:14:43] that would get some data but not all I think [00:14:48] core seems like an awkward place for that [00:15:19] counting all hook usage would be nice [00:15:31] it used to happen and it was removed a couple months ago [00:15:34] yeah, would you mind if i submit that regardless of this conversation? [00:15:38] I assumed that was perf reasons [00:15:52] I would be glad to see it happening again [00:16:08] but it would not cover most of the auth metrics use cases [00:16:31] some things don't have hooks, some things have hooks but the arguments matter [00:17:07] ori: my last app stack instrumented practically every method call to graphite so + from me if the backend can handle it [00:17:11] (on that note, having counters again for API module usage would also be very nice) [00:17:16] I think that your reservations are largely on-point [00:17:45] but I don't think the alternatives you are proposing match your standard of coherence and minimalism [00:18:10] at least that's my drive-by view, I can legitimately be called out for flip-flopping on this [00:20:18] on possible argument for wfTrack() is that we could figure out how to defer calling the hook unitl post render [00:20:48] so that whatever happens under the hood doesn't impact time to first byte (by much) [00:23:30] getStats()->... processing is already deferred [00:23:50] *nod* but EL hooks aren't [00:23:50] that's a valid point but a weak one IMO [00:23:56] yeah but the wrapper could get that [00:24:20] EventLogging::mapHook() would ensure the callback: [00:24:27] * gets params by copy rather than by reference [00:24:32] * does not have its return value propagated to the hook [00:24:37] * gets executed at request shutdown [00:26:54] when a hook gives you all the data you need, you just have to subscribe to that hook, no need for a new abtraction layer in that case [00:27:23] EL::mapHook would be a nice convenience function to avoid boilerplate subscription code, in that case [00:27:40] and it is a lot easier to change something that is not allowed to influence the behavior of the code; worst case you break some stats reasoning about the effect of changing a hook tends to be much harder [00:27:46] and would solve for that ^ as well [00:28:25] well, that's not really a problem when a hook already exists [00:28:28] not if you need new data inside the hook handler which I think was his point [00:29:49] fair point [00:30:08] when there is no existing hook and we want to avoid hook proliferation, or there is a hook but it doesn't pass all the data you need, or is not called reliably, then we either use an abstraction or hardwire what kind of logging we are going to do into the code [00:30:19] an abstraction seems natural to me [00:30:25] well, or add a hook [00:30:53] granted, getStats() is already an abstraction, but not necessarily the right one [00:31:21] I'm not totally persuaded but I don't think you're crazy or anything [00:31:23] what you say makes sense [00:32:09] I just think the ugliness of the hook system is the product of extensive usage by lots of people over a number of years, so of course alternative abstractions look cleaner and more specific by comparison [00:32:25] for example, if a small third-party wiki wants to log this stuff, they probably want to put it into the DB as statsd/graphite is an extra effort to set up [00:32:28] but talk to me after they get used and abused for a decade and see if it's any cleaner than the system you were hoping to supplant [00:33:15] that's a fair point [00:34:06] I think the hook system should be revamped at some point and wfTrack should be killed then as well [00:34:29] but we needed something that could be done within a short time [00:35:41] personally I would still prefer abusing the logging system over abusing the hook system for stats like these :) [00:35:59] that's possible [00:36:27] our logstash setup could turn well structured log messages into graphite stats [00:36:38] but it would be a bit of a pain to configure [00:36:46] as the logstash config is in puppet [00:36:54] we just don't have a good way of extensions interacting with the logging setup [00:37:45] yeah adding a handler to a log channel is core functionality [00:38:20] if we switched to monolog as the core abstraction we could change that [00:38:24] well, it doesn't necessarily have to go into logstash, monolog could handle it, but then the configuration would go to operations/mediawiki-config which is still not so great [00:38:28] but at the cost of other flexibilty [00:39:07] monolog really wouldn't know how to do the "view logic" that you did with https://gerrit.wikimedia.org/r/#/c/205869/3/WikimediaEventsHooks.php [00:39:21] that's a log formatter [00:39:28] it would require a custom log handler [00:39:50] which could live in WikimediaEvents which would be fine if WME could add log handlers/formatters [00:40:32] that wouldn't be too hard to add to MonologSpi [00:40:49] IMO that would be the cleanest; have some convention to blackhole channels with, say, a certain prefix and then extensions can override [00:40:59] ++ [00:42:23] if blackholing is clever enough to have a very low overhead, and we encourage developers to use $log->info generously, that could be a nice setup [00:42:59] in prod our default handler is a null handler so the overhead is pretty low [00:43:31] we could probably look at the flame graphs to see that [00:44:02] we'd know if it wasn't because Ori would have come after me with a big stick when I switched everything over to monolog [00:44:35] the monolog layer processes everything wfDebug() and wgDebugLog() see [00:46:44] ok, so new and improved game plan is to make MonologSpi easier to interact with programmatically I guess [00:47:04] sounds right to me [00:47:57] and then we can use LoggerFactory::getInstance( 'something' )->info( 'blah', array( .... ) ); to record interesting events [00:48:48] and use WME or wmf-config to attach handlers to those channels that feed graphite [00:49:03] Nice [00:55:18] I'll work on the MonologSpi changes tomorrow. They should be pretty trivial actually [00:55:57] and we already have an accessor to grab the Spi via LoggerFactory::getProvider() [00:59:47] o/ night all [01:00:02] good night [07:38:14] TimStarling: hi! do you have any opinions on https://gerrit.wikimedia.org/r/#/c/221306/ since you were involved in the original discussion? [10:52:05] I'll have a look [13:39:48] Ghah, Phab's boards turned ugly. And broke my greasemonkey script that fixes the columns. [15:16:39] anomie: .aphront-multi-column-fixed .phui-workpanel-view & .phui-workpanel-view .phui-header-shell seem to be the new magic selectors needed to squish the columns [15:18:57] bd808: Yeah, I figured that out too [15:25:40] squish? [15:26:10] make the columns narrower [15:27:02] for me 220px vs the 300px default [15:27:43] https://github.com/bd808/userscripts/blob/gh-pages/phabricator.user.css [15:29:02] * greg-g has to install greasemonkey, will do later [15:29:06] (requires restart) [15:30:29] greasemonkey, ABP and noscript are the triumvirate of making the web usable [15:48:30] bd808: I've found it works well enough just setting the widths to 100%, makes it fit to the window. [16:31:55] SMalyshev: http://facebook.github.io/graphql/ [17:01:40] hoo|away: do the stewards have a policy set up for using the global merge tool? [17:05:46] Niharika: hangout? [17:53:41] legoktm: Not as far as I'm aware [17:53:46] We talked about that [17:53:58] and people had some ideas [17:54:12] but nothing that would qualify as policy or even best practice came out, I think [17:55:26] hoo: ok, I talked with DerHexer and we're going to do a production test of Yuvipanda & YuviPanda at 20:00 UTC, will you be around at that time? and then i'll turn it off afterwards [17:56:04] I probably will be [17:56:08] but only with one eye [17:56:16] am not at home atm, but still watching the train [18:45:50] 144 Unknown HTTP status code 200 OK in /srv/mediawiki/php-1.26wmf9/includes/libs/HttpStatus.php on line 100 [18:45:57] bd808: wmf9? [18:46:15] demon@tin ~$ mwversionsinuse [18:46:15] 1.26wmf12 [18:46:16] uhhh wtf? [18:46:32] that in faltamonitor I take it? [18:46:37] Yep [18:47:34] ostriches: from mw1111 according to /a/mw-log/hhvm.log [18:47:44] Jun 12 10:02:53 mw1111: message repeated 143 times: [ #012Warning: Unknown HTTP status code 200 OK in /srv/mediawiki/php-1.26wmf9/includes/libs/HttpStatus.php on line 100] [18:52:27] Hmm [18:52:30] Out of sync? [18:52:56] ostriches: I'll run sync-common there [18:52:58] isn't wikidata or something on an old branch? [18:53:08] Krenair: SMW [18:53:09] oh, all from that server? [18:54:23] wikidata is on wmf12 anyway, ignore me: https://github.com/wikimedia/mediawiki-tools-release/blob/master/make-wmf-branch/config.json#L166 [18:55:53] ostriches: forcing the sync updated wikiversions.cdb [18:56:03] not sure what's up with that [19:37:12] ooooh, RobLa's in the office [19:37:22] so great to see him [20:09:14] OuKB: Awesome. [20:53:36] Keegan: we tried global merge, it blew up. [20:53:48] :( [20:54:08] legoktm: Do we know why yet? [21:04:35] Keegan: yes, two main issues. 1) CentralNotice tables don't exist on most wikis, which is patched and I'm going to backport 2) some crazy transaction thing that Aaron is looking into [21:05:34] All righty [21:09:17] bd808, ostriches: there's a bunch of 115 Notice: Array to string conversion in /srv/mediawiki/php-1.26wmf11/extensions/BetaFeatures/includes/HTMLFeatureField.php on line 171, which I fixed in wmf12...but aren't all wikis already on wmf12? [21:09:40] legoktm: Bunch right now? Or just a bunch in the last day? [21:11:27] bunch showing up in fatalmontor [21:12:59] also https://gerrit.wikimedia.org/r/222486 is spamming logs [21:32:38] legoktm: hmmm... we had a host that had rsynced incredibly badly earlier [21:32:55] bd808: I looked at hhvm.log and it appeared to be multiple hosts [21:34:34] legoktm: what pathc was supposed to have fixed that? [21:34:39] *patch [21:34:59] bd808: https://gerrit.wikimedia.org/r/#/c/221778/ in ContentTranslation [21:35:03] which is in wmf12 [21:37:21] legoktm: could there be another beta feature with the same problem? [21:37:43] * legoktm greps [21:37:55] but we haven't had any other new BFs in a while so I doubt it [21:39:25] what is that horrible data structure supposed to be? [21:40:15] dict of arrays of arrays? [21:41:53] 'blacklist' is supposed to be a dict of arrays [21:41:57] er, dict of lists [21:42:08] {'msie': ['<', '12']} [21:42:41] there is another list in the extension.json { "msie": [ [ "<", 10 ] ] } [21:43:19] but that seems to be what the code expects [21:43:25] ugh [21:43:34] right, I that seems correct [21:43:41] words are hard. [21:43:48] $browser => $versions and the $versions as $version [21:43:51] the extension.json looks correct. [21:44:25] I'm mainly confused why the warnings are referencing the wmf11 patch [21:44:27] path* [21:45:28] mwversionsinuse says only 1.26wmf12 [21:45:35] on tin [21:46:02] and terbium [21:47:00] are the log events new or just hanging out in fatalmonitor because nothing has pushed them off yet? [21:47:15] hmm, might be that [21:47:20] the count is slowly going down [21:48:07] Jun 27 15:20:45 is the newest one and 18:30 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf12 [21:48:24] you just got tricked y the crappy ass fatalmonitor script [21:48:33] use logstash! [21:48:37] :| [21:48:39] also, [21:48:49] https://gerrit.wikimedia.org/r/222486 fixes the interwiki warning [21:49:08] yeah, tail and grep are your friend with those logs, don't trust fatalmonitor [21:50:15] thanks [21:50:26] fatalmonitor is just a summary of the last 1000 lines of /a/mw-log/hhvm.log [21:50:34] huh, are we really at the point where stuff gets pushed out of fatalmonitor within seconds? :P [21:50:37] that used to turn over pretty quickly [21:50:56] *not* getting pushed [21:50:58] Then we started dealing with errors in there? :P [21:51:04] yup [21:51:10] boooring [21:52:25] OuKB: it's holding about 2 hours of events these days in the last 1k lines [21:53:44] legoktm: https://gerrit.wikimedia.org/r/#/c/222486/ might be worth a backport [21:54:00] bd808: already deployed :) [21:54:05] heh [21:56:37] legoktm: :((( Jul 2 17:41:21 mw1072: message repeated 2 times: [ #012Notice: Array to string conversion in /srv/mediawiki/php-1.26wmf11/extensions/BetaFeatures/includes/HTMLFeatureField.php on line 171] [21:56:53] that just hit the log [21:57:02] but timestamp from long ago [21:57:07] Oh. [21:57:31] some kind of nasty syslog buffering [21:57:37] or something [21:59:15] bah [22:43:22] those phabricator kids really can't decide how they want to format dates. There are 2 different date formats on my dashboard and neither of them matches my preference settings for showing a date. [22:44:19] "Mon, Jun 29, 23:25" and "May 27 2015, 22:33" shown and my pref says "ISO 8601 (2000-02-28)"