[02:48:27] Analytics-Kanban, Analytics-Wikimetrics: Utf-8 names on json reports appear as unicode code points: "\u0623\u0645\u064a\u0646" - https://phabricator.wikimedia.org/T93023#1127355 (kevinator) Ah, I see the bug now. I had to download the file and open it in a text editor. My JSON viewer plugin in chrome dis... [14:02:03] ottomata: standup [14:02:08] sry :) [14:04:26] (CR) Milimetric: [C: 2] Add config to run funnel_failure_rates_by_type [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/197318 (https://phabricator.wikimedia.org/T89251) (owner: Mforns) [14:04:33] (CR) Milimetric: [V: 2] Add config to run funnel_failure_rates_by_type [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/197318 (https://phabricator.wikimedia.org/T89251) (owner: Mforns) [14:10:08] nuria: https://github.com/wikimedia/operations-debs-logster/blob/master/logster/parsers/SampleLogster.py [14:10:13] SampleLogster is a bad name, i guess [14:11:43] ottomata: got that, command runs fine like: /usr/bin/logster -o statsd --statsd-host=labmon1001.eqiad.wmnet:8125 --metric-prefix='wikimetrics' LineCountLogster /var/log/apache2/access.wikimetrics.log [14:11:52] ottomata: on wikimetrics staging [14:12:01] ottomata: but where are metrics going? [14:12:06] aye, just wondering if you'd want to report based on status code too, e.g. you probably only want 200s? [14:12:28] nuria: if you run that with -o stdout too, do you see the metrics? [14:13:19] ottomata: yes , [14:13:32] https://www.irccloud.com/pastebin/Wvr0STPM [14:14:20] ok, i guess check to see that the packets are going out then, or run witih the --debug flag and see what happens [14:14:22] you could do [14:14:29] sudo tcpdump port 8125 [14:29:49] Analytics-Dashiki, Analytics-Kanban, Patch-For-Review: Pageviews not loading in Vital Signs - https://phabricator.wikimedia.org/T90742#1128630 (mforns) To update private repo: sudo GIT_SSH=/var/lib/git/ssh git pull --rebase [14:31:45] Analytics, MediaWiki-extensions-ConfirmEdit-(CAPTCHA-extension): Provide a log of actions which trigger the CAPTCHA - https://phabricator.wikimedia.org/T43522#1128634 (He7d3r) [14:45:02] ottomata: ok, see stuff (with delay) on tcpdump... now, how do isee those metrics on graphite? [14:45:03] https://graphite.wmflabs.org/ [14:45:31] ottomata: do you know? cause they do not *seem* to appear [14:50:02] Analytics-Cluster, Analytics-Kanban, Performance: Cluster report that looks at x-Analytics header and extracts the date to calculate uniques. - https://phabricator.wikimedia.org/T92977#1128699 (kevinator) Is this task the same as {T88814}? [14:50:33] nuria: i dunno how it works in labs [14:50:48] but, afaik, statsd collects and aggregates stats over a minute and sends things to graphite [14:50:55] so, i guess, check if they are in statsd? [14:51:05] or, you could try sending them to graphite directly with the -o graphite flag [14:51:07] instead of statsd? [14:54:23] Analytics-Cluster, Analytics-Kanban, Performance: Cluster report that looks at x-Analytics header and extracts the date to calculate uniques. - https://phabricator.wikimedia.org/T92977#1128715 (Nuria) Let's see: Task T888814 has two parts: Part #1 VCL changes, code & deploy (https://phabricator.wikime... [14:55:47] ottomata: ok let me ask yuvi cause I assumed statsd is sending every one metric to graphite [14:56:05] Yuvi|reallyFood: lemme know when you rae back ... [14:57:35] i don't have access to the statsd or graphite instances in labs :/ [14:57:38] not sure how to get to them [14:57:44] so i'm not sure how to debug further [14:59:03] Analytics-Cluster, Analytics-Kanban, Performance: Cluster report that looks at x-Analytics header and extracts the date to calculate uniques. - https://phabricator.wikimedia.org/T92977#1128737 (kevinator) This will change a little bit once the UA map is in the refined tables. For now, you can use the... [15:03:10] ottomata: no worries , i will ask YuviPanda [15:05:05] (PS6) Milimetric: Add Sunburst Visualizer [analytics/dashiki] - https://gerrit.wikimedia.org/r/197234 [15:05:08] (PS2) Milimetric: [Review but DO NOT MERGE] Begin funnel layout [analytics/dashiki] - https://gerrit.wikimedia.org/r/196489 [15:05:09] (PS1) Milimetric: Add rickshaw timeseries graph [analytics/dashiki] - https://gerrit.wikimedia.org/r/197590 [15:07:09] qchris: are we sure hdfs is the proper user to use for guard? [15:10:57] ottomata: It's the only user that we know that exists at that point. [15:11:04] Which one to use instead? [15:11:07] milimetric, the merge you did, did not work, because the gerrit changeset had a "false" non-merged dependency [15:11:21] checking [15:11:23] milimetric, I'm going to remove it and I ping you [15:11:44] I first had the stats user, but that does not ... oh ... now with the added dependency on statistics ... it does exist. [15:11:49] Would you prefer the stats user? [15:11:55] (PS3) Milimetric: Query and visualization for failure vs user analysis [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/195436 (https://phabricator.wikimedia.org/T91123) (owner: Mforns) [15:12:08] mforns: no don't [15:12:10] i'll just rebase [15:12:14] But the "refinery::" role doing things as "stats" ... that looks wrong too. [15:12:20] but mforns is the stacked bar chart thing all set? [15:12:21] milimetric, ok [15:12:28] milimetric, no [15:12:39] it is still blocked by the wrong data [15:12:40] right? [15:12:55] refinery doing things as hdfs also seems wrong [15:12:56] mforns: that's fine, let's merge it. Ultimately all the scripts are bad, because all the data is bad [15:12:57] milimetric, so, that's why I said that I would remove the dependency, [15:13:04] milimetric, ok [15:13:09] k, i'll do that [15:13:11] hm [15:13:11] but [15:13:12] hm [15:13:20] so, role::analytics::refinery [15:13:23] milimetric, thx [15:13:24] does have an explicit dependency on client [15:13:25] Class['role::analytics::hadoop::client'] -> Class['role::analytics::refinery'] [15:13:32] hadoop client [15:13:42] (CR) Milimetric: [C: 2 V: 2] "Merging even though we know the script is wrong. At this point, due to the problems we found with the data, all the scripts need re-worki" [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/195436 (https://phabricator.wikimedia.org/T91123) (owner: Mforns) [15:13:48] meh whatever, hdfs is fine [15:13:50] (PS3) Milimetric: Add config to run funnel_failure_rates_by_type [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/197318 (https://phabricator.wikimedia.org/T89251) (owner: Mforns) [15:13:55] (CR) Milimetric: [V: 2] Add config to run funnel_failure_rates_by_type [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/197318 (https://phabricator.wikimedia.org/T89251) (owner: Mforns) [15:14:14] I was hesitant with 'hdfs' too, so I guess we both do not like it :-/ [15:14:20] But other things run as hdfs too. [15:14:37] well, the other things that run as hdfs do stuff with hadoop [15:14:43] like, camus, dropping partitions, etc. [15:14:46] milimetric, in an hour I'll have a look at stat1003 generated data, to see if reportupdater worked as expected [15:14:49] so, they work with files in hdfs [15:15:05] mforns: thanks [15:15:15] refinery-soruce and refinery guard don't really need hdfs at all [15:15:27] True. [15:15:28] i don't think you even need that require role::analytics::refinery, do you? [15:15:39] It's for the existence of the hdfs user. [15:15:44] right. [15:15:53] If we switch the user, we can drop it. [15:16:04] hm, [15:16:10] Argh. Right I forgot about the comment about the dependency. Thanks. [15:16:21] well, qchris [15:16:24] if we switch to stats user [15:16:31] than there is a real dependency [15:16:35] then* [15:16:55] which, i think i'm ok with [15:17:00] working path and user? sure. [15:17:02] what do you think? [15:17:10] then you can run it as $:;statistics::user::username [15:17:41] Ok. Then I'll translate the whole thing to the stats user. [15:17:44] ok [15:17:44] milimetric, oh, I forgot [15:17:59] so, ja, if you do that, you don't need a comment about that statistics module dependency :) [15:18:03] milimetric, there's also this change, which should have been a dependency: [15:18:09] https://gerrit.wikimedia.org/r/#/c/197319/1 [15:18:24] ottomata: Is it ok if I'll keep the classes in refinery.pp, as it's more refinery than statistics? [15:18:44] yes [15:18:46] k [15:18:49] Thanks. [15:19:34] (CR) Milimetric: [C: 2] Make row assignable [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/197319 (https://phabricator.wikimedia.org/T89251) (owner: Mforns) [15:19:44] thanks milimetric :] [15:36:49] (PS2) Milimetric: Add rickshaw timeseries graph [analytics/dashiki] - https://gerrit.wikimedia.org/r/197590 [15:38:08] Analytics-EventLogging, Analytics-Kanban, operations: EventLogging query strings are truncated to 1014 bytes by ?(varnishncsa? or udp packet size?) - https://phabricator.wikimedia.org/T91347#1128881 (Nuria) We are going to enable our event-ingesting pipeline to use varnishkafka rtaher thna varnishncsa.... [15:40:35] YuviPanda: hola, have time for labs question about graphite? [15:40:54] nuria: I am back :) [15:40:55] and yes [15:41:18] YuviPanda: ok, so iam trying to send metrics to labs graphite via logster and statsd, like: [15:42:00] YuviPanda: /usr/bin/logster --debug -o statsd --statsd-host=labmon1001.eqiad.wmnet:8125 --metric-prefix='wikimetrics' LineCountLogster /var/log/apache2/access.wikimetrics.log [15:42:29] YuviPanda: If i run that command with stdout i see the execution and metric: [15:42:46] https://www.irccloud.com/pastebin/vNHFSvUs [15:43:02] but i do not see anything on graphite on labs... [15:43:29] YuviPanda: That is: https://graphite.wmflabs.org/... am i missing something about how to send metrics there? [15:44:02] nuria: so graphite.wmflabs.org seems down atm, coren is investigating. txstatsd should still be running, however. [15:45:01] YuviPanda: so (when garphite is up), should i sent metrics there using statsd? [15:45:12] nuria: you are sending it to the correct place, yeah. [15:45:21] nuria: but I can’t help debug until graphite is back up... [15:45:32] nuria: what’s the ‘-o stdout’ mean? [15:46:04] YuviPanda: that is just debugging so it prints to screen [15:46:19] nuria: ah, is it printing to the screen exactly what it is sending to statsd? [15:46:21] -o statsd is the "real" command [15:46:27] YuviPanda: I think so [15:46:36] hmm, that doesn’t look like proper statsd format [15:46:38] * YuviPanda looks for the dosc [15:47:04] YuviPanda: plus execution times i think (last number) [15:47:39] nuria: right, so https://github.com/etsy/statsd/blob/master/docs/metric_types.md is the spec. I assume you would want ‘gauges’. and the output doesn’t look like valid statsd format at all [15:47:51] YuviPanda: since logster is used in prod i assume formats are right, even if printout looks odd [15:48:14] nuria: right. so I guess the way to debug this is to look at tcpdump and see what exactly is being sent... [15:48:26] and make sure it matches what txstatsd expects [15:48:37] nuria: are you sure it’s being used in prod? afaik ottomata said it was written but never used... [15:49:15] mmm.. ottomata : does varnishkafka use logster in prod? cc YuviPanda [15:49:43] yes [15:49:44] it does [15:50:09] ottomata: sending data to statsd? cc YuviPanda [15:50:16] hmm, ok. [15:50:18] YuviPanda: logster sends from all varnishkafka instances to a local txstatsd on each cache node, and those forward to central statsds [15:50:27] ottomata: these are txstatsd as well? [15:50:32] afaik [15:50:34] yes [15:50:51] nuria: hmm, so I’m not sure, but I can help debug once graphite is back up :) [15:51:13] YuviPanda: https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/cache.pp#L417 [15:51:25] YuviPanda: ok, thank you. [15:51:40] Ironholds, hi [15:51:42] and alos [15:51:43] https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/cache.pp#L543 [15:52:06] nuria: cool :) [16:01:54] hey mforns [16:02:02] hey Ironholds :] [16:02:22] I'm starting: https://phabricator.wikimedia.org/T86535 [16:02:29] and have 2 questions [16:04:00] Ironholds, when you say "If there is only one event in a session, it should not be reported." I understand one-pv sessions should not count in calculating the means, min, max, quantiles, right? [16:04:53] I'm in a meeting [16:04:59] let's talk about this when I'm back [16:05:03] *? [16:05:03] oh, ok, np [16:05:18] sure Ironholds, ping me when you have time [16:07:39] qchris: i am running guard! [16:07:49] wohoo \o/ [16:15:09] mforns, okay, I'm around. WHat's up? [16:15:15] hey Ironholds [16:15:27] so I'm looking at: https://phabricator.wikimedia.org/T86535 [16:15:47] when you say "If there is only one event in a session, it should not be reported." I understand one-pv sessions should not count in calculating the means, min, max, quantiles, right? [16:15:57] yep [16:16:03] ok, right [16:16:18] but they should count as a session in session counts, right? [16:16:54] and also count in the computation of events per session? [16:19:18] err [16:19:22] * Ironholds goes to check [16:20:42] mforns, yes and yes [16:20:46] so, what I did, as a heuristic? [16:21:04] well, not heuristic. identifier. [16:21:25] I had the session length calculation output -1 in the case of sessions with only one event [16:21:40] which means it's trivial to filter them out but you still retain the data - meaning you can calculate, e.g., bounce rate, trivially. [16:21:42] Ironholds, aha [16:22:09] ok, makes sense [16:22:14] mforns, I've actually implemented the entire set in highly speedy C++, so if you'd find it useful I'm happy to point you to the source code [16:22:27] that would be great :] [16:22:31] of "this is how we separate streams of timestamps into sessions, session length calculation uses N value" [16:22:32] cool! [16:23:05] https://github.com/Ironholds/reconstructr/tree/master/src you want session_metrics and sessionise, I think [16:23:22] ok, thanks! [16:23:27] (I apologise for it being a product of its API and thus using lists and vectors for damn near everything ;p) [16:24:42] np at all [16:25:08] the other question was about quantiles. which ones do we need? 0.25, 0.5 and 0.75? [16:28:33] Analytics-EventLogging, Analytics-Kanban, operations, Patch-For-Review: Eventlogging JS client should warn users when serialized event is more than "N" chars long and not sent the event [8 pts] - https://phabricator.wikimedia.org/T91918#1129163 (mforns) Open>Resolved [16:28:34] Analytics-EventLogging, Analytics-Kanban, operations: EventLogging query strings are truncated to 1014 bytes by ?(varnishncsa? or udp packet size?) - https://phabricator.wikimedia.org/T91347#1129164 (mforns) [16:29:48] (CR) Nuria: "mmm... isn't Utilities class missing from patch?" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/197296 (owner: OliverKeyes) [16:31:47] mforns, I'd check with Deskana|Away ; Howie wanted 0.1:1.0, plus 0.99 [16:32:34] nuria, bahahahahaha [16:32:41] ...I am an idiot who forgot what -a did [16:32:44] Ironholds, what do you mean with 0.1:1.0 ? [16:32:48] one moment while I submit a valid patch to you [16:32:56] mforns, 0.1, 0.2, 0.3, 0.4,0.5, 0.6... ;p [16:32:58] Ironholds: that a*ahem* has NEVER happen to me ... [16:33:14] well of course not! You are a professional! I am a professional! We are all professionals! [16:33:15] Ironholds, ok [16:33:25] who said impostor syndrome? I HEARD YOU, VOICE IN MY HEAD [16:33:27] Ironholds: no, it's more like every SINGLE time man. [16:33:38] hehehe [16:33:46] nuria, thank you for again demonstrating why I love working with y'all [16:33:47] Ironholds: it's one of those mistakes that you are like .. ah no, did i do that again? [16:33:57] I can think of WMF employees whose response would've been "ugh, you forgot to do X, god." [16:34:08] "I forget to do X all the time" makes all the difference when you're a noob :) [16:35:57] Analytics-EventLogging: Adapt eventlogging intake to use Kafka - https://phabricator.wikimedia.org/T93096#1129191 (Ottomata) NEW a:Ottomata [16:38:26] (PS2) OliverKeyes: De-static-everything [analytics/refinery/source] - https://gerrit.wikimedia.org/r/197296 [16:53:59] Ironholds: ok, looked at patch. Then: if we want to not have static classes but those are such that we only want 1 instance of the object, they should be "application scoped" singletons, meaning that they are instantiated once for the life of the app, which i guess in this case is the java that runs your hive query. [16:54:17] https://www.irccloud.com/pastebin/iwP1LIoG [16:54:46] aha! [16:54:52] so, throw that into the UDF defs? [16:54:56] nuria, ^ [16:55:19] Ironholds: no, into the classes that are singletons themselves [16:55:31] Ironholds: let me give you more precise docs [16:55:59] nuria: i'm going to work on some other stuff for a bit after SoS, would you look over this deployment plan? [16:56:00] http://etherpad.wikimedia.org/p/Analytics-Nuria [16:56:06] Ironholds: http://www.javaworld.com/article/2074979/java-concurrency/double-checked-locking--clever--but-broken.html?page=2 [16:56:07] i will do that to betalabs first [16:56:32] ottomata: will do, after 'second breakfast' [16:57:04] nuria, awesome; thanks :) [16:57:28] Ironholds: some useful info also here: https://en.wikipedia.org/wiki/Singleton_pattern [16:57:39] nuria, so just throw that in as the constructor, at which point when you try to instantiate a second time, the constructor will reference the initial instantiation? [16:57:45] if I'm understanding correctly [16:58:11] Ironholds: and this book is like the best reference evah: http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601 [16:58:32] perfect! [16:58:41] Ironholds: right, constructor is private now [16:59:02] Ironholds: do some tryouts outside the hive environment and you shall see it perhaps more clearly [16:59:12] nuria, gotcha :). It makes sense! [17:01:18] Ironholds: ok, caller classes would do: Singeton.getInstance() [17:01:30] Ironholds: let me know if this doesn't make sense [17:03:23] nuria, it mostly does! I'll read the docs more thoroughly and put together a dummy example just to be sure :) [17:10:12] nuria, okay, so it looks like what I want for full optimisation is (1) the singleton approach as you're suggesting [17:10:34] but (2) to lazily rather than eagerly create it (so we avoid unnecessary clutter, since the entire point of this patch is avoiding unnecessary clutter) [17:11:22] does that sound right? [17:16:14] nuria, is it not possible to do the getInstance() inside of the Resource (in your paste) instead of in MySingleton [17:16:30] and have Resource hav a protected (private?) static member [17:16:39] that gets initialized when getInstance is called? [17:16:47] that way you avoid more classes? [17:17:20] ottomata, it looks like that's the case from my googling [17:17:30] but I'm going for the simplest approach, which is test it and see if it explodes :D [17:17:57] lazy instantiated singletons in the existing classes: test coming right up [17:18:07] class Resource { [17:18:07] protected static Resource resourceInstance; [17:18:07] public getInstance() { [17:18:07] resourceInstance = new Resource(); [17:18:07] return resourceInstance; [17:18:07] } [17:18:08] } [17:18:08] something like tha? [17:18:11] idunno [17:18:24] would that avoid the public static getInstance()* [17:19:15] ottomata, yeah, although I'm going for: [17:19:32] well, that but resourceInstance = null [17:19:42] and then if(resourceInstance = null){ [17:20:00] resourceInstance = new Resource(); [17:20:04] return resourceInstance; [17:20:14] or, something like that [17:20:21] aye [17:20:24] righto [17:20:25] the point is to create lazily rather than eagerly so you don't create it whether or not you need it [17:20:38] * Ironholds is skim-reading all the things and still disconcerted I can have this conversation [17:20:53] okay, implementation test comin' up [17:21:07] Ironholds: "This is just a convenience method that also makes sure that arguments are not null.", i cannot explain this well, but this is something scala is really good at [17:21:13] starting with Pageviews because that's a one-public-method class [17:21:20] it auto guards against NPEs with some fancy type wrappers [17:21:24] ottomata, nice! [17:21:29] Scala is on my list, don't worry! [17:21:35] heheh, still on mine too :) [17:21:39] When I am proficient in Java and C++ and Python I will dig into scala :D [17:21:52] well, "proficient" in C++ == "can make most things work with this one version of gcc" [17:22:01] i read this part of the scala book [17:22:05] and didn't buy it [17:22:13] it seemed like it wasn't really gaining anything, why not just check for nulls? [17:22:22] but, then I used it in that spark streaming thing, and I got it. [17:22:39] i still don't fully understand, but it was really nice [17:22:51] yay! [17:24:04] http://alvinalexander.com/scala/using-scala-option-some-none-idiom-function-java-null [17:30:36] ottomata, am I allowed to stick a very subtle reference to The Lonely Island's "Jack Sparrow" in the documentation if I can make it relevant. [17:32:55] ottomata: yes, there should not be any additional classes [17:33:49] wheee it works! [17:34:02] nuria, I think I got it to work, lazily instantiating so it doesn't create it from the get-go [17:34:28] I'm going to finish making it work for the pageviews class specifically, then would you like me to throw it up to the gerrit patch so you can check I haven't fubared it before I do it for every other class? [17:34:42] also, can I just say: singletons are REALLY COOL. [17:34:47] I need to find out how to do this in C++ [17:35:09] ...oh god I just said I liked Java. End times. [17:35:26] ottomata, Ironholds : singletons are far from ideal and they have their own sort of troubles, but at least: 1) they allow for mocking when testing 2) they can implement an interface. So for an application that will have several classes w/o any deps management I think is a preferable alternative to all being static and thus not OO. [17:35:37] yup [17:35:45] and also preferrable to "a new instance with every row" [17:36:13] Ironholds: ya, ya, hive udfs give us this method that is executed once per udf [17:36:15] Ironholds: singleton objects (aka companion objects) are first level citizens in scala :p [17:36:18] http://tutorials.jenkov.com/scala/singleton-and-companion-objects.html [17:36:32] Ironholds: that is where our instantiations should go always [17:36:45] *thumbs up* [17:37:00] ottomata: looking at deployment plan [17:37:01] okay, I've got it working for one class. You want to check now or should I implement everywhere and --amend then? [17:39:31] Ironholds, quick question about APP metrics job, why do we need to have a sample of uuids? [17:39:48] mforns, as opposed to grabbing all events from all UUIDs? [17:39:52] yes [17:40:04] we don't, it was just easier [17:40:24] remember the requirements come from a point where the processing and metric generation happened outside hadoop [17:40:29] which means no MR and no distributed computing [17:40:33] aha [17:40:34] so we took a sample and said: that'll have to do. [17:41:12] ok, so now we really do not need these uuid sample, right? we'll run the job over all webrequest logs [17:41:16] if you can do it efficiently with all UUIDs (which I don't doubt! After all, you already have to open all the files, and once the first-stage reducers have sorted the requests by {uuid, timestamp} it's just a streaming problem to tokenise them) do it [17:41:39] my only useful thing there would be: tokenise first, calculate metrics later. Which sounds very duh, but. [17:41:56] life is easier when you can provide a list of vectors, each vector representing a session (or, the java equivalent to vectors) [17:42:05] as opposed to, for each metric, having to sessionise [17:42:26] Ironholds, aha [17:42:37] ok, makes sense [17:42:42] *thumbs up* [17:42:52] thanx! [17:42:53] mforns, oh, and you've worked out the genius of mapping for this, right? [17:43:16] what? :] [17:43:21] maps produce {key, value} tuples. All we need as input data for session reconstruction is the UUID, which is a key, and the timestamp, which is the value [17:43:23] *jazz hands* [17:43:47] I'm super excited to see what you come out with because this system is...like, it would be hard to generate a system more optimised for session reconstruction, than hadoop is :) [17:43:56] so best of luck and let me know if you have further problems [17:44:01] ...questions, even [17:44:07] sorry, I'm doing 30 things at once; p [17:44:36] Ironholds, yes I have some experience with MR, but all your comments will help a lot and are welcome! [17:44:51] oh, totally. You know more about it than I do! [17:45:03] I'm just doing my "I love this thing! This thing is so awesome. Let's squee at how perfect it is" thing ;p [17:45:04] ok, I'll ping you no doubt if I have more questions [17:45:28] mforns: any desire to try to do this in spark instead of hive? you can use python. :p [17:45:29] no, for sure you know more on this specific case than I do! [17:45:37] it *might* be more efficient...it might be less... [17:45:42] actually, i take that back. [17:45:53] hive and parquet work great together right now, stick with it :p [17:46:00] hehe, ok [17:46:31] ottomata: ok, looked at plan, sounds good and i can help testing on vanadium wheever [17:47:11] mforns: but if you sample the job will be faster right? [17:47:21] nuria, yes [17:47:35] oh, hive? :/ [17:47:40] hive is going to be horrifyingly inefficient [17:47:58] unless you want to build out an entire class of UDFs just for this, I guess [17:48:05] mforns: Then there is a good reason as refined tables already have hive partitioning that allows for random sampling [17:48:15] Analytics-Cluster: Make spark work well with webrequest Parquet data - https://phabricator.wikimedia.org/T93105#1129465 (Ottomata) NEW a:Ottomata [17:48:18] Ironholds: ah, so map & reduce code you mean [17:48:26] Ironholds: ya, that might be the case [17:48:35] nuria, yeah, a straight MR job [17:48:42] oh iunno mforns, Ironholds, dunno much about this task [17:48:44] do wahtever you need :) [17:48:52] because (1) the data format (uuid, timestamp) is perfectly optimised for MR [17:48:53] oh if you do aMR job, i think you wil have the same problems that hive will. [17:49:05] Ironholds: thos come from x analytics header, right? [17:49:17] and (2) if we sort before plugging it into the sessioniser we can take advantage of streaming [17:49:20] Ironholds: i think it might come to that, we can probably try hive first and see how things look [17:49:23] sorry, same problems that spark will * [17:49:26] which means that we get to avoid clogging the rest of the system with...all the things. [17:49:40] ottomata, yeah, I think so? I think technically you have to check both that and URL at the moment, because legacy app versions. [17:49:49] why is count + group by bad? isn't that all you are doing? [17:50:01] nope [17:50:11] select uuid, timestamp, ORDER BY uuid, timestamp DESC [17:50:17] then convert timestamp into seconds [17:50:29] then stream each uuid's timestamps into a tokeniser [17:50:32] (coudl do that in the select) [17:50:38] tokenisserrrrrr [17:50:49] ? [17:50:53] well, "sessioniser" makes Aaron cry [17:50:58] ottomata, okay, I've got 20 timestamps from userX [17:51:18] I want to work out how many sessions they had, and also how long these sessions were, and also how many pages were in each session, right? [17:51:33] all of these metrics are dependent on reconstructing sessions from stream_of_timestamps [17:52:07] so instead of including that logic in each metric's calculator, you have a dedicated sessioniser that accepts a pile of timestamps and produces a list with each entry consisting of one "session" and the timestamps within that session [17:52:43] then you plug the list into how_many_sessions. list.length() done. How many pages? map(list,length). How long these sessions were? well, more complicated, but for the sake of this example, last_timestamp - first_timestamp [17:53:48] i see ok [17:53:49] so if Hive is the method we're looking at; how well does hive handle throwing multiple values in? Because we'll need to sort, and then divide up by UUID value, and then sessionise, and then calculate metrics for each of those users' sessions, and then calculate (mean, median, quantiles, whatever) for each metric [17:54:07] mforns: you should use spark :p [17:54:09] haha [17:54:12] like, Hive would work well for data retrieval (it used to be slow but from what nuria is saying it's probably a lot more reliable now, because the partitioning is optimised for this) [17:54:22] but it's not a great model for the processing, as I understand it [17:54:24] aye [17:54:28] ottomata, aa [17:54:29] aha [17:54:38] i jsut made this ticket: [17:54:38] https://phabricator.wikimedia.org/T93105 [17:54:43] ottomata, wait, did I just get a thing right? Truly, today is a weird day ;p [17:54:55] Ironholds: i am not 100% sure, but it does sound complicated to do with hive [17:55:12] but, either way [17:55:15] if it is MR or if it is Spark [17:55:23] either'd work! [17:55:28] i think there will have to be some work around making Parquet work with a column projection of some kind [17:55:35] hive does this for us [17:55:35] MR is easiest just because, well, if we're gonna have to maintain a thing, maintaining it in Java would seem to make sense. [17:55:40] in my mind [17:55:48] disagree there :/ [17:55:58] really? hrm. What would you suggest? [17:56:06] like, what's the rationale? [17:56:09] i would suggest the best tool for the job that we also want to support [17:56:31] java would be fine. but spark code is much easier to iterate on, and to debug and to read. and potential is faster (but mabye not!) [17:56:50] easier to iterate on than MR code* [17:56:59] *nods* [17:57:00] makes sense! [17:57:06] Java MR is very verbose and has a lot of parts. implementing mappers, reducers, etc. [17:57:13] i'm not opposed to that at all [17:57:16] if mforns wants to do that [17:57:18] no objection here. [17:57:19] yeah, that's a good point. I reviewed Bob West's streaming code and just that made my eyes bug [17:57:36] but you raise an excellent point of "mforns gets to decide", so on that note I'm gonna get back to singleton generation :D [17:57:41] haha, k :) [17:58:00] ottomata, but there's work to do still to get spark working right? [17:58:06] ottomata,Ironholds , mforns : seems that there are two things here 1) processing of data so complex metrics can be calculated 2) complex metric calculations [17:58:07] naw, spark works now [17:58:18] ottomata, and the task you created with parquet? [17:58:22] nuria, yep. The second is trivial. Like, if you want, I am happy to implement the calculators. [17:58:25] the part that doesn't work: is parquet schema projection [17:58:27] so [17:58:33] parquet is a columnar based store, right? [17:58:37] the hard bit is retrieving and sorting and sessionising the data, imo [17:58:38] aha [17:58:41] so, you should only have to read the columns you want from disk. [17:58:45] but that might just be because my expertise is metrics calculation not MR :D [17:58:45] hive has this built in [17:58:58] i think we need some custom work to make that happen, if we are not going to use Hive [17:59:00] but! [17:59:08] ottomata: right, because of the sql-> java mapping [17:59:09] you can still use Parquet with either [17:59:11] spark or MR [17:59:18] rigiht now [17:59:25] you just wont' get the performance benifits of parquet [17:59:30] aha [17:59:31] because by default you will read all of the columns [17:59:48] I see [17:59:48] i have done some work on this already, so I know approximately what nees to happen [18:00:09] i know more of how to make this work in java, but I'm pretty sure that the work to do so will apply for spark as well...although possibly not spark-python [18:00:13] you might have to use scala :/ [18:00:14] ottomata: actually we can do that now w/o modifications if we create a partition that "only" has the columns we need (not-so-great-workarround) [18:00:15] or java! [18:00:18] spark has a java API too [18:00:54] ok... [18:02:01] nuria: ? [18:02:03] well lots of decisions to make, in which I have not much experience [18:02:22] mforns: your decision is: [18:02:33] xD [18:02:35] your choices are: [18:02:58] - java MR [18:02:58] - java spark [18:02:58] - scala spark [18:02:58] - hive [18:02:58] - something else? probably not at this time :) [18:03:05] oh, and MAYBE [18:03:07] - python spark [18:03:12] you can start with python spark if you want [18:03:30] it is very easy to translate from python spark to scala spark if we get there, up to you. [18:03:36] whatever you choose, we will make this work. [18:03:42] ok [18:03:42] this is something i've been meaning to do for a while [18:03:55] and, the work to make it work applies to all of those options (except for maybe python spark and for hive) [18:04:14] and, in the meantime, your code will be the same eithe rway [18:04:15] ottomata: i do not think we should add python spark to the mix [18:04:18] haha [18:04:25] well, people here like python! [18:04:26] :) [18:04:35] i would encourage scala as well, but that is very new and I don't want to force it [18:05:12] well, spark is also very new to me [18:05:16] i think it would not be hard folks to use python at first, and if we run into limitations, convert to scala. the logic of code is very similar [18:05:21] mforns: you will not have a problem, i am sure [18:05:27] especially if you start with the python stuff [18:05:29] it is very easy [18:05:34] ok [18:05:38] ottomata: i disagree, we have a bunch of utility functions in java already, let's plis [18:05:44] not duplicate those in python [18:05:51] mforns: https://gist.github.com/ottomata/adcb200b99ac1c9d5941 [18:05:54] java spark then? [18:05:54] ottomata: I think we should support reserachers using python [18:05:59] hm, that is true, python won't let you use the java stuff [18:06:06] scala spark I mean [18:06:11] ottomata: but they mostly do 1-offs, not recurrent jobs [18:06:11] mforns: you can use the java classes from java or scala [18:06:14] aye [18:06:16] makes sense actually. [18:06:35] mforns: if you want to try spark, i would recommend playing with the python api first, just to get a feel for it [18:06:37] ottomata: so for dev team let's keep it to java or scala and make our utility code be java [18:06:44] maybe you will not want to implement this in python though [18:06:51] as nuria says, she is probably rigiht [18:07:18] (PS3) OliverKeyes: De-static-everything [analytics/refinery/source] - https://gerrit.wikimedia.org/r/197296 [18:07:18] ok [18:07:37] If I have problems starting right away with scala, I'll try python first [18:07:43] nuria, thrown in an example of how I'm handling singletons in a lazy way ^ - let me know if I'm doing it right when you have the time/spoons? [18:07:44] sure. [18:07:57] mforns: it is really fun to play with spark on the repl CLI [18:07:58] but it seems it will be scala-spark [18:08:04] do: [18:08:09] spark-shell [18:08:21] Ironholds: in your gerrit patch? [18:08:23] that will give you a local spark instance that can also access hdfs [18:08:27] spark shell instance* [18:08:34] nuria, yep! It's the "Pageview" class and associated tests/UDFs [18:08:35] then you can put scala code in and do fun stuff [18:08:37] like ipython! [18:08:38] its fun! [18:08:44] ottomata, ok [18:08:49] (I should probably have specified that. "It's somewhere in these half-dozen jars, have fun!" :D) [18:08:50] or, pyspark [18:08:51] if you want to do that [18:08:58] i should write up a spark tutorial... [18:08:59] :p [18:09:03] adding a task! :) [18:09:09] ottomata, our local spark dealer [18:09:12] haha [18:09:12] the first map job is free [18:09:15] after that, you have to pay [18:09:20] xD [18:09:25] all the new techs go like this [18:09:35] i've been a dealer in things people don't want to try for 3 years at wmf now :p [18:10:39] ottomata, the code for the wikipedia top-visited tag cloud, is scala-spark right? [18:10:40] Analytics-Cluster: Write wikitech spark tutorial - https://phabricator.wikimedia.org/T93111#1129598 (Ottomata) NEW a:Ottomata [18:11:20] ottomata: mmm... not to be teh party pooper but how will scala spark integrate with oozie? [18:11:27] ottomata: is that possible? [18:14:57] nuria: I've got a weird thing with event logging, mind taking a look with me? [18:16:29] mforns: ja [18:16:54] nuria: shoudl work fine, no? oozie is flexible [18:16:55] milimetric: sure, give me a sec [18:16:58] ottomata, oozie does not trigger spark jobs? [18:16:58] its just a hadoop job [18:17:05] oh [18:17:49] ottomata: so it is executed as an external command to oozie? [18:17:57] nuria: also [18:17:57] https://github.com/apache/oozie/blob/master/client/src/main/resources/spark-action-0.1.xsd [18:18:19] ottomata: ah, ok, so it knows about spark [18:18:27] i see at least some version does [18:18:30] ottomata: good, cause we need the scheduler [18:18:36] or we could just shell out to submit a job, i'm sure it will work somehow [18:19:30] ottomata: ya, i was trying to avoid the shell approach [18:20:15] ottomata: if oozie initializes it likely, just like it does for java, it can load all our utility code (udfs.. etc) [18:20:33] ottomata: or rather, run on top of the jvm where all that is alredy initialized [18:21:22] if spark action is not avialable, i'm sure there is a way to just launch it as a hadoop action or something? it is just a jar that takes a main class name in the end [18:21:30] spark-submit somewhere along the lines probbly does hadoop jar... [18:21:33] i would think anyway [18:22:31] i dunno, i'm sure it is possible :) [18:24:39] ottomata: ok, sounds like it is. [18:25:29] ottomata, nuria: I feel like spark + scala + bending oozie to call spark is quite a lot of new things for me. I will need some assistance with that. [18:26:26] mforns: I would 1st: try to do what needs to be done in hive, see problems with it, try it (on the dry) on spark [18:26:33] mforns: no oozie yet [18:26:43] mforns: in a subset of mobile data say an hour [18:26:48] aha [18:27:23] ok [18:27:45] mforns: once you have an idea of issues/complexities it is likely that spark is a better choice. if so great, me, like ottomata loves scala [18:27:49] nuria, ottomata: do you have a more or less similar example of a scala spark job? [18:28:14] mforns: not me [18:28:38] the top 10 pageviews was done in scala, mforns [18:28:39] ottomata, the top wikipedia articles tag cloud was scala spark? [18:28:47] yes, thanks milimetric [18:28:49] it was done in spark streaming, so it is a little different [18:28:55] I see [18:29:01] but, i'll do one like th gist i sent you [18:29:03] in scala [18:29:04] it's not exactly similar, but the scala code should be a lot simpler for what you need [18:29:06] ottomata: but this one should be simpler, no real time component [18:29:06] gimme a few mins [18:29:20] milimetric: batcave? EL? [18:29:26] batcave, yes [18:29:37] ok, thanks guys [18:30:15] nuria: I be in da cave [18:30:27] milimetric:wait, sounds like chrome re-start [18:49:23] ooo, mforns, i think spark scala does work with these parquet files real nicely. i'm still making an example, stay tuned.. [18:49:37] ottomata, ok :] [19:27:59] ok, mforns, batcave? [19:28:01] https://gist.github.com/ottomata/a045770ce75f065268ea [19:28:07] ottomata, sure [19:28:07] want a walk through? [19:28:22] actually, brb.... [19:32:15] (CR) Nuria: De-static-everything (3 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/197296 (owner: OliverKeyes) [19:32:40] nuria, thanks for the CR! [19:32:59] I'll fix and implement for the other classes, yeah? [19:34:21] IH|postoffice: sounds good, let's go with the simpler eager initialization construct as it does not require synchronization so is per-se thread safe [19:50:24] nuria, makes sense! [19:51:03] Ironholds: k [20:07:46] milimetric: something else to talk to VE team about will be the rate of events, now is ~70 per sec, which is huge, soon mysql tables will be too large to get any data [20:09:56] nuria: yeah, but because they're doing sessions it's impossible to sample [20:10:12] milimetric: you can sample within the session just fine [20:10:33] as a session either sends events or not (every single one of them) [20:10:53] I'm not sure what you mean [20:11:00] milimetric: so you do not sample events but rather sample sessions [20:11:18] can't do that though - how do I know if I'm sampling equally for all types of events? [20:11:19] milimetric: so every other session sents events , that would be 50% sampling [20:11:36] yeah, but you could introduce a bias pretty easily if you're not careful with that [20:11:55] milimetric: no, if session ids are ramdom [20:12:06] milimetric: and they are as random as they can be [20:12:08] mi [20:12:40] milimetric: you will skewed to users that use the tool more frequently, but for any overall stats that is happening already [20:13:03] not really, for example, the user type analysis [20:13:17] I think we just need to get a better way to analyze the data [20:13:38] using mysql for analysis is like using a car engine as a fishing rod [20:15:04] milimetric: event that type of data you could sample (provided bucket sizes are within a magnitude). More data doesn't necessarily mean more precise results for the level of precision we are reporting. [20:16:00] nuria: graphite is back up and should be stable now [20:16:22] YuviPanda: ok, thank you! testing now [20:16:26] I agree we could sample, but in light of all the other work we have, it seems like an unnecessary battle. Worst case I'll just sqoop all their data into hdfs and do the analysis there [20:16:51] milimetric: agree is concern #2 after others [20:29:01] * Ironholds checks email [20:29:05] * Ironholds reaches for his good gin [20:29:54] Ironholds, do you have time for 1 more question? [20:30:43] is it an unceasing stream of "what the hell nooo" at wmfall? [20:30:48] (sure, hit me) [20:32:39] Ironholds, where is the apps uuid data in the webrequest table? [20:32:42] x_analytics? [20:32:54] ah. ahahah. ahahahahahahahahahahsahfkhgthkmg*chokes* [20:33:00] xD [20:33:13] so, it used to be in the URL, but they realised that this meant caching was impossible because every user had their own set of URLs [20:33:18] so now it's in x_analytics [20:33:22] buuut some people don't upgrade! [20:33:56] ok, and how do I know a pageview comes from apps? [20:34:26] https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Pageview.java#L122-L139 [20:34:38] if you're planning on using hive and ergo the UDFs I can make that public as part of my current patch if you want [20:35:07] ha why shouldn't that be public, eh?! [20:35:21] because we've never needed to use it on its own! :P [20:35:35] I'll make it public (will have to write a set of unit tests. Eh, fine. Is tomorrow okay?) [20:35:43] so, on the UUIDs, you'll need to use url_parse(concat(uri_host,uri_path)...) to grab the UUID and, if it's NULL, I wrote xAnalyticsExtract, where you pass it a parameter name and it passes the value back out (or NULL) [20:35:52] this is assuming UDF usage [20:36:45] also, fyi, there is a task that will soon parse x_analytics into a Map and store it as such in the refined table [20:36:49] so you will be able to do [20:36:50] yay! [20:36:54] select x_analytics['uuid'] [20:36:56] or whatever [20:36:56] x_analytics['uuid'] [20:36:57] snap [20:37:19] aha [20:37:51] Ironholds, ottomata: can I use UDFs from spark? [20:38:07] nooo idea ;p [20:38:25] I mean, all my UDFs have the logic abstracted out to a pure java class [20:38:30] so, they're not UDF dependant [20:38:32] Analytics-Kanban, Analytics-Wikimetrics: Utf-8 names on json reports appear as unicode code points: "\u0623\u0645\u064a\u0646" - https://phabricator.wikimedia.org/T93023#1130173 (Fhocutt) a:Fhocutt [20:38:38] so even if you can't you could grab the underlying java class and use that [20:39:20] all right [20:39:27] mforns: not UDFs, but java ja [20:39:32] (PS1) Milimetric: Update for March (Yes, it's late) [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/197736 [20:39:37] mforns: i did in my streaming job for isPageview [20:39:46] since the json data i was consuming doesn't have that as a refined field [20:39:47] (CR) Milimetric: [C: 2 V: 2] Update for March (Yes, it's late) [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/197736 (owner: Milimetric) [20:39:48] so ja, like this: [20:40:10] yes, I see, didn't know that UDFs could also be called from java [20:40:11] import org.wikimedia.analytics.refinery.core.Pageview [20:40:11] Pageview.isPageview(...) [20:40:24] ok ok [20:40:27] well, UDF is the wrong term there really [20:40:30] UDF is for Hive [20:40:46] the UDFs we code for Hive are very thin wrappers [20:40:51] for things usually in refinery.core [20:41:06] Analytics-Kanban, Analytics-Wikimetrics: Utf-8 names on json reports appear as unicode code points: "\u0623\u0645\u064a\u0646" - https://phabricator.wikimedia.org/T93023#1130189 (Fhocutt) I'll look into this--looks like I didn't handle encoding properly if this is happening. [20:41:08] so, if you are using other languages, you can call the refinery.core class methods [20:41:46] ottomata, aha [20:42:16] the more I listen to Moonlight Sonata the more I think it's about exponential progress and impending doom at the hands of artificial intelligent god machines [20:43:02] ottomata, when I import org.wikimedia.analytics.refinery.core.Pageview from spark-shell it does not work: error: object wikimedia is not a member of package org [20:43:43] aye you need to have the jars loaded in your classpath [20:43:47] uMmmMmMM [20:43:56] this is easier from inside of a compiled context, lemme see... [20:45:05] ah easy [20:45:06] mforns: [20:45:06] spark-shell --jars /srv/deployment/analytics/refinery/artifacts/refinery-core.jar [20:45:12] then [20:45:18] import org.wikimedia.analytics.refinery.core.Pageview [20:46:05] mforns: i want to say: just because spark is cool does not mean we ahve to use it for this [20:46:18] if hive actually does work, and isn't too roundabout, then you might want to stick with it [20:46:57] spark and scala would be very new for us, and introducing this for unique counting might not be the best decision. That doesn't mean that it isn't though! I leave that up to y'all :) [20:46:59] well I'll try, I agree it is better to use spark :] [20:47:11] ok [20:47:19] thanks for the help :] [21:17:30] nuria, the apps uuid comes partly from the x_analytics header [21:19:28] evening halfak [21:19:51] o/ Ironholds [21:19:56] how goes? [21:20:11] Not bad. Just finishing up the last paper session at CSCW [21:20:18] Looking at meme diffusion patterns. [21:20:28] neat! [21:20:35] I'm uh. Writing java. [21:21:44] UDFs or something else? [21:23:22] architectural changes! [21:23:38] reorganising our code so it's non-static and instead uses eagerly-instantiated singletons [21:24:05] (I would go for lazily-instantiated but nuria tells me there are thread-safety problems with that, which is why we pay her software engineer money and me...whateverIamnow money ;p) [21:29:28] Analytics-Kanban, Analytics-Wikimetrics: Utf-8 names on json reports appear as unicode code points: "\u0623\u0645\u064a\u0646" - https://phabricator.wikimedia.org/T93023#1130300 (kevinator) @fhocutt, @nuria is also looking into this presently. We have run into may encoding issues in the past and she thin... [21:31:11] Analytics-Kanban, Analytics-Wikimetrics: Utf-8 names on json reports appear as unicode code points: "\u0623\u0645\u064a\u0646" - https://phabricator.wikimedia.org/T93023#1130302 (Fhocutt) Ok, great! If she fixes it I'll see how she did it and know for next time. [21:31:23] Analytics-Kanban, Analytics-Wikimetrics: Utf-8 names on json reports appear as unicode code points: "\u0623\u0645\u064a\u0646" - https://phabricator.wikimedia.org/T93023#1130303 (Fhocutt) a:Fhocutt>None [21:55:27] Ironholds, how do singletons make the code non-static? [21:55:31] Analytics, MediaWiki-General-or-Unknown, Services, Wikidata, and 4 others: Reliable publish / subscribe event bus - https://phabricator.wikimedia.org/T84923#1130400 (GWicke) See also: @aaron is working on a cache update service at https://github.com/AaronSchulz/python-m emcached-relay [21:55:38] Analytics-Cluster, Analytics-Kanban, Easy: Mobile Apps PM has monthly report from oozie about apps uniques [8 pts] - https://phabricator.wikimedia.org/T88308#1130402 (kevinator) @deskana was asking: is this report run monthly on a month of data? [21:55:53] halfak, rationale goes; if we make it static, everything is everywhere. messy. if we make it dynamic, we have to explicitly instantiate it [21:56:04] buuut if we have to explicitly instantiate it in the UDF, we're instantiating a new instance every row [21:56:14] What I'm missing here is your meaning of "static" and "dynamic" [21:56:29] so we use singletons to ensure we get to avoid cluttering the namespaces until we need the class, but only have to instantiate it once when we do need it [21:56:42] halfak, literally "public static string extractThingFromOtherThing" [21:56:54] as opposed to "public string extractThingFromOtherThing" [21:58:40] Break time! [21:58:40] o/ [22:54:22] Analytics, Analytics-Kanban: Turn off Zero - Limn dashboards & put up a "moved sign" - https://phabricator.wikimedia.org/T92920#1130547 (kevinator) Spoke to Dan and this isn't as simple as dropping in an index.html where the current dashboards are. Easiest solution might be to configure Apache to redirec... [23:00:13] Analytics, Analytics-Kanban: Turn off Zero - Limn dashboards & put up a "moved sign" - https://phabricator.wikimedia.org/T92920#1130579 (kevinator) [23:02:32] Analytics, Analytics-Kanban: Turn off WP Zero's Limn-Dashboards & put up a "moved sign" - https://phabricator.wikimedia.org/T92920#1130583 (kevinator) [23:03:58] (PS4) OliverKeyes: De-static-everything [analytics/refinery/source] - https://gerrit.wikimedia.org/r/197296 [23:04:49] (CR) OliverKeyes: "Nurieta, a singleton for you!" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/197296 (owner: OliverKeyes) [23:09:05] ori: yt?