[05:36:17] Analytics-Cluster, operations, Interdatacenter-IPsec: Secure inter-datacenter web request log (Kafka) traffic - https://phabricator.wikimedia.org/T92602#1115779 (Gage) NEW [08:57:56] Analytics-Tech-community-metrics, ECT-March-2015: KPI pages in korma need horizontal margins - https://phabricator.wikimedia.org/T88670#1115975 (Qgil) Ok, this new layout with margins is more usable and looks a lot better. Thank you! There is still a chance that some tooltips are cut, and not even using... [13:27:54] Analytics-Tech-community-metrics, ECT-March-2015, Patch-For-Review: "Volume of open changesets" graph should show reviews pending every month - https://phabricator.wikimedia.org/T72278#1116526 (Qgil) p:Low>Normal [13:27:56] Analytics-Tech-community-metrics, ECT-March-2015, Patch-For-Review: "Age of unreviewed changesets by affiliation" shows negative number of changesets - https://phabricator.wikimedia.org/T72600#1116528 (Qgil) p:Low>Normal [13:32:25] Analytics-Tech-community-metrics, ECT-March-2015, Patch-For-Review: "Volume of open changesets" graph should show reviews pending every month - https://phabricator.wikimedia.org/T72278#1116543 (Qgil) a:Qgil>Dicortazar >>! In T72278#745488, @Acs wrote: > Quim, we plan to add the Verified -1 and -2... [13:40:41] Analytics-Tech-community-metrics, ECT-March-2015, Patch-For-Review: "Age of unreviewed changesets by affiliation" shows negative number of changesets - https://phabricator.wikimedia.org/T72600#1116558 (Qgil) a:Qgil>Dicortazar Ok, here the suspicion was that the values of previous months changed a... [13:41:12] Analytics-Tech-community-metrics, ECT-March-2015: Key performance indicator: Gerrit review queue - https://phabricator.wikimedia.org/T39463#1116562 (Qgil) p:Low>Normal [14:02:22] ottomata: standupppp [14:02:45] (CR) Milimetric: "Just some cleanup comments, looks good to merge if you fix the flake8 stuff. I confess I wasn't able to test it but I trust you that it w" (5 comments) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/192319 (https://phabricator.wikimedia.org/T89251) (owner: Mforns) [14:03:12] ohmygoodness standup [14:08:14] Analytics-EventLogging, Analytics-Kanban, operations: Eventlogging JS client should warn users when serialized event is more than "N" chars long and not sent the event [8 pts] - https://phabricator.wikimedia.org/T91918#1116583 (kevinator) Implementation notes from our tasking meeting: Client - adding m... [14:13:29] ottomata: I can't get that "we keep 1 PB of data in kafka buffers" thing from the article [14:15:56] they have a lot of kafka brokers [14:16:00] wait, linkedin? [14:16:08] the jay kreps lambda article? [14:42:07] qchris: can I ask you for some git/gerrit advice? [14:42:14] Sure. [14:42:18] it is very simple,and something you are good at. [14:42:20] so [14:42:25] i have a patch in for review [14:42:27] but i want to keep working on it [14:42:34] probaly will submit a new patch on top of it [14:42:53] but, i am not sure of the best way to still be able to edit the first patch if I get review comments [14:43:01] if i have a commit on top of it also in for review [14:43:29] how do you do it? [14:43:44] So you have a commit A1 that is parent of B1. [14:43:52] A1 is in gerrit and getitng reviewed, [14:44:05] while you privately hack on B1? [14:44:05] yes [14:44:16] for now, but i would like to put B1 up too [14:44:19] while A1 is still in review [14:45:03] Whenever you update A1 to A2, A3, A4, ... I'd also rebase B1 on the current A$N. [14:45:14] So you'd end up with A1 being parent of B2; [14:45:19] Grr. wrong. [14:45:20] Sorry. [14:45:23] So you'd end up with A1 being parent of B1; [14:45:31] A2 being parent of B2. [14:45:36] A3 being parent of B3. [14:45:38] And so on. [14:45:53] It's not super-nice, but I do not know of a simpler thing. [14:46:33] Do the rebasing/cherry-picking in your local checkout. Git is typically better at that than gerrit is. [14:46:33] locally how to you hack on a1? [14:46:36] check it out in its own branch, hack, and then submit [14:46:46] and then after that rebase B1? [14:47:38] I typically create a C1 (which is a child of B1). [14:47:45] Then I go "git rebase -i master" [14:48:05] and reorder A1 <- B1 <- C1 to A1 <- C1 <- B1 [14:48:20] (and squash C1 into A1). [14:48:41] That creates A2 (from A1 and C1), and B2 (just rebased B1). [14:49:01] This workflow assumes that A1 and B1 are not too intertwined. [14:49:21] Are A1 and B1 really intertwined for what you are trying to achieve!? [14:49:24] s/!// [14:49:26] not really [14:49:31] oh man sounds complicated though [14:49:38] reorder? [14:49:49] It only sounds complicated. [14:50:10] Reorder ... yes. Did you notice the "-i" switch in the rebase command? [14:50:15] That is for "interactive" rebase. [14:50:31] Git opens up an editor, with a list of commits that you are about to rebase. [14:50:38] ah no, ok, going to try it. [14:50:51] Swapping the line of B1 and C1 the saving, swaps the order of the two commits. [14:50:59] so, in your example, C1 is what you want to turn in to A2, yes? [14:51:09] A2 := A1 + C1. Yes. [14:51:11] ok [14:51:41] "git rebase -i master" will show you a file in your editor that looks like [14:52:00] pick $HASH_OF_A1 $COMMIT_MSG_OF_A1 [14:52:03] pick $HASH_OF_B1 $COMMIT_MSG_OF_B1 [14:52:09] pick $HASH_OF_C1 $COMMIT_MSG_OF_C1 [14:52:13] turn that into [14:52:21] pick $HASH_OF_A1 $COMMIT_MSG_OF_A1 [14:52:28] f $HASH_OF_C1 $COMMIT_MSG_OF_C1 [14:52:35] pick $HASH_OF_B1 $COMMIT_MSG_OF_B1 [14:52:38] and save. [14:53:06] in your case, are you using a working topic branch? [14:53:08] whne you rebase? [14:53:10] (So order of B1 and C1 got swapped. and "pick" of B1 got turned into "f" which means "fix-up") [14:53:26] yes, but that is not so important. [14:53:30] so that rebase -i master chooses an earlier commit? [14:53:35] i just did a test in a new git repo [14:53:39] with three commits [14:53:42] If you're not on a topic branch, you just end up with a detached HEAD: [14:53:45] rebase -i master [14:53:46] says [14:53:47] noop [14:53:51] noop [14:53:51] # Rebase 623654a..623654a onto 623654a [14:54:00] I master at the parent of A1? [14:54:07] (It should be) [14:54:08] master is at heaed [14:54:10] head [14:54:12] Ah! [14:54:12] so, C2 [14:54:14] sorry [14:54:15] C1 [14:54:26] so that's why you'd need another branch [14:54:28] then try "git rebase -i HEAD~3" [14:54:30] k [14:55:10] (But typically I'd recommend to not work on "master" branch. Either work detached, or use a topic branch.) [14:55:20] (Topic branches are less error-prone) [14:55:36] aye [14:55:49] ok, that worked. [14:55:50] pffft. magnet and a steady hand. [14:55:55] how do I squash A1 and C2? [14:56:16] The change from "pick" to "f" in the rebase does the trick. [14:56:28] oh [14:56:33] "fix-up" is a "squash" without asking you about the commit message. [14:56:55] It's a short-cut that really usefull in those situations. [14:57:16] So one cannot pick the wrong Change-Id when squashing them. [14:57:19] hm Cannot 'fixup' without a previous commit [14:57:24] maybe i did something weird [14:57:27] trying again [14:57:28] from scrat [14:57:45] Yup. See scrollback. [14:57:49] A1 is "pick" [14:57:53] C1 is "f" [14:58:27] Let me boot the google machine... [14:59:10] yes i think my problem is in my trial i didn't have an A1, it was just the inital commit [14:59:15] trying again with an A1 [15:00:05] that worked qchris [15:00:21] ha, this is pretty annoying [15:00:25] hm, ok, at least I know how now [15:00:27] thank you! [15:05:20] yw :-) [15:17:03] hey nuria, EL q for you if you got a sec [15:17:45] or for ori too, but maybe he is not awake yet? [15:21:20] ottomata: what's up? [15:21:41] oh, its kinda a El architecture q, about the codebase. [15:21:54] kinda a: do you think I should do this, or this, question [15:24:19] sorry, just caught up with your git/gerrit question above - I know how to do that reordering too, there's also another way that I do it [15:24:51] you can ask, I know the EL code fairly well [15:26:35] oh ok [15:26:36] ok [15:26:41] ok [15:27:24] so, both forwarder and processor have a similar problem, that is, even though they take input and output URIs [15:27:33] they are specific to zeromq [15:28:00] i'm wondering if they could take advantage of the handlers and just use get_reader and get_writer [15:28:07] and then potentially be able to work with any input or output [15:28:21] exampel [15:28:25] processor does [15:28:26] pub = pub_socket(args.output) [15:28:29] wheras, it could do [15:28:36] output_stream = get_reader(args.output) [15:28:41] sorry [15:28:44] get_writer(...) [15:29:36] hm, my opinion is it'd be better to keep these things single-purpose [15:29:39] i'm not sure about the unicode stuff thoguh [15:29:50] so instead of that, just make a kafka_writer [15:29:57] whatever_writer [15:30:03] you mean a eventlogging-kafka-processor? [15:30:14] i already have kafka_reader and kafka_writer [15:30:17] that works fine [15:30:22] that's why the kafka URIs work [15:31:21] so you want to change the zeromq_writer to be more generic? [15:31:27] no [15:31:34] i want the bin/ scripts to work with any reader or writer [15:31:44] oh, sorry, bin/scripts, duh [15:32:26] eventlogging has this fanciness to infer reader iterator streams and writer streams from URIs [15:32:30] its really awesome [15:32:32] and works for kafka [15:32:42] but, the bin scripts are special cased and don't use the get_reader or get_writer methods (mostly) [15:32:50] -consumer does [15:32:52] and it works great with kafka [15:33:20] I think that idea makes sense, yea [15:33:53] so, i'm looking at processor now, too see how hard it would be to process events directly from kafka, rather than using 0mq [15:34:03] the reader is actually generic [15:34:07] oh HM [15:34:08] actually [15:34:10] lemme try that... [15:34:44] ottomata: sorry i missed your ping. did you get your EL question answer? [15:35:15] ah, no it isn't [15:35:18] sorry, htat was my own code [15:35:21] ha, no [15:35:21] so [15:35:27] i'm looking at eventlogging-processor [15:35:40] and wondering if I can make it use generic get_reader and get_writier from input and output args [15:35:44] rather than doing the specfic [15:35:51] (sub_socket(args.input, identity=args.sid) [15:35:52] and [15:35:57] https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/server/bin/eventlogging-processor#L53 [15:35:58] pub = pub_socket(args.output) [15:36:06] yeah, exactly [15:36:10] for that [15:36:11] why not do [15:36:19] output_stream = get_writer(args.output) [15:36:20] get_writer(args.output) [15:37:11] ha, the --sid flag can be encoded in the url if you are using 0mq get_reader [15:37:12] where's get_writer defined again? [15:37:23] tcp://*:8600?socked_id=abc [15:37:25] in utils i think [15:37:48] no, factory.py [15:37:48] https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/server/eventlogging/factory.py#L66 [15:38:07] ottomata: the problem is that the processor assumes zeromq streams and you want to make it more generic, correct? [15:38:22] yes [15:38:26] ottomata: did I understood that well? [15:38:29] same for forwarder too [15:38:31] sorta [15:38:47] the main issue is the iter_unicode stuff [15:38:55] i think those methods assume a file handle? [15:39:02] not just a generic iterator [15:40:15] file or zmq stream? [15:40:16] not sure what [15:40:21] if hasattr(stream, 'recv_unicode'): [15:40:21] return iter(stream.recv_unicode, None) [15:40:21] is for [15:40:27] i assume recv_unicode is a zmq method? [15:40:38] ottomata: lemme look [15:40:58] ja it is [15:41:04] ok, so yeah, the various writers have that decorator that registers them in factory.py, which then dispatches the appropriate writer depending on the URI scheme. I think get_writer should work fine [15:41:31] milimetric: it does well, because consumer uses it [15:42:00] and it makes sense to make processor more generic, too [15:42:03] hmm, yeah, actually i thikn forwarded might be very easy to maek generic [15:42:14] processor looks harder, because of the unicode stuff, [15:42:17] hm, i guess [15:42:31] those methods in stream.py already take condintional action based on the type of stream [15:42:32] I don't see where the consumer does it, or maybe it's not merged? [15:42:32] https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/server/bin/eventlogging-consumer [15:42:36] if hasattr(stream, 'recv_unicode' [15:42:37] etc. [15:42:42] i could add another conditional for kafka [15:42:51] consumer calls .drive() [15:43:18] https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/server/eventlogging/factory.py#L82 [15:44:15] seems that the current stream abstraction doesn't work with kafka [15:44:19] oooh [15:44:21] :) [15:44:42] ay ay [15:45:11] it could! [15:45:15] Adding conditionals there doesn't make teh forwrader /consumer more generic though, right? [15:45:27] it does, because then you could call iter_unicode(kafka_stream) [15:45:27] you just move one level below the "if" [15:45:39] i'd still have to make it use get_reader and get_wrtier [15:45:46] but i could pass those to iter_unicode [15:46:07] for raw_event in iter_unicode(get_reader(args.input)): [15:47:20] hm, yeah [15:47:23] actually, that is what I should do. [15:47:28] make the stream abstraction work with kafka [15:47:32] then everything else will fall in place [15:47:43] if this were OO i'd say [15:47:59] streams shoudl be a factory that returns a stream handler based on input args [15:48:41] stream handler will be of type zeomq or kafka depending on args passed to forwarder (if those two are substantially different) [15:49:35] now... this all being python modules i imagine you can add the needed logic to manage kafaka streams to streams (seems that it belongs there) [15:49:42] yeah, as is it's functional without pattern matching. But with get_writer and get_reader, your problem only starts here: https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/server/bin/eventlogging-processor#L66 [15:50:45] (and btw, apply_safe in factory is kinda like pattern matching) [15:51:10] Probably what we want to change is the fact that the forwarder/processor assumes there is only one possible stream [15:51:57] hmm, milimetric, yeah [15:52:01] that is someting I don't understand [15:52:06] why all the special unicode handling? [15:52:50] so "make stream abstraction work with kafka" sounds good and also "make forwarder work with stream abstraction rather than call out methods directly" [15:54:09] I think it's just being careful to keep everything unicode because python 2 sucks at handling that kind of thing unless you're explicit [15:54:46] do you mean just calling pub.send_unicode or other stuff? [15:55:02] other stuff too [15:55:10] the stream abstraction does iter_unicode pretty much everywhere [15:55:20] which means that it either calls zmq's recv_unicode [15:55:21] or [15:55:26] ottomata: because sys.setdefaultencoding('utf-8') [15:55:31] wraps a file object with a utf conversion [15:55:35] ottomata: which is not the default in python [15:56:05] milimetric: https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/server/eventlogging/streams.py#L77 [15:56:17] i wonder, if i change iter_file to be a more generic utf8 wrapper [15:56:22] to work with files, or iterators of any kind [15:56:41] like, instead of implicitly using io.open, just wrap anotehr generator, hm. [15:56:42] Hmmm [15:57:53] nuria: to do that, would I just make sure each message does unicode(message, "utf-8") [15:58:01] or [15:58:02] decode('utf-8', 'ignore') [15:58:03] ? [15:58:07] (i'm looking at SO :) ) [15:59:20] ok, but note that setting sys.setdefaultencoding('utf-8') on forwarder is affecting all code gets executed from that moment on on that process [15:59:28] it seems to me like this line means iter_unicode only handles files and things with recv_unicode (zeromq): https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/server/eventlogging/streams.py#L90 [15:59:37] yeah, milimetric, i think this might work: [15:59:49] https://gist.github.com/ottomata/4b62022150d52300a7fa [15:59:58] maybe [15:59:59] ? [16:00:23] instead of ignore, why not 'replace' like he uses? [16:00:54] but that sounds ok... [16:01:20] oh is that a thing? [16:01:33] aye yeha [16:05:17] ooo that works! cool, i can now use stream() on a generic iterator! [16:06:41] ottomata: yes, I was referring to the Jay Kreps article before. 1 Petabyte!! We should totally think that through when we start work this next quarter. [16:07:14] also - here's my worse but maybe simpler way to do what you asked about gerrit: [16:08:16] git checkout -b A [16:08:16] work work [16:08:16] git commit [16:08:16] git review [16:08:16] git checkout -b B [16:08:16] work work [16:08:17] git commit [16:08:17] git review [16:08:17] -- this is when you want to address changes to A, right? [16:08:27] yes [16:08:45] git checkout A [16:08:45] work work [16:08:45] git commit --amend [16:08:46] git checkout B [16:08:46] git rebase A [16:08:46] git review [16:09:05] that will submit A2 and B2 both [16:09:18] because you brought A2 into your B topic branch [16:09:22] by rebasing [16:09:24] ah so just multpled branches, aye, but. hmm, gerrit doesn't mind if your topic branch B depends on an un merged commit in topic branch A [16:09:24] ? [16:09:41] no, it doesn't care [16:09:58] I suck at synchronizing my local topic branch names with the gerrit ones, but I don't think anyone cares about any of that [16:10:16] well, I take that back, I don't think people like me care, I'm sure Christian cares :) [16:11:10] anyway, I think Christian's way is more correct, but this might be easier to reason about / remember. At some point I'll start doing it Christian's way. Also, I'd say use squash instead of fix-up [16:11:29] until you're more comfy with the whole thing anyway [16:12:20] hm, k [16:12:21] thanks [16:12:38] ok, milimetric, here's an annoying thing about using uris all the time [16:12:50] see my coment here? [16:12:50] https://gerrit.wikimedia.org/r/#/c/196073/3/server/bin/eventlogging-kafka-forwarder [16:13:03] basically, in places where raw input is required [16:13:12] i need to make sure that the URI has raw=True on it [16:13:54] whereas, because these scripts don't use get_reader() for example, but call things like sub_socket directly, there is no json conversion. [16:14:18] i guess, hm, get_reader and get_writer as assuming that you are working with json only. [16:14:23] whereas raw input might not be json [16:14:33] e.g. from varnishncsa it is not, there are extra fields [16:14:42] that's why processor exists, to convert it to json. [16:14:43] hm [16:16:26] hm, i could make the methods in factory.py take and pass through a raw argument, instead of relying on it being in the uri? [16:16:48] ottomata: that is why there are two processors though, watch out [16:17:32] yes, but still, in both cases those processors input is raw [16:17:45] ottomata: one for client side one for server side, the process to convert both to valid json is different [16:17:48] ottomata: ah yes [16:18:46] so i mean, it isn't that big of a deal, in the bin scripts themselves [16:18:50] to make sure the input url has raw=True on it [16:18:54] but, it is a little messy [16:18:58] thoughts? [16:19:09] sorry, what's "raw" mean? [16:19:14] raw means [16:19:15] ok [16:19:18] if raw=False [16:19:20] which is the default [16:19:30] the stream returned will automatically convert each message to a json object [16:19:35] rather than just give you the string [16:20:02] processor needs the raw string, because, it assumes that the input will not be a json objet [16:20:27] the whole point of processor is to take a raw stream of strings and parse out the json field, and then convert that to a json object sream [16:20:50] so, in order for processor to work, it has to iterate over a raw stream, [16:20:54] if i don't set raw=True [16:21:09] the iterators in streams.py will call json.loads() on the string [16:21:33] https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/server/eventlogging/streams.py#L103 [16:23:24] ottomata: that seems a good thing to me, distinguising between raw events and json-fy events [16:24:54] yes, it is good [16:24:56] ottomata: yeah, it seems like a fine thing to do in the forwarder, because it wouldn't ever be different for that forwarder [16:25:07] right, i would also do it in processor [16:25:28] maybe just add a short explanation above [16:25:32] if i wanted multiplexer to work with kafka, i'd have to do it there too [17:08:00] Analytics-EventLogging, operations: Delete vanadium:/srv/eventlogging - https://phabricator.wikimedia.org/T75084#1116914 (RobH) a:Ottomata @ottomata: is this task something you would handle? I'd think so, since the parent is also assigned to you. If not, please feel free to remove yourself. [17:34:18] Analytics-EventLogging, operations: Delete vanadium:/srv/eventlogging - https://phabricator.wikimedia.org/T75084#1117037 (Ottomata) Ja can do. [17:35:39] Analytics-EventLogging, operations: Delete vanadium:/srv/eventlogging - https://phabricator.wikimedia.org/T75084#1117041 (Ottomata) Open>Resolved [17:55:03] mforns: did you tested the EL code with the multimedia events? [17:55:16] nuria, yes [17:55:51] mforns: souns great [17:55:52] you mean in beta? [17:56:08] I tested in vagrant [17:56:13] mforns: no you cannot test it in beta cause there we only control chnages to server, not client [17:56:14] not in beta? [17:56:19] ok [18:35:45] milimetric, I tried to find how to mock module (not instance) methods with MagicMock and I didn't find anything, do you have an idea if that is possible? [18:36:25] mocking the module method seems to have no effect [18:36:38] mforns: I might be naive, but I thought it was just module.method = Mock(...) [18:36:39] and the code that uses utcnow is no class [18:36:50] this has no effect [18:37:28] huh, weird, lemme mess with it for a second. It doesn't matter anyway, I'm not concerned about that test not passing :) [18:37:29] milimetric, it also happens if you try to mock a module method by hand, it seems python does not let you assign to modules [18:37:42] if I can't figure something out in the next five - ten minutes, then you can just ignore my comment [18:37:48] I could change reportupdater.py to be a class [18:37:58] nah, that's fine, no need [18:38:10] but it would be lots of lines of code (tests) [18:38:23] ok [18:39:31] mforns: batcave? [18:39:34] sure [18:57:34] milimetric, https://phabricator.wikimedia.org/T92596 is about VE only, right? [18:57:46] it doesn't include any of my wikieditor instrumentation [19:03:46] Krenair: no, it doesn't , the issue is present on client sie data [19:03:50] *client side [19:05:24] nuria, do you know why that bug says rev_id rather than rev_user? [19:05:37] Krenair: lemme see [19:06:48] I wonder what "James seems to think this is because mw.user (which is used to get .isAnon()) was changed to be async at some point. " meant [19:07:37] you can get logged in/logged out in the middle of edits, due to other tabs, etc. I wonder if that's related [19:07:59] Krenair: it means that the user (user.id, user.class) in most VE sessions (client side data) [19:08:15] Krenair: is listed as being anonymous when it is really not [19:08:53] Krenair: James_F thought it might have something to do with the fact that mw.user it loaded async and -in some instances, according to james- [19:08:59] not present when VE is loaded [19:09:28] Krenair: "you can get logged in/logged out in the middle of edits, due to other tabs, etc"-> for the majority of our data? no, i do not think so [19:10:06] Krenair: I think there is some instrumentation failure there, otherwise we will have a small portion of data be wrong, not the majority of it [19:10:23] you think that case is a minority and a much larger part of the data is just wrong? [19:10:27] Krenair: wrong , when it comes to identify anonymous users, that is. not wrong overall [19:10:45] Krenair: no, let me explain. [19:11:20] Krenair: The "majority" of the data incorrectly identifies logged in users as anonymous. That is teh only incorrection. [19:11:49] ok [19:11:50] Krenair: That is separated for "userX" went through steps1 -> 2->3 of funnel [19:11:55] that's what I thought you were saying before :) [19:12:33] Krenair: What we know is that we cannnot do any analysys of user types with the data we have. [19:13:30] Krenair: ah, ok, "wrong data" is kind of too broad. "dataset does not label correctly anonymous users" is more precise [19:13:54] right [19:13:59] wrong user class data is what I meant [19:14:43] sorry I missed your ping, Krenair [19:15:08] one clarification [19:15:16] it wasn't that long ago, a lot of people take much longer to respond [19:15:31] :) [19:15:46]