[00:22:01] greg-g: I cannot think of any existing source of data for logins over HTTPS and as a function of country [00:22:37] IceWeasel beats Chrome with Google Chart support, wow [00:23:31] yeah, who would've thunk it [00:23:53] DarTar: so, ori is working on getting that info out of the centralauth.log file [00:24:10] ah! [00:24:14] yeppers :) [00:24:30] awesome [07:59:30] (CR) Faidon: "I don't have a full grasp of the issue either but that's what Stefan said above and my understanding so far, yes." [analytics/dclass] (wikimedia) - https://gerrit.wikimedia.org/r/79453 (owner: Stefan.petrea) [13:01:19] morning [13:06:03] Morning :-) [13:07:58] morning [13:09:15] moriniiiing! [14:20:12] hey qchris: can i move 1077 to done? [14:20:17] and what about 1025? [14:20:27] ottomata: can i move 385 to done? [14:20:40] and what about 933? [14:20:56] milimetric: 823 moves to the next sprint? [14:21:16] drdee: 1077 I did not yet receive a confirmation that from the customer that it works for her. [14:21:41] drdee: 1025: I am not sure. Did you read the comment on the card? What do you think? [14:22:06] yeah, drdee, I think so. I'm only at 79% [14:22:08] 1077: jessie just emailed us [14:22:17] but the last few tests should be much easier [14:22:28] 1077: If she's fine, we can move it to done. [14:22:32] 933 will be a slow work in progress drdee [14:22:41] thx ottomata [14:22:44] but, it is 'set up' so if you want to move it to done, and then create a new card for tweaks [14:22:47] that would be fine [14:22:59] 1025: well it's merged (https://gerrit.wikimedia.org/r/#/c/79584/) [14:23:10] i will delete stuff from stat1 now, so 385 will be done [14:23:30] qchris so i think it's done [14:23:36] drdee: ok. [14:23:41] drdee: Cool :-) [14:24:13] qchris: 1023 is still in progress, right? [14:24:52] milimetric: do you want to demo 822? [14:24:58] drdee: Shouldn't we postpone such discussions to standup? [14:25:07] no standup on sprint days :) [14:25:11] on days we have sprint demo we don' do standup [14:25:13] Oh ... :-) [14:25:18] so that's why i do it like this :) [14:25:19] sure, I can demo the aggregation [14:25:27] i am prepping the slidedeck [14:25:39] how short on points are we so far? [14:25:48] what do you mean? [14:25:53] I could... I suppose, try to rush 823 just to get more coverage and skip any hard stuff [14:26:05] nah [14:26:11] k, cool [14:26:35] we've got 44 points in two sprints so it's about half our normal velocity [14:26:38] drdee: 1023: running the jobs outside of Hadoop is done. The files required by the card are available. But they lie in my home directory as suggested by the card. [14:26:45] but most of us were on holiday of course [14:26:57] drdee: 1023: We do not yet know where to put them. [14:27:17] why not the same space as the w0 stuff? [14:27:24] drdee: 1023: Where should they go? (The http URL probably will be turned off soonish) [14:27:43] drdee: The Wikipedia Zero stuff goes straight into dashboard computations. [14:27:43] stats.wikimedia.org/public-kraken/ [14:27:52] ottomata: ^^ agree ? [14:28:03] How to get data in there once the cluster is turned off? [14:28:07] stats.wikimedia.org/public-kraken/ is hosted on stat1001 [14:28:20] so an rsync should work [14:29:20] milimetric, qchris: is there something else we want to demo? [14:29:29] 824? [14:29:43] drdee: There is nothing I can demo. [14:29:46] and maybe 1072 [14:30:02] I can quickly click on 824 while I'm waiting for the aggregate results, sure [14:30:12] drdee: There is nothing to demo on 1072 :-) [14:30:24] maybe you can explain it a bit? [14:30:46] yeah, totally 1072 [14:30:51] ? [14:30:57] Ok. If you want. [14:31:01] i mean it saved our ass already once ;0 [14:31:03] so what you say is: [14:31:13] there was a configuration mistake made [14:31:17] we saw weird numbers [14:31:25] did an analysis, which led to the fix [14:31:31] and show people the graphs [14:31:32] ok, let's demo 822, 824 and explain 1072 [14:31:34] people love the graphs :) [14:31:36] don't use public-kraken [14:31:41] buut uuuse [14:31:48] http://stat1001.wikimedia.org/public-datasets/ [14:31:49] eh? [14:31:55] er? [14:32:30] ottomata: How can I get data into that directory? I am neither root, nor in the ww-data group. Is it fetched from somewhere else? [14:32:39] hmmm, right now it is synced from stat1 [14:32:43] /a/public-datasets [14:32:47] but that is annoying for you, right? [14:32:50] since you are working on stat1002? [14:32:59] i think we can set up the rsync from stat1002 too...... [14:33:03] if that is easier? [14:33:35] ottomata: I can just copy it over for the time being. [14:33:40] ottomata: That's ok [14:33:57] ok thanks [14:33:58] yeah so [14:34:07] if you put files in stat1:/a/public-datasets [14:34:12] they will eventually show up at that stat1001 url [14:34:32] it is synced every 30 mins [14:35:26] ottomata: /a/public-datasets contains mostly wiki directoryies. So I just add a kraken directory in there? Or a mobile? What structure do you prefer? [14:35:55] hmm, whatever you want is fine, maybe not 'kraken' [14:36:04] maybe an analytics/ [14:36:05] subdir? [14:36:25] Sure. �analytics/mobile� ? [14:36:26] analytics/mobile is cool? [14:36:27] yeah [14:36:41] Ok. I'll use that one then. [14:36:45] ottomata: Thanks [14:38:19] ottomata: what's left to be done for 933? [14:38:38] 933 is camus? [14:38:40] yup [14:38:41] it is kind of undefined i think [14:38:45] :( [14:38:48] well that's not good [14:38:50] will my pull requests be merged upstream? [14:38:52] is that important? [14:39:03] Snaps and I have been talking about using timestamp in the kafka key [14:39:11] do we want to do that instead of parse the json for timestamp value? [14:39:16] i think upstream merge is nice to have, not required [14:39:27] timestamp in kafka, could that be a separate card? [14:39:41] will Snaps put a newline at the end of each json object, or do I need to support that in the recordwriter? (he might have already done this) [14:40:01] i think Snaps will put a newline in each json objet [14:40:05] ottomata: there wont be a newline [14:40:08] HAHA [14:40:09] haha [14:40:14] i am WRONG [14:40:16] dawww, no, why not? can we? [14:40:16] YES! [14:40:16] :) [14:40:24] just after the object? [14:40:27] of course we can, but why? [14:40:48] readability, slightly easier parsability (but not really) [14:40:52] if it isn't in the kafka message [14:40:58] i ahve to insert it during etl phase [14:41:17] and it seems wrong to manually insert a newline, especially if i am trying to get this class upstreamed in camus [14:41:27] so, then I'd have to support arbitrary delimiters [14:41:28] etc. [14:42:07] the text log output has newlines at the end, right? :) [14:47:11] ottomata: timestamp in kafka key, could/should that be a separate card? [14:47:53] yeah [14:47:59] basically [14:48:05] i think we can move that card into done if you want [14:48:08] everything that it says is done [14:48:13] i just might work on it more [14:48:30] make it better, etc. [14:48:36] i know i know ;) [14:56:42] I thought you were gonna try to keep bandwidth usage down. [14:56:55] (Sorry, got sidetracked) [14:57:40] Snaps: eh? [14:58:03] what are you referring to? the addition of a newline? [14:58:06] yes [14:58:22] I'll have to talk to mark about this, dont like where this is going [14:58:51] haha [14:58:51] ok [14:58:56] which part, the newline or more than that? [14:59:19] seriously. newline, not a problem. [14:59:29] stuff in kafka message key: a bit of work, but not a problem. [14:59:45] the message key thing we aren't sure about, that def needs thought and discussion [15:00:24] Snaps: which part are you worried about? [15:00:59] none, I was just kidding about the newline. [15:01:24] oh haha, ok [15:01:30] phew! [15:02:03] i think i'm fine if you hardcode that in, but we could also put it in the output format string if you like [15:02:11] %x %t…..\n [15:02:12] ? [15:02:24] up to you [15:02:24] yeah, I can add a separate format stirng for the key [15:02:47] oh, yeah, no i mean the newline after the json object [15:02:51] which is very flexible, and also opens up for using the key for various partitioners. [15:02:52] right now we specify all the fields in the json object [15:03:01] but for key, yeah, that would be awesome too [15:03:09] I can add support for literal '\n' in the format string. [15:03:16] awesooome [15:03:36] up to you though. i know that might be kind of awkward in the json format string as is [15:03:55] since the json isn't necessarily ordered, and its not like we also enclose the object in the format string [15:04:00] it isn't like: [15:04:27] format = { %x %{t@timestamp} … \n } [15:04:29] oops [15:04:35] format = { %x %{t@timestamp} … }\n [15:04:47] but, up to you [15:04:54] yeah, it would be a special case for newlines at the end of format.json = [15:04:57] aye ok [15:05:04] but I guess it would be quite a common use-case [15:05:04] cool [15:05:08] yeah i think so [15:27:22] sorry guys - my wireless is going CRAZY [15:27:36] I'll be on/off line until I sort it out :( [15:35:52] ugh... they gave me a new laptop with the SAME OLD REALTEK CRAP CARD [15:36:19] welp [15:36:20] qchris, any idea if lshw && lspci could possible be wrong? [15:36:38] I took the hard drive from my old machine and installed it on the new machine [15:36:55] that's my last hope, that the driver is forcing lspci to tell me I have the wrong card [15:37:12] otherwise I'm sending this machine back and just buying one myself. [15:46:07] ottomata: could you do an email describing what you want? else I'll forget [15:47:52] sure [15:48:48] qchris: is 1023 now finished with syncing the data to http://stat1001.wikimedia.org/public-datasets/ ? [16:00:53] hi [16:01:02] just woke up [16:04:25] anyone wanna talk 704 ? [16:04:56] milimetric: ^^ [16:10:19] milimetric, average wants to talk about 704 [16:17:23] ok, I'll jump in the standup average & drdee [16:17:32] ah [16:17:32] nvm [16:17:34] it's taken [16:17:37] average, what's up [16:20:26] hangout ? [16:25:51] milimetric: btw, found a way to run less tests, but can't suppress output for the other ones. [16:25:56] rm .coverage test.db enwiki.db ; find -name "*.pyc" | xargs rm ; nosetests --cover-erase -e models tests/test_controllers/test_home.py:TestHomeController [16:26:02] average the hangout is taken up by Toby and Christian atm [16:26:06] milimetric: Sorry. I've been in a meeting. [16:26:13] no prob, I know [16:26:16] Oh. The hangout should be free again :-) [16:26:21] cool, thanks [16:26:23] average, hangout? [16:26:31] About lspci ... I never heard of lspci being wrong ... [16:26:49] heya Snaps [16:26:54] is this: [16:26:55] # Required number of acks [16:26:55] topic.request.required.acks = 1 [16:27:03] in that [16:27:10] is 'request' a topic name? [16:27:21] milimetric: I am not sure about the concrete problems. But the ids of lspci should work. [16:27:30] ok [16:27:43] no worries, thanks. I'll just fight through it. [16:27:49] I get really really frustrated with Linux :) [16:27:57] <-- windows guy [16:28:01] milimetric: :-D [16:28:03] ottomata: nopes. its just part of the property name. those properties are identical to the apache kafka counterparts. [16:28:13] ottomata: and there is only topic in varnishkafka. [16:28:22] ok [16:28:24] right [16:28:37] milimetric: what's the concrete problem? Your computer not seeing a realtex card? [16:28:41] according to this the property name is just 'request.required.acks' [16:28:41] http://kafka.apache.org/08/configuration.html [16:28:45] milimetric: Network card? [16:28:55] Snaps, is there a way to set sync vs async? [16:28:58] or is it always async? [16:29:23] ottomata: yeah, but I prepended the topic part since it is set per topic in librdkafka, not globally. [16:29:34] ok cool [16:29:37] that's fine [16:29:53] Snaps: its always async. the application calls the produce API of librdkafka, which is non-blocking, and recieves a callbakc when the message is either delivered, or failed delivery. [16:30:11] snaps->ottomata [16:30:12] a callback? [16:30:20] is that based on acks? [16:30:23] what if acks is 0? [16:30:29] ottomata: yeah. [16:30:42] acks=0 i guess there is just no callback [16:30:45] eh? [16:30:53] ottomata: so if required.acks = 0 it will not call a callback (actually, it will, for letting the application free the payload, if configured to do so) [16:31:00] oh ok, hm [16:31:02] interseting [16:31:09] ok so, here's my q [16:31:09] so if you are more into performance than message delivery reliability, you do required.acks = 0 [16:31:13] it will probably save some cycles [16:31:22] i'm playing with controlled broker shutdown and new leader election [16:31:30] as far as I can tell, as long as acks != 0 [16:31:44] when a new leader is elected, varnishkafka should be notified, right? [16:31:49] and start producing to the new leader? [16:32:30] ottomata: yes. but it needs to be improved in librdkafka (theres an open issue for it) [16:32:45] hm ok, i just tried it with acks=1, and it didkn't seem to work [16:33:00] i had to restart varnishkafka for producers to find the new leader [16:33:04] lemme try a few more things to confirm [16:33:20] that was pushed yesterday to librdkafka, so you are probably on an old version. [16:33:23] oh! [16:33:24] haha, ok [16:33:25] awesome [16:33:29] but its not enough [16:33:51] so what it will do is poll the broker's for updated topic+partition (toppar) -> broker mappings at regular intervals. because right now it will poll the brokers right away when it sees that the current broker is no longer leader. [16:34:05] But if the new leader has not been elected yet, then it will just stall [16:34:20] until a connection is re-established with any broker, at which time the metadata list is updated. [16:34:29] So I'll add that regular thingie tonight. [16:34:45] But, it needs to be quite aggressive [16:35:06] hmmmm [16:35:15] don't brokers notify connected producers? [16:35:17] when things change? [16:35:19] do you have to poll? [16:35:22] So you're producing 100kmsgs/s, then you dont want to poll the brokerlist every two seconds or so, because thats alot of messages queueing up and maybe dropped (if thresholds are configured too low) [16:35:48] so I think I'll use some backoff mechanism, start to poll right away, then back off by 100ms each time. [16:36:06] No, unfortunately it does not. It needs to be polled [16:36:16] and that should really be fixed in kafka. [16:36:24] hmm, in 0.7 they were def notififed [16:36:27] because the producers used zookeeper [16:36:30] i did testing with that [16:36:36] yeah, with zookeeper its probably another story [16:36:40] but that requirement is gone in 0.8 [16:36:43] i could crash a broker and producers would reconfigure themselves real fast [16:36:43] right [16:36:44] hm [16:36:47] that doesn't sound good [16:36:56] And its always problematic to get "authoritative" information from two different paths, they're bound to be incorrect [16:37:19] so binning zookeeper for the producers and adding proper notifications would be the proper way to go [16:38:19] snaps, coudln't you just poll if there is an ack error? [16:38:48] until either librdkafka supports zookeeper or kafka protocol supports notifications I'll add a setting for varnishkafka telling it how many messages it may buffer when the broker is down. That in combination with aggresssive polling will hopefully lead to no dropped messages in case of broker failure. [16:38:51] Thats what I do [16:38:54] oh ok [16:38:58] so you aren't polling all the time [16:39:00] just when there's an error [16:39:09] the problem is that new leader probably hasnt been elected when I poll after an error [16:39:11] yeah [16:39:14] rigiht yeah [16:39:17] so you have to poll until that happens [16:39:18] but thats what Im adding, an all the time poller. [16:40:05] https://github.com/edenhill/librdkafka/issues/14 [16:40:07] thats the issue [16:42:40] hmm, k reading [16:43:02] so, should I try to recompile librdkafka + varnishkafka now and try again? or should I wait for your future fixes? [16:43:48] you will probably need the fix for #14 [16:44:06] Im not really sure how fast leader-reelections are, but most likely not as fast as the poll-on-err request [16:44:18] but I'll try to get that sorted tonight [16:44:44] k, much appreciated [16:45:45] Snaps, looks like they are trying to deal with the acks=0 case in the Scala producer code too: [16:45:46] https://issues.apache.org/jira/browse/KAFKA-955 [16:47:40] thats a phony fix [16:47:58] closing the socket will break communication for all topics that still work with that broker. [16:48:11] why not just send a notification saying the broker list is updated [16:49:26] yeah, true, but i think they are mainly trying to handle the controlled shutdown case [16:49:43] where all partitions on a broker get a new leader [16:50:07] that would be better though [16:50:15] maybe they are trying to not mess with the broker code [16:50:18] and only do this in the producer [16:51:23] dealing with linux, brb in time for the demo I hope [16:51:56] ottomata: but it will happen if any toppar gets a new leader. [16:52:00] yeah [16:52:10] does the broker even keep a list of producers? [16:58:15] it knows that something is connected, but not what [16:58:29] but it would send metadata update to all connected clients. [16:58:46] I commented that on the KAFKA-955 issue [16:59:01] so lets see what (probably wont) happen [17:00:11] ok cool, danke. i was about to comment too, but I think you understand the issue better than I do :) [17:00:45] :) [17:21:12] Snaps, this issue seems relevant too: [17:21:14] https://issues.apache.org/jira/browse/KAFKA-691 [17:21:39] seems like the same problem: making sure producers can produce when brokers change [17:57:09] what's the live url of wikimetrics ? [17:57:45] metrics.wmflabs.org [17:58:18] drdee: thanks [18:10:10] milimetric: /me likes the configure output bit [18:10:23] cool, glad to hear [19:02:30] ottomata: mmkay [19:46:17] milimetric: does flake8 check for compilation errors ? [19:46:35] is there a way to check all files for compilation errors before actually running [19:46:48] don't think so [19:47:08] flake8 does check [19:47:11] but only small stuff [19:47:24] since python is interpreted, some things only show up at runtime [19:50:42] average ^ [19:51:50] ok [20:26:33] milimetric: can I ask a q ? [20:26:42] how difficult is it to deploy wikimetrics ? [20:26:44] sure [20:26:49] not hard [20:26:56] git push & restart stuff ? [20:26:57] depends if there's a database change atm [20:27:03] oh, ok [20:27:08] no db changes for 704 I think [20:27:18] yeah, once stuff's in gerrit, I have a script that does the deployment [20:27:31] it's up on wikimetrics.pmtpa.wmflabs [20:27:43] but you can try it locally first [20:27:49] and I can deploy if you're all set [20:28:02] ok [20:28:05] once you finish the metric, you can try writing some unit tests for it [20:28:16] that should get pretty far into testing it [20:30:20] 18:25 < average> rm .coverage test.db enwiki.db ; find -name "*.pyc" | xargs rm ; nosetests --cover-erase -e models tests/test_controllers/test_home.py:TestHomeController [20:30:29] milimetric: can you try this out ? ^^ [20:30:47] milimetric: it works faster for me, but for some reason it still shows the report for other tests [20:31:58] hey that's great average! [20:32:01] I think that does it [20:32:28] yeah, but it doesn't suppress output for the other ones [20:32:32] the reason it shows the coverage report, I think, is because coverage is enabled in the setup.cfg [20:32:35] which is fine [20:32:40] it's not output, it's the coverage analysis [20:33:01] if we took off the coverage analysis from the setup.cfg, it wouldn't show [20:33:08] but it's fine, doesn't cost anything other than screen space [20:33:30] coverage is not bad, but I was expecting only to get it for what test_home.py tests [20:33:45] but yeah [20:45:28] milimetric: I need to test this [20:45:43] test what? [20:45:44] milimetric: uhm, first just as a user would, from my browser [20:45:49] ok [20:45:50] uhm, testing this new metric I'm adding [20:45:53] wikimetrics --mode web [20:46:00] yeah, it's open that one [20:46:02] and in another terminal: wikimetrics --mode celery [20:46:06] I need an example cohort [20:46:27] I've used tests/csvs_cohorts/for_enwiki.csv [20:46:34] not sure if I should use that one [20:46:37] ok, to give yourself some demo cohorts, visit: https://localhost:5000/demo/create/cohorts [20:47:08] cool, got some cohorts now [20:47:45] milimetric: how can I make the new metric show up in "Pick Metrics" ? [20:47:58] i like that question average! [20:48:03] that sounds like UI work [20:48:16] the real CSV cohorts won't work because the users don't exist in the test enwiki.db, dewiki.db, etc. [20:48:19] oh [20:48:37] ok, so to make it show up in PickMetrics, you'd have to make sure you import it everywhere, then set show_in_ui to True [20:49:32] *import it everywhere other metrics are imported [20:49:52] wikimetrics/metrics/__init__.py ? [20:49:54] oh, average, you can jump into the hangout if it's not working [20:49:56] yeah [20:55:13] so I basically copied namespace_edits.py [20:55:17] and then renamed [20:55:21] and then I created the demo cohorts [20:55:26] and when I tried to create a report [20:55:29] I got in celery (OperationalError) no such table: revision_userindex u'SELECT revision_userindex.rev_user AS revision_userindex_rev_user, count(revision_userindex.rev_id) AS count_1 \nFROM revision_userindex JOIN page ON page.page_id = revision_userindex.rev_page \nWHERE page.page_namespace IN (?) AND revision_userindex.rev_user IN (?, ?, ?) GROUP BY revision_userindex.rev_user' (0, 9, 10, 11): (OperationalError) no such [20:55:36]