[00:01:44] (PS1) Milimetric: Add downloads query and config [analytics/limn-extdist-data] - https://gerrit.wikimedia.org/r/221799 (https://phabricator.wikimedia.org/T101194) [00:02:06] (CR) Milimetric: [C: 2 V: 2] Add downloads query and config [analytics/limn-extdist-data] - https://gerrit.wikimedia.org/r/221799 (https://phabricator.wikimedia.org/T101194) (owner: Milimetric) [00:20:39] Analytics-Kanban, MediaWiki-extensions-ExtensionDistributor, Patch-For-Review: Set up graphs and dumps for ExtensionDistributor download statistics {frog} [3 pts] - https://phabricator.wikimedia.org/T101194#1412345 (Milimetric) I've done everything on my end. There's a problem with wikitech so I can't... [00:31:41] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1412355 (Milimetric) I see the dashboard says you're sampling at 0.1%. In that case, the jump to 5% can't be done right now. It would simply overwhelm mysql with too many events and... [00:35:32] Analytics-Kanban, Reading-Web: Cron on stat1003 for mobile data is causing an avalanche of queries on dbstore1002 - https://phabricator.wikimedia.org/T103798#1412358 (Milimetric) @Yuvipanda, generic action metrics would be lovely but it would continue this rift where people have to go to multiple places to... [00:51:55] Analytics, Analytics-Backlog, Performance-Team: Collect HTTP statistics about load.php requests - https://phabricator.wikimedia.org/T104277#1412430 (Krinkle) NEW [02:40:25] (PS1) Milimetric: Fix type-o [analytics/limn-extdist-data] - https://gerrit.wikimedia.org/r/221826 [02:40:39] (CR) Milimetric: [C: 2 V: 2] Fix type-o [analytics/limn-extdist-data] - https://gerrit.wikimedia.org/r/221826 (owner: Milimetric) [10:54:22] Quarry, Labs: Quarry is a tested way to perform a DOS against labsdb - https://phabricator.wikimedia.org/T104308#1413194 (valhallasw) [11:07:33] Quarry, Labs: Quarry is a tested way to perform a DOS against labsdb - https://phabricator.wikimedia.org/T104308#1413228 (valhallasw) > Disable querry, as it goes against labs usage policy (allowing arbitrary queries to non lab users) Note that this is a Tool Labs policy, not a generic Labs policy. [11:16:06] Quarry, Labs: Quarry is a tested way to perform a DOS against labsdb - https://phabricator.wikimedia.org/T104308#1413264 (yuvipanda) So the measures that Quarry has in place to prevent abuse are: # Tied to user accounts on SUL, and the queries themselves have the username attached. We can trout people wh... [11:17:58] Quarry, Labs: Quarry is a tested way to perform a DOS against labsdb - https://phabricator.wikimedia.org/T104308#1413266 (yuvipanda) I'll note that the only person who seems to have run toxic queries has been you :) http://quarry.wmflabs.org/JCrespo%20(WMF) has broken Quarry as well. [11:20:29] Quarry, Labs: Quarry is a tested way to perform a DOS against labsdb - https://phabricator.wikimedia.org/T104308#1413276 (yuvipanda) ('broken' in the sense that it is reporting non-running queries (they were killed too) as running, not that Quarry itself isn't working anymore) [11:20:51] Quarry, Labs: Quarry is a tested way to perform a DOS against labsdb - https://phabricator.wikimedia.org/T104308#1413277 (jcrespo) My suggestions: * queries should be limited to 1 per account * There should be a throttling on new account creation [11:21:29] Quarry, Labs: Quarry is a tested way to perform a DOS against labsdb - https://phabricator.wikimedia.org/T104308#1413278 (jcrespo) [11:21:53] Quarry: Limit number of concurrent queries one user can run at a time - https://phabricator.wikimedia.org/T104316#1413282 (yuvipanda) NEW [11:23:31] Quarry, Labs: Quarry is a tested way to perform a DOS against labsdb - https://phabricator.wikimedia.org/T104308#1413290 (yuvipanda) I've filed T104316 for #1. Can you explain what you mean by #2? Quarry's logins are just Wikimedia SUL accounts... I think '1' is too stringent a limit - Quarry has never c... [11:28:13] Quarry, Labs: Quarry is a tested way to perform a DOS against labsdb - https://phabricator.wikimedia.org/T104308#1413316 (jcrespo) Accounts probably should be autoconfirmed, or have a verified email to get in contact with them. Or maybe have a default small query limit (1 min), make it longer for older a... [11:30:46] Quarry, Labs: Quarry is a tested way to perform a DOS against labsdb - https://phabricator.wikimedia.org/T104308#1413318 (yuvipanda) I think we should be as permissible as possible and do restrictions based only on actual abuse than imagined. Autoconfirmed does not work - autoconfirmed on which wiki? meta... [11:49:15] Quarry, Labs: Quarry is a tested way to perform a DOS against labsdb - https://phabricator.wikimedia.org/T104308#1413354 (jcrespo) Open>Invalid [12:26:42] Quarry: Have an easy way to ban users from Quarry - https://phabricator.wikimedia.org/T104322#1413399 (yuvipanda) NEW [12:26:53] Quarry, Labs: Quarry is a tested way to perform a DOS against labsdb - https://phabricator.wikimedia.org/T104308#1413409 (yuvipanda) Filed T104322 as well as follow up. [13:20:58] Hi ottomata :) [13:21:10] ottomata: Any wrong cron email ? [13:21:39] hello! good morning! am checking emails now! [13:21:53] Cool :) [13:21:54] Thx [13:27:33] good morning guys :] [13:27:40] mornin [13:27:46] good afternoon, too [13:27:57] Good afternoon mforns :) [13:32:12] joal: i got no obvious cron email, but i dont' get them! [13:32:17] on purpose! let's check the job [13:32:29] seems the job has not run on new folder :( [13:33:26] so the log says not much, eh? [13:33:30] Reading /mnt/hdfs/wmf/data/archive/projectview/webstatcollector/hourly/2015/2015-06/projectcounts-20150629-010000 [13:33:31] ... [13:34:23] so it ran [13:34:25] joal / ottomata: you can merge the wikimetrics role change that sets up the new link, I +1-ed it [13:34:27] oh [13:34:35] Thx milimetric :) [13:34:35] hm [13:39:24] hm joal [13:39:30] RuntimeError: '/mnt/hdfs/wmf/data/archive/projectview/webstatcollector/hourly/2015/2015-06/projectcounts-20150629-020000' is not an existing file [13:39:41] i think we should direct stderr to our log file :) [13:42:05] joal: is it possible oozie job is backed up? [13:42:23] ottomata: possible, I'll check [13:42:59] ottomata: one failure in coordinator [13:44:35] ottomata: Trying to re-run it now [13:53:59] ottomata: madhuvishy's job is starving the cluster ... [13:54:03] Shall we kill it ? [13:54:57] it is so far along! [13:55:01] halfak: bad news, there is some good changes to be made for old InputFile [13:55:04] :( [13:55:13] it is in default queue, it should be preempted eventually by prod stuff, ja? [13:55:35] should, but kafka output has been slow for the past 2 hours [13:55:50] ottomata: --^ [13:59:27] looking (doing buncha things at once! :D ) [14:00:29] hm ja tha tis kind bad. wouldn't have thought default queue could do this.. [14:00:29] hm [14:00:57] ottomata: neither do I, something must be wrong somewhere, don't you think? [14:01:10] 242 reduces running [14:01:14] can we just kill some of those? [14:01:43] can try [14:01:55] already 820 killed though [14:02:24] yea not one successful, that doesn't look good [14:02:44] nope [14:03:59] halfak: have aminute ? [14:04:08] ottomata: what do you suggest ? [14:04:28] stilll looking [14:04:30] Sure joal [14:04:30] k [14:04:41] halfak: batcave for a minute ? [14:04:42] joal btw: https://gerrit.wikimedia.org/r/221858 [14:04:45] kk [14:12:20] joal: i think we can kill it :( [14:12:30] yeah, nothing better I guess :( [14:12:40] here is the tail of the logs of one of the running reducers [14:12:56] https://gist.github.com/ottomata/00dee21b3d844be70cce [14:12:59] Analytics-Cluster, Analytics-Kanban: Add Pageview aggregation to Python {musk} [13 pts] - https://phabricator.wikimedia.org/T95339#1413740 (mforns) a:mforns [14:14:05] joal: do you know what this job is? [14:14:12] no i don't [14:14:33] k [14:14:36] killing [14:14:38] will send email [14:14:44] thx ottomata [14:26:15] ottomata: action rereunned succesfully [14:26:32] ottomata: Will wait tomorrow to see if aggregated data shows up :) [14:28:54] k [14:29:13] ottomata: that logging will mess up with the python one, no ? [14:36:09] how? [14:36:23] oh need 2 >> :)? [14:37:03] hum, the file will be used inside python to log in [14:37:40] yes [14:38:01] but the stderr will just be appended if i had done it right. i looked for easy way to use python logging to send stderr to file [14:38:02] but it was annoying [14:38:57] do you know of a better way? [14:39:01] k, if that doesn't mess too much, that should be any issue ;) [14:39:06] No, I don't actually [14:39:54] joal, I claimed this task to do next, is it OK with you? https://phabricator.wikimedia.org/T95339 [14:40:13] mforns: Sure ! [14:40:17] Please go :) [14:40:18] cool [14:41:18] joal, do you know if this https://github.com/wikimedia/analytics-aggregator is the codebase to modify in this task? [14:41:36] it is mforns :) [14:42:18] But I think most of the job could be done in Hive actually [14:42:26] joal, mmmm [14:42:40] mforns: let's discuss that with ottomata in a few minutes, if you wish [14:43:09] joal, that´s the task that we discussed for a long time in tasking like 2 or 3 weeks ago, right? [14:55:27] mforns: batcave ? [14:55:33] joal, sure@ [14:55:34] ! [14:57:25] joal, it all validates! [14:57:28] :) [14:57:33] awesome :) [14:58:32] Analytics-Cluster, hardware-requests, operations: Hadoop worker node procurement - 2015 - https://phabricator.wikimedia.org/T100442#1413865 (Cmjohnson) I received the servers but do not have any place to put most of them. 4 are destined to go to Row D2 but the others I am not sure. I know Andrew B sa... [15:01:28] Analytics-Cluster, hardware-requests, operations: Hadoop worker node procurement - 2015 - https://phabricator.wikimedia.org/T100442#1413869 (Ottomata) Hm. We don't have any plans to replace any analytics nodes, but we might be able to move some of them around, as some will be repurposed as non distri... [15:10:15] halfak: some good news among the difficulties ;) [15:32:47] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Productionize reporting of app session metrics {hawk} [8pts] - https://phabricator.wikimedia.org/T97876#1413992 (kevinator) Open>Resolved [15:36:54] Analytics-Kanban, Patch-For-Review: Link to new projectcounts data and serve via wikimetrics {Musk} [5 pts] - https://phabricator.wikimedia.org/T104003#1413999 (ggellerman) a:JAllemandou [15:37:52] joal: I was wrong, I don't have enough info to update https://phabricator.wikimedia.org/T101118, could you do it please? [15:38:02] Will do :) [15:40:40] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1414010 (kevinator) [15:40:43] Analytics-EventLogging, Analytics-Kanban: Load Test Event Logging {oryx} [8 pts] - https://phabricator.wikimedia.org/T100667#1414009 (kevinator) Open>Resolved [15:40:58] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Create current-definition/projectcounts {musk} [13 pts] - https://phabricator.wikimedia.org/T101118#1414011 (JAllemandou) [15:44:43] ottomata: kafka is happier, ate a big breakfast ! [15:50:24] :) [15:50:47] halfak: cana't make it today with Altiscale [15:50:58] sorry for late notice :S [15:57:28] ottomata: WIll you have the time to review puppet wikimetrics pageviewts ? [15:57:34] sure [15:57:50] also, I reiterate, if you htink I can help on jmx, let me know [15:58:51] Analytics-Tech-community-metrics, ECT-June-2015: Most basic Tech Community metrics are published and up to date - https://phabricator.wikimedia.org/T94578#1414056 (Aklapper) [16:00:06] halfak: have you noticed my previous comment ? [16:00:08] Analytics-Tech-community-metrics: Present most basic community metrics from T94578 on one page - https://phabricator.wikimedia.org/T100978#1414075 (Aklapper) [16:00:15] Analytics-Tech-community-metrics, ECT-June-2015: Most basic Tech Community metrics are published and up to date - https://phabricator.wikimedia.org/T94578#1414072 (Aklapper) Open>Resolved All data we wanted (as defined in the task description) is available in public places now, so I'm closing this t... [16:00:22] halfak: Since you didn't answer, I wonder: ) [16:00:36] ottomata: commuting to office now, will ping when I'm there :) [16:00:57] joal, gotcha. [16:01:02] Was AFK for a bit. [16:01:15] Just joining the call now. It's likely to be brief then. [16:01:18] milimetric: updated https://phabricator.wikimedia.org/T101118, is it better ? [16:01:20] Anything you'd like me to bring up? [16:01:36] nope, didn't have enough time to work more [16:01:40] halfak: --^ [16:01:43] OK [16:01:55] Thx mate [16:12:23] Analytics, Analytics-Dashiki: Improve the edit analysis dashboard - https://phabricator.wikimedia.org/T104261#1414186 (Neil_P._Quinn_WMF) [16:12:51] Analytics, Analytics-Dashiki: Improve the edit analysis dashboard - https://phabricator.wikimedia.org/T104261#1412040 (Neil_P._Quinn_WMF) [16:17:39] joal that looks good, but I think at some point we should put this on dumps. Or maybe not if we get the API out fast enough... Hm, not sure [16:21:41] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1414227 (ksmith) @Milimetric: To be clear, desktop and mobile web are currently(ish) sampling at .1%. Mobile apps are at 1% (at least, that's how I read it). If that changes your anal... [16:26:00] hi milimetric. did Limn ever have the plots for the total number of active editors per month? I would use Dashiki except that I want to look a bit more back, for example, look at the data for the past year. In Limn, I see numbers for all projects for those who have done 100+ [16:27:53] leila: yeah, on the reportcard, isn't this good: http://reportcard.wmflabs.org/graphs/active_editors [16:28:39] that's the old active editor definition [16:28:54] no rolling or fancy stuff, just 5 per month, but de-duped across projects [16:30:12] milimetric: let's not put that on dumps and focus on API ;) [16:30:20] perfect. thanks, milimetric. Yeah, fancy stuff are all in Dashiki. :-) [16:30:30] Analytics-Tech-community-metrics, ECT-June-2015: Fine tune "Code Review overview" metrics page in Korma - https://phabricator.wikimedia.org/T97118#1414259 (Qgil) a:Aklapper [16:30:32] milimetric: kidding, if you want, we can add the things to dumps :) [16:30:47] leila: Hi, you were looking for me yesterday [16:30:50] no, that's good - focus! [16:30:59] milimetric: I like that as well :) [16:31:00] ow yes, joal. hi. :-) [16:31:21] joal: I was wondering if we can afford to make a copy of whole month of request logs. :-\ [16:31:45] leila: that's big ... [16:31:49] there was a bug introduced to App that was later fixed, and the data for that period can be used for a natural experiment [16:32:13] we can only keep Apps data, but keeping Deskptop and mobile data with it can help us make comparisons across devices. [16:32:18] leila, various concerns here: our retention policy, and storage [16:32:19] I understand joal. [16:32:55] leila: depends as well on which columns, but I guess you'd want most of them [16:32:55] I've talked to Michelle about retention policy. She is happy with 90 days. If we keep it beyond that, she needs to add it as an exception to the data retention [16:33:10] storage right now is my biggest concern. [16:33:11] leila: that's what I mean [16:33:25] leila: yes, the cluster is already quite full [16:33:51] joal: we can try to keep a copy of specific columns, specially the columns generated by UDFs we don't need to keep [16:33:57] would that help, joal? [16:34:30] leila: not really, the core data is the big part of it (less easy to compress using dictionnary) [16:35:25] I understand. joal: is it fair to say that we don't have space for it? [16:35:44] I'd like to communicate this with Dario so we can figure out what we want to do in this case, and in the future. [16:36:09] leila: let's bring ottomata into this [16:36:22] hi ottomata. :-) [16:36:32] leila, can you filter down the requests for what you need? [16:36:33] leila: in my opinion, we shouldn't do it: barely enough space [16:36:38] e.g. maybe only text requests? [16:36:52] And maybe drop fields too? [16:37:00] leila: only mobile requests would be really more afordable [16:37:07] ^ that too [16:37:17] depends on what you're after leila [16:37:23] joal: I completely understand if there is space limitations. I'd like to communicate this with Dario so we can plan better for the future. Thanks for your help. :-) [16:37:42] hellOOoOo? [16:37:56] so, currently cluster is about 70% full [16:37:57] Analytics, Analytics-Dashiki: Improve the edit analysis dashboard - https://phabricator.wikimedia.org/T104261#1414283 (Neil_P._Quinn_WMF) [16:38:19] halfak: the main fields we need to keep cannot be compressed enough. we can go with keeping the bare minimum, but then if we need to do any desktop/mobile comparisons for that month, we cannot. I prefer to leave more room for exploration for that month if we want to spend time on it. [16:38:43] joal: how much adding a month would change that number? [16:38:58] (not pushing here, joal. just trying to get all the data points I need. :-) ) [16:39:04] Let's talk about text and mobile data (no upload nor bits not misc) [16:39:11] ok ? [16:39:17] for june [16:40:34] leila: ok for no uploads ? [16:40:40] joal: let me do this. I'll talk to Dario about this general problem. We will get the bare minimum we need for this one not to affect the cluster too much. [16:40:47] joal: yeah, I don't think we need uploads. [16:40:53] (making a note to exclude it) [16:41:10] Can filter out an awful lot by only looking at requests that represent pageviews. [16:41:24] in the next Dev and R&D meeting, we can discuss how we can do this in the future (I'm assuming we will need more resources) [16:41:39] halfak: that makes sense. make a note of that, too [16:42:19] So with refined table, this would about 22Tb useful, so 66Tb real --> about 80% full [16:42:32] got it, joal. thanks! [16:42:34] so, the month of may is about 50T [16:42:42] june [16:42:52] may ;-) [16:42:56] may already has data gone I think [16:43:11] no we keep 60 days, right? if it is gone only a little is gone [16:43:12] but sure [16:43:18] june is currently [16:43:22] yeah, 60 days [16:43:42] 128.9T?!?!? [16:43:44] what the [16:43:50] joal [16:43:50] 86.5 G 259.6 G /wmf/data/wmf/webrequest/webrequest_source=bits/year=2015/month=6 [16:43:55] 1.8 T 5.5 T /wmf/data/wmf/webrequest/webrequest_source=bits/year=2015/month=5 [16:44:00] NOW what is happenign with bits?! :p [16:44:05] :D [16:44:16] ok well, leila, you don't need bits [16:44:17] Analytics-Backlog, Analytics-Dashiki: Improve the edit analysis dashboard - https://phabricator.wikimedia.org/T104261#1414296 (Milimetric) [16:44:19] or updload, right? [16:44:20] ottomata: I think it goes into texct ;) [16:44:24] correct, ottomata. [16:44:39] May still here, but not for long ;) [16:44:41] so yeah, just text and mobile about 20+T [16:44:42] thanks guys, I think I have a plan of action. this was helpful, as usual. :-) [16:44:50] x3 is 60T [16:44:58] leila: how long do you want to keep it? [16:45:07] ottomata: when should we receive new hardware ? [16:45:08] ottomata: 2-3 months, at most. [16:45:12] Analytics-Backlog: EventLogging cleanup debrief - https://phabricator.wikimedia.org/T104351#1414298 (mforns) NEW [16:45:16] i think we can deal with that, we could also reduce replication factor [16:45:20] maybe to 2 for your savd data [16:45:27] and if we only save some fields, yeah data size is smaller [16:45:39] joal: in june, both bits and text are huge [16:45:39] makes sense ottomata. will do that. [16:45:50] OHH [16:45:50] ottomata: weird ! [16:45:51] whoops [16:45:53] no it snot [16:45:54] G [16:45:54] not T [16:45:55] :p [16:45:56] phew ! [16:45:57] :) [16:46:01] bits is getting smaller [16:46:06] good. [16:46:06] haha [16:46:09] And text bigger, I noticed as well [16:46:11] yes [16:46:12] that is fine [16:46:13] ok phew [16:49:59] madhuvishy: ok, i am getting lunch, but here at this cafe for a bit, then will run home [16:50:06] just finished up with that annoying jmxtrans stuff [16:50:11] how can I help right now? [16:59:59] milimetric: i could only hear a bit of what you were saying earlier about kafka and el throughput [17:00:07] did you get to benchmark processor consuming from kafka? [17:00:17] yes [17:00:25] one sec and I'll catch you up [17:00:43] k [17:03:28] madhuvishy: do we have a wikitech page describing the mobile app session metrics table ? [17:07:43] ok ottomata, so the summary is: [17:07:58] kafkacat -> file -> processor: 1400 events per second [17:08:17] processor straight from kafka, reading from beginning of topic: 1000 events per second [17:08:34] from what I can tell, invalid events cause a lot of the variance in performance [17:08:41] when there are more invalids, it takes much longer [17:08:58] but I haven't tested that [17:09:37] hm [17:09:41] for comparison, test_load.py (with 4 schemas and no invalids) -> processor: 1700 events per second [17:09:52] so, that is about the same? [17:09:59] as 0mq? [17:10:13] which isn't surprising. [17:10:19] I haven't tried 0mq, that's not set up on an04 right? [17:10:50] sure, you can set it up [17:10:58] just run a processor like you do, but make it read from a tcp:// uri [17:11:08] and then have something else pipe it in [17:11:10] the penalty for grabbing from kafka instead of a file seems to be about 400 events per second [17:11:11] a forwarder maybe [17:11:21] in short, python sucks :) [17:11:25] so, hm, kafka does have a bit of a ramp up time [17:11:37] once it starts running though it shouldn't have much overhead [17:11:40] but more than a file, for sure [17:11:46] it does, I did run the test over 4 different durations though to try and tease out that 5 second startup lag [17:12:11] same approximate throughput not considering that handshake lag? [17:12:25] lemme double check the per-run time for the different times [17:12:25] ottomata: since we are CPU bound for JSON decoding, same rate for 0mq and kafka is expected, right? [17:12:30] ok, welp, so. that means we have to parallelize! :) [17:12:30] yes [17:12:30] milimetric: --^ [17:13:08] ottomata, milimetric : then it also sounds normal that errors in schemas reduce performance (more work as JSON decoding) [17:14:19] guys, check this variance out though: [17:14:49] run 1: 1006 e/s, run 2: 992 e/s, run 3: 1208 e/s [17:15:13] if I take out 5 seconds for supposedly constant startup time: [17:15:38] run 1: 1156 e/s, run 2: 1059 e/s, run 3: 1331 e/s [17:16:03] so these numbers are too unstable to count on really [17:17:07] we're right around 1000 e/s with normal load but what if we have a spike of invalid events. I think that's what we should test. I'll modify the processor to add invalid data and test again, one sec [17:17:31] milimetric: very good idea :) [17:21:12] ottomata: around? [17:21:23] joal: no there's no wiki page - I'll add one :) [17:21:33] awesome madhuvishy :) [17:22:45] madhuvishy: yes! [17:23:10] madhuvishy: i am going to build a package for python-kafka 0.9.4, so we can use multirprocess consumer [17:23:20] it'd be cool to get that in so we can bench that now that milimetric has some tests we can use [17:23:50] ottomata: aah cool! [17:23:53] ottomata: I was following the EL dev setup guide on https://www.mediawiki.org/wiki/Extension:EventLogging#Developer_setup [17:23:59] I ran the dev server [17:24:14] but tried sending an event from JS console [17:24:19] eef, i have never done this! errrrrr [17:24:27] ottomata: aah [17:24:39] although that would be nice! [17:24:40] :) [17:24:44] do you not use EL on vagrant? [17:24:49] naw, i test in labs [17:24:55] but vagrant would probably be better [17:24:55] aah [17:25:00] joal: we are back from our meeting... do you want to continue the discussion? [17:25:08] sure kevinator [17:25:33] ottomata: I expect to see some events being printed on python console - but i get JS errors [17:25:39] madhuvishy: milimetric maybe can help you with this? [17:25:43] can you join the same meeting video as before joal? [17:25:50] currently trying [17:25:54] kevinator: --^ [17:26:32] madhuvishy: yes, I can help, maybe, but I have a meeting in 5 [17:26:39] ottomata and joal: weird!!!! [17:26:39] https://wikitech.wikimedia.org/wiki/EventLogging/Performance#Event_Logging_load_test [17:26:47] check out Processor take 4 and take 5 [17:26:54] ottomata: milimetric this is what I have on console - http://i.imgur.com/oEavwPY.png [17:27:08] take 4: all invalid events (I changed the code to add a random property to the event dictionary): 700 events per second [17:27:35] take 5: I commented out all logging in the error handlers, same exact test as take 4: 2600 events per second!!! [17:27:55] because the logger was writing to the console, long known to be the most efficient thing a computer can do apparently [17:28:15] oh nice [17:28:15] haha [17:28:27] I have to run to a meeting, bbl [17:28:36] k [17:28:43] madhuvishy: forget the console :p [17:28:55] madhuvishy: you have eventlogging code checked out in vagrant? [17:29:01] ottomata: yes [17:29:21] what does the dev server do? i don't even know...respond to http requests? [17:29:21] hm [17:29:24] hm [17:29:37] in meeting [17:29:53] ottomata: yeah.. that's what i thought. [17:30:03] madhuvishy: since you will only be developing processor and consumer, i think we can get you a test file to work with, and then you can read from stdin...and then kafka [17:30:11] so, let's try that [17:30:38] ottomata: okay! [17:30:50] I also need to set up kafka on vagrant [17:31:17] yes [17:31:24] for that you should just use confluent's packages [17:31:25] it will be easier [17:31:32] but also maybe more annoying for something,s but lets do that [17:31:55] kevinator: are you making it to this? [17:32:10] no, still meeting with joal [17:32:16] k, i'll go alone [17:32:19] thanks [17:32:28] ping me if you need me [17:32:33] I'm with Joal / Chris S [17:32:38] madhuvishy: here is a file [17:32:39] https://gist.githubusercontent.com/ottomata/720cb1773d174c016c45/raw/28334a2bc02d1209340198ad290e812e1694b3a7/client-side-events-raw.100.log [17:32:43] got that from beta [17:32:51] dl that to your vagrant [17:33:05] btw, when you get to installing kfka [17:33:06] do this [17:33:07] http://confluent.io/docs/current/installation.html#installation-apt [17:33:20] and then [17:33:25] sudo apt-get install confluent-kafka-2.10.4 [17:33:43] ottomata: yes doing that [17:33:46] madhuvishy: let me know when you have dled that file, let's just try to pipe that through processor [17:33:51] i have to run home in just a few mins [17:34:14] ottomata: yeah i did that [17:34:20] ok [17:34:21] theeeeen [17:34:38] hm, install some deps, lets do them this way: [17:35:22] sudo apt-get install python-jsonschema python-kafka python-mysqldb python-pygments python-pymongo python-six python-sqlalchemy python-zmq, [17:35:36] no comma at the end :p [17:35:41] sudo apt-get install python-jsonschema python-kafka python-mysqldb python-pygments python-pymongo python-six python-sqlalchemy python-zmq [17:35:47] madhuvishy: lemm ekno [17:35:49] then do [17:35:53] yup alright done [17:35:58] export PYTHONPATH=/path/to/EventLogging/server [17:36:10] and [17:36:15] cd /path/to/EventLogging/server [17:36:20] (replace with your path, obvi) [17:36:32] ottomata: yup done [17:36:42] ok now [17:36:46] let's try tio process the file [17:37:21] cat path/to/client-side-events-raw.100.log | python bin/eventlogging-processor '%q %{recvFrom}s %{seqId}d %t %h %{userAgent}ij' stdin:// stdout:// [17:37:37] you should get json output [17:38:26] how go? [17:38:29] ottomata: hmmm [17:38:31] I get [17:38:34] https://www.irccloud.com/pastebin/2HxJ0fQA/ [17:39:47] hmmm [17:40:54] hm madhuvishy is eventlogging already installed on your vagrant? [17:40:57] some older version maybe? [17:41:01] grr this part is annoying [17:41:07] ottomata: no I updated it [17:41:10] oh? [17:41:21] you sure? [17:41:24] do [17:41:28] sudo python setup.py install [17:41:32] and try again [17:41:35] btw, i don't like this part [17:41:47] i prefer to NOT have eventlogging installed when doing dev [17:41:55] because python will by deafult choose its installed stuff rather than your local dir [17:42:05] you can override this, but you have to edit the bin/ file [17:42:08] that's what I do sometimes [17:42:17] sys.path.insert(0, '/home/otto/EventLogging/server') [17:42:18] :( [17:42:37] ottomata: okay I did that [17:42:40] hmmm [17:42:49] and ran processor again [17:42:52] now i get [17:42:59] https://www.irccloud.com/pastebin/OY8XsxrR/ [17:43:14] ottomata: and a big list of Unable to process: ... messages [17:43:26] ok that is better! [17:43:30] yup [17:43:37] does it say why? [17:43:57] ottomata: nope [17:44:34] Analytics, Analytics-Backlog, Varnish: https://wikitech.wikimedia.org/beacon/statsv 404 Not Found - https://phabricator.wikimedia.org/T104359#1414509 (Krinkle) NEW [17:46:35] hm, ok, madhuvishy, not sure, but getting there! at least you can run eventlogging! :) [17:46:40] i have to run home, be back shortly [17:46:48] coo [17:46:50] cool [18:01:37] ok madhuvishy am here [18:02:26] ottomata: eating tacos and chips at the travel briefing [18:03:06] guys, do you know if all youtube streaming links can be watched after some time, or just live? [18:03:12] madhuvishy: are you on the 5th floor? [18:03:12] milimetric, ^ [18:03:21] kevinator: ya [18:03:31] haha i'm there too [18:03:35] k [18:03:42] madhuvishy, ^ [18:04:08] mforns: hmmm i think they can be watched later unless they take it down [18:04:10] travel briefing for wikimania starts soon milimetric mforns ottomata joal [19:08:14] ottomata: done :) [19:09:31] k so [19:09:34] status? [19:19:43] ottomata: the same [19:20:07] ottomata: the error message doesn't say anything [19:20:11] hm k [19:20:32] ok i can reproduce [19:20:33] ummm [19:22:59] madhuvishy: while i figreu this out, go ahead and install kafka [19:23:16] ottomata: I did - looking at how to get it running now [19:23:19] ok cool [19:28:30] ok madhuvishy [19:28:36] i think that files format was dumb [19:28:39] use this one [19:28:40] https://gist.github.com/ottomata/720cb1773d174c016c45 [19:28:44] here [19:28:44] https://gist.githubusercontent.com/ottomata/720cb1773d174c016c45/raw/015552587e1ab9c3c2de11d89b70b62fa7510bc4/client-side-events-raw.100.log [19:33:02] ottomata: hmmm, weird i get the same error [19:33:37] realilly? [19:33:37] hm [19:33:47] this one worked for me [19:33:54] ottomata: oh [19:34:28] madhuvishy: [19:34:28] cat client-side-events-raw.100.log.1 | eventlogging-processor '%q %{recvFrom}s %{seqId}d %t %h %{userAgent}i' stdin:// stdout:// [19:34:32] oop, no .1 [19:34:33] probably [19:35:01] cat client-side-events-raw.100.log | eventlogging-processor '%q %{recvFrom}s %{seqId}d %t %h %{userAgent}i' stdin:// stdout:// [19:35:14] yeah, oh madhuvishy, i think there was a paste typo in the format string in the command I gave you before [19:35:18] try that [19:36:08] ottomata: okay yes this works [19:38:48] COOOL [19:38:55] ok, now kafka [19:38:57] yes [19:39:01] got it running? [19:39:09] its running I think [19:39:36] ok, lets make a test topic [19:39:39] ummm [19:40:07] http://confluent.io/docs/current/kafka/post-deployment.html#adding-and-removing-topics [19:40:08] someting like [19:40:10] Yes it runs [19:40:16] oh zk, you have zk up, yes? [19:40:28] ottomata: yes [19:40:29] http://confluent.io/docs/current/quickstart.html [19:40:47] I was able to do 5 and 6 [19:40:57] bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic eventlogging-client-side --partitions 4 --replication-factor 1 [19:41:13] oh great, ok [19:41:20] not sure wher kafka-topics.sh gets installed [19:41:24] but try running that command [19:42:33] it's in /usr/bin [19:42:37] ottomata: that worked! [19:44:20] https://www.irccloud.com/pastebin/cBPG3g1N/ [19:44:26] great! [19:44:33] ok, now lets try putting those raw events into kafka [19:44:36] via eventlogging [19:44:43] alright! [19:45:06]  cat client-side-events-raw.100.log | eventlogging-forwarder stdin:// 'kafka:///localhost?topic=eventlogging-client-side' [19:45:43] and then jsut confirm that they are there: [19:45:52] um, somethin glke [19:46:25] hm, is there a kafka-console-consumer.sh? [19:46:28] or something like that? [19:46:59] ottomata: yup [19:48:10] ottomata: got it [19:48:22] vagrant@mediawiki-vagrant:/usr/bin$ ./kafka-console-consumer --zookeeper localhost:2181 --topic eventlogging-client-side --from-beginning [19:49:21] ottomata: hmmm it's not reading anything - just waiting [19:51:03] yea! [19:51:18] ok, did the first thin gwork? did we produce the raw events to kafka? [19:51:22] with forwarder [19:51:23] ? [19:51:25] madhuvishy: ^ [19:51:30] ottomata: it said, 2015-06-30 19:39:33,691 (MainThread) Forwarding stdin://?raw=True => kafka:///localhost?topic=eventlogging-client-side&raw=True... [19:51:50] do it again! [19:51:55] leae the consumer runing [19:51:57] see if you get anything [19:52:10] nope, nothing [19:52:25] oh [19:52:28] madhuvishy: cat the file [19:52:34] oh you did tha? [19:52:35] right? [19:52:57] ottomata: this - cat client-side-events-raw.100.log | eventlogging-forwarder stdin:// 'kafka:///localhost?topic=eventlogging-client-side' - Yes [19:53:00] yes [19:53:01] hm [19:53:06] ok weird, no errors? [19:53:11] nope [19:54:08] oooook, lets come back to this and try the processor from file [19:54:57] oh, hm, do you need a port on the kafka url? [19:54:58] not sure [19:55:04] madhu, is your kafka broker listening on 9092? [19:55:12] cat client-side-events-raw.100.log | eventlogging-processor  '%q %{recvFrom}s %{seqId}d %t %h %{userAgent}i' stdin:// 'kafka:///localhost:9092' [19:55:15] what's that do? [19:55:57] ottomata: 2015-06-30 19:46:15,889 (MainThread) Publishing valid JSON events to kafka:///localhost:9092. [19:57:06] yes its listening on 9092 [19:58:54] okkkk [19:58:59] do the list topic thing again [19:59:02] do you have more topics? [19:59:05] eventlogging_.. [19:59:05] ? [19:59:11] madhuvishy: ^ [19:59:43] ottomata: no.. just the one [19:59:46] rather [19:59:51] i have the other test one [20:00:04] i can read data from that [20:00:21] oh? [20:00:33] the raw client side topic has data in it now madhuvishy? [20:00:39] ottomata: yeah - that's this one - http://confluent.io/docs/current/quickstart.html [20:00:40] no [20:00:42] oh [20:00:51] ok, hm lets go more direct then [20:00:59] i think you have kafka-console-producer.sh [20:00:59] yes? [20:01:02] yes [20:01:07] cat the log file into that [20:01:10] --topic eventlogging-client-side [20:01:40] madhuvishy: hm, maybe kafka isn't bound to localhost [20:01:51] try your vms IP instead [20:02:11] hostname --ip-address [20:02:19] Missing required argument "[broker-list]" [20:02:20] Analytics-Backlog: Clean up mobile-reportcard dashboards - https://phabricator.wikimedia.org/T104379#1415097 (Milimetric) NEW [20:03:32] ottomata: nope - same. and running the producer directly said missing arg - broker-list [20:04:12] Analytics-Backlog: Clean up mobile-reportcard dashboards {frog} - https://phabricator.wikimedia.org/T104379#1415114 (Milimetric) [20:04:55] oh [20:04:55] add [20:05:07] --broker-list $(hostname --ip-address):9092 [20:05:13] to producer directly [20:05:46] Analytics-Kanban, Reading-Web: Cron on stat1003 for mobile data is causing an avalanche of queries on dbstore1002 - https://phabricator.wikimedia.org/T103798#1415124 (Milimetric) Open>Resolved a:Milimetric Closing this issue because we decided what to do with each graph here: T104379. I will re-e... [20:05:56] ottomata: yeah did that [20:06:01] https://www.irccloud.com/pastebin/lMOz4s5d/ [20:06:42] Analytics-Kanban, Reading-Web: Cron on stat1003 for mobile data is causing an avalanche of queries on dbstore1002 - https://phabricator.wikimedia.org/T103798#1415132 (Milimetric) Resolved>Open [20:06:49] oh, madhuvishy, do --help on it [20:06:51] see what the args are [20:06:52] iunno [20:06:57] confluent's is different :) [20:07:03] ottomata: no, it is topic [20:07:16] https://www.irccloud.com/pastebin/crvtfzFe/ [20:07:34] madhuvishy: hm, 127.0.1.1 is not right [20:07:36] ifconfig [20:07:38] what's that say? [20:08:08] ottomata: inet addr:127.0.0.1 Mask:255.0.0.0 [20:08:20] gotta be something else, no? [20:08:24] some 192 or 10. addy? [20:08:45] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1415136 (Milimetric) thanks @ksmith, I didn't realize that. The bulk of our traffic comes from desktop and mobile, so the 1% sampling on apps doesn't help much. I'll wait for Oliver... [20:09:26] ottomata: I ran it with 127.0.0.1 [20:10:20] https://www.irccloud.com/pastebin/1sa3y6xO/ [20:10:34] gah [20:10:45] my bad [20:10:47] extra . [20:10:49] Analytics-EventLogging, Analytics-Kanban: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1415144 (Milimetric) FYI, 150 events per second would mean over a billion rows in those tables assuming we truncate them after 90 days as is usual with our other tables. If we can tr... [20:10:49] but same [20:10:59] Property topic is not valid [20:11:26] hm [20:11:27] weird [20:11:32] madhu [20:11:39] le'ts batcae in 5mins, gotta get a snack... [20:12:38] ottomata: yup okay [20:15:08] madhuvishy: ah! I forgot to ping you [20:15:37] got problems with vagrant still? [20:17:41] ok batcave! [20:19:01] good end of the day everyone! see you tomorrow :] [20:21:13] laters!~ [20:25:27] I'm done lads ! [20:25:30] Later :) [20:28:49] python bin/eventlogging-processor  '%q %{recvFrom}s %{seqId}d %t %h %{userAgent}i' 'kafka:///localhost:9092?topic=eventlogging-client-side' stdout:// [20:28:53] madhuvishy: ^ [20:34:30] milimetric: Andrew helped me get it working through sample files [20:35:05] but i can't use the dev server successfully [21:02:26] kevinator: I'm on fifth in one of the pink couches that hide you [21:03:57] madhuvishy: ok :) let's talk after your 1/1 (just had mine) [21:09:55] Analytics-Tech-community-metrics, Engineering-Community, ECT-July-2015: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1415360 (Dicortazar) I've updated the current list with the command provided by @Aklapper, and I'... [21:22:51] Analytics-Backlog, Analytics-Dashiki: Improve the edit analysis dashboard - https://phabricator.wikimedia.org/T104261#1415411 (Neil_P._Quinn_WMF) [21:55:54] Analytics-Kanban: rubicon - https://phabricator.wikimedia.org/T104390#1415512 (kevinator) NEW [21:56:10] Analytics-Kanban: rubicon - https://phabricator.wikimedia.org/T104390#1415520 (kevinator) p:Triage>Low [21:56:31] Analytics-Kanban: ---- RUBICON ---- - https://phabricator.wikimedia.org/T104390#1415523 (kevinator) [22:31:14] milimetric: I forgot to ping earlier. lets figure it out tomorrow. its late for you [23:07:57] madhuvishy: nah, it's cool, we can chat [23:08:06] milimetric: okayy.. [23:08:29] so you're trying to setup the EL dev environment, or.. [23:08:38] so when i tried to run the devserver, and test it from js console - http://i.imgur.com/oEavwPY.png happened [23:09:02] but let me try again [23:09:10] madhuvishy: nah, that makes sense [23:09:23] the dev server would never see the events because there's no varnish to pass them on [23:09:42] milimetric: aah [23:09:47] that's missing from the vagrant setup so you can't get something similar to prod on your local machine sadly [23:09:57] I think it's because varnish is way too hard to do in vagrant or something [23:10:00] okay, so this is normal? [23:10:02] yes [23:10:12] also, though, you wouldn't see anything on your console [23:10:21] logEvent just sends the event and gets back a 204 [23:10:26] (which is a no-content OK) [23:11:15] aah [23:11:15] doing some varnish server in mw-v wouldn't bee too hard but running the prod varnish config there is bascailly unpossible [23:11:18] also, there's something weird going on with the schemas lately (or maybe always but the error logging is new) where if you don't have the appropriate PHP configuration, your schema won't be loaded properly [23:11:27] we looked at it in Zürich and then everyone lost interest [23:11:33] madhuvishy: but! you can test the devserver through the magic that is EL awesomeness [23:11:55] one way is to run the separate workers with inputs and outputs. [23:12:05] milimetric: yes I'm able to do all that :) [23:12:13] ok, good, in vagrant too, right? [23:12:21] and process and consume stuff from kafka too [23:12:25] great, good [23:12:27] yes, in vagrant [23:12:35] yeah, then that's pretty much all you can hope for with EL [23:12:39] madhuvishy: when you have few min, can you ping? [23:12:53] milimetric: okay then, i'm good for now [23:12:56] if you want to test the full pipeline, you can do so on the beta cluster, you know about that? [23:13:08] deployment-eventlogging02.eqiad.wmflabs [23:13:29] milimetric: yeah, I haven't tried to test anything there - dont need it right now I guess [23:13:35] leila: sure [23:13:44] we use that machine to try out different things in a "prod" environment. This wiki logs events to the mysql instance on that machine: https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page [23:13:56] yep, you don't need that right now, just FYI it's there if you want [23:14:06] k, nite! [23:14:31] milimetric: yes, that's great. thanks :) I'll poke again when I get there [23:14:43] good night! [23:16:28] leila: Hey :) [23:16:35] hellooo madhuvishy :-) [23:16:43] so, I'm looking at your spreadsheet [23:16:51] leila: okay.. [23:16:59] is column D last access uniques on App? [23:17:14] madhuvishy, ^ [23:17:27] leila: looking [23:18:48] leila: la_uniques, la_repeats - are based on last access cookie - daily counts for access_method='mobile app' on enwiki [23:19:08] app_uniques is using the mobile apps unique counting - based on uuids [23:20:59] so, madhuvishy, I'm looking at an email on May 8 which says la is not set for apps. [23:21:18] I'm now confused what that emails says and what your numbers are. can you look into it? [23:21:29] subject: Uniques report! [23:23:10] leila: hmmm [23:23:32] madhuvishy: :-) [23:24:10] leila: it's confusing for me too. that's what I thought too, but there are requests with access_method='mobile app' and Last access set [23:24:24] these are the counts of such requests [23:24:30] do you want to follow up on that thread madhuvishy? [23:25:18] leila: sure. I will find out what exactly. [23:26:29] so, madhuvishy: just to be clear: if la_uniques is really for app, then we expect la_uniques to be equal to app_uniques, right? [23:26:42] leila: yeah [23:27:35] the real value is 28% higher on average, madhuvishy [23:28:03] leila: yes, which is puzzling. finding out how these cookies get set might be the place to start [23:28:24] yeah, makes sense, madhuvishy. okay, I'll wait for that conversation to converge then? :-) [23:28:55] leila: okay :) I'll also do some poking to figure it out. Thanks for pointing this out! [23:29:21] np. sorry that I didn't notice that thread earlier. [23:39:43] leila: I just sent the email. Let's see [23:39:49] thanks, madhuvishy [23:41:07] madhuvishy: I'm reading my emails but won't be in IRC for the next 1.5 hour. will come back then. [23:41:23] leila: no problem :) [23:53:26] Analytics-Wikistats: Wikistats list of Wikipedias incomplete - https://phabricator.wikimedia.org/T99677#1416095 (Dzahn) wrong project - fixing [23:54:27] Analytics-Wikistats, Design: stats.wikimedia.org is visually unattractive - https://phabricator.wikimedia.org/T28353#1416099 (Dzahn) Can we please fix https://phabricator.wikimedia.org/T93702#1146743 ? If the files were in the repo that would make it possible to also help with this and other open tickets. [23:59:00] Analytics-Wikistats: statistics for Wikidata API usage - https://phabricator.wikimedia.org/T64873#1416125 (Dzahn) What does that Mingle link actually tell us? All i see is a link back into Bugzilla and a lot of fields with "not set" value.