[13:35:58] Analytics-Tech-community-metrics: "Tickets" (defunct Bugzilla) vs "Maniphest" sections on korma are confusing - https://phabricator.wikimedia.org/T106037#1502866 (Aklapper) Interesting - that "Maniphest" section is gone again now. So only "Tickets" left (which is Bugzilla and shows outdated data). There is... [14:33:50] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [14:35:51] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [14:44:23] joal: hey [14:45:19] I tried to figure out how to get your RESTBase working with the Cassandra docker but couldn't get it. Wanna put me out of my misery? :) [14:45:38] Hi milimetric ! [14:45:45] Let's get that stuff sorted :) [14:46:19] Batcave ! [14:56:10] Analytics, Engineering-Community, MediaWiki-API, Research-and-Data, and 4 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1503132 (Qgil) [15:12:21] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [15:14:31] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [15:19:06] Analytics-Kanban, RESTBase-API: create first RESTBase endpoint [8 pts] {slug} - https://phabricator.wikimedia.org/T107053#1485466 (Milimetric) [15:19:40] Analytics-Kanban, RESTBase-API: create first RESTBase endpoint [8 pts] {slug} - https://phabricator.wikimedia.org/T107053#1503176 (Milimetric) a:JAllemandou [15:33:25] Analytics-Kanban: Solve aggregator bug - https://phabricator.wikimedia.org/T107764#1503214 (JAllemandou) NEW a:JAllemandou [15:45:13] Analytics-Kanban: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1485486 (Milimetric) [15:45:30] Analytics-Kanban: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1503259 (Milimetric) a:Milimetric [16:07:02] Analytics-Cluster, operations, Interdatacenter-IPsec: Secure inter-datacenter web request log (Kafka) traffic - https://phabricator.wikimedia.org/T92602#1503299 (BBlack) [16:11:58] Analytics-Kanban: Solve aggregator bug {wren} - https://phabricator.wikimedia.org/T107764#1503328 (kevinator) [16:14:41] Analytics-Cluster, Analytics-Kanban: Solve aggregator bug {wren} - https://phabricator.wikimedia.org/T107764#1503341 (kevinator) [16:17:39] Analytics-Cluster, Analytics-Kanban: Solve aggregator bug {wren} [5 pts] - https://phabricator.wikimedia.org/T107764#1503355 (kevinator) [16:39:10] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [16:45:37] ottomata: did you and nuria and oliver? look into auto-updating UAParser as new updates come out? [16:47:11] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [16:47:20] milimetric: I did not [16:47:25] ok, cool [16:47:26] thx [16:54:49] Analytics-Backlog, Analytics-Cluster, Analytics-Kanban: Classification of Bing robots as spider traffic instead of user traffic - https://phabricator.wikimedia.org/T106134#1503474 (kevinator) p:High>Normal [16:58:15] Analytics-Backlog, Analytics-Cluster, Analytics-Kanban: Classification of Bing robots as spider traffic instead of user traffic {hawk} - https://phabricator.wikimedia.org/T106134#1503478 (kevinator) [16:58:47] Analytics-Kanban, Analytics-Wikistats: {lama} Wikistats 2.0 - https://phabricator.wikimedia.org/T107175#1503483 (Milimetric) Hold please, clarifying with Erik separately :) In any case, any new work on this kind of thing will be fully modular and we wouldn't combine traffic reports with dumps analysis in... [17:04:52] Analytics-Backlog, Analytics-Cluster, Analytics-Kanban: Classification of Bing robots as spider traffic instead of user traffic {hawk} - https://phabricator.wikimedia.org/T106134#1503522 (kevinator) @dr0ptp4kt is this urgent? If so, we'll look into manually updating UA parser to the latest codebase an... [17:05:38] * milimetric lunching [17:14:39] Analytics-Backlog, Analytics-Cluster, Analytics-Kanban: Classification of Bing robots as spider traffic instead of user traffic {hawk} - https://phabricator.wikimedia.org/T106134#1503546 (dr0ptp4kt) @kevinator, I don't think it's urgent urgent. The only thing I would say is it may make sense to spot ch... [17:19:03] joal: i am about to try to launch a big partition replica reassignment [17:19:06] you wanted to see, ja? [17:21:08] there it goes, that is gonna flood networking! eee! :) [17:26:52] ottomata, what do you know about x_analytics as a field and how it's assigned? Because I had a crazy idea. [17:32:04] Analytics, Labs, Labs-Infrastructure, Labs-Sprint-108, Patch-For-Review: Set up cron job on labstore to rsync data from stat* boxes into labs. - https://phabricator.wikimedia.org/T107576#1503614 (yuvipanda) [17:37:04] ottomata: Have I missed everyhting or not completely finished yet ? [17:56:03] seems like I have missed the thing. [17:56:08] Will go to dinner then :) [18:09:10] Analytics, Security, operations: Purge > 90 days stat1002:/a/squid/archive/glam_nara - https://phabricator.wikimedia.org/T92340#1503756 (leila) @Multichill: who from GLAM we can talk to to figure out what data needs to be kept and in what form it will be potentially useful in the future? I'm proposing... [18:39:50] Analytics, Labs, Labs-Infrastructure, Labs-Sprint-108, Patch-For-Review: Set up cron job on labstore to rsync data from stat* boxes into labs. - https://phabricator.wikimedia.org/T107576#1503837 (yuvipanda) We'll need to have strict and clear guidelines to make sure we don't leak private data. P... [18:50:12] Analytics-Engineering: VE-related data for the Galician Wikipedia - https://phabricator.wikimedia.org/T86944#1503934 (Neil_P._Quinn_WMF) a:Neil_P._Quinn_WMF [19:01:05] sorry joal, was at lunch and then in the thick of it, i thin i have a problem. [19:01:08] very unforseen [19:01:11] varnishkafka is not happy [19:27:00] Analytics-Tech-community-metrics, ECT-August-2015: Tech community KPIs for the WMF metrics meeting - https://phabricator.wikimedia.org/T107562#1504128 (Aklapper) a:Aklapper I think both numbers about code changes and numbers about code review users still suffer to some extend from T103292 (e.g. I have... [19:43:06] ottomata: you ok? need any help? [19:43:31] (re: varnishkafka not being happy) [19:47:27] dunno yet! [19:47:46] things are looking more ok...but not great....i think i am about to an equilibrium where I will be ok to do meeting [19:48:00] gonna have a data outage for sure. [19:48:06] but i thikn it shoudl be coming in now [19:54:24] ok, well, you can always postpone or I can run through your slides and record the meeting and we can go over it later [19:54:33] milimetric: i still don't have a full handle on what is going on. naw, i want to do the meeting. [19:54:36] there are options, in case it's bad [19:54:39] k [19:55:02] k, you brain bouncing with other ops folks? [19:55:08] no [19:55:12] haven't had time [19:55:19] if you wanna talk it over, I'm happy to [19:55:26] I was just reading the Krepps article :) [19:55:33] ok, good news is [19:55:38] data is rolling into hadoop still just fine [19:56:00] i think that vk is having periodic trouble during metadata requests [19:56:16] which is going to cause it to drop messages here and there. [19:56:20] so, today is going to be a very lossy day [19:56:27] hm, weird [19:56:44] but, it isn't totally down. i need to do some testing with vk in a clean setup in labs to makes sure it will work with 100% with a 0.8.2.1 [19:57:00] i'm also having some weird snappy exceptions on the one new broker that had started to be the leader for real data [19:57:10] there are a lot of replicas on new brokers now [19:57:15] but only a couple of bits partitions on leaders [19:57:24] those partitions arle dropping all data. i'm trying to move them back to old brokers [19:57:56] ah good, i have succeeded there. [19:58:08] ok, so yeah, i still have some werid metadata errors [19:58:12] but most things are working. [19:58:17] so less of an emergency atm [19:58:22] interesting, so yeah, seems like some kind of incompatibility between vk and the new metadata API? [19:58:34] maybe, shouldn't be. [19:58:40] i'll look through the changelog, just in case [19:58:48] somethign weird with snappy too that i don't understand [19:58:51] two issues to figure out [19:59:27] gotcha, did snappy versions change too? [20:02:00] (from changelog of 8.1: [KAFKA-1164] - kafka should depend on snappy 1.0.5 (instead of 1.0.4.1)) [20:02:22] https://archive.apache.org/dist/kafka/0.8.1/RELEASE_NOTES.html [20:03:50] hmm, maybe! seems mac related though [20:04:26] and then they seemed to update it to snappy 1.1 in kafka 8.2: https://issues.apache.org/jira/browse/KAFKA-1369 [20:05:01] that one looks like it happened on Linux [20:05:44] ottomata, hey! I won't make it to your meeting, sorry. Next week I'll ask for it or read docs... [20:06:26] k, thanks mforns np! [20:06:37] bye! [20:06:41] ah that is def it milimetric! [20:07:01] poopy ok [20:07:09] will need to rebuild deb. will do after meeting [20:08:12] as far as metadata / broker / zookeeper type stuff, there are like a million bugfixes and changes since 0.8.0, sorry [20:08:36] milimetric: we are upgrading from 0.8.1.1 [20:08:39] to 0.8.2.1 [20:08:41] oh, good [20:43:08] ottomata: just read the backlog [20:43:18] Sorry for not having been there for help :S [20:43:28] Thanks milimetric ! [20:43:58] I'm off for now, let me know (email or irc) if there is anything urgent I should monitor tomorrow morning [20:44:00] :: shrug :: I blindly point to things, sometimes I lucky, opsy things magical [20:44:04] have a good night joal [20:44:37] milimetric: you being so humble doesn't make me be foolish enough to think it's only luck ;) [20:44:53] :O [20:45:00] Have a good end of day ! [20:45:01] I'll tell my wife you think I'm humble, she'll laugh so hard [20:45:03] you too :) [20:45:06] :D [21:28:43] Analytics-Kanban, Patch-For-Review: Clean up mobile-reportcard dashboards {frog} [13 pts] - https://phabricator.wikimedia.org/T104379#1415097 (Milimetric) This cleanup is not 100% done, because some post-deployment cleanup still has to happen next week. But in the meantime, the new cleaner dashboard is ru... [22:01:27] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1504586 (awight) [22:22:19] milimetric: yt? [22:22:22] batcave w me? [22:42:12] hey ottomata, I'm here [22:56:55] hi! quick question: during the event logging meeting, it was mentioned that adding new schema is expensive... Is it very much so? Should I try to avoid as much as possible ever having to tweak the new banner history schema? [22:57:37] I'm pretty sure I heard once that every new schema means adding a new table, but I don't know if that causes a subsequent performance draw on the whole system... [22:57:49] milimetric: ottomata: Ironholds: ^ ? [22:57:51] AndyRussG, you heard right, but I was mostly complaining about it for research reasons [22:58:14] that is, I have to write all my scripts to be agnostic to the table name and easily backfill from varying schemas and that's a PITA ;) [22:58:59] AndyRussG: you should check with us, because if too many new events per second come in, we could go up against the bottlenecks in the currently non-scalable system [22:59:16] so, because you have to check with us, it's annoying, nobody wants to check with us :) [23:00:03] Ironholds: milimetric: ahh K thx :) [23:00:36] milimetric: Heh in our case we did check that we won't be flooding the system (at least not at the start) ;p [23:01:01] yep, I know, most people do check with us as unpleasant as it is :) [23:02:09] milimetric: when you have a minute, could you tell me how you'd test the statsd stuff for the consumer on analytics1004? [23:02:31] madhuvishy: sure, I'm helping Andrew for a bit now [23:02:42] milimetric: yup no problem, take your time [23:02:54] but yes, sorry I forgot to do that earlier [23:03:13] you should ping me if I forget, I'm forgetful [23:03:26] milimetric: I think that was my only question... for the record, I certainly enjoyed checking with you, both times :) [23:03:27] no no, I was busy too, just got to ask you :) [23:04:03] milimetric: I'm finally gonna write and deploy a schema, first to labs, though... [23:04:27] AndyRussG: cool, good luck, ping if you're stuck [23:04:38] milimetric: thanks much!! [23:09:15] madhuvishy: ok, so I'm not sure about the statsd part. Can you test that the right messages are going out without actually sending anything to statsd? [23:09:38] milimetric: ya forget statsd [23:09:47] ok, so ssh into an04 [23:09:51] i just want to test the mysql consumer [23:10:03] milimetric: okay done [23:10:06] I rsynched eventlogging to my home directory, you can either copy that (but it's out of date) or get your own [23:10:19] mine's in /home/milimetric/EventLogging [23:10:46] milimetric: okay got it [23:10:59] and then you can run the consumer like this: [23:11:00] time cat ~/out.load.2 | python eventlogging-consumer stdin:// mysql://root:root@127.0.0.1/log?charset=utf8 [23:11:27] that'll take the file as input and write to the local mysql instance [23:11:28] oh, wait [23:11:35] you need to cd into server/bin [23:11:47] and do the equivalent of this: export PYTHONPATH=/home/milimetric/EventLogging/server [23:11:51] for whatever path you're on [23:12:00] milimetric: yeah okay [23:12:53] milimetric: okay this is cool, I can work with this, thanks a lot :) [23:12:58] np