[13:35:58] <wikibugs>	 Analytics-Tech-community-metrics: "Tickets" (defunct Bugzilla) vs "Maniphest" sections on korma are confusing - https://phabricator.wikimedia.org/T106037#1502866 (Aklapper) Interesting - that "Maniphest" section is gone again now. So only "Tickets" left (which is Bugzilla and shows outdated data).  There is...
[14:33:50] <icinga-wm>	 PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0]
[14:35:51] <icinga-wm>	 RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0]
[14:44:23] <milimetric>	 joal: hey
[14:45:19] <milimetric>	 I tried to figure out how to get your RESTBase working with the Cassandra docker but couldn't get it.  Wanna put me out of my misery? :)
[14:45:38] <joal>	 Hi milimetric !
[14:45:45] <joal>	 Let's get that stuff sorted :)
[14:46:19] <joal>	 Batcave !
[14:56:10] <wikibugs>	 Analytics, Engineering-Community, MediaWiki-API, Research-and-Data, and 4 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1503132 (Qgil)
[15:12:21] <icinga-wm>	 PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0]
[15:14:31] <icinga-wm>	 RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0]
[15:19:06] <wikibugs>	 Analytics-Kanban, RESTBase-API: create first RESTBase endpoint [8 pts] {slug} - https://phabricator.wikimedia.org/T107053#1485466 (Milimetric)
[15:19:40] <wikibugs>	 Analytics-Kanban, RESTBase-API: create first RESTBase endpoint [8 pts] {slug} - https://phabricator.wikimedia.org/T107053#1503176 (Milimetric) a:JAllemandou
[15:33:25] <wikibugs>	 Analytics-Kanban: Solve aggregator bug - https://phabricator.wikimedia.org/T107764#1503214 (JAllemandou) NEW a:JAllemandou
[15:45:13] <wikibugs>	 Analytics-Kanban: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1485486 (Milimetric)
[15:45:30] <wikibugs>	 Analytics-Kanban: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1503259 (Milimetric) a:Milimetric
[16:07:02] <wikibugs>	 Analytics-Cluster, operations, Interdatacenter-IPsec: Secure inter-datacenter web request log (Kafka) traffic - https://phabricator.wikimedia.org/T92602#1503299 (BBlack)
[16:11:58] <wikibugs>	 Analytics-Kanban: Solve aggregator bug {wren} - https://phabricator.wikimedia.org/T107764#1503328 (kevinator)
[16:14:41] <wikibugs>	 Analytics-Cluster, Analytics-Kanban: Solve aggregator bug {wren} - https://phabricator.wikimedia.org/T107764#1503341 (kevinator)
[16:17:39] <wikibugs>	 Analytics-Cluster, Analytics-Kanban: Solve aggregator bug {wren} [5 pts] - https://phabricator.wikimedia.org/T107764#1503355 (kevinator)
[16:39:10] <icinga-wm>	 PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0]
[16:45:37] <milimetric>	 ottomata: did you and nuria and oliver? look into auto-updating UAParser as new updates come out?
[16:47:11] <icinga-wm>	 RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0]
[16:47:20] <ottomata>	 milimetric:  I did not
[16:47:25] <milimetric>	 ok, cool
[16:47:26] <milimetric>	 thx
[16:54:49] <wikibugs>	 Analytics-Backlog, Analytics-Cluster, Analytics-Kanban: Classification of Bing robots as spider traffic instead of user traffic - https://phabricator.wikimedia.org/T106134#1503474 (kevinator) p:High>Normal
[16:58:15] <wikibugs>	 Analytics-Backlog, Analytics-Cluster, Analytics-Kanban: Classification of Bing robots as spider traffic instead of user traffic {hawk} - https://phabricator.wikimedia.org/T106134#1503478 (kevinator)
[16:58:47] <wikibugs>	 Analytics-Kanban, Analytics-Wikistats: {lama} Wikistats 2.0 - https://phabricator.wikimedia.org/T107175#1503483 (Milimetric) Hold please, clarifying with Erik separately :)  In any case, any new work on this kind of thing will be fully modular and we wouldn't combine traffic reports with dumps analysis in...
[17:04:52] <wikibugs>	 Analytics-Backlog, Analytics-Cluster, Analytics-Kanban: Classification of Bing robots as spider traffic instead of user traffic {hawk} - https://phabricator.wikimedia.org/T106134#1503522 (kevinator) @dr0ptp4kt is this urgent?  If so, we'll look into manually updating UA parser to the latest codebase an...
[17:05:38] * milimetric lunching
[17:14:39] <wikibugs>	 Analytics-Backlog, Analytics-Cluster, Analytics-Kanban: Classification of Bing robots as spider traffic instead of user traffic {hawk} - https://phabricator.wikimedia.org/T106134#1503546 (dr0ptp4kt) @kevinator, I don't think it's urgent urgent. The only thing I would say is it may make sense to spot ch...
[17:19:03] <ottomata>	 joal: i am about to try to launch a big partition replica reassignment
[17:19:06] <ottomata>	 you wanted to see, ja?
[17:21:08] <ottomata>	 there it goes, that is gonna flood networking! eee! :)
[17:26:52] <Ironholds>	 ottomata, what do you know about x_analytics as a field and how it's assigned? Because I had a crazy idea.
[17:32:04] <wikibugs>	 Analytics, Labs, Labs-Infrastructure, Labs-Sprint-108, Patch-For-Review: Set up cron job on labstore to rsync data from stat* boxes into labs. - https://phabricator.wikimedia.org/T107576#1503614 (yuvipanda)
[17:37:04] <joal>	 ottomata: Have I missed everyhting or not completely finished yet ?
[17:56:03] <joal>	 seems like I have missed the thing.
[17:56:08] <joal>	 Will go to dinner then :)
[18:09:10] <wikibugs>	 Analytics, Security, operations: Purge > 90 days stat1002:/a/squid/archive/glam_nara - https://phabricator.wikimedia.org/T92340#1503756 (leila) @Multichill: who from GLAM we can talk to to figure out what data needs to be kept and in what form it will be potentially useful in the future? I'm proposing...
[18:39:50] <wikibugs>	 Analytics, Labs, Labs-Infrastructure, Labs-Sprint-108, Patch-For-Review: Set up cron job on labstore to rsync data from stat* boxes into labs. - https://phabricator.wikimedia.org/T107576#1503837 (yuvipanda) We'll need to have strict and clear guidelines to make sure we don't leak private data. P...
[18:50:12] <wikibugs>	 Analytics-Engineering: VE-related data for the Galician Wikipedia - https://phabricator.wikimedia.org/T86944#1503934 (Neil_P._Quinn_WMF) a:Neil_P._Quinn_WMF
[19:01:05] <ottomata>	 sorry joal, was at lunch and then  in the thick of it, i thin i have a problem.
[19:01:08] <ottomata>	 very unforseen
[19:01:11] <ottomata>	 varnishkafka is not happy
[19:27:00] <wikibugs>	 Analytics-Tech-community-metrics, ECT-August-2015: Tech community KPIs for the WMF metrics meeting - https://phabricator.wikimedia.org/T107562#1504128 (Aklapper) a:Aklapper I think both numbers about code changes and numbers about code review users still suffer to some extend from T103292 (e.g. I have...
[19:43:06] <milimetric>	 ottomata: you ok?  need any help?
[19:43:31] <milimetric>	 (re: varnishkafka not being happy)
[19:47:27] <ottomata>	 dunno yet!
[19:47:46] <ottomata>	 things are looking more ok...but not great....i think i am about to an equilibrium where I will be ok to do meeting
[19:48:00] <ottomata>	 gonna have a data outage for sure.
[19:48:06] <ottomata>	 but i thikn it shoudl be coming in now
[19:54:24] <milimetric>	 ok, well, you can always postpone or I can run through your slides and record the meeting and we can go over it later
[19:54:33] <ottomata>	 milimetric: i still don't have a full handle on what is going on.  naw, i want to do the meeting.
[19:54:36] <milimetric>	 there are options, in case it's bad
[19:54:39] <ottomata>	 k
[19:55:02] <milimetric>	 k, you brain bouncing with other ops folks?
[19:55:08] <ottomata>	 no
[19:55:12] <ottomata>	 haven't had time
[19:55:19] <milimetric>	 if you wanna talk it over, I'm happy to
[19:55:26] <milimetric>	 I was just reading the Krepps article :)
[19:55:33] <ottomata>	 ok, good news is
[19:55:38] <ottomata>	 data is rolling into hadoop still just fine
[19:56:00] <ottomata>	 i think that vk is having periodic trouble during metadata requests
[19:56:16] <ottomata>	 which is going to cause it to drop messages here and there.
[19:56:20] <ottomata>	 so, today is going to be a very lossy day
[19:56:27] <milimetric>	 hm, weird
[19:56:44] <ottomata>	 but, it isn't totally down.  i need to do some testing with vk in a clean setup in labs to makes sure it will work with 100% with a  0.8.2.1
[19:57:00] <ottomata>	 i'm also having some weird snappy exceptions on the one new broker that had started to be the leader for real data
[19:57:10] <ottomata>	 there are a lot of replicas on new brokers now
[19:57:15] <ottomata>	 but only a couple of bits partitions on leaders
[19:57:24] <ottomata>	 those partitions arle dropping all data.  i'm trying to move them back to old brokers
[19:57:56] <ottomata>	 ah good, i have succeeded there.
[19:58:08] <ottomata>	 ok, so yeah, i still have some werid metadata errors
[19:58:12] <ottomata>	 but most things are working.
[19:58:17] <ottomata>	 so less of an emergency atm
[19:58:22] <milimetric>	 interesting, so yeah, seems like some kind of incompatibility between vk and the new metadata API?
[19:58:34] <ottomata>	 maybe, shouldn't be.
[19:58:40] <milimetric>	 i'll look through the changelog, just in case
[19:58:48] <ottomata>	 somethign weird with snappy too that i don't understand
[19:58:51] <ottomata>	 two issues to figure out
[19:59:27] <milimetric>	 gotcha, did snappy versions change too?
[20:02:00] <milimetric>	 (from changelog of 8.1: [KAFKA-1164] - kafka should depend on snappy 1.0.5 (instead of 1.0.4.1))
[20:02:22] <milimetric>	 https://archive.apache.org/dist/kafka/0.8.1/RELEASE_NOTES.html
[20:03:50] <ottomata>	 hmm, maybe!  seems mac related though
[20:04:26] <milimetric>	 and then they seemed to update it to snappy 1.1 in kafka 8.2: https://issues.apache.org/jira/browse/KAFKA-1369
[20:05:01] <milimetric>	 that one looks like it happened on Linux
[20:05:44] <mforns>	 ottomata, hey! I won't make it to your meeting, sorry. Next week I'll ask for it or read docs...
[20:06:26] <ottomata>	 k, thanks mforns np!
[20:06:37] <mforns>	 bye!
[20:06:41] <ottomata>	 ah that is def it milimetric!
[20:07:01] <ottomata>	 poopy ok
[20:07:09] <ottomata>	 will need to rebuild deb.   will do after meeting
[20:08:12] <milimetric>	 as far as metadata / broker / zookeeper type stuff, there are like a million bugfixes and changes since 0.8.0, sorry
[20:08:36] <ottomata>	 milimetric:  we are upgrading from 0.8.1.1
[20:08:39] <ottomata>	 to 0.8.2.1
[20:08:41] <milimetric>	 oh, good
[20:43:08] <joal>	 ottomata: just read the backlog
[20:43:18] <joal>	 Sorry for not having been there for help :S
[20:43:28] <joal>	 Thanks milimetric !
[20:43:58] <joal>	 I'm off for now, let me know (email or irc) if there is anything urgent I should monitor tomorrow morning
[20:44:00] <milimetric>	 :: shrug :: I blindly point to things, sometimes I lucky, opsy things magical
[20:44:04] <milimetric>	 have a good night joal
[20:44:37] <joal>	 milimetric: you being so humble doesn't make me be foolish enough to think it's only luck ;)
[20:44:53] <milimetric>	 :O
[20:45:00] <joal>	 Have a good end of day !
[20:45:01] <milimetric>	 I'll tell my wife you think I'm humble, she'll laugh so hard
[20:45:03] <milimetric>	 you too :)
[20:45:06] <joal>	 :D
[21:28:43] <wikibugs>	 Analytics-Kanban, Patch-For-Review: Clean up mobile-reportcard dashboards {frog} [13 pts] - https://phabricator.wikimedia.org/T104379#1415097 (Milimetric) This cleanup is not 100% done, because some post-deployment cleanup still has to happen next week.  But in the meantime, the new cleaner dashboard is ru...
[22:01:27] <wikibugs>	 Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1504586 (awight)
[22:22:19] <ottomata>	 milimetric: yt?
[22:22:22] <ottomata>	 batcave w me?
[22:42:12] <milimetric>	 hey ottomata, I'm here
[22:56:55] <AndyRussG>	 hi! quick question: during the event logging meeting, it was mentioned that adding new schema is expensive... Is it very much so? Should I try to avoid as much as possible ever having to tweak the new banner history schema?
[22:57:37] <AndyRussG>	 I'm pretty sure I heard once that every new schema means adding a new table, but I don't know if that causes a subsequent performance draw on the whole system...
[22:57:49] <AndyRussG>	 milimetric: ottomata: Ironholds: ^ ?
[22:57:51] <Ironholds>	 AndyRussG, you heard right, but I was mostly complaining about it for research reasons
[22:58:14] <Ironholds>	 that is, I have to write all my scripts to be agnostic to the table name and easily backfill from varying schemas and that's a PITA ;)
[22:58:59] <milimetric>	 AndyRussG: you should check with us, because if too many new events per second come in, we could go up against the bottlenecks in the currently non-scalable system
[22:59:16] <milimetric>	 so, because you have to check with us, it's annoying, nobody wants to check with us :)
[23:00:03] <AndyRussG>	 Ironholds: milimetric: ahh K thx :)
[23:00:36] <AndyRussG>	 milimetric: Heh in our case we did check that we won't be flooding the system (at least not at the start) ;p
[23:01:01] <milimetric>	 yep, I know, most people do check with us as unpleasant as it is :)
[23:02:09] <madhuvishy>	 milimetric: when you have a minute, could you tell me how you'd test the statsd stuff for the consumer on analytics1004?
[23:02:31] <milimetric>	 madhuvishy: sure, I'm helping Andrew for a bit now
[23:02:42] <madhuvishy>	 milimetric: yup no problem, take your time
[23:02:54] <milimetric>	 but yes, sorry I forgot to do that earlier
[23:03:13] <milimetric>	 you should ping me if I forget, I'm forgetful
[23:03:26] <AndyRussG>	 milimetric: I think that was my only question... for the record, I certainly enjoyed checking with you, both times :)
[23:03:27] <madhuvishy>	 no no, I was busy too, just got to ask you :)
[23:04:03] <AndyRussG>	 milimetric: I'm finally gonna write and deploy a schema, first to labs, though...
[23:04:27] <milimetric>	 AndyRussG: cool, good luck, ping if you're stuck
[23:04:38] <AndyRussG>	 milimetric: thanks much!!
[23:09:15] <milimetric>	 madhuvishy: ok, so I'm not sure about the statsd part.  Can you test that the right messages are going out without actually sending anything to statsd?
[23:09:38] <madhuvishy>	 milimetric: ya forget statsd
[23:09:47] <milimetric>	 ok, so ssh into an04
[23:09:51] <madhuvishy>	 i just want to test the mysql consumer
[23:10:03] <madhuvishy>	 milimetric: okay done
[23:10:06] <milimetric>	 I rsynched eventlogging to my home directory, you can either copy that (but it's out of date) or get your own
[23:10:19] <milimetric>	 mine's in /home/milimetric/EventLogging
[23:10:46] <madhuvishy>	 milimetric: okay got it
[23:10:59] <milimetric>	 and then you can run the consumer like this:
[23:11:00] <milimetric>	 time cat ~/out.load.2 | python eventlogging-consumer stdin:// mysql://root:root@127.0.0.1/log?charset=utf8
[23:11:27] <milimetric>	 that'll take the file as input and write to the local mysql instance
[23:11:28] <milimetric>	 oh, wait
[23:11:35] <milimetric>	 you need to cd into server/bin
[23:11:47] <milimetric>	 and do the equivalent of this: export PYTHONPATH=/home/milimetric/EventLogging/server
[23:11:51] <milimetric>	 for whatever path you're on
[23:12:00] <madhuvishy>	 milimetric: yeah okay
[23:12:53] <madhuvishy>	 milimetric: okay this is cool, I can work with this, thanks a lot :)
[23:12:58] <milimetric>	 np