[00:04:07] hey, kafka in prod is a little funky atm [00:04:12] i have auto create topics turned off [00:04:18] so you won't be able to use eventlogging ther right now [00:04:26] i gotta run thanks allLlLl [01:50:49] Analytics: Generalize useful pageview tools - https://phabricator.wikimedia.org/T107831#1505287 (Tnegrin) NEW a:kaldari [12:20:12] Analytics: Generalize useful pageview tools - https://phabricator.wikimedia.org/T107831#1506075 (Doc_James) Data by WikiProject by month has the longest history and what I propose we do that. We can use the EN categorization of articles by WikiProject and than use the Wikidata links to form corresponding li... [13:01:09] o/ joal [13:19:55] Hey halfak ! [13:20:22] Didn't expect you today :) [13:20:31] Yo. milimetric and I are in call [13:21:15] I had see [13:21:24] hm [13:21:52] So, I thought you had cancelled the thing [13:21:57] I'll join now [13:25:57] just realized the cancellation was from another meeting [14:14:32] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [14:16:51] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [14:21:39] sorry milimetric [14:21:42] batcave ? [14:21:43] sok :) [14:21:48] no, i was gonna take a quick break [14:21:56] ok [14:22:07] same for me :) [14:22:08] and I was just gonna say I'm still mucking around with the code, but so far not stuck at all [14:22:22] awesome :) [14:22:24] ok, ping me when you're back if you wanna catch up [14:22:28] You'll show me l;ater ? [14:22:32] yup [14:39:32] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [14:41:52] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [15:24:35] Analytics-Tech-community-metrics, ECT-August-2015: Tech community KPIs for the WMF metrics meeting - https://phabricator.wikimedia.org/T107562#1506474 (Aklapper) [15:24:38] (PS1) Milimetric: Update for August Meeting [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/229138 [15:24:47] (CR) Milimetric: [C: 2 V: 2] Update for August Meeting [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/229138 (owner: Milimetric) [15:25:20] Analytics-Tech-community-metrics, ECT-August-2015: Tech community KPIs for the WMF metrics meeting - https://phabricator.wikimedia.org/T107562#1498048 (Aklapper) (added charts and data for July month to initial task description) [15:34:24] Analytics-Kanban: POC RestBase with cassandra in labs on test data [8 pts] {slug} - https://phabricator.wikimedia.org/T106821#1506507 (ggellerman) [15:39:05] Analytics-Kanban, RESTBase-API: create first RESTBase endpoint [8 pts] {slug} - https://phabricator.wikimedia.org/T107053#1506517 (ggellerman) a:JAllemandou>Milimetric [15:56:14] Analytics-Backlog, Analytics-Cluster: Test Impalla operationally - https://phabricator.wikimedia.org/T96331#1506577 (ggellerman) [16:01:47] Analytics-Kanban, Patch-For-Review: Processor writes valid and invalid events to separate Kafka topics {stag} [13 points] - https://phabricator.wikimedia.org/T98781#1506591 (kevinator) work for this task is done and deployed... but code not being used right now. Marking as done as we have moved on. [16:01:53] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Prep work for Eventlogging on Kafka {stag} - https://phabricator.wikimedia.org/T102831#1506593 (kevinator) [16:01:55] Analytics-Kanban, Patch-For-Review: Processor writes valid and invalid events to separate Kafka topics {stag} [13 points] - https://phabricator.wikimedia.org/T98781#1506592 (kevinator) Open>Resolved [16:01:56] Analytics-EventLogging, Analytics-Kanban: {stag} EventLogging on Kafka - https://phabricator.wikimedia.org/T102225#1506594 (kevinator) [16:05:14] kevinator: lemme know when you want to talk about those screening qs [16:05:53] tomorrow after standup? [16:10:18] k [16:11:07] Analytics-Kanban: Analyze webrequest data issue on August 3/4 - https://phabricator.wikimedia.org/T107893#1506633 (JAllemandou) NEW [16:11:49] Analytics-Kanban: Analyze webrequest data issue on August 3/4 [?pts] {hawk} - https://phabricator.wikimedia.org/T107893#1506644 (JAllemandou) [16:14:48] Analytics-Cluster, Analytics-Kanban, operations, Patch-For-Review: Build new latest stable (0.8.2.1?) Kafka package and upgrade Kafka brokers - https://phabricator.wikimedia.org/T106581#1506647 (Ottomata) Oof, had some problems yesterday :( Incident documentation here: https://wikitech.wikimedia.or... [16:24:22] madhuvishy: do we need to task swapping out the kafka producer to use pykafka? [16:27:45] ottomata: yes [16:28:01] ottomata: I'll add a task for it and work on it [16:29:06] ottomata: do you know what the statsd host is? I thought it was labmon1001.eqiad.wmnet, port 8125 [16:29:52] uHhH [16:30:31] that looks right madhuvishy [16:30:35] udp or tcp? dunno [16:30:36] both? [16:31:01] I thought statsd used udp [16:32:15] https://www.irccloud.com/pastebin/6Zsol9fn/ [16:32:21] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [16:32:41] ottomata: this is all i tried to do. but i dont see anything show up on graphite [16:32:50] i'm trying this on analytics1004 [16:33:29] ottomata, I hope you like... [16:33:33] PRAISE IN FRONT OF OTHER PEOPLE! [16:33:37] * Ironholds makes scary face, hits send [16:34:31] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [16:34:31] ah [16:34:33] madhuvishy: [16:34:37] you can't reach labs statsd from prod :) [16:35:05] in prod it is [16:35:10] ottomata: what is the hostname in prod? [16:35:10] statsd.eqiad.wmnet [16:35:14] ottomata: aaah [16:35:25] but also, are you sure you want to test this in prod? [16:35:32] sending stats to graphite? are you sending under a test metric name? [16:36:08] ottomata: I could send under a test name. i think i can also test in labs [16:37:43] madhuvishy: do it in deployment-prep's eventlogging instance? [16:37:55] Analytics-Kanban, Reading-Admin, Research-and-Data, Research consulting: request for data: sites traffic by topics/ subject areas and geographies - https://phabricator.wikimedia.org/T107613#1506743 (Tbayer) It might be useful to get a sense of what kind of information is already out there. To start w... [16:38:09] joal: if you want to fix the graphite EL alerts, i bet we could adjust threshold for alert somehow [16:38:20] ottomata: hmmm, okay, will try that [16:39:03] madhuvishy: or in prod, i think it would be fine to send to a test metric name [16:45:30] Analytics, Labs, Labs-Infrastructure, Labs-Sprint-108, Patch-For-Review: Set up cron job on labstore to rsync data from stat* boxes into labs. - https://phabricator.wikimedia.org/T107576#1506752 (Ottomata) Oof, am looking at all that. I really am not excited about setting up another cron job +... [16:59:08] joal you gone for the day? [16:59:17] nope, working on stuff :) [16:59:21] ottomata: -^ [16:59:49] May I help ottomata ? [17:01:45] ottomata: I have seen your message on alerts, but still wonder if threshold is the right thing --> th conern seems to come from time difference in graphite updates [17:03:51] ottomata: just saw your message on operations chan [17:03:55] will monitor [17:14:28] ah [17:14:41] aye, am not sure either [17:14:43] about thresholds [17:16:51] ottomata: in a meeting now, let's batcave in 1/2 hour ? [17:17:01] I can monitor stuff, but not really talk :) [17:18:45] k no worries, we wait for moritz anyway, it should be totally fine ™ [17:26:06] ottomata: --^ Muhahaha :) [17:27:36] ottomata: moritzm deploy will be tomorrow [17:31:26] Analytics-Kanban, Reading-Admin, Research-and-Data, Research consulting: request for data: sites traffic by topics/ subject areas and geographies - https://phabricator.wikimedia.org/T107613#1506905 (DarTar) notes: https://etherpad.wikimedia.org/p/T107613 [17:36:39] Analytics-Tech-community-metrics, ECT-August-2015: Ranking of repositories in Korma's code review page should update more often - https://phabricator.wikimedia.org/T102112#1506923 (Aklapper) [17:42:32] Analytics-Kanban, Reading-Admin, Research-and-Data, Research consulting: request for data: sites traffic by topics/ subject areas and geographies - https://phabricator.wikimedia.org/T107613#1506946 (SVentura) @Tbayer, @DarTar thank you! [17:45:57] Analytics-Backlog, Research-and-Data, Research collaborations: Meet with Felipe Hoffa: Google BigQuery + Wikimedia PV data - https://phabricator.wikimedia.org/T107911#1506955 (DarTar) NEW a:DarTar [18:09:14] joal: still around? [18:09:19] i need a java classpath brain bounce thing [18:09:19] ottomata: yup ! [18:09:23] i am stumped [18:09:24] batcave? [18:09:25] batcave ? [18:09:30] :) [18:14:52] * joal is gone for dinner ! [18:34:02] Analytics-Kanban, Reading-Admin, Research-and-Data, Research consulting: request for data: sites traffic by topics/ subject areas and geographies - https://phabricator.wikimedia.org/T107613#1507191 (DarTar) @ezachte: we met today and discussed the scope of this ask and it looks like the work you've d... [18:50:59] Analytics-Kanban, Reading-Admin, Research-and-Data, Research consulting: request for data: sites traffic by topics/ subject areas and geographies - https://phabricator.wikimedia.org/T107613#1507343 (SVentura) @ezachte, good to meet you here, would you have time for a quick call/google hangout tomorro... [19:07:18] ottomata, is the eqiad analytics web proxy down? [19:07:26] unable to resolve 'webproxy.eqiad.wmnet' [19:08:37] where are you doing that from? [19:11:28] Analytics-Kanban, Reading-Admin, Research-and-Data, Research consulting: request for data: sites traffic by topics/ subject areas and geographies - https://phabricator.wikimedia.org/T107613#1507480 (Tbayer) One more link - this Word Bank document shows what kind of stats they used previously: "[[ h... [19:22:41] ottomata, nevermind, my bug :/ [19:22:44] I hadn't SSHd in [19:22:49] "why can't [my laptop] find this" [19:22:56] hehe [19:28:53] Tomorrow guys ! [19:34:31] ottomata, any idea when we get an R upgrade, btw? ;) [19:34:39] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [19:35:07] R upgrade!? when we get a OS upgrade that has it? does debian jessie have what you want? [19:35:36] it does! [19:35:59] 3.1.1-1 [19:35:59] ? [19:36:28] welp, i guess we need to upgrade stat boxes, then, woo! or, even easier, Ironholds when we get around to adding a new stat box or two, it will be Jessie. [19:36:39] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [19:48:40] milimetric: need brain bounce, yt? [19:49:11] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [19:49:20] hey ottomata, yes [19:49:23] cave? [19:52:29] ottomata, awesome! [19:52:41] any idea when either of those will happen? [19:54:51] there are a few things that are blocked on finding more room for our new hadopo nodes in eqiad [19:54:57] we can then repurpose more older dells [19:55:29] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [19:56:37] sweet [19:56:46] LMK if you need people to march around going "we need this" [20:12:18] hello milimetric [20:12:23] I've a question! [20:18:51] hi YuviPanda [20:18:53] what's up [20:19:09] milimetric: just wanted to ask how recurring reports were built in wikimetrics [20:19:14] been planning on building that for quarry [20:19:23] right, sure [20:19:29] so it's fairly manual [20:19:39] I have a "parent" report that has a recurrent flag set [20:19:45] it also has a start date [20:20:20] then I just generate two lists: all reports that should have run between start_date and today [20:20:31] and all reports that actually did run and stored success in the db [20:20:45] I do the diff between those and re-run them [20:20:57] code is ... hang on [20:21:24] here-ish: https://github.com/wikimedia/analytics-wikimetrics/blob/master/wikimetrics/models/report_nodes/run_report.py#L161 [20:21:49] so that returns a generator of the reports that should be run [20:22:07] hmmm I see [20:22:11] and then this celery beat daemon does the actual running: https://github.com/wikimedia/analytics-wikimetrics/blob/master/wikimetrics/schedules/daily.py#L25 [20:22:27] notice the different time limits [20:22:46] there were some gotchas when we did it. We ran into some recursion problems when trying to use fancier celery constructs like chords [20:23:05] so we just make all of them into separate celery tasks [20:23:36] mmm... one possibility would be to add an "ad-hoc" report type to wikimetrics and just re-use the code [20:23:43] that'd probably be better than rewriting it [20:23:49] hmm so what I was thining of doing [20:23:55] was allowing people to say 'weekly' or 'daily' or 'monthly' [20:24:07] and then I'll make a db entry after picking an appropriate time [20:24:18] and then just have celery beat run every 5 mins, check if anything needs to be run, and start those tasks [20:24:24] if they fail they fail and are run next time [20:24:40] fair enough, if you can tollerate missed runs [20:24:54] we built this to run vital signs so missed runs are not ok [20:25:11] of course, this makes the whole system vomit constantly because labsdb can't keep up with some of these metrics [20:25:24] but it's smart and doesn't retry too many at a time, etc. [20:26:41] but yeah, celery beat and schedules are easy to work with. If you're not building any guarantees on top of a normal cron, you could just use celerybeat and update its schedule file [20:26:45] then you don't need to implement much [20:27:25] milimetric: right, so if this misses a run it'll still have old stale data [20:27:36] YuviPanda: another problem: are you trying to concatenate the results? [20:27:56] what do you mean by 'concatenate'? [20:27:57] we have logic that formats both timeseries and non-timeseries data so it can be concatted together [20:28:09] like, I have July 1st, July 2nd, July 3rd and I want all of the data in one file [20:28:18] ah, no [20:28:26] people will ask for that :) [20:28:28] this is just the equivalent of someone clicking the 'run' button [20:28:41] since I know at least 3 peolpe who basically are doing the recurring runs by hand now :D [20:29:01] yeah, what about farming it out to wikimetrics so you don't re-invent the wheel [20:29:04] milimetric: but I store historical data forever, so you can just ask for $current-1 version [20:29:26] it feels like different problems and that quarry is going to be far more lightweight... [20:30:30] milimetric: I'll liberally steal code if I need to tho [20:30:31] cool. Yeah, so my advice is then just use celery beat to do most of the work [20:30:43] yeah cool! [20:30:46] I'll do that :) [20:42:31] Analytics-Kanban, Reading-Admin, Research-and-Data, Research consulting: request for data: sites traffic by topics/ subject areas and geographies - https://phabricator.wikimedia.org/T107613#1507872 (SVentura) thanks for the link @Tbayer. this time around we are working with World Bank's data scientis... [21:12:42] Analytics, Labs, Labs-Infrastructure, Labs-Sprint-108, Patch-For-Review: Set up cron job on labstore to rsync data from stat* boxes into labs. - https://phabricator.wikimedia.org/T107576#1507951 (yuvipanda) @Ottomata it isn't super hard to do, I can write it up if you'd like :) I think the 'one... [21:57:18] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [21:59:29] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [22:01:45] Hey eventlogging folks. If a user submits an event twice will it be recorded with the same "uuid" value in the database? [22:18:35] bmansurov: the uuid has a unique constraint on it, so it shouldn't be possible to insert the duplicate [22:19:02] but if I remember correctly, yes, the uuid would be the same if all the event data is exactly the same and it's coming from the same IP, etc. [22:19:19] but the timestamp would have to be the same too [22:19:46] gtg for a bit [22:22:27] milimetric, thanks [22:32:12] Analytics-Kanban: Project Flea - https://phabricator.wikimedia.org/T107955#1508261 (ggellerman) [22:33:15] Analytics-Kanban: {Flea} Teaching people to fish - https://phabricator.wikimedia.org/T107955#1508272 (ggellerman) [22:34:19] Analytics-Kanban: {Flea} Teaching people to fish - https://phabricator.wikimedia.org/T107955#1508277 (ggellerman) [22:35:10] Analytics-Backlog: Provide the Wikimedia DE folks with Hive access/training {flea} - https://phabricator.wikimedia.org/T106042#1508285 (ggellerman)