[00:00:09] mforns: no, ummm the 2015 beirut bombings was only created on Nov 12 [00:00:48] i probably would expect the garissa university page to plot all the data, and only 4 days for the beirut bombings [00:00:59] i dont know why that's not happening though [00:01:08] do you think since 2105-11-15, no-one visited it? may be... [00:01:25] ummm what? [00:01:40] hehe [00:01:50] i have to assume at least the person who created it looked at it once :P [00:02:36] mforns: nov 12th was when the Beirut page was created, since then for 3 days it seems to have views [00:02:53] the other page was created earlier though, and should have data for all of october [00:03:08] madhuvishy, yes, it's weird though that after 3 days it goes from 30000 view to 0 in one day [00:03:10] what i'm saying is i cannot reproduce the bug exactly as it was reported [00:03:30] aha [00:03:30] mforns: oh i don't see that [00:03:56] mforns: i was able to reproduce it 5 minutes back [00:04:02] but now it behaves differently [00:04:04] madhuvishy, if you add the garissa university college attack page [00:04:05] dont know [00:04:25] and select the full month of october, you'll see it I think [00:05:40] i did [00:05:50] madhuvishy, mmmm that's a nasty bug [00:06:08] because the pageviews that were originally for 12 november [00:06:24] after adding the second article, are plotted at 1st october [00:06:29] yeah [00:06:33] that's the reported bug [00:06:41] all i can see now is http://i.imgur.com/wfD5vLK.png [00:06:45] dont know why [00:06:52] this is not correct either [00:07:05] mforns: ^ [00:07:42] madhuvishy, I think this is a data computation bug, because the article has edits for the 16th, so it should have views as well [00:08:10] but there is another bug, a visualization one, originated because of the caveat in the API [00:08:15] yeah [00:08:24] that 0 values are returned as 'empty' [00:08:44] pfffff [00:08:48] something [00:09:24] i'm very confused :P i do know that the bug that's reported is something on the viz side though - the november data points are plotted in october [00:09:33] why i cant reproduce it i dont know [00:10:13] madhuvishy, try to rechoose the date range [00:10:20] just open the selector and click ok [00:11:22] mforns: oof [00:11:26] ya [00:11:28] i see it now [00:11:31] ok [00:11:36] that behaves differently than [00:11:46] deliberately choosing the date range [00:11:49] huh [00:11:55] yea... weird... bad [00:13:05] madhuvishy, I think there are 3 bugs: 1) data computation (missing pageviews) 2) data holes crashing linechart 3) chart.js bug in chart update [00:13:19] yup [00:13:44] mforns: you should sleep though! this can wait [00:14:02] i'll add some notes to the ticket [00:14:05] I guess solving 2 can solve 3 [00:15:00] yes, I didn't want to add that code to the sample, but it'll be necessary [00:15:22] hmm [00:15:28] guys it's still the 16th, so the data's just not ready yet :) [00:15:38] it'll only be available on the 17th [00:15:56] and then it has to wait like 2-3 hours after the 16th is over to make it through the pipeline [00:16:04] milimetric, of course! so, no bug in computing data [00:16:07] milimetric: true [00:16:22] the viz bug looks to me like the dates are not aligning when adding two pages that don't both have data for the full range [00:16:24] filling in gaps should do the trick [00:16:28] right [00:16:37] mforns: go to sleep, I'll do that later tonight [00:16:39] I'll do that [00:16:46] you guys are too quick to blame yourselves [00:17:15] I'll do that in a sec, and you review it milimetric :] [00:20:54] :) it's a demo, hopefully nobody's feeling too bad about it not being perfect. As a matter of fact, I'd argue it should be imperfect to motivate others to make a production version [00:21:07] it's fun to say - hey! maybe I can do better than these WMF people :) [00:21:34] yea :] for sure [00:22:14] milimetric: yup! but the bug was reported like - we have missing data :) people who aren't checking the actual api think that the demo is the reflection of absolute truth [00:22:33] Analytics-Backlog: Weirdnesses in top_articles - https://phabricator.wikimedia.org/T117343#1809615 (Tbayer) See also the top 200 list at T117945 : These three pages are in the list of most viewed pages in `pageview_hourly` even when restricting it the list to `agent_type = "user"`. [00:23:30] ah! I just saw the pageview api announcement. congratulations, mforns, nuria, madhuvishy, joal, milimetric, and ottomata. :-) [00:23:34] This is sooo awesome. [00:23:37] leila: :D [00:23:40] :] [00:23:57] no, seriously. :D [00:23:59] madhuvishy, mforns , milimetric : a TON of bugs filed from people regarding pageview APi will be awesome, that is a sign of success [00:24:21] did nuria just say I'm a bug? :D [00:24:24] nuria, yes sure! [00:24:25] let's sit back and not try to fix everything too soon until we have gotten some traction [00:24:27] leila: he he [00:24:52] this really calls for an in-person celebration. :-) [00:24:57] enjoy it, y'all. [00:25:02] mforns, milimetric , madhuvishy : let's bake things for a few days [00:25:16] leila: just fyi, we have this awesome thing where you can use a-team to ping us all here! just in case it's tiring to type all names :) [00:25:20] leila: old fashion food & drinks event i would say [00:25:30] yeah! [00:25:41] nuria: totally. I'm getting ready. [00:25:43] woohoo [00:25:50] thx leila :D [00:25:55] madhuvishy: good that you said it. I was looking on the staff page to make sure I'm not missing anyone. :D [00:25:59] nuria: I agree, let's bake cupcakes!! [00:26:03] leila: :D [00:38:13] I can bring cupcakes at the API talk at the mediawiki dev summit :-) [00:38:36] ::nom nom:: CUPCAKES :) [00:38:49] :P [00:40:25] https://upload.wikimedia.org/wikipedia/commons/0/09/Bridal_Boudoir_Affair_%284280075406%29.jpg [00:43:31] * milimetric turns into a cartoon wolf and drops jaw [00:47:46] :] [01:08:35] milimetric, I finished a patch for the fill-in of zero values [01:08:47] do you think I put it in the gist? [01:09:15] you can put it anywhere, but I have changes to the title and text and comments and stuff on the deployed one [01:09:21] also I switched the cdnjs etc. [01:09:33] so if you put it in your gist I'll wget it and do the diff and apply that way [01:10:51] sorry, mforns ^ [01:11:17] milimetric, I'd like to have your comments and changes in the gist [01:11:37] but the cdnjs should be kept in it [01:11:46] right... [01:11:50] I'll update both [01:12:03] with your and my latest changes [01:12:07] ok. you can download the deployed file and update that, then push it to both places [01:12:16] ok [01:12:57] oh then the cdnjs stuff will be overwritten so I guess copy that over [01:13:12] aha [01:19:14] milimetric, I added nulls instead of 0s [01:19:37] so that the chart won't infere false 0 [01:19:56] do you want to review it before deploying? [01:20:17] eh... easier to just deploy and fix :) [01:20:26] ok [01:23:01] milimetric, it's on air [01:25:28] sweet, looking [01:27:09] looks good mforns. Also, there's a disturbing spike in Cat pageviews on October 2012. Weiird :) [01:27:20] *October 12th [01:27:27] milimetric, I know xD [01:27:28] I need sleep... geez [01:27:39] ok, see you tomorrow, I'm going too [01:27:42] bye! [01:27:42] nite@! [01:27:44] thx [01:28:00] Analytics-Backlog: Missing Pageview API data for one article - https://phabricator.wikimedia.org/T118785#1809867 (mforns) Thanks @Ainali and @madhuvishy for spotting this bug. The problem was not missing data, but formatting of the timeseries: a problem caused by one of the caveats of the API. It should be fi... [01:28:23] Analytics-Kanban: Missing Pageview API data for one article - https://phabricator.wikimedia.org/T118785#1809870 (mforns) p:Triage>High a:mforns [01:29:03] Analytics-Kanban: Missing Pageview API data for one article - https://phabricator.wikimedia.org/T118785#1809878 (Milimetric) (the caveat is that the API does not return data if there are no pageviews for a particular timestamp or date. That means you'll have to fill in with nulls depending on how your charti... [01:35:09] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1809895 (Milimetric) Ocaasi, whereas what you're proposing would be tricky with the *current* API endpoints, I could see a new endpoint that would look like: p... [01:36:32] Analytics-Kanban, RESTBase, Services, RESTBase-API: configure RESTBase pageview proxy to Analytics' cluster {slug} [34 pts] - https://phabricator.wikimedia.org/T114830#1809906 (Milimetric) I think we should open up a new task for that, in the sake of marking that some progress was made. But I'll le... [02:07:50] PROBLEM - Check status of defined EventLogging jobs on eventlog2001 is CRITICAL: CRITICAL: Stopped EventLogging jobs: consumer/server-side-events-log consumer/mysql-m4-master consumer/client-side-events-log consumer/all-events-log processor/server-side-0 processor/client-side-0 forwarder/server-side-raw forwarder/legacy-zmq [04:25:03] Analytics, Services: Wikimedia pageview API intermittently throwing HTTP 503s - https://phabricator.wikimedia.org/T118817#1810033 (MZMcBride) NEW [04:59:45] *sigh* again: "ERROR 2013 (HY000): Lost connection to MySQL server during query" while querying via research@analytics-store.eqiad.wmnet from stat1003 [05:19:49] Analytics, Services: Wikimedia pageview API intermittently throwing HTTP 503s - https://phabricator.wikimedia.org/T118817#1810052 (GWicke) [06:29:29] Analytics-Kanban: Missing Pageview API data for one article - https://phabricator.wikimedia.org/T118785#1810104 (Ainali) >>! In T118785#1809549, @madhuvishy wrote: > @Ainali can you check if you are still seeing this bug? No, it has been fixed. [08:36:02] (PS1) Addshore: Remove total_views to example cron [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253565 [08:36:16] (CR) Addshore: [C: 2 V: 2] Remove total_views to example cron [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253565 (owner: Addshore) [08:36:52] Analytics, Services: Wikimedia pageview API intermittently throwing HTTP 503s - https://phabricator.wikimedia.org/T118817#1810244 (mobrovac) p:Triage>High This is really strange. I can reproduce this in the exact way as outlined in the desc when I issue requests from my machine, but cannot reproduce... [08:37:21] joal: fyi ^^ [08:48:12] Analytics, Services: Wikimedia pageview API intermittently throwing HTTP 503s - https://phabricator.wikimedia.org/T118817#1810259 (mobrovac) Switching to a new terminal window gives me 200 OK for the first time and then 503's. The same cannot be reproduced for other domains, only `wikimedia.org`. [08:48:38] Analytics, Services, operations: Wikimedia pageview API intermittently throwing HTTP 503s - https://phabricator.wikimedia.org/T118817#1810260 (mobrovac) [08:49:38] (PS1) Addshore: Add dispatch tracking script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253566 [08:49:54] (CR) Addshore: [C: 2 V: 2] Add dispatch tracking script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253566 (owner: Addshore) [09:32:43] Analytics-Kanban: Missing Pageview API data for one article - https://phabricator.wikimedia.org/T118785#1810312 (JAllemandou) Not fixed for me :( [09:36:14] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1810321 (Symac) Thanks fot the API, that is great! I have been playing with views data to build this small tool http://www.wikifamo.us and at the moment I am pr... [09:41:40] hey mobrovac [09:41:50] Thanks for yesterday's deploy and for pointing the bug [09:42:11] I guess having assigned to ops is the right thing do, isn't it mobrovac ? [09:45:45] joal: i'm working with ops to resolve this as we speak [09:45:55] ops channel I guess [09:46:07] thanks mobrovac [09:46:14] HAve any idea from where it comes from ? [09:46:51] joal: it seems that in our infra the wikimedia.org domain is dealt with differently than others, so we are trying to find out what exactly is causing this [09:47:01] right [09:47:10] Thanks a milion mobrovac for taking care of this [09:47:25] np joal :) [09:51:32] And also mobrovac: Thanks a lot for the help on setting up the API :) IT'S ALIVE !!!!! [09:51:56] hehe [09:52:02] it was my pleasure joal! [09:52:08] i'm so happy it's up there :) [09:54:24] (PS1) Addshore: Social metrics to graphite [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253571 (https://phabricator.wikimedia.org/T117735) [09:54:27] (PS1) Addshore: Convert site_stats to graphite [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253572 (https://phabricator.wikimedia.org/T117735) [09:54:30] (PS1) Addshore: Convert getclaims stats to graphite [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253573 (https://phabricator.wikimedia.org/T117735) [10:02:16] (PS1) Addshore: Fix paths to config for social stats [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253576 [10:07:58] (PS1) Addshore: +x .sh files [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253577 [10:16:29] (CR) Addshore: [C: 2 V: 2] Social metrics to graphite [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253571 (https://phabricator.wikimedia.org/T117735) (owner: Addshore) [10:16:32] (CR) Addshore: [C: 2 V: 2] Convert site_stats to graphite [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253572 (https://phabricator.wikimedia.org/T117735) (owner: Addshore) [10:16:35] (CR) Addshore: [C: 2 V: 2] Convert getclaims stats to graphite [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253573 (https://phabricator.wikimedia.org/T117735) (owner: Addshore) [10:16:38] (CR) Addshore: [C: 2 V: 2] Fix paths to config for social stats [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253576 (owner: Addshore) [10:16:41] (CR) Addshore: [C: 2 V: 2] +x .sh files [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253577 (owner: Addshore) [10:16:57] (Merged) jenkins-bot: Social metrics to graphite [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253571 (https://phabricator.wikimedia.org/T117735) (owner: Addshore) [10:17:26] (CR) jenkins-bot: [V: -1] Fix paths to config for social stats [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253576 (owner: Addshore) [10:17:35] (CR) jenkins-bot: [V: -1] Fix paths to config for social stats [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253576 (owner: Addshore) [10:18:12] (CR) Addshore: "recheck" [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253576 (owner: Addshore) [10:20:17] Analytics-Tech-community-metrics, Developer-Relations, DevRel-November-2015: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1810431 (Aklapper) Daniel says there is some more work to do here to update these lists. [10:27:36] Analytics-Backlog: Write hive code doing pageview data anonimisation with two tables [13 pts] {hawk} - https://phabricator.wikimedia.org/T118838#1810450 (JAllemandou) NEW [10:28:42] Analytics-Backlog: Productionize hive code with Oozie job and refinery inclusion - https://phabricator.wikimedia.org/T118839#1810457 (JAllemandou) NEW [10:28:58] Analytics-Backlog: Productionize hive code with Oozie job and refinery inclusion [8 pts] {hawk} - https://phabricator.wikimedia.org/T118839#1810457 (JAllemandou) [10:29:46] Analytics-Backlog: Productionize hive code with Oozie job and refinery inclusion [8 pts] {hawk} - https://phabricator.wikimedia.org/T118839#1810463 (JAllemandou) [10:29:47] Analytics-Backlog: Write hive code doing pageview data anonimisation with two tables [13 pts] {hawk} - https://phabricator.wikimedia.org/T118838#1810473 (JAllemandou) [10:32:34] Analytics-Backlog: Deploy pageview sanitization and start ongoing process [5 pts] {hawk} - https://phabricator.wikimedia.org/T118841#1810476 (JAllemandou) NEW [10:32:52] Analytics-Backlog: Deploy pageview sanitization and start ongoing process [5 pts] {hawk} - https://phabricator.wikimedia.org/T118841#1810483 (JAllemandou) [10:32:53] Analytics-Backlog: Productionize hive code with Oozie job and refinery inclusion [8 pts] {hawk} - https://phabricator.wikimedia.org/T118839#1810484 (JAllemandou) [10:36:53] (PS3) DCausse: Test do not merge [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253393 [10:37:45] (PS4) DCausse: Test do not merge [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253393 [10:38:38] Analytics-Backlog: Backfill pageview_hourly sanitization - 1 month - [8 pts] {hawk} - DUPLICATE THIS TASK FOR EACH MONTH TO BACKFILL - https://phabricator.wikimedia.org/T118842#1810488 (JAllemandou) NEW [10:40:39] Analytics-Backlog: Sanitize pageview_hourly - subtasked [0 pts] {hawk} - https://phabricator.wikimedia.org/T114675#1810499 (JAllemandou) a:JAllemandou>None [10:41:14] Analytics-Backlog: Deploy pageview sanitization and start ongoing process [5 pts] {hawk} - https://phabricator.wikimedia.org/T118841#1810476 (JAllemandou) [10:41:15] Analytics-Backlog: Productionize hive code with Oozie job and refinery inclusion [8 pts] {hawk} - https://phabricator.wikimedia.org/T118839#1810463 (JAllemandou) [10:41:17] Analytics-Backlog: Backfill pageview_hourly sanitization - 1 month - [8 pts] {hawk} - DUPLICATE THIS TASK FOR EACH MONTH TO BACKFILL - https://phabricator.wikimedia.org/T118842#1810488 (JAllemandou) [10:41:19] Analytics-Backlog: Write hive code doing pageview data anonimisation with two tables [13 pts] {hawk} - https://phabricator.wikimedia.org/T118838#1810450 (JAllemandou) [10:41:46] Analytics-Kanban: Sanitize pageview_hourly - subtasked [0 pts] {hawk} - https://phabricator.wikimedia.org/T114675#1702699 (JAllemandou) [10:58:50] hein dis dis dis ? [10:58:53] oops: ) [11:02:00] Analytics-Tech-community-metrics, Gerrit-Migration: Make MetricsGrimoire/korma support gathering Code Review statistics from Phabricator's Differential - https://phabricator.wikimedia.org/T118753#1810527 (Aklapper) [11:05:33] Analytics-General-or-Unknown, WMDE-Analytics-Engineering, Wikidata, Story: [Story] Statistics for Wikidata exports - https://phabricator.wikimedia.org/T64874#1810535 (Addshore) [11:06:51] Analytics-General-or-Unknown, WMDE-Analytics-Engineering, Wikidata, Story: [Story] Statistics for Wikidata exports - https://phabricator.wikimedia.org/T64874#676394 (Addshore) > Can you get us an internal overview from hadoop? Yes [11:23:20] Analytics-Backlog: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#1810556 (mforns) [11:23:21] Analytics-EventLogging, Analytics-Kanban: {tick} Schema Audit - https://phabricator.wikimedia.org/T102224#1810555 (mforns) [11:23:56] Analytics, Services, operations: Wikimedia pageview API intermittently throwing HTTP 503s - https://phabricator.wikimedia.org/T118817#1810559 (akosiaris) curl with `--compressed` is succeeding every single time. curl with `--compressed` will set **Accept-Encoding: deflate, gzip ** whereas this does... [11:37:15] hi a-team ] [11:37:20] hey mforns :) [11:37:22] *:] [11:37:33] howdy ? [11:37:36] hey joal, I just read your comment on the api sample bug-fix [11:37:41] late work yesterday I have seen [11:37:43] good, thx, you? [11:37:48] good :) [11:38:07] does the bug really persist in your browser? [11:38:37] nope, bug is fixed, I was just looking at the wrong URL (the old one, in labs) [11:38:46] ah! ok :] [11:38:50] That's why I think we should remove that one ! [11:39:05] yes, you're totally right, will do that now [11:39:09] Just to prevent eedjits like me to fill in fake bugs :) [11:39:21] sure [11:40:19] joal, done [11:40:33] awesome :) [11:42:20] joal: update on the ticket, it seems there's a problem in the interaction between rb and aqs [11:42:36] will keep you updated as the situation progresses [11:42:50] mobrovac: you mean an issue between rb and varnish, right ? [11:44:24] no joal, for some unknown reason, content gets gzipped either in rb or in aqs when it shouldn't [11:44:41] weirdo mobrovac [11:44:50] yup [11:44:53] I thought it would have been expected to send gzip [11:45:41] joal: that should happen only iff the client sends accept-encoding: deflate, gzip [11:46:14] mobrovac: hum, maybe rb sends aqs a request with those headers set ? [11:46:46] according to the code, it shouldn't [11:47:01] mwaaarf :( [11:47:05] joal: i'm exploring some options, will keep you up to date [11:47:23] np mobrovac, maybe some tcpdump could also help clearing what happend [11:48:55] joal: weird new fact: the first req succeeds because the content-encoding: gzip header is not set [11:49:00] grrrrrrrrrrrrrrrr [11:49:24] well, at least that looks normal, doesn't it ? [11:50:02] the first one, yes [11:50:07] the others don't [11:50:10] wth??????????? [11:50:48] ¯\_(ツ)_/¯ [11:50:56] hehehe [12:05:56] Analytics-Kanban: Backfill cassandra pageview data - August [5 pts] {slug} - https://phabricator.wikimedia.org/T118845#1810595 (JAllemandou) NEW a:JAllemandou [12:26:47] Analytics-Backlog, Analytics-Dashiki, Google-Code-In-2015, Need-volunteer: Vital-signs layout is broken - https://phabricator.wikimedia.org/T118846#1810619 (mforns) NEW [12:33:55] Analytics, Services, operations: Wikimedia pageview API intermittently throwing HTTP 503s - https://phabricator.wikimedia.org/T118817#1810628 (mobrovac) >>! In T118817#1810559, @akosiaris wrote: > curl with `--compressed` is succeeding every single time. curl with `--compressed` will set > > **Accept... [12:37:26] a-team I'm away for an hour roughly [13:30:36] Analytics, Services, operations: Wikimedia pageview API intermittently throwing HTTP 503s - https://phabricator.wikimedia.org/T118817#1810701 (mobrovac) a:mobrovac We have debugged this further and hopefully found the root cause: `preq` (the lib used by RESTBase to issue external requests) forces gz... [13:30:56] joal: found the problem ^^ [13:31:04] will fix it and deploy today [13:31:32] joal: it is likely you will not need to do a deploy of AQS for this, this should be a RB-problem only [13:34:25] Analytics, CirrusSearch, Discovery, operations, audits-data-retention: Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old. - https://phabricator.wikimedia.org/T118527#1810713 (ArielGlenn) Adding @Ottomata and a link to T84618 which is still pending with a number of open... [13:55:37] Analytics-General-or-Unknown, WMDE-Analytics-Engineering, Wikidata, Story: [Story] Statistics for Wikidata exports - https://phabricator.wikimedia.org/T64874#1810734 (Addshore) Just as a sample here is for a single hour > > 6 text/html; charset=UTF-8 > 2460 application/rdf+xml; charset=... [13:58:18] awesome mobrovac :) [13:58:21] Thanks again ! [13:58:35] np! [14:03:31] o/ joal & milimetric. [14:03:34] Sorry I'm late [14:06:31] Analytics, CirrusSearch, Discovery, operations, audits-data-retention: Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old. - https://phabricator.wikimedia.org/T118527#1810745 (Addshore) [14:21:04] (PS1) Addshore: Fix metric name for getclaims property use [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253604 [14:21:12] (CR) Addshore: [C: 2 V: 2] Fix metric name for getclaims property use [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253604 (owner: Addshore) [14:25:12] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [14:27:03] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [14:55:35] ottomata: hi, I've read some tickets about the EventBus, do you plan to use avro binary with it? [14:57:55] dcausse: no, not at first anyway. we may do some avro in the future, it wouldnt' be too hard to support i think. [14:58:07] ok [14:58:09] but, the event-schemas repo that we just created for event bus is where the avro schemas you are working with should live [14:58:27] is it ready? [14:58:36] there's nothing there afaik yet [14:58:36] https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/event-schemas [14:58:40] but, we can iterate on it [14:58:44] submit a patch :) [14:58:46] :) [14:58:48] make an avro/ dir [14:58:57] is there some guidelines to handle schema evolution ? [14:58:59] and i dunno what the hierarchy shoudl be yet, i guess you should make that be java style [14:59:08] for avro or for jsonschema? [14:59:11] avro [14:59:35] avro should be the usual rules, and we will likely use jenkins to reject changes that don't do it right [14:59:38] I'm running into an issue where the consumer needs to know both schema [14:59:43] buuuuut, not anytime soon [15:00:03] right, you were saying that yesterday, did the union + default thing not work? [15:00:12] no [15:00:14] rats. [15:00:17] http://mail-archives.apache.org/mod_mbox/avro-user/201502.mbox/%3CCAGHyZ6KhssPCq%3DjCoDRHKbCzaAzvMVY2UMWLu%3DHCXVPprkS-0g%40mail.gmail.com%3E [15:00:52] they suggest a solution where you encode the schema rev id in the message [15:01:22] this implies that the consumer must have access to all schema revisions [15:01:24] interesting, dcausse, and this is a problem always? or just for when you are trying to use a union? [15:01:34] ottomata: whenever you update the schema [15:01:57] hm, [15:02:06] rats, that is strange. hm [15:02:11] lemme look around at some code [15:02:18] i mean, kinda makes sense, but is totally annoying [15:02:33] see comments here: https://phabricator.wikimedia.org/T118570 [15:02:59] I was wondering why hive works, but in fact the schema is stored with each records [15:03:13] that's not the case with our kafka topic [15:03:33] right [15:08:41] well, crapo. dcausse [15:08:42] https://issues.apache.org/jira/browse/AVRO-1661 [15:08:46] you are right, and this won't work. [15:09:00] our hopes of not needing to encode a schema id in the kafka message it hink won't work. [15:09:19] so. what to do... [15:09:38] dcausse: i wonder if we should just use confluent stuff for avro.... [15:09:47] stand up their schema registry and use their camus. [15:09:59] nuria: ^^ [15:10:07] for mediawiki it can be annoying [15:10:14] confluent? or what? [15:10:29] having a schema an external schema repo [15:10:29] writing the int id in the message? [15:10:34] no [15:10:44] who will deploy the new schema? [15:10:46] no, i think you don't need that, all you need mw to do is be sure that it writes the message with the proper schema id [15:10:52] yeah [15:10:56] i think that is also not ideal [15:10:57] hm. [15:11:12] wel, i mean, the stuff we are buidling for eventbus makes eventlogging act as a schema registry... [15:11:17] but, we don't have good unique ids [15:11:19] atm [15:11:25] and no avro support ety [15:11:25] yet [15:11:56] We can store all schema revs in the jar for now [15:12:12] yeah, but how will you ID a given kafka message to a schema? [15:12:22] manually :/ [15:12:49] I mean I change the format, add small int in the body [15:12:49] how? camus needs to know the two schemas when writing, right? [15:12:53] oh [15:13:04] ok so add the int id in the message? [15:13:24] but then camus needs to know how to map that to the schemas [15:13:31] or does just hive? [15:13:36] i guess camus doesn't at all [15:13:39] it only needs the writer schema [15:13:51] then on camus side I have CirrusSearchRequestSet.1.avsc CirrusSearchRequestSet.2.avsc ... [15:13:58] in src/resources [15:14:01] yeah, but that is not a unique id [15:14:07] topic + rev [15:14:14] hmm, MMmMmMMMMmmM [15:14:37] this is hackish I know :) [15:14:44] yeah, we do have a topic config that specifies the schema [15:15:09] SchemaRegistry is meant to do that [15:15:13] but i'm experiencing what all noobs experience when they try to do things a different way...realization that smart people have already done this and they did it their way for a reason [15:15:46] dcausse: we could make something generatea unique id for you when you evolve the schema maybe [15:15:59] CirrusSearchRequestSet.4451324355322334.avsc [15:16:03] hm, woudl be nice if it was incrementing though [15:16:09] could do timestamp + iteration [15:16:15] but I need to know what is the latest schema :) [15:16:42] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite, Patch-For-Review: Enable retention of daily metrics for longer periods of time in Graphite - https://phabricator.wikimedia.org/T117402#1810894 (Addshore) Open>Resolved [15:16:48] Schema.REVTS.avsc [15:16:58] sounds good [15:16:58] TS=12345646 [15:16:58] REV=2 [15:17:03] dunno [15:17:55] ottomata: I deployed your CI configuration change for /eventlogging.git [15:18:04] cool thank you! [15:18:08] though the service branch fails flake8 in some commented out code [15:18:18] yeah... [15:18:25] how do I make this recheck again? [15:18:25] https://gerrit.wikimedia.org/r/#/c/253547/ [15:18:27] it'll def fail [15:18:32] recheck? [15:18:33] in comment [15:19:02] dcausse: i don't have time to work on avro support in this now...........hm. [15:19:29] ottomata: ok, will check with erik what we should do then [15:19:33] but, what do we need? we need: shared unique ids for avro schemas, [15:19:45] those IDs embedded in kafka messages [15:19:52] and camus needs to know how to map that Id to a schema [15:19:58] that's it, right? [15:20:04] yes sort of, and write a custom schemaregistry in refinery-camus [15:20:06] yes [15:20:22] yeah, i think it will have the schema repo checked out and just load it or something, but yeah.... [15:20:39] not sure what other analytics devs are up to, maybe madhu can help you with that. [15:20:53] but, i see no reason why you can't submit patches to mediawiki/event-schemas now [15:21:04] maybe an avro/ dir with the java style path and your schemas [15:21:11] ok [15:21:14] and...a hacky bash script to auto evolve a new file with id? [15:21:27] like, given a schema file, it will create a copy with the new name for you? [15:21:28] why not [15:22:14] if you do write that, see if you can make it generic and not avro specific [15:22:22] :) [15:22:26] the jsonschema files can be revisioned the same way [15:22:45] ok [15:22:48] sorry this isn't smoother! [15:23:12] np [15:27:29] joal: how do I know whether or not I'm killing the cluster? Oh... I think Christian posted a how-to-know that on wiki somewhere... [15:27:56] Analytics, CirrusSearch, Discovery, operations, audits-data-retention: Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old. - https://phabricator.wikimedia.org/T118527#1810931 (Ottomata) I don't know much about the mw-logs. Maybe @bd808 knows more, or who to ask? [15:28:03] Analytics-General-or-Unknown, WMDE-Analytics-Engineering, Wikidata, Story: [Story] Statistics for Wikidata exports - https://phabricator.wikimedia.org/T64874#1810930 (JAllemandou) Hi, quick questions on that: Is the need regular, or would one shots make it ? Also, what level of aggregation ? Daily... [15:28:09] aha: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load [15:28:11] milimetric: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load [15:28:19] Ah ... you beat me :-) [15:28:20] I found it! [15:28:26] :) [15:28:49] But I think some (all?) parts of it went stale. [15:28:54] thanks qchris__ and hey, btw, not sure if you saw -we released the pageview API [15:29:00] yeah, I'll update the doc as needed [15:29:20] Congrats! [15:29:24] Analytics, Services, operations: Wikimedia pageview API intermittently throwing HTTP 503s - https://phabricator.wikimedia.org/T118817#1810938 (mobrovac) Open>Resolved [preq PR #9](https://github.com/wikimedia/preq/pull/9) fixed this issue entirely. IT has been deployed and now everything works as... [15:29:45] I noticed yesterday late evening, but I did not find time to go through my emails. [15:29:49] joal: milimetric: ^^ [15:30:06] you rock mobrovac :) [15:31:43] Analytics-Cluster, Database: Replicate Echo tables to analytics-store - https://phabricator.wikimedia.org/T115275#1810941 (jcrespo) I do not think this is possible right now. I would like you to request a different approach- if you want echo on analytics-store, something else has to go (like a core produc... [15:31:56] joal: yw :) [15:32:41] woah, tricky issue, nice fix, thx [15:43:08] ottomata: do we have a good graph / dashboard on CPU usage for the Hadoop cluster? The load monitoring article covers memory and Bytes Out: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load [15:43:35] milimetric: ganglia: http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&c=Analytics+cluster+eqiad&h=&tab=m&vn=&hide-hf=false&m=cpu_report&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name [15:49:30] Analytics-General-or-Unknown: IPv6 GeoIP databases do not automatically update - https://phabricator.wikimedia.org/T56191#1810983 (faidon) Open>Resolved a:faidon This has been solved a looong time ago. [15:49:56] heya hashar, how do get around this flake8 error? [15:49:57] 15:42:21 ./eventlogging/service.py:17:1: F403 'from _codecs import *' used; unable to detect undefined nam [15:50:14] it won't let me import the actual type I want... [15:50:16] frmo _codes import UnicodeError [15:50:21] naw, it wouldnt do it [15:50:25] hoping that is the only use [15:50:25] but i will try some other things [15:50:29] that is [15:50:36] or you can disable the check [15:50:42] suffix the line with: ' # noqa [15:50:44] err [15:50:46] # noqa [15:50:55] that instruct flake8 to ignore the line [15:51:19] ok cool [15:51:32] well, it doesn't look like I'm killing the cluster. But someone is definitely running a monster on stat1003. It's nice-d, so probably no cause for alarm [15:54:18] yeah i can't import it in any other way, don't know why [15:56:09] Analytics-Backlog, Analytics-EventLogging, Analytics-Kanban: More solid Eventlogging alarms for raw/validated {oryx} [8 pts] - https://phabricator.wikimedia.org/T116035#1810999 (mforns) a:mforns [15:59:14] Analytics, CirrusSearch, Discovery, operations, audits-data-retention: Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old. - https://phabricator.wikimedia.org/T118527#1811003 (EBernhardson) I believe the medawiki logs were rsync'd over at @ironholds request. They are not... [16:01:59] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1811006 (Milimetric) > 1. I might have been missing something but at the moment I don't think it is possible to get already computed monthly statistics for an a... [16:03:15] kevinator: argh, on batcave [16:03:21] kevinator: lemme switch [16:03:40] milimetric, re "monster on stat1003": "top" shows a lot of processes by halfak at near 100% ... [16:03:56] ...and as mentioned here yesterday, i've been getting a lot of "ERROR 2006 (HY000): MySQL server has gone away" when accessing analytics-store.eqiad.wmnet from stat1003. [16:03:57] Hey HaeB [16:04:05] ...they might all be niced though [16:04:06] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1650918 (mforns) @bmansurov made this interesting point in the API release email thread: "I just have a question. Is this an evolving thing in a sense that mor... [16:04:10] I don't think my processes are affecting you. [16:04:14] ,,,and not related [16:04:19] I run jobs like this every day :) [16:04:53] halfak: sure, i mainly wanted to connect that to what dan said ;) [16:08:11] Ahh. gotcha [16:09:24] halfak: is it possible to find out the load of the database server itself (analytics-store.eqiad.wmnet)? [16:10:21] (i also had some queries taking a - to me- surprisingly long time; but that might just have been because of the particular table involved)' [16:10:37] HaeB, can you share the query [16:10:39] ? [16:12:05] HaeB, can you run SHOW PROCESSLIST on the mysql server (you might not have perms) [16:12:36] if so, that is an easy way to see long running sql processes [16:12:36] and how many there are [16:12:36] ottomata, anyone can, but that doesn't really show you load. [16:12:36] ottomata, do you know the real name of analytics-store.eqiad.wmnet? [16:12:36] dbstore1001? [16:12:43] i don't, i never remember that [16:12:53] got it! dbstore1002 [16:13:01] analytics-store.eqiad.wmnet. 300 IN CNAME dbstore1002.eqiad.wmnet. [16:13:02] :) [16:13:31] HaeB: http://ganglia.wikimedia.org/latest/?c=MySQL%20eqiad&h=dbstore1002.eqiad.wmnet&m=cpu_report&r=hour&s=descending&hc=4&mc=2 [16:13:32] HaeB, you can use http://graphite.wikimedia.org/ to check on the system stats. [16:13:35] or that! [16:13:42] I like ottomata's better [16:14:03] Analytics-EventLogging, Analytics-Kanban, EventBus, Patch-For-Review: Move EventLogging/server to its own repo and set up CI - https://phabricator.wikimedia.org/T118761#1811042 (Ottomata) Open>Resolved [16:14:06] Analytics, Analytics-Kanban, Discovery, EventBus, and 8 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1811043 (Ottomata) [16:14:16] ottomata, is there a good way to check disk activity? [16:14:20] Maybe IO wait [16:15:11] ja that is a good way :) [16:15:12] ganglia is usually useful for the standard system stats, graphite for everything else [16:15:19] halfak: this one took 1 hour and 6 min: [16:15:20] mysql:research@analytics-store.eqiad.wmnet [log]> SELECT SUM(IF(event_hasServiceWorkerSupport =1,1,0))/SUM(1) FROM MobileWebSectionUsage_14321266 WHERE timestamp LIKE '20151115%' AND wiki = 'enwiki'; [16:15:55] on the other hand, this one only took 3 seconds: [16:15:59] mysql:research@analytics-store.eqiad.wmnet [log]> SELECT COUNT(*) FROM MobileWebSectionUsage_14321266 WHERE timestamp LIKE '20151115%'; [16:16:33] Yeah. Those are very different queries [16:16:40] The latter just uses the index. [16:16:45] (and another one with the same table restricted to the same day took over 2 hours, would need to find that in the log) [16:16:45] It doesn't even need to read the table. [16:17:33] the whole table is about 12GB right now [16:17:47] Looks like the first query does a scan over 1.7 million rows. [16:17:52] halfak: so there is an index for the timestamp? [16:17:56] Yes [16:18:12] With those restrictions, it looks like you will still need to scan 1.7 million rows. [16:18:27] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1811065 (Symac) >> 2. Other thing (for my specific use case) is that it would be great to be able to request statistics for different pages with a unique call t... [16:18:58] but including the "LIKE" for the timestamp should still serve to restrict the number of rows scanned, right? [16:19:07] Yes [16:19:25] The query optimizer thinks that it will restrict the data to 1.7 million rows. [16:20:08] (i know, i should probably dig into this with "explain" .. i've still been more concerned about the connection errors) [16:20:19] thanks, good to know [16:22:13] (need to go offline now, will check out the system stats link later) [16:22:30] o/ [16:23:49] (PS1) Addshore: Add entity usage tracking script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253625 [16:24:33] (PS2) Addshore: Add entity usage tracking script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253625 [16:40:59] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1811161 (Magnus) Update: My treeviews tool ( https://tools.wmflabs.org/glamtools/treeviews/ ), and potentially some others, not use the new API instead of stats... [16:51:25] Analytics-EventLogging, Analytics-Kanban, EventBus: Deploy eventlogging from new repository. - https://phabricator.wikimedia.org/T118863#1811200 (Ottomata) NEW a:Ottomata [16:58:10] Hi! quick update anyone on the percentage of our users with access to LocalStorage? (nuria?) :) thanks! [17:01:25] a-team: typing e-scrum as i have a conflicting meeting on tuesdays [17:01:48] ok nuria [17:01:56] (CR) BryanDavis: "I developed and tested the patch on stat1002. Here's an example run -- https://phabricator.wikimedia.org/P2322" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253046 (https://phabricator.wikimedia.org/T118592) (owner: BryanDavis) [17:04:14] Analytics-General-or-Unknown, The-Wikipedia-Library: Category based-pageview collection for non-Article space, via Treeviews or similar - https://phabricator.wikimedia.org/T112157#1811254 (Sadads) @Magnus Discovered I could use PagePile to hack the tool: https://tools.wmflabs.org/glamtools/treeviews/?q=%... [17:09:29] Analytics-Backlog: Blogpost Pageview API - https://phabricator.wikimedia.org/T118866#1811265 (Nuria) NEW [17:16:04] Analytics, Analytics-Kanban, Discovery, EventBus, and 8 others: Send HTTP stats about eventlogging-service to statsd - https://phabricator.wikimedia.org/T118869#1811301 (Ottomata) NEW a:Ottomata [17:16:20] Analytics-EventLogging, Analytics-Kanban, EventBus: Send HTTP stats about eventlogging-service to statsd - https://phabricator.wikimedia.org/T118869#1811311 (Ottomata) [17:22:34] Analytics-Kanban: Missing Pageview API data for one article {slug} [3 pts] - https://phabricator.wikimedia.org/T118785#1811335 (mforns) [17:23:14] Analytics-Kanban: Pageview API documentation for end users {slug} [8 pts] - https://phabricator.wikimedia.org/T117226#1811339 (mforns) [17:23:16] Analytics-Kanban: Pageview API Press release {slug} [2 pts] - https://phabricator.wikimedia.org/T117225#1811340 (Milimetric) [17:23:56] Analytics-Kanban: Troubleshoot Hebrew characters in Wikimetrics {dove} [2 pts] - https://phabricator.wikimedia.org/T118574#1811342 (mforns) [17:28:34] Analytics-Kanban: AQS should expect article names uriencoded just once {slug} - https://phabricator.wikimedia.org/T118403#1811346 (Ragesoss) Open>Resolved [17:40:52] Analytics-Backlog, Analytics-EventLogging, Analytics-Kanban: More solid Eventlogging alarms for raw/validated {oryx} [8 pts] - https://phabricator.wikimedia.org/T116035#1811404 (mforns) a:mforns>None [17:41:12] joal, I put the EL alarms task for grabs [17:47:59] (PS1) Addshore: entityUsage - retry and report failed db calls [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253642 [17:49:48] mforns: as you want, there is plenty to be done in sanitization so you tell me if you prefer to keep it :) [17:50:09] joal, no it's ok, grab it :] [17:53:27] Analytics-Backlog, Analytics-EventLogging, Analytics-Kanban: More solid Eventlogging alarms for raw/validated {oryx} [8 pts] - https://phabricator.wikimedia.org/T116035#1811473 (JAllemandou) a:JAllemandou [17:58:50] wikimedia/mediawiki-extensions-EventLogging#513 (wmf/1.27.0-wmf.7 - 596a0a8 : Mukunda Modell): The build has errored. [17:58:50] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/596a0a863ff5 [17:58:50] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/91646346 [18:15:18] (CR) Addshore: [C: 2 V: 2] entityUsage - retry and report failed db calls [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253642 (owner: Addshore) [18:15:27] (CR) Addshore: [C: 2 V: 2] Add entity usage tracking script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253625 (owner: Addshore) [18:15:45] (Merged) jenkins-bot: Add entity usage tracking script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253625 (owner: Addshore) [18:15:48] (Merged) jenkins-bot: entityUsage - retry and report failed db calls [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253642 (owner: Addshore) [18:16:15] Analytics-Cluster, Database: Replicate Echo tables to analytics-store - https://phabricator.wikimedia.org/T115275#1811557 (jcrespo) As an alternative, it would be easier to provide you a subset of the tables, or CONNECT tables that are virtual tables (so joins would be slow). [18:17:40] (PS1) Addshore: Fix entity usage example cron [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253648 [18:17:48] (CR) Addshore: [C: 2 V: 2] Fix entity usage example cron [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253648 (owner: Addshore) [18:18:48] dcausse: yt? [18:18:56] nuria: yes [18:19:02] wanted to catch up on avro [18:19:36] dcausse: i still do not see why you need more schemas beyond latests if changes are backwards compatible [18:20:36] dcausse: consumer does not need both schemas if schema only adds new fields, i edit your unit test to *try* to explain this [18:20:53] nuria: if you can that would be awesome :) [18:21:01] dcausse: i did already [18:21:04] dcausse: yesterday [18:21:08] in fact no :/ [18:21:13] what? [18:21:16] you used the same schema to write and read [18:21:17] sorry [18:21:36] it's because my test was a bit messy [18:21:36] right, cause the new schema can be used for both in our use case [18:21:44] you do not need both [18:22:01] provided that schema specoifies new topics with union defaults [18:22:20] not new topics, sorry dcausse , new "fields" [18:23:20] nuria: in the I uploaded there's a binary file. This file was generated without the "newField", I can't find a way to load this data without the exact same schema used to write it [18:23:38] s/in the/in the patch / [18:24:19] and eveything I read about avro seems to confirm this problem [18:26:31] mforns: you here ? [18:26:37] nuria: http://mail-archives.apache.org/mod_mbox/avro-user/201502.mbox/%3CCAGHyZ6KhssPCq%3DjCoDRHKbCzaAzvMVY2UMWLu%3DHCXVPprkS-0g%40mail.gmail.com%3E [18:33:34] dcausse: reading [18:34:48] dcausse: given that we control decoders I am not sure if what says there is true, you do not have that problem on hive data do you? [18:35:42] nuria: it's because hive store the schema with each records [18:36:05] http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop.hive/hive-serde/0.7.1-cdh3u5/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java [18:36:45] ya nuria, writer schema is stored with each avro file [18:37:14] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1811657 (Milimetric) >>! In T112956#1811161, @Magnus wrote: > Update: My treeviews tool ( https://tools.wmflabs.org/glamtools/treeviews/ ), and potentially some... [18:37:30] dcausse: we ahve done no real research into this, but for eventbus, we are only thinking about using jsonschema in kafka atm [18:37:45] later, we will likely figure out the best way to import json data in kafka into ahdoo [18:37:52] which may include avro conversion at that time [18:38:11] if we do, then we don't have to deal with the streaming avro schema problem [18:38:12] dcausse: ok, let me work on this for a bit, need to recharge laptop so i will be offline for a bit. will work on your example, plus do a bit more reserach cc ottomata [18:38:30] nuria: thanks! [18:38:31] ottomata: here ? [18:39:16] joal, sorry I was eating something [18:39:22] np mforns :) [18:39:28] yes [18:39:38] what's up? [18:39:45] I can't recall where to find the scripts for graphite check - any of mforns / ottomata :) [18:39:52] :) [18:39:53] in puppet? [18:39:54] ... [18:39:57] right [18:40:51] joal, as puppet is small, you'll find them quickly ;P [18:41:01] pff mforns [18:41:06] hehehe [18:41:11] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1811690 (Milimetric) > Don't know if it is something that is suitable with the way API as it is constructed but the idea would be to get data for different arti... [18:41:18] I've been searching for eventlo1001, but no luck [18:41:50] https://github.com/wikimedia/operations-puppet/blob/production/modules/eventlogging/manifests/monitoring/graphite.pp [18:42:18] was looking for it also, thanks ottomata :] [18:42:39] man ... I'd have take hours before finding that [18:42:44] Thanks a lot both of you :) [18:42:49] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1811693 (Ragesoss) >>! In T112956#1811657, @Milimetric wrote: >>>! In T112956#1811161, @Magnus wrote: >> Update: My treeviews tool ( https://tools.wmflabs.org/g... [18:43:00] joal: a way to find would be to get some text from the icinga check and grep for it in puppet repo [18:43:01] like [18:43:13] eventlogging_difference_raw_validated [18:43:13] or something [18:43:18] aha [18:43:22] makes sense ottomata [18:43:26] I'll remember that :) [18:44:04] :) sorry ragesoss, I edited my comment to triple-important [18:45:00] a-team I'm off for tonight (setup ready for tomorrow :) [18:45:05] :) [18:45:07] Have a good end of day ! [18:45:19] laters! [18:45:34] nuria: since you all are takinga slightly different approach with eventbus, i think what we are going to do is turn off mediawiki side logging, delete the old data, update schemas on both sides and deploy this new schema. then turn it back on. having generic map fields basically allows us to work around not being able to change the schema and will be "good enough" until eventbus is fully ready for us to switch over [18:45:52] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1811714 (ezachte) Also Magnus and I both pleaded for monthly stats earlier, each for different use cases Erik Zachte. [18:46:15] ebernhardson: ok, but schema evolution will happen with either system [18:46:22] ebernhardson: what do you think about just using jsonschema for now? [18:46:23] milimetric: I rebased my branch that adds Pageviews API support, and all of a sudden a bunch of tests were breaking, and I was like AARRRRGGGGHHHHH. But then I figured out that itwas because the fix for the double-encoding issue had gone live. [18:46:25] ebernhardson: that is a fact, and we have to have a story for it [18:46:28] just an idea [18:46:39] And then I was happy again. [18:46:56] ragesoss: right, we might have some instability in these early days, but hopefully not [18:47:24] nuria: sure, but basically we wanted to add new fields for tests we want to put into production this week (thats why we looked into adding the map). i'd rather unblock things in the easiest way possible for the "right now" and upgrade the rest as things become ready [18:47:30] yeah. I'm just pleased that my tests worked. [18:47:31] :) [18:47:55] ebernhardson: if you do not care about old data deleting +starting is easy [18:48:17] ebernhardson: i would still like to work with dcausse on getting details of avro schema evolution straight [18:48:55] ebernhardson: you can do that right now, I can merge patch but we will not deploy it right now [18:49:55] nuria: for the schema migration, i think json schema will end up having a completely different solution than avro. the sugested method for avro seems to be to pack a binary value before the message, whereas json schema will be something different (not sure what) [18:49:56] ebernhardson: we can deploy it as soon as next deployment happens [18:50:37] nuria: sure, i'll turn off the cirrus logging in this afternoon's swat then [18:52:11] i suppose we don't even have to delete the data in hdfs, since it has the schema prepended, mostly we just need to turn off mediawiki logging before updating the camus schema so it doesn't spew errors on 3k messages/sec [18:53:32] ebernhardson: right, we just need to delete the data from kafka [18:54:13] failure will be on avro [18:54:21] on camus sorry [18:54:31] ebernhardson: right, for historical data, json is very bad for evolution [18:54:51] we can code some stuff in front to 'resolve' it. if our rules are only ever add new fields with default values [18:54:54] never make any other changes [18:55:08] however, in the future, we may use json/jsonschema as canonical schemas and in kafka [18:55:18] but maintain corresponding avro schemas [18:55:29] and convert to avro when we import to hadoop [18:55:43] and never use avro binary in kafka? [18:56:15] yea, after putting this pipeline together and seeing the issues i realized writing binary avro to kafka was not the easiest solution. If i had known more two months ago wouldn't have suggested it [18:56:53] avro is certainly the best codec to limit the message size [18:57:20] but it comes at a cost with this writerSchema and readerSchema :/ [18:57:26] indeed, the uncompressed avro files are 1/3 the size of our compressed text logs [18:58:05] yeah, ebernhardson, dcausse, our experience with avro so far is: wow its great! but woah, it is really hard to use [18:58:21] yes :( [18:58:39] i think i warned a little bit when yall started, but I am happy to see us pushing this and learning these things [18:59:07] but, i mean, if we do things the way confluent et. al. do, we should do the int schema id in message [18:59:10] and just figure that out [18:59:16] i think it would be nice to be able to do avro in kafka [18:59:19] but. maybe just hard right now [18:59:35] dunno [19:04:24] i don't know enough about java/camus, but at least in php it would be super easy to define a map that points to the schemas and unpack an integer from the begining [19:05:45] the php side would be something like: pack( 'n', $version ) . $message; splitting it back apart would be $version = unpack( 'n', substr( $envelope, 0, 2 ) ); $message = substr( $envelope, 2 ); [19:05:59] yes [19:06:13] ebernhardson: camus does this already too [19:06:28] but, the harder part is associating that id with a schema on the camus side. [19:06:29] madhuvishy, . yt? [19:06:35] we don't have a remote schema registry like confluent does (although i guess we could) [19:06:38] mforns: yup! [19:06:56] we could build support somehow into our event-schemas repo to get schema from unique id [19:07:06] and make camus know how to load that [19:07:23] madhuvishy, I was going to take a wikimetrics task [19:07:37] hmm, we could use an md5. Then when camus boots up it just has to hash the schemas it knows about and hold a map in memory [19:07:38] do you think https://phabricator.wikimedia.org/T117287 can be done in parallel with yours? [19:07:51] packing and md5 of the schema that we produce with would be easy too [19:07:54] s/and/an/ [19:08:20] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [19:08:33] mforns: ah! yeah I think so - it's a standalone function right? [19:08:58] ebernhardson: we need an incrementing id [19:09:01] madhuvishy, ah OK [19:09:08] cool, I'll take that one [19:09:10] in order to know which schema is latest [19:09:17] mforns: cool, let me know if you need something [19:09:17] thx madhuvishy [19:09:19] ottomata: ahh, that makes sense [19:09:20] ok [19:09:23] ;] [19:09:30] dcausse talked this morning about a revid concat with a timestamp on new schema creation [19:09:32] like [19:09:43] ts=12345678 [19:09:47] rev=5 [19:09:47] filename is [19:09:53] Schema.512345678.avsc [19:10:05] that way you are pretty sure it is unique, and you know its incremental [19:10:17] ottomata: we also need a away to flag the latest schema rev [19:10:29] just sort, no? [19:10:46] the EventBus stuff I'm doing can get you that [19:10:46] ottomata: right sorry [19:11:51] but, i'd like to not have to tie it to the http service [19:12:03] you shoudl be able to get what you need using local scripts or something, i think... [19:12:03] :/ [19:12:09] really though, the more and more we talk about this [19:12:19] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [19:12:31] it seems like anything simple still ends up being a good amount of work that will eventually be done in eventbus anyways :) [19:12:59] the more it seems we should be using tools that are already built [19:12:59] madhuvishy: linked me to this earlier today [19:12:59] https://github.com/schema-repo/schema-repo [19:12:59] it has rudimentary file based repo support [19:13:02] ebernhardson: yeah..>.>... well, eventually [19:13:17] keep in mind, eventlogging service != event-schema repo [19:13:31] event-schema repo should be useable independent of the service [19:13:37] and, i was telling dcausse, we can work on avro stuff in event-schemas repo now [19:13:39] i dont' have much time for it [19:13:51] ottomata: i can work on it next week [19:13:55] but yall are the ones really pushing it at the moment, so, up to you [19:14:06] that would becool [19:14:14] ebernhardson: we made the schema repo yesterday: mediawiki/event-schemas [19:14:16] ottomata: what's the plan to expose json schema from the repo (REST service)? [19:14:20] but i am not sure what timeline ebernhardson is on [19:14:27] and the team [19:14:48] dcausse: i'm trying to find a source link to send you........buuuuuut i think i did not set up gerrit -> github replication properly or something [19:14:54] well, we can just continue not really using the data in hive :) we still have all this same stuff logging to text files that are just more annoying to process. So we don't need to force a speedy timeline [19:15:23] but dcausse that is still to be worked out, but currently you can grab schemas by scid (name, revision), or just schema_name to get the latest one [19:16:13] ebernhardson: okay! ottomata lets talk about it tomorrow and may be make some tasks [19:16:26] i will have a working mic :P [19:16:57] ok, madhuvishy, i'd like to involve ebernhardson and/or dcausse. yall wanna set up a meeting to flesh this out? [19:17:05] yeah of course [19:17:09] sure [19:17:09] cool [19:17:11] nuria too if she can join [19:17:38] aye [19:17:48] madhuvishy: i'm not working on friday, so before then, or next week [19:18:07] ottomata: sure, anytime. https://issues.apache.org/jira/browse/AVRO-1124 could be suggested reading [19:18:18] ja [19:19:32] ebernhardson: an option dan suggested: just use meta.wm.org like evnetlogging does for jsonschema. that'll give us unique rev ids automatically, and an http service automatically [19:20:57] ottomata: seems fine to me, the only annoyance would be that i think the Schema namespace enforces EL type schemas? but we can put them somewhere else (but schema makes most sense) [19:21:41] yeah, had talked with ori before about adapting it to be more flexible, but since the avscs are just json, it should be pretty easy [19:21:58] we lose the ability to source control and review and CI them in our usual ways by doing it there [19:22:42] it's perhaps not the best location, but we still have to CR actually updating the code to produce with a new schema version [19:22:48] yea [19:22:49] it's a bit of a proxy, but might be "good enough" [19:25:07] i think its kind of a cool idea. [19:25:29] we also had talked about making the Schema: namespace know how to show scheams from a cloned file repo [19:25:40] and make them read only on the wiki [19:26:20] that sounds much harder :) [19:27:37] Analytics-Backlog: Allow metrics to roll up results by user across projects {kudu} [5 pts] - https://phabricator.wikimedia.org/T117287#1811873 (mforns) a:mforns [19:27:48] Analytics-Kanban: Allow metrics to roll up results by user across projects {kudu} [5 pts] - https://phabricator.wikimedia.org/T117287#1770440 (mforns) [19:28:20] madhuvishy: back [19:31:12] ottomata: let's not invent a schema registry rightttt? [19:31:27] :) [19:31:34] agree, but we are kinda already doing it... :/ [19:31:47] we already have one [19:31:51] that wmf is using [19:43:53] another thing we will want to consider, anomie's code to do api logging (which is going to end up in the kafka -> camus -> hadoop pipeline) was just merged today. This doesn't yet turn it on, but means the code will be deployed and ready to turn on [19:46:12] aye [19:46:30] (PS1) Addshore: Remove empty string from EU db list [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253659 [19:46:37] well, hm, i guess as long as they never change it it will work as is? :/ [19:46:43] (CR) Addshore: [C: 2 V: 2] Remove empty string from EU db list [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/253659 (owner: Addshore) [19:46:47] :( [19:47:27] ottomata: yup, if the schema doesn't change [19:47:30] it's fine [19:47:38] which is kinda lame [19:47:43] but yeah [19:49:39] madhuvishy: the schema will always change though [19:49:48] yup yup [19:49:54] madhuvishy: even if changes are only due to typos [19:50:06] no dispute there [19:50:22] just saying, in the current setup it would cause trouble [19:51:12] Hi all! Thanks for all the bikeshedding and help with graphite etc over the past weeks ;) [19:51:57] Here is some of the stuff that has come out of it so far in grafana! https://grafana.wikimedia.org/dashboard/db/wikidata-api-wbgetclaims https://grafana.wikimedia.org/dashboard/db/wikidata-dispatch https://grafana.wikimedia.org/dashboard/db/wikidata-social-followers https://grafana.wikimedia.org/dashboard/db/wikidata-entity-usage [19:52:02] Many thanks! :D [19:53:16] COOL! [19:56:37] ;0 [19:57:41] dcausse: still there? [19:57:53] nuria: yes [19:58:26] dcause: let me understand your test [19:59:00] dcausse: you write a record with CirrusSearchRequestSet schema [19:59:15] the code that writes is @Ignore [19:59:25] it was just used to generate the binary file [19:59:34] then I updated the schema [19:59:38] dcausse: ah hhhh [19:59:43] and I tried to read the binary file again [19:59:47] dcausse: ok, now i get it sorry [19:59:55] dcausse: and you added teh "newField" [20:00:01] yes [20:00:43] I tried different variation ("null", "string" with null as default and "string", "null" with "" as default) [20:02:21] if it is going to work, it has to be with your current version, [20:02:24] dcausse: that is [20:02:39] dcausse: "null", "string" with null as default [20:03:28] what do you mean? [20:05:10] dcausse: that for a default to be null it has to be defined ["null", "string" ] [20:05:35] yes you're right, it's what I've done I think [20:13:45] dcausse: right, right [20:21:02] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1812149 (Halfak) Quick +1 for monthly datasets. I'd even take yearly! For a lot of my work, I just need to compare the relative rate of views for article over... [20:28:59] halfak: this explains far better what I tried to explain you about the repesentation for words and how it's generated: http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/ [20:35:33] ottomata (& halfak): thanks again for the info about monitoring load, i added it here https://wikitech.wikimedia.org/w/index.php?title=Analytics/Data_access&diff=204584&oldid=195253 [20:36:03] HaeB, \o/ [20:37:04] also, if either of you has immediate thought about this : https://wikitech.wikimedia.org/wiki/Talk:Analytics/Data_access - accessing analytics-store from stat1002 (instead of stat1003) doesn't work as advertised [20:37:41] ...not really urgent for me personally, but i thought it shoudl be corrected [20:42:11] HaeB, what you do mean "doesn't work"? [20:42:26] Oh I see. you linked to a talk post [20:42:42] Why is the file not in the same place on stat1002!? [20:43:02] it is there, but has different permissions [20:43:23] I see. [20:43:57] halfak: HaeB I added two lines to the page [20:44:16] HaeB: I din't remove your comment, feel free to remove it if you see fit [20:44:34] on stat1002, it's analytics-research-client.cnf [20:44:43] madhuvishy: can you change it at https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Analytics_slaves instead? [20:44:49] (that's where i quoted from) [20:45:01] on the talk page people won't see it [20:45:01] madhuvishy, seems like it should be the same. Is there a good reason it is different? [20:45:12] I imagine that there are duplicate roles in puppet [20:45:27] halfak: yeah i'm not sure what the reason is [20:45:39] (also it's a wiki taboo to change other people's comments, but don't worry, i won't report you to the admins ;) [20:45:55] WHAT! CALL JIMMY! [20:45:56] HaeB: sorry I din't even realize it was a talk page [20:46:05] ;) [20:46:17] thought that was the actual page, I'll change it in the original [20:46:23] ;) [20:46:40] also, ottomata do you know why we have different conf files to access mysql-store on stat1002 and 1003? [20:47:04] on 1003 is research-client.cnf, and 1002 it's analytics-research-client.cnf [20:47:19] yes, different pws [20:47:23] right? [20:47:32] ottomata, same DB/user [20:47:36] So probably not? [20:47:38] i think different groups? yeah, because, we need research pw on both boxes [20:47:44] Yeah. Different groups. [20:47:46] and control access to the files via posix groups [20:47:53] but, posix groups also control access to boxes [20:48:02] Why not use the same groups? [20:48:10] so, if we wanted to chgrp to researchers group on stat1002 [20:48:11] madhuvishy: cool, that works [20:48:22] that would give access to stat1002 to everyone in the researchers group [20:48:44] HaeB: great, also fixed the wiki page [20:48:50] ottomata, should have a group for access to the DB and a group for access to stat3-like boxes and another for access to stat2-like boxes. [20:49:00] But I imagine that's a refactoring job that is low priority. [20:49:01] we don' thave a way to make a group exist on a box, without also making those users that puppet sees in that group have access on thta box [20:49:15] Oh.... Weird. [20:49:15] yes [20:49:20] we do have that, ssorta [20:49:32] 'researchers' is access to db via research-client.cnf, but also gives access to stat1002 [20:49:43] sorry* [20:49:45] to stat1003* [20:49:51] it is very confusing [20:49:53] i know. [20:49:58] but yes, refactoring is low priority [20:50:58] yo milimetric et al, how do I use pageviews awesomeness with mediawiki? Based on API's " For projects like commons without language codes, use commons.wikimedia" it sounds like "mediawiki.wikimedia" would work, but no. [20:51:22] I mean www.mediawiki.org pages [20:51:34] ah! [20:51:36] right [20:51:38] hm.... [20:52:24] no worries. thanks for the clarifications ottomata [20:53:08] AIUI the db is mediawikiwiki, so www.mediawikiwiki.wikiwikimediapedia ? :-) [20:54:47] :) [20:54:56] I'm confuzzled too, /me looking [20:55:11] (gotta run a quick hive query to figure it out) [20:57:11] spagewmf: just mediawiki? [20:58:14] madhuvishy: yay, that worked! The API help string is misleading. [20:58:26] spagewmf: cool [20:58:34] if you wanna find out you can run something like [20:58:37] select normalized_host from wmf.webrequest where uri_host='www.mediawiki.org' and year=2015 and month=11 and day=13 and hour=14 limit 1; [20:59:15] madhuvishy: where do I file the bug about Pageviews [20:59:35] spagewmf: for the API help? Analytics-Backlog [21:00:02] spagewmf: https://gist.github.com/milimetric/0c4746306419f225921d [21:00:18] ottomata: is https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#Access_data_in_Hadoop something you would recommend trying for executing large eventlogging queries faster? [21:00:23] that's a list of all distinct wikis. mediawiki indeed appears there, but I had tried that and it didn't work in the per-project aggregate [21:00:26] weirdddd. [21:00:41] (distinct projects taken from the data that feeds the API I mean) [21:00:48] milimetric: spagewmf just said it worked [21:00:54] i saw yea [21:01:54] oh I see, it works https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/mediawiki/all-access/all-agents/daily/2015110100/2015110200 [21:02:03] we just don't have monthly granularity for that endpoint yet [21:02:08] cool! [21:06:12] dcausse: still there? [21:06:22] dcausse: still there? [21:08:07] HaeB: i do not think you will find all your data there, persistance is not teh same than db [21:08:50] *the [21:10:06] nuria: the linked documentation already says "NOTE: EventLogging data in HDFS is auto purged after 90 days.", is that what you mean? [21:10:21] HaeB: yeah, we do not have a lot of data there - it will be fairly recent [21:10:26] (already saw that, it wouldn't be a concern in my case) [21:10:41] the purging after 90 days happens in both mysql-store and hadoop [21:11:49] HaeB: data in Hadoop starts from Aug 28th [21:13:13] HaeB: as of the speed i would not assume is faster, i was just caution about amount of data. [21:13:42] dcausse, ottomata , madhuvishy , looked at avro binary encoding a bunch [21:14:10] https://www.irccloud.com/pastebin/8YuPGZ1B/ [21:14:33] second record is same data with an additional field (last one) that can be null, see \0 [21:14:38] (PS1) Madhuvishy: [WIP] Setup celery task workflow to handle running reports for the Global API [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/253750 (https://phabricator.wikimedia.org/T118308) [21:14:46] no worries regarding the amount of data (the table i would be interested in was started after aug 28 and all the partitions seem present in the corresponding folder at /mnt/hdfs/wmf/data/raw/eventlogging/... ) [21:15:11] HaeB: cool then! [21:15:16] nuria: ok, so no speed benefits from using hadoop/mapreduce instead of mariadb? [21:15:25] (CR) jenkins-bot: [V: -1] [WIP] Setup celery task workflow to handle running reports for the Global API [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/253750 (https://phabricator.wikimedia.org/T118308) (owner: Madhuvishy) [21:16:07] (+ ottomata ^ ) [21:16:44] ? [21:17:07] HaeB: hadoop will only be faster than mysql for large datasets [21:17:12] for small ones, mysql is probably faster [21:17:21] there's a lot of overhead in doing hadoop stuff, and it is low latency, but distributed [21:17:27] nuria: ok... [21:17:30] what's that mean? [21:17:34] HaeB: I guess you would need to try it, i wouldn't assume so, hadoop is not fast , it can just tackle a lot of data at once [21:18:07] ottomata: it means that avro ALWAYS expects a byte to be present for a field [21:18:15] ottomata: that can be null [21:18:39] ottomata: let me explain better, something needs to be there, the field cannot be not present altogether [21:18:50] milimetric: do you have 5 minutes to talk real quick, I have some questions on getting these wikimetrics reports results in one place [21:18:58] i'm looking at the MobileWebSectionUsage schema/table which already is 12GB or so [21:19:18] milimetric: I have a mic now :) [21:19:32] one query over one day of data took over 2h, so that's why i thought it might benefit from partitions/parallelization [21:19:47] HaeB: you are welcome to just compare and try :) [21:19:49] https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#Access_data_in_Hadoop [21:19:58] i find spark a little easier to work with json data right away [21:20:04] HaeB: ya I'd expect a query over 1 day to be a lot faster than 2 hours [21:20:08] in hadoop [21:20:12] but dunno [21:20:38] ottomata: by total coincidence, that's where i got the idea from in the first place ;) [21:20:39] https://www.irccloud.com/pastebin/iImQ4l1v/ [21:20:42] ("[13:00] ottomata: is https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#Access_data_in_Hadoop something you would recommend trying for executing large eventlogging queries faster?") [21:20:51] thanks all, might try that later [21:21:05] :) [21:21:13] ottomata, madhuvishy : if we add a field to schema like: [21:21:21] https://www.irccloud.com/pastebin/ebwRxB2e/ [21:21:49] the newer schema will not be able to load old data because that field being null means [21:22:30] ottomata, madhuvishy that avro expects a "null" placeholder on the data [21:22:36] cc dcausse [21:22:39] that's lame [21:23:07] what is the point of all the union and default values then [21:23:13] madhuvishy: and not obvious at all [21:23:16] madhuvishy: right [21:23:17] yeah [21:23:44] madhuvishy: i am going to spend a bit more looking at this but yes, it no work [21:24:25] okay [21:25:15] this is what makes avro hard to work with :/ i don't know what good reason exists behind this restriction [21:26:12] yeah [21:28:07] nuria: yes :( [21:28:48] Analytics-EventLogging, Analytics-Kanban, EventBus, Patch-For-Review: Make eventlogging logs configurable via python config file - https://phabricator.wikimedia.org/T118903#1812283 (Ottomata) NEW a:Ottomata [21:28:59] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0] [21:28:59] madhuvishy: here's another quick and easy task that would be very helpful [21:28:59] https://phabricator.wikimedia.org/T118903?workflow=118780 [21:29:16] nuria: ottomata we should just change the avro libraries to decode this sorta thing :P [21:29:23] heh uh [21:31:08] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [21:34:17] ottomata: cool i'll see if i can do that later today [21:34:59] mforns: btw my super initial patch is here - https://gerrit.wikimedia.org/r/#/c/253750/ [21:35:10] madhuvishy, ok :] [21:35:19] it's a bit hacky in parts but if you go to /reports/global/create [21:35:30] ok [21:35:40] and put in stuff, and watch the queue logs - it will upload and validate the cohort and launch four reports [21:35:44] all of which fail [21:35:46] but whatever [21:35:54] cool, I'll have a look, probably tomorrow [21:36:15] anytime, just wanted to let you know if you if it'd help with your patch [21:36:23] sure! thanks for sharing :] [21:36:40] I found a place for my code, I hope I'm in the right path [21:37:06] mforns: okay great, we can always refactor if needed :) [21:37:20] ok! [21:37:43] dcausse: cause I never give up i am going to spend a bit more looking at this but I think you are right, it is build in that to add a field is NOT backwards compatible if you encode in binary [21:38:30] nuria: it's fine with json encoding? [21:46:12] Analytics-EventLogging, Analytics-Kanban, EventBus, Patch-For-Review: Deploy eventlogging from new repository. - https://phabricator.wikimedia.org/T118863#1812318 (Ottomata) Deployment works. Next up: sudo python setup.py install from master on deployment-eventlogging03 and restart eventlogging an... [22:01:21] kevinator: I'm not at office so joined hangout for our 1-1 [22:01:23] madhuvishy: sorry, I'm not getting pings for some reason. I can talk, you still wanna talk? [22:01:35] oh - after your 1/1 [22:01:37] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1812342 (Legoktm) [22:02:54] milimetric: 1-1 now, yup! I'll ping you [22:06:55] a-team, see you tomorrow! [22:07:07] nite mforns [22:11:33] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1812375 (Legoktm) This shouldn't be too difficult to do. Does EventLogging have a max limit on the "string"... [22:16:33] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1812384 (Sadads) If it does, I can't think of a reason why we would need supper long urls: there is a point... [22:25:25] milimetric: now? [22:25:35] omw [22:32:18] Analytics-Backlog, Discovery, Reading-Infrastructure-Team: Determine proper encoding for structured log data sent to Kafka by MediaWiki - https://phabricator.wikimedia.org/T114733#1812430 (bd808) @EBernhardson you ended up using binary Avro for T103505 right? Is that working well enough to say that it i... [22:44:19] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1812446 (Legoktm) a:Legoktm [23:08:49] Analytics-Tech-community-metrics: What is contributors.html for in korma? - https://phabricator.wikimedia.org/T118522#1812459 (Aklapper) > the last quarter covered is 2015 Q1. It is 2015 Q4 now. To be fixed / updated in https://github.com/Bitergia/mediawiki-dashboard/pull/72 [23:09:36] Analytics-Backlog, Discovery, Reading-Infrastructure-Team: Determine proper encoding for structured log data sent to Kafka by MediaWiki - https://phabricator.wikimedia.org/T114733#1812461 (EBernhardson) Avro is pretty awesome for some things (especially data size), but we are still working out issues re... [23:26:41] madhuvishy: with json encoding we will not have this issue, correct [23:27:03] hmmm [23:27:15] wonder if the search team can switch to json [23:33:02] (CR) Nuria: [C: 2 V: 2] Add UDF for network origin [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253046 (https://phabricator.wikimedia.org/T118592) (owner: BryanDavis) [23:33:16] (CR) Nuria: [C: 2 V: 2] Rename ipAddressMatcherCache -> trustedProxiesCache [analytics/refinery/source] - https://gerrit.wikimedia.org/r/253045 (https://phabricator.wikimedia.org/T118592) (owner: BryanDavis) [23:51:21] btw, hexdump -c is our friend [23:55:23] nuria: I missed updating a test for the enum change. Working on a patch to fix it now