[00:10:58] Analytics, Design: Collect font preference metrics - https://phabricator.wikimedia.org/T108884#1724067 (Tgr) #quicksurveys seem like the perfect tool for this task. [03:40:00] (PS1) Milimetric: [WIP] Archive hourly pageviews by article in wsc format [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) [11:42:07] Analytics-Kanban, RESTBase, Services, Patch-For-Review: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1724792 (mobrovac) >>! In T114830#1723426, @Milimetric wrote: > 1. The AQS `per-article` endpoint should be available publicly at... [11:53:03] Analytics-Kanban: run job using oozie {slug} [13 pts] - https://phabricator.wikimedia.org/T115355#1724801 (JAllemandou) a:JAllemandou [12:36:10] (CR) Joal: [C: 2 V: 2] "LGTM ! Thanks marcel :)" [analytics/refinery] - https://gerrit.wikimedia.org/r/245921 (https://phabricator.wikimedia.org/T113255) (owner: Mforns) [12:40:25] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Improve daily webrequest partition report {hawk} [5 pts] - https://phabricator.wikimedia.org/T113255#1724902 (JAllemandou) Just merged the code. Another change is needed there: https://github.com/wikimedia/operations-puppet/blob/production/manifest... [13:40:58] joal: do you have a sec? I wanted to test this oozie change [13:41:02] and I've never done that :) [13:41:08] hey milimetric [13:41:11] I do have a sec :) [13:41:17] cave ? [13:41:17] ok, I'll join the cave [13:49:48] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [30.0] [13:51:08] (PS2) Milimetric: [WIP] Archive hourly pageviews by article in wsc format [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) [13:53:19] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [14:18:51] joal:yt? [14:18:55] Hey nuria [14:19:02] I am testing your project whitelist :) [14:19:20] ah ok, want to talk about why i did not included country [14:19:22] wassup nuria ? [14:19:27] sure, batcave ? [14:19:36] let me grab battery for my headphones, 1min [14:19:52] np [14:21:37] batcave? [14:21:50] There already [14:21:55] or alternate one maybe ? [14:27:05] Analytics-Kanban, netops, operations, Patch-For-Review: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1725187 (Dzahn) [14:27:07] Analytics, Services: restbase is not listening on port 7231 on aqs* - https://phabricator.wikimedia.org/T114742#1725185 (Dzahn) Resolved>Open We still have 3 CRITs in Icinga for "Restbase endpoints health" on aqs and there was a comment next to them linking to this ticket. https://icinga.wikimedia... [14:28:34] (PS3) Milimetric: Archive hourly pageviews by article in wsc format [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) [14:48:10] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [30.0] [14:51:31] joal: you looking at this EL stuff? How come insertAttempted is 0 on the dashboard? [14:51:45] milimetric: I ma not after it no [14:51:50] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [14:51:53] But I can if needed [14:53:10] hm... it's just weird [14:53:20] i'll take a look in a bit [14:58:49] Analytics-Kanban, Patch-For-Review: Add a 'Guard' job for pageviews {hawk} [13 pts] - https://phabricator.wikimedia.org/T109739#1725259 (Nuria) cc @ironholds, we will not be including country/continent on whitelist, just project, access_method and agent_type [15:02:40] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0] [15:04:30] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [15:05:20] yeah, something's definitely not right with Event logging [15:05:44] those alerts and recoveries are hiding the fact that no data has been inserted since around 06:00 today [15:07:17] yep, nothing in the db since then [15:08:01] milimetric: wow that's wrong [15:08:10] milimetric: anything I can do to help? [15:08:15] i have to go eat :( [15:08:19] but I'll look at it more after [15:08:29] i think everything's in kafka, but it'll run out of space eventually [15:08:37] and also the pressure on mysql when it starts again will be rough [15:08:54] hm, not afraid about kafka, but mysql, yes [15:09:27] hiya [15:09:55] yeah, just noticing this now too [15:10:30] statsv is picking up metrics following a manual restart [15:12:46] but other consumers need to be restarted too, presumably [15:12:58] so the kafka event subscriber code doesn't handle this gracefully [15:17:33] mysql consumer died with this [15:17:33] https://gist.github.com/ottomata/39aa0f4b3dda1a60ed49 [15:17:43] but, other consumers are continuing to run. [15:18:45] ottomata, ou [15:18:59] for how long is it stopped [15:19:46] mforns: last 8 hours ish, i'm trying to discover what it is doing [15:19:49] it is running [15:19:55] it started back up immediately after it died [15:19:59] oh ok [15:20:08] and inserted data [15:20:14] but, the graphs show no data being inserted [15:20:22] so i'm investigating, trying to see what exactly is going on [15:21:55] cool, mforns cool, it looks like it is not consuming, and keeping track of that fact. [15:22:03] so, i believe when I restart the consumer now, it should pick back up from where it left off [15:22:11] !log restarting lagging eventlogging mysql consumer [15:22:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [15:22:25] aha [15:23:09] we should set up burrow [15:23:09] https://engineering.linkedin.com/apache-kafka/burrow-kafka-consumer-monitoring-reinvented [15:23:57] k, its inserting, mforns can you cehck that data in mysql is beign filled in? [15:24:11] ottomata, sure [15:25:21] ottomata, btw, I'm modifying the puppet code adding the --percent-lost flag to the refinery-dump-status-webrequest-partitions call [15:25:39] joal, asked in the CR if someone else should receive the emails? [15:25:50] right mforns [15:26:00] joal, ottomata, do you want me to add any other email? [15:26:05] hi joal :] [15:26:09] :) [15:26:46] whoever wants them is fine [15:27:05] ok [15:29:53] ottomata: can you file a task for that? [15:29:55] for burrow [15:29:59] and consumer monitoring in general [15:30:47] yes [15:31:06] hm. i'm worried that my restart of eventlogging is causing it to reconsume a lot of stuff...possibly from all kafka consumers. [15:31:13] eventlogging kafka consumers. [15:31:32] i'm seeing a huge increase in events produced to everything [15:31:55] why do you think it's re-consuming? [15:32:04] it's presumably just consuming the backlog of events that haven't been consumed [15:32:11] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 46.67% of data above the critical threshold [30.0] [15:32:15] or producing, rather [15:32:52] well, except from what was looking at in grafana, everying was fine except for zmq and mysql consumers [15:32:57] mmmm [15:33:02] those consume form the eventlogging-valid-mixed stream [15:33:17] the schema based topics [15:33:20] all looked normal [15:33:24] but now those have many more events in them [15:34:17] yeah hmmm [15:35:07] ottomata, it makes sense for the mysql consumer to have a higher throughput right now, no? [15:35:16] normally it consumes events as they come in [15:35:47] now it is going max speed [15:36:19] yes [15:36:21] what I fear is the mysql consumer is not prepared to work at this rate, I guess it is queueing all events for insertion [15:36:48] the queue size may be getting BIG [15:36:56] what queue size? mysql events? [15:37:03] yes [15:37:03] shouldn't, remembe,r there isn't an in memory zmq queue any more [15:37:13] if you top eventlog1001 you'll see [15:37:31] it shoudln't die, it might slow for a while, but i think it should just finsih and then go back to normal, right? [15:37:32] mysql consumer it's using 43% of memory [15:37:34] mysql insertino might slow [15:37:46] and going up [15:37:46] hmmmm [15:37:54] remembering how this works... [15:38:00] the eventlogging consumer sticks into a queue for mysql [15:38:02] there are 2 threads [15:38:06] does the queue have a max size that would cause it to block? [15:38:10] one reads from kafka and queues for insertion [15:38:22] the other one, pops from the queue and inserts [15:38:32] insertion is slower than queueing [15:38:38] so the queue is getting big [15:39:04] hm, mforns if this doesn't exist, the queue should probably ahve a max size [15:39:13] with which we could cause the inserting thread to block [15:39:14] ottomata, I agree [15:39:46] and if its size is bigger than X, consuming from kafka should sleep [15:40:05] 50% [15:41:09] mforns: i am hoping that it will settle itself out [15:41:14] ottomata, on the other side, insertions are going pretty fast, If I don't get it wrong, the database is updated until 14:15:00 UTC [15:41:45] lag is going down... [15:41:55] but there are occasional lags in the mysql consumer logs [15:41:58] it already updated almost all the events? [15:43:30] not sure. [15:43:56] I'm querying the db for max(timestamp) and it says 15h UTC [15:46:24] mforns: i think it is not done, and i think that's what i'm worried abotu [15:46:31] mmm [15:46:41] i think it is re-processing client side raw stuff and redoing them. although, hm, i'm really not sure. [15:47:09] ottomata, it's strange that we have a validation gap [15:47:19] ? [15:47:33] the last grah in grafana [15:47:39] *graph [15:47:55] shows a raw-valid difference [15:47:57] why? [15:48:44] i'm confused about that too, mabye i've got the parameters mixed up? [15:49:06] ok, it may be something in the metric [15:51:02] Analytics-Backlog, Analytics-Cluster: Investigate (and remove?) spamy pageviews on pageview_hourly - https://phabricator.wikimedia.org/T115477#1725482 (Aklapper) [15:52:57] ottomata, I've checked the db more closely, it is not done, events are being inserted still [15:53:03] yeah [15:53:17] hmm, yeah, and mforns the mysql handler should guard against dups, right? [15:53:17] hm [15:53:26] we don't see dup errors anymore in the log [15:53:28] ottomata, does consumption from kafka ensure order? [15:53:29] i oh we do [15:53:31] oh! [15:53:33] ottomata, duplicates: yes [15:53:34] and i'm seeing a lot of them [15:53:44] Warning: Duplicate entry 'e5bdf19537f65765a7c724aba5a90238' for key 'ix_MediaViewer_10867062_uuid' [15:53:55] mmm [15:56:34] If I'm seeing it right, it has backfilled around 20% [15:56:42] in 20 mins [15:57:11] mforns: backfilled? [15:57:13] 20%? [15:57:33] yes, it's filling the hole back [15:57:49] ottomata, ^ [15:57:56] right, so the mysql stuff is doing the right thing, although, do you actually see new events coming in? [15:58:06] i would think they would all be duplicates right now. [15:58:50] ottomata, yes events are coming in [15:59:50] hm [16:00:03] can you get an approximate rate of events inserted? [16:01:01] mmm let me try [16:01:34] having problems to join standup [16:01:43] ottomata: standup? do you guys want to join and we can talk about EL cc mforns milimetric [16:01:51] I can't get into the hangout [16:01:55] yeah, same as marcel [16:01:57] me neither! [16:02:00] me too [16:02:02] :) [16:02:03] can't join [16:02:05] bluejeans? [16:02:18] can't in ops meetings [16:02:44] milimetric: can you guys try again, it worked for me just now [16:03:08] nope [16:03:10] yeah, it's not great :) [16:03:16] i guess it works in seattle and france [16:03:18] haha [16:03:34] ok, how does one set up a bluejeans conf? [16:03:35] * mforns tries [16:03:43] milimetric: let me get a bluejeans meeting [16:04:30] meanwhile a-team, can you try getting into this? https://plus.google.com/hangouts/_/wikimedia.org/a-batcave1 [16:04:47] madhuvishy: we are slowly getting in there [16:04:49] a-team, https://plus.google.com/hangouts/_/wikimedia.org/a-batcave-2 worked for me... [16:04:54] he he [16:05:16] can madhuvishy and milimetric try again regular batcave? [16:05:31] yeah, give the regular cave one more shot [16:05:40] ok [16:06:01] ....no can join [16:06:10] can't get in [16:06:14] one more shot in batcave, otherwise i have bluejeans ready [16:06:19] mforns: you are the only one not in there [16:06:26] :[ [16:06:33] ok, so let's try bluejeans [16:06:34] retry ? [16:06:40] I'm retrying [16:06:42] no, it seems both are having trouble [16:06:55] nuria - link? [16:07:02] i sent you an invite [16:07:33] * mforns tries nuria's link [16:07:48] I'm in [16:08:04] mm not yey [16:08:09] yet [16:08:52] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [16:09:52] nuria: everyone's in and we need you to start things I think [16:09:56] nuria: i got in, but it says you aren't there [16:10:10] man i cannot get in...wait a sec [16:10:46] :) ok, let's give this 5 more minutes and then just have an irc meeting, a-team [16:10:59] yea [16:11:06] good for me milimetric [16:13:05] mforns: bluejeans? [16:13:20] I am in and out, in and out of it [16:13:29] mforns: using browser? [16:15:09] Analytics-Kanban, Patch-For-Review: Create Hadoop Job to load data into cassandra [34 pts] {slug} - https://phabricator.wikimedia.org/T108174#1725578 (ggellerman) [16:18:42] Analytics-Kanban: Write in-depth dashiki documentation {crow} [3 pts] - https://phabricator.wikimedia.org/T112685#1725599 (Nuria) Open>Resolved [17:52:39] hey bd808 we hear you're blocked on us finishing monolog work [17:52:44] Analytics-Kanban, RESTBase, Services, Patch-For-Review: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1725902 (mobrovac) Also, if we do end up with exposing everything under the global domain, but also have per-project and per-arti... [17:53:12] bd808: I think this is related: https://phabricator.wikimedia.org/T113521 but I'm not super familiar so you can follow up with nuria and madhuvishy [17:53:30] (this is in response to a Scrum of Scrums blocked on analytics notice from anomie) [17:55:27] bd808: hello, any questions? [17:57:07] nuria, milimetric: cool. I'm mostly waiting on ebernhardson to actually start using all the plumbing I guess and verify that it actually works [17:57:31] bd808: we did verify it works on fri [17:57:59] bd808: using search squema [17:58:20] excellent. My excuses for not moving forward are quickly dropping to 0 [17:58:22] :) [17:58:35] jaja [17:58:49] ops is out for the week [17:59:15] and until they came back we cannot setup new topics and such [17:59:24] mforns_brb: if you have any details from ottomata about the EL outage, mind sending them over? I've gotta write the incident report and I'm trying to get information together. So far I know nothing :) [17:59:39] ok [17:59:45] milimetric, ok [18:00:35] nuria: right. I need to apply for hive access too I suppose so I can see the data once we get it piped in. I'll make some tickets today/tomorrow [18:00:59] joal: let me know when you have time to catch up on the cassandra stuff [18:02:11] hey madhuvishy [18:02:20] In meeting now, after :) [18:02:55] bd808: Make sure to request for stat1002 access and to be added to analytics-privatedata-users group :) There've been confusions before where people got added to the wrong group. [18:02:59] joal: sure [18:02:59] milimetric: pageview api meeting? [18:03:10] sigh [18:05:04] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [18:06:05] Analytics-Tech-community-metrics, DevRel-October-2015: Patches with Verified -1 should not be counted as open in our code review metrics - https://phabricator.wikimedia.org/T108507#1725941 (jgbarah) I guess we've worked on that in the past. It seems and I think it is already fixed. Checking the code in G... [18:06:33] Analytics-Backlog, Analytics-Kanban, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1725942 (Nuria) [18:09:00] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [18:17:31] Analytics-Backlog, Analytics-Kanban, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1725965 (JAllemandou) First Ideas - Presentation 20 mins - 20 mins questions - Presentation content first draft: - How is it built ?... [18:17:49] milimetric, about eventlogging [18:17:51] madhuvishy: Got some time now :) [18:18:31] if you look at https://grafana.wikimedia.org/dashboard/db/eventlogging [18:18:40] (PS10) Joal: [WIP] Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 [18:19:14] you'll see that today at 7am utc, several graphs stopped showing stats [18:19:34] in particular, the inserts and insertAttempts [18:19:43] joal: batcave? [18:19:50] sure ! [18:20:19] so Andrew looked at it an he found that the mysql consumer had died with: https://gist.github.com/ottomata/39aa0f4b3dda1a60ed49 [18:20:49] I checked the db, and in fact the data since 7am was missing [18:21:25] Andrew restarted the mysql consumer at 15:22h utc [18:22:17] and it started consuming events from kafka [18:23:41] milimetric, however, I suspect that we missed some events, because if you do a: select left(timestamp, 10), count(*) from Edit_13457736 where timestamp >= '20151014000000' group by 1; [18:23:54] you'll see the incomplete data [18:24:34] during a couple hours, the memory consumption of the mysql consumer has been really high, like 50% [18:24:48] I wonder if the mysql consumer got restarted [18:25:02] and that is why we lost data? [18:25:35] BTW, a reminder: the mysql consumer has 2 threads [18:25:48] one that pulls from kafka and queues events for insertion [18:26:01] and the other that pops from that queue and inserts into mysql [18:26:31] mforns: milimetric i'm sorry i can't help look into this right now [18:26:35] the first one is a lot faster, so the queue gets big and that's why the memory consumption got high [18:26:43] i've got like my only moment to talk eventbus with ops people today :) [18:26:43] np ottomata [18:27:06] milimetric, I have to go now... I hope this is useful [18:27:25] I'll be back in 1.5 hours [18:31:08] (PS8) Nuria: Add pageview quality check to pageview_hourly [analytics/refinery] - https://gerrit.wikimedia.org/r/240099 (https://phabricator.wikimedia.org/T109739) (owner: Joal) [18:32:52] joal: does pageview quality need to be tested or you have done that already? [18:34:12] milimetric: let me know if you want to look at EL together, will look at metrics [18:35:29] (sorry, had someone drop by for a visit, thanks Marcel, I'll take a look in a bit, my queue is huge and growing today :)) [18:39:01] Analytics-Kanban, RESTBase, Services, Patch-For-Review: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1726031 (Milimetric) > https://wm.org/api/rest_v1/metrics/{project} > -- per-project # or aggregate > -- per-article > --... [18:40:22] Analytics-Backlog, Analytics-Kanban, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1726032 (Milimetric) @kaldari, if you think of future use cases for the pageview API while you implement tools that use it, let us know and... [18:43:55] (CR) Nuria: "Looks good. if I understand we are changing the format to be wsc for wikistats to consume it.. are there any other consumers?" [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) (owner: Milimetric) [18:45:31] (CR) Milimetric: "yes, other consumers of wsc data include the folks who publish the top 5000 pages, top pages per wiki project, etc. Some of these use cas" [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) (owner: Milimetric) [18:48:25] (PS4) Milimetric: Archive hourly pageviews by article in wsc format [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) [18:57:25] joal: let me know when you have a few secs [19:10:56] Analytics-Kanban, RESTBase, Services, Patch-For-Review: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1726066 (mobrovac) >>! In T114830#1726031, @Milimetric wrote: > It seems to me that the global domain is much easier to get conse... [19:13:37] Analytics-Kanban, RESTBase, Services, Patch-For-Review: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1726069 (Milimetric) >>>! In T114830#1723426, @Milimetric wrote: >> 3. The `{project}` parameter should come before `/top` and `/... [19:19:49] (PS11) Joal: [WIP] Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 [19:20:21] nuria: yup sorry [19:20:31] nuria: was in full test with madhuvishy [19:20:50] I tested the oozie stuff about pageview quality, so it should be good to go :) [19:20:54] joal: is there testing taht needs to happen on the whitelist stuff or you have tested everything there? [19:20:57] ah ok, sorry [19:21:01] np :) [19:21:18] I have tested the mecanism, but not with real data is all :) [19:21:30] I think ... [19:21:42] can't recall everything but I think I have tested that :) [19:21:46] nuria: --^ [19:21:52] Going to diner ! [19:22:00] see you guys tomorrow :) [19:22:07] joal: ok, shouldni merge then or test some more? [19:22:28] nuria: since it only sends an email, please have a test run (doesn't cost that much ;0 [19:22:34] joal: k [19:22:38] Thanks a lot [19:22:52] * joal is gone ! [19:39:01] this oozie job that I'm trying to run keeps dying on me nuria [19:39:10] am I bad at debugging or is this error meaningless: https://hue.wikimedia.org/oozie/list_oozie_workflow/0034743-150922143436497-oozie-oozi-W/?coordinator_job_id=0034742-150922143436497-oozie-oozi-C [19:39:28] (I click on Actions there and the KILLED one just has a very generic error message) [19:40:04] Analytics-Backlog, Analytics-EventLogging, Analytics-Kanban: Eventlogging monitoring of consumers (process nanny) - https://phabricator.wikimedia.org/T115495#1726136 (Aklapper) [19:41:10] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1726139 (Eevans) I've expanded upon @gwicke's prototype a bit, progress here: https://github.com/wikimedia/restevent [19:41:35] milimetric: org.apache.hadoop.hive.ql.parse.SemanticException: Line 10:34 Invalid table alias or column reference 'page_title': (possible column names are: qualifier, view_count) [19:41:41] have you seen this? [19:42:00] what?! :) where'd you find that [19:42:04] :) [19:42:24] I think I need glasses actually, in all seriousness, I can't see anymore [19:42:41] https://hue.wikimedia.org/jobbrowser/jobs/job_1441303822549_100128/single_logs [19:42:57] milimetric: scroll up [19:43:43] I see, thx, how did you get to that log? I was trying to click on everything from the coordinator down [19:44:36] milimetric: Actions tab [19:44:42] and you see the failed action [19:44:54] left most has an icon that links to logs [19:46:01] ha! I missed that little link, I just found the same logs if you click through but they're hidden behind a SUCCESS message [19:46:02] thx [19:46:36] milimetric: yeah, also yarn logs -applicationId application_123.. [19:47:41] I think last time I submitted an oozie job was almost two years ago :) [19:47:57] and mapred job -logs job_1441303822549_100128 [19:47:58] :) [19:48:00] hue is a million times nicer than how we were getting errors before [19:48:12] yup, but hue flakes out sometimes [19:48:22] it won't load and you will be very very sad [19:49:06] yes, indeed. Well, it makes me happy for now :) [19:50:51] Analytics-Backlog, Analytics-EventLogging, Analytics-Kanban: Eventlogging monitoring of consumers (process nanny) - https://phabricator.wikimedia.org/T115495#1726147 (Ottomata) This isn't true. The consumer did die, but was started back up. There is already alerting in place to notify if the process... [19:50:53] (PS5) Milimetric: Archive hourly pageviews by article in wsc format [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) [19:51:25] Analytics-Engineering, Analytics-EventLogging, Need-volunteer, Patch-For-Review: EventLogging calling deprecated SyntaxHighlight_GeSHi::buildHeadItem - https://phabricator.wikimedia.org/T71328#1726150 (Legoktm) a:Legoktm [19:54:47] milimetric: on meeting, free in 30 mins? [19:55:11] nuria: no worries, madhu helped me, I love hue now, everything's ok :) [19:57:08] Analytics-EventLogging, I18n, Patch-For-Review, RTL: Headings in json-schema-code-samples have content language direction instead of user language - https://phabricator.wikimedia.org/T62233#1726166 (Legoktm) a:Legoktm Above patches move the code samples out of the content area and into indicators... [20:06:34] milimetric, I'm back, let me know if you want more explanations [20:07:11] thx marcel, still finishing up other stuff, no worries, I'll start the report tonight and look in the db some more, we can add more tomorrow [20:07:20] ok [20:25:04] oh no! hue is giving 500s left and right. When madhu said this would happen I thought it was going to be sometime in the future when I'm in my 60s and almost retired. But it happened now!! So saddd [20:26:46] ok it's fine now, crisis averted [20:29:16] (CR) Milimetric: [C: 1] "Self +1 with supporting evidence. Whoever merges this should talk to me first about the TODO regarding renaming "projectcounts" outputs t" [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) (owner: Milimetric) [20:32:38] milimetric: :P [20:32:47] it happened to me a few minutes back [20:32:55] before i told you [20:33:29] :) it's an emotional roller coaster but at least I don't have to create ssh tunnels, a task to which I have lost too many years of my life [20:34:04] yeah, the yarn logs and mapred job logs are helpful too [20:36:20] milimetric: emotional rollercoaster.. jaja [20:37:26] milimetric: i have not used hue for oozie jobs cause really , it is down everytime i look at it but logs have worked thus far [20:37:55] sounds awesome [20:41:08] :P [20:47:08] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1726337 (Tbayer) >>! In T108925#1719740, @Nuria wrote: >>This is a new and quite important finding. I should say that for the purposes of the Reading team (such... [20:51:17] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Improve daily webrequest partition report {hawk} [5 pts] - https://phabricator.wikimedia.org/T113255#1726365 (mforns) @joal There it is. I added also all Analytics' engineers to receive tha daily email. [21:01:50] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1726413 (Tbayer) @JAllemandou : I see you moved this task to "done" on the board, but left its status as open. In our October 6 IRC discussion ([[http://bots.wmf... [21:10:44] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1726450 (Milimetric) @Tbayer, our tasks just remain open as a matter of process, the product owner closes them as resolved to acknowledge that it's completed to... [21:32:13] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1726530 (Nuria) @Tbayer Sorry but team does not agree that the differences between the old R code and the current pageviews need to be keep up to date in a wiki... [21:51:58] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1726599 (atgo) Ping - what do we need to do to get 1:10 into pgheres? [21:57:32] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1726618 (Jgreen) >>! In T97676#1726599, @atgo wrote: > Ping - what do we need to do to get 1:10 into pgheres?... [22:03:54] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1726630 (Jgreen) One comment on the change from 1:100 to 1:10--this means we're collecting something like 10X t... [22:17:44] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1726675 (ellery) 1:10 is already much better. Pgheres has a campaign field which is crucial. Are you able to ge... [22:21:43] Analytics-EventLogging, I18n, Patch-For-Review, RTL, WMF-deploy-2015-10-20_(1.27.0-wmf.4): Headings in json-schema-code-samples have content language direction instead of user language - https://phabricator.wikimedia.org/T62233#1726695 (Legoktm) Open>Resolved [22:26:47] Analytics-Backlog, Analytics-Cluster: Investigate (and remove?) spamy pageviews on pageview_hourly - https://phabricator.wikimedia.org/T115477#1726716 (Tbayer) https://en.wikipedia.org/wiki/!!!Fuck_You!!! and https://en.wikipedia.org/wiki/!!!Fuck_You!!!_and_Then_Some are real pages, both redirect to https... [22:35:07] night a-team, see you tomorrow!