[00:10:58] <wikibugs>	 Analytics, Design: Collect font preference metrics - https://phabricator.wikimedia.org/T108884#1724067 (Tgr) #quicksurveys seem like the perfect tool for this task.
[03:40:00] <krrrit-wm>	 (PS1) Milimetric: [WIP] Archive hourly pageviews by article in wsc format [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379)
[11:42:07] <wikibugs>	 Analytics-Kanban, RESTBase, Services, Patch-For-Review: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1724792 (mobrovac) >>! In T114830#1723426, @Milimetric wrote: > 1. The AQS `per-article` endpoint should be available publicly at...
[11:53:03] <wikibugs>	 Analytics-Kanban: run job using oozie <mapreduce> {slug} [13 pts] - https://phabricator.wikimedia.org/T115355#1724801 (JAllemandou) a:JAllemandou
[12:36:10] <krrrit-wm>	 (CR) Joal: [C: 2 V: 2] "LGTM ! Thanks marcel :)" [analytics/refinery] - https://gerrit.wikimedia.org/r/245921 (https://phabricator.wikimedia.org/T113255) (owner: Mforns)
[12:40:25] <wikibugs>	 Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Improve daily webrequest partition report {hawk} [5 pts] - https://phabricator.wikimedia.org/T113255#1724902 (JAllemandou) Just merged the code. Another change is needed there: https://github.com/wikimedia/operations-puppet/blob/production/manifest...
[13:40:58] <milimetric>	 joal: do you have a sec?  I wanted to test this oozie change
[13:41:02] <milimetric>	 and I've never done that :)
[13:41:08] <joal>	 hey milimetric
[13:41:11] <joal>	 I do have a sec :)
[13:41:17] <joal>	 cave ?
[13:41:17] <milimetric>	 ok, I'll join the cave
[13:49:48] <icinga-wm>	 PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [30.0]
[13:51:08] <krrrit-wm>	 (PS2) Milimetric: [WIP] Archive hourly pageviews by article in wsc format [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379)
[13:53:19] <icinga-wm>	 RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0]
[14:18:51] <nuria>	 joal:yt?
[14:18:55] <joal>	 Hey nuria
[14:19:02] <joal>	 I am testing your project whitelist :)
[14:19:20] <nuria>	 ah ok, want to talk about why i did not included country
[14:19:22] <joal>	 wassup nuria ?
[14:19:27] <joal>	 sure, batcave ?
[14:19:36] <nuria>	 let me grab battery for my headphones,  1min
[14:19:52] <joal>	 np
[14:21:37] <nuria>	 batcave?
[14:21:50] <joal>	 There already
[14:21:55] <joal>	 or alternate one maybe ?
[14:27:05] <wikibugs>	 Analytics-Kanban, netops, operations, Patch-For-Review: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1725187 (Dzahn)
[14:27:07] <wikibugs>	 Analytics, Services: restbase is not listening on port 7231 on aqs* - https://phabricator.wikimedia.org/T114742#1725185 (Dzahn) Resolved>Open We still have 3 CRITs in Icinga for "Restbase endpoints health" on aqs and there was a comment next to them linking to this ticket.   https://icinga.wikimedia...
[14:28:34] <krrrit-wm>	 (PS3) Milimetric: Archive hourly pageviews by article in wsc format [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379)
[14:48:10] <icinga-wm>	 PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [30.0]
[14:51:31] <milimetric>	 joal: you looking at this EL stuff?  How come insertAttempted is 0 on the dashboard?
[14:51:45] <joal>	 milimetric: I ma not after it no
[14:51:50] <icinga-wm>	 RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0]
[14:51:53] <joal>	 But I can if needed
[14:53:10] <milimetric>	 hm... it's just weird
[14:53:20] <milimetric>	 i'll take a look in a bit
[14:58:49] <wikibugs>	 Analytics-Kanban, Patch-For-Review: Add a 'Guard' job for pageviews {hawk} [13 pts] - https://phabricator.wikimedia.org/T109739#1725259 (Nuria) cc @ironholds, we will not be including country/continent on whitelist, just project, access_method and agent_type
[15:02:40] <icinga-wm>	 PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0]
[15:04:30] <icinga-wm>	 RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0]
[15:05:20] <milimetric>	 yeah, something's definitely not right with Event logging
[15:05:44] <milimetric>	 those alerts and recoveries are hiding the fact that no data has been inserted since around 06:00 today
[15:07:17] <milimetric>	 yep, nothing in the db since then
[15:08:01] <joal>	 milimetric: wow that's wrong
[15:08:10] <joal>	 milimetric: anything I can do to help?
[15:08:15] <milimetric>	 i have to go eat :(
[15:08:19] <milimetric>	 but I'll look at it more after
[15:08:29] <milimetric>	 i think everything's in kafka, but it'll run out of space eventually
[15:08:37] <milimetric>	 and also the pressure on mysql when it starts again will be rough
[15:08:54] <joal>	 hm, not afraid about kafka, but mysql, yes
[15:09:27] <ottomata>	 hiya
[15:09:55] <ottomata>	 yeah, just noticing this now too
[15:10:30] <ori>	 statsv is picking up metrics following a manual restart
[15:12:46] <ori>	 but other consumers need to be restarted too, presumably
[15:12:58] <ori>	 so the kafka event subscriber code doesn't handle this gracefully
[15:17:33] <ottomata>	 mysql consumer died with this
[15:17:33] <ottomata>	 https://gist.github.com/ottomata/39aa0f4b3dda1a60ed49
[15:17:43] <ottomata>	 but, other consumers are continuing to run.
[15:18:45] <mforns>	 ottomata, ou
[15:18:59] <mforns>	 for how long is it stopped
[15:19:46] <ottomata>	 mforns:  last 8 hours ish, i'm trying to discover what it is doing
[15:19:49] <ottomata>	 it is running
[15:19:55] <ottomata>	 it started back up immediately after it died
[15:19:59] <mforns>	 oh ok
[15:20:08] <ottomata>	 and inserted data
[15:20:14] <ottomata>	 but, the graphs show no data being inserted
[15:20:22] <ottomata>	 so i'm investigating, trying to see what exactly is going on
[15:21:55] <ottomata>	 cool, mforns cool, it looks like it is not consuming, and keeping track of that fact.
[15:22:03] <ottomata>	 so, i believe when I restart the consumer now, it should pick back up from where it left off
[15:22:11] <ottomata>	 !log restarting lagging eventlogging mysql consumer
[15:22:14] <analytics-logbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master
[15:22:25] <mforns>	 aha
[15:23:09] <ottomata>	 we should set up burrow
[15:23:09] <ottomata>	 https://engineering.linkedin.com/apache-kafka/burrow-kafka-consumer-monitoring-reinvented
[15:23:57] <ottomata>	 k, its inserting, mforns can you cehck that data in mysql is beign filled in?
[15:24:11] <mforns>	 ottomata, sure
[15:25:21] <mforns>	 ottomata, btw, I'm modifying the puppet code adding the --percent-lost flag to the refinery-dump-status-webrequest-partitions call
[15:25:39] <mforns>	 joal, asked in the CR if someone else should receive the emails?
[15:25:50] <joal>	 right mforns
[15:26:00] <mforns>	 joal, ottomata, do you want me to add any other email?
[15:26:05] <mforns>	 hi joal :]
[15:26:09] <joal>	 :)
[15:26:46] <ottomata>	 whoever wants them is fine
[15:27:05] <mforns>	 ok
[15:29:53] <ori>	 ottomata: can you file a task for that?
[15:29:55] <ori>	 for burrow
[15:29:59] <ori>	 and consumer monitoring in general
[15:30:47] <ottomata>	 yes
[15:31:06] <ottomata>	 hm.       i'm worried that my restart of eventlogging is causing it to reconsume a lot of stuff...possibly from all kafka consumers.
[15:31:13] <ottomata>	 eventlogging kafka consumers.
[15:31:32] <ottomata>	 i'm seeing a huge increase in events produced to everything
[15:31:55] <ori>	 why do you think it's re-consuming?
[15:32:04] <ori>	 it's presumably just consuming the backlog of events that haven't been consumed
[15:32:11] <icinga-wm>	 PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 46.67% of data above the critical threshold [30.0]
[15:32:15] <ori>	 or producing, rather
[15:32:52] <ottomata>	 well, except from what was looking at in grafana, everying was fine except for zmq and mysql consumers
[15:32:57] <mforns>	 mmmm
[15:33:02] <ottomata>	 those consume form the eventlogging-valid-mixed stream
[15:33:17] <ottomata>	 the schema based topics
[15:33:20] <ottomata>	 all looked normal
[15:33:24] <ottomata>	 but now those have many more events in them
[15:34:17] <ottomata>	 yeah hmmm
[15:35:07] <mforns>	 ottomata, it makes sense for the mysql consumer to have a higher throughput right now, no?
[15:35:16] <mforns>	 normally it consumes events as they come in
[15:35:47] <mforns>	 now it is going max speed
[15:36:19] <ottomata>	 yes
[15:36:21] <mforns>	 what I fear is the mysql consumer is not prepared to work at this rate, I guess it is queueing all events for insertion
[15:36:48] <mforns>	 the queue size may be getting BIG
[15:36:56] <ottomata>	 what queue size?  mysql events?
[15:37:03] <mforns>	 yes
[15:37:03] <ottomata>	 shouldn't, remembe,r there isn't an in memory zmq queue any more
[15:37:13] <mforns>	 if you top eventlog1001 you'll see
[15:37:31] <ottomata>	 it shoudln't die, it might slow for a while, but i think it should just finsih and then go back to normal, right?
[15:37:32] <mforns>	 mysql consumer it's using 43% of memory
[15:37:34] <ottomata>	 mysql insertino might slow
[15:37:46] <mforns>	 and going up
[15:37:46] <ottomata>	 hmmmm
[15:37:54] <ottomata>	 remembering how this works...
[15:38:00] <ottomata>	 the eventlogging consumer sticks into a queue for mysql
[15:38:02] <mforns>	 there are 2 threads
[15:38:06] <ottomata>	 does the queue have a max size that would cause it to block?
[15:38:10] <mforns>	 one reads from kafka and queues for insertion
[15:38:22] <mforns>	 the other one, pops from the queue and inserts
[15:38:32] <mforns>	 insertion is slower than queueing
[15:38:38] <mforns>	 so the queue is getting big
[15:39:04] <ottomata>	 hm, mforns if this doesn't exist, the queue should probably ahve a max size
[15:39:13] <ottomata>	 with which we could cause the inserting thread to block
[15:39:14] <mforns>	 ottomata, I agree
[15:39:46] <mforns>	 and if its size is bigger than X, consuming from kafka should sleep
[15:40:05] <mforns>	 50%
[15:41:09] <ottomata>	 mforns:  i am hoping that it will settle itself out
[15:41:14] <mforns>	 ottomata, on the other side, insertions are going pretty fast, If I don't get it wrong, the database is updated until 14:15:00 UTC
[15:41:45] <ottomata>	 lag is going down...
[15:41:55] <ottomata>	 but there are occasional lags in the mysql consumer logs
[15:41:58] <mforns>	 it already updated almost all the events?
[15:43:30] <ottomata>	 not sure.
[15:43:56] <mforns>	 I'm querying the db for max(timestamp) and it says 15h UTC
[15:46:24] <ottomata>	 mforns:  i think it is not done, and i think that's what i'm worried abotu
[15:46:31] <mforns>	 mmm
[15:46:41] <ottomata>	 i think it is re-processing client side raw stuff and redoing them.  although, hm, i'm really not sure.
[15:47:09] <mforns>	 ottomata, it's strange that we have a validation gap
[15:47:19] <ottomata>	 ?
[15:47:33] <mforns>	 the last grah in grafana
[15:47:39] <mforns>	 *graph
[15:47:55] <mforns>	 shows a raw-valid difference
[15:47:57] <mforns>	 why?
[15:48:44] <ottomata>	 i'm confused about that too, mabye i've got the parameters mixed up?
[15:49:06] <mforns>	 ok, it may be something in the metric
[15:51:02] <wikibugs>	 Analytics-Backlog, Analytics-Cluster: Investigate (and remove?) spamy pageviews on pageview_hourly - https://phabricator.wikimedia.org/T115477#1725482 (Aklapper)
[15:52:57] <mforns>	 ottomata, I've checked the db more closely, it is not done, events are being inserted still
[15:53:03] <ottomata>	 yeah
[15:53:17] <ottomata>	 hmm, yeah, and mforns the mysql handler should guard against dups, right?
[15:53:17] <ottomata>	 hm
[15:53:26] <ottomata>	 we don't see dup errors anymore in the log
[15:53:28] <mforns>	 ottomata, does consumption from kafka ensure order?
[15:53:29] <ottomata>	 i oh we do
[15:53:31] <ottomata>	 oh!
[15:53:33] <mforns>	 ottomata, duplicates: yes
[15:53:34] <ottomata>	 and i'm seeing a lot of them
[15:53:44] <ottomata>	 Warning: Duplicate entry 'e5bdf19537f65765a7c724aba5a90238' for key 'ix_MediaViewer_10867062_uuid'
[15:53:55] <mforns>	 mmm
[15:56:34] <mforns>	 If I'm seeing it right, it has backfilled around 20%
[15:56:42] <mforns>	 in 20 mins
[15:57:11] <ottomata>	 mforns:  backfilled?
[15:57:13] <ottomata>	 20%?
[15:57:33] <mforns>	 yes, it's filling the hole back
[15:57:49] <mforns>	 ottomata, ^
[15:57:56] <ottomata>	 right, so the mysql stuff is doing the right thing, although, do you actually see new events coming in?
[15:58:06] <ottomata>	 i would think they would all be duplicates right now.
[15:58:50] <mforns>	 ottomata, yes events are coming in
[15:59:50] <ottomata>	 hm
[16:00:03] <ottomata>	 can you get an approximate rate of events inserted?
[16:01:01] <mforns>	 mmm let me try
[16:01:34] <mforns>	 having problems to join standup
[16:01:43] <nuria>	 ottomata: standup? do you guys want to join and we can talk about EL cc mforns milimetric
[16:01:51] <milimetric>	 I can't get into the hangout
[16:01:55] <milimetric>	 yeah, same as marcel
[16:01:57] <mforns>	 me neither!
[16:02:00] <madhuvishy>	 me too
[16:02:02] <milimetric>	 :)
[16:02:03] <madhuvishy>	 can't join
[16:02:05] <milimetric>	 bluejeans?
[16:02:18] <ottomata>	 can't in ops meetings
[16:02:44] <nuria>	 milimetric: can you guys try again, it worked for me just now
[16:03:08] <madhuvishy>	 nope
[16:03:10] <milimetric>	 yeah, it's not great :)
[16:03:16] <milimetric>	 i guess it works in seattle and france
[16:03:18] <milimetric>	 haha
[16:03:34] <milimetric>	 ok, how does one set up a bluejeans conf?
[16:03:35] * mforns tries
[16:03:43] <nuria>	 milimetric: let me get a bluejeans meeting
[16:04:30] <madhuvishy>	 meanwhile a-team, can you try getting into this? https://plus.google.com/hangouts/_/wikimedia.org/a-batcave1
[16:04:47] <joal>	 madhuvishy: we are slowly getting in there
[16:04:49] <mforns>	 a-team, https://plus.google.com/hangouts/_/wikimedia.org/a-batcave-2 worked for me...
[16:04:54] <madhuvishy>	 he he
[16:05:16] <nuria>	 can madhuvishy and milimetric try again regular batcave?
[16:05:31] <milimetric>	 yeah, give the regular cave one more shot
[16:05:40] <mforns>	 ok
[16:06:01] <mforns>	 ....no can join
[16:06:10] <madhuvishy>	 can't get in
[16:06:14] <nuria>	 one more shot in batcave, otherwise i have bluejeans ready
[16:06:19] <joal>	 mforns: you are the only one not in there
[16:06:26] <mforns>	 :[
[16:06:33] <milimetric>	 ok, so let's try bluejeans
[16:06:34] <joal>	 retry ?
[16:06:40] <mforns>	 I'm retrying
[16:06:42] <milimetric>	 no, it seems both are having trouble
[16:06:55] <milimetric>	 nuria - link?
[16:07:02] <nuria>	 i sent you an invite
[16:07:33] * mforns tries nuria's link
[16:07:48] <mforns>	 I'm in
[16:08:04] <mforns>	 mm not yey
[16:08:09] <mforns>	 yet
[16:08:52] <icinga-wm>	 RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0]
[16:09:52] <milimetric>	 nuria: everyone's in and we need you to start things I think
[16:09:56] <madhuvishy>	 nuria: i got in, but it says you aren't there
[16:10:10] <nuria>	 man i cannot get in...wait a sec
[16:10:46] <milimetric>	 :) ok, let's give this 5 more minutes and then just have an irc meeting, a-team
[16:10:59] <mforns>	 yea
[16:11:06] <joal>	 good for me milimetric
[16:13:05] <joal>	 mforns: bluejeans?
[16:13:20] <mforns>	 I am in and out, in and out of it
[16:13:29] <nuria>	 mforns: using browser?
[16:15:09] <wikibugs>	 Analytics-Kanban, Patch-For-Review: Create Hadoop Job to load data into cassandra [34 pts] {slug} - https://phabricator.wikimedia.org/T108174#1725578 (ggellerman)
[16:18:42] <wikibugs>	 Analytics-Kanban: Write in-depth dashiki documentation {crow} [3 pts] - https://phabricator.wikimedia.org/T112685#1725599 (Nuria) Open>Resolved
[17:52:39] <milimetric>	 hey bd808 we hear you're blocked on us finishing monolog work
[17:52:44] <wikibugs>	 Analytics-Kanban, RESTBase, Services, Patch-For-Review: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1725902 (mobrovac) Also, if we do end up with exposing everything under the global domain, but also have per-project and per-arti...
[17:53:12] <milimetric>	 bd808: I think this is related: https://phabricator.wikimedia.org/T113521 but I'm not super familiar so you can follow up with nuria and madhuvishy
[17:53:30] <milimetric>	 (this is in response to a Scrum of Scrums blocked on analytics notice from anomie)
[17:55:27] <nuria>	 bd808: hello, any questions?
[17:57:07] <bd808>	 nuria, milimetric:  cool. I'm mostly waiting on ebernhardson to actually start using all the plumbing I guess and verify that it actually works
[17:57:31] <nuria>	 bd808: we did verify it works on fri
[17:57:59] <nuria>	 bd808: using search squema
[17:58:20] <bd808>	 excellent. My excuses for not moving forward are quickly dropping to 0
[17:58:22] <bd808>	  :)
[17:58:35] <nuria>	 jaja
[17:58:49] <nuria>	 ops is out for the week
[17:59:15] <nuria>	 and until they came back we cannot setup new topics and such
[17:59:24] <milimetric>	 mforns_brb: if you have any details from ottomata about the EL outage, mind sending them over?  I've gotta write the incident report and I'm trying to get information together.  So far I know nothing :)
[17:59:39] <mforns>	 ok
[17:59:45] <mforns>	 milimetric, ok
[18:00:35] <bd808>	 nuria: right. I need to apply for hive access too I suppose so I can see the data once we get it piped in. I'll make some tickets today/tomorrow
[18:00:59] <madhuvishy>	 joal: let me know when you have time to catch up on the cassandra stuff
[18:02:11] <joal>	 hey madhuvishy
[18:02:20] <joal>	 In meeting now, after :)
[18:02:55] <madhuvishy>	 bd808: Make sure to request for stat1002 access and to be added to analytics-privatedata-users group :) There've been confusions before where people got added to the wrong group.
[18:02:59] <madhuvishy>	 joal: sure
[18:02:59] <nuria>	 milimetric: pageview api meeting?
[18:03:10] <milimetric>	 sigh
[18:05:04] <icinga-wm>	 PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0]
[18:06:05] <wikibugs>	 Analytics-Tech-community-metrics, DevRel-October-2015: Patches with Verified -1 should not be counted as open in our code review metrics - https://phabricator.wikimedia.org/T108507#1725941 (jgbarah) I guess we've worked on that in the past. It seems  and I think it is already fixed. Checking the code in G...
[18:06:33] <wikibugs>	 Analytics-Backlog, Analytics-Kanban, Wikimedia-Developer-Summit-2016: Developer summit session:  Pageview API  overview - https://phabricator.wikimedia.org/T112956#1725942 (Nuria)
[18:09:00] <icinga-wm>	 RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0]
[18:17:31] <wikibugs>	 Analytics-Backlog, Analytics-Kanban, Wikimedia-Developer-Summit-2016: Developer summit session:  Pageview API  overview - https://phabricator.wikimedia.org/T112956#1725965 (JAllemandou) First Ideas  - Presentation 20 mins  - 20 mins questions  - Presentation content first draft:     - How is it built ?...
[18:17:49] <mforns>	 milimetric, about eventlogging
[18:17:51] <joal>	 madhuvishy: Got some time now :)
[18:18:31] <mforns>	 if you look at https://grafana.wikimedia.org/dashboard/db/eventlogging
[18:18:40] <krrrit-wm>	 (PS10) Joal: [WIP] Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224
[18:19:14] <mforns>	 you'll see that today at 7am utc, several graphs stopped showing stats
[18:19:34] <mforns>	 in particular, the inserts and insertAttempts
[18:19:43] <madhuvishy>	 joal: batcave?
[18:19:50] <joal>	 sure !
[18:20:19] <mforns>	 so Andrew looked at it an he found that the mysql consumer had died with: https://gist.github.com/ottomata/39aa0f4b3dda1a60ed49
[18:20:49] <mforns>	 I checked the db, and in fact the data since 7am was missing
[18:21:25] <mforns>	 Andrew restarted the mysql consumer at 15:22h utc
[18:22:17] <mforns>	 and it started consuming events from kafka
[18:23:41] <mforns>	 milimetric, however, I suspect that we missed some events, because if you do a: select left(timestamp, 10), count(*) from Edit_13457736 where timestamp >= '20151014000000' group by 1;
[18:23:54] <mforns>	 you'll see the incomplete data
[18:24:34] <mforns>	 during a couple hours, the memory consumption of the mysql consumer has been really high, like 50%
[18:24:48] <mforns>	 I wonder if the mysql consumer got restarted
[18:25:02] <mforns>	 and that is why we lost data?
[18:25:35] <mforns>	 BTW, a reminder: the mysql consumer has 2 threads
[18:25:48] <mforns>	 one that pulls from kafka and queues events for insertion
[18:26:01] <mforns>	 and the other that pops from that queue and inserts into mysql
[18:26:31] <ottomata>	 mforns:  milimetric i'm sorry i can't help look into this right now
[18:26:35] <mforns>	 the first one is a lot faster, so the queue gets big and that's why the memory consumption got high
[18:26:43] <ottomata>	 i've got like my only moment to talk eventbus with ops people today :)
[18:26:43] <mforns>	 np ottomata
[18:27:06] <mforns>	 milimetric, I have to go now... I hope this is useful
[18:27:25] <mforns>	 I'll be back in 1.5 hours
[18:31:08] <krrrit-wm>	 (PS8) Nuria: Add pageview quality check to pageview_hourly [analytics/refinery] - https://gerrit.wikimedia.org/r/240099 (https://phabricator.wikimedia.org/T109739) (owner: Joal)
[18:32:52] <nuria>	 joal: does pageview quality need to be tested or you have done that already?
[18:34:12] <nuria>	 milimetric: let me know if you want to look at EL together, will look at metrics
[18:35:29] <milimetric>	 (sorry, had someone drop by for a visit, thanks Marcel, I'll take a look in a bit, my queue is huge and growing today :))
[18:39:01] <wikibugs>	 Analytics-Kanban, RESTBase, Services, Patch-For-Review: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1726031 (Milimetric) > https://wm.org/api/rest_v1/metrics/{project} >   -- per-project  # or aggregate >   -- per-article >   --...
[18:40:22] <wikibugs>	 Analytics-Backlog, Analytics-Kanban, Wikimedia-Developer-Summit-2016: Developer summit session:  Pageview API  overview - https://phabricator.wikimedia.org/T112956#1726032 (Milimetric) @kaldari, if you think of future use cases for the pageview API while you implement tools that use it, let us know and...
[18:43:55] <krrrit-wm>	 (CR) Nuria: "Looks good. if I understand we are changing the format to be wsc for wikistats to consume it.. are there any other consumers?" [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) (owner: Milimetric)
[18:45:31] <krrrit-wm>	 (CR) Milimetric: "yes, other consumers of wsc data include the folks who publish the top 5000 pages, top pages per wiki project, etc. Some of these use cas" [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) (owner: Milimetric)
[18:48:25] <krrrit-wm>	 (PS4) Milimetric: Archive hourly pageviews by article in wsc format [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379)
[18:57:25] <nuria>	 joal: let me know when you have a few secs
[19:10:56] <wikibugs>	 Analytics-Kanban, RESTBase, Services, Patch-For-Review: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1726066 (mobrovac) >>! In T114830#1726031, @Milimetric wrote: > It seems to me that the global domain is much easier to get conse...
[19:13:37] <wikibugs>	 Analytics-Kanban, RESTBase, Services, Patch-For-Review: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1726069 (Milimetric) >>>! In T114830#1723426, @Milimetric wrote: >> 3. The `{project}` parameter should come before `/top` and `/...
[19:19:49] <krrrit-wm>	 (PS11) Joal: [WIP] Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224
[19:20:21] <joal>	 nuria: yup sorry
[19:20:31] <joal>	 nuria: was in full test with madhuvishy
[19:20:50] <joal>	 I tested the oozie stuff about pageview quality, so it should be good to go :)
[19:20:54] <nuria>	 joal: is there testing taht needs to happen on the whitelist stuff or you have tested everything there?
[19:20:57] <nuria>	 ah ok, sorry
[19:21:01] <joal>	 np :)
[19:21:18] <joal>	 I have tested the mecanism, but not with real data is all :)
[19:21:30] <joal>	 I think ...
[19:21:42] <joal>	 can't recall everything but I think I have tested that :)
[19:21:46] <joal>	 nuria: --^
[19:21:52] <joal>	 Going to diner !
[19:22:00] <joal>	 see you guys tomorrow :)
[19:22:07] <nuria>	 joal: ok, shouldni merge then or test some more?
[19:22:28] <joal>	 nuria: since it only sends an email, please have a test run (doesn't cost that much ;0
[19:22:34] <nuria>	 joal: k
[19:22:38] <joal>	 Thanks a lot
[19:22:52] * joal is gone !
[19:39:01] <milimetric>	 this oozie job that I'm trying to run keeps dying on me nuria
[19:39:10] <milimetric>	 am I bad at debugging or is this error meaningless: https://hue.wikimedia.org/oozie/list_oozie_workflow/0034743-150922143436497-oozie-oozi-W/?coordinator_job_id=0034742-150922143436497-oozie-oozi-C
[19:39:28] <milimetric>	 (I click on Actions there and the KILLED one just has a very generic error message)
[19:40:04] <wikibugs>	 Analytics-Backlog, Analytics-EventLogging, Analytics-Kanban: Eventlogging monitoring of consumers (process nanny) - https://phabricator.wikimedia.org/T115495#1726136 (Aklapper)
[19:41:10] <wikibugs>	 Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1726139 (Eevans) I've expanded upon @gwicke's prototype a bit, progress here: https://github.com/wikimedia/restevent
[19:41:35] <madhuvishy>	 milimetric:  org.apache.hadoop.hive.ql.parse.SemanticException: Line 10:34 Invalid table alias or column reference 'page_title': (possible column names are: qualifier, view_count)
[19:41:41] <madhuvishy>	 have you seen this?
[19:42:00] <milimetric>	 what?! :) where'd you find that
[19:42:04] <madhuvishy>	 :)
[19:42:24] <milimetric>	 I think I need glasses actually, in all seriousness, I can't see anymore
[19:42:41] <madhuvishy>	 https://hue.wikimedia.org/jobbrowser/jobs/job_1441303822549_100128/single_logs
[19:42:57] <madhuvishy>	 milimetric: scroll up
[19:43:43] <milimetric>	 I see, thx, how did you get to that log?  I was trying to click on everything from the coordinator down
[19:44:36] <madhuvishy>	 milimetric: Actions tab
[19:44:42] <madhuvishy>	 and you see the failed action
[19:44:54] <madhuvishy>	 left most has an icon that links to logs
[19:46:01] <milimetric>	 ha! I missed that little link, I just found the same logs if you click through but they're hidden behind a SUCCESS message
[19:46:02] <milimetric>	 thx
[19:46:36] <madhuvishy>	 milimetric: yeah, also yarn logs -applicationId application_123..
[19:47:41] <milimetric>	 I think last time I submitted an oozie job was almost two years ago :)
[19:47:57] <madhuvishy>	 and mapred job -logs job_1441303822549_100128
[19:47:58] <madhuvishy>	 :)
[19:48:00] <milimetric>	 hue is a million times nicer than how we were getting errors before
[19:48:12] <madhuvishy>	 yup, but hue flakes out sometimes
[19:48:22] <madhuvishy>	 it won't load and you will be very very sad
[19:49:06] <milimetric>	 yes, indeed.  Well, it makes me happy for now :)
[19:50:51] <wikibugs>	 Analytics-Backlog, Analytics-EventLogging, Analytics-Kanban: Eventlogging monitoring of consumers (process nanny) - https://phabricator.wikimedia.org/T115495#1726147 (Ottomata) This isn't true.  The consumer did die, but was started back up.  There is already alerting in place to notify if the process...
[19:50:53] <krrrit-wm>	 (PS5) Milimetric: Archive hourly pageviews by article in wsc format [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379)
[19:51:25] <wikibugs>	 Analytics-Engineering, Analytics-EventLogging, Need-volunteer, Patch-For-Review: EventLogging calling deprecated SyntaxHighlight_GeSHi::buildHeadItem - https://phabricator.wikimedia.org/T71328#1726150 (Legoktm) a:Legoktm
[19:54:47] <nuria>	 milimetric: on meeting, free in 30 mins?
[19:55:11] <milimetric>	 nuria: no worries, madhu helped me, I love hue now, everything's ok :)
[19:57:08] <wikibugs>	 Analytics-EventLogging, I18n, Patch-For-Review, RTL: Headings in json-schema-code-samples have content language direction instead of user language - https://phabricator.wikimedia.org/T62233#1726166 (Legoktm) a:Legoktm Above patches move the code samples out of the content area and into indicators...
[20:06:34] <mforns>	 milimetric, I'm back, let me know if you want more explanations
[20:07:11] <milimetric>	 thx marcel, still finishing up other stuff, no worries, I'll start the report tonight and look in the db some more, we can add more tomorrow
[20:07:20] <mforns>	 ok
[20:25:04] <milimetric>	 oh no!  hue is giving 500s left and right.  When madhu said this would happen I thought it was going to be sometime in the future when I'm in my 60s and almost retired.  But it happened now!!  So saddd
[20:26:46] <milimetric>	 ok it's fine now, crisis averted
[20:29:16] <krrrit-wm>	 (CR) Milimetric: [C: 1] "Self +1 with supporting evidence. Whoever merges this should talk to me first about the TODO regarding renaming "projectcounts" outputs t" [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) (owner: Milimetric)
[20:32:38] <madhuvishy>	 milimetric: :P
[20:32:47] <madhuvishy>	 it happened to me a few minutes back
[20:32:55] <madhuvishy>	 before i told you
[20:33:29] <milimetric>	 :) it's an emotional roller coaster but at least I don't have to create ssh tunnels, a task to which I have lost too many years of my life
[20:34:04] <madhuvishy>	 yeah, the yarn logs and mapred job logs are helpful too
[20:36:20] <nuria>	 milimetric: emotional rollercoaster.. jaja
[20:37:26] <nuria>	 milimetric: i have not used hue for oozie jobs cause really , it is down everytime i look at it   but logs have worked thus far
[20:37:55] <ori>	 sounds awesome
[20:41:08] <milimetric>	 :P
[20:47:08] <wikibugs>	 Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1726337 (Tbayer) >>! In T108925#1719740, @Nuria wrote: >>This is a new and quite important finding. I should say that for the purposes of the Reading team (such...
[20:51:17] <wikibugs>	 Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Improve daily webrequest partition report {hawk} [5 pts] - https://phabricator.wikimedia.org/T113255#1726365 (mforns) @joal There it is. I added also all Analytics' engineers to receive tha daily email.
[21:01:50] <wikibugs>	 Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1726413 (Tbayer) @JAllemandou : I see you moved this task to "done" on the board, but left its status as open. In our October 6 IRC discussion ([[http://bots.wmf...
[21:10:44] <wikibugs>	 Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1726450 (Milimetric) @Tbayer, our tasks just remain open as a matter of process, the product owner closes them as resolved to acknowledge that it's completed to...
[21:32:13] <wikibugs>	 Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1726530 (Nuria) @Tbayer  Sorry but team does not agree that the differences between the old R code and the current pageviews need to be keep up to date in a wiki...
[21:51:58] <wikibugs>	 Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1726599 (atgo) Ping - what do we need to do to get 1:10 into pgheres?
[21:57:32] <wikibugs>	 Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1726618 (Jgreen) >>! In T97676#1726599, @atgo wrote: > Ping - what do we need to do to get 1:10 into pgheres?...
[22:03:54] <wikibugs>	 Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1726630 (Jgreen) One comment on the change from 1:100 to 1:10--this means we're collecting something like 10X t...
[22:17:44] <wikibugs>	 Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1726675 (ellery) 1:10 is already much better. Pgheres has a campaign field which is crucial. Are you able to ge...
[22:21:43] <wikibugs>	 Analytics-EventLogging, I18n, Patch-For-Review, RTL, WMF-deploy-2015-10-20_(1.27.0-wmf.4): Headings in json-schema-code-samples have content language direction instead of user language - https://phabricator.wikimedia.org/T62233#1726695 (Legoktm) Open>Resolved
[22:26:47] <wikibugs>	 Analytics-Backlog, Analytics-Cluster: Investigate (and remove?) spamy pageviews on pageview_hourly - https://phabricator.wikimedia.org/T115477#1726716 (Tbayer) https://en.wikipedia.org/wiki/!!!Fuck_You!!! and https://en.wikipedia.org/wiki/!!!Fuck_You!!!_and_Then_Some are real pages, both redirect to https...
[22:35:07] <mforns>	 night a-team, see you tomorrow!