[03:39:18] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10MediaWiki-Vagrant, 13Patch-For-Review, 15User-bd808: Replace upstart with systemd unit in eventlogging::devserver and eventlogging::service - https://phabricator.wikimedia.org/T154265#2988769 (10bd808) 05Open>03Resolved
[06:50:32] <wikibugs>	 10Analytics-EventLogging, 06Analytics-Kanban, 10DBA, 13Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#2989010 (10Marostegui) >>! In T125135#2987577, @Ottomata wrote: >> This is key, is that somehow doable from the application side? It...
[07:08:40] <wikibugs>	 10Analytics, 10DBA, 06Operations: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#2989026 (10Marostegui) >>! In T124307#2987587, @Ottomata wrote: > @Marostegui ok!  So the T125135 auto-increment thing is a very small piece of this larger issue. >  > Let's see if we can...
[07:47:12] <elukey>	 morning!
[07:47:17] <elukey>	 aqs1008-a is bootstrapping :)
[08:51:51] <mforns>	 hi team! joal yt? the monthly job is still generating the data, so... I guess better deploy without
[08:52:15] <mforns>	 will come back in a bit to recheck
[09:45:48] <joal>	 Thanks mforns_away for checking this :)
[09:46:01] <joal>	 elukey: yay, mar bootstapz
[10:03:56] <joal>	 elukey: quick question: have you run cleanup on aqs1007-b?
[10:05:39] <elukey>	 nope, it didn't need it
[10:05:57] <joal>	 elukey: yeaaah, in theory it didn't :-P
[10:06:00] <elukey>	 the only weird thing that I am seeing is https://grafana.wikimedia.org/dashboard/db/aqs-elukey?panelId=14&fullscreen&from=now-24h&to=now
[10:06:17] <elukey>	 well we can run it anytime
[10:06:32] <elukey>	 so P99 spikes in latency
[10:06:37] <joal>	 Do you mind going for it (as you said, shoud have no effect)
[10:07:19] <joal>	 hm
[10:08:19] <elukey>	 joal: done, finished in 20 secs
[10:08:24] <joal>	 thanks mate :)
[10:08:42] <joal>	 elukey: it makes me feal better ;)
[10:08:48] <joal>	 feel sorry
[10:09:04] <joal>	 the p99 spikes is weird indeed
[10:09:11] <elukey>	 it is not a huge deal
[10:09:56] <joal>	 elukey: when looking at last 30 days, it doesn't seem to be a real tendency yet
[10:10:05] <joal>	 let's keep it in mind though
[10:10:26] <elukey>	 yeah I was about to say that
[10:10:36] <elukey>	 anyhow, better double checking to be sure :)
[10:10:41] <joal>	 for sure :)
[10:11:02] <joal>	 elukey: deploy?
[10:12:53] <elukey>	 sure! Let me check the docs, didn't have time up to now
[10:13:16] <joal>	 no prob elukey - Idea would be: you do it by the docs, I'm here in case :)
[10:14:56] <elukey>	 first question joal - do we need to deploy refinery source ?
[10:15:26] <joal>	 elukey: good question! tickets in "Ready to Deploy" should tell you that :)
[10:16:15] <elukey>	 this is a good point
[10:16:20] <joal>	 elukey: T156629 has a patch on refinery source, so yes, we should :)
[10:16:20] <stashbot>	 T156629: Better explanation on pageview definition for edit actions - https://phabricator.wikimedia.org/T156629
[10:22:25] <elukey>	 ok I am a already a bit confused
[10:22:26] <elukey>	 :D
[10:22:47] <elukey>	 I thought that refinery 0.40 was built by Madhu's automation in Jenkins (also checked Gerrit)
[10:22:56] <elukey>	 but then in the wiki I  read
[10:22:56] <elukey>	 If you need to trigger a build (not a release) you can do that hereː https://integration.wikimedia.org/ci/job/analytics-refinery-release/build?delay=0sec
[10:23:25] <elukey>	 and then "Maven project analytics-refinery-release"
[10:23:26] <elukey>	 :D
[10:23:32] <joal>	 Ah elukey: this is no more explanation on why triggering a build could be needed on docs?
[10:24:35] <elukey>	 not a lot - https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Deploy/Refinery-source#How_to_deploy_with_Jenkins_.28and_related_steps.29
[10:24:40] <elukey>	 but maybe I am missing something
[10:26:42] <joal>	 elukey: Reason to trigger a build before the release is for jenkins to pick up automatically the correct release versions
[10:27:00] <joal>	 elukey: To correctly prefill values in some release page
[10:27:21] <joal>	 elukey: It's needed only if, when preparing a release job, values prefilled are incorrect
[10:28:33] <elukey>	 ahh ok got it
[10:29:42] <joal>	 elukey: docs are not yet precise enough :)
[10:30:00] <elukey>	 I am adding info :)
[10:30:12] <joal>	 Thanks elukey !
[10:32:34] <elukey>	 ah yes I was going to ask about the changelog since https://github.com/wikimedia/analytics-refinery-source/commits/master shows periodic version bumps in changelog
[10:32:45] <elukey>	 but it is buried into "If the maven release job failed (step 3)"
[10:33:02] <elukey>	 that makes sense but it might be better to add some preconditions
[10:33:21] <joal>	 elukey: please, please, please, pleaaaaaase :)
[10:34:19] * elukey takes notes
[10:34:59] <elukey>	 ah snap "Update the changelog.md file at the root of the repository with changes that are going to be deployed - commit and merge this change."
[10:35:06] * elukey goes in the corner of shame
[10:35:27] <elukey>	 anyhow, I'll make it straightforward for dumb people like me :D
[10:39:38] <wikibugs>	 (03PS1) 10Elukey: Changelog v0.0.40 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/335423
[10:41:21] <elukey>	 joal: --^ all good?
[10:41:59] <elukey>	 afaics the maven release plugin will take care of the pom.xml version bump
[10:42:17] <joal>	 elukey: commit message could be just a bit more explicit, but content of changelog is good :)
[10:43:29] <elukey>	 ah I followed the trend of the last releases
[10:44:18] <joal>	 nevermind elukey, I'm too picky :)
[10:44:25] <joal>	 elukey: merging that version :)
[10:44:50] <wikibugs>	 (03PS2) 10Elukey: Add v0.0.40 to the Changelog [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/335423
[10:44:53] <elukey>	 ah I just updated it
[10:44:54] <elukey>	 :P
[10:44:57] <joal>	 hehe :)
[10:45:05] <joal>	 ok, merging the noew one than ;)
[10:45:24] <wikibugs>	 (03CR) 10Joal: [C: 032] "LGTM :)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/335423 (owner: 10Elukey)
[10:45:37] <wikibugs>	 (03CR) 10Elukey: [V: 032] Add v0.0.40 to the Changelog [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/335423 (owner: 10Elukey)
[10:45:42] <joal>	 elukey: let's wait a few minutes for jenkins to merge it
[10:46:08] <elukey>	 ehm sure
[10:46:15] <elukey>	 I haven't merged it already
[10:46:19] <joal>	 :D
[10:46:20] <elukey>	 it is an illusion joal
[10:46:25] <elukey>	 don't look to wikibugs
[10:47:47] * elukey writes down to wait
[10:48:35] <joal>	 elukey: That's no problem
[10:54:34] <elukey>	 so https://integration.wikimedia.org/ci/job/analytics-refinery-release/m2release/ gives to me 0.0.39 as release version that looks wrong
[10:54:49] <elukey>	 so I should https://integration.wikimedia.org/ci/job/analytics-refinery-release/build?delay=0sec and then redo
[10:55:01] <joal>	 That's indeed the idea elukey 
[10:55:25] <joal>	 elukey: You don't even have o wait for the build to finish, you can kill it a few seconds after it has started
[10:56:48] <elukey>	 other weird thing
[10:57:20] <elukey>	 git tag --list shows all vx.y.z but then I can see 0.0.39
[10:57:22] <elukey>	 mmm
[10:58:17] <joal>	 elukey: errors was made at deploy for version 0.039 :)
[10:58:47] <joal>	 elukey: see Change refinery-x.y.z to vx.y.z in the "SCM tag" input textbox and update the number -- We should put the 'v' in bold :)
[10:59:49] <elukey>	 yep yep
[10:59:52] <elukey>	 will do
[11:09:56] <wikibugs>	 06Analytics-Kanban: Document the difference in aggregate data on wikistats and wikistats 2.0 - https://phabricator.wikimedia.org/T150963#2802816 (10Elitre) I am afraid that the answers were not direct enough, so I am still unclear about what it is that you think you need from us and when.  I can help review spec...
[11:15:43] <moritzm>	 elukey, joal: there's a new privilege escalation vulnerability in ntfs-3g, which gets installed by Ubuntu on trusty systems. I'll simply deinstall it from the Hadoop cluster, I doubt anyone/anything uses NTFS?
[11:16:04] <joal>	 Hi moritzm
[11:16:20] <joal>	 moritzm: I don't think hadoop nodes are used for anything else than hadoop
[11:16:34] <joal>	 moritzm: clients node (stat1002,4) might though
[11:18:37] <moritzm>	 but none of the users of stat100[24] has physical access to these servers, so the typical use case of USB media is moot
[11:19:19] <moritzm>	 I'll doublecheck via salt cluster-wide
[11:19:35] <joal>	 awesome (for analytics machines, I'm pretty sure it's not used)
[11:19:47] <joal>	 for stat, don't know
[11:21:31] <moritzm>	 I doublechecked, none of stat100[24] has the fuse kernel module loaded, so it can't have been used since these were rebooted the last time
[11:21:46] <moritzm>	 I'll drop it from there as well, seems sage
[11:21:48] <moritzm>	 I'll drop it from there as well, seems safe
[11:21:52] <joal>	 thanks moritzm 
[11:24:27] <elukey>	 +1, sorry just seen the ping
[11:25:28] <HaeB>	 I have a Hive query repeatedly failing in the reduce stage with "Timed out after 600 secs" errors https://yarn.wikimedia.org/jobhistory/attempts/job_1485458133961_26593/r/FAILED
[11:25:41] <HaeB>	 any ideas on why this is happening, and how to avoid it?
[11:26:50] <HaeB>	 https://www.irccloud.com/pastebin/huI3EcJU/Hive%20query%20which%20is%20failing%20with%20timeout%20in%20reduce%20stage
[11:28:54] <joal>	 HaeB: reading
[11:29:22] <HaeB>	 it's working ok with a smaller LIMIT (e.g. 100) in the inner query, so may have something to do with the windowing (OVER...)
[11:30:20] <elukey>	 joal: updated https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Deploy/Refinery-source
[11:32:22] <joal>	 HaeB: ordering means a single reducer, so while it doesn't take a lot of parallel resource, it can possibly take a very long time :)
[11:32:53] <HaeB>	 i'm happy to wait longer than 600 seconds though ;)
[11:32:53] <joal>	 HaeB: One thing: no need to order in the inner subquery - will be done by windi
[11:33:54] <HaeB>	 ah, i thought so (that was a leftover from an earlier version)
[11:35:35] <joal>	 also, just to be sure I understand: you're willing to get page_titles with views and cumulative views for page ranks that are mutiple of 10k, and rank smaller than  100M, right?
[11:35:49] <joal>	 HaeB: --^
[11:35:55] <HaeB>	 yes
[11:35:59] <joal>	 k :)
[11:36:24] <HaeB>	 100M as hypothetical limit, IIRC it's actually less than 10M 
[11:36:46] <joal>	 probably, for a single hour
[11:37:14] <elukey>	 joal: last step for refinery source would be to run https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars/build with 0.0.40
[11:37:15] <HaeB>	 no, that's the number of pages of the entire wiki
[11:37:21] <elukey>	 (without the 'v')
[11:37:43] <HaeB>	 (because of the GROUP BY page_title)
[11:37:43] <joal>	 HaeB: counting redirects?
[11:38:35] <joal>	 elukey: looks correct :)
[11:39:03] <HaeB>	 https://www.irccloud.com/pastebin/Q8B7ozrf/%23%20of%20distinct%20pages%20viewed%20on%20eswiki%20in%20December%202016
[11:39:12] <HaeB>	 joal: ^
[11:39:55] <joal>	 HaeB: running a slightly modifed version of your query (very slightly) - as expected, windowing step makes having a single reducer
[11:40:11] <joal>	 HaeB: ok, thanks for the check :)
[11:41:30] <joal>	 elukey: docs are way better than before :) I'll correct some typos, but it looks good !
[11:41:41] <joal>	 Thanks elukey for that !
[11:45:38] <joal>	 HaeB: Can you confirm issue happens on stage 3 of your request?
[11:45:52] <joal>	 HaeB: This seems to be what happens to me
[11:46:37] <HaeB>	 yes
[11:55:08] <joal>	 HaeB: looking at EXPLAIN teels us that the sorting actually happens in stage 3
[11:56:23] <HaeB>	 joal: ok - you mean the sorting for the windowing terms, right?
[11:56:36] <joal>	 correct - That's why this stage takes so long
[11:56:49] <joal>	 However there is something I don't understand in the explain statement
[11:57:23] <joal>	 HaeB: trying something slightly different to see if results happen
[11:57:39] <HaeB>	 (the inner query does not seem resource intensive, i have run for an entire year recently for two other wikis)
[11:57:50] <joal>	 HaeB: indeed
[11:59:05] <elukey>	 joal: I checked https://github.com/wikimedia/analytics-refinery/commits/master and it seems to me that we'll only need to restat the druid related coordinator after the refinery deployment right?
[11:59:22] <joal>	 elukey: checking
[11:59:55] <joal>	 elukey: yessir, looks correct to me :)
[12:00:02] <joal>	 pageview jobs (both daily and monthly
[12:01:47] <elukey>	 all right so I'll deploy from tin to stat1002, follow the instructions and then we'll restart them
[12:01:51] <elukey>	 sounds good?
[12:01:58] <elukey>	 should take 5 mins
[12:02:09] <joal>	 elukey: deploy everything alright, then restart:)
[12:07:38] <elukey>	 joal: should I run sudo -u hdfs /srv/deployment/analytics/refinery/bin/refinery-deploy-to-hdfs --verbose --no-dry-run only on one host right? And it could be done after the scap deployment
[12:08:03] <elukey>	 not super clear from the docs
[12:08:20] <joal>	 elukey: indeed, this command deploys last version to HDFS - no need to do it from multiple places :)
[12:08:35] <joal>	 And it indeed should be done after scap, to copy the last deployed version
[12:17:42] <elukey>	 !log deployed Refinery via scap and then executed the hdfs copies on stat1002
[12:17:46] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[12:18:31] <wikibugs>	 (03PS2) 10Nschaaf: (in progress) Store sanitized data for WDQS [analytics/refinery] - 10https://gerrit.wikimedia.org/r/335211 (https://phabricator.wikimedia.org/T146915)
[12:20:21] <elukey>	 joal: all good, the last step is to restart the coordinators
[12:20:33] <wikibugs>	 (03CR) 10Nschaaf: [C: 04-1] "I've updated the naming to better reflect what is happening to the data. Could Analytics comment on exactly what data is considered PII an" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/335211 (https://phabricator.wikimedia.org/T146915) (owner: 10Nschaaf)
[12:23:50] <joal>	 elukey: let me know when you want to do so :)
[12:27:35] <elukey>	 joal: anytime, even now
[12:27:51] <joal>	 let's go :)
[12:28:29] <elukey>	 all right let me try to explain what I'd do
[12:28:54] <joal>	 HaeB: I don't have better ideas than boosting timeout :(
[12:29:10] <joal>	 HaeB: Not very good, but can't think of better solution
[12:29:19] <elukey>	 THEORETICALLY pageview-druid-monthly-coord and pageview-druid-daily-coord should be changed
[12:30:04] <joal>	 HaeB: I think this should work: SET mapreduce.task.timeout = 1800000; (1800 seconds instead of 600, could even be more)
[12:30:34] <elukey>	 ah snap https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Oozie/Administration is super long :P
[12:31:06] <joal>	 Oh elukey, an interesting thought came to my mind: why haven't we changed jar version in refine job?
[12:32:46] <elukey>	 joal: is it a tricky question ? :D
[12:33:00] <joal>	 elukey: A bit :)
[12:33:16] <elukey>	 well you changed comments afaics
[12:33:29] <elukey>	 so theoretically it doesn't matter much to bump it
[12:33:37] <elukey>	 but the .properties has been changed
[12:33:47] <elukey>	 so the coordinator needs a restart
[12:33:47] <joal>	 elukey: correct :) We even could have not deployed refinery-source
[12:33:53] <elukey>	 I knowwww
[12:33:55] <elukey>	 :P
[12:35:24] <joal>	 elukey: My point was to make sure we did not update jar version anywhere on purpose (we sometimes forget to do it when it's needed)
[12:36:41] <elukey>	 +1
[12:38:58] <elukey>	 so joal, should I just kill the two coordinators and restart them?
[12:54:06] <joal>	 excuse me elukey, got a phone call
[12:54:24] <joal>	 elukey: That's the plan, kill, restart (with correct start time)
[13:02:52] <elukey>	 joal: me too :)
[13:03:09] <elukey>	 so pageview-druid-monthly-coord and pageview-druid-daily-coord right?
[13:04:03] <joal>	 elukey: correct
[13:12:58] <elukey>	 !log restarted pageview-druid-monthly-coord and pageview-druid-daily-coord oozie coordinators after deployment
[13:13:01] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[13:13:03] <elukey>	 joal: --^ 
[13:13:08] <elukey>	 hope that they are fine
[13:13:23] <elukey>	 just checked in hue and they seem good
[13:16:13] <joal>	 indeed it seems good elukey :)
[13:16:19] <joal>	 Thanks a lot elukey for having done that :)
[13:16:25] <elukey>	 awesome docs for oozie!
[13:16:42] <elukey>	 thanks for helping me, finally I managed to do my first deployment :)
[13:18:05] <joal>	 elukey: I did nothing !
[13:51:53] * elukey afk for a bit!
[13:57:35] <wikibugs>	 10Analytics, 10DBA, 06Operations: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#2989663 (10Ottomata) @mforns, can you comment about large DELETES?  Do they happen often?  How large are they when it happens?  Would LOAD DATA actually help replication?
[14:01:14] <milimetric>	 halfak / joal: running 5 minutes late, bank problem
[14:02:34] <halfak>	 no worries milimetric.  I'm a little late myself. 
[14:15:00] <moritzm>	 elukey: I'm upgrading ca-certificates-java on the kafka* hosts, already done on the other jessie/java hosts, so should cause no problems
[14:15:43] <elukey>	 moritzm: sure - should we restart the brokers for openjdk later on?
[14:15:43] <moritzm>	 it was required for the openjdk-8 updates, and since kafka uses java 7 it's strictly not needed, but better to have all the versions in sync across the fleet
[14:15:50] <elukey>	 ah ok
[14:15:52] <moritzm>	 no, not needed for this one
[14:15:54] <elukey>	 already answered :)
[14:30:31] <wikibugs>	 10Analytics, 10DBA, 06Operations: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#2989770 (10Marostegui) >>! In T124307#2989663, @Ottomata wrote: >  > @Marostegui , Would LOAD DATA actually help replication?  If you need to do massive data imports into the DB, it will h...
[14:36:35] * fdans goes out for some 4pm lunch
[14:38:40] <wikibugs>	 (03PS3) 10Fdans: Adds map visualizer to Dashiki [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/333922 (https://phabricator.wikimedia.org/T153921)
[14:39:02] <milimetric>	 (will review after meeting, fdans )
[14:39:13] <wikibugs>	 10Analytics, 10DBA, 06Operations: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#2989845 (10Ottomata) EventLogging is a stream of data.  We can do batching because the data is consumed from Kafka, and then inserted into MySQL via a python MySQL client.  So we could con...
[14:39:15] <fdans>	 milimetric: I just need to alter the tests a bit, but it's ready for review
[14:39:21] <fdans>	 thank you :)
[14:56:08] <wikibugs>	 10Analytics, 10DBA, 06Operations: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#2989873 (10Marostegui) LOAD DATA is a lot faster to bulk lots of data in the DB, there is a lot less overhead in parsing SQL statements and all the processes around that parsing.  This is...
[15:07:31] <milimetric>	 fdans: gonna review now, if you want to watch me mumble through it, I can jump in the batcave
[15:14:33] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1053 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[15:14:52] <joal>	 yeah, was about to say: cluster is under heavy pressure
[15:16:04] <joal>	 analytics 32 has weird behavior elukey: a lot of system CPU sued compared to others :(
[15:16:48] <elukey>	 checking
[15:16:53] <joal>	 thanks
[15:17:33] <joal>	 first day of the month is always a bad day for cluster ...
[15:18:07] <elukey>	 so 53 gives me OK: YARN NodeManager analytics1053.eqiad.wmnet:8041 Node-State: RUNNING
[15:18:20] <elukey>	 and I believe that it is the bug of the script checking node manager state
[15:18:29] <joal>	 k elukey 
[15:18:50] <elukey>	 WHAT
[15:18:50] <elukey>	  * Hadoop nodemanager is dead and pid file exists
[15:18:59] <elukey>	 ok now I am confused :D
[15:19:01] <elukey>	 ahahaha
[15:19:05] <joal>	 wow
[15:19:06] <elukey>	 the script is definitely weird
[15:20:06] <elukey>	 java.lang.OutOfMemoryError: Java heap space
[15:20:14] <elukey>	 this is the memleak
[15:21:10] <elukey>	 joal: I think that we need a rolling restart of the cluster :(
[15:21:19] <joal>	 elukey: :(
[15:22:45] <elukey>	 some nodes are fine, others are not - https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=17&fullscreen
[15:23:20] <joal>	 elukey: most are not
[15:23:33] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1053 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[15:23:47] <elukey>	 close to the heap limit, and these metrics are not super fine grained to show small spikes
[15:24:01] <elukey>	 to yes I need to start restarting Yarn daemons now
[15:24:03] <elukey>	 :)
[15:24:23] <joal>	 elukey: Mwarf
[15:24:38] <joal>	 elukey: Let's be carefull, there are some jobs I'd rather not kill
[15:24:50] <wikibugs>	 10Analytics, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#2989939 (10Ottomata)
[15:25:19] <elukey>	 joal: sure, the last time I tried to poke the App Masters from the Resource Manager being super careful in not killing them, but it didn't work super well
[15:25:46] <elukey>	 this mess might push us to upgrade the cluster to the new cdh
[15:25:56] <joal>	 correct
[15:27:01] <joal>	 elukey: let's give it another hour: huge monthly stuff is being absorbed
[15:27:07] <joal>	 elukey: ok for you?
[15:28:00] <elukey>	 sure
[15:28:12] <elukey>	 udo -u hdfs /usr/bin/yarn application -appStates RUNNING -list | egrep -o 'analytics10[0-9][0-9].eqiad.wmnet' | sort | uniq -c is what I used to check the appmasters
[15:28:15] <elukey>	 *sudo
[15:28:36] <joal>	 checking elukey 
[15:29:08] <chasemp>	 joal: seems like you're busy, no need to respond :) small fyi https://phabricator.wikimedia.org/T153743 is the place to watch for remaining shards on new labsdbs
[15:29:42] <joal>	 Thanks chasemp, will look after cluster gets quieter
[15:31:19] <joal>	 atually elukey, no good - my faith was incorrect
[15:48:38] <wikibugs>	 10Analytics, 10DBA, 06Operations: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#1952524 (10Nuria) @ottomata: we do not delete data from eventlogging (other than the purging that it should happen after 90 days)  the system just inserts  batches of records.
[15:53:47] <wikibugs>	 10Analytics, 10DBA, 06Operations: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#2990018 (10jcrespo) > purging that it should happen after 90 days  How do you implement purging? That surely must run deletes or some kind of updates?
[15:56:23] <joal>	 ottomata, elukey: spark is 1.6.0 in CDH 5.7+
[15:56:28] <wikibugs>	 (03CR) 10Nuria: "Maybe it is worth taking about this in person? As far as I can see there is no sanitization despite of naming.  We do not retain long term" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/335211 (https://phabricator.wikimedia.org/T146915) (owner: 10Nschaaf)
[15:56:35] <joal>	 mwahh, not even bug correction version :(
[15:56:39] <wikibugs>	 (03CR) 10Nuria: [V: 04-1 C: 04-1] (in progress) Store sanitized data for WDQS [analytics/refinery] - 10https://gerrit.wikimedia.org/r/335211 (https://phabricator.wikimedia.org/T146915) (owner: 10Nschaaf)
[16:01:46] <ottomata>	 nuria:  standup wooo
[16:01:53] <ottomata>	 mforns:  too
[16:01:53] <ottomata>	 :)
[16:02:02] <nuria>	 ottomata: indeeed!!!!
[16:02:07] <mforns>	 oooh! coming!
[16:07:42] <ottomata>	 elukey:  fyi, you asked me how: basically I do this: https://wikitech.wikimedia.org/wiki/Git-buildpackage#How_to_build_a_Python_deb_package_using_git-buildpackage
[16:07:45] <ottomata>	 for new python debs
[16:08:02] <elukey>	 thanks!
[16:09:36] <wikibugs>	 06Analytics-Kanban: Update montly 'unique computation' jobs for better resource management - https://phabricator.wikimedia.org/T156921#2990037 (10JAllemandou)
[16:10:10] <wikibugs>	 (03PS1) 10Joal: Update montly unique jobs for better resource mgt [analytics/refinery] - 10https://gerrit.wikimedia.org/r/335459 (https://phabricator.wikimedia.org/T156921)
[16:29:18] <wikibugs>	 10Analytics: CDH upgrade. Value proposition: new spark for edit reconstruction - https://phabricator.wikimedia.org/T152714#2857818 (10Ottomata) In Analytics Ops meeting today, we decided we should upgrade to CDH 5.10 now that it is out, even though it doesn't have Spark 2.x like we had hoped.  - Mediawiki Histor...
[16:31:47] <joal>	 elukey: have you started to restart nodemanagers:
[16:31:48] <joal>	 ?
[16:32:23] <elukey>	 joal: yep, 1028/29/30 done
[16:32:37] <joal>	 hm, killed a monthly job :(
[16:32:44] <elukey>	 ah snap
[16:33:07] <elukey>	 so definitely the script does not tell us the whole picture
[16:33:15] <elukey>	 or I am missing something
[16:33:25] <joal>	 there is something I don't understand either
[16:34:43] <elukey>	 another thing that is really annoying is "nodemanager did not stop gracefully after 5 seconds: killing with kill -9"
[16:34:59] <elukey>	 IIRC I tracked down a bug that was still open for this
[16:35:35] <joal>	 mforns: right elukey 
[16:35:39] <joal>	 oop sorrry mforns 
[16:35:53] <mforns>	 hehe np :]
[16:37:40] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1045 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[16:39:20] <elukey>	 :(
[16:39:40] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1045 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[16:39:46] <joal>	 elukey: I think your script doesn't give you correct info :( another spark job I had runnign has died because of restarts
[16:39:56] <joal>	 elukey: I'll have a look at the script again
[16:41:56] <elukey>	 I am also wondering if there is a graceful restart procedure
[16:42:22] <joal>	 elukey: makes no sense why my job died if you've not killed the master :(
[16:42:29] <elukey>	 https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html
[16:43:36] <joal>	 looks good elukey!!
[16:43:55] <elukey>	 https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html is for our version
[16:44:00] <elukey>	 so it might be something to check
[16:44:01] <joal>	 elukey: my spark job died because of 4 retries over a node that was not present - makes sense
[16:45:19] <elukey>	 joal: a node that wasn't present? :O
[16:45:54] <joal>	 elukey: a node that had died
[16:45:57] <joal>	 (or restarted)
[16:47:37] <joal>	 marcel: your job dies anyway ;)
[16:47:42] <joal>	 mforns: --^
[16:47:48] <mforns>	 joal, :[
[16:47:50] <mforns>	 ok
[16:49:12] <joal>	 elukey: let me know when you're done with the NodeManagers, I'll restart jobs at that moment
[16:49:37] <joal>	 mforns: I'll let you restart your job (maybe with the patch I suggest on mappers)
[16:49:54] <mforns>	 joal, yes will do!
[16:50:00] <joal>	 mforns: not now though :)
[16:50:17] <elukey>	 joal: did only spark job failed by any chance?
[16:50:23] <mforns>	 ok
[16:50:28] <joal>	 elukey: nope, others too
[16:50:50] <joal>	 don't bother elukey, let's move on with restarting everything, I'll restart what's needed
[16:50:58] <elukey>	 because I can see application_1480065021448_201730BannerImpressionsStream               SPARK      joal root.default           RUNNING         UNDEFINED            10%            http://10.64.5.104:4040
[16:51:10] <joal>	 elukey: you can kill it, I'll restart it
[16:51:14] <elukey>	 and the IP is stat1004
[16:51:24] <joal>	 elukey: fun !
[16:51:30] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1054 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[16:51:35] <joal>	 elukey: Please go ahead, we'll see if it dies :)
[16:52:30] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1054 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[16:54:50] <elukey>	 weird I can see that your spark job is running on 1035
[16:54:51] <elukey>	 mmmm
[16:56:20] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1042 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[16:58:20] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1042 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[17:02:43] <wikibugs>	 (03PS2) 10Joal: Update montly unique jobs for better resource mgt [analytics/refinery] - 10https://gerrit.wikimedia.org/r/335459 (https://phabricator.wikimedia.org/T156921)
[17:02:58] <joal>	 mforns: updated the patch with some delay in coordinator, please have a look --^
[17:03:13] <mforns>	 joal, oh! thanks :]
[17:03:26] <joal>	 mforns: so, 2 things: start mappers only, + delay
[17:03:33] <mforns>	 ok
[17:03:51] <joal>	 mforns: I think the banners job could be delayed by a day - data is present thanks to daily jobs, so no real rush on that one, agreed?
[17:04:21] <mforns>	 joal, I think it could be delayed by more days, like 5?
[17:04:36] <joal>	 mforns: if you want, no big deal
[17:05:30] <joal>	 elukey: doing good on restarts?
[17:07:01] <elukey>	 joal: so I am missing 1035 (that is running your banner impression job), 1040 and 1052->1055
[17:07:04] <elukey>	 the rest is done
[17:07:44] <joal>	 elukey: great - please go ahead with everything, don't bother about jobs anymore (let's do it, then restart - done :)
[17:10:48] <wikibugs>	 06Analytics-Kanban, 10DBA, 13Patch-For-Review: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#2990284 (10Nuria)
[17:11:21] <wikibugs>	 10Analytics, 10DBA, 13Patch-For-Review: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#1532166 (10Nuria)
[17:14:00] <wikibugs>	 06Analytics-Kanban: Investigate if Node Managers can be restarted without impacting running containers - https://phabricator.wikimedia.org/T156932#2990310 (10elukey)
[17:14:19] <wikibugs>	 10Analytics-Cluster, 06Analytics-Kanban: Investigate if Node Managers can be restarted without impacting running containers - https://phabricator.wikimedia.org/T156932#2990323 (10elukey) p:05Triage>03High
[17:14:22] <elukey>	 joal: --^
[17:14:28] <elukey>	 all restarts are done
[17:14:51] <wikibugs>	 10Analytics: Create purging script for analytics-slave data - https://phabricator.wikimedia.org/T156933#2990326 (10Nuria)
[17:15:01] <elukey>	 oozie is complaining a lot
[17:15:33] <ottomata>	 milimetric:  which db hosts are these two?
[17:15:34] <ottomata>	 https://phabricator.wikimedia.org/T156844
[17:15:39] <ottomata>	 or
[17:15:41] <ottomata>	 mforns: db1046, db1047
[17:15:42] <ottomata>	 ?
[17:15:58] <mforns>	 ottomata, ???
[17:16:02] <elukey>	 joal: can I restart some of the jobs?
[17:16:05] <elukey>	 or are you doing it?
[17:16:12] <ottomata>	 analytics-store, etc. are special names
[17:16:18] <ottomata>	 if you don't know i'll look
[17:16:41] <mforns>	 ottomata, I don't know... can look as well
[17:17:04] <mforns>	 db1046 is m2-master
[17:17:33] <ottomata>	 that is an analytics mysql box?
[17:18:07] <wikibugs>	 10Analytics, 10DBA, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#2990353 (10Ottomata)
[17:19:15] <mforns>	 db1047 is a slave
[17:19:19] <mforns>	 ottomata, not sure
[17:19:31] <ottomata>	 k
[17:19:40] <wikibugs>	 10Analytics, 10DBA, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#2987618 (10Ottomata) @Marostegui, @jcrespo, we talked about this today.  What is your timeline for replacing these boxes?  We want to try to ween people off of EventLogging My...
[17:19:43] <joal>	 sorry got hooked in baby mode
[17:20:07] <joal>	 elukey: I'll do it (I'd like to patch some if mforns and nuria agree for merging)
[17:21:00] <elukey>	 super
[17:21:02] <mforns>	 joal, sure I'm reviewing it
[17:23:20] <mforns>	 joal, I see, so you make the job depend on other datasets, that's clever
[17:23:43] <mforns>	 now I'm not sure if you said you were going to modify the banner job or I can do it?
[17:23:50] <joal>	 mforns: It's actually the same dataset, but different dependency :)
[17:23:58] <joal>	 mforns: please go for it
[17:24:03] <mforns>	 joal, ok
[17:24:38] <joal>	 mforns: this trick has been first by qchris__, we owe him a lot :)
[17:24:54] <mforns>	 :] I see
[17:27:06] <joal>	 !log Restarting 2 webrequest-load text jobs that failed during NM restart (2016-02-01T11:00 and T13:00)
[17:27:09] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[17:27:10] <stashbot>	 T13: Plan to migrate everything to Phabricator - https://phabricator.wikimedia.org/T13
[17:27:43] <joal>	 hehe, managed to confuse stashbot :)
[17:28:14] <joal>	 mforns: if you think the patch for monthly is good enough, let's merge, I'll deploy and restart jobs
[17:28:22] <wikibugs>	 06Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 13Patch-For-Review, 06Services (watching): Prepare eventstreams (with KafkaSSE) for deployment - https://phabricator.wikimedia.org/T148779#2990398 (10Nuria)
[17:28:26] <wikibugs>	 06Analytics-Kanban, 10EventBus, 06Operations, 10Traffic, and 2 others: Productionize and deploy Public EventStreams - https://phabricator.wikimedia.org/T143925#2990397 (10Nuria) 05Open>03Resolved
[17:28:39] <wikibugs>	 06Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 07Documentation: EventStreams documentation - https://phabricator.wikimedia.org/T153117#2990399 (10Nuria) 05Open>03Resolved
[17:28:46] <wikibugs>	 06Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 06Services (watching), 15User-mobrovac: Public Event Streams - https://phabricator.wikimedia.org/T130651#2990400 (10Nuria)
[17:28:54] <wikibugs>	 06Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 13Patch-For-Review, and 2 others: RecentChanges in Kafka - https://phabricator.wikimedia.org/T152030#2990427 (10Nuria) 05Open>03Resolved
[17:28:57] <wikibugs>	 06Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 06Services (watching), 15User-mobrovac: Public Event Streams - https://phabricator.wikimedia.org/T130651#2611852 (10Nuria)
[17:29:08] <wikibugs>	 06Analytics-Kanban: Pageview Jobs: Make workflows easier to maintain using a variable instead of repeating some complex value accross the files - https://phabricator.wikimedia.org/T156668#2990430 (10Nuria) 05Open>03Resolved
[17:29:22] <wikibugs>	 06Analytics-Kanban, 13Patch-For-Review: Follow naming convention on druid jobs: ts for long unix timestamps, dt for ISO. - https://phabricator.wikimedia.org/T156170#2990432 (10Nuria) 05Open>03Resolved
[17:29:33] <wikibugs>	 06Analytics-Kanban, 13Patch-For-Review: Better explanation on pageview definition for edit actions - https://phabricator.wikimedia.org/T156629#2990433 (10Nuria) 05Open>03Resolved
[17:29:45] <wikibugs>	 06Analytics-Kanban, 13Patch-For-Review: Improve AQS deployment - https://phabricator.wikimedia.org/T156049#2990437 (10Nuria) 05Open>03Resolved
[17:30:23] <HaeB>	 joal: thanks, will try that out
[17:30:41] <joal>	 no prob HaeB, sorry for not having something better
[17:30:48] <wikibugs>	 06Analytics-Kanban, 06Operations, 10ops-eqiad, 13Patch-For-Review, 15User-Elukey: rack and set up aqs100[7-9] - https://phabricator.wikimedia.org/T155654#2990439 (10Nuria)
[17:30:51] <wikibugs>	 10Analytics: Add hardware capacity to AQS - https://phabricator.wikimedia.org/T144833#2990438 (10Nuria)
[17:32:00] <mforns>	 joal, you mean banner monthly? or monthly uniques?
[17:32:27] <joal>	 mforns: was thinking on monthly uniques
[17:33:12] <wikibugs>	 (03CR) 10Mforns: [C: 032] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/335459 (https://phabricator.wikimedia.org/T156921) (owner: 10Joal)
[17:33:21] <mforns>	 joal, ^
[17:33:28] <joal>	 thanks mforns 
[17:41:55] <wikibugs>	 10Analytics, 10DBA, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#2990444 (10jcrespo) > What is your timeline for replacing these boxes?  The constraint, more than the decommission, is the budget for replacements. I do not know what is the d...
[17:44:45] <wikibugs>	 10Analytics, 10DBA, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#2990448 (10jcrespo) > What is your timeline for replacing these boxes?  BTW, I forgot to answer literally your question, the deadline for replacement is January 2014 (not a ty...
[17:55:54] <elukey>	 all right going afk team!
[17:56:00] <elukey>	 talk with you tomorrow :)
[17:56:01] <elukey>	 byeee
[18:14:59] <wikibugs>	 10Analytics, 10Pageviews-API: Pageview API: Better filtering of bot traffic on top enpoints - https://phabricator.wikimedia.org/T123442#2990549 (10MusikAnimal) I have a very buggy new version of Topviews that I'm working on that shows the percentage of mobile views each page receives. See http://tools.wmflabs....
[18:18:06] <joal>	 bearloga: Hi !
[18:44:49] <wikibugs>	 (03CR) 10Joal: [V: 032] "Merging to deploy." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/335459 (https://phabricator.wikimedia.org/T156921) (owner: 10Joal)
[18:47:14] <joal>	 !log Deploy refinery for uniques monthly patches
[18:47:17] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[19:01:00] <joal>	 !log Killed-Restarted Mobile apps Uniques monthly jobs to pick up new config - 0096638-161121120201437-oozie-oozi-C
[19:01:01] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[19:07:16] <bearloga>	 joal: hi! Good morning! Sorry I missed your ping earlier. Are you still here? What's up?
[19:07:26] <joal>	 hey bearloga :)
[19:08:19] <joal>	 bearloga: I was willing to spend some time with you discussing if productionisation could be interesting for some of your hive queries
[19:08:31] <joal>	 bearloga: now might not be the best moment though :)
[19:08:49] <joal>	 And you possibly have already discussed this with mforns and milimetric
[19:10:36] <bearloga>	 joal: Maybe? Depends which queries. I'm currently testing our Reportupdater-based code base which has a lot of Hive queries.
[19:10:46] <joal>	 ah :)
[19:11:12] <joal>	 bearloga: I'm asking since I've noticed you are a 'regular' user of the cluster ;)
[19:12:43] <bearloga>	 I certainly hope I'm a "regular" user of the cluster! ;D analyzing data is literally my primary job :P
[19:12:53] <joal>	 hehe :)
[19:14:18] <joal>	 If you;re in the process of report-updater, it's definitely already on the move :)
[19:14:24] <joal>	 Thanks for ligh
[19:14:48] <joal>	 thanks for lighting some of my bulbs :)
[19:15:57] <joal>	 Taking my leave now, see y'all a-team and others tomorrow :)
[19:16:04] <bearloga>	 joal: Have fun! :)
[19:45:29] <nuria>	 bearloga: jajaja
[19:47:10] <nuria>	 ottomata|afk: milimetric will be able to mmake data mapping meeting, feel free to skip if you have other important things to attend to
[19:48:21] <wikibugs>	 10Analytics, 10Wikimedia-General-or-Unknown: Browser and platform stats for logged-in vs. anon users for security and product support decisions - https://phabricator.wikimedia.org/T58575#2991001 (10Nuria) > as a) browser support is likely to differ between anonymous & authenticated users,  Browser stats for th...
[19:50:12] <milimetric>	 (I'll be there nuria)
[19:54:35] <wikibugs>	 10Analytics, 10Wikimedia-General-or-Unknown: Browser and platform stats for logged-in vs. anon users for security and product support decisions - https://phabricator.wikimedia.org/T58575#2991043 (10Nuria) Correcting last post. From data on hadoop we cannot differentiate between logged in and not logged in user...
[19:58:25] <nuria>	 bearloga: is there an e-mail list for analysts at wmf?
[19:58:54] <bearloga>	 nuria: not to my knowledge, no
[20:01:49] <wikibugs>	 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Change userAgent field to user_agent_map in EventCapsule - https://phabricator.wikimedia.org/T153207#2991061 (10Nuria)
[20:29:24] <ottomata>	 phew oh man
[20:29:41] <ottomata>	 i locked my keys in my car, luckily a nice dude at the storage/parking place had some serious wire hanger skills
[20:44:42] <leila>	 ottomata: I need your help. :D
[20:45:21] <ottomata>	 leila:  hiii
[20:45:23] <ottomata>	 in a meeting
[20:45:25] <ottomata>	 but, what's up?
[20:45:40] <leila>	 we're choosing a name for a domain that will be used to serve recommendation APIs that we offer: think readMore recommendations (for readers), recommendation about what article to create, what article to translate, which hyperlinks to add to articles, which articles to expand, etc. 
[20:45:46] <leila>	 my question is: what should be the name?
[20:45:47] <leila>	 :D
[20:45:57] <leila>	 you can even say it https://phabricator.wikimedia.org/T147420 ottomata
[20:46:48] <ottomata>	 OH I LOVE NAMING
[20:46:48] <ottomata>	 hehehhe
[21:01:18] <leila>	 ottomata: I don't ask you a task that you don't love. :D
[21:02:56] <wikibugs>	 10Analytics: Remove user_agent_map from pageview_hourly long term - https://phabricator.wikimedia.org/T156965#2991318 (10Nuria)
[21:07:14] <HaeB>	 joal: BTW, interesting to see https://gerrit.wikimedia.org/r/#/c/335459/ - are these perfomance tricks something that we could potentially also use in the other last-access queries we discussed with zareen recently?
[21:07:35] <wikibugs>	 10Analytics: Remove user_agent_map from pageview_hourly long term - https://phabricator.wikimedia.org/T156965#2991322 (10Nuria) Browser data is been useful to many teams on druid.  - For detail data we can delete after 90 days - We can load (to see browser trends) our browser dataset over time
[21:08:33] <nuria>	 leila: "RECOMENDATOR"....obviously
[21:08:40] <leila>	 nuria: :D
[21:10:00] <nuria>	 HaeB: delay is just so two big jobs do not run at the same time, it doesn't make resources available
[21:10:54] <nuria>	 HaeB: for the other setting i am not sure
[21:10:59] <HaeB>	 nuria: i was more thinking about the other part (SET mapreduce.job.reduce.slowstart.completedmaps=0.99;) 
[21:12:23] <nuria>	 HaeB: we can consult with joal tomorrow those are useful if there is sorting 
[21:12:57] <HaeB>	 yeah, not urgent
[21:44:30] <wikibugs>	 10Analytics, 10EventBus, 10Reading-Web-Trending-Service, 13Patch-For-Review, and 2 others: Compute the trending articles over a period of 24h rather than 1h - https://phabricator.wikimedia.org/T156411#2991436 (10Jdlrobson) I'm not sure whether intentional is the right word.. but it's something I observed....
[21:50:33] <wikibugs>	 10Analytics, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#2991446 (10Ottomata)
[21:59:55] <wikibugs>	 10Analytics, 10Pageviews-API: Pageview API: Better filtering of bot traffic on top enpoints - https://phabricator.wikimedia.org/T123442#2991467 (10Milimetric) That's great insight, thank you @MusikAnimal
[22:05:15] <wikibugs>	 10Analytics, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#2991493 (10Ottomata)
[22:05:17] <wikibugs>	 06Analytics-Kanban: Document the difference in aggregate data on wikistats and wikistats 2.0 - https://phabricator.wikimedia.org/T150963#2991494 (10Milimetric) Maybe a meeting would be easier then?  Maybe our request is just getting lost in too much documentation?
[22:12:20] <wikibugs>	 10Analytics-Dashiki, 06Analytics-Kanban, 13Patch-For-Review: Add extension and category  (ala Eventlogging) for DashikiConfigs - https://phabricator.wikimedia.org/T125403#1986718 (10Milimetric) After a positive discussion on meta's Babel page, created T156971 to track deployment to prod.
[22:15:00] <wikibugs>	 10Analytics, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#2991532 (10Ottomata)
[22:16:13] <wikibugs>	 10Analytics, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#2989939 (10Ottomata)
[22:23:05] <wikibugs>	 10Analytics, 10EventBus, 10Reading-Web-Trending-Service, 13Patch-For-Review, and 2 others: Compute the trending articles over a period of 24h rather than 1h - https://phabricator.wikimedia.org/T156411#2991561 (10mobrovac) Adding the extra parameter shouldn't be a problem from the caching perspective, as we...
[22:28:11] <wikibugs>	 (03CR) 10Milimetric: [C: 04-1] "cool, looks good.  Couple of nits and one or two ideas." (0312 comments) [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/333922 (https://phabricator.wikimedia.org/T153921) (owner: 10Fdans)
[22:33:29] <wikibugs>	 10Analytics, 10EventBus, 10Reading-Web-Trending-Service, 13Patch-For-Review, and 2 others: Compute the trending articles over a period of 24h rather than 1h - https://phabricator.wikimedia.org/T156411#2991592 (10Jdlrobson) Clients wouldn't specify the value of max_age in config. They would specify a period...
[22:35:48] <wikibugs>	 10Analytics, 10EventBus, 10Reading-Web-Trending-Service, 13Patch-For-Review, and 2 others: Compute the trending articles over a period of 24h rather than 1h - https://phabricator.wikimedia.org/T156411#2991599 (10mobrovac) Yup, lapsus linguae, but the concern still stands.
[22:41:12] <wikibugs>	 (03CR) 10Ottomata: "Hm, you have external Hive partitions on this data, right?  Are you sure you don't need to drop those too?" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/335158 (owner: 10EBernhardson)
[22:51:16] <wikibugs>	 10Analytics, 10EventBus, 10Reading-Web-Trending-Service, 13Patch-For-Review, and 2 others: Compute the trending articles over a period of 24h rather than 1h - https://phabricator.wikimedia.org/T156411#2991693 (10Jdlrobson) Right now I envisioned this as an integer but we could use an enumerator instead to...
[23:10:11] <wikibugs>	 10Analytics, 10EventBus, 10Reading-Web-Trending-Service, 13Patch-For-Review, and 2 others: Compute the trending articles over a period of 24h rather than 1h - https://phabricator.wikimedia.org/T156411#2991716 (10mobrovac) Fixed integers in the (0,24] range would be better. the more options we introduce, th...
[23:44:19] <wikibugs>	 10Analytics, 06Reading-analysis, 06Research-and-Data, 10Research-consulting: Propose metrics along with qualifiers for the press kit - https://phabricator.wikimedia.org/T144639#2991803 (10Neil_P._Quinn_WMF)