[00:01:23] <wikibugs>	 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 4 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3141307 (10Pchelolo) >>! In T161731#3814596, @Smalyshev wrote: > @Ottomata thanks, I can connect to the hosts above, but still not sure how to control...
[00:23:20] <wikibugs>	 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 4 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3814801 (10Smalyshev) @Pchelolo thanks for the pointer, this is very helpful!   Indeed, kafkacat for example supports it [[ https://github.com/edenhill...
[00:35:12] <wikibugs>	 10Analytics, 10Services (watching): Update to latest kafkacat - https://phabricator.wikimedia.org/T182163#3814846 (10Pchelolo)
[00:35:29] <wikibugs>	 10Analytics, 10Services (watching): Update to latest kafkacat - https://phabricator.wikimedia.org/T182163#3814862 (10Pchelolo)
[00:36:36] <wikibugs>	 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 4 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3814868 (10Pchelolo) Indeed. Created T182163 for updating `kafkacat` but it is a bit more complex then simply installing new version - it depends on ne...
[00:47:54] <wikibugs>	 10Analytics, 10Services (watching): Update to latest kafkacat - https://phabricator.wikimedia.org/T182163#3814898 (10Smalyshev) Looks like stretch-backports [[ https://packages.debian.org/stretch-backports/librdkafka1 | has 0.11 ]],  and there's 1.3.1 package [[ https://packages.debian.org/buster/kafkacat | fo...
[00:48:34] <wikibugs>	 10Analytics, 10EventBus, 10MW-1.31-release-notes (WMF-deploy-2017-11-14 (1.31.0-wmf.8)), 10Patch-For-Review, 10Services (next): Timeouts on event delivery to EventBus - https://phabricator.wikimedia.org/T180017#3814900 (10Pchelolo) After the deployment of the above patches, we haven't seen and `Attribute...
[09:31:54] <joal>	 Heya elukey - Let me know when you want to talk about streaming-jobs in puppet
[09:32:24] <elukey>	 joal: o/
[09:32:34] <elukey>	 do you mean your code review?
[09:33:29] <joal>	 Yessir
[09:37:28] <elukey>	 so I added some suggestions to the code review
[09:37:40] <elukey>	 let me explain the idea in here so we can chat about it
[09:38:39] <elukey>	 we are transitioning from a model in which a host in site.pp holds multiple roles to new one in which every host holds one single role, that in turn includes profiles
[09:39:12] <elukey>	 profiles are basically what we used as "roles" before, stored in a different path (modules -> profiles -> manifests -> etc..
[09:39:49] <elukey>	 profiles should get their parameters from hiera, but it is not a problem to have defaults
[09:40:23] <joal>	 elukey: I hink I understand
[09:40:27] <elukey>	 in this case, you probably want to run the check only on analytics1003 right?
[09:40:34] <joal>	 elukey: correct !
[09:40:51] <joal>	 elukey: I basically copy-pasted from existing crons
[09:40:56] <elukey>	 if you check site.pp now that host runs analytics_cluster::coordinator
[09:41:10] <joal>	 elukey: I hear that this is to change
[09:41:11] <elukey>	 (I know that name might not be the best but I had to pick one)
[09:41:28] <elukey>	 maybe the name of the role, but not the role itself
[09:41:46] <elukey>	 we can call it "batman" or whatever, but the point is that we'll need a single role for each host :)
[09:42:15] <joal>	 I do understand - I'd actually find it funnier to name analytics1003 batman \l9
[09:42:18] <joal>	 :)
[09:42:42] <joal>	 Sorry, I got distracted :)
[09:43:59] <elukey>	 I would love to call it batman but I can't sneak in such a name in site.pp :D
[09:44:05] <joal>	 :D
[09:44:13] <joal>	 there are wathers ;)
[09:44:27] <elukey>	 actually it would make sense, since it is our dark knight checking the cluster and establishing order
[09:44:30] <joal>	 Actually, given we are the a-team, cood be named looping
[09:48:22] <elukey>	 joal: as example, this is a refactoring that I should merge soonish https://gerrit.wikimedia.org/r/#/c/395527/
[09:48:30] <elukey>	 to move oozie/hive roles to profiles
[09:48:34] <elukey>	 just to give you an idea
[09:52:08] <joal>	 elukey: ok, will try to follow the pattern
[09:52:39] <joal>	 elukey: except from that code-architeture aspect, does the idea to make a temporary HTTP call to druid works for you?
[09:56:15] <elukey>	 joal: it might be good as temporary measure, but I don't like it as generic alarming procedure for druid to be honest
[09:57:07] <joal>	 makes sense elukey - we should go for realtime prometheus metrics - but I don't like it because it's more work on you
[09:58:06] <elukey>	 we could also alarm if the streaming job fails for some reason
[09:58:24] <joal>	 elukey: thing is, the streaming job doesn't fail
[09:58:36] <joal>	 even if no task is created on druid
[09:58:58] <elukey>	 so it is tranquillity acting weirdly?
[09:59:12] <joal>	 I'm not sure if it is weirdly or expected actually
[09:59:32] <joal>	 But yes, tranquility continues to work even if no task exists on druid
[10:00:39] <elukey>	 well if no task is present in druid it seems to me that tranquillity is not really doing its job no?
[10:00:51] <joal>	 I agree
[10:01:19] <elukey>	 the sad part is that we don't have logs (or maybe we don't have a procedure to retrieve them yet)
[10:02:01] <joal>	 elukey: app is running in yarn, so as long as the app runs, logs are only available on every machine, not aggregated on HDFS
[10:02:28] <elukey>	 yep this is why I mentioned that we don't have a procedure 
[10:02:42] <elukey>	 it is a bit cumbersome to find those logs iirc
[10:03:02] <joal>	 It is
[10:03:27] <elukey>	 I am wondering if we should have logs in logstash
[10:03:58] <joal>	 elukey: I think it;s too big of a burden given we have them in HDFS
[10:04:07] <joal>	 this streaming case is a very specific one
[10:06:05] <elukey>	 I am checking http://druid.io/docs/0.9.2/operations/metrics.html
[10:08:15] <elukey>	 joal: are the banner jobs down again?
[10:09:20] <joal>	 elukey: depends what you mean by down - job works, but is in a status that prevents it to work
[10:09:27] <joal>	 I have no clue what's happening :(
[10:10:16] <joal>	 elukey: There are some spark tasks that don't finish (on micro-bath of one minute have been running for 7h
[10:11:02] <joal>	 Those tasks are on the same host: analytics1033
[10:11:05] <elukey>	 maybe tranquillity doesn't like a lot that we reboot nodes when it operates
[10:11:37] <elukey>	 but iirc you managed to restart real time indexing yesterday right?
[10:11:56] <elukey>	 and then 7 hours ago stopped again? Did I get it correctly?
[10:11:57] <joal>	 I did restart yesterday
[10:13:52] <joal>	 elukey: I don't understand :( When looking at logs of the tasks that are still running, nothing special is mentinoed
[10:19:34] <joal>	 elukey: I'll kill / restart the streaming job, but that is not satisfactory at all :(
[10:24:06] <elukey>	 weird..
[10:24:25] <joal>	 elukey: looking at logs from yarn UI with ssh tunnel, I see some errors hapenned, but the job recovered fine, and then at one point, after the same again error it didn't recover
[10:24:59] <elukey>	 ah is there a yarn ui for each node manager?
[10:26:39] <joal>	 elukey: batcave?
[10:27:02] <elukey>	 joal: need to fix one thing in puppet but I'll come in a sec!
[10:32:03] <elukey>	 joal: I am in the cave
[10:52:03] <elukey>	 joal: network issues?
[10:52:36] <elukey>	 in the meantime I am trying to figure out how to add RealtimeMetricsMonitor
[10:52:50] <joal>	 Back in da cave elukey - using ff
[11:21:31] <elukey>	 joal: do you mind if I drain the middle manager on druid1001 and then restart it with the RealtimeMonitoring config?
[11:21:37] <elukey>	 to test if my understanding is right
[11:21:39] <joal>	 please go elukey 
[11:22:27] <elukey>	 !log disabled temporarily druid's middlemanager on druid1001 to test the Real Time monitor setting
[11:22:32] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[11:35:17] <mforns>	 hiii team
[11:35:26] <joal>	 Hi mforns :)
[11:35:42] <mforns>	 hey joal :]
[11:36:07] <joal>	 What's up mforns? Is india as welcoming as expected?
[11:59:53] <elukey>	 joal: found this in the middlemanager's metrics log even before activating the realtime monitoring
[11:59:56] <elukey>	 Event [{"feed":"metrics","timestamp":"2017-12-06T11:56:14.745Z","service":"druid/peon","host":"druid1001.eqiad.wmnet:8101","metric":"ingest/events/processed","value":153,"dataSource":"banner_activity_minutely","taskId":["index_realtime_banner_activity_minutely_2017-12-06T11:00:00.000Z_2_0"]}]
[12:00:16] <joal>	 Yay !
[12:02:28] <elukey>	 it also has other metrics like Event [{"feed":"metrics","timestamp":"2017-12-05T18:12:27.073Z","service":"druid/peon","host":"druid1001.eqiad.wmnet:8100","metric":"query/segment/time","value":5,"dataSource":"banner_activity_minutely","duration":"PT720S"
[12:02:46] <elukey>	 that are historical ones afaics, but logged by the realtime indexer?
[12:03:33] <joal>	 elukey: realtime nodes are queryable, so there could be same metrics as in historical
[12:03:47] <elukey>	 ah this part is new to me
[12:04:16] <elukey>	 how come they are queryable??
[12:13:22] <joal>	 elukey: in order for realtime data to be queryable, well, realtime noeds are queryable :)
[12:14:05] <joal>	 Queries on realtime node are only on currently-indexing segments (for us, an hour of data)
[12:15:36] <joal>	 for current-day previous hours, segments are sent to historical, but for currently indexing hour, data i qeryable in reatlime node
[12:15:42] <joal>	 elukey: --^
[12:15:45] <joal>	 Makes sense
[12:15:52] <elukey>	 ah so they are able to query that data, before it gets handed off to historicals
[12:15:59] <joal>	 indeed
[12:16:00] <elukey>	 *to be queriable for
[12:17:14] <joal>	 sorry for orthograph :(
[12:21:34] <elukey>	 joal: I have a patch + tests that seem to work for the druid_exporter :)
[12:21:47] <joal>	 Man - You're fast :)
[12:31:24] <mforns>	 joal, sorry missed your message, India has always been welcoming for us :] although always something unexpected, hehe
[12:39:54] <wikibugs>	 (03PS3) 10Joal: Update mediawiki-history-reduced job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/389496 (https://phabricator.wikimedia.org/T178504)
[12:40:34] <elukey>	 joal: code reviews out! Need to better test it but it should be ok
[12:40:43] <elukey>	 https://gerrit.wikimedia.org/r/395727
[12:40:55] <elukey>	 also removed an unused feature https://gerrit.wikimedia.org/r/395728
[12:42:43] <joal>	 Thanks a lot elukey! This will allow me to remove my temp part in my patch
[12:43:10] <joal>	 elukey: Currently puting the last effort over wks2 data, I'll probably modify the patch later on tonight
[12:51:27] <elukey>	 super
[12:51:40] <elukey>	 I also added the support for query/time|bytes for the peons
[12:54:30] <joal>	 :D
[12:56:28] <elukey>	 all right now that I think about it, it is 2PM and I haven't get lunch :D
[12:56:32] <elukey>	 *got 
[12:56:40] * elukey lunch!
[13:05:17] <wikibugs>	 (03PS1) 10Joal: Update mediawiki-history endpoints [analytics/aqs] - 10https://gerrit.wikimedia.org/r/395732 (https://phabricator.wikimedia.org/T178478)
[13:05:26] <joal>	 milimetric: --^ when you have time :)
[13:05:34] <joal>	 Gone for my break now a-team
[13:05:51] <milimetric>	 will do
[13:06:22] <joal>	 milimetric: The last patch for mediawiki-history reduced is also good to go I think
[13:06:34] <joal>	 milimetric: https://gerrit.wikimedia.org/r/389496
[13:06:53] <milimetric>	 great
[13:07:00] <joal>	 I left a bug in computing reduced, currently reindexing - with AQS patch, I think we'll be good
[13:07:12] <joal>	 gone now ;)
[13:36:50] <mforns>	 fdans, I managed to build the file with edits per wiki for all wikis on 2015-05-05
[13:36:55] <mforns>	 sent you an email with it
[13:51:56] <fdans>	 mforns: wonderful, I got the flattened per country data, I'm going to try and join them
[13:52:30] <mforns>	 fdans, I'm trying to transform db names into project names in an easy way now
[13:54:38] <fdans>	 mforns: you didn't send me an email in the end right?
[13:54:49] <mforns>	 mmm yes...
[13:54:50] <mforns>	 wai
[13:54:52] <mforns>	 t
[13:55:17] <mforns>	 fdans, yes I did! :o
[13:56:48] <wikibugs>	 (03CR) 10Fdans: [V: 032 C: 032] "Great change! We were discussing just this the other day during standup, but we were unaware that our current CI infrastructure had Firefo" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/395537 (owner: 10Hashar)
[13:57:39] <fdans>	 mf wow this is weird
[13:58:14] <mforns>	 fdans, sending it again
[13:58:19] <fdans>	 nonono ii have it
[13:58:26] <fdans>	 but look: 
[13:58:42] <fdans>	 https://usercontent.irccloud-cdn.com/file/HAAIG8IM/Screen%20Shot%202017-12-06%20at%2014.57.53.png
[13:58:45] <fdans>	 (now)
[13:58:53] <fdans>	 https://usercontent.irccloud-cdn.com/file/hfGU6Wsk/Screen%20Shot%202017-12-06%20at%2014.58.01.png
[13:59:04] <fdans>	 (same email, also now)
[13:59:40] <mforns>	 ???
[13:59:43] <mforns>	 don't get it
[14:00:39] <fdans>	 mforns: the times diverge even though they're the same email
[14:00:47] <mforns>	 fdans, oh ok
[14:01:05] <mforns>	 must be some time zone bug
[14:02:34] <fdans>	 mforns: imma run your data through the sitematrix before joining 💤 💤 💤 💤
[14:03:19] <mforns>	 fdans, yes I was trying that, how would you do that?
[14:03:52] <fdans>	 mforns: we can do it together in the cave
[14:03:58] <mforns>	 k! omw
[14:33:12] <elukey>	 joal: almost ready to reboot an1003 
[14:42:55] <wikibugs>	 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 4 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3816132 (10Ottomata) You can easily '[[ https://github.com/edenhill/kafkacat#quick-build | quickbuild ]]' kafkacat with a statically linked librdkafka....
[14:43:17] <wikibugs>	 (03CR) 10Milimetric: "I'm just not remembering what we talked about with the redirects, everything else looks good." (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/389496 (https://phabricator.wikimedia.org/T178504) (owner: 10Joal)
[14:47:54] <wikibugs>	 (03CR) 10Milimetric: Update mediawiki-history-reduced job (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/389496 (https://phabricator.wikimedia.org/T178504) (owner: 10Joal)
[14:48:35] <icinga-wm>	 PROBLEM - Oozie Server on analytics1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.catalina.startup.Bootstrap
[14:49:22] <elukey>	 what the hell oozie auto-restarts?
[14:50:09] <wikibugs>	 (03CR) 10Milimetric: "looks great, you can +2 whenever you're ready, I'm scared to :)" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/395732 (https://phabricator.wikimedia.org/T178478) (owner: 10Joal)
[14:51:33] <elukey>	 ah snap it does
[14:54:47] <elukey>	 of course hue didn't like it
[14:55:44] <elukey>	 !log oozie server accidentally restarted due to a puppet change (the service auto-restarts)
[14:55:50] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:55:55] <elukey>	 !log restart hue to pick up the new oozie server
[14:56:02] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:57:56] <icinga-wm>	 PROBLEM - Hue Server on thorium is CRITICAL: PROCS CRITICAL: 2 processes with command name python2.7, args /usr/lib/hue/build/env/bin/hue
[14:59:14] <elukey>	 ok this is weird
[14:59:56] <icinga-wm>	 RECOVERY - Hue Server on thorium is OK: PROCS OK: 1 process with command name python2.7, args /usr/lib/hue/build/env/bin/hue
[15:00:48] <elukey>	 so oozie is running, but port 11000 on an1003 is not good
[15:00:48] <ottomata>	 hiiiii
[15:00:49] <ottomata>	 :)
[15:01:08] <elukey>	 hiiiii
[15:01:34] <ottomata>	 oozie auto restarts?!
[15:01:37] <ottomata>	 but whY?!
[15:01:42] <ottomata>	 why would I do that?  i wouldn't do that!
[15:02:02] <ottomata>	         subscribe  => [
[15:02:02] <ottomata>	             File["${config_directory}/oozie-site.xml"],
[15:02:02] <ottomata>	             File["${config_directory}/oozie-env.sh"]
[15:02:02] <ottomata>	         ],
[15:02:05] <ottomata>	 what the heckaayy why
[15:02:06] <ottomata>	 haha
[15:02:43] <elukey>	 I think it is still not up
[15:03:12] <ottomata>	 its just always been that way...weird past me
[15:03:15] <ottomata>	 why did you do that?!
[15:03:45] <elukey>	 what do you mean?
[15:03:58] <elukey>	 ahhh sorry
[15:04:03] <elukey>	 didn't read all :D
[15:04:28] <ottomata>	 i'm talking to the past me who did that!
[15:05:00] <elukey>	 ahhh FATAL ERROR in native method: processing of -javaagent failed
[15:05:14] <elukey>	 Caused by: java.lang.IllegalArgumentException: Collector already registered that provides name: jmx_scrape_duration_seconds
[15:05:56] <ottomata>	 oooo
[15:06:11] <elukey>	 first time that happens, rolling back manually and then puppet
[15:08:35] <icinga-wm>	 RECOVERY - Oozie Server on analytics1003 is OK: PROCS OK: 1 process with command name java, args org.apache.catalina.startup.Bootstrap
[15:08:47] <elukey>	 hello oozie
[15:17:52] <elukey>	 ottomata: https://gerrit.wikimedia.org/r/#/c/395746/1/manifests/oozie/server.pp - wdyt?
[15:20:21] <elukey>	 ah there you go https://github.com/prometheus/client_java/issues/279
[15:21:12] <elukey>	 or something similar
[15:21:20] <ottomata>	 elukey:  move them to require instead of subscribe
[15:21:45] <elukey>	 sure, makes sense
[15:21:58] <joal>	 elukey: Heya I'm back if needed
[15:22:04] <elukey>	 done ottomata 
[15:22:19] <joal>	 elukey: You got ottomata - you're actually way safer than with me :)
[15:29:44] <elukey>	 joal, ottomata - ops sync in batcave-2, fdans stole the cave! :P
[15:33:36] <wikibugs>	 10Analytics, 10Analytics-Wikimetrics, 10Software-Licensing: Add a license file to wikimetrics - https://phabricator.wikimedia.org/T60753#3816306 (10MarcoAurelio)
[15:57:13] <fdans>	 elukey: ଉଇକିଅଭିଧାନ, ଉଇକିପିଡ଼ିଆର ଏକ ସହଯୋଗୀ ପ୍ରକଳ୍ପ । ଓଡ଼ିଆ ଉଇକିଅଭିଧାନରେ ବର୍ତ୍ତମାନ ସୁଦ୍ଧା ୧,୦୮,୫୯୦ଟି ଶବ୍ଦ ରହିଛି । ଏହି ଶବ୍ଦ ଗୁଡ଼ିକ ଓଡ଼ିଆ ସମେତ ଆହୁରି ଅନେକ ଭାଷାରେ
[15:57:13] <fdans>	 ରହିଛି କିନ୍ତୁ ଏହାର ଶବ୍ଦାର୍ଥ ଓ ବ୍ୟାଖ୍ୟା ଓଡ଼ିଆରେ ଅଛି । ଯଥା - ଓଡ଼ିଆ ଶବ୍ଦର ଓଡ଼ିଆ ଶବ୍ଦାର୍ଥ; ବଙ୍ଗାଳୀ ଶବ୍ଦର ଓଡ଼ିଆ ଶବ୍ଦାର୍ଥ; ଇଂରାଜୀ ଶବ୍ଦର ଓଡ଼ିଆ ଶବ୍ଦାର୍ଥ ଇତ୍ୟାଦି ।
[15:57:14] <fdans>	 ଏହି ଅଭିଧାନଟି ମୁଖ୍ୟତଃ ସ୍ଵେଚ୍ଛାସେବୀମାନଙ୍କ ଦ୍ଵାରା ତିଆରି ହୋଇଛି । ଏହାକୁ ଯେକେହି ସମ୍ପାଦନା କରିପାରିବେ ଓ ବ୍ୟବହାର ମଧ୍ୟ କରିପାରିବେ । ଏଥିପାଇଁ କୌଣସି ଶୁଳ୍କ ଦେବାକୁ
[15:57:14] <fdans>	 ପଡେନାହିଁ
[15:57:55] <elukey>	 whhhaaaattt
[16:00:42] <ottomata>	 oh a-team, i think nuria and I have another meeting scheduled right now for lawsuit stuff
[16:00:43] <ottomata>	 ?
[16:01:03] <elukey>	 Francisco will take over
[16:01:05] <elukey>	 :D
[16:45:57] <leila>	 hi all. soo, do we have a ticket somewhere that you know of about providing HTML dumps, on top of the XML dumps?
[16:46:52] <leila>	 we're running to a problem for quite a few projects where we need the HTML version of the files, and the workaround is to hit the API, which is not really good for the servers and for us.
[16:50:25] <joal>	 Hi leila - I've not heard of that need before
[16:51:08] <joal>	 leila: we know there some interesting cases for HTML, but those dumps would be bigger than the text ones
[16:52:14] <leila>	 got it joal. let me create a ticket for this and expand on it. This comes up over and over in our interactions with outside researchers and the work we do. And I just realized we have never created ticket for this. ;)
[16:53:57] <AndyRussG>	 hi! just a quick check, these are also maintenance outages? Thanks!!!!! :D https://goo.gl/PfSDc4
[16:54:28] <joal>	 AndyRussG: there are problems with the job currently, we're on it but haven't found the problem yet
[16:58:38] <AndyRussG>	 joal: ah ok gotcha, thanks! pls lmk if I can help with anything :)
[17:28:27] <elukey>	 joal: deployed the new version of the druid exporter on druid1001, and also disabled the middlemanager via API
[17:29:10] <elukey>	 once drained, I am going to add the config to the middle manager to push metrics via http endpoint
[17:29:15] <elukey>	 and restart it
[17:29:19] <elukey>	 let's see how it goes
[17:30:36] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Add link to new wikistats 2.0 to wikistats 1.0 pages - https://phabricator.wikimedia.org/T182001#3816768 (10Nuria)  @milimetric can you suggest teh wording for the link to wikistats 2.0  on stats.wikimedia.org, talked to @ezachte and he will be happy to change text
[17:34:48] <wikibugs>	 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10Patch-For-Review: kafka1018 fails to boot - https://phabricator.wikimedia.org/T181518#3816790 (10madhuvishy) @elukey I recommend copying home directories on notebook1002 and back them up somewhere on notebook1001, and send a note to analytics and research-l...
[17:36:38] <wikibugs>	 10Analytics-Tech-community-metrics, 10Developer-Relations (Oct-Dec 2017), 10Upstream: Understand difference between author_name and name in gerrit - https://phabricator.wikimedia.org/T177890#3816810 (10Aklapper)
[17:38:08] <wikibugs>	 10Analytics-Tech-community-metrics, 10Developer-Relations (Oct-Dec 2017), 10Upstream: Understand difference between author_name and name in gerrit - https://phabricator.wikimedia.org/T177890#3674048 (10Aklapper) 05Open>03Resolved `author_name` is the global field name; the other two names are data source...
[17:38:42] <wikibugs>	 10Analytics-Tech-community-metrics, 10Developer-Relations (Oct-Dec 2017): Understand difference between author_name and name in gerrit - https://phabricator.wikimedia.org/T177890#3816820 (10Aklapper)
[18:16:52] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Handle error due to lack of data - https://phabricator.wikimedia.org/T182224#3817033 (10Milimetric)
[18:17:03] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Handle error due to lack of data - https://phabricator.wikimedia.org/T182224#3817045 (10Milimetric) a:03Milimetric
[18:17:18] <wikibugs>	 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Migrate htmlCacheUpdate job to Kafka - https://phabricator.wikimedia.org/T182023#3817046 (10mobrovac)
[18:24:17] <elukey>	 joal: 
[18:24:32] <joal>	 yes elukey 
[18:24:33] <elukey>	 just restarted the druid1001's middlemanager (was drained) and set the http emitter config
[18:24:47] <elukey>	 eventually it should produce the new metrics
[18:24:59] <elukey>	 if tomorrow all is good, I'll deploy them on all the nodes
[18:25:04] <elukey>	 and make everything permanent
[18:27:56] <joal>	 <3
[18:31:36] <elukey>	 hoping that the peon will pick up middlemanager's settings
[18:35:42] <wikibugs>	 (03PS2) 10Ottomata: Deploy repo for superset (python) [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/392933 (https://phabricator.wikimedia.org/T166689)
[18:37:05] <elukey>	 ottomata: had a chat with the Traffic team, they are ok with the idea of a vk test instance on cache misc with TLS
[18:38:04] <elukey>	 I am wondering if we could reboot analytics1003
[18:38:31] <ottomata>	 great
[18:38:44] <ottomata>	 elukey:  we'd have to schedule a little downtime i think
[18:38:47] <ottomata>	 since hive etc. will be offline
[18:39:06] <elukey>	 yeah better to send an email first
[18:39:14] <elukey>	 I can scheduled it for tomorrow morning EU time
[18:39:19] <elukey>	 I'll do it with joal 
[18:39:19] <wikibugs>	 (03PS3) 10Ottomata: Deploy repo for superset (python) [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/392933 (https://phabricator.wikimedia.org/T166689)
[18:39:34] <wikibugs>	 (03PS4) 10Ottomata: Deploy repo for superset (python) [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/392933 (https://phabricator.wikimedia.org/T166689)
[18:39:39] <ottomata>	 ok great sounds good
[18:40:54] <wikibugs>	 (03PS5) 10Ottomata: Deploy repo for superset (python) [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/392933 (https://phabricator.wikimedia.org/T166689)
[18:41:44] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] Deploy repo for superset (python) (031 comment) [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/392933 (https://phabricator.wikimedia.org/T166689) (owner: 10Ottomata)
[18:43:46] <wikibugs>	 (03PS1) 10Ottomata: Adding artifacts for jessie superset==0.20.6 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395804 (https://phabricator.wikimedia.org/T166689)
[18:44:06] <joal>	 sure elukey 
[18:44:16] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] Adding artifacts for jessie superset==0.20.6 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395804 (https://phabricator.wikimedia.org/T166689) (owner: 10Ottomata)
[18:48:06] <elukey>	 email sent
[18:50:14] <ottomata>	 elukey: i'm working on puppetizing superset
[18:50:17] <ottomata>	 we will run it on thorium, like pivot
[18:50:31] <ottomata>	 do you think i should puppetizie it like pivot, with stuff in statistics::sites::superset
[18:50:32] <ottomata>	 OR
[18:50:34] <ottomata>	 do a profile
[18:50:40] <ottomata>	 and include that profile from role::statistics::web
[18:53:18] <elukey>	 I think we can do as for pivot, and then decide later on how to refactor all that code?
[18:53:26] <ottomata>	 ok cool
[18:54:06] <ottomata>	 hmmm, but hmm.  superset has more configuration than pivot does
[18:54:19] <ottomata>	 i need class params somewhere and hiera somewhere.
[18:54:33] <ottomata>	 i could put the hiera for it in role/common/statistics/web.yaml
[18:54:36] <ottomata>	 hmm
[18:54:53] <elukey>	 if you think that a profile is ok let's do it
[18:55:07] <ottomata>	 hmmm
[18:55:20] <ottomata>	 ok i'm going to try profile first, might make it easier, not sure
[18:58:32] <wikibugs>	 (03PS4) 10Joal: Update mediawiki-history-reduced job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/389496 (https://phabricator.wikimedia.org/T178504)
[19:04:39] <wikibugs>	 (03PS1) 10Ottomata: Use -e instead of -d for checking for deploy dir, in case it is a symlink [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395807
[19:04:39] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] Use -e instead of -d for checking for deploy dir, in case it is a symlink [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395807 (owner: 10Ottomata)
[19:05:00] <elukey>	 ok also sent an email to analytics@ to announce the notebook1002 -> kafka1023 reimage
[19:05:31] <elukey>	 ottomata: if you have time today would you mind to copy the notebook1002's home dirs to notebook1001's /srv ?
[19:05:37] <elukey>	 should be few megs
[19:05:41] <elukey>	 but just to be sure
[19:06:28] <halfak>	 o/ fdans
[19:06:58] <ottomata>	 elukey:  sure
[19:09:57] <ottomata>	 elukey:  notebook1001:/srv/notebook1002_home_dirs
[19:10:38] <elukey>	 super
[19:10:41] <elukey>	 joal: https://grafana.wikimedia.org/dashboard/db/prometheus-druid?orgId=1&panelId=41&fullscreen
[19:10:51] <elukey>	 ottomata: --^
[19:10:57] <elukey>	 this is only from druid1001 atm
[19:11:04] <elukey>	 buuuut tomorrow I'll deploy it everywhere
[19:11:05] <elukey>	 \o/
[19:12:40] <elukey>	 all right people I am going home
[19:12:46] <elukey>	 talk with you tomorrow!
[19:12:47] <elukey>	 o/
[19:12:52] * elukey off
[19:37:14] <wikibugs>	 10Analytics-Kanban, 10Pageviews-API, 10Services (watching): Endpoints that 404 no longer have the "Access-Control-Allow-Origin" header - https://phabricator.wikimedia.org/T179113#3817600 (10Pchelolo) @Milimetric sorry for late response. Seems like it's a bug in RESTBase and it's happening for all the service...
[19:39:19] <ottomata>	 elukey:  niiiice!
[19:41:07] <wikibugs>	 10Analytics-Kanban, 10Pageviews-API, 10Services (watching): Endpoints that 404 no longer have the "Access-Control-Allow-Origin" header - https://phabricator.wikimedia.org/T179113#3817623 (10Milimetric) @Nuria I'm going to mark this done from our side, and when you close it, if you remember, just merge it int...
[19:51:32] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Add link to new wikistats 2.0 to wikistats 1.0 pages - https://phabricator.wikimedia.org/T182001#3817686 (10Milimetric) @ezachte you should say anything you like here, but here's how I see it:  2017-12: Wikistats has been redesigned for simplicity and faster data proc...
[19:58:58] <wikibugs>	 (03PS1) 10Ottomata: Move common vars to profile.sh, add init_superset.sh [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395822
[19:59:27] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] Move common vars to profile.sh, add init_superset.sh [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395822 (owner: 10Ottomata)
[20:02:01] <wikibugs>	 (03PS1) 10Ottomata: chmod 755 init_superset.sh [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395824
[20:02:08] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] chmod 755 init_superset.sh [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395824 (owner: 10Ottomata)
[20:13:36] <joal>	 nuria_: updated the docs with redirect data: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats
[20:27:58] <wikibugs>	 10Analytics, 10Analytics-Cluster: Unconfigured Hue's use of webhdfs/httpfs - https://phabricator.wikimedia.org/T182242#3817825 (10Ottomata)
[20:28:04] <wikibugs>	 10Analytics, 10Analytics-Cluster: Unconfigure Hue's use of webhdfs/httpfs - https://phabricator.wikimedia.org/T182242#3817835 (10Ottomata)
[21:32:11] <joal>	 AndyRussG: I kill the realime job, it's failing without us understanding - I'll investigate more tomorrow
[21:51:05] <wikibugs>	 10Analytics, 10Analytics-EventLogging: Unit tests for Event Logging - https://phabricator.wikimedia.org/T86543#3818162 (10ggellerman)
[22:07:00] <wikibugs>	 10Analytics, 10EventBus, 10MediaWiki-JobQueue: Create scripts to estimate Kafka queue size per wiki - https://phabricator.wikimedia.org/T182259#3818223 (10Pchelolo)
[22:07:26] <wikibugs>	 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10Services (doing): Create scripts to estimate Kafka queue size per wiki - https://phabricator.wikimedia.org/T182259#3818238 (10Pchelolo)
[22:07:44] <wikibugs>	 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10Services (doing): Create scripts to estimate Kafka queue size per wiki - https://phabricator.wikimedia.org/T182259#3818223 (10Pchelolo)
[22:32:50] <wikibugs>	 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3818312 (10Jdlrobson) Are there plans to expose this in https://stream.wikimedia.org/v2/stream/recentchange ?
[22:34:02] <ebernhardson>	 by chance does anyone know what yarn actually checks to see what memory usage of a 'container' is to decide if it should kill it for overrunning?
[23:13:59] <nuria_>	 ebernhardson: old but probably of help: https://issues.apache.org/jira/browse/YARN-2984
[23:23:08] <ebernhardson>	 digging through, it looks like its using rss from /proc/$pid/stats, will start logging that with my jobs. Trying to track down why i have to massively over-allocate memory to keep our model training from getting killed by yarn
[23:24:08] <ebernhardson>	 well, that along with a process tree implementation that sums up rss across child processes
[23:33:13] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Add link to new wikistats 2.0 to wikistats 1.0 pages - https://phabricator.wikimedia.org/T182001#3809460 (10Erik_Zachte) Here is a first draft: https://stats.wikimedia.org/wikinews/EN/draft/TablesWikipediaES.htm  Please comment.  In this example the deep-link to some...
[23:35:47] <wikibugs>	 (03PS1) 10Ottomata: Improvements to deployment scripts [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395887 (https://phabricator.wikimedia.org/T166689)
[23:35:56] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] Improvements to deployment scripts [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395887 (https://phabricator.wikimedia.org/T166689) (owner: 10Ottomata)
[23:58:42] <wikibugs>	 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3818583 (10Ottomata) Yes!  It's just not prioritized, so I only get to work on it when I have a bit of headroom...
[23:58:57] <wikibugs>	 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3818584 (10Ottomata) Wait, in recentchange?  No, it will be its own stream.