[00:01:23] 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 4 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3141307 (10Pchelolo) >>! In T161731#3814596, @Smalyshev wrote: > @Ottomata thanks, I can connect to the hosts above, but still not sure how to control... [00:23:20] 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 4 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3814801 (10Smalyshev) @Pchelolo thanks for the pointer, this is very helpful! Indeed, kafkacat for example supports it [[ https://github.com/edenhill... [00:35:12] 10Analytics, 10Services (watching): Update to latest kafkacat - https://phabricator.wikimedia.org/T182163#3814846 (10Pchelolo) [00:35:29] 10Analytics, 10Services (watching): Update to latest kafkacat - https://phabricator.wikimedia.org/T182163#3814862 (10Pchelolo) [00:36:36] 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 4 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3814868 (10Pchelolo) Indeed. Created T182163 for updating `kafkacat` but it is a bit more complex then simply installing new version - it depends on ne... [00:47:54] 10Analytics, 10Services (watching): Update to latest kafkacat - https://phabricator.wikimedia.org/T182163#3814898 (10Smalyshev) Looks like stretch-backports [[ https://packages.debian.org/stretch-backports/librdkafka1 | has 0.11 ]], and there's 1.3.1 package [[ https://packages.debian.org/buster/kafkacat | fo... [00:48:34] 10Analytics, 10EventBus, 10MW-1.31-release-notes (WMF-deploy-2017-11-14 (1.31.0-wmf.8)), 10Patch-For-Review, 10Services (next): Timeouts on event delivery to EventBus - https://phabricator.wikimedia.org/T180017#3814900 (10Pchelolo) After the deployment of the above patches, we haven't seen and `Attribute... [09:31:54] Heya elukey - Let me know when you want to talk about streaming-jobs in puppet [09:32:24] joal: o/ [09:32:34] do you mean your code review? [09:33:29] Yessir [09:37:28] so I added some suggestions to the code review [09:37:40] let me explain the idea in here so we can chat about it [09:38:39] we are transitioning from a model in which a host in site.pp holds multiple roles to new one in which every host holds one single role, that in turn includes profiles [09:39:12] profiles are basically what we used as "roles" before, stored in a different path (modules -> profiles -> manifests -> etc.. [09:39:49] profiles should get their parameters from hiera, but it is not a problem to have defaults [09:40:23] elukey: I hink I understand [09:40:27] in this case, you probably want to run the check only on analytics1003 right? [09:40:34] elukey: correct ! [09:40:51] elukey: I basically copy-pasted from existing crons [09:40:56] if you check site.pp now that host runs analytics_cluster::coordinator [09:41:10] elukey: I hear that this is to change [09:41:11] (I know that name might not be the best but I had to pick one) [09:41:28] maybe the name of the role, but not the role itself [09:41:46] we can call it "batman" or whatever, but the point is that we'll need a single role for each host :) [09:42:15] I do understand - I'd actually find it funnier to name analytics1003 batman \l9 [09:42:18] :) [09:42:42] Sorry, I got distracted :) [09:43:59] I would love to call it batman but I can't sneak in such a name in site.pp :D [09:44:05] :D [09:44:13] there are wathers ;) [09:44:27] actually it would make sense, since it is our dark knight checking the cluster and establishing order [09:44:30] Actually, given we are the a-team, cood be named looping [09:48:22] joal: as example, this is a refactoring that I should merge soonish https://gerrit.wikimedia.org/r/#/c/395527/ [09:48:30] to move oozie/hive roles to profiles [09:48:34] just to give you an idea [09:52:08] elukey: ok, will try to follow the pattern [09:52:39] elukey: except from that code-architeture aspect, does the idea to make a temporary HTTP call to druid works for you? [09:56:15] joal: it might be good as temporary measure, but I don't like it as generic alarming procedure for druid to be honest [09:57:07] makes sense elukey - we should go for realtime prometheus metrics - but I don't like it because it's more work on you [09:58:06] we could also alarm if the streaming job fails for some reason [09:58:24] elukey: thing is, the streaming job doesn't fail [09:58:36] even if no task is created on druid [09:58:58] so it is tranquillity acting weirdly? [09:59:12] I'm not sure if it is weirdly or expected actually [09:59:32] But yes, tranquility continues to work even if no task exists on druid [10:00:39] well if no task is present in druid it seems to me that tranquillity is not really doing its job no? [10:00:51] I agree [10:01:19] the sad part is that we don't have logs (or maybe we don't have a procedure to retrieve them yet) [10:02:01] elukey: app is running in yarn, so as long as the app runs, logs are only available on every machine, not aggregated on HDFS [10:02:28] yep this is why I mentioned that we don't have a procedure [10:02:42] it is a bit cumbersome to find those logs iirc [10:03:02] It is [10:03:27] I am wondering if we should have logs in logstash [10:03:58] elukey: I think it;s too big of a burden given we have them in HDFS [10:04:07] this streaming case is a very specific one [10:06:05] I am checking http://druid.io/docs/0.9.2/operations/metrics.html [10:08:15] joal: are the banner jobs down again? [10:09:20] elukey: depends what you mean by down - job works, but is in a status that prevents it to work [10:09:27] I have no clue what's happening :( [10:10:16] elukey: There are some spark tasks that don't finish (on micro-bath of one minute have been running for 7h [10:11:02] Those tasks are on the same host: analytics1033 [10:11:05] maybe tranquillity doesn't like a lot that we reboot nodes when it operates [10:11:37] but iirc you managed to restart real time indexing yesterday right? [10:11:56] and then 7 hours ago stopped again? Did I get it correctly? [10:11:57] I did restart yesterday [10:13:52] elukey: I don't understand :( When looking at logs of the tasks that are still running, nothing special is mentinoed [10:19:34] elukey: I'll kill / restart the streaming job, but that is not satisfactory at all :( [10:24:06] weird.. [10:24:25] elukey: looking at logs from yarn UI with ssh tunnel, I see some errors hapenned, but the job recovered fine, and then at one point, after the same again error it didn't recover [10:24:59] ah is there a yarn ui for each node manager? [10:26:39] elukey: batcave? [10:27:02] joal: need to fix one thing in puppet but I'll come in a sec! [10:32:03] joal: I am in the cave [10:52:03] joal: network issues? [10:52:36] in the meantime I am trying to figure out how to add RealtimeMetricsMonitor [10:52:50] Back in da cave elukey - using ff [11:21:31] joal: do you mind if I drain the middle manager on druid1001 and then restart it with the RealtimeMonitoring config? [11:21:37] to test if my understanding is right [11:21:39] please go elukey [11:22:27] !log disabled temporarily druid's middlemanager on druid1001 to test the Real Time monitor setting [11:22:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:35:17] hiii team [11:35:26] Hi mforns :) [11:35:42] hey joal :] [11:36:07] What's up mforns? Is india as welcoming as expected? [11:59:53] joal: found this in the middlemanager's metrics log even before activating the realtime monitoring [11:59:56] Event [{"feed":"metrics","timestamp":"2017-12-06T11:56:14.745Z","service":"druid/peon","host":"druid1001.eqiad.wmnet:8101","metric":"ingest/events/processed","value":153,"dataSource":"banner_activity_minutely","taskId":["index_realtime_banner_activity_minutely_2017-12-06T11:00:00.000Z_2_0"]}] [12:00:16] Yay ! [12:02:28] it also has other metrics like Event [{"feed":"metrics","timestamp":"2017-12-05T18:12:27.073Z","service":"druid/peon","host":"druid1001.eqiad.wmnet:8100","metric":"query/segment/time","value":5,"dataSource":"banner_activity_minutely","duration":"PT720S" [12:02:46] that are historical ones afaics, but logged by the realtime indexer? [12:03:33] elukey: realtime nodes are queryable, so there could be same metrics as in historical [12:03:47] ah this part is new to me [12:04:16] how come they are queryable?? [12:13:22] elukey: in order for realtime data to be queryable, well, realtime noeds are queryable :) [12:14:05] Queries on realtime node are only on currently-indexing segments (for us, an hour of data) [12:15:36] for current-day previous hours, segments are sent to historical, but for currently indexing hour, data i qeryable in reatlime node [12:15:42] elukey: --^ [12:15:45] Makes sense [12:15:52] ah so they are able to query that data, before it gets handed off to historicals [12:15:59] indeed [12:16:00] *to be queriable for [12:17:14] sorry for orthograph :( [12:21:34] joal: I have a patch + tests that seem to work for the druid_exporter :) [12:21:47] Man - You're fast :) [12:31:24] joal, sorry missed your message, India has always been welcoming for us :] although always something unexpected, hehe [12:39:54] (03PS3) 10Joal: Update mediawiki-history-reduced job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/389496 (https://phabricator.wikimedia.org/T178504) [12:40:34] joal: code reviews out! Need to better test it but it should be ok [12:40:43] https://gerrit.wikimedia.org/r/395727 [12:40:55] also removed an unused feature https://gerrit.wikimedia.org/r/395728 [12:42:43] Thanks a lot elukey! This will allow me to remove my temp part in my patch [12:43:10] elukey: Currently puting the last effort over wks2 data, I'll probably modify the patch later on tonight [12:51:27] super [12:51:40] I also added the support for query/time|bytes for the peons [12:54:30] :D [12:56:28] all right now that I think about it, it is 2PM and I haven't get lunch :D [12:56:32] *got [12:56:40] * elukey lunch! [13:05:17] (03PS1) 10Joal: Update mediawiki-history endpoints [analytics/aqs] - 10https://gerrit.wikimedia.org/r/395732 (https://phabricator.wikimedia.org/T178478) [13:05:26] milimetric: --^ when you have time :) [13:05:34] Gone for my break now a-team [13:05:51] will do [13:06:22] milimetric: The last patch for mediawiki-history reduced is also good to go I think [13:06:34] milimetric: https://gerrit.wikimedia.org/r/389496 [13:06:53] great [13:07:00] I left a bug in computing reduced, currently reindexing - with AQS patch, I think we'll be good [13:07:12] gone now ;) [13:36:50] fdans, I managed to build the file with edits per wiki for all wikis on 2015-05-05 [13:36:55] sent you an email with it [13:51:56] mforns: wonderful, I got the flattened per country data, I'm going to try and join them [13:52:30] fdans, I'm trying to transform db names into project names in an easy way now [13:54:38] mforns: you didn't send me an email in the end right? [13:54:49] mmm yes... [13:54:50] wai [13:54:52] t [13:55:17] fdans, yes I did! :o [13:56:48] (03CR) 10Fdans: [V: 032 C: 032] "Great change! We were discussing just this the other day during standup, but we were unaware that our current CI infrastructure had Firefo" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/395537 (owner: 10Hashar) [13:57:39] mf wow this is weird [13:58:14] fdans, sending it again [13:58:19] nonono ii have it [13:58:26] but look: [13:58:42] https://usercontent.irccloud-cdn.com/file/HAAIG8IM/Screen%20Shot%202017-12-06%20at%2014.57.53.png [13:58:45] (now) [13:58:53] https://usercontent.irccloud-cdn.com/file/hfGU6Wsk/Screen%20Shot%202017-12-06%20at%2014.58.01.png [13:59:04] (same email, also now) [13:59:40] ??? [13:59:43] don't get it [14:00:39] mforns: the times diverge even though they're the same email [14:00:47] fdans, oh ok [14:01:05] must be some time zone bug [14:02:34] mforns: imma run your data through the sitematrix before joining ЁЯТд ЁЯТд ЁЯТд ЁЯТд [14:03:19] fdans, yes I was trying that, how would you do that? [14:03:52] mforns: we can do it together in the cave [14:03:58] k! omw [14:33:12] joal: almost ready to reboot an1003 [14:42:55] 10Analytics, 10Discovery, 10EventBus, 10Wikidata, and 4 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3816132 (10Ottomata) You can easily '[[ https://github.com/edenhill/kafkacat#quick-build | quickbuild ]]' kafkacat with a statically linked librdkafka.... [14:43:17] (03CR) 10Milimetric: "I'm just not remembering what we talked about with the redirects, everything else looks good." (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/389496 (https://phabricator.wikimedia.org/T178504) (owner: 10Joal) [14:47:54] (03CR) 10Milimetric: Update mediawiki-history-reduced job (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/389496 (https://phabricator.wikimedia.org/T178504) (owner: 10Joal) [14:48:35] PROBLEM - Oozie Server on analytics1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.catalina.startup.Bootstrap [14:49:22] what the hell oozie auto-restarts? [14:50:09] (03CR) 10Milimetric: "looks great, you can +2 whenever you're ready, I'm scared to :)" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/395732 (https://phabricator.wikimedia.org/T178478) (owner: 10Joal) [14:51:33] ah snap it does [14:54:47] of course hue didn't like it [14:55:44] !log oozie server accidentally restarted due to a puppet change (the service auto-restarts) [14:55:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:55:55] !log restart hue to pick up the new oozie server [14:56:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:57:56] PROBLEM - Hue Server on thorium is CRITICAL: PROCS CRITICAL: 2 processes with command name python2.7, args /usr/lib/hue/build/env/bin/hue [14:59:14] ok this is weird [14:59:56] RECOVERY - Hue Server on thorium is OK: PROCS OK: 1 process with command name python2.7, args /usr/lib/hue/build/env/bin/hue [15:00:48] so oozie is running, but port 11000 on an1003 is not good [15:00:48] hiiiii [15:00:49] :) [15:01:08] hiiiii [15:01:34] oozie auto restarts?! [15:01:37] but whY?! [15:01:42] why would I do that? i wouldn't do that! [15:02:02] subscribe => [ [15:02:02] File["${config_directory}/oozie-site.xml"], [15:02:02] File["${config_directory}/oozie-env.sh"] [15:02:02] ], [15:02:05] what the heckaayy why [15:02:06] haha [15:02:43] I think it is still not up [15:03:12] its just always been that way...weird past me [15:03:15] why did you do that?! [15:03:45] what do you mean? [15:03:58] ahhh sorry [15:04:03] didn't read all :D [15:04:28] i'm talking to the past me who did that! [15:05:00] ahhh FATAL ERROR in native method: processing of -javaagent failed [15:05:14] Caused by: java.lang.IllegalArgumentException: Collector already registered that provides name: jmx_scrape_duration_seconds [15:05:56] oooo [15:06:11] first time that happens, rolling back manually and then puppet [15:08:35] RECOVERY - Oozie Server on analytics1003 is OK: PROCS OK: 1 process with command name java, args org.apache.catalina.startup.Bootstrap [15:08:47] hello oozie [15:17:52] ottomata: https://gerrit.wikimedia.org/r/#/c/395746/1/manifests/oozie/server.pp - wdyt? [15:20:21] ah there you go https://github.com/prometheus/client_java/issues/279 [15:21:12] or something similar [15:21:20] elukey: move them to require instead of subscribe [15:21:45] sure, makes sense [15:21:58] elukey: Heya I'm back if needed [15:22:04] done ottomata [15:22:19] elukey: You got ottomata - you're actually way safer than with me :) [15:29:44] joal, ottomata - ops sync in batcave-2, fdans stole the cave! :P [15:33:36] 10Analytics, 10Analytics-Wikimetrics, 10Software-Licensing: Add a license file to wikimetrics - https://phabricator.wikimedia.org/T60753#3816306 (10MarcoAurelio) [15:57:13] elukey: рмЙрмЗрмХрм┐рмЕрмнрм┐рмзрм╛рми, рмЙрмЗрмХрм┐рмкрм┐рмбрм╝рм┐рмЖрм░ рмПрмХ рм╕рм╣рмпрнЛрмЧрнА рмкрнНрм░рмХрм│рнНрмк ред рмУрмбрм╝рм┐рмЖ рмЙрмЗрмХрм┐рмЕрмнрм┐рмзрм╛рмирм░рнЗ рммрм░рнНрмдрнНрмдрморм╛рми рм╕рнБрмжрнНрмзрм╛ рнз,рнжрно,рнлрнпрнжрмЯрм┐ рм╢рммрнНрмж рм░рм╣рм┐рмЫрм┐ ред рмПрм╣рм┐ рм╢рммрнНрмж рмЧрнБрмбрм╝рм┐рмХ рмУрмбрм╝рм┐рмЖ рм╕рморнЗрмд рмЖрм╣рнБрм░рм┐ рмЕрмирнЗрмХ рмнрм╛рм╖рм╛рм░рнЗ [15:57:13] рм░рм╣рм┐рмЫрм┐ рмХрм┐рмирнНрмдрнБ рмПрм╣рм╛рм░ рм╢рммрнНрмжрм╛рм░рнНрме рмУ рммрнНрнЯрм╛рмЦрнНрнЯрм╛ рмУрмбрм╝рм┐рмЖрм░рнЗ рмЕрмЫрм┐ ред рмпрмерм╛ - рмУрмбрм╝рм┐рмЖ рм╢рммрнНрмжрм░ рмУрмбрм╝рм┐рмЖ рм╢рммрнНрмжрм╛рм░рнНрме; рммрмЩрнНрмЧрм╛рм│рнА рм╢рммрнНрмжрм░ рмУрмбрм╝рм┐рмЖ рм╢рммрнНрмжрм╛рм░рнНрме; рмЗрмВрм░рм╛рмЬрнА рм╢рммрнНрмжрм░ рмУрмбрм╝рм┐рмЖ рм╢рммрнНрмжрм╛рм░рнНрме рмЗрмдрнНрнЯрм╛рмжрм┐ ред [15:57:14] рмПрм╣рм┐ рмЕрмнрм┐рмзрм╛рмирмЯрм┐ рморнБрмЦрнНрнЯрмдрмГ рм╕рнНрм╡рнЗрмЪрнНрмЫрм╛рм╕рнЗрммрнАрморм╛рмирмЩрнНрмХ рмжрнНрм╡рм╛рм░рм╛ рмдрм┐рмЖрм░рм┐ рм╣рнЛрмЗрмЫрм┐ ред рмПрм╣рм╛рмХрнБ рмпрнЗрмХрнЗрм╣рм┐ рм╕рморнНрмкрм╛рмжрмирм╛ рмХрм░рм┐рмкрм╛рм░рм┐рммрнЗ рмУ рммрнНрнЯрммрм╣рм╛рм░ рмормзрнНрнЯ рмХрм░рм┐рмкрм╛рм░рм┐рммрнЗ ред рмПрмерм┐рмкрм╛рмЗрмБ рмХрнМрмгрм╕рм┐ рм╢рнБрм│рнНрмХ рмжрнЗрммрм╛рмХрнБ [15:57:14] рмкрмбрнЗрмирм╛рм╣рм┐рмБ [15:57:55] whhhaaaattt [16:00:42] oh a-team, i think nuria and I have another meeting scheduled right now for lawsuit stuff [16:00:43] ? [16:01:03] Francisco will take over [16:01:05] :D [16:45:57] hi all. soo, do we have a ticket somewhere that you know of about providing HTML dumps, on top of the XML dumps? [16:46:52] we're running to a problem for quite a few projects where we need the HTML version of the files, and the workaround is to hit the API, which is not really good for the servers and for us. [16:50:25] Hi leila - I've not heard of that need before [16:51:08] leila: we know there some interesting cases for HTML, but those dumps would be bigger than the text ones [16:52:14] got it joal. let me create a ticket for this and expand on it. This comes up over and over in our interactions with outside researchers and the work we do. And I just realized we have never created ticket for this. ;) [16:53:57] hi! just a quick check, these are also maintenance outages? Thanks!!!!! :D https://goo.gl/PfSDc4 [16:54:28] AndyRussG: there are problems with the job currently, we're on it but haven't found the problem yet [16:58:38] joal: ah ok gotcha, thanks! pls lmk if I can help with anything :) [17:28:27] joal: deployed the new version of the druid exporter on druid1001, and also disabled the middlemanager via API [17:29:10] once drained, I am going to add the config to the middle manager to push metrics via http endpoint [17:29:15] and restart it [17:29:19] let's see how it goes [17:30:36] 10Analytics-Kanban, 10Analytics-Wikistats: Add link to new wikistats 2.0 to wikistats 1.0 pages - https://phabricator.wikimedia.org/T182001#3816768 (10Nuria) @milimetric can you suggest teh wording for the link to wikistats 2.0 on stats.wikimedia.org, talked to @ezachte and he will be happy to change text [17:34:48] 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10Patch-For-Review: kafka1018 fails to boot - https://phabricator.wikimedia.org/T181518#3816790 (10madhuvishy) @elukey I recommend copying home directories on notebook1002 and back them up somewhere on notebook1001, and send a note to analytics and research-l... [17:36:38] 10Analytics-Tech-community-metrics, 10Developer-Relations (Oct-Dec 2017), 10Upstream: Understand difference between author_name and name in gerrit - https://phabricator.wikimedia.org/T177890#3816810 (10Aklapper) [17:38:08] 10Analytics-Tech-community-metrics, 10Developer-Relations (Oct-Dec 2017), 10Upstream: Understand difference between author_name and name in gerrit - https://phabricator.wikimedia.org/T177890#3674048 (10Aklapper) 05Open>03Resolved `author_name` is the global field name; the other two names are data source... [17:38:42] 10Analytics-Tech-community-metrics, 10Developer-Relations (Oct-Dec 2017): Understand difference between author_name and name in gerrit - https://phabricator.wikimedia.org/T177890#3816820 (10Aklapper) [18:16:52] 10Analytics-Kanban, 10Analytics-Wikistats: Handle error due to lack of data - https://phabricator.wikimedia.org/T182224#3817033 (10Milimetric) [18:17:03] 10Analytics-Kanban, 10Analytics-Wikistats: Handle error due to lack of data - https://phabricator.wikimedia.org/T182224#3817045 (10Milimetric) a:03Milimetric [18:17:18] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Migrate htmlCacheUpdate job to Kafka - https://phabricator.wikimedia.org/T182023#3817046 (10mobrovac) [18:24:17] joal: [18:24:32] yes elukey [18:24:33] just restarted the druid1001's middlemanager (was drained) and set the http emitter config [18:24:47] eventually it should produce the new metrics [18:24:59] if tomorrow all is good, I'll deploy them on all the nodes [18:25:04] and make everything permanent [18:27:56] <3 [18:31:36] hoping that the peon will pick up middlemanager's settings [18:35:42] (03PS2) 10Ottomata: Deploy repo for superset (python) [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/392933 (https://phabricator.wikimedia.org/T166689) [18:37:05] ottomata: had a chat with the Traffic team, they are ok with the idea of a vk test instance on cache misc with TLS [18:38:04] I am wondering if we could reboot analytics1003 [18:38:31] great [18:38:44] elukey: we'd have to schedule a little downtime i think [18:38:47] since hive etc. will be offline [18:39:06] yeah better to send an email first [18:39:14] I can scheduled it for tomorrow morning EU time [18:39:19] I'll do it with joal [18:39:19] (03PS3) 10Ottomata: Deploy repo for superset (python) [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/392933 (https://phabricator.wikimedia.org/T166689) [18:39:34] (03PS4) 10Ottomata: Deploy repo for superset (python) [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/392933 (https://phabricator.wikimedia.org/T166689) [18:39:39] ok great sounds good [18:40:54] (03PS5) 10Ottomata: Deploy repo for superset (python) [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/392933 (https://phabricator.wikimedia.org/T166689) [18:41:44] (03CR) 10Ottomata: [V: 032 C: 032] Deploy repo for superset (python) (031 comment) [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/392933 (https://phabricator.wikimedia.org/T166689) (owner: 10Ottomata) [18:43:46] (03PS1) 10Ottomata: Adding artifacts for jessie superset==0.20.6 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395804 (https://phabricator.wikimedia.org/T166689) [18:44:06] sure elukey [18:44:16] (03CR) 10Ottomata: [V: 032 C: 032] Adding artifacts for jessie superset==0.20.6 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395804 (https://phabricator.wikimedia.org/T166689) (owner: 10Ottomata) [18:48:06] email sent [18:50:14] elukey: i'm working on puppetizing superset [18:50:17] we will run it on thorium, like pivot [18:50:31] do you think i should puppetizie it like pivot, with stuff in statistics::sites::superset [18:50:32] OR [18:50:34] do a profile [18:50:40] and include that profile from role::statistics::web [18:53:18] I think we can do as for pivot, and then decide later on how to refactor all that code? [18:53:26] ok cool [18:54:06] hmmm, but hmm. superset has more configuration than pivot does [18:54:19] i need class params somewhere and hiera somewhere. [18:54:33] i could put the hiera for it in role/common/statistics/web.yaml [18:54:36] hmm [18:54:53] if you think that a profile is ok let's do it [18:55:07] hmmm [18:55:20] ok i'm going to try profile first, might make it easier, not sure [18:58:32] (03PS4) 10Joal: Update mediawiki-history-reduced job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/389496 (https://phabricator.wikimedia.org/T178504) [19:04:39] (03PS1) 10Ottomata: Use -e instead of -d for checking for deploy dir, in case it is a symlink [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395807 [19:04:39] (03CR) 10Ottomata: [V: 032 C: 032] Use -e instead of -d for checking for deploy dir, in case it is a symlink [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395807 (owner: 10Ottomata) [19:05:00] ok also sent an email to analytics@ to announce the notebook1002 -> kafka1023 reimage [19:05:31] ottomata: if you have time today would you mind to copy the notebook1002's home dirs to notebook1001's /srv ? [19:05:37] should be few megs [19:05:41] but just to be sure [19:06:28] o/ fdans [19:06:58] elukey: sure [19:09:57] elukey: notebook1001:/srv/notebook1002_home_dirs [19:10:38] super [19:10:41] joal: https://grafana.wikimedia.org/dashboard/db/prometheus-druid?orgId=1&panelId=41&fullscreen [19:10:51] ottomata: --^ [19:10:57] this is only from druid1001 atm [19:11:04] buuuut tomorrow I'll deploy it everywhere [19:11:05] \o/ [19:12:40] all right people I am going home [19:12:46] talk with you tomorrow! [19:12:47] o/ [19:12:52] * elukey off [19:37:14] 10Analytics-Kanban, 10Pageviews-API, 10Services (watching): Endpoints that 404 no longer have the "Access-Control-Allow-Origin" header - https://phabricator.wikimedia.org/T179113#3817600 (10Pchelolo) @Milimetric sorry for late response. Seems like it's a bug in RESTBase and it's happening for all the service... [19:39:19] elukey: niiiice! [19:41:07] 10Analytics-Kanban, 10Pageviews-API, 10Services (watching): Endpoints that 404 no longer have the "Access-Control-Allow-Origin" header - https://phabricator.wikimedia.org/T179113#3817623 (10Milimetric) @Nuria I'm going to mark this done from our side, and when you close it, if you remember, just merge it int... [19:51:32] 10Analytics-Kanban, 10Analytics-Wikistats: Add link to new wikistats 2.0 to wikistats 1.0 pages - https://phabricator.wikimedia.org/T182001#3817686 (10Milimetric) @ezachte you should say anything you like here, but here's how I see it: 2017-12: Wikistats has been redesigned for simplicity and faster data proc... [19:58:58] (03PS1) 10Ottomata: Move common vars to profile.sh, add init_superset.sh [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395822 [19:59:27] (03CR) 10Ottomata: [V: 032 C: 032] Move common vars to profile.sh, add init_superset.sh [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395822 (owner: 10Ottomata) [20:02:01] (03PS1) 10Ottomata: chmod 755 init_superset.sh [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395824 [20:02:08] (03CR) 10Ottomata: [V: 032 C: 032] chmod 755 init_superset.sh [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395824 (owner: 10Ottomata) [20:13:36] nuria_: updated the docs with redirect data: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats [20:27:58] 10Analytics, 10Analytics-Cluster: Unconfigured Hue's use of webhdfs/httpfs - https://phabricator.wikimedia.org/T182242#3817825 (10Ottomata) [20:28:04] 10Analytics, 10Analytics-Cluster: Unconfigure Hue's use of webhdfs/httpfs - https://phabricator.wikimedia.org/T182242#3817835 (10Ottomata) [21:32:11] AndyRussG: I kill the realime job, it's failing without us understanding - I'll investigate more tomorrow [21:51:05] 10Analytics, 10Analytics-EventLogging: Unit tests for Event Logging - https://phabricator.wikimedia.org/T86543#3818162 (10ggellerman) [22:07:00] 10Analytics, 10EventBus, 10MediaWiki-JobQueue: Create scripts to estimate Kafka queue size per wiki - https://phabricator.wikimedia.org/T182259#3818223 (10Pchelolo) [22:07:26] 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10Services (doing): Create scripts to estimate Kafka queue size per wiki - https://phabricator.wikimedia.org/T182259#3818238 (10Pchelolo) [22:07:44] 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10Services (doing): Create scripts to estimate Kafka queue size per wiki - https://phabricator.wikimedia.org/T182259#3818223 (10Pchelolo) [22:32:50] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3818312 (10Jdlrobson) Are there plans to expose this in https://stream.wikimedia.org/v2/stream/recentchange ? [22:34:02] by chance does anyone know what yarn actually checks to see what memory usage of a 'container' is to decide if it should kill it for overrunning? [23:13:59] ebernhardson: old but probably of help: https://issues.apache.org/jira/browse/YARN-2984 [23:23:08] digging through, it looks like its using rss from /proc/$pid/stats, will start logging that with my jobs. Trying to track down why i have to massively over-allocate memory to keep our model training from getting killed by yarn [23:24:08] well, that along with a process tree implementation that sums up rss across child processes [23:33:13] 10Analytics-Kanban, 10Analytics-Wikistats: Add link to new wikistats 2.0 to wikistats 1.0 pages - https://phabricator.wikimedia.org/T182001#3809460 (10Erik_Zachte) Here is a first draft: https://stats.wikimedia.org/wikinews/EN/draft/TablesWikipediaES.htm Please comment. In this example the deep-link to some... [23:35:47] (03PS1) 10Ottomata: Improvements to deployment scripts [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395887 (https://phabricator.wikimedia.org/T166689) [23:35:56] (03CR) 10Ottomata: [V: 032 C: 032] Improvements to deployment scripts [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/395887 (https://phabricator.wikimedia.org/T166689) (owner: 10Ottomata) [23:58:42] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3818583 (10Ottomata) Yes! It's just not prioritized, so I only get to work on it when I have a bit of headroom... [23:58:57] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3818584 (10Ottomata) Wait, in recentchange? No, it will be its own stream.