[00:16:12] <wikibugs>	 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10Services (later): Enable EventBus on all wikis - https://phabricator.wikimedia.org/T185170#3908258 (10Pchelolo)
[00:22:52] <ottomata>	 nuria_:  i'd use spark2-shell now whenever possible
[01:37:05] <wikibugs>	 10Analytics-Tech-community-metrics, 10Possible-Tech-Projects, 10ECT-June-2015, 10ECT-March-2015, and 2 others: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#1215552 (10srishakatux) @jgbarah @Dicortazar @Acs Would there be any interest in...
[07:11:21] <elukey>	 !log re-run webrequest-load-wf-misc-2018-1-18-3 via Hue
[07:11:22] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[08:42:38] <joal>	 Hi elukey - I have some time now - Is there anything urgent you wish me to help with?
[08:44:23] <elukey>	 hi joal! How are things!??
[08:44:56] <joal>	 elukey: better! Mélissa is back seeing and Naé doesn't vomit anymore :)
[08:44:57] <elukey>	 all good, yesterday druid1002's reboot caused a bit of weirdness for Druid, had to restart overlords/middlemanagers to get things back in shape
[08:45:04] <elukey>	 nice! :)
[08:45:27] <joal>	 elukey: even with the procedure we have?
[08:46:45] <elukey>	 yep, and this time also the "batch" indexing jobs were acting weirdly
[08:46:53] <joal>	 maaaan
[08:46:58] <joal>	 elukey: :(
[08:47:16] <joal>	 elukey: and from me reading the chan, it seemed the reboot way worked, right?
[08:47:23] <elukey>	 I think that this version of druid is not tolerant to zookeeper/overlord changes at the same time
[08:47:37] <elukey>	 need to do druid1001, the only one left :D
[08:47:50] <elukey>	 if you have time we can do it now
[08:48:06] <elukey>	 then the reboots will be completed
[08:48:15] <joal>	 elukey: Let's go ! I'll feel a bit useful ;)
[08:50:39] <elukey>	 all right! 
[08:51:00] <mforns>	 elukey, thanks for: re-run webrequest-load-wf-misc-2018-1-18-3 via Hue
[08:51:16] <elukey>	 so the druid overlord commmander is druid1002
[08:51:17] <elukey>	 :)
[08:51:37] <elukey>	 I didn't get exactly why it failed though, I checked briefly but it was before coffee :D
[08:51:59] <elukey>	 0001008-180117090906503-oozie-oozi-W was the id
[08:52:45] <elukey>	 !log disable druid1001's middlemanager as prep step for reboot
[08:52:47] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[08:53:17] <elukey>	 !log temporarily point pivot's configuration to druid1002 (druid1001 needs to be rebooted)
[08:53:19] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[08:54:41] <elukey>	 I imagine that superset also uses druid1001 only
[08:54:57] <joal>	 elukey: I think so yes
[08:58:23] <elukey>	 !log temporarily set druid1002 in superset's druid cluster config (via UI)
[08:58:24] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:00:24] <elukey>	 !log suspended hourly druid batch index jobs via Hue
[09:00:25] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:02:23] <elukey>	 all right going to wait for druid1001's middlemanager to completely drain
[09:02:43] <joal>	 elukey: about 1 hour of waiting - hourly jobs just started :(
[09:03:01] <elukey>	 joal: analytics1003's reboot went super smooth yesterday, one good thing
[09:03:13] <joal>	 awesome :)
[09:03:20] <elukey>	 argh so I haven't stopped them in time?
[09:03:49] <joal>	 elukey: there is no "in time" thing - You stopped them just after the hour instead of just before :)
[09:04:15] <joal>	 elukey: I'm talking about the realtime jobs
[09:05:11] <joal>	 Oh elukey ! Actually, new jobs are inexing on druid100[23] only - We should be good to go in the next 10 minutes :)
[09:05:26] <joal>	 excuse me elukey - Didn't look precsiely enough :)
[09:07:48] <elukey>	 ahhh okok
[09:12:58] <elukey>	 joal: another thing that I wanted to ask - yesterday I tried to kill the banner streaming job to get it restarted, in a attempt to figure out why real time indexers where not running.. All was good except the fact that indexers were spawned only after the hour, not before. Now I am *sure* that you already told me why 100 times, would you be so kind to redo it for the 101th?
[09:13:41] <joal>	 huhuhu elukey :)
[09:14:02] <joal>	 elukey: this problem is due to tranquility realtime-tasks naming scheme
[09:14:46] <joal>	 elukey: in order to be able to reconnect to existing tasks, it creates them with a naming convention using dates
[09:15:34] <joal>	 elukey: However when those tasks get killed (druid master overlord restart, or manual kill), tranquility tries to reinstanciate them with the same name, and druid doesn't allow it
[09:17:26] <joal>	 elukey: does it make more sense?
[09:18:05] <elukey>	 it does, yes
[09:18:07] <elukey>	 <3
[09:18:52] <elukey>	 joal: do you think that I can do the netflow scala task ?
[09:19:17] <joal>	 elukey: I think you can - only issue is that data is not present anymore :(
[09:19:26] <elukey>	 it probably takes 30 mins for you and 3 days for me but I'd learn a lot :D
[09:19:36] <elukey>	 it is in jumbo!
[09:20:00] <elukey>	 I restarted the pmacct daemon a while a go
[09:20:01] <elukey>	 *ago
[09:20:13] <elukey>	 so afaik data is pushed to jumbo as we speak
[09:20:39] <elukey>	 in the meantime, druid1001 is ready to be rebooted
[09:21:19] <elukey>	 !log reboot druid1001 for kernel upgrades
[09:21:20] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:25:08] <elukey>	 not seeing any data in kafkacat -C -b kafka-jumbo1006.eqiad.wmnet -t netflow -p0
[09:25:11] <elukey>	 mmmmm
[09:25:28] <elukey>	 maybe I need to ask to Faidon what is the status of pmacct, he is working on it these days
[09:25:31] <joal>	 elukey: I was about to say that I odn't see any netflow topic in jumbo from grafana
[09:25:42] <elukey>	 ah no just seen some events
[09:25:55] <elukey>	 {"tag": 0, "as_dst": 0, "as_path": "", "peer_as_dst": 0, "stamp_inserted": "2018-01-18 09:09:00", "stamp_updated": "2018-01-18 09:25:02", "packets": 51000, "bytes": 74460000}
[09:26:18] <elukey>	 now I have no idea how to read the data
[09:26:18] <joal>	 elukey: weird ! No topic abailable in grafana :(
[09:27:21] <elukey>	 I suspect that kafka by topic has not been ported to prometheus
[09:27:38] <joal>	 Ah! makes sense
[09:27:41] <joal>	 ok :)
[09:27:58] <joal>	 then, if we haz dataz, you canz go for netflow :)
[09:33:05] <elukey>	 \o/
[09:33:21] <elukey>	 druid1001 is up with the new kernel
[09:33:27] <joal>	 awesome elukey :)
[09:37:18] <elukey>	 !log resumed druid hourly index jobs via hue and restored pivot's configuration
[09:37:19] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:37:31] <elukey>	 now I need to reboot thorium 
[09:38:43] <elukey>	 !log reboot thorium (analytics webserver) for security upgrade - This maintenance will cause temporary unavailability of the Analytics websites
[09:38:45] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:39:52] <elukey>	 rebooting in a minute, all websites down now
[09:43:00] <elukey>	 all good!
[09:43:18] <elukey>	 joal: do you want to know something funny?
[09:43:43] <elukey>	 there is going to be soon another jvm upgrade to do :P
[09:43:45] <joal>	 elukey: tell me !
[09:44:00] <joal>	 Bwaaahhhhahhhahah ;(
[09:44:45] <elukey>	 but since Moritz is mercyful he told us to couple it with the java 8 upgrade
[09:45:05] <elukey>	 so in theory we'll do everything in one go
[09:45:22] <joal>	 great elukey
[09:45:34] * joal bows to moritzm mercy
[09:45:46] <joal>	 elukey: about j8 tests - anything new on the labs clluster?
[09:46:16] <elukey>	 I was about to say that, we can definitely work on it before SF and make everything work
[09:46:31] <elukey>	 Andrew agrees with us that doing it before would be too aggressive :D
[09:46:54] <elukey>	 so I propose to work on it (depending on your work schedule in these days) up to Tue
[09:47:04] <elukey>	 and then write down a procedure for the prod cluster
[09:47:31] <joal>	 elukey: works for me
[09:48:25] <joal>	 elukey: I'll be on-and-off today, as Naé is still home, but should be full-time tomorrow, can make time this weekend and full-time monday (with late start)
[09:52:52] <elukey>	 joal: ack, I'll try to gently help the labs cluster understand that it needs to work
[09:53:31] <elukey>	 if you want to laugh a bit https://phabricator.wikimedia.org/T184794
[09:53:45] <joal>	 elukey: Thing is - I'm not sure what we need - first is to make sure spark1 and 2 work with Hive, ok
[09:54:00] <joal>	 elukey: Then, testing upgrade?
[09:55:05] <elukey>	 joal: I am going to set via Puppet the JAVA_HOME to openjdk 8 (at the moment is only manual via puppet disabled)
[09:55:16] <elukey>	 but we should already be good with upgrade testing no?
[09:55:25] <elukey>	 I mean, everything now has been working with java 8
[09:55:37] <joal>	 elukey: should we try to have spark2 wotking with hive with j7, and test upgrade again?
[09:55:50] <joal>	 elukey: or let's just fix it with j8, works for me as well
[09:58:51] <joal>	 elukey: just read the oozie/hive prometheus thing - Man there's something wrong in there
[09:58:52] <wikibugs>	 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Puppet admin module should support adding system users to managed groups - https://phabricator.wikimedia.org/T174465#3562875 (10MoritzMuehlenhoff) This is solely for T174110 or are we anticipating other use cases?   The high level implementation idea seem...
[10:04:14] <elukey>	 joal: I wanted your opinion to share my inner pain, I felt really bad while checking those bash scripts
[10:04:21] <elukey>	 there is something horrible in those
[10:04:37] <elukey>	 now I understand why those packages do not have any systemd unit
[10:04:45] <joal>	 elukey: I'm actually fairly sure there are MANY things horrible in those
[10:04:50] <elukey>	 hhahahaah
[10:05:02] <elukey>	 the symlink was really weird
[10:06:21] <elukey>	 I also have no idea who is the owner of the debian packaging
[10:06:32] <elukey>	 I guess cloudera ?
[10:08:15] <joal>	 elukey: hm, I have no clue
[10:08:38] <joal>	 elukey: if so, this might actually be a good argument to try hortonworks ;)
[10:08:43] <joal>	 :D
[10:08:54] * joal runs and hides
[10:16:09] <elukey>	 if they have proper systemd units I'd be happy to test it :D
[10:16:15] <elukey>	 I mean, it would be a really nice pro
[10:19:05] <wikibugs>	 10Analytics-Kanban, 10Operations, 10monitoring, 10netops, 10User-Elukey: Pull netflow data in realtime from Kafka via Tranquillity/Spark - https://phabricator.wikimedia.org/T181036#3908967 (10elukey) @faidon whenever you have time do you mind to explain a bit what data is currently pushed to the netflow...
[10:22:00] <joal>	 elukey: quickly scanning https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/ch_getting_ready_chapter.html
[10:22:09] <joal>	 elukey: some things makes me tick
[10:25:48] <joal>	 elukey: And it seems .sh scripts are used to start/stop hadoop all along the doc
[10:25:51] <joal>	 :(
[10:26:29] <elukey>	 I suspect that it is a mess that upstream need to fix first
[10:26:43] <joal>	 very much probable elukey 
[10:27:35] <wikibugs>	 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Puppet admin module should support adding system users to managed groups - https://phabricator.wikimedia.org/T174465#3562875 (10akosiaris) Ι 'll echo Moritz on this one. It does look like adding system users to the admin module adds some complexity and do...
[10:32:07] <wikibugs>	 10Analytics-Kanban, 10RESTBase-API, 10Patch-For-Review, 10Services (done): Update AQS pageview-top definition - https://phabricator.wikimedia.org/T184541#3909087 (10mobrovac) The RESTBase side of things has been deployed.
[10:42:31] <elukey>	 going for an errand + early lunch, ttl! 
[10:42:34] * elukey afk!
[10:43:03] <elukey>	 (reachable via hangouts/phone of course if needed)
[10:47:42] <wikibugs>	 10Analytics-Tech-community-metrics, 10Possible-Tech-Projects, 10ECT-June-2015, 10ECT-March-2015, and 2 others: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#3909164 (10Aklapper) Note: The task description is very outdated and links to tec...
[11:23:40] <wikibugs>	 10Analytics, 10TCB-Team, 10Documentation: Where should keys used for stats in an extension be documented? - https://phabricator.wikimedia.org/T185111#3909240 (10Legoktm) Just use `README` or `docs/metrics.txt` or wherever you want? :)
[12:40:47] <elukey>	 !log set piwik in readonly mode and stopped mysql on bohrium (prep step for reboot)
[12:40:51] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[12:43:22] <elukey>	 !log piwik on bohrium re-enabled 
[12:43:23] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[12:44:35] <elukey>	 aaand bohrium also updated
[12:44:43] <elukey>	 last one is probably archiva
[12:47:16] <joal>	 elukey: sonething just hit my mind: when updating cluster to j8, it means we need to upgrade druid as well, no?
[12:50:41] <elukey>	 joal: not sure, can we do it later on or does it need to be done in parallel?
[12:51:23] <joal>	 elukey: We failed to upgrade druid last time because of java versions discrepencies - this is why I'm wondering
[12:51:47] <elukey>	 yep I remember, new druid requires java8
[12:52:11] <joal>	 My question would then be: if cluster is java8, would druid undfer j7 work with it?
[12:53:11] <elukey>	 do you remember how druid was failing list time? I mean, what specifically was failing
[12:54:38] <elukey>	 https://phabricator.wikimedia.org/T164008
[12:54:53] <elukey>	 "Druid 0.10 requires Java 8, which is fine. But the Analytics Hadoop cluster runs Java 7, and Hadoop indexing tasks were failing. See: https://groups.google.com/forum/#!topic/druid-user/aTGQlnF1KLk"
[12:57:44] <elukey>	 so it is possible that indexing jobs may fail, not sure what could happen
[12:57:53] <elukey>	 but afaik we'd be ready tomorrow to upgrade druid
[12:58:07] <elukey>	 so we can think about coupling the  two
[12:59:56] <elukey>	 but I'd prefer to keep them separate
[13:00:21] <elukey>	 I can see two issues:
[13:00:30] <joal>	 sounds good to me that way elukey :)
[13:00:56] <elukey>	 1) HDFS read/write impaired on druid due to java mismatch -> big issue since it would affect wikistats
[13:01:27] <elukey>	 2) only index jobs failing -> not a big trouble for wikistats, but some problems to pivot/superset/etc..
[13:04:52] <joal>	 elukey: very much agreed
[13:05:07] <joal>	 elukey: I wonder if it would be possible to test issue 1 in labs
[13:05:42] <elukey>	 pretty sure we can, I am currently working on the last settings for hadoop in labs, after that we can spin up a druid cluster
[13:05:49] <elukey>	 it should be easy (last famous words)
[13:06:12] <joal>	 elukey: those are my TM :)
[13:48:26] <elukey>	 joal: now yarn nodemanagers in labs should be healthy
[13:54:26] <elukey>	 and also I deployed the new puppet changes for JAVA_HOME set via hiera, all good
[14:10:12] <elukey>	 joal: also configured the coordinator not to run the crons for the moment
[14:10:21] <elukey>	 without the need to disabling puppet
[14:10:32] <elukey>	 now if we need stuff I'll add it via horizon uoi
[14:10:36] <elukey>	 ui
[14:10:42] <elukey>	 like camus etc..
[14:17:13] <elukey>	 scala2 still acting a bit weirdly
[14:30:57] <fdans>	 joal: whenever you can shall we reload per country data?
[14:32:15] <wikibugs>	 10Analytics-Kanban, 10Puppet, 10User-Elukey: analytics VPS project puppet errors - https://phabricator.wikimedia.org/T184482#3909685 (10elukey) Fixed all except j1.analytics.eqiad.wmflabs - @Ottomata do we still need this? It seems running superset, and puppet is broken in there..
[14:34:12] <fdans>	 (also joal I wanted to ask your opinion about minor inconsistencies in rank order between wikistats and aqs by country)
[14:37:54] <mforns>	 fdans, joal: I will deploy refinery now
[14:38:14] <fdans>	 mforns: can I shadow you on the cave?
[14:38:34] <mforns>	 fdans, sure, gime 5 mins to change rooms
[14:39:18] <fdans>	 cool!
[14:43:41] <mforns>	 fdans, omw to bc
[14:48:02] <wikibugs>	 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): Explain decrease in number of patchset authors for same time span when accessed 3 months later - https://phabricator.wikimedia.org/T184427#3909716 (10Aklapper) p:05High>03Normal I saved the CSV data for Dec2017 and Oct-Dec2017 locall...
[14:57:06] <mforns>	 milimetric, this change https://gerrit.wikimedia.org/r/#/c/403946/ can be deployed as is? or needs some manual table maintenance?
[14:58:34] <milimetric>	 mforns: yes, can be deployed as-is, it's just updating the table definition, I already dropped and re-created the table
[14:58:43] <mforns>	 milimetric, ok!
[14:58:45] <milimetric>	 so this is more just for reference and in the future if we drop the table again
[14:59:02] <milimetric>	 thx for checking, sorry I wasn't more clear in the comment, we should start adding a Deloyment: section
[14:59:57] <milimetric>	 fdans: makes sense about the aggregates on the top pages.  I was thinking like maybe averaging out the buckets for per-country, adding up the view_counts for top articles, but that gets too complicated, it's fine to leave it out for now
[15:02:24] <fdans>	 milimetric: maybe we need yet another property at the config detailing whether the metric is bucketed or not
[15:02:27] <joal>	 Hey elukey / fdans  - Was away, will be again
[15:02:46] <fdans>	 we're in the batcave deploying refinery joal :)
[15:02:55] <joal>	 fdans: great :)
[15:03:27] <joal>	 fdans: Will restart the jobs later tongiht if ok
[15:03:58] <fdans>	 yess great joal :)
[15:04:08] <joal>	 cool :)
[15:04:10] <joal>	 Gone again !
[15:04:18] <mforns>	 elukey, joal, the repo in deployment.eqiad.wmnet:/srv/deployment/analytics/refinery/scap has uncommited modifications, do you know something about that?
[15:04:22] <joal>	 elukey: Will also investigate spark2 this evening
[15:04:34] <joal>	 mforns: aouch - no idea
[15:04:38] <mforns>	 ok
[15:06:40] <mforns>	 elukey, joal, solved :]
[15:07:29] <mforns>	 !log starting refinery deployment
[15:07:31] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[15:08:52] <elukey>	 mforns: just seen the ping sorry
[15:09:13] <mforns>	 np :]
[15:21:09] <mforns>	 !log refinery deployment using scap and then deploying onto hdfs finished
[15:21:10] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[15:41:16] <wikibugs>	 10Analytics-Kanban, 10Puppet, 10User-Elukey: analytics VPS project puppet errors - https://phabricator.wikimedia.org/T184482#3909876 (10Ottomata) j1 deleted!
[15:42:05] <wikibugs>	 (03CR) 10Milimetric: Map component and Pageviews by Country metric (037 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) (owner: 10Fdans)
[15:42:37] <wikibugs>	 10Analytics-Kanban, 10Puppet, 10User-Elukey: analytics VPS project puppet errors - https://phabricator.wikimedia.org/T184482#3909884 (10elukey) 05Open>03Resolved Closing the task since puppet should be ok now, please re-open otherwise!
[15:44:23] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban: Move webrequest varnishkafka and consumers to Kafka jumbo cluster. - https://phabricator.wikimedia.org/T185136#3909891 (10Jgreen) >>! In T185136#3907424, @Ottomata wrote: > @Jgreen FYI, we'll need to coordinate this soon :)  No problem. I tried a little puppetspelunking...
[15:45:11] <elukey>	 ottomata: hiiiiiiiii
[15:45:38] <elukey>	 whenever you have 10 mins can we check why in labs spark2-shell behaves weirdly?
[15:46:01] <elukey>	 it maybe a config that needs to be applied somewhere that I am not aware
[15:46:18] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban: Move webrequest varnishkafka and consumers to Kafka jumbo cluster. - https://phabricator.wikimedia.org/T185136#3909912 (10Ottomata) >  I'm guessing we're talking about a new pool of kafka hosts Yup! Mostly just changing settings and bouncing the kafkatee instances, but...
[15:46:25] <elukey>	 if we try to use the hive context it doesn't work (tested with show databases, spark-shell works)
[15:46:38] <ottomata>	 hiii
[15:46:49] <ottomata>	 hmm
[15:50:41] <ottomata>	 elukey:  how are you testing?
[15:50:43] <ottomata>	 I'm trying
[15:51:27] <ottomata>	 spark.sql("SHOW DATABASES").collect()
[15:51:34] <ottomata>	 but i don't get any results
[15:51:36] <ottomata>	 it works though
[15:52:01] <ottomata>	 hmm
[15:52:42] <ottomata>	 OK
[15:52:47] <ottomata>	 hive-site.xml in supposed to be symlinked
[15:52:50] <ottomata>	 in /etc/spark2/conf
[15:52:51] <ottomata>	 hmmm
[15:52:57] <ottomata>	 i'd think that was puppetized...
[15:52:58] <ottomata>	 looking
[15:54:28] <ottomata>	 ah elukey on install, the .deb package does the symlink
[15:54:32] <ottomata>	 if hive-site.xml exists
[15:54:35] <ottomata>	 maybe there was a race condition
[15:54:59] <ottomata>	 I've added the symlink
[15:57:31] <wikibugs>	 (03PS14) 10Fdans: Map component and Pageviews by Country metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529)
[15:57:48] <fdans>	 milimetric: pong :)
[15:57:59] <wikibugs>	 (03CR) 10Fdans: Map component and Pageviews by Country metric (035 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) (owner: 10Fdans)
[15:59:10] <milimetric>	 looking :)
[16:01:50] <elukey>	 ottomata: sorry I was doing another thing, checking now! 
[16:02:08] <elukey>	 it worked for me but no result was returned
[16:03:03] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban: Move webrequest varnishkafka and consumers to Kafka jumbo cluster. - https://phabricator.wikimedia.org/T185136#3909987 (10Jgreen) >>! In T185136#3909912, @Ottomata wrote: >>  I'm guessing we're talking about a new pool of kafka hosts > Yup! Mostly just changing settings...
[16:03:59] <elukey>	 ottomata: workssssss \o/
[16:04:00] <elukey>	 you rock
[16:04:14] <wikibugs>	 (03CR) 10Milimetric: [C: 032] Map component and Pageviews by Country metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) (owner: 10Fdans)
[16:04:27] <fdans>	 awyisss
[16:04:28] <elukey>	 so /etc/spark2/conf has the same trick as /etc/hadoop/conf etc..
[16:04:43] <elukey>	 joal: spark2 works with java 8 :)
[16:05:02] <elukey>	 yessssssssssssssssssssssss
[16:05:21] <milimetric>	 :) nice fdans, you're definitely meant for greatness, I'm just meant for laziness :)
[16:05:45] <fdans>	 milimetric: haha I didn't intend to merge just yet though
[16:06:04] <fdans>	 I still gotta uncomment those couple lines once the iso codes are live on the endpoint
[16:06:11] <elukey>	 ahh no hive-site.xml -> /etc/hive/conf.analytics-hadoop-labs/hive-site.xml
[16:06:19] <elukey>	 so only hive-site is symlinked
[16:06:23] <elukey>	 okok got it now
[16:06:41] <milimetric>	 fdans: oh, sorry, just self-merge follow-ups
[16:06:51] <elukey>	 maybe in puppet we need to add the File to the requires of spark2 pkg
[16:06:58] <ottomata>	 yup, and because the deb package does it on installl, if the hive/conf/hive-site.xml file doesn't exist (yet), it won't symlink it
[16:07:08] <elukey>	 super
[16:07:53] <elukey>	 ottomata: no idea why spark2 picks up java8 straight away without any config, buuut everything works
[16:08:06] <elukey>	 I applied in labs the JAVA_HOME hiera change to force it to java8
[16:08:15] <elukey>	 all good as far as I can see
[16:08:55] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Provision new Kafka cluster(s) with security features - https://phabricator.wikimedia.org/T152015#3910001 (10Ottomata)
[16:09:01] <wikibugs>	 10Analytics, 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-kafka03 due to full disk - https://phabricator.wikimedia.org/T184235#3909999 (10Ottomata) 05Open>03Resolved deployment-kafka03 has been deleted.
[16:09:08] <wikibugs>	 10Analytics-Kanban, 10Beta-Cluster-Infrastructure: deployment-kafka01 - disk is full - https://phabricator.wikimedia.org/T174742#3910003 (10Ottomata) 05Open>03Resolved a:03Ottomata deployment-kafka01 has been deleted.
[16:15:39] <wikibugs>	 10Analytics, 10Analytics-Cluster: Move EventStreams to new jumbo cluster. - https://phabricator.wikimedia.org/T185225#3910019 (10Ottomata) p:05Triage>03Normal
[16:15:42] <ottomata>	 elukey:  it has to
[16:15:50] <ottomata>	 spark2 doesn't work with java 7
[16:16:10] <elukey>	 this is a good explanation :D
[16:17:34] <elukey>	 ottomata: I think that require_package('spark2') in profile::hadoop::worker comes before include ::profile::hive::client
[16:17:41] <elukey>	 so the symlink doesn't get installed
[16:17:59] <ottomata>	 aye, maybe we can add a dependency
[16:18:03] <ottomata>	 or manually ensure the symlink exists
[16:18:15] <ottomata>	 we'll probably need to make a spark2 class or profile or something if we do either
[16:18:24] <ottomata>	 eventually
[16:22:37] <elukey>	 ottomata: another question that we might need to test - do we need to upgrade druid as well?
[16:22:55] <elukey>	 or maybe move it to 0.10
[16:27:12] <ottomata>	 elukey:  we should probably  test a loading job if we can?  but i'm fairly sure it will work with hadoop running java 8
[16:27:23] <ottomata>	 if we can upgrade druid to java 8 at the same time or soon, that would be good
[16:27:32] <ottomata>	 we should do the actual druid upgrade as a separate piece
[16:28:08] <elukey>	 yep I'd really like to keep things separate..
[16:28:25] <elukey>	 so you are suggesting so create a small cluster in labs and then test a load/indexing job?
[16:32:28] <ottomata>	 elukey:  maybe?  hm.  
[16:32:47] <ottomata>	 dob'nt have one right now, eh?
[16:32:53] <ottomata>	 elukey:  a single node cluster would be fine...i'll help
[16:32:55] <ottomata>	 i'll make one  now :)
[16:33:12] <elukey>	 if you have time sure, otherwise I can do it tomorrow :)
[16:33:53] <ottomata>	 i'll at least spawn up the cluster now
[16:34:02] <ottomata>	 then maybe you and/or jo al can test a loading job tomorrow
[16:44:42] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban: Move webrequest varnishkafka and consumers to Kafka jumbo cluster. - https://phabricator.wikimedia.org/T185136#3910130 (10Ottomata) > first step on the frack side is to whitelist the new hosts at the firewalls, can you point me to the list and I'll add a phabricator tas...
[16:46:59] <wikibugs>	 10Analytics-Kanban, 10Discovery-Analysis, 10MobileApp, 10Wikipedia-Android-App-Backlog: Bug behavior of QTree[Long] for quantileBounds - https://phabricator.wikimedia.org/T184768#3910132 (10Nuria) a:03Nuria
[16:52:05] <joal>	 heya ottomata and elukey - Just here for a minute - Awesome catch on hive for spark2 :)
[16:52:27] <joal>	 I'll try a loading job later tonight with the new 1-node cluster
[16:59:45] <wikibugs>	 10Analytics, 10DBA, 10Patch-For-Review, 10User-Elukey: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#3910160 (10Nuria)
[16:59:48] <wikibugs>	 10Analytics-Kanban, 10DBA: Purge all old data from EventLogging master - https://phabricator.wikimedia.org/T168414#3910159 (10Nuria) 05Open>03Resolved
[16:59:57] <wikibugs>	 10Analytics: Private geo wiki data in new analytics stack - https://phabricator.wikimedia.org/T176996#3910162 (10Nuria)
[16:59:59] <wikibugs>	 10Analytics-Kanban: Read the python code and design the Hadoop version - https://phabricator.wikimedia.org/T182944#3910161 (10Nuria) 05Open>03Resolved
[17:00:14] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review: Make superset more scalable - https://phabricator.wikimedia.org/T182688#3910163 (10Nuria) 05Open>03Resolved
[17:00:26] <nuria_>	 ping fdans 
[17:00:45] <nuria_>	 ping joal 
[17:02:04] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban: {tick} Schema Audit - https://phabricator.wikimedia.org/T102224#3910168 (10elukey)
[17:02:06] <wikibugs>	 10Analytics-Kanban: Enforce policy for each schema: Sanitize {tick} [8 pts] - https://phabricator.wikimedia.org/T104877#3910167 (10elukey)
[17:02:09] <wikibugs>	 10Analytics, 10DBA, 10Patch-For-Review, 10User-Elukey: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#3910166 (10elukey) 05Open>03Resolved
[17:02:17] <wikibugs>	 10Analytics, 10DBA, 10Patch-For-Review, 10User-Elukey: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#1618887 (10elukey)
[17:16:07] <wikibugs>	 (03PS10) 10Mforns: Improve WikiSelector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/402387 (https://phabricator.wikimedia.org/T179530)
[17:23:06] <wikibugs>	 10Analytics: Investigate why data was missing - https://phabricator.wikimedia.org/T185229#3910241 (10Milimetric)
[17:30:54] <wikibugs>	 10Analytics: Investigate why data was missing from mediawiki events  around January 3rd - https://phabricator.wikimedia.org/T185229#3910283 (10Nuria)
[17:35:44] <nuria_>	 ping ottomata 
[17:35:48] <nuria_>	 groskinnn
[17:36:31] <wikibugs>	 10Analytics-Kanban: Investigate why data was missing from mediawiki events  around January 3rd - https://phabricator.wikimedia.org/T185229#3910305 (10fdans)
[17:36:57] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Move EventStreams to new jumbo cluster. - https://phabricator.wikimedia.org/T185225#3910306 (10fdans)
[17:40:52] <wikibugs>	 10Analytics-Kanban: Eventlogging of the Future - https://phabricator.wikimedia.org/T185233#3910345 (10fdans)
[17:41:38] <wikibugs>	 10Analytics-Kanban: Eventlogging of the Future - https://phabricator.wikimedia.org/T185233#3910355 (10fdans)
[17:41:41] <wikibugs>	 10Analytics, 10TCB-Team, 10Documentation: Where should keys used for stats in an extension be documented? - https://phabricator.wikimedia.org/T185111#3910356 (10fdans)
[17:57:09] <wikibugs>	 10Analytics, 10MediaWiki-Releasing: Create dashboard showing MediaWiki tarball download statistics - https://phabricator.wikimedia.org/T119772#3910415 (10fdans) p:05Normal>03High
[18:11:31] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban: Lookout for duplicates in EL refine - https://phabricator.wikimedia.org/T185237#3910470 (10Nuria) p:05Triage>03Normal
[18:11:42] <wikibugs>	 10Analytics, 10Analytics-EventLogging: Lookout for duplicates in EL refine - https://phabricator.wikimedia.org/T185237#3910470 (10Nuria)
[18:15:48] <wikibugs>	 10Analytics, 10Analytics-EventLogging: Implement purging scheme for eventlogging data on top of eventlogging refine - https://phabricator.wikimedia.org/T176426#3910499 (10Nuria) One idea is to refine incoming data so we have, say,   popups  popups_refined according to whitelist   and those two tables are fille...
[18:19:59] <wikibugs>	 10Analytics, 10Analytics-EventLogging: Lookout for duplicates in EL refine - https://phabricator.wikimedia.org/T185237#3910511 (10Ottomata) a:03Ottomata
[18:22:29] <ottomata>	 btw, nuria_, milimetric, i forget, we have a page draft with eventlogging + druid schema guidelines, right?
[18:22:35] <ottomata>	 i'm writing a reply to this page previews email
[18:22:39] <nuria_>	 ottomata: yes
[18:22:40] <nuria_>	  we do
[18:22:44] <nuria_>	 ottomata: one sec
[18:22:52] <ottomata>	 oh i found
[18:22:53] <ottomata>	 https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Schema_Guidelines
[18:22:57] <ottomata>	 i shoulda just wikitech searched
[18:22:58] <milimetric>	 https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Schema_Guidelines
[18:22:58] <ottomata>	 thanks nuria_
[18:23:00] <milimetric>	 beat me to it
[18:23:03] <milimetric>	 yeah, google has everything
[18:31:02] <wikibugs>	 10Analytics-Kanban, 10ChangeProp, 10EventBus, 10Services (watching), 10User-Elukey: Export burrow metrics to prometheus - https://phabricator.wikimedia.org/T180442#3910556 (10fdans)
[18:32:30] <wikibugs>	 10Analytics: Configure superset to query mysql slaves - https://phabricator.wikimedia.org/T167427#3910559 (10fdans) 05Open>03declined
[18:32:33] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review: Productionize Superset - https://phabricator.wikimedia.org/T166689#3910560 (10fdans)
[18:33:19] <wikibugs>	 10Analytics-Kanban, 10Operations, 10Traffic, 10User-Elukey: Refactor kafka_config.rb and and kafka_cluster_name.rb in puppet to avoid explicit hiera calls - https://phabricator.wikimedia.org/T177927#3910563 (10fdans)
[18:37:35] <wikibugs>	 10Analytics, 10Operations, 10Traffic, 10User-Elukey: Refactor kafka_config.rb and and kafka_cluster_name.rb in puppet to avoid explicit hiera calls - https://phabricator.wikimedia.org/T177927#3910569 (10fdans)
[18:41:08] <wikibugs>	 10Analytics-Kanban, 10Operations, 10User-Elukey: Tune Varnishkafka delivery errors to be more sensitive - https://phabricator.wikimedia.org/T173492#3910612 (10fdans)
[18:44:00] <milimetric>	 oh, nuria_ / fdans: I have to also disable the per-country metric, right?
[18:44:01] <fdans>	 joal: when you're back let me know if you're going to reload tonight :)
[18:44:11] <joal>	 fdans: Here I am :)
[18:44:16] <fdans>	 milimetric: if you're going to deploy yes
[18:44:20] <joal>	 fdans: Do you wish me to restart the jobs?
[18:44:42] <fdans>	 joal: that includes backfilling right?
[18:44:44] <joal>	 fdans: another question: should we truncate first?
[18:44:47] <joal>	 fdans: correct
[18:44:58] <fdans>	 joal: yeah I think we agreed to truncate
[18:45:02] <joal>	 fdans: I'll actually restart the jobs with a start-end at the beginniong of backfilling
[18:45:38] <joal>	 fdans: ok, doing that: Truncate table then kill-restart loading jobs since beginning of existing data
[18:46:13] <fdans>	 you’re the bestest!
[18:47:07] <fdans>	 I need to be afk for a couple hours but i’ll be back later tonight
[18:47:19] <joal>	 np fdans :)
[18:47:32] <elukey>	 ottomata: did you see all ksql_transient_.. topics in jumbo?
[18:48:12] <joal>	 milimetric: Can you tall me the plan for metric-disabling? Is that in WKS2 front end?
[18:50:27] <milimetric>	 joal: yeah, just changing the config: https://github.com/wikimedia/analytics-wikistats2/blob/master/src/config/metrics/reading.js#L27
[18:50:51] <joal>	 k milimetric - just triple checking
[18:51:04] <joal>	 milimetric: A m I safe to truncate the table now or should I wait?
[18:52:46] <wikibugs>	 10Analytics-Kanban, 10Operations, 10monitoring, 10netops, 10User-Elukey: Pull netflow data in realtime from Kafka via Tranquillity/Spark - https://phabricator.wikimedia.org/T181036#3910670 (10elukey) Had an interesting chat with @ayounsi and for the moment it seems that the only format expected in the ne...
[18:53:25] <milimetric>	 joal: totally safe, yea
[18:53:29] * elukey off!
[18:53:36] <joal>	 great - Thanks milimetric 
[18:53:44] <joal>	 Bye elukey !
[18:56:18] <joal>	 !log Truncating table "local_group_default_T_top_bycountry"."data" in cassandra before reload
[18:56:19] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[19:10:37] <joal>	 !log Add fake data to cassandra to silent alarms (Thanks again ema)
[19:10:38] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[19:10:45] <ema>	 :)
[19:11:35] <joal>	 !log Kill-Restart coord_pageviews_top_bycountry_monthly ooie job from 2015-05
[19:11:36] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[19:26:32] <wikibugs>	 (03CR) 10Joal: [C: 031] "LGTM :)" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/403916 (https://phabricator.wikimedia.org/T167907) (owner: 10Joal)
[19:27:20] <wikibugs>	 (03CR) 10Joal: [C: 031] "Good for me as well :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/404662 (https://phabricator.wikimedia.org/T185100) (owner: 10Mforns)
[19:29:13] <joal>	 Backfilling of top-by-country confirmed
[19:35:37] <joal>	 ottomata: in case you missed it - Everything in a single place ;) https://docs.confluent.io/current/tutorials.html
[19:47:41] <ottomata>	 joal cool!
[19:56:49] <nuria_>	 joal: saw your comment about abstractdataclass, need to remove and will do. gerrit is working on/off today for me so i probably cannot take care  of that today
[19:56:59] <nuria_>	 joal: this code still needs to be tested on cluster though
[19:57:08] <joal>	 nuria_: I can test if you wish
[19:57:26] <nuria_>	 joal: either way, i can do it later on today if you have not gotten there yet, 
[19:57:41] <joal>	 nuria_: Was updatin
[19:57:45] <joal>	 again ...
[19:58:08] <joal>	 nuria_: was updating druid loading to prepare for WKS2 data loading v2
[19:58:09] <nuria_>	 joal: gerrit is not loading but they will fix that eventually, saw your comments on e-mail
[19:58:52] <joal>	 nuria_: maxmind code is better than it was for sure :)
[19:58:58] <joal>	 nuria_: Thanks for that
[20:00:14] <nuria_>	 joal: ok, i 'm glad you like it
[20:00:33] <joal>	 nuria_: I would have been lazy to get there though ;)
[20:00:41] <nuria_>	 jaja
[20:06:22] <wikibugs>	 (03PS1) 10Joal: Add optional datasource to druid loading workflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/405053
[20:17:51] <ottomata>	 joal:  btw, druid running on d-1 in analytics labs
[20:18:05] <joal>	 ottomata: noted !
[20:18:08] <ottomata>	 if you have some mins to test a druid loading job from hadoop
[20:18:09] <joal>	 ottomata: will test
[20:18:17] <ottomata>	 gr8
[20:19:28] <joal>	 ottomata: can we relaunch camus for some time? No more data in cluster :*(
[20:21:49] <ottomata>	 i think the kafka cluster is busted
[20:21:50] <ottomata>	 looking
[20:22:47] <ottomata>	 gonna try to wipe it..
[20:25:28] <joal>	 ottomata: I can also generate a bunch of fake json, but it would be interesting to do it fake data that looks real :)
[20:26:06] <ottomata>	 like my animals!?
[20:30:29] <joal>	 :D
[20:30:38] <joal>	 ottomata: real webrequest-style data
[20:30:50] <joal>	 ottomata: I can also fake some if your cluster is gone
[20:32:34] <ottomata>	 pssh i dunno what is up with this cluster, i'm totally wiping it, time for new!
[20:35:07] <joal>	 ottomata: shall I generate some data, or do you prefer us testing your new cluster with camus as well?
[20:35:43] <ottomata>	 oh joal, we kinda already tested that I thin?  i dunno, i think i'll set it up anyway...
[20:36:03] <wikibugs>	 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3911039 (10zhuyifei1999) Could be related to the load increase of the servers as a result of the recent *.labsdb redirection to the analytics database servers.
[20:36:37] <joal>	 ottomata: we tested indeed
[20:36:58] <ottomata>	 since java 8 upgrade?
[20:37:39] <joal>	 hm, I think so, ya? no?
[20:37:57] <joal>	 ottomata: not since Luca pushed java8 with puppet
[20:38:03] <joal>	 but with j8 enable, yes
[20:38:09] <joal>	 let's try it again :)
[20:38:12] <joal>	 to be sure
[20:38:16] <ottomata>	 ok'
[20:46:03] <wikibugs>	 (03PS1) 10Milimetric: Fix blocking problems with map component [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/405061
[20:46:32] <wikibugs>	 (03CR) 10Milimetric: [V: 032 C: 032] "fdans, take a look at this when you're back, just going to self-merge for now" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/405061 (owner: 10Milimetric)
[20:47:04] <wikibugs>	 (03PS11) 10Milimetric: Improve WikiSelector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/402387 (https://phabricator.wikimedia.org/T179530) (owner: 10Mforns)
[20:53:43] <wikibugs>	 (03CR) 10Milimetric: [C: 032] Improve WikiSelector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/402387 (https://phabricator.wikimedia.org/T179530) (owner: 10Mforns)
[20:54:06] <ottomata>	 allrighty joal
[20:54:09] <ottomata>	 data rolling back in
[20:54:14] <ottomata>	 camus running
[20:54:42] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban: Move webrequest varnishkafka and consumers to Kafka jumbo cluster. - https://phabricator.wikimedia.org/T185136#3911114 (10Jgreen) >>! In T185136#3910130, @Ottomata wrote: >> first step on the frack side is to whitelist the new hosts at the firewalls, can you point me to...
[20:54:47] <joal>	 ottomata: awesome, many thanks !
[20:54:53] <wikibugs>	 (03CR) 10Milimetric: [C: 032] "I'm going to merge this, we'll look at the chain thing together later.  The code does look cleaner, but it was already good.  Another thin" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/402387 (https://phabricator.wikimedia.org/T179530) (owner: 10Mforns)
[20:55:17] <joal>	 ottomata: I think to prevent the issue we already add about disk being full, maybe we can stop it laer on tonight?
[20:56:02] <milimetric>	 ottomata: I wanted to link people to your thoughts on Stream Data Platform, or the best thing to read about it for the summit.  I feel like that's the best project to describe what our team does and can do
[20:56:29] <wikibugs>	 (03CR) 10Milimetric: [V: 032 C: 032] Improve WikiSelector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/402387 (https://phabricator.wikimedia.org/T179530) (owner: 10Mforns)
[20:57:03] <milimetric>	 ottomata: this is the task where I'd link to it: https://phabricator.wikimedia.org/T183320
[20:58:18] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban: Move webrequest varnishkafka and consumers to Kafka jumbo cluster. - https://phabricator.wikimedia.org/T185136#3911119 (10Ottomata) > Another question, does kafka use a different port for TLS service? Yes, :9093.  > As long as we can create certs for the two frack hosts...
[20:59:22] <ottomata>	 joal:  hmm, its one req per second
[20:59:22] <ottomata>	 but
[20:59:27] <ottomata>	 which disk ?
[20:59:29] <ottomata>	 kafka?
[20:59:45] <joal>	 ottomata: it can wait tomorrow morning, but hadoop got full pretty fast (like a couple days)
[20:59:53] <ottomata>	 ah ok
[21:00:07] <ottomata>	 hm, well if we stop my curl loop, it'll stop
[21:00:07] <joal>	 ottomata: I confirm data flows :)
[21:00:10] <ottomata>	 i can stop whenever you like
[21:00:18] <ottomata>	 milimetric:  sure!
[21:00:26] <ottomata>	 we are still wordsmithing i think
[21:00:27] <ottomata>	 but
[21:00:28] <ottomata>	 https://wikitech.wikimedia.org/wiki/User:Ottomata/Stream_Data_Platform
[21:00:40] <ottomata>	 this is kinda more than 'analytics' though
[21:01:06] <ottomata>	 but, i'd love it if you disseminated  it
[21:01:55] <wikibugs>	 (03PS1) 10Milimetric: Release 2.1.5 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/405065
[21:02:16] <milimetric>	 ottomata: yeah, it's more than Analytics, and that's exactly the point I'm trying to make
[21:02:32] <milimetric>	 like, if we are to hit this extremely ambitious strategy by 2030, we need to start working together
[21:02:44] <milimetric>	 and this is the perfect example for that
[21:04:42] <wikibugs>	 (03CR) 10Milimetric: [C: 032] Release 2.1.5 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/405065 (owner: 10Milimetric)
[21:04:51] <wikibugs>	 (03CR) 10Milimetric: [V: 032 C: 032] Release 2.1.5 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/405065 (owner: 10Milimetric)
[21:06:45] <ottomata>	 oh nice
[21:06:48] <ottomata>	 +1 milimetric
[21:07:05] <ottomata>	 yeah man, put all ur knowledge events into stream data platform
[21:07:15] <ottomata>	 then everyone can put them wherever they want!
[21:07:22] <ottomata>	 KNOWLEDGE AS A SERVICE
[21:10:36] <wikibugs>	 (03PS1) 10Milimetric: Release 2.1.5 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/405067
[21:10:43] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Release 2.1.5 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/405067 (owner: 10Milimetric)
[21:11:37] <wikibugs>	 (03Abandoned) 10Milimetric: Release 2.1.5 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/405067 (owner: 10Milimetric)
[21:14:07] <wikibugs>	 (03PS1) 10Milimetric: Release 2.1.5 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/405069
[21:15:10] <wikibugs>	 (03CR) 10Milimetric: [V: 032 C: 032] Release 2.1.5 [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/405069 (owner: 10Milimetric)
[21:17:05] <milimetric>	 ottomata: that's even more ambitious than the 2030 strategy :)
[21:20:40] <ottomata>	 :)
[21:58:14] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban: Add IPv6 addresses for kafka-jumbo hosts - https://phabricator.wikimedia.org/T185262#3911265 (10Ottomata) p:05Triage>03Normal
[22:01:01] <joal>	 hey ottomata - had you stopped your loop, or camus?
[22:06:54] <ottomata>	 have not
[22:07:17] <joal>	 ottomata: I only have a single file in hdfs :(
[22:07:24] <joal>	 looks like somnthing is worng
[22:07:38] <ottomata>	 looking
[22:09:18] <joal>	 ottomata: it's as if camus was thinking it is running, while there is no job
[22:09:49] <joal>	 ottomata: Actually, camus IS running, but no yarn job
[22:09:53] <joal>	 :(
[22:10:47] <ottomata>	 weird joal
[22:10:50] <ottomata>	 i just ran it manually
[22:10:53] <ottomata>	 it seemed to work?
[22:11:26] <ottomata>	 dunno why it was failing in cron
[22:11:44] <joal>	 ottomata: Seems related to an existing staled process
[22:16:31] <wikibugs>	 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3911330 (10IKhitron) Can't understand, but nevermind. Is this permanent?
[22:17:19] <ottomata>	 oh there's a running yarn camus now?
[22:17:45] <joal>	 not in yarn no, but as a java process
[22:17:50] <ottomata>	 oh hm
[22:17:51] <ottomata>	 weird ok
[22:17:55] <ottomata>	 ok now though?
[22:18:10] <joal>	 ottomata: very much
[22:18:18] <joal>	 ottomata: you can kill your loop, I have data
[22:18:21] <joal>	 ottomata: many thanks :)
[22:18:33] <joal>	 ottomata: let's not fill disks with animals :)
[22:19:00] <ottomata>	 ok, will kill loop :)
[22:19:59] <joal>	 refine ongoing ottomata - Will test druid loading when finished
[22:21:29] <ottomata>	 grrr8
[22:49:55] <wikibugs>	 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3911404 (10zhuyifei1999) > Is this permanent? Yes.
[22:52:01] <joal>	 ottomata: not working - probably netweork issue
[22:52:16] <joal>	 ottomata: oozie doens't even manage to lauch the indexation job
[22:54:02] <joal>	 Arf, actually nom nevermind ottomata - sorry for the noise
[22:56:58] <ottomata>	 ? oook:)
[22:57:19] <joal>	 still not working, but loking for why now :)
[22:58:04] <joal>	 ottomata: looks like the idexation failedon druid side
[22:58:07] <joal>	 going over there :)
[22:58:29] <joal>	 Ohhh ! I think I know :)
[22:58:46] <ottomata>	 oh?
[22:58:57] <joal>	 parameters in oozie templates
[22:59:27] <joal>	 actually, no
[22:59:44] <joal>	 ottomata: is yarn master configured correctly in d-1?
[23:03:39] <joal>	 ottomata: in druid middlemanager logs: java.net.UnknownHostException: analytics-hadoop-labs
[23:04:22] <ottomata>	 OH i think i know
[23:04:24] <ottomata>	 luca had this problem...
[23:04:25] <ottomata>	 hmmm
[23:04:54] <joal>	 ottomata: I had it as well on cluster for oozie, but now seems solved - What magic have you done?
[23:05:11] <ottomata>	 i did nothing!
[23:06:21] <ottomata>	 its working joal?
[23:06:43] <joal>	 in oozie yes, but from druid seems no
[23:07:41] <joal>	 it's weird ottomata, cause hdfs command works from d-1, but it seems that middlemanager (maybe peons?) can't find it by name
[23:17:21] <ottomata>	 joal i'm not sure where its getting that hostname from...
[23:17:26] <ottomata>	 analytics-hadoop-labs
[23:17:26] <ottomata>	 ...
[23:17:31] <ottomata>	 that's not in oozie or something somewhere?
[23:17:47] <joal>	 I past it as hostname (it;s defined as is in hdfs-site.xml)
[23:18:09] <ottomata>	 OHHH
[23:18:10] <ottomata>	 OH
[23:18:11] <ottomata>	 i think i know...
[23:19:14] <ottomata>	 misconfiguration on my part...luca's hadoop configs only apply to nodes that start with hadoop*
[23:19:19] <ottomata>	 so i had to copy paste some
[23:19:29] <joal>	 could be ottomata :)
[23:21:26] <ottomata>	 joal try now
[23:21:29] <joal>	 sure
[23:22:51] <ottomata>	 wait wah
[23:22:53] <ottomata>	 sorry joal
[23:22:58] <ottomata>	 it idnt' change what I thought it did... what's wrong here..
[23:24:04] <joal>	 ottomata: At least druid launched an indexation
[23:24:45] <joal>	 ottomata: seems to have worked (another issue, but indexation job launched)
[23:27:32] <ottomata>	 it didn't?
[23:28:18] <joal>	 ottomata: druid indexation succeeded :)
[23:28:21] <joal>	 Awesome
[23:28:35] <joal>	 ottomata: is d-1 running j7?
[23:28:48] <ottomata>	 AH i know
[23:28:50] <ottomata>	 yes
[23:28:54] <ottomata>	 but the hostname is still not riht
[23:29:05] <ottomata>	 because of more copy paste problems...
[23:29:06] <ottomata>	 on it
[23:29:20] <ottomata>	 i should have named this node hadoop-druid1
[23:29:24] <ottomata>	 then everything would jsut work
[23:29:30] <joal>	 ottomata: I have no clue why it succeeded then :)
[23:32:43] <joal>	 ottomata: Got results from druid - This is a success :)
[23:35:14] <joal>	 ottomata: Gone to bed for tonight
[23:35:19] <joal>	 ottomata: Many thanks for your help :)
[23:35:27] <joal>	 ottomata: See you tomorrow
[23:39:36] <ottomata>	 great!