[06:25:06] <wikibugs>	 (03CR) 10Elukey: "My test coordinator finished successfully, so I have compared timings. The current job takes ~1h, meanwhile mine took more than 7..." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey)
[06:38:57] <elukey>	 testing also hive.exec.submit.local.task.via.child = false
[06:39:00] <elukey>	 looks promising
[06:39:09] <elukey>	 for edit hourly I mean
[06:39:22] <elukey>	 anyway, need to go to the doctor this morning, will start a little later!
[08:14:45] <wikibugs>	 (03PS2) 10Elukey: edit: set hive.exec.submit.local.task.via.child = false in the .hql file [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257)
[08:15:44] <wikibugs>	 (03CR) 10Elukey: "With this new configuration, everything works as expected in the same amount of time. I checked heap usage of HiveServer2 while executing " [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey)
[08:25:57] <wikibugs>	 10Analytics, 10Patch-For-Review, 10User-Elukey: Move refinery to hive 2 actions - https://phabricator.wikimedia.org/T227257 (10elukey) We encountered two issues when after the migration to hive2 actions:  * T229669 * https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/528167/
[08:26:09] <elukey>	 mforns: o/
[08:26:11] <elukey>	 https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/528167/ is interesting
[08:26:17] <elukey>	 let me know your thoughts
[08:30:55] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10elukey) >>! In T229347#5394507, @Ottomata wrote: > Another gotcha: >  > ` > Exception: Python in worker has different version 3.5 than that in driver 3.7, PySpark cannot run wi...
[09:00:42] <wikibugs>	 (03PS8) 10WMDE-leszek: Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180
[09:03:33] <wikibugs>	 (03PS7) 10WMDE-leszek: Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup)
[09:05:06] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup)
[09:18:14] <wikibugs>	 (03PS1) 10Elukey: Revert "Revert "edit-hourly: move oozie coordinator to hive2 actions"" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528407
[09:30:13] <wikibugs>	 (03PS9) 10WMDE-leszek: Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180
[09:30:52] <wikibugs>	 (03PS8) 10WMDE-leszek: Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup)
[09:33:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup)
[09:38:45] <wikibugs>	 (03CR) 10WMDE-leszek: "recheck" [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup)
[09:42:12] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup)
[09:46:12] <wikibugs>	 (03CR) 10WMDE-leszek: "tests failures are pretty odd, given the class loader issue in the maven plugin is fixed in the parent change, also the failing JUnit test" [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup)
[09:52:40] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10elukey) I was finally able to test on analytic1030 Refine. The missing bit was `--conf spark.executorEnv.LD_LIBRARY_PATH=/usr/lib/hadoop/li...
[10:48:06] * elukey lunch!
[12:43:24] <elukey>	 fdans: o/
[12:44:12] <elukey>	 I have created https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/528445/ to update the AQS's druid config
[12:45:16] <elukey>	 IIRC nuria told us to deploy it as final step after the generation of the new snapshot
[12:46:22] <fdans>	 elukey: yesss awesome, but this is part of the train right?
[12:47:02] <elukey>	 fdans: I have always done it with Joseph as separate step, but we can add it to the train yes
[12:47:14] <elukey>	 anytime you want ping me :)
[12:53:23] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Roll restart all openjdk-8 jvms in Analytics - https://phabricator.wikimedia.org/T229003 (10elukey)
[12:53:53] <elukey>	 \o/
[12:53:58] <elukey>	 finally done
[13:02:22] <wikibugs>	 10Analytics, 10Analytics-Kanban: Add more dimensions to netflow's druid ingestion specs - https://phabricator.wikimedia.org/T229682 (10elukey) Current config:  `     profile::analytics::refinery::job::eventlogging_to_druid_job { 'netflow':         job_config => {             database         => 'wmf',...
[13:23:09] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) Oh, cool!  This makes all 3 of those py3 versions available?  Yeah for sure we should do this.  As long as default python3 stays the same, this is safe!  Awesome!
[13:25:26] <wikibugs>	 10Analytics, 10Operations, 10Core Platform Team Legacy (Watching / External), 10Patch-For-Review, and 2 others: Replace and expand codfw kafka main hosts (kafka200[123]) with kafka-main200[12345] - https://phabricator.wikimedia.org/T225005 (10Ottomata) In addition to the steps in https://phabricator.wikime...
[13:29:02] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] "Interesting!  I think this is worth a try, especially if this job seems work and be fine.  This could cause problems on the hive-server2 J" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey)
[13:29:04] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10elukey) @Ottomata there is a caveat though: all the python libraries have only the version in debian, so changing the version of the interpreter might be a problem.. We'll need...
[13:31:46] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) Hm indeed.  For the Spark case, we might need to also include all the binary deps of pyarrow (e.g. numpy) for the python version in the deb package.  I.e. include bot...
[13:39:50] <ottomata>	 elukey:  i bet your refine job failed because I stopped using an30 for testing.
[13:39:55] <ottomata>	 am using an-tool1006
[13:39:59] <ottomata>	 so you prob had an old jar
[13:40:12] <ottomata>	 am trying now on 1006
[13:41:23] <ottomata>	 first trying without kinit... :)
[13:45:11] <wikibugs>	 10Analytics, 10Research: Recommend the best format to release public data lake as a dump - https://phabricator.wikimedia.org/T224459 (10Halfak) Hey folks, I thought it might be useful if I shared some thoughts here.   When I first started analyzing Wikipedia edit histories, the first thing I did was figure out...
[13:54:27] <elukey>	 o/
[13:54:35] <elukey>	 sorry I was afk
[13:54:51] <elukey>	 ah okok I thought something was going on, the null pointer was really weird :)
[13:55:55] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10Ottomata) I've been using an-tool1006, so the most recent .jar is there.  Without kinit, I get stuff like:  ` 19/08/06 13:41:21 WARN Client...
[13:58:47] <elukey>	 ottomata: about --^ did you turn off rcp encryption/auth?
[13:59:12] <elukey>	 because it is set in the yarn config, so I believe that it must be there
[13:59:34] <elukey>	 I can remove all that stuff for the moment in theory
[13:59:59] <elukey>	 (coffee brb)
[14:01:32] <ottomata>	 elukey:  it was in yarn mode, should the rcp auth work there?
[14:02:18] <ottomata>	 trying with rpc auth disabled...
[14:05:36] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10Ottomata) If I disable RPC auth with `--conf "spark.authenticate=false" --conf "spark.shuffle.service.enabled=false" --conf "spark.dynamicA...
[14:07:35] <elukey>	 mmm nope it should have been the opposite, namely working with auth enabled
[14:07:39] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10Ottomata) Ah, but those confs disable dynamicAllocation and shuffle service...which we don't really want to do.  If I only do `--conf "spar...
[14:08:50] <ottomata>	 elukey:  i think this isn't working then :(
[14:09:43] <ottomata>	 yesterday we had considered disabling the RPC auth to let us turn on kerberos with spark 2.3, right?
[14:10:04] <ottomata>	 but, from what I can tell, the shuffle service needs RPC auth to work?  and dynamic allocation needs the shuffle service.
[14:10:06] <ottomata>	 sooooo
[14:10:15] <ottomata>	 we need a way to get RPC auth working in Yarn mode?
[14:10:17] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10elukey) The last failure is expected, since in the testing cluster the shuffler wants authentication via spark RPC native or SASL, and if d...
[14:10:17] <ottomata>	 right?
[14:10:44] <elukey>	 nono the shuffler wants that because there are settings in yarn-site.xml
[14:10:48] <elukey>	 if I remove them it will work fine
[14:11:00] <ottomata>	 oh
[14:11:08] <elukey>	 in theory it should work now in yarn mode with authentication + encryption
[14:11:15] <ottomata>	 ah
[14:11:18] <elukey>	 adding the library path etc..
[14:11:19] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10Ottomata) ` /usr/bin/spark2-submit \ --name otto_refine0 \ --master yarn \ --class org.wikimedia.analytics.refinery.job.refine.Refine \ --c...
[14:11:27] <ottomata>	 ok ^ has my command with rpc auth disabled
[14:11:27] <elukey>	 so if it failes there might be something going on
[14:11:31] <ottomata>	 you can remove that part
[14:11:32] <ottomata>	 and try
[14:11:38] <elukey>	 super will try to see what it is happening
[14:11:52] <elukey>	 but I'll also disable auth+encryption and re-test everything
[14:12:08] <elukey>	 we already know that local mode doesn't work
[14:12:11] <ottomata>	 aye
[14:12:21] <elukey>	 it is really a pity though to let everything un encrypted after kerberos
[14:12:27] <ottomata>	 aye
[14:12:51] <ottomata>	 i dunno, if yarn RPC should work...maybe we can just patch the spark2-submit stuff to always disable RPC auth in local mode
[14:13:05] <ottomata>	 i don't love it but it would be an ok hold over i guess
[14:13:33] <elukey>	 nah we can wait 2.4, hopefully it should not come that far in the future :)
[14:13:40] <ottomata>	 e.g. maybe a wrapper that al;ways passes a custom --properties-file 
[14:13:42] <ottomata>	 aye ok.
[14:13:48] <ottomata>	 disable RPC auth in kerberos until 2.4 upgrade
[14:14:21] <ottomata>	 elukey:  ok then, so we don't need to test RPC auth, right?  we just need to disable it in yarn-site and in spark-defaults, and then test?
[14:14:39] <elukey>	 I think so yes, lemme remove settings now
[14:15:01] <ottomata>	 elukey:  btw, the LD_LIBRARY_PATH stuff is not needed in local mode
[14:16:24] <ottomata>	 it is exported in spark-env.sh
[14:16:29] <ottomata>	 so i guess that makes sense
[14:16:32] <ottomata>	 since it is all local
[14:16:55] <ottomata>	 dunno why it wouldn't be set for remote executors
[14:17:13] <elukey>	 super weird
[14:25:42] <elukey>	 ok ottomata RCP should be plain now
[14:25:45] <elukey>	 if you want to test
[14:25:48] <ottomata>	 k
[14:27:40] <ottomata>	 do i still need to disablle in --conf cli?
[14:28:47] <elukey>	 nono
[14:29:01] <elukey>	 it should work as it is
[14:29:11] <elukey>	 modulo the LD_LIBRARY_PATH thing
[14:30:47] <ottomata>	 elukey:  am getting
[14:30:47] <ottomata>	 Unable to create executor due to Unable to register with external shuffle server due to : java.lang.IllegalStateException: Expected SaslMessage, received something else (maybe your client does not have SASL enabled?)
[14:30:53] <ottomata>	 do the nodemanagers need restarted?
[14:31:01] <ottomata>	 don't really remember how shuffle service works
[14:31:55] <elukey>	 I already restarted them mmm
[14:32:11] <elukey>	 lemme check
[14:32:20] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] swift_upload.py to handle upload and event emitting [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525435 (https://phabricator.wikimedia.org/T227896) (owner: 10Ottomata)
[14:32:40] <elukey>	 sorry didn't do a proper restart, the cumin command was wrong
[14:32:44] <elukey>	 now it should work 
[14:32:52] <elukey>	 ottomata: --^
[14:33:31] <ottomata>	 kj
[14:34:21] <ottomata>	 elukey:  let's go ahead and puppetize the spark.executorEnv.LD_LIBRARY_PATH in spark-defaults too eh?
[14:36:50] <elukey>	 ottomata: today I have done some tests, it seems effective (for yarn mode) only if set by the client 
[14:37:01] <elukey>	 I added it first to spark defaults on the workers but nothing
[14:37:15] <elukey>	 anyway, if it works it seems ok to add yes!
[14:37:17] <ottomata>	 !
[14:37:30] <ottomata>	 well we'd just set it everywhere in spark-defaults anyway. hm
[14:38:00] <ottomata>	 elukey: 
[14:38:01] <ottomata>	 yarn-site.xml has
[14:38:07] <ottomata>	   <property>
[14:38:07] <ottomata>	       <name>yarn.app.mapreduce.am.env</name>
[14:38:07] <ottomata>	       <value>LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native</value>
[14:38:07] <ottomata>	   </property>
[14:38:49] <ottomata>	 i wonder what would happen if we set spark.executorEnv.LD_LIBRARY_PATH in yarn-site.xml
[14:38:57] <ottomata>	 hmm, nawwww
[14:39:03] <ottomata>	 the shuffle service isn't reading the snappy stuff.
[14:39:52] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10Ottomata) After https://gerrit.wikimedia.org/r/c/operations/puppet/+/528483, yarn client, yarn cluster and local mode all work great!
[14:41:19] <elukey>	 nice! --^
[14:41:34] <elukey>	 ottomata: I think that we can merge the change then, so it will get deployed this week
[14:41:43] <ottomata>	 k
[14:41:58] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Refine: infer hiveServerUrl from config [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/526670 (https://phabricator.wikimedia.org/T228291) (owner: 10Ottomata)
[14:47:22] <ottomata>	 elukey:  any objections to installying pyall on cluster now?
[14:47:30] <ottomata>	 it'll help me test stuff from stat1005
[14:49:59] <elukey>	 nope!
[14:52:04] <ottomata>	 elukey:  https://gerrit.wikimedia.org/r/c/operations/puppet/+/528492/1/modules/profile/manifests/analytics/cluster/packages/common.pp
[14:53:01] <elukey>	 ottomata: isn't pyall a component to add via apt config to then require python37 etc...?
[14:53:17] <ottomata>	 doh doh doh
[14:53:29] <ottomata>	 that makes way more sense elukey :p
[14:53:37] <ottomata>	 haha
[14:54:41] <elukey>	 :)
[15:05:59] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: TLS certificates for Analytics origin servers - https://phabricator.wikimedia.org/T227860 (10elukey)
[15:06:32] * elukey afk for a bit!
[15:23:59] <wikibugs>	 (03CR) 10WMDE-leszek: "meh, tests are not really skipped/ignored, @Ignore annotation does not work with the JUnit version used" [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 (owner: 10WMDE-leszek)
[15:29:06] <wikibugs>	 (03CR) 10WMDE-leszek: [C: 04-2] Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 (owner: 10WMDE-leszek)
[15:32:50] <wikibugs>	 (03PS10) 10WMDE-leszek: Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180
[15:41:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 (owner: 10WMDE-leszek)
[15:46:46] <wikibugs>	 (03PS11) 10WMDE-leszek: Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180
[15:53:18] <wikibugs>	 (03CR) 10Mforns: [V: 03+2 C: 03+2] "Two +1s, merging!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/527583 (https://phabricator.wikimedia.org/T229669) (owner: 10Mforns)
[15:53:48] <wikibugs>	 (03PS2) 10Mforns: Revert "Revert "cassandra: move oozie bundle to hive2 actions"" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/527616
[15:53:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 (owner: 10WMDE-leszek)
[15:54:15] <wikibugs>	 (03CR) 10Mforns: [V: 03+2 C: 03+2] "Merging to reenable hhive2 actions for fixed cassandra jobs." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/527616 (owner: 10Mforns)
[15:57:18] <wikibugs>	 (03PS1) 10Mforns: [WIP] Add spark job to create mediawiki history dumps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/528504 (https://phabricator.wikimedia.org/T208612)
[15:57:39] <wikibugs>	 (03CR) 10Mforns: [C: 04-2] "Still WIP" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/528504 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns)
[15:57:51] <ottomata>	 elukey: https://gerrit.wikimedia.org/r/c/operations/puppet/+/528492/2/modules/profile/manifests/analytics/cluster/packages/common.pp
[15:59:36] <elukey>	 ottomata: looks good, not sure how require_package behaves in this case (since it should come before the surrounding class, but it is required by the apt stuff..)
[15:59:55] <ottomata>	 i could change it before the class?
[15:59:56] <elukey>	 package { 'etc.': } is probably less convoluted, but I am not sure
[16:00:04] <ottomata>	 but i think the class should declare the package resource too
[16:00:08] <ottomata>	 so  iexpectd it to work
[16:00:13] <elukey>	 super
[16:00:15] <elukey>	 looks good
[16:00:48] <ottomata>	 k trying
[16:00:53] <ottomata>	 ... well after meetings
[16:02:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add spark job to create mediawiki history dumps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/528504 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns)
[16:02:41] <wikibugs>	 (03PS12) 10WMDE-leszek: Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180
[16:11:42] <wikibugs>	 (03PS13) 10WMDE-leszek: Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180
[16:13:21] <ebernhardson>	 what? non-ancient python? this is a debian shop we only run stable 5 year old releases :P
[16:15:14] <ottomata>	 haha, well buster is available now!
[16:15:20] <ottomata>	 gotta figure out how to get to it
[16:16:19] <ottomata>	 you are welcome to figure out how to use python 3.7 with your stuff after this;
[16:16:28] <ottomata>	 we'll be eventually upgrading it to the default anyway
[16:16:47] <ottomata>	 the installed debian python packages might not work tho; since they are meant for 3.5
[16:16:54] <ottomata>	 but if you ship all your deps to the cluster anyway
[16:16:55] <ottomata>	 it might work
[16:18:26] <ebernhardson>	 i might, i think the main thing 3.7 brings over 3.5 that we've been using is some more optional typing support, and better async, nifty but probably not super important
[16:18:37] <ebernhardson>	 of course there is probably a wealth of other things i dont know that maybe i do want :) 
[16:19:03] <ottomata>	 and f strings?
[16:19:53] <ebernhardson>	 oh, i guess those are nice
[16:20:13] <wikibugs>	 (03PS9) 10WMDE-leszek: Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup)
[16:21:22] <wikibugs>	 (03CR) 10WMDE-leszek: [C: 03+2] Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup)
[16:21:39] <wikibugs>	 (03CR) 10WMDE-leszek: [C: 03+2] Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 (owner: 10WMDE-leszek)
[16:21:59] <wikibugs>	 (03Merged) 10jenkins-bot: Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 (owner: 10WMDE-leszek)
[16:22:16] <wikibugs>	 (03Merged) 10jenkins-bot: Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup)
[16:26:21] <wikibugs>	 (03PS3) 10Elukey: edit: set hive.exec.submit.local.task.via.child = false in the .hql file [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257)
[16:30:26] <wikibugs>	 (03PS4) 10Elukey: edit: set hive.exec.submit.local.task.via.child = false in the .hql file [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257)
[16:30:35] <elukey>	 mforns: hope that --^ is clearer
[16:32:32] <mforns>	 elukey, you want me to merge?
[16:33:37] <mforns>	 or just +1?
[16:37:13] <wikibugs>	 (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey)
[16:37:30] <mforns>	 ok, you have another +1, so you can merge :]
[16:38:37] <elukey>	 thanks!!
[16:38:44] <wikibugs>	 (03PS5) 10Elukey: edit: set hive.exec.submit.local.task.via.child = false in the .hql file [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257)
[16:38:48] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] edit: set hive.exec.submit.local.task.via.child = false in the .hql file [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey)
[16:39:10] <wikibugs>	 (03PS2) 10Elukey: Revert "Revert "edit-hourly: move oozie coordinator to hive2 actions"" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528407
[16:39:13] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] Revert "Revert "edit-hourly: move oozie coordinator to hive2 actions"" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528407 (owner: 10Elukey)
[16:42:01] <elukey>	 logging off earlier o/
[16:49:19] <ottomata>	  o/
[18:27:53] <mforns>	 ottomata, I'm getting an inconsistency between the mediawiki history job and the data in Hive, it's regarding the page_artificial_id, it's in the MediawikiEvent schema, but not in the hive table... does this ring a bell to you?
[19:02:34] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) //Writing this down for future me://    I have 2 goals. 1. Get Spark scala and pyspark 2.3.0 with pyarrow to work in a heterogeneous Stretch + Buster environment. 2....
[19:03:09] <ottomata>	 mforns:  MediawikiEvent schema?
[19:03:19] <mforns>	 ottomata, yes
[19:03:36] <ottomata>	 that is part of mediawiki history?
[19:04:01] <mforns>	 the one in the file: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/mediawikihistory/denormalized/MediawikiEvent.scala
[19:04:23] <mforns>	 it has the page_artificial_id, but the table in Hive does not
[19:04:34] <ottomata>	 hm, mforns  no i don't know what this is
[19:05:01] <mforns>	 I think the hive table never had the page_artificial_id, but not sure how that comes to be
[19:05:21] <ottomata>	 aye
[19:06:14] <mforns>	 I might not be able to use the MediawikiEvent... it seems tied to the history reconstruction process. :[
[19:06:25] <mforns>	 I'll have to repeat a bunch of code..
[19:08:48] <ottomata>	 hmnmm
[19:08:57] <ottomata>	 mforns:  sorry i gotta run for a bit, not really sure if i cna help anyway :/
[19:09:09] <ottomata>	 gotta go get an internatinoal drivers permit!  and pick up some malaria meds
[19:09:12] <ottomata>	 back in a little while...
[19:09:12] <mforns>	 no problemo, will look more
[19:09:16] <mforns>	 k
[19:22:06] <mforns>	 I don't understand how the current code does not write the page_artificial_id o.O, the method toRow from MediawikiEvent is used just before writing the denormalized data to HDFS, so confused...
[20:56:24] <ottomata>	 (back)
[21:05:27] <mforns>	 hey ottomata, I haven't found an explanation yet... :C
[21:06:15] <mforns>	 I'm going to go ahead and try to modify my current mediawiki history dump job to not use MediawikiEvent.
[21:07:05] <mforns>	 unless you want to pair with me and look into the crazinesssss
[21:44:24] <wikibugs>	 10Analytics, 10Anti-Harassment (The Letter Song), 10Patch-For-Review: Instrument Special:Mute - https://phabricator.wikimedia.org/T224958 (10nettrom_WMF) >>! In T224958#5394397, @dmaza wrote: > @Niharika I think we might need approval from someone on the analytics team. Do you know who can we ping here?  I'v...
[22:27:37] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) Hm, more thoughts.  Even if I figure out ^, I'm not sure it allow for an incremental upgrade to Buster.  If we upgrade workers first, the Buster workers will only wor...
[22:32:06] <ottomata>	 mforns:  sorry I missed your ping!
[23:29:59] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) > mayyyybe I can get something working where the python3.7 dependencies are always shipped along with the spark job to workers JAW DROP...I got this to work.  1. Get...
[23:39:28] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) Whoa wait a minute. If I can do ^, I can upgrade to Spark 2.4.3 before buster.  I'm not relying on the Debian packaged python dependencies anymore...I might even be a...