[06:25:06] (03CR) 10Elukey: "My test coordinator finished successfully, so I have compared timings. The current job takes ~1h, meanwhile mine took more than 7..." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [06:38:57] testing also hive.exec.submit.local.task.via.child = false [06:39:00] looks promising [06:39:09] for edit hourly I mean [06:39:22] anyway, need to go to the doctor this morning, will start a little later! [08:14:45] (03PS2) 10Elukey: edit: set hive.exec.submit.local.task.via.child = false in the .hql file [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257) [08:15:44] (03CR) 10Elukey: "With this new configuration, everything works as expected in the same amount of time. I checked heap usage of HiveServer2 while executing " [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [08:25:57] 10Analytics, 10Patch-For-Review, 10User-Elukey: Move refinery to hive 2 actions - https://phabricator.wikimedia.org/T227257 (10elukey) We encountered two issues when after the migration to hive2 actions: * T229669 * https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/528167/ [08:26:09] mforns: o/ [08:26:11] https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/528167/ is interesting [08:26:17] let me know your thoughts [08:30:55] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10elukey) >>! In T229347#5394507, @Ottomata wrote: > Another gotcha: > > ` > Exception: Python in worker has different version 3.5 than that in driver 3.7, PySpark cannot run wi... [09:00:42] (03PS8) 10WMDE-leszek: Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 [09:03:33] (03PS7) 10WMDE-leszek: Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup) [09:05:06] (03CR) 10jerkins-bot: [V: 04-1] Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup) [09:18:14] (03PS1) 10Elukey: Revert "Revert "edit-hourly: move oozie coordinator to hive2 actions"" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528407 [09:30:13] (03PS9) 10WMDE-leszek: Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 [09:30:52] (03PS8) 10WMDE-leszek: Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup) [09:33:19] (03CR) 10jerkins-bot: [V: 04-1] Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup) [09:38:45] (03CR) 10WMDE-leszek: "recheck" [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup) [09:42:12] (03CR) 10jerkins-bot: [V: 04-1] Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup) [09:46:12] (03CR) 10WMDE-leszek: "tests failures are pretty odd, given the class loader issue in the maven plugin is fixed in the parent change, also the failing JUnit test" [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup) [09:52:40] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10elukey) I was finally able to test on analytic1030 Refine. The missing bit was `--conf spark.executorEnv.LD_LIBRARY_PATH=/usr/lib/hadoop/li... [10:48:06] * elukey lunch! [12:43:24] fdans: o/ [12:44:12] I have created https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/528445/ to update the AQS's druid config [12:45:16] IIRC nuria told us to deploy it as final step after the generation of the new snapshot [12:46:22] elukey: yesss awesome, but this is part of the train right? [12:47:02] fdans: I have always done it with Joseph as separate step, but we can add it to the train yes [12:47:14] anytime you want ping me :) [12:53:23] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Roll restart all openjdk-8 jvms in Analytics - https://phabricator.wikimedia.org/T229003 (10elukey) [12:53:53] \o/ [12:53:58] finally done [13:02:22] 10Analytics, 10Analytics-Kanban: Add more dimensions to netflow's druid ingestion specs - https://phabricator.wikimedia.org/T229682 (10elukey) Current config: ` profile::analytics::refinery::job::eventlogging_to_druid_job { 'netflow': job_config => { database => 'wmf',... [13:23:09] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) Oh, cool! This makes all 3 of those py3 versions available? Yeah for sure we should do this. As long as default python3 stays the same, this is safe! Awesome! [13:25:26] 10Analytics, 10Operations, 10Core Platform Team Legacy (Watching / External), 10Patch-For-Review, and 2 others: Replace and expand codfw kafka main hosts (kafka200[123]) with kafka-main200[12345] - https://phabricator.wikimedia.org/T225005 (10Ottomata) In addition to the steps in https://phabricator.wikime... [13:29:02] (03CR) 10Ottomata: [C: 03+1] "Interesting! I think this is worth a try, especially if this job seems work and be fine. This could cause problems on the hive-server2 J" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [13:29:04] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10elukey) @Ottomata there is a caveat though: all the python libraries have only the version in debian, so changing the version of the interpreter might be a problem.. We'll need... [13:31:46] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) Hm indeed. For the Spark case, we might need to also include all the binary deps of pyarrow (e.g. numpy) for the python version in the deb package. I.e. include bot... [13:39:50] elukey: i bet your refine job failed because I stopped using an30 for testing. [13:39:55] am using an-tool1006 [13:39:59] so you prob had an old jar [13:40:12] am trying now on 1006 [13:41:23] first trying without kinit... :) [13:45:11] 10Analytics, 10Research: Recommend the best format to release public data lake as a dump - https://phabricator.wikimedia.org/T224459 (10Halfak) Hey folks, I thought it might be useful if I shared some thoughts here. When I first started analyzing Wikipedia edit histories, the first thing I did was figure out... [13:54:27] o/ [13:54:35] sorry I was afk [13:54:51] ah okok I thought something was going on, the null pointer was really weird :) [13:55:55] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10Ottomata) I've been using an-tool1006, so the most recent .jar is there. Without kinit, I get stuff like: ` 19/08/06 13:41:21 WARN Client... [13:58:47] ottomata: about --^ did you turn off rcp encryption/auth? [13:59:12] because it is set in the yarn config, so I believe that it must be there [13:59:34] I can remove all that stuff for the moment in theory [13:59:59] (coffee brb) [14:01:32] elukey: it was in yarn mode, should the rcp auth work there? [14:02:18] trying with rpc auth disabled... [14:05:36] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10Ottomata) If I disable RPC auth with `--conf "spark.authenticate=false" --conf "spark.shuffle.service.enabled=false" --conf "spark.dynamicA... [14:07:35] mmm nope it should have been the opposite, namely working with auth enabled [14:07:39] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10Ottomata) Ah, but those confs disable dynamicAllocation and shuffle service...which we don't really want to do. If I only do `--conf "spar... [14:08:50] elukey: i think this isn't working then :( [14:09:43] yesterday we had considered disabling the RPC auth to let us turn on kerberos with spark 2.3, right? [14:10:04] but, from what I can tell, the shuffle service needs RPC auth to work? and dynamic allocation needs the shuffle service. [14:10:06] sooooo [14:10:15] we need a way to get RPC auth working in Yarn mode? [14:10:17] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10elukey) The last failure is expected, since in the testing cluster the shuffler wants authentication via spark RPC native or SASL, and if d... [14:10:17] right? [14:10:44] nono the shuffler wants that because there are settings in yarn-site.xml [14:10:48] if I remove them it will work fine [14:11:00] oh [14:11:08] in theory it should work now in yarn mode with authentication + encryption [14:11:15] ah [14:11:18] adding the library path etc.. [14:11:19] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10Ottomata) ` /usr/bin/spark2-submit \ --name otto_refine0 \ --master yarn \ --class org.wikimedia.analytics.refinery.job.refine.Refine \ --c... [14:11:27] ok ^ has my command with rpc auth disabled [14:11:27] so if it failes there might be something going on [14:11:31] you can remove that part [14:11:32] and try [14:11:38] super will try to see what it is happening [14:11:52] but I'll also disable auth+encryption and re-test everything [14:12:08] we already know that local mode doesn't work [14:12:11] aye [14:12:21] it is really a pity though to let everything un encrypted after kerberos [14:12:27] aye [14:12:51] i dunno, if yarn RPC should work...maybe we can just patch the spark2-submit stuff to always disable RPC auth in local mode [14:13:05] i don't love it but it would be an ok hold over i guess [14:13:33] nah we can wait 2.4, hopefully it should not come that far in the future :) [14:13:40] e.g. maybe a wrapper that al;ways passes a custom --properties-file [14:13:42] aye ok. [14:13:48] disable RPC auth in kerberos until 2.4 upgrade [14:14:21] elukey: ok then, so we don't need to test RPC auth, right? we just need to disable it in yarn-site and in spark-defaults, and then test? [14:14:39] I think so yes, lemme remove settings now [14:15:01] elukey: btw, the LD_LIBRARY_PATH stuff is not needed in local mode [14:16:24] it is exported in spark-env.sh [14:16:29] so i guess that makes sense [14:16:32] since it is all local [14:16:55] dunno why it wouldn't be set for remote executors [14:17:13] super weird [14:25:42] ok ottomata RCP should be plain now [14:25:45] if you want to test [14:25:48] k [14:27:40] do i still need to disablle in --conf cli? [14:28:47] nono [14:29:01] it should work as it is [14:29:11] modulo the LD_LIBRARY_PATH thing [14:30:47] elukey: am getting [14:30:47] Unable to create executor due to Unable to register with external shuffle server due to : java.lang.IllegalStateException: Expected SaslMessage, received something else (maybe your client does not have SASL enabled?) [14:30:53] do the nodemanagers need restarted? [14:31:01] don't really remember how shuffle service works [14:31:55] I already restarted them mmm [14:32:11] lemme check [14:32:20] (03CR) 10Ottomata: [V: 03+2 C: 03+2] swift_upload.py to handle upload and event emitting [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525435 (https://phabricator.wikimedia.org/T227896) (owner: 10Ottomata) [14:32:40] sorry didn't do a proper restart, the cumin command was wrong [14:32:44] now it should work [14:32:52] ottomata: --^ [14:33:31] kj [14:34:21] elukey: let's go ahead and puppetize the spark.executorEnv.LD_LIBRARY_PATH in spark-defaults too eh? [14:36:50] ottomata: today I have done some tests, it seems effective (for yarn mode) only if set by the client [14:37:01] I added it first to spark defaults on the workers but nothing [14:37:15] anyway, if it works it seems ok to add yes! [14:37:17] ! [14:37:30] well we'd just set it everywhere in spark-defaults anyway. hm [14:38:00] elukey: [14:38:01] yarn-site.xml has [14:38:07] [14:38:07] yarn.app.mapreduce.am.env [14:38:07] LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native [14:38:07] [14:38:49] i wonder what would happen if we set spark.executorEnv.LD_LIBRARY_PATH in yarn-site.xml [14:38:57] hmm, nawwww [14:39:03] the shuffle service isn't reading the snappy stuff. [14:39:52] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10Ottomata) After https://gerrit.wikimedia.org/r/c/operations/puppet/+/528483, yarn client, yarn cluster and local modeĀ all work great! [14:41:19] nice! --^ [14:41:34] ottomata: I think that we can merge the change then, so it will get deployed this week [14:41:43] k [14:41:58] (03CR) 10Ottomata: [C: 03+2] Refine: infer hiveServerUrl from config [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/526670 (https://phabricator.wikimedia.org/T228291) (owner: 10Ottomata) [14:47:22] elukey: any objections to installying pyall on cluster now? [14:47:30] it'll help me test stuff from stat1005 [14:49:59] nope! [14:52:04] elukey: https://gerrit.wikimedia.org/r/c/operations/puppet/+/528492/1/modules/profile/manifests/analytics/cluster/packages/common.pp [14:53:01] ottomata: isn't pyall a component to add via apt config to then require python37 etc...? [14:53:17] doh doh doh [14:53:29] that makes way more sense elukey :p [14:53:37] haha [14:54:41] :) [15:05:59] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: TLS certificates for Analytics origin servers - https://phabricator.wikimedia.org/T227860 (10elukey) [15:06:32] * elukey afk for a bit! [15:23:59] (03CR) 10WMDE-leszek: "meh, tests are not really skipped/ignored, @Ignore annotation does not work with the JUnit version used" [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 (owner: 10WMDE-leszek) [15:29:06] (03CR) 10WMDE-leszek: [C: 04-2] Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 (owner: 10WMDE-leszek) [15:32:50] (03PS10) 10WMDE-leszek: Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 [15:41:19] (03CR) 10jerkins-bot: [V: 04-1] Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 (owner: 10WMDE-leszek) [15:46:46] (03PS11) 10WMDE-leszek: Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 [15:53:18] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Two +1s, merging!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/527583 (https://phabricator.wikimedia.org/T229669) (owner: 10Mforns) [15:53:48] (03PS2) 10Mforns: Revert "Revert "cassandra: move oozie bundle to hive2 actions"" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/527616 [15:53:57] (03CR) 10jerkins-bot: [V: 04-1] Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 (owner: 10WMDE-leszek) [15:54:15] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Merging to reenable hhive2 actions for fixed cassandra jobs." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/527616 (owner: 10Mforns) [15:57:18] (03PS1) 10Mforns: [WIP] Add spark job to create mediawiki history dumps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/528504 (https://phabricator.wikimedia.org/T208612) [15:57:39] (03CR) 10Mforns: [C: 04-2] "Still WIP" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/528504 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns) [15:57:51] elukey: https://gerrit.wikimedia.org/r/c/operations/puppet/+/528492/2/modules/profile/manifests/analytics/cluster/packages/common.pp [15:59:36] ottomata: looks good, not sure how require_package behaves in this case (since it should come before the surrounding class, but it is required by the apt stuff..) [15:59:55] i could change it before the class? [15:59:56] package { 'etc.': } is probably less convoluted, but I am not sure [16:00:04] but i think the class should declare the package resource too [16:00:08] so iexpectd it to work [16:00:13] super [16:00:15] looks good [16:00:48] k trying [16:00:53] ... well after meetings [16:02:21] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add spark job to create mediawiki history dumps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/528504 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns) [16:02:41] (03PS12) 10WMDE-leszek: Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 [16:11:42] (03PS13) 10WMDE-leszek: Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 [16:13:21] what? non-ancient python? this is a debian shop we only run stable 5 year old releases :P [16:15:14] haha, well buster is available now! [16:15:20] gotta figure out how to get to it [16:16:19] you are welcome to figure out how to use python 3.7 with your stuff after this; [16:16:28] we'll be eventually upgrading it to the default anyway [16:16:47] the installed debian python packages might not work tho; since they are meant for 3.5 [16:16:54] but if you ship all your deps to the cluster anyway [16:16:55] it might work [16:18:26] i might, i think the main thing 3.7 brings over 3.5 that we've been using is some more optional typing support, and better async, nifty but probably not super important [16:18:37] of course there is probably a wealth of other things i dont know that maybe i do want :) [16:19:03] and f strings? [16:19:53] oh, i guess those are nice [16:20:13] (03PS9) 10WMDE-leszek: Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup) [16:21:22] (03CR) 10WMDE-leszek: [C: 03+2] Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup) [16:21:39] (03CR) 10WMDE-leszek: [C: 03+2] Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 (owner: 10WMDE-leszek) [16:21:59] (03Merged) 10jenkins-bot: Fixed CI build [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/528180 (owner: 10WMDE-leszek) [16:22:16] (03Merged) 10jenkins-bot: Use the internal WDQS endpoint instead [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/526471 (https://phabricator.wikimedia.org/T214894) (owner: 10Ladsgroup) [16:26:21] (03PS3) 10Elukey: edit: set hive.exec.submit.local.task.via.child = false in the .hql file [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257) [16:30:26] (03PS4) 10Elukey: edit: set hive.exec.submit.local.task.via.child = false in the .hql file [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257) [16:30:35] mforns: hope that --^ is clearer [16:32:32] elukey, you want me to merge? [16:33:37] or just +1? [16:37:13] (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [16:37:30] ok, you have another +1, so you can merge :] [16:38:37] thanks!! [16:38:44] (03PS5) 10Elukey: edit: set hive.exec.submit.local.task.via.child = false in the .hql file [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257) [16:38:48] (03CR) 10Elukey: [V: 03+2 C: 03+2] edit: set hive.exec.submit.local.task.via.child = false in the .hql file [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528167 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [16:39:10] (03PS2) 10Elukey: Revert "Revert "edit-hourly: move oozie coordinator to hive2 actions"" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528407 [16:39:13] (03CR) 10Elukey: [V: 03+2 C: 03+2] Revert "Revert "edit-hourly: move oozie coordinator to hive2 actions"" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528407 (owner: 10Elukey) [16:42:01] logging off earlier o/ [16:49:19] o/ [18:27:53] ottomata, I'm getting an inconsistency between the mediawiki history job and the data in Hive, it's regarding the page_artificial_id, it's in the MediawikiEvent schema, but not in the hive table... does this ring a bell to you? [19:02:34] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) //Writing this down for future me:// I have 2 goals. 1. Get Spark scala and pyspark 2.3.0 with pyarrow to work in a heterogeneous Stretch + Buster environment. 2.... [19:03:09] mforns: MediawikiEvent schema? [19:03:19] ottomata, yes [19:03:36] that is part of mediawiki history? [19:04:01] the one in the file: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/mediawikihistory/denormalized/MediawikiEvent.scala [19:04:23] it has the page_artificial_id, but the table in Hive does not [19:04:34] hm, mforns no i don't know what this is [19:05:01] I think the hive table never had the page_artificial_id, but not sure how that comes to be [19:05:21] aye [19:06:14] I might not be able to use the MediawikiEvent... it seems tied to the history reconstruction process. :[ [19:06:25] I'll have to repeat a bunch of code.. [19:08:48] hmnmm [19:08:57] mforns: sorry i gotta run for a bit, not really sure if i cna help anyway :/ [19:09:09] gotta go get an internatinoal drivers permit! and pick up some malaria meds [19:09:12] back in a little while... [19:09:12] no problemo, will look more [19:09:16] k [19:22:06] I don't understand how the current code does not write the page_artificial_id o.O, the method toRow from MediawikiEvent is used just before writing the denormalized data to HDFS, so confused... [20:56:24] (back) [21:05:27] hey ottomata, I haven't found an explanation yet... :C [21:06:15] I'm going to go ahead and try to modify my current mediawiki history dump job to not use MediawikiEvent. [21:07:05] unless you want to pair with me and look into the crazinesssss [21:44:24] 10Analytics, 10Anti-Harassment (The Letter Song), 10Patch-For-Review: Instrument Special:Mute - https://phabricator.wikimedia.org/T224958 (10nettrom_WMF) >>! In T224958#5394397, @dmaza wrote: > @Niharika I think we might need approval from someone on the analytics team. Do you know who can we ping here? I'v... [22:27:37] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) Hm, more thoughts. Even if I figure out ^, I'm not sure it allow for an incremental upgrade to Buster. If we upgrade workers first, the Buster workers will only wor... [22:32:06] mforns: sorry I missed your ping! [23:29:59] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) > mayyyybe I can get something working where the python3.7 dependencies are always shipped along with the spark job to workers JAW DROP...I got this to work. 1. Get... [23:39:28] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Rebuild spark2 for Debian Buster - https://phabricator.wikimedia.org/T229347 (10Ottomata) Whoa wait a minute. If I can do ^, I can upgrade to Spark 2.4.3 before buster. I'm not relying on the Debian packaged python dependencies anymore...I might even be a...