[06:02:30] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 (again) - https://phabricator.wikimedia.org/T281316 (10elukey) Hi Goran!  We moved the hive server nodes to Debian Buster recently (T231067) and we had a problem with Hive and Mariadb,...
[06:24:41] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 (again) - https://phabricator.wikimedia.org/T281316 (10GoranSMilovanovic) @elukey Thank your for a prompt response, Luca!  > The first error is unrelated, it is due to the fact that you...
[06:37:00] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 (again) - https://phabricator.wikimedia.org/T281316 (10GoranSMilovanovic) @elukey   > As a quick workaround you should be able to unblock your queries just removing --driver org.mariadb...
[06:37:26] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 (again) - https://phabricator.wikimedia.org/T281316 (10GoranSMilovanovic) p:05High→03Low
[06:47:31] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 (again) - https://phabricator.wikimedia.org/T281316 (10elukey) Nice! Let's keep it open since I want to understand if we need to use `--driver com.mysql.jdbc.Driver` or not, it will hav...
[08:00:17] <wikibugs>	 10Analytics, 10WMCZ-Stats: Review request: New datasets for WMCZ published under analytics.wikimedia.org - https://phabricator.wikimedia.org/T279567 (10Urbanecm) Hello @JAllemandou, thanks for your reply. After a discussion with @Ottomata, it seems the Analytics team is okay with me publishing the data using t...
[08:28:29] <wikibugs>	 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.32; 2021-02-23), 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-01-20): Add edit count bucketing to all metrics - https://phabricator.wikimedia.org/T269986 (10awight)
[08:28:34] <wikibugs>	 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-02-03): Adjust edit count bucketing for CodeMirror - https://phabricator.wikimedia.org/T273471 (10awight) 05Open→03Resolved
[08:28:36] <wikibugs>	 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.32; 2021-02-23), 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-01-20): Add edit count bucketing to all metrics - https://phabricator.wikimedia.org/T269986 (10awight)
[08:28:43] <wikibugs>	 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-02-03): Adjust edit count bucketing for TemplateData - https://phabricator.wikimedia.org/T272569 (10awight) 05Open→03Resolved a:03awight
[08:29:11] <wikibugs>	 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.32; 2021-02-23), 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-01-20): Add edit count bucketing to all metrics - https://phabricator.wikimedia.org/T269986 (10awight)
[08:29:15] <wikibugs>	 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), 10Patch-For-Review, and 2 others: Adjust edit count bucketing for VisualEditor, segment all metrics - https://phabricator.wikimedia.org/T273474 (10awight) 05Open→03Resolved
[08:29:27] <wikibugs>	 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-02-03): Adjust edit count bucketing for TemplateWizard, segment all metrics - https://phabricator.wikimedia.org/T273475 (10awight) 05Open→03Resolved a:03awight
[08:29:31] <wikibugs>	 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.32; 2021-02-23), 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-01-20): Add edit count bucketing to all metrics - https://phabricator.wikimedia.org/T269986 (10awight)
[08:58:16] <wikibugs>	 (03PS5) 10Kosta Harlan: [WIP] Create structuredtask/article/link-recommendation schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177)
[09:21:59] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 (again) - https://phabricator.wikimedia.org/T281316 (10GoranSMilovanovic) @elukey No worries. Let me know if you need any external tests performed.
[09:27:55] <wikibugs>	 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.32; 2021-02-23), 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-01-20): Add edit count bucketing to all metrics - https://phabricator.wikimedia.org/T269986 (10awight) 05Open→03Resolved a:03awight
[09:57:51] <mforns>	 beep(sound=1)
[09:57:55] <mforns>	 oops!
[10:36:02] * elukey lunch!
[10:46:18] <wikibugs>	 (03CR) 10Hnowlan: "> Patch Set 1: Code-Review+1" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/682933 (https://phabricator.wikimedia.org/T278701) (owner: 10Hnowlan)
[12:07:23] <wikibugs>	 10Analytics-Clusters: Re-create deployment-aqs cluster - https://phabricator.wikimedia.org/T272722 (10hnowlan) >>! In T272722#7024151, @Ottomata wrote: > Would it be worth moving AQS to deployment pipeline?  Even if you don't use it in prod k8s , having the docker image would allow you to use [[ https://github.c...
[12:37:27] <ottomata>	 elukey:  o/
[12:37:39] <ottomata>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/683053 should be ok right/
[12:37:40] <ottomata>	 ?
[12:37:45] <ottomata>	 adding data purge jobs in test cluster
[12:46:23] <ottomata>	 merged ^
[12:47:14] <ottomata>	 milimetric:  no train yet, right?
[12:50:06] <ottomata>	 !log applied data_purge jobs in analytics test cluster; old data will now be dropped there - T273789
[12:50:10] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[12:50:10] <stashbot>	 T273789: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789
[12:57:32] <elukey>	 ottomata: just seen the ping sorry
[12:57:54] <ottomata>	 elukey:  np! :)
[13:12:17] <wikibugs>	 (03PS1) 10Ottomata: RefineSanitizeMonitor - pass keep_all_enabled to SanitizeTransformation.loadAllowlist [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/683275 (https://phabricator.wikimedia.org/T273789)
[13:12:27] <ottomata>	 milimetric:  i have anotoher refinery-source fix i'd like to go out
[13:12:42] <ottomata>	 i think you have already done refinery-source release, but not yet refinery deploy, right?
[13:12:49] <ottomata>	 i'll go ahead and do another refinery-source release 
[13:23:26] <milimetric>	 ottomata: it failed twice last night, some weird error trying to fetch some dependencies
[13:23:51] <milimetric>	 https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/81/
[13:24:10] <milimetric>	 I'm just starting now, first thing I was gonna try and figure out what's going on with that
[13:26:10] <elukey>	 milimetric: o/ when you have a moment I'd ask a favor, namely if we could check on an-launcher1002 that sqoop runs fine. With the reimage of an-coord100[1,2] to Buster + the fix to use the com.mysql.jdbc.Driver instead of the mariadb one I want to make sure that everything works :)
[13:26:25] <elukey>	 we can do it even tomorrow
[13:27:38] <milimetric>	 elukey: as soon as I can figure out what's wrong with the jenkins / maven build, I have to deploy the grouped_wikis file change, so I was going to test sqoop anyway.  Hopefully I can do that soon 'cause we only got today and tomorrow
[13:27:48] <milimetric>	 this build error is the weirdest...
[13:28:03] <elukey>	 <3 sure
[13:28:23] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] RefineSanitizeMonitor - pass keep_all_enabled to SanitizeTransformation.loadAllowlist [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/683275 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata)
[13:28:46] <wikibugs>	 (03PS1) 10Ottomata: Changelog entry for 0.1.8 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/683279
[13:28:47] <milimetric>	 ... maybe I'll just try again since it looks like maven was getting all kinds of strange things like "[INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-site-plugin:3.9.1:deploy (default-deploy) on project refinery: Wagon protocol 'https' doesn't support directory copying -> [Help 1]"
[13:29:06] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Changelog entry for 0.1.8 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/683279 (owner: 10Ottomata)
[13:30:19] <ottomata>	 oh milimetric  i'm still merging!
[13:30:22] <ottomata>	 for 0.1.8
[13:30:25] <ottomata>	 just mered ^^
[13:31:14] <ottomata>	 huh intersting though
[13:31:18] <ottomata>	 if 0.1.7 didn't go out
[13:31:19] <ottomata>	 ...
[13:31:19] <milimetric>	 ottomata: it's still 0.1.7 technically
[13:31:36] <ottomata>	 hm i've got the 0.1.7 tag locally
[13:31:52] <milimetric>	 ...hm, can we skip a version?
[13:32:02] <milimetric>	 like if I just do another changelog, will it just skip it?
[13:32:03] <ottomata>	 the tag also exists in gerrit
[13:32:15] <ottomata>	 i'm pretty sure if you do the release it will do 0.1.8 now
[13:32:25] <ottomata>	 it iis baed on whats in git, not archiva
[13:32:50] <milimetric>	 ok, so I'll just do another changelog change and see what happens
[13:32:55] <milimetric>	 then we can troubleshoot the build
[13:32:56] <ottomata>	 milimetric:  already done :)
[13:33:04] <ottomata>	 i was about to push the button and then saw your messages
[13:33:08] <milimetric>	 ah
[13:33:18] <milimetric>	 ok, I'm pushing the button
[13:33:20] <ottomata>	 so um, i guess go ahead
[13:33:21] <ottomata>	 yeah ok
[13:41:43] <mforns>	 helloooo teammm
[13:44:17] <ottomata>	 yoohoo
[13:44:36] <ottomata>	 milimetric:  failed?
[13:44:47] <milimetric>	 ya
[13:45:29] <ottomata>	 hmm i bet its relatted to the new cI stuff
[13:45:30] <ottomata>	 aded
[13:45:42] <ottomata>	 Wagon protocol 'https' doesn't support directory copying
[13:45:43] <ottomata>	 ?
[13:45:48] <ottomata>	 gehel: yt?
[13:45:59] <milimetric>	 the first failure yesterday was throwing a bunch of style errors, so I ran it on my local and it did the same but succeeded
[13:46:01] <gehel>	 yep
[13:46:05] <gehel>	 context?
[13:46:18] <ottomata>	 trying to release a new refinery source version
[13:46:20] <milimetric>	 so I looked closer and it also threw some weird network error trying to fetch dependencies
[13:46:30] <ottomata>	 maven release or deploy is failing in jenkins
[13:46:36] <ottomata>	 https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/83/console
[13:46:36] <milimetric>	 so then I did another build and that one didn't have any style errors but had a different fetch error
[13:46:41] <ottomata>	 not quite sure why
[13:46:52] <milimetric>	 https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/80/console
[13:46:52] <ottomata>	 but assuming something has changed with the patches we merged last friday
[13:46:55] <milimetric>	 https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/81/console
[13:47:03] <milimetric>	 but the build works fine locally
[13:47:32] <milimetric>	 at least 83 is consistent with 81, they both say "[ERROR] Failed to execute goal org.apache.maven.plugins:maven-site-plugin:3.9.1:deploy (default-deploy) on project refinery: Wagon protocol 'https' doesn't support directory copying -> [Help 1]"
[13:47:34] <gehel>	 strange, looks like the site is trying to be released
[13:47:44] <ottomata>	 the site...dcos?
[13:47:46] <ottomata>	 docs*?
[13:47:46] <gehel>	 needs some time looking at it
[13:47:48] <gehel>	 yep
[13:49:00] <ottomata>	 gehel:  this one?
[13:49:01] <ottomata>	 https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/681989
[13:49:24] <isaacj>	 milimetric: could you give me read permissions for `hdfs:/tmp/for_isaac/referrals-for-2021-03-04.tsv`
[13:49:32] <gehel>	 yeah, but just adding site endpoint should not enable the relase
[13:49:36] <ottomata>	 oh 
[13:49:38] <ottomata>	 is that because
[13:49:40] <ottomata>	 https://gerrit.wikimedia.org/r/c/integration/config/+/681988
[13:49:47] <ottomata>	 is not merged yet?
[13:50:11] <gehel>	 nope, unrelated, that would create another job, should not impact the release job
[13:50:21] <milimetric>	 ok, isaacj, try now?
[13:50:24] <ottomata>	 ok, gehel  can/should I revert https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/681989 for now?
[13:50:27] <milimetric>	 I thought everyone could read everything in /tmp
[13:51:02] <ottomata>	 milimetric: if a afile is created there it might have a umask that allowws that, if you move a file there it will keep its previous permissions (pretty sure)
[13:51:19] <milimetric>	 ah, you're right, that's what happened, my bad
[13:52:11] <gehel>	 ottomata: can you revert that one and see if it works better?
[13:52:18] <gehel>	 meeting in 5', I'll disapear
[13:52:28] <ottomata>	 ok
[13:52:38] <wikibugs>	 (03PS1) 10Ottomata: Revert "Ensure that maven site generation works." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/683139
[13:52:43] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Revert "Ensure that maven site generation works." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/683139 (owner: 10Ottomata)
[13:53:47] <wikibugs>	 (03PS1) 10Ottomata: Update changelog for 0.1.9 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/683287
[13:53:54] <wikibugs>	 (03PS2) 10Ottomata: Update changelog for 0.1.9 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/683287
[13:54:03] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Update changelog for 0.1.9 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/683287 (owner: 10Ottomata)
[13:54:13] <ottomata>	 milimetric:  gehel  going to try to release again
[13:54:31] * milimetric reads console output
[13:54:33] <ottomata>	 https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/84/console
[13:54:42] <isaacj>	 https://www.irccloud.com/pastebin/75akj3q7/
[13:56:17] <milimetric>	 ok isaacj, I chowned it to you (sorry I thought just adding +rx would solve it but apparently not)
[13:56:43] <milimetric>	 (and chowning these days means I gotta log into a different server with the hdfs keytab and apparently I'm extremely lazy :))
[13:57:30] <ottomata>	 mforns:  would you some time this morning to pair with me on a data copy?  i'm going to copy tables into event_sanitized
[13:57:58] <ottomata>	 simliar to the table rename we did before...but less dangerous :)
[13:58:30] <mforns>	 ottomata: yes, ofc, in an hour is good for you?
[13:58:52] <isaacj>	 milimetric: still getting the same error and showing you own the file... delay possibly or am i somehow looking at the wrong file?
[13:58:53] <ottomata>	 mforns:  i think that should be fine, we have pa sync in 1.5 hours
[13:59:03] <ottomata>	 you can at least go over with me my procedure and double check it
[13:59:11] <ottomata>	 the actual copies and repair tables will take a while
[13:59:26] <milimetric>	 https://www.irccloud.com/pastebin/nYrlz0e9/
[13:59:31] <milimetric>	 isaacj: ^
[13:59:49] <gehel>	 ottomata: http://maven.apache.org/maven-release/maven-release-plugin/perform-mojo.html#goals
[13:59:55] <isaacj>	 so weird...let me switch to stat1004 in case that somehow matters
[14:00:08] <gehel>	 we overwrite that property for other projects, so it does not interfere
[14:00:15] <gehel>	 I'll push another attempt later on
[14:01:07] <isaacj>	 https://www.irccloud.com/pastebin/d5LicV1M/
[14:01:37] <isaacj>	 no idea how we have different states of ownership on the same file
[14:02:08] <isaacj>	 unless does the whole folder need to be chowned over?
[14:02:12] <isaacj>	 milimetric: ^
[14:02:36] <milimetric>	 doh!  you're right, I'm so bad
[14:03:32] <milimetric>	 ok isaacj, now?
[14:03:53] <isaacj>	 yep! thanks!
[14:04:08] <isaacj>	 i feel like i should have known that about file permissions but actually didn't
[14:06:44] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) To prep for moving historical event data to event_sanitized, we need to copy the tables we'll be keeping for...
[14:07:12] <milimetric>	 ottomata: yay
[14:07:14] <ottomata>	 mforns: draft procedure: https://phabricator.wikimedia.org/T273789#7042025
[14:07:27] <milimetric>	 ottomata: shall I continue the train then?
[14:07:30] <ottomata>	 milimetric:  great!  gehel  yeah that was it, release suceeded, thank you
[14:07:32] <ottomata>	 milimetric:  yes please!
[14:07:45] <ottomata>	 i'll handle any pppet changes i need after that's done
[14:07:50] <gehel>	 ottomata: sorry for the mess! I'll do some cleanup!
[14:08:00] <milimetric>	 k, that includes merging ra-zzi's change otto
[14:08:03] <ottomata>	 np thanks for it all!
[14:08:06] <ottomata>	 oh
[14:08:11] <ottomata>	 well milimetric  i wwon't do that :p
[14:08:17] <ottomata>	 i'll let him do that, i meant anything not in the train doc yet
[14:08:25] <ottomata>	 that has to do with the refine sanitize stuff i'm working on
[14:08:29] <milimetric>	 ottomata: I think you have to otherwise the jobs will fail
[14:08:32] <ottomata>	 oh...
[14:08:34] <ottomata>	 ok?
[14:08:49] <milimetric>	 it's just the rename of the wikis file, it's used by the siteinfo dumps timer
[14:08:58] <ottomata>	 ok https://gerrit.wikimedia.org/r/c/operations/puppet/+/682791 after the deploy?
[14:08:59] <mforns>	 ottomata: do you want me to read that now, and then pair at the end of the hour, or is that for when we meet?
[14:09:01] <milimetric>	 anyway, I'll just point you to the patch you gotta +2
[14:09:13] <ottomata>	 mforns:  just putting there for your convenince, that's what we'll talk about in an hour
[14:09:24] <mforns>	 ottomata: cool
[14:10:12] <wikibugs>	 (03PS1) 10Maven-release-user: Add refinery-source jars for v0.1.9 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683292
[14:10:48] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add refinery-source jars for v0.1.9 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683292 (owner: 10Maven-release-user)
[14:13:14] <milimetric>	 hm, looks like no jobs to restart so I'll just do the refinery deploy and sync (in 5 min to give the git fat stuff a chance to catch up)
[14:14:25] <ottomata>	 great
[14:23:16] <milimetric>	 elukey: this might have something to do with the Buster reimage on the coords:
[14:23:19] <milimetric>	 https://www.irccloud.com/pastebin/uQhcAbsQ/
[14:28:54] <elukey>	 milimetric: yes definitely
[14:29:52] <elukey>	 milimetric: should be fixed now if you want to retry
[14:30:13] <elukey>	 !log chown -R analytics-deploy:analytics-deploy /srv/deployment/analytics on an-coord1001
[14:30:15] <milimetric>	 ok, the rest are finishing up, I'll redo coord after
[14:30:16] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:30:22] <milimetric>	 thanks!!
[14:31:40] <elukey>	 hope that it does the trick
[14:34:34] <milimetric>	 elukey: success
[14:42:22] <milimetric>	 !log deployed refinery with 0.1.9 jars and synced to hdfs
[14:42:24] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:42:34] <milimetric>	 ok, ottomata, sync done, you can +2 https://gerrit.wikimedia.org/r/c/operations/puppet/+/682791
[14:42:36] <milimetric>	 and I think that's it
[14:42:54] <ottomata>	 ok done ty
[14:54:13] <ottomata>	 fkaelin:  is this cool? https://analytics-zoo.readthedocs.io/en/latest/doc/UseCase/spark-dataframe.html
[14:54:16] <ottomata>	 i cannot tell :)
[14:58:44] <mforns>	 ottomata: I'm ready, whenever you're ok :]
[15:05:49] <ottomata>	 ok bc!
[15:06:31] <wikibugs>	 (03PS1) 10Gehel: Ensure that maven site generation works. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/683351
[15:07:00] <gehel>	 ottomata: ^ this should fix the issue
[15:07:13] <gehel>	 I haven't actually tried doing a relase
[15:07:39] <gehel>	 s/relase/release/
[15:12:50] <icinga-wm>	 PROBLEM - Check unit status of refine_event_sanitized_analytics_immediate on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit refine_event_sanitized_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[15:23:32] <razzi>	 Hi team, g'day
[15:34:27] <ottomata>	 yoohoo
[15:34:36] <ottomata>	 razzi: fyi i merged your grouped wikis patch after dan deployed
[15:55:59] <fkaelin>	 ottomata analytics-zoo does look interesting, more ambitious overall than https://github.com/criteo/tf-yarn, though the latter has gpu support. analytics-zoo does open the door for running the popular deep learning frameworks on yarn, the largest compute resource available to us. e.g. if there is an pretrained pytorch model that somebody wants to use for inference on wmf data, they could use analytics-zoo to run on yarn.
[15:59:36] <ottomata>	  also fkaelin  FYI, elukey and joal just added support for targeting hadoop nodes with GPUs via labels
[15:59:40] <ottomata>	 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop 
[15:59:42] <ottomata>	 see GPU support
[15:59:53] <joal>	 fkaelin: I think miriam_ already knows it :)
[16:00:09] <isaacj>	 milimetric: 
[16:00:27] <isaacj>	 the data looks good. is the code that generates the TSV somewhere in the patch or elsewhere?
[16:01:14] <milimetric>	 isaacj: yep, its https://gerrit.wikimedia.org/r/c/analytics/refinery/+/655804/14/oozie/referrer/daily/archive.hql
[16:01:48] <milimetric>	 and thanks for looking!  I'll start it later today and see if it fares better in production than it did in my testing (I suspect it was some permissions nonsense but couldn't find the error)
[16:04:27] <isaacj>	 oh excellent - i somehow missed that. if it's easy to update still, my only comment would be to add the header and concatenate year, month, day into a single column called time with the format YYYY-MM-DD. i'm not sure how to trigger the header but the time column would be  `CONCAT(LPAD(year, 4, "0"), LPAD(month, 2, "0"), LPAD(day, 2, "0")) AS time` (turnilo has weird requirements around the time column)
[16:05:03] <isaacj>	 and thanks milimetric -- excited to get this data up!
[16:05:17] <fkaelin>	 ottomata joal, that is nice. Miriam and Aiko are probably already using that label, these idle gpus have been in their sight for a while.
[16:06:24] <milimetric>	 isaacj: we usually just put a README with header info in the same folder, would that be ok?
[16:07:28] <isaacj>	 ahh -- if that's more standard, go for it. i suspect i'll have to make some tweaks for turnilo anyways so it's easy to handle the header then
[16:20:33] <wikibugs>	 10Analytics, 10Analytics-Kanban: Consolidate labs / production sqoop lists to a single list - https://phabricator.wikimedia.org/T280549 (10razzi) This has been re-deployed: https://gerrit.wikimedia.org/r/c/operations/puppet/+/682791
[16:30:07] <elukey>	 razzi: I am ok if you want to reimage tomorrow but are you going to add the plan to https://phabricator.wikimedia.org/T278423 ?
[16:30:13] <elukey>	 (didn't get it during standup)
[16:30:43] <elukey>	 the interesting bit is to think about what could go wrong and contingency plans etc..
[16:30:59] <elukey>	 the an-master nodes are critical for the whole infra
[16:32:18] <razzi>	 elukey: right, I'll add what I've thought about so far there; and we can discuss tomorrow before deciding to proceed or not
[16:33:34] <elukey>	 razzi: let's first talk about the plan in detail, with all the steps etc.., then we schedule the maintenance
[16:45:46] <razzi>	 elukey: sounds good
[16:52:09] <icinga-wm>	 RECOVERY - Check unit status of refine_event_sanitized_analytics_immediate on an-launcher1002 is OK: OK: Status of the systemd unit refine_event_sanitized_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[16:53:56] <hnowlan>	 !log stopping deployment-eventlog05 in deployment-prep
[16:53:59] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[16:54:56] <wikibugs>	 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10phuedx) a:03phuedx
[17:08:09] <hnowlan>	 just fyi razzi and I are gonna make eventlog1003 in a few minutes
[17:08:24] <hnowlan>	 make it live as an eventlogging host that is
[17:08:40] <hnowlan>	 shutting down the old host in deployment-prep went fine, the new host took over no bother 
[17:26:25] <razzi>	 !log manually change /srv/deployment/eventlogging/analytics/.git/DEPLOY_HEAD to deployment1002 on deployment1002 to fix puppet scap error
[17:26:27] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[17:31:12] <razzi>	 !log remove deployment cache on eventlogging1003: sudo rm -fr /srv/deployment/eventlogging/analytics-cache/
[17:31:15] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[17:36:06] <razzi>	 !log sudo mkdir /srv/log/eventlogging and sudo chown eventlogging:eventlogging /srv/log/eventlogging to workaround missing directory puppet error (to be puppetized later)
[17:36:09] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[17:46:53] <hnowlan>	 !log eventlog1003 joined to groups successfully
[17:46:54] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[17:47:40] <hnowlan>	 for the time being eventlog1002 is still doing all the work - we're gonna experiment with switching traffic over to eventlog1003 tomorrow 
[17:57:51] <elukey>	 hnowlan: in theory the 1003's consumers should get added to the consumer group, and split the work with 1002
[17:57:58] <elukey>	 have you folks stopped the consumers on 1003?
[17:58:07] <elukey>	 otherwise they will take traffic automatically
[17:58:32] <elukey>	 razzi: --^
[18:01:36] <elukey>	 ok so the consumers are up
[18:11:20] <elukey>	 but it seems zero partitions assignment to eventlog1003
[18:14:26] <razzi>	 elukey: my understanding was that since there's only 1 partition of eventlogging_client_side, 1003 will not do anything until 1002 is down
[18:15:00] <elukey>	 razzi: nono we have 12, this is why I was asking
[18:15:20] <elukey>	 I expected 1003 to take say 6
[18:16:36] <elukey>	 razzi: ahh okok so there are two topics, with - and _
[18:16:43] <elukey>	 eventlogging_client_side has indeed 1 partition
[18:16:49] <elukey>	 but the one with - has 12
[18:16:55] <elukey>	 and we pull from it
[18:17:03] <razzi>	 oh ok confusing
[18:17:42] <elukey>	 yes definitely
[18:17:48] <elukey>	 even more that the group didn't rebalance :D
[18:18:25] <elukey>	 razzi: on any kafka jumbo you can run
[18:18:26] <elukey>	 kafka consumer-groups --describe --group eventlogging_processor_client_side_00
[18:18:29] <elukey>	 to get the current assignments
[18:18:32] <elukey>	 all 1002
[18:18:37] <elukey>	 ottomata: do you have any idea?
[18:24:35] <elukey>	 going to dinner, ttl!
[19:02:49] <wikibugs>	 10Analytics, 10Product-Analytics: Aggregate table not working after superset upgrade - https://phabricator.wikimedia.org/T280784 (10razzi) I believe this can be resolved by switching from the legacy druid connector to druid tables:  > A native connector to Druid ships with Superset (behind the DRUID_IS_ACTIVE...
[19:15:17] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering: Drop old WMDEBanner events from Hive - https://phabricator.wikimedia.org/T281300 (10Ottomata) @Addshore can you help this ticket find its way to the right people at WMDE? Thank you!
[19:17:15] <wikibugs>	 (03PS1) 10Ottomata: Update event_sanitized_main_allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683421 (https://phabricator.wikimedia.org/T273789)
[19:21:56] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) For reference, the current sizes of the tables to move to event_sanitized are: ` 9.3 M  /wmf/data/event/medi...
[19:22:50] <ottomata>	 hm elukey  razzi , no i'd expect 1003 to get half the partitions too
[19:23:26] <ottomata>	 mforns:  i'm going to copy over the two centralnotice campaign tables
[19:23:27] <ottomata>	 they are small
[19:23:38] <ottomata>	 also, resource_change is very large, 4T, maybe we don't need to keep taht indefinitely?
[19:23:50] <ottomata>	 probably not that useful, it only has notification about things that have changed, but none of their state
[19:23:52] <ottomata>	 dunno
[19:23:56] <mforns>	 ottomata: I was just going to take a break before my meeting at the end of the hour
[19:24:07] <ottomata>	 mforns:  no worries, i will do these, if i have problems i'll ask ya
[19:24:14] <ottomata>	 i think after our session earlier i can begin
[19:24:27] <ottomata>	 Pchelolo:  q, is there any value in keeping resource_change events forever in hadoop?
[19:24:36] <mforns>	 ottomata: no context on what's in resource_change...
[19:24:51] <ottomata>	 its basically use restbase urls that need rerendering
[19:25:00] <ottomata>	 but is meant to be more abstract than that
[19:30:15] <AndyRussG>	 hi ottomata mforns lmk if there are any centralnotice-y topics I could potentially help with btw
[19:31:05] <ottomata>	 oh AndyRussG  we are just going to be sanitizing and retaining the mediawiki state change event data in the event_sanitiized database
[19:31:06] <AndyRussG>	 also thanks for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralNotice/+/677015 (event logging banner history patch) -- I'll just smoke test that a bit later, looks totally fine
[19:31:15] <ottomata>	 which includes mediawiki_centralnotice_campaign_change and mediawiki_centralnotice_campaign_create
[19:31:17] <Pchelolo>	 not as I know of ottomata
[19:31:26] <ottomata>	 so we need to copy it there, and then set up purge jobs for the data in the event database
[19:31:29] <ottomata>	 ok Pchelolo  i will not keep it around then
[19:31:30] <ottomata>	 thank you
[19:31:37] <ottomata>	 great AndyRussG  thank you
[19:31:42] <ottomata>	 looking forward to that one :)
[19:32:07] <AndyRussG>	 ottomata heheh yeah really sorry for the delay on that......... 8p thank u! :)
[19:33:16] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-New-Editors-Banner-Campaigns: Drop old WMDEBanner events from Hive - https://phabricator.wikimedia.org/T281300 (10Addshore) Will do (sending email now)
[19:33:19] <AndyRussG>	 ah also thx for the explanation about state data ^
[19:33:30] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-New-Editors-Banner-Campaigns: Drop old WMDEBanner events from Hive - https://phabricator.wikimedia.org/T281300 (10Addshore)
[19:59:14] <wikibugs>	 10Analytics, 10Product-Analytics: Aggregate table not working after superset upgrade - https://phabricator.wikimedia.org/T280784 (10Esanders) I now get this error: {F34432536,size=full}
[19:59:32] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-New-Editors-Banner-Campaigns: Drop old WMDEBanner events from Hive - https://phabricator.wikimedia.org/T281300 (10Ottomata) Thanks! :)
[20:02:44] <wikibugs>	 (03PS2) 10Ottomata: Update event_sanitized_main_allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683421 (https://phabricator.wikimedia.org/T273789)
[20:04:57] <ottomata>	 mforns:  FYI i changed the procedure a bit
[20:04:57] <ottomata>	 https://phabricator.wikimedia.org/T273789#7042025
[20:05:06] <ottomata>	 using distcp hopefully wlil be a little more efficient
[20:05:13] <ottomata>	 so i am copying all data upfront now
[20:06:16] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) FYI: We are not going to sanitize and keep resource_change
[21:23:34] <wikibugs>	 10Analytics, 10Product-Analytics: Aggregate table not working after superset upgrade - https://phabricator.wikimedia.org/T280784 (10cchen) Thanks for the information @razzi! Is there any way that we can show the druid tables only in the dropdown?  @Esanders I recreate your chart with the new data source, looks...
[21:47:34] <wikibugs>	 10Analytics, 10Product-Analytics: Aggregate table not working after superset upgrade - https://phabricator.wikimedia.org/T280784 (10razzi) @cchen do you mean when switching data sources / creating new charts? We've already [disabled the legacy druid connection](https://gerrit.wikimedia.org/r/plugins/gitiles/op...
[22:51:13] <wikibugs>	 10Analytics-Kanban, 10SRE, 10ops-eqiad: Degraded RAID on an-worker1100 - https://phabricator.wikimedia.org/T280132 (10razzi)
[22:58:17] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban: Re-add disk to an-worker1100 - https://phabricator.wikimedia.org/T281427 (10razzi)
[23:05:18] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban: Re-add disk to an-worker1100 - https://phabricator.wikimedia.org/T281427 (10razzi) @elukey I was following the instructions at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration#Swapping_broken_disk but I got a nonzero exit code.  Based...