[00:53:08] <wikibugs>	 (03PS2) 10Milimetric: [WIP] Use page move events to improve joining to entity [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/594428 (https://phabricator.wikimedia.org/T249773)
[00:54:02] <wikibugs>	 (03CR) 10Milimetric: "still have to test, so still [WIP], but I appreciate the early catches." (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/594428 (https://phabricator.wikimedia.org/T249773) (owner: 10Milimetric)
[04:20:55] <icinga-wm>	 RECOVERY - Check the last execution of monitor_refine_sanitize_eventlogging_analytics_delayed on an-launcher1001 is OK: OK: Status of the systemd unit monitor_refine_sanitize_eventlogging_analytics_delayed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[04:23:27] <icinga-wm>	 RECOVERY - Check the last execution of monitor_refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is OK: OK: Status of the systemd unit monitor_refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[04:50:20] <wikibugs>	 10Analytics, 10Operations: Analytics1060 unresponsive - https://phabricator.wikimedia.org/T251973 (10Marostegui)
[04:55:30] <nuria>	 ebernhardson: tried to change ownership with anayltics user but couldn't maybe the hdfs user is needed here  (cc elukey )
[06:03:23] <elukey>	 nuria: correct, hdfs is needed
[06:03:53] <elukey>	 I see that Erik also owns all the subdirs
[06:05:03] <icinga-wm>	 RECOVERY - Check the last execution of refine_sanitize_eventlogging_analytics_delayed on an-launcher1001 is OK: OK: Status of the systemd unit refine_sanitize_eventlogging_analytics_delayed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[06:05:17] <elukey>	 !log execute hdfs dfs -chown -R analytics-search:analytics-search-users  /wmf/data/discovery/search_satisfaction/daily/year=2019
[06:05:19] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[06:05:31] <elukey>	 ebernhardson: done!
[06:07:17] <elukey>	 Going to be afk for most of the morning, but available on the phone if needed o/
[06:52:53] <wikibugs>	 10Analytics: Javascript-less Wikistats - https://phabricator.wikimedia.org/T251979 (10fdans)
[07:39:29] <wikibugs>	 10Quarry, 10DBA, 10Data-Services: Unable to use force index on replicas (Key 'PRIMARY' doesn't exist in table 'page') - https://phabricator.wikimedia.org/T251980 (10RhinosF1)
[07:40:57] <wikibugs>	 10Quarry, 10DBA, 10Data-Services: Unable to use force index on replicas (Key 'PRIMARY' doesn't exist in table 'page') - https://phabricator.wikimedia.org/T251980 (10Marostegui) >>! In T251980#6111712, @Akeron wrote: > I used https://quarry.wmflabs.org to test those queries on enwiki_p. >  > It is very penali...
[07:54:56] <wikibugs>	 10Quarry, 10Data-Services: Unable to use force index on replicas (Key 'PRIMARY' doesn't exist in table 'page') - https://phabricator.wikimedia.org/T251980 (10Marostegui)
[08:50:16] <elukey>	 interesting JA008: File does not exist: hdfs://analytics-hadoop/user/oozie/share/lib/lib_20200204183338/hive2/libfb303-0.9.3.jar
[08:51:07] <elukey>	 this is the pageview hourly coord
[08:52:30] <elukey>	 elukey@stat1005:~$ ls -l /mnt/hdfs/user/oozie/share/lib/
[08:52:30] <elukey>	 total 4
[08:52:31] <elukey>	 drwxr-xr-x 13 99 hadoop 4096 Apr 29 07:24 lib_20200429072322
[09:01:13] <elukey>	 that comes from
[09:01:13] <elukey>	 /var/log/puppet.log.7.gz:224:Apr 29 07:24:24 an-coord1001 puppet-agent[169992]: (/Stage[main]/Cdh::Oozie::Server/Kerberos::Exec[oozie_sharelib_install]/Exec[oozie_sharelib_install]/returns) executed successfully
[09:04:37] <elukey>	 !log execute oozie admin -sharelibupdate on an-coord1001
[09:04:39] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:05:26] <elukey>	 !log re-run pageview-hourly coordinator 2020-5-6-6 after oozie shared lib update
[09:05:28] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:07:59] <elukey>	 !log re-run data quality coordinators for 2020-5-6-5/6 after oozie shared lib update
[09:08:01] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:08:54] <elukey>	 !log re-run mediarequest coordinator for 2020-5-6-7 after oozie shared lib update
[09:08:56] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:09:44] <elukey>	 !log re-run mediacounts coordinator for 2020-5-6-7 after oozie shared lib update
[09:09:45] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:10:21] <elukey>	 !log re-run aqs-hourly coordinator for 2020-5-6-7 after oozie shared lib update
[09:10:22] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:11:06] <jbond42>	 hi elukey is wdqs an analytics service? 
[09:11:09] <elukey>	 !log re-run learning features actor coordinator for 2020-5-6-7 after oozie shared lib update
[09:11:11] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:11:16] <elukey>	 jbond42: nope! Search's
[09:11:20] <jbond42>	 ack thanks
[09:11:22] <elukey>	 is it exploding?
[09:12:02] <jbond42>	 no i just noticed that wdqs-updater is failing to start in wdqs1009
[09:12:31] <jbond42>	 massive stack trace with some spark-query at the begining
[09:12:42] <elukey>	 jbond42: ahhh okok there was a problem yesterday, that is a test host so nothing horrible, but worth to follow up with Search
[09:13:11] <elukey>	 !log re-run apis coordinator for 2020-5-6-7 after oozie shared lib update
[09:13:13] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:13:47] <jbond42>	 ahh ok thanks ill ping them
[09:22:58] <wikibugs>	 10Analytics, 10Operations, 10Traffic, 10Patch-For-Review: Create replacement for Varnishkafka - https://phabricator.wikimedia.org/T237993 (10fgiunchedi) Chiming in with two cents and my Prometheus hat:  I agree with @ema that none of the options are great unfortunately. Rephrasing to make sure I understand...
[09:24:30] <elukey>	 !log re-run virtualpageview coordinator for 2020-5-6-5 after oozie shared lib update
[09:24:33] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:25:18] <elukey>	 !log re-run projectview coordinator for 2020-5-6-5 after oozie shared lib update
[09:25:21] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:25:50] <elukey>	 hopefully all coords restarted
[11:20:32] <wikibugs>	 10Analytics, 10Operations, 10Security-Team, 10CAS-SSO, and 2 others: Log / alert on too many failing logins / Throttling login attempts - https://phabricator.wikimedia.org/T233944 (10MoritzMuehlenhoff)
[11:22:38] <wikibugs>	 10Analytics, 10CAS-SSO, 10User-Elukey: Secure Hue/Superset/Turnilo with CAS (and possibly 2FA) - https://phabricator.wikimedia.org/T159584 (10MoritzMuehlenhoff)
[11:22:40] <elukey>	 back!
[11:23:25] <joal>	 Hi :)
[11:27:31] <joal>	 Wow - Thanks elukey for the restarts and all
[11:28:06] <joal>	 elukey: any idea how we ended up with a corruipted oozie sharelib?
[11:28:29] <elukey>	 joal: not corrupted, the dir got re-created and the old one dropped
[11:28:42] <elukey>	 no idea why, we had this problem before :(
[11:28:53] <elukey>	 so oozie was freaking out
[11:30:09] <joal>	 hm - second time we see sharelib dropped/recreated without us being at the action button
[11:30:13] <elukey>	 !log use /run/user as kerberos credential cache for stat1005
[11:30:15] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[11:30:33] <joal>	 elukey: If ok for you, I think we should investigate (one more thing :(
[11:33:07] <elukey>	 definitely
[11:34:36] <elukey>	 the main issue is trying to figure out who/what drops the lib
[11:34:46] <elukey>	 because in theory it shouldn't be the oozie shlib script
[11:34:52] <elukey>	 since it executed on the 29th
[11:39:18] <elukey>	 I checked in the trash of hdfs, oozie, analytics and can't find it
[11:46:02] <joal>	 :(
[11:47:17] <elukey>	 interestung
[11:47:18] <elukey>	 elukey@stat1005:~$ ls -dl /mnt/hdfs/user/oozie/share/lib
[11:47:18] <elukey>	 drwxr-xr-x 3 99 hadoop 4096 May  6 08:19 /mnt/hdfs/user/oozie/share/lib
[11:47:35] <elukey>	 so the dir's mtime is today at 8:19 UTC, when the issue started
[11:49:37] <elukey>	 so something has deleted the dir right before oozie freaked out
[11:50:30] <joal>	 elukey: I can't imagine it's not puppet
[11:50:38] <mforns>	 hey teamm
[11:50:51] <mforns>	 uou, alarms
[11:53:04] <elukey>	 hey mforns 
[11:53:15] <elukey>	 joal: not sure, I can't really find any trace of it
[11:54:05] <joal>	 hm
[11:54:07] <elukey>	 ah but maybe the hdfs-audit logs have it
[11:54:11] <elukey>	 checking
[11:56:13] <elukey>	 2020-05-06 08:19:59,109 INFO FSNamesystem.audit: allowed=trueugi=oozie/an-coord1001.eqiad.wmnet@WIKIMEDIA (auth:KERBEROS)ip=/10.64.21.104cmd=deletesrc=/user/oozie/share/lib/lib_20200204183338dst=nullperm=nullproto=rpc
[11:56:17] <elukey>	 loooool
[11:56:36] <elukey>	 it's oozie itself!
[11:56:38] <elukey>	 aahahaah
[11:57:10] <elukey>	 ok wait too weird
[11:57:33] <elukey>	 lemme check when I have executed the shlib update command just in ase
[11:57:36] <elukey>	 *case
[11:58:49] <elukey>	 that was around an hour later (UTC)
[11:59:43] <elukey>	 Oozie will automatically clean up old ShareLib lib_<timestamp> directories based on the following rules:
[11:59:46] <elukey>	 After ShareLibService.temp.sharelib.retention.days days (default: 7)
[11:59:49] <elukey>	 Will always keep the latest 2
[11:59:51] * elukey cries in a corner
[11:59:53] <elukey>	 joal: --^
[12:00:07] <elukey>	 29th -> 6th
[12:00:09] <elukey>	 one week
[12:00:13] <mforns>	 sharp troubleshooting O.o
[12:00:42] <joal>	 WATTTTT !
[12:00:56] <joal>	 elukey: kudos for hdfs-log digging!
[12:01:17] <joal>	 elukey: so oozie recreates it's sharelib every week ???
[12:01:27] <elukey>	 nono I think that this happened
[12:01:55] <elukey>	 1) the puppet exec triggered for some reason (like the "unless" being false due to a network glitch)
[12:02:05] <elukey>	 2) the new shlib gets created on the 29th
[12:02:19] <elukey>	 3) after a week (today) oozie decides to clean up
[12:02:45] <elukey>	 the assumption that oozie makes is that it can safely delete stale dirs
[12:03:03] <elukey>	 joal: --^
[12:03:35] <joal>	 elukey: there still is something I don't understand - on the 29th, puppet recreates an oozie sharelib - So we have 2, correct?
[12:03:45] <joal>	 or does it drop the previous one?
[12:04:08] <elukey>	 the former
[12:04:35] <joal>	 from the hdfs-lob I assume we have 2 - the one 20200204 and the one created the 29th (20200429)
[12:04:48] <elukey>	 after 2) yes
[12:04:52] <elukey>	 the exec is
[12:04:55] <elukey>	     kerberos::exec { 'oozie_sharelib_install':
[12:04:55] <elukey>	         command => "/usr/bin/oozie-setup sharelib create -fs ${hdfs_uri} -locallib ${oozie_sharelib_archive}",
[12:04:58] <elukey>	         unless  => '/usr/bin/hdfs dfs -ls /user/oozie | grep -q /user/oozie/share',
[12:05:01] <elukey>	         user    => 'oozie',
[12:05:02] <elukey>	 see the 'unless' ?
[12:05:05] <elukey>	         require => [Cdh::Hadoop::Directory['/user/oozie'], File['/usr/bin/oozie-setup']]
[12:05:08] <elukey>	     }
[12:05:21] <elukey>	 if that fails for a network issue for example a new dir gets created
[12:05:46] <joal>	 right elukey  - so in case of network error, or cred error (yesterday pinging today ...)
[12:05:57] <joal>	 We have a new dir
[12:06:12] <joal>	 BUT - that new dir is not the one used by oozie ?
[12:06:32] <elukey>	 exactly, since no shared lib upgrade is executed
[12:06:58] <elukey>	 so the old one is still used
[12:07:04] <joal>	 right - but oozie feels free to drop the folder that is still in use - MAAAAN that last sentence makes me feel so bad 
[12:07:11] <elukey>	 https://issues.apache.org/jira/browse/OOZIE-1783
[12:07:23] <elukey>	 it used to be doable only during oozie startup
[12:07:32] <elukey>	 but they thought to make it more interactive
[12:07:34] <elukey>	 to please people
[12:07:40] <elukey>	 so now oozie does it live
[12:08:07] <elukey>	 doing seppuku basically
[12:08:15] <mforns>	 xD
[12:08:38] <joal>	 I'm in the middle of /o\ and :D
[12:08:49] <joal>	 I kinda don't know where to be
[12:09:10] <elukey>	 I get the feeling
[12:09:12] <joal>	 shareLib clean deletes the lib in use ????? I can't imagi
[12:09:57] <elukey>	 I am trying to see if there is way to tell oozie "PLEASE DON'T DO ANYTHING MATE"
[12:10:46] <joal>	 mforns: can you please double check failed/restarted jobs?
[12:10:57] <mforns>	 joal: sure
[12:11:09] <joal>	 Thanks mate :)
[12:14:41] <elukey>	 so I propose something like oozie.service.ShareLibService.temp.sharelib.retention.days=365
[12:15:01] <elukey>	 or even maxint :D
[12:15:20] <elukey>	 in the meantime, updating the email thread
[12:16:28] <joal>	 elukey: https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_command-line-installation/content/set_up_oozie_configuration_files.html
[12:16:57] <elukey>	 ahahha 1000
[12:17:13] <elukey>	 I like it
[12:17:35] <joal>	 elukey: let's not forget the 'interval' prop
[12:17:59] <elukey>	 joal: I think that even 1000 days alone would be safe :D
[12:18:05] <joal>	 :)
[12:18:11] <elukey>	 I hope to not have oozie in 1000 days :D
[12:18:24] <joal>	 elukey: oozie will remind us 1000 days after its restart ;)
[12:28:52] <mforns>	 !log re-run pageview-druid-hourly-coord for 2020-05-06T06:00:00 after oozie shared lib update
[12:28:53] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[12:33:00] <elukey>	 ah snap I missed one sorry :(
[12:34:09] <mforns>	 no problemo, joal asked me to re-check them, last one is running
[12:34:20] <joal>	 Thanks both of you :)
[12:36:23] <elukey>	 code review in https://gerrit.wikimedia.org/r/#/c/594703/
[12:36:28] <elukey>	 if you guys want to check
[12:37:13] <mforns>	 lukin
[12:49:22] <elukey>	 !log restart oozie on an-coord1001 to pick up the new shlib retention changes
[12:49:23] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[12:49:43] <elukey>	 thanks mforns 
[12:54:38] <elukey>	 all right oozie should be fixed
[12:54:46] <joal>	 \o/
[12:54:52] <joal>	 Thanks again elukey 
[12:55:18] <mforns>	 yay
[12:56:59] <joal>	 elukey: also - Should we tell oozie to upgrade sharelib if we create a new folder?
[12:59:05] <wikibugs>	 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Data request for logs from SparQL interface at query.wikidata.org - https://phabricator.wikimedia.org/T143819 (10dcausse)
[13:03:35] <elukey>	 joal: not sure, let's discuss with Andrew.. ideally it shouldn't happen :)
[13:33:43] <elukey>	 joal: whenever you have time I'd need to chat about kerberos credential cache :)
[13:34:21] <joal>	 when you wish elukey 
[13:36:39] <elukey>	 joal: lemme fix one thing since I am stupid
[13:36:46] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Automate ingestion and refinement into Hive of event data from Kafka - https://phabricator.wikimedia.org/T251609 (10mforns) This idea is probably naive and far from what we have now, but maybe: Could we have...
[13:37:15] <joal>	 elukey: please fix, and please stop tell me lies :)
[13:37:44] <elukey>	 ok bc??
[13:38:19] <joal>	 sure elukey 
[13:47:54] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Automate ingestion and refinement into Hive of event data from Kafka - https://phabricator.wikimedia.org/T251609 (10Ottomata) We could, but I'm not sure what that would gain us! :)  We still need a way to ide...
[13:53:13] <milimetric>	 crazy find elukey.  I don't even understand the purpose of that feature, are we supposed to be updating shared libs more often or something?
[13:58:30] <elukey>	 milimetric: I have no idea :)
[13:59:34] <milimetric>	 right... like... someone *wanted* this at some point.  I want to meet that person
[14:15:02] <elukey>	 https://github.com/openjdk/jdk/blob/master/src/java.security.jgss/share/classes/sun/security/krb5/internal/ccache/FileCredentialsCache.java#L448-L456
[14:15:05] <elukey>	 joal: --^
[14:15:07] * elukey cries 
[14:15:27] <joal>	 /o/
[14:15:31] <wikibugs>	 (03PS1) 10Milimetric: Use new page move incremental updates [analytics/refinery] - 10https://gerrit.wikimedia.org/r/594719 (https://phabricator.wikimedia.org/T249773)
[14:15:46] <joal>	 elukey: I guess we're gonna need to use the env var :S
[14:18:09] <wikibugs>	 10Analytics, 10Performance-Team (Radar), 10Vue.js (Vue.js-Search): Revise schema and performance dashboards for Vue.js search - https://phabricator.wikimedia.org/T250336 (10Niedzielski)
[14:18:44] <elukey>	 joal: that is way more invasive sigh
[14:33:12] <ottomata>	 hiyaaaa my laptop is not booting, am on an old computer with less ability to log into things atm...
[14:44:54] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Automate ingestion and refinement into Hive of event data from Kafka - https://phabricator.wikimedia.org/T251609 (10mforns) Yes, I imagined it would be easier to do it as soon as possible in the pipeline (Kaf...
[14:49:40] <wikibugs>	 10Analytics, 10Product-Analytics: Can't publish my draft dashboard on superset - https://phabricator.wikimedia.org/T248904 (10mforns) I can see the dashboard mentioned in the description is published. @Esanders Is this issue solved then? Thanks :]
[14:55:08] <wikibugs>	 10Analytics: Check home/HDFS leftovers of anomie - https://phabricator.wikimedia.org/T250167 (10mforns) @AMooney ping? :-)
[15:26:26] <wikibugs>	 10Analytics, 10Operations: Analytics1060 unresponsive - https://phabricator.wikimedia.org/T251973 (10colewhite) p:05Triage→03Medium
[15:30:26] <wikibugs>	 10Analytics, 10Operations: Analytics1060 unresponsive - https://phabricator.wikimedia.org/T251973 (10elukey) 05Open→03Resolved a:03elukey
[15:46:25] <mforns>	 joal: the hdfs-rsync that handles mediawiki-history-dumps is the java one? I thought it was the one you wrote...
[15:47:29] <wikibugs>	 10Analytics, 10Analytics-Kanban: hdfs-rsync of mediawiki history dumps fails due to source not present (yet) - https://phabricator.wikimedia.org/T251858 (10mforns) a:05JAllemandou→03mforns
[15:52:15] <maryum>	 hi! I am using the archiva ci credentials mentioned here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Deploy/Refinery-source#Changing_the_archiva-ci_password and I was wondering if the password has expired
[15:52:32] <wikibugs>	 10Analytics: Check home/HDFS leftovers of anomie - https://phabricator.wikimedia.org/T250167 (10AMooney) @mforns, thanks for the ping. I am checking around for someone with access.
[16:06:24] <elukey>	 maryum: hey! do you see a password in there?
[16:06:57] <maryum>	 elukey: I don't have permission....I know it was there because our job was working but now it's failing so I was wondering if the password has expired like mentioned on that page
[16:07:34] <elukey>	 ahhh ok because I can't see it too
[16:08:26] <elukey>	 so the pw in password store (only for sres) seems not working
[16:08:37] <elukey>	 ottomata: did you change the pass of archiva-ci recently?
[16:09:15] <ottomata>	 hmmm i might have...
[16:09:20] <ottomata>	 beacuse it expired
[16:09:53] <ottomata>	 yargh but
[16:09:59] <ottomata>	 hm
[16:10:05] <ottomata>	 i am never able to update pwstore
[16:10:20] <ottomata>	 because of expired user keys
[16:10:24] <ottomata>	 i might have given up
[16:10:40] <ottomata>	 and now, my main computer is on the fritz and is restoring from backup atm
[16:10:52] <ottomata>	 and i can't get the pw out of my pw manager until it boots... :/
[16:11:16] <ottomata>	 it would have been a couple of months ago though
[16:11:24] <ottomata>	 maryum:  how long has your job been failing?
[16:12:06] <ottomata>	 i think i was able to disable the archiva-ci password expiration too
[16:12:08] <ottomata>	 not sure though
[16:12:20] <maryum>	 ottomata: it was passing on Monday and failed earlier today. we don't run it all the time
[16:12:37] <maryum>	 I don't think I would even be able to get access to pwstore as I'm not an SRE
[16:13:01] <ottomata>	 aye
[16:13:04] <ottomata>	 yeah i haven't changed it since monday
[16:13:16] <maryum>	 oh okay....then we have some other issue....super strange. thanks!
[16:13:30] <ottomata>	 maybe the pw did expire in archiva?
[16:13:32] <ottomata>	 it is possible
[16:14:15] <maryum>	 I can't log into archiva either to check
[16:14:24] <maryum>	 well not as the admin user
[16:15:26] <elukey>	 I can as admin but I don't see if it is expired
[16:17:26] <maryum>	 elukey: hmm okay, maybe the password is okay then. difficult to tell
[16:19:19] <elukey>	 maryum: is it blocking you right now? (I guess yes)
[16:19:42] <elukey>	 ottomata: one thing that I could do is to generate a new pw and then update jenkins/archiva
[16:19:52] <elukey>	 and try to save it on pwstore
[16:20:25] <maryum>	 elukey: it's not an immediate blocker but we plan to use this job once a week for deploys
[16:20:53] <maryum>	 ottomata: that would be helpful, and then if the job is still failing then it must be something else
[16:21:16] <ottomata>	 elukey:  +1
[16:21:31] <elukey>	 maryum: ack I'll try to regenerated later on, is it ok?
[16:21:39] <elukey>	 (need to run some errands sorry)
[16:21:44] <maryum>	 elukey: yes that is fine, no rush
[16:21:49] <elukey>	 super thanks :)
[16:21:54] * elukey errand for a while
[16:54:38] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Research, 10Patch-For-Review: Proposed adjustment to wmf.wikidata_item_page_link to better handle page moves - https://phabricator.wikimedia.org/T249773 (10Milimetric) @Isaac, I was finally able to run this successfully.  I'm vetting the data a little bit now, basically j...
[17:00:49] <ottomata>	 nuria:  i'm pretty sure ssh access is needed to use superset?  elukey  right?
[17:01:01] <ottomata>	 the accounts need to exist on the namenode still, no?
[17:10:29] <elukey>	 nuria: to use presto yes
[17:10:38] <elukey>	 since it checks credentials to access hdfs files
[17:13:43] <elukey>	 (logging off, will check later :)
[17:13:53] <elukey>	 err: ottomata: --^
[17:14:17] <ottomata>	 laters!
[17:15:08] <wikibugs>	 10Analytics, 10LDAP-Access-Requests, 10Operations, 10Patch-For-Review: LDAP access to the wmf group for Antonino Hemmer (superset, turnilo, hue) - https://phabricator.wikimedia.org/T251123 (10colewhite) 05Open→03Resolved ah212 added to `wmf` ldap group.  Please feel free to reopen if you encounter any...
[17:32:09] <wikibugs>	 10Analytics: Check home/HDFS leftovers of anomie - https://phabricator.wikimedia.org/T250167 (10AMooney) @tstarling, Do you have ssh access, so that you can access these files and copy them to your home dir? I'd like to ensure that we do not need them.
[17:51:11] <mforns>	 ottomata: can you do nested string interpolation in puppet? like:   $a = 'Blah'    $b = "Hello, ${a}"    $c = '!'     $d = "${b}${c}"
[17:54:10] <mforns>	 I guess you can, asking because maybe there's some weird double escaping that needs to be done?
[18:08:58] <ottomata>	 yes
[18:09:07] <ottomata>	 that shoul be fine
[18:09:17] <ottomata>	 it isn't working for you mforns ?
[18:09:46] <mforns>	 no no, just checkig that there wasn't any weird thing, i.e. with backslashes or sth
[18:09:49] <mforns>	 thx ottomata 
[18:33:12] <wikibugs>	 10Analytics, 10Analytics-Kanban: Make anomaly detection correctly handle holes in time-series - https://phabricator.wikimedia.org/T251542 (10mforns) a:03mforns
[18:40:05] <joal>	 mforns: sorry missed your ping and then left for diner - hdfs-rsync is the tool I wrote in scala, launched by java
[18:40:37] <mforns>	 joal: aaah... sorry
[18:40:50] <joal>	 np mforns :)
[18:41:20] <mforns>	 joal: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/594773/
[18:42:10] <joal>	 mforns: I like it very much the way you did it :)
[18:42:24] <mforns>	 :D
[18:42:25] <joal>	 mforns: hdfs-rsync is supposed to fail when the source is missing
[18:42:26] <joal>	 :)
[18:43:05] <mforns>	 ok, yea I thought to pass a new parameter to it, so that it could choose to fail or not, but if you like it in puppet, I'm happy :]
[18:43:14] <joal>	 Thanks a lot mforns! The puppet is also extremely good looking (I'd have gone for the hard-coded version)
[18:43:36] * mforns blushes
[20:38:59] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Research, 10Patch-For-Review: Proposed adjustment to wmf.wikidata_item_page_link to better handle page moves - https://phabricator.wikimedia.org/T249773 (10Isaac) > Question: is that ok? I can easily regenerate the 2020-03-02 snapshot as if it was generated by the old log...
[20:50:47] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Research, 10Patch-For-Review: Proposed adjustment to wmf.wikidata_item_page_link to better handle page moves - https://phabricator.wikimedia.org/T249773 (10Isaac) Quick context for snapshot ranges -- I checked via this query (I assume the spillover to April is unpredictab...
[21:06:25] <wikibugs>	 10Analytics, 10Growth-Team, 10Product-Analytics (Kanban): Hash edit session ID in EditAttemptStep and VisualEditorFeatureUse whitelisting - https://phabricator.wikimedia.org/T244931 (10nettrom_WMF) 05Open→03Declined After discussing this with Analytics Engineering, I think it's clear that we don't want t...
[21:22:31] <wikibugs>	 10Analytics, 10DC-Ops, 10Operations, 10ops-eqiad: Degraded RAID on analytics1055 - https://phabricator.wikimedia.org/T252070 (10colewhite) p:05Triage→03Medium