[04:25:38] PROBLEM - Check the last execution of reportupdater-interlanguage on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [04:26:24] PROBLEM - Check the last execution of reportupdater-published_cx2_translations on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [04:28:20] PROBLEM - Check the last execution of reportupdater-browser on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [04:36:14] RECOVERY - Check the last execution of reportupdater-interlanguage on stat1007 is OK: OK: Status of the systemd unit reportupdater-interlanguage https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [04:37:00] RECOVERY - Check the last execution of reportupdater-published_cx2_translations on stat1007 is OK: OK: Status of the systemd unit reportupdater-published_cx2_translations https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [04:38:54] RECOVERY - Check the last execution of reportupdater-browser on stat1007 is OK: OK: Status of the systemd unit reportupdater-browser https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:19:04] good morning elukey - Can I help with the p3 thing? [07:27:12] joal: morning :) [07:27:29] if you want yes, I am just debugging what may have happened in prod [07:27:33] so far no repro in labs [07:27:36] that is a bit strange [07:27:40] indeed [07:28:42] how do you want us to proceed? Shall I get my beaked-yellow suit and meet you in da cave? [07:31:02] there is no urgency, if you have other things to do please go ahead, we'll be out next week so we'll likely attempt another deploy in some days [07:31:17] ack :) [08:00:37] joal: I think I have found a possible fix, but it is all very confusing [08:01:05] it works very well if I assume that deployment prep is not representative of prod traffic at all [08:01:44] elukey: I think that's a fair assumption: deployment-prep has lower traffic, and a lot less event variability [08:02:07] joal: but not even one event that triggered this problem? [08:02:15] I can't explain it [08:02:59] elukey: isn't the issue related to error-only events, and deployment-prep don't have them? [08:04:16] we do have them [08:04:30] at least from what I can see [08:25:38] thanks for the review! [09:22:24] hello joal, if you could take a quick look at this I can start backfilling :) [09:22:24] https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/543837/ [09:24:09] fdans: I have not double checked the values of the map, but except from that it looks good :) thanks! [10:30:54] * elukey lunch! [10:39:09] (03CR) 10Joal: "Before going into detailed CR, I'd like to discuss the approach. Here you're listing all folders and files older-than-X, and then recursiv" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/543897 (https://phabricator.wikimedia.org/T235200) (owner: 10Ottomata) [13:16:47] (03CR) 10Ottomata: "Would the way to use RemoteIterator be with FileSystem listLocatedStatus? If so, I can do that, as there is a method that takes a PathFil" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/543897 (https://phabricator.wikimedia.org/T235200) (owner: 10Ottomata) [13:17:27] (03CR) 10Ottomata: "I'm fine with putting this in refinery core if you think that is better." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/543897 (https://phabricator.wikimedia.org/T235200) (owner: 10Ottomata) [13:40:07] (03CR) 10Joal: "I was thinking of using listStatusIterator(Path p), and apply filtering manually. The decision to make is whether applying PathFilter retu" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/543897 (https://phabricator.wikimedia.org/T235200) (owner: 10Ottomata) [13:50:35] (03CR) 10Ottomata: "> I was thinking of using listStatusIterator(Path p), and apply filtering manually." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/543897 (https://phabricator.wikimedia.org/T235200) (owner: 10Ottomata) [13:51:14] Heya ottomata - just wanted to be sure we're on the same page about the HdfsCleaner [14:05:56] joal: ya i think so [14:06:13] just use listStatusIterator and check mtime and delete? [14:06:44] is using a PathFIlter even helpful? [14:06:51] vs just checking a conditional directly? [14:07:02] if it is helpful, I could use listLocatedStatus [14:07:09] since that also returns a remote iterator [14:07:16] its the same thing as the FileStatus, but with block info? [14:07:21] i don't need the block info [14:07:55] i guess, I could drop in listLocatedStatus right now with existent code in place of listStatus and everythign would just work i think [14:08:00] and it would use a RemoteIterator [14:08:03] ? [14:45:00] !log backfilling 2015-1-1 for mediarequests per file, proceeding with all days until 2019-05-17 if successful [14:45:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:32:48] * fdans this takes ages [16:18:12] nuria: mforns: dsaez: I updated the page. please let me know if you need more information about the events. the IR one is particularly interesting as it deviated from the regular traffic pattern even though it was for a short period of time [16:41:11] Team - I'm gone to Paris! See you next week :) [16:41:18] byeeeee! [16:59:26] 10Analytics, 10Analytics-Kanban, 10LDAP-Access-Requests, 10Operations, and 2 others: Analytics Access for Grant (groups cn=wmf and analytics-privatedata-users) - https://phabricator.wikimedia.org/T235260 (10Nuria) Please @gsingers test whether you can access https://turnilo.wikimedia.org and https://super... [17:01:56] 10Analytics, 10Operations, 10SRE-Access-Requests: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688 (10Nuria) And also we need to add lex to nda group for access to turnilo and superset [17:15:30] 10Analytics, 10Operations, 10SRE-Access-Requests: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688 (10Dzahn) @RStallman-legalteam Hi, do you have NDA on file for Lex Nasser? [17:15:34] 10Analytics, 10Operations, 10SRE-Access-Requests: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688 (10Dzahn) @Nuria Thank you for the groups, will prepare a change and move forward with this asap. @lexnasser Thank you, yes, i see the signature looks good! [17:26:32] 10Analytics, 10Operations, 10SRE-Access-Requests: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688 (10Dzahn) @Nuria Is Lex going to get a @wikimedia.org email address? I was wondering because i need to specify one in the code change and it looks like it doesn't exist un... [17:33:24] 10Analytics, 10Operations, 10SRE-Access-Requests: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688 (10RStallman-legalteam) @Dzahn I do not have an NDA on file for Lex Nasser, but it is possible that the paperwork for Lex's internship was completed through HR. @Nuria can... [18:02:07] milimetric, do you know how to insert named_structs in Hive without losing the casing of the sub-field name? [18:03:02] mforns: iiuc, doing anything in hive might lose casing [18:03:11] since it is case insensitive [18:03:13] sort of? [18:03:17] hm, maybe in structs it isnt' sort of? [18:03:30] ottomata, no, structs are not case insensitive [18:03:54] really? even in a regular hive query? [18:04:03] e.g. event.key is different than event.Key? [18:04:04] I wonder if I insert an unnamed struct into a named_struct column it will insert the subfields by order of appearance? [18:04:12] yes [18:04:19] mforns: sounds like you are getting into spark land' [18:04:29] not 100% sure, but I remember having problems in the past with that [18:04:37] ffffff... [18:06:03] ok, thanks! will try to insert the value and see what happens... [18:07:35] ottomata, oh! no, I just checked and struct subfields are indeed case insensitive! [18:07:51] the problems I had in the past must have been with the EL sanitization whitelist [18:08:20] and the sanitization algorithm not being case insensitive [18:08:31] thanks! [18:21:12] hm ! [18:33:23] sorry mforns no idea either, I remember vaguely trying it once [18:33:45] milimetric, no problemo, I was confused, it does work indeed [20:12:34] (03CR) 10Zhuyifei1999: [C: 03+1] "I kinda wanna fix this, but sure" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/543923 (https://phabricator.wikimedia.org/T205214) (owner: 10Framawiki) [20:43:41] 10Analytics, 10Operations, 10SRE-Access-Requests: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688 (10Nuria) @RStallman-legalteam it was done through HR, yes, he probably needs to sign an NDA as well? [20:55:51] 10Analytics, 10Operations, 10SRE-Access-Requests: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688 (10RStallman-legalteam) Ok, probably best for me to just create one since it looks like shell access is needed. @lexnasser could you email your physical (snail mail) addre... [21:07:31] 10Analytics, 10Operations, 10SRE-Access-Requests: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688 (10lexnasser) @RStallman-legalteam Just sent an email [21:51:04] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade eventlogging to Python 3 - https://phabricator.wikimedia.org/T233231 (10Nuria) Left comments on patch Given the > if isinstance(event_string, Event): The Event is a dictionary (always, otherwise we do not serialize) but it might...