[00:07:09] PROBLEM - Check the last execution of refinery-sqoop-whole-mediawiki on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit refinery-sqoop-whole-mediawiki https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [11:29:32] Jan 01 00:02:58 an-launcher1002 kerberos-run-command[32432]: File "/srv/deployment/analytics/refinery/python/refinery/sqoop.py", line 54, in __init__ [11:29:35] Jan 01 00:02:58 an-launcher1002 kerberos-run-command[32432]: self.split_by = queries[table]['split-by'] [11:29:38] Jan 01 00:02:58 an-launcher1002 kerberos-run-command[32432]: KeyError: 'split-by' [11:33:56] elukey: queries[table].keys() should show you what keys do exist [11:35:06] RhinosF1: good morning, yep I know but re-running the script only for that might not be feasible, it might be due to some changes that happened before the holidays [11:35:14] these timers start the first day of the month [11:35:39] elukey: ah, that's unfortunate. [11:37:41] RhinosF1: basically we have a list of things to get from the wiki databases, and the config contains some keys, like "split-by", surely we are missing something :) [11:37:52] it is not a huge deal, it is holiday, we can check it later o [11:37:53] *on [11:39:38] elukey: something must have gone missing ye, I've never looked properly at the analytics python code. [11:39:47] https://gerrit.wikimedia.org/r/c/analytics/refinery/+/647681/2/python/refinery/sqoop.py is missing the split-by, and it happened in dec [11:41:26] elukey: that's likely it. [11:41:34] anyway, going to lunch, have a good day! [11:41:53] * RhinosF1 wonders if you can make pytest fail if things like that are missed [11:42:35] elukey: after you have lunch of course but I can look at ^ [11:43:19] nah that's fine, I just sent an email to the team, we'll fix the script and re-launch during the next days, nothing urgent! Thanks for the help :) [11:44:25] elukey: cool, I'm happy to add it to a list of to dos though. It's no bother. [13:51:06] (03PS1) 10Milimetric: Hotfix sqoop handling of split-by [analytics/refinery] - 10https://gerrit.wikimedia.org/r/653063 [14:14:12] (test is running on 3 small wikis with all the tables, sqooping into my home dir [14:14:31] if all goes well I'll deploy and reset the failed timer [14:14:32] ) [14:32:40] works, deploying [14:37:08] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "tested, works" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/653063 (owner: 10Milimetric) [14:54:42] !log deployed refinery hotfix for sqoop problem, after testing on three small wikis [14:54:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:57:03] currently running the main production job manually, and will reset the timer when it's finished and successful [15:27:53] miriam: o/ [15:27:57] err milimetric :) [15:28:24] thanks for the fix! So we don't restart the timer right? [15:30:52] no, I will check on the manual run later and if it's succesful I'll just reset the timer failure [15:31:35] ah okok, we could have re-run it just to check that it worked fine [15:32:21] good good, it seems that we are on track, lemme know if I can do anything (will check later on too just in case :) [15:32:44] I ran the same command the timer runs, it's going in a screen on launcher1002 [15:33:07] no worries Luca, I got it. Happy New Year! [15:33:33] and thanks for the email :) [15:33:51] <3 happy ny to you and fam too :) [17:58:02] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on an-coord1002 - https://phabricator.wikimedia.org/T270768 (10elukey) My great ignorance in sw-RAID setups forced me to step on a mine, namely T215183. The failed disk is the one containing the grub partition table, since it was not... [18:45:38] Hi, and happy new year! A volunteer completed the translation for stats.wikimedia.org into Catalan a while ago, is there an ETA for this translation to go live? [22:41:24] quick update: job is running, I think it takes ~ 24h so I'll check on it tomorrow. It did the big wikis fine, so I think we're ok as far as code-correctness is concerned, just a half day or so behind our normal pace. [23:02:03] (03PS1) 10Gerrit maintenance bot: Add uz.wikimedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/653209 (https://phabricator.wikimedia.org/T270987) [23:11:19] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/653209 (https://phabricator.wikimedia.org/T270987) (owner: 10Gerrit maintenance bot)