[01:25:12] RECOVERY - Check the last execution of monitor_refine_event on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:54:43] PROBLEM - Check the last execution of monitor_refine_event_failure_flags on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_event_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:47:49] RECOVERY - Check the last execution of monitor_refine_eventlogging_legacy_failure_flags on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_legacy_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:55:42] Does anyone here have experience using a hardware 2FA key for stat machine Kerberos? [07:56:06] Maybe that's redundant once I have my Wikimedia production SSH key on the hardware dongle... [07:58:30] hi awight! What do you mean 2fa with kerberos? [07:59:07] what I have been using so far is a yubikey to ssh to stat100x hosts, then in there one needs to kinit etc.. [07:59:21] I don't think it is possible to automate the step via a dongle [07:59:36] (not sure if I got your use case correctly) [08:36:35] elukey: Exactly, it sounds like we're on the same page. The passphrase doesn't seem to add much security if any--I type it into a web browser and shell repeatedly, it lives in an encrypted file somewhere... If my laptop is compromised, then the passphrase is as well. If there were a Kerberos method such as Yubi OTP (or any of the many other protocols), it seems like it would add a proper [08:36:42] layer of security. [08:39:56] with compromised you mean if somebody runs a keylogger on your laptop? [08:50:15] elukey: exactly--a highly paranoid scenario but it breaks laptop-only 2FA [08:50:45] ah yes yes :) [08:52:07] Still, once I have hardware 2FA for ssh I guess it's not an issue any more? Maybe I'm missing the justification for Kerberos passwords, I'll read through the original tasks perhaps... [08:53:53] krb passwords are needed as extra step, for example if a laptop is stolen (and the ssh keys are not protected with password etc..) you'll need to prove your identity on kinit with another password (that if not written on a post-it on the same laptop it should shield our data) [08:56:42] Cool, I can see that giving some small margin of protection in that scenario. Stolen laptop is one of the least likely attacks but does happen. [08:58:29] there are keytabs that one can use for recurrent jobs etc.. that don't need a password, so in theory somebody could bypass this step as well [08:59:19] the alternative setting would be that every user had a kerberos keytab for the user/host combination (say all stat boxes), that could be used automatically for jobs etc.. [08:59:28] but the automation work to do it is a lot :) [09:06:28] Hardware 2FA is no help if it's sticking out of the stolen laptop, anyway ;-) [09:07:04] Thanks for talking me through this, I think my paranoia attack is winding down now! [09:07:26] I wish more people were affected by this paranoia about security :) [09:07:55] * awight credits Amir1 with educating a few of us recently [09:10:27] * awight adjust tin foil hat to look just like his [09:48:27] all morning spent debugging hue and python3 [09:48:29] lovely [10:07:02] looking at the resource_purge event refine failures [10:12:38] it might be something that Andrew is working on, that null pointer exception is weird [10:18:23] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade the Hadoop test cluster to BigTop - https://phabricator.wikimedia.org/T244499 (10elukey) After the last change, I have tested (from an-tool1006): - pyspark and spark shells - refine and druid indexation timers (all spark... [10:38:25] 10Analytics-Clusters, 10Discovery, 10Discovery-Search: mjolnir-kafka-msearch-daemon dropping produced messages after move to search-loader[12]001 - https://phabricator.wikimedia.org/T260305 (10elukey) Adding also another note. The configuration of the msearch response topic is the following: ` Topic:mjolnir... [10:57:54] * elukey lunch! [12:49:45] 10Analytics-Clusters, 10Operations, 10vm-requests: Create 4 new VMs to replace schema[12 - https://phabricator.wikimedia.org/T260347 (10elukey) [12:51:13] 10Analytics-Clusters, 10Operations, 10vm-requests: Create 4 new VMs to replace schema[12]00[12] - https://phabricator.wikimedia.org/T260347 (10elukey) p:05Triage→03Medium a:03elukey [12:54:07] 10Analytics-Clusters, 10Operations, 10vm-requests: Create 4 new VMs to replace schema[12]00[12] - https://phabricator.wikimedia.org/T260347 (10elukey) [12:54:11] 10Analytics-Clusters: Upgrade schema[12]00[12] to Debian Buster - https://phabricator.wikimedia.org/T255026 (10elukey) [13:00:33] fdans: am here, checking email, lemme know if you found any resource_purge stuff.....hmmmmm [13:00:42] oh i betcah we just started refining it for the first time yesterday [13:00:52] my changes to camus probably picked it up [13:00:55] and it started being refined [13:01:00] 10Analytics-Clusters, 10Operations, 10vm-requests: Create 4 new VMs to replace schema[12]00[12] - https://phabricator.wikimedia.org/T260347 (10elukey) As described in T255026#6276301: >>! In T255026#6276301, @MoritzMuehlenhoff wrote: > When you do that, please use row B/D in eqiad and row C/D in codfw to be... [13:03:40] ottomata: morninggg [13:04:03] ottomata: when you have a moment, I am working on https://phabricator.wikimedia.org/T260347 [13:04:18] the idea is to create the new buster vms, then we'll switch [13:04:22] elukey: hello sounds good [13:04:23] lemme know if it is ok [13:04:25] super :) [13:06:06] for sure [13:06:08] will be very easy too [13:06:16] can add the VMs, add them to the LVS pool [13:06:22] then depool the old ones and decomm [13:07:33] exactly yes [13:07:50] created https://gerrit.wikimedia.org/r/c/operations/dns/+/619998 [13:13:31] (03PS1) 10Awight: [WIP] Explore mystery conflicts [analytics/wmde/TW/edit-conflicts] - 10https://gerrit.wikimedia.org/r/620000 (https://phabricator.wikimedia.org/T246439) [13:19:51] fdans: FYI resourcep-purge is failing because the events in codfw are created by restbase/changeprop, producing directly to kafka, but the code isn't properly setting the $schema URI [13:19:56] so refine can't find the schema [13:20:09] ohhhhh [13:26:26] petr is going to fix [13:40:07] a-team it's train time [13:40:22] super [13:41:09] nice [13:42:52] ottomata: do you have anything pending deployment? i only have two items by elukey and nuria [13:43:40] fdans: there is a refine change I made but it is irrelevant unless we fix that bug i was talking about [13:43:43] it is safe to go out though [13:43:48] nothing otherwise [13:44:06] cool! [13:49:53] (03PS1) 10Fdans: Update changelog.md for 0.0.133 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/620013 [13:51:14] (03CR) 10Fdans: [V: 03+2 C: 03+2] Update changelog.md for 0.0.133 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/620013 (owner: 10Fdans) [13:52:46] Starting build #56 for job analytics-refinery-maven-release-docker [14:02:00] Project analytics-refinery-maven-release-docker build #56: 09SUCCESS in 9 min 13 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/56/ [14:02:45] very nice, 9 minutes to build refinery [14:03:11] I think that we could get even better with the patch for nginx+sendfile, will try to get it done [14:06:43] ottomata: when you have a moment https://gerrit.wikimedia.org/r/620016 [14:09:58] anyway seems good to merge, I'll have to do all the bootstrap vm so in case something is not ok I'll fix it [14:13:33] elukey: this new thing is freakin awesome [14:13:58] !log updating refinery source symlinks [14:13:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:14:09] fdans: what new thing?? :D [14:14:49] elukey: the jar generation job is less confusing no? [14:15:18] fdans: ah not sure about that part, I think that joseph did the magic in changing the repos etc.. [14:15:27] didn't know about other changes [14:15:27] Starting build #23 for job analytics-refinery-update-jars-docker [14:15:38] ahhhh sorry you mean the CI stuff [14:15:44] (03PS1) 10Maven-release-user: Add refinery-source jars for v0.0.133 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/620018 [14:15:44] Project analytics-refinery-update-jars-docker build #23: 09SUCCESS in 17 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/23/ [14:15:52] Antoine and Joseph worked on it yes, really near [14:15:53] *neat [14:42:58] (03CR) 10Fdans: [V: 03+2 C: 03+2] Add refinery-source jars for v0.0.133 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/620018 (owner: 10Maven-release-user) [14:44:30] !log deploying refinery [14:44:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:44:48] (03PS1) 10Awight: [WIP] Refresh "new article" conflicts for 2020 [analytics/wmde/TW/edit-conflicts] - 10https://gerrit.wikimedia.org/r/620048 (https://phabricator.wikimedia.org/T246439) [14:49:23] ottomata: schema1003 is ready to test whenever you want [14:49:33] I am completing the other ones now [14:50:08] nice [14:50:58] looks fine elukey [14:52:34] super [15:02:55] fdans, ottomata - standuuppp [15:03:12] OHH [15:04:42] 10Analytics-Clusters, 10Operations, 10vm-requests: Create 4 new VMs to replace schema[12]00[12] - https://phabricator.wikimedia.org/T260347 (10elukey) ` elukey@cumin1001:~$ sudo cookbook sre.ganeti.makevm eqiad_B schema1003.eqiad.wmnet --vcpus 2 --memory 2 --disk 10 START - Cookbook sre.ganeti.makevm /usr/li... [15:04:59] 10Analytics-Clusters: Upgrade schema[12]00[12] to Debian Buster - https://phabricator.wikimedia.org/T255026 (10elukey) [15:05:01] 10Analytics-Clusters, 10Operations, 10vm-requests: Create 4 new VMs to replace schema[12]00[12] - https://phabricator.wikimedia.org/T260347 (10elukey) 05Open→03Resolved All vms created and bootstrapped! [15:08:30] 10Analytics-Clusters: Upgrade schema[12]00[12] to Debian Buster - https://phabricator.wikimedia.org/T255026 (10elukey) Vms created, next steps: [] check that vms are working fine etc.. (I checked that specs are correct but @Ottomata should verify that all the envoy services are good on all VMs just to be sure)... [15:15:25] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:17:02] fdans: please get back for post standups [15:17:11] hehe [16:13:13] !log restarting webrequest bundle [16:13:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:30:29] fdans: I see two webrequest_load bundles in hue, expected? [16:33:21] anyway, I was checking if my change caused damages, looks like it is working fine [16:33:31] (didn't expect much troubles but better safe than sorry :) [16:41:09] mediarequest failed, but it seems due to refinery-hive [16:41:38] 0.98 is surely not in archiva anymore [16:44:53] yep we have 115+ https://archiva.wikimedia.org/#artifact~releases/org.wikimedia.analytics.refinery.hive/refinery-hive [16:45:00] but I thought we had update all of those [16:46:27] (03PS1) 10Elukey: oozie: fix refinery-hive version for mediarequest coordinator [analytics/refinery] - 10https://gerrit.wikimedia.org/r/620070 [16:48:16] (03PS2) 10Elukey: oozie: fix refinery-hive version for mediarequest coordinator [analytics/refinery] - 10https://gerrit.wikimedia.org/r/620070 [16:48:25] using 0.0.115 since it is widely used in refinery, safer [16:51:29] elukey: oh i totally didn't bump up the versions [16:53:44] 10Analytics, 10Analytics-EventLogging, 10Event-Platform: Integrate Wikimedia Event Utilities with discovery-parent-pom - https://phabricator.wikimedia.org/T260375 (10Gehel) [16:54:31] fdans: there are also two webrequest-load upload coords active right now, soon the next hour will kick off [16:54:46] (and two concurrent jobs will run etc..) [16:54:51] on it [16:55:18] all right, stepping away for a bit! ttl [16:55:19] :) [16:58:10] elukey: I got this, bumping and restarting, good night! [17:01:41] (03PS1) 10Fdans: Bump up jar versions of mediarequest and webreques bundle [analytics/refinery] - 10https://gerrit.wikimedia.org/r/620073 [17:35:14] (03CR) 10Elukey: Bump up jar versions of mediarequest and webreques bundle (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/620073 (owner: 10Fdans) [17:35:52] (03Abandoned) 10Elukey: oozie: fix refinery-hive version for mediarequest coordinator [analytics/refinery] - 10https://gerrit.wikimedia.org/r/620070 (owner: 10Elukey) [17:36:07] just added a couple of comments, feel free to ignore them and proceed :) [18:03:57] elukey: see notes on etherpad [18:03:57] https://etherpad.wikimedia.org/p/analytics-weekly-train [18:04:17] ohhhhh damn I see [18:04:50] there's no need for a bump, it's just that the mediarequests failed because archiva didn't have that har [18:12:19] (03PS2) 10Fdans: Bump up jar versions of mediarequest and webreques bundle [analytics/refinery] - 10https://gerrit.wikimedia.org/r/620073 [18:12:44] (03CR) 10Fdans: [V: 03+2 C: 03+2] "both comments addressed, self merging" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/620073 (owner: 10Fdans) [18:13:03] (03PS3) 10Fdans: Bump up jar versions of mediarequest and webreques bundle [analytics/refinery] - 10https://gerrit.wikimedia.org/r/620073 [18:13:10] (03CR) 10Fdans: [V: 03+2 C: 03+2] Bump up jar versions of mediarequest and webreques bundle [analytics/refinery] - 10https://gerrit.wikimedia.org/r/620073 (owner: 10Fdans) [18:14:43] fdans: yes exactly [18:16:28] fdans: so IIUC the jar bump is for puppet refine right? [18:16:53] elukey: yep, I'm pushing the patch right now [18:28:31] (need to go out for dinner, will check later if needed :) [19:04:29] 10Analytics-Clusters, 10Discovery, 10Discovery-Search (Current work): mjolnir-kafka-msearch-daemon dropping produced messages after move to search-loader[12]001 - https://phabricator.wikimedia.org/T260305 (10CBogen) [19:13:25] (03PS2) 10Mforns: Add editors by country data to AQS [analytics/aqs] - 10https://gerrit.wikimedia.org/r/593660 (https://phabricator.wikimedia.org/T238365) (owner: 10Lex Nasser) [19:14:04] (03CR) 10Mforns: [C: 04-2] "OK, this passes unit tests, but still need to test with real data!" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/593660 (https://phabricator.wikimedia.org/T238365) (owner: 10Lex Nasser) [19:15:17] 10Analytics, 10Analytics-EventLogging, 10Discovery-Search, 10Event-Platform: Integrate Wikimedia Event Utilities with discovery-parent-pom - https://phabricator.wikimedia.org/T260375 (10Gehel) [19:19:12] 10Analytics, 10Analytics-EventLogging, 10Discovery-Search, 10Event-Platform: Integrate Wikimedia Event Utilities with discovery-parent-pom - https://phabricator.wikimedia.org/T260375 (10Gehel) [19:19:25] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Discovery-Search (Current work): Integrate Wikimedia Event Utilities with discovery-parent-pom - https://phabricator.wikimedia.org/T260375 (10CBogen) [19:25:52] 10Analytics: Allow login to JupyterHub via CAS - https://phabricator.wikimedia.org/T260386 (10Ottomata) [19:32:35] 10Analytics, 10CAS-SSO: Allow login to JupyterHub via CAS - https://phabricator.wikimedia.org/T260386 (10Peachey88) [19:52:41] 10Analytics-Clusters, 10Discovery, 10Discovery-Search (Current work): mjolnir-kafka-msearch-daemon dropping produced messages after move to search-loader[12]001 - https://phabricator.wikimedia.org/T260305 (10EBernhardson) Other bits i forgot: repository: search/MjoLniR deploy repository: search/MjoLniR-depl... [20:12:26] 10Analytics, 10Data-Services, 10cloud-services-team (Kanban): Rethink Cloud DB replicas - https://phabricator.wikimedia.org/T215858 (10Bstorm) [21:42:27] 10Analytics-Radar, 10Dumps-Generation, 10Okapi, 10Platform Engineering: HTML Dumps - June/2020 - https://phabricator.wikimedia.org/T254275 (10RBrounley_WMF) [22:04:51] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Order mediawiki_history dumps by event_timestamp - https://phabricator.wikimedia.org/T254233 (10marcmiquel) Sorry to bother, but I am using the dumps and I see the same problem on gnwiki. This is a really small wikipedia which is all in one file. The rows...