[00:36:33] PROBLEM - Check the last execution of drop-el-unsanitized-events on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit drop-el-unsanitized-events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:30:31] bearloga: first 12 characters are device ID. I wouldn't trust device ID to stay secret, by principle, any service that you use with your yubikey will have that. Yubikey can also generate static password if you configure it to (on longer hit of the button). [07:26:55] RECOVERY - Check the last execution of drop-el-unsanitized-events on an-coord1001 is OK: OK: Status of the systemd unit drop-el-unsanitized-events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:37:38] !log systemctl restart drop-el-unsanitized-events on an-coord1001 [07:37:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:38:06] !log re-run virtualpageviews-druid-daily 09/01/2020 via Hue [07:38:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:47:39] 10Analytics, 10GLOW, 10User-Elukey: Access to DataGrip refused - https://phabricator.wikimedia.org/T241170 (10elukey) Added a note to https://wikitech.wikimedia.org/w/index.php?title=Analytics%2FSystems%2FCluster%2FAccess&type=revision&diff=1850233&oldid=1838749 [08:03:13] good morning team [08:08:01] bonjour! [08:08:07] How are you elukey ? [08:10:26] good :) what about on the other side of Alps? :) [08:10:56] Good! Not raining this morning, which makes it an exception for this week :) [08:11:16] this is definitely a good start then! [08:55:35] (03CR) 10Joal: "Naming and abstraction discussionm, small comments :)" (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/562597 (https://phabricator.wikimedia.org/T192474) (owner: 10Milimetric) [10:06:31] 10Analytics, 10User-Elukey: No queries run in Hue - https://phabricator.wikimedia.org/T242306 (10elukey) I can see the following in the an-coord1001's db: ` MariaDB [hive_metastore]> select count(*) from DELEGATION_TOKENS; +----------+ | count(*) | +----------+ | 1927 | +----------+ 1 row in set (0.00 sec... [10:10:58] ok I found a way to repro in hue, it seems that if there is some hive cache (like the db list etc..) then Hue keeps working [10:11:12] I refreshed the list of dbs and now I get the error for my user as well [10:15:20] elukey: I don't understand what hive-cache is [10:15:51] joal: I think that hue keeps a cache of hive results [10:15:54] like the db list [10:16:04] that would make sense [10:16:57] I agree [10:17:09] Now how does that break tokens? [10:18:08] not sure if related to tokens or not, it is strange that a restart doesn't force hue to request other tokens [10:18:25] :( [10:18:32] hue probably stores them in the db [10:18:50] IIRC there was a mention of deleting rows in DB manually (session rows for hue IIRC) [10:19:31] I tried to check that table but it is not clear how to drop data.. [10:19:47] hm [10:20:03] MAYBE dropping all the rows would force a flush, but who knows.. [10:20:10] my token expires on the 14th afaics [10:20:25] so I guess that Hue will keep doing this dance for the next days [10:20:38] pfffff [10:22:45] elukey: I don't understand if the problem is related to restarts or not [10:24:59] ah! [10:25:11] there is a way to close a hive session [10:25:15] it works now [10:25:19] (in hue's ui I mean) [10:25:22] ? [10:26:01] but in Hue 4 [10:26:05] can't see it in Hue 3 [10:26:36] I think I found it!! [10:26:42] when you access via hue 4 the hive editor, at the top right there is a 3dots thing [10:26:50] in which you can delete/recreate [10:26:50] yup [10:26:54] yessir [10:27:15] now I have no idea if it will keep working or not [10:27:31] do you see it with hue 3? [10:28:22] ah yes I can see it [10:28:27] yup [10:28:27] it is in a different place [10:28:33] Should work the same [10:28:45] I'll take a screenshot and add to the task [10:28:54] \o/ [10:33:16] 10Analytics, 10User-Elukey: No queries run in Hue - https://phabricator.wikimedia.org/T242306 (10elukey) @MMiller_WMF I found a way in Hue's UI to force the re-creation of the Hive session, that seemed working for me (I was able to repro): {F31507268} You'd need to find the 3 dots at the top-right corner of... [10:34:18] ok so there seems to be a workaround [10:54:56] I created https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/563414/ for oozie and spark joal [11:11:50] I tried a pyspark2-shell spark.sql query and eventually it succeeds [11:15:43] I am wondering if it is a hadoop test oozie job that somehow causes the cluster to be full [11:15:52] and it affects when we test [11:16:05] because now the test cluster is free, and I can't repro anymore [11:16:46] ah no I can [11:16:54] but I had to re-create a spark session [11:18:09] what do you mean by re-create a spark session? [11:18:25] it still fails for me elukey [11:18:37] log off and launch pyspark2-shell again [11:18:44] ah ok [11:18:54] eventually it worked, and then re-issuing queries was fine [11:19:42] elukey: It seems that if you leave spark alone for a long enough time, it manages to connect, and runs correctly [11:21:14] joal: yes and also interestingly it keeps working [11:21:32] elukey: after a day for instance, without token renewal? [11:21:55] joal: no no I mean that subsequent queries work [11:22:00] in the same shell [11:22:01] yes yes [12:06:18] * elukey lunch! [12:48:29] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Create and use new schema repositories - https://phabricator.wikimedia.org/T240985 (10akosiaris) I 've went ahead and added some basic canary support in the mathoid chart (since this is the one I feel the most famili... [13:14:11] hi elukey - would be back by any chance? [13:25:25] joal: I am now :) [13:25:34] Hellooooo :) [13:25:50] elukey: Would you have some time to test a few command lines of hdfs-rsync? [13:26:08] elukey: dry-run only obvisouly :) [13:26:28] elukey: I'm asking cause the thing would be to run them on labstore1003 for instance (or another labstore) [13:27:41] * joal hope that diverting elukey from fixing stuff can be a nice idea on a friday afternoon ... [13:32:34] joal: yes sure! [13:32:45] Yay :) [13:33:16] elukey: will we be testing on labstore1003? [13:34:38] if so elukey, I have devised those commands: https://gist.github.com/jobar/8585610849c0c86f61067057ba821e4a [13:34:53] elukey: if ok for you, we could batcave when you want, so that I also see the results? [13:38:59] joal: sure, joining bc now [13:39:51] sorry elukey - signing in [14:30:32] !log restart oozie with new settings to instruct it to pick up spark-defaults.conf settings from /etc/spark2/conf [14:30:34] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:32:51] heya teammm [14:33:02] o/ [14:36:15] this google login everyday thing is really annoying! [14:36:24] i have 4 different apps I log in with! [14:36:43] it is very annoying [14:37:48] +10 [14:38:24] all right oozie on an-coord1001 has picked up the new spark-defaults config [14:40:25] (03CR) 10Ottomata: Encode to pagecounts-ez format with a UDF (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/562597 (https://phabricator.wikimedia.org/T192474) (owner: 10Milimetric) [14:42:02] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Enable encryption in Spark 2.4 by default - https://phabricator.wikimedia.org/T240934 (10elukey) Just restarted oozie, it picked up the new config so in theory we'll not need to change any workflows when enabling spark encryption. [14:49:12] elukey: it was a BUG! [14:50:10] !! [14:52:46] elukey: and finding this allowed me to realize that my tests were somehow not well set [14:57:18] it is a complex tool! [15:01:07] elukey: generating files is so fast that they have the same timestamp!!!! [15:01:18] Bad Joseph [15:01:27] in test I mean [15:01:35] Ok, gone for kids, back for standup [15:27:07] (03PS2) 10Mforns: Add anomaly detection to data quality stats workflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/563200 (https://phabricator.wikimedia.org/T235486) [15:28:44] (03CR) 10Mforns: [V: 03+2] "Tested that it works by launching the bundle on test databases." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/563200 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [15:29:16] (03CR) 10Mforns: [V: 03+2] "Tested that it works by running the oozie bundle and vetting results." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/561674 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [15:49:31] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Create and use new schema repositories - https://phabricator.wikimedia.org/T240985 (10Ottomata) Some other ideas about how to name and use canary labels: https://etherpad.wikimedia.org/p/k8s-canary [16:25:30] 10Analytics: Spike to plan what is needed for newpyter project - https://phabricator.wikimedia.org/T241186 (10Ottomata) [16:25:53] 10Analytics: Spike to plan what is needed for newpyter project - https://phabricator.wikimedia.org/T241186 (10Ottomata) [16:25:55] 10Analytics: Newpyter - First Class Jupyter Notebook system - https://phabricator.wikimedia.org/T224658 (10Ottomata) [16:29:36] elukey: just had a thought about hadoop hardware and leases and upgrade, etc. we also have those 5 presto nodes that can be new hadoop hw. if we decide to do a new cluster for the upgrade, those could be used for new cluster [16:29:42] and we could colocate presto like we have talked about too [16:31:23] ottomata: hello [16:31:24] :) [16:31:28] yes it could be an idea! [16:48:27] oh ebernhardson , i remember now, i had remembered it the wrong way around: [16:48:32] only python 3.7 is on buster [16:48:38] but wee have python3.7 also on stretch [16:49:00] so if you use a 3.7 venv [16:49:04] for dependencies, etc. [16:49:06] everything should just work [16:52:39] ottomata: hmm, i'll try a 3.7 build and see how it goes [17:01:49] ottomata: standup? [17:01:53] elukey: stamndup? [17:03:16] OH coming sorry [17:22:10] ottomata: https://github.com/wikimedia/hdfs-tools/pull/6 [17:22:13] meh [17:41:01] lol ottomata [17:42:31] hehe [17:43:34] merged joal [17:44:50] ottomata: fdans: https://youtu.be/EWzt1bI8AZ0 [17:44:56] oh wait, lemme link you direct [17:45:04] https://youtu.be/EWzt1bI8AZ0?t=74 [17:48:24] much shorter one: https://www.youtube.com/watch?v=GCtTtKKAhyE [17:57:10] Thanks ottomata - I'll deploy on monday :) [17:59:46] 10Analytics, 10User-Elukey: No queries run in Hue - https://phabricator.wikimedia.org/T242306 (10MMiller_WMF) @elukey -- clicking "recreate" makes it work! Thank you. [18:03:09] 10Analytics, 10Analytics-SWAP, 10Product-Analytics: Provide Python 3.6+ on SWAP - https://phabricator.wikimedia.org/T212591 (10Neil_P._Quinn_WMF) It looks like Pandas 1.0, which should be released in the next month or so, [will require Python 3.6.1 or higher](https://pandas.pydata.org/pandas-docs/version/1.0... [18:16:10] 10Analytics, 10User-Elukey: No queries run in Hue - https://phabricator.wikimedia.org/T242306 (10elukey) >>! In T242306#5793182, @MMiller_WMF wrote: > @elukey -- clicking "recreate" makes it work! Thank you. Great, please report in this task if the problem re-occurs! I hope it will not, but not sure.. [18:34:36] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Enable encryption in Spark 2.4 by default - https://phabricator.wikimedia.org/T240934 (10elukey) All the symptoms of the last issue to solve are highlighted in https://issues.apache.org/jira/browse/SPARK-19528 The application master in yarn seems to be cr... [18:34:44] ok declaring myself defeated by Spark (again) [18:34:48] going afk, o/ [18:34:51] have a good weened! [18:34:54] *weekend! [18:36:27] aww maaan [18:36:30] bye luca! [19:08:06] 10Analytics, 10GLOW: Particular MariaDB queries not working on SWAP - https://phabricator.wikimedia.org/T242448 (10Iflorez) [19:25:20] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - Can't find stats for number of "Very Active" editors - https://phabricator.wikimedia.org/T242451 (10Clayoquot) [19:59:14] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Services (watching): Add examples to all event schemas - https://phabricator.wikimedia.org/T242454 (10Ottomata) [20:05:13] random comment: last gnome is interesting, you need to install an extension to get "permanent" notifications... in the default system the notifications dissapears ... [21:12:40] nuria: question about the request for images [21:14:07] 10Analytics, 10GLOW: Particular MariaDB queries not working on SWAP - https://phabricator.wikimedia.org/T242448 (10Ottomata) A weird, error, something is wrong when using charset=binary to convert from whatever result mysql gives back into python: ` ~/venv/lib/python3.5/site-packages/mysql/connector/conversio... [21:18:27] nuria: I'm not sure if I should count for all-history or last-revision-per-page (1 count per page with the last known revision for the dump) [21:46:20] joal: i think last revision is plenty, no? [21:46:34] nuria: can we batcave around this? [21:46:42] ya [21:50:38] joal, nuria, when you're done, can we review the results of the anomalies? to see if you think that is deployable (as first step)? [21:50:54] I think it will take like 10 mins [21:54:34] 10Analytics: Newpyter - First Class Jupyter Notebook system - https://phabricator.wikimedia.org/T224658 (10Ottomata) @elukey FYI I've started taking notes and adding questions at https://etherpad.wikimedia.org/p/newpyter. [21:58:53] mforns: since it is a bit late for joal, let's do it mon? [21:58:59] nuria, sure :] [22:02:04] Thanks mforns - my brain is not in a very state that late ;) [22:02:17] no no, didn't even notice it was so late [22:02:20] sorry [22:02:26] have a great weekend!! [22:02:31] see you mforns :) [22:14:53] nuria: its actually more complicated than we thought :S [22:36:05] joal: yesss? [22:36:36] nuria: images can be stored on commons, or not, and be linked to through File: or Image: [22:37:10] joal: then ,let's simplify and get All images under File or image [22:37:12] example: https://wikitech.wikimedia.org/wiki/File:Editswikipedia200220182.png - Has a right tab: 'Export to Wikimedia Commons' [22:37:22] therefore not on commons [22:37:53] https://en.wikipedia.org/wiki/File:Pi-unrolled-720.gif has a right time 'View on commons' --> therefore on commons [22:38:04] and we have no easy way to distinct them at parsing [22:38:31] I'm currently building a query to join to our page-history on commons project [22:43:07] nuria: it seems to work :) [22:44:51] nuria: the widely most used image on arwiki: https://commons.wikimedia.org/wiki/File:Insert-signature.png [22:45:20] 3/4 of of all commons image usage ) [22:46:15] joal: man....