[00:13:15] 10Analytics, 10Product-Analytics, 10Editing-team (Tracking): Add MariaDB replicas to Superset - https://phabricator.wikimedia.org/T291195 (10ppelberg) >>! In T291195#7396005, @Milimetric wrote: > Looking forward to the meeting. Until then the team's general consensus is that we can't drive this, it just won... [01:14:48] (03PS1) 10Tim Starling: Update PHP query, adding 8.0, 8.1 and 8.2 [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/730389 [01:15:45] (03CR) 10Tim Starling: "Untested" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/730389 (owner: 10Tim Starling) [08:50:29] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) The fourth snapshot has finished loading: ` progress: [/10.64.32.128]0:1996/1996 100% [/10.64.48.65]0:2172/2172 100... [09:02:22] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) Reclaiming the space from this snapshot: ` root@aqs1011:/srv/cassandra-b/tmp# df -h /srv/cassandra-b/ Filesystem... [09:05:57] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) Reload operation five of twelve under way now. ` ### Moving table data in keyspace local_group_default_T_pageviews_... [09:21:55] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Jupyter notebook logs should appear in Logstash - https://phabricator.wikimedia.org/T288348 (10BTullis) That ecs_170 patch has definitely worked to some extent. I have seen my first message in logstash, but not yet as many as I was expecting. {F34686969} I... [09:30:08] 10Analytics: Kerberos identity for kcv-wikimf - https://phabricator.wikimedia.org/T293189 (10KCVelaga_WMF) [09:31:07] 10Analytics: Kerberos identity for kcv-wikimf - https://phabricator.wikimedia.org/T293189 (10KCVelaga_WMF) [10:14:00] 10Analytics-Data-Quality, 10WMDE-TechWish, 10WMDE-Templates-FocusArea, 10WMDE-TechWish-Sprint-2021-09-29: Check whether VE template dialog and Template Wizard metrics are healthy - https://phabricator.wikimedia.org/T292045 (10awight) 05Open→03Resolved [11:30:02] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Jupyter notebook logs should appear in Logstash - https://phabricator.wikimedia.org/T288348 (10BTullis) I have created a basic dashboard for these JupyterHub user notebook logs in Kibana. [[https://logstash.wikimedia.org/app/dashboards#/view/af2a44f0-2c0e-... [11:58:58] I've added a new JupyterHub dashboard in Kibana where users' logs can be visualized and searched: https://logstash.wikimedia.org/app/dashboards#/view/af2a44f0-2c0e-11ec-81e9-e1226573bad4 [11:59:20] Hopefully it will be useful. I'll work out how to filter per-user soon. [12:14:08] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Jupyter notebook logs should appear in Logstash - https://phabricator.wikimedia.org/T288348 (10BTullis) I have updated wikitech here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter#Viewing_JupyterHub_logs ...with details of the new dashboard... [13:19:18] 10Analytics, 10Data-Engineering: Allow users to differentiate their JupyterHub logs in Logstash - https://phabricator.wikimedia.org/T293243 (10BTullis) [13:38:45] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-MoritzMuehlenhoff: Improve user experience for Kerberos by creating automatic token renewal service - https://phabricator.wikimedia.org/T268985 (10BTullis) I have merged that change to increase the maximum renewable lifetime to 14 days.... [13:39:25] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641 (10BTullis) [14:54:54] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Wikidata, 10Wikidata-Query-Service: Add MCR slot information to revision-create events - https://phabricator.wikimedia.org/T293195 (10Ottomata) [16:34:05] btullis, ottomata, razzi - I may be late 10 mins, ping me if you need me sooner :) [16:34:31] elukey: We're at the offsite, so can't make sync. Sorry. [16:34:51] ahhh ack thanks! [16:44:59] (03PS2) 10Cicalese: Update PHP and version queries [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/730389 (owner: 10Tim Starling) [16:47:14] (03CR) 10Cicalese: "I made the corresponding change to the php_drilldown query and while doing that added the missing newer MediaWiki versions. I also am unab" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/730389 (owner: 10Tim Starling) [17:31:15] 10Analytics-Clusters, 10SRE, 10ops-eqiad: analytics1069 mgmt interface intermittently goes up and down - https://phabricator.wikimedia.org/T291732 (10BTullis) 05Open→03Resolved [17:34:27] 10Analytics-Clusters, 10SRE: stat1008 - low on disk - https://phabricator.wikimedia.org/T293283 (10Dzahn) [17:35:08] 10Analytics-Clusters, 10SRE: stat1008 - low on disk - https://phabricator.wikimedia.org/T293283 (10Dzahn) https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=stat1008&service=Disk+space [17:54:51] !log removed /tmp/spark-* files belonging to aikochou on stat1008 [17:54:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:56:50] do we not clean /tmp? [17:56:54] there are large files there from may [17:58:10] !log deleting files on stat1008 in /tmp older than 10 days and larger than 20M sudo find /tmp -mtime +10 -size +20M | xargs sudo rm -rfv [17:58:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:58:33] Maybe we could install tmpreaper: https://packages.debian.org/buster/tmpreaper [17:59:00] 10Analytics, 10Analytics-Kanban, 10SRE: ~1 request/minute to intake-logging.wikimedia.org times out at the traffic/service interface - https://phabricator.wikimedia.org/T264021 (10CDanis) Just a quick update that, after a year, this is still happening, on both [[ https://logstash.wikimedia.org/goto/a35e4125b... [17:59:11] 10Analytics-Clusters, 10SRE: stat1008 - low on disk - https://phabricator.wikimedia.org/T293283 (10Ottomata) Oh just saw this ticket. Have removed a lot of files in tmp. sudo find /tmp -mtime +10 -size +20M | xargs sudo rm -rfv [17:59:49] 10Analytics-Clusters, 10SRE: stat1008 - low on disk - https://phabricator.wikimedia.org/T293283 (10GoranSMilovanovic) @Dzahn > Most of the space is used by /tmp and there are always users running CPU-intensive processes (R, java, others?) here. I frequently run R jobs from stat1008 (every hour, from crontab)... [18:01:45] 10Analytics-Clusters, 10SRE: stat1008 - low on disk - https://phabricator.wikimedia.org/T293283 (10Dzahn) @GoranSMilovanovic Moving files out of your home should not matter since they are already on a separate and large partition mounted on /srv. This is just about / and the /tmp inside it. I may have misint... [18:03:39] 10Analytics-Clusters, 10SRE: stat1008 - low on disk - https://phabricator.wikimedia.org/T293283 (10GoranSMilovanovic) @Dzahn > I may have misinterpreted the CPU usage but looked like an R process was using 100% at that time. Ok, I am getting in touch because I typically use R. I always constrain the number... [18:03:47] 10Analytics-Clusters, 10SRE: stat1008 - low on disk - https://phabricator.wikimedia.org/T293283 (10Dzahn) 05Open→03Resolved a:03Dzahn rescheduled Icinga check: <+icinga-wm> RECOVERY - Disk space on stat1008 is OK: DISK OK [18:04:57] 10Analytics-Clusters, 10SRE: stat1008 - low on disk - https://phabricator.wikimedia.org/T293283 (10Dzahn) a:05Dzahn→03Ottomata [18:39:22] 10Analytics-Clusters, 10SRE: stat1008 - low on disk - https://phabricator.wikimedia.org/T293283 (10BTullis) Maybe we could encourage users to set the configuration item `spark.local.dir` to be `~/tmp/` so that any large files are generated under `/srv/`? {F34687508} @Ottomata - Do you think that this would hel... [19:25:04] 10Analytics-Clusters, 10SRE: stat1008 - low on disk - https://phabricator.wikimedia.org/T293283 (10Dzahn) It would definitely help if those tmp files could be under /srv/tmp instead of /tmp. That should fix it if the config change isn't a problem. Nice [19:29:31] 10Analytics-Radar, 10SecTeam-Processed, 10Security, 10User-Elukey: Establish a process to periodically review and approve access for hadoop/hue users - https://phabricator.wikimedia.org/T121136 (10Mstyles) [19:49:46] !log re-ran cassandra-daily-coord-local_group_default_T_pageviews_per_article_flat for 2021-10-12 successfully [19:49:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:47:19] (03CR) 10Jirislav: "I've looked up some of your active CR, just out of curiosity, to get the feel of your workflow, so I hope it's not inappropriate of me aft" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/673648 (https://phabricator.wikimedia.org/T265765) (owner: 10Milimetric)