[01:45:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [01:49:10] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Data Pipelines, 10Data-Catalog: Integrate Spark with DataHub with lineage - https://phabricator.wikimedia.org/T306896#10350453 (10tchin) I went through all the dags in the analytics instance for a surface-level evaluation on whether we can apply spa... [03:10:15] RESOLVED: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [04:02:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [04:22:15] RESOLVED: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [06:28:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [10:28:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [11:58:49] 06Data-Engineering, 06Data-Platform-SRE: Log aggregation is failing for the analytics user due to too many files in /var/log/hadoop-yarn/apps/analytics/logs on HDFS - https://phabricator.wikimedia.org/T380674 (10BTullis) 03NEW [11:58:59] 06Data-Engineering, 06Data-Platform-SRE: Log aggregation is failing for the analytics user due to too many files in /var/log/hadoop-yarn/apps/analytics/logs on HDFS - https://phabricator.wikimedia.org/T380674#10350654 (10BTullis) p:05Triage→03Unbreak! [12:03:39] 06Data-Engineering, 06Data-Platform-SRE: Log aggregation is failing for the analytics user due to too many files in /var/log/hadoop-yarn/apps/analytics/logs on HDFS - https://phabricator.wikimedia.org/T380674#10350659 (10BTullis) I have looked into the options for increasing the limits on the number of files.... [12:40:57] 06Data-Engineering, 06Data-Platform-SRE: Log aggregation is failing for the analytics user due to too many files in /var/log/hadoop-yarn/apps/analytics/logs on HDFS - https://phabricator.wikimedia.org/T380674#10350679 (10BTullis) This didn't work. I've now removed the `/etc/hadoop/conf/hdfs-default.xml` file... [12:41:18] 06Data-Engineering, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Log aggregation is failing for the analytics user due to too many files in /var/log/hadoop-yarn/apps/analytics/logs on HDFS - https://phabricator.wikimedia.org/T380674#10350680 (10BTullis) [14:28:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [16:55:00] 06Data-Engineering, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Log aggregation is failing for the analytics user due to too many files in /var/log/hadoop-yarn/apps/analytics/logs on HDFS - https://phabricator.wikimedia.org/T380674#10350876 (10BTullis) p:05Unbreak!→03High I had to bump the Java heap on... [16:57:34] 06Data-Engineering, 10Data-Platform-SRE (2024.11.09 - 2024.11.29): Log aggregation is failing for the analytics user due to too many files in /var/log/hadoop-yarn/apps/analytics/logs on HDFS - https://phabricator.wikimedia.org/T380674#10350881 (10BTullis) In the meantime, the number of directory entries in `/... [18:28:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent [22:28:15] FIRING: HdfsCapacityRemainingPercent: Alarmingly low free space on the analytics-hadoop HDFS cluster. - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts#HDFS_Capacity_Remaining - https://grafana.wikimedia.org/d/000000585/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=106&fullscreen - https://alerts.wikimedia.org/?q=alertname%3DHdfsCapacityRemainingPercent