[00:16:14] 10Analytics, 10Product-Analytics: Metrics request on portal namespace usage - https://phabricator.wikimedia.org/T205681 (10AfroThundr3007730) Thanks for looking into this. We're still in the process of discussing the construction of the RfC and the criteria for the guidelines, so we have time. We want to be th... [00:17:43] 10Analytics, 10Product-Analytics: Metrics request on portal namespace usage - https://phabricator.wikimedia.org/T205681 (10AfroThundr3007730) [00:23:42] 10Analytics, 10Product-Analytics: Metrics request on portal namespace usage - https://phabricator.wikimedia.org/T205681 (10AfroThundr3007730) [06:00:47] 10Analytics-Kanban, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Add Tilman to analytics-admins - https://phabricator.wikimedia.org/T178802 (10Tbayer) >>! In T178802#4641742, @Ottomata wrote: > @HaeB do you still need this? Can we roll this back? Yes, until the end of January it looks like (se... [08:17:24] 10Analytics, 10Product-Analytics: Metrics request on portal namespace usage - https://phabricator.wikimedia.org/T205681 (10Pbsouthwood) I agree with AfroThundr3007730 on the probable usefulness and scope of this request. Cheers, P [08:21:34] (03CR) 10Elukey: Add python script importing xml dumps onto hdfs (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/456654 (https://phabricator.wikimedia.org/T202489) (owner: 10Joal) [08:24:40] (03PS13) 10Joal: Add python script importing xml dumps onto hdfs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/456654 (https://phabricator.wikimedia.org/T202489) [08:24:56] (03CR) 10Joal: Add python script importing xml dumps onto hdfs (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/456654 (https://phabricator.wikimedia.org/T202489) (owner: 10Joal) [08:25:02] Thanks elukey for the review :) [09:39:01] np! Didn't do a complete one but I can do it on Monday if you want (even if Andrew already +1ed it) [09:44:09] interested elukey :) Thanks ! [11:53:30] (03PS1) 10Framawiki: base.html: Add link to sql-optimizer tool [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/464947 [12:57:56] PROBLEM - Webrequests Varnishkafka log producer on cp1076 is CRITICAL: Return code of 255 is out of bounds [13:18:36] PROBLEM - Webrequests Varnishkafka log producer on cp1076 is CRITICAL: Return code of 255 is out of bounds [13:22:56] RECOVERY - Webrequests Varnishkafka log producer on cp1076 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf [14:16:02] cp1076 had troubles with disk --^ [15:48:18] Is https://yarn.wikimedia.org/cluster/scheduler a broken link for anyone else? [16:18:06] (03CR) 10Zhuyifei1999: [C: 031] base.html: Add link to sql-optimizer tool [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/464947 (owner: 10Framawiki) [16:24:49] (03CR) 10Framawiki: [C: 032] base.html: Add link to sql-optimizer tool [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/464947 (owner: 10Framawiki) [16:26:08] (03Merged) 10jenkins-bot: base.html: Add link to sql-optimizer tool [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/464947 (owner: 10Framawiki) [17:58:34] broken for me too (redirects to http://an-master1002.eqiad.wmnet:8088/cluster/scheduler ) [18:07:10] for some reason an-master1001 is not the yarn master :( [18:07:11] 2018-10-06 05:26:46,513 WARN org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: State-store fenced ! Transitioning RM to standby [18:10:07] !log restart Yarn Resource Manager on an-master1002 to force an-master1001 to take the active role back (failed over due to a zk conn issue) [18:10:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:10:45] HaeB, groceryheist - thanks a lot for the ping! For some reason the an-master1001 was not the master, a failover of the Yarn Resource Manager happened due to zk conn issue [18:10:49] really weird [18:11:13] I think that there is the need for an alarm to catch these situations [18:13:42] I am going to check again tomorrow morning EU time, I suspect that some connection issue with zookeeper and the new master nodes might be happening [18:13:53] Cc: joal --^ [18:14:00] but it might also be a sporadic one time thing [19:43:24] thanks elukey! (i know it's a weekend, so...) [19:50:25] thanks elukey