[00:48:20] PROBLEM - Check the last execution of monitor_refine_eventlogging_legacy_failure_flags on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_legacy_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:28:13] RECOVERY - Check the last execution of monitor_refine_eventlogging_legacy_failure_flags on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_legacy_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [10:51:38] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade the Hadoop test cluster to BigTop - https://phabricator.wikimedia.org/T244499 (10elukey) Instructions to upgrade the coordinator (or where hive/oozie run): ` dpkg -l | awk '/+cdh/ {print $2}' | tr '\n' ' ' > /root/cdh_pac... [11:32:43] * elukey lunch! [12:19:16] (03PS1) 10Awight: Filter out "#"- lines from the dblist [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/619458 (https://phabricator.wikimedia.org/T260122) [13:11:27] heyall! [13:23:59] (03PS1) 10Elukey: hive: quote all usages of percent/range words [analytics/refinery] - 10https://gerrit.wikimedia.org/r/619466 (https://phabricator.wikimedia.org/T244499) [13:24:46] mforns: hola! [13:27:58] (03CR) 10Elukey: "As far as I can see from https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select backticks in select statements are support" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/619466 (https://phabricator.wikimedia.org/T244499) (owner: 10Elukey) [13:29:39] (03PS2) 10Elukey: hive: quote all usages of percent/range words [analytics/refinery] - 10https://gerrit.wikimedia.org/r/619466 (https://phabricator.wikimedia.org/T244499) [13:54:27] (03PS1) 10Ottomata: Add constraints parameter when using bin/camus with EventStreamConfig API [analytics/refinery] - 10https://gerrit.wikimedia.org/r/619469 (https://phabricator.wikimedia.org/T251935) [13:55:08] 10Analytics: Use MaxMind DB in piwik geo-location - https://phabricator.wikimedia.org/T213741 (10elukey) @Aklapper the description is too generic, what we meant was "good first task for new Analytics SREs" since we are onboarding two very soon. We'll avoid using the tag sorry! [13:56:08] (03PS2) 10Ottomata: Add constraints parameter when using bin/camus with EventStreamConfig API [analytics/refinery] - 10https://gerrit.wikimedia.org/r/619469 (https://phabricator.wikimedia.org/T251935) [14:01:08] mforns: if you could review ^ real quick [14:16:29] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade the Hadoop test cluster to BigTop - https://phabricator.wikimedia.org/T244499 (10elukey) ` elukey@analytics1028:~$ sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -finalizeUpgrade Finalize upgrade successful for analy... [14:19:20] finalized hdfs on the test cluster, nothing exploded [14:19:52] ottomata: lookin [14:28:25] (03CR) 10Mforns: [C: 03+1] "See 2 comments for typos, but otherwise, LGTM!" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/619469 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata) [14:49:27] (03CR) 10Ottomata: Add constraints parameter when using bin/camus with EventStreamConfig API (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/619469 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata) [14:49:31] (03CR) 10Ottomata: [C: 03+2] Add constraints parameter when using bin/camus with EventStreamConfig API [analytics/refinery] - 10https://gerrit.wikimedia.org/r/619469 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata) [14:49:36] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Add constraints parameter when using bin/camus with EventStreamConfig API [analytics/refinery] - 10https://gerrit.wikimedia.org/r/619469 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata) [14:49:56] (03PS3) 10Ottomata: Add constraints parameter when using bin/camus with EventStreamConfig API [analytics/refinery] - 10https://gerrit.wikimedia.org/r/619469 (https://phabricator.wikimedia.org/T251935) [14:50:30] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Add constraints parameter when using bin/camus with EventStreamConfig API [analytics/refinery] - 10https://gerrit.wikimedia.org/r/619469 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata) [14:51:32] (03CR) 10Nuria: [C: 03+2] "Let's merge and note on train etherpad what jobs need to be restarted this week, we need to restart refine for other changes so that is al" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/619466 (https://phabricator.wikimedia.org/T244499) (owner: 10Elukey) [14:52:12] !log scap deploy refinery to an-launcher1002 to get camus wrapper script changes [14:52:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:21:40] 10Analytics: Use MaxMind DB in piwik geo-location - https://phabricator.wikimedia.org/T213741 (10Nuria) * For an entry level task this probably needs to happen in labs first [15:26:55] 10Analytics, 10ContentTranslation, 10Language-Team (Language-2020-Focus-Sprint): Test Performance of Marian NMT translation in stat cluster - https://phabricator.wikimedia.org/T247245 (10Nuria) 05Resolved→03Open [15:41:22] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: HiveExtensions.convertToSchema does not properly convert arrays of structs - https://phabricator.wikimedia.org/T259924 (10EBernhardson) Can you use spark higher order functions, particularly transform(array, function): array ? This effectively... [15:42:27] ottomata : yt? [15:42:40] ottomata: have 10 minutes to talk about SRE onboarding? [16:00:50] ping mforns ottomata [16:13:11] (03CR) 10Nuria: [V: 03+2 C: 03+2] hive: quote all usages of percent/range words [analytics/refinery] - 10https://gerrit.wikimedia.org/r/619466 (https://phabricator.wikimedia.org/T244499) (owner: 10Elukey) [16:23:56] 10Analytics, 10Analytics-Kanban, 10Platform Team Workboards (Initiatives): reportupdater Pingback reports are broken and need to be refactored - https://phabricator.wikimedia.org/T246154 (10Nuria) a:03mforns [16:42:23] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Infrastructure-Data, and 3 others: Session Length Metric. Web implementation - https://phabricator.wikimedia.org/T248987 (10jlinehan) >>! In T248987#6352594, @Nuria wrote: >>I don't mind, but maybe I don't understand what is confusing?... [17:11:44] (03CR) 10Ladsgroup: Change metric for TwoColConflict disables (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/611279 (https://phabricator.wikimedia.org/T257577) (owner: 10Awight) [17:11:57] (03PS1) 10Ladsgroup: Change metric for TwoColConflict disables [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/619378 (https://phabricator.wikimedia.org/T257577) [17:12:10] (03CR) 10Ladsgroup: [C: 03+2] Change metric for TwoColConflict disables [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/619378 (https://phabricator.wikimedia.org/T257577) (owner: 10Ladsgroup) [17:12:49] (03Merged) 10jenkins-bot: Change metric for TwoColConflict disables [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/619378 (https://phabricator.wikimedia.org/T257577) (owner: 10Ladsgroup) [17:15:01] (03CR) 10Awight: "Thank you!" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/619378 (https://phabricator.wikimedia.org/T257577) (owner: 10Ladsgroup) [17:15:21] (03Merged) 10jenkins-bot: Filter out "#"- lines from the dblist [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/619458 (https://phabricator.wikimedia.org/T260122) (owner: 10Awight) [17:15:25] (03Merged) 10jenkins-bot: Filter out "#"- lines from the dblist [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/619379 (https://phabricator.wikimedia.org/T260122) (owner: 10Ladsgroup) [17:16:12] (03CR) 10Ladsgroup: "> Patch Set 1:" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/619378 (https://phabricator.wikimedia.org/T257577) (owner: 10Ladsgroup) [17:17:20] * elukey afk! [17:18:19] (03CR) 10Awight: Change metric for TwoColConflict disables (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/611279 (https://phabricator.wikimedia.org/T257577) (owner: 10Awight) [17:33:43] 10Analytics, 10Analytics-Kanban, 10Design-Research: Setup and integrate analytics for Design Research Website - https://phabricator.wikimedia.org/T259322 (10Volker_E) @Nuria The site will live at http://design.wikimedia.org/strategy/ [17:36:04] !log refine with refinery-source 0.0.132 and merge_with_hive_schema_before_read=true - T255818 [17:36:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:36:07] T255818: Refine drops $schema field values - https://phabricator.wikimedia.org/T255818 [17:56:46] 10Analytics, 10Analytics-Kanban, 10Design-Research: Setup and integrate analytics for Design Research Website - https://phabricator.wikimedia.org/T259322 (10Nuria) That site (design.wikimedia.org) already has piwik, no additional setup is needed. You will see hits on /strategy on your dashboard when those h... [18:42:41] (03CR) 10Awight: "Thanks again :-)" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/619379 (https://phabricator.wikimedia.org/T260122) (owner: 10Ladsgroup) [18:58:20] it seems i need ACL's to get kafka admin apis now, could someone check what the ttl is on mjolnir.* topics in jumbo? I think they are 24h, but can't remember [18:58:50] (at least, my attempts to use the python kafka admin apis was rejected) [18:59:25] ? [18:59:30] ebernhardson: i think that is probably the ferm rules [18:59:40] it is probably dependent on wher you are executing from [18:59:45] we haven't enabled any ACLs [18:59:48] just firewall rules iiuc [18:59:50] ottomata: ahh, maybe. [19:00:01] can you run the same commands from the mjolnir hosts? [19:00:03] or stat boxes? [19:00:07] trying, sec [19:05:41] ottomata: same ottomata doesn't seem like it, from stat1007: https://phabricator.wikimedia.org/P12213 [19:11:13] hm [19:12:46] ebernhardson: i've never had to use the admin api like that [19:12:56] kafkacat -L -b kafka-jumbo1001 -t mjolnir.msearch-prod-response [19:12:58] works for me from stat1007 [19:13:25] ottomata: that doesn't list the topic config like ttl though [19:14:34] mostly i'm poking at these api's because i couldn't find any public tools installed that would query the topic configuration [19:14:55] just checked, no ACLs for that [19:15:09] there must be something else going on with the ferm rules and the admin cli [19:15:18] does it use a diferent port? no right? [19:15:45] it doesn't seem like ferm, because it gets a real kafka response, kafka just says err 29 (TOPIC_AUTHORIZATION_FAILED). very odd [19:16:13] yeah seems like it, but maybe it would say that if it coudln't connect to do the authorizaiton? [19:16:19] hmm, maybe [19:23:03] ebernhardson: are you trying to change the setting? [19:23:14] oh you are just checking [19:23:25] ebernhardson: [19:23:28] https://www.irccloud.com/pastebin/IMXCCQcR/ [19:28:12] ottomata: just trying to verify, thanks! [19:34:18] as FYI no ferm rules have been activated for jumbo yet :) [19:34:32] I am waiting for the confirmation that search-loader hosts are ok [19:45:44] elukey: still not looking great :( The run finished yesterday, but even with the repaired event inputs it's still showing reduced output, still investigating [23:51:52] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: EventGate validation errors should be visible in logstash - https://phabricator.wikimedia.org/T116719 (10Krinkle) Regarding query, I meant Logstash query. I've turned your query into a dashboard that engineering tea...