[00:31:38] (03CR) 10Jenniferwang: "> Patch Set 5:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/628237 (https://phabricator.wikimedia.org/T262496) (owner: 10Jenniferwang) [01:50:33] 10Analytics-Radar, 10Product-Analytics, 10Growth-Team (Current Sprint), 10Patch-For-Review: HomepageVisit schema validation errors - https://phabricator.wikimedia.org/T269966 (10Tgr) >>! In T269966#6692835, @nettrom_WMF wrote: > I'm for keeping the name, because as far as I can tell from T258008 it continu... [01:53:46] 10Analytics-Radar, 10Product-Analytics, 10Growth-Team (Current Sprint), 10Patch-For-Review: HomepageVisit schema validation errors - https://phabricator.wikimedia.org/T269966 (10Tgr) So, the patch above will fix this issue. More generally, we could do one or both of two things: * include eventlogging error... [03:20:52] 10Analytics, 10Analytics-EventLogging, 10ci-test-error (WMF-deployed Build Failure): WikimediaEvents\Tests\PageViewsTest::testLog HTTP request blocked: - https://phabricator.wikimedia.org/T270226 (10Reedy) [03:21:10] 10Analytics, 10Analytics-EventLogging, 10ci-test-error (WMF-deployed Build Failure): WikimediaEvents\Tests\PageViewsTest::testLog HTTP request blocked: - https://phabricator.wikimedia.org/T270226 (10Reedy) [03:21:22] 10Analytics, 10Analytics-EventLogging, 10ci-test-error (WMF-deployed Build Failure): WikimediaEvents\Tests\PageViewsTest::testLog HTTP request blocked - https://phabricator.wikimedia.org/T270226 (10Reedy) [03:24:02] 10Analytics, 10Analytics-EventLogging, 10ci-test-error (WMF-deployed Build Failure): WikimediaEvents\Tests\PageViewsTest::testLog HTTP request blocked - https://phabricator.wikimedia.org/T270226 (10Reedy) [03:25:01] 10Analytics, 10Analytics-EventLogging, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. Use MockHttpTrait. - https://phabricator.wikimedia.org/T270226 (10Reedy) [03:25:14] 10Analytics, 10Analytics-EventLogging, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. Use MockHttpTrait. - https://phabricator.wikimedia.org/T270226 (10Reedy) [03:25:39] 10Analytics, 10Analytics-EventLogging, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. Use MockHttpTrait. - https://phabricator.wikimedia.org/T270226 (10Reedy) [03:26:56] 10Analytics, 10Analytics-EventLogging, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. Use MockHttpTrait. - https://phabricator.wikimedia.org/T270226 (10Reedy) [06:23:31] good morning [07:40:01] isaacj: o/ - I guess nothing horrible happened on stat1004 right? [08:07:58] Ah! I missed 2 patches for new pageview sites, therefore alerts still present - Merging them now and updating the prod folder [08:08:05] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/642894 (https://phabricator.wikimedia.org/T268410) (owner: 10Gerrit maintenance bot) [08:08:21] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/643025 (https://phabricator.wikimedia.org/T268448) (owner: 10Gerrit maintenance bot) [08:11:53] (03PS2) 10Joal: Add skr.wikipedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/642894 (https://phabricator.wikimedia.org/T268410) (owner: 10Gerrit maintenance bot) [08:12:50] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging after manual rebase" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/642894 (https://phabricator.wikimedia.org/T268410) (owner: 10Gerrit maintenance bot) [08:13:56] !log Manually push updated pageview whitelist to HDFS [08:13:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:21:35] joal: bonjour [08:22:17] today I was wondering if a jump from cdh to bigtop 1.5 is even feasible, but the 2.6 -> 2.10 jump is the bit that I am afraid of [08:22:32] even if it could only be a mental thing :D [08:26:17] I need to restore cdh on the test cluster (that involves starting hdfs from scratch, since I have finalized the hdfs upgrade for bigtop 1.5) so I could test a rollout/rollback of 1.5 [08:27:37] we'd recover some time lost waiting for the backup cluster, plus we'd be able to start reimaging workers for buster right after it [08:28:18] (this time it will be nice, 60+24 workers + coordinators + masters) [08:53:30] elukey: I don't see reasons not to try - The jump would be a bit bigger, but it would prevent jumping again :) [08:53:47] exactly yes [08:53:52] ok I'll test it :) [10:05:26] 10Analytics-Clusters, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker10[18-41] - https://phabricator.wikimedia.org/T260445 (10elukey) @Cmjohnson what do you think about the last proposal? [10:05:26] elukey: looks like kafka-test1009.eqiad.wmnet was half way through a reimage. i have fixed the issue m,entioned earlier and manually kicked of a puppet run [10:09:56] jbond42: thanks a lot! cc: razzi that is working on it :) [10:36:39] (03CR) 10Lucas Werkmeister (WMDE): "No, it’s nothing really serious. But I don’t know how Grafana handles combinations of metrics that arrive at different timestamps (can I d" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/649676 (owner: 10Lucas Werkmeister (WMDE)) [10:46:50] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Deprecate the 'researchers' posix group - https://phabricator.wikimedia.org/T268801 (10elukey) Just sent another email as heads up. I added Monday 21st as deadline, if I don't hear anything from people I'll proceed with the removal of the username... [10:48:14] 10Analytics-Clusters, 10Analytics-Kanban: Deprecate the anaytics-users POSIX group - https://phabricator.wikimedia.org/T269150 (10elukey) >>! In T269150#6660663, @elukey wrote: > @brion @Mhurd @SantoshiWiki Hi! My team is trying to consolidate POSIX groups, and you are the last ones remaining in `analytics-use... [11:34:59] 10Analytics-Kanban, 10Patch-For-Review: Test the Bigtop 1.5 RC release on the Hadoop test cluster - https://phabricator.wikimedia.org/T269919 (10elukey) The Bigtop 1.5 release vote passed, we'll soon have a new Bigtop version officially available from upstream :) The remaining (big) step to do in this task is... [11:40:00] * elukey lunch! [12:01:32] 10Analytics, 10Analytics-EventLogging, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. Use MockHttpTrait. - https://phabricator.wikimedia.org/T270226 (10Michael) [12:06:26] elukey: yep, everything seems to be working normally still [12:07:23] 10Analytics, 10Analytics-EventLogging, 10Wikidata, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. Use MockHttpTrait. - https://phabricator.wikimedia.org/T270226 (1... [12:08:16] good morning folks :) [12:10:06] (03PS1) 10Andrew-WMDE: Process EventLogging events for TemplateData [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649861 (https://phabricator.wikimedia.org/T270246) [12:12:05] 10Analytics, 10Analytics-EventLogging, 10Wikidata, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. Use MockHttpTrait. - https://phabricator.wikimedia.org/T270226 (1... [12:48:26] (03PS1) 10Gerrit maintenance bot: Add diq.wiktionary to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/649869 (https://phabricator.wikimedia.org/T270275) [12:48:58] (03PS1) 10Gerrit maintenance bot: Add bcl.wiktionary to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/649871 (https://phabricator.wikimedia.org/T270274) [13:05:10] 10Analytics, 10Analytics-EventLogging, 10Wikidata, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. Use MockHttpTrait. - https://phabricator.wikimedia.org/T270226 (1... [13:07:23] 10Analytics, 10Analytics-EventLogging, 10Wikidata, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. Use MockHttpTrait. - https://phabricator.wikimedia.org/T270226 (1... [13:48:36] 10Analytics, 10Analytics-EventLogging, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. ... - https://phabricator.wikimedia.org/T270226 [13:50:01] 10Analytics, 10Analytics-EventLogging, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. ... - https://phabricator.wikimedia.org/T270226 [13:55:19] hola fdans [13:57:16] 10Analytics, 10Analytics-EventLogging, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. ... - https://phabricator.wikimedia.org/T270226 [14:12:35] (03PS1) 10Fdans: AQS: add configuration for timeout and max retries to Druit requests [analytics/aqs] - 10https://gerrit.wikimedia.org/r/649884 (https://phabricator.wikimedia.org/T268809) [14:24:20] (03CR) 10Elukey: "Thanks a lot for working on this, added some questions to kick of the conversation!" (033 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/649884 (https://phabricator.wikimedia.org/T268809) (owner: 10Fdans) [14:33:28] (03PS4) 10Fdans: Add Active Editors per Country metric to Wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/647792 (https://phabricator.wikimedia.org/T188859) [14:33:32] (03PS2) 10Awight: [WIP] Aggregate TemplateWizard metrics [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649351 (https://phabricator.wikimedia.org/T262209) [14:35:59] (03PS1) 10Awight: Include a bucket for anonymous editors [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649888 (https://phabricator.wikimedia.org/T262209) [14:41:44] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Wikistats 2.0: Add statistics for the geographical origin of the contributors - https://phabricator.wikimedia.org/T188859 (10fdans) p:05Medium→03High [14:43:23] https://bigtop.apache.org/download.html#releases [14:43:28] The latest release of Apache Bigtop software framework [14:43:29] Bigtop 1.5.0 (pgp sha256 sha512) [14:43:32] yesssssss [14:43:43] it is out! [14:44:17] hehe! [14:44:55] so happy [14:45:18] (03PS2) 10Fdans: AQS: add configuration for timeout and max retries to Druid requests [analytics/aqs] - 10https://gerrit.wikimedia.org/r/649884 (https://phabricator.wikimedia.org/T268809) [14:45:26] the main sad thing is that debian 11 is around the corner :D [14:45:36] (codename "bullseye") [14:45:45] (03CR) 10Fdans: "Thank you for the comments @elukey !" (033 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/649884 (https://phabricator.wikimedia.org/T268809) (owner: 10Fdans) [14:47:26] (03CR) 10Elukey: AQS: add configuration for timeout and max retries to Druid requests (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/649884 (https://phabricator.wikimedia.org/T268809) (owner: 10Fdans) [14:55:06] joal: also tested druid and bigtop 1.5, navtiming is indexing fine, no issue over there [14:56:07] 10Analytics-Clusters, 10Analytics-Kanban: Refactor puppet profiles to reduce hiera pollution - https://phabricator.wikimedia.org/T268220 (10elukey) we decided that it was enough for this task moving to done! [14:56:38] 10Analytics-Clusters, 10Analytics-Kanban: Refactor puppet profiles to reduce hiera pollution - https://phabricator.wikimedia.org/T268220 (10elukey) [15:01:19] (03CR) 10Joal: "A bunch of comments, mostly on code organization" (036 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/647092 (https://phabricator.wikimedia.org/T266872) (owner: 10Ottomata) [15:11:03] elukey: So about kafkacat. I think we have the following options: a) just copy the binary onto our machines, no pkg management b) copy the deb onto the machines and use dpkg to install it over the existing version c) make the deb available through WMF's apt infrastructure, limited to analytics hosts (and whoever else wants to opt in). [15:11:45] I think they're in rough order of ascending complexity and desirablity. c) is the largest amount of work right now, but is more sustainable [15:14:01] klausman: hi! Yes I'd say that c) is the best course of action, we could create a component in our apt to host the new kafkacat, or just include it to buster-wikimedia and let others know etc.. [15:15:20] heya teammm! [15:15:35] elukey: can I ask questionnn? [15:16:11] we could take https://github.com/edenhill/kafkacat/tree/debian and build on deneb.. I also see that https://github.com/edenhill/kafkacat/blob/debian/debian/control lists librdkafka-dev (>= 0.9.0) [15:16:16] mforns: hola! sure [15:16:20] :] [15:16:48] elukey: all reportupdater jobs (hive and mysql) run from an-launcher1002 now, right? [15:17:05] correct yes [15:17:15] in the past we split reportupdater configs, because the hive jobs would run from one machine and the mysql ones from another [15:17:25] now, is it still necessary to split jobs? [15:17:52] in theory no, they are all running as 'analytics' on launcher [15:17:54] I saw that mysql jobs have the use_kerberos => false flag in puppet, but not sure if they will still work with kerberos => true [15:18:02] that >=0.9.0 dep I patched out in my tests, and I observed no adverse effects [15:18:15] As in: I patched out the version requirement, not the whole dep [15:18:46] Let me see if I can find a rationale for it in that github history [15:19:02] because that branch should be the one used by debian upstream [15:19:36] we can get that or directly the one on debian, check it out on deneb, tweak the control and rebuild with say +wmf1 or +deb10u1 etc.. [15:19:52] and then we should be done [15:20:05] ` * d/control: ensure librdkafka can handle consumer mode.` [15:20:12] Is that relevant to us? [15:21:22] Hurm. Nevermind. We have a newer version (0.11) from some WMF repo [15:23:53] why did I have to patch that then... [15:25:27] Well, looks like the only change from the `debian` branch we'd need is for debhelper-compat, if deneb is still on 12 [15:27:34] ah that part is not clear to me, we need to check [15:31:36] (03CR) 10Andrew-WMDE: [C: 03+1] Include a bucket for anonymous editors [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649888 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [15:37:18] (03PS3) 10Awight: [WIP] Aggregate TemplateWizard metrics [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649351 (https://phabricator.wikimedia.org/T262209) [15:39:36] elukey: so, do you think a mysql reportupdater job with use_kerberos => true will work fine? [15:40:41] elukey: Deneb is indeed on 12. Minor change, really. [15:46:11] mforns: sorry I am not getting what you need to do, why do you need use_kerberos => true ? [15:46:24] (we can bc if you have time) [15:46:30] elukey: ok! omw [15:48:34] 10Analytics, 10Analytics-EventLogging, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. ... - https://phabricator.wikimedia.org/T270226 [15:49:02] 10Analytics, 10Analytics-EventLogging, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. ... - https://phabricator.wikimedia.org/T270226 [15:49:30] 10Analytics, 10Analytics-EventLogging, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. ... - https://phabricator.wikimedia.org/T270226 [15:53:27] (03CR) 10Andrew-WMDE: "Thanks!" (037 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649861 (https://phabricator.wikimedia.org/T270246) (owner: 10Andrew-WMDE) [15:53:59] 10Analytics, 10Analytics-EventLogging, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. ... - https://phabricator.wikimedia.org/T270226 [15:58:42] (03CR) 10Mforns: Include a bucket for anonymous editors (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649888 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [16:00:42] bbiab, need to do some grocery shopping. Should be back well before standup [16:01:18] ack! [16:04:23] mforns: ok to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/649661 ? [16:04:36] otherwise I'll ask 5 euros to awight [16:04:38] elukey: yes! [16:04:41] ahh snap [16:04:45] merging :D [16:04:48] hehehe [16:05:02] thanks! [16:17:41] !log dropping and re-creating superset staging database [16:17:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:17:59] elukey@an-launcher1002:~$ sudo systemctl list-timers | grep codemirror [16:18:02] Wed 2020-12-16 17:00:00 UTC 42min left n/a n/a reportupdater-codemirror.timer reportupdater-codemirror.service [16:18:06] awight: --^ [16:31:10] * elukey coffee [16:49:31] (03CR) 10Mforns: "> Patch Set 5:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/628237 (https://phabricator.wikimedia.org/T262496) (owner: 10Jenniferwang) [16:57:48] (03CR) 10Mforns: "@JenniferWang, I'm sorry, for some reason we missed this patch last week, and we did not deploy it." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/628237 (https://phabricator.wikimedia.org/T262496) (owner: 10Jenniferwang) [17:01:52] fdans: standuuuupppp [17:02:01] mforns: --^ [17:02:09] uoooo [17:02:14] get up standup, don't give up the fight [17:04:33] PROBLEM - Check the last execution of reportupdater-codemirror on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-codemirror https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:04:48] ahhh [17:09:17] 10Analytics, 10serviceops, 10User-jijiki: Clarify multi-service instance concepts in helm charts and enable canary releases - https://phabricator.wikimedia.org/T242861 (10jijiki) [17:21:39] (03PS1) 10Elukey: codemirror: fix config layout [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649939 [17:24:37] (03PS2) 10Elukey: codemirror: fix config layout and permissions [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649939 [17:25:05] RECOVERY - Check the last execution of reportupdater-codemirror on an-launcher1002 is OK: OK: Status of the systemd unit reportupdater-codemirror https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:29:45] (03PS3) 10Elukey: codemirror: fix configs [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649939 [17:30:58] mforns: --^ : [17:30:59] :) [17:31:27] elukey / joal I was going to bring this up, I have no idea how we do it and if we create explicit drop and broadcast rules, sorry to be ignorant: https://druid.apache.org/docs/latest/operations/rule-configuration.html [17:31:56] but it seems that if you're asking on a forum, that would be the context to ask about, ie the broadcast rules not being flexible enough [17:32:06] (seems like they should allow configuring a delay) [17:32:36] uau elukey you ninja-fixed that [17:32:51] (03CR) 10Andrew-WMDE: [C: 03+1] "Thanks for taking a closer look!" (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649888 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [17:32:51] * milimetric lunch [17:33:47] I don't think we use broadcast rules milimetric - We use load and drop, onl, and for the given conf we use loadForever so no drop [17:35:46] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649939 (owner: 10Elukey) [17:36:10] thanks elukey :] [17:42:15] \o/ [17:42:17] (03CR) 10Mforns: "Code looks good to me! Can you please make sure that the Hive script files are executable? Thanks :]" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649861 (https://phabricator.wikimedia.org/T270246) (owner: 10Andrew-WMDE) [17:42:36] mforns: going to restart the codemirror job [17:44:00] (03CR) 10Mforns: "Sorry for the confusion." (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649888 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [17:49:49] (03PS4) 10Awight: [WIP] Aggregate TemplateWizard metrics [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649351 (https://phabricator.wikimedia.org/T262209) [17:50:22] (03PS5) 10Awight: [WIP] Aggregate TemplateWizard metrics [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649351 (https://phabricator.wikimedia.org/T262209) [17:52:21] elukey: I'm having problem interacting with oozie API :( [17:52:31] I think it's kerberos related [17:52:37] (03CR) 10Awight: "Thanks for the fixes!" (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649939 (owner: 10Elukey) [17:54:31] (03PS1) 10Awight: Make scripts executable [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649946 [17:58:28] Just to confirm, reportupdater still runs on an-launcher1002.eqiad.wmnet, and proles like me don't have access to read the error logs? [18:00:18] (03CR) 10Awight: "> Can you please make sure the hive script file is executable?" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649888 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [18:00:41] awight: reportupdater logs are under /srv/reportupdater/logs [18:00:54] not sure if you have access... [18:02:09] Not to that machine. Fine by me, I was asking mostly to see if I can help monitor my team's scripts. [18:02:40] awight: yeah for the moment that host is only for us, since it runs our jobs etc.. [18:03:05] we have some thought about creating a dedicated vm for these team-specific requirements [18:03:15] I'll keep you in the loop :) [18:03:18] (03CR) 10Mforns: [V: 03+2 C: 03+2] "OK, LGTM!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649888 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [18:03:36] This is one of those things where I'm happy to not be driving ;-) [18:05:00] basically everything was on stat100x, we unified the clients to be pure clients (with stat1007 as little exception) and consolidated jobs in various places [18:05:04] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649946 (owner: 10Awight) [18:05:11] but now more requests have been made etc.. so we have to adjust :) [18:05:55] mforns: ok to restart the code mirror report job? [18:06:04] elukey: yes please! [18:06:35] elukey: just merged, maybe let puppet agent pull the code, or force it (you probabl already did...) [18:07:18] ah snap in my change I didn't add the exec bits [18:07:18] sigh [18:07:27] yes yes and it failed again [18:07:54] sorry I didn't notice it [18:08:34] ah an also I didn't add the .sql stuff [18:08:37] what a disaster Luca [18:09:48] ah no sorry that part was done ok [18:09:55] fiuuu [18:10:17] so re-running again puppet, the changes for the +x weren ot picked up [18:12:20] ohh nice it runs :) [18:14:23] (03PS3) 10Andrew-WMDE: Process EventLogging events for TemplateData [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649861 (https://phabricator.wikimedia.org/T270246) [18:17:22] awight: all good, logs are fine now :) [18:17:28] thanks for fixing the extra +x [18:21:03] 10Analytics: Update Spicerack cookbooks to follow the new class API conventions - https://phabricator.wikimedia.org/T269925 (10elukey) [18:28:15] mforns: I'm gonna manually restart the data-quality jobs - anything special about them? [18:28:44] joal: the only thing is you have to specify the granularity [18:29:52] ack mforns - I wondered about emails for alerts but I have just seen the are encoded in bundle.xml - all good :) [18:30:05] ok [18:31:45] !log Kill-rerun data-quality bundles [18:31:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:37:32] !log Kill-restart wikidata-entity, wikidata-item_page_link and mobile_apps-session_metrics oozie jobs [18:37:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:52:27] !log Kill-restart cassandra loading oozie jobs [18:52:28] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:54:42] joal is on fire [18:54:53] elukey: doing the manual ones first [18:55:23] elukey: I'll be able to automate (more or less) for many, but still have a few to do manually [18:55:25] joal: going afk for some mins but I'll be back if you need me and/or if you want to swear against me because I broke jobs :D [18:55:46] also elukey I think you missed my ping earlier: I have not managed to have oozie API working :() [18:56:02] elukey: I think its kerberos related, and wnet back to good ol' CLI [18:56:03] ah snap :( [18:56:57] * elukey afk! bbiab [19:35:22] !log Kill-restart all oozie jobs belonging to analytics except mediawiki-wikitext-history-coord [19:35:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:49:04] awight: ok if I merge https://gerrit.wikimedia.org/r/c/mediawiki/extensions/TemplateWizard/+/649594 ? [19:58:46] joal: wow all done? [19:59:10] elukey: launching commands gently 5 at a time [19:59:17] elukey: but gently moving [19:59:54] very nice :) [19:59:56] thanks a lot [20:00:19] tomorrow morning we should be able to switch an-coord1001 to analytics-hive and test a failover [20:00:26] elukey: now it'll be about monitoring alerts :) [20:01:30] all right I'll keep an eye! [20:01:37] So will I [20:02:26] * joal is happy to have enforced naming convention for oozie jobs :) [21:00:32] mmmmmm [21:00:34] some errors [21:03:24] ok for virtual page viewis [21:03:27] *views [21:03:28] org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Failed to read external resource hdfs://analytics-hadoop/wmf/refinery/2020-12-15T19.36.10+00.00--scap_sync_2020-12-15_0001-dirty/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.58.jar [21:03:42] * elukey plays sad_trombone.wav [21:04:19] up, just saw that elukey [21:04:26] elukey: patching now [21:04:41] joal: <3 [21:04:49] Same problem for projectview-geo [21:05:01] ah yes yes [21:05:21] Here's the plan: restart jobs with updated value for jar version manually, and send a patch with the change [21:05:34] +1 [21:06:57] !log Kill-restart virtualpageview-hourly-coord and projectview-geo-coord with manually updated jar versions (old versions in conf) [21:07:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:11:26] thanks joal :) [21:13:13] fixed elukey - email sent [21:13:42] elukey: we might expect some failure with the round of daily jobs (same with monthly, but we'll have forgotten :) [21:14:29] yep yep I agree [21:21:39] ok - I'm not gonna wait for those jobs to fail :) [21:21:45] See ou tomorrow team [21:43:15] 10Analytics, 10Analytics-EventLogging, 10Wikidata, 10Patch-For-Review, 10ci-test-error (WMF-deployed Build Failure): HTTP request blocked: https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=18910134&formatversion=2&format=json by RemoteSchema::httpGet. ... - https://phabricator.wikimedia.org/T270226 [21:57:36] elukey: would anything destroy my kinit ticket beyond it expiring after 2 days or me explicitly calling kdestroy?