[00:03:14] RECOVERY - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is OK: OK: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:15:54] PROBLEM - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:06:36] PROBLEM - Check the last execution of reportupdater-published_cx2_translations_mysql on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit reportupdater-published_cx2_translations_mysql https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:00:44] RECOVERY - Check the last execution of reportupdater-published_cx2_translations_mysql on an-launcher1001 is OK: OK: Status of the systemd unit reportupdater-published_cx2_translations_mysql https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [03:35:10] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on an-coord1001 is CRITICAL: CRITICAL https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [03:41:52] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on an-airflow1001 is CRITICAL: CRITICAL https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [03:41:52] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on an-tool1006 is CRITICAL: CRITICAL https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [03:41:52] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1006 is CRITICAL: CRITICAL https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [03:41:54] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1005 is CRITICAL: CRITICAL https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [04:03:10] RECOVERY - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is OK: OK: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [04:05:58] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on notebook1003 is CRITICAL: CRITICAL https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [04:15:46] PROBLEM - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [04:19:04] PROBLEM - Check the last execution of monitor_refine_sanitize_eventlogging_analytics_delayed on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_sanitize_eventlogging_analytics_delayed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [04:20:44] PROBLEM - Check the last execution of monitor_refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:03:20] PROBLEM - Check the last execution of analytics-dumps-fetch-mediawiki_history_dumps on labstore1007 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-mediawiki_history_dumps https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:10:10] PROBLEM - Check the last execution of analytics-dumps-fetch-mediawiki_history_dumps on labstore1006 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-mediawiki_history_dumps https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:39:10] 10Analytics, 10I18n, 10RTL: Support right-to-left languages - https://phabricator.wikimedia.org/T251376 (10fdans) @Huji thank you for the feedback. Repeating translations is indeed annoying. In the case of dimension values like "access site" luckily TranslateWiki usually suggests the previous tranlation used... [05:41:54] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on notebook1004 is CRITICAL: CRITICAL https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [05:42:57] gooooo morning [05:47:25] hellooo [05:47:31] ok so the mountpoint check is not really behaving :) [05:47:33] hola fdans [05:50:16] 10Analytics, 10Analytics-Wikistats: Include locale string jsons as webpack chunks so that only the required language is bundled - https://phabricator.wikimedia.org/T240618 (10fdans) 05Open→03Resolved @Piramidion I can't really speak for translations since I'm not the one doing them. But if you feel they ar... [05:50:19] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Add multilanguage ability to Wikistats - https://phabricator.wikimedia.org/T238752 (10fdans) [06:03:32] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Experiment with Druid and SqlAlchemy - https://phabricator.wikimedia.org/T249681 (10cchen) [06:04:02] RECOVERY - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is OK: OK: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:07:28] PROBLEM - Check the last execution of refine_sanitize_eventlogging_analytics_delayed on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit refine_sanitize_eventlogging_analytics_delayed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:09:43] 10Analytics: Mediawiki history dumps fails due to source not present (yet) - https://phabricator.wikimedia.org/T251858 (10elukey) p:05Triage→03Medium [06:16:42] PROBLEM - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:19:04] PROBLEM - Check the last execution of monitor_refine_sanitize_eventlogging_analytics_delayed_failure_flags on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_sanitize_eventlogging_analytics_delayed_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:22:30] PROBLEM - Check the last execution of monitor_refine_sanitize_eventlogging_analytics_immediate_failure_flags on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_sanitize_eventlogging_analytics_immediate_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:23:09] yes yes [06:23:38] this one is probably too noisy, maybe we should simply absent it for the moment --^ [06:32:42] 10Analytics, 10Dumps-Generation, 10Core Platform Team Workboards (Clinic Duty Team), 10Patch-For-Review: page_restrictions field incomplete in current and historical dumps - https://phabricator.wikimedia.org/T251411 (10ArielGlenn) Making the command decision to block this task on T218446 / T35334. [06:33:18] 10Analytics, 10Dumps-Generation, 10Core Platform Team Workboards (Clinic Duty Team), 10Patch-For-Review: page_restrictions field incomplete in current and historical dumps - https://phabricator.wikimedia.org/T251411 (10ArielGlenn) [07:22:48] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Create anaconda .deb package with stacked conda user envs - https://phabricator.wikimedia.org/T251006 (10MoritzMuehlenhoff) > ` > dpkg-source: error: cannot represent change to anaconda/bin/configurable-http-proxy: > dpkg-source: error: new version is sy... [07:56:06] (03CR) 10Gilles: [V: 03+2] LayoutJank schema is deprecated, now LayoutShift [analytics/refinery] - 10https://gerrit.wikimedia.org/r/594135 (https://phabricator.wikimedia.org/T216594) (owner: 10Gilles) [08:28:27] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Experiment with Druid and SqlAlchemy - https://phabricator.wikimedia.org/T249681 (10cchen) Tested 4 tables and 2 dashboards with Druid tables: - Features: features are all working with Druid tables; - Load time: longer load time in refreshing ch... [08:39:57] 10Analytics: Creation of canonical pageview dumps for users to download - https://phabricator.wikimedia.org/T251777 (10fdans) [08:42:31] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Migrate pagecounts-ez generation to hadoop - https://phabricator.wikimedia.org/T192474 (10fdans) [08:42:33] 10Analytics: Creation of canonical pageview dumps for users to download - https://phabricator.wikimedia.org/T251777 (10fdans) [08:52:00] 10Analytics: Creation of canonical pageview dumps for users to download - https://phabricator.wikimedia.org/T251777 (10fdans) [09:03:43] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-Vagrant: eventlogging vagrant role: 'ParsedRequirement' object has no attribute 'req' - https://phabricator.wikimedia.org/T251864 (10Tgr) [09:13:39] RECOVERY - Check if the Hadoop HDFS Fuse mountpoint is readable on an-coord1001 is OK: OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [09:18:44] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Druid access for view on event.editeventattempt - https://phabricator.wikimedia.org/T249945 (10fdans) Just ran the ingestion job for data from Feb 20 to Apr 18. [09:29:23] (03PS1) 10Milimetric: [WIP] Use page move events to improve joining to entity [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/594428 (https://phabricator.wikimedia.org/T249773) [09:29:31] 10Analytics, 10Analytics-Kanban, 10Research, 10Patch-For-Review: Proposed adjustment to wmf.wikidata_item_page_link to better handle page moves - https://phabricator.wikimedia.org/T249773 (10Milimetric) Quick update, the patch I'm about to send shows some promise. I am compiling and testing on the full da... [11:04:19] * elukey lunch! [11:24:51] (03CR) 10Joal: [C: 03+1] "LGTM (weird though!)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/594283 (https://phabricator.wikimedia.org/T251794) (owner: 10Mforns) [11:51:53] RECOVERY - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1005 is OK: [Errno 5] Input/output error: /mnt/hdfs https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [11:53:08] I am still not able to figure out how to make nagios and the fuse mountpoint to work with kerberos-run-command [11:53:16] :S [11:55:22] very weird, it seems as if it wasn't able to read credentials doing ls /mnt/hdfs [11:55:36] meanwhile stuff like hdfs dfs -ls etc.. inside the same nagios check works [11:55:50] weird :( [11:56:17] (03CR) 10Joal: "2 minor things and 1 more complex. The approach is great :)" (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/594428 (https://phabricator.wikimedia.org/T249773) (owner: 10Milimetric) [11:56:17] it seems something peculiar to the fuse mountpoint, and nagios using a private tmp [11:56:52] elukey: this is gonna force us get read of fuse (how dramtically sad)( [12:03:32] joal: I wish, I am not sure if possible, too many people using it.. [12:04:05] elukey: I hear that - If it's not maintanable, we'll have to nontheless I guess [12:04:36] (03CR) 10Joal: [WIP] Use page move events to improve joining to entity (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/594428 (https://phabricator.wikimedia.org/T249773) (owner: 10Milimetric) [12:07:08] joal: well we could simply decide to stop monitoring it, should be fine [12:07:57] joal: also, did you see my task about hdfs-rsync? [12:08:14] elukey: I have a question - That many errors mean the thing is broken, isn't it? If it's broken, people can't use it (right?) - And nobody complained so far! [12:08:14] not sure if there is an option to make it more lenient now (to reduce the alarms) [12:08:31] elukey: I have seen it - It's in my list to comment on :) [12:08:45] joal: nono the errors are only nagios failing to ls on the mountpoints due to credentials, but everthing works [12:08:47] elukey: you shall have a ping from me soon :) [12:08:51] ack :) [12:09:02] ah ok I understand elukey [12:09:49] Thanks joal, I’ll add 1 month to the history snapshot date and 1 week to the entity one [12:10:05] milimetric: let's also add a comment :) [12:10:19] ofc [12:21:24] RECOVERY - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1004 is OK: OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [12:23:08] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1005 is CRITICAL: CRITICAL https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [12:25:19] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-Vagrant: eventlogging vagrant role: 'ParsedRequirement' object has no attribute 'req' - https://phabricator.wikimedia.org/T251864 (10Tgr) The issue is with the [[https://gerrit.wikimedia.org/g/eventlogging|eventlogging]] Python library, which uses this hack... [12:39:41] 10Analytics, 10Fundraising-Backlog, 10fundraising-tech-ops, 10Patch-For-Review: Install superset on front end server for analytics - https://phabricator.wikimedia.org/T245755 (10ayounsi) [12:47:50] I really love when I do another pass to superset dashboards thinking "this time is the right one, I fixed all bugs" [12:47:57] and then I realize that a chart is broken [12:48:01] * elukey cries in a corner [12:48:06] :( [12:54:27] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Add new dimensions to druid's pageview_hourly datasource - https://phabricator.wikimedia.org/T243090 (10JAllemandou) >>! In T243090#6106290, @Nuria wrote: > I think turnilo needs a re-start cause the dimensions are not available We... [12:55:03] 10Analytics: hdfs-rsync of mediawiki history dumps fails due to source not present (yet) - https://phabricator.wikimedia.org/T251858 (10JAllemandou) [12:56:07] 10Analytics: hdfs-rsync of mediawiki history dumps fails due to source not present (yet) - https://phabricator.wikimedia.org/T251858 (10JAllemandou) Should we update the job to look at new month only after a few days? Or shall we add a hdfs call to only launch hdfs-rsync if the source is present? [12:57:25] ah no wait it might be an issue with settigns [12:57:30] *settings [13:02:50] RECOVERY - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1007 is OK: OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration%23Fixing_HDFS_mount_at_/mnt/hdfs [13:06:49] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: Automate ingestion and refinement into Hive of event data from Kafka - https://phabricator.wikimedia.org/T251609 (10JAllemandou) Thoughts on https://phabricator.wikimedia.org/T251609#6100233: 1 - I'd be ok to hard-code a fake dat... [13:15:00] !log Dropping unavailable mediawiki-history-reduced datasources from superset [13:15:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:24:16] elukey: I think this should stop the published_cx_translations alarms: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/594169/ [13:24:29] too hacky? [13:27:37] mforns: what is the current problem? [13:30:46] elukey: oh sorry, I thought the description was on the task, but there's no task :], it was in the email thread: https://pastebin.com/uk6WjyD3 [13:32:13] 10Analytics, 10Tool-Pageviews: Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title - https://phabricator.wikimedia.org/T234590 (10BerndFiedlerWMDE) Thanks guys! [13:32:15] joal: can I help or do something with the fetch-mediawiki_history_dumps alerts? [13:32:26] Hi mforns :) [13:32:31] heyyy :] [13:32:47] mforns: I have commented on the task elukey created, asking for a prefered approach [13:32:57] oh ok [13:33:29] mforns: I think that the interval is fine [13:33:50] joal: this one right? https://phabricator.wikimedia.org/T251858 [13:33:59] elukey: cool :] [13:34:14] correct mforns [13:34:49] ok thx [13:36:48] mforns: merged and deployed [13:36:56] elukey: yay, thanks! [13:37:04] let's see if alarms cease [13:39:03] 10Analytics: hdfs-rsync of mediawiki history dumps fails due to source not present (yet) - https://phabricator.wikimedia.org/T251858 (10elukey) >>! In T251858#6108422, @JAllemandou wrote: > Should we update the job to look at new month only after a few days? Or shall we add a hdfs call to only launch hdfs-rsync... [13:39:54] ACKNOWLEDGEMENT - Check the last execution of analytics-dumps-fetch-mediawiki_history_dumps on labstore1006 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-mediawiki_history_dumps Elukey T251858 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:39:54] ACKNOWLEDGEMENT - Check the last execution of analytics-dumps-fetch-mediawiki_history_dumps on labstore1007 is CRITICAL: CRITICAL: Status of the systemd unit analytics-dumps-fetch-mediawiki_history_dumps Elukey T251858 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:41:28] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Experiment with Druid and SqlAlchemy - https://phabricator.wikimedia.org/T249681 (10JAllemandou) > - Data: discrepancy using the same data and same filters, see following examples. This is due to the semantics in time-range querying being diffe... [13:41:30] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Druid access for view on event.editeventattempt - https://phabricator.wikimedia.org/T249945 (10dr0ptp4kt) Thank you! [13:45:47] nice joal --^ [13:46:46] 10Analytics: hdfs-rsync of mediawiki history dumps fails due to source not present (yet) - https://phabricator.wikimedia.org/T251858 (10JAllemandou) Makes sense @elukey - this also means that we'll never get an error if no data shows up. I assume it's ok, as we have SLA emails on the production side. [13:47:39] 10Analytics: hdfs-rsync of mediawiki history dumps fails due to source not present (yet) - https://phabricator.wikimedia.org/T251858 (10elukey) >>! In T251858#6108699, @JAllemandou wrote: > Makes sense @elukey - this also means that we'll never get an error if no data shows up. I assume it's ok, as we have SLA... [13:52:09] 10Analytics: hdfs-rsync of mediawiki history dumps fails due to source not present (yet) - https://phabricator.wikimedia.org/T251858 (10JAllemandou) Well here we are, decision is made :)Task description will be updated with the desired solution. [13:53:13] 10Analytics: hdfs-rsync of mediawiki history dumps fails due to source not present (yet) - https://phabricator.wikimedia.org/T251858 (10JAllemandou) [13:53:36] mforns: do you want to handle T251858 or do you wish me to do it? [13:53:37] T251858: hdfs-rsync of mediawiki history dumps fails due to source not present (yet) - https://phabricator.wikimedia.org/T251858 [13:56:45] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: Automate ingestion and refinement into Hive of event data from Kafka - https://phabricator.wikimedia.org/T251609 (10Ottomata) > However the idea of having a single job for all is not appealing if that job is not resilient to some-... [14:03:46] RECOVERY - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is OK: OK: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:16:24] PROBLEM - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:18:23] mforns: should we absent the timers --^ ? [14:25:16] joal: I can work on T251858 if it's ok, my ops week :] [14:25:17] T251858: hdfs-rsync of mediawiki history dumps fails due to source not present (yet) - https://phabricator.wikimedia.org/T251858 [14:25:45] elukey: whatever you prefer, I'm testing the fix right now, and then I'll proceed to deployment train [14:25:49] works for me mforns - If there is too much on your plate please let me know (busy ops week!) [14:25:49] so maybe not super needed [14:27:33] mforns: ah are you deploying today? [14:29:03] elukey: yes [14:29:13] mforns: all right then, <3 [14:29:19] cool :] [14:31:56] Hey! I got service! I’m at the doctor for vertigo, will miss meetings I think [14:40:43] 10Analytics, 10Analytics-EventLogging, 10Product-Analytics: EditAttemptStep sent event with "ready_timing": -18446744073709543000 - https://phabricator.wikimedia.org/T251772 (10mpopov) >>! In T251772#6106468, @DLynch wrote: > ` > logEditEvent( 'ready', { > timing: readyTime - window.performance.timing.navi... [14:45:29] 10Analytics, 10Pageviews-Anomaly: Topviews Analysis of the Hungarian Wikipedia is flooded with spam - https://phabricator.wikimedia.org/T237282 (10Nuria) The bot detection has been deployed, see: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection Closing this ticket as there are no a... [14:45:37] 10Analytics, 10Pageviews-Anomaly: Topviews Analysis of the Hungarian Wikipedia is flooded with spam - https://phabricator.wikimedia.org/T237282 (10Nuria) 05Open→03Resolved [14:45:39] 10Analytics, 10Research-Backlog: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207 (10Nuria) [14:54:29] elukey: please take a look when you have time: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/594272/ [14:55:15] nuria: sure, already tested etc..? [14:57:14] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Druid access for view on event.editeventattempt - https://phabricator.wikimedia.org/T249945 (10Nuria) 05Open→03Resolved [14:59:31] elukey: i need to test, will test that one and this one as well: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/594472/ [15:02:48] nuria: ok so no merge for the moment right [15:02:49] ? [15:03:05] elukey: right, will test after standup [15:33:39] is there any information about recording spark UI output from automated runs for later analysis? Basically, do we have a spark history server? [15:34:05] hm ebernhar1son no [15:34:09] we have yarn logs from spark job [15:34:13] but that's it for history [15:34:28] right, but the yarn logs from my jobs are > 1M lines, i want to look in the ui and see a thing that says (9 retries) :) [15:34:37] aye [15:34:44] i'll see how much work it would be to setup, not sure [15:52:08] ebernhardson: not sure if this can help but I have tried https://github.com/criteo/babar and found interesting stuff [15:54:59] joal: interesting, that might help! it turns out one of my jobs has 9 hours of standard deviation in its weekly runs since feb...a little profiling and looking at what it's actually doing is called for :) [15:55:26] (it took 3 hours once a week when initially deployed, so the current runtime is quite surprising) [15:57:29] ack ebernhardson [15:59:56] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Experiment with Druid and SqlAlchemy - https://phabricator.wikimedia.org/T249681 (10cchen) @JAllemandou Thanks for the explanation! [16:01:01] milimetric: nuria standup [16:02:24] (03CR) 10Mforns: [V: 03+2] "I tested this in the cluster and works!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/594283 (https://phabricator.wikimedia.org/T251794) (owner: 10Mforns) [16:06:34] 10Quarry, 10Data-Services, 10cloud-services-team (Kanban): Quarry or the Analytics wikireplicas role creates lots of InnoDB Purge Lag - https://phabricator.wikimedia.org/T251719 (10JHedden) [16:12:34] (03CR) 10Nuria: [C: 03+2] Adapt EventLoggingSanitization to snakeyaml version 1.20 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/594283 (https://phabricator.wikimedia.org/T251794) (owner: 10Mforns) [16:42:23] 10Analytics, 10Fundraising-Backlog, 10fundraising-tech-ops, 10Patch-For-Review: Install superset on front end server for analytics - https://phabricator.wikimedia.org/T245755 (10Jgreen) 05Open→03Resolved This is done. [16:58:38] 10Quarry, 10Data-Services, 10cloud-services-team (Kanban): Quarry or the Analytics wikireplicas role creates lots of InnoDB Purge Lag - https://phabricator.wikimedia.org/T251719 (10bd808) From https://mariadb.com/kb/en/innodb-purge/ > The old versions are kept at least until all transactions older than the t... [17:02:35] RECOVERY - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is OK: OK: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:05:56] * elukey afk for a bit! [17:14:03] Gone for diner - back after [17:15:08] 10Quarry, 10Data-Services, 10cloud-services-team (Kanban): Quarry or the Analytics wikireplicas role creates lots of InnoDB Purge Lag - https://phabricator.wikimedia.org/T251719 (10jcrespo) The ideal solution, (other than modify client query patterns, which we are likely cannot) is not `innodb_max_purge_lag`... [17:15:09] PROBLEM - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:30:55] joal: I moved your sqoop task from Ready to deploy to Done in the Kanban, I think you already deployed it yesterday. please move back if I messed up [17:37:12] nuria: what do you think of changing the recipients of the traffic entropy data quality alarms to: just you, sukhbir and me (for now). [17:39:14] fdans: would you like me to deploy something Wikistats today? [17:44:02] (03PS1) 10Mforns: Update changelog for version 0.0.124 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/594534 [17:44:40] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Merging for deployment" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/594534 (owner: 10Mforns) [17:44:55] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: Automate ingestion and refinement into Hive of event data from Kafka - https://phabricator.wikimedia.org/T251609 (10Ottomata) Ok assuming we had 1. and 2. We could then have. ==== Camus Job 'event' All active stream's topics fou... [17:47:19] Starting build #42 for job analytics-refinery-maven-release-docker [17:50:31] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10Ottomata) [17:51:01] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10Ottomata) [17:51:05] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10CPT Initiatives (Modern Event Platform (TEC2)), and 2 others: Refactor EventBus mediawiki configuration - https://phabricator.wikimedia.org/T229863 (10Ottomata) [17:51:17] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10Ottomata) {T229863} is relevant too. [17:57:20] Project analytics-refinery-maven-release-docker build #42: 09SUCCESS in 10 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/42/ [18:00:51] Starting build #14 for job analytics-refinery-update-jars-docker [18:01:09] Project analytics-refinery-update-jars-docker build #14: 09SUCCESS in 18 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/14/ [18:01:09] (03PS1) 10Maven-release-user: Add refinery-source jars for v0.0.124 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/594536 [18:02:00] !log Deployed refinery-source using the awesome new jenkins jobs :] [18:02:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:05:28] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10Ottomata) Hm, but then again, right now static stream config is manually declared for each EventGate instance. eventgate-main onl... [18:08:35] mforns: you want me to merge that refine jar bump? [18:08:43] :D [18:08:58] ottomata: wait [18:09:01] k [18:09:44] ottomata: I need to deploy refinery for that jar to be accessible everywhere, right? [18:10:38] yes [18:10:57] yea, so I'm still deploying refinery, will ping you when finished :] [18:13:25] !log Deploying refinery using scap, then deploying onto hdfs [18:13:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:19:35] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935 (10Ottomata) Let's keep discussion of this on T251609. Moving some of ^ there. [18:19:46] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: Automate ingestion and refinement into Hive of event data from Kafka - https://phabricator.wikimedia.org/T251609 (10Ottomata) > I'll work on getting 1. and 2. done. I just created {T251935} for 2, but upon starting to implement th... [18:27:10] thanks mforns for moving the task :) [18:30:22] mforns: I rather take a deeper looks to the alarms now, i doubt that the thresholds and timeseries we have now are really working cause alarms are being triggered a few times a day, seems more noise than signal [18:30:46] mforns: Also joal and i are here (on our 1 on 1) if you need help with refinery deploy [18:30:56] indeed :) [18:32:08] 10Analytics, 10Analytics-Cluster: Monitoring GPU Usage on stat Machines - https://phabricator.wikimedia.org/T251938 (10Aroraakhil) [18:39:56] thanks nuria and joal all good for now [18:47:58] !log Finished deploying refinery using scap, then deploying onto hdfs [18:47:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:48:43] ottomata: ok :] can you please merge the bump up patch, thxxx [18:51:10] ok [18:51:29] mforns: what's this? [18:51:29] https://gerrit.wikimedia.org/r/c/analytics/refinery/+/594536 [18:51:48] lookion [18:52:25] oh... it seems the patch created by jenkins! [18:52:37] did that need to be merged before refinery deploy? [18:52:39] it's not merged... [18:52:59] yes! but I thought it was merged automatically by jenkins no? it was before [18:53:26] yeah? dunno [18:54:29] wow, I never merged such a change when deploying [18:54:34] is it sth new? [18:56:47] well, yes...joal refactored the deploy system [18:56:49] but i haven't used it yet [18:56:59] so dunno if that is a new step [18:58:25] k, will ask him when meeting finished [18:58:36] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: Automate ingestion and refinement into Hive of event data from Kafka - https://phabricator.wikimedia.org/T251609 (10Ottomata) None of these solutions are great. Perhaps we can just mostly keep the status quo and aim for a slight... [19:00:39] here I am mforns [19:00:44] hey joal :] [19:01:28] mforns: so, here is the thing - the new docker-job sends a CR instead of directly pushing the change as it was before - It allows us to review [19:01:42] ah! ok no problemo [19:01:49] Here as you can see, we have the new jars in their correct places, and the link updates [19:01:54] looks good to me :) [19:02:10] You can merge that, and the new jars are ready to get deployed :) [19:02:21] I missed that, yea change looks good [19:02:33] also will add to the docs that this needs to be manually merged [19:03:03] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Merging for deployment." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/594536 (owner: 10Maven-release-user) [19:03:18] RECOVERY - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is OK: OK: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:05:39] ok, re-deploying refinery [19:08:38] mforns: https://wikitech.wikimedia.org/w/index.php?title=Analytics%2FSystems%2FCluster%2FDeploy%2FRefinery-source&type=revision&diff=1865163&oldid=1865161 [19:09:17] joal: thanks :D [19:09:33] np mforns - thanks for pointing the lack of docs ;) [19:09:58] I was also changing them, I think we edited at the same time! [19:10:06] will remove my changes, because your comments are better [19:15:57] PROBLEM - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [19:28:11] quick (maybe? or just tell me to go to phab or analytics listserv) gutcheck from anyone who is around and might know the answer to this: certain sites/apps provide a wprov= parameter in Wikipedia links so we can see when someone uses that link (https://wikitech.wikimedia.org/wiki/Provenance). when that link is redirected though, the wprov parameter is dropped from the URL and not retained in the x_analytics data either. the [19:28:11] most common case seems to be that someone shares a desktop link and it is redirected to mobile such as en.wikipedia.org/wiki/?wprov= -> en.m.wikipedia.org/wiki/. is this a bug or is there a good reason why this happens and if I want to count pageviews from that wprov, I need to account for it somehow in my hive query? [19:37:07] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Technical contributors emerging communities metric definition, thick data - https://phabricator.wikimedia.org/T250284 (10jwang) ` communities with less than 5 technical contributors ` If technical contributor means bot editors, then we can count bot edito... [19:38:05] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: Automate ingestion and refinement into Hive of event data from Kafka - https://phabricator.wikimedia.org/T251609 (10Ottomata) Or, I think I just thought of a way to keep things as they are now with all the various Camus and Refine... [19:49:48] !log Finished re-deploying refinery using scap, then re-deploying onto hdfs [19:49:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:50:18] hey ottomata, if you're still here, can you merge the patch? I think this time all is set up [19:51:43] bbiab [19:58:20] yes [20:06:00] 10Analytics, 10LDAP-Access-Requests, 10Operations: LDAP access to the wmf group for Antonino Hemmer (superset, turnilo, hue) - https://phabricator.wikimedia.org/T251123 (10colewhite) p:05Triage→03Medium a:03colewhite [20:07:03] mforns_brb: done [20:09:58] RECOVERY - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-launcher1001 is OK: OK: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [20:13:42] ottomata and joal, thanks for the help with the deployment train [20:23:51] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Automate ingestion and refinement into Hive of event data from Kafka - https://phabricator.wikimedia.org/T251609 (10Ottomata) I did just ^ in https://gerrit.wikimedia.org/r/c/operations/puppet/+/594565. I th... [20:37:26] logging of teamm, byee :] [20:44:25] 10Analytics, 10Analytics-Kanban, 10Research, 10Patch-For-Review: Proposed adjustment to wmf.wikidata_item_page_link to better handle page moves - https://phabricator.wikimedia.org/T249773 (10Isaac) @Milimetric that looks great! Only comment: I assume that the `wmf.mediawiki_page_history` snapshot will alwa... [23:08:19] 10Analytics, 10LDAP-Access-Requests, 10Operations: LDAP access to the wmf group for Antonino Hemmer (superset, turnilo, hue) - https://phabricator.wikimedia.org/T251123 (10colewhite) [23:08:57] 10Analytics, 10LDAP-Access-Requests, 10Operations: LDAP access to the wmf group for Antonino Hemmer (superset, turnilo, hue) - https://phabricator.wikimedia.org/T251123 (10colewhite) This can be completed once the patch T251122 has been deployed. [23:14:19] if anyone is still around, can i get a hdfs chown on /wmf/data/discovery/search_satisfaction/daily/year=2019 to analytics-search? Not sure how it ended up owned by me