[00:14:40] 10Analytics, 10Product-Analytics: Re-process webrequests from 2020-05-18 so that page views from latest Wikipedia app releases are counted - https://phabricator.wikimedia.org/T256516 (10Nuria) @JoeWalsh I think we might a few more problems: - the number of agents doing > 2000 pageviews a day (which was about... [01:02:36] 10Analytics, 10Product-Analytics, 10Epic: API pageview counts for 'Mobile app' are incorrect since switch to mobile-html - https://phabricator.wikimedia.org/T256508 (10JoeWalsh) [01:03:01] 10Analytics, 10Product-Analytics, 10Epic: API pageview counts for 'Mobile app' are incorrect since switch to mobile-html - https://phabricator.wikimedia.org/T256508 (10JoeWalsh) [01:22:00] 10Analytics, 10Product-Analytics: Re-process webrequests from 2020-05-18 so that page views from latest Wikipedia app releases are counted - https://phabricator.wikimedia.org/T256516 (10Nuria) @JoeWalsh I think we might a few more problems: - the number of agents doing > 2000 pageviews a day (which was about... [01:24:48] 10Analytics, 10Analytics-EventLogging, 10Release-Engineering-Team, 10dev-images, 10Patch-For-Review: EventLogging dev image should have verbose output enabled - https://phabricator.wikimedia.org/T257378 (10dbarratt) [01:26:52] 10Analytics, 10Product-Analytics: Re-process webrequests from 2020-05-18 so that page views from latest Wikipedia app releases are counted - https://phabricator.wikimedia.org/T256516 (10Nuria) Will post some more updates tomorrow. [01:29:52] 10Analytics, 10Analytics-EventLogging, 10Release-Engineering-Team, 10dev-images, 10Patch-For-Review: EventLogging dev image should have verbose output enabled - https://phabricator.wikimedia.org/T257378 (10dbarratt) [01:50:31] 10Analytics-Radar, 10Operations, 10Traffic, 10Privacy: Add request_id to webrequest logs as well as other event records ingested into Hadoop - https://phabricator.wikimedia.org/T113817 (10Ottomata) :) [03:42:42] (03CR) 10Joal: Correct unique-devices per-project-family bug (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/610151 (https://phabricator.wikimedia.org/T257358) (owner: 10Joal) [03:47:54] (03CR) 10Joal: "> I have a hard time CR this cause I do not know how to best test it, maybe you can show me how can it be tested (besides running the deno" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/609465 (https://phabricator.wikimedia.org/T255548) (owner: 10Joal) [03:50:39] (03PS2) 10Joal: Correct unique-devices per-project-family bug [analytics/refinery] - 10https://gerrit.wikimedia.org/r/610151 (https://phabricator.wikimedia.org/T257358) [03:53:54] 10Analytics, 10Analytics-Kanban: Fix geoeditors job heisenbug - https://phabricator.wikimedia.org/T257397 (10JAllemandou) [03:53:56] 10Analytics, 10Analytics-Kanban: Fix geoeditors job heisenbug - https://phabricator.wikimedia.org/T257397 (10JAllemandou) a:03JAllemandou [03:54:48] (03PS2) 10Joal: Update geoeditors daily to not use map-join [analytics/refinery] - 10https://gerrit.wikimedia.org/r/609510 (https://phabricator.wikimedia.org/T257397) [04:09:03] (03PS2) 10Joal: Rename pageview_actor_hourly to pageview_actor [analytics/refinery] - 10https://gerrit.wikimedia.org/r/610159 (https://phabricator.wikimedia.org/T256415) [06:44:15] bonjour [07:17:44] 10Analytics-Clusters, 10Operations, 10netops, 10Patch-For-Review: Move netflow data to Eventgate Analytics - https://phabricator.wikimedia.org/T248865 (10elukey) Some high level issues that came up while talking about netflow on eventgate: * `pmacct` (the daemon that collects netflow data from routers) se... [07:31:11] 10Analytics-Clusters, 10Operations, 10procurement: RAM expansion for an-master100[1,2] nodes - https://phabricator.wikimedia.org/T257403 (10elukey) [07:46:22] I am a little bit worried about an-coord1001's memory usage [07:47:03] we'd need I think +32G of ram in theory [07:47:37] luckily we moved all the jobs and sqoop out some time ago [07:51:09] and it is not clear yet what is our failover plan if an-coord1001 goes down for hw failure [08:06:07] 10Analytics-Clusters, 10Operations, 10procurement: RAM expansion for an-master100[1,2] nodes - https://phabricator.wikimedia.org/T257403 (10Peachey88) @elukey This should be moved from {S1} to {S4}. [08:12:36] 10Analytics-Clusters, 10User-Elukey: Upgrade Druid to its latest upstream version (currently 0.18.1) - https://phabricator.wikimedia.org/T244482 (10elukey) In https://github.com/apache/druid/releases/tag/druid-0.16.0-incubating I see that Middlemanagers/Peons can be replaced with the "indexer", a new multi-thr... [08:16:24] 10Analytics-Clusters, 10User-Elukey: Upgrade Druid to its latest upstream version (currently 0.18.1) - https://phabricator.wikimedia.org/T244482 (10elukey) In https://github.com/apache/druid/releases/tag/druid-0.17.0 the batch ingestion code has been reworked, we might need to change our JSON configs (to be ch... [08:17:59] 10Analytics-Clusters, 10User-Elukey: Upgrade Druid to its latest upstream version (currently 0.18.1) - https://phabricator.wikimedia.org/T244482 (10elukey) In https://github.com/apache/druid/releases/tag/druid-0.18.0 there is full support for join operators and initial support for Java 11. [08:44:14] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Move Archiva to Debian Buster - https://phabricator.wikimedia.org/T252767 (10elukey) a:03elukey [08:51:20] 10Analytics-Clusters: Review an-coord1001's usage and failover plans - https://phabricator.wikimedia.org/T257412 (10elukey) [10:00:57] 10Analytics, 10Analytics-Wikistats: Wikistats New Feature - https://phabricator.wikimedia.org/T257071 (10Aklapper) @A455bcd9: Please edit the task summary to summarize **which** "new feature" this task is about, otherwise we have dozens of tasks which all only say "New Feature" in their summaries - see https:/... [10:03:45] 10Analytics, 10Analytics-Wikistats: Wikistats New Feature - https://phabricator.wikimedia.org/T257071 (10Aklapper) >>! In T257071#6279217, @A455bcd9 wrote: > FYI on [[ https://stats.wikimedia.org/#/all-projects | Wikimedia Statistics ]], "New feature" directly points to Wikimedia Phabricator. It may be better... [10:05:38] 10Analytics, 10Analytics-Wikistats: "Page views by edition of Wikipedia" for each country - https://phabricator.wikimedia.org/T257071 (10Aklapper) 05duplicate→03Open [10:05:41] 10Analytics, 10Analytics-Wikistats: "Page views by edition of Wikipedia" for each country - https://phabricator.wikimedia.org/T257071 (10Aklapper) [10:34:42] * elukey lunch! [10:51:20] 10Analytics, 10Product-Analytics: Re-process webrequests from 2020-05-18 so that page views from latest Wikipedia app releases are counted - https://phabricator.wikimedia.org/T256516 (10JoeWalsh) Hi @Nuria Thank you for flagging these issues before re-processing the data and for providing the additional cont... [10:57:01] 10Quarry, 10I18n: Internationalize Quarry UI - https://phabricator.wikimedia.org/T151104 (10Tacsipacsi) [11:47:53] Hi elukey [12:51:06] o/ [12:57:38] another round of upgrade/rollback for the test cluster [12:59:49] this is the magic of automation :) [13:01:37] I am currently wondering what libraries that we use from the cloudera repo are not in maven central [13:01:42] probably only super old ones [13:18:36] 10Analytics-Clusters, 10Operations, 10netops, 10Patch-For-Review: Move netflow data to Eventgate Analytics - https://phabricator.wikimedia.org/T248865 (10Ottomata) > doesn't support emitting HTTP POST to an endpoint (like eventgate). Well if it doesn't support HTTP POST then you won't be moving it to Event... [13:45:15] o/ [13:46:02] joal: wanna chat before the meeting today? [13:46:14] did you and ottomata get a chance to go over the doc yet? [13:47:42] 10Quarry: Quarry's download as "Excel xlsx" does not seem to work last 2 days - https://phabricator.wikimedia.org/T257453 (10Jarekt) [13:50:32] milimetric: we're in meeting with Andrew, will ping you once finished :) [14:00:39] milimetric: ready! [14:01:24] joal: omw cave [14:06:43] 10Quarry: Quarry's download as "Excel xlsx" creates an empty file last 2 days - https://phabricator.wikimedia.org/T257453 (10Aklapper) [14:07:35] milimetric: o/ - can you try the superset sqllab presto query again to see if the timeout is 60s now? (I am testing an option) [14:22:56] (03CR) 10Nuria: [C: 03+1] "Changes look good, please merge once tested" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/610151 (https://phabricator.wikimedia.org/T257358) (owner: 10Joal) [14:28:09] 10Analytics, 10Product-Analytics: Calculate impact of missing mobile app pageviews to high-level metrics - https://phabricator.wikimedia.org/T257373 (10Nuria) Please see issues listed at {T256508} [15:00:55] so for some weird reason, that I don't understand, when doing rollback the journalnodes can get into a weird state in which they start as if their cluster id is zero [15:01:12] so when the namenode tries to start (in rollback mode) they refuse to collaborate [15:01:33] meh :( this seems like super bizarre! [15:01:35] I thought that a restart "fixed" this inconsistency, but apparently more than one restart is needed [15:01:43] pff :) [15:02:01] then the journalnode realizes that it holds a valid journal and starts [15:02:29] elukey: possibly we could do as we do for archiva: give journal-nodes 5 minutes before restarting? [15:02:46] joal: what do you mean? [15:02:55] I wait one minute after each restart [15:03:05] ok [15:03:45] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Epic, and 2 others: Session Length Metric. Web implementation - https://phabricator.wikimedia.org/T248987 (10jlinehan) Adding some long overdue updates here for further discussion if it's necessary. I built 3 different versions of this instrum... [15:04:00] elukey: any lead I could help in following? [15:04:18] Hi daniram - welcome the the analytics chan :) [15:04:32] thank you @joal :) [15:04:40] joal: no thanks, I'll try to see if I find anything in the logs, really strange [15:04:45] it happens once in while [15:05:03] pretty sure it will happen in prod [15:05:27] daniram: I have some ask for you - Could you please limit you concurrent number of spark-jobs to 2 (1 if large)? [15:07:56] joal: sure, how do I check the number of spark-jobs running? [15:08:09] daniram: https://yarn.wikimedia.org/cluster/scheduler [15:08:18] you should use your shell username + pass [15:10:45] daniram: ou currently have 5 spark jobs running for instance :) [15:14:08] joal: ok I see :) thank you for the warning [15:17:15] np daniram - we don't enforce strong limitations on the cluster, we prefer to ask users for self-awareness :) [15:18:02] 10Analytics, 10Operations: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10kai.nissen) [15:24:06] 10Analytics, 10Operations: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10elukey) Hi! We are trying to move users away from Hue if possible, what is your use case? Have you tried, by any chance, https://superset.wikimedia.org/superset/sqllab ? [15:24:25] 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10CPT Initiatives (Modern Event Platform (TEC2)), and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Ottomata) [15:28:45] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade the Hadoop test cluster to BigTop - https://phabricator.wikimedia.org/T244499 (10elukey) The cookbooks seem to run fine, but sometimes during rollback I get instances of the following problems on journalnodes: ` 2020-07-0... [15:34:43] so in --^ there seems to be an indication of what is wrong with journals, but no idea why yet [15:57:22] (03PS1) 10Ebernhardson: Deploy analytics refinery to airflow instances [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/610313 [16:10:16] ebernhardson: o/ [16:10:31] do you have a min to chat? [16:11:37] 10Analytics, 10Operations: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10kai.nissen) My use case is to request data from event.WMDEBanner* tables. I'm using beeline right now, but find it hard to read at times. We're planning to request ingestion of Hive data to Druid for the... [16:16:13] 10Analytics, 10Operations: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10elukey) >>! In T257466#6290096, @kai.nissen wrote: > My use case is to request data from event.WMDEBanner* tables. I'm using beeline right now, but find it hard to read at times. > > We're planning to re... [16:28:46] elukey: sure [16:29:51] ebernhardson: added a comment in gerrit, basically I fear that we'll have some disk space issue on an-airflow if we deploy refinery [16:31:29] elukey: hmm, i didn't realize it was that big! But i see on stat1007 that history of deploys is up to 18G. hmm [16:31:47] really we only need the python part of the package, we could copy it out of hdfs but that seemed more hacky than deploying the repo. hmm [16:31:53] ah! [16:31:57] then this is good [16:32:17] if you see in the refinery's environment there should be a lighter deployment [16:32:23] that basically do not git-fat [16:32:31] we used it for notebooks [16:33:08] ok, i'll look around for how that works and update the patch [16:33:14] the environment is 'thin' [16:33:46] [global] [16:33:47] git_binary_manager: None [16:33:47] cache_revs: 1 [16:34:02] if you add an-airflow in there we should be good [16:34:15] 10Analytics, 10Operations: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10kai.nissen) Great, that's all I need! Sorry, I missed the documentation update. Does that mean, I can also just add `event.wmdebanner*` tables as described in [Analytics/Systems/Superset#Druid_datasource... [16:35:01] (03PS2) 10Ebernhardson: Deploy analytics refinery to airflow instances [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/610313 [16:35:10] ok i think thats the appropriate change [16:37:02] 10Analytics, 10Operations: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10elukey) >>! In T257466#6290255, @kai.nissen wrote: > Great, that's all I need! Sorry, I missed the documentation update. > > Does that mean, I can also just add `event.wmdebanner*` tables as described in... [16:37:49] (03CR) 10Elukey: [V: 03+2 C: 03+2] Deploy analytics refinery to airflow instances [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/610313 (owner: 10Ebernhardson) [16:40:57] elukey@an-airflow1001:~$ ls /srv/deployment/analytics/refinery [16:40:58] artifacts bin diagrams druid HACKING.md hive oozie python README.md setup.cfg static_data [16:41:01] ebernhardson: --^ [16:41:03] done :) [16:41:29] sweet!! thanks, this will help get the code that creates data in the same place that deletes it, so we can ensure when _+2'ing a patch it will clean up and doesn't need followups [16:51:01] 10Analytics-Radar, 10Core Platform Team, 10Dumps-Generation: HTML Dumps - June/2020 - https://phabricator.wikimedia.org/T254275 (10RBrounley_WMF) [17:07:36] 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10CPT Initiatives (Modern Event Platform (TEC2)), and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Ottomata) [17:08:22] 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10CPT Initiatives (Modern Event Platform (TEC2)), and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Ottomata) [17:09:54] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Epic, and 2 others: Session Length Metric. Web implementation - https://phabricator.wikimedia.org/T248987 (10Nuria) I am a bit lost here as to why do we need clock synchronization at all. If we can asses what tab is visible with the page visib... [17:17:39] 10Analytics, 10Event-Platform, 10Technical-blog-posts: Story idea for Blog: Wikimedia's Event Platform - https://phabricator.wikimedia.org/T253649 (10Ottomata) [17:17:57] 10Analytics, 10Operations: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10kai.nissen) 05Open→03Resolved a:03kai.nissen Yes, it works fine! I was already going beyond and trying > The documentation that you pointed out is related to druid datasources, so basically after h... [17:18:08] 10Analytics, 10Operations: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10kai.nissen) a:05kai.nissen→03None [17:19:32] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Epic, and 2 others: Session Length Metric. Web implementation - https://phabricator.wikimedia.org/T248987 (10jlinehan) >>! In T248987#6290395, @Nuria wrote: > I am a bit lost here as to why do we need clock synchronization at all. No, we aren... [17:28:51] 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10CPT Initiatives (Modern Event Platform (TEC2)), and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Ottomata) [17:58:29] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade the Hadoop test cluster to BigTop - https://phabricator.wikimedia.org/T244499 (10elukey) I tried again an upgrade and checked one of the journal nodes, finding a `previous` state: ` elukey@analytics1031:~$ ls /var/lib/had... [18:13:44] 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10CPT Initiatives (Modern Event Platform (TEC2)), and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Ottomata) [18:24:36] * elukey off! [19:13:10] 10Quarry: Quarry's download as "Excel xlsx" creates an empty file last 2 days - https://phabricator.wikimedia.org/T257453 (10Framawiki) Caused by {T238375}. [19:23:58] 10Quarry: Quarry's download as "Excel xlsx" creates an empty file last 2 days - https://phabricator.wikimedia.org/T257453 (10Framawiki) 05Open→03Resolved a:03Framawiki It's back now. [19:28:35] 10Quarry: quarry-web-01 leaks files in /tmp - https://phabricator.wikimedia.org/T238375 (10Framawiki) Per @bd808 [[ https://manpages.ubuntu.com/manpages/precise/man8/tmpreaper.8.html | tmpreaper ]] tool can be used to delete old files, like what find command did. It is already used somewhere in puppet files (::... [19:28:50] 10Quarry: Quarry's download as "Excel xlsx" creates an empty file last 2 days - https://phabricator.wikimedia.org/T257453 (10Framawiki) Btw thanks for the report @Jarekt :) [19:34:43] 10Quarry: quarry-web-01 leaks files in /tmp - https://phabricator.wikimedia.org/T238375 (10bd808) The [[https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/tmpreaper/manifests/init.pp|::tmpreaper module]] would probably be safe for these instances. Toolforge has to do something s... [19:43:33] a-team: back in business [19:45:45] 10Analytics, 10Analytics-Kanban, 10Core Platform Team Workboards (Initiatives): Design Document that proposes an alternative architecture for historic data endpoints - https://phabricator.wikimedia.org/T241184 (10Xinbenlv) Thank you this is important work~ Kudos to the team. Can we make https://docs.google.... [20:09:29] (03CR) 10Joal: "Tested - ready" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/610151 (https://phabricator.wikimedia.org/T257358) (owner: 10Joal) [20:49:15] I was in business on hexchat and remembered all the things I hated about IRC clients on Linux, glad to be back :) [21:50:08] (03CR) 10Nuria: [C: 03+2] Correct unique-devices per-project-family bug [analytics/refinery] - 10https://gerrit.wikimedia.org/r/610151 (https://phabricator.wikimedia.org/T257358) (owner: 10Joal) [22:59:16] 10Analytics: wikistats UI: language menu unusable after making language selection on mobile - https://phabricator.wikimedia.org/T257529 (10Nuria) [23:00:56] 10Analytics: Wikistats UI: legend for bar graph not visible on mobile UI - https://phabricator.wikimedia.org/T257530 (10Nuria) [23:26:44] 10Analytics: wikistats UI: language menu not usable after making language selection on mobile - https://phabricator.wikimedia.org/T257529 (10Nuria)