[06:54:54] 10Analytics: Upgrade pandas in spark SWAP notebooks - https://phabricator.wikimedia.org/T222301 (10elukey) >>! In T222301#5153342, @Groceryheist wrote: > I see, for Python packages I usually use pip instead of Debian since python tends to move much faster than Debian. Of course, I'm just managing this for mysel... [07:05:54] 10Analytics: Upgrade pandas in spark SWAP notebooks - https://phabricator.wikimedia.org/T222301 (10Groceryheist) > Having said this, Andrew is planning to work on the Spark 2.4.2 upgrade and he will take a look if pandas could be upgraded as well :) Sweet! Thank you I appreciate it! I understand your answer... [07:08:16] 10Analytics, 10Analytics-Kanban: Mandatory success_email_to parameter in mediawiki_history_check coordinator - https://phabricator.wikimedia.org/T222422 (10JAllemandou) [07:08:31] 10Analytics, 10Analytics-Kanban: Mandatory success_email_to parameter in mediawiki_history_check coordinator - https://phabricator.wikimedia.org/T222422 (10JAllemandou) a:03JAllemandou [07:08:56] (03PS1) 10Joal: Mandatory success_email_to at coordinator level [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507914 (https://phabricator.wikimedia.org/T222422) [07:13:41] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refactor druid data deletion script - https://phabricator.wikimedia.org/T220111 (10JAllemandou) Ping @fdans - The patch is not listed here (don't know why), but still needs your review please: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/502858 [07:13:48] 10Analytics: Upgrade pandas in spark SWAP notebooks - https://phabricator.wikimedia.org/T222301 (10elukey) It would be a security problem since keeping track of dependencies for those venvs would require a lot of effort or some automation that we currently don't have.. I'd prefer to wait for common use cases (an... [07:14:28] !log Restarting mediawiki-history-check_denormalize-coord with missing parameter (patch provided to prevent the coord to start without it) [07:14:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:24:58] 10Analytics: Upgrade pandas in spark SWAP notebooks - https://phabricator.wikimedia.org/T222301 (10Groceryheist) Ok I see. A hostile dependency could be a big problem. I'm not looking to argue, just sincerely curious. I get involved managing a sort of ad-hoc spark setup on the UW cluster, so maybe I can learn s... [07:28:04] 10Analytics: Upgrade pandas in spark SWAP notebooks - https://phabricator.wikimedia.org/T222301 (10elukey) We are also open to suggestions, it is a new thing for everybody, so please feel free to create tasks and follow up with us, we don't think you are looking to argue, on the contrary! [07:29:09] joal: bonjour :) [07:29:15] Good morning elukey :) [07:29:20] I still have no explanation why we have analytics:hdfs [07:29:42] :S [07:29:51] This is no good :( [07:29:59] acls are disabled, parent dirs don't have grp hdfs afaics [07:30:20] but there might be some exceptions that we are hitting [07:30:24] that could be legitimate? [07:30:26] elukey: could it be that the analytics user is not present/in specific group on workers/an-coord? [07:30:48] elukey: I'm pretty sure it;s a corner-case of user not being in groups or something like that [07:33:26] I tried from stat1004 and an-coord1001 to [07:33:37] 1) create /user/analytics/test [07:33:45] 2) chown with analytics:hadoop [07:33:48] 3) touchz a file [07:33:51] -rw-r--r-- 3 analytics hadoop 0 2019-05-03 07:32 /user/analytics/test/test.txt [07:33:55] that is good [07:34:28] joal: would it be an issue to create a "test" dir under the aqs path? [07:34:32] to see if we can repro [07:35:03] elukey: ooie doesn't run from stat1004, maybe it;s the issue? [07:35:22] elukey: no issue creating a folder in aqs path :) [07:35:46] elukey: create a folder, touch file/do some testing, clean remove the things :) [07:36:26] joal: I also tried from an-coord1001 [07:36:30] hm [07:36:43] The last one thing: the hive job is run from workers [07:41:23] the user is not on those, but neither all the others (like analytics-search or any other "real" user) [07:43:19] drwxr-xr-x - analytics hadoop 0 2019-05-03 07:42 /wmf/data/wmf/aqs/hourly/year=2019/month=5/day=2/test_elukey [07:43:26] this is from stat1004 [07:43:34] 10Analytics, 10Analytics-Kanban: Fix jobs after mediawiki-history refactor - https://phabricator.wikimedia.org/T222425 (10JAllemandou) [07:43:35] removing and also trying from an-coord1001 [07:43:58] :/ [07:44:23] 10Analytics, 10Analytics-Kanban: Fix jobs after mediawiki-history refactor - https://phabricator.wikimedia.org/T222425 (10JAllemandou) a:03JAllemandou [07:44:57] same thing [07:45:06] pffff [07:45:22] must be oozie related [07:45:38] those dirs are created by hive when partitioning right? [07:45:45] correct [07:46:04] (03PS1) 10Joal: Fix wikidata-coeditor job after MWH-refactor [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507916 (https://phabricator.wikimedia.org/T222425) [07:48:45] and https://cwiki.apache.org/confluence/display/Hive/Permission+Inheritance+in+Hive seems to confirm that Hive doesn't do anything weird [07:49:42] but the behavior paragraph is interesting [07:50:41] "Hence, user that hive runs as (either 'hive' or the logged-in user in case of impersonation), must be super-user or owner of the file whose security properties are going to be changed" [07:57:06] https://github.com/dpgaspar/Flask-AppBuilder/commit/174b07e1e18da0a8e84bba1f0e275d279a837c6a [07:57:09] \o/ [07:59:32] You rock elukey :) [08:00:36] if he releases a new FAB version I'll include it into our version of superset [08:00:48] because it will take a huge amount of time before hitting superset [08:00:52] (upstream) [08:00:55] sigh [08:01:09] I have two pull requests opened that are sitting there doing nothing [08:01:28] not even sure how to drag attention anymore [08:01:34] not an healthy way to manage community [08:01:34] (03CR) 10Joal: [V: 03+1] "Checked on cluster" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507916 (https://phabricator.wikimedia.org/T222425) (owner: 10Joal) [08:01:46] anyway, back to hive [08:10:01] so IIUC hdfs basic inheritance for groups work fine [08:10:35] in this case, the only explanation is that we are hitting a hive corner case [08:11:06] so we could try to create partitions on a temp dir with group 'hadoop' and see if we can repro [08:22:01] Let's try that - Fake table on tmp folder [08:24:03] never done it though :) [08:24:16] we don't need data right? [08:24:22] I can do it elukey, except for me not being able to sudo as analytics :( [08:24:24] just a directory and then create table from bla bla [08:24:36] yep I can do that part [08:24:43] maybe we can use a gist? [08:25:14] elukey: create a folder with correct perms: /user/analytics/test_hive - I'll have a gist for you in minutes :) [08:29:31] elukey: https://gist.github.com/jobar/055584333706e894860d91bfa9457f96 [08:29:31] drwxr-xr-x - analytics hadoop 0 2019-05-03 08:28 /user/analytics/test_hive [08:30:00] elukey: ou should run hive being analytics obviously [08:30:14] yep yep [08:30:48] Line 4:8 Expression not in GROUP BY key 'cache_status' [08:31:05] Arf - sorry [08:31:19] Updated [08:31:30] too fast of a copy-paste :) [08:31:51] ok I am seeing https://yarn.wikimedia.org/cluster/app/application_1555511316215_60333 [08:32:30] Loading data to table analytics.test_aqs_hourly partition (year=2019, month=5, day=1, hour=0) [08:32:33] chgrp: changing ownership of 'hdfs://analytics-hadoop/user/analytics/test_hive/year=2019/month=5/day=1/hour=0': User does not belong to hadoop [08:32:36] ta daaan [08:32:45] there you go :) [08:33:04] elukey: I'd be interested to make the same test with any of our local user [08:33:18] I think it's a classical error message I often see [08:34:20] very interesting [08:34:21] drwxr-xr-x - analytics hadoop 0 2019-05-03 08:32 /user/analytics/test_hive/year=2019/month=5/day=1 [08:34:25] right [08:34:27] drwxr-xr-x - analytics hdfs 0 2019-05-03 08:32 /user/analytics/test_hive/year=2019/month=5/day=1/hour=0 [08:34:36] wow [08:34:41] that I don't explain :D [08:34:44] even moar funky [08:35:05] but it is what the error message says [08:35:33] https://issues.apache.org/jira/browse/HIVE-6648 [08:36:01] not sure if strictly related but.. [08:36:30] even if we have 1.1 [08:37:50] it seems like the dirs are created via regular hdfs mkdir [08:38:09] but then the last leaf is chgrp() [08:38:24] that fails since 'analytics' is not in hadoop [08:38:34] and it remains 'hdfs' [08:39:58] ok [08:40:06] does it make sense? [08:40:31] it kinda does - But the case is probably everywhere in hdfs :S [08:41:45] (03PS1) 10Joal: Fix edit_hourly job after MWH refactor [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507923 (https://phabricator.wikimedia.org/T222425) [08:42:08] (03PS1) 10Joal: Fix geoeditors job after MWH refactor [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507924 (https://phabricator.wikimedia.org/T222425) [08:42:47] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Fix jobs after mediawiki-history refactor - https://phabricator.wikimedia.org/T222425 (10JAllemandou) [08:44:27] (03CR) 10Joal: [V: 03+1] "Query checked on cluster." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507923 (https://phabricator.wikimedia.org/T222425) (owner: 10Joal) [08:44:40] (03CR) 10Joal: [V: 03+1] "Query checked on cluster." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507924 (https://phabricator.wikimedia.org/T222425) (owner: 10Joal) [08:54:22] 10Analytics, 10Analytics-Kanban: Move the three sqoop jobs to oozie to ease administration and manual runs - https://phabricator.wikimedia.org/T222378 (10JAllemandou) > we should check that and decline the task if the migration is complete. In that case we can just reschedule all sqoops to start on the 1st of... [08:56:02] brb [08:57:57] 10Analytics, 10Analytics-Kanban: Move the three sqoop jobs to oozie to ease administration and manual runs - https://phabricator.wikimedia.org/T222378 (10JAllemandou) One suggestion to keep things easier than having to use oozie is to make a single timer instead of 2, launching the 2 jobs in sequence with an &... [08:59:16] 10Analytics-Cluster, 10Analytics-Kanban: refinery-sqoop-mediawiki-production on an-coord1001 needed restart - https://phabricator.wikimedia.org/T222294 (10JAllemandou) Closing this task as a duplicate of the one we created to describe the issue. [08:59:54] 10Analytics-Cluster, 10Analytics-Kanban: refinery-sqoop-mediawiki-production on an-coord1001 needed restart - https://phabricator.wikimedia.org/T222294 (10JAllemandou) [08:59:58] 10Analytics, 10Analytics-Kanban: Move the three sqoop jobs to oozie to ease administration and manual runs - https://phabricator.wikimedia.org/T222378 (10JAllemandou) [09:28:33] mmmm also chgrp: changing ownership of... seems to be related to the cli command [09:28:44] that is probably used [09:29:05] the main question is why hive tries to set the perms [09:35:14] (03PS1) 10Joal: Remove wikipedia-zero as program is over [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507933 (https://phabricator.wikimedia.org/T213770) [09:42:53] other main question is - do we need to have analytics:hadoop ? or :hdfs is sufficient? [09:45:51] elukey: I actually have no clue if there is a strong difference :( [09:46:35] the only thing that I can think of is that yarn also belongs to 'hadoop', so sometimes it might be useful ? [09:48:16] but afaics we don't actively leverage 'hadoop' [09:48:40] because we either set perms for user:group and no other (like hdfs:analytics-private-data) [09:48:43] or other is rw [09:48:47] err rx [09:50:35] (03PS2) 10Joal: Fix wikidata-coeditor job after MWH-refactor [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507916 (https://phabricator.wikimedia.org/T222425) [09:50:41] elukey: I feel helpless on the topic :( [09:59:40] joal: me too, I don't have a lot of context, but I guess that we have been using 'hadoop' as generic group (like a placeholder) [10:00:09] probably elukey - Maybe Andrew knows more? [10:00:20] actually - I hope Andrew knows more :) [10:00:52] in hadoop we have yarn/mapred/hdfs [10:25:46] joal: setting hive.warehouse.subdir.inherit.perms = false; in the client seems working [10:25:59] hoooooo [10:26:05] I didn't know about that param [10:27:24] I am trying to double check if not using it yields back analytics:hdfs [10:28:09] ah no sorry false alarm, the param has no effect [10:28:11] sigh [10:28:24] :( [10:29:13] I was checking the 'day' dir perms, but it is the hour one that gets borked [10:38:55] the only thing that changes with the param is that I don't see the "chgrp: changing ownership of..." anymore [10:38:58] but perms are no ok [10:39:00] *not o [10:39:02] ufff [10:39:54] https://jira.apache.org/jira/browse/HIVE-15433 [10:39:59] setting hive.warehouse.subdir.inherit.perms in HIVE won't overwrite it in hive configuration [10:41:21] so we could test it in hive-site.xml [10:42:37] 10Analytics, 10Continuous-Integration-Config, 10Release-Engineering-Team (Watching / External): Status of analytics/limn-*-data git repositories? - https://phabricator.wikimedia.org/T221064 (10hashar) 05Open→03Resolved Thank you Dan! I just thought all those repositories got forgotten and did not serve... [10:46:06] 10Analytics, 10EventBus, 10Growth-Team, 10MediaWiki-extensions-WikimediaEvents, and 2 others: Edits to Flow pages result in a page-links-change event with no performer - https://phabricator.wikimedia.org/T216726 (10Cirdan) [10:46:30] I think at this point that hive.warehouse.subdir.inherit.perms is the likely culprit joal [10:47:45] reading more https://issues.apache.org/jira/browse/HIVE-6892 [10:47:51] okey [10:48:30] so setting it in hive-site.xml should work, but what could be the side effect ? [10:57:03] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10fgiunchedi) >>! In T196066#5153205, @Ottomata wrote: > Hm but also, whatever we replace varnishkafka with... [11:09:05] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Clickstream dataset for Persian Wikipedia only includes external values - https://phabricator.wikimedia.org/T191964 (10JAllemandou) Hi @Ladsgroup - I'm extremely sorry for having taken the time to answer you faster :( I've quickly tested your patch and it... [11:09:13] (03CR) 10Joal: [C: 03+2] "Tested, looks good." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/472700 (https://phabricator.wikimedia.org/T191964) (owner: 10Ladsgroup) [11:14:44] (03Merged) 10jenkins-bot: ClickstreamBuilder: Decode refferer url to utf-8 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/472700 (https://phabricator.wikimedia.org/T191964) (owner: 10Ladsgroup) [11:23:40] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Clickstream dataset for Persian Wikipedia only includes external values - https://phabricator.wikimedia.org/T191964 (10JAllemandou) a:05Nuria→03JAllemandou [11:41:38] joal: tried to make a summary in https://phabricator.wikimedia.org/T220971#5155593 [11:41:51] I'd say to wait for Andrew's opinion and then see how to proceed [11:43:10] Many thanks elukey - awesome work as always :) [11:44:12] well you did a big chunk of it :) [11:44:36] I really hope that this is the last blocker :D [11:44:58] going off for lunch break! [11:45:00] ttl :) [11:45:14] later :) [12:57:44] 10Analytics, 10WMDE-Analytics-Engineering: Install Gensim for Python3 on stat1007 - https://phabricator.wikimedia.org/T222359 (10GoranSMilovanovic) @Ottomata Hey, Otto thank you for a lighting fast response on this one. Here is the situation: - the [[ http://wmdeanalytics.wmflabs.org/ | Wikidata Concepts Mon... [13:14:45] mornin [13:22:37] o/ [13:39:50] 10Analytics, 10WMDE-Analytics-Engineering: Install Gensim for Python3 on stat1007 - https://phabricator.wikimedia.org/T222359 (10elukey) @GoranSMilovanovic have you tried to use https://wikitech.wikimedia.org/wiki/SWAP and install the package in there? pip is available and you could start using it straight awa... [13:52:02] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "no this is entirely my bad, I'll do a code search now" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507924 (https://phabricator.wikimedia.org/T222425) (owner: 10Joal) [13:52:40] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Fix edit_hourly job after MWH refactor [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507923 (https://phabricator.wikimedia.org/T222425) (owner: 10Joal) [13:53:15] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Mandatory success_email_to at coordinator level [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507914 (https://phabricator.wikimedia.org/T222422) (owner: 10Joal) [13:59:16] joal: just trying to keep things straight, event_user_creation_timestamp is not necessarily the earliest, right? The earliest one is usually the first edit timestamp? [13:59:28] or no... it's mixed, right [13:59:28] correct milimetric [14:00:00] in searching for other possible problems with the new schema I found metrics that now I'm not sure how they should be computed [14:00:13] We should use MIN(MIN(user_registration_timestamp, user_creation_timestamp), user_first_edit_timestamp) [14:00:17] but I guess event_user_creation_timestamp is a good timestamp to use for now, maybe we can do min, yea [14:00:58] milimetric: given metrics is not used as of today (not that I know at least), and it's not broken, I didn't made the effort to fix [14:01:00] oof, that's a mouthful, I'd hate to add another column but it'd be nice to have user_first_seen or something like that [14:01:07] yeah, that's fine [14:01:17] milimetric: yeah, was thinking about it while writing [14:01:57] milimetric: need to go catch kids see you at standup - Thanks for the reviews :) [14:02:06] 'course [14:06:07] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "double checked all changes to the schema from this commit: https://github.com/wikimedia/analytics-refinery/commit/4301e0148400896968a0d68d" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507924 (https://phabricator.wikimedia.org/T222425) (owner: 10Joal) [14:07:24] (03CR) 10Milimetric: [C: 03+2] Fix wikidata-coeditor job after MWH-refactor [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507916 (https://phabricator.wikimedia.org/T222425) (owner: 10Joal) [14:12:25] (03Merged) 10jenkins-bot: Fix wikidata-coeditor job after MWH-refactor [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/507916 (https://phabricator.wikimedia.org/T222425) (owner: 10Joal) [14:22:19] 10Analytics, 10WMDE-Analytics-Engineering: Install Gensim for Python3 on stat1007 - https://phabricator.wikimedia.org/T222359 (10GoranSMilovanovic) @elukey @Ottomata > ... to create your own Python virtual environment on stat1007 and pip install the package ... Done. Thank you for your support. [14:22:41] 10Analytics, 10WMDE-Analytics-Engineering: Install Gensim for Python3 on stat1007 - https://phabricator.wikimedia.org/T222359 (10GoranSMilovanovic) [14:22:55] 10Analytics, 10WMDE-Analytics-Engineering: Install Gensim for Python3 on stat1007 - https://phabricator.wikimedia.org/T222359 (10GoranSMilovanovic) 05Open→03Resolved a:03GoranSMilovanovic [14:32:27] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10Patch-For-Review: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10Ottomata) I don't think Magnus would build it into librdkafka itself. But, a 3rd party C library could b... [15:00:41] superset is not on IRC but on slack [15:00:48] (the project devs I mean) [15:01:11] so nice that once I move between chrome tabs the only one hanging and stopping is the slack one [15:01:13] sigh [15:02:22] elukey: you are on mac ya? [15:02:25] try out franz! [15:02:29] i'm using it with slack and irc [15:02:47] actually its on linux too i think [15:05:39] I'll try it :) [15:08:50] ottomata: (if you have time) is analytics-users a good group for the hdfs files? It doesn't seem used a lot, buut probably better than 'analytics', I like it but it shouldn't be used that much [15:09:32] just repsonding on phab! [15:09:35] ah sorry you just answered in the phab! [15:09:35] responded* [15:09:36] :) [15:09:37] :) [15:09:45] no idea what it is best [15:09:53] ya me neither, analytics is fine wiht me [15:09:54] probabaly simpler [15:10:07] maybe we'd have a reason to restrict other perms for analytics-users? [15:10:10] but i don't think we would. [15:10:19] but we agree that 'hadoop' is more a placeholder than something that we use right? [15:10:19] if we did we'd chown group to hdfs or somethinig [15:10:54] yeah i think that maaybe hadoop is more for hadoop/yarn system stuff rather than owning data files [15:10:59] 'hdfs' might be something agnostic like 'hadoop', good for all use cases? [15:10:59] haven't thought deeply about it though [15:11:08] hdfs sounds better for owning files in hdfs [15:11:13] group or user if we hda to [15:11:15] but ya [15:11:41] not sure if i this the way it is but [15:11:48] maybe all hadoop system users would be in the hadoop group [15:11:52] e.g. hdfs, yarn, etc. [15:12:02] hdfs is hdfs super user [15:12:09] and if we need to we can group own files in hdfs as hdfs [15:12:16] but, for our regular data [15:12:25] analytics owner, with appropriate group ownership is best [15:12:40] either privatedata-users or 'analytics' (or analytics-users, whichever you prefer) [15:12:45] makes sense [15:12:52] as long as we are consistent i think either are fine [15:12:58] maybe 'analytics' is better [15:13:02] the more we shift away from using hadoop-system-users the better [15:13:07] because we can use it inplaces where 'analytucs-users' might not exist [15:13:10] agree [15:13:20] let's use analytics then [15:13:22] k [15:13:28] I'll send an email to the team so we can decided [15:13:33] *decide [15:25:34] franz is working better on linux, it had trouble with irccloud initially, but I don't think it has other ways of connecting to IRC other than IRCcloud [15:31:22] 10Analytics, 10Analytics-Kanban, 10EventBus: Port usage of mediawiki_ApiAction to mediawiki_api_request - https://phabricator.wikimedia.org/T222267 (10Ottomata) Looking a bit better! ` SELECT count(*) from wmf_raw.ApiAction where year = 2019 AND month = 5 AND day = 3 AND hour BETWEEN 9 AND 12 AND ts >= 1556... [16:00:51] milimetric, joal : standduppp [16:00:55] ottomata: standduppp [16:14:47] elukey, hey, I've been wondering how big a job it would be to migrate deployment-zookeeper02 from jessie to stretch. I'm not familiar with zookeeper [16:16:25] Krenair: o/ - IIRC we have a single host in deployment-prep, so some downtime is surely needed. We could create a new instance and move everything to it, starting from a fresh state. It shouldn't be a huge deal for deployment-prep, but some testing might be needed [16:16:56] if you want we can open a task and then list what is needed in there [16:17:19] 10Analytics: 15.wikipedia.org missclassified as a pageview, same for query.wikidata.org - https://phabricator.wikimedia.org/T222460 (10Nuria) [16:18:08] 10Analytics: 15.wikipedia.org missclassified as a pageview, same for query.wikidata.org - https://phabricator.wikimedia.org/T222460 (10Nuria) a:03Joalpe [16:18:17] 10Analytics: 15.wikipedia.org missclassified as a pageview, same for query.wikidata.org - https://phabricator.wikimedia.org/T222460 (10Nuria) a:05Joalpe→03JAllemandou [16:23:02] elukey, that would be helpful. it's not urgent or anything (jessie still has a year), I've just noticed that prod is already ahead on all of this [16:23:28] yeah makes sense, count me in when you want to do it [16:23:38] I'll not have a ton of time to dedicate but I can help for sure [16:25:01] I'll make a task, thanks [16:35:38] https://phabricator.wikimedia.org/T222461 [16:48:27] nice :) [16:58:45] 10Analytics, 10MinervaNeue, 10Product-Analytics, 10Readers-Web-Backlog, and 2 others: [Spike ??hrs] Sticky header instrumentation - https://phabricator.wikimedia.org/T199157 (10ovasileva) 05Open→03Stalled p:05Normal→03Low dropping as low until the project is scheduled [17:07:09] Hey team, anything needed from me before I leave for tonight? [17:14:28] not from me! [17:19:16] Ok then, gone for tonight :) [17:19:26] Have a good weekend team [17:24:09] you too! [17:29:12] * elukey off! [17:37:01] joal: when snapshot is done will e-mail product analytics right? [18:05:19] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Set up automated email to report completion of mediawiki_history snapshot and Druid loading - https://phabricator.wikimedia.org/T206894 (10Neil_P._Quinn_WMF) We just got the email announcing the April snapshot—thank you! One request... [18:39:11] 10Analytics, 10Analytics-Kanban, 10EventBus: Port usage of mediawiki_ApiAction to mediawiki_api_request - https://phabricator.wikimedia.org/T222267 (10Ottomata) @bd808 @Tgr https://github.com/bd808/action-api-analytics/pull/1 I tested the selects on theses queries, I think they all work as expected. It'd... [18:45:26] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Fundraising-Backlog, and 3 others: Fix EventLogging schemas that use array for items type - https://phabricator.wikimedia.org/T218617 (10Ottomata) Ok @chelsyx, our fix is out. Can you try changing the schema again? [19:13:09] (03PS1) 10Ottomata: Fix EventLoggingSchemaLoader to properly set useragent is_bot and is_mediawiki fields as booleans [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/508007 (https://phabricator.wikimedia.org/T215442) [19:13:57] yar nuria, found simple a bug in eventloggingschemaloader: https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/508007/1/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/jsonschema/EventLoggingSchemaLoader.java [19:16:41] (03PS1) 10Ottomata: EventLoggingSchemaLoader - Don't include unused timestamp field in EventCapsule [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/508008 (https://phabricator.wikimedia.org/T215442) [19:35:37] (03PS2) 10Milimetric: [WIP] Change projection of world map to eckert 3 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/504429 (https://phabricator.wikimedia.org/T218045) [19:43:37] 10Analytics, 10Research-management: Test GPUs with an end-to-end training task (Photo vs Graphics image classifier) - https://phabricator.wikimedia.org/T221761 (10dr0ptp4kt) As an example, https://upload.wikimedia.org/wikipedia/commons/thumb/7/7f/NY_308_in_Rhinebeck_4.jpg/800px-NY_308_in_Rhinebeck_4.jpg maps t... [19:50:22] (03CR) 10Nuria: [C: 03+2] Fix EventLoggingSchemaLoader to properly set useragent is_bot and is_mediawiki fields as booleans (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/508007 (https://phabricator.wikimedia.org/T215442) (owner: 10Ottomata) [19:52:37] ottomata: bad /me for not having seen that before [19:54:19] (03CR) 10Nuria: [C: 03+2] EventLoggingSchemaLoader - Don't include unused timestamp field in EventCapsule [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/508008 (https://phabricator.wikimedia.org/T215442) (owner: 10Ottomata) [19:55:59] (03Merged) 10jenkins-bot: Fix EventLoggingSchemaLoader to properly set useragent is_bot and is_mediawiki fields as booleans [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/508007 (https://phabricator.wikimedia.org/T215442) (owner: 10Ottomata) [20:01:17] (03Merged) 10jenkins-bot: EventLoggingSchemaLoader - Don't include unused timestamp field in EventCapsule [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/508008 (https://phabricator.wikimedia.org/T215442) (owner: 10Ottomata) [20:59:47] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Fundraising-Backlog, and 3 others: Fix EventLogging schemas that use array for items type - https://phabricator.wikimedia.org/T218617 (10Ottomata) [21:03:06] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Patch-For-Review: Make Refine use JSONSchemas of event data to support Map types and proper types for integers vs decimals - https://phabricator.wikimedia.org/T215442 (10Ottomata) I just ran a Refine test with JSONSchema on existing Hive table schemas Most wo... [23:10:18] (03CR) 10Nuria: "Nice, thanks for doing these changes, please see my comments on druid indexing." (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/507933 (https://phabricator.wikimedia.org/T213770) (owner: 10Joal)