[00:06:13] 10Analytics, 10Product-Analytics: Hash all pageTokens or temporary identifiers from the EL Sanitization white-list as needed for iOS - https://phabricator.wikimedia.org/T226849 (10Aklapper) a:05chelsyx→03None [05:48:41] 10Analytics, 10Cleanup: Deletion of limn-flow-data repository - https://phabricator.wikimedia.org/T228981 (10fdans) @EBernhardson thanks! Adding cleanup project tag since it's safe to proceed. [05:51:22] 10Analytics, 10Editing-team: Deletion of limn-edit-data repository - https://phabricator.wikimedia.org/T228982 (10fdans) @Neil_P._Quinn_WMF anything else you think should be saved here? https://github.com/wikimedia/analytics-limn-edit-data [05:51:33] goooood morning [07:41:17] o/ [07:55:27] (03PS2) 10Elukey: unique_devices: move oozie coordinators to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529381 (https://phabricator.wikimedia.org/T227257) [08:32:03] (03PS1) 10Fdans: Add literal transcoding field to media info UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/529713 (https://phabricator.wikimedia.org/T230312) [08:32:04] 10Analytics, 10Analytics-Kanban, 10StructuredDataOnCommons, 10Tool-Pageviews: Add literal transcoding to media file properties UDF - https://phabricator.wikimedia.org/T230312 (10fdans) [09:01:42] (03PS3) 10Fdans: Add creation query for new nediarequests dataset [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528134 (https://phabricator.wikimedia.org/T229817) [09:01:49] (03CR) 10Fdans: Add creation query for new nediarequests dataset (035 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528134 (https://phabricator.wikimedia.org/T229817) (owner: 10Fdans) [09:04:02] (03CR) 10Fdans: [C: 03+1] "Scanned visually every file, nothing looks weird!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529381 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [09:55:08] 10Analytics, 10Product-Analytics, 10Readers-Web-Backlog (Needs Product Owner Decisions): Reading_depth remove eventlogging instrumentation? - https://phabricator.wikimedia.org/T229042 (10phuedx) I'm not against disabling the instrumentation for now (but keeping the dataset per T229042#5370440), given the… a... [11:09:02] * elukey lunch! [12:53:57] fdans: you there? [12:54:12] need to ask a n00b question about wikistats [13:00:19] elukey: helloooo [13:00:39] :) [13:01:04] so my question is - what a wikistats deploy usually change? [13:01:17] I am trying to figure out the caching settings that we were discussing last week [13:01:43] elukey: the contents of the dist dir [13:02:02] index.html stays with the same name [13:02:17] but the css and ja have hashes in their names [13:02:21] js* [13:21:50] fdans: ah ok, thanks! [13:22:21] so after a chat with the Traffic team, I validated our assumptions - varnish caches for 24h by default and object without caching headers [13:22:28] but due to the currently layering [13:22:58] it might happen that the cache gets up to 4d [13:23:16] so we need to explicitly set headers for .html files in my opinion [13:24:09] fdans: afaics from thorium, we have only index.html right? [13:27:12] elukey: hmmm yes, wikistats is a single-page application, so only index.html [13:45:09] (03PS4) 10Fdans: Add creation query for new nediarequests dataset [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528134 (https://phabricator.wikimedia.org/T229817) [13:46:29] (03PS5) 10Fdans: Add creation query for new nediarequests dataset [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528134 (https://phabricator.wikimedia.org/T229817) [13:52:26] fdans: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/529795/ [13:52:40] whenever you have time let me know if it makes sense [13:58:44] elukey: max-age is in seconds right? [14:02:33] fdans: correct [14:02:47] we can put any amount of seconds, I thought that 10s is reasonable [14:03:02] yeah, sounds good to me [14:04:12] elukey: the browser aggressively caches both css and js, so for non-dev usage it won't really make a difference [14:04:30] (except for deployment problems like the other day) [14:05:41] (just changed the commit msg, it referenced no-cache) [14:11:54] hi teamm :] [14:12:33] no alarms yay [14:12:50] touch wood [14:18:04] (03CR) 10Mforns: [C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529315 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:21:46] (03CR) 10Mforns: [V: 03+2 C: 03+2] mediawiki: move oozie coordinators to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529315 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:22:24] (03CR) 10Mforns: [V: 03+2 C: 03+2] projectview: move oozie coordinator to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529377 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:26:00] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529384 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:28:54] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529387 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:32:21] mforns: o/ [14:32:22] thanks! [14:32:32] hey elukey :] [14:33:09] how was the morning [14:33:10] ? [14:33:44] all quiet :) [14:40:58] hey I was just about to take the day off but it turns out it's a holiday today :) [14:41:06] I ran into all sorts of mess with my contractor [14:41:24] so I'll go sort it out [14:41:47] feel free to IRC me if you need me, I'm just running around like a crazy person [14:41:59] (03PS3) 10Mforns: unique_devices: move oozie coordinators to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529381 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:43:03] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! I just corrected the indentation of one line in one properties file." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529381 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:43:47] (03PS2) 10Mforns: Move oozie's Hive utils workflows to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529384 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:44:01] (03CR) 10Mforns: [V: 03+2 C: 03+2] Move oozie's Hive utils workflows to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529384 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:45:08] (03PS2) 10Mforns: virtualpageview: move oozie coords to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529385 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:46:32] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529385 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:47:02] (03PS3) 10Mforns: webrequest: move oozie coords to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529387 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:47:09] (03CR) 10Mforns: [V: 03+2 C: 03+2] webrequest: move oozie coords to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/529387 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [15:02:47] mforns, fdans - standup? [15:02:55] elukey, uuuups [15:02:58] joining [15:23:19] 10Analytics, 10Analytics-Kanban, 10StructuredDataOnCommons, 10Tool-Pageviews, 10Patch-For-Review: Add literal transcoding to media file properties UDF - https://phabricator.wikimedia.org/T230312 (10fdans) p:05Triage→03High [15:23:49] fdans, you were saying about the review right? [15:23:57] will look [15:24:04] mforns: yes the udf one [15:24:11] k [15:24:14] i also replied to the create query one [15:45:41] (03PS2) 10Mforns: [WIP] Add spark job to create mediawiki history dumps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/528504 (https://phabricator.wikimedia.org/T208612) [16:05:52] (03CR) 10Mforns: "Just one nit-picky comment" (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/529713 (https://phabricator.wikimedia.org/T230312) (owner: 10Fdans) [16:06:31] fdans, I just added one quite nitpicky comment, but looks good to me! [16:06:40] fdans, have you tested it in Hive? [16:09:34] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Tune Wikistats 2 Varnish caching - https://phabricator.wikimedia.org/T230136 (10elukey) a:05Milimetric→03elukey [16:35:27] (03PS2) 10Fdans: Add literal transcoding field to media info UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/529713 (https://phabricator.wikimedia.org/T230312) [16:35:47] (03CR) 10Fdans: Add literal transcoding field to media info UDF (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/529713 (https://phabricator.wikimedia.org/T230312) (owner: 10Fdans) [16:36:33] (03CR) 10Mforns: Add creation query for new nediarequests dataset (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/528134 (https://phabricator.wikimedia.org/T229817) (owner: 10Fdans) [16:39:06] mforns: https://medium.com/@ntnmathur/cluster-by-and-clustered-by-in-spark-sql-9af7f8b80978 [16:39:28] mforns: I guess it makes a difference, I just added it because it was in the mediacounts one [16:45:36] mforns: going afk for dinner with friends, is it ok? [16:45:48] I can take care of restarting the coords tomorrow morning [16:46:00] if you want to leave them [16:46:07] anyway, thanks a lot! [16:46:08] o/ [16:46:11] elukey, sure, no no, will restart them slowly today, if I don't finish, we can continue tomorrow [16:46:18] :) [16:46:22] :] [16:50:47] fdans, AFAICS, the CLUSTERED BY (blah) INTO 64 BUCKETS command forces the number of files of each leaf partition directory. As opposed of letting HDFS distribute the data as it pleases. [16:51:15] mforns: let's remove it then? [16:51:21] if you i.e. hdfs dfs -ls /wmf/data/wmf/mediacounts/year=2019/month=8/day=7/hour=4 [16:51:32] you can see that there are 64 files [16:53:44] fdans, each file is about 4MB [16:53:58] if you look at pageviews though [16:54:15] with i.e.: hdfs dfs -ls /wmf/data/wmf/pageview/hourly/year=2019/month=8/day=6/hour=2 [16:54:21] you can see less files, but bigger [16:54:40] about 56MB each [16:55:20] don't know what is better! but it seems the mediacounts job is the only one in refinery that has that clustered thing, I'd say we can remove, no? [17:43:03] mforns: agreed [17:43:11] let's merge the udf change tho? [17:44:10] fdans, have you tested it in Hive? [17:44:36] mforns: yesss fdans.mediarequests [17:44:44] there's an hour loaded there [17:46:43] fdans, with the last code? I think it's missing the transcodingClassification = blah [17:48:44] mforns: not sure what you mean [17:49:09] oh [17:49:25] transcodingClassification is an instance variable [17:49:39] right [17:49:51] mforns: ah yes, I replaced the calls to the getter with the field [17:50:04] yes yes, my bad [17:50:15] not sure it makes much of a difference, but it's less verbose [17:50:33] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/529713 (https://phabricator.wikimedia.org/T230312) (owner: 10Fdans) [17:50:36] mforns: I always appreciate your thoroughness :) [17:50:46] xD [18:33:40] !log Starting deployment of analytics-refinery up to 5418d3be5f65f7325324d0c15c51b3ca722dde1c [18:33:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:32:57] !log Finished deployment of analytics-refinery up to 5418d3be5f65f7325324d0c15c51b3ca722dde1c [19:32:58] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:00:57] team, I'm going to go ahead and restart some oozie jobs after deployment. we might see some SLA false alerts [21:06:28] !log restarted Webrequest bundle in oozie [21:06:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:14:00] !log restarted Webrequest druid coordinators in oozie [21:14:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:28:06] !log restarted all Virtualpageview coordinators in oozie [21:28:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:55:48] !log restarted all Unique Devices coordinators (except cassandra ones) in oozie [21:55:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:00:09] !log restarted projectview geo coordinator in oozie [22:00:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:26:59] I want to read a small set of revisions to a number of deleted pages on various wikimediawikis [22:27:34] specifically, revisions to "Mediawiki:Undo-summary" [22:27:49] is there a database I can access to get this??