[07:07:18] morning! I'm dealing with the failed cron, no worries -.- [07:08:29] fdans: o/ [07:08:45] I have received an email for /user/fdans/etc.. on root@ [07:09:04] with yesterday's failure [07:09:11] you may have it also in your crontab [07:09:40] also, do you want to have emails be sent to analytics-alerts@ or do you prefer your email? [07:10:57] elukey: analytics-alerts is prob better, I just need to test this once more and make sure it runs correctly [07:12:31] fdans: ack, I just commented the crontab for your user on stat1007 [08:10:33] Morning team :) [08:13:44] o/ [09:21:18] (03CR) 10Joal: [C: 04-1] "This should work for backfilling but not for regular daily jobs: jobs are materialized at the beginning of the day (coord.actualTime), and" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/549876 (owner: 10Fdans) [09:29:56] (03CR) 10Joal: Add python oozie lib and oozie-dumper script (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/549861 (https://phabricator.wikimedia.org/T237271) (owner: 10Joal) [09:30:20] (03PS3) 10Joal: Add python oozie lib and oozie-dumper script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/549861 (https://phabricator.wikimedia.org/T237271) [09:31:08] Wow elukey - I can tell stat1004 is kerberized :) [09:35:33] (03PS4) 10Joal: Add python oozie lib and oozie-dumper script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/549861 (https://phabricator.wikimedia.org/T237271) [09:36:47] joal: hahahaha [09:36:55] is is too much or understandable? [09:37:01] I created also a userguide [09:38:05] This is great elukey - Very nticeable (that's the whole point), and understable [09:38:31] elukey: possibly we wait for kerberos to be deployed before updating all our hosts with the message? [09:38:36] maybe not though :0 [09:42:35] now is the right time, all the Kerberos tools are avaialble and working, so people can already familiarise [09:44:29] yep exactly this was my thought as well [09:44:54] hosts are all kerberized, and in the UserGuide I added a note about the fact that it is not enabled yet in prod [09:45:22] ok :) [09:54:54] * joal can feel Kerberos fetid breath on his neck [09:59:52] omg joal [10:01:09] yes fdans? [10:01:31] joal: nah the breath comment left an impression of me :D [10:01:34] D: [10:01:47] impression on me* [10:47:22] ah joal little problem, the oozie dump script doesn't work with kerberos [10:47:34] http needs to be instructed to use spnego probably [10:47:50] that is not a problem for the migration [10:48:38] the oozie http api will require kerberos (good since it will properly authenticate api requests) [10:48:45] readin [10:48:52] about spnego elukey :) [10:49:03] Will update the script - thanks for pointing! [10:49:56] maybe we can add a flag like --use-kerberos or similar [10:49:59] np :) [10:50:05] the rest looks perfect [10:50:16] thanks for the review :) [11:27:42] PROBLEM - Check the last execution of eventlogging_db_sanitization on db1107 is CRITICAL: NRPE: Command check_check_eventlogging_db_sanitization_status not defined https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [11:28:11] this is me --^ [11:34:18] PROBLEM - Check the last execution of eventlogging_db_sanitization on db1108 is CRITICAL: NRPE: Command check_check_eventlogging_db_sanitization_status not defined https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [11:35:07] same thing :) [11:42:10] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Rerun sanitization before archiving eventlogging mysql data - https://phabricator.wikimedia.org/T236818 (10elukey) [11:43:51] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10elukey) Sanitization done! The remaining step seems to be to take a mysql dump of both log databases (on db1107 and db1108) a... [11:47:48] * elukey lunch! [13:51:44] elukey: I've been reading a bit on spnego - Would you have a minute to talk with me about that? [13:55:31] joal: sure [13:55:40] in the cave or here? [13:55:50] elukey: cave seems easier if you have time [13:56:04] sure [14:43:33] elukey: The refinery rsync function has worked great with kerb - I assume we need a wrapper to be able to use it as a script, right? [14:43:45] \o/ [14:44:24] joal: does it need all the refinery to work? Because deploying it to labstore nodes might be problematic [14:44:36] elukey: I tried it both ways: local-to-hdfs and hdfs-to-local - There are interesting details: hidden files not copied [14:44:48] elukey: it needs refinery-python :( [14:45:19] elukey: it feels like we will be willing to separate refinery-jars from refinery-python - But some of it is intertweened [14:45:30] yeah I can imagine [14:45:45] For instance the camus wrapper uses a jar ... [14:51:21] elukey: can scap deploy parts of repos? or could we have a hierarchical scap projects (subprojects depending on parent ones)? [14:54:38] Also elukey - Shall I use that to implement spnego for the script? https://github.com/pythongssapi/requests-gssapi [14:56:14] yep seems good! [14:56:32] no idea about scap.. maybe we could have git submodules, and deploy them separately if needed [14:58:32] dropping for kids - bac kfro standup [15:26:01] elukey: do you recall anything about deploying wikistats 1? [15:26:12] would just running the ol puppet on thorium do the trick? [15:26:27] I want to deploy a lil change in index.html [15:27:24] fdans: good question, I don't recall [15:27:43] but we can try with puppet [15:28:04] elukey: yesss let's do it [15:35:28] fdans: how should we proceed? do you have a change ready or still in progress? [15:35:37] (just to understand if you are waiting for me or not) [15:35:43] elukey: already merged :) [15:35:49] ah ok [15:36:53] elukey: I also had the devious idea of just replacing the file in thorium manually but I thought to try the nice way first [15:36:56] fdans: forced the puppet run but I didn't see any git pull [15:37:13] so there is probably another way [15:37:36] elukey: I feel like it's rsynced because the folder in thorium isn't a git repo [15:38:28] yep [15:39:42] ah yes it is [15:39:51] lemme find the info in puppet [15:40:22] elukey: could it be that erik just uploaded the files? [15:40:38] fdans: it seems that Erik was able to rsync before, but now we disabled it [15:41:04] elukey: ok do I have permission to just overwrite the file? [15:41:22] * fdans smiles nervously [15:41:27] fdans: +1, log it in here though [15:41:38] thank you elukey [15:42:10] !log manually overwriting index.html in Wikistats 1 to apply patch https://gerrit.wikimedia.org/r/#/c/analytics/wikistats/+/550338/ [15:42:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:43:27] be back in a bit :) [15:47:07] (03PS1) 10Mforns: Add data quality metric: traffic variations per country [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) [15:47:43] (03CR) 10Mforns: [V: 03+2] Refactor data_quality oozie bundle to fix too many partitions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547320 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [15:57:40] mforns: o/ [15:57:56] sanitization completed on both log databases, I have removed all our work :( [16:06:53] (03CR) 10Nuria: [C: 04-1] "Before merging these code and queries let's please study that the data we are pulling has a signal, for which we probably need to run the " [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) (owner: 10Mforns) [16:08:04] elukey, \o/ [16:08:28] eventlogging_cleaner will always exist in our hearts [16:10:49] (03CR) 10Nuria: [C: 04-1] Add query to track WDQS updater hitting Special:EntityData (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549859 (https://phabricator.wikimedia.org/T218998) (owner: 10Ladsgroup) [16:15:53] mforns: INDEED [16:16:19] heh [16:16:48] mforns: for the quality alarms, we still do not have the code that would alarm on sudden changes of entrophy right? [16:17:03] mforns: for teh UA [16:17:05] *the [16:17:18] nuria, no no [16:17:27] that would be another job [16:17:33] on the same workflow [16:19:03] mforns: let's do that before jumping into pageviews per country , let me know if you disagree. i think we can try the idea of removing the top 20% of pages but still teh underlying data series will be very seasonal ( my bet) so i think we need to study a bit what threshold of pages to remove if any and consult with jelel [16:19:34] (03CR) 10Ladsgroup: Add query to track WDQS updater hitting Special:EntityData (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549859 (https://phabricator.wikimedia.org/T218998) (owner: 10Ladsgroup) [16:22:38] just added a pic to the Kerberos user guide https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide#High_level_overview [16:22:47] will add some words about it [16:27:34] elukey: ooohhh [16:27:39] nuria, I already finished the query with: 1) top 20% articles ignored and 2) pageviews per country normalized by their continents pageviews, which will reduce seasonality [16:28:25] mforns: I think we shoudl determine emprirically whether 20% is a good threshold [16:28:30] nuria, you can take a look at the metric in Superset: https://superset.wikimedia.org/superset/dashboard/73/ [16:28:36] nuria, sure! [16:29:38] the dashboard for traffic_per_country is being backfilled, will take a bit [16:30:00] mforns: and normalizing by continent also requires proof that it works, i do not see that it woudl for acountry like cuba [16:31:08] mforns: i think we should test teh asumptions we made in jupyter and use a known event (of censhorship or traffic decline) to see how things look so at this time i feel the superset dashboard is a bit premature [16:31:31] nuria, yes yes I saw your CR, makes sense [16:32:08] a-team so now we're officially retiring Wikistats 1 in January 2020: https://stats.wikimedia.org/ [16:32:21] yes, normalizing cuba by continent would not be ideal, but in general I thought that it would be a better normalization than global [16:32:36] fdans, O.o! [16:32:39] mforns: so let's do two things: 1) let's work with these events and our timeseries : https://office.wikimedia.org/wiki/Analytics/MonitoringWikipediaAccessibilityAroundTheWorld [16:33:22] and see what we get in terms of metrics and 2) let's work on the alarming scripts for UA entrophy which we know has a signal we can alarm on [16:34:25] mforns: my hunch is that normalizing by continent is not effective but you can prove me wrong [16:34:47] nuria, yea I'm also not super happy with that, let's see [16:35:07] you think it would be better to normalize by global? [16:35:16] mforns: let's work in jupyter 1st in this case, we have all teh data we need so we can try out different things there faster [16:35:30] mforns: for entrophy we needed to gather the data but in this case we have it [16:36:02] sure [16:42:57] (03CR) 10Nuria: [C: 04-1] Add query to track WDQS updater hitting Special:EntityData (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549859 (https://phabricator.wikimedia.org/T218998) (owner: 10Ladsgroup) [16:50:26] fdans: I am on Chrome and I don't see any banner indicating that [16:50:31] might be that I have a page cached [16:51:09] ah there you go after some refreshes it popped out [16:52:17] !log forced a purge in Varnish for the stats.wikimedia.org front page to pick up the new deprecation banner [16:52:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:52:20] fdans: --^ [16:52:38] oh thanks elukey :) [17:01:26] ping ottomata , mforns standdduppp [17:20:57] 10Analytics: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10Danielsberger) Thanks @lexnasser , that's very helpful. Is there a way to check how frequently submit queries happen for hosts other than upload? [17:43:16] 10Analytics: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10lexnasser) @Danielsberger I'm not sure if there's a public-facing way to check the frequency of submit queries. Will have to defer to @Nuria about that. That said, I believe that *.... [17:48:52] 10Analytics, 10Analytics-Wikistats: Wikistats data discrepancy for India page views from hive data pull - https://phabricator.wikimedia.org/T237579 (10Iflorez) This is helpful clarity as we work with various data systems. I will read the AQS endpoints documentation. Thank you! [17:51:11] 10Analytics, 10Analytics-Wikistats: Wikistats data discrepancy for India page views from hive data pull - https://phabricator.wikimedia.org/T237579 (10Iflorez) @pdas This is the documentation for Wikistats 2: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Wikistats_2 I will also be reading through this... [18:05:41] 10Analytics, 10Desktop Improvements, 10Event-Platform, 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): [SPIKE 8hrs] How will the changes to eventlogging affect desktop improvements - https://phabricator.wikimedia.org/T233824 (10Jdrewniak) a:03Jdrewniak [18:36:12] #wikimedia-analytics need to perform maintenance on analytics1062 would like some assistance depooling host [18:37:11] 10Analytics, 10DBA: Repurpose db1107 as a generic database - https://phabricator.wikimedia.org/T238113 (10elukey) [18:37:21] nuria: --^ :) [18:45:40] 10Analytics, 10Analytics-Wikistats: Wikistats data discrepancy for India page views from hive data pull - https://phabricator.wikimedia.org/T237579 (10Nuria) Wikistats is a limited UI over a more capable API (AQS the analytics query service) so what wikistats offers and AQS can do are different things. Data so... [18:47:33] elukey: ya? db1107? [18:48:25] elukey: tears come to my eyes when thinking of getting rid of those boxes , not that we do not LOVE every one of our boxes [18:49:35] :) [19:03:02] msg ottomata MEP meeting? [19:49:30] (03PS1) 10Joal: Add hdfs-rsync script based on Hdfs python lib [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550536 (https://phabricator.wikimedia.org/T234229) [19:53:39] ok team - leaving for diner [20:07:26] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Epic: Vertical: Virtualpageview datastream on MEP - https://phabricator.wikimedia.org/T238138 (10Nuria) [20:07:40] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Epic: Vertical: Virtualpageview datastream on MEP - https://phabricator.wikimedia.org/T238138 (10Nuria) Helpful notes on etherpad: https://etherpad.wikimedia.org/p/event-platform [20:09:58] ottomata: please modify as needed: https://phabricator.wikimedia.org/T238138 [20:12:00] 10Analytics, 10Analytics-Cluster, 10Operations, 10ops-eqiad: analytics1062 lost one of its power supplies - https://phabricator.wikimedia.org/T237133 (10ops-monitoring-bot) Icinga downtime for 2:00:00 set by otto@cumin1001 on 1 host(s) and their services with reason: analytics1062 lost one of its power sup... [20:59:59] (03CR) 10Ladsgroup: Add query to track WDQS updater hitting Special:EntityData (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549859 (https://phabricator.wikimedia.org/T218998) (owner: 10Ladsgroup) [21:46:37] 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10CPT Initiatives (Modern Event Platform (TEC2)), and 2 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Ottomata) [22:20:09] mforns: apple copying your ideas: https://www.apple.com/uk/privacy/ [22:20:23] O.o [23:08:21] 10Analytics, 10Growth-Team, 10Product-Analytics: Growth: implement wider data purge window - https://phabricator.wikimedia.org/T237124 (10Nuria) [23:22:56] fdans: wikistats2 getting closer to 2000 uniques a day!