[01:06:07] 10Analytics, 10Product-Analytics, 10VisualEditor: Hash all pageTokens or temporary identifiers from the EL Sanitization white-list - https://phabricator.wikimedia.org/T220410 (10ppelberg) [06:14:23] wow https://grafana.wikimedia.org/d/000000585/hadoop?orgId=1&from=1569964002384&to=1569974708471 [06:15:45] the winner is... Andrew :D [06:15:51] I am checking the hdfs audit log [06:21:08] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Scoring-platform-team, 10Patch-For-Review: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10elukey) Andrew the script to back fill might have been a bit aggressive, the HDFS RPC queue jumped like cr... [07:39:43] (03PS1) 10Elukey: oozie: add hive2_jdbc parameter to mediawiki history check_denormalize coord [analytics/refinery] - 10https://gerrit.wikimedia.org/r/540356 [07:43:40] (03PS2) 10Elukey: oozie: add hive2_jdbc parameter to mediawiki history check_denormalize coord [analytics/refinery] - 10https://gerrit.wikimedia.org/r/540356 [08:31:42] !log kill/restart mw check denormalize with hive2_jdbc parameter [08:31:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:45:06] 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): MinervaClientError sends malformed events - https://phabricator.wikimedia.org/T234344 (10ovasileva) [09:26:57] ah! I may have found the issue with druid_loader.py [09:27:15] since we haven't restarted the coords using it, it is still referencing the hdfs version [09:27:25] namely the one with the #! python [09:27:38] so when executed on worker nodes, it tries to use python-yaml [09:27:39] and fail [09:30:39] so I need now to restart all the coordinators using druid_loader.py [09:31:01] or better, only the ones that haven't been restarted before my change was merged/deployed [09:31:06] lovely [09:34:06] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move the Analytics Refinery to Python 3 - https://phabricator.wikimedia.org/T204735 (10elukey) The explanation should be the following: the oozie shell action uses the druid_loader.py stored in HDFS, not the ones where oozie runs (of course). So since we h... [09:38:54] !log restart banner activity druid daily/montly coords to pick up new hdfs version of druid_loader.py [09:38:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:41:40] !log restart edit druid hourly coord to pick up new hdfs version of druid_loader.py [09:41:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:45:18] !log restart mw geoeditors druid coord to pick up new hdfs version of druid_loader.py [09:45:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:48:51] !log restart pageview druid hourly/daily/montly coords to pick up new hdfs version of druid_loader.py [09:48:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:49:49] ah! [09:49:50] Error: E0738 : E0738: The following 3 parameters are required but were not defined and no default values are available: hive2_jdbc_url, druid_datasource, sla_alert_contact [09:49:59] this is the pageview druid monthly [09:50:56] no I am simply stupid, nevermind [09:59:05] hi team, I'm starting now since I'm feeling better compared to yesterday :) [09:59:12] good morning :) [09:59:19] hellooo elukey [10:00:25] elukey: I see you're restarting jobs, if there's anything you need me to look at, I'm happy to help [10:06:50] fdans: there is one thing! the mediawiki history check denormalize coord failed two times this morning [10:07:01] the first one was due to a missing parameter (hive2_jdbc) [10:07:23] I restarted it with the parameter, but then it failed again, this time there seems to be problems with the data [10:08:22] elukey: I'll have a look now [10:15:36] !log re-run unique devices druid daily 28/09/2019 - failed but possibly no alert was fired to analytics-alerts@ [10:15:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:15:46] this needs to be checked as well --^ [10:16:58] nope it did send an alarm [10:24:49] !log restart unique dev per domain druid daily_agg_monthly/daily/montly coords to pick up new hdfs version of druid_loader.py [10:24:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:25:14] fdans: going to stop now with the restart, will finish the work later on in the afternoon [10:25:20] sorry for the noise in alerts@ [10:25:30] but there shouldn't be anything to take care other than the check denorm thing [10:31:10] taking it as yes, lunch! Ping me on the phone if needed [10:36:01] elukey: sorry luca, was looking at the errors, I'll ask joal or milimetric as soon as they are in, I've never seen this before [12:48:36] fdans: o/ [12:48:40] resuming the restart job stuff [12:48:52] cooool! [12:54:04] !log kill/start oozie unique devices per family druid daily/daily_agg_mon/monthly coords to pick up new druid_loader.py version [12:54:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:04:56] !log kill/start oozie virtualpageview druid daily/monthly coords to pick up new druid_loader.py version [13:04:56] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Scoring-platform-team, 10Patch-For-Review: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10Ottomata) Iinnnteresting. And the job failed too (and deleted all the temporary output files it created).... [13:04:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:08:19] !log kill/start oozie webrequest druid daily/hourly coords to pick up new druid_loader.py version [13:08:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:10:21] ok I should have restarted all the ones using druid_loader.py [13:12:44] !log remove python-request from all the hadoop workers (shouldn't be needed anymore) [13:12:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:16:12] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move the Analytics Refinery to Python 3 - https://phabricator.wikimedia.org/T204735 (10elukey) All the druid-related coordinators restarted, and I removed python-request from all the hadoop workers. If everything goes according to plan, no issue should ari... [13:24:04] ottomata: I am trying to find the dataset on search queries (https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Cirrus) could you help me with a pointer how to access? [13:27:19] mgerlach: I think that the table should be in the event database [13:27:30] (hive -> use event; -> show tables.. etc..) [13:28:29] mmm maybe, I am trying to find it [13:29:22] so the schema mentions avro, I am wondering if those are the old ones [13:29:32] that got migrated to eventgate? [13:29:34] super ignorant [13:29:40] yes the doc is obsolete :( [13:30:13] yes yes https://phabricator.wikimedia.org/T214080 :) [13:30:49] dcausse: do you know where to find the new events in hive? [13:30:54] so we can help mgerlach [13:31:00] sure [13:31:14] lemme find some code snippet [13:31:24] this data is not super trivial to manipulate [13:35:55] elukey: thanks. i can see the tables. perhaps mediawiki_cirrussearch_request ? [13:37:27] mgerlach: yes that's it: https://phabricator.wikimedia.org/P9228 [13:38:07] dcausse: great. thanks a lot. [13:38:10] super [13:40:49] dcausse: btw, I dont think we have met. I am martin from research (just joined so trying to understand the different datasets). thanks again. [13:42:13] mgerlach: great to meet you, I'm David working on search :) [13:42:51] if you have questions regarding this data I can help a bit, but ebernhardson will definitely be the best source of knowledge for it [13:43:17] cool, good to know. will probably have more questions ; ) [13:48:05] (03CR) 10Fdans: [V: 03+2] Correct parameters in mediarequest cassandra jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/537936 (owner: 10Fdans) [13:50:03] dcausse: for your code example I get the following exception (on swap, using PySpark - Yarn): [13:50:03] AnalysisException: 'No such struct field suggestion in database, query;' [13:50:03] Is this on my side? didnt understand all the fields right away [13:52:54] mgerlach: my bad [13:54:40] (03PS10) 10Fdans: Add oozie job to load top mediarequests data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538880 (https://phabricator.wikimedia.org/T233717) [13:56:28] mgerlach: paste updated (https://phabricator.wikimedia.org/P9228) [13:57:31] dcausse: merci [13:57:40] yw! [13:58:25] works now [14:00:29] elukey: one quick question, the restarts you're doing right now, do they include the cassandra bundle? [14:02:35] fdans: nope [14:02:40] I already finished them [14:02:52] elukey: ah cool, I'll restart it as part of the train [14:02:54] they were all the coords using the druid_loader.py [14:03:05] I see [14:03:31] if you deploy, can you review/merge https://gerrit.wikimedia.org/r/#/c/540356/ ? [14:09:39] (03CR) 10Fdans: "This looks good but I think you have to add the property to coordinator.xml in the same directory? Tell me if this doesn't make sense" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/540356 (owner: 10Elukey) [14:14:33] (03PS3) 10Elukey: oozie: add hive2_jdbc parameter to mediawiki history check_denormalize coord [analytics/refinery] - 10https://gerrit.wikimedia.org/r/540356 [14:14:50] fdans: you are right, good catch, updated --^ [14:18:35] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Scoring-platform-team, 10Patch-For-Review: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10Ottomata) Ok, now doing half a month at a time, and writing directly to parquet files rather than going th... [14:33:25] (03CR) 10Fdans: [V: 03+2 C: 03+2] oozie: add hive2_jdbc parameter to mediawiki history check_denormalize coord [analytics/refinery] - 10https://gerrit.wikimedia.org/r/540356 (owner: 10Elukey) [15:02:08] I see data in my druid tables in turnilo (test_search_satisfaction_hourly), but in superset it says 'No data', any ideas? [15:07:45] ebernhardson: is there a datasource about it in superset? [15:07:52] if not you have to define it [15:10:59] elukey: yes, it has a datasource and a dashboard i built a couple months ago [15:11:10] elukey: but now the dashboard says 'no data' on all fields [15:15:35] (03PS4) 10Milimetric: [WIP] Publish monthly geoeditor numbers [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) [15:16:09] (03CR) 10Milimetric: [WIP] Publish monthly geoeditor numbers (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) (owner: 10Milimetric) [15:16:37] ebernhardson: ah, checking [15:18:53] ebernhardson: was it working before and it doesn't anymore? (trying to understand the problem) [15:19:00] I don't see horrible errors in the logs [15:19:27] elukey: yes, when i built this it was working great, and now i shipped an AB test and was looking at it for intermediate results but it says no data :) [15:20:16] elukey: so basically the dashboard "Search Query Suggestions" used to show breakdowns of the query suggestion funnel, but it basically says "no data" for everything. Turnilo against same druid datasource shows data, and trying to compare the queries issued i don't see anything obviously different [15:25:28] ebernhardson: did the measures/etc.. changed? I am wondering if the datasource dims/measures are not reflecting the current data in druid [15:25:59] from the logs it seems that the query returns an empty result [15:26:01] weird [15:26:02] elukey: hmm, they shouldn't. These are all based on an eventlogging schema that hasn't changed [15:26:27] elukey: even creating a new table in superset and pointing it at the datasource gives no results, without chosing special metrics or anything :S [15:28:27] ebernhardson: have some meeting but I'll keep checking.. do you recall the last time that it worked? I am wondering if it was before or after the superset upgrade [15:29:27] elukey: i'm pretty sure it was just after, but lemme double check [15:30:15] superset was upgraded ~aug 21, i built this about ~aug 10th, so before. [15:32:09] i've exported the table config to yaml, i suppose could try delete/re-add [15:45:09] (03PS1) 10Fdans: Add mediarequests tops metric endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/540433 [15:45:46] (03CR) 10jerkins-bot: [V: 04-1] Add mediarequests tops metric endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/540433 (owner: 10Fdans) [15:46:23] (03PS2) 10Fdans: Add mediarequests tops metric endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/540433 [16:00:51] (03CR) 10Fdans: [V: 03+1] "Tested successfully in beta" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/540433 (owner: 10Fdans) [16:04:50] ping fdans standduppp? [16:05:27] 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Nuria) a:05mforns→03Milimetric [16:06:43] (03PS11) 10Fdans: Add oozie job to load top mediarequests data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538880 (https://phabricator.wikimedia.org/T233717) [16:18:47] !log kill duplicate of oozie pageview-druid-hourly coord and start the wrongly killed oozie pageview-hourly-coord (causing jobs to wait for data) [16:18:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:26:20] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Shorten the time it takes to move files from hadoop to dump hosts by Kerberizing the dump hosts - https://phabricator.wikimedia.org/T234229 (10elukey) Ping @Bstorm :) [16:28:53] sukhe: hi! I'm working on a task that I would love your thoughts about: https://phabricator.wikimedia.org/T131280 We read your censorship/surveilance paper and are trying to use insights from there to determine a country blacklist: https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/530878/3/static_data/mediawiki/geoeditors/blacklist/country_codes.tsv [16:29:17] basically, the blacklist would be countries where we think it's not totally safe to publish the dataset [16:30:14] the dataset is: (month, wiki, country, activity level of editors, count of editors at this level) with counts bucketed in intervals of 10 [16:31:43] we talked about this data a bunch and we figure it's generally safe, but we want to be careful not to make it easily accessible to regimes that are censoring Wikis [16:32:42] sukhe: I can explain more in a video call, let us know if/when you have some time [16:32:53] hi milimetric! [16:33:15] (to be clear, I am not the author of the paper, I just reference it a lot :) [16:33:30] but yes, happy to help in any way. let's talk about it if that's easier [16:34:37] mforns: ok to merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/540421/? [16:34:48] elukey, yes :] [16:34:59] ack, doing it :) [16:35:23] milimetric: tomorrow is a bit tight but Friday works [16:36:19] sukhe: ok great, I’ll try to get something on the calendar Friday or early next week [16:36:30] thanks! [16:36:51] I am in EDT (Toronto), if that helps [16:39:16] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Scoring-platform-team, 10Patch-For-Review: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10Ottomata) Woo hoo! ` hive -e 'show partitions event.mediawiki_revision_score' | wc -l 10897 hdfs dfs -du... [16:40:20] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Scoring-platform-team, 10Patch-For-Review: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10Ottomata) And! `lang=sql select page_title, scores['goodfaith'].prediction[1] as prediction, scores['good... [16:46:13] milimetric: can you coordinate with fdans regarding mediawiki history? [16:46:28] mforns: done! [16:46:34] milimetric: i have moved our 1 on 1 to later today [16:46:39] thanks elukey :] [16:46:51] fdans: yeah let’s work on that, I have a bit now then I gotta get lunch [16:46:58] milimetric: if you want to bc now I'm here! [16:47:55] fdans: k, in cave [16:48:52] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Scoring-platform-team, 10Patch-For-Review: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10Ottomata) [16:52:16] ottomata, I launched the hdfs dfs -get /wmf/data/archive/mediawiki/history/* with ionice -c 3 and it seems to be running fine. It's not super fast though, by my calculations it will finish in around 12 hours. [16:53:18] ottomata, just to be sure, I'm using my user to execute it, I imagine there will be no problems with that, if needed we can chown -R later right? [16:54:29] and ottomata, do we need to stop the actual rsync? should I prepare a puppet patch? [16:57:58] ebernhardson: if you could try delete/re-add just as test it would be great.. I'll check tomorrow morning first thing what is happening and report back [16:58:08] (I hope it is not super urgent, otherwise please let me know) [16:58:19] * elukey off! [16:58:23] (will read later) [16:59:25] mforns: i think to stop rsync you are going to need a puppet patch but also sudo so ottomata would probably need to intervene [16:59:26] ya [16:59:28] mforns: sounds good [16:59:34] mfornsyeah we should [16:59:45] ok, will prepare patch [17:05:45] it's here: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/540442/ [17:05:50] still pending jenkins +2 [17:06:13] ok, donw [17:06:15] done [17:16:33] done, had to add an $ensure param to the web::fetches::job too :) [17:32:53] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Scoring-platform-team, 10Patch-For-Review: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10Ottomata) Just merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/475818 too! https://stream.wik... [17:45:02] milimetric: ping me when you are bcak [17:45:05] *back [17:54:03] (03CR) 10Nuria: [WIP] Publish monthly geoeditor numbers (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) (owner: 10Milimetric) [17:59:13] 10Analytics, 10Analytics-EventLogging, 10Wikimedia-Portals: Sudden drop in WikipediaPortal events - https://phabricator.wikimedia.org/T234461 (10Pcoombe) [18:08:50] (03CR) 10Nuria: [C: 04-1] Add mediarequests tops metric endpoint (032 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/540433 (owner: 10Fdans) [18:10:14] nuria: see the job for this endpoint: we only use agent = user [18:10:35] fdans: yaya, i know, it is food for though for later [18:11:07] fdans: for pageviews we are going to have another marker "automated" that would "remove" some of the user requests [18:11:14] nuria: oh i see what you mean tho [18:11:15] yea yea [18:11:18] fdans: but we do not have a way to do this for mediarequests [18:11:25] fdans: it is a note-toself [18:12:11] nuria: we have to keep in mind that mediarequests is flawed by design both by unmarked bots and prefetches [18:12:14] elukey: exported dashboards and table, deleted. Running 'scan new datasources' detected the table, but still no data :( [18:12:24] the solution here will be to have a mediaviews dataset [18:12:32] elukey: so odd that turnilo sees it though..suggesting the data is actually there [18:12:44] done with events, with proper instrumentation that qualifies human views [18:12:52] 10Analytics, 10EventBus, 10Product-Analytics: Review draft Modern Event Platform schema guidelines - https://phabricator.wikimedia.org/T233329 (10Ottomata) Neil good points, I think you are right here. I will change this page to be more about using our schema repositories. Right now we just have the one for... [18:13:03] ebernhardson: let me help you with that cause e.lukey has left for teh day [18:13:09] fdans: ya, +1 [18:13:30] fdans: but we will still getcrawlers that trigger js events [18:13:38] fdans: we see those on eventlogging data [18:13:44] fdans: so it "mitigates" problem [18:13:47] nuria: essentially, test_search_satisfaction_hourly in turnilo has data, but superset says "no data". [18:13:52] fdans: but doe snot disappear [18:14:09] ebernhardson: cause queries are timing out? or rather returning empty? [18:14:10] nuria: this worked a few months ago (pre-superset upgrade) but something now isn't working [18:14:24] nuria: its not a timeout, superset says sub-ms response times [18:14:28] nuria: hmm, but in that case we can develop a solution that encompasses all events, not a solution specific to media metrics, right? [18:14:36] milimetric: when you have some time would love to bb some mep schema things with ya [18:14:41] fdans: ya, that is teh idea [18:14:50] ebernhardson: ok, let me look at turnilo [18:16:18] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Shorten the time it takes to move files from hadoop to dump hosts by Kerberizing the dump hosts - https://phabricator.wikimedia.org/T234229 (10Bstorm) My main concern would be if we were mounting labstore NFS and moving the data, which would surely cause proble... [18:16:32] ebernhardson: data in turnilo shows up to october 1st right? [18:16:40] nuria: right, it's a daily import at 00:00 [18:16:44] https://usercontent.irccloud-cdn.com/file/tMBCKvTG/Screen%20Shot%202019-10-02%20at%2011.16.13%20AM.png [18:16:57] ebernhardson: ok, let me look at superset datasources [18:19:47] ebernhardson: i see the "no data" on teh datasource yes, is there a dashboard on top of it or can i delete datasource and re list it? [18:20:02] nuria: there was a dashboard, but i've exported and deleted it [18:20:06] nuria: so go ahead and do again [18:20:12] ebernhardson: ok [18:22:10] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Shorten the time it takes to move files from hadoop to dump hosts by Kerberizing the dump hosts - https://phabricator.wikimedia.org/T234229 (10Ottomata) > If we are talking about running a sync to labstore1006/7 from an hdfs system Yup, that's what we want! >... [18:23:03] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Shorten the time it takes to move files from hadoop to dump hosts by Kerberizing the dump hosts - https://phabricator.wikimedia.org/T234229 (10elukey) >>! In T234229#5541936, @Bstorm wrote: > My main concern would be if we were mounting labstore NFS and moving... [18:26:34] ebernhardson: what i see on console: [18:26:36] ebernhardson: [18:26:39] https://www.irccloud.com/pastebin/T0e1963P/ [18:27:22] ebernhardson: not very hopeful as the datasource DOES exists (superset itself found it) [18:27:24] nuria: hmm, so deleting doesn't seem to be working? [18:27:34] yea [18:28:17] 10Analytics, 10EventBus, 10CPT Initiatives (Modern Event Platform (TEC2)), 10Patch-For-Review, 10Services (next): mediawiki/recentchange event should not use fields with polymorphic types - https://phabricator.wikimedia.org/T216567 (10Ottomata) Hm, I'm not so sure if we can do this. recentchange is pret... [18:31:17] ebernhardson: i think i get it [18:31:27] ebernhardson: the time intervals selected by default are missleading [18:32:16] ebernhardson: see "Nuria testing erik's datasource" on charts [18:32:44] nuria: hmm, ok indeed i see data with a static time range [18:32:46] ebernhardson: the default time interval of last week was coming up to 2019-09-02 [18:32:59] ebernhardson: for which there is no data [18:33:43] ebernhardson: i think is a UX fail more than anything [18:33:56] ebernhardson: do try your dashboard with a "hardcoded" time range [18:33:57] hmm, there should be data. turnilo shows data going back to at least july 4 [18:34:12] ebernhardson: ya, it was the end date [18:34:32] ebernhardson: not the start date, or so it seemed, let me change times again [18:34:41] huh, interesting [18:34:50] milimetric: any chance with the mediawiki-history problem? [18:35:55] ebernhardson: ya, changed timeranges and things work (do try it and let me know if that is not the case) [18:36:16] nuria: lemme delete that and re-import my dashboard, see if i can get it going [18:36:22] (03CR) 10Ottomata: "Excited to merge this... :D" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/540159 (https://phabricator.wikimedia.org/T223414) (owner: 10Mforns) [18:36:31] ebernhardson: ok, if it does not work ping me [18:37:01] :] [18:37:22] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Change HDFS balancer threshold - https://phabricator.wikimedia.org/T231828 (10Ottomata) p:05Triage→03Low a:03Ottomata [18:43:06] 10Analytics: Use Snakebite instead of subprocess.Popen in HdfsUtils - https://phabricator.wikimedia.org/T200904 (10Ottomata) 05Open→03Declined Snakebite is python2 only≥ [18:44:33] 10Analytics: Throttling /request dispatcher that fronts eventlogging /MEP endpoint - https://phabricator.wikimedia.org/T212853 (10Ottomata) [18:44:40] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Operations, and 3 others: Public EventGate endpoint for analytics event intake - https://phabricator.wikimedia.org/T233629 (10Ottomata) [18:46:05] joal: I'm still looking at it [18:46:34] ok fdans - I guess it's nothing simple if you're still on it :( [18:46:47] nuria: nice! So it was a simple UX fail for time range? :( [18:46:55] joal: or that I'm kinda clueless, but let's go with your option :) [18:47:09] the sqoop seems fine [18:47:24] but some wikis show wild decreases in page events [18:47:43] :/ [18:48:58] joal: there has been no change at all in the history codebase right? [18:49:10] Not that we know [18:49:32] hmmm I'm going to try something [18:55:09] yeah I’m still in traffic, nuria I may not make it back in time [18:55:35] milimetric: ping me when you are back [18:56:08] elukey: I think so ya, also strange cause you would expect if data for 10/02 is not present it will return data up to 10/01 but no [18:56:20] elukey: so ux fail but also something else cc ebernhardson [19:01:41] nuria: asking superset to include time in the groupby, it seems superset isn't seeing any data after 9-19 ? [19:02:18] ebernhardson: can you point me to the dashboard? [19:03:12] nuria: it's not imported yet (the import failed ... so i have to re-do from the json file) [19:03:44] ebernhardson: if this data is from eventlogging we have the job that can do the import regularly [19:03:52] ebernhardson: so you do not have to do it [19:05:05] nuria: it's from eventlogging, but it's post-processed and joined with backend logging [19:05:22] ebernhardson: ah, isee, with your clickthrough data right? [19:05:27] ebernhardson: ok [19:05:28] nuria: right [19:05:51] ebernhardson: it seems that that once again a lovely oozie job might be thing here [19:06:03] nuria: we have an oozie job (two, actually) [19:06:30] nuria: thats this one: https://github.com/wikimedia/wikimedia-discovery-analytics/tree/master/oozie/search_satisfaction [19:06:33] ebernhardson: ok is superset not seeing data that was sucessfully imported by the job? [19:06:55] nuria: right, turnilo sees the data. superset isn't seeing past 9-19 in your test dashboard [19:08:14] ebernhardson: ok, let me look again [19:13:02] Disconnecting for tonight [19:15:21] ebernhardson: ok, i updated dashboard to show time and last record is 09/19, which now makes the "no data" for last week to make more sense. I suspect this is widespreda to other datasources, one sec [19:15:57] nuria, do you recall why the wiki field in the MediaWikiPingback schema is hashed (and seems rotated daily as well)? [19:17:31] 10Analytics: superset now showing data after 09/16 for some datasources - https://phabricator.wikimedia.org/T234471 (10Nuria) [19:19:16] ebernhardson: see ticket and other dashboard i created for navigation timing [19:20:28] elukey: i was wrong about superset , ayayaya [19:20:39] elukey: let's talk tomorrow [19:21:39] 10Analytics: superset now showing data after 09/16 for some datasources - https://phabricator.wikimedia.org/T234471 (10Nuria) Example dashboard with issue using eventlogging navigation timing data: https://bit.ly/2pt2qhf Note that this data is on turnilo (search satisfaction is having the same issue) : {F30525... [19:23:10] nuria: ouch, ok wider spread issue :( [19:23:49] ebernhardson: well, EXPLAINS the "no data last week" at least [19:23:54] indeed [19:23:55] ebernhardson: but it is a bigger deal [19:24:14] ebernhardson: i need to look at druid ui and segments and SAL and try to piece it together [19:28:06] 10Analytics: superset now showing data after 09/16 for some datasources - https://phabricator.wikimedia.org/T234471 (10Nuria) {F30525251} Number of segments look OK [19:28:23] 10Analytics, 10Analytics-Kanban: superset now showing data after 09/16 for some datasources - https://phabricator.wikimedia.org/T234471 (10Nuria) [19:31:27] 10Analytics, 10Analytics-Kanban: superset now showing data after 09/16 for some datasources - https://phabricator.wikimedia.org/T234471 (10Nuria) According to SAL: 2019-09-17 07:42 elukey: reboot analytics-tool1004 (host running superset) for kernel updates... given that last visible navigation timing data is... [19:33:40] if anyone is wondering, I'm still looking at the mediawiki history shenanigans [19:37:54] fdans: good [19:38:01] fdans: i am looking at superset issue [19:39:34] nuria: kinda mysterious both aug and sept snapshots have similar sizes [19:39:47] as expected sep is slightly bigger than aug [19:40:09] but sep has remarkably fewer move events than aug [19:40:10] fdans: that seems right but events have decreased (and that is why checker failed?) [19:40:31] nuria: yeayea that's what I was going for [19:40:49] september has half the move events [19:41:12] fdans: did you checked whether events for all wikis have decreased or "some" wikis [19:41:37] fdans: also events for "bots?" like "user group=Bot"? [19:41:39] nuria: that's what I'm waiting for now [19:45:16] I'm gonna get dinner while the query is running [20:03:07] 10Analytics, 10Analytics-Kanban: superset not showing data after 09/16 for some datasources - https://phabricator.wikimedia.org/T234471 (10EBernhardson) [21:21:45] !log restarting superset [21:21:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:35:14] nuria: how's superset going? [21:43:04] 10Analytics: Add anomaly detection alarm to detect traffic blockage per country - https://phabricator.wikimedia.org/T234484 (10Nuria) [21:43:49] 10Analytics: Add anomaly detection alarm to detect traffic blockage per country - https://phabricator.wikimedia.org/T234484 (10Nuria) And also traffic increases, is this a good usage of entrophy or do we want to go with a simple normal + std deviation distribution? [21:44:19] fdans: not superwell [21:44:36] fdans: with meetings and all, i know the what, teh when it started but not the why [21:44:54] ah yes [21:45:04] nuria: "the what but not the why" is today's theme [21:45:31] fdans: what where the results of query on reduction of events? [21:46:22] nuria: I'm pretty sure all those missing move events are create events in the latest snapshot [21:46:52] but they appear to be coming as move events in the raw logging table [21:47:23] fdans: for all wikis? [21:48:05] nuria: pretty sure this is wiki-independent yes [21:48:49] fdans: let's double check that cause it narrows down issue, are move events missing in all wikis? [21:50:49] 10Analytics, 10Analytics-Kanban: superset not showing data after 09/16 for some datasources - https://phabricator.wikimedia.org/T234471 (10Nuria) I see errors like: io.druid.java.util.common.RE: Failure getting results for query[7e7c183d-4d16-430e-8e53-ed88fd072125] url[http://druid1003.eqiad.wmnet:8200/druid... [22:02:54] 10Analytics, 10Analytics-Kanban: superset not showing data after 09/16 for some datasources - https://phabricator.wikimedia.org/T234471 (10Nuria) I this segment metadata not available to druid somehow. Handy script to get segment metadata: curl -X POST 'http://localhost:8082/druid/v2/?pretty' -H 'Content-Ty... [22:17:25] nuria: here's a chart of move events per wiki [22:17:38] https://usercontent.irccloud-cdn.com/file/rrgy3hCG/Screen%20Shot%202019-10-03%20at%2012.17.03%20AM.png [22:19:05] the craziest case is commons [22:19:50] which in the september snapshot had 1949210 move events, and only 17426 in the october one [22:34:00] fdans: do the move events that have disppeared (for commons) have something in common? like they were done by a bot, pages got deleted and thus those move events disappeared? [22:47:08] nuria: I'm not sure but I'm looking at the raw logging table, comparing the count of logs with log_action='move' and there doesn't seem to be anything weird there [22:47:43] fdans: were a big number of pages deleted from commons? [22:48:25] nuria: hmmm let me do an event group by on commons [22:50:07] select event_type, snapshot, count(*) from mediawiki_history where snapshot > '2019-07' and wiki_db='commons_wiki' group by event_type, snapshot; [22:55:18] 10Analytics, 10Analytics-Kanban: superset not showing data after 09/16 for some datasources - https://phabricator.wikimedia.org/T234471 (10Nuria) The query that does not return data on superset returns data on druid: nuria@druid1001:~$ more test-query-navigationtiming.sh curl -X POST 'http://localhost:8082/d... [23:01:12] nuria: delete events increased in a regular way [23:01:28] fdans: so no delete spike? [23:01:38] I'm pretty sure that all those move events are being turned to create events [23:02:18] https://www.irccloud.com/pastebin/vBdgULQ0/ [23:05:56] fdans: without another snapshot to compare (like aug to july) it is hard to see if differences are too large, from the checker i am gusessing they are [23:06:00] *guessing [23:06:11] nuria: yeah that's what i was just checking [23:06:14] with july [23:08:03] nuria: anyway, I'm going to leave it for tonight, I'm too tired, will continue tomorrow [23:08:18] fdans: Sounds good, thanks for your work! [23:27:37] 10Analytics: Druid monthly indexing of eventlogging segments - https://phabricator.wikimedia.org/T234493 (10Nuria) [23:33:53] 10Analytics, 10Analytics-Kanban: superset not showing data after 09/16 for some datasources - https://phabricator.wikimedia.org/T234471 (10Nuria) I think also the daily rollup of eventlogging data might not be happening since 09/29? (indexing might be happening hourly but needs to happen daily too) [23:48:16] 10Analytics, 10Analytics-Kanban: Eventlogging to druid daily timers not executing? - https://phabricator.wikimedia.org/T234494 (10Nuria) [23:59:31] 10Analytics, 10Operations, 10User-Elukey: setup/install krb1001/WMF5173 - https://phabricator.wikimedia.org/T233141 (10Cmjohnson)