[04:31:32] (03CR) 10Nuria: [V: 03+2 C: 03+2] Add EditAttemptStep's bucket field to EventLogging whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/500524 (owner: 10Neil P. Quinn-WMF) [04:31:59] (03Abandoned) 10Nuria: whitelist: add hi.wikisource [analytics/refinery] - 10https://gerrit.wikimedia.org/r/499462 (https://phabricator.wikimedia.org/T218155) (owner: 10MarcoAurelio) [04:36:48] (03Restored) 10Nuria: whitelist: add hi.wikisource [analytics/refinery] - 10https://gerrit.wikimedia.org/r/499462 (https://phabricator.wikimedia.org/T218155) (owner: 10MarcoAurelio) [04:39:06] (03CR) 10Nuria: Add ExternalGuidance event logging table to whitelist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/499270 (https://phabricator.wikimedia.org/T218838) (owner: 10Chelsyx) [06:54:18] morning! [06:54:21] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10elukey) tensorflow-rocm 1.13.1 available for Python 3.7 on PyPi! https... [06:54:29] tensorflow-rocm available for python 3.7! [07:26:15] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Ingest data from PrefUpdate EventLogging schema into Druid - https://phabricator.wikimedia.org/T218964 (10fdans) Oh my bad @Tbayer the test load actually only spans Jan 15, granularity should be fine but I've loaded three days now for good measure :) Her... [07:32:16] morning elukey [07:32:33] o/ [08:15:16] elukey: shall we deploy AQS? [08:15:43] sure! [08:16:21] I'll let you know when aqs1004 is ready [08:16:29] Hi tizianop - I have seen your message from yesterday - I'm interested to understand the use case of you partitionBy with buckets :) [08:16:42] Thanks elukey [08:20:22] joal: did you see https://etherpad.wikimedia.org/p/analytics-cdh5.16.1 [08:20:25] ? [08:20:26] still incomplete [08:20:32] but way better than before [08:21:33] joal: I have to do often the group by session_id and if I understand correctly, this should reduce the shuffling of the data on different nodes [08:23:17] joal: aqs1004 ready [08:23:56] testing elukey [08:31:05] elukey: all good (sorry for the delay) [08:31:23] elukey: Let's full deploy, then I'll double check on WKS2-UI [08:31:44] tizianop: I think you misunderstand repartitioning and partitionBy :) [08:31:52] ack! [08:32:45] this is jumbo 1004's tx bandwidth [08:32:45] 20:21:47 666764.4 [08:32:45] 20:21:48 765245.4 [08:32:46] 20:21:49 885008.4 [08:32:46] 20:21:50 682260.6 [08:32:48] 21:05:34 628015.8 [08:32:54] pfff :( [08:32:57] yeah :( [08:33:00] we need 10G [08:33:02] yeah [08:33:11] and/or more nodz [08:33:21] moar POWA [08:33:51] elukey: I'm sorry my brain is not in powerfull enough mode to review the etherpad :) I'll do it next week :) [08:34:07] I've opened it, read 3 lines, and realized I had just seen letters [08:34:39] joal: oh no :( I'll read the documentation again [08:34:54] joal: ah sorry I should have told you what's changed - basically all steps are done via cumin or SRE's automation [08:35:01] from cumin1001 [08:35:09] so basically way more automated [08:35:42] elukey: I had understood that, but still can't make my way around it :) [08:35:45] tizianop: https://stackoverflow.com/questions/40416357/spark-sql-difference-between-df-repartition-and-dataframewriter-partitionby [08:36:24] tizianop: partitionBy is about reading more or less data based on a data-value (in your example, session_id % 200) [08:37:15] tizianop: repartition(COL) organises the data so that same values for COL end up on the same worker (that's what you want) [08:37:20] I've uploaded the new nodejs 10 yesterday, production aqs can be upgraded to the new version [08:37:45] moritzm: I think it has already been restarted by elukey this morning for another reason, so done it must be :) [08:38:21] PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [08:38:21] PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [08:38:23] PROBLEM - aqs endpoints health on aqs1008 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [08:38:27] tizianop: However, wow [08:38:30] sorry [08:38:33] PROBLEM - aqs endpoints health on aqs1006 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [08:38:35] not cool [08:39:11] I repooled aqs1004 for a bit and all was good [08:39:31] RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [08:39:31] RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [08:39:33] RECOVERY - aqs endpoints health on aqs1008 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [08:39:43] RECOVERY - aqs endpoints health on aqs1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [08:39:45] ??? [08:39:46] is is probably druid joal https://grafana.wikimedia.org/d/000000538/druid?refresh=1m&orgId=1 [08:40:01] caching misses [08:40:11] ok [08:41:02] elukey: we need the patch nuria made on restbase - fast if we can [08:41:10] Arrf, no deploy on friday :( [08:41:37] joal: I don't think this is related to that, we'd need probably to somehow warm druid's cache [08:41:40] now it seems all fine [08:42:11] elukey: yes, but the number of query count at historical level is still high [08:42:14] first we need to upgrade the nodejs debs on aqs*, though :-) [08:42:27] joal: yeah basically I wanted the executor to read only "local" data. I guess the trick is to use repartition() before writing. Thanks! :) [08:43:21] tizianop: something else to notice: when you'll reuse the partitioned-data, spark has no way to know that it is already partitioned, so the shuffle stage will still happen [08:43:30] joal: mmm it doesn't seem huge to me [08:43:37] maybe I am not seeing the right metric [08:44:13] number of rps is ok for broker, for historical, it's a lot higher (long timespan queries) [08:44:16] moritzm: we can do it on monday if you are ok [08:44:34] elukey: restart was not enough? [08:44:57] joal: it wasn't deployed as far as I know [08:45:06] ah ok you mean the baseline of rps to historicals [08:45:09] yes yes [08:45:20] joal: uhm, ok. But the data is not moved over network, right? [08:45:26] Ahhh - my bad moritzm - sorry - I shouldn't talk on topics I don't fully understand ;) [08:45:34] tizianop: it will :) [08:46:02] tizianop: possibly less, but some shuffling will happen [08:46:31] joal: aaaargh... ok, not much to do at this point :) [08:47:02] tizianop: if you want to optimize at that level, you need to use RDDs and the mapPartitions function [08:47:23] tizianop: meaning you need to go lower-level in term of paralellization functions [08:48:43] tizianop: another approach - If you are sure the length of sessions is small enough, you can collect the session data in a single row: (session_id, Array[session_items]) [08:49:14] tizianop: But then, you'll end up iterating over session items as arrays, which is not simpler depending on what ou have to do [08:49:54] joal: oook! Thank you very much! [08:50:21] np tizianop - Don't hesitate to ask for help ;) [08:53:28] elukey: Checked the UI, nothing wrong spotted - Thanks for the deploy [08:54:10] 10Analytics, 10Analytics-Kanban, 10Core Platform Team (Modern Event Platform (TEC2)), 10Core Platform Team Backlog (Watching / External), and 2 others: [Post-mortem] Kafka Jumbo cluster cannot accept connections - https://phabricator.wikimedia.org/T219842 (10elukey) Some info from ifstat (time / tx bandwid... [09:33:10] 10Analytics, 10Operations, 10netops: Allow swift https access from analytics to prod - https://phabricator.wikimedia.org/T220081 (10fgiunchedi) @ayounsi this is good to go, unless there's signoff needed? [10:01:58] fdans: heya - would you have a minute for a python brainbounce? [10:02:18] joal: let's go batcave! [10:02:35] OMW! [10:18:49] ebernhardson: o/ - while working on granting ssh access to stat1005 to Gilles/Miriam (https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/501156/) I started to think if the group that you are in (gpu-testers) is still needed anymore (namely if you need sudo on stat1005). If not I could add you to the new 'gpu-users' group, so in this way we could start moving stat1005 from a "hacky/testing" [10:18:55] environment to a more prod one :) [10:18:57] let me know your thoughs! [10:19:38] my final goal is to allow the research team to use the host [10:19:46] (when ready of course) [10:26:54] PROBLEM - aqs endpoints health on aqs1007 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [10:26:58] PROBLEM - aqs endpoints health on aqs1008 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [10:27:20] PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [10:27:28] PROBLEM - aqs endpoints health on aqs1006 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [10:27:47] Oh crap [10:28:09] this me having deleted datasources on druid-public [10:28:18] I wouldn't have expected those issues :( [10:29:26] PROBLEM - aqs endpoints health on aqs1009 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [10:29:26] PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [10:30:14] actually query count on historical nodes is even higher on new datasource that it was on previous :( [10:33:34] something is wrong with druid - elukey - HELP :) [10:37:21] I am here sory [10:37:26] np elukey [10:37:36] druid didn't like me removing old datasources :( [10:38:31] exceptions in broker log on druid1004 [10:39:50] elukey: can we try bumping brokers on druid machines? [10:40:35] bumping? [10:40:41] restart [10:40:46] I just restarted druid1004's broker [10:40:46] RECOVERY - aqs endpoints health on aqs1007 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [10:40:48] RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [10:40:54] RECOVERY - aqs endpoints health on aqs1009 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [10:40:58] Ooook - Here we go [10:41:21] but WHY ???? [10:42:25] I am rolling restart all the brokers [10:42:29] Thanks you alot [10:42:34] RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [10:43:24] RECOVERY - aqs endpoints health on aqs1008 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [10:44:00] RECOVERY - aqs endpoints health on aqs1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [10:45:11] the spikes on 'host overview' dashboard for druid1004 at the moment I dropped the datasources looks not healthy [10:46:33] joal: I was about to go afk for an errand, ~1:30h, is it ok if I go? [10:46:38] or do you prefer me to remain? [10:47:06] elukey: All good now, I won't do anything more until you're here, just looking at graphs :) [10:47:13] Thanks a lot for the quick actions ! [10:48:08] np! I am not sure why I am seeing queries for the 07_2018 snapshot [10:48:19] WAT? [10:48:23] ah no wait similar colors [10:48:24] no no [10:48:26] my bad [10:48:28] :) [10:48:33] Aouch - I breath back [10:48:37] :) [10:48:47] all right see you later!! [10:48:50] ciao [11:05:29] (03PS1) 10GoranSMilovanovic: initial [analytics/wmde/WD/WD_identifierLandscape] - 10https://gerrit.wikimedia.org/r/501543 [11:06:11] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] initial [analytics/wmde/WD/WD_identifierLandscape] - 10https://gerrit.wikimedia.org/r/501543 (owner: 10GoranSMilovanovic) [11:11:05] (03PS1) 10GoranSMilovanovic: github links [analytics/wmde/WD/WD_identifierLandscape] - 10https://gerrit.wikimedia.org/r/501545 [11:11:30] (03CR) 10GoranSMilovanovic: [V: 03+2 C: 03+2] github links [analytics/wmde/WD/WD_identifierLandscape] - 10https://gerrit.wikimedia.org/r/501545 (owner: 10GoranSMilovanovic) [11:38:17] (03CR) 10Hoo man: [C: 04-1] WIP: count number of Wikidata edits by namespace (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500752 (https://phabricator.wikimedia.org/T218901) (owner: 10Lucas Werkmeister (WMDE)) [11:38:29] (03PS3) 10Lucas Werkmeister (WMDE): WIP: count number of Wikidata edits by namespace [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500752 (https://phabricator.wikimedia.org/T218901) [11:46:33] (03CR) 10Hoo man: Track number of links to Wikidata entity namespaces (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500966 (https://phabricator.wikimedia.org/T218903) (owner: 10Michael Große) [11:56:48] (03PS4) 10Lucas Werkmeister (WMDE): Count number of Wikidata edits by namespace [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500752 (https://phabricator.wikimedia.org/T218901) [11:59:21] 10Analytics, 10Analytics-Kanban: Fix druid-public drop-snapshot script - https://phabricator.wikimedia.org/T220111 (10JAllemandou) a:03JAllemandou [11:59:37] (03PS1) 10Joal: [WIP] update script dropping old druid datasources [analytics/refinery] - 10https://gerrit.wikimedia.org/r/501551 [12:01:33] (03PS2) 10Michael Große: Track number of links to Wikidata entity namespaces [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500966 (https://phabricator.wikimedia.org/T218903) [12:03:58] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Track number of links to Wikidata entity namespaces [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500966 (https://phabricator.wikimedia.org/T218903) (owner: 10Michael Große) [12:55:56] joal: o/ [12:56:13] do we need to do any follow up on the druid brokers outage other than your email? [13:17:20] 10Analytics, 10Research, 10Article-Recommendation, 10Patch-For-Review: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10Dzahn) [13:20:26] elukey: I don't think so [13:21:09] elukey: What the outage seems to mean IMO is that too-busy is not good for brokers, but I don't think we can easily do anything for that [13:21:56] joal: too busy due to dropping segments you mean? [13:22:09] I only see exceptions in brokers, all related to timeouts [13:22:13] really strange [13:24:48] elukey: The current strategy uses sending 1 DELETE request per segment, and if druid is trying to parallelize everything, well it could just be overwhelemed? [13:28:14] joal: sure I wanted to get some proof from metrics or logs.. IIUC the extra DELETE load caused historicals to be busy in dropping segments causing timeouts for brokers? [13:33:02] hey joal, you free? [13:37:07] sorry I probably caught you late [13:38:52] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10elukey) @EBernhardson question for you - while working on https://gerr... [13:55:23] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10EBernhardson) Doesn't seem to be needed anymore, feel free to start mo... [13:58:19] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Change permissions for daily traffic anomaly reports on stat1007 - https://phabricator.wikimedia.org/T219546 (10elukey) @ssingh let's do things properly and move crons and repositories in one place. The dependencies that were discussed in the related task are b... [14:12:38] heya team [14:12:40] :] [14:13:32] o/ [14:17:52] (03PS9) 10Bmansurov: Oozie: add article recommender [analytics/refinery] - 10https://gerrit.wikimedia.org/r/496885 (https://phabricator.wikimedia.org/T210844) [14:25:18] (03CR) 10Mforns: [V: 03+2 C: 03+2] "This looks fine to me then." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/501045 (https://phabricator.wikimedia.org/T220033) (owner: 10Nettrom) [14:32:23] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up edit_hourly data set in Hive - https://phabricator.wikimedia.org/T220092 (10mforns) Thanks @Neil_P._Quinn_WMF! Forgot to do that. As you see, we had a slight change of plans in the implementation. We encountered and issue in Druid, which does not al... [14:35:15] (03CR) 10Mforns: [C: 04-2] Add edit_hourly oozie job (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/501197 (https://phabricator.wikimedia.org/T220092) (owner: 10Mforns) [15:06:47] (03CR) 10Mforns: [C: 04-2] Add edit_hourly oozie job (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/501197 (https://phabricator.wikimedia.org/T220092) (owner: 10Mforns) [15:29:05] (03CR) 10Michael Große: [C: 03+1] "This looks good to me, but I don't have the rights to deploy/verify this." (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500752 (https://phabricator.wikimedia.org/T218901) (owner: 10Lucas Werkmeister (WMDE)) [15:30:19] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Track number of links to Wikidata entity namespaces (032 comments) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500966 (https://phabricator.wikimedia.org/T218903) (owner: 10Michael Große) [15:37:30] what a WEEK! [15:38:23] (03PS1) 10Mforns: Add oozie job to load edit_hourly to druid [analytics/refinery] - 10https://gerrit.wikimedia.org/r/501607 (https://phabricator.wikimedia.org/T211173) [15:41:41] (03CR) 10Mforns: [C: 04-2] "Still testing!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/501607 (https://phabricator.wikimedia.org/T211173) (owner: 10Mforns) [15:49:22] (03PS5) 10Lucas Werkmeister (WMDE): Count number of Wikidata edits by namespace [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500752 (https://phabricator.wikimedia.org/T218901) [15:51:27] (03PS6) 10Mforns: Add edit_hourly oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/501197 (https://phabricator.wikimedia.org/T220092) [15:52:20] (03PS3) 10Mforns: Add edit_hourly to list of tables to be purged of old snapshots [analytics/refinery] - 10https://gerrit.wikimedia.org/r/501328 (https://phabricator.wikimedia.org/T220092) [15:52:32] (03PS2) 10Mforns: Add oozie job to load edit_hourly to druid [analytics/refinery] - 10https://gerrit.wikimedia.org/r/501607 (https://phabricator.wikimedia.org/T211173) [15:53:11] joal: you mention that the method to delete segments is deprecated, but it is for the latest druid version, which we don't have yet right? [15:53:31] as far as i can remember this was the suggested way of deleting for our version of druid [15:54:18] fdans: I think it's not deprecated yet in our version, but has not been suggested wa to deleted from deep-storage - Kill-task was already the suggested way [15:55:10] (03CR) 10Michael Große: Track number of links to Wikidata entity namespaces (033 comments) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500966 (https://phabricator.wikimedia.org/T218903) (owner: 10Michael Große) [15:55:12] fdans: more important than deprecated or not, if kill-tasks allow for not having borkers freaking out, I'll take it for good :) [15:55:27] (03PS3) 10Michael Große: Track number of links to Wikidata entity namespaces [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500966 (https://phabricator.wikimedia.org/T218903) [15:55:54] yeayea, totally [15:56:08] still need to be tested though :) [16:05:43] (03CR) 10Lucas Werkmeister (WMDE): Count number of Wikidata edits by namespace (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500752 (https://phabricator.wikimedia.org/T218901) (owner: 10Lucas Werkmeister (WMDE)) [16:08:32] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Ingest data from PrefUpdate EventLogging schema into Druid - https://phabricator.wikimedia.org/T218964 (10Tbayer) Thanks! To me this looks good to go now, except perhaps that the x-axis coordinates seem a bit weird (each day appears twice - "Wed 16 Wed 1... [16:14:04] (03CR) 10Lucas Werkmeister (WMDE): Track number of links to Wikidata entity namespaces (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500966 (https://phabricator.wikimedia.org/T218903) (owner: 10Michael Große) [16:31:50] (03CR) 10Michael Große: Count number of Wikidata edits by namespace (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500752 (https://phabricator.wikimedia.org/T218901) (owner: 10Lucas Werkmeister (WMDE)) [16:33:56] (03CR) 10Lucas Werkmeister (WMDE): Count number of Wikidata edits by namespace (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500752 (https://phabricator.wikimedia.org/T218901) (owner: 10Lucas Werkmeister (WMDE)) [16:34:18] (03PS6) 10Lucas Werkmeister (WMDE): Count number of Wikidata edits by namespace [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500752 (https://phabricator.wikimedia.org/T218901) [16:35:56] (03PS4) 10Michael Große: Track number of links to Wikidata entity namespaces [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500966 (https://phabricator.wikimedia.org/T218903) [16:46:37] 10Analytics, 10Analytics-Kanban: Refactor druid data deletion script - https://phabricator.wikimedia.org/T220111 (10JAllemandou) [16:46:48] nuria: is the description ok --^ ? [16:47:24] joal: can you add the problem caused when you tried to delete? [16:47:30] yes [16:47:41] nuria: I actually deleted [16:48:54] 10Analytics, 10Analytics-Kanban: Refactor druid data deletion script - https://phabricator.wikimedia.org/T220111 (10JAllemandou) [16:49:25] joal: i thought alarms got fired? [16:50:02] yes - druid-brokers went unresponsive after the deletion - elukey had to erstart them [16:50:41] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Track number of links to Wikidata entity namespaces [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/500966 (https://phabricator.wikimedia.org/T218903) (owner: 10Michael Große) [16:53:29] joal i think that is the important bit of why we should prioritize this work right? [16:55:04] nuria: correct, but I'll test (on monday, with Luca around) a deletion using a kill-task to make sure it solves the issue (could also be related to the historical being too busy dropping segments to respond to brokers, making them freak out) [16:55:09] Will test on monday [16:55:16] joal: k [16:56:22] Gone for diner team [16:59:11] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10elukey) The long list of patches above was needed to allow to deploy t... [16:59:54] logging off folks, see you on Monday! [17:16:45] 10Analytics, 10Operations, 10netops: Allow swift https access from analytics to prod - https://phabricator.wikimedia.org/T220081 (10ayounsi) a:03ayounsi [18:33:57] mforns_: if you are there, in the edits_hourly what are the count and "edit count" differences? [18:45:38] (03PS1) 10Zhuyifei1999: SECURITY: escape CSV injections [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/501662 (https://phabricator.wikimedia.org/T209226) [18:46:04] sorry team, my internet is really bad today... [18:46:16] (03CR) 10Zhuyifei1999: [C: 03+2] SECURITY: escape CSV injections [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/501662 (https://phabricator.wikimedia.org/T209226) (owner: 10Zhuyifei1999) [18:46:22] now, edits_hourly is in druid! https://turnilo.wikimedia.org/#edits_hourly [18:49:22] (03PS3) 10Mforns: Add oozie job to load edit_hourly to druid [analytics/refinery] - 10https://gerrit.wikimedia.org/r/501607 (https://phabricator.wikimedia.org/T211173) [19:06:54] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10Patch-For-Review: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset - https://phabricator.wikimedia.org/T211173 (10mforns) Hey all! The first test of edits_hourly is in Druid: https://turnilo.wikime... [19:12:38] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10Patch-For-Review: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset - https://phabricator.wikimedia.org/T211173 (10Nuria) The hourly granularity on display looks a bit strange (see screenshot), could... [19:18:17] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10Patch-For-Review: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset - https://phabricator.wikimedia.org/T211173 (10mforns) Wow, that's weird. What I see is that all pm hours are not there... I can s... [19:24:32] 10Quarry, 10Security: Quarry can be affected by CSV Injection - https://phabricator.wikimedia.org/T209226 (10sbassett) [19:38:41] sorry team, both my internet connections are flapping today, will log off [19:41:12] (03PS3) 10Chelsyx: Add ExternalGuidance event logging table to whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/499270 (https://phabricator.wikimedia.org/T218838) [20:26:53] (03PS1) 10HaeB: Add MobileWebShareButton schema to EventLogging whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/501715 (https://phabricator.wikimedia.org/T207280) [21:53:58] Anyone from analytics about? [22:07:05] PROBLEM - aqs endpoints health on aqs1007 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:08:15] RECOVERY - aqs endpoints health on aqs1007 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:09:09] PROBLEM - aqs endpoints health on aqs1008 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:09:11] PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:09:17] PROBLEM - aqs endpoints health on aqs1006 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:10:19] RECOVERY - aqs endpoints health on aqs1008 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:10:21] RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:15:35] RECOVERY - aqs endpoints health on aqs1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:17:23] PROBLEM - aqs endpoints health on aqs1007 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:18:17] PROBLEM - aqs endpoints health on aqs1009 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:18:39] RECOVERY - aqs endpoints health on aqs1007 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:20:43] RECOVERY - aqs endpoints health on aqs1009 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:26:01] PROBLEM - aqs endpoints health on aqs1006 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:26:31] PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:27:21] PROBLEM - aqs endpoints health on aqs1009 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:27:47] PROBLEM - aqs endpoints health on aqs1007 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:28:31] RECOVERY - aqs endpoints health on aqs1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:29:05] RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:30:21] RECOVERY - aqs endpoints health on aqs1007 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:33:47] RECOVERY - aqs endpoints health on aqs1009 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:33:52] 10Analytics, 10Analytics-Cluster, 10Discovery-Search (Current work), 10Patch-For-Review: Implementation and testing of glent m0 integration with wmf infra - https://phabricator.wikimedia.org/T218164 (10debt) 05Open→03Resolved [22:34:59] PROBLEM - aqs endpoints health on aqs1008 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:35:01] PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:36:13] RECOVERY - aqs endpoints health on aqs1008 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:37:35] RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:40:11] PROBLEM - aqs endpoints health on aqs1008 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:40:47] PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:42:01] PROBLEM - aqs endpoints health on aqs1007 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:42:47] PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:42:51] PROBLEM - aqs endpoints health on aqs1009 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:42:51] PROBLEM - aqs endpoints health on aqs1006 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:43:15] RECOVERY - aqs endpoints health on aqs1007 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:43:19] RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:43:57] RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:44:01] RECOVERY - aqs endpoints health on aqs1008 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:44:01] RECOVERY - aqs endpoints health on aqs1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:44:05] RECOVERY - aqs endpoints health on aqs1009 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:48:03] PROBLEM - aqs endpoints health on aqs1009 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:49:15] PROBLEM - aqs endpoints health on aqs1008 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:49:17] RECOVERY - aqs endpoints health on aqs1009 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:53:09] RECOVERY - aqs endpoints health on aqs1008 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:53:47] PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:54:33] PROBLEM - aqs endpoints health on aqs1009 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:55:49] PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:56:23] RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:57:03] RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:57:05] RECOVERY - aqs endpoints health on aqs1009 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:57:07] PROBLEM - aqs endpoints health on aqs1006 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [22:58:23] RECOVERY - aqs endpoints health on aqs1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [23:00:39] Seddon: yes [23:01:03] PROBLEM - aqs endpoints health on aqs1009 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [23:02:11] RECOVERY - aqs endpoints health on aqs1009 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [23:21:07] (03CR) 10Nuria: [C: 03+2] Add ExternalGuidance event logging table to whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/499270 (https://phabricator.wikimedia.org/T218838) (owner: 10Chelsyx)