[00:16:35] 10Analytics, 10Analytics-Cluster, 10Operations, 10SRE-Access-Requests: Giving Access to gpu-testers to Rodolfo - https://phabricator.wikimedia.org/T253274 (10diego) [01:55:52] 10Analytics: Add awawiki to anlaytics whitelist - https://phabricator.wikimedia.org/T253225 (10Ladsgroup) 05Open→03Resolved a:03Milimetric It was added in this commit: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/597566/1/static_data/pageview/whitelist/whitelist.tsv [01:56:44] 10Analytics, 10User-Urbanecm: Add gomwiktionary to analytics whitelist - https://phabricator.wikimedia.org/T253227 (10Ladsgroup) 05Open→03Resolved a:03Milimetric It was added in this commit: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/597566/1/static_data/pageview/whitelist/whitelist.tsv [02:09:54] 10Analytics, 10Analytics-Cluster, 10Operations, 10SRE-Access-Requests: Giving Access to gpu-testers to Rodolfo - https://phabricator.wikimedia.org/T253274 (10Reedy) [02:10:06] 10Analytics, 10Analytics-Cluster, 10Operations, 10SRE-Access-Requests: Giving Access to gpu-testers to Rodolfo - https://phabricator.wikimedia.org/T253274 (10Reedy) [04:49:34] (03CR) 10Nuria: "Just looked at overall idea, real nice." (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/595152 (https://phabricator.wikimedia.org/T251777) (owner: 10Fdans) [04:58:05] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Create job that backfills Pagecounts-EZ (2011 - 2016) data via hadoop correcting issues - https://phabricator.wikimedia.org/T252857 (10Nuria) Is our plan to maintain the data in hadoop tables as well? So the data ingested will remain? [05:06:38] 10Analytics, 10Analytics-Kanban, 10LDAP-Access-Requests, 10Operations: LDAP access to the wmf group for Segun Oworu (superset, turnilo, hue) - https://phabricator.wikimedia.org/T252703 (10Nuria) I think all access needed is granted as @soworu is able to access turnilo and superset. I have to say that this... [05:09:19] 10Analytics: Grant not able to access superset - https://phabricator.wikimedia.org/T253281 (10Nuria) [05:39:27] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Create job that backfills Pagecounts-EZ (2011 - 2016) data via hadoop correcting issues - https://phabricator.wikimedia.org/T252857 (10fdans) I think unless we have a good reason to keep it, data should only be kept as long as it's useful to generate the d... [05:47:10] 10Analytics, 10Analytics-Cluster, 10Operations, 10SRE-Access-Requests: Giving Access to gpu-testers to Rodolfo - https://phabricator.wikimedia.org/T253274 (10elukey) 05Open→03Resolved a:03elukey @diego any user in analytics-privatedata-users can have access to the GPUs by default (since a few months... [05:49:28] 10Analytics, 10Analytics-Cluster, 10Operations, 10SRE-Access-Requests: Giving Access to gpu-testers to Rodolfo - https://phabricator.wikimedia.org/T253274 (10elukey) Also another detail: see the versions of tensorflow supported - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/AMD_GPU#Use_ten... [06:22:27] 10Analytics, 10Inuka-Team, 10Language-strategy, 10Tool-Pageviews: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10fdans) +1 to what Nuria said, but I'm moving this to incoming, since the completion of bot detection means we should probably reprioritize t... [06:45:17] 10Analytics, 10Analytics-Kanban: Add new Druid nodes to analytics and public clusters - https://phabricator.wikimedia.org/T252771 (10elukey) Had to revert, this error popped up: ` May 20 17:12:36 an-druid1001 druid[19141]: 12) Not enough direct memory. Please adjust -XX:MaxDirectMemorySize, druid.processing.... [07:25:41] (03PS2) 10Fdans: Add special explode UDTF that turns EZ-style hourly strings into rows [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/596605 (https://phabricator.wikimedia.org/T252857) [07:26:19] (03CR) 10Fdans: "Addressed most of the problems, a couple replies below" (037 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/596605 (https://phabricator.wikimedia.org/T252857) (owner: 10Fdans) [07:30:08] (03PS1) 10Fdans: Add UDF that transforms Pagecounts-EZ projects into standard [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/597740 (https://phabricator.wikimedia.org/T252857) [08:15:19] !log roll restart of all druid historicals in the analytics cluster to pick up new settings [08:15:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:32:43] (03PS8) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [08:36:22] brb [08:57:31] (03PS9) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [09:00:59] (03PS10) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [09:04:00] (03PS11) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [09:04:24] 10Analytics: Upgrade turnilo to latest upstream - https://phabricator.wikimedia.org/T253294 (10elukey) p:05Triage→03Medium [09:05:14] !log move turnilo to an-druid1001 (beefier host) [09:05:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:05:39] I cannot change the superset broker field for druid analytics [09:05:40] weird [09:06:04] fdans: do you have a min to try? [09:06:33] elukey: sure! try what? [09:07:27] so I am trying to move superset's druid-analytics config to an-druid1002 [09:07:44] you can go in Superset -> Druid Clusters and then edit the analytics entry [09:07:54] swap druid1003 with an-druid1002 [09:08:08] the main issue for me is that "save" completes, but I still see the old value [09:13:48] elukey: looking [09:15:25] elukey: same thing happens for me [09:15:39] what about deleting the record and creating a new one with the change value? [09:16:01] !log move Druid Analytics SQL in Superset to druid://an-druid1001.eqiad.wmnet:8082/druid/v2/sql/ [09:16:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:16:14] the sqlalchemy table works fine [09:16:53] fdans: I fear that if we delete/recreate and there is a new id for the druid cluster then chars will be broken [09:16:59] this is a speculation of course [09:17:03] we can try in staging and see [09:19:13] joal: confirming that loading an external table with files in hdfs and then dropping the table removes the files [09:19:40] i thought my problem was that I wasn't setting a location, but I set one and the issue is still there [09:37:57] "A 1:15 ratio of Brokers to Historicals is a reasonable starting point (this is not a hard rule)." [09:38:00] ahahhha sure [09:38:12] https://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html [09:44:01] ew, check out the very gross https://hue.wikimedia.org/oozie/list_oozie_coordinator/0018287-200507064132789-oozie-oozi-C/ [09:44:14] this is what coordinators that need to pull in hourly events data for 30 days look like [09:44:15] ew [09:46:47] 10Analytics, 10Analytics-Kanban, 10Research, 10Patch-For-Review: Proposed adjustment to wmf.wikidata_item_page_link to better handle page moves - https://phabricator.wikimedia.org/T249773 (10Milimetric) This is deployed, should be processing the 5/18 snapshot sometime soon. [09:47:13] hello milimetric :) [09:47:19] howdy :) [09:47:29] need to run some errands, bbiab, available on the phone if needed [09:47:33] * elukey afk [09:48:45] oh wow, 1/15 ratio would be so awesome. Yeah, though, see, this is what I'm usually saying, our cluster's basically tiny. Does make me think, would we get better performance if we had fewer brokers somehow? [09:49:22] 10Analytics, 10Analytics-Kanban: Spike, see how easy/hard is to scoop all tables from Eventlogging log database - https://phabricator.wikimedia.org/T250709 (10Milimetric) p:05Triage→03High [09:51:13] (03PS12) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [10:24:28] (03PS13) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [10:28:00] (03PS14) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [10:50:36] milimetric: one thing that we could do in the future is split hosts between historicals and broker+overlord+etc.. [10:51:26] right, would certainly make figuring out where bottlenecks are easier, no? [10:51:36] or can you monitor resource use per process pretty easily now [10:55:22] atm the resource usage on host is not that much, our historicals are not even using half of the heap [10:55:48] so in theory for our volume of traffic we should be ok.. [10:56:10] the new hosts are beefier, 128G of ram and 64 vcores [10:56:43] before getting the bottleneck in those we'll need a lot more of queries :D [10:57:34] I am reviewing now again all parameters that we set, Druid has really too many [11:14:26] quick lunch! [11:16:25] 10Analytics, 10Analytics-Kanban: Spike, see how easy/hard is to scoop all tables from Eventlogging log database - https://phabricator.wikimedia.org/T250709 (10Milimetric) I made the python script I was talking about, you pass it output from information_schema and it makes queries that look like they should wor... [11:56:33] Hello! When I query mediawiki_wikitext_current, I get unexpected results. basically plenty of NULL rows, sometimes even conflicting with my predicates. (using Hive on Hue) [11:59:17] djellel: hi! I'd suggest to open a task with the query that you are executing and what is the issue, so we can repro and provide feedback [11:59:36] (just add the Analytics tag and we'll read it) [11:59:52] also, aside from the data, have you tried superset sqllab? [12:00:12] https://superset.wikimedia.org/superset/sqllab [12:00:16] Ok. but just to confirm, is mediawiki_wikitext_current the table that contains current wiki [12:00:57] I am a little bit ignorant about it so cannot say, will let others to chime in [12:01:01] fdans: are you around? [12:03:22] elukey: hello [12:03:56] fdans: hola, if you have time can you check djellel's qs? --^ :) [12:05:58] djellel: have you read this page? https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Content/XMLDumps/Mediawiki_wikitext_current [12:06:13] `ye [12:06:21] yes i did [12:07:24] the description is relevant, but there's something off when I query the table. [12:09:45] I will open a ticket when I gather more information [12:10:26] djellel: thank you! [12:10:40] can't help much I'm afraid, I'm not familiar with this dataset [12:15:02] djellel: as follow up question, what do you use Hue for generally? [12:15:14] because I'd love to deprecate it in the future [12:22:55] a-team: I am going to do some invasive maintenance to the druid public cluster, I need to apply new settings and it will mean taking down one node at the time. I'll depool them etc.. but please let me know if anything looks weird in the meantime [12:35:12] druid1004 is now recovering [12:35:31] elukey: ad-hoc queries. I use it extensively. [12:36:33] djellel: if you have time, I'd like you to try superset's sqllab (that uses Presto behind the scenes) to see if it can be a replacement [12:37:02] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset#SQL_Lab [12:37:31] it should also be way quicker than Hue since it doesn't query Hive directlyu [12:41:00] druid1004 done and repooled [12:41:12] will wait 5/10 mins and proceed with druid1005 [12:41:37] ok, i can check it out [12:43:34] djellel: it changed my life :) [12:44:57] milimetric: if I can write sql and do regex, I am happy :) [12:48:06] proceeding with druid1005 [12:52:19] (03PS15) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [13:01:31] 10Quarry, 10Data-Services, 10cloud-services-team (Kanban): Quarry or the Analytics wikireplicas role creates lots of InnoDB Purge Lag - https://phabricator.wikimedia.org/T251719 (10Marostegui) This is a proof on how much CPU intensive the Analytics role is. This is a CPU graph from labsdb1010 as soon as I de... [13:04:34] 10Quarry, 10Data-Services, 10cloud-services-team (Kanban): Quarry or the Analytics wikireplicas role creates lots of InnoDB Purge Lag - https://phabricator.wikimedia.org/T251719 (10Marostegui) And the effect on purge and lag is huge too after it got rid of that role: {F31835696} {F31835695} [13:09:57] druid1006 done [13:13:14] !log stop druid-daemons on druid100[1-3] (one at the time) to move the druid partition from /srv/druid to /srv (didn't think about it before) - T252771 [13:13:17] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:13:17] T252771: Add new Druid nodes to analytics and public clusters - https://phabricator.wikimedia.org/T252771 [13:13:30] so since I am stupid I'll do the same with druid100[1,3] [13:17:03] !log kill wmf_netflow druid supervisor for maintenance [13:17:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:42:56] (03PS16) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [13:53:30] !log restart druid-historical on an-druid100[1,2] to pick up new settings [13:53:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:02:43] !log restart druid kafka supervisor for wmf_netflow after maintenance [14:02:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:14:52] (03PS17) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [14:18:12] * elukey afk for a bit before meetings [14:37:45] (03PS18) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [14:39:08] me everytime I test oozie jobs: https://i.imgur.com/79XYx1L.gif [14:56:18] 10Analytics, 10Research-Backlog: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207 (10Nuria) [14:59:21] 10Analytics, 10Analytics-Kanban: Spike, see how easy/hard is to scoop all tables from Eventlogging log database - https://phabricator.wikimedia.org/T250709 (10Nuria) Before calling it good let's make sure we can scoop a bunch of tables and that data in those tables is queryable for all columns, it is happen be... [16:20:01] 10Analytics, 10Analytics-Kanban: Spike, see how easy/hard is to scoop all tables from Eventlogging log database - https://phabricator.wikimedia.org/T250709 (10Milimetric) The following tables will not be imported because they have 0 rows: MobileWikiAppiOSSessions_18064102, ServerSideAccountCreation_5014296, H... [16:44:34] !log roll restart druid historical nodes on druid100[4-6] (public cluster) to pick up new settings - T252771 [16:44:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:44:36] T252771: Add new Druid nodes to analytics and public clusters - https://phabricator.wikimedia.org/T252771 [17:24:30] !log add druid100[7,8] to the druid public cluster (not serving load balancer traffic for the moment, only joining the cluster) - T252771 [17:24:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:24:33] T252771: Add new Druid nodes to analytics and public clusters - https://phabricator.wikimedia.org/T252771 [17:39:10] a-team: I am adding two new nodes to the Druid public cluster, for the moment not serving LB traffic from AQS (but only sharing historical traffic etc..) [17:39:17] please let me know if anything is weird [17:42:51] very nice I can see the new nodes in the coordinator's ui [17:43:16] segments are not balanced of course, the coordinator will try to do it slowly [17:45:12] going to step away for a bit, will check later, but so far all good! [17:51:22] 10Analytics, 10Operations, 10Traffic: Publishing project anomaly data for censorship researchers. Evaluate privacy threats - https://phabricator.wikimedia.org/T183990 (10RLazarus) p:05Triage→03Medium a:03ssingh Trying to route this -- @ssingh, should this be assigned to you? [17:53:36] 10Analytics, 10Operations, 10Traffic: Publishing project anomaly data for censorship researchers. Evaluate privacy threats - https://phabricator.wikimedia.org/T183990 (10ssingh) Hi, yes that's fine for now. The privacy threats will be more suited for the Security team but I will triage it again when required... [18:25:41] a-team: atlas, requires (do not cry) hbase [18:31:09] 10Analytics, 10Operations, 10Readers-Web-Backlog, 10Traffic: Mobile redirects drop provenance parameters - https://phabricator.wikimedia.org/T252227 (10BBlack) [19:03:06] just chiming in quickly nuria - atlas uses Titan (actually janus, the open-source fork of it) as backend, which is backend agnostic and can be set to cassandra [19:38:02] just re-checked the druid clusters, so far all good [19:38:58] tomorrow I'll add the two new nodes to the public's load balancer VIP [19:39:07] ttl :) [20:17:39] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: EventLogging data missing from event_sanitized schemas - https://phabricator.wikimedia.org/T253182 (10mforns) a:05Milimetric→03mforns [20:31:40] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: EventLogging data missing from event_sanitized schemas - https://phabricator.wikimedia.org/T253182 (10mforns) @MMiller_WMF I just left the sanitization back-fill running for 2020-04-29 -> 2020-05-05 (issue period). It will take a couple hours. Tomorrow I... [21:38:03] Hi @nuria, wanted to confirm that the new fields we plan to add to VisualEditorFeatureUse Schema will be handled automatically in Eventlogging and no changes will be required by Analytics. https://phabricator.wikimedia.org/T244498#6149508 [23:02:28] 10Analytics, 10Product-Analytics: [Spike] Should EventLogging support DNT? - https://phabricator.wikimedia.org/T252438 (10JFishback_WMF) Sorry for my delayed response here - I wanted to spend time really reviewing the disparate viewpoints and refining my own thoughts on the subject, so thank you for your patie...