[04:21:11] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10Nuria) Nice @Isaac need to get back to this now that https://phabricator.wikimedia.org/T... [06:49:28] good morning [07:46:29] Good morning [07:50:49] bonjour [07:52:50] o/ [08:08:03] elukey: morning! this module is not used in production but kept intentionally for the cloud: https://gerrit.wikimedia.org/r/c/operations/puppet/+/657045 [08:08:15] I don't know where it's used in the cloud though [08:12:38] Amir1: morning! Let's keep it for the moment, we might need it, and we'll do the clean up when EL will be decommed! Thanks! (merged) [08:12:56] Thanks ^^ [08:15:03] I don't think there's much left in analytics. Just some in refinery [08:16:14] wow really nice, thanks a lot :) [08:33:30] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Update Spicerack cookbooks to follow the new class API conventions - https://phabricator.wikimedia.org/T269925 (10elukey) [08:34:44] going to take a little break, bbl! [09:13:39] (03CR) 10Awight: [C: 03+1] "Yep, looks right. CodeMirror is enabled on all wikis, so the full list of dbs should be queried. Might be good to comment that these met" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/655886 (https://phabricator.wikimedia.org/T271894) (owner: 10WMDE-Fisch) [09:28:27] (03PS1) 10Joal: [WIP] Fix wikitext history job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/657053 (https://phabricator.wikimedia.org/T269032) [09:31:54] (03PS2) 10WMDE-Fisch: Update schema with core bucket labels [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/656901 (https://phabricator.wikimedia.org/T269986) [09:33:12] (03CR) 10jerkins-bot: [V: 04-1] Update schema with core bucket labels [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/656901 (https://phabricator.wikimedia.org/T269986) (owner: 10WMDE-Fisch) [09:38:39] (03PS3) 10WMDE-Fisch: Update schema with core bucket labels [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/656901 (https://phabricator.wikimedia.org/T269986) [09:39:43] (03CR) 10jerkins-bot: [V: 04-1] Update schema with core bucket labels [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/656901 (https://phabricator.wikimedia.org/T269986) (owner: 10WMDE-Fisch) [09:53:29] 10Analytics, 10Data-Persistence-Backup: Matomo backup size got halved, normally pointing to a backup or underlying data issue - https://phabricator.wikimedia.org/T272344 (10jcrespo) [09:54:11] 10Analytics, 10Data-Persistence-Backup: Matomo backup size got halved, normally pointing to a backup or underlying data issue - https://phabricator.wikimedia.org/T272344 (10jcrespo) Luca may know matomo the most? [10:03:01] joal: I created https://gerrit.wikimedia.org/r/c/operations/puppet/+/635751 as initial config for the 18 worker nodes [10:03:20] it is very simple and it doesn't add the datanodes on the masters yet, just to keep the boostrap simple [10:03:42] today I'll follow up with dcops to get the 2 nodes that are not connected to the switch fixed [10:03:51] and tomorrow I think we'll be able to bootstrap the cluster [10:04:13] going to do the prep-work (kerberos keytabs etc..) now [10:04:33] 10Analytics, 10Data-Persistence-Backup: Matomo backup size doubled, we should check this is normal operation - https://phabricator.wikimedia.org/T272344 (10jcrespo) [10:05:18] 10Analytics, 10Data-Persistence-Backup: Matomo backup size doubled, we should check this is normal operation - https://phabricator.wikimedia.org/T272344 (10jcrespo) p:05Triage→03Low I just realized it doubled, not halved, which would be a way more common operation to happen. [10:09:04] elukey: do you wish me to review anything special in that conf or is it standard? [10:10:02] 10Analytics, 10Data-Persistence-Backup: Matomo backup size doubled, we should check this is normal operation - https://phabricator.wikimedia.org/T272344 (10jcrespo) piwiki log tables seem to be the main responsible for the growth: {P13830} [10:14:57] 10Analytics, 10Data-Persistence-Backup: Matomo database backup size doubled, we should check this is normal operation - https://phabricator.wikimedia.org/T272344 (10jcrespo) [10:19:20] joal: it is standard for the moment, nothing really big changing [10:19:28] ack elukey [10:19:58] joal: I am thinking about the future, and the fact that we may need a permanent backup cluster in codfw [10:20:02] or a similar solution [10:20:09] makes sense elukey [10:20:25] Bigtop is going to release 3.x during the next months, likely with hadoop 3.2.x [10:20:34] and there is already a task for 3.3.0 [10:20:37] this is very cool :) [10:21:16] plus Bullseye will only be supported on 3.2.x, so I think that we'll upgrade at a different pace during the next couple of years :D [10:21:37] but if everytime we have to get crazy to save the data it will become a burden [10:26:24] joal: IIUC our current need is backupping ~350 (unreplicated) TBs [10:26:28] possibly growing in the future [10:26:34] (surely growing :P) [10:28:01] elukey: I don't remebmer how much precisely - will do a check [10:35:54] elukey: I find 391Tb [10:36:22] ah nice even more :D [10:36:28] :( [10:36:32] ok so around 400TB [10:36:39] elukey: we have some ever-growing data [10:36:55] so the more we wait, the bigger it gets :( [10:37:28] Yes yes this is also to follow up with Jaime, to see if any other backup solution for this huge amount of data could be envisioned [10:37:50] elukey: something to consider in that discussion [10:38:09] elukey: The data we're talking about to backup is not trimmed at al [10:38:46] another thing to know would be how much we expect it to grow in say a year [10:38:48] elukey: If there are backup solutions, we should be able to define levels of need [10:38:50] 50TB? 100TB? [10:39:16] For instance, we have to backup pageview data [10:39:25] and, we maybe should backup user data [10:39:59] and, we probably don't want to backup webrequest data, even if in our case do becasue we wish to keep cluster usable for analysis even in case of problem [10:40:00] okok I am going to open a task [10:40:29] well I'd strive for a solution that would work for upgrades and normal day-to-day [10:40:40] without the risk of us getting paranoid :D [10:47:17] elukey: I hear your point - Our current backup strategy is too bold IMO - It'll be easier (feasible might be the correct word) to maintain a strategy of what to backup when we'll have a data-governance tool (or so I hope) [10:48:32] joal: definitely yes [11:19:25] (03CR) 10ZPapierski: [C: 03+1] "LGTM, with a very small nit (with which I'm fine if it's ignored)." (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/647723 (https://phabricator.wikimedia.org/T269619) (owner: 10DCausse) [12:13:00] (03PS3) 10WMDE-Fisch: Collect metrics of all wikis [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/655886 (https://phabricator.wikimedia.org/T271894) [12:13:19] (03CR) 10WMDE-Fisch: "> Patch Set 2: Code-Review+1" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/655886 (https://phabricator.wikimedia.org/T271894) (owner: 10WMDE-Fisch) [12:25:02] (03CR) 10Awight: [C: 03+1] Collect metrics of all wikis [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/655886 (https://phabricator.wikimedia.org/T271894) (owner: 10WMDE-Fisch) [12:31:33] elukey: +1'd your backup cluster change (635751) once more :) [12:32:20] klausman: thanks! :D [12:32:52] I need to follow up with dcops for a couple of nodes, but the rest look good! [12:47:03] * elukey lunch! [14:02:08] hellooo teamm [14:02:40] yoohoo [14:03:41] hey ottomata :] I wanted to confirm with you that the growth server-side schemas can be productionized? [14:04:29] nope not yet! [14:04:36] ok, np [14:04:52] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventLogging/+/655999 [14:05:04] lokin [14:05:31] ok, will follow this one [14:07:20] gonna follow up on that as soon as done emails this morn [14:13:28] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Platform Team Workboards (Clinic Duty Team): Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10AMooney) [14:15:13] (03CR) 10Ottomata: ":)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 (owner: 10Joal) [14:19:40] ottomata: no rush at all! I just thought that you asked me to move on with the php schema migrations, and wanted to be sure.. [14:26:10] 10Analytics, 10Event-Platform: Some refined events folders contain no data while they should - https://phabricator.wikimedia.org/T272177 (10Ottomata) > : the remove-canary-event function drops any event not having the meta.dt You mean `meta.domain`? `remove_canary_events` should only remove events that explic... [14:26:44] hello folks :) [14:28:53] hello! [15:00:27] Heya folks - Gone for kids, back for standup [15:17:36] 10Analytics, 10Data-Persistence-Backup: Matomo database backup size doubled, we should check this is normal operation - https://phabricator.wikimedia.org/T272344 (10elukey) @jcrespo Thanks a lot for the ping, I'll review the data with @razzi and we'll get back to you asap. Really great alert! I like it :) [15:18:04] razzi: hello! When you are around, let's check https://phabricator.wikimedia.org/T272344, I think it is a good way to learn a bit the matomo + backups part of the infra :) [15:41:47] 10Analytics: Upgrade to Superset 1.0 - https://phabricator.wikimedia.org/T272390 (10elukey) [15:41:57] 10Analytics: Upgrade to Superset 1.0 - https://phabricator.wikimedia.org/T272390 (10elukey) [15:42:00] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Superset Updates - https://phabricator.wikimedia.org/T211706 (10elukey) [16:06:04] mforns, joal, milimetric: howdy! right now, the endpoint I'm working on is intended to be named `top-articles-by-country`, but I was wondering if `top-pages-per-country` (or `top-articles-per-country`) would be better [16:07:01] I was thinking this because: 1) not all items returned by the endpoint are articles (e.g., home page), and 2) per country signifies that the data is not split across all countries (such as editors/by-country or pageviews/top-by-country), but rather filtered by a single country (such as pageviews/per-article) [16:08:01] lexnasser: per-country rather than by-country makes sense to me! [16:09:14] re. page vs article, I also like page more, and in wikistats page is widely used, do I'm all for page [16:11:25] mforns: my only hesitation about using `top-pages` over `top-articles` is that the pageviews/per-article endpoint returns data for pages like the home page, so we might want to stick with `top-articles` just for convention. do you think this is important enough to not switch? [16:16:00] just ran the cookbooks to upgrade from cdh to bigtop 1.5, all good [16:16:18] I am going to wait a bit to let things stabilize, and then I'll test the rollback [16:21:15] lexnasser: maybe yes, let's see what others say? [16:22:46] lexnasser: so the URL could be pageviews/top-per-country, sidestepping the issue and being consistent with pageviews/top and pageviews/per-article [16:22:54] does that work? [16:23:40] but top-pages-per-country looks like a name for something other than the URL, what am I forgetting? [16:24:50] milimetric: I like that! [16:25:24] k :) [16:25:40] maybe our cassandra keyspace can finally fit into the size constraint! [16:25:45] mforns: what do you think [16:25:46] hahaha [16:26:49] lexnasser, milimetric: also like that :] [16:30:22] great! I'll wait to see if joal has any reservations and then make the necessary changes if not [16:32:37] 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[18-41] - https://phabricator.wikimedia.org/T260445 (10RobH) a:05RobH→03Cmjohnson I'm unsubscribing myself from this, as its been taken over by the subteam, and its causing a lot of noise in my phabricator... [16:32:48] Hi team! [16:32:49] elukey: I saw your message about matomo backups, is now a good time to look at it? [16:56:19] a-team: today's tech mgmt meeting is 30min longer so I won't be in standup, but I'll join for retro (pls start without me if needed) [17:02:19] razzi: sorry I was in a meeting! [17:22:20] 10Analytics, 10Analytics-Kanban, 10serviceops, 10Patch-For-Review, 10User-jijiki: Mechanism to flag webrequests as "debug" - https://phabricator.wikimedia.org/T263683 (10Milimetric) ping @jijiki on the above question ^. In the meantime we have another request to add client source port to this data, so I... [17:23:53] 10Analytics, 10Analytics-Kanban: Add client TCP source port to webrequest - https://phabricator.wikimedia.org/T271953 (10Milimetric) Hi! We'd like to add this to the X-Analytics header, if that's ok with everyone. This way we don't have to add a new field. Here's another task that's adding a debug flag this... [17:30:02] joal: [17:30:02] https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/TransformFunctions.scala#L591 [17:30:06] keep everything [17:30:16] where meta.domain is null or meta.domain != canary [17:30:21] so it should keep meta.domain === NULL [17:30:37] ack ottomata [17:30:42] it is something else then :( [17:32:18] joal: what are your thoughts on the `top-per-country` naming as discussed above [17:32:53] pageviews/top-per-country sounds great to me :) [17:33:00] also we only just created that filter and it shoulldn't have run on the data back in october [17:33:23] joal: cool! will get that changed right away [17:33:26] ok ottomata - probably unrelated then - sorry for the noise [17:33:31] joal: i lost link to task [17:33:34] you have? [17:33:42] https://phabricator.wikimedia.org/T272177 [17:33:45] ottomata: -^ [17:33:56] hm joal legacy eventlogging data runs through [17:34:08] https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/TransformFunctions.scala#L486-L510 [17:34:32] ithink this is the same webHost that dan and I saw when we had this problem last week: login.wikimedia.org [17:37:02] this should be matched by the UDF :( [17:38:37] yeah [17:38:57] so don't know why that data wasn't refined [17:39:04] but, refine_sanitize shouldn't fail if no data I think [17:40:55] 10Analytics, 10Event-Platform, 10Product-Infrastructure-Data: PrefUpdate Event Platform Migration - https://phabricator.wikimedia.org/T267348 (10Ottomata) According to @nettrom_WMF and @MNeisler, client_ip and geocoded data can be removed [18:25:02] ottomata: let me know when you have time for the refine failure issues [18:29:22] joal: got 30 mins til next meetingj [18:29:27] bc? [18:30:15] sure - tardis ottomata [18:30:23] https://meet.google.com/kti-iybt-ekv [18:30:59] ottomata: --^ [18:36:33] elukey: I'm in the bc [18:36:44] razzi: just got to the laptop, joining [19:01:06] joal: uou, my meeting moved 15 mins, wanna go back to tardis? [19:01:11] sure [19:10:13] razzi: cannot hear you anymore :( [19:17:17] (03PS1) 10Joal: Fix DataFrameExtension.convertToSchema repartition [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/657171 (https://phabricator.wikimedia.org/T272177) [19:17:27] ottomata: --^ [19:17:36] mforns: as well --^ [19:27:27] (03CR) 10Gergő Tisza: "Elasticsearch is switching to a non-free license, which might or might not be a problem for using ECS. See T272238." [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/647025 (owner: 10Jason Linehan) [19:34:16] (03CR) 10Ottomata: [C: 03+1] Fix DataFrameExtension.convertToSchema repartition [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/657171 (https://phabricator.wikimedia.org/T272177) (owner: 10Joal) [19:38:37] joal: cool, meeting over. back into looking at why no data for that hour [19:38:50] ottomata: nailed it - patch in minutes [19:39:58] oh COOL! ok great, actually going for quick walk outside before the sun goes down [19:40:00] back shortly [19:45:07] actually no patch was needed ottomata - The reason for the failure is the update of the webrequest.isWMFHostname function [19:47:40] joal: just tested the rollback bigtop 1.5 -> cdh, no issue raised [19:47:46] \o/ [19:47:57] This is super awesome elukey :) [19:48:02] so happy, the cookbooks with the new logic work very well [19:48:54] * joal bows to elukey-san [19:48:58] <3 [19:49:00] ottomata: https://github.com/wikimedia/analytics-refinery-source/commit/50960033a66c109b77ca792635f8feb46d6356c2#diff-6f19664b24f5e6f30d3c97f8d2ff85738a5787b805e1fac15c37546e3b2bd0f5L492 [19:49:14] ok I am calling this the end of the day, see you tomorrow! o/ [19:49:19] see you elukey [19:50:34] ottomata: we must have run refine with the filter_allowed_domains transform function while it was not ready - Or we have backfilled with old code but new settings [19:51:26] ottomata: I actually think this is why we get the same problem when using the manual refine launcher you gave me: the config we rely on is current (in /etc...) [19:51:39] While maybe it was different in term of transform functions [19:52:10] mforns: --^ if you're interested as well [19:53:05] I'm gonna call it end of day for me as well - mforns, ottomata, I'm leaving config-changes investigation and related impact investigation to you if you may [19:53:41] also a-team - I have not deployed today - If will try to have my patches for /tmp perms updated for tomorrow evening and deploy tomorrow [19:54:49] joal: I can help deploy tomorrow [19:55:41] Ack mforns - Let's sync tomorrow then :) It'll be kids day, so mostly off, but I'll try to chime in for patches updates for /tmp [19:58:35] back [19:59:11] but joal we only megd that 29 days ago [19:59:16] heya ottomata - I'll stay a couple of minutes if you have questions, I pasted my findings above --^ and then I'll leave :) [19:59:18] version 0.0.143 [19:59:51] ottomata: this is the thing with software-as-config: IIRC the usage of the transform function has been removed from config [20:00:08] And then reinstalled, but we need to check for dates [20:00:23] but, this refine job ran in october [20:00:29] the q is [20:00:39] why does the 0.0.133 version of the jar remove these 2 events [20:00:55] ottomata: because it applies a faulty transform function [20:01:13] The patch merged in 0.0.143 makes the tranform use the new def you patched [20:01:13] isWikimediaHost(login.wikimedia.org) == flase? [20:01:15] false? [20:01:16] ? [20:01:27] yessir - no pageview [20:01:31] ahhHHHhhh [20:01:43] ok then, that is fine and known and had been a problem for a long tim ethen [20:02:12] I thought we had removed the function from conf, but eh, you know better :) [20:02:21] At least, we have an explanation :) [20:02:30] no, the list of function sis still the same [20:02:51] both eventlogging_analyltics and eventloggging_legacy use filter_allowed_domains [20:02:53] did then and do now. [20:03:04] ok [20:03:07] its just the implementation if isWikimediaDomain i guess that has changed [20:03:21] refine_events (which does not do 'legacy' eventlogging stuff at all) [20:03:22] makes sense then [20:03:25] does not filter_allowed_domains [20:03:31] it just tags is_wmf_domain [20:03:48] joal: i'll update task with findings [20:03:51] have a good night! [20:03:57] ack ottomata thanks [20:04:03] reading scrollback [20:04:24] I assume then that next steps will be to merge and dpeloy the patch fixing sanitization, and rerun [20:04:50] ya that sounds right [20:04:54] i think we don't need to fix the old data [20:05:10] works for me - great - Leaving then :) See y'all tomorrow [20:07:48] 10Analytics, 10Event-Platform, 10Patch-For-Review: Some refined events folders contain no data while they should - https://phabricator.wikimedia.org/T272177 (10Ottomata) Ok! Answers! > 1. refine_sanitize shouldn't fail if no data. https://gerrit.wikimedia.org/r/657171 should fix this. > 2. 10/30/21 should... [20:57:50] 10Analytics, 10Analytics-Kanban: Add client TCP source port to webrequest - https://phabricator.wikimedia.org/T271953 (10Ladsgroup) It's okay for me, as long as I have a way to obtain the data, it works for me :D [21:10:54] 10Analytics, 10Add-Link, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint), 10Patch-For-Review: Add Link engineering: Pipeline for moving MySQL database(s) from stats1008 to production MySQL server - https://phabricator.wikimedia.org/T266826 (10Tgr) [21:11:41] 10Analytics, 10Add-Link, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint), 10Patch-For-Review: Add Link engineering: Pipeline for moving MySQL database(s) from stats1008 to production MySQL server - https://phabricator.wikimedia.org/T266826 (10Tgr) Filed {T272419} about the MediaWiki part. [21:11:53] 10Analytics, 10Add-Link, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint), 10Patch-For-Review: Add Link engineering: Pipeline for moving MySQL database(s) from stats1008 to production MySQL server - https://phabricator.wikimedia.org/T266826 (10Tgr) [21:12:23] 10Analytics, 10Add-Link, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint), 10Patch-For-Review: Add Link engineering: Pipeline for moving MySQL database(s) from stats1008 to production MySQL server - https://phabricator.wikimedia.org/T266826 (10Tgr) [22:12:38] ottomata: still there? [22:45:55] ya in bc with razzi [22:45:59] or no, iin meeting [22:46:04] practicing some kafka stuff [23:06:55] mforns: hello bout to sign off [23:07:02] sorry! you still there? can I help ya? [23:09:51] ok signing off by ebyeyyyy