[00:07:44] (03PS14) 10Sharvaniharan: Image recommendations table for android [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 [00:08:49] (03CR) 10jerkins-bot: [V: 04-1] Image recommendations table for android [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 (owner: 10Sharvaniharan) [00:10:08] (03PS15) 10Sharvaniharan: Image recommendations table for android [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 [01:30:55] (03PS5) 10Razzi: Upgrade superset to 1.0.1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/665130 (https://phabricator.wikimedia.org/T272390) [01:30:57] 10Analytics, 10Analytics-Kanban, 10Growth-Team, 10Product-Analytics: Revisions missing from mediawiki_revision_create - https://phabricator.wikimedia.org/T215001 (10Milimetric) (tl;dr; no simple answers) As this is becoming a pivotal issue I wanted to make sure to exhaust simple explanations. My query is... [02:14:54] (03PS6) 10Razzi: Upgrade superset to 1.0.1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/665130 (https://phabricator.wikimedia.org/T272390) [03:59:10] (03PS7) 10Razzi: Upgrade superset to 1.0.1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/665130 (https://phabricator.wikimedia.org/T272390) [06:23:38] good morning [06:24:20] I get proxy error when using superset [06:27:06] mmm /srv/deployment/analytics/superset/venv is not there on an-tool1010, very weird [06:27:48] and we didn't get an alert [06:28:00] (I saw a puppet failure for an-tool1010 in icinga) [06:28:53] PWD=/home/razzi ; USER=root ; COMMAND=/usr/bin/rm -r /srv/deployment/analytics/superset/venv/ [06:29:02] ok this is the issue [06:29:14] I suspect that it was done on the wrong host [06:32:27] !log force a manual run of create_virtualenv.sh on an-tool1010 - superset down [06:32:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:33:57] superset up afaics [06:34:24] razzi: please be more careful next time [06:36:14] now, why didn't we receive an alert? [06:39:29] so superset (the process) was up, and apparently the alert didn't fire [06:39:54] since maybe it was answering to the health check correctly [06:40:34] yeah the check is simply that the tcp port is open [06:49:48] same thing for turnilo [06:53:44] 10Analytics: Add better monitoring for Analytics UIs - https://phabricator.wikimedia.org/T277729 (10elukey) p:05Triage→03High [07:27:12] Good morning elukey [07:27:24] elukey: should we have a more robust check for superset? [07:29:19] joal: yep I opened a task [07:29:38] bonjour :) [07:29:38] Ah sorry, I didn't notice - thanks for that [07:30:14] * joal should have knoiwn elukey is faster :) [07:30:29] How are you today elukey? anything I can help with? [07:33:06] all good :) [07:55:30] elukey: thanks a lot for all the hard work on capcity-scheduler integration in puppet - the CR thread sparks for itself <3 [07:59:45] also elukey, while checking the new AQS endpoint and oozie job yesterday I realized our cassandra hosts were not super clean [08:00:00] elukey: shall we plan on doing some manual clean up today or tomorrow? [08:01:25] joal: I am going to add your last suggestions to the config today now that we have a good way forward :) [08:01:29] (for capacity scheudler) [08:01:55] joal: what kind of cleanup for cassandra? [08:02:10] (we should also discuss with Hugh the plan for Cassandra 3, we are getting close) [08:04:01] super happy to talk about cassandra 3 plans :) [08:04:22] clean up is about test keyspaces, renamed (unused) kespaces [08:04:26] and data leftover [08:04:28] ah yes yes [08:04:51] elukey: shall I create a task, or do we move without? [08:05:11] joal: a task is better, let's also add Hugh among the subscribers [08:06:01] ack elukey - will do [08:20:11] 10Analytics, 10Research: Webrequest.isWMFDomain should return true for .wmflabs.org domains. - https://phabricator.wikimedia.org/T277536 (10JAllemandou) patch sent, but I have questions: - If we accept `wmflabs.org`, should we also accept `toolsforge.org` ? - Given the 'non-official-production' status of cod... [08:21:20] 10Analytics, 10Research: Webrequest.isWMFDomain should return true for .wmflabs.org domains. - https://phabricator.wikimedia.org/T277536 (10Majavah) Note that `wmflabs.org` is slowly being phased out in favor of `wmcloud.org`. [08:25:17] 10Analytics, 10Research: Webrequest.isWMFDomain should return true for .wmflabs.org domains. - https://phabricator.wikimedia.org/T277536 (10JAllemandou) >>! In T277536#6924002, @Majavah wrote: > Note that `wmflabs.org` is slowly being [[ https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/DNS#*.wmcloud.... [08:27:23] (03PS1) 10Joal: Add wmflabs.org and wmcloud.org in WMF domain list [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/673202 (https://phabricator.wikimedia.org/T277536) [08:29:13] joal: do you have a min for an IRC brainbounce? [08:29:31] sure elukey [08:30:11] I am still extremely puzzled by the druid broker query cache [08:30:26] it seems that the cache is used [08:30:35] for example, see the analytics cluster [08:30:36] https://grafana.wikimedia.org/d/000000538/druid?viewPanel=11&orgId=1&refresh=1m&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=druid_analytics&var-druid_datasource=All [08:31:12] so that high hitrate is, I think, related to the turnilo's druid datasource refreshes or something similar [08:31:25] but it is a proof, in theory, that something is cached [08:31:30] elukey: broker-cache should be used for overall queries IIRC [08:31:41] yes in theory this is my understanding [08:31:46] but for the rest it is not used [08:32:14] so I was wondering if for some reason the queries were not cachable, or if some override somewhere is preventing whole-query cache [08:32:16] I don't understand what 'the rest' means [08:32:29] Ah I get it [08:33:04] (03CR) 10jerkins-bot: [V: 04-1] Add wmflabs.org and wmcloud.org in WMF domain list [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/673202 (https://phabricator.wikimedia.org/T277536) (owner: 10Joal) [08:33:08] yes sorry I mean the other queries, most of the ones from turnilo in analytics and from aqs in public [08:33:37] I sent an email to users@druid but no answer [08:35:46] hm [08:56:05] joal: do we use group-by v2 for druid public? [08:56:18] elukey: we don't [08:56:41] elukey: we don't use group-by queries as they are too slow for us [08:56:52] because I found https://github.com/apache/druid/issues/10344 that could have explained, all right nothing :) [08:56:57] elukey: I however wish to test again, now that the caching layer has changed [08:58:02] joal: I am wondering if topN follows the same problem of not being cachable [08:58:11] (we use on aqs right?) [08:58:13] I don't know elukey! [08:58:19] elukey: yes we do [09:05:10] I left a comment in the issue, we use topn v2 IIUC that I suspect follows the same rule as group by for query caching [09:07:03] elukey: possible :( [09:07:24] elukey: I'm reading, and couldn't find any specifics about queries not being cached at broker level [09:07:47] In the issue it is written " I did not see a reason why it does not (it is documented alright)" [09:07:50] maybe in the code? [09:08:30] elukey: could it be that the queries issued by the UIs are always different (only true for analytics cluster - public one definitely have almost always the same queries) [09:09:13] And actually, cache hit-rate for brokers on public says 0 [09:09:14] :( [09:12:20] and also the cache size is 0 [09:12:27] this is what is weird [09:12:37] yup [09:12:44] the only explanation that I can give is that Druid avoids to cache [09:12:46] could it be a typo in settings? [09:13:03] I checked 100 times but I don't see any [09:13:06] elukey: druid was caching when using segments in brogkers [09:13:19] it was yes [09:13:45] elukey: I'm gonna check default query parameters (populateCache) [09:15:21] joal: +1 [09:17:24] elukey: default query parameters seem ok [09:19:21] joal: one test that we could do is to pick a query for druid public, and hit the broker to the v1 endpoint (if it offers topn) [09:20:28] elukey: note that we also do timeseries in public [09:20:43] elukey: so v2 is not enough to explain [09:21:10] joal: ah ok I thought only topn [09:21:21] nah, actually mostly timeseries [09:21:37] joal: but we use v2 only right? [09:21:54] I mean maybe for some obscure reason v2 don't have any result level cache [09:22:09] elukey: We don't specify anythong in queries, so if druid is configured for v2 we use that [09:22:15] maybe :( [09:22:29] joal: the URI path on aqs code is /druid/v2 afaics [09:22:39] Ah! yes it is :) [09:26:39] elukey: quick question on debian versions: you have bumped stats machine to buster recently, right? [09:26:58] joal: months ago yes [09:27:02] with Tobias [09:27:15] Ack! Will answer Erin - thanks [09:30:10] (03CR) 10Hashar: "We have the parent repository analytics/wmde which can be used to set the proper rights on all repositories: https://gerrit.wikimedia.org/" (031 comment) [analytics/wmde/scripts] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/672824 (owner: 10WMDE-leszek) [09:33:56] 10Analytics: ssh access to stat100x machines - https://phabricator.wikimedia.org/T277721 (10JAllemandou) Hi @EYener, Machines identification change when their OS is updated, and they have (more or less) recently been updated (Debian Strech --> debian Buster). You indeed need to remove lines in your `/.ssh/known_... [09:37:44] 10Analytics: ssh access to stat100x machines - https://phabricator.wikimedia.org/T277721 (10elukey) Adding a suggestion on top of Joseph's one: you can remove the entry in your ssh config, and use https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/stat1005.eqiad.wmnet to check that the new fingerprint is... [10:01:42] 10Analytics, 10Analytics-Kanban: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance - https://phabricator.wikimedia.org/T272973 (10elukey) a:05elukey→03razzi [10:02:02] 10Analytics, 10Analytics-Kanban: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance - https://phabricator.wikimedia.org/T272973 (10elukey) Some updates: @razzi is going to take over the work during the next quarter :) [10:04:27] * elukey bbiab [10:05:20] 10Analytics-Radar, 10Product-Analytics, 10Structured-Data-Backlog: Create a Commons equivalent of the wikidata_entity table in the Data Lake - https://phabricator.wikimedia.org/T258834 (10Cparle) [10:05:22] 10Analytics-Radar, 10Datasets-General-or-Unknown, 10Dumps-Generation, 10Product-Analytics, 10Structured-Data-Backlog (Current Work): Set up generation of JSON dumps for Wikimedia Commons - https://phabricator.wikimedia.org/T259067 (10Cparle) 05Open→03Resolved Nothing left to do afaik. Hooray! I'll cl... [11:42:19] 10Analytics-Radar, 10Dumps-Generation, 10Okapi, 10Platform Engineering: HTML Dumps - June/2020 - https://phabricator.wikimedia.org/T254275 (10kostajh) Is there a rough estimate of when these HTML dumps will be available? Is it dependent on when https://enterprise.wikimedia.com/ goes live? [11:45:24] https://docs.tecton.ai/overviews/feature_store.html [11:53:23] joal: added some changes to https://gerrit.wikimedia.org/r/c/operations/puppet/+/672373/14/modules/profile/manifests/analytics/cluster/hadoop/yarn_capacity_scheduler.pp [12:00:25] * elukey lunch! [12:02:33] 10Analytics: ssh access to stat100x machines - https://phabricator.wikimedia.org/T277721 (10EYener) 05Open→03Resolved a:03EYener Super! Thank you both. I have logged in successfully. I'll resolve the task! [12:27:15] 10Analytics-Radar, 10Dumps-Generation, 10Okapi, 10Platform Engineering: HTML Dumps - June/2020 - https://phabricator.wikimedia.org/T254275 (10ArielGlenn) >>! In T254275#6924530, @kostajh wrote: > Is there a rough estimate of when these HTML dumps will be available? Is it dependent on when https://enterpris... [13:27:09] 10Analytics-Radar, 10Readers-Web-Backlog, 10Performance-Team (Radar), 10Vue.js (Vue.js Search Experience (Vector modern)): Revise schema and performance dashboards for Vue.js search - https://phabricator.wikimedia.org/T250336 (10phuedx) [13:30:00] 10Analytics, 10Research, 10Patch-For-Review: Webrequest.isWMFDomain should return true for .wmflabs.org domains. - https://phabricator.wikimedia.org/T277536 (10Ottomata) Hm good question. For legacy EventLogging, this will cause these events to show up in Refine, which is vital for the TranslationRecommenda... [13:30:11] (03CR) 10Ottomata: [C: 03+1] Add wmflabs.org and wmcloud.org in WMF domain list [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/673202 (https://phabricator.wikimedia.org/T277536) (owner: 10Joal) [13:30:16] (03CR) 10Ottomata: [C: 03+1] "Thank you!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/673202 (https://phabricator.wikimedia.org/T277536) (owner: 10Joal) [13:32:12] 10Analytics, 10Research, 10Patch-For-Review: Webrequest.isWMFDomain should return true for .wmflabs.org domains. - https://phabricator.wikimedia.org/T277536 (10JAllemandou) > For new schemas, it will only cause the is_wmf_domain field to be true. I think these are wmf domains, so this seems correct to me. Le... [13:33:16] hm elukey , if a user wants to use Hive in Hue, do they need ssh access? [13:33:17] i guess not eh? [13:33:43] shoudl they get it anyway? that is not a case covered in our "What access should I request?" docs [13:38:48] ottomata: morning! In theory they don't need ssh access, hue acts as proxy IIRC, but I hoped it wasn't a use case to cover [13:39:03] I'd love to move people away from hue, but superset+presto are not ready yet [13:39:11] yeah [13:42:50] ok edited the doc to mention that case [13:42:51] ty [13:43:35] ack thanks! [13:45:21] 10Analytics, 10Research, 10Patch-For-Review: Webrequest.isWMFDomain should return true for .wmflabs.org domains. - https://phabricator.wikimedia.org/T277536 (10Ottomata) Oh, hm. Yes let's do it. [14:01:46] elukey: do we need krb: present? [14:01:53] likely not, right? they won't be kinit-ing? [14:02:05] ottomata: only if you have the ssh access [14:02:09] right [14:16:11] (03PS2) 10Joal: Update WMF domain list with Cloud and toolforge [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/673202 (https://phabricator.wikimedia.org/T277536) [14:23:41] (03CR) 10jerkins-bot: [V: 04-1] Update WMF domain list with Cloud and toolforge [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/673202 (https://phabricator.wikimedia.org/T277536) (owner: 10Joal) [14:25:38] interesting failure --^ [14:27:22] 10Analytics, 10DBA, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) > How would it handle if the replica goes down for a long time or even forever (ie... [14:27:24] gehel: Hi! I think this is sonar related --^ The error is: Could not find artifact jdk.tools:jdk.tools:jar:1.7 at specified path /usr/lib/jvm/java-11-openjdk-amd64/../lib/tools.jar [14:27:54] I don't understand, as I think we set java 1.8 in our main pom.xml [14:28:40] damn :/ [14:28:59] gehel: sorry for the errot-ping [14:29:05] gehel: I wish I knew more :S [14:29:36] we set source and target, but that's only about the bytecode version, the sonar analysis still runs on JDK 11 [14:29:55] the main build still passed, so you can consider that as a green build [14:30:14] ack :) [14:30:14] I'll post a patch to disable whatever is causing problems during the sonar analysis [14:30:57] Thanks gehel :) [14:31:00] in related news, I'm making some progress on T264873, which means that we'll get real branch analysis with sonar at some point (well, probably, I still don't understand everything) [14:31:02] T264873: Ensure that SonarQube is commenting on gerrit code reviews of the Search Platform team - https://phabricator.wikimedia.org/T264873 [14:31:24] \o/ [14:31:38] Zuul is a mess! [14:32:02] gehel: who you gonna call? (sorry for the reference) [14:32:12] * gehel missed the reference [14:32:45] joal: do you have a link to the build log? [14:33:22] yessir: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-java8-docker/1509/console [14:33:34] thx! [14:34:38] gehel: https://www.youtube.com/watch?v=y3ns6H5dHrU [14:35:09] https://ghostbusters.fandom.com/wiki/Zuul [14:36:01] that explains a lot! [14:42:08] gehel: o/ got a sec to discuss 'data ownership' ? :p [14:42:21] sure [14:42:27] yesterday desiree coordinated a meeting to start a process on how to get things done re. a 'data as a service' platform [14:42:32] but nobody from your team was there, which was noted [14:42:36] i told them i'd sync up with you [14:42:51] impomptu video hangout ok? [14:42:53] or just IRC? [14:43:28] I'm here https://meet.google.com/rxb-bjxn-nip if you want, otherwise IRC is fine [14:43:29] ottomata: also, I sent Desiree the technical prez about the current analytics stuff [14:43:35] joal: which one? [14:44:10] ottomata: that one: https://docs.google.com/presentation/d/1Bp9VpVfFpRZJQZA9e-WLURvNfyoCMzBBHhFD5KZyQlI/edit [14:44:20] OHh colol! [14:44:49] ottomata: it's not about future, but I think it can help to define present :) [14:51:48] joal: thanks for the review :) [14:52:05] elukey: do my comments make sense? [14:52:13] also elukey: thanks for the changes ! [14:55:11] joal: they do yes! I'll apply the changes and re-send :) [15:02:29] Gone for kids - back in a while [15:04:30] 10Analytics, 10Patch-For-Review: Prep for replacing jupyter conda migration - https://phabricator.wikimedia.org/T262847 (10Ottomata) [15:11:02] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10Ottomata) > Because of the bug above, I updated the Jupyter page on Wikitech to remove the recommendation that all users switch to Newpyter. I look forward to continuing to test... [15:14:39] (03PS1) 10Gehel: Fix failing sonar analysis due to JDK11 removing tools.jar [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/673290 [15:14:54] joal: with a little bit of luck, this should fix your build ^ [15:16:18] Good morning, thanks elukey for cleaning up after me on an-tool1010, it was indeed the wrong host [15:19:27] 10Analytics, 10Performance-Team: Navigation timing per-host dasahboard/metrics died around 2021-01-28 - https://phabricator.wikimedia.org/T277764 (10Gilles) [15:19:38] 10Analytics-EventLogging, 10Analytics-Radar, 10Front-end-Standards-Group, 10MediaWiki-extensions-WikimediaEvents, and 5 others: Provide a reusable getEditCountBucket function for analytics purposes - https://phabricator.wikimedia.org/T210106 (10Lena_WMDE) [15:22:03] 10Analytics, 10Performance-Team: Navigation timing per-host dashboard/metrics died around 2021-01-28 - https://phabricator.wikimedia.org/T277764 (10Gilles) [15:28:27] 10Analytics, 10Performance-Team: Navigation timing per-host dashboard/metrics died around 2021-01-28 - https://phabricator.wikimedia.org/T277764 (10Ottomata) Ah, yup not supported. The events aren't emitted from a cache frontend anymore. They are posted directly to EventGate, which is running in Kubernetes. [15:32:29] 10Analytics, 10Performance-Team: Navigation timing per-host dashboard/metrics died around 2021-01-28 - https://phabricator.wikimedia.org/T277764 (10Gilles) 05Open→03Resolved OK, then I guess the workaround would be to get the information from the client and pass it as schema fields. I'll file a task to tha... [15:38:17] hi a-team, I will take the day off today as well, still feeling ill, see you soon [15:38:26] mforns: <3 [15:38:37] razzi: np, but let's be more careful next time :) [15:39:19] (In this case we'll need to improve alarmin, ideally you should have seen a CRITICAL right after the rm) [15:44:58] mforns: is this ok to merge? [15:44:59] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/673075 [15:45:14] feel free to ignore! [15:45:26] (just hoping to unblock that, from what I can tell the schemas you were working on are ready for this) [15:45:36] (no hurry though!) [16:01:33] 10Analytics-Clusters, 10Analytics-Kanban, 10observability: Modify Kafka max replica lag alert to only alert if increasing - https://phabricator.wikimedia.org/T273702 (10fdans) 05Open→03Resolved [16:01:35] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Rematerialize all event schemas with enforceNumericBounds - https://phabricator.wikimedia.org/T273069 (10fdans) 05Open→03Resolved [16:01:37] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: jsonschema-tools should have option to materialize schemas with default max/min validation for e.g. max long, max double, etc. - https://phabricator.wikimedia.org/T258659 (10fdans) [16:01:41] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add client TCP source port to webrequest - https://phabricator.wikimedia.org/T271953 (10fdans) 05Open→03Resolved [16:01:43] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: "Active editors" panel keeps flashing on stats.wikimedia.org - https://phabricator.wikimedia.org/T262725 (10fdans) 05Open→03Resolved [16:01:46] 10Analytics, 10Analytics-Kanban, 10Anti-Harassment, 10Event-Platform, and 2 others: Migrate Anti-Harassment EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T268517 (10fdans) 05Open→03Resolved [16:01:47] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: MEP Client MediaWiki PHP - https://phabricator.wikimedia.org/T253121 (10fdans) [16:01:50] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10fdans) [16:02:01] 10Analytics, 10Analytics-Kanban: Bug: Active Editors showing July numbers - https://phabricator.wikimedia.org/T273470 (10fdans) 05Open→03Resolved [16:02:03] 10Analytics, 10Analytics-Kanban: Filter out webrequest where debug=1 from pageview - https://phabricator.wikimedia.org/T273083 (10fdans) 05Open→03Resolved [16:02:05] 10Analytics, 10Analytics-Kanban: Upgrade UA Parser to 1.5.1+ - https://phabricator.wikimedia.org/T272926 (10fdans) 05Open→03Resolved [16:02:08] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Uncaught TypeError: navigator.sendBeacon is not a function - https://phabricator.wikimedia.org/T273374 (10fdans) 05Open→03Resolved [16:02:10] 10Analytics, 10Analytics-Kanban, 10serviceops, 10User-jijiki: Mechanism to flag webrequests as "debug" - https://phabricator.wikimedia.org/T263683 (10fdans) [16:02:13] 10Analytics, 10Analytics-Kanban: Backup HDFS data before BigTop upgrade - https://phabricator.wikimedia.org/T272846 (10fdans) 05Open→03Resolved [16:02:15] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Some refined events folders contain no data while they should - https://phabricator.wikimedia.org/T272177 (10fdans) 05Open→03Resolved [16:02:17] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Update Image usage metric - https://phabricator.wikimedia.org/T271571 (10fdans) 05Open→03Resolved [16:02:20] 10Analytics, 10Analytics-Kanban, 10SRE, 10Traffic: Traffic anomalies: Factor out list of countries into a dedicated Hive table - https://phabricator.wikimedia.org/T272052 (10fdans) 05Open→03Resolved [16:02:23] 10Analytics, 10Patch-For-Review: Upgrade the Analytics Hadoop cluster to Apache Bigtop - https://phabricator.wikimedia.org/T273711 (10fdans) [16:02:25] 10Analytics-Kanban, 10Patch-For-Review: Test the Bigtop 1.5 RC release on the Hadoop test cluster - https://phabricator.wikimedia.org/T269919 (10fdans) 05Open→03Resolved [16:02:27] 10Analytics, 10Analytics-Kanban: Follow up on Druid alarms not firing when Druid indexations were failing due to permission issues - https://phabricator.wikimedia.org/T271568 (10fdans) 05Open→03Resolved [16:02:28] a-team standup! [16:02:31] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Follow up on hdfs:///tmp perms issues after umask change on HDFS - https://phabricator.wikimedia.org/T271560 (10fdans) 05Open→03Resolved [16:02:33] 10Analytics, 10Analytics-Kanban: Wikistats map's choropleth shows the same color for 0 and minimum nonzero value - https://phabricator.wikimedia.org/T269883 (10fdans) 05Open→03Resolved [16:02:35] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: mediawiki-wikitext-history-2020-10 failed - https://phabricator.wikimedia.org/T269032 (10fdans) 05Open→03Resolved [16:02:39] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Alter table for navigation timing errors out in Hadoop test - https://phabricator.wikimedia.org/T268733 (10fdans) 05Open→03Resolved [16:02:41] 10Analytics, 10Analytics-Kanban: Test hudi and Iceberg as an incremental update system using 2 mediawiki-history snapshots - https://phabricator.wikimedia.org/T262256 (10fdans) 05Open→03Resolved [16:02:43] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade AMD ROCm drivers/tools to latest upstream - https://phabricator.wikimedia.org/T264408 (10fdans) 05Open→03Resolved [16:02:45] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: [SPIKE] Prototype of incremental updates for mediawiki history for simplewiki , including reverts using apache hudi - https://phabricator.wikimedia.org/T258532 (10fdans) [16:02:49] 10Analytics, 10Analytics-Kanban: Add folder creation for sqoop initial installation in puppet - https://phabricator.wikimedia.org/T251788 (10fdans) 05Open→03Resolved [16:02:52] 10Analytics, 10Event-Platform, 10Services: EventGate should use recent service-runner (^2.8.1) with Prometheus support - https://phabricator.wikimedia.org/T272714 (10fdans) [16:02:54] 10Analytics, 10Analytics-Kanban: Establish what data must be backed up before the HDFS upgrade - https://phabricator.wikimedia.org/T260409 (10fdans) 05Open→03Resolved [16:02:56] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: eventgate-wikimedia should emit metrics about validation errors - https://phabricator.wikimedia.org/T257237 (10fdans) 05Open→03Resolved [16:02:58] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Create a temporary hadoop backup cluster - https://phabricator.wikimedia.org/T260411 (10fdans) 05Open→03Resolved [16:03:00] 10Analytics, 10Patch-For-Review: Upgrade the Analytics Hadoop cluster to Apache Bigtop - https://phabricator.wikimedia.org/T273711 (10fdans) [16:03:02] 10Analytics, 10Patch-For-Review: Upgrade the Analytics Hadoop cluster to Apache Bigtop - https://phabricator.wikimedia.org/T273711 (10fdans) [16:03:04] 10Analytics-Kanban: Neflow data pipeline - https://phabricator.wikimedia.org/T257554 (10fdans) [16:03:06] 10Analytics, 10Analytics-Kanban: Update refinery-core Webrequest.isWikimediaHost - https://phabricator.wikimedia.org/T256674 (10fdans) 05Open→03Resolved [16:03:08] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up automatic deletion/snitization for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10fdans) 05Open→03Resolved [16:03:10] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Dropping data from druid takes down aqs hosts - part 2 - https://phabricator.wikimedia.org/T270173 (10fdans) 05Open→03Resolved [16:03:16] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 5 others: Remove http.client_ip from EventGate default schema (again) - https://phabricator.wikimedia.org/T262626 (10fdans) 05Open→03Resolved [16:03:18] 10Analytics, 10Analytics-Kanban: Default hive table creation to parquet - needs hive 2.3.0 - https://phabricator.wikimedia.org/T168554 (10fdans) 05Open→03Resolved [16:03:20] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add urlshortener button to Turnilo - https://phabricator.wikimedia.org/T233336 (10fdans) 05Open→03Resolved [16:05:56] 10Analytics, 10Analytics-Kanban, 10Chinese-Sites: MediaWiki history dumps have some events in 2025 - https://phabricator.wikimedia.org/T235269 (10Samwalton9) Is this data expected to no longer be present? I just did a query on `mediawiki_history` and found some results in 2025. `SELECT count(*) FROM mediawi... [16:12:15] 10Analytics, 10Patch-For-Review: Newpytyer python spark kernels - https://phabricator.wikimedia.org/T272313 (10Ottomata) 05Open→03Resolved This task ended up being a little bit more than Fabian's original bug report, but I think things are looking ok here. Feel free to reopen. [16:40:36] 10Analytics, 10Analytics-Kanban, 10Chinese-Sites: MediaWiki history dumps have some events in 2025 - https://phabricator.wikimedia.org/T235269 (10nshahquinn-wmf) 05Resolved→03Open Just to make sure @Samwalton9's question gets looked at 😊 [16:47:03] !log rebalance kafka partitions for webrequest_text partition 0 [16:47:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:47:49] 10Analytics-Radar, 10SRE: Upgrade to Kafka MirrorMaker 2 - https://phabricator.wikimedia.org/T277467 (10fdans) [16:51:16] 10Analytics, 10Inuka-Team, 10Product-Analytics: Superset timeouts for KaiOS dashboard - https://phabricator.wikimedia.org/T277320 (10fdans) cc @Milimetric @razzi [16:54:10] 10Analytics: Add better monitoring for Analytics UIs - https://phabricator.wikimedia.org/T277729 (10fdans) a:03razzi [16:54:55] 10Analytics-Radar, 10PM: Fix Analytics workflow for #Analytics-EventLogging tasks - https://phabricator.wikimedia.org/T274490 (10fdans) [16:56:23] 10Analytics, 10Event-Platform, 10Product-Data-Infrastructure, 10Product-Analytics (Kanban): [MEP] [BUG] Timestamp format changed in migrated client-side EventLogging schemas - https://phabricator.wikimedia.org/T277253 (10fdans) p:05Triage→03High [16:57:03] 10Analytics, 10Event-Platform, 10Product-Analytics, 10Product-Data-Infrastructure, and 2 others: [MEP] [BUG] dt field in migrated client-side EventLogging schemas is not set to meta.dt - https://phabricator.wikimedia.org/T277330 (10fdans) p:05Triage→03High [16:58:13] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data: Optimize intermediate session length data set and dashboard - https://phabricator.wikimedia.org/T277512 (10fdans) [16:59:54] 10Analytics, 10Analytics-Kanban, 10Research, 10Patch-For-Review: Webrequest.isWMFDomain should return true for .wmflabs.org domains. - https://phabricator.wikimedia.org/T277536 (10fdans) [17:02:47] 10Analytics, 10SRE, 10Traffic: varnishkafka / ATSkafka should support setting the kafka message timestamp - https://phabricator.wikimedia.org/T277553 (10fdans) p:05Triage→03Medium a:03razzi cc @ema [17:03:36] PROBLEM - AQS root url on aqs1010 is CRITICAL: connect to address 10.64.0.40 and port 7232: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/AQS%23Monitoring [17:03:53] this is a new host, downtime expired [17:10:12] 10Analytics, 10Product-Analytics: Hive table neilpquinn.toledo_pageviews missing almost all data - https://phabricator.wikimedia.org/T277781 (10nshahquinn-wmf) [17:11:05] 10Analytics, 10Product-Analytics: Hive table neilpquinn.toledo_pageviews missing almost all data - https://phabricator.wikimedia.org/T277781 (10nshahquinn-wmf) p:05Triage→03High This seems high priority to me; please change if you disagree. [17:27:17] fdans: this is assigend to you and is in ops week [17:27:18] https://phabricator.wikimedia.org/T238243 [17:27:20] should it go elsewhere? [17:28:05] ottomata: no it's fine as it is [17:28:13] fdans: but...ops week? [17:28:25] doesn't that mean we on ops week should be doing something about it? [17:48:21] (03CR) 10Jdlrobson: [C: 03+1] "Sam: Feel free to self-merge this patch. I don't have +2 rights in this repo but can vouch for the change." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) (owner: 10Phuedx) [17:49:46] * elukey afk! [17:55:03] (03CR) 10Mholloway: "> Patch Set 9: Code-Review+1" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) (owner: 10Phuedx) [17:59:53] 10Analytics: Add "did edit" field to pageview_actor - https://phabricator.wikimedia.org/T277785 (10Isaac) [18:18:19] (03CR) 10Ottomata: "It is hard for me to see from this patch, but what is backwards incompatible about it? Are any fields being removed/renamed and/or do any" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) (owner: 10Phuedx) [18:48:24] (03CR) 10Sharvaniharan: Image recommendations table for android (033 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 (owner: 10Sharvaniharan) [18:49:36] hi analytics team. I'd like to launch a long running spark thingy over the weekend. I'd expect it to run for ~40 hours, using the "regular size" job spec from wmfdata. I wanted to give you an heads up; Holler if there's any concern. [18:51:15] 10Analytics, 10Product-Analytics: Hive table neilpquinn.toledo_pageviews missing almost all data - https://phabricator.wikimedia.org/T277781 (10Ottomata) Info from a Slack convo: This is a single file table, data.gz, for which > Oozie job copies, appends to, and replaces that file. I think that approach was... [18:55:45] 10Analytics, 10Product-Analytics (Kanban): Hive table neilpquinn.toledo_pageviews missing almost all data - https://phabricator.wikimedia.org/T277781 (10nshahquinn-wmf) [18:58:49] (03CR) 10Dbrant: [C: 03+1] Image recommendations table for android (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 (owner: 10Sharvaniharan) [19:02:12] (03CR) 10Sharvaniharan: Image recommendations table for android (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 (owner: 10Sharvaniharan) [19:02:32] 10Analytics-Clusters, 10Analytics-Kanban: /wmf/data/raw should be readable by analytics-privatedata-users - https://phabricator.wikimedia.org/T275396 (10Ottomata) ` hdfs dfs -chgrp -R analytics-privatedata-users /wmf/camus ` Could also do the files in /wmf/data/raw, but the directories there are all correct a... [19:02:43] !log hdfs dfs -chgrp -R analytics-privatedata-users /wmf/camus - T275396 [19:02:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:02:47] T275396: /wmf/data/raw should be readable by analytics-privatedata-users - https://phabricator.wikimedia.org/T275396 [19:03:09] ok gmodena thanks! [19:03:26] gmodena: if you like, I'm pretty sure you can !log just like I did in here [19:03:37] might be good for things like that [19:03:45] your message will go to https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:03:46] ottomata TIL [19:04:01] and if there is a problem one of us might check that first, to see if anything important changed [19:04:04] ottomata i'll do from now on. Thanks for the pointer! [19:04:22] cool! doesn't hurt to ping us too [19:04:56] oh another pointer! we've been using the a-team as a keyword ping (sorry for the ping team!), so if you use that it will ping us all [19:05:49] gmodena: also if you add a phab ticket ID in your !log message, it will add a comment like this to your phab ticket [19:05:50] https://phabricator.wikimedia.org/T275396#6926402 [19:06:31] ottomata love the alias :D [19:06:46] :) [19:10:18] (03CR) 10Ottomata: [WIP] Refactor EventLoggingSanitization to a generic job: RefineSanitize (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670321 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [19:10:54] (03PS4) 10Ottomata: [WIP] Refactor EventLoggingSanitization to a generic job: RefineSanitize [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670321 (https://phabricator.wikimedia.org/T273789) [19:12:37] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Refactor EventLoggingSanitization to a generic job: RefineSanitize [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670321 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [19:13:29] (03CR) 10Phuedx: "> Patch Set 9:" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) (owner: 10Phuedx) [19:15:21] (03CR) 10Ottomata: "> The previously-required token property has been removed as it's superseded by the web_session_id property from the web_identifiers fragm" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) (owner: 10Phuedx) [19:16:11] (03PS5) 10Ottomata: [WIP] Refactor EventLoggingSanitization to a generic job: RefineSanitize [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670321 (https://phabricator.wikimedia.org/T273789) [19:19:24] (03CR) 10Ottomata: "> Patch Set 9:" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) (owner: 10Phuedx) [19:19:50] (03CR) 10Ottomata: "> You can remove it from the list of required fields." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) (owner: 10Phuedx) [19:20:46] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Refactor EventLoggingSanitization to a generic job: RefineSanitize [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670321 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [19:29:10] !log temporarily rename /usr/lib/python2.7/dist-packages/cqlshlib/copyutil.so on aqs1004 to fix https://issues.apache.org/jira/browse/CASSANDRA-11574 [19:29:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:30:54] !log rename /usr/lib/python2.7/dist-packages/cqlshlib/copyutil.so back [19:30:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:48:49] 10Analytics, 10Product-Analytics (Kanban): Hive table neilpquinn.toledo_pageviews missing almost all data - https://phabricator.wikimedia.org/T277781 (10JAllemandou) Hi Neil, Here is what I have: - HDFS tells us the data was touched last time on 2020-12-22 at 00:27, and it weights less than 800Kb (data for 20... [20:07:42] (03PS3) 10Milimetric: [WIP] Update mysql resolver to work with cloud replicas [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666209 (https://phabricator.wikimedia.org/T274690) [20:54:38] (03PS8) 10Razzi: Upgrade superset to 1.0.1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/665130 (https://phabricator.wikimedia.org/T272390) [21:11:48] PROBLEM - Check unit status of refine_sanitize_eventlogging_analytics_immediate on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [21:22:56] RECOVERY - Check unit status of refine_sanitize_eventlogging_analytics_immediate on an-launcher1002 is OK: OK: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [21:34:13] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) Oof. Today I discovered that `RefineTarget.find` will not work as on paths that have more than just datetim... [21:58:54] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) Hmm, maybe something like: `lang=scala val q = spark.table("event.mediawiki_revision_create").where("year... [21:59:21] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) ping @JAllemandou for thoughts. [23:20:39] 10Analytics, 10Product-Analytics, 10Structured-Data-Backlog: Create a Commons equivalent of the wikidata_entity table in the Data Lake - https://phabricator.wikimedia.org/T258834 (10nettrom_WMF) p:05Low→03Triage Moving this back to Analytics now that the dump exists, and changed the priority so the team...