[01:33:43] PROBLEM - Hadoop NodeManager on an-worker1131 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [01:56:13] RECOVERY - Hadoop NodeManager on an-worker1131 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [05:28:22] 10Analytics: Reset Kerberos password for nahidunlimited - https://phabricator.wikimedia.org/T282077 (10Nahid) [05:56:25] goood morning [05:57:27] 10Analytics, 10Platform Team Workboards (Image Suggestion API): AirFlow collaboration between PE and DE - https://phabricator.wikimedia.org/T282033 (10gmodena) [06:05:45] 10Analytics-Clusters, 10Analytics-Kanban: Migrate eventlog1002 to buster - https://phabricator.wikimedia.org/T278137 (10elukey) +1 [06:45:03] Good morning [06:45:48] Morning! [06:46:10] How are you tanny411 ? [06:47:11] joal: Yep good, stuck with nt being able to creating jars, haha. WIll have a talk today right? [06:47:35] tanny411: I assume:( [06:47:53] tanny411: sorry - Yes, talk today, and I'll try our patch before that [06:48:04] tanny411: the spark notebook works now, right? [06:48:17] Yes, notebooks work <3 [06:48:52] \o/ [06:49:28] tanny411: we'll also bootstrap the analysis of the whole triple set, even if the task with the details of the analysis has not been created yet [06:50:49] joal: Sound good [08:14:05] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Growth-Team, and 3 others: Revisions missing from mediawiki_revision_create - https://phabricator.wikimedia.org/T215001 (10Protsack.stephan) Hey, sorry for late reply. We do use `revision-create` but don't use `rev_is_revert` field, so removing that fiel... [09:14:35] 10Analytics, 10Platform Team Workboards (Image Suggestion API): AirFlow collaboration between PE and DE - https://phabricator.wikimedia.org/T282033 (10ArielGlenn) You know there is another airflow-common-usage task around, here it is: T237361 [09:19:26] !log starting decommission of eventlog1002 [09:19:28] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:20:10] hnowlan: \o/ [09:40:06] 10Analytics-Clusters, 10Data-Persistence-Backup: Evaluate possible solutions to backup Analytics Hadoop's HDFS data - https://phabricator.wikimedia.org/T277015 (10elukey) @jcrespo quick question - if we want to move forward with this, do we need hardware planned for next fiscal? I know that the use case is ver... [09:49:13] elukey: I have a very quick little review before I hit go if you have a sec https://gerrit.wikimedia.org/r/c/operations/puppet/+/685746/ [09:49:16] just to prevent noise [09:49:21] 10Analytics-Clusters, 10Data-Persistence-Backup: Evaluate possible solutions to backup Analytics Hadoop's HDFS data - https://phabricator.wikimedia.org/T277015 (10jcrespo) > do we need hardware planned for next fiscal Absolutely yes. I thought that clear, and something you were handling on your own or with my... [09:56:53] milimetric: Hi! I am trying to gather data about spambots that have been blocked globally, so I have been parsing the monthly pages of stewards requests (e.g. https://meta.wikimedia.org/wiki/Steward_requests/Global/2021-04)... is there a table where I could get this information directly from? Thanks :) [10:11:24] hnowlan: looks good, but are the dhcp/netboot entries removed in another patch? [10:13:24] left a comment in the patch : [10:13:25] :) [10:13:33] * elukey lunch! [10:50:47] elaragon: ok, so what are spambots? Users in the group "bot" that spam? Or something else? [11:38:24] huh, eventlog1002 had a bad PTR this whole time [11:38:42] only for its mgmt interface though [11:38:53] -80 1H IN PTR eventlog1002.mgmt.eqiad.wmnet. [11:38:54] 80 1H IN PTR wmf4751.mgmt.eqiad.wmnet. [11:39:42] good that it didn't cause problems :) [11:40:26] elukey: let me know when you're back from lunch :) [11:46:46] I am! [11:47:22] o/ [11:48:00] elukey: I'd like a brain bounce on the thread/cassandra problem - I think they are related and would like to discuss [11:48:51] on meet?? [11:50:53] if possible elukey [11:51:59] sure [11:52:19] I'm in the cave [12:11:17] milimetric: I am defining spambots as users globally blocked by a steward and also marked as "spam-only account: spambot", e.g., https://meta.wikimedia.org/wiki/Special:CentralAuth/AnonymousRebellion [12:15:01] !log changed eventlogging CNAME to point to eventlog1003 [12:15:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:27:21] joal: testing the change in hadoop test, for the moment no complains from the RMs [12:29:05] joal: all good, +1 to proceed in prod? [12:29:10] (coffee first) [12:29:11] please elukey! [12:29:13] sure :) [12:38:11] (03PS15) 10Kosta Harlan: [WIP] Create structured_task/article/link_suggestion_interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) [12:39:28] !log restart Yarn RMs to apply the dominant resource calculator setting - T281792 [12:39:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:39:32] T281792: Yarn NM stopping due to failures while creating native threads - https://phabricator.wikimedia.org/T281792 [12:40:09] * joal watches yarn UI :) [12:43:27] joal: done :) [12:44:20] elukey: doing some checks [12:45:39] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10phuedx) >>! In T279382#7062430, @mforns wrote: > And this is not a problem for the Ev... [12:46:20] something else I have noticed elukey: since we moved to CapSch the time to launch any job is longer [12:46:39] joal: minutes longer or seconds longer? [12:46:46] seconds longer [12:46:52] ah okok fiuuu [12:47:02] it took between 1 and 2 seconds, it's now between 5 and 10 [12:47:15] interesting [12:47:50] Resource negociation is more expensive I think [12:48:14] elukey: I confirm the UI is back to what I'm used to <3 [12:48:26] * joal sees correct VCPUs anew \o/ [12:48:35] niceeeeee [12:48:46] perfect one thing fixed [12:48:53] joal: can you kick off the test to see if it breaks? [12:49:06] elukey: Indeed i was about to tell ou I'm doing so :) [12:49:11] <3 [12:49:58] Job started [12:50:00] Monitoring [12:53:34] still failing elukey [12:54:01] will check to see if errors are the same [12:55:46] Same problem again elukey [12:56:29] elukey: Now the thing I don't understand is why some cassandra3 jobs generate the problem and some don't :( [12:57:10] joal: it is part of the fun [12:57:19] :) [12:59:28] joal: can I see the app_id? [12:59:32] curious about the logs [12:59:34] sure [12:59:46] sudo -u analytics kerberos-run-command analytics yarn logs --applicationId application_1620304990193_0015 | less [12:59:49] elukey: --^ [13:02:25] joal: very interesting that the java.lang.OutOfMemory happened at the container level, but nothing is registered in NM logs [13:02:30] (the worker is an-worker1082.eqiad.wmnet) [13:02:34] yup [13:03:54] elukey: on logs for c3-jobs not having the problem, many of the the lines about "Using native clock to generate timestamps." and "You are creating too many HashedWheelTimer instances" are not present [13:04:23] I wonder if the driver things it needs to instanciate some timer for this job while it doesn't do so for others - werid [13:06:39] (03PS16) 10Kosta Harlan: Create structured_task/article/link_suggestion_interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) [13:06:54] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10Ottomata) > deploy a partial revert of your change to re-introduce the URL length lim... [13:09:13] joal: ok step two https://gerrit.wikimedia.org/r/c/operations/puppet/+/685314 [13:10:16] this is a little bit more delicate, need to test it in hadoop test [13:10:56] elukey: I imagine this won't have effect on my cassandra problem :) [13:12:07] joal: in theory this raises the limits for threads in the yarn nm cgroup, worth to try [13:12:22] +1 [13:12:28] also it makes it uniform across nodes [13:20:18] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10phuedx) >>! In T279382#7065711, @Ottomata wrote: >> deploy a partial revert of your c... [13:20:56] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10Ottomata) I see, so forward port only if we have to? [13:26:04] ok I have a systemd config that works, I missed some settings [13:26:08] works in hadoop test [13:26:14] I am testing it on an-worker1080 [13:26:16] \o/ [13:27:50] looks good, extending it to analytics107x [13:29:19] !log roll restart of hadoop yarn nodemanagers to pick up TasksMax=26214 [13:29:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:38:15] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 3 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10Ottomata) [13:39:26] joal: started a very slow roll restart of all NMs, it will probably take some time [13:39:40] No problem elukey - Thanks for doing that [13:39:43] like 30 mins [13:39:47] then we can re-test [13:40:00] elukey: I'll be gone for kids soon, we'll retest after (sorry for that) [13:40:20] actually elukey, you can check it: https://hue.wikimedia.org/hue/jobbrowser/#!id=0011324-210426062240701-oozie-oozi-C [13:40:56] tick the box of the first failed job and click the rerun button - Don't touch anything on the popup, just confirm - And you have a restart [13:41:36] joal: sure [13:43:10] hola teamm [13:43:42] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 3 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10Ottomata) [13:43:43] hola marcel [13:43:48] hi mforns [13:45:34] 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-New-Editors-Banner-Campaigns: Drop old WMDEBanner events from Hive - https://phabricator.wikimedia.org/T281300 (10Ottomata) I've created a new task to track down the EventLogging usage: {T282131} [13:49:26] 10Analytics, 10Patch-For-Review: Yarn NM stopping due to failures while creating native threads - https://phabricator.wikimedia.org/T281792 (10elukey) After a chat with Joseph we decided to proceed one change at the time: * Enable Dominant Resource Calculator for Yarn (to take vcores into account, not only me... [13:50:48] 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-New-Editors-Banner-Campaigns: Drop old WMDEBanner events from Hive - https://phabricator.wikimedia.org/T281300 (10CorinnaHillebrand_WMDE) >>! In T281300#7063143, @Ottomata wrote: > @gabriel-wmde @CorinnaHillebrand_WMDE @Tim_WMDE thanks for the ping, @Ottom... [13:51:08] taking a little break [13:51:24] ottomata: o/ I am stepping away for a bit, the Yarn NM are restarting, but nothing on fire :D [13:51:42] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10mforns) @phuedx Is the new code in production already? @Ottomata What do you mean wi... [13:54:24] 10Analytics, 10Event-Platform: WikidataCompletionSearchClicks Event Platform Migration - https://phabricator.wikimedia.org/T282140 (10Ottomata) [13:54:56] ok elukey [13:54:57] :) [13:58:19] (03PS1) 10Ottomata: Add WikidataCompletionSearchClicks legacy schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/685807 (https://phabricator.wikimedia.org/T282140) [14:00:48] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [14:00:56] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [14:05:43] joal: is there anything I can provide to dig into those import failures on the new cluster? are they consistently happening? [14:06:27] 10Analytics, 10Analytics-Kanban: Analytics Presto improvements - https://phabricator.wikimedia.org/T266639 (10Ottomata) [14:06:29] 10Analytics-Clusters: Co-locate Presto with Hadoop worker nodes - https://phabricator.wikimedia.org/T256108 (10Ottomata) 05Stalled→03Declined Not going to colocate after all :) [14:06:51] 10Analytics-Clusters: Configure Yarn to be able to locate nodes with a GPU - https://phabricator.wikimedia.org/T264401 (10Ottomata) a:03elukey [14:09:28] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) Cat herding being tracked in {T282131} [14:09:52] 10Analytics, 10Event-Platform, 10Patch-For-Review: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) Before migrating, we want to see if we can find an actual owner for this implementation. https://phabricator.wikimedia.org/T262433#7062838 [14:21:13] mforns: o/ yt? [14:21:23] yep! sup? [14:21:25] q about https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Data_retention#Purging_Strategies [14:21:27] is that right? [14:21:30] what is minimal purge? [14:21:37] lookin [14:23:42] ottomata: I think this is somewhat old: when I was doing EL audits, this was a way of discussing allowlisted fields in a simple way. Currently schema/stream owners are more aware of how sanitization works, and this is not needed I think! [14:23:43] 10Analytics, 10Platform Team Workboards (Image Suggestion API): AirFlow collaboration between PE and DE - https://phabricator.wikimedia.org/T282033 (10Milimetric) @ArielGlenn oh yeah, I remember that. At this point we've pretty much decided to go forward with AirFlow. I read T237361#5636979 and it seems to m... [14:23:53] ok great thanks! [14:24:08] ottomata: want me to remove that? [14:24:11] i'm edting now [14:24:13] k [14:24:13] i'll remove that section [14:24:15] ? [14:24:18] ok ok [14:24:26] righit/ the whole section? [14:25:01] ottomata: yeaaaa [14:28:35] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10phuedx) >>! In T279382#7066066, @mforns wrote: > @phuedx Is the new code in productio... [14:30:12] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10Ottomata) @mforns I guess I mean what Sam was saying, if we get a ton of errors relat... [14:30:22] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10Ottomata) > It'll be live on all Wikipedias by EOD today. Oh my, we should prep the m... [14:34:03] 10Analytics, 10Event-Platform, 10MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), 10MW-1.37-notes (1.37.0-wmf.3; 2021-04-27), and 2 others: extensions/EventBus - Use UserGroupManager instead of User group methods - https://phabricator.wikimedia.org/T281825 (10Vlad.shapik) [14:35:37] joal: I re-run one job (application_1620304990193_0232) and I don't see the OutOfMemory problem anymore, but it fails anyway for other things [14:37:37] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Apache Beam go prototype code for DP evaluation - https://phabricator.wikimedia.org/T280385 (10Milimetric) p:05Medium→03High I'm making this high so we can try and pick it up, but it's still behind lots of other work. The pro... [14:40:35] (03CR) 10Hnowlan: "> Patch Set 1:" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/682933 (https://phabricator.wikimedia.org/T278701) (owner: 10Hnowlan) [14:40:38] (03PS2) 10Hnowlan: Add grants and schema CQL [analytics/aqs] - 10https://gerrit.wikimedia.org/r/682933 (https://phabricator.wikimedia.org/T278701) [14:44:33] ok mforns i updated the two retention and sanitization pages to make them apply to the changes and make them not eventlogging specific [14:44:35] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Event_Sanitization [14:44:37] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Event_Data_retention [14:44:43] (^ was moved) [14:45:08] I'm going to send an FYI email to give folks a heads up about the changes [14:47:21] 10Analytics, 10Analytics-Dashiki, 10Analytics-Kanban: npm install gives Verification failed while extracting mediawiki-storage@https://github.com/wikimedia/analytics-mediawiki-storage/archive/master.tar.gz - https://phabricator.wikimedia.org/T278982 (10Milimetric) Thanks, good point, I added a note at the be... [14:47:32] (03CR) 10Hnowlan: [C: 03+2] Add grants and schema CQL [analytics/aqs] - 10https://gerrit.wikimedia.org/r/682933 (https://phabricator.wikimedia.org/T278701) (owner: 10Hnowlan) [14:48:39] (03Merged) 10jenkins-bot: Add grants and schema CQL [analytics/aqs] - 10https://gerrit.wikimedia.org/r/682933 (https://phabricator.wikimedia.org/T278701) (owner: 10Hnowlan) [14:49:45] (03PS2) 10Hnowlan: Use Cassandra 3 syntax in schema [analytics/aqs] - 10https://gerrit.wikimedia.org/r/682934 (https://phabricator.wikimedia.org/T278701) [14:58:12] (03CR) 10Hnowlan: [C: 03+2] Use Cassandra 3 syntax in schema [analytics/aqs] - 10https://gerrit.wikimedia.org/r/682934 (https://phabricator.wikimedia.org/T278701) (owner: 10Hnowlan) [14:59:04] 10Analytics, 10Event-Platform, 10Patch-For-Review: WikidataCompletionSearchClicks Event Platform Migration - https://phabricator.wikimedia.org/T282140 (10Ottomata) @EBernhardson said > hmm, most likely it doesn't need ips/geo. The stats are language based rather than region Will proceed with this migration [14:59:08] (03Merged) 10jenkins-bot: Use Cassandra 3 syntax in schema [analytics/aqs] - 10https://gerrit.wikimedia.org/r/682934 (https://phabricator.wikimedia.org/T278701) (owner: 10Hnowlan) [15:04:39] 10Analytics-Clusters, 10Analytics-Kanban, 10Cassandra, 10Patch-For-Review: Set up a testing environment for the AQS Cassandra 3 migration - https://phabricator.wikimedia.org/T257572 (10hnowlan) a:05elukey→03hnowlan [15:05:10] (03PS3) 10Hnowlan: Add docker-compose environment with cassandra [analytics/aqs] - 10https://gerrit.wikimedia.org/r/679295 (https://phabricator.wikimedia.org/T257572) [15:06:01] 10Analytics-Clusters, 10Analytics-Kanban, 10Cassandra, 10Patch-For-Review: Set up a testing environment for the AQS Cassandra 3 migration - https://phabricator.wikimedia.org/T257572 (10hnowlan) In lieu of having a test environment or a WMCS cluster, we are planning on pursuing a docker-compose environment... [15:08:12] 10Analytics-Clusters, 10Analytics-Kanban: Migrate eventlog1002 to buster - https://phabricator.wikimedia.org/T278137 (10hnowlan) 05Open→03Resolved [15:10:25] 10Analytics, 10Analytics-Kanban: Crunch and delete many old dumps logs - https://phabricator.wikimedia.org/T280678 (10Milimetric) The format looks like [[ https://en.wikipedia.org/wiki/Common_Log_Format | Common Log Format ]] with two additional fields, "full URI requested" and "user agent" If it's just a sim... [15:26:33] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Apache Beam go prototype code for DP evaluation - https://phabricator.wikimedia.org/T280385 (10Isaac) > I'm making this high so we can try and pick it up, but it's still behind lots of other work. The proof of concept is very usef... [15:32:19] joal: I tried another re-run, application_1620304990193_0354, and it seems failing for too many open files [15:32:29] like [15:32:29] Caused by: java.io.FileNotFoundException: /var/lib/hadoop/data/l/yarn/local/filecache/5617/refinery-cassandra-0.1.10.jar/com/google/common/base/Throwables.class (Too many open files) [15:33:11] that looks a bit weir [15:33:14] *weird [15:45:07] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:57:28] elukey: very weird indeed! [15:57:34] will look into that as well [15:57:37] :S [16:00:07] 10Analytics, 10Event-Platform, 10Patch-For-Review: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10EYener) @Ottomata when would this migration need to happen? If we have a deadline, we might be able to prioritize it and help find an owner. [16:00:35] joal: but it is progress! [16:00:52] indeed elukey - different error is progress for sure :) [16:27:29] 10Analytics, 10Event-Platform, 10Patch-For-Review: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) The sooner the better, but there isn't yet a deadline. We need to either migrate or decom all legacy EventLogging streams in order to turn off the old eventlog... [16:33:54] 10Analytics-Clusters, 10Analytics-Kanban: Refresh Druid nodes (druid100[1-3]) - https://phabricator.wikimedia.org/T255148 (10Ottomata) a:03hnowlan [16:35:04] 10Analytics, 10Platform Team Workboards (Image Suggestion API): AirFlow collaboration between PE and DE - https://phabricator.wikimedia.org/T282033 (10ArielGlenn) >>! In T282033#7066226, @Milimetric wrote: > > So where are you at in the process? You can follow this task until we have something that's... [16:45:19] 10Analytics-Radar, 10Event-Platform, 10MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), 10MW-1.37-notes (1.37.0-wmf.3; 2021-04-27), and 2 others: extensions/EventBus - Use UserGroupManager instead of User group methods - https://phabricator.wikimedia.org/T281825 (10Ottomata) [17:02:27] 10Analytics-Radar, 10Dumps-Generation: Temp files left around in wikistats_1/ ? - https://phabricator.wikimedia.org/T280311 (10Milimetric) So it looks like the https://dumps.wikimedia.org/other/wikistats_1.0/ folder is empty, so that can be deleted. The https://dumps.wikimedia.org/other/wikistats_1 folder con... [17:10:38] 10Analytics, 10Event-Platform, 10Platform Engineering: Add expiry info to mediawiki.page-restrictions-change stream - https://phabricator.wikimedia.org/T282057 (10Ottomata) p:05Triage→03Low [17:11:17] 10Analytics: Reset Kerberos password for nahidunlimited - https://phabricator.wikimedia.org/T282077 (10Ottomata) a:03razzi [17:12:45] 10Analytics: Add wikitech (labswiki) to the sqoop list - https://phabricator.wikimedia.org/T217792 (10Ottomata) p:05Triage→03Low [17:14:03] 10Analytics, 10Analytics-Wikistats: Wikistats New Feature - bot edits / new articles - https://phabricator.wikimedia.org/T241922 (10Ottomata) p:05Triage→03Medium [17:16:01] 10Analytics, 10Analytics-Wikistats, 10good first task: Wikistats Bug - easy to understand language for pageviews - https://phabricator.wikimedia.org/T263973 (10Ottomata) p:05Triage→03High [17:17:04] 10Analytics: Investigate lowering "per-article" resolution data in AQS - https://phabricator.wikimedia.org/T144837 (10Ottomata) p:05Triage→03Low [17:23:15] 10Analytics: Mediarequests: Remove icon images from top lists - https://phabricator.wikimedia.org/T242033 (10Ottomata) 05Open→03Declined p:05Triage→03Low [17:24:26] 10Analytics, 10Pageviews-API: Pageviews API should allow specifying a country - https://phabricator.wikimedia.org/T245968 (10Ottomata) 05Open→03Declined Reopen if needed [17:33:14] 10Quarry, 10cloud-services-team (Kanban): quarry-web-01 out of disk space - https://phabricator.wikimedia.org/T282171 (10Bstorm) p:05Triage→03High [17:33:41] olja: you want us to add links to the deep dive doc under the different sections? [17:36:35] 10Quarry, 10cloud-services-team (Kanban): quarry-web-01 out of disk space - https://phabricator.wikimedia.org/T282171 (10Bstorm) It's being caused by /tmp filling up. [17:38:01] 10Quarry, 10cloud-services-team (Kanban): quarry-web-01 out of disk space - https://phabricator.wikimedia.org/T282171 (10Bstorm) All files are owned by www-data, so this is a quarry bug. [17:50:43] 10Quarry, 10cloud-services-team (Kanban): quarry-web-01 out of disk space - https://phabricator.wikimedia.org/T282171 (10Bstorm) `lang=shell-session bstorm@quarry-web-01:/tmp$ df -h . Filesystem Size Used Avail Use% Mounted on /dev/vda3 19G 3.5G 15G 20% / ` Now I am curious about the why here... [17:53:52] 10Quarry, 10cloud-services-team (Kanban): quarry-web-01 out of disk space - https://phabricator.wikimedia.org/T282171 (10Bstorm) p:05High→03Medium [17:58:51] 10Quarry, 10cloud-services-team (Kanban): quarry-web-01 out of disk space - https://phabricator.wikimedia.org/T282171 (10Bstorm) 05Open→03Resolved It does not seem to be currently leaking temp files. I'm not sure why the temp files filed the disk. Closing this for now. [18:02:24] * elukey afk! [18:09:01] Hi mforns - do you wish to spend a minute talking about ops-weekd sync? [18:09:11] joal: please :] [18:09:16] mforns: cave? [18:09:18] ok! [18:10:54] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 3 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10Ottomata) [18:13:02] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Decommission EventLogging backend components by migrating to MEP - https://phabricator.wikimedia.org/T238230 (10Ottomata) [18:35:30] 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-New-Editors-Banner-Campaigns: Drop old WMDEBanner events from Hive - https://phabricator.wikimedia.org/T281300 (10mforns) @Merle_von_Wittich_WMDE & @GoranSMilovanovic, regarding deletion of historical data: > @Merle_von_Wittich_WMDE I don't think so. All the... [18:37:19] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 3 others: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned - https://phabricator.wikimedia.org/T282131 (10Ottomata) [18:45:51] 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-New-Editors-Banner-Campaigns: Drop old WMDEBanner events from Hive - https://phabricator.wikimedia.org/T281300 (10Ottomata) > Can you confirm, then, that we can delete data older than 90 days? :-) And/or can we just stop collecting this data altogether? :) [19:11:25] 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-New-Editors-Banner-Campaigns: Drop old WMDEBanner events from Hive - https://phabricator.wikimedia.org/T281300 (10GoranSMilovanovic) @mforns @Ottomata @Merle_von_Wittich_WMDE - We have all our Campaign reports for the WMDE New Editors team already rendered... [19:29:46] 10Analytics-Clusters, 10Analytics-Kanban, 10Technical-blog-posts: Story idea for Blog: Migration of the Analytics Hadoop infrastructure to Apache Bigtop - https://phabricator.wikimedia.org/T277133 (10srodlund) @elukey I am finished going through this. The post itself is great and super interesting, so I don'... [19:34:36] 10Analytics: Article missing from the Clickstream dataset - https://phabricator.wikimedia.org/T282178 (10diego) [19:36:12] (03CR) 10Gergő Tisza: [C: 03+2] helppanel: Add machineSuggestion as valid editor_interface [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/684821 (https://phabricator.wikimedia.org/T280564) (owner: 10Kosta Harlan) [19:36:46] (03Merged) 10jenkins-bot: helppanel: Add machineSuggestion as valid editor_interface [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/684821 (https://phabricator.wikimedia.org/T280564) (owner: 10Kosta Harlan) [19:42:15] 10Analytics-Clusters, 10Analytics-Kanban, 10Technical-blog-posts: Story idea for Blog: Migration of the Analytics Hadoop infrastructure to Apache Bigtop - https://phabricator.wikimedia.org/T277133 (10elukey) @srodlund thanks a lot for the review! I accepted all the comments and left two questions for you, fe... [19:44:27] 10Analytics-Clusters, 10Analytics-Kanban, 10Technical-blog-posts: Story idea for Blog: Migration of the Analytics Hadoop infrastructure to Apache Bigtop - https://phabricator.wikimedia.org/T277133 (10Ottomata) I just google image searched 'migrating elephants', saw one that looked good, and then saw that the... [19:46:38] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: WikidataCompletionSearchClicks Event Platform Migration - https://phabricator.wikimedia.org/T282140 (10Ottomata) [19:47:16] 10Analytics-Clusters, 10Analytics-Kanban, 10Technical-blog-posts: Story idea for Blog: Migration of the Analytics Hadoop infrastructure to Apache Bigtop - https://phabricator.wikimedia.org/T277133 (10elukey) @Ottomata the image looks so good! :( [19:48:28] 10Analytics-Clusters, 10Analytics-Kanban, 10Technical-blog-posts: Story idea for Blog: Migration of the Analytics Hadoop infrastructure to Apache Bigtop - https://phabricator.wikimedia.org/T277133 (10elukey) @srodlund if you have ideas we all all ears, any elephant-related image could be ok :) [20:35:05] (03PS1) 10Mforns: Add VirtualPageView schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/685919 (https://phabricator.wikimedia.org/T238138) [20:35:10] (03PS17) 10Kosta Harlan: Create structured_task/article/link_suggestion_interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) [20:35:56] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Self-merging for EP VirtualPageView migration." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/685919 (https://phabricator.wikimedia.org/T238138) (owner: 10Mforns) [20:36:31] (03Merged) 10jenkins-bot: Add VirtualPageView schema to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/685919 (https://phabricator.wikimedia.org/T238138) (owner: 10Mforns) [20:36:47] ottomata: hm, I don't have permits to merge this change: https://gerrit.wikimedia.org/r/c/schemas/event/secondary/+/685919 [20:37:01] oops, it just got merged by jenkins [20:37:06] never mind! [20:50:16] 10Analytics, 10Analytics-Kanban: Add password reset to kerberos manage_principals.py - https://phabricator.wikimedia.org/T282185 (10razzi) [20:57:02] 10Analytics-Clusters, 10Analytics-Kanban, 10Technical-blog-posts: Story idea for Blog: Migration of the Analytics Hadoop infrastructure to Apache Bigtop - https://phabricator.wikimedia.org/T277133 (10srodlund) Great! I have resolved the questions, and love the photo! I will publish this post tomorrow! I lef... [21:08:38] ottomata: I prepared the VirtualPageView migration up to step 6: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/685928 [21:08:47] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 4 others: VirtualPageView Event Platform Migration - https://phabricator.wikimedia.org/T238138 (10mforns) [21:09:23] once the instrumentation changes are fully deployed, I will deploy this in the backport window to testwiki and check it works. And then same to prod. [21:27:31] !log sudo manage_principals.py reset-password nahidunlimited --email_address=nsultan@wikimedia.org [21:27:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:28:44] 10Analytics: Reset Kerberos password for nahidunlimited - https://phabricator.wikimedia.org/T282077 (10razzi) This should be all set; check your email for your reset password. [22:08:05] 10Analytics, 10Event-Platform, 10Patch-For-Review: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10EYener) Fair enough, @Ottomata! Is Q1 too late for this self-imposed deadline? [22:59:03] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: ApacheBeam prototype for DP noise addition with pageview privacy units on top of Spark - https://phabricator.wikimedia.org/T282195 (10Nuria) [23:48:57] 10Analytics, 10Event-Platform, 10Patch-For-Review: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) That'd be great we can work with that! Thank you.