[00:06:17] RECOVERY - Hadoop NodeManager on an-worker1130 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [00:27:21] PROBLEM - Hadoop NodeManager on an-worker1115 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [00:27:59] PROBLEM - Hadoop NodeManager on an-worker1106 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [00:30:01] RECOVERY - Hadoop NodeManager on an-worker1115 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [00:30:41] RECOVERY - Hadoop NodeManager on an-worker1106 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [00:43:05] PROBLEM - Check unit status of monitor_refine_eventlogging_legacy on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_legacy https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:51:55] PROBLEM - Hadoop NodeManager on an-worker1135 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [01:09:13] RECOVERY - Hadoop NodeManager on an-worker1135 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [03:18:05] PROBLEM - Hadoop NodeManager on an-worker1130 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [03:30:53] RECOVERY - Hadoop NodeManager on an-worker1130 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [05:51:58] good morning [05:56:35] another big query caused the errors on 1130 [05:56:36] https://yarn.wikimedia.org/jobhistory/attempts/job_1619507802557_51897/m/KILLED [05:56:43] checking the others [06:02:07] another one seems to be application_1619507802557_48904 but I don't find it in yarn [06:02:50] but this time user analytics [06:19:28] I have a theory on this, not sure what it is causing it though [06:19:44] let's pick an-worker1130 [06:20:26] elukey@an-worker1130:~$ sudo systemctl status hadoop-yarn-nodemanager.service | grep Task Tasks: 618 (limit: 11059) [06:21:03] and the processes running as yarn are currently 5 [06:21:49] the "tasks" are counting, IIUC, both userland threads and also kernel threads [06:22:21] the limit is set using some heuristic depending on kernel + hw, but in our case it probably too low [06:22:40] as soon as the containers start to increase, the threads raise and we hit a ceiling [06:22:43] (this is my theory) [06:23:00] so we could raise the limit, it should be very safe [06:23:11] the defaults from the kernel are probably not ok for our use case [06:23:30] (maybe this is a side effect of the capacity scheduler? different container allocation etc..) [06:39:48] so we could apply a systemd override to control the TaskMax parameter [07:46:34] wow - cool analysis elukey! [07:49:13] elukey, joal: Hi, I am not able to connect to scala spark kernel in jupyter lab. It shows connecting for a while and then disconnected. Thanks to joal, I made sure to kinit. Python3 kernel works fine though. [07:49:13] joal: To use arq in terminal do i have to install arq, scala separately? [07:49:34] Hi tanny411 [07:49:59] joal: bonjour :) started https://gerrit.wikimedia.org/r/c/operations/puppet/+/685314 [07:50:02] elukey: we checked yesterday and it seems tanny411 cannot look at her kernel logs - could you help with that? [07:50:12] of course [07:50:16] Good morning elukey :) [07:50:24] tanny411: on what node? [07:50:40] Hi also :) [07:50:50] stat1008 [07:51:15] tanny411: for you to be able to use arq on the repl, the jars need to be on the classpath so you can import the needed classes [07:51:29] tanny411: also how are you connecting to stat1008? [07:52:54] tanny411: to add the jar to the running kernel I know 2 ways: The %AddJar magic of toree (see https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb) [07:53:46] tanny411: Or, adding your built jar to your notebook kernel by adding "--jars PATH_OF_JAR" in the spark_opts list [07:53:54] this last version is my prefered one :) [07:54:23] joal: so that actually needs the project to be built properly. getting errors on that. [07:54:23] elukey: I ssh into it. ssh stat1008.eqiad.wmnet -L 8880:127.0.0.1:8880 to be exact [07:54:33] joal: oh okay [07:55:05] ok tanny411 - let's try to make that work [07:55:29] tanny411: I assume the last version of my patch (I corrected some stuff yesterday, it passes jenkins now) [07:56:52] joal: one thing at the time, otherwise I don't understand :) [07:57:31] so tanny411 can see the jupyterhub UI login page and use it, but cannot create a spark notebook [07:57:33] tanny411: The error I see from jenkins on your patch was due to problems in my code - I fixed that [07:57:40] sure elukey - stopping the second thread [07:58:03] what is tanny411's username on stat1008? [07:58:14] elukey: akhatun [07:58:31] ack perfect, checking in the logs [07:59:29] I see stuff like [07:59:30] /srv/home/akhatun/.local/share/jupyter/kernels/scala_spark_scala/bin/run.sh: line 45: /usr/local/spark/bin/spark-submit: No such file or directory [08:00:29] we have /usr/bin/spark2-submit on the host [08:00:36] for that error --^ I think you have not configured the scala-spark home correctly tanny411 [08:01:07] reading the docs [08:01:32] As expected, the problem comes from me forgetting a line in the docs [08:01:36] I'm sorry for that [08:01:41] tanny411, elukey --^ [08:01:46] Correcting :S [08:02:24] always Joseph's fault [08:02:26] :D :D :D [08:02:40] indeed elukey - That's why I ask for help! [08:04:01] ok, corrected - I'm sorry tanny411 - With the "--spark_home="/usr/lib/spark2/" parameter at kernel creation it should work [08:04:15] docs corrected here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter#Scala-Spark_or_Spark-SQL_using_Toree [08:04:55] Great! [08:05:33] supe [08:05:35] *super [08:10:40] thanks elukey :) [08:10:56] gone back to kids [08:23:03] elukey, joal : sorry to bother again, but the kernel still seems to disconnect. Started a fresh ssh connection just to be sure. [08:23:44] tanny411: different error this time! [08:23:45] May 05 08:22:06 stat1008 bash[21411]: Exception in thread "main" org.apache.spark.SparkException: Cannot load main class from JAR file:/srv/home/akhatun/Notebooks/spark.sql.shuffle.partitions=256 [08:23:49] May 05 08:22:06 stat1008 bash[21411]: at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:657) [08:24:40] ah I found a missing --conf [08:25:14] tanny411: I have updated the cmd on the wiki, can you retry? [08:25:23] Altight [08:25:27] alright* [08:27:20] elukey: Worked! Thanks a lot! [08:27:26] gooood!! :) [08:27:31] thank you for the patience :) [08:27:47] :D [08:54:32] 10Analytics-Clusters, 10Analytics-Kanban, 10Technical-blog-posts: Story idea for Blog: Migration of the Analytics Hadoop infrastructure to Apache Bigtop - https://phabricator.wikimedia.org/T277133 (10elukey) @srodlund draft ready! I shared the gdoc with you and the Analytics team :) [08:55:33] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade the Hadoop masters to Debian Buster - https://phabricator.wikimedia.org/T278423 (10elukey) Almost forgot - the procedure should also include T231067#6863800 :) [08:59:51] 10Analytics-Clusters: Could not find class ::profile::swap for an-test-client1001.eqiad.wmnet - https://phabricator.wikimedia.org/T281917 (10elukey) @razzi each check has its own interval, check_puppet_run_changes might run every X hours so it may be slow to update. If you want to get fresh results you can force... [10:47:19] joal, razzi, ottomata - we just got a page in SRE due to a heavy job saturating network pipes, namely GPU training :( [10:47:36] I killed the job and alerted Miriam/Aiko, but if it re-happens the job is easy to spot [10:48:55] * elukey lunch! [11:37:39] Given that the dual loading is working okay now, I might truncate the tables and take the snapshot for the migration to the 3.11 cluster this afternoon [11:47:12] Hi hnowlan - please give me some time before starting, we got alerts for some failing jobs I'd like t investigate first [11:47:43] joal: ack, I'll hold! thanks for that [11:53:05] (03PS1) 10Joal: Correct referrer_daily job's SLA [analytics/refinery] - 10https://gerrit.wikimedia.org/r/685414 [11:55:04] Thanks a lot elukey for the correction of the spark config :S [12:02:57] !log rerun cassandra-daily-wf-local_group_default_T_top_percountry-2021-5-4 [12:03:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:04:32] elukey: I have a fun feeling about the nodemanager issue we're experiencing and capacity-scheduler strange behavior on used CPUs [12:22:27] RECOVERY - Check unit status of monitor_refine_eventlogging_legacy on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_legacy https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:22:52] !log Reset monitor_refine_eventlogging_legacy after manual rerun of failed job [12:22:54] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:39:04] hnowlan: I have 2 jobs failing with weird memory errors - Could it be that keyspace/table configuration could be different for them? [12:39:55] keyspaces are: "local_group_default_T_top_percountry" and "local_group_default_T_unique_devices" [12:43:49] joal: hmm, they shouldn't be much different to the other keyspaces :( all keyspaces had a change to their caching key because the syntax is different in cassandra 3 but it shouldn't affect insertions and definitely shouldn't be different per table (these are the diffs between versions https://gerrit.wikimedia.org/r/c/analytics/aqs/+/682934) [12:44:04] What do the memory errors look like? I doubt I'll be much help but I'm curious [12:46:16] joal: o/ [12:46:24] what kind of feeling do you have for capacity? [12:46:26] hnowlan: My understanding is that the driver creates too many io.netty.util.HashedWheelTimer in the failing cases, while in the non-failing cases it doesn't even report on creating some [12:46:34] hi elukey [12:46:51] elukey: capcity UI currently doesn't report correctly on the number of CPU used per resource [12:47:31] joal: IIRC it doesn't take cpus into account at all with the basic settings, only memory [12:47:53] elukey: Example - https://yarn.wikimedia.org/cluster/scheduler --> application_1619507802557_58723 [12:48:12] This reports 33 containers (correct) with 33 VCPUs [12:48:19] yes yes [12:48:50] https://yarn.wikimedia.org/proxy/application_1619507802557_58723/ shows that we're using 128 tasks, meaning that each worker-container uses 4 VCPUs, not one [12:49:00] joal: so yarn.scheduler.capacity.resource-calculator says [12:49:05] The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator only uses Memory while DominantResourceCalculator uses Dominant-resource to compare multi-dimensional resources such as Memory, CPU etc. A Java ResourceCalculator class name is expected. [12:49:29] elukey: FairScheduler was using RAM-only resource allocation, and was reporting correctly on used CPUs [12:49:37] 10Analytics-Clusters, 10Analytics-Kanban: Migrate eventlog1002 to buster - https://phabricator.wikimedia.org/T278137 (10Ottomata) +1 [12:49:46] elukey: That's why I'm asking [12:50:12] joal: no idea [12:50:16] Could it be that the node manager allocates for 1CPU, and 4 are used, and therefore limits of containers are not prepared correctly? [12:50:21] elukey: --^ [12:50:22] ? [12:50:33] * joal is having ideas way over his head [12:51:11] joal: not sure we should probably check in more depth [12:51:24] right elukey [12:51:26] the band aid of having more threads allowed should give us more time to investigate [12:51:28] will do that later [12:51:33] 10Analytics-Clusters: Could not find class ::profile::swap for an-test-client1001.eqiad.wmnet - https://phabricator.wikimedia.org/T281917 (10Ottomata) Hey sorry yall! I thought I had done a code search and removed all occurrences...must not have noticed this on an-test-client somehow. Thank you. [12:51:34] ack [12:52:07] heya teammm [12:53:07] 10Analytics, 10Analytics-EventLogging, 10dev-images, 10Patch-For-Review: EventLogging dev image should have verbose output enabled - https://phabricator.wikimedia.org/T257378 (10hashar) It seems to be solely for the #analytics team. [12:53:08] joal: but I'd be curious to see if it is the resource-calculator implementation, maybe the fair one handles vcores assigned correctly even if memory is the only thing that matters, and the capacity doesn't [12:57:50] I'm learning about the mediawiki.revision-create kafka topic. If I understand it correctly, those events do not contain the revision payload. Is there a canonical way for accessing the revision content? My use case is along the lines of: given a wikipedia article at times t_1 and t_2, I would like to diff t_2 and t_1 in a real-time data pipeline and check if "an image was added at t_2, which was not present at t_1". [12:59:52] gmodena: you'll need to get the content from the api for that [13:02:24] joal the Action API? Is the preferred access method also for internal use case (potentially high throughput)? [13:03:19] gmodena: The throuput needs to be discussed with your team I think :) And about getting content, I think the new API is the one you should use [13:03:49] joal awesome sauce. Thanks for the pointer :) [13:04:45] gmodena: this type of use case (getting content of revisions and working with them streaming way) are appearing more and more - we should collaborate on providing streams of useful pre-computed content-related info - that would be awesome (in addition to move to the direction of solving more use-cases :) [13:06:40] joal i'd be happy to join forces! Right now I'm justing playing around with a little spike for learning about kafka. Happy to touch base and bounce ideas around in our next chat :) [13:07:12] (03CR) 10Ottomata: "> That seems like a really weird requirement." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/680798 (https://phabricator.wikimedia.org/T254891) (owner: 10Neil P. Quinn-WMF) [13:52:43] Hi! I am trying to gather data about spambots that have been blocked globally. To do this, I am parsing the monthly pages of stewards requests (e.g. https://meta.wikimedia.org/wiki/Steward_requests/Global/2021-04) and then examining individually which of these users have been flagged as spambots (e.g., https://meta.wikimedia.org/wiki/Special:CentralAuth/AnonymousRebellion)... is there a table where I could [13:52:43] get this information directly from? [14:40:17] (03PS8) 10Kosta Harlan: [WIP] Create structured_task/article/link_suggestion_interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) [14:44:59] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Create structured_task/article/link_suggestion_interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) (owner: 10Kosta Harlan) [14:47:09] (03PS9) 10Kosta Harlan: [WIP] Create structured_task/article/link_suggestion_interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) [14:48:11] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, 10Patch-For-Review: EventGate idea: use presence of schema properties in http.(request|response)_headers to automatically set header values in event data - https://phabricator.wikimedia.org/T263466 (10Ottomata) 05Open→... [15:41:41] (03PS2) 10Hnowlan: Add docker-compose environment with cassandra [analytics/aqs] - 10https://gerrit.wikimedia.org/r/679295 [15:42:47] hnowlan: <3 [15:43:42] (03CR) 10Kosta Harlan: "> Patch Set 7:" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) (owner: 10Kosta Harlan) [15:43:55] :D [15:44:07] (03PS10) 10Kosta Harlan: [WIP] Create structured_task/article/link_suggestion_interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) [15:44:19] the above now works (for real I promise) and is ready for review if anyone wants to take it for a spin [15:47:09] 10Analytics, 10Event-Platform: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) [15:47:21] 10Analytics, 10Event-Platform: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) a:03Ottomata [15:47:31] 10Analytics, 10Event-Platform: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) [15:49:35] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:49:38] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [15:50:59] (03CR) 10Ottomata: [C: 03+1] [WIP] Create structured_task/article/link_suggestion_interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) (owner: 10Kosta Harlan) [15:54:42] 10Analytics, 10Analytics-Kanban: Add logic to purging scripts that requires admin action if it's about to delete a lot of data - https://phabricator.wikimedia.org/T270433 (10mforns) a:03mforns [15:55:20] (03PS1) 10Ottomata: Add WikipediaPortal to analytics/legacy [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/685513 (https://phabricator.wikimedia.org/T282012) [16:03:07] 10Analytics, 10Event-Platform, 10Patch-For-Review: WikipediaPortal Event Platform Migration - https://phabricator.wikimedia.org/T282012 (10Ottomata) @mpopov do you know who maintains [[ https://gerrit.wikimedia.org/r/admin/repos/wikimedia%2Fportals | wikimedia/portals ]]? It looks like it has a [[ https://g... [16:04:09] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10mforns) Hi! I just realized that, when this change is deployed to prod, we'll be miss... [16:11:34] mforns: Just to highlight my latest obstacle, https://phabricator.wikimedia.org/T273748#7051951 [16:11:49] Otherwise, the new metrics seem perfectly healthy! [16:19:08] (03PS11) 10Kosta Harlan: [WIP] Create structured_task/article/link_suggestion_interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) [16:22:00] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "My bad, this was all copy pasta from an hourly job, and I tried to fix most of it but keep finding things I missed. Yaaaay oozie... :(" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/685414 (owner: 10Joal) [16:23:20] (03PS12) 10Kosta Harlan: [WIP] Create structured_task/article/link_suggestion_interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) [16:25:24] (03PS13) 10Kosta Harlan: [WIP] Create structured_task/article/link_suggestion_interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) [16:35:33] elukey: have updated the number of tasks on hadoop or not yet? [16:35:53] (03PS14) 10Kosta Harlan: [WIP] Create structured_task/article/link_suggestion_interaction schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) [16:36:37] joal: not yet, still in code review [16:36:55] ack elukey - it looks like my cassandra issue could be related [16:37:16] There still is some behavior from cassandra driver I don't understand though [16:42:00] elukey: https://issues.apache.org/jira/browse/YARN-9839 ? [16:43:56] joal: yeah I have it open as well, but I'd like to test the TaskMax first [16:44:05] yeah [16:44:12] elukey: The root cause of this issue was an OS level configuration which was not letting OS to overcommit virtual memory. [16:44:21] there is a feel of virtual-memory [16:45:59] joal: that was the problem of one single person reporting it, it may be it but we shouldn't trust that solution blindly [16:46:10] yeah true [16:51:19] 10Analytics-Clusters, 10Analytics-Kanban: Migrate eventlog1002 to buster - https://phabricator.wikimedia.org/T278137 (10hnowlan) [16:57:17] gonna start the decom of eventlog1002 [16:59:50] actually before I do - there is a large folder in /srv/home/nuria/T219842_kafka_jumbo_outage. Is that worth saving? cc ottomata elukey razzi [17:00:47] elukey: I wonder - shall we move to DominantResourceCalculator instead of DefaultResourceCalculator? [17:01:22] joal: this is a good question, maybe tomorrow with coffee ? :) [17:01:35] sure elukey :) [17:01:38] <# [17:01:40] <3 [17:04:20] (03CR) 10Nettrom: "Responding to Gergő and Kosta's discussion on generalizing recommendations" (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) (owner: 10Kosta Harlan) [17:11:37] elukey: hey, for when you have time T281809 would be amazing [17:11:39] T281809: Requesting a kerberos identity for user sihe - https://phabricator.wikimedia.org/T281809 [17:11:44] it's blocking a colleague [17:13:34] * joal is stuck in cassandra darkness again :( [17:36:14] !log create principal for sihe: sudo manage_principals.py create sihe --email_address=silvan.heintze@wikimedia.de [17:36:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:37:26] 10Analytics, 10Patch-For-Review: Requesting a kerberos identity for user sihe - https://phabricator.wikimedia.org/T281809 (10razzi) a:03razzi [17:38:16] 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-New-Editors-Banner-Campaigns: Drop old WMDEBanner events from Hive - https://phabricator.wikimedia.org/T281300 (10Merle_von_Wittich_WMDE) hey @GoranSMilovanovic I am wondering if the raw data mentioned above is relevant for your old reports? [17:41:19] Amir1: I was in a meeting but razzi was faster :) [17:41:37] Awesome. Thank you both! [17:47:37] 10Analytics, 10Patch-For-Review: Requesting a kerberos identity for user sihe - https://phabricator.wikimedia.org/T281809 (10razzi) Should be all set; email was sent to silvan.heintze@wikimedia.de. [17:48:37] * elukey afk! [17:52:43] 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-New-Editors-Banner-Campaigns: Drop old WMDEBanner events from Hive - https://phabricator.wikimedia.org/T281300 (10GoranSMilovanovic) @Merle_von_Wittich_WMDE I don't think so. All the datasets that we need to re-render the old reports in R markdown should stil... [17:58:01] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Growth-Team, and 3 others: Revisions missing from mediawiki_revision_create - https://phabricator.wikimedia.org/T215001 (10Ottomata) It's been a couple of weeks since I sent an email asking if anyone needed or used revert info in mediawiki.revision-creat... [18:05:01] 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-New-Editors-Banner-Campaigns: Drop old WMDEBanner events from Hive - https://phabricator.wikimedia.org/T281300 (10Ottomata) A related q as we are figuring this out. Are these used at all? If not, we would like to stop collecting them as part of {T259163}.... [18:05:45] 10Analytics, 10SRE, 10Traffic, 10Patch-For-Review: Add Traffic's notion of "from public cloud" to Analytics webrequest data - https://phabricator.wikimedia.org/T279380 (10CDanis) [18:06:41] 10Analytics, 10SRE, 10Traffic, 10Patch-For-Review: Add Traffic's notion of "from public cloud" to Analytics webrequest data - https://phabricator.wikimedia.org/T279380 (10CDanis) @fdans @JAllemandou New map entry should be ready for Analytics to set up in Turnilo :) [18:06:57] 10Analytics, 10WMDE-Analytics-Engineering, 10WMDE-New-Editors-Banner-Campaigns: Drop old WMDEBanner events from Hive - https://phabricator.wikimedia.org/T281300 (10Ottomata) @gabriel-wmde @CorinnaHillebrand_WMDE @Tim_WMDE [18:10:27] 10Analytics, 10Platform Engineering: AirFlow collaboration between PE and DE - https://phabricator.wikimedia.org/T282033 (10Milimetric) [18:10:46] 10Analytics, 10Platform Engineering: AirFlow collaboration between PE and DE - https://phabricator.wikimedia.org/T282033 (10Milimetric) [18:10:49] 10Analytics, 10Analytics-Kanban: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance - https://phabricator.wikimedia.org/T272973 (10Milimetric) [18:12:23] gmodena / clarakosi: I made this parent task: https://phabricator.wikimedia.org/T282033, sorry for the triple ping, just making sure you see it. Feel free to add/change anything you like, I'll try and track any ongoing work there. [18:15:27] 10Analytics, 10Platform Engineering: Catalog, Categorize, and Templetize existing scheduled workflows - https://phabricator.wikimedia.org/T282035 (10Milimetric) [18:18:01] 10Analytics: Requesting a kerberos identity for user sihe - https://phabricator.wikimedia.org/T281809 (10razzi) 05Open→03Resolved @Silvan_WMDE Read the user guide at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide and comment here or chat in #wikimedia-analytics on IRC if you run int... [18:22:58] 10Analytics-Clusters: Could not find class ::profile::swap for an-test-client1001.eqiad.wmnet - https://phabricator.wikimedia.org/T281917 (10razzi) 05Open→03Resolved Ok, sure enough, the alert has removed an-test-client from its erroring nodes. [18:23:41] fdans: yt? [18:23:49] is EL schema AutomatedRequest used? [18:23:50] looks like no data [18:23:52] and you created it [18:23:52] :) [18:23:58] https://meta.wikimedia.org/wiki/Schema:AutomatedRequest [18:27:03] wat [18:27:18] i created no such thing [18:28:00] fdans: [18:28:00] https://meta.wikimedia.org/w/index.php?title=Schema:AutomatedRequest&action=history [18:28:01] :) [18:28:04] oh I guess I did [18:28:10] I have no memory of this [18:28:19] your memory sounds like it works like mine [18:28:23] LRU purging [18:28:37] ok, then i will mark it to decomission [18:31:27] I remember around that time I was doing the report on weird requests coming from middle east countries on IE11 [18:31:59] so that's kinda related but I have no idea why would I create a new schema for anything like that [18:32:05] oh well [18:33:10] awight: hi! [18:33:13] milimetric terrific, thanks for this! [18:33:16] is EditDebugging schema used at all? [18:33:19] https://meta.wikimedia.org/w/index.php?title=Schema:EditDebugging&action=history [18:33:29] you created it long ago, and it isn't receiving any traffic [18:34:08] same q for EditLifecycle [18:34:49] 10Analytics, 10Platform Team Workboards (Image Suggestion API): AirFlow collaboration between PE and DE - https://phabricator.wikimedia.org/T282033 (10Clarakosi) [18:44:30] 10Analytics, 10Platform Team Workboards (Image Suggestion API): AirFlow collaboration between PE and DE - https://phabricator.wikimedia.org/T282033 (10Clarakosi) [19:30:46] AndyRussG: yt? [19:56:23] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) Ok, I've started sorting through the long tail of schemas in the [[ https://docs.google.com/spreadsheets/... [20:26:04] 10Analytics, 10Product-Analytics: Add timestamps of important revision events to mediawiki_history - https://phabricator.wikimedia.org/T266375 (10Isaac) @Ottomata thanks for the ping. Yeah, I'm aware of the table but the challenge has always been whether you can reconstruct the page restrictions on a page at a... [20:27:45] 10Analytics-Clusters, 10Analytics-Kanban, 10Technical-blog-posts: Story idea for Blog: Migration of the Analytics Hadoop infrastructure to Apache Bigtop - https://phabricator.wikimedia.org/T277133 (10srodlund) @elukey Awesome!!! I will take a pass at this tomorrow! [20:30:16] 10Analytics, 10Product-Analytics: Add timestamps of important revision events to mediawiki_history - https://phabricator.wikimedia.org/T266375 (10Ottomata) Oh interesting. Perhaps we should capture the expiry in that stream too! [20:32:08] 10Analytics, 10Product-Analytics: Default table creation settings results in warnings when querying - https://phabricator.wikimedia.org/T277822 (10Milimetric) 05Open→03Resolved a:03Milimetric That particular warning seems to be gone, and what's left are the log4j warnings, which I'm looking into. Please... [20:38:54] 10Analytics, 10Product-Analytics: Add timestamps of important revision events to mediawiki_history - https://phabricator.wikimedia.org/T266375 (10Isaac) > Oh interesting. Perhaps we should capture the expiry in that stream too! Yeah, if it's straightforward, that'd be appreciated! I actually had a use-case for... [20:54:04] 10Analytics, 10Event-Platform, 10Platform Engineering: Add expiry info to mediawiki.page-restrictions-change stream - https://phabricator.wikimedia.org/T282057 (10Ottomata) [20:55:27] thanks ottomata ! [21:12:54] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Growth-Team, and 3 others: Revisions missing from mediawiki_revision_create - https://phabricator.wikimedia.org/T215001 (10RBrounley_WMF) @Protsack.stephan - do we use the reverts in revision-create? sorry i didn't respond to your email, I actually had f... [21:36:53] ottomata: Thanks for the ping—those two schemas can be removed with great haste :-). I started to introduce them as a volunteer-time thing, back before we had introduced similar events which did this better. I don't believe any of my experimental patches were ever merged. [21:40:09] Unrelatedly, can someone with an-runner access let us know whether this job is throwing any errors? We don't know why it seems to be failing: https://phabricator.wikimedia.org/T273748#7051951