[00:39:53] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Privacy Engineering, and 4 others: Remove http.client_ip from EventGate default schema (again) - https://phabricator.wikimedia.org/T262626 (10CDanis) >>! In T262626#6564427, @Ottomata wrote: > @CDanis @Krinkle this does leave `http.client_ip` in the w3c/... [02:37:53] !log re-run webrequest-load-wf-{text,upload}-2020-10-21-{19,20} oozie jobs after they timed out waiting for data due to camus misconfiguration (fixed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/635678) [02:37:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [03:42:11] hare:https://wikitech.wikimedia.org/wiki/Analytics/AQS/Mediarequests al [03:42:46] There we go, thank you [06:06:17] !log execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chown -R analytics /wmf/data/archive/geoip" on an-launcher1002 - permission issues for 'analytics' and /wmf/data/archive/geoip [06:06:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:31:11] !log restart turnilo to apply new settings for wmf_netflow [06:31:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:38:43] 10Analytics-Clusters, 10Patch-For-Review: Review an-coord1001's usage and failover plans - https://phabricator.wikimedia.org/T257412 (10elukey) There is a major problem in the plan of having two coordinators, namely the fact that we hardcode an-coord1001 all over oozie properties in refinery. This means that a... [06:49:49] 10Analytics-Clusters: Ensure Puppet checks types as part of the build - https://phabricator.wikimedia.org/T261693 (10elukey) @razzi is this something that can be closed in favor of T166066? If not, let's add proper tags to this task so it is visible to others (for example, puppet-complier, operations, etc..) [06:50:12] 10Analytics-Clusters, 10Operations, 10ops-eqiad, 10User-Elukey: replace onboard NIC in kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T236327 (10elukey) ping :) [06:52:01] mmmmm this is piwik.wikimedia.org [06:52:02] upstream connect error or disconnect/reset before headers. reset reason: connection termination [06:54:11] !log restart httpd on matomo1002, errors while connecting [06:54:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:57:31] 10Analytics, 10Patch-For-Review, 10User-Elukey: Move https termination from nginx to envoy (if possible) - https://phabricator.wikimedia.org/T240439 (10elukey) Very strange, when I tested piwik.wikimedia.org I get a white-background screen with `upstream connect error or disconnect/reset before headers. rese... [06:58:40] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Create the new Hadoop test cluster - https://phabricator.wikimedia.org/T255139 (10elukey) Created all the webrequest tables from the hive scripts in refinery. [07:03:57] !log decom analytics1057 from the Hadoop cluster [07:03:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:18:43] 1057 is the last host, then the backup cluster will be ready [07:48:17] 10Analytics, 10Analytics-Kanban: Update Wikidata usage metric - https://phabricator.wikimedia.org/T264945 (10mforns) This is done! https://analytics.wikimedia.org/published/datasets/periodic/reports/metrics/structured-data/ [08:52:19] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Create the new Hadoop test cluster - https://phabricator.wikimedia.org/T255139 (10elukey) [09:00:41] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Create the new Hadoop test cluster - https://phabricator.wikimedia.org/T255139 (10elukey) Remaining steps: * Replace an-tool1006 with a buster VM (already in progress) * Create a new VM for Druid * Create a new VM for Hue [09:11:01] elukey: Morning and thanks :D [09:13:22] GoranSM: ping? You have a big R job running on stat1005. Is it restartable? I have to reboot stat1005 sometime later today. [09:22:09] :) [09:46:00] 10Analytics-Clusters, 10Patch-For-Review: Review an-coord1001's usage and failover plans - https://phabricator.wikimedia.org/T257412 (10elukey) After reading https://docs.cloudera.com/documentation/enterprise/latest/topics/admin_ha_hiveserver2.html I had an idea about a possible way forward, that may also be a... [10:02:38] 10Analytics-Radar, 10Anti-Harassment, 10CheckUser, 10Privacy Engineering, and 2 others: Deal with Google Chrome User-Agent deprecation - https://phabricator.wikimedia.org/T242825 (10Instance) Well, safari does the same. [10:02:50] elukey: the rocm38 kernel module does not compile against the kernel we have on stat1005 [10:04:02] whatt [10:04:22] do they expect a 5.x kernel? [10:04:45] I don't know yet [10:04:48] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade AMD ROCm drivers/tools to latest upstream - https://phabricator.wikimedia.org/T264408 (10klausman) Unfortunately, the rocm38 kernel module does not compile against our current Buster kernel (4.19.0-12): ` CC [M] /var/lib/dkms/amdgpu/3.... [10:05:00] ^^^ has a c&p of the compile error [10:07:06] But yeah, it looks like we need something newer than 4.19 for rocm 3.8 [10:07:13] I can give 3.7 a go [10:08:32] yes it could be a good option [10:09:06] Cc: moritzm: --^ [10:09:47] TL;DR: we are tying to deploy the new dkms amd drivers on stat1005 and we got a compile error :( [10:10:13] klausman: let's try 3.7 then, and see [10:12:07] CL for adding 3.7 to apt incoming [10:15:37] https://github.com/RadeonOpenCompute/ROCm/tree/e0361edcf8c982a7660a664af3427917e494becd#Supported-Operating-Systems I suspect it'll work, since some 4.18 distros are supported, but who knows what patchsets they have. [10:18:53] klausman: reviewed, I think we are missing a few other configs for apt [10:20:15] You're right of course. Do you want me to fold the change to the host override into this one or make is separate? [10:22:11] let's keep them separate if possible [10:23:02] ROger [10:24:45] +1! [10:24:56] I am going afk in a bit, feel free to proceed with stat1005 [10:34:21] Aye aye, cap'n [10:38:50] * elukey afk! [10:52:57] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade AMD ROCm drivers/tools to latest upstream - https://phabricator.wikimedia.org/T264408 (10klausman) And 3.7 has the same problem: ` LD [M] /var/lib/dkms/amdgpu/3.7-20/build/amd/amdkcl/amdkcl.o CC [M] /var/lib/dkms/amdgpu/3.7-20/build... [11:22:00] good morning! [11:22:02] look who's up early [11:33:02] God, I hate doing kernel module stuff on the stat machines. Since the machine's are completely overloaded, everything takes lik 10x the time it should, and then puppet snipes me every time I am almost making progress [11:36:58] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Privacy Engineering, and 4 others: Remove http.client_ip from EventGate default schema (again) - https://phabricator.wikimedia.org/T262626 (10Ottomata) Alternatively, instead of producing these to eventgate-logging-external and to logstash, we could prod... [11:43:39] 10Analytics-Clusters, 10Patch-For-Review: Review an-coord1001's usage and failover plans - https://phabricator.wikimedia.org/T257412 (10Ottomata) I like! Could we do the same idea, but instead of using DNS CNAMEs, use LVS instead? Might be easier to do failovers with LVS than waiting for DNS TTLs to expire.... [11:44:32] elukey: also, bad news: even 3.3 does not want to compile against the latest kernel we have (4.19.0-12). So we got ninja'd by a kernel upgrade. 4.19.0-11 works fine. [11:44:49] moritzm: ^^^ [11:46:29] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade AMD ROCm drivers/tools to latest upstream - https://phabricator.wikimedia.org/T264408 (10klausman) After some more experimenting, I have found that at least rocm33 compiles fine against 4.19.0-11, but fails with 4,19.0-12, with the above e... [11:47:14] klausman: you can disable puppet while yo uwork [11:47:24] But I need the agent to set up apt etc right [11:47:30] ah [11:48:04] so I am doing manual runs, but the scheduled ones snipe me when I am trying to do it "interactively" so I can see what's going on. [11:48:51] At any rate, I have found the problem (4.19.0-12 vs. -11), and will now have lunch. stat1005 is back in working state. [11:51:13] !log camus-eventgate-analytics-external now uses EventStreamConfig to discovery topics to ingest and canary topics to monitor [11:51:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:41:12] 10Analytics-Clusters, 10Patch-For-Review: Review an-coord1001's usage and failover plans - https://phabricator.wikimedia.org/T257412 (10elukey) @Ottomata LVS inside the analytics vlan is problematic :( +1 for naming, I don't have any particular preference [12:49:32] klausman: nice work! Really sad about it :( [12:53:44] 10Analytics-Clusters, 10Patch-For-Review: Review an-coord1001's usage and failover plans - https://phabricator.wikimedia.org/T257412 (10Ottomata) > LVS inside the analytics vlan is problematic :( Oh right we wanted to do that for druid long ago but couldn't. But why should it be! I don't remember why it didn... [13:00:25] 10Analytics-Clusters, 10Patch-For-Review: Review an-coord1001's usage and failover plans - https://phabricator.wikimedia.org/T257412 (10elukey) >>! In T257412#6571309, @Ottomata wrote: >> LVS inside the analytics vlan is problematic :( > Oh right we wanted to do that for druid long ago but couldn't. But why s... [13:01:41] 10Analytics-Clusters, 10Patch-For-Review: Review an-coord1001's usage and failover plans - https://phabricator.wikimedia.org/T257412 (10Ottomata) Sure! [13:03:42] !log restart turnilo to pick up new wmf_netflow settings [13:03:42] elukey: So, kernel 5.9 when? ;) [13:03:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:04:15] klausman: I am a bit confused about why the module fails to compile with a higher kernel version [13:04:27] !log camus-eventgate-analytics_events now uses EventStreamConfig to discovery topics to ingest and canary topics to monitor - T251609 [13:04:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:04:32] T251609: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 [13:06:17] elukey: AIUI, a kernel symbol got renamed/moved between -11 and -12 (kinda weird doing this on a minor version bump) [13:06:32] Mortitz might know more [13:07:39] klausman: because Debian started to import AMD rocm, so if we encountered a bug in their kernels it might be good to report it [13:08:47] Are we using strictly upstream (Debian) kernels? [13:08:54] yep [13:12:46] So when you say they imported rocm, do you mean they are re-distributing the same debs we have been using? Or do they roll their own? [13:15:45] I think the latter, but for the moment IIRC it is only an initial work in sid [13:15:58] a little bit too soon for us [13:17:30] interesting https://wiki.debian.org/AMDGPUDriverOnStretchAndBuster2 [13:17:35] it talks about a patch for 4.19 [13:18:08] anyway, let's put everything in the task and then we can ping Moritz, so we can decide a common plan [13:18:46] in theory if we switch to 5.x we shouldn't need the dkms stuff [13:19:04] but at the same time it is a backported kernel yada yada [13:23:44] Yeah. [13:25:56] Hm, neither of the patches there mentions the mission symbols [13:25:59] missing* [13:35:12] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade AMD ROCm drivers/tools to latest upstream - https://phabricator.wikimedia.org/T264408 (10klausman) https://wiki.debian.org/AMDGPUDriverOnStretchAndBuster2 indicates that soem people are experimenting with rocm on Debian. The page mentions... [13:55:40] PROBLEM - Throughput of EventLogging EventError events on alert1001 is CRITICAL: 65.51 ge 30 https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Administration https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=13&fullscreen&orgId=1 [13:56:12] !log camus-eventgate-main_events now uses EventStreamConfig to discover topics to ingest, but still uses regex to find topics to monitor - T251609 [13:56:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:56:15] T251609: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 [14:02:04] hey teammm! [14:03:38] hola hola! [14:04:10] RECOVERY - Throughput of EventLogging EventError events on alert1001 is OK: (C)30 ge (W)20 ge 1.662 https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Administration https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=13&fullscreen&orgId=1 [14:05:44] !log bump camus version to wmf12 for all camus jobs. should be no-op now. - T251609 [14:05:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:05:47] T251609: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 [14:19:12] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Add sbisson to analytics-privatedata-users and create a kerberos identity - https://phabricator.wikimedia.org/T265969 (10elukey) 05Open→03Resolved ` elukey@krb1001:~$ sudo manage_principals.py create sbisson --email_address=sbisson@w... [14:19:26] 10Analytics, 10Event-Platform: Q2 goal. Deploy the canary event monitoring for some event streams - https://phabricator.wikimedia.org/T263696 (10Ottomata) Status: Canary events are enabled for all eventgate-analytics-external stream and most eventgate-analytics streams. Stream topics for ingestion are discov... [14:19:36] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 3 others: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10Ottomata) Status: Canary events are enabled for al... [14:19:59] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Q2 goal. Deploy the canary event monitoring for some event streams - https://phabricator.wikimedia.org/T263696 (10Ottomata) [14:26:14] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 3 others: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10Ottomata) [14:28:48] (03PS1) 10Ottomata: bin/camus - Remove support for 'dynamic-stream-config' [analytics/refinery] - 10https://gerrit.wikimedia.org/r/635846 (https://phabricator.wikimedia.org/T251609) [14:33:42] (03PS2) 10Ottomata: bin/camus - Remove support for 'dynamic-stream-config' [analytics/refinery] - 10https://gerrit.wikimedia.org/r/635846 (https://phabricator.wikimedia.org/T251609) [14:37:44] mforns: i'm pretty sure that archive maxmind thing is razzi's new/changed job [14:37:54] not sure what the status is, but you should check with him when he gets on [14:37:57] ottomata: ah! [14:38:02] ok, thanks [14:38:25] mforns: I just answered to the email, fixed it this morning [14:38:36] I didn't see the email in the middle of all the ones in alerts@ sorry [14:38:40] elukey: ah ok, I couldn't see any logs [14:38:45] no no [14:38:55] is it normal that I can not see any logs? [14:39:05] sudo journalctl -u archive-maxmind-geoip-database --since '2020-10-21 00:00:00' [14:39:15] should show me sth no? [14:39:48] it is something that we should investigate more, IIUC for the timers that don't have syslog logging to a file we depend only on journald, that periodically rotates logs [14:40:08] I am not sure what it is the retention of those logs though, because when it rotates they disappear [14:41:07] elukey: even if I ask to start at 2020-10-21 00:00:00, it only starts at 2020-10-22 10:33:39, about 8 hours buffer IIUC [14:41:28] mforns: Here's what I get from journalctl: [14:41:28] ```razzi@an-launcher1002:~$ sudo journalctl -u archive-maxmind-geoip-database.service [14:41:28] -- Logs begin at Thu 2020-10-22 10:33:39 UTC, end at Thu 2020-10-22 14:39:54 UTC. -- [14:41:28] -- No entries -- [14:41:28] ``` [14:41:40] razzi: exactly, same [14:42:04] What happened is that I deployed a maxmind archiving change, tested it, and rolled it back. But I didn't clean up the service it created [14:42:11] When I debug the issue, I'll redeploy it [14:42:26] razzi: I already re-deployed, all good :) [14:42:30] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 3 others: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10Ottomata) [14:42:40] I logged in the chan what I did, it was a quick chown of the dirs [14:43:18] (I fixed the perm issue before seeing your revert, then I thought to just rollout again) [14:43:33] I logged the command used in the SAL [14:45:31] 10Analytics, 10Analytics-Kanban: Refine event pipeline at this time refines data in hourly partitions without knowing if the partition is complete - https://phabricator.wikimedia.org/T252585 (10Ottomata) This is now possible for any stream that sets `canary_events_enabled: true` in EventStreamConfig. [14:45:36] 10Analytics, 10Analytics-Kanban: Refine event pipeline at this time refines data in hourly partitions without knowing if the partition is complete - https://phabricator.wikimedia.org/T252585 (10Ottomata) a:03Ottomata [14:46:53] mforns: 2 simple changes for you to review: [14:47:05] looking [14:47:08] - https://gerrit.wikimedia.org/r/c/analytics/refinery/+/628349 [14:47:09] and [14:47:13] https://gerrit.wikimedia.org/r/c/analytics/refinery/+/635846 [14:47:34] the first is just an update to the python eventstreamconfig stuff (which...is now unused) [14:47:43] the second is removing the support for using it from the camus wrapper [14:58:55] (03CR) 10Mforns: [C: 03+1] "LGTM! Left a comment for a potential shortening of code?" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/628349 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [15:01:17] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Add sbisson to analytics-privatedata-users and create a kerberos identity - https://phabricator.wikimedia.org/T265969 (10SBisson) @elukey All good. Thanks! [15:02:17] (03CR) 10Mforns: [C: 03+1] "LGTM!!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/635846 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [15:04:07] (03CR) 10Ottomata: [C: 03+2] bin/camus - Remove support for 'dynamic-stream-config' [analytics/refinery] - 10https://gerrit.wikimedia.org/r/635846 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [15:04:09] (03CR) 10Ottomata: [V: 03+2 C: 03+2] bin/camus - Remove support for 'dynamic-stream-config' [analytics/refinery] - 10https://gerrit.wikimedia.org/r/635846 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [15:04:23] (03PS4) 10Ottomata: eventstreamconfig.py - remove custom logic for computing topic lists [analytics/refinery] - 10https://gerrit.wikimedia.org/r/628349 (https://phabricator.wikimedia.org/T251609) [15:04:32] (03CR) 10Ottomata: eventstreamconfig.py - remove custom logic for computing topic lists (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/628349 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [15:06:36] (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/628349 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [15:06:50] (03CR) 10Ottomata: [C: 03+2] eventstreamconfig.py - remove custom logic for computing topic lists [analytics/refinery] - 10https://gerrit.wikimedia.org/r/628349 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [15:06:52] (03CR) 10Ottomata: [V: 03+2 C: 03+2] eventstreamconfig.py - remove custom logic for computing topic lists [analytics/refinery] - 10https://gerrit.wikimedia.org/r/628349 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [15:08:23] (03PS2) 10Ottomata: [WIP] Spark JsonSchemaConverter - additionalProperties with schema is always a MapType [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/629406 (https://phabricator.wikimedia.org/T263466) [15:14:51] big puppet refactoring for hive settings in https://gerrit.wikimedia.org/r/c/operations/puppet/+/635844 [15:14:59] let me know if you like the idea [15:15:08] if so I'd like to do the same for oozie [15:15:21] so we have a more centralized config [15:23:20] elukey: want to create the keytab for the ganeti vm before meetings? [15:23:43] yep gimme 10 mins [15:23:49] cool [15:23:51] can you read the docs in the meantime? [15:24:26] Good idea :) [15:32:11] elukey: qq when deleting leftovers, should I also delete the actual user folder, or just leave it empty? in hdfs too? [15:32:35] mforns: nuke everything :) [15:32:43] ok :] [15:33:06] 10Analytics: Check home/HDFS leftovers of jkumarah - https://phabricator.wikimedia.org/T263715 (10mforns) ==== stat1005 === mforns@stat1005:~$ ls /home/jkumalah/ ==== stat1006 === mforns@stat1006:~$ ls /home/jkumalah/ ==== stat1007 === mforns@stat1007:~$ ls /home/jkumalah/ **banner_history_2019_11_25.out** ==... [15:36:14] elukey: I can only delete files in hdfs, no permits to delete them from stat* machines... [15:37:40] 10Analytics: Check home/HDFS leftovers of jkumarah - https://phabricator.wikimedia.org/T263715 (10mforns) Deleted /user/jkumalah from HDFS. Could not delete folders in stat* boxes, no permits... [15:38:46] mforns: yep I'll do it [15:39:19] razzi: going to the bc [15:46:48] 10Analytics: Check home/HDFS leftovers of joewalsh - https://phabricator.wikimedia.org/T265447 (10mforns) ==== stat1005 mforns@stat1005:~$ ls -lh /home/joewalsh/ total 0 ==== stat1006 mforns@stat1006:~$ ls -lh /home/joewalsh/ total 4.0K -rw-rw-r-- 1 14815 wikidev 0 Mar 13 2020 file_name.tsv -rw-rw-r-- 1 1481... [15:47:09] mforns ops week handoff? [15:49:48] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Infrastructure-Data, and 4 others: Session Length Metric. Web implementation - https://phabricator.wikimedia.org/T248987 (10Mholloway) >>! In T248987#6506486, @Krinkle wrote: > Be sure to enable it on Beta Cluster first as well Is the... [15:54:32] 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T264268 (10mforns) @leila, T264255 is now resolved (I believe a tarball with all required files was copied over to a public location). Please, can you confirm that we can proceed to delete the data in stat100* machines and HDFS? [15:57:15] 10Analytics: Check home/HDFS leftovers of shiladsen - https://phabricator.wikimedia.org/T264269 (10mforns) @Shilad ping? :] [16:01:17] mforns: will be testing enthropy job (cc fdans ) and hopefully not making as big of a mess [16:01:34] ok nuria no problemo [16:26:53] 10Analytics: Check home/HDFS leftovers of shiladsen - https://phabricator.wikimedia.org/T264269 (10Shilad) You can trash everything! Sorry for the delayed response. [16:27:14] 10Analytics, 10Operations, 10procurement: Check data currently stored on thorium and drop what it is not needed anymore - https://phabricator.wikimedia.org/T265971 (10elukey) [16:29:31] this had the wrong visibility tag sigh --^ [16:30:32] 10Analytics: Check data currently stored on thorium and drop what it is not needed anymore - https://phabricator.wikimedia.org/T265971 (10elukey) [16:52:55] 10Analytics, 10Analytics-Kanban: Filter non-mediawiki hostnames at ingestion time - https://phabricator.wikimedia.org/T266130 (10razzi) p:05Triage→03High Temporarily going to block this user agent; hoping to deprecate this system eventually. [16:53:21] 10Analytics, 10Analytics-Kanban: Filter non-mediawiki hostnames at ingestion time - https://phabricator.wikimedia.org/T266130 (10razzi) a:03fdans [16:57:16] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10JavaScript, 10Wikimedia-production-error: OperationError: The operation failed for an operation-specific reason in generateRandomSessionId - https://phabricator.wikimedia.org/T263041 (10razzi) a:03Ottomata [16:57:18] 10Analytics: [refinery-source] Add encoding config to surefire plugin to avoid building issues - https://phabricator.wikimedia.org/T265058 (10razzi) 05Open→03Resolved [16:58:36] 10Analytics: Check home/HDFS leftovers of rush - https://phabricator.wikimedia.org/T265121 (10razzi) p:05Triage→03Medium [17:02:06] 10Analytics, 10Analytics-Wikistats: Wikistats active editors metric reporting unrealistic numbers - https://phabricator.wikimedia.org/T265322 (10razzi) p:05Triage→03High a:03Milimetric [17:03:08] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Product-Infrastructure-Data, and 2 others: OperationError: The operation failed for an operation-specific reason in generateRandomSessionId - https://phabricator.wikimedia.org/T263041 (10Ottomata) Am trying to read the stack trace a bit, am I wron... [17:05:34] 10Analytics, 10Analytics-Wikistats: pagecounts-ez uploads stopped after 9/24 - https://phabricator.wikimedia.org/T265378 (10razzi) a:03Milimetric We are aware; see https://phabricator.wikimedia.org/T251777. The new data is available at https://dumps.wikimedia.org/other/pageview_complete/. [17:06:43] 10Analytics, 10Analytics-Wikistats: pagecounts-ez uploads stopped after 9/24 - https://phabricator.wikimedia.org/T265378 (10Milimetric) p:05Triage→03High Documentation for the new data is coming up, and we'll send communication on the mailing lists once we vet the new data and it's ready. [17:09:18] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Product-Infrastructure-Data, and 2 others: OperationError: The operation failed for an operation-specific reason in generateRandomSessionId - https://phabricator.wikimedia.org/T263041 (10Jdlrobson) >>! In T263041#6572213, @Ottomata wrote: > Am try... [17:14:51] 10Analytics: Add cache to MaxMindDB setup - https://phabricator.wikimedia.org/T265516 (10razzi) p:05Triage→03Medium [17:27:27] 10Analytics: Add editors per country data for non-Wikipedia projects - https://phabricator.wikimedia.org/T265510 (10Milimetric) 05Open→03Declined We're open to discussing this, so please don't take my closing the task as any kind of "final word", @Pamputt, and thank you for raising the issue. So, in general... [17:27:29] 10Analytics, 10Analytics-Wikistats: pagecounts-ez uploads stopped after 9/24 - https://phabricator.wikimedia.org/T265378 (10S1magreene) oh cool, thank you! [17:27:44] 10Analytics-EventLogging, 10Analytics-Radar, 10Event-Platform, 10Product-Infrastructure-Data, and 2 others: OperationError: The operation failed for an operation-specific reason in generateRandomSessionId - https://phabricator.wikimedia.org/T263041 (10Milimetric) Feels radar-ey for us. If folks decide to... [17:27:48] 10Analytics: Check data currently stored on thorium and drop what it is not needed anymore - https://phabricator.wikimedia.org/T265971 (10razzi) To do: backup archive directory to hdfs and delete from this node. [17:27:54] 10Analytics-Radar, 10MediaWiki-General, 10Platform Team Workboards (Clinic Duty Team): Proposal: drop kafka-php dependency from MediaWiki - https://phabricator.wikimedia.org/T265966 (10Ottomata) > Neither of these features were ever used to my knowledge. They were! But they are not anymore. Yes please let... [17:27:57] 10Analytics-Radar, 10MediaWiki-General, 10Platform Team Workboards (Clinic Duty Team): Proposal: drop kafka-php dependency from MediaWiki - https://phabricator.wikimedia.org/T265966 (10Ottomata) [17:28:19] 10Analytics: Create monthly job for canonical pageviews - https://phabricator.wikimedia.org/T265732 (10razzi) p:05Triage→03High [17:28:21] 10Analytics: Quick data exploration CLI - https://phabricator.wikimedia.org/T265765 (10razzi) p:05Triage→03Low [17:54:32] ottomata: I was trying to figure out how these _SUCCESS flags get here, and I have no idea! [17:54:32] https://github.com/wikimedia/analytics-refinery/blob/master/oozie/events/datasets.xml#L25 [17:54:43] that dataset ^ is relying on them being there [17:55:18] so that's why we can't rely on just the canaries, and I thought the flow was refine -> _REFINED -> add partitions to hive metadata store -> _SUCCESS [17:55:40] but usually we do that from a "load" oozie job and I can't find something like that for mediawiki_page_move [17:57:51] * elukey afk! [18:09:48] milimetric: yeah load is only for webrequest [18:09:51] raw [18:10:40] milimetric: dunno what would put _SUCCESS there [18:36:08] no we have a big load job for the sqooped tables too. But yeah... this must be happening somewhere.... I’ve gotta run out for a bit and will look more [18:59:41] 10Analytics-Clusters, 10Operations: Switch Zookeeper to profile::java - https://phabricator.wikimedia.org/T264176 (10Ottomata) a:03razzi [19:04:30] 10Analytics-Clusters: Improve logging for HDFS Namenodes - https://phabricator.wikimedia.org/T265126 (10razzi) a:03razzi [19:28:54] 10Analytics-Clusters, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-09-15) upgrade/replace memory in stat100[58] - https://phabricator.wikimedia.org/T260448 (10Cmjohnson) I have not received the PDUs yet [19:56:03] 10Analytics-Radar, 10Operations, 10ops-eqiad: an-presto1004 down - https://phabricator.wikimedia.org/T253438 (10Cmjohnson) I am not sure why this is not here yet. I am calling Dell to follow up [19:56:56] 10Analytics-Clusters, 10Operations, 10ops-eqiad, 10User-Elukey: replace onboard NIC in kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T236327 (10Cmjohnson) @elukey I am sorry but I have ot push these off to the first week of November. Let's coordinate a schedule next week. [20:17:51] i'm setting up an (initially very hacky) integration environment that provides yarn/hive/hdfs(/eventgate/druid/elasticsearch/mediawiki/...) to our job scheduling. Finding it very tedious to generate and load sample data into hive, not expecting any magic but wondering if there are pre-existing examples from analytics work? [20:45:22] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Privacy Engineering, and 4 others: Remove http.client_ip from EventGate default schema (again) - https://phabricator.wikimedia.org/T262626 (10Krinkle) p:05Low→03Medium //Raising priority per parent task.// This task is about the default and by exten... [20:56:50] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Privacy Engineering, and 4 others: Remove http.client_ip from EventGate default schema (again) - https://phabricator.wikimedia.org/T262626 (10Jdlrobson) The value of `mw.user.sessionId()` would work for my needs. [20:59:32] 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T264268 (10leila) @mforns Please go ahead and purge. [22:57:30] 10Analytics-EventLogging, 10Analytics-Radar, 10Event-Platform, 10Product-Infrastructure-Data, and 2 others: OperationError: The operation failed for an operation-specific reason in generateRandomSessionId - https://phabricator.wikimedia.org/T263041 (10Krinkle) [22:59:32] 10Analytics-EventLogging, 10Analytics-Radar, 10Event-Platform, 10Product-Infrastructure-Data, and 2 others: OperationError: The operation failed for an operation-specific reason in generateRandomSessionId - https://phabricator.wikimedia.org/T263041 (10Krinkle) What does the spec say? Why does it happen? I... [23:00:59] 10Analytics-EventLogging, 10Analytics-Radar, 10Event-Platform, 10Product-Infrastructure-Data, and 2 others: OperationError: The operation failed for an operation-specific reason in generateRandomSessionId - https://phabricator.wikimedia.org/T263041 (10Nuria) if this is not super urgent i can work on it on...