[02:31:04] (03PS1) 10Milimetric: Fix referrer job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683468 [02:38:48] (03PS2) 10Milimetric: Fix referrer job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683468 [02:43:07] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "successful test at https://hue.wikimedia.org/hue/jobbrowser#!id=0003730-210426062240701-oozie-oozi-W" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683468 (owner: 10Milimetric) [02:59:17] !log deployed hotfix for referrer job [02:59:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [03:14:21] for Luca when he wakes up: sqoop test was fine except... one very strange error. One of the tables, "content", seems to not want to sqoop. It's there in clouddb, I checked, but somehow we don't see it and logs say "skipped" [05:25:26] good morning! [05:26:02] weird Dan :( [05:52:56] 10Analytics-Clusters, 10Analytics-Kanban: Re-add disk to an-worker1100 - https://phabricator.wikimedia.org/T281427 (10elukey) @razzi you have the wrong slot, it is the 10th :) ` Enclosure Device ID: 32 Slot Number: 10 Enclosure position: 1 Device Id: 10 WWN: 5000c500c9829a03 Sequence Number: 7 Media Error Cou... [06:03:58] hnowlan: o/ [06:25:56] Good morning [06:27:47] bonjour [06:37:20] For Dan when he gets back, and elukey: I can answer on sqoop and the content table - The 'SKIPPED' message comes from https://github.com/wikimedia/analytics-refinery/blob/master/python/refinery/sqoop.py#L396 [06:38:04] Sending a patch in minutes so that the log-line is more explicit (currently says: "SKIPPING: Table content is not available for database XXXX") [06:38:04] joal: perfect :) [06:38:28] git up [06:38:31] woops [06:39:28] (03PS2) 10Joal: Update cassandra jobs for double loading [analytics/refinery] - 10https://gerrit.wikimedia.org/r/681678 (https://phabricator.wikimedia.org/T280649) [06:39:30] (03PS2) 10Joal: Cleanup cassandra double loading [analytics/refinery] - 10https://gerrit.wikimedia.org/r/681682 (https://phabricator.wikimedia.org/T280649) [06:43:13] git review [06:43:39] mwarf [06:43:54] * joal should stop sending git commands to analytics folks [06:44:04] (03PS1) 10Joal: Better logging for sqoop skipped tables [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683485 [06:49:09] (03CR) 10Joal: [C: 03+1] "> > Maybe we should also add the 'test-data' to this folder, instead of in aqs-deploy repo? <-- comments welcome :)" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/682933 (https://phabricator.wikimedia.org/T278701) (owner: 10Hnowlan) [06:51:15] 10Analytics, 10WMCZ-Stats: Review request: New datasets for WMCZ published under analytics.wikimedia.org - https://phabricator.wikimedia.org/T279567 (10JAllemandou) > You say I will need to "start reworking some of your script to Airflow" – are there any help materials about what needs to be done? Unfortunate... [07:54:15] 10Analytics: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565 (10JAllemandou) [07:54:17] 10Analytics, 10Product-Analytics, 10Pageviews-Anomaly: Too many views to Skathi (moon) on enwiki - https://phabricator.wikimedia.org/T280844 (10JAllemandou) [08:00:04] 10Analytics, 10Product-Analytics, 10Pageviews-Anomaly: Too many views to Skathi (moon) on enwiki - https://phabricator.wikimedia.org/T280844 (10JAllemandou) @kzimmerman : I added the task as a subtask of T280565. I did some further analysis: - Constant distinct IPs and user-agents hourly over a day (~180 i... [09:19:00] Interesting!!! 10:04:36 -!- aikoChou24 [6fff2a52@111-255-42-82.dynamic-ip.hinet.net] has quit [Client Quit] [09:19:03] oops [09:19:04] Again [09:19:13] https://flokkr.github.io/ [09:27:35] I think it was the one done by the Cloudera's Principal that worked on Ozone [09:27:49] but I am a little skeptical about the "kerberos" part :) [09:27:56] :) [09:28:13] ah joal they answered me for the feature store [09:28:22] https://community.hopsworks.ai/t/feature-store-hardware-requirements/466 [09:28:59] as I suspected they run a separate Hive server with Hudi, but it seems that they don't support kerberos for external tables (sigh) [09:53:04] (03PS1) 10Gehel: Use discovery-parent-pom in wikidata toolkit analyzer. [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/683569 [09:59:40] (03CR) 10Gehel: Use discovery-parent-pom in wikidata toolkit analyzer. (031 comment) [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/683569 (owner: 10Gehel) [10:20:58] * elukey lunch! [10:37:25] elukey: yo! gotta run out for an appointment in a few but I'm caught up on scrollback. [10:37:35] so weird that it worked fine in deployment-prep but not in prod [10:37:50] I wonder if we stop the processes on eventlog1002 will 1003 immediately take over? [10:37:57] I know that's not how it's supposed to operate ofc [10:38:05] but might be worth trying [11:06:20] (03CR) 10Addshore: [C: 03+2] Use discovery-parent-pom in wikidata toolkit analyzer. [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/683569 (owner: 10Gehel) [11:07:00] addshore: thanks ! [11:09:10] (03Merged) 10jenkins-bot: Use discovery-parent-pom in wikidata toolkit analyzer. [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/683569 (owner: 10Gehel) [11:27:27] hnowlan: I am wondering if restarting eventlog on 1003 could cause a rebalance, maybe the firewall rules were not in place when it tried to connect? I don't see errors indicating that in the logs, but I can't explain this either [11:42:06] elukey: I'll give it a go [11:43:39] same again `Setting newly assigned partitions set() for group eventlogging_processor_client_side_00` [11:46:19] hnowlan: another test could be to manually stop only one processor of eventlog1002 [11:46:31] (we can add some mins of downtime first) [11:46:56] one topic partition will be freed, I am interested to see if it will be picked up by eventlog1002 or 1003 [11:50:27] just added 900s, testing :) [11:50:50] !log manual stop of one of the eventlog processors on eventlog1002 to see if 1003 takes it over [11:50:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:52:14] ah! it worked! [11:52:17] hnowlan: --^ [11:53:08] elukey: nice! [11:53:11] and weird [11:54:24] there must be some heuristic in the rebalance that we don't know [11:54:52] hnowlan: on 1003 eventlogging-processor@client-side-08.service took the new partition, logs are good [11:55:29] I disabled puppet on 1002, I think that we could slowly move everything to 1003 [11:55:44] if you want to do it now, otherwise later on is fine [11:55:50] I'll add more downtime to 1002 in case [11:56:38] now works for me [11:58:45] perfect :) [11:59:18] hnowlan: added an hour of downtime to 1002 now [11:59:41] thanks! I'll start stopping them in order, will log each one [12:00:15] you can do a single !log if you want, it will suffice [12:00:34] let's make sure that nothing explodes in the logs, and that the volume of events is the same [12:00:39] (03CR) 10Framawiki: [C: 03+2] Fix Docker compatibility with multiinstance [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/682321 (owner: 10Framawiki) [12:00:45] can't think about anything else [12:00:54] ah also host-overview metrics for 1003 [12:01:04] just to make sure that the vm is sized correctly etc.. [12:01:43] ack [12:02:09] <3 [12:02:18] !log stopping processors on eventlog1002 to migrate to eventlog1003 [12:02:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:03:48] (03PS1) 10Milimetric: Add yue.wikibooks to the allow list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683585 [12:04:02] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add yue.wikibooks to the allow list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683585 (owner: 10Milimetric) [12:05:15] (03Merged) 10jenkins-bot: Fix Docker compatibility with multiinstance [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/682321 (owner: 10Framawiki) [12:24:22] Right, everything is migrated [12:26:19] eventlog1003 is... coping but it could probably do with some more CPU [12:28:40] so I wonder whether turning eventlog1002 consumers back on will flip stuff back [12:29:42] huh, yep, it did. [12:31:01] joal: ah, I should've known, thanks for fixing. I knew you had made the sqoopable db, but forgot [12:35:19] !log restarting 2 processors on eventlog1002 [12:35:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:35:38] gonna take lunch, letting some of the workers on eventlog1002 pick up some of the slack while I'm afk [12:36:43] extended downtime on eventlog1002 [12:40:57] super [12:57:10] * elukey bbiab! [13:08:38] Hi milimetric - Would you have manually updated the pageview allowlist during a pageview run by any chance? [13:09:09] milimetric: I'm asking to be sure of the reson for which the pageview-hourly job fialed [13:09:20] joal: I did! I had no idea that was potentially bad, we should take it out of the ops_week procedures if it is [13:09:33] milimetric: no problem, I wanted to triple check [13:09:42] milimetric: reruning the job (I have it all setup_0 [13:09:47] ok [13:09:55] !log Rerun failed pageview-hourly-wf-2021-4-29-11 [13:09:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:10:16] joal: so should we not do that? [13:10:19] I've been doing that for years [13:10:47] milimetric: updating the file manually only causes problem if ou do it at a moment a job reads it :) [13:10:48] hm... I guess it doesn't matter how you sync it, whether with put -f or running hdfs-sync [13:11:02] yeah, so it's just bad luck... ok [13:11:14] funny though, I'll add a note [13:12:19] thanks milimetric :) [13:13:35] milimetric: I think I'll be hitting the same problem you had yesterday with rerunning and perms - Let's try to figure out a solution for that [13:13:54] joal: yea, wanna brainbounce? [13:15:20] sure [13:15:44] hm... I think we should just write a script for it, to find and rerun command line, where we can sudo -u whoever [13:16:53] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) distcp complete, application_id: application_1619507802557_6586 Moving on to creating and repairing hive ta... [13:17:17] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Update event_sanitized_main_allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683421 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [13:34:42] milimetric: ok if i do a refnery sync to get https://gerrit.wikimedia.org/r/c/analytics/refinery/+/683421 out? [13:34:46] i really only need it on an-launcher [13:34:48] so 'll just sync it there [13:34:55] milimetric: I think our problem is permissions [13:35:00] ottomata: I'm fine with that [13:35:14] joal: oh, ok, I'll check the allow list permissions [13:36:13] hi teammmm [13:36:26] (03PS1) 10Ottomata: Update static_data/eventlogging/whitelist.yaml symlink [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683608 [13:36:30] mornin! [13:36:34] :) [13:36:48] ottomata: looking at the changes you pinged me with last night [13:37:01] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Update static_data/eventlogging/whitelist.yaml symlink [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683608 (owner: 10Ottomata) [13:37:51] PROBLEM - Check status of defined EventLogging jobs on eventlog1002 is CRITICAL: CRITICAL: Stopped EventLogging jobs: eventlogging-processor@client-side-11 eventlogging-processor@client-side-10 eventlogging-processor@client-side-09 eventlogging-processor@client-side-08 eventlogging-processor@client-side-07 eventlogging-processor@client-side-06 eventlogging-processor@client-side-05 eventlogging-processor@client-side-04 eventlogging [13:37:51] t-side-03 eventlogging-processor@client-side-02 https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging [13:38:23] joal: indeed, for some reason it didn't have read permissions at the group level [13:38:34] WEIRD! [13:38:39] that's changed from how it would work in the past... I wonder why, directory permissions are the same [13:38:45] anyway I +r-ed the file [13:38:56] milimetric: for ALL, of for group? [13:39:12] milimetric: I confirm it works for me now :) [13:39:26] uh... sorry I'm the worst, it's rw-r--r-- now, and it was rw-r---- before [13:43:35] ok milimetric - I confirm that '+or' was the correct thing to do - all good :) Thanks! [13:53:00] milimetric: Shall I rerun the 2 failed pageview-jourly jobs? [13:53:02] 10Analytics, 10Event-Platform: mediawiki/page/properties-change schema should use map type for added and removed page properties - https://phabricator.wikimedia.org/T281483 (10Ottomata) [13:53:30] !log Rerun failed pageview-hourly-wf-2021-4-29-11 and pageview-hourly-wf-2021-4-29-12 [13:53:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:54:08] thx joal [13:54:27] no prob - thanks for the allowlist fix! [13:55:07] I'm so confused between +or o+r and +r, they seem to do the same thing :) [13:55:22] (in this case, obviously not in general) [13:56:18] (03PS1) 10Ottomata: Add comment in allowlist about mediawiki_page_properties_change [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683620 (https://phabricator.wikimedia.org/T273789) [13:56:52] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Add comment in allowlist about mediawiki_page_properties_change [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683620 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [14:05:23] elukey: think I'll bump eventlog1003 up to 6 CPUs just to give us some room, that seem sensible? [14:08:17] Gone for kids [14:08:45] hnowlan: +1 yes, didn't check graphs but sounds good! [14:09:22] joal: https://docs.google.com/document/d/1-DLugMuUEFu8f3MyZVVEQJYv4SKBHN6rdgREPALoaKY/edit#heading=h.4rhrqlz74n1s :) [14:11:27] oh, you've seen it ! :) great [14:21:02] !log bump eventlog1003 CPUs to 6 [14:21:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:21:12] gonna swing all eventlogging back to eventlog1002 for a reboto [14:21:21] *reboot of eventlog1003 [14:31:47] RECOVERY - Check status of defined EventLogging jobs on eventlog1002 is OK: OK: All defined EventLogging jobs are runnning. https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging [14:32:33] ^ expected [14:37:03] joal: ok, I pushed the topic detection stuff, minimal testing as I came to the conclusion it's not really doing anything [14:37:26] (it's on milimetric/wmf on top of what I think is your latest from joal/wmf) [14:44:14] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10phuedx) This should be deployed to the Beta Cluster any moment now. [14:44:48] !log restored all eventlogging jobs to eventlog1003 [14:44:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:47:59] ACKNOWLEDGEMENT - Check status of defined EventLogging jobs on eventlog1002 is CRITICAL: CRITICAL: Stopped EventLogging jobs: eventlogging-processor@client-side-11 eventlogging-processor@client-side-10 eventlogging-processor@client-side-09 eventlogging-processor@client-side-08 eventlogging-processor@client-side-07 eventlogging-processor@client-side-06 eventlogging-processor@client-side-05 eventlogging-processor@client-side-04 even [14:47:59] or@client-side-03 eventlogging-processor@client-side-02 eventlogging-processor@client-side-01 eventlogging-processor@client-side-00 Hnowlan eventlog1003 now handles these jobs. https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging [14:52:42] 10Analytics-Clusters, 10Analytics-Kanban: Migrate eventlog1002 to buster - https://phabricator.wikimedia.org/T278137 (10hnowlan) eventlog1003 is now handling all eventlogging jobs. [[ https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&var-server=eventlog1003&var-datasource=thanos&var-cl... [14:53:35] hnowlan: nice! [14:55:59] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Rename event_sanitized partition directories to lowercase - https://phabricator.wikimedia.org/T280813 (10fdans) 05Open→03Resolved [14:56:02] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10fdans) [14:56:10] 10Analytics, 10Analytics-Kanban: Duplicate wikitext entries for a bunch of wikis in 2021-02 snapshot - https://phabricator.wikimedia.org/T278551 (10fdans) 05Open→03Resolved [14:56:16] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade the Hadoop coordinators to Debian Buster - https://phabricator.wikimedia.org/T278424 (10fdans) 05Open→03Resolved [14:56:19] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10fdans) [14:56:30] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade furud/flerovium to Debian Buster - https://phabricator.wikimedia.org/T278421 (10fdans) 05Open→03Resolved [14:56:34] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10fdans) [14:56:43] 10Analytics-Clusters, 10Analytics-Kanban: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers - https://phabricator.wikimedia.org/T255973 (10fdans) 05Open→03Resolved [14:56:47] 10Analytics, 10Analytics-Dashiki, 10Analytics-Kanban: npm install gives Verification failed while extracting mediawiki-storage@https://github.com/wikimedia/analytics-mediawiki-storage/archive/master.tar.gz - https://phabricator.wikimedia.org/T278982 (10fdans) 05Open→03Resolved [14:56:52] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add better monitoring for Analytics UIs - https://phabricator.wikimedia.org/T277729 (10fdans) 05Open→03Resolved [14:56:54] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Configure the HDFS Namenodes to use the log4j rolling gzip appender - https://phabricator.wikimedia.org/T276906 (10fdans) 05Open→03Resolved [14:57:31] !log run mysql_upgrade on an-coord1001 to complete the buster upgrade - T278424 [14:57:35] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:57:35] T278424: Upgrade the Hadoop coordinators to Debian Buster - https://phabricator.wikimedia.org/T278424 [14:57:40] now done done done [15:02:31] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 3 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10phuedx) [15:03:06] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 3 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10phuedx) [15:09:37] ottomata: o/ - about hw stuff, I just realized something that might be worth to follow up. an-launcher1002 and an-coord1002 were previously notebook100[3,4], that we bought in Jan 2018 from what I can see. The will complete their 5y life during next FY, and I fear that dcops didn't track them after the relabel [15:09:53] *They will [15:10:47] I know that we discussed about moving hive to vms etc.., but given the fact VMs bigger than 16g are not reccomended I'd think about replacing an-coord1002 [15:34:30] elukey: aye yeah not using VM makes sense [15:34:58] elukey: will they be OOW next FY or the one after? [15:35:19] early 2022, so next FY [15:35:47] elukey: how's yo find that out? phab or some DC ops doc? [15:35:49] or maybe late 2022 [15:35:53] mmm [15:36:00] we racked them in Jan 2018 afaics [15:36:13] (03PS1) 10Milimetric: Fix referrer input events [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683671 [15:36:27] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Fix referrer input events [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683671 (owner: 10Milimetric) [15:36:37] in theory we'd have 2022 too, maybe not really next fiscal, but we'd be in the grey area of no spare parts etc.. [15:37:03] (we already had a disk issue with an-coord1001, Chris added a spare one but we couldn't get any warranty for the rest) [15:37:14] so if the hosts break badly we cannot replace them [15:37:42] an-coord1001 must be older, no? [15:38:10] !log enabling event_sanitized_main jobs - T273789 [15:38:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:38:14] T273789: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 [15:38:32] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) [15:38:38] ottomata: I was wondering about it as well, it may be around its replacement date as well [15:38:51] lets ask in dcops [15:46:14] Hi elukey, quick question for the an-worker1100 disk: I formatted the disk, but when I mount it using `sudo mount -v /dev/sdl1` it gets unmounted immediately by systemd. Looks like `systemctl daemon-reexec` might fix it, is that a safe command to run? [15:47:18] 10Analytics, 10Privacy Engineering, 10Research: Release dataset on top search engine referrers by country, device, and language - https://phabricator.wikimedia.org/T270140 (10Isaac) [15:48:22] razzi: hi! have you updated /etc/fstab accordingly? You should add the new UUID, save the file and then simply mount -a [15:48:45] if you don't do it when you reboot the os will not re-mount the partition [15:50:25] Yeah, I updated fstab, and the disk mounts when I run `mount -a`: [15:50:25] `/var/lib/hadoop/data/k : successfully mounted` [15:50:25] But systemd unmounts it: [15:50:25] `var-lib-hadoop-data-k.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-7bcd4c25\x2da157\x2d4023\x2da346\x2d924d4ccee5a0.device. Stopping, too.` [15:50:25] `Unmounting /var/lib/hadoop/data/k...` [15:51:53] you can try with a systemctl daemon-reload [15:52:44] it is graceful and may help [15:53:17] elukey: cool, that fixed it! [15:53:29] ah nice! [15:55:44] !log restart hadoop-yarn-nodemanager and hadoop-hdfs-datanode on an-worker1100 for hadoop to recognize new disk /dev/sdl [15:55:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:59:37] oh elukey serverrs ordered in 2018...will be up for refresh in 2023, no? [16:01:02] (03CR) 10Mforns: [C: 03+1] "Left a comment just in case, but LGTM +1" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/681678 (https://phabricator.wikimedia.org/T280649) (owner: 10Joal) [16:03:23] ottomata: I think at this point, but we are outside the warranty, and we have only one an-launcher [16:03:30] so say that it breaks, it may be an issue [16:10:46] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 (again) - https://phabricator.wikimedia.org/T281316 (10elukey) 05Open→03Resolved a:03elukey No issues from our side, going to close, please reopen if necessary! [16:17:53] hmm hmm, if launcher breaks...we can use a vm for that? [16:18:10] ottomata: not when sqoop runs :( [16:20:39] hm [16:20:56] sqoop heavy stuff can't run in a yarn app master? [16:21:32] this is outside my knowledge, it is really havy on launcher in its current state [16:21:41] *heavy [16:25:33] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) Backfilling since yesterday's distcp was started. `sudo -u analytics kerberos-run-command analytics /usr/lo... [16:29:03] milimetric: sqoop q above ^^^ [16:30:57] /win 11 [16:30:59] ufff [16:36:51] elukey: isn't it win-10? [16:36:54] :-P [16:48:59] * elukey runs away [16:51:42] 10Analytics, 10Product-Analytics: Top read repeats - https://phabricator.wikimedia.org/T280011 (10kzimmerman) @JAllemandou sorry I missed your comment last week! Is this something that still needs looking into? Tagging with #product-analytics so we can discuss this in our next board triage. T270784 suggests t... [16:57:39] 10Analytics: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565 (10kzimmerman) @JAllemandou There are a couple of other tickets (T270784, T274823) that might be resolved if the automated traffic detection heuristics are improved; should I add them as subtasks? [16:58:52] 10Analytics: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565 (10JAllemandou) >>! In T280565#7046384, @kzimmerman wrote: > @JAllemandou There are a couple of other tickets (T270784, T274823) that might be resolved if the automated traffic detection heuristics ar... [17:01:26] 10Analytics: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565 (10kzimmerman) [17:01:58] 10Analytics-Radar, 10Product-Analytics (Kanban): Big increase in traffic for projects except 'wikipedia' family since Feb 14th - https://phabricator.wikimedia.org/T274823 (10kzimmerman) [17:02:00] 10Analytics: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565 (10kzimmerman) [17:08:28] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10phuedx) a:05phuedx→03None [17:32:03] 10Analytics-Radar, 10Anti-Harassment, 10CheckUser, 10Privacy Engineering, and 2 others: Deal with Google Chrome User-Agent deprecation - https://phabricator.wikimedia.org/T242825 (10Niharika) [17:46:33] (03CR) 10Mforns: [C: 03+1] "LGTM! +1" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/681682 (https://phabricator.wikimedia.org/T280649) (owner: 10Joal) [17:49:27] 10Analytics, 10Product-Analytics: Aggregate table not working after superset upgrade - https://phabricator.wikimedia.org/T280784 (10cchen) @razzi I mean when switching data sources, in some charts, i am not able to see the druid table in the dropdown list. For example, [[ https://superset.wikimedia.org/superse... [17:59:27] 10Analytics-Clusters, 10Analytics-Kanban: Re-add disk to an-worker1100 - https://phabricator.wikimedia.org/T281427 (10razzi) Ok, it looks like everything is working here, but disk usage is still at 0%: ` NAME FSTYPE LABEL UUID FSAVAIL FSUSE% MOU... [18:05:52] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: Replace usages of Linker::link() and Linker::linkKnown() in extension EventLogging - https://phabricator.wikimedia.org/T279328 (10Mholloway) @fdans, Thanks for the heads-up. Do you know who would be the r... [18:07:46] (03CR) 10Joal: "Thanks for the review @mforms :)" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/681678 (https://phabricator.wikimedia.org/T280649) (owner: 10Joal) [18:34:17] * elukey afk! [18:36:06] 10Analytics-Clusters, 10Analytics-Kanban: Re-add disk to an-worker1100 - https://phabricator.wikimedia.org/T281427 (10elukey) Yep it takes a bit! If the datanode got the new config you'll see more data in the upcoming days :) [18:51:59] * razzi lunch [18:54:19] Gone for tonight - see you folks tomorrow [18:55:20] byyyyeeeeee :] [19:16:00] 10Analytics, 10Product-Analytics: Top read repeats - https://phabricator.wikimedia.org/T280011 (10Astinson) @kzimmerman my partner set up her voice search for her new android phone -- I suspect it has all of the vowels that are common in spanish. [20:32:26] 10Analytics, 10Product-Analytics: Aggregate table not working after superset upgrade - https://phabricator.wikimedia.org/T280784 (10razzi) @cchen I'm not sure what you mean, what do you expect to have happen? [21:29:23] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 (again) - https://phabricator.wikimedia.org/T281316 (10GoranSMilovanovic) 05Resolved→03Open @elukey Let's take a close look at this, if you agree. [22:05:38] PROBLEM - Hadoop NodeManager on an-worker1131 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process [22:21:31] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 (again) - https://phabricator.wikimedia.org/T281316 (10GoranSMilovanovic) @WMDE-leszek @elukey I would like to learn from this. The following argument to `/usr/bin/sqoop` > --driver... [22:26:44] RECOVERY - Hadoop NodeManager on an-worker1131 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process