[02:31:04] <wikibugs>	 (03PS1) 10Milimetric: Fix referrer job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683468
[02:38:48] <wikibugs>	 (03PS2) 10Milimetric: Fix referrer job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683468
[02:43:07] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] "successful test at https://hue.wikimedia.org/hue/jobbrowser#!id=0003730-210426062240701-oozie-oozi-W" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683468 (owner: 10Milimetric)
[02:59:17] <milimetric>	 !log deployed hotfix for referrer job
[02:59:19] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[03:14:21] <milimetric>	 for Luca when he wakes up: sqoop test was fine except... one very strange error.  One of the tables, "content", seems to not want to sqoop.  It's there in clouddb, I checked, but somehow we don't see it and logs say "skipped"
[05:25:26] <elukey>	 good morning!
[05:26:02] <elukey>	 weird Dan :(
[05:52:56] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban: Re-add disk to an-worker1100 - https://phabricator.wikimedia.org/T281427 (10elukey) @razzi you have the wrong slot, it is the 10th :)  ` Enclosure Device ID: 32 Slot Number: 10 Enclosure position: 1 Device Id: 10 WWN: 5000c500c9829a03 Sequence Number: 7 Media Error Cou...
[06:03:58] <elukey>	 hnowlan: o/
[06:25:56] <joal>	 Good morning
[06:27:47] <elukey>	 bonjour
[06:37:20] <joal>	 For Dan when he gets back, and elukey: I can answer on sqoop and the content table - The 'SKIPPED' message comes from https://github.com/wikimedia/analytics-refinery/blob/master/python/refinery/sqoop.py#L396
[06:38:04] <joal>	 Sending a patch in minutes so that the log-line is more explicit (currently says: "SKIPPING: Table content is not available for database XXXX")
[06:38:04] <elukey>	 joal: perfect :)
[06:38:28] <joal>	 git up
[06:38:31] <joal>	 woops
[06:39:28] <wikibugs>	 (03PS2) 10Joal: Update cassandra jobs for double loading [analytics/refinery] - 10https://gerrit.wikimedia.org/r/681678 (https://phabricator.wikimedia.org/T280649)
[06:39:30] <wikibugs>	 (03PS2) 10Joal: Cleanup cassandra double loading [analytics/refinery] - 10https://gerrit.wikimedia.org/r/681682 (https://phabricator.wikimedia.org/T280649)
[06:43:13] <joal>	 git review
[06:43:39] <joal>	 mwarf
[06:43:54] * joal should stop sending git commands to analytics folks
[06:44:04] <wikibugs>	 (03PS1) 10Joal: Better logging for sqoop skipped tables [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683485
[06:49:09] <wikibugs>	 (03CR) 10Joal: [C: 03+1] "> > Maybe we should also add the 'test-data' to this folder, instead of in aqs-deploy repo? <-- comments welcome :)" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/682933 (https://phabricator.wikimedia.org/T278701) (owner: 10Hnowlan)
[06:51:15] <wikibugs>	 10Analytics, 10WMCZ-Stats: Review request: New datasets for WMCZ published under analytics.wikimedia.org - https://phabricator.wikimedia.org/T279567 (10JAllemandou) > You say I will need to "start reworking some of your script to Airflow" – are there any help materials about what needs to be done?  Unfortunate...
[07:54:15] <wikibugs>	 10Analytics: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565 (10JAllemandou)
[07:54:17] <wikibugs>	 10Analytics, 10Product-Analytics, 10Pageviews-Anomaly: Too many views to Skathi (moon) on enwiki - https://phabricator.wikimedia.org/T280844 (10JAllemandou)
[08:00:04] <wikibugs>	 10Analytics, 10Product-Analytics, 10Pageviews-Anomaly: Too many views to Skathi (moon) on enwiki - https://phabricator.wikimedia.org/T280844 (10JAllemandou) @kzimmerman  : I added the task as a subtask of T280565. I did some further analysis:  - Constant distinct IPs and user-agents hourly over a day (~180 i...
[09:19:00] <joal>	 Interesting!!! 10:04:36 -!- aikoChou24 [6fff2a52@111-255-42-82.dynamic-ip.hinet.net] has quit [Client Quit]
[09:19:03] <joal>	 oops
[09:19:04] <joal>	 Again
[09:19:13] <joal>	 https://flokkr.github.io/
[09:27:35] <elukey>	 I think it was the one done by the Cloudera's Principal that worked on Ozone
[09:27:49] <elukey>	 but I am a little skeptical about the "kerberos" part :)
[09:27:56] <joal>	 :)
[09:28:13] <elukey>	 ah joal they answered me for the feature store
[09:28:22] <elukey>	 https://community.hopsworks.ai/t/feature-store-hardware-requirements/466
[09:28:59] <elukey>	 as I suspected they run a separate Hive server with Hudi, but it seems that they don't support kerberos for external tables (sigh)
[09:53:04] <wikibugs>	 (03PS1) 10Gehel: Use discovery-parent-pom in wikidata toolkit analyzer. [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/683569
[09:59:40] <wikibugs>	 (03CR) 10Gehel: Use discovery-parent-pom in wikidata toolkit analyzer. (031 comment) [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/683569 (owner: 10Gehel)
[10:20:58] * elukey lunch!
[10:37:25] <hnowlan>	 elukey: yo! gotta run out for an appointment in a few but I'm caught up on scrollback. 
[10:37:35] <hnowlan>	 so weird that it worked fine in deployment-prep but not in prod
[10:37:50] <hnowlan>	 I wonder if we stop the processes on eventlog1002 will 1003 immediately take over? 
[10:37:57] <hnowlan>	 I know that's not how it's supposed to operate ofc 
[10:38:05] <hnowlan>	 but might be worth trying
[11:06:20] <wikibugs>	 (03CR) 10Addshore: [C: 03+2] Use discovery-parent-pom in wikidata toolkit analyzer. [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/683569 (owner: 10Gehel)
[11:07:00] <gehel>	 addshore: thanks !
[11:09:10] <wikibugs>	 (03Merged) 10jenkins-bot: Use discovery-parent-pom in wikidata toolkit analyzer. [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/683569 (owner: 10Gehel)
[11:27:27] <elukey>	 hnowlan: I am wondering if restarting eventlog on 1003 could cause a rebalance, maybe the firewall rules were not in place when it tried to connect? I don't see errors indicating that in the logs, but I can't explain this either
[11:42:06] <hnowlan>	 elukey: I'll give it a go 
[11:43:39] <hnowlan>	 same again `Setting newly assigned partitions set() for group eventlogging_processor_client_side_00`
[11:46:19] <elukey>	 hnowlan: another test could be to manually stop only one processor of eventlog1002
[11:46:31] <elukey>	 (we can add some mins of downtime first)
[11:46:56] <elukey>	 one topic partition will be freed, I am interested to see if it will be picked up by eventlog1002 or 1003
[11:50:27] <elukey>	 just added 900s, testing :)
[11:50:50] <elukey>	 !log manual stop of one of the eventlog processors on eventlog1002 to see if 1003 takes it over
[11:50:52] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[11:52:14] <elukey>	 ah! it worked!
[11:52:17] <elukey>	 hnowlan: --^
[11:53:08] <hnowlan>	 elukey: nice! 
[11:53:11] <hnowlan>	 and weird
[11:54:24] <elukey>	 there must be some heuristic in the rebalance that we don't know
[11:54:52] <elukey>	 hnowlan: on 1003 eventlogging-processor@client-side-08.service took the new partition, logs are good
[11:55:29] <elukey>	 I disabled puppet on 1002, I think that we could slowly move everything to 1003
[11:55:44] <elukey>	 if you want to do it now, otherwise later on is fine
[11:55:50] <elukey>	 I'll add more downtime to 1002 in case
[11:56:38] <hnowlan>	 now works for me 
[11:58:45] <elukey>	 perfect :)
[11:59:18] <elukey>	 hnowlan: added an hour of downtime to 1002 now
[11:59:41] <hnowlan>	 thanks! I'll start stopping them in order, will log each one
[12:00:15] <elukey>	 you can do a single !log if you want, it will suffice
[12:00:34] <elukey>	 let's make sure that nothing explodes in the logs, and that the volume of events is the same
[12:00:39] <wikibugs>	 (03CR) 10Framawiki: [C: 03+2] Fix Docker compatibility with multiinstance [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/682321 (owner: 10Framawiki)
[12:00:45] <elukey>	 can't think about anything else
[12:00:54] <elukey>	 ah also host-overview metrics for 1003
[12:01:04] <elukey>	 just to make sure that the vm is sized correctly etc..
[12:01:43] <hnowlan>	 ack 
[12:02:09] <elukey>	 <3
[12:02:18] <hnowlan>	 !log stopping processors on eventlog1002 to migrate to eventlog1003 
[12:02:19] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[12:03:48] <wikibugs>	 (03PS1) 10Milimetric: Add yue.wikibooks to the allow list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683585
[12:04:02] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add yue.wikibooks to the allow list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683585 (owner: 10Milimetric)
[12:05:15] <wikibugs>	 (03Merged) 10jenkins-bot: Fix Docker compatibility with multiinstance [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/682321 (owner: 10Framawiki)
[12:24:22] <hnowlan>	 Right, everything is migrated 
[12:26:19] <hnowlan>	 eventlog1003 is... coping but it could probably do with some more CPU 
[12:28:40] <hnowlan>	 so I wonder whether turning eventlog1002 consumers back on will flip stuff back 
[12:29:42] <hnowlan>	 huh, yep, it did. 
[12:31:01] <milimetric>	 joal: ah, I should've known, thanks for fixing.  I knew you had made the sqoopable db, but forgot
[12:35:19] <hnowlan>	 !log restarting 2 processors on eventlog1002
[12:35:21] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[12:35:38] <hnowlan>	 gonna take lunch, letting some of the workers on eventlog1002 pick up some of the slack while I'm afk 
[12:36:43] <hnowlan>	 extended downtime on eventlog1002
[12:40:57] <elukey>	 super
[12:57:10] * elukey bbiab!
[13:08:38] <joal>	 Hi milimetric - Would you have manually updated the pageview allowlist during a pageview run by any chance?
[13:09:09] <joal>	 milimetric: I'm asking to be sure of the reson for which the pageview-hourly job fialed
[13:09:20] <milimetric>	 joal: I did!  I had no idea that was potentially bad, we should take it out of the ops_week procedures if it is
[13:09:33] <joal>	 milimetric: no problem, I wanted to triple check
[13:09:42] <joal>	 milimetric: reruning the job (I have it all setup_0
[13:09:47] <milimetric>	 ok
[13:09:55] <joal>	 !log Rerun failed pageview-hourly-wf-2021-4-29-11
[13:09:57] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[13:10:16] <milimetric>	 joal: so should we not do that?
[13:10:19] <milimetric>	 I've been doing that for years
[13:10:47] <joal>	 milimetric: updating the file manually only causes problem if ou do it at a moment a job reads it :)
[13:10:48] <milimetric>	 hm... I guess it doesn't matter how you sync it, whether with put -f or running hdfs-sync
[13:11:02] <milimetric>	 yeah, so it's just bad luck... ok
[13:11:14] <milimetric>	 funny though, I'll add a note
[13:12:19] <joal>	 thanks milimetric :)
[13:13:35] <joal>	 milimetric: I think I'll be hitting the same problem you had yesterday with rerunning and perms - Let's try to figure out a solution for that
[13:13:54] <milimetric>	 joal: yea, wanna brainbounce?
[13:15:20] <joal>	 sure
[13:15:44] <milimetric>	 hm... I think we should just write a script for it, to find and rerun command line, where we can sudo -u whoever
[13:16:53] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) distcp complete, application_id: application_1619507802557_6586  Moving on to creating and repairing hive ta...
[13:17:17] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Update event_sanitized_main_allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683421 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata)
[13:34:42] <ottomata>	 milimetric:  ok if i do a refnery sync to get https://gerrit.wikimedia.org/r/c/analytics/refinery/+/683421 out?
[13:34:46] <ottomata>	 i really only need it on an-launcher
[13:34:48] <ottomata>	 so 'll just sync it there
[13:34:55] <joal>	 milimetric: I think our problem is permissions
[13:35:00] <milimetric>	 ottomata: I'm fine with that
[13:35:14] <milimetric>	 joal: oh, ok, I'll check the allow list permissions
[13:36:13] <mforns>	 hi teammmm
[13:36:26] <wikibugs>	 (03PS1) 10Ottomata: Update static_data/eventlogging/whitelist.yaml symlink [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683608
[13:36:30] <ottomata>	 mornin!
[13:36:34] <ottomata>	 :)
[13:36:48] <mforns>	 ottomata: looking at the changes you pinged me with last night
[13:37:01] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Update static_data/eventlogging/whitelist.yaml symlink [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683608 (owner: 10Ottomata)
[13:37:51] <icinga-wm>	 PROBLEM - Check status of defined EventLogging jobs on eventlog1002 is CRITICAL: CRITICAL: Stopped EventLogging jobs: eventlogging-processor@client-side-11 eventlogging-processor@client-side-10 eventlogging-processor@client-side-09 eventlogging-processor@client-side-08 eventlogging-processor@client-side-07 eventlogging-processor@client-side-06 eventlogging-processor@client-side-05 eventlogging-processor@client-side-04 eventlogging
[13:37:51] <icinga-wm>	 t-side-03 eventlogging-processor@client-side-02 https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging
[13:38:23] <milimetric>	 joal: indeed, for some reason it didn't have read permissions at the group level
[13:38:34] <joal>	 WEIRD!
[13:38:39] <milimetric>	 that's changed from how it would work in the past... I wonder why, directory permissions are the same
[13:38:45] <milimetric>	 anyway I +r-ed the file
[13:38:56] <joal>	 milimetric: for ALL, of for group?
[13:39:12] <joal>	 milimetric: I confirm it works for me now :)
[13:39:26] <milimetric>	 uh... sorry I'm the worst, it's rw-r--r-- now, and it was rw-r---- before
[13:43:35] <joal>	 ok milimetric - I confirm that '+or' was the correct thing to do - all good :) Thanks!
[13:53:00] <joal>	 milimetric: Shall I rerun the 2 failed pageview-jourly jobs?
[13:53:02] <wikibugs>	 10Analytics, 10Event-Platform: mediawiki/page/properties-change schema should use map type for added and removed page properties - https://phabricator.wikimedia.org/T281483 (10Ottomata)
[13:53:30] <joal>	 !log Rerun failed pageview-hourly-wf-2021-4-29-11 and pageview-hourly-wf-2021-4-29-12
[13:53:39] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[13:54:08] <milimetric>	 thx joal
[13:54:27] <joal>	 no prob - thanks for the allowlist fix!
[13:55:07] <milimetric>	 I'm so confused between +or o+r and +r, they seem to do the same thing :)
[13:55:22] <milimetric>	 (in this case, obviously not in general)
[13:56:18] <wikibugs>	 (03PS1) 10Ottomata: Add comment in allowlist about  mediawiki_page_properties_change [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683620 (https://phabricator.wikimedia.org/T273789)
[13:56:52] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Add comment in allowlist about  mediawiki_page_properties_change [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683620 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata)
[14:05:23] <hnowlan>	 elukey: think I'll bump eventlog1003 up to 6 CPUs just to give us some room, that seem sensible? 
[14:08:17] <joal>	 Gone for kids
[14:08:45] <elukey>	 hnowlan: +1 yes, didn't check graphs but sounds good!
[14:09:22] <ottomata>	 joal:  https://docs.google.com/document/d/1-DLugMuUEFu8f3MyZVVEQJYv4SKBHN6rdgREPALoaKY/edit#heading=h.4rhrqlz74n1s :)
[14:11:27] <ottomata>	 oh, you've seen it ! :) great
[14:21:02] <hnowlan>	 !log bump eventlog1003 CPUs to 6
[14:21:03] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:21:12] <hnowlan>	 gonna swing all eventlogging back to eventlog1002 for a reboto
[14:21:21] <hnowlan>	 *reboot of eventlog1003 
[14:31:47] <icinga-wm>	 RECOVERY - Check status of defined EventLogging jobs on eventlog1002 is OK: OK: All defined EventLogging jobs are runnning. https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging
[14:32:33] <hnowlan>	 ^ expected 
[14:37:03] <milimetric>	 joal: ok, I pushed the topic detection stuff, minimal testing as I came to the conclusion it's not really doing anything
[14:37:26] <milimetric>	 (it's on milimetric/wmf on top of what I think is your latest from joal/wmf)
[14:44:14] <wikibugs>	 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10phuedx) This should be deployed to the Beta Cluster any moment now.
[14:44:48] <hnowlan>	 !log restored all eventlogging jobs to eventlog1003
[14:44:49] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:47:59] <icinga-wm>	 ACKNOWLEDGEMENT - Check status of defined EventLogging jobs on eventlog1002 is CRITICAL: CRITICAL: Stopped EventLogging jobs: eventlogging-processor@client-side-11 eventlogging-processor@client-side-10 eventlogging-processor@client-side-09 eventlogging-processor@client-side-08 eventlogging-processor@client-side-07 eventlogging-processor@client-side-06 eventlogging-processor@client-side-05 eventlogging-processor@client-side-04 even
[14:47:59] <icinga-wm>	 or@client-side-03 eventlogging-processor@client-side-02 eventlogging-processor@client-side-01 eventlogging-processor@client-side-00 Hnowlan eventlog1003 now handles these jobs. https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging
[14:52:42] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban: Migrate eventlog1002 to buster - https://phabricator.wikimedia.org/T278137 (10hnowlan) eventlog1003 is now handling all eventlogging jobs. [[ https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&var-server=eventlog1003&var-datasource=thanos&var-cl...
[14:53:35] <elukey>	 hnowlan: nice!
[14:55:59] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Event-Platform: Rename event_sanitized partition directories to lowercase - https://phabricator.wikimedia.org/T280813 (10fdans) 05Open→03Resolved
[14:56:02] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10fdans)
[14:56:10] <wikibugs>	 10Analytics, 10Analytics-Kanban: Duplicate wikitext entries for a bunch of wikis in 2021-02 snapshot - https://phabricator.wikimedia.org/T278551 (10fdans) 05Open→03Resolved
[14:56:16] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade the Hadoop coordinators to Debian Buster - https://phabricator.wikimedia.org/T278424 (10fdans) 05Open→03Resolved
[14:56:19] <wikibugs>	 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10fdans)
[14:56:30] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade furud/flerovium to Debian Buster - https://phabricator.wikimedia.org/T278421 (10fdans) 05Open→03Resolved
[14:56:34] <wikibugs>	 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10fdans)
[14:56:43] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers - https://phabricator.wikimedia.org/T255973 (10fdans) 05Open→03Resolved
[14:56:47] <wikibugs>	 10Analytics, 10Analytics-Dashiki, 10Analytics-Kanban: npm install gives Verification failed while extracting mediawiki-storage@https://github.com/wikimedia/analytics-mediawiki-storage/archive/master.tar.gz - https://phabricator.wikimedia.org/T278982 (10fdans) 05Open→03Resolved
[14:56:52] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add better monitoring for Analytics UIs - https://phabricator.wikimedia.org/T277729 (10fdans) 05Open→03Resolved
[14:56:54] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Configure the HDFS Namenodes to use the log4j rolling gzip appender - https://phabricator.wikimedia.org/T276906 (10fdans) 05Open→03Resolved
[14:57:31] <elukey>	 !log run mysql_upgrade on an-coord1001 to complete the buster upgrade - T278424
[14:57:35] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:57:35] <stashbot>	 T278424: Upgrade the Hadoop coordinators to Debian Buster - https://phabricator.wikimedia.org/T278424
[14:57:40] <elukey>	 now done done done
[15:02:31] <wikibugs>	 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 3 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10phuedx)
[15:03:06] <wikibugs>	 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 3 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10phuedx)
[15:09:37] <elukey>	 ottomata: o/ - about hw stuff, I just realized something that might be worth to follow up. an-launcher1002 and an-coord1002 were previously notebook100[3,4], that we bought in Jan 2018 from what I can see. The will complete their 5y life during next FY, and I fear that dcops didn't track them after the relabel
[15:09:53] <elukey>	 *They will
[15:10:47] <elukey>	 I know that we discussed about moving hive to vms etc.., but given the fact VMs bigger than 16g are not reccomended I'd think about replacing an-coord1002
[15:34:30] <ottomata>	 elukey:  aye yeah not using VM makes sense
[15:34:58] <ottomata>	 elukey:  will they be OOW next FY or the one after?
[15:35:19] <elukey>	 early 2022, so next FY
[15:35:47] <ottomata>	 elukey:  how's yo find that out? phab or some DC ops doc?
[15:35:49] <elukey>	 or maybe late 2022
[15:35:53] <elukey>	 mmm
[15:36:00] <elukey>	 we racked them in Jan 2018 afaics
[15:36:13] <wikibugs>	 (03PS1) 10Milimetric: Fix referrer input events [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683671
[15:36:27] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] Fix referrer input events [analytics/refinery] - 10https://gerrit.wikimedia.org/r/683671 (owner: 10Milimetric)
[15:36:37] <elukey>	 in theory we'd have 2022 too, maybe not really next fiscal, but we'd be in the grey area of no spare parts etc..
[15:37:03] <elukey>	 (we already had a disk issue with an-coord1001, Chris added a spare one but we couldn't get any warranty for the rest)
[15:37:14] <elukey>	 so if the hosts break badly we cannot replace them
[15:37:42] <ottomata>	 an-coord1001 must be older, no?
[15:38:10] <ottomata>	 !log enabling event_sanitized_main jobs - T273789
[15:38:14] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[15:38:14] <stashbot>	 T273789: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789
[15:38:32] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata)
[15:38:38] <elukey>	 ottomata: I was wondering about it as well, it may be around its replacement date as well
[15:38:51] <ottomata>	 lets ask in dcops
[15:46:14] <razzi>	 Hi elukey, quick question for the an-worker1100 disk: I formatted the disk, but when I mount it using `sudo mount -v /dev/sdl1` it gets unmounted immediately by systemd. Looks like `systemctl daemon-reexec` might fix it, is that a safe command to run?
[15:47:18] <wikibugs>	 10Analytics, 10Privacy Engineering, 10Research: Release dataset on top search engine referrers by country, device, and language - https://phabricator.wikimedia.org/T270140 (10Isaac)
[15:48:22] <elukey>	 razzi: hi! have you updated /etc/fstab accordingly? You should add the new UUID, save the file and then simply mount -a
[15:48:45] <elukey>	 if you don't do it when you reboot the os will not re-mount the partition
[15:50:25] <razzi>	 Yeah, I updated fstab, and the disk mounts when I run `mount -a`: 
[15:50:25] <razzi>	 `/var/lib/hadoop/data/k   : successfully mounted`
[15:50:25] <razzi>	 But systemd unmounts it:
[15:50:25] <razzi>	 `var-lib-hadoop-data-k.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-7bcd4c25\x2da157\x2d4023\x2da346\x2d924d4ccee5a0.device. Stopping, too.`
[15:50:25] <razzi>	 `Unmounting /var/lib/hadoop/data/k...`
[15:51:53] <elukey>	 you can try with a systemctl daemon-reload
[15:52:44] <elukey>	 it is graceful and may help
[15:53:17] <razzi>	 elukey: cool, that fixed it!
[15:53:29] <elukey>	 ah nice!
[15:55:44] <razzi>	 !log restart hadoop-yarn-nodemanager and hadoop-hdfs-datanode on an-worker1100 for hadoop to recognize new disk /dev/sdl
[15:55:47] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[15:59:37] <ottomata>	 oh elukey  serverrs ordered in 2018...will be up for refresh in 2023, no?
[16:01:02] <wikibugs>	 (03CR) 10Mforns: [C: 03+1] "Left a comment just in case, but LGTM +1" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/681678 (https://phabricator.wikimedia.org/T280649) (owner: 10Joal)
[16:03:23] <elukey>	 ottomata: I think at this point, but we are outside the warranty, and we have only one an-launcher
[16:03:30] <elukey>	 so say that it breaks, it may be an issue
[16:10:46] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 (again) - https://phabricator.wikimedia.org/T281316 (10elukey) 05Open→03Resolved a:03elukey No issues from our side, going to close, please reopen if necessary!
[16:17:53] <ottomata>	 hmm hmm, if launcher breaks...we can use a vm for that?
[16:18:10] <elukey>	 ottomata: not when sqoop runs :(
[16:20:39] <ottomata>	 hm
[16:20:56] <ottomata>	 sqoop heavy stuff can't run in a yarn app master?
[16:21:32] <elukey>	 this is outside my knowledge, it is really havy on launcher in its current state
[16:21:41] <elukey>	 *heavy
[16:25:33] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) Backfilling since yesterday's distcp was started.  `sudo -u analytics kerberos-run-command analytics /usr/lo...
[16:29:03] <ottomata>	 milimetric:  sqoop q above ^^^
[16:30:57] <elukey>	  /win 11
[16:30:59] <elukey>	 ufff
[16:36:51] <joal>	 elukey: isn't it win-10?
[16:36:54] <joal>	 :-P
[16:48:59] * elukey runs away
[16:51:42] <wikibugs>	 10Analytics, 10Product-Analytics: Top read repeats - https://phabricator.wikimedia.org/T280011 (10kzimmerman) @JAllemandou sorry I missed your comment last week! Is this something that still needs looking into? Tagging with #product-analytics so we can discuss this in our next board triage.  T270784 suggests t...
[16:57:39] <wikibugs>	 10Analytics: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565 (10kzimmerman) @JAllemandou There are a couple of other tickets (T270784, T274823) that might be resolved if the automated traffic detection heuristics are improved; should I add them as subtasks?
[16:58:52] <wikibugs>	 10Analytics: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565 (10JAllemandou) >>! In T280565#7046384, @kzimmerman wrote: > @JAllemandou There are a couple of other tickets (T270784, T274823) that might be resolved if the automated traffic detection heuristics ar...
[17:01:26] <wikibugs>	 10Analytics: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565 (10kzimmerman)
[17:01:58] <wikibugs>	 10Analytics-Radar, 10Product-Analytics (Kanban): Big increase in traffic for projects except 'wikipedia' family since Feb 14th - https://phabricator.wikimedia.org/T274823 (10kzimmerman)
[17:02:00] <wikibugs>	 10Analytics: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565 (10kzimmerman)
[17:08:28] <wikibugs>	 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10phuedx) a:05phuedx→03None
[17:32:03] <wikibugs>	 10Analytics-Radar, 10Anti-Harassment, 10CheckUser, 10Privacy Engineering, and 2 others: Deal with Google Chrome User-Agent deprecation - https://phabricator.wikimedia.org/T242825 (10Niharika)
[17:46:33] <wikibugs>	 (03CR) 10Mforns: [C: 03+1] "LGTM! +1" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/681682 (https://phabricator.wikimedia.org/T280649) (owner: 10Joal)
[17:49:27] <wikibugs>	 10Analytics, 10Product-Analytics: Aggregate table not working after superset upgrade - https://phabricator.wikimedia.org/T280784 (10cchen) @razzi I mean when switching data sources, in some charts, i am not able to see the druid table in the dropdown list. For example, [[ https://superset.wikimedia.org/superse...
[17:59:27] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban: Re-add disk to an-worker1100 - https://phabricator.wikimedia.org/T281427 (10razzi) Ok, it looks like everything is working here, but disk usage is still at 0%: ` NAME            FSTYPE            LABEL           UUID                                   FSAVAIL FSUSE% MOU...
[18:05:52] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: Replace usages of Linker::link() and Linker::linkKnown() in extension EventLogging - https://phabricator.wikimedia.org/T279328 (10Mholloway) @fdans, Thanks for the heads-up.  Do you know who would be the r...
[18:07:46] <wikibugs>	 (03CR) 10Joal: "Thanks for the review @mforms :)" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/681678 (https://phabricator.wikimedia.org/T280649) (owner: 10Joal)
[18:34:17] * elukey afk!
[18:36:06] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban: Re-add disk to an-worker1100 - https://phabricator.wikimedia.org/T281427 (10elukey) Yep it takes a bit! If the datanode got the new config you'll see more data in the upcoming days :)
[18:51:59] * razzi lunch
[18:54:19] <joal>	 Gone for tonight - see you folks tomorrow
[18:55:20] <mforns>	 byyyyeeeeee :]
[19:16:00] <wikibugs>	 10Analytics, 10Product-Analytics: Top read repeats - https://phabricator.wikimedia.org/T280011 (10Astinson) @kzimmerman my partner set up her voice search for her new android phone -- I suspect it has all of the vowels that are common in spanish.
[20:32:26] <wikibugs>	 10Analytics, 10Product-Analytics: Aggregate table not working after superset upgrade - https://phabricator.wikimedia.org/T280784 (10razzi) @cchen I'm not sure what you mean, what do you expect to have happen?
[21:29:23] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 (again) - https://phabricator.wikimedia.org/T281316 (10GoranSMilovanovic) 05Resolved→03Open @elukey Let's take a close look at this, if you agree.
[22:05:38] <icinga-wm>	 PROBLEM - Hadoop NodeManager on an-worker1131 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[22:21:31] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-GoranSMilovanovic: WDCM_Sqoop_Clients.R fails from stat1004 (again) - https://phabricator.wikimedia.org/T281316 (10GoranSMilovanovic) @WMDE-leszek @elukey I would like to learn from this.   The following argument to `/usr/bin/sqoop`  > --driver...
[22:26:44] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1131 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process