[06:44:34] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review: Broken disk on analytics1072 - https://phabricator.wikimedia.org/T226467 (10elukey) 05Open→03Resolved @Cmjohnson thanks a lot! I had to reboot again to be able to configure the new PD, not really sure why (the megacli commands were failing bef... [06:45:16] analytics1072 back in shape with a new disk [07:19:21] 10Analytics, 10Tool-Pageviews: The mediacounts dataset doesn't have a project dimension - https://phabricator.wikimedia.org/T228151 (10fdans) @Tgr agreed, we can definitely get the project from the referer, but the main value of this dataset is its historical aspect. We can't backfill project using referer fur... [07:25:38] dcausse: o/ [07:25:52] elukey: o/ [07:26:57] morninggg - I am wondering who usually takes care of oozie stuff in your team [07:27:05] for https://gerrit.wikimedia.org/r/#/c/wikimedia/discovery/analytics/+/523212/ [07:27:17] because the coordinators, if we merge, should be restarted [07:28:14] elukey: Erik usually does [07:29:52] ahh okok [07:29:57] will ping him :) [07:33:32] 10Analytics, 10Tool-Pageviews: The mediacounts dataset doesn't have a project dimension - https://phabricator.wikimedia.org/T228151 (10Tgr) People will just have to live with that IMO. Which project a file was uploaded to is certainly useful information, but using it as the project field seems pretty misleading. [07:52:49] 10Analytics, 10Tool-Pageviews: The mediacounts dataset doesn't have a project dimension - https://phabricator.wikimedia.org/T228151 (10Tgr) > `/wikipedia/{language}/*` Note that that can also be a project name for a multilingual/language-less project, not just a language. `commons`, `meta`, `mediawiki`, `foun... [08:02:24] 10Analytics, 10Operations, 10hardware-requests, 10User-Elukey: codfw: 1 misc node for the Kerberos KDC service - https://phabricator.wikimedia.org/T227425 (10MoritzMuehlenhoff) Ack, this looks good to me! [08:02:54] 10Analytics, 10Operations, 10hardware-requests, 10User-Elukey: eqiad: 1 misc node for the Kerberos KDC service - https://phabricator.wikimedia.org/T227288 (10MoritzMuehlenhoff) Also followed up on the codfw task, but adding here for completeness as well: This looks good to me! [08:05:19] 10Analytics, 10Operations, 10hardware-requests, 10User-Elukey: codfw: 1 misc node for the Kerberos KDC service - https://phabricator.wikimedia.org/T227425 (10elukey) a:05elukey→03RobH [08:05:43] 10Analytics, 10Operations, 10hardware-requests, 10User-Elukey: eqiad: 1 misc node for the Kerberos KDC service - https://phabricator.wikimedia.org/T227288 (10elukey) a:05elukey→03RobH [08:41:16] 10Analytics, 10Patch-For-Review, 10User-Elukey: Move refinery to hive 2 actions - https://phabricator.wikimedia.org/T227257 (10elukey) First annoying problems when testing beeline: * It seems that `beeline -f script.hql --database something`, I got my tables created in the default db. Using the hive tool it... [08:56:38] 10Analytics, 10Patch-For-Review, 10User-Elukey: Move refinery to hive 2 actions - https://phabricator.wikimedia.org/T227257 (10elukey) https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-LoggingConfiguration could help, going to test in the testing cluster and see if... [10:31:23] * elukey lunch! [11:08:34] (03PS1) 10Fdans: Add UDF to get wiki project from referer string [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/523903 (https://phabricator.wikimedia.org/T228151) [13:38:52] 10Analytics, 10Patch-For-Review, 10User-Elukey: Move refinery to hive 2 actions - https://phabricator.wikimedia.org/T227257 (10elukey) In https://oozie.apache.org/docs/4.2.0/DG_Hive2ActionExtension.html I found this: > The argument element, if present, contains arguments to be passed as-is to Beeline. So I... [13:48:09] \o/ \o/ \o/ ---^ [14:07:17] :) [14:09:17] Hey analytics team - I had a question about using analytics-mysql - should I expect some data inconsistency between those replicas and any random prod db host? [14:12:11] sbassett: hi! in theory no, did you find anything strange? [14:14:54] elukey: No, was just wondering if there was any expected delay in replicating data to them. Admittedly, I don't know much about that process even after reviewing wikitech:/Analytics/Data_access. [14:20:07] sbassett: in theory the dbstore hosts (we have three) are way more robust than the previous dbstore1002, and they should be regular replicas of the prod dbs [14:20:30] in dbstore1002 we had a lot of data inconsistencies due to outages, lag, etc.. [14:26:08] (03CR) 10Elukey: [V: 03+2] "The job runs fine under my username, I added one hour of pageviews to my database. The following has been verified:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/523200 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:26:08] elukey: Ok, good to know. Thanks! [14:26:32] :) [14:30:06] (03PS2) 10Elukey: pageview: move the oozie coordinator to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/523200 (https://phabricator.wikimedia.org/T227257) [14:30:08] (03PS1) 10Elukey: Add verbose argument to pageview/projectview oozie coordinators [analytics/refinery] - 10https://gerrit.wikimedia.org/r/523934 (https://phabricator.wikimedia.org/T227257) [14:33:03] (03CR) 10Elukey: [C: 03+2] "Re-add the +2 since it was lost after the rebase." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/523200 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:33:14] (03CR) 10Elukey: [V: 03+2] pageview: move the oozie coordinator to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/523200 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:41:34] (03PS1) 10Addshore: Create script tracking number of slots on wikibase repos [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/523938 (https://phabricator.wikimedia.org/T68025) [14:41:44] * elukey sees addshore and waves o/ [14:41:50] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, and 2 others: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by otto@cumin1001 for hosts: `cloudvirtan[1001-... [14:42:55] (03CR) 10jerkins-bot: [V: 04-1] Create script tracking number of slots on wikibase repos [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/523938 (https://phabricator.wikimedia.org/T68025) (owner: 10Addshore) [15:03:54] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10elukey) Reporting some info from https://github.com/ROCmSoftwarePlatfo... [15:04:32] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, and 2 others: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Ottomata) Alright, nodes are role spare::system and decommed/downtimed in icinga. [15:04:40] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, and 2 others: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Ottomata) [15:06:39] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, and 2 others: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Ottomata) @cmjohnson back atcha :) [15:14:23] 10Analytics, 10Better Use Of Data, 10Reading-Infrastructure-Team-Backlog, 10Epic: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10jlinehan) [15:17:53] (03CR) 10Nuria: [C: 04-1] Add UDF to get wiki project from referer string (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/523903 (https://phabricator.wikimedia.org/T228151) (owner: 10Fdans) [15:22:00] (03CR) 10Nuria: [C: 03+1] "Nice, virtual +2 for when we are ready to merge." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/523934 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [15:31:01] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move reportupdater queries from limn-* repositories to reportupdater-queries - https://phabricator.wikimedia.org/T222739 (10Nuria) I think that before we delete queries from repos we probably want to make sure this puppet change is deployed and working wel... [15:34:34] (03CR) 10ArielGlenn: Create script tracking number of slots on wikibase repos (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/523938 (https://phabricator.wikimedia.org/T68025) (owner: 10Addshore) [15:43:40] 10Analytics: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10Ottomata) [15:43:43] 10Analytics: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10Ottomata) [15:55:46] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Allow all Analytics tools to work with Kerberos auth - https://phabricator.wikimedia.org/T226698 (10elukey) [15:55:48] 10Analytics: Refine should accept principal name for hive2 jdbc connection for DDL - https://phabricator.wikimedia.org/T228291 (10elukey) [16:01:44] ping ottomata [16:01:48] standdduppppppp [16:08:55] 10Analytics, 10EventBus, 10Growth-Team, 10Notifications, 10Wikimedia-production-error: Database error "Duplicate entry" for PRIMARY key (from EchoNotificationMapper::insert) - https://phabricator.wikimedia.org/T217079 (10Krinkle) Doesn't appear in the last 30 days in Logstash. [16:17:38] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Performance-Team (Radar): EventLogging needs to enque events to avoid draining users' battery on mobile - https://phabricator.wikimedia.org/T225578 (10Milimetric) @Krinkle so I'm looking at this and feeling a little uncomfortable about basically... [16:23:40] ottomata: https://aws.amazon.com/blogs/aws/amazon-eventbridge-event-driven-aws-integration-for-your-saas-applications/ [16:42:58] (03CR) 10Milimetric: [C: 03+2] "refinery doesn't merge by itself, so feel free to +2 when you're ready" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/523934 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [16:43:12] (03CR) 10Milimetric: [C: 03+2] pageview: move the oozie coordinator to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/523200 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [16:44:17] (03CR) 10Elukey: [V: 03+2] Add verbose argument to pageview/projectview oozie coordinators [analytics/refinery] - 10https://gerrit.wikimedia.org/r/523934 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [16:44:43] thanks milimetric ! [16:44:51] np, looks good [16:48:48] 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10Core Platform Team (Services Operations): Create scripts to estimate Kafka queue size per wiki - https://phabricator.wikimedia.org/T182259 (10Pchelolo) p:05Normal→03Low [16:52:34] * elukey off! [16:53:04] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Core Platform Team Backlog (Watching / External), and 2 others: Factor out eventgate-wikimedia factory into its own gerrit repo and use it for deployment pipeline - https://phabricator.wikimedia.org/T226668 (10Jdforrester-WMF) [16:53:08] I am going out for dinner and I won't be around for the deployment sorry :( [16:54:58] 10Analytics, 10Core Platform Team, 10MediaWiki-API, 10RESTBase-API: Top API user agents stats - https://phabricator.wikimedia.org/T142139 (10Pchelolo) I don't think this is relevant anymore and any work should be done, but it contains some good info, so moving to Icebox. [17:03:19] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 3 others: Separate ChangeProp and JobQueue Redis - https://phabricator.wikimedia.org/T183586 (10Pchelolo) [17:14:58] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Performance-Team (Radar): EventLogging needs to enque events to avoid draining users' battery on mobile - https://phabricator.wikimedia.org/T225578 (10Krinkle) Yes, a common interface for this would make sense. I'd go further and actually also ma... [17:48:13] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), 10Core Platform Team Workboards (Clinic Duty Team): Develop a library for JSON schema backwards incompatibility detection - https://phabricator.wikimedia.org/T206889 (10Pchelolo) [17:49:33] hehe elukey re eventbridge...yuppers [17:51:44] (03PS11) 10Nuria: Add file extension and media type classification to media files UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/517641 (https://phabricator.wikimedia.org/T225911) (owner: 10Fdans) [17:52:28] fdans: ay, i think my last push doesn't have the test you added but will get it and add it again (re: mediatypes code) [17:52:33] fdans: sorry about that [17:52:39] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Performance-Team (Radar): EventLogging needs to enque events to avoid draining users' battery on mobile - https://phabricator.wikimedia.org/T225578 (10Milimetric) Hm, if it's a singleton then clients would have to namespace events. Because the s... [17:52:55] (03CR) 10jerkins-bot: [V: 04-1] Add file extension and media type classification to media files UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/517641 (https://phabricator.wikimedia.org/T225911) (owner: 10Fdans) [17:53:48] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), 10Core Platform Team Workboards (Clinic Duty Team): Migrate RESTBase/ChangeProp produced events to eventgate - https://phabricator.wikimedia.org/T228318 (10Pchelolo) [17:54:01] (03PS12) 10Nuria: Add file extension and media type classification to media files UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/517641 (https://phabricator.wikimedia.org/T225911) (owner: 10Fdans) [17:54:55] (03CR) 10jerkins-bot: [V: 04-1] Add file extension and media type classification to media files UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/517641 (https://phabricator.wikimedia.org/T225911) (owner: 10Fdans) [17:57:03] (03PS1) 10Nuria: Adding changes to changelog.md for release [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/523979 [17:57:28] (03CR) 10Nuria: [C: 03+2] "Self merging changelog for upcoming release" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/523979 (owner: 10Nuria) [18:06:14] (03Merged) 10jenkins-bot: Adding changes to changelog.md for release [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/523979 (owner: 10Nuria) [18:06:55] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Performance-Team (Radar): EventLogging needs to enque events to avoid draining users' battery on mobile - https://phabricator.wikimedia.org/T225578 (10Nuria) If it is a singleton there would be 1 queue but in this case the statsd queue and EL que... [18:10:37] !log stating build of new refinery-source 0.0.94 [18:10:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:12:32] ottomata: when i deploy next jar for refine scala jobs i just need to bump up the jar version on puppet right? [18:24:25] yup but only if you intend to [18:24:32] for refine...yes we want that :) [19:28:09] milimetric: nuria more learnings about ResourceLoader [19:30:10] load.php always returns a JavaScript snippet for mediawiki JS clients [19:30:12] to eval [19:30:21] won't work for mobile apps [19:33:39] so [19:33:51] we'll need to make an api endpoint in EventLogging to get config [19:36:43] ottomata: eventgate , right? [19:37:35] ? [19:37:45] nuria: ^ [19:48:54] nuria: I edited design doc to suggest MW aciton API endpoint [20:03:06] ottomata: yes sorry, mw action api endpoint to surface config ya, +1 [20:19:38] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 3 others: Modern Event Platform: Stream Configuration - https://phabricator.wikimedia.org/T205319 (10Ottomata) [20:19:50] ottomata: mmm.. i do not understand on building refinery why it says it build 0.0.93 https://integration.wikimedia.org/ci/job/analytics-refinery-release/lastBuild/org.wikimedia.analytics.refinery$refinery/ [20:20:02] ottomata: when it should be 0.0.94? [20:20:33] nuria: i think those are entered in the jenkins release form, no? [20:20:59] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Deploy/Refinery-source [20:21:02] If the versions are not the ones that you expect, you can refresh them with https://integration.wikimedia.org/ci/job/analytics-refinery-release/build?delay=0sec This is known to refresh the values that you'll see in the next step, but usually they should be pre-filled. [20:21:07] ottomata: ya, i did enter 0.094 [20:21:10] hm [20:21:15] ottomata: let me see [20:22:13] hm [20:22:29] it does look like 0.0.94 is what you entered, i see the tag, pom changed, etc. [20:23:06] nuria: the build artifiacts say refinery-0.0.94-SNAPSHOT.pom [20:23:24] ottomata: ya, they do [20:23:39] OH [20:23:49] hmm [20:23:55] oh nuria that is a commit message [20:23:58] from a previous commit [20:24:17] i think maybe it is just the commit that jenkins made during the preivous releaase [20:24:28] somehow, the plugin includes it in this release? [20:24:33] ottomata: oohhhh, RIGHTTTTTT [20:25:03] dunno why but ¯\_(ツ)_/¯ [20:25:30] ottomata: k, continuing [20:29:04] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Core Platform Team Backlog (Watching / External), and 3 others: Modern Event Platform: Stream Intake Service: Migrate eventlogging-service-eventbus events to eventgate-main - https://phabricator.wikimedia.org/T211248 (10Ottomata) [20:30:54] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), 10Core Platform Team Workboards (Clinic Duty Team): Use new event schema format for change-prop events - https://phabricator.wikimedia.org/T228318 (10Ottomata) p:05Triage→03Normal [20:47:55] 10Analytics, 10Wikimedia-Stream, 10Documentation: stream.wikimedia.org/?doc returns an error page - https://phabricator.wikimedia.org/T227958 (10WDoranWMF) a:03Ottomata Untagging CPT and adding @Ottomata, please let me know if this is incorrect. [20:58:15] !Log deploying refinery 0.0.94 [21:11:22] 10Analytics, 10Wikimedia-Stream, 10Documentation: stream.wikimedia.org/?doc returns an error page - https://phabricator.wikimedia.org/T227958 (10Ottomata) Thank you! Fixed. This was caused by an update to the service-template-node version we used. spec.yaml used by ?doc was not updated properly. [21:11:27] 10Analytics, 10Analytics-Kanban, 10Wikimedia-Stream, 10Documentation: stream.wikimedia.org/?doc returns an error page - https://phabricator.wikimedia.org/T227958 (10Ottomata) [21:37:04] ottomata: still cannot deploy to an-coord1001.eqiad.wmnet [21:39:15] https://www.irccloud.com/pastebin/ru1VoZQg/ [22:19:11] ottomata: or wait might be analytics1030.eqiad.wmnet [22:21:50] 10Analytics: deployments to analytics1030 failing - https://phabricator.wikimedia.org/T228347 (10Nuria) [22:25:45] 10Analytics: deployments to analytics1030 failing - https://phabricator.wikimedia.org/T228347 (10Nuria) *i think* the srv partition on analytics1030 is full and the deployment of today (and the one dan did last week) have failed. ` df: /mnt/hdfs: Input/output error Filesystem Size Us... [22:38:10] 10Analytics: deployments to analytics1030 failing - https://phabricator.wikimedia.org/T228347 (10Nuria) State of cache on an-coord1001 ` nuria@an-coord1001:/srv/deployment/analytics/refinery-cache/revs$ ls -la total 20 drwxr-xr-x 5 analytics-deploy analytics-deploy 4096 Jul 17 21:06 . drwxr-xr-x 4 analytics-de... [22:40:39] 10Analytics, 10Analytics-Kanban, 10Wikimedia-Stream, 10Documentation: stream.wikimedia.org/?doc returns an error page - https://phabricator.wikimedia.org/T227958 (10Ottomata) Hm, I think varnish still has /?spec cached on some frontends. Not sure how long it takes for that to get purged, but I assume it e... [22:41:11] 10Analytics: deployments to analytics1030 failing - https://phabricator.wikimedia.org/T228347 (10Nuria) I think we need to remove b8a496b174bfb8965090ceaa3ad0ef48db5eec61 from analytics1030 to be able to deploy but i cannot do it cause i do not have sudo, ping @Ottomata @elukey so either can remove and we can co... [22:41:28] 10Analytics, 10Analytics-Kanban: deployments to analytics1030 failing - https://phabricator.wikimedia.org/T228347 (10Nuria) [22:42:17] 10Analytics, 10Analytics-Kanban: deployments to analytics1030 failing - https://phabricator.wikimedia.org/T228347 (10Ottomata) Done. [22:50:20] (03PS13) 10Nuria: Add file extension and media type classification to media files UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/517641 (https://phabricator.wikimedia.org/T225911) (owner: 10Fdans) [23:30:03] (03PS14) 10Nuria: Add file extension and media type classification to media files UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/517641 (https://phabricator.wikimedia.org/T225911) (owner: 10Fdans) [23:30:29] (03CR) 10Nuria: "I have incorporated your new test to this changeset." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/517641 (https://phabricator.wikimedia.org/T225911) (owner: 10Fdans) [23:38:20] 10Analytics, 10EventBus, 10Core Platform Team Backlog (Watching / External), 10Patch-For-Review, 10Services (later): Modern Event Platform: Stream Intake Service: Migrate change-prop events to new (EventGate) style schemas - https://phabricator.wikimedia.org/T226522 (10Pchelolo) Ok, I've seem to have bee... [23:39:48] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Modern Event Platform: Stream Intake Service: Migrate eventlogging-service-eventbus events to eventgate-main - https://phabricator.wikimedia.org/T211248 (10Pchelolo)