[00:01:02] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Order mediawiki_history dumps by event_timestamp - https://phabricator.wikimedia.org/T254233 (10marcmiquel) By the way, I just found that in cawiki files for years 2011 and 2016 (I've just checked this one now) rows are not sorted. The file starts with tim... [00:42:55] 10Analytics-Radar, 10Technical-blog-posts: Story idea for Blog: The Best Dataset on Wikimedia Content and Contributors - https://phabricator.wikimedia.org/T259559 (10Milimetric) >>! In T259559#6394856, @srodlund wrote: > @Milimetric I will take a first pass at it (asynch), and then if we want to meet (synch) w... [00:45:52] RECOVERY - Check the last execution of monitor_refine_eventlogging_legacy_failure_flags on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_legacy_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:54:17] 10Analytics-Clusters, 10Operations, 10ops-eqiad: (Need By: TBD) upgrade ram in an-master100[12] - https://phabricator.wikimedia.org/T259162 (10Jclark-ctr) @elukey. ram has arrived how long will it take to drain host? and how long can they be down? [10:49:31] Querying the wmf.pageview_hourly table in hive feels incredibly expensive. What data source is pageviews.toolforge.org using instead? [10:50:59] Or maybe there's currently some sort of contention with another user's 28-cpu job on stat1005 :-) [10:59:24] The tool is using https://wikimedia.org/api/rest_v1/metrics/pageviews/, which seems to be based on Hadoop? So I should be able to make the same queries from my notebook... [11:38:40] No commits to analytics/pageview-api in 8 years? This repo should be deprecated... [11:49:35] Okay I found how restbase forwards to the Analytics Query Service... [11:50:46] cassandra / druid. So I guess I should make API requests from my notebook, works for me! [13:10:21] awight: :) yeah AQS is backed by precomputed data in druid and cassandra, so the queries are much faster [13:20:46] Starting build #1 for job wikimedia-event-utilities-maven-release-docker [13:21:33] Project wikimedia-event-utilities-maven-release-docker build #1: 04FAILURE in 47 sec: https://integration.wikimedia.org/ci/job/wikimedia-event-utilities-maven-release-docker/1/ [13:23:09] ottomata: Thats an understatement! Thanks for the confirmation :-) [13:28:43] Starting build #2 for job wikimedia-event-utilities-maven-release-docker [13:29:49] Yippee, build fixed! [13:29:50] Project wikimedia-event-utilities-maven-release-docker build #2: 09FIXED in 1 min 7 sec: https://integration.wikimedia.org/ci/job/wikimedia-event-utilities-maven-release-docker/2/ [13:32:06] 10Analytics, 10Discovery-Search, 10Event-Platform: swift_upload.py events not making it into kafka - https://phabricator.wikimedia.org/T260743 (10Ottomata) Ah! eventgate-analytics emits to the Kafka jumbo cluster, not the Kafka main clusters. I also don't see any recent events in that topic in Kafka jumbo,... [13:43:55] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10hashar) @mforns and I paired the creation of the Je... [13:45:03] awight: only downside of the API is you need to know page-titles [14:24:42] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10mforns) Thanks for your help, @hashar :] [14:34:53] anyone know if there's the equivalent of rsync (https://wikitech.wikimedia.org/wiki/Analytics/Systems/Clients#Rsync_between_clients) between the analytics clients and Cloud VPS system? i suspect they're totally separate but wanted to see if there was a direct-ish way to move large files from stat100x to Cloud VPS instances [14:38:40] Hi isaacj - I don't think there is anything [14:39:27] The way I've done transfers is through file publication on stat machines, and download on cloudVPS - Not optimal :( [14:40:28] joal: thanks, martin and i were talking about it and we suspected as much. thanks for the reminder re: file publication. we were thinking we had to use our local laptops as the go-between but you're right that for public data (these are largely just largeish machine learning models), it'd be reasonable to use that system [14:41:09] mgerlach: ^^ https://wikitech.wikimedia.org/wiki/Analytics/Web_publication [14:41:21] isaacj: right - also, the fact that the data is be used on cloudVPS means it will be public, therefore can be published from stat machines [14:42:51] yes, good point as well [14:53:09] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10Ottomata) YESSSSSS THANK YOU SO SO MUCH! [15:00:57] ottomata, fdans: standup? [15:01:07] omw [15:05:45] ACHHG! [15:05:46] OOPS [15:11:53] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10mpopov) 05Open→03Declined I'm closing this ticket because it's no longer necessar... [15:12:00] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Goal, 10Services (watching): Modern Event Platform: Stream Configuration - https://phabricator.wikimedia.org/T205319 (10mpopov) [16:06:11] milimetric: language q for you [16:06:15] In both cases, we need to be able to find the schema of an event. Confluent does this with opaque schema IDs which are used to look up schemas in their remote Schema Registry service. We do this for JSONSChemas with versioned URIs. [16:06:18] in ^ [16:06:23] when I say 'opaque schema IDs' [16:06:26] do you know what I mean? [16:07:36] or mforns ^ ? [16:08:25] ottomata: I know what you mean but I think that's because I know what you're talking about [16:08:34] well ok [16:08:38] more specicially [16:08:45] i think i know what the definiton of an 'opaque id' is [16:08:47] if I didn't, maybe a little aside like "by opaque schema IDs, I mean IDs that aren't tied to ... " [16:08:49] but i am trying to make sure i'm not wrong [16:10:38] lets seem ya maybe i ask someone else...hmmm [16:10:44] wait, so by opaque schema IDs I assume you mean that the schema IDs are created/managed by their central schema repository, right? [16:10:52] yes [16:10:54] (as in, not standalone) [16:11:06] meaning they are just a int in the schema registry [16:11:16] yeah, so why leave it at "opaque" in hope that people know that? Just spell it out [16:11:21] whihc doesn't have meaning outside of that particlar schema registry install [16:11:26] hmm [16:11:27] ok [16:38:57] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10Ottomata) 05Declined→03Open Ah, I'd like to keep this open! It is a real bug and... [16:39:00] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Goal, 10Services (watching): Modern Event Platform: Stream Configuration - https://phabricator.wikimedia.org/T205319 (10Ottomata) [16:43:26] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Product-Infrastructure-Data: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API - https://phabricator.wikimedia.org/T259917 (10mpopov) p:05High→03Low Good call/catch! [16:49:40] 10Analytics, 10MediaWiki-REST-API, 10Platform Team Sprints Board (Sprint 1), 10Platform Team Workboards (Green), 10Story: Figure out how to remove null values from access logs - https://phabricator.wikimedia.org/T260820 (10Pchelolo) [17:34:52] 10Analytics-Radar, 10Technical-blog-posts: Story idea for Blog: The Best Dataset on Wikimedia Content and Contributors - https://phabricator.wikimedia.org/T259559 (10srodlund) @Milimetric Okay, I did a first pass of this. Please take a look and let me know if you have questions or want to synch up! [17:47:39] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Data, 10Goal: BUOD-KR1-Q3: Require that all new schema/instruments are created with the MEP system - https://phabricator.wikimedia.org/T259157 (10jlinehan) [17:49:51] 10Analytics-Radar, 10Product-Analytics, 10Product-Infrastructure-Data, 10Wikipedia-Android-App-Backlog, and 2 others: [EPIC] Count unique iOS & Android users precisely and in a privacy conscious manner that does not require opt in to send data - https://phabricator.wikimedia.org/T202664 (10jlinehan) [17:59:06] ottomata: I wouldn't have known what it means initially, but seems a good word for the meaning anyway [18:03:59] 10Analytics-Clusters, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker10[18-40] - https://phabricator.wikimedia.org/T260445 (10wiki_willy) a:05Cmjohnson→03Jclark-ctr [18:04:23] a-team, does anybody want to attend PA retro? [18:04:53] it's just myself on the AE side... [18:07:52] milimetric, otto ^ ? [18:16:47] 10Analytics, 10MediaWiki-REST-API, 10Platform Team Sprints Board (Sprint 1), 10Platform Team Workboards (Green), 10Story: Figure out how to remove null values from access logs - https://phabricator.wikimedia.org/T260820 (10Pchelolo) I have [[ https://gerrit.wikimedia.org/r/c/operations/deployment-charts/... [18:17:20] mforns: i usually skip those, i've been to one or two and i don't have enough interaction with PA to make it worthwhile [18:19:11] 10Analytics, 10Event-Platform: Eventgate crashes on invalid event - https://phabricator.wikimedia.org/T260839 (10Pchelolo) [18:23:10] 10Analytics, 10Discovery-Search, 10Event-Platform: swift_upload.py events not making it into kafka - https://phabricator.wikimedia.org/T260743 (10EBernhardson) 05Open→03Invalid Doh, indeed it's the jumbo instance. I incorrectly assumed the events weren't being produced. Also, for some reason setting off... [18:23:12] 10Analytics, 10MediaWiki-REST-API, 10Patch-For-Review, 10Platform Team Sprints Board (Sprint 1), and 2 others: System administrator reviews API usage by client - https://phabricator.wikimedia.org/T251812 (10Pchelolo) Boom! ` kafkacat -b kafka-jumbo1001.eqiad.wmnet -t staging.api-gateway.request -p 0 -c 1... [18:25:52] 10Analytics, 10Discovery-Search, 10Event-Platform: swift_upload.py events not making it into kafka - https://phabricator.wikimedia.org/T260743 (10Ottomata) > Also, for some reason setting offset to 0 in kafkacat seems to give nothing, Weird! Did not expect that. You can also give negative offsets to -o, to... [18:40:44] sorry mforns I had to run to the mechanic with the car, should’ve left a note [18:46:00] 10Analytics: Create new mailing list for analytics systems users - https://phabricator.wikimedia.org/T260849 (10Ottomata) [18:59:27] milimetric: no problemo! [19:58:37] 10Analytics, 10MediaWiki-REST-API, 10Platform Team Sprints Board (Sprint 1), 10Platform Team Workboards (Green), 10Story: Figure out how to remove null values from access logs - https://phabricator.wikimedia.org/T260820 (10Pchelolo) Filed a proposal for filtering nulls out in envoy https://github.com/env... [20:14:59] a-team, am working on blog post about eventgate [20:15:41] what do you think is more interesting to tech blog audience: how generic eventgate library works and can be used outside of WMF, or how WMF does things, like stream config, etc.? [20:15:54] i think i'm going to write about both, but not sure which to startt with [20:29:10] ottomata: I think if you write from WMF's point of view, you can mention what pieces are stand-alone, and that would be interesting [20:29:43] because EventGate is not yet widely used enough to be interesting on its own (hopefully will be soon!) [20:49:27] well, the blog post is kinda meant to encourage that [20:49:28] maybe. [21:38:26] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Technical contributors emerging communities metric definition, thick data - https://phabricator.wikimedia.org/T250284 (10jwang) Summarized the findings and recommendations in [[ https://docs.google.com/document/d/1NgNjhAKwik9sw-APHtN2RtcNZT8PdopexZQgpPUQO... [22:27:04] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10hashar) Forgot to capture, but from the discussion... [23:05:18] 10Analytics-Radar, 10Product-Analytics (Kanban): Identify next steps for dealing with missing mobile app pageview counts - https://phabricator.wikimedia.org/T256804 (10SNowick_WMF) [] Write incident report, stored in Product Analytics shared drive 'Data Incident Reports" [] Documenting fix process in ticket (t...