[00:51:30] 10Analytics, 10Product-Infrastructure-Data, 10Wikimedia-Logstash, 10observability: Create a separate logstash ElasticSearch index for schemaed events - https://phabricator.wikimedia.org/T265938 (10Krinkle) @herron For context, this relates to what we did with mediawiki exception/error messages, which have... [05:04:35] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:15:15] RECOVERY - Check the last execution of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:28:24] good morning [06:28:38] after months and months, an-presto1004 is back in service! [08:31:59] Wow - welcome back an-presto1004 :) [08:34:56] (03CR) 10Conniecc1: "> Patch Set 4:" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/607361 (https://phabricator.wikimedia.org/T256050) (owner: 10Conniecc1) [09:16:03] (03CR) 10Joal: [C: 04-1] "Comments inline" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/607361 (https://phabricator.wikimedia.org/T256050) (owner: 10Conniecc1) [09:20:02] !log upgrade hue to 4.8.0 on hue-next [09:20:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:48:44] 10Analytics, 10Analytics-Wikistats: stats.wikimedia.org not available in Finnish - https://phabricator.wikimedia.org/T266974 (10Jnovikov) All right! Now Finnish version is 88 % complete. [11:00:14] Hi, I am struggling to use gerrit from stat1008. I am trying to connect via ssh but get a time-out (it works from my local machine). I am checking `ssh -p 29418 -v mgerlach@gerrit.wikimedia.org` following the instructions here https://www.mediawiki.org/wiki/Gerrit/Tutorial#Add_SSH_Private_key_to_use_with_Git [11:00:14] Is there something I am missing ? thanks for any help. [11:09:55] mgerlach: hi! there is a firewall that controls traffic going outside the analytics vlan, so that ssh is not working, have you tried https? [11:10:07] there should already be a proxy configured for git [11:10:39] please also don't push any private material on stat100x, not safe [11:13:44] elukey: ah ok got it. yes, I can clone via http but wanted to push patches. I will try from my local machine then [11:14:15] mgerlach: you can push patches via https with your user password in theory [11:14:40] it will be asked when pushing, but best if you do it from your local machine yes [11:14:50] elukey: thanks [11:26:15] hey joal! Thanks for the code and specially for the idea of working with month and intermediary dataset, saved my life. [11:27:42] hi elukey, how are you? [11:28:47] dsaez: all good! How about you? [11:29:47] good good. [11:30:41] I'm having this problem with jupyter notebook, failing to start. I had same problem for other machines, and you give me a pointer to solve it, but I haven't save it. It was something like deleting/reseting the venv. Can you please send me that pointer again? [11:30:45] dsaez: before you ask, https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter#Resetting_user_virtualenvs [11:30:52] ahahah okok yes --^ [11:30:53] hahaha [11:37:07] thanks [11:37:11] :) [11:37:19] going afk for lunch! [14:00:06] 10Analytics-EventLogging, 10Analytics-Radar, 10Event-Platform, 10Product-Infrastructure-Data, and 2 others: OperationError: The operation failed for an operation-specific reason in generateRandomSessionId - https://phabricator.wikimedia.org/T263041 (10jlinehan) [14:05:45] elukey: could you take a look at https://gerrit.wikimedia.org/r/c/operations/puppet/+/639050/ [14:06:12] jbond42: only because it is you [14:06:23] <3 [14:07:31] jbond42: +1, do you want me to roll it out? [14:08:11] elukey: no its allreght needs to be pushed at the same time as the earlier one in the change so ill baby sit and shout if i see issues, thanks [14:09:54] ack! [14:23:57] elukey: applied to stat1004 was no-op [14:24:11] super [14:31:48] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [14:31:50] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [14:39:52] heya teamm [14:40:45] Hi mforns [14:41:01] :] [14:42:40] o/ [14:42:59] joal: if you want we can do the druid datasource bump [14:59:20] very interesting to see how memory changes for an-coord1001 (after the reboot) [14:59:23] https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=4&orgId=1&refresh=5m&var-server=an-coord1001&var-datasource=thanos&var-cluster=analytics&from=now-24h&to=now [14:59:37] cached memory increases really slowly, I expected something a bit more different [15:00:08] also used memory was higher before the reboot, weird [15:14:05] ottomata: when you have a moment can we chat about https://phabricator.wikimedia.org/T266826 ? [15:14:32] IIUC they are going to use the staging db on the dbstores, and I am a bit worried about disk space usage on those nodes [15:17:13] elukey: i told him to ask the DBAs about that but I'd be fine with it [15:17:23] if there is a problem then ya for sure speak up! [15:17:52] elukey: i was told those tables would be around 1 or 2 GB [15:17:55] too much? [15:18:48] ah ok I didn't see any estimation and I was wondering the size, and if it would have been growing over time [15:18:56] if it is 1/2G it's ok [15:18:59] not that I know of? [15:19:07] okok perfect [15:19:11] i think each time they regenerate the data and test it therrer [15:19:20] once they like it, they export it to prod [15:20:11] okok +1 [15:20:30] as long as it is considered as testing/scratchpad it is fine [15:32:47] * elukey throws hue out of the window and gets a coffee [15:38:22] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [15:44:56] 10Analytics-Radar, 10Platform Engineering Roadmap Decision Making, 10Epic, 10MW-1.35-notes (1.35.0-wmf.32; 2020-05-12), and 2 others: Remove revision_comment_temp and revision_actor_temp - https://phabricator.wikimedia.org/T215466 (10Kormat) [15:46:29] 10Analytics, 10Product-Infrastructure-Data, 10Wikimedia-Logstash, 10observability: Create a separate logstash ElasticSearch index for schemaed events - https://phabricator.wikimedia.org/T265938 (10herron) >>! In T265938#6601674, @Krinkle wrote: > @herron For context, this relates to what we did with mediaw... [15:58:58] 10Analytics, 10Product-Infrastructure-Data, 10Wikimedia-Logstash, 10observability: Create a separate logstash ElasticSearch index for schemaed events - https://phabricator.wikimedia.org/T265938 (10Ottomata) > Happy to discuss further! FWIW we do have an "o11y office hours" open slot on Mondays at 11:30 Eas... [16:23:18] 10Analytics-Clusters, 10Operations: Segfault for systemd-sysusers.service on stat1007 - https://phabricator.wikimedia.org/T256098 (10elukey) Haven't seen the issue for a while, maybe it is worth closing since there is already an upstream bug opened for Debian Buster. Thoughts? [16:24:09] 10Analytics-Clusters, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: replace onboard NIC in kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T236327 (10elukey) Reporting in here a chat with Chris - the maintenance is postponed to tomorrow (5th) [16:25:25] 10Analytics-Clusters, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker10[18-41] - https://phabricator.wikimedia.org/T260445 (10elukey) >>! In T260445#6504248, @elukey wrote: > @Cmjohnson I checked the items listed in the package slip but I don't see the quantity, only the fa... [17:39:49] 10Analytics-Radar, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Data: mw.user.generateRandomSessionId should return a UUID - https://phabricator.wikimedia.org/T266813 (10razzi) [17:40:32] 10Analytics, 10Product-Analytics, 10Structured-Data-Backlog: Add image table to monthly sqoop list - https://phabricator.wikimedia.org/T266077 (10razzi) a:03JAllemandou [17:41:41] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Structured-Data-Backlog: Add image table to monthly sqoop list - https://phabricator.wikimedia.org/T266077 (10razzi) [17:44:10] 10Analytics, 10Better Use Of Data, 10Product-Analytics: Revamp analytics.wikimedia.org data portal & landing page - https://phabricator.wikimedia.org/T266834 (10razzi) @mpopov we already have https://phabricator.wikimedia.org/T253393 with a similar scope; should we merge these? [17:45:04] RECOVERY - Check the last execution of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:47:56] 10Analytics: Improve Refine - https://phabricator.wikimedia.org/T266872 (10razzi) p:05Triage→03High [17:50:52] 10Analytics: Improve Refine - https://phabricator.wikimedia.org/T266872 (10JAllemandou) On using `count` to check how many dropped-rows: we could forget that the dataset also goes through deduplication and rows could be dropped for that reason as well. [17:51:19] 10Analytics, 10Operations: Augment NEL reports with a computed timestamp-of-generation - https://phabricator.wikimedia.org/T266886 (10razzi) @Ottomata could you take a look? [17:53:17] 10Analytics, 10Analytics-Wikistats: stats.wikimedia.org not available in Finnish - https://phabricator.wikimedia.org/T266974 (10razzi) p:05Triage→03High a:03Milimetric [17:56:12] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [18:01:39] (03PS1) 10Dave Pifke: Make compatible with Python 3 [analytics/statsv] - 10https://gerrit.wikimedia.org/r/639223 [18:07:23] 10Analytics, 10Analytics-Kanban: analytics.wikimedia.org TLC - https://phabricator.wikimedia.org/T253393 (10mpopov) [18:07:30] 10Analytics, 10Better Use Of Data, 10Product-Analytics: Revamp analytics.wikimedia.org data portal & landing page - https://phabricator.wikimedia.org/T266834 (10mpopov) [18:07:38] 10Analytics, 10Better Use Of Data, 10Product-Analytics: Revamp analytics.wikimedia.org data portal & landing page - https://phabricator.wikimedia.org/T266834 (10mpopov) @razzi: Good call, thanks! [18:10:03] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Revamp analytics.wikimedia.org data portal & landing page - https://phabricator.wikimedia.org/T253393 (10mpopov) [18:10:30] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Revamp analytics.wikimedia.org data portal & landing page - https://phabricator.wikimedia.org/T253393 (10mpopov) [18:12:29] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Revamp analytics.wikimedia.org data portal & landing page - https://phabricator.wikimedia.org/T253393 (10mpopov) [18:15:00] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Revamp analytics.wikimedia.org data portal & landing page - https://phabricator.wikimedia.org/T253393 (10mpopov) [18:16:08] 10Analytics, 10Product-Analytics: Content for analytics.wikimedia.org - https://phabricator.wikimedia.org/T267254 (10mpopov) [18:17:06] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Epic: Revamp analytics.wikimedia.org data portal & landing page - https://phabricator.wikimedia.org/T253393 (10mpopov) [18:19:37] 10Analytics, 10Operations: Augment NEL reports with a computed timestamp-of-generation - https://phabricator.wikimedia.org/T266886 (10Ottomata) @Cdanis and I need to discuss whether or not these events should ultimately go to Logstash or to Hive. I think this would be possible in either, but in Hive you could... [18:43:49] mforns: wanna do a schema migration together? or at leaest start? [18:44:18] ottomata: sure! [18:44:20] bc? [18:44:23] ya! [18:44:26] omw [18:59:04] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [19:01:49] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [19:04:13] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [19:06:11] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [19:09:29] (03PS4) 10Joal: Improve webrequest-refine query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/638086 (https://phabricator.wikimedia.org/T267008) [19:12:43] ottomata: heya - I have a fun idea :) would you be availa [19:13:01] ble now, or will I wait tomorrow to share it with you? [19:17:12] @anyone: which of these would be closest to number of unique pageviews (unique people who've seen a page): COUNT(DISTINCT actor_signature) or COUNT(DISTINCT user_agent, ip) or something else [19:18:57] lexnasser: We can't say "unique people", as the signatures (whether actor_signature or user_agent+ip or something else) are always devices, not users :) [19:20:00] joal: got it, which do you think would be most relevant to targeting the privacy threats that concern the API? [19:20:08] lexnasser: With that precision, I think the actor_signature is the closest thing we have - One thing though, is that it is NOT cross project (see https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/ActorSignatureGenerator.java) [19:21:04] lexnasser: I also think this one https://phabricator.wikimedia.org/T258101 is relevant [19:23:47] lexnasser: As for the relevance, I'm not really sure if it makes any difference (except for the mobile-apps) [19:24:20] joal: by not cross project, do u just mean that if one device were to visit both french and english wikipedia, then that would be counted twice? [19:24:37] correct lexnasser [19:24:57] uri_host is part of the hash used as the signature [19:25:18] joal: in bc with mforns [19:25:29] ok joining [19:26:04] joal: got it, thanks for clarifying, it really helped. I think that should be fine for this use case because we're only counting unique views per page, not per project. [19:27:19] makes sense - and pages are not cross-project lexnasser :) [19:32:20] 10Analytics-Clusters, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-09-15) upgrade/replace memory in stat100[58] - https://phabricator.wikimedia.org/T260448 (10wiki_willy) I followed up with Dell (during my regular meeting with them) about the status of the PSUs, and they said it was delivered on Nov... [19:33:10] * elukey afk! [19:33:27] 10Analytics-Clusters, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-09-15) upgrade/replace memory in stat100[58] - https://phabricator.wikimedia.org/T260448 (10wiki_willy) Tracking #935433832396 [19:39:43] Gone for tonight - see you tomorrow :) [20:40:28] (03CR) 10Ottomata: [C: 03+1] Improve webrequest-refine query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/638086 (https://phabricator.wikimedia.org/T267008) (owner: 10Joal) [21:08:23] (03PS2) 10Dave Pifke: Make compatible with Python 3 [analytics/statsv] - 10https://gerrit.wikimedia.org/r/639223 (https://phabricator.wikimedia.org/T267269) [21:30:15] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [21:57:12] (03PS16) 10Bstorm: multiinstance: Attempt to make quarry work with multiinstance replicas [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/632804 (https://phabricator.wikimedia.org/T264254) [21:57:46] (03CR) 10Bstorm: multiinstance: Attempt to make quarry work with multiinstance replicas (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/632804 (https://phabricator.wikimedia.org/T264254) (owner: 10Bstorm) [22:36:56] (03PS17) 10Bstorm: multiinstance: Attempt to make quarry work with multiinstance replicas [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/632804 (https://phabricator.wikimedia.org/T264254) [22:38:11] (03CR) 10Bstorm: multiinstance: Attempt to make quarry work with multiinstance replicas (032 comments) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/632804 (https://phabricator.wikimedia.org/T264254) (owner: 10Bstorm) [22:45:40] 10Analytics-Radar, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10wdwb-tech-focus: ApiAction log in data lake doesn't record Wikibase API actions - https://phabricator.wikimedia.org/T174474 (10Aklapper) [22:45:46] 10Analytics-Radar, 10Developer-Advocacy, 10MediaWiki-API, 10Product-Infrastructure-Team-Backlog, and 4 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079 (10Aklapper) [22:46:35] 10Analytics-Radar, 10MediaWiki-API, 10Patch-For-Review, 10Platform Team Initiatives (Modern Event Platform (TEC2)), 10User-Addshore: Run ETL for wmf_raw.ActionApi into wmf.action_* aggregate tables - https://phabricator.wikimedia.org/T137321 (10Aklapper) 05Stalled→03Open The previous comments don't e... [23:16:34] PROBLEM - Disk space on Hadoop worker on an-worker1113 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/e 16 GB (0% inode=99%): https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration