[00:25:28] (03PS6) 10Paul Kernfeld: reader.py: Get all report keys from defaults [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/623060 (https://phabricator.wikimedia.org/T193171) [06:15:15] 10Analytics-Clusters: Upgrade Kafka Brokers to Debian Buster - https://phabricator.wikimedia.org/T255123 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['kafka-jumbo1005.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202008310615_el... [06:15:56] good morning! [06:16:02] I am reimaging jumbo1005 to buster [06:49:52] 10Analytics-Clusters: Upgrade Kafka Brokers to Debian Buster - https://phabricator.wikimedia.org/T255123 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['kafka-jumbo1005.eqiad.wmnet'] ` and were **ALL** successful. [07:13:24] !log run kafka preferred-replica-election on Jumbo after jumbo1005's reimage [07:13:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:18:28] 10Analytics-Clusters: hue.wikimedia.org throws an exception when trying to log in with a non-ASCII username - https://phabricator.wikimedia.org/T260929 (10elukey) Hi, thanks for the report! We are trying to package the latest upstream version of Hue in T258768 as part of the migration to Buster, this issue might... [07:30:25] 10Analytics-Clusters, 10Jupyter-Hub: Timeout during relaunch Jupyterhub server - https://phabricator.wikimedia.org/T258087 (10elukey) @Rvvalentim are you still having issues? [07:31:43] 10Analytics-Clusters, 10Operations, 10ops-eqiad, 10User-Elukey: replace onboard NIC in kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T236327 (10elukey) @Jclark-ctr should we sync about this to schedule the first host (when you have time of course)? [07:59:45] a-team: please welcome Tobias to this chan :) (nick: klausman) [08:00:12] good morning klausman !! welcome :) [08:00:15] (as anticipated welcomes will pop up during the day :) [08:00:18] *wave* hello everyone [10:17:14] I noticed that the analytics mariadb replicas are read-only. What would I do for e.g. `create table awight.X as select ...`? [10:17:39] awight: hello! There is a database called "staging" in which you can create tables [10:18:00] elukey: magic! Thanks--I'll add that to https://wikitech.wikimedia.org/wiki/Analytics/Data_access#MariaDB if that makes sense. [10:18:04] it is on one node only iirc though [10:18:13] O_O kk good hint [10:18:27] if there is the need for more, we can add one for each dbstore [10:18:34] lemme check [10:19:11] awight: yes it is listedi n https://wikitech.wikimedia.org/wiki/Analytics/Systems/MariaDB (look for "staging") [10:19:13] Whenever I'm slightly outside the "normal" band of use cases, I suspect it's me who's doing something wrong. [10:19:39] elukey: excellent, thanks again! [10:20:08] awight: nono you just need some scratchpad, it makes sense! The main issue that I am thinking of is if you need to select * from something on host X and the staging db is on host Y [10:20:28] +1 just running into that next question myself ;-) [10:20:54] we'd probably need more staging dbs [10:20:59] I can query into Python if necessary, just that it's sort of a heavy query so I'd love persistence and stuff. [10:21:23] Maybe I can push more of my logic into sql so the back-and-forth of a pipeline goes away. [10:22:43] as general concern, since we don't have a ton of space on the dbstores, we'd prefer not to store big tables as scratchpad [10:22:56] but it makes sense as use case (to have staging on all) [10:23:12] we can work on it but we'll need to schedule some time [10:28:51] elukey: Don't worry--this chat was very helpful and it's clear how I proceed now. Much better than me just wondering on my own! [10:29:33] please ping us anytime! [10:29:38] I'll just minimize the output, then store back to hadoop. [10:29:45] super [10:31:07] elukey: :-) I do! In fact, I secretly use your team as my go-to example of extreme support and generosity. Just let me know whenever my pestering is too much, always happy to wait in the queue e.g. on Phabricator. [10:33:38] nah you are always welcome :) [10:37:45] Keeping the dbs safely read-only is probably a good way to keep guests from wearing out their welcome. [10:54:26] Great—this mariadb has common table expressions, so I don't have get all subselecty. [10:58:26] super [11:47:24] 10Analytics, 10Operations, 10Traffic: Package varnish 6.0.x - https://phabricator.wikimedia.org/T261632 (10ema) [11:47:52] 10Analytics, 10Operations, 10Traffic: Package varnish 6.0.x - https://phabricator.wikimedia.org/T261632 (10ema) [11:48:22] 10Analytics, 10Operations, 10Traffic: Package varnish 6.0.x - https://phabricator.wikimedia.org/T261632 (10ema) p:05Triage→03Medium [11:53:23] I am going to reimage another kafka broker! [11:53:32] Hi folks :) [11:53:57] awight: question for you - why not use DBs dumped on the cluster? [11:54:15] awight: I have potential answers in mind, your view is of interest :) [12:01:10] 10Analytics-Clusters, 10Patch-For-Review: Upgrade Kafka Brokers to Debian Buster - https://phabricator.wikimedia.org/T255123 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['kafka-jumbo1001.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-r... [12:04:25] (03PS1) 10Awight: Summarize beta opt-out situation on small default wikis [analytics/wmde/TW/edit-conflicts] - 10https://gerrit.wikimedia.org/r/623338 (https://phabricator.wikimedia.org/T261491) [12:07:40] (03PS2) 10Awight: Summarize beta opt-out situation on small default wikis [analytics/wmde/TW/edit-conflicts] - 10https://gerrit.wikimedia.org/r/623338 (https://phabricator.wikimedia.org/T261491) [12:26:24] 10Analytics-Clusters: Upgrade Kafka Brokers to Debian Buster - https://phabricator.wikimedia.org/T255123 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['kafka-jumbo1001.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202008311226_el... [12:51:36] 10Analytics-Clusters: Upgrade Kafka Brokers to Debian Buster - https://phabricator.wikimedia.org/T255123 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['kafka-jumbo1001.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202008311251_el... [12:56:58] 10Analytics, 10Operations, 10Traffic: Package varnish 6.0.x - https://phabricator.wikimedia.org/T261632 (10ema) When it comes to varnish-modules, our current version (0.12.1-1+wmf2) does not build against 6.0.x, and same goes for varnish-modules 0.16.0 currently in testing. Luckily though, with a few changes... [13:31:22] (03CR) 10Joal: Chopping timeseries for noise detection (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/612454 (https://phabricator.wikimedia.org/T257691) (owner: 10Nuria) [13:43:28] !log run kafka preferred-replica-election on Jumbo after jumbo1001's reimage [13:43:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:20:15] joal: You mean, process /mnt/data/xmldatadumps ? In my case there was a much simpler way to get the numbers I wanted, so the encouragement not to create a bunch of temporary tables was helpful: https://gitlab.com/adamwight/conflict-query/-/blob/optout/reports/Opt-out%20modeling.ipynb [14:20:58] The CTEs are temporary tables, actually. But there's no need to persist them. [14:21:11] awight: in meeting, will read after [14:21:16] +1 [14:23:44] ottomata: helloooo just pushed another commit to the jsonschematools pr [14:24:12] oo k ty [14:24:14] will look [14:42:54] hi klausman! I'm getting a late start today, but excited to have you onboard! [14:43:22] \o thanks for the warm welcome [14:55:59] awight: thanks for the snippet :) I meant using the sqooped versions of mysql tables we have on the cluster - from the snippet, we are missing user_properties though [15:01:21] Hey klausman - we all are in another room - https://meet.google.com/rxb-bjxn-nip [15:01:53] klausman: we move@! [15:03:07] ping razzi [15:26:43] 10Analytics, 10Patch-For-Review: Fix TLS certificate location and expire for Hadoop/Presto/etc.. and add alarms on TLS cert expiry - https://phabricator.wikimedia.org/T253957 (10jbond) >>! In T253957#6351773, @elukey wrote: > I recently discovered that we have `base::expose_puppet_certs` in puppet. The class i... [15:38:00] razzi: did you get invite [15:38:11] Yep! [15:38:13] razzi: https://meet.google.com/rxb-bjxn-nip [15:40:29] ping joal , standup? [15:40:53] nuria: I'm with chris talking about hiring [15:41:01] sent e-scrum nuria [15:41:10] joal: k [15:48:39] leila: Hi! I just met with chris and we updated our part of the interview doc - we agreed on the strategy of start-question leeding to technical discussion with questions deepening the aspects we want to check - Please let us know if that fits for you [15:49:14] 10Analytics-Clusters: AMD ROCm kernel drivers on stat1005/stat1008 don't support some features - https://phabricator.wikimedia.org/T260442 (10Nuria) a:03klausman [16:01:19] a-team will be slightliy late to grosking, gotta attend a call sorry [16:07:33] joal: Ooh very cool to learn about, thanks! Yeah my little task is finished, but I'll keep the imported tables in mind. Maybe worth mentioning on the "Data access" wiki page... [16:11:24] joal: coming to groskin? [16:16:24] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-NavigationTiming, 10Performance-Team: Invalid EventLogging messages for NavigationTiming topic - https://phabricator.wikimedia.org/T261665 (10awight) [16:17:26] 10Analytics, 10EventStreams: KafkaSSE: Cannot write SSE event, the response is already finished - https://phabricator.wikimedia.org/T261556 (10Ottomata) I am pretty sure this is normal; this happens when the HTTP connection is closed (either by the client or the server timeout), but the KafkaSSE consume loop h... [16:18:07] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - View numbers contradictory - https://phabricator.wikimedia.org/T261565 (10Milimetric) p:05Triage→03Low [16:21:37] 10Analytics-Radar, 10MediaWiki-extensions-NavigationTiming, 10Performance-Team: Invalid EventLogging messages for NavigationTiming topic - https://phabricator.wikimedia.org/T261665 (10Milimetric) ping @Gilles, you were asking about this last week, here are some more examples of validation errors. [16:24:46] 10Analytics, 10Analytics-Kanban, 10EventStreams: KafkaSSE: Cannot write SSE event, the response is already finished - https://phabricator.wikimedia.org/T261556 (10Milimetric) p:05Triage→03High a:03Milimetric Maybe we should change the level of this to warn. [16:25:58] 10Analytics, 10Analytics-Wikistats, 10I18n: Add link to translatewiki.net in wikistats footer - https://phabricator.wikimedia.org/T261502 (10Milimetric) p:05Triage→03High [16:26:16] 10Analytics-Radar, 10MediaWiki-extensions-NavigationTiming, 10Performance-Team: Invalid EventLogging messages for NavigationTiming topic - https://phabricator.wikimedia.org/T261665 (10Ottomata) This is actually good news. This is a consequence of a fix for {T254606}. Browsers are nasty and send bad data som... [16:30:31] 10Analytics-Radar, 10Operations, 10Traffic, 10Patch-For-Review: Package varnish 6.0.x - https://phabricator.wikimedia.org/T261632 (10Milimetric) [16:33:29] 10Analytics: Statement of work for new designer in wikistats - https://phabricator.wikimedia.org/T223478 (10Milimetric) [16:34:39] 10Analytics, 10Analytics-Kanban: Statement of work for new designer in wikistats - https://phabricator.wikimedia.org/T223478 (10Milimetric) 05Open→03Resolved [16:36:12] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Add dimensions to editors_daily dataset - https://phabricator.wikimedia.org/T256050 (10Milimetric) a:05Milimetric→03Nuria [16:39:48] 10Analytics: Add wikitech (labswiki) to the sqoop list - https://phabricator.wikimedia.org/T217792 (10Milimetric) Future us: this parent/child graph is backwards, the blocking task of moving wikitech to a normal DB cluster is not done. [16:41:34] 10Analytics-Radar, 10Commons, 10Epic: Provide download statistics of files on Wikimedia Commons - https://phabricator.wikimedia.org/T218076 (10Milimetric) Moving this to radar until we see some update to instrumentation and we can revisit. [16:46:01] razzi: https://meet.google.com/erk-ijvf-uqh [16:46:31] 10Analytics, 10Analytics-Wikistats, 10I18n: Add link to translatewiki.net in wikistats footer - https://phabricator.wikimedia.org/T261502 (10Milimetric) [16:46:41] 10Analytics, 10Analytics-Wikistats, 10I18n, 10good first task: Add link to translatewiki.net in wikistats footer - https://phabricator.wikimedia.org/T261502 (10Milimetric) [16:47:46] 10Analytics, 10Analytics-Dashiki: Have dashiki read and write GET params to pass stateful versions of dashboard pages {crow} - https://phabricator.wikimedia.org/T119996 (10Milimetric) 05Open→03Declined (better done in Superset at this point) [16:48:08] 10Analytics: Edit analysis dashboard Failures by User Type chart does not update correctly - https://phabricator.wikimedia.org/T148656 (10Milimetric) 05Open→03Declined (no longer relevant with that dashboard undeployed) [16:50:43] 10Analytics-Radar, 10Commons, 10Structured-Data-Backlog, 10Epic: Provide download statistics of files on Wikimedia Commons - https://phabricator.wikimedia.org/T218076 (10CBogen) [16:55:09] 10Analytics, 10Analytics-Dashiki: Dashiki Cleanup - https://phabricator.wikimedia.org/T168573 (10Milimetric) [16:55:30] 10Analytics, 10Analytics-Dashiki: Dashiki Cleanup - https://phabricator.wikimedia.org/T168573 (10Milimetric) p:05Triage→03Low [16:55:48] 10Analytics-Clusters: AMD ROCm kernel drivers on stat1005/stat1008 don't support some features - https://phabricator.wikimedia.org/T260442 (10elukey) Background info for @klausman: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/AMD_GPU We have two hosts with one AMD GPU each (stat1005.eqiad.wmnet... [16:56:06] 10Analytics, 10Analytics-Dashiki: Switch to fetch away from jquery - https://phabricator.wikimedia.org/T148053 (10Milimetric) 05Open→03Declined nah [16:56:37] 10Analytics, 10Analytics-Dashiki: dashiki should execute tests on jenkins - https://phabricator.wikimedia.org/T156657 (10Milimetric) 05Open→03Declined [17:01:59] 10Analytics, 10Cloud-Services: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users - https://phabricator.wikimedia.org/T204950 (10Milimetric) p:05High→03Low This needs further consideration. [17:02:13] 10Analytics: Fix EventLogging editCountBucket fields historically - https://phabricator.wikimedia.org/T169674 (10Milimetric) 05Open→03Declined unneeded, work already done in other ways [17:03:09] nuria: just came back from bringing Naé to school integration - I should have mentioned in chan - sorry [17:03:33] 10Analytics, 10Cloud-Services: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users - https://phabricator.wikimedia.org/T204950 (10Milimetric) [17:03:47] 10Analytics: Presto cluster online and usable with test data pushed from analytics prod infrastructure accessible by Cloud (labs) users - https://phabricator.wikimedia.org/T204951 (10Milimetric) 05Open→03Declined Presto may not work here, unless we just poke a hole to a production cluster. We'll have it for... [17:04:32] 10Analytics-Kanban: Create a set of celery tasks that can handle the global metric API input {kudu} [0 pts] - https://phabricator.wikimedia.org/T117288 (10Milimetric) [17:04:39] 10Analytics: Expose the results of the global metric at a public link, that's available immediately for the API {kudu} - https://phabricator.wikimedia.org/T118310 (10Milimetric) 05Open→03Declined outdated, no further work on wikimetrics [17:05:49] 10Analytics, 10Analytics-Wikistats, 10Patch-For-Review: Wikistats2: Linting - https://phabricator.wikimedia.org/T208697 (10Milimetric) 05Open→03Declined outdated now, though I still have the patches and I'll try to honor their spirit in a later refactor. [17:09:28] 10Analytics, 10Operations, 10Research, 10WMF-Legal, 10User-Elukey: Enable layered data-access and sharing for a new form of collaboration - https://phabricator.wikimedia.org/T245833 (10Milimetric) We have to think more about how to accomplish this, taking into account all the security implications we've... [17:09:37] 10Analytics: Investigate AQS cassandra schema hash warninga - https://phabricator.wikimedia.org/T178832 (10Milimetric) 05Open→03Declined Haven't seen this in a while, maybe it went away! [17:11:16] 10Analytics: Webrequest tagging and distribution. Measuring non-pageview requests - https://phabricator.wikimedia.org/T164019 (10Milimetric) 05Open→03Declined we can/should do this in the event way [17:11:53] 10Analytics-Radar, 10MediaWiki-extensions-NavigationTiming, 10Performance-Team: Invalid EventLogging messages for NavigationTiming topic - https://phabricator.wikimedia.org/T261665 (10Gilles) Oh yes, we've known for years that browsers send junk values for these APIs regularly. There's not much we can do abo... [17:13:15] 10Analytics: Increase topojson resolution: Singapore does not appear on wikistats map - https://phabricator.wikimedia.org/T199571 (10Milimetric) p:05Medium→03Low (one idea would be to download an increased-resolution version) [17:15:01] 10Analytics, 10Analytics-Wikistats, 10Patch-For-Review: Improve scoping of CSS - https://phabricator.wikimedia.org/T190915 (10Milimetric) p:05Medium→03Low [17:18:41] 10Analytics: Report updater should support Graphite mapping plugins - https://phabricator.wikimedia.org/T152257 (10Milimetric) 05Open→03Declined outdated [17:19:13] 10Analytics, 10Beta-Cluster-Infrastructure: Set up a fake Pageview API endpoint for the beta cluster - https://phabricator.wikimedia.org/T150483 (10Milimetric) 05Open→03Declined [17:21:32] 10Analytics: Make sure pageview API limits are well documented - https://phabricator.wikimedia.org/T261681 (10Milimetric) [17:28:10] 10Analytics, 10Analytics-Dashiki: Breakdowns should be bookmarkeable - https://phabricator.wikimedia.org/T136127 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:28:16] 10Analytics, 10Analytics-Dashiki: Simplify readiness checking by making a ready computed - https://phabricator.wikimedia.org/T136025 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:28:27] 10Analytics, 10Analytics-Dashiki: Clean up property passing in dashiki - https://phabricator.wikimedia.org/T132691 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:28:44] 10Analytics, 10Analytics-Dashiki, 10good first task: Detect bad hash in tabs layout - https://phabricator.wikimedia.org/T219235 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:28:47] 10Analytics, 10Analytics-Dashiki, 10CX-analytics: Dashiki: CX2 translations fails to load - https://phabricator.wikimedia.org/T217506 (10Milimetric) [17:29:28] 10Analytics, 10Analytics-Dashiki: Optionally do not sort columns in table-timeseries alphabetically - https://phabricator.wikimedia.org/T189125 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:29:33] 10Analytics, 10Analytics-Dashiki: Add a legend for annotations - https://phabricator.wikimedia.org/T189164 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:29:35] 10Analytics, 10Analytics-Dashiki, 10good first task: Add annotationsMetric option to tabs layout - https://phabricator.wikimedia.org/T189159 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:29:37] 10Analytics, 10Analytics-Dashiki: Make it possible to suppress the box in the bottom left of dygraphs-timeseries graphs - https://phabricator.wikimedia.org/T189069 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:29:39] 10Analytics, 10Analytics-Dashiki: publish mediawiki deployments as a metric tsv - https://phabricator.wikimedia.org/T189156 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:29:47] 10Analytics, 10Analytics-Dashiki: Clean up remaining Dashiki configs on meta - https://phabricator.wikimedia.org/T159269 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:31:08] 10Analytics, 10Analytics-Dashiki, 10Patch-For-Review: Create dashboard for upload wizard - https://phabricator.wikimedia.org/T159233 (10Milimetric) 05Open→03Declined Do follow-up if somehow this is still relevant [17:31:39] 10Analytics, 10Analytics-Dashiki: Dashboards working on mobile - https://phabricator.wikimedia.org/T144299 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:31:57] 10Analytics, 10Analytics-Dashiki, 10good first task: Add external link to tabs layout - https://phabricator.wikimedia.org/T146774 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:32:00] 10Analytics, 10Analytics-Dashiki: Improve initial load performance for dashiki dashboards - https://phabricator.wikimedia.org/T142395 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:32:01] * elukey going offline! ttl [17:32:25] 10Analytics, 10Analytics-Dashiki, 10good first task: Add error component to Dashiki - https://phabricator.wikimedia.org/T157697 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:32:26] 10Analytics, 10Analytics-Dashiki: Create a Universal Layout for Dashiki for staging / testing config purposes - https://phabricator.wikimedia.org/T147009 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:32:29] 10Analytics, 10Analytics-Dashiki: Just an idea: poly-graph - https://phabricator.wikimedia.org/T148469 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:32:30] 10Analytics, 10Analytics-Dashiki: Allow clicking on links in annotations - https://phabricator.wikimedia.org/T110459 (10Milimetric) 05Open→03Declined manually merged into T168573, to make it easier to triage and decide on the future of Dashiki. [17:33:18] 10Analytics-Clusters: AMD ROCm kernel drivers on stat1005/stat1008 don't support some features - https://phabricator.wikimedia.org/T260442 (10Nuria) cc-ing @calbon [17:34:53] 10Analytics, 10Analytics-Dashiki: Dashiki Cleanup - https://phabricator.wikimedia.org/T168573 (10Milimetric) [17:35:46] (most of that nonsense was merging all dashiki tasks into that last one ^) [17:43:51] 10Analytics-Clusters: hue.wikimedia.org throws an exception when trying to log in with a non-ASCII username - https://phabricator.wikimedia.org/T260929 (10matmarex) I don't need Hue access right now, so it's not really high priority, but I might need it in the future. Also, I'm not sure if Hue expects the wiki... [17:53:44] 10Analytics-Clusters: AMD ROCm kernel drivers on stat1005/stat1008 don't support some features - https://phabricator.wikimedia.org/T260442 (10MoritzMuehlenhoff) >>! In T260442#6424247, @elukey wrote: > > 1) add the http://repo.radeon.com/rocm/apt/3.3/pool/main/r/rock-dkms/ package to our internal apt's reposito... [18:32:18] 10Analytics-Radar, 10MediaWiki-extensions-NavigationTiming, 10Performance-Team (Radar): Invalid EventLogging messages for NavigationTiming topic - https://phabricator.wikimedia.org/T261665 (10Gilles) [19:49:09] nuria: helllOooo! if you have a moment would not mind another brain bounce on event ingestion stuff; got a q about how you'd do a thing with builder pattern [19:49:27] ottomata: free in 15 mins [19:49:31] k pfct [19:55:09] ottomata: question about kafka and jobrunner if you have a minute (cc ryankemper) [19:55:18] sure! [19:55:26] https://grafana.wikimedia.org/explore?orgId=1&left=%5B%22now-12h%22,%22now%22,%22codfw%20prometheus%2Fops%22,%7B%22expr%22:%22kafka_burrow_partition_lag%7B%20%20%20%20group%3D%5C%22cpjobqueue-cirrusSearchElasticaWrite%5C%22,%20%20%20%20topic%3D~%5C%22%5B%5B:alpha:%5D%5D*.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite%5C%22%7D%5Cn%22%7D,%7B%22ui%22:%5Btrue,true,true,%22none%22%5D%7D%5D [19:55:27] i'm only kinda familiar with jobrunner [19:55:40] since the DC switch, we see activity on codfw (as expected) [19:56:00] k [19:56:10] we don't see activity on eqiad anymore [19:56:10] https://grafana.wikimedia.org/explore?orgId=1&left=%5B%22now-12h%22,%22now%22,%22eqiad%20prometheus%2Fops%22,%7B%22expr%22:%22kafka_burrow_partition_lag%7B%20%20%20%20group%3D%5C%22cpjobqueue-cirrusSearchElasticaWrite%5C%22,%20%20%20%20topic%3D~%5C%22%5B%5B:alpha:%5D%5D*.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite%5C%22%7D%5Cn%22%7D,%7B%22ui%22:%5Btrue,true,true,%22none%22%5D%7D%5D [19:56:21] BUT that graph has not gone down to zero [19:56:57] I'm understanding "kafka_burrow_partition_lag" as the consumer lag, so a measure of how much behind the consumers are (am I correct) [19:58:00] yes [19:58:26] So we no longer have a consumer in eqiad processing that queue, correct? [19:58:30] the lag is calculated from the latest offset - the last committed consumer offset [19:58:41] so that means that we have non consumed stuff in that queue, that will get processed when we switch back to eqiad? [19:59:07] i believe so, although, i'm curious as to why it would stop consuming in eqiad [19:59:16] i think that for jobqueue stuff [19:59:22] that should not be an issue in itself, but it is annoying, because we use that metric during cluster restarts to wait until all updates have been processed [19:59:27] the workers aren't switched over [19:59:34] its just new job submission is [19:59:38] so new jobs will be created only in codfw [19:59:45] but the workers in both DCs shoudl keep running [19:59:58] and i'd expect the eqiad ones to finish in eqiad [20:00:05] that would make a lot of sense! but then, why is that offset stuck at 4k? [20:00:11] maybe there is some weird consumer batching? [20:00:15] is that possible? [20:00:20] we probably should ask Pchelolo [20:01:01] lemme read the backscroll [20:01:55] we've seen this weirness before when lag just freezes [20:02:01] * gehel needs to stop working, but ryankemper will follow up [20:02:08] * gehel will read the backlog tomorrow [20:02:36] I guess one place to start is, where do these consumers live so I can start poking at them [20:02:38] I've investigated it a bit in relation to mediamoderation jobs that got stuck too [20:02:51] and I concluded it was a bug in burrow [20:03:10] oh and as far as the weirdness you mentioned Pchelolo, are you imagining the consumer itself being stuck, or rather that the consumer is still processing events but the offset never got updated? [20:03:35] ottomata: bc? [20:03:36] I do not think consumer is stuck [20:03:55] mmmm... I have an idea, gimme a moment [20:04:07] * nuria wonders why i cannot see those graphs [20:04:28] are you logged into grafana? [20:04:52] nuria: coming [20:06:31] no, never mind about the idea.. [20:07:14] I think the consumers are not stuck, I think burrow is being the problem [20:07:33] joal: thanks for the update re the interview questions. I will do a pass this afternoon PST and you should have my comments in for when you wake up tomorrow. [20:10:32] Pchelolo: I'm not super familiar with how burrow works under the hood, but from some googling it sounds like it's basically tailing the `__consumer__offsets` special queeu? [20:12:24] hm.. interesting. kafka consumer-groups --describe --group cpjobqueue-cirrusSearchElasticaWrite agrees with burrow [20:13:59] Is there a better metric we could check to know if the elastic write queue is lagging ? [20:15:56] hm.. so the commits in jobqueue change-prop are not done for each message processed since the messages can be processed in order [20:16:27] so the finished message offsets are accumulated in memory, and then one every 500ms we find the right watermark to commit [20:16:55] and the timeout is set by messages which finished processing [20:17:42] So I could imagine that if we suddenly stop having jobs in the topic, we could maybe somehow loose the last commit batch [20:17:56] but 5000 messages are way too much to accumulate over 500ms [20:21:11] I guess I'lll file a task [20:28:43] nuria: https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/623413/5/eventutilities/src/main/java/org/wikimedia/eventutilities/core/event/WikimediaEventStreamFactory.java [20:28:47] ty! [20:33:07] ottomata: k [20:44:41] Every now and then, I feel trolled when a search intended as being technical gives decidedly non-technical results https://usercontent.irccloud-cdn.com/file/XPTjmAik/puppet-types.png [20:44:52] hahah [20:45:02] razzi: juas [20:45:47] ottomata: and what if the factory.builder().build() method (no calling of settings) ALWAYS returned a wmf-stream-factory [20:46:04] ottomata: but [20:46:09] https://www.irccloud.com/pastebin/nLn0Y3A4/ [20:46:26] f you want something else you call it like that [20:46:29] like: [20:46:58] EventStreamFactory f = EventStreamFactory.builder(). build() => will build WMFStreamfactory by default [20:46:58] [20:47:02] mmmm, not sure [20:48:02] nuria: i had that before too, i also like that, but it means that wmf production defaults will be in EventStreamFactory [20:48:03] actually [20:48:10] nuria: there are two things EventStreamFactory needs [20:48:17] EventSchemaLoader and EventStreamConfig [20:48:27] the defaults really are specific to each of those [20:48:34] there are wmf production default schema_base_uris [20:48:45] and wmf production default event_stream_config_uri and event_service_to_uri_map [20:48:56] i could even put those defaults in their relevant classes [20:49:04] then you could even instantiate those classes individually [20:49:09] with wmf production defaults [20:49:31] i had somethign like this originally, but again, jo al thought it was weird to put the wmf prod defaults hardcoded into the generic classe [20:49:32] s [20:49:45] even if they are only used as defaults [20:50:01] i really don't mind either way, this isn't some library that is intended for use outside of wmf prod anyway [20:51:35] ottomata: our default case is that we use wmf defaults ya? [20:52:02] if we want it to be, we can make it that way [20:57:50] ottomata: I think something like: EventStreamFactory. build() => returns StreamFactory with WMF defaults [20:58:21] ottomata: I think something like: EventStreamFactory. builder().setSome().build() => returns stream factory with some other defaults [20:58:28] what about [20:58:33] EventStreamFactory.builder().build() [20:58:35] ? [20:58:43] since .build() is a method on EventStreamFactoryBuilder [20:58:45] ottomata: that too [20:58:49] ok [20:58:54] will update patch... [20:59:28] ottomata: is that satisfactory or does it feel something is missing? [20:59:44] 10Analytics: Ensure Puppet checks types as part of the build - https://phabricator.wikimedia.org/T261693 (10razzi) [20:59:47] i mean, it does put those WMf production defaults hardcoded into e.g EventStreamFactory [20:59:53] but i personaly don't care that much [21:00:01] i think that is cleaner, fewer classes [21:00:38] 10Analytics: Ensure Puppet checks types as part of the build - https://phabricator.wikimedia.org/T261693 (10razzi) [21:00:52] they can be on another class and just referenced by their CONSTANT_1_NAME [21:01:31] could but if we do that i might as well put the build method there too? [21:01:35] i guess [21:01:48] i could make a EventStreamFactoryBuilder.buildWmf() method? [21:01:53] that just uses those values? [21:02:11] ottomata: I would put all public static final CONSTANTs [21:02:18] in one file with no logic [21:02:47] nuria: i guess, if we do that, it is kinda weird to expect them to be the generic defaults [21:03:03] ottomata: * i think* as a private method EventStreamFactoryBuilder.buildWmf() makes sense [21:03:08] oh [21:03:17] if we have a buildWmf method it would be obvious what is happening [21:03:27] kind of the same as having a separate WMF specific class tho [21:03:42] but, if we have a buildWmf method, we might as well keep the wmf prod values in the same class [21:04:04] and, would buildWmf return a singleton? :p [21:05:45] ottomata: if we do EventStreamFactory. build() => returns StreamFactory with WMF defaults [21:05:52] what about [21:06:02] EventStreamFactory.getWmfInstance() [21:06:03] ? [21:06:06] EventStreamFactory. builder().setSome().build() => => returns stream factory with some other defaults [21:06:38] ottomata: it is anti- factory no? [21:07:19] ottomata: the thing is that it does not matter that is a wmf instance right? [21:07:43] ottomata: it just metters that is the default of the builder() [21:08:01] ottomata:no? [21:08:17] nuria: [21:08:17] EventStreamFactory. build() [21:08:19] do you mean [21:08:23] EventStreamFactory.builder().build() [21:08:23] ? [21:08:26] ottomata: I can do changes on top of your patch (unless you hate it) [21:08:32] ottomata: either [21:08:45] already started, some, lemme push and see whay ou thikn, i think it is what you are saying [21:08:49] ottomata: but ya EventStreamFactory.builder().build() [21:08:55] ottomata: works just fine [21:12:04] ok nuria [21:12:05] https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/623413/7/eventutilities/src/main/java/org/wikimedia/eventutilities/core/event/EventStreamFactory.java#159 [21:14:11] ottomata: I think that works, for teh hash [21:14:21] I would change: [21:14:23] put("eventgate-main", URI.create("https://eventgate-main.discovery.wmnet:4492/v1/events")); [21:15:04] put("eventgate-main", URI.create(EVENTGATE_MAIN_URL)); [21:15:11] oof [21:15:12] so there is less clutter [21:15:21] there are 3 URIs for that [21:15:35] total of 12 right now for all endpoints [21:15:43] you want me to make constants for all of them? [21:15:44] but for unit testing it works as we can build the instances for testing with: EventStreamFactory. builder().setSome().build() [21:15:49] ah [21:15:50] ottomata: ahem [21:15:57] for unit testing i don't even use the default [21:16:02] i load it from a file [21:16:09] ottomata: k [21:16:12] https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/623413/9/eventutilities/src/test/resources/event_service_to_uri.yaml [21:16:31] that's actually supported by the builder [21:17:02] ottomata: i see, ok [21:17:03] https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/623413/9/eventutilities/src/main/java/org/wikimedia/eventutilities/core/event/EventStreamConfig.java#128 [21:17:09] setEventServiceToUriMap(String eventServiceConfigUri) [21:18:03] ottomata: k [21:20:48] ottomata: but i really think all those constants shoudl be public static finals defined at the beginning of that file or on a different file [21:23:21] nuria: i'm fine with that too, it just seemed that maybe they belonged in the Builder class [21:23:31] i could make the Builder class a separate file instead of an inner class [21:23:32] ? [21:23:34] don't love that [21:23:43] jaja [21:23:45] i also don't care, i can make them public static finals on EventStreamFacgory [21:23:57] tehy do not have to be public, sorry [21:24:09] just static final [21:25:44] I am fine with them being all in the same file for now, ideally they would not be in the code at all and come from config "a la mediawiki config" [21:26:17] well, they will for the prod job [21:26:25] this is mostly just for easy use by humans [21:26:52] ok, i'll make them top level [21:35:29] nuria: [21:35:35] should I put the defaults into the classes they pertain to [21:35:36] e.g. [21:35:46] SCHEMA_BASE_URIS_DEFAULT could go in EventSchemaLoader [21:38:30] ottomata: i would put all the constants in a class called Config.java [21:42:33] nuria: then in Builder.build() e.g. [21:42:33] if (eventSchemaLoader == null) { [21:42:33] setEventSchemaLoader(WikimediaEventPlatformConfig.SCHEMA_BASE_URIS); [21:42:34] } [21:42:58] i guses o [21:42:59] i geuss so [21:44:42] ottomata: ya, no? (the other options i can think about seem unnecessarily complicated) [21:44:48] ok [21:53:11] ottomata: ok, will reviw as soon as i commit my changes for joseph to review [21:53:37] ok cool [21:53:39] https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/623413/12/eventutilities/src/main/java/org/wikimedia/eventutilities/core/event/WikimediaDefaults.java [21:53:54] that's actually kind of nice [21:54:02] because I also have an EventStreamConfigBuilder [21:54:06] that can now use the same default [21:54:09] in the same way [21:54:27] like [21:54:27] https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/623413/13/eventutilities/src/main/java/org/wikimedia/eventutilities/core/event/EventStreamConfig.java#131 [21:54:42] ottomata: oohh seee [21:55:58] nuria i've added you as reviewer to 2 reffinery-source changes too [21:55:59] that use this [21:56:12] ottomata: ok, will do! [21:56:19] the first one just gets rid of the duplicate classes; e.g. moving EventSchemaLoader to event-utilities [21:56:27] the second is ProduceCanaryEvents job [22:04:18] ottomata: hiya! do you have any remaining reservations/hesitations +2-ing https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventStreamConfig/+/622881/ ? [22:20:00] bearloga: merged! [22:23:16] ottomata: (cc razzi) are there any 'puppet for newbies' resouces for yours truly that you would recommend? [22:24:38] I'm going to give the official-looking https://learn.puppet.com/course/puppet-basics/ a go [22:25:13] ottomata: also for the eventutilities changes , how are we doing the local development , are we pushing changes to archiva and sourcing those to refinery after or ..? [22:25:37] nuria: haha, razzi just asked me the same thing, i started learning puppet in 2007 so if even if i could remember any tutorials I did they would be woefully out of date [22:25:46] but ya razzi found one that looks good [22:26:08] nuria: eventutitlities is released to archiva [22:26:12] we could push snapshots [22:26:15] but i've just been doing local mvn install [22:26:18] to use it in refinery-source [22:47:10] fdans: FYI just responded on jsonschema-tools PR [22:57:04] ottomata: k, will cr with that in mind [23:17:32] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Data, and 2 others: MEP Client MediaWiki PHP - https://phabricator.wikimedia.org/T253121 (10Mholloway) TODO: Sampling config handling [23:30:42] 10Analytics-Clusters, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-09-15) upgrade/replace memory in stat100[58] - https://phabricator.wikimedia.org/T260448 (10Jclark-ctr) [23:30:52] 10Analytics-Clusters, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-09-15) upgrade/replace memory in stat100[58] - https://phabricator.wikimedia.org/T260448 (10Jclark-ctr) received memory placed in storage room