[02:23:59] 10Analytics, 10Event-Platform: jsonschema-tools should ensure schema examples exist - https://phabricator.wikimedia.org/T270134 (10Ottomata) [02:24:34] 10Analytics, 10Event-Platform: jsonschema-tools should ensure schema examples exist - https://phabricator.wikimedia.org/T270134 (10Ottomata) https://github.com/wikimedia/jsonschema-tools/pull/23 [04:22:21] 10Analytics-Radar, 10MediaWiki-Authentication-and-authorization, 10Privacy Engineering, 10Performance-Team (Radar): Clear site data on MediaWiki log out - https://phabricator.wikimedia.org/T179752 (10Krinkle) [04:23:20] 10Analytics-Radar, 10MediaWiki-Authentication-and-authorization, 10Privacy Engineering, 10Performance-Team (Radar): Clear site data on MediaWiki log out - https://phabricator.wikimedia.org/T179752 (10Krinkle) Tagging Privacy-Engineering as FYI. This may be worth looking into and get into our planning. See... [07:42:06] Bonjour [07:42:08] (03PS1) 10Joal: Fix oozie conf errors found during all-restart [analytics/refinery] - 10https://gerrit.wikimedia.org/r/650034 (https://phabricator.wikimedia.org/T257412) [07:49:21] bonjour! [07:49:46] joal: just to double check - does the naming convention involve no "hourly" anymore? [07:50:47] ahah no - naming convention reflects folder organization in coordinator name, facilitating checking/building jobs path out of their coord names [07:53:46] (03CR) 10Elukey: [C: 03+1] Fix oozie conf errors found during all-restart [analytics/refinery] - 10https://gerrit.wikimedia.org/r/650034 (https://phabricator.wikimedia.org/T257412) (owner: 10Joal) [07:54:11] sounds good, it is a little strange to still see hourly coords with and without "hourly" in their name [07:54:47] elukey: we can move the druid folder in the hourly one? [07:57:33] joal: nono the convention makes sense, it reflects the path structure, I was only wondering why we have this difference.. [07:58:12] yeah - That's why I was suggesting changing the path structure to make for a more menainful name :) [07:59:01] I fear it would become a big burden for you to review all job names :D [07:59:08] for this particular fix it makes sense [07:59:46] We have 2 job classes that don't follow the convention and can't be restarted out of automatically creating commands: data_quality and cassandra [08:00:28] then feel free to go ahead :) [08:04:44] o/ [08:05:01] o/ [08:05:22] is there a way to list the jvm args of all the processes running on the workers behind a yarn application? [08:06:02] more precisely I'd like to know the -Xmx arg of the jvms on the worker nodes [08:06:53] yarn UI reports the total but I'm curious what's the actual Xmx value [08:07:23] dcausse: what is the app id? [08:07:33] elukey: application_1605880843685_20679 [08:09:17] dcausse: I see -Xmx1073741824 -Xms1073741824 [08:09:44] and also -Xmx3597035049 -Xms3597035049 [08:10:33] elukey: thanks! (should be 3 at 3G and 1 at 1G) [08:10:42] ah checking [08:12:00] dcausse: yep [08:12:11] great thank you! :) [09:05:28] joal: so nothing seems exploding so far :) [09:06:05] I just sent an email to analytics-announce@ to schedule the move of an-coord1001 to analytics-hive, since I realized that it would impact stat boxes etc.. [09:06:21] so since I need to reboot them, I'll also add some new configs :) [09:06:31] (including the shared kerberos cache) [09:32:47] * elukey bbiab [10:04:30] 10Analytics, 10Wikidata, 10Wikidata-Query-Service: wikimedia-event-utilities should provide tools for JVM based apps producing directly to kafka - https://phabricator.wikimedia.org/T270371 (10dcausse) [11:02:03] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/649871 (https://phabricator.wikimedia.org/T270274) (owner: 10Gerrit maintenance bot) [11:04:12] 10Analytics-Kanban, 10Patch-For-Review: Test the Bigtop 1.5 RC release on the Hadoop test cluster - https://phabricator.wikimedia.org/T269919 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-test-master1001.eqiad.wmnet', 'an-test-master1002.eqiad.... [11:04:54] !log wipe/reimage the hadoop test cluster to start clean for CDH (and then test the upgrade to bigtop 1.5) [11:04:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:09:29] (03CR) 10Awight: [C: 03+1] Process EventLogging events for TemplateData [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/649861 (https://phabricator.wikimedia.org/T270246) (owner: 10Andrew-WMDE) [11:34:00] 10Analytics-Kanban, 10Patch-For-Review: Test the Bigtop 1.5 RC release on the Hadoop test cluster - https://phabricator.wikimedia.org/T269919 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-test-master1002.eqiad.wmnet', 'an-test-master1001.eqiad.wmnet', 'an-test-worker1001.eqiad.wmnet', 'an-test... [11:54:33] elukey: so about kfakacat. Would we make a new section like for amd-rocm for the backported package? I just want to avoid pushing kafkacat 1.5 to people who prefer (for whatever reason) the Buster version [12:00:15] klausman: hello! [12:01:35] Heya :) [12:01:38] klausman: I'd be 100% in favor of uploading that to buster-wikimedia, in my opinion it is a win for everybody, since the updated version offers new things keeping the rest stable [12:01:49] Alright. [12:02:12] https://wikitech.wikimedia.org/wiki/Reprepro#Importing_packages So this proces,, right? [12:02:22] then if people complains (I see this extremely unlikely) we can move stuff to a separate component [12:02:48] (that page is.... not well-organized) [12:02:57] yes exactly seems up to date [12:03:21] reprepro -C main include buster-wikimedia kafkacat-yadayada.changes [12:03:36] you can also include only the deb, but if src etc.. are available it is better [12:03:49] if this is a one off I'd make it clear also in the changes file [12:03:51] Ok, I'll see how much havoc I can wreak [12:03:55] so people know etc.. [12:03:59] perfect :) [12:03:59] (Dogs of War are on standby) [12:09:05] 10Analytics-Kanban, 10Patch-For-Review: Test the Bigtop 1.5 RC release on the Hadoop test cluster - https://phabricator.wikimedia.org/T269919 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-test-coord1001.eqiad.wmnet'] ` The log can be found in `... [12:15:58] quick lunch! [12:31:13] 10Analytics-Kanban, 10Patch-For-Review: Test the Bigtop 1.5 RC release on the Hadoop test cluster - https://phabricator.wikimedia.org/T269919 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-test-coord1001.eqiad.wmnet'] ` and were **ALL** successful. [12:38:46] (03CR) 10Joal: "Adding a note on retry backoff" (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/649884 (https://phabricator.wikimedia.org/T268809) (owner: 10Fdans) [13:23:42] joal: regarding the retries, do you think we should keep the timeout at 10 seconds then? [13:25:12] Hi fdans - Let's confirm with elukey what he thinks about that [13:26:32] I think that even 5s could be good, but no strong opinions [13:26:36] we can refine if needed [13:26:55] the 3x retry part is what concerns me, I fear amplification of queries under load [13:27:59] right elukey - I was thinking to make timeout long enough so that amplification when timing-out is slowed down [13:29:14] joal: but then we keep risking of piling up connections on the aqs side, we could possibly even avoid retries for the time being, with a 10s or 5s timeout [13:29:30] and then we see how it goes [13:30:01] I am not a big fan in general of retries for apis, the client does it if needed [13:30:26] works for me elukey [13:35:07] fdans: does it make sense for you too? So just a 10s timeout for starter, with some comments about retries etc.. (so we know how to enable them if needed etc..) [13:35:54] elukey: joal sounds good [13:36:10] Cool! [13:37:15] super thanks :) [13:41:29] (03PS3) 10Fdans: AQS: add configuration for timeout to Druid requests [analytics/aqs] - 10https://gerrit.wikimedia.org/r/649884 (https://phabricator.wikimedia.org/T268809) [13:42:47] (03CR) 10jerkins-bot: [V: 04-1] AQS: add configuration for timeout to Druid requests [analytics/aqs] - 10https://gerrit.wikimedia.org/r/649884 (https://phabricator.wikimedia.org/T268809) (owner: 10Fdans) [13:43:36] 🙄 [13:43:52] (03PS4) 10Fdans: AQS: add configuration for timeout to Druid requests [analytics/aqs] - 10https://gerrit.wikimedia.org/r/649884 (https://phabricator.wikimedia.org/T268809) [13:57:24] elukey: so for versioning, the Buster package is at 1.6.0-1. Would we use something like 1.6.0-1.1wmf1 then? I think that's the pattern what the WMF-specific libkafka uses: [13:57:26] $ apt-cache show librdkafka1|grep Version [13:57:28] Version: 0.11.6-1.1wmf1 [13:59:44] the buster package is 1.3.1-1 right? [13:59:49] (I am getting confused otherwise :D) [14:00:40] I think so [14:00:52] we can use something like +wmf1 for sure, I don't think that we have a precise convention.. or even 1.6.0-1+deb10u1 [14:01:03] $ apt-cache show kafkacat [14:01:05] Package: kafkacat [14:01:07] Version: 1.3.1-1 [14:01:19] That's buster [14:01:25] On Bullseye, it's 1.6.0-1 [14:01:28] yep yep i was confused by "so for versioning, the Buster package is at 1.6.0-1" [14:01:37] Sorr, my bad [14:01:53] yes yes my brain faulted for a sec but it recovered :D [14:01:56] Buster/Bullseye are just too close in naming, what were Debian thinking? [14:02:03] ahahahah [14:02:24] for example see curl's version https://packages.debian.org/buster/curl [14:02:50] So +deb10u1 means it's a backport? [14:03:00] I thought they used "bpo" or sth like that [14:03:24] Then again, it's not really a port. Just compiling it on Buster with the WMF-provided librdkafka [14:04:28] nono the +deb10u1 is IIUC to state "version for buster" [14:04:44] but we might want to indicate that it is our version, so +wmf1 seems fine [14:04:48] simpler [14:04:56] what do you think? [14:07:28] WMF [14:07:31] wer WFM [14:07:49] I am checking on reprepro and we have a wide variety of conventions [14:07:49] mah brain very gud today %-) [14:08:48] reprepro list buster-wikimedia | egrep '*wmf*' [14:08:53] ahahahah [14:09:56] Hrm. [14:10:15] wait... you're using a glob with grep, that won't work [14:10:43] with egrep? [14:10:53] egrep is still regexp [14:11:10] That first '*' is likely interpreted literally, since it has no preceding class [14:11:25] It should be '.*wmf.*' or just 'wmf' [14:11:58] '*wmf*' would work with the shell [14:12:02] strange it doesn't interpret the * as literal [14:12:18] I get what you mean yes [14:13:10] so the last * is probably zero or more time f [14:13:18] the former doesn't really count I think [14:13:22] Yes, which is likely not what you meant :) [14:13:23] ANYWAY [14:13:45] | grep wmf [14:13:55] my point was to give you a quick command to check stuff :) [14:15:09] I see a lot of ~wmf1, +wmf1, ~wmf1+deb10u1 [14:15:11] etc.. [14:15:36] Ah, I thought you meant that grep cam back *empty* [14:16:10] Aren't we two genuises today [14:16:37] rocket scientists :D [14:17:21] to summarize, please go ahead with any version that you prefer, there is not really a guidance [14:17:41] the extra bits should matter when apt tries to figure out what version comes first and what after [14:17:52] but we are super fine in this case [14:20:14] Aye [14:44:32] stat1008 ~ $ sudo apt-cache show kafkacat [14:44:34] Package: kafkacat [14:44:36] Version: 1.6.0-1.1wmf1 [14:44:38] wheeee [14:44:59] niceeee [14:45:12] feel free to test/upgrade the stats :) [14:45:46] (03PS5) 10Fdans: AQS: add configuration for timeout to Druid requests [analytics/aqs] - 10https://gerrit.wikimedia.org/r/649884 (https://phabricator.wikimedia.org/T268809) [14:54:00] !log Updated all stat100x machines to now sport kafkacat 1.6.0, backported from Bullseye [14:54:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:54:27] Should I do any kind of wider announcement? Or just mention it during Standup? [14:54:43] the latter is fine! [14:54:46] thanks a lot :) [14:55:22] Roger. Will also update the ticket [14:55:59] 10Analytics-Clusters, 10Operations: Backport kafkacat 1.6.0 from bullseye to buster-backports or buster-wikimedia - https://phabricator.wikimedia.org/T268936 (10klausman) Package is backported and uploaded to reprepro/aptx00y and updated on all stats100x machines. [14:56:26] Now to find something shaped like a very late lunch :) [14:59:56] (03CR) 10Fdans: [V: 03+2 C: 03+2] "Tested, matched with legacy pageviews correctly for December 2007 and January 2008. Merging." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/640146 (https://phabricator.wikimedia.org/T251777) (owner: 10Fdans) [15:00:47] * elukey quick coffee break [15:15:17] helloooo [15:19:13] holaaa [15:27:19] 10Analytics, 10Event-Platform, 10Wikidata, 10Wikidata-Query-Service, 10Patch-For-Review: wikimedia-event-utilities should provide tools for JVM based apps producing directly to kafka - https://phabricator.wikimedia.org/T270371 (10Ottomata) [15:27:37] 10Analytics-Clusters, 10Operations: Backport kafkacat 1.6.0 from bullseye to buster-backports or buster-wikimedia - https://phabricator.wikimedia.org/T268936 (10Ottomata) Yahoo! [15:53:47] hi team [15:53:53] good morning :) [15:54:17] good evening :) [15:56:01] Quick question for you elukey: when I try to rsync from krb1001 to puppetmaster1001, it prompts for a password; how do I deal with that? [15:56:22] elukey: qq [15:56:30] oh! razzi beat me :P [15:57:10] razzi: it is password protected, the user:pass is under /srv/kerberos [15:57:33] mforns: sure :) [15:57:49] ah ok elukey now I that in the doc. Thanks! [15:57:56] np! [15:58:14] elukey: thinking of traffic anomalies job generating a file for icinga to alarm [15:58:31] elukey: at which interval does icinga check for the file existence? [16:00:30] mforns: so the nagios nrpe daemon doesn't check for file existence, it checks if a systemd unit return non zero, etc.. there are also custom checks, but creating a file may not be the best (then it needs to be cleaned etc..) [16:00:34] what is it the use case? [16:01:01] can we just return non-zero ? (asking because of ignorance, not sure how the whole thing works) [16:04:40] hi all. sukhe and mforms, are we meeting today? [16:05:34] dsaez: yes! [16:05:35] same link [16:19:53] (03PS3) 10Fdans: Wikistats testing framework: Replace Karma with Jest [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/648376 [16:21:05] (03CR) 10jerkins-bot: [V: 04-1] Wikistats testing framework: Replace Karma with Jest [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/648376 (owner: 10Fdans) [16:22:00] (03PS4) 10Fdans: Wikistats testing framework: Replace Karma with Jest [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/648376 [16:22:54] (03CR) 10jerkins-bot: [V: 04-1] Wikistats testing framework: Replace Karma with Jest [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/648376 (owner: 10Fdans) [16:26:28] (03PS5) 10Fdans: Wikistats testing framework: Replace Karma with Jest [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/648376 [16:32:37] (03PS2) 10Fdans: Upgrade Webpack from 2 to 5 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/649311 (https://phabricator.wikimedia.org/T188759) [16:33:06] (03CR) 10jerkins-bot: [V: 04-1] Upgrade Webpack from 2 to 5 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/649311 (https://phabricator.wikimedia.org/T188759) (owner: 10Fdans) [16:33:26] elukey: +2'd the copytruncate change [16:33:42] klausman: ack! [16:33:46] thanks :) [16:35:36] elukey: sorry was in meeting! Yes, of course, I remember... [16:35:38] thanks [16:42:01] 10Analytics, 10Product-Analytics, 10Inuka-Team (Kanban): Set up preview counting for KaiOS app - https://phabricator.wikimedia.org/T244548 (10nshahquinn-wmf) One minor issue I noticed: because we can't alter the user agent for `sendBeacon` calls, we aren't able to set the `wmf_app_version` field. We are stil... [16:42:30] (03PS3) 10Fdans: Upgrade Webpack from 2 to 5 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/649311 (https://phabricator.wikimedia.org/T188759) [16:43:27] 10Analytics, 10Product-Analytics, 10Inuka-Team (Kanban): Set up preview counting for KaiOS app - https://phabricator.wikimedia.org/T244548 (10nshahquinn-wmf) [16:48:05] (03PS2) 10Fdans: Show status of translations not included in build [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/614850 (owner: 10Milimetric) [16:52:07] (03CR) 10Joal: [C: 03+1] "LGTM! Thanks fdans :)" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/649884 (https://phabricator.wikimedia.org/T268809) (owner: 10Fdans) [16:55:12] (03CR) 10Elukey: [C: 03+1] "LGTM, but we should briefly test it somewhere before deploying.." [analytics/aqs] - 10https://gerrit.wikimedia.org/r/649884 (https://phabricator.wikimedia.org/T268809) (owner: 10Fdans) [17:06:31] elukey: for when you have time: https://gerrit.wikimedia.org/r/c/operations/puppet/+/649721 :D [17:09:17] 10Analytics, 10Event-Platform, 10Wikidata, 10Wikidata-Query-Service, 10Patch-For-Review: wikimedia-event-utilities should provide tools for JVM based apps producing directly to kafka - https://phabricator.wikimedia.org/T270371 (10Ottomata) [17:11:23] 10Analytics, 10Product-Analytics, 10Inuka-Team (Kanban): Set up preview counting for KaiOS app - https://phabricator.wikimedia.org/T244548 (10Ottomata) If you use Event Platform, your client can explicitly set `http.request_headers['user-agent']`. [17:12:11] 10Analytics, 10Product-Analytics, 10Inuka-Team (Kanban): Set up preview counting for KaiOS app - https://phabricator.wikimedia.org/T244548 (10Ottomata) Oh, I see this is for VirtualPageView, which makes what I said more complicated. :) [17:27:45] Amir1: thanks! [17:28:12] Nah, it was tiny :D [17:28:54] * elukey reprases to please Amir [17:29:16] Amir1: how dare you sending a tiny patch instead of a huge one? This time I am merging, but it is the last! :D [17:29:35] *rephrases [17:29:52] in any case, thanks for the effort, really appreciated :) [17:29:58] be careful what you wish for, my next patch will be 200 lines :P [17:30:21] 290 left, 100-ish is analytics [17:30:25] Amir1: it would be painful only if it was related to mediawiki's internals :D [17:30:50] (that I know basically zero about) [17:31:10] no one really knows about it either [17:31:23] seriously, it grew so much [17:32:16] the part that matters is that its logo is going to change [17:32:38] boy I hate the old logo [17:37:19] :) [18:25:12] (03PS1) 10Gerrit maintenance bot: Add nia.wiktionary to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/650218 (https://phabricator.wikimedia.org/T270409) [18:25:58] (03PS1) 10Gerrit maintenance bot: Add nia.wikipedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/650221 (https://phabricator.wikimedia.org/T270408) [18:28:28] joal: anything to hand off from the ops week?? [18:28:37] so ... wait ... does that mean nobody volunteers to pair with me? :( [18:28:46] hm - nothing that I know mforns - All alerts have been taken care of [18:28:54] Thanks for asking :) [18:29:03] ok, thanks! I take the turn now, then! [18:29:23] milimetric: I pair! [18:29:28] batcave? [18:29:30] ok, to the batcave [18:30:22] a-team check out these personal goals from 2012 that i somehow stumbed upon while looking for our goas doc [18:30:23] https://office.wikimedia.org/wiki/Archive:Goals/2012-2013/Engineering/Andrew_Otto [18:30:36] "Show that Analytics team is cool" [19:01:45] ahahahah [19:01:58] ottomata: was it around the time that the SRE team built the great vlan firewall?? :D [19:04:35] hi, I ran into a bogus CI error on https://gerrit.wikimedia.org/r/c/schemas/event/secondary/+/650031 ? [19:04:46] any thoughts on how to handle that? [19:05:38] 10Analytics, 10Analytics-Kanban, 10Privacy Engineering, 10Product-Analytics, and 3 others: Drop data from Prefupdate schema that is older than 90 days - https://phabricator.wikimedia.org/T250049 (10Milimetric) 05Open→03Resolved Done. sudo -u analytics hdfs dfs -rm -r -skipTrash /wmf/data/archive/bac... [19:13:08] * elukey afk! [20:06:26] haha elukey i'm not sure which came first :p [20:14:56] 10Analytics: Switch off skipTrash for data purging - https://phabricator.wikimedia.org/T270431 (10fdans) [20:18:58] 10Analytics: Add logic to purging scripts that requires admin action if it's about to delete a lot of data - https://phabricator.wikimedia.org/T270433 (10fdans) [20:27:08] razzi: yoohoo, i'd like to sit in on the architecture office hours that starts in 33 minutes, are you avail to do our sync now? [20:27:40] ottomata: yeah, 1 minute [20:28:14] 2 minutes! [20:28:37] ok :) [20:30:38] ok! [21:01:26] (03PS1) 10Fdans: Add Catalan and Greek to Wikistats languages [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/650259 [21:13:27] 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production - https://phabricator.wikimedia.org/T120242 (10Ottomata)