[06:30:50] !log re-run cassandra-coord-pageview-per-article-daily 29/10/2019 [06:30:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:52:10] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Create a reports directory under analytics.wikimedia.org - https://phabricator.wikimedia.org/T235494 (10Neil_P._Quinn_WMF) @Ottomata thank you! This looks like a great plan 😁 [06:52:39] fdans: o/ [06:53:01] when you are online, let's check cassandra compactions [06:53:07] they are back to high levels [07:28:12] (03CR) 10Elukey: graphite.py: encode a text string before socket.send (031 comment) [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/546971 (https://phabricator.wikimedia.org/T204736) (owner: 10Awight) [08:24:33] Thanks for that elukey [08:24:51] and hello- obviously :) [08:25:41] elukey: I had this intuition yesterday that backfilloading at the same time of running day-to-day big loading jobs would be a bad idea [08:29:21] elukey: with your permission I'll rerun fail jobs for day-to-day mediarequest-per-file - There are 3 of them, compaction is back to a low-ish level [08:30:14] joal: bonjour! You don't really need my permission :) [08:30:24] ah when you are done, can we batcave a min for airflow? [08:30:38] elukey: give me aminute restarting those jobs, and I'll join [08:31:29] I'm installing PHP sec updates on matomo1001, let me know if anything is odd [08:31:33] !log Rerun failed cassandra-daily-coord-local_group_default_T_mediarequest_per_file days: 2019-10-26, 2019-10-23 and 2019-10-22 [08:31:35] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:31:59] I'll get moar coffee [08:32:19] elukey: ping me when you have enough, we can airflow :) [08:39:29] joal: I am ready! [08:39:37] To the cave ! [09:11:11] (03CR) 10Awight: graphite.py: encode a text string before socket.send (031 comment) [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/546971 (https://phabricator.wikimedia.org/T204736) (owner: 10Awight) [09:12:11] Oh and by the way elukey - I managed to have jenkins not reject my puppet Cr :D [09:12:18] Took me 7 tries, but he :) [09:13:44] :D [09:13:49] I can check it in a sec [09:21:40] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Vgutierrez) @JAllemandou it's currently split like this: ` - VCL_Log CP-TLS-Version: TLSv1.2 - VCL_Log CP-TLS-Sess... [09:24:26] (03CR) 10Awight: graphite.py: encode a text string before socket.send (031 comment) [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/546971 (https://phabricator.wikimedia.org/T204736) (owner: 10Awight) [09:29:05] (03PS3) 10Awight: graphite.py: encode a text string before socket.send [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/546971 (https://phabricator.wikimedia.org/T204736) [09:38:18] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10JAllemandou) Thanks @Vgutierrez - I think representing those values in a map (or an array) is probably the easiest and most flexibl... [09:46:02] (03CR) 10Elukey: [C: 03+2] graphite.py: encode a text string before socket.send [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/546971 (https://phabricator.wikimedia.org/T204736) (owner: 10Awight) [09:47:14] elukey: Thanks! ^ btw, those test are very slow, maybe something is making actual network connections? [09:47:52] awight_: thank you for the work! I think that the gate/submit job started right after my +2, should be fixed soon [09:48:31] yeah something fishy is happening with Zuul, but the slowness happens from the commandline as well. I'll take a quick look. [09:48:49] there seems to be a big backlog sigh [09:51:08] All the test suites are nearly instantaneous, except for test/reportupdater_test.py [09:55:21] 10-30s per test case in that class. I think I can fix it... [09:59:55] * elukey likes awight being nerd-sniped so easily to fix Analytics stuff :D [10:00:08] * awight adjusts glasses [10:00:15] ahahaah [10:00:36] :) [10:01:22] joal: is https://puppet-compiler.wmflabs.org/compiler1001/19134/stat1007.eqiad.wmnet/ ok for you? [10:03:59] elukey: looks like there is an issue with script path (/srv/deployment/analytics/refinery/bin/import-mediawiki-dumps --> /bin/import-mediawiki-dumps) [10:04:07] except from that it looks as expected [10:04:43] elukey: alsoI'd like to include data-purge changes to reflect the changes I have made here [10:05:00] --verbose :D [10:05:12] ah ok ok now I get it yes [10:05:14] sure sure [10:06:26] the issues with the path is strange [10:06:36] That's what I was thinking as well ) [10:06:40] the define might not work as we expect [10:08:53] ah ok [10:09:05] see https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/546966/8/modules/profile/templates/analytics/refinery/job/refinery-import-mediawiki-dumps.sh.erb ? [10:09:11] line 5 [10:09:27] there is a variable not present when the define is evaluated [10:09:40] $refinery_path = $profile::analytics::refinery::path [10:09:51] joal: --^ [10:09:58] Ahhhhhh - we don't have that anymore since not defined in context- I thought parent context would have been passed [10:10:27] well you require refinery in the define, you can add it in there [10:10:36] Makes sense [10:10:38] My bad [10:11:55] also another thing [10:11:58] in the profile [10:12:00] log_file => "${::profile::analytics::refinery::log_dir}/import_siteinfo_dumps.log", [10:12:14] the main issue is that you don't require anymore in there refinery [10:13:06] so that logic could be moved to the define [10:13:08] as well [10:22:39] new try elukey :) [10:26:30] joal: better! https://puppet-compiler.wmflabs.org/compiler1002/19137/stat1007.eqiad.wmnet/ [10:26:38] - --output-base /wmf/data/raw/mediawiki/xmldumps \ [10:26:38] + --output-base /wmf/data/raw/mediawiki/dumps \ [10:26:41] expected ? [10:26:51] I guess so but triple checking [10:27:07] elukey: yes - siteinfo dumps are not XML for instance [10:27:46] elukey: I don't mind using xmldumps as global path since it's the well-known name, but mabe using dumps is enough and less misleading? [10:32:11] nono I didn;t want to discuss the choice, just verify the diff :) [10:33:57] np elukey - discussing the choice is actually a good idea :) [10:36:54] elukey: About the data-purge update, shall I go for the same pattern? [10:41:28] should be ok yes [10:41:53] elukey: I'll look more broadly at usage of the deletion script and see if the pattern can be generalied [10:42:40] in the meantime I am looking into varnishkafka for TLS [10:43:02] elukey: Thanks! But you should try to teach me! [10:43:03] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10elukey) Sure! The JSON format of what we collect from Varnish for webrequest is in `profile::cache::kafka::webrequest`: ` format... [10:43:21] joal: I am adding all the info in the task [10:43:25] don't worry [10:43:26] \o/ [10:48:22] Ok - loading of previsouly-failed jobs have finished, now cassandra needs time for compaction-recover [11:02:29] (03Merged) 10jenkins-bot: graphite.py: encode a text string before socket.send [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/546971 (https://phabricator.wikimedia.org/T204736) (owner: 10Awight) [11:03:58] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10elukey) I set up a test webrequest.conf on cp2001, and confirmed that the solution works! Side note - varnishkafka set with output... [11:04:08] joal: --^ [11:04:13] let me know if you have doubts [11:05:17] Thanks elukey - Will look into it [11:18:42] (03PS1) 10Awight: Use mock.patch to override external method [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/547168 [11:21:58] (03PS1) 10Joal: Update oozie datasets to match dumps import change [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547169 (https://phabricator.wikimedia.org/T234333) [11:22:57] (03PS2) 10Joal: [WIP] Update oozie datasets to match dumps import change [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547169 (https://phabricator.wikimedia.org/T234333) [11:24:13] (03CR) 10jerkins-bot: [V: 04-1] Use mock.patch to override external method [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/547168 (owner: 10Awight) [11:39:31] (03PS2) 10Awight: Use mock.patch to override external methods [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/547168 [11:44:52] (03CR) 10jerkins-bot: [V: 04-1] Use mock.patch to override external methods [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/547168 (owner: 10Awight) [11:47:22] (03PS3) 10Awight: Use mock.patch to override external methods [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/547168 [12:03:58] hahaha I found the slowness. We're trying to run every report since the start_date of when the code was written. [12:04:16] wow [12:05:44] thanks a lot for all the work awight [12:06:47] It's fun! Sort of fits into my official work tasks at the moment, don't worry that I might be volunteering ;-) [12:07:18] awight: you definitely have some kind of the same fun I do :) [12:09:23] lol, fun like learning a new language or something [12:09:26] (03PS1) 10Joal: Move wikitext_history from parquet to avro [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547184 (https://phabricator.wikimedia.org/T236687) [12:14:43] * elukey lunch break! [12:30:46] Yo fdans - You here? [12:46:39] fdans: Depending on when you read this, I suggest we start backfilling per-file for 10 days [12:47:14] fdans: Given how long it takes to load and then recover compacting, 10 days is what we want if we start now [12:49:30] fdans: I also think that we should automate launching those type of jobs on a daily basis at 05:00am, for 20 days [12:50:52] (03PS1) 10Awight: [WIP] Mock date to prevent massive backfilling [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/547190 [12:51:25] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Mock date to prevent massive backfilling [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/547190 (owner: 10Awight) [12:55:45] joal: sorry joal, my wife has terrible back pain today so I'm back and forth between her and the computer [12:56:00] No prob fdans :) [12:56:08] joal: I agree with what you're saying [12:56:18] I wish her (and you) fast recovery :S [12:56:20] joal: 20 days won't be enough though [12:56:49] ? [12:57:17] according to my napkin calculations, this backfill, run continuously, will take a month and a half [12:57:44] right [12:57:46] ( 4.4 years to backfill @ 40 min per day) [12:59:08] I'm not sure I understand what you mean [13:09:09] (03CR) 10Ottomata: [C: 03+1] Move wikitext_history from parquet to avro [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547184 (https://phabricator.wikimedia.org/T236687) (owner: 10Joal) [13:26:50] 10Analytics, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog, 10Wikimedia-Logstash, and 3 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10fgiunchedi) [13:48:37] 10Analytics, 10ArticlePlaceholder, 10Wikidata, 10Wikidata-Campsite, 10wikidata-tech-focus: ArticlePlaceholder dashboard stopped tracking page views - https://phabricator.wikimedia.org/T236895 (10Addshore) This is "wikimedia/analytics/refinery/job/WikidataArticlePlaceholderMetrics.scala" Will poke the wm... [13:49:08] Hi A Team! [13:49:14] Any idea why this oozie job stopped? https://phabricator.wikimedia.org/T236895 [13:56:08] 10Analytics, 10Wikidata, 10wikidata-tech-focus, 10User-Jonas: [Trailblaze] Create recommendation system prototype for property suggestions - https://phabricator.wikimedia.org/T201168 (10Addshore) 05Open→03Declined Declining as this is not a planned trailblaze right now [14:05:59] phew sorry meant to get online earlier but my computer took forever to do an update! [14:06:43] proceeding with https://phabricator.wikimedia.org/T235494#5617556 [14:06:49] to rename published-datasets to datsets [14:06:51] sorry [14:06:52] to 'published' [14:07:46] ottomata: o/ [14:07:51] hellloooo [14:08:06] I checked the procedure this morning and it seemed ok, I didn't review it in depth due to cergen rabbithole [14:08:11] I can do it now if you want [14:08:18] np! [14:08:22] it is pretty simple [14:08:26] mostly moving some files around and creating symlinks [14:08:45] just had to write it all down somewhere to keep it straight in my head [14:10:17] it looks og [14:10:19] good indeed [14:10:33] +1 [14:24:32] addshore: o/ [14:24:39] nice to hear from you! [14:27:10] 10Analytics, 10ArticlePlaceholder, 10Wikidata, 10Wikidata-Campsite, 10wikidata-tech-focus: ArticlePlaceholder dashboard stopped tracking page views - https://phabricator.wikimedia.org/T236895 (10elukey) Hello! Do you have a coordinator that I can check in Hue? https://hue.wikimedia.org/oozie/list_oozie_c... [14:27:14] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Create a reports directory under analytics.wikimedia.org - https://phabricator.wikimedia.org/T235494 (10Ottomata) Woo hoo, done! Everything is now /srv/published, with the previous /srv/published-datasets now at /srv/published/datas... [14:27:28] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Create a reports directory under analytics.wikimedia.org - https://phabricator.wikimedia.org/T235494 (10Ottomata) [14:30:29] Hi addshore - I think the problem comes from this: T226730 [14:30:57] addshore: almost no more `Special:` pages are pageviews [14:31:34] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Create HDFS /tmp/ cleaner - https://phabricator.wikimedia.org/T235200 (10Ottomata) [14:36:00] all right! Back in business [14:42:13] milimetric: o/ [14:42:17] how are you feeling? [14:42:31] a LOT better, but still a little weak [14:42:59] I thought I'd be better yesterday and turned out I couldn't sit upright for more than like 10 minutes, so that's no good [14:45:07] :( [14:56:54] dawww [14:58:21] ottomata: for cergen, I guess that now I have to create the debian change etc.. right? [14:59:44] hehe yup! [15:00:05] !log disabling eventlogging-consumer mysql on eventlog1002 [15:00:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:00:52] woooooooowwww [15:00:55] is it the time?? [15:01:38] 10Analytics, 10ArticlePlaceholder, 10Wikidata, 10Wikidata-Campsite, 10wikidata-tech-focus: ArticlePlaceholder dashboard stopped tracking page views - https://phabricator.wikimedia.org/T236895 (10JAllemandou) I think this problem could be related to T226730 (preventing most `Special:XXX` pages to be flagg... [15:01:56] yup! [15:02:09] didn't do it yesterday mostly because i was out and about at cafes :) [15:02:16] once consumer is stopped we can sanitize and archive [15:02:29] mforns: yt? can you (help?) with the sanitize of data in mysql [15:02:35] oh, elukey were there tables we could drop first? [15:02:54] elukey: https://phabricator.wikimedia.org/T233891#5612965 [15:03:24] hey a-team [15:03:34] you know what is maybe better than google hangouts and slack? [15:03:38] discord [15:03:38] ! [15:04:26] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Disable production EventLogging analytics MySQL consumers - https://phabricator.wikimedia.org/T232349 (10Ottomata) [15:04:39] Would that thing sow the seeds of discord? [15:04:55] har har [15:04:59] it could eh!? [15:05:05] i wouldn't want us to use it for chat over IRC [15:05:11] but it might be good for batcave [15:05:19] the room stays alive, anybody can jump in voice chat... [15:05:20] hmmmm [15:05:23] it has video [15:05:28] not sure about group video though ghmmm [15:05:45] ahhh nm i htink it doesn't [15:06:40] yeah, basically nothing does group video well, that's how come you have to either surrender your sanity, privacy, or $$$ to get it [15:06:40] ottomata: sanitization is pretty quick, we can do it even without dropping table first [15:07:54] elukey: question for you - In data_purge.pp, most data-deletion timers are using the timer-definition - Would you prefer me building a define using file+timer, or timer only, or keep it the way it is and remove the currently existing template for deleting dumps? [15:09:35] elukey: let's drop tables! [15:09:36] :D [15:11:12] ottomata: sure but it needs to be done with care, I'd prefer to triple check, there is no rush no? [15:11:52] yar yar yar [15:11:56] i just want to be done with this :p [15:12:01] ok [15:12:07] i will let you drop them [15:12:30] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Performance-Team (Radar): Drop Navigationtiming data entirely from mysql storage? - https://phabricator.wikimedia.org/T233891 (10Ottomata) a:05Ottomata→03elukey [15:12:41] you can drop them too, I just didn't have the time to triple check, I am a bit scared when dropping data :) [15:13:25] aye [15:13:28] hmm, actually [15:13:41] hm ya [15:14:03] joal: didn't get your point sorry, do you need to have a separate file for each timerr? [15:14:36] elukey: I don't - Keeping them out is definitely possible [15:14:57] elukey: one already exist, but since I'm going to change it, I can remove it easily [15:15:57] joal: let's try to use what we already have, but if you have ideas about generalizing some stuff please do! [15:16:19] sounds good elukey [15:18:02] ottomata: I noticed that there is also a changelog in cergen.. but it seems not up to date with the last commits [15:18:15] fdans: wow, this was just merged: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikimediaMessages/+/544086/ [15:18:43] elukey: debian/changelog? [15:18:52] ottomata: nono sorry CHANGELOG.md [15:18:54] in master [15:18:55] oh [15:18:57] nuria: I KNOW SO EXCITING [15:19:06] not sure, i haven't really updated cergen myself in a while...have I? [15:19:09] ottomata: basically https://gerrit.wikimedia.org/r/#/c/cergen/+/547238 [15:19:10] maybe i forgot to update it? [15:19:17] joal: did you figured out the avro/parquet disparity on results? [15:19:19] * joal claps for fdans !!! [15:19:20] ohhh [15:19:20] yes [15:19:26] elukey makes sense, we didn't tag [15:19:37] i think some of those were just patches that were low priority [15:19:43] ottomata: also when merging from master to debian there are moar things than mines, just wanted to make sure if was ok to release [15:19:48] yeah it is [15:19:52] ahh okok [15:19:54] super [15:19:54] we were just waiting for a reason to release i think [15:20:01] thank you [15:20:13] nuria: Not a disparity on parquet/avro, but a disparity between spark and hive way to compute string patterns 'like' [15:21:22] joal: thanks for the ping [15:21:38] elukey: im back :D [15:21:47] 10Analytics, 10ArticlePlaceholder, 10Wikidata, 10Wikidata-Campsite, 10wikidata-tech-focus: ArticlePlaceholder dashboard stopped tracking page views - https://phabricator.wikimedia.org/T236895 (10Nuria) Ya, =1 to joseph, Special:blah urls (other than Special:Search) should not have been counted as pagevie... [15:23:57] 10Analytics, 10ArticlePlaceholder, 10Wikidata, 10Wikidata-Campsite, 10wikidata-tech-focus: ArticlePlaceholder dashboard stopped tracking page views - https://phabricator.wikimedia.org/T236895 (10Nuria) So this query needs to remove the is_pageview=true line: https://github.com/wikimedia/analytics-refiner... [15:24:12] 10Analytics, 10Wikimedia-General-or-Unknown, 10WikimediaMessages: Add link from wikis' footer to Wikistats 2 - https://phabricator.wikimedia.org/T235803 (10MarcoAurelio) 05Open→03Resolved a:03Jdforrester-WMF [15:24:14] 10Analytics, 10ArticlePlaceholder, 10Wikidata, 10Wikidata-Campsite, 10wikidata-tech-focus: ArticlePlaceholder dashboard stopped tracking page views - https://phabricator.wikimedia.org/T236895 (10Addshore) Right then, so we need to: # Remove this condition https://github.com/wikimedia/analytics-refinery-... [15:24:15] joal: WHAT! [15:24:25] joal: civilization is coming to an end! [15:24:35] nuria: I guess pageview_info["project"] also needs changing [15:24:37] indeed nuria - This feels wrong :) [15:24:39] * addshore looks at the schema again [15:24:40] 10Analytics, 10Wikimedia-General-or-Unknown, 10WikimediaMessages, 10MW-1.35-notes (1.35.0-wmf.4; 2019-10-29): Add link from wikis' footer to Wikistats 2 - https://phabricator.wikimedia.org/T235803 (10Jdforrester-WMF) This should begin roll out to production with the train tomorrow. [15:26:50] I guess we use normalized_host and project + project_family ? [15:27:08] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Operations, and 4 others: Public EventGate endpoint for analytics event intake - https://phabricator.wikimedia.org/T233629 (10akosiaris) [15:27:24] 10Analytics, 10ArticlePlaceholder, 10Wikidata, 10Wikidata-Campsite, 10wikidata-tech-focus: ArticlePlaceholder dashboard stopped tracking page views - https://phabricator.wikimedia.org/T236895 (10Nuria) yes, you can use https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-hive/src/m... [15:28:11] addshore: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/GetHostPropertiesUDF.java [15:28:19] thanks, just saw the phab comment :) [15:28:19] 10Analytics, 10Wikimedia-General-or-Unknown, 10WikimediaMessages, 10MW-1.35-notes (1.35.0-wmf.5; 2019-11-05): Add link from wikis' footer to Wikistats 2 - https://phabricator.wikimedia.org/T235803 (10Jdforrester-WMF) Ahem. Obviously, no it won't, because today isn't Monday. [15:29:22] addshore: so changes to query are small, given that the x-analytics filter is also reducing your resultset i do not think queries would take much longer [15:29:41] amazing, just going to write everything in there and try to get someone other than me to pick it up [15:29:49] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Operations, and 3 others: Public schema.wikimedia.org endpoint for schema.svc - https://phabricator.wikimedia.org/T233630 (10akosiaris) @Ottomata, I 'd say so. I would ask for review from #Traffic or #ServiceOps, but unless it was under the goals o... [15:29:53] 10Analytics, 10ArticlePlaceholder, 10Wikidata, 10Wikidata-Campsite, 10wikidata-tech-focus: ArticlePlaceholder dashboard stopped tracking page views - https://phabricator.wikimedia.org/T236895 (10Addshore) indeed, project + project_family can be used, so something like: get_host_properties('en.m.zero.wik... [15:30:04] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Operations, and 4 others: Public EventGate endpoint for analytics event intake - https://phabricator.wikimedia.org/T233629 (10Ottomata) > I'm inclined to just use the existent eventgate-analytics backend endpoint for now. Recent discussions about m... [15:30:26] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Operations, and 3 others: Public schema.wikimedia.org endpoint for schema.svc - https://phabricator.wikimedia.org/T233630 (10Ottomata) Ok, I will make patches then. [15:31:17] elukey: https://gerrit.wikimedia.org/r/c/operations/puppet/+/547239 [15:31:22] OH we have a meeting rn! [15:31:48] 10Analytics, 10ArticlePlaceholder, 10Wikidata, 10Wikidata-Campsite, 10wikidata-tech-focus: ArticlePlaceholder dashboard stopped tracking page views - https://phabricator.wikimedia.org/T236895 (10Addshore) [15:31:55] thanks team! [15:40:16] mforns / ottomata: added you to the gerrit patch for rsyncing geoeditors, sorry I forgot last week. It's ready for review, let me know what you think [15:44:23] k [16:16:48] 10Analytics: Spike [2019-2020 work] Airflow Study - https://phabricator.wikimedia.org/T217059 (10Ottomata) FYI, RelEng is considering using Argo for CI in Kubernetes. Argo looks like it has some similarities with Airflow: https://github.com/argoproj/argo/issues/849 [16:20:12] 10Analytics: Spike [2019-2020 work] Airflow Study - https://phabricator.wikimedia.org/T217059 (10Ottomata) Also: https://www.pachyderm.io/ [16:21:59] 10Analytics, 10Analytics-Kanban, 10Multimedia, 10Tool-Pageviews: Make job to backfill data from mediacounts into mediarequests tables in cassandra so as to have historical mediarequest data - https://phabricator.wikimedia.org/T234591 (10Nuria) [16:32:35] 10Analytics: Spike [2019-2020 work] Oozie Replacement study (Airflow, Argo, Pachyderm, Kubernetes, etc.) - https://phabricator.wikimedia.org/T217059 (10Ottomata) [16:33:03] 10Analytics: Spike [2019-2020 work] Ozie Replacement. Airflow Study / Argo Study - https://phabricator.wikimedia.org/T217059 (10Nuria) [16:35:51] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Create a reports directory under analytics.wikimedia.org - https://phabricator.wikimedia.org/T235494 (10Neil_P._Quinn_WMF) Thanks, @Ottomata! Currently, https://analytics.wikimedia.org/published-datasets/ is returning 404. Any idea... [16:40:53] 10Analytics, 10Analytics-Kanban, 10Multimedia, 10Tool-Pageviews: Make job to backfill data from mediacounts into mediarequests tables in cassandra so as to have historical mediarequest data - https://phabricator.wikimedia.org/T234591 (10Nuria) Per the many issues we have seen recently with cassandra not be... [16:58:01] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Check Avro as potential better file format for wikitext-history - https://phabricator.wikimedia.org/T236687 (10mforns) @JAllemandou I think the rlike comparison does not require the matched string to start with the pattern, unless you use ^. The like com... [16:58:52] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10phuedx) >>! In T220627#5607061, @Isaac wrote: > As I go to do this analysis, what UTC day/ho... [17:00:50] vetting now, nuria, data looks good overall, will dig deeper into a couple of specific examples [17:01:02] milimetric: nice, thank you [17:14:13] 10Analytics, 10Operations, 10Core Platform Team Legacy (Watching / External), 10Patch-For-Review, and 2 others: Replace and expand kafka main hosts (kafka[12]00[123]) with kafka-main[12]00[12345] - https://phabricator.wikimedia.org/T225005 (10Dzahn) Could you please merge/amend/remove the missing cumin ali... [17:38:08] 10Analytics: Spike [2019-2020 work] Oozie Replacement. Airflow Study / Argo Study - https://phabricator.wikimedia.org/T217059 (10Ottomata) [17:38:37] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Create a reports directory under analytics.wikimedia.org - https://phabricator.wikimedia.org/T235494 (10Ottomata) https://analytics.wikimedia.org/published-datasets/ was never a URL, it was always https://analytics.wikimedia.org/dat... [17:47:42] ottomata: new cergen built and uploaded to buster-wikimedia, waiting for SRE before upgrading puppet masters but should be almost done [17:48:05] nice [17:48:14] thanks luccaaa [17:54:42] ok will upgrade tomorrow, John and others seem to be already off :) [17:54:48] going off too! o/ [17:56:04] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Sunset MySQL data store for eventlogging - https://phabricator.wikimedia.org/T159170 (10Ottomata) Ok, eventloggging mysql data has been turned off. No new data will flow in. Next steps in order: 1. {T233891} - @elukey - {T236818} - @ottomat... [17:56:16] ok laters luca! [17:58:10] 10Analytics, 10Operations, 10Traffic, 10User-jbond: Fix geoip updaters for new MaxMind hashed keys by 2019-08-15 - https://phabricator.wikimedia.org/T228533 (10jbond) [18:01:08] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Create a reports directory under analytics.wikimedia.org - https://phabricator.wikimedia.org/T235494 (10Neil_P._Quinn_WMF) >>! In T235494#5620453, @Ottomata wrote: > https://analytics.wikimedia.org/published-datasets/ was never a UR... [18:01:52] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Create a reports directory under analytics.wikimedia.org - https://phabricator.wikimedia.org/T235494 (10Ottomata) Must be because I get a dir listing! :) [18:25:46] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Create a reports directory under analytics.wikimedia.org - https://phabricator.wikimedia.org/T235494 (10Neil_P._Quinn_WMF) >>! In T235494#5620618, @Ottomata wrote: > Must be because I get a dir listing! :) Yeah, now I do too. I'm ju... [18:39:01] 10Analytics, 10Operations, 10Core Platform Team Legacy (Watching / External), 10Patch-For-Review, and 2 others: Replace and expand kafka main hosts (kafka[12]00[123]) with kafka-main[12]00[12345] - https://phabricator.wikimedia.org/T225005 (10herron) [18:54:29] 10Analytics: Dashiki: Read multiple wikis from single file - https://phabricator.wikimedia.org/T236941 (10Milimetric) [19:03:41] 10Analytics: Dashiki: Read multiple wikis from single file - https://phabricator.wikimedia.org/T236941 (10Nuria) cc-ing @srishakatux as she might be interested to know when this is done [19:06:17] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Optimize archiva git-fat symlink script - https://phabricator.wikimedia.org/T235668 (10Ottomata) [19:14:25] 10Analytics, 10Cloud-Services, 10Developer-Advocacy (Oct-Dec 2019): Setup Config:Dashiki:WMCSEdits on meta wiki - https://phabricator.wikimedia.org/T236223 (10srishakatux) Thank you both for your helpful reply! We discussed the two possible layouts in our meeting and agreed that: * Ratios/percentages are wh... [19:53:20] Gone for tonight [20:22:06] 10Analytics, 10Wikimedia-General-or-Unknown, 10WikimediaMessages, 10MW-1.35-notes (1.35.0-wmf.5; 2019-11-05): Add link from wikis' footer to Wikistats 2 - https://phabricator.wikimedia.org/T235803 (10fdans) @Jdforrester-WMF can't thank you enough for doing this. I'm beyond thrilled to see the increase in a... [20:26:17] 10Analytics, 10Better Use Of Data, 10Epic, 10Performance-Team (Radar), 10Product-Infrastructure-Team-Backlog (Kanban): Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10Ottomata) client/error schema has been merged. eventgate-logging.wmflabs.org has been updated to... [20:33:26] 10Analytics, 10Analytics-Wikistats: [Wikistats v2] Default selection for (active) editors is confusing for inexperienced users - https://phabricator.wikimedia.org/T213800 (10Nemo_bis) [21:00:56] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, and 4 others: Create new eventgate-logging deployment in k8s with helmfile - https://phabricator.wikimedia.org/T236386 (10Ottomata) [21:01:07] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, and 4 others: Create new eventgate-logging deployment in k8s with helmfile - https://phabricator.wikimedia.org/T236386 (10Ottomata) a:05elukey→03Ottomata [21:11:09] 10Analytics, 10Wikimedia-General-or-Unknown, 10WikimediaMessages, 10MW-1.35-notes (1.35.0-wmf.5; 2019-11-05): Add link from wikis' footer to Wikistats 2 - https://phabricator.wikimedia.org/T235803 (10Jdforrester-WMF) Happy to help, especially to highlight the excellent work you all have done! [21:44:46] (03PS1) 10Mforns: Refactor data_quality oozie bundle to fix too many partitions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547320 (https://phabricator.wikimedia.org/T235486) [21:47:16] (03PS2) 10Mforns: Refactor data_quality oozie bundle to fix too many partitions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547320 (https://phabricator.wikimedia.org/T235486) [21:48:33] (03CR) 10Mforns: [C: 04-2] "Still testing" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547320 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [21:50:03] (03Abandoned) 10Mforns: Add report generation for data quality oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/546212 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [21:51:07] (03Abandoned) 10Mforns: Add spark job to generate a data quality report [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/541557 (https://phabricator.wikimedia.org/T215863) (owner: 10Mforns) [22:22:43] I've got a problem trying to use a modern version of R with SWAP [22:22:56] I'm on notebook1004 [22:23:25] I installed R 3.6 and the IRkernel following https://github.com/IRkernel/IRkernel [22:24:16] I can start jupyter console using the R 3.6 kernel, it shows up in the jupyter-notebook dropdown menu, but when I try to connect to it from the notebook I get "kernel died" [23:03:03] (03CR) 10MNeisler: Add the MobileWebUIActionsTracking schema to EventLogging whitelist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541946 (https://phabricator.wikimedia.org/T234563) (owner: 10MNeisler) [23:12:39] (03CR) 10Nuria: Add the MobileWebUIActionsTracking schema to EventLogging whitelist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541946 (https://phabricator.wikimedia.org/T234563) (owner: 10MNeisler) [23:27:12] 10Analytics, 10Wikimedia-General-or-Unknown, 10WikimediaMessages, 10MW-1.35-notes (1.35.0-wmf.5; 2019-11-05): Add link from wikis' footer to Wikistats 2 - https://phabricator.wikimedia.org/T235803 (10Nuria) @Jdforrester-WMF super thanks. REALLY. [23:36:04] 10Analytics, 10Analytics-Kanban: logging level of cassandra should be warning or error but not debug - https://phabricator.wikimedia.org/T236698 (10Nuria) soo.. changing ` mapreduce.map.log.level INFO ... [23:51:03] 10Analytics, 10Analytics-Kanban: logging level of cassandra should be warning or error but not debug - https://phabricator.wikimedia.org/T236698 (10Nuria) Also tried to add -Dyarn.app.mapreduce.am.log.level=ERROR to the command that runs the job, that did not work either