[02:03:04] <wikibugs>	 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Milimetric) @Ijon I'm working on a blacklist, and wanted to check with you to see how it would impact the usefulness of the datase...
[02:27:59] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 4 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10Nuria) > It can be delayed for an arbitrary amount of time; events (with a timestamp and all their data) wi...
[04:18:37] <icinga-wm>	 PROBLEM - Check the last execution of monitor_refine_mediawiki_events on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_mediawiki_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[06:14:37] <elukey>	 good morning camus!
[06:15:54] <elukey>	 from the error logs it seems that the mapreduce job suffered a heap problem
[06:16:12] <elukey>	 so I have added manually a more generous setting and restarted
[06:16:36] <elukey>	 brb
[06:22:07] <elukey>	 worked, and refine is now running
[06:22:10] <elukey>	 let's see
[06:35:00] <icinga-wm>	 RECOVERY - Check the last execution of monitor_refine_mediawiki_events on an-coord1001 is OK: OK: Status of the systemd unit monitor_refine_mediawiki_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[06:35:55] <elukey>	 goooood
[07:04:12] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10elukey) I am inclined to mark this as done given T234826
[08:11:42] <elukey>	 awight: you broke my heart :(
[08:11:56] <elukey>	 (kidding, thanks a ton for the el testing!)
[08:41:21] <wikibugs>	 10Analytics, 10Discovery, 10Event-Platform, 10Wikidata, and 3 others: Log Wikidata Query Service queries to the event gate infrastructure - https://phabricator.wikimedia.org/T101013 (10dcausse) @Ottomata absolutely this is for analysis purposes
[08:43:53] <wikibugs>	 10Analytics, 10Discovery, 10Event-Platform, 10Wikidata, and 3 others: Log Wikidata Query Service queries to the event gate infrastructure - https://phabricator.wikimedia.org/T101013 (10dcausse)
[08:48:23] <wikibugs>	 10Analytics, 10Discovery, 10Event-Platform, 10Wikidata, and 3 others: Log Wikidata Query Service queries to the event gate infrastructure - https://phabricator.wikimedia.org/T101013 (10dcausse) @Ottomata I updated the task description to indicate the steps needed to make this happen, I probably missed some...
[09:34:28] <dcausse>	 hey, I have a quick question about wdqs_extract (don't worry I don't want to bring it back)
[09:35:55] <dcausse>	 just in case you remember what it was doing, was it just a subset of the webrequest logs or did it have some advanced transformations that I could reuse to extract the sparql queries our wdqs hosts receive today
[09:47:42] <elukey>	 no idea dcausse :(
[09:48:48] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10elukey)
[10:17:36] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10elukey) @Ottomata do you think that it would be ok to remove the current HDFS backup (to remove one things that blocks the krb deployment) given T2...
[10:49:02] * elukey lunch!
[11:51:40] <joal>	 Hi team - I'm here while kids sleep and then will be back for standup
[11:52:16] <fdans>	 joal: helloooo good sir
[12:21:24] <ottomata>	 elukey:  which camus job failed?
[12:21:41] <ottomata>	 ...wait i wil check email....
[13:23:55] <milimetric>	 dcausse: I'll check the git history and let you know
[13:24:59] <dcausse>	 milimetric: thanks! but don't waste too much time on this, I'm close to have something working in swap notebook
[13:25:38] <milimetric>	 dcausse: 2 minutes :)
[13:25:48] <dcausse>	 :)
[13:25:50] <milimetric>	 dcausse: just a straight copy: https://github.com/wikimedia/analytics-refinery/commit/bdd566ada8a797b5670208987cd53994775f8f87#diff-fae5910d5b2d0fa42bd76cd2ddb95c20
[13:25:54] <milimetric>	 no tranformations
[13:26:34] <dcausse>	 milimetric: great thanks!
[13:27:53] <dcausse>	 just discovered the tags field on webrequest, and thankfully you add a tag 'sparql' somewhere in the refinery process, this is super handy for my usecase
[14:29:44] <elukey>	 hive2druid working now in hadoop test!
[14:56:39] <ottomata>	 cool!
[15:10:28] <nuria>	 dcausse: stas added that tag for i think your same use case
[15:47:50] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: drop CitatitionUsage data on mysql - https://phabricator.wikimedia.org/T233893 (10Nuria) 05Open→03Resolved
[15:47:52] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10Nuria)
[15:48:12] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Performance-Team (Radar), 10User-Elukey: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 (10Nuria) 05Open→03Resolved
[15:48:27] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10CPT Initiatives (Modern Event Platform (TEC2)), 10Services (watching): Modern Event Platform: Schema Registry: Implementation - https://phabricator.wikimedia.org/T206789 (10Nuria)
[15:48:29] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 3 others: CI Support for Schema Registry - https://phabricator.wikimedia.org/T206814 (10Nuria) 05Open→03Resolved
[15:48:40] <wikibugs>	 10Analytics, 10Analytics-Kanban: Add more dimensions to netflow's druid ingestion specs - https://phabricator.wikimedia.org/T229682 (10Nuria) 05Open→03Resolved
[15:49:00] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Services (watching): Mediarequests: Add endpoint for agreggated counts per file type per project - https://phabricator.wikimedia.org/T231589 (10Nuria) 05Open→03Resolved
[15:49:03] <wikibugs>	 10Analytics, 10Patch-For-Review, 10Services (watching): Add mediacounts data to AQS and, from there, Restbase - https://phabricator.wikimedia.org/T207208 (10Nuria)
[15:49:18] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Allow all Analytics tools to work with Kerberos auth - https://phabricator.wikimedia.org/T226698 (10Nuria)
[15:49:20] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Move refinery to hive 2 actions - https://phabricator.wikimedia.org/T227257 (10Nuria) 05Open→03Resolved
[15:49:32] <wikibugs>	 10Analytics-Kanban: Deprecate Python 2 software from the Analytics infrastructure - https://phabricator.wikimedia.org/T204734 (10Nuria)
[15:49:34] <wikibugs>	 10Analytics, 10Analytics-Kanban: Verify what Python 2 packages deployed to Analytics hosts are needed - https://phabricator.wikimedia.org/T204737 (10Nuria) 05Open→03Resolved
[15:49:51] <wikibugs>	 10Analytics, 10Analytics-Kanban: Move Analytics Report Updater to Python 3 - https://phabricator.wikimedia.org/T204736 (10Nuria) 05Open→03Resolved
[15:50:10] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Scoring-platform-team, 10Patch-For-Review: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10Nuria) 05Open→03Resolved
[15:50:30] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move the Analytics Refinery to Python 3 - https://phabricator.wikimedia.org/T204735 (10Nuria) 05Open→03Resolved
[15:50:32] <wikibugs>	 10Analytics-Kanban: Deprecate Python 2 software from the Analytics infrastructure - https://phabricator.wikimedia.org/T204734 (10Nuria)
[15:51:24] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Event-Platform: Clean up descriptions of fields in included common schemas in mediawiki/event-schemas repository - https://phabricator.wikimedia.org/T233057 (10Nuria) 05Open→03Resolved
[15:51:26] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10CPT Initiatives (Modern Event Platform (TEC2)), 10Services (watching): Modern Event Platform: Schema Registry: Implementation - https://phabricator.wikimedia.org/T206789 (10Nuria)
[15:53:18] <wikibugs>	 (03CR) 10Nuria: Add spark job to generate a data quality report (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/541557 (https://phabricator.wikimedia.org/T215863) (owner: 10Mforns)
[15:59:48] <wikibugs>	 (03CR) 10Nuria: [C: 03+2] "Let's merge this code if we have tested the job." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538613 (https://phabricator.wikimedia.org/T233504) (owner: 10Joal)
[16:10:56] <wikibugs>	 (03PS2) 10Fdans: (wip) Add backfill queries for mediarequest metrics [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541817
[16:33:35] <elukey>	 milimetric: o/
[16:33:37] <milimetric>	 sorry elukey was running to other meeting
[16:33:39] <joal>	 elukey: I joined back, but you left just before :)
[16:33:44] <milimetric>	 you were gonna say about the aqs patch?
[16:33:46] <elukey>	 aahaha sorry guys
[16:34:13] <elukey>	 milimetric: yes merged and puppet run on the aqs host, we can restart aqs on aqs1004 and test when you have time
[16:34:18] <elukey>	 (the apply to all)
[16:34:33] <elukey>	 depool 1004, restart aqs, test, repool, apply all
[16:34:42] <elukey>	 usually this is what I do with Joseph
[16:35:00] <milimetric>	 yep, I can test now elukey 
[16:35:13] <elukey>	 ah ok lemme depool 1004 then
[16:36:03] <elukey>	 milimetric: 1004 depooled and ready
[16:37:20] <wikibugs>	 10Analytics, 10Desktop Improvements, 10Event-Platform, 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): [SPIKE 8hrs] How will the changes to eventlogging affect desktop improvements - https://phabricator.wikimedia.org/T233824 (10ovasileva)
[16:37:57] <wikibugs>	 10Analytics: dumps.wikimedia.org/other/mediawiki_history is missing some files - https://phabricator.wikimedia.org/T235112 (10mforns)
[16:38:11] <milimetric>	 elukey: uh... weird... not getting data, hang on, gotta reboot my brain
[16:39:49] <milimetric>	 elukey: yeah, confirmed, aqs1004 is broken somehow, aqs1005 is fine
[16:40:17] <milimetric>	 proof: this returns no results: curl http://localhost:7232/analytics.wikimedia.org/v1/edits/aggregate/all-projects/all-editor-types/all-page-types/monthly/2017060100/2019100500
[16:40:24] <milimetric>	 like none at all, not even the older ones
[16:49:54] <elukey>	 ouch
[16:54:12] <elukey>	 milimetric: mmm with curl -X GET --header 'Accept: application/json; charset=utf-8' works?
[16:55:05] <milimetric>	 I just tunneled and curled Druid directly and mediawiki_history_reduced_2019_09 works fine from druid1004
[16:55:06] <elukey>	 ah no zero results for that
[16:55:25] <milimetric>	 but not fine from druid1005!
[16:55:30] <milimetric>	 which is ... weird
[16:56:02] <elukey>	 what you mean not fine?
[16:56:05] <milimetric>	 elukey: steps to repro:
[16:56:11] <milimetric>	 https://www.irccloud.com/pastebin/FGQiX9pn/
[16:56:25] <milimetric>	 this works if I'm doing:
[16:56:25] <milimetric>	 ssh -N druid1004.eqiad.wmnet -L 8082:druid-public-broker.svc.eqiad.wmnet:8082
[16:56:30] <elukey>	 anyway, with your curl if you swap localhost with aqs1004 workds
[16:56:30] <milimetric>	 and it doesn't work if I'm doing:
[16:56:33] <milimetric>	 ssh -N druid1005.eqiad.wmnet -L 8082:druid-public-broker.svc.eqiad.wmnet:8082
[16:57:35] <milimetric>	 ok, 'cause it's rebalancing me (really silly how it does that), but above is still weird, no?
[16:59:13] <elukey>	 I am trying to repro now
[16:59:56] <milimetric>	 yeah, there's something still weird going on with the druid cluster, like it somehow hasn't distributed that datasource yet.  Because this doesn't work:
[16:59:56] <milimetric>	 curl http://aqs1004.eqiad.wmnet:7232/analytics.wikimedia.org/v1/editors/aggregate/en.wikipedia.org/all-editor-types/all-page-types/all-activity-levels/monthly/2017090100/2019100900
[17:02:40] <milimetric>	 (but it does work with aqs1005 as you'd expect it to)
[17:02:42] <elukey>	 the curl in the irccloud yields me and error for unclosed braces or similar
[17:06:37] <elukey>	 ok I can repro, druid1005 seems werid
[17:11:30] <elukey>	 !log restart druid-broker on druid100[5-6] - not serving data correctly
[17:11:31] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[17:11:57] <elukey>	 milimetric: now the work
[17:11:59] <elukey>	 super weird
[17:12:51] <milimetric>	 indeed, the druids seem to have agreed to collaborate :)
[17:13:19] <milimetric>	 hammer ftw :)
[17:13:31] <milimetric>	 deploy away, best not to ask too many questions
[17:13:38] <milimetric>	 (said Luca ... never)
[17:13:39] <elukey>	 I don't like this though
[17:13:52] <elukey>	 it seems a druid broker bug
[17:13:59] <elukey>	 like they were stuck in some state
[17:14:01] <elukey>	 sigh
[17:14:02] <milimetric>	 sorry I'm in meeting, yeah, I'd be happy to brainbounce after
[17:14:45] <elukey>	 milimetric: ok to complete the aqs roll restart then?
[17:15:40] <elukey>	 just repooled aqs1004, looks good
[17:15:48] <milimetric>	 elukey: I'm out of the meeting if you want to think about it more
[17:15:56] <milimetric>	 but it seemed like a hiccup where that datasource wasn't replicated
[17:16:03] <milimetric>	 agree that it's weird and that druid shouldn't do that
[17:17:52] <elukey>	 milimetric: I think that the historicals all had the data correctly in place (otherwise we wouldn't have seen such a quick recovery) but the brokers were stuck in some weird state
[17:18:22] <elukey>	 now I am wondering if they were returning empty responses to clients when hitting 1005/1006
[17:18:24] <milimetric>	 makes sense.  Then yeah, roll deploy and we’ll keep an eye on it
[17:18:38] <milimetric>	 I can always roll back with andrew if it keeps being weird
[17:19:07] <milimetric>	 it was just the _09 datasource elukey, the _08 one wasn’t affected
[17:19:22] <milimetric>	 so the weirdness never surfaced to the public
[17:19:37] <elukey>	 ah ok better :)
[17:25:48] <wikibugs>	 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): MinervaClientError sends malformed events - https://phabricator.wikimedia.org/T234344 (10Krinkle) @Jdrewniak If our statsv client is producing request urls with multiple query strings, that's a bug indeed. Feel free to use this task fo...
[17:27:15] <elukey>	 milimetric: all done
[17:27:24] <wikibugs>	 10Analytics, 10MinervaNeue, 10Performance-Team (Radar), 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): MinervaClientError sends malformed events - https://phabricator.wikimedia.org/T234344 (10Krinkle)
[17:27:59] <milimetric>	 kk, will test
[17:28:39] <milimetric>	 looks great, thanks elukey
[17:30:33] <elukey>	 I think that next Q we should upgrade Druid
[17:30:45] <elukey>	 with the hope of a more stable thing
[17:31:32] <elukey>	 will triple check later, going to dinner now!
[17:31:33] <elukey>	 o/
[17:47:25] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10QuickSurveys, 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10Jdlrobson)
[18:02:52] <nuria>	 milimetric, joal: is there any scooping that happens from labs db  tht ais not mw history? (we scoop geoeditors from prod so i cannot think of anything else)
[18:03:42] <milimetric>	 nuria: yeah tables like pagelinks that folks have requested
[18:03:56] <mforns_>	 joal, I'm seeing that Hive.py does not support partition values that have dots (.) in them, is that needed for security reasons? or could it change?
[18:04:05] <nuria>	 milimetric: and those we scoop monthly correct?
[18:04:07] <milimetric>	 but I don’t think we process and publish anything else that we sqoop
[18:04:21] <milimetric>	 yes, all sqoops are monthly
[18:04:28] <nuria>	 k
[18:27:12] <wikibugs>	 10Analytics, 10Cloud-Services, 10Developer-Advocacy (Oct-Dec 2019): Develop a tool or integrate feature in existing one to visualize WMCS edits data - https://phabricator.wikimedia.org/T226663 (10Milimetric) >>! In T226663#5542873, @bd808 wrote:  > @Milimetric I don't quite understand what happened here, so...
[18:31:42] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2] Add network-origin to the geoeditors-daily table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538613 (https://phabricator.wikimedia.org/T233504) (owner: 10Joal)
[18:31:52] <nuria>	 mforns_: want to look at jupyter if it is not too late?
[18:32:27] <mforns_>	 nuria, sure!
[18:32:31] <mforns_>	 batcave?
[18:33:03] <nuria>	 mforns_: yessir
[18:40:21] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Cloud-Services, 10Developer-Advocacy (Jul-Sep 2019), 10Patch-For-Review: add whether an edit happened on cloud VPS to geoeditors-daily dataset - https://phabricator.wikimedia.org/T233504 (10Milimetric) The column has been added and I'm restarting the job so it will be f...
[18:42:44] <milimetric>	 mforns_: anything left to deploy on https://phabricator.wikimedia.org/T223414?  should I just move it to Done?
[18:44:56] <milimetric>	 a-team: gonna deploy refinery source and refinery now, don't see much on the deployment etherpad so ping me if you need to add somethin
[18:45:53] <ottomata>	 k!
[18:46:03] <fdans>	 milimetric: what I merged doesn’t need any actionables right now, deploying is enough
[18:46:13] <milimetric>	 k
[18:50:45] <wikibugs>	 (03PS1) 10Milimetric: Update changelog for 0.0.102 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/541892
[18:51:00] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] Update changelog for 0.0.102 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/541892 (owner: 10Milimetric)
[18:56:58] <ottomata>	 oh ebernhardson  is that supposed to be queue inside queue?
[18:57:01] <ottomata>	 that's not a thing, is it?
[18:59:35] <ottomata>	 anyway, i 'fixed' it
[18:59:38] <ottomata>	         <schedulingMode>fifo</schedulingMode>
[18:59:39] <ottomata>	 oops
[18:59:43] <ottomata>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/541895
[18:59:57] <ottomata>	 i think yarn resourcemanager needs to be restarted for that to take affect
[19:00:04] <ottomata>	 i can do it if/when you need it, or we can wait until next time it happens
[19:00:09] <ottomata>	 for maintenance reasons
[19:10:58] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Cloud-Services, 10Developer-Advocacy (Jul-Sep 2019), 10Patch-For-Review: add whether an edit happened on cloud VPS to geoeditors-daily dataset - https://phabricator.wikimedia.org/T233504 (10Nuria) I do not think that is needed as #cloud-services team has that data from...
[19:41:29] <milimetric>	 ottomata: scap deploy failed on stat1007 canary:
[19:41:33] <milimetric>	 https://www.irccloud.com/pastebin/c8bkABsR/
[19:41:48] <ottomata>	 yeahhhhh
[19:42:02] <ottomata>	 hm oh this isn't merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/541775
[19:42:06] <ottomata>	 so not the problem
[19:42:39] <ottomata>	 you know milimetric, that changed made me realize, the git fat sha symlink to the jar in archiva is not immediate.
[19:42:48] <ottomata>	 it is created by a cron that runs every 5 minutes
[19:42:52] <wikibugs>	 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Ijon) Fine by me.
[19:43:08] <ottomata>	 maybe if you attempt to deploy too soon after the refinery-source release happens
[19:43:18] <ottomata>	 it'll fail.
[19:43:18] <ottomata>	 hm
[19:43:29] <milimetric>	 ottomata: are you telling me to wait 5 minutes and try again? :P
[19:43:38] <ottomata>	 possibly!
[19:43:51] <milimetric>	 ok, then this is what I will do
[19:45:18] <milimetric>	 (I was just kidding, what you say makes sense, but I want to try unplugging and plugging something in after this)
[19:47:18] <ottomata>	 ya milimetric  i htink i should fix up that git-fat link script
[19:47:27] <ottomata>	 it is too heavy with so much stuff in archiva
[19:47:43] <ottomata>	 it computes shasums of every artifact in archiva every 5 minutes
[19:47:52] <ottomata>	 so, i can see the process running now
[19:47:56] <ottomata>	 started 3 mins ago, not done yet.
[19:48:03] <milimetric>	 canary worked this time but it failed on all the other targets it looks like
[19:48:39] <ottomata>	 it worked?  seems surprising i don't see that sha yet
[19:54:33] <milimetric>	 it just finished failing again, I'm trying everything again, but yeah, the second time the canary worked really fast... which seems weird
[19:55:16] <milimetric>	 ottomata: really weird, this time it just flew though the scap deploy and finished
[19:55:38] <milimetric>	 !log refinery ... probably? deployed with errors like "No such file or directory (2)\nrsync error"
[19:55:40] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[19:56:45] <ottomata>	 uhhh hm
[19:56:50] <milimetric>	 ottomata: I did something wrong... the refinery commit says vv0.0.102
[19:56:56] <milimetric>	 but I removed that when I ran the build!
[19:57:03] <milimetric>	 (the leading v)
[19:57:23] <ottomata>	 ya milimetric  somethign is wrong i think
[19:57:35] <ottomata>	 artifacts/org/wikimedia/analytics/refinery/refinery-cassandra-v0.0.102.jar
[19:57:47] <ottomata>	 all the 102 jars are v0.0.102
[19:57:50] <ottomata>	 and their shas are wrong
[19:57:51] <ottomata>	 da39a3ee5e6b4b0d3255bfef95601890afd80709
[19:57:55] <ottomata>	 each has the same
[19:58:25] <milimetric>	 arghhh
[19:59:15] <ottomata>	 hm is wikitech down for you?
[19:59:51] <ottomata>	 ah is back.
[20:00:58] <ottomata>	 milimetric:  assuming the previous version value was bad, you might be able to just run https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars/build again
[20:01:02] <ottomata>	 with the 'v'-less version
[20:01:41] <milimetric>	 I swear I didn't type a v in that stupid box, just like last time this happened
[20:02:29] <milimetric>	 ok, I'll run that and run scap again after... I guess
[20:02:33] <ottomata>	 ya
[20:06:03] <icinga-wm>	 PROBLEM - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[20:06:21] <icinga-wm>	 PROBLEM - Check the last execution of eventlogging_to_druid_netflow_hourly on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_netflow_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[20:08:49] <icinga-wm>	 PROBLEM - Check the last execution of eventlogging_to_druid_navigationtiming_hourly on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_navigationtiming_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[20:13:58] <milimetric>	 ottomata: same thing this time as far as scap is concerned - takes a while and fails, then if you rerun it finishes right away
[20:14:12] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 4 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10Ottomata) > We should probably not ask mobile clients to download a bunch of kilobytes that they will proba...
[20:14:30] <ottomata>	 milimetric:  that makes sense i think; git fat pull isn't happening the second time around
[20:14:54] <ottomata>	 the git pull of the refinery repo works;  git fat shas are updated, git fat fails.
[20:14:55] <milimetric>	 yeah, but it doesn't matter how long I wait before running the first time, it still fails
[20:15:01] <ottomata>	 next scap, refinery git doesnt change versions
[20:15:03] <ottomata>	 so no git fat is run
[20:15:05] <milimetric>	 right
[20:15:30] <ottomata>	 ok the shas look better
[20:15:32] <ottomata>	 let me check in archiva
[20:15:52] <ottomata>	 milimetric: what is the git fat failure?
[20:16:10] <milimetric>	 same as what I pasted above, the no such file or directory
[20:16:28] <ottomata>	 ya but which sha?
[20:16:45] <ottomata>	 (that will tell me which file)
[20:16:55] <ottomata>	 i checked a few of the ones you just added
[20:16:59] <ottomata>	 they exist in archiva;
[20:17:51] <milimetric>	 hm, now refinery-deploy-to-hdfs fails
[20:18:25] <milimetric>	 oh sorry I closed that terminal
[20:19:03] <milimetric>	 https://www.irccloud.com/pastebin/dPYihif2/
[20:20:15] <ottomata>	 milimetric:  makes sense
[20:20:16] <ottomata>	 git fat failed
[20:20:23] <ottomata>	 but we need to know which files failed.
[20:20:28] <ottomata>	 i will try git fat pull on deploy host
[20:21:03] <ottomata>	  /git-fat/da39a3ee5e6b4b0d3255bfef95601890afd80709
[20:21:04] <ottomata>	 oh
[20:21:04] <ottomata>	 milimetric: 
[20:21:17] <ottomata>	 this is because the vXXX files that were around from before are still there
[20:21:20] <ottomata>	 we need to remove them manually
[20:21:23] <ottomata>	 willd oo
[20:21:52] <milimetric>	 thanks, appreciate it
[20:23:14] <wikibugs>	 (03PS1) 10Ottomata: Removing bad jars accidentally added by jenkins during release [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541909
[20:23:31] <wikibugs>	 (03CR) 10Ottomata: [V: 03+2 C: 03+2] Removing bad jars accidentally added by jenkins during release [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541909 (owner: 10Ottomata)
[20:23:34] <icinga-wm>	 PROBLEM - Check the last execution of refine_mediawiki_events on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refine_mediawiki_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[20:23:40] <ottomata>	 will look ^
[20:24:30] <ottomata>	 ?
[20:24:43] <ottomata>	 jobs are failing beacuse of missing jars?
[20:24:50] <ottomata>	 but....thhe old versions?
[20:25:27] <ottomata>	 dunno what's up there, expect those to be fixed afte rthis scap deploy
[20:27:56] <nuria>	 ottomata: k
[20:27:59] <milimetric>	 maybe something's referencing current or something?
[20:28:08] <ottomata>	 maybe?  but not all of those.
[20:28:20] <ottomata>	 milimetric:  my scap deploy is hanging at
[20:28:20] <ottomata>	 analytics/refinery: fetch stage(s):  14% (ok: 1; fail: 0; left: 6)
[20:28:31] <ottomata>	 canayr was succesful
[20:28:38] <ottomata>	 oh
[20:28:40] <ottomata>	 it just moved
[20:28:41] <ottomata>	 ok
[20:28:44] <ottomata>	 probably taking a while to pull jars?
[20:30:21] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 4 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10Ottomata) Ok, https://github.com/ottomata/mediawiki-extensions-ConfigExports now supports filtering on conf...
[20:31:37] <milimetric>	 yeah, when it works it does take a really long time
[20:32:55] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10ORES, and 5 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180 (10Ottomata) FYI, we fixed the hairy problems by supporting map type fields, which the revisions-score stream uses!  mediawik...
[20:33:00] <icinga-wm>	 PROBLEM - Check the last execution of refine_eventlogging_analytics on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refine_eventlogging_analytics https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[20:33:00] <icinga-wm>	 PROBLEM - Check the last execution of refine_mediawiki_job_events on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refine_mediawiki_job_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[20:33:44] <wikibugs>	 10Analytics, 10Event-Platform, 10Core Platform Team Legacy (Watching / External), 10Services (watching): log-events topic emitted in EventBus - https://phabricator.wikimedia.org/T155804 (10Ottomata) 05Open→03Declined Don't think this will go anywhere, declining!  Feel free to reopen.
[20:34:47] <ottomata>	 1 left...
[20:35:44] <wikibugs>	 10Analytics, 10Event-Platform, 10CPT Initiatives (Modern Event Platform (TEC2)), 10Patch-For-Review, 10Services (next): mediawiki/recentchange event should not use fields with polymorphic types - https://phabricator.wikimedia.org/T216567 (10Ottomata) 05Open→03Declined
[20:35:49] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 3 others: CI Support for Schema Registry - https://phabricator.wikimedia.org/T206814 (10Ottomata)
[20:36:38] <wikibugs>	 10Analytics, 10Event-Platform, 10Core Platform Team Legacy (Watching / External), 10Services (watching): Failure in EventBus schema for mediawiki/revision/visibility-change - https://phabricator.wikimedia.org/T187362 (10Ottomata) 05Open→03Stalled
[20:41:46] <nuria>	 milimetric: still doing the scap deploy?
[20:42:13] <milimetric>	 nuria: Andrew was fixing it, a few minutes ago he said one host was left
[20:43:38] <ebernhardson>	 ottomata: thanks for updating yarn, i'll try it out today
[20:44:08] <ottomata>	 ebernhardson:  i haven't restarted resourcemanager
[20:44:19] <ottomata>	 nuria:  yeah it is still doing one host
[20:44:21] <ottomata>	 not sure which one
[20:44:32] <ottomata>	 milimetric:  you should be able to run deploy to hdfs thoo
[20:44:34] <ottomata>	 go ahead and do that
[20:44:38] <ottomata>	 from stat1007 i guess?
[20:44:47] <nuria>	 ottomata:  pushing ?
[20:44:48] <milimetric>	 ok
[20:44:58] <ottomata>	 nuria:  i think it is git fat pulling on some target node
[20:45:03] <ottomata>	 analytics/refinery: fetch stage(s):  85% (ok: 6; fail: 0; left: 1)
[20:45:04] <milimetric>	 yeah, it's working now
[20:45:43] <ebernhardson>	 ottomata: hmm, yarn.wikimedia.org reports the queue, not sure
[20:46:17] <ottomata>	 oh
[20:46:20] <ottomata>	 then great
[20:46:21] <ottomata>	 no restart needed
[20:46:23] <ottomata>	 go ahead
[20:46:26] <ebernhardson>	 ok :)
[20:52:54] <milimetric>	 !log deploy of refinery and refinery-source 0.0.102 finally seems to have finished
[20:52:56] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[20:53:10] <milimetric>	 ottomata: if that other host is done, then nuria you can go ahead and bump up the version and restart
[20:53:36] <milimetric>	 nuria: I wanted to do that but I'm in a meeting now, I can do it in a couple of hours
[20:54:13] <nuria>	 milimetric: let's see that alarms go to ok before doing anything, i can bump jar and backfill after that
[20:54:49] <milimetric>	 ok, I'll do the geoeditors_daily job
[20:54:58] <milimetric>	 (also when alarms are ok)
[20:57:56] <nuria>	 milimetric: right,  i do not understand why did alarms triggered (cc ottomata ) for jobs like refine that have the older jars there
[20:58:04] <nuria>	 milimetric: do you understand that?
[20:58:11] <milimetric>	 not at all
[20:58:29] <milimetric>	 there's a whole layer of magic around how git fat works and all I know is that sometimes it doesn't and everything explodes
[20:59:22] <ottomata>	 nuria:  i dont' know why, i think perhaps that  when the git fat pull failed, the old gitfat jars must have been left in a bad state?
[20:59:25] <ottomata>	 but i am not totally sure.
[21:00:21] <ottomata>	 ya, on an-coord right now even old jars don't have real content
[21:00:29] <nuria>	 ottomata: whatata?
[21:00:59] <ottomata>	 i think it is in the middle of scap deploy
[21:01:04] <ottomata>	 that's the remaining host
[21:01:09] <ottomata>	 dunno whwy its taking so long...
[21:01:11] <nuria>	 ottomata:  i do not understand, git fat shoudl pull with a sha just the new jars , right?
[21:01:56] <ottomata>	 not 100% sure with scap, because scap does some symlink swapping when it deploys new versions
[21:02:04] <ottomata>	 so it might have to re-pull each git fat jar every time
[21:02:05] <ottomata>	 not sure.
[21:02:18] <ottomata>	 going to ctrl-c the scap deploy and try an-coord again
[21:03:03] <ottomata>	 ok looks good now
[21:03:09] <ottomata>	 dunno why it didn't finish
[21:03:16] <ottomata>	 but i see real jar files now
[21:03:23] <ottomata>	 so, expecting recovery for next set of refines schedule
[21:09:27] <nuria>	 ottomata: k
[21:25:57] <nuria>	 milimetric:   the changelog  is missing the addition of the new column to geoeditors daily , we cannot change it now as it is pushed but FYI
[21:26:02] <icinga-wm>	 RECOVERY - Check the last execution of refine_mediawiki_job_events on an-coord1001 is OK: OK: Status of the systemd unit refine_mediawiki_job_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[21:26:42] <milimetric>	 nuria: the column was just added to refinery not refinery-source
[21:27:00] <icinga-wm>	 RECOVERY - Check the last execution of refine_mediawiki_events on an-coord1001 is OK: OK: Status of the systemd unit refine_mediawiki_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[21:35:29] <nuria>	 milimetric: ah yes, that's right
[21:36:40] <icinga-wm>	 RECOVERY - Check the last execution of refine_eventlogging_analytics on an-coord1001 is OK: OK: Status of the systemd unit refine_eventlogging_analytics https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[21:37:08] <ottomata>	 gotta run byeyaa
[21:41:27] <nuria>	 milimetric: ahem , given that nobody is around to merge the jar bump for refine: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/541929/ I will backfill tomorrow
[21:41:54] <milimetric>	 ok nuria, sounds good
[21:56:20] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Cloud-Services, 10Developer-Advocacy (Jul-Sep 2019), 10Patch-For-Review: add whether an edit happened on cloud VPS to geoeditors-daily dataset - https://phabricator.wikimedia.org/T233504 (10Milimetric) ok, I restarted the monthly job and this column will be populated go...
[22:04:32] <icinga-wm>	 RECOVERY - Check the last execution of eventlogging_to_druid_navigationtiming_hourly on an-coord1001 is OK: OK: Status of the systemd unit eventlogging_to_druid_navigationtiming_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[22:05:40] <wikibugs>	 (03PS1) 10MNeisler: Add the MobileWebUIActionsTracking schema to EventLogging whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541946 (https://phabricator.wikimedia.org/T234563)
[22:06:00] <icinga-wm>	 RECOVERY - Check the last execution of eventlogging_to_druid_netflow_hourly on an-coord1001 is OK: OK: Status of the systemd unit eventlogging_to_druid_netflow_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[22:11:54] <icinga-wm>	 RECOVERY - Check the last execution of refine_sanitize_eventlogging_analytics_immediate on an-coord1001 is OK: OK: Status of the systemd unit refine_sanitize_eventlogging_analytics_immediate https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[22:26:37] <wikibugs>	 (03CR) 10Nuria: [C: 04-1] Add the MobileWebUIActionsTracking schema to EventLogging whitelist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541946 (https://phabricator.wikimedia.org/T234563) (owner: 10MNeisler)
[23:11:23] <wikibugs>	 10Analytics, 10Analytics-Kanban: Enable geoeditors_daily deletion - https://phabricator.wikimedia.org/T234238 (10Nuria)
[23:12:28] <wikibugs>	 10Analytics: dumps.wikimedia.org/other/mediawiki_history is missing some files - https://phabricator.wikimedia.org/T235112 (10Nuria) a:03Ottomata
[23:12:42] <wikibugs>	 10Analytics: dumps.wikimedia.org/other/mediawiki_history is missing some files - https://phabricator.wikimedia.org/T235112 (10Nuria) I think @Ottomata did this sync by hand
[23:15:52] <wikibugs>	 10Analytics, 10Research: Taxonomy of new user reading patterns - https://phabricator.wikimedia.org/T234188 (10Nuria) @MGerlach Logged in users have a different pattern through the site, it will be worth checking that all his requests are served via varnish, if they are not (which might be the case) you have an...
[23:19:11] <wikibugs>	 10Analytics, 10Research: Taxonomy of new user reading patterns - https://phabricator.wikimedia.org/T234188 (10Nuria) Ok, so i confirmed the data will be indeed there for all  logged in users, just pageview times will be longer on repeated pageviews cause those pages are not cached
[23:23:13] <wikibugs>	 10Analytics, 10Analytics-Kanban: Superset not able to load a  reading dashboard - https://phabricator.wikimedia.org/T234684 (10Nuria) Pinging @JAllemandou in case he has another ideas
[23:24:58] <wikibugs>	 10Analytics-EventLogging, 10Analytics-Kanban: Sunset MySQL data store for eventlogging - https://phabricator.wikimedia.org/T159170 (10Nuria)
[23:25:43] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Drop page create event data on mysql - https://phabricator.wikimedia.org/T233892 (10Nuria) 05Open→03Resolved
[23:25:45] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10Nuria)
[23:27:45] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Coarse alarm on data quality for refined data based on entrophy calculations - https://phabricator.wikimedia.org/T215863 (10Nuria) p:05Normal→03High
[23:28:35] <wikibugs>	 10Analytics, 10Research: Recommend the best format to release public data lake as a dump - https://phabricator.wikimedia.org/T224459 (10Nuria) 05Open→03Resolved
[23:28:38] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Research-Backlog, 10Patch-For-Review: Release edit data lake data as a public  json dump /mysql dump, other? - https://phabricator.wikimedia.org/T208612 (10Nuria)
[23:29:43] <wikibugs>	 10Analytics, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Nuria) Ping @bblack to give us some priorities around this work
[23:32:15] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Cloud-Services, 10Developer-Advocacy (Oct-Dec 2019): Explore importing geoeditors_daily data (aggregated edits per namespace per country per wiki) into druid - https://phabricator.wikimedia.org/T234281 (10Nuria) 05Open→03Declined
[23:32:18] <wikibugs>	 10Analytics, 10Cloud-Services, 10Developer-Advocacy (Oct-Dec 2019): Develop a tool or integrate feature in existing one to visualize WMCS edits data - https://phabricator.wikimedia.org/T226663 (10Nuria)
[23:34:25] <wikibugs>	 10Analytics: Use virtual image views to filter mediacounts - https://phabricator.wikimedia.org/T211030 (10Nuria) Pinging @Tgr but i think the logic to report mediacounts to varnish endpoint was dismantled at some point.
[23:36:21] <wikibugs>	 10Analytics: add agent-type dimension to pageviews per country endpoint - https://phabricator.wikimedia.org/T233238 (10Nuria) Declining, we decided a while back these endpoints are for tagged "user" traffic
[23:36:30] <wikibugs>	 10Analytics: add agent-type dimension to pageviews per country endpoint - https://phabricator.wikimedia.org/T233238 (10Nuria) 05Open→03Resolved
[23:36:39] <wikibugs>	 10Analytics: add agent-type dimension to pageviews per country endpoint - https://phabricator.wikimedia.org/T233238 (10Nuria) 05Resolved→03Declined
[23:39:28] <wikibugs>	 10Analytics, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Nuria) a:03JAllemandou
[23:39:51] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, 10observability: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Nuria)