[00:01:37] <wikibugs>	 (03PS9) 10Nuria: Usage of commons files for tech tunning session metrics [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/606734 (https://phabricator.wikimedia.org/T247417)
[00:05:05] <wikibugs>	 (03CR) 10Nuria: [C: 03+2] Update sqoop list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/607092 (https://phabricator.wikimedia.org/T256013) (owner: 10Joal)
[00:05:18] <wikibugs>	 (03CR) 10Nuria: [V: 03+2 C: 03+2] Update sqoop list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/607092 (https://phabricator.wikimedia.org/T256013) (owner: 10Joal)
[00:21:59] <icinga-wm>	 PROBLEM - Check the last execution of monitor_refine_eventlogging_analytics on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_analytics https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[00:47:25] <wikibugs>	 10Analytics, 10Research, 10WMDE-Analytics-Engineering: Please upgrade R on stat100** servers - https://phabricator.wikimedia.org/T256188 (10GoranSMilovanovic)
[02:26:07] <wikibugs>	 10Analytics-Radar, 10AbuseFilter, 10Cognate, 10ConfirmEdit (CAPTCHA extension), and 28 others: Replace PageContent(Insert|Save)Complete hooks - https://phabricator.wikimedia.org/T250566 (10DannyS712)
[06:23:37] <elukey>	 GoranSM: hi! The RAM on the stat100x boxes is there to be used, so temporary spikes in usage are allowed. If it is a sustained clogging for hours/days then it might be something to follow up.. 
[06:23:59] <elukey>	 but currently the host seems ok as far as I can see
[06:45:42] <elukey>	 sigh already too much space used for an-launcher1001, the 100G partition is not really enough
[06:45:48] <elukey>	 RU consumes a lot of GBs
[06:47:50] <elukey>	 whattt it is the RU's log directory
[06:48:12] <elukey>	 elukey@an-launcher1001:/srv/reportupdater/log/reportupdater-ee-beta-features$ du -hs *
[06:48:15] <elukey>	 48G syslog.log
[06:48:18] <elukey>	 mforns: --^ :D :D :D 
[06:48:58] <elukey>	 there are a ton of logs for
[06:48:59] <elukey>	 RuntimeError: pymysql can not execute query ((1146, "Table 'dewiki.user_properties' doesn't exist")).
[06:49:02] <elukey>	 etc..
[06:50:20] <elukey>	 !log truncate /srv/reportupdater/log/reportupdater-ee-beta-features from 43G to 1G on an-launcher1001 (disk space issues)
[06:50:21] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[06:58:12] <wikibugs>	 10Analytics: RU reportupdater-ee-beta-features keeps logging a lot of daily errors to its logs - https://phabricator.wikimedia.org/T256195 (10elukey) p:05Triage→03High
[07:05:47] <wikibugs>	 10Analytics, 10Tool-Pageviews: Statistics for views of individual Wikimedia images - https://phabricator.wikimedia.org/T210313 (10Nemo_bis) >>! In T210313#5957328, @Nuria wrote: > @BerndFiedlerWMDE Yes, it is the quotes and it is a known problem. moved issue to a different ticket {T247333}  Should this task st...
[07:21:46] <icinga-wm>	 RECOVERY - Check the last execution of monitor_refine_eventlogging_analytics on an-launcher1001 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_analytics https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[07:27:07] <wikibugs>	 (03PS2) 10Conniecc1: Add dimensions to editors_daily dataset [analytics/refinery] - 10https://gerrit.wikimedia.org/r/607361 (https://phabricator.wikimedia.org/T256050)
[08:53:30] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10elukey) I was about to reimage and then I remembered about the `staging` database. I checked and there are ~270 tables o...
[08:53:42] <elukey>	 sigh --^
[09:13:33] <GoranSM>	 elukey: With due respect, if "...GoranSM: hi! The RAM on the stat100x boxes is there to be used, so temporary spikes in usage are allowed...", why was I criticized in the autumn 2019 for using 20 - 40 GB (for a rather critical update)? Current usage *by a single user* on stat1008 is: 85987, since yesterday.
[09:14:12] <wikibugs>	 10Analytics, 10Tool-Pageviews: Statistics for views of individual Wikimedia images - https://phabricator.wikimedia.org/T210313 (10BerndFiedlerWMDE) seems to be working now.
[09:21:27] <elukey>	 GoranSM: my mistake, I quickly checked at top and didn't see that it was still ongoing. Will follow up with the user, let's see if he/she needs all the RAM.
[09:22:19] <elukey>	 GoranSM: a comment related to "criticized" - it seems a strong word, we asked politely to review your usage of the machine, since there were alarms and issues related to it
[09:23:06] <elukey>	 and what we did as follow up to improve the daily work experience on stat100x of all our users has been
[09:23:12] <elukey>	 1) standardize stat100x configurations
[09:23:48] <elukey>	 2) apply automatic cgroups limitations to kill big processes before they could cause OOMs and alerts to the main SRE team
[09:24:07] <elukey>	 3) add nodes with more RAM (like stat1008 that has 128)
[09:24:10] <GoranSM>	 elukey: I don't want anyone stopping their jobs on the stat100* machines for me. It would be useful - for all of us who use the stat100* machines, I think - if people could come up with some estimates on how long would such heavy resource-dependent jobs take to run.
[09:24:29] <GoranSM>	 elukey: Your team's work on the stat100* machines is awesome, no question about that. 
[09:25:21] <elukey>	 GoranSM: my point is that we asked a lot from all of you people, like tuning your jobs etc.., because we were in a stressful situation and we needed to refactor our client nodes to avoid issues with other teams. 
[09:25:31] <GoranSM>	 elukey: "... a comment related to "criticized" - it seems a strong word" - sorry, but in spite of that fact that I've lived two years in the States it seems that I do not know how to use English anymore. Ok: you have asked politely to review my usage of the machine (which you really did). Better?
[09:25:36] <elukey>	 but we also worked a lot in the meantime :)
[09:27:17] <elukey>	 GoranSM: I didn't take any offence, I just wanted to follow up with you to understand why you were feeling "criticized", it is important for us to keep a good work relationship with all of you. If we fail to do so then it is really bad, so better to be very clear and sincere when there is ambiguity (also my understanding of english is far from ideal, so I could have misinterpreted the usage of 
[09:27:23] <elukey>	 "criticized", in case apologies)
[09:28:01] <GoranSM>	 elukey: the situation on the stat100* is superfine. I am not "criticizing" (ooops) here. I am just trying to say that it would be good if anyone needs to use as much as 80+ Gb RAM for something, which is perfectly legitimate in Data Science, and plans to do so for a prolonged period of time - everyone else should know. 
[09:29:02] <elukey>	 GoranSM: completely agree, but this is something that users should hopefully self manage. If you have ideas about how we (as Analytics) could help in facilitating please tell us
[09:29:15] <GoranSM>	 elukey: :) Of course
[09:37:25] <GoranSM>	 elukey: Idea: maybe we could have a page where the users could (a) describe their long running and demanding jobs, and then (b) provide estimates - from experience - on how much resources they are expected to consume and for how long? 
[09:40:52] <elukey>	 GoranSM: I think it could be doable, but most of the times people might not know in advance, or maybe may not be familiar with resource consumption. We could start following up with big users and ask to review their jobs, if we all do it then it should eventually be sufficient in my opinion
[09:42:58] <elukey>	 but if you want to create a page in wikitech and propose the procedure to analytics@ etc.. please go ahead!
[09:43:22] <elukey>	 anything that facilitates/coordinates users is surely good
[09:45:37] <GoranSM>	 elukey: Thank you. I think having a page like that would be a good idea.
[09:46:52] <GoranSM>	 elukey: I will do it at soon as I find some time, and then the Analytics-Engineering team can take a look at it and see if it could be of any help.
[09:59:27] <elukey>	 super
[10:33:28] * elukey lunch!
[11:27:09] <GoranSM>	 elukey: I was in touch with the user whose processes were consuming >80Gb RAM on stat1008: they seem to have not terminated many of their previously running Jupyter processes, resulting in an excessive memory use. A very kind person, responded to my inquiry in no time and took care of the problem. 
[12:19:49] <bearloga>	 ottomata: ? about how streams in wmf-config get merged. say I have streams X (w/ config A) & Y in 'default' and streams X (w/ config B) & Z in 'enwiki'
[12:20:01] <bearloga>	 ottomata: then enwiki will have streams: X (w/ config B, **NOT** A), Y, and Z
[12:20:18] <bearloga>	 ottomata: right?
[12:20:26] <GoranSM>	 @everyone I am now consuming ~27Gb of RAM out of 64Gb on stat1005: it will be over soon. I am trying to optimize the code for some Jaccard matrix computation exactly to be able to run it without excessive RAM usage; experimenting with various R functions in that respect now. Thank you for your understanding.
[12:24:41] <wikibugs>	 10Analytics, 10Better Use Of Data, 10Event-Platform: Stream cc map should not be generated on every pageload - https://phabricator.wikimedia.org/T256169 (10mpopov)
[12:42:13] <wikibugs>	 10Analytics, 10Better Use Of Data, 10Event-Platform: Stream cc map should not be generated on every pageload - https://phabricator.wikimedia.org/T256169 (10mpopov) @Ottomata: do you think we can incorporate that algorithm into EventLogging in a way that the cc map isn't generated client-side on every pageloa...
[12:48:16] <elukey>	 GoranSM: good!
[12:48:31] <elukey>	 (for the jupyter resources freed)
[13:12:08] <wikibugs>	 (03CR) 10Milimetric: "Looks good. Not sure about the repeated tests, like Marcel.  Maybe just test the UDF?  It's testing the underlying function anyway.  But a" (034 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/597740 (https://phabricator.wikimedia.org/T252857) (owner: 10Fdans)
[13:12:20] <milimetric>	 sorry so late on that review fdans
[13:19:48] <wikibugs>	 (03CR) 10Milimetric: [C: 03+2] Add special explode UDTF that turns EZ-style hourly strings into rows [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/596605 (https://phabricator.wikimedia.org/T252857) (owner: 10Fdans)
[13:24:16] <wikibugs>	 (03Merged) 10jenkins-bot: Add special explode UDTF that turns EZ-style hourly strings into rows [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/596605 (https://phabricator.wikimedia.org/T252857) (owner: 10Fdans)
[13:31:14] <ottomata>	 bearloga:  I think that is right
[13:34:10] <bearloga>	 ottomata: ok cool, thanks! :) also we should probably verify that at some point on beta cluster once the sampling config patch is merged
[13:35:04] <fdans>	 milimetric: thank youuuu for the reviews
[13:35:56] <bearloga>	 actually we can do that any time without waiting for that, since we can just make up configs and see which values show up when we query the api for the stream config
[13:37:48] <ottomata>	   ya, also mediawiki-config changes have a CI step that shows the config diff for each wiki
[13:38:33] <bearloga>	 oh that's neat and good to know!
[13:53:34] <mforns>	 hey elukey, just saw your ping this morning...
[13:54:04] <elukey>	 hola hola marcelo
[13:54:14] <mforns>	 :]
[13:54:26] <mforns>	 so logs, huh
[13:55:09] <mforns>	 this makes me think you were right, and we should have alerts whenever a RU query fails?
[13:56:20] <elukey>	 no idea, maybe there are some specific cases in the logs that could be helpful to fail fast
[13:57:05] <elukey>	 I pinged more to check if the specific report need to be fixed and/or re-run etc..
[13:58:04] <mforns>	 elukey: but 48G is a lot! how come that went so far!
[13:59:06] <elukey>	 mforns: after days and days of long stacktraces logged :D
[13:59:48] <mforns>	 still, RU runs every hour, even if every hour we get a couple stack-traces... it would take years to get to that point no?
[14:00:17] <elukey>	 mforns: it is not a couple of stack traces marcel I think it was failing for multiple things for a single report
[14:00:18] <mforns>	 looking into that
[14:00:21] <elukey>	 did you check logs?
[14:00:26] <mforns>	 looking
[14:08:17] <wikibugs>	 10Analytics, 10Better Use Of Data, 10Event-Platform: Stream cc map should not be generated on every pageload - https://phabricator.wikimedia.org/T256169 (10Ottomata) I don't fully grasp exactly what you need, but putting this logic in EventLogging ext makes the most sense.  When I first saw this ticket title...
[14:12:45] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] "omg this makes me so happy.  I had no idea how deep my need for efficiency was..." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/607096 (https://phabricator.wikimedia.org/T256049) (owner: 10Joal)
[14:17:47] <elukey>	 mforns: when you have time let's also chat about RU jobs using db1108
[14:18:16] <mforns>	 elukey: One peculiarity (one could say design flaw too) of RU is that if a query fails, it retries it the next time.
[14:18:52] <mforns>	 so if for some reason, one query always fails, its executions will pile up. Today you execute 1, tomorrow 2, next day 3, and so on
[14:19:35] <mforns>	 the last successful execution of that query was Feb 27, so right now everytime that RU runs, it tries to run the query around 120 times, one per missing date
[14:20:11] <mforns>	 plus, that particular query is exploded by wiki, there are 50 wikis in its whitelist
[14:20:28] <mforns>	 plus, RU runs every hour, even if the report is daily
[14:20:47] <mforns>	 so: 120 * 50 * 24 = 50k stack-traces per day
[14:20:55] <mforns>	 :D
[14:24:28] <elukey>	 looool
[14:24:29] <mforns>	 thing is, those queries have not been modified since 2019-06-14, so the user_properties table must have been removed?
[14:25:31] <elukey>	 no idea
[14:26:46] <elukey>	 maybe we could add a use case that returns non-zero (aborting the overall RU job) if too many query retry are piled up
[14:26:56] <elukey>	 not sure if easy or not
[14:28:54] <milimetric>	 joal: how come the updated clickstream job has to filter out referer is null now?  Doesn't that change the logic?
[14:30:15] <wikibugs>	 (03CR) 10Milimetric: "you missed a couple updates I made to the commit message in previous patches." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/606449 (owner: 10Joal)
[14:31:38] <wikibugs>	 (03CR) 10Milimetric: "https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/606449/5..6//COMMIT_MSG" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/606449 (owner: 10Joal)
[14:34:39] <mforns>	 elukey: or we could return non-zero code, whenever there's a single error, but!: before we do, we finish all other jobs that do work.
[14:35:09] <mforns>	 this way we get the alert, but we also compute everything that is computable
[14:36:04] <elukey>	 sounds good as well!
[14:44:35] <elukey>	 mmm mforns I just realized that page-creation in RU queries reads from the log database on db1108? I guess that the job is either not needed or completely broken by now
[14:46:46] <mforns>	 uou
[14:47:18] <mforns>	 but is this one active?
[14:47:19] <elukey>	 I was about to wipe db1108 today and I discovered it while doing the last checks
[14:47:44] <elukey>	 mforns: there are logs from today, so I guess it runs
[14:48:06] <elukey>	 elukey@an-launcher1001:/srv/reportupdater/log/reportupdater-page-creation$ sudo systemctl list-timers | grep page-creation
[14:48:09] <elukey>	 Wed 2020-06-24 15:00:00 UTC  11min left          Wed 2020-06-24 14:00:01 UTC  47min ago          reportupdater-page-creation.timer                                reportupdater-page-creation.service
[14:49:13] <elukey>	 I am checking /srv/reportupdater/output/metrics/page-creation/pagecreations/enwiki.tsv, and I see all zeros for the past months
[14:50:16] <mforns>	 elukey: I see that job is absented in puppet
[14:50:59] <elukey>	 ah!
[14:51:51] <mforns>	 wait
[14:52:22] <mforns>	 elukey:  no, it's enabled on the hive side
[14:53:37] <elukey>	 mforns: do you have a min for bc before standup?
[14:53:45] <mforns>	 sure!
[14:53:46] <mforns>	 omw
[14:59:03] <joal>	 hi team - milimetric good catch! I added that (referer null removal) assuming it was the reason for the job failing while it was the bug in host-normaliztion - removing (and updating commit message)
[15:01:48] <joal>	 milimetric: the test shouldn't change anything as there are other filters, but minimal-diff is better
[15:03:08] <wikibugs>	 (03PS7) 10Joal: Update clickstream and interlanguage jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/606449 (https://phabricator.wikimedia.org/T255779)
[15:20:17] <wikibugs>	 (03CR) 10Milimetric: [C: 03+2] Update clickstream and interlanguage jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/606449 (https://phabricator.wikimedia.org/T255779) (owner: 10Joal)
[15:20:19] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] Update clickstream and interlanguage jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/606449 (https://phabricator.wikimedia.org/T255779) (owner: 10Joal)
[15:29:06] <wikibugs>	 (03PS6) 10Fdans: Add UDF that transforms Pagecounts-EZ projects into standard [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/597740 (https://phabricator.wikimedia.org/T252857)
[15:29:18] <fdans>	 milimetric:  changes applied :)
[15:36:38] <wikibugs>	 (03CR) 10Joal: [C: 03+2] "Merging for deploy" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/606162 (https://phabricator.wikimedia.org/T255660) (owner: 10Joal)
[15:40:47] <wikibugs>	 (03Merged) 10jenkins-bot: Make ActorSignatureGenerator non-singleton [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/606162 (https://phabricator.wikimedia.org/T255660) (owner: 10Joal)
[15:54:00] <joal>	 fdans: Shall I merge the refinery-source patch?
[15:54:40] <fdans>	 joal: please and thank you!
[15:54:53] <wikibugs>	 (03CR) 10Joal: [C: 03+2] "Merging for deploy!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/597740 (https://phabricator.wikimedia.org/T252857) (owner: 10Fdans)
[15:55:24] <joal>	 also fdans, can you point me to the oozie change that needs to be merged for backfilling?
[15:55:30] <joal>	 please
[15:57:49] <joal>	 nevermind fdans - found it :)
[15:59:37] <wikibugs>	 (03Merged) 10jenkins-bot: Add UDF that transforms Pagecounts-EZ projects into standard [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/597740 (https://phabricator.wikimedia.org/T252857) (owner: 10Fdans)
[15:59:39] <wikibugs>	 (03CR) 10Joal: "One last thing - I can do it as well!" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) (owner: 10Fdans)
[15:59:57] <joal>	 fdans: we need to bump the jar version to 0.0.128 - Do you wish me to do it?
[16:00:24] <fdans>	 joal if you could do it i’d be eternally grateful (making dinner)
[16:00:39] <joal>	 no need for eternity fdans - doing it now :)
[16:01:28] <wikibugs>	 (03PS1) 10Joal: Bump changelog.md to v0.0.128 for deploy [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/607548
[16:02:21] <wikibugs>	 (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/606127 (https://phabricator.wikimedia.org/T255467) (owner: 10Joal)
[16:02:39] <wikibugs>	 (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/606233 (https://phabricator.wikimedia.org/T250744) (owner: 10Joal)
[16:02:56] <wikibugs>	 (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/606449 (https://phabricator.wikimedia.org/T255779) (owner: 10Joal)
[16:03:33] <wikibugs>	 (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/607096 (https://phabricator.wikimedia.org/T256049) (owner: 10Joal)
[16:05:01] <wikibugs>	 (03PS5) 10Joal: Aggregate pageview_hourly from pageview_actor_hourly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/607096 (https://phabricator.wikimedia.org/T256049)
[16:05:06] <wikibugs>	 (03CR) 10Joal: [V: 03+2 C: 03+2] Aggregate pageview_hourly from pageview_actor_hourly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/607096 (https://phabricator.wikimedia.org/T256049) (owner: 10Joal)
[16:07:39] <ottomata>	 mforns:  yt?
[16:07:49] <ottomata>	 need brain bounce on weird eventlogging behavior
[16:09:16] <wikibugs>	 (03PS67) 10Joal: Add pageview historical dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) (owner: 10Fdans)
[16:10:01] <wikibugs>	 (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) (owner: 10Fdans)
[16:11:22] <joal>	 a-team - I'm about to deploy having merged all the patches discussed at standup - last call before train starts :)
[16:11:30] <ottomata>	 go for it!
[16:11:48] <ottomata>	 actually mforns in an hour + would be good, gonna lunch and listen to better use of data staekholder meeting
[16:12:51] <wikibugs>	 (03CR) 10Joal: [C: 03+2] "Merging for deploy" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/607548 (owner: 10Joal)
[16:17:04] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['db1108.eqiad....
[16:18:30] <wikibugs>	 (03Merged) 10jenkins-bot: Bump changelog.md to v0.0.128 for deploy [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/607548 (owner: 10Joal)
[16:18:32] <wikibugs>	 (03CR) 10Joal: [V: 03+2 C: 03+2] Bump changelog.md to v0.0.128 for deploy [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/607548 (owner: 10Joal)
[16:20:35] <joal>	 !log Releasing refinery-source 0.0.128 to archiva
[16:20:36] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[16:20:56] <wmf-insecte>	 Starting build #46 for job analytics-refinery-maven-release-docker
[16:26:21] <joal>	 Leaving for diner, back to continue deployment after
[16:28:42] <wmf-insecte>	 Project analytics-refinery-maven-release-docker build #46: 04FAILURE in 7 min 46 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/46/
[16:29:55] <elukey>	 ah this might be due to the change in password
[16:30:51] <elukey>	 yep Return code is: 401, ReasonPhrase: Unauthorized
[16:33:55] <elukey>	 now how to change it in jenkins?
[16:34:27] <wikibugs>	 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1108.eqiad.wmnet'] `  and were **ALL** successful.
[16:53:15] <elukey>	 there are some info in https://phabricator.wikimedia.org/T210271#5994895
[16:53:22] <elukey>	 can't find anybody from releng around
[16:57:01] <elukey>	 ok I changed a password, not sure if it is the right one
[16:58:28] <mforns>	 ottomata: ping me whenever!
[16:58:35] <elukey>	 tried to kick off a manual rebuild
[16:58:47] <ottomata>	 elukey:  are you talking about the archiva pw in jenkins?
[16:59:09] <ottomata>	 if so
[16:59:11] <ottomata>	 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Deploy/Refinery-source#Changing_the_archiva-ci_password
[16:59:22] <elukey>	 ottomata: yes correct, I changed it recently and it is not in pwstore yet (a key is expired)
[16:59:46] <elukey>	 ottomata: I think I got it right then, let's see if the build works
[16:59:57] <wmf-insecte>	 Starting build #47 for job analytics-refinery-maven-release-docker
[17:05:50] <wikibugs>	 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Analytics: EventLogging MEP Upgrade Phase 3 (Stream cc-ing) - https://phabricator.wikimedia.org/T256165 (10mpopov) That's fair, plus THE use case for this feature is Growth team's experiments and their issues with EditAttemptStep.
[17:06:49] <elukey>	 ottomata: if you have a sec https://gerrit.wikimedia.org/r/#/c/operations/dns/+/607569/
[17:07:01] <wmf-insecte>	 Project analytics-refinery-maven-release-docker build #47: 04STILL FAILING in 7 min 3 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/47/
[17:07:16] <elukey>	 ufffff
[17:11:41] <joal>	 aouch - sorry for having leaved at the wrong moment elukey :S
[17:11:47] <joal>	 elukey: how my I help?
[17:17:01] <elukey>	 joal: I modified the archiva-deploy-user creds on jenkins, as indicated by the doc above, but the build fails
[17:17:08] <elukey>	 I mean, I tried to rebuild from jenkins
[17:17:18] <elukey>	 not sure if kicking off a new one is different
[17:17:30] <joal>	 hm
[17:17:45] <joal>	 not sure how the new docker jobs gets its password
[17:18:40] <elukey>	 ah wait there is archiva-deploy and archiva-ci
[17:19:04] <joal>	 Could very well be elukey! the doc is about archiva-ci
[17:19:41] <elukey>	 lemme retry
[17:19:50] <elukey>	 I may have changed the -deploy one
[17:20:00] <wmf-insecte>	 Starting build #48 for job analytics-refinery-maven-release-docker
[17:21:56] <elukey>	 joal: completely unrelated, but https://archiva-new.wikimedia.org/repository/mirrored/ seems to be working (from a manual GET etc..) like its counterpart in archiva.wikimedia.org
[17:22:11] <elukey>	 but it is a repository group, backed by two separate mirrors (maven central and cloudera)
[17:22:19] <joal>	 elukey: it wasn't?
[17:22:53] <elukey>	 joal: in archiva.wikimedia.org it is a repository by itself
[17:23:01] <elukey>	 in archiva-new it is a repository group
[17:23:13] <joal>	 Ahhh - that change! the repo-group - sorry, / mwe needs to concentrate
[17:23:18] <joal>	 That's 
[17:23:27] <elukey>	 hope that it will work transparently!
[17:23:37] <joal>	 great elukey - I assume we can mirror various different sites?
[17:23:56] <joal>	 I also assume it now is configures to mirror central, and possibly more to come?
[17:24:30] <ottomata>	 ok mforns yt?
[17:24:34] <elukey>	 so I added central as repository (proxied), since the discovery team might want to use it
[17:24:38] <mforns>	 ottomata: yes bc?
[17:24:44] <ottomata>	 ya
[17:24:49] <elukey>	 but we can add as much as we want in theory
[17:25:20] <joal>	 \o/
[17:30:46] <wmf-insecte>	 Yippee, build fixed!
[17:30:46] <wmf-insecte>	 Project analytics-refinery-maven-release-docker build #48: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/48/
[17:31:05] <joal>	 elukey: I like that message --^ :)
[17:31:12] <joal>	 elukey: thanks a milion for the fix!
[17:31:31] <elukey>	 gooooood
[17:32:52] <joal>	 Arf elukey - Didn't thought about that - We released v0.0.130 while we were supposed to release v0.0.128
[17:33:05] <joal>	 Will clean up
[17:34:28] <joal>	 also elukey: There still is a ton of refinery-hive jars in archiva-relesase repo - refinery-job is cleaner, but not hive
[17:34:52] <elukey>	 we can delete them
[17:35:10] <joal>	 same for camus
[17:35:22] <joal>	 ok will do elukey 
[17:36:23] <elukey>	 I didn't find a good scripted way other than manually dropping
[17:36:35] <elukey>	 I can take care of one, you do the other?
[17:36:51] <joal>	 elukey: I want to check which camus jar we use first :)
[17:37:06] <elukey>	 nah it is not needed :P
[17:37:15] <joal>	 :)
[17:38:35] <elukey>	 db1108 finally reimaged!
[17:38:40] <joal>	 \o/
[17:40:59] <joal>	 elukey: https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/analytics/refinery/job/camus.pp#L89
[17:40:59] <wikibugs>	 10Analytics-Radar, 10Core Platform Team, 10Dumps-Generation: HTML Dumps - June/2020 - https://phabricator.wikimedia.org/T254275 (10RBrounley_WMF) Yep, sorry about the delay here @Sj. @Kelson Interesting, learning about this is interesting. I’d love to learn more about your work and how we might best collabor...
[17:41:23] <joal>	 elukey: shall we drop until 0.0.90, or update puppet?
[17:43:11] <joal>	 !log Reseting refinery-source to v0.0.128 for clean release after jenkins-archiva password fix
[17:43:12] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[17:46:16] <elukey>	 joal: we can update puppet and use the last one probably
[17:46:23] <joal>	 works for me elukey 
[17:50:21] <wikibugs>	 (03PS1) 10Joal: Reset poms to v0.0.128-SNAPSHOT to clean release issues [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/607576
[17:50:23] <joal>	 elukey: shall I let you do that?
[17:50:27] <joal>	 also elukey --^
[17:53:33] <elukey>	 what sorry??
[17:53:50] <joal>	 what what? :)
[17:54:52] <elukey>	 the > shall I let you do that?
[17:55:05] <elukey>	 the code review or camus?
[17:55:11] <joal>	 ah - elukey: change the camus jar in puppet
[17:55:28] <elukey>	 do you prefer it done now?
[17:55:39] <joal>	 as you wish - can be done tomorrow
[17:56:09] <elukey>	 yes let's do it tomorrow, taking a note
[17:56:17] <joal>	 ack elukey :)
[17:56:41] <joal>	 elukey: I would however need your review on the above patch please (just a double check)
[18:02:32] <joal>	 looks like I'm alone :) Will self-merge
[18:02:38] <elukey>	 joal: it looks good, I am wondering why that happened?
[18:02:41] * elukey ignorant
[18:02:50] <elukey>	 was it me doing more builds?
[18:02:59] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Reset poms to v0.0.128-SNAPSHOT to clean release issues [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/607576 (owner: 10Joal)
[18:02:59] <joal>	 Ahh - the release job happens in multiple steps (maven steps)
[18:03:05] <joal>	 elukey: -^
[18:03:22] <joal>	 elukey: merging, starting deploy, then explaining more :)
[18:03:33] <elukey>	 all right, no problem, we can sync tomorrow :)
[18:03:34] <wikibugs>	 (03CR) 10Joal: [C: 03+2] "Merging for deploy" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/607576 (owner: 10Joal)
[18:03:59] <elukey>	 going off!
[18:04:07] <elukey>	 if needed ping me on the phone :)
[18:04:21] <joal>	 elukey: the 1st step is to bump the version in POMs (prepare release v0.0.X)
[18:04:35] <joal>	 bye elukey - we'll see tomorrow :)
[18:09:24] <wikibugs>	 (03Merged) 10jenkins-bot: Reset poms to v0.0.128-SNAPSHOT to clean release issues [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/607576 (owner: 10Joal)
[18:15:58] <joal>	 !log launching a new jenkins release  after cleanup
[18:15:59] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[18:16:04] <wmf-insecte>	 Starting build #49 for job analytics-refinery-maven-release-docker
[18:28:24] <wmf-insecte>	 Project analytics-refinery-maven-release-docker build #49: 09SUCCESS in 12 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/49/
[18:44:07] <joal>	 here we are :)
[18:44:31] <wmf-insecte>	 Starting build #18 for job analytics-refinery-update-jars-docker
[18:44:51] <wikibugs>	 (03PS1) 10Maven-release-user: Add refinery-source jars for v0.0.128 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/607594
[18:44:57] <wmf-insecte>	 Project analytics-refinery-update-jars-docker build #18: 09SUCCESS in 26 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/18/
[18:45:24] <wikibugs>	 (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for dpeloy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/607594 (owner: 10Maven-release-user)
[18:47:14] <joal>	 !log Deploying refinery using scap
[18:47:15] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[18:47:41] <joal>	 !log clean archiva from refinery-hive (up to 0.0.115)
[18:47:43] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[18:49:12] <joal>	 Wow - deploying canary took less than a minute :)n
[18:54:48] <wikibugs>	 10Analytics, 10Better Use Of Data, 10Event-Platform: Stream cc map should not be generated on every pageload - https://phabricator.wikimedia.org/T256169 (10mpopov) >>! In T256169#6252704, @Ottomata wrote: > Alternatively, could you just make a config setting that indicated the parent stream for a 'CCed' chil...
[18:54:57] <joal>	 !log Deploying refinery onto HDFS
[18:54:57] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[18:57:43] <joal>	 !log Clean archiva refinery-camus except 0.0.90
[18:57:45] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[19:06:22] <joal>	 !log Create pageview_actor_hourly after deploy to start new jobs
[19:06:23] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[19:08:04] <joal>	 !log Start pageview_actor_hourly oozie job
[19:08:05] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[19:14:19] <joal>	 The heisenbug again!!!!!
[19:14:27] <joal>	 I'm doomed :S
[19:16:12] <joal>	 !log Restarting unique-devices jobs to use pageview_actor_hourly instead of webrequest (4 jobs)
[19:16:14] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[19:22:13] <joal>	 Kill-restart clickstream and interlanguage jobs to read from pageview_actor_hourly
[19:28:02] <leila>	 nuria, do you have a few min to join a budget meeting?
[19:28:44] <joal>	 !log Cleaning refinery-tools from archiva (up to 0.0.115)
[19:28:45] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[19:36:30] <joal>	 !log Cleaning refinery-spark from archiva (up to 0.0.115)
[19:36:31] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[19:41:34] <joal>	 Wow lots of heisenbugs
[19:57:31] <wikibugs>	 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics: session_tick stream configs - https://phabricator.wikimedia.org/T256311 (10mpopov)
[19:57:57] <wikibugs>	 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics: session_tick stream configs - https://phabricator.wikimedia.org/T256311 (10mpopov) a:03mpopov
[19:59:40] <wikibugs>	 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics: session_tick stream configs - https://phabricator.wikimedia.org/T256311 (10mpopov)
[20:16:19] <wikibugs>	 10Analytics-Radar, 10Growth-Team (Current Sprint), 10Product-Analytics (Kanban): Newcomer tasks: update schema whitelist for Guidance - https://phabricator.wikimedia.org/T255501 (10nettrom_WMF)
[20:17:08] <wikibugs>	 10Analytics-Radar, 10Growth-Team (Current Sprint), 10Product-Analytics (Kanban): Newcomer tasks: update schema whitelist for Guidance - https://phabricator.wikimedia.org/T255501 (10nettrom_WMF) Updated the task description to reflect that this work should also remove the hashing of short-term tokens. As disc...
[20:21:32] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Vertical: Migrate SearchSatisfaction EventLogging event stream to Event Platform - https://phabricator.wikimedia.org/T249261 (10Ottomata) Alright, Today I deployed config to migrate SearchSatisfaction to EventGate on...
[20:21:41] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Vertical: Migrate SearchSatisfaction EventLogging event stream to Event Platform - https://phabricator.wikimedia.org/T249261 (10Ottomata) CC @Krinkle if you find some time to help. Thank you!
[20:21:43] <joal>	 Gone for tonight
[20:22:54] <ottomata>	 mforns: 
[20:22:54] <ottomata>	 https://phabricator.wikimedia.org/T249261#6254554
[20:23:02] <ottomata>	 let me know if that sounds right, feel free to add other thoughts you might have
[20:30:43] <wikibugs>	 (03PS1) 10Nettrom: Update whitelisting of Growth Team's schemas [analytics/refinery] - 10https://gerrit.wikimedia.org/r/607615 (https://phabricator.wikimedia.org/T255501)
[20:43:07] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Decommission EventLogging backend components by migrating to MEP - https://phabricator.wikimedia.org/T238230 (10Ottomata) Something I've overlooked:  Camus's eventlogging job uses the `dt` field for hourly partitioni...
[21:07:28] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Decommission EventLogging backend components by migrating to MEP - https://phabricator.wikimedia.org/T238230 (10Milimetric) Ooof, but you can easily have outliers with offline features and buffered events sent in bat...
[21:09:02] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Decommission EventLogging backend components by migrating to MEP - https://phabricator.wikimedia.org/T238230 (10Ottomata) Ah, for the most part, we won't be using the client's time for partitioning, its only during t...
[21:09:43] <wikibugs>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Decommission EventLogging backend components by migrating to MEP - https://phabricator.wikimedia.org/T238230 (10Ottomata) See discussion in {T240460}
[21:27:45] <mforns>	 ottomata: the comment looks great to me
[21:28:51] <ottomata>	 ty
[21:38:52] <wikibugs>	 10Analytics, 10Better Use Of Data, 10Event-Platform: Stream cc map should not be generated on every pageload - https://phabricator.wikimedia.org/T256169 (10Ottomata) > For example, suppose you have streams "edit.growth" and "edit.mobile" in the stream config but not "edit". What happens if an instrumentation...