[00:29:22] <ottomata>	 Amir1:  what do you want? i think so
[00:29:35] <ottomata>	 not all jobs, since the data is not really schemaed well
[00:29:54] <Amir1>	 ottomata: I want to get the parameters of the jobs that were run
[00:30:03] <Amir1>	 query them basically
[00:30:12] <ottomata>	 see the event.mediawiki_job_* tables in hive
[00:30:23] <ottomata>	 not all jobs are refineable, but many are!
[00:32:32] <Amir1>	 oh thanks
[00:47:05] <wikibugs>	 (03PS8) 10Nuria: [WIP] Table and workflow for features computations per session [analytics/refinery] - 10https://gerrit.wikimedia.org/r/552943 (https://phabricator.wikimedia.org/T238360)
[02:27:05] <wikibugs>	 10Analytics, 10Research-Backlog, 10Wikidata: Copy Wikidata dumps to HDFS - https://phabricator.wikimedia.org/T209655 (10GoranSMilovanovic) @JAllemandou Do you think it would be possible to produce a new version of this data set? The latest update seems to be:  `2019-10-03 09:29 /user/joal/wmf/data/wmf/mediaw...
[04:19:04] <wikibugs>	 10Analytics, 10Datasets-Archiving, 10Research-Backlog: Make HTML dumps available - https://phabricator.wikimedia.org/T182351 (10leila) a:05leila→03None
[04:19:11] <wikibugs>	 10Analytics, 10Datasets-Archiving, 10Research-Backlog: Make HTML dumps available - https://phabricator.wikimedia.org/T182351 (10leila) >>! In T182351#5710173, @ArielGlenn wrote: >>>! In T182351#5709123, @leila wrote: >> @tizianopiccardi thanks for the update and great to see that you're there. :) Please make...
[07:57:43] <elukey>	 joal: bonjour!
[07:58:00] <elukey>	 so spark code logs as DEBUG hive token stuff -.-
[07:58:02] <elukey>	 19/12/04 07:55:52 DEBUG HiveDelegationTokenProvider: Getting Hive delegation token for analytics/analytics1030.eqiad.wmnet@WIKIMEDIA against hive/_HOST@WIKIMEDIA at thrift://analytics1030.eqiad.wmnet:9083
[07:58:13] <elukey>	 this is for the metaas
[07:58:17] <elukey>	 *metastore
[07:58:47] <elukey>	 19/12/04 07:55:52 DEBUG HiveDelegationTokenProvider: Get Token from hive metastore: Kind: HIVE_DELEGATION_TOKEN, Service: , Ident: 00 2d 61 6e 61 6c 79 74 6
[08:05:02] <elukey>	 and I don't see hive server 2 in https://github.com/eBay/Spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
[08:10:22] <elukey>	 but there myst be somewhere
[08:14:25] <elukey>	 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/security/README.md
[08:14:31] <elukey>	 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/security/README.md
[08:32:02] <elukey>	 in fact, https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala mentions only metastore
[09:06:49] <elukey>	 so to summarize
[09:07:26] <elukey>	 1) the DataFrameToHive code is triggered only when create/alter statements need to be execute, otherwise it doesn't
[09:08:33] <elukey>	 2) it seems that the JDBC connection would need a special handling for credentials, the code doesn't retrieve any delegation token so it may be our responsibility to do so, or just provide a valid keytab 
[09:57:23] <elukey>	 ok I just merged https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/554473/
[09:57:36] <elukey>	 and I am backfilling in the testing cluster refine navtiming
[11:13:24] <elukey>	 ok all good, works like a charm
[11:36:44] <elukey>	 !log restart mariadb on analytics1030 (hadoop test coordinator) to test explicit_defaults_for_timestamp - T236180
[11:36:46] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[11:36:47] <stashbot>	 T236180: Deploy search platform airflow service - https://phabricator.wikimedia.org/T236180
[11:36:49] * elukey lunch!
[11:48:11] <wikibugs>	 10Analytics, 10Event-Platform, 10serviceops: conntrack -L - https://phabricator.wikimedia.org/T239795 (10Aklapper)
[11:55:49] <wikibugs>	 10Analytics, 10Event-Platform, 10serviceops: Connection tracking on kubernetes hosts alerts - https://phabricator.wikimedia.org/T239795 (10akosiaris) 05Open→03Resolved p:05Triage→03Normal
[12:01:16] <wikibugs>	 10Analytics, 10Analytics-Kanban: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10Danielsberger) Hi @lexnasser ,   Thank you for setting up the detailed description on wikitech. This is really great!  One thing that I'm wondering about is that...
[12:02:00] <wikibugs>	 10Analytics, 10Event-Platform, 10serviceops, 10Patch-For-Review: Connection tracking on kubernetes hosts alerts - https://phabricator.wikimedia.org/T239795 (10akosiaris)
[12:59:38] <wikibugs>	 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Growth: implement wider data purge window - https://phabricator.wikimedia.org/T237124 (10mforns) @Nuria, I believe, for now, that would be OK for them. @nettrom_WMF explained that they are aiming to make short term analyses of 270 days, and th...
[14:49:49] <wikibugs>	 10Analytics, 10Event-Platform, 10Operations, 10Wikimedia-Logstash, 10observability: Move eventgate logs to new logging infrastructure - https://phabricator.wikimedia.org/T225129 (10Ottomata) 05Open→03Resolved
[15:04:36] <wikibugs>	 (03PS1) 10Mforns: Correct minor details in wmcs queries [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/554527 (https://phabricator.wikimedia.org/T232671)
[15:12:57] <wikibugs>	 (03CR) 10Mforns: "@srishakatux I was troubleshooting the queries, and I realized I had overlooked a couple details in our previous code review. Sorry for th" (0311 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/554527 (https://phabricator.wikimedia.org/T232671) (owner: 10Mforns)
[15:13:26] <wikibugs>	 (03CR) 10Mforns: [V: 03+2 C: 03+2] "Self-merging to unbreak production." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/554527 (https://phabricator.wikimedia.org/T232671) (owner: 10Mforns)
[15:24:50] <joal>	 Hi team
[15:26:36] <mforns>	 hey joal :]
[15:27:06] <joal>	 elukey: I have found some info on HiveServer2 JDBC connection yesterday, but didn't manage to make it work
[15:27:34] <joal>	 elukey: You were right in thinking JDBC-Kerb needs some settings we don't provide as of now in code
[15:27:41] <joal>	 Hi mforns - all good?
[15:28:28] <elukey>	 joal: o/ - I think it is easier to just pass the keytab, it works in test
[15:29:45] <mforns>	 all goooood :]
[15:35:25] <joal>	 elukey: awesome :)
[15:42:51] <wikibugs>	 10Analytics, 10Analytics-Kanban: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10Nuria) @Danielsberger It is the other way arround, the volume of requests to upload is a lot highter. Think that a web page is one made by one text document and m...
[15:46:40] <joal>	 Hi milimetric
[15:47:05] <milimetric>	 hi joal 
[15:47:11] <milimetric>	 arg! sorry forgot to say hello today
[15:48:18] <joal>	 milimetric: I'm answering the email about alerts
[15:48:26] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Upgrade to Superset 0.35 - https://phabricator.wikimedia.org/T236690 (10elukey) The fix for https://github.com/apache/incubator-superset/issues/8676 has been merged to upstream, so it will likely be included in the next release. I...
[15:48:53] <joal>	 milimetric: we should rerun the failed jobs not in sync - The problem is due to overlaoding cassandra when loading the 2 big jobs at the same time.
[15:49:35] <joal>	 milimetric: Then a patch to move mediarequest one hour later is probably enough and not too much disruptive in term of data availability?
[15:50:42] <milimetric>	 joal: I'm not totally up to speed with the other cassandra loading, is it temporary or is it going to keep happening for the next few months?
[15:50:59] <milimetric>	 if it's semi-permanent, yeah, let's move it one hour later
[15:51:35] <milimetric>	 and I'm not sure when to rerun those two failed jobs then, when is cassandra not overloaded?
[15:51:37] <joal>	 milimetric: the problem is unrelated to backfilling - errors are from prod-jobs of yesterday :)
[15:52:43] <milimetric>	 oh I get it now
[15:52:55] <milimetric>	 sending patch / rerunning, thanks!
[15:53:12] <joal>	 milimetric: errors occurs when refining and extracting last hour of the day finishes at similar time for "upload+mediarequest" and "text+pageview"
[15:53:21] <milimetric>	 your email was clear, I got confused with your IRC even though it's almost exactly the same message - interesting
[15:53:50] <joal>	 :D
[15:54:02] <joal>	 I mentionned midnight in the email - could help :)
[15:56:27] <joal>	 milimetric: I look at this graphs to check for cassandra backfiling state: 
[15:56:30] <joal>	 https://grafana.wikimedia.org/d/000000418/cassandra?orgId=1&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=aqs&var-keyspace=local_group_default_T_mediarequest_per_file&var-table=data&var-quantile=99p
[15:57:14] <joal>	 Noticeably, you can see backfilling load jobs in "write rate" graph, and we are interested now about "pending compactions"
[15:58:16] <joal>	 As you can see, pending compactions are almost back to 0 after having gone high while loading was happening - This means the system is back to almost normal state: you restart one of the two jobs (and the other one when the first is done)
[15:58:24] <joal>	 milimetric: --^
[15:58:24] <wikibugs>	 (03PS1) 10Elukey: build_wheels.sh: use the system pip instead of the virtualenv's one [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/554541 (https://phabricator.wikimedia.org/T236690)
[15:58:30] <joal>	 milimetric: makes sense?
[15:59:28] <milimetric>	 joal: yeah, definitely, I just have to groom the tech com backlog now and will restart after
[16:01:57] <joal>	 Thanks a lot milimetric :)
[16:05:07] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] build_wheels.sh: use the system pip instead of the virtualenv's one [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/554541 (https://phabricator.wikimedia.org/T236690) (owner: 10Elukey)
[16:07:07] <wikibugs>	 (03CR) 10Elukey: [V: 03+2 C: 03+2] build_wheels.sh: use the system pip instead of the virtualenv's one [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/554541 (https://phabricator.wikimedia.org/T236690) (owner: 10Elukey)
[16:07:52] <wikibugs>	 (03Abandoned) 10Elukey: WIP - Superset 0.35.1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/552238 (owner: 10Elukey)
[16:08:37] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10Patch-For-Review: Upgrade to Superset 0.35 - https://phabricator.wikimedia.org/T236690 (10elukey)
[16:09:04] <wikibugs>	 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Superset Updates - https://phabricator.wikimedia.org/T211706 (10elukey)
[16:13:22] <wikibugs>	 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Superset Updates - https://phabricator.wikimedia.org/T211706 (10elukey) As FYI to everybody interested, we have tested 0.35.1 (latest upstream) in T236690, but we ended up in a bug that broke a dashboard: https://github.com/apache/incubator-super...
[16:13:46] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10SDC General, 10Wikidata: Create reportupdater reports that execute SDC requests - https://phabricator.wikimedia.org/T239565 (10mpopov) >>! In T239565#5706854, @Milimetric wrote: > Yay, I get to work with @mpopov :)  Aw, I feel likewise! :D  > * how...
[16:33:29] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10SDC General, 10Wikidata: Create reportupdater reports that execute SDC requests - https://phabricator.wikimedia.org/T239565 (10Nuria) Some alternatives: superset can source data from other places than druid and we have couple dashboards on top of so...
[16:43:34] <wikibugs>	 (03PS1) 10Mforns: Escape dollar sign in hive script for wmcs [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/554564 (https://phabricator.wikimedia.org/T232671)
[16:46:27] <wikibugs>	 (03CR) 10Mforns: [V: 03+2 C: 03+2] "Self-merging to unbreak production." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/554564 (https://phabricator.wikimedia.org/T232671) (owner: 10Mforns)
[17:02:11] <wikibugs>	 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10AndyRussG) Hi! Here's my request for the new creds for stat100* and notebook100*, please. Username: andyrussg. Thanks so much for working on this!!!!! :)
[17:15:28] <wikibugs>	 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10Ejegg) Hi! here's my request for Kerberos credentials for Hadoop access on stat100X and notebook100X. My username is ejegg.
[17:32:11] <nuria>	 milimetric: today we need to deploy , ping us when you are back from meeting cc joal 
[17:32:39] <wikibugs>	 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10elukey) >>! In T237605#5712992, @Ejegg wrote: > Hi! here's my request for Kerberos credentials for Hadoop access on stat100X and notebook100X. My username is ejegg.  ` elukey@krb1001:~$ sudo man...
[17:35:05] <milimetric>	 nuria: yep, was planning on deploying in 1.5 hours, at 14:00 EST
[17:35:29] <milimetric>	 I'll do everything in the etherpad, let me know if there are any exceptions/additions
[17:35:43] <nuria>	 milimetric: see: https://etherpad.wikimedia.org/p/analytics-weekly-train
[17:36:13] <nuria>	 joal: i think moving the cassandra job and restart also needs to be added to train etherpad
[17:39:59] <joal>	 yup, will do nuria 
[17:41:51] <wikibugs>	 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10elukey) >>! In T237605#5712943, @AndyRussG wrote: > Hi! Here's my request for the new creds for stat100* and notebook100*, please. Username: andyrussg. Thanks so much for working on this!!!!! :)...
[17:43:25] <wikibugs>	 10Analytics, 10Analytics-Kanban: Delay cassandra mediarequest-per-file daily job one hour so that it doesn't colide with pageview-per-article - https://phabricator.wikimedia.org/T239848 (10JAllemandou)
[17:43:39] <wikibugs>	 10Analytics, 10Analytics-Kanban: Delay cassandra mediarequest-per-file daily job one hour so that it doesn't colide with pageview-per-article - https://phabricator.wikimedia.org/T239848 (10JAllemandou) a:03JAllemandou
[17:43:51] <lexnasser>	 nuria: lmk if there’s anything I could do to get started with my next project. I’m thinking of playing around with Gerrit, but if there’s something more particular you think would be helpful, I’m all ears
[17:44:17] <ottomata>	 lexnasser: did you need some help today copying some stuff somewhere?
[17:46:21] <lexnasser>	 ottomata: Yes, the privacy manager is writing up a risk assessment right now, and I think I’ll need your help to move the files sometime today. nuria: should I wait on James’ risk assessment completion before release?
[17:46:42] <ottomata>	 (we are both in a meeting atm, she'll respond more in a bit i think)
[17:48:11] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10SDC General, 10Wikidata: Create reportupdater reports that execute SDC requests - https://phabricator.wikimedia.org/T239565 (10Abit) Bless y'all's hearts for setting this up for us ♥♥♥
[17:50:31] <nuria>	 lexnasser: i was thinking this one for your next task:  
[17:50:38] <nuria>	 lexnasser: ta-ta-channnnn
[17:51:19] <nuria>	 lexnasser: https://phabricator.wikimedia.org/T239625
[17:51:45] <nuria>	 lexnasser: take a look and let me know,  involves programming in java (little) and adding tests/running queries
[17:54:43] <wikibugs>	 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Growth: implement wider data purge window - https://phabricator.wikimedia.org/T237124 (10nettrom_WMF) @Nuria : I can confirm what @mforns mentions. During my conversations with him yesterday, it became clear to me that how the Growth team is u...
[17:57:58] <wikibugs>	 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Growth: implement wider data purge window - https://phabricator.wikimedia.org/T237124 (10Nuria) @nettrom_WMF which are the schemas subjected to this 270  window?
[18:00:41] <wikibugs>	 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Growth: implement wider data purge window - https://phabricator.wikimedia.org/T237124 (10nettrom_WMF) @Nuria : [[ https://meta.wikimedia.org/wiki/Schema:HelpPanel | HelpPanel ]], [[ https://meta.wikimedia.org/wiki/Schema:HomepageVisit | Homepa...
[18:02:07] <lexnasser>	 nuria: Tentatively, it sounds fitting. I'll read more into it and the referenced tickets
[18:04:49] <wikibugs>	 10Analytics, 10Performance-Team, 10Research, 10Security-Team, and 2 others: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10JFishback_WMF) Due to the low impact of harm, and low opportunity and probability of malicious use of this d...
[18:05:22] <wikibugs>	 10Analytics, 10Performance-Team, 10Research, 10Security-Team, and 2 others: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10JFishback_WMF) #wmf-legal can we please get someone to sign off on this?
[18:20:14] * elukey off!
[18:23:04] <wikibugs>	 (03PS1) 10Mforns: Add funnel parameter to wmcs queries that return multiple rows [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/554585 (https://phabricator.wikimedia.org/T232671)
[18:24:36] <wikibugs>	 (03CR) 10Mforns: [V: 03+2 C: 03+2] "Self-merging to unbreak production." (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/554585 (https://phabricator.wikimedia.org/T232671) (owner: 10Mforns)
[18:58:24] <wikibugs>	 10Analytics: Add pertinent wdqs_external_sparql_query metrics and wdqs_internal_sparql_query  to a superset dashboard - https://phabricator.wikimedia.org/T239852 (10Nuria)
[19:13:20] <wikibugs>	 10Analytics, 10Research-Backlog, 10Wikidata: Copy Wikidata dumps to HDFS - https://phabricator.wikimedia.org/T209655 (10JAllemandou) New dataset available @GoranSMilovanovic. Pinging @Groceryheist  as I also generated the items per page. ` hdfs dfs -ls /user/joal/wmf/data/wmf/mediawiki/wikidata_parquet | tai...
[19:17:09] <milimetric>	 a-team: deployment etherpad for today says "patch is here:" but doesn't link to a patch (https://etherpad.wikimedia.org/p/analytics-weekly-train), was that supposed to be the patch that delays one of the cassandra daily workflows?  If so, I can do that, but just wanted to check
[19:17:46] <joal>	 milimetric: indeed! currently writing the commi message
[19:18:16] <milimetric>	 k, cool
[19:18:26] <milimetric>	 I'll review/merge and deploy after then
[19:19:20] <wikibugs>	 (03PS1) 10Joal: Add delay to cassandra oozie loading [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554593 (https://phabricator.wikimedia.org/T239848)
[19:19:29] <milimetric>	 got it
[19:20:24] <joal>	 milimetric: I'm unhappy with my commit message and have not tested yet
[19:27:39] <wikibugs>	 (03PS2) 10Joal: Add delay to cassandra oozie loading [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554593 (https://phabricator.wikimedia.org/T239848)
[19:30:38] <joal>	 milimetric: tests are succefull for the patch - would you mind reviewing for my poor english (and other less important code related stuff ;)
[19:30:56] <wikibugs>	 (03CR) 10Joal: [V: 03+2] "Validates on cluster" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554593 (https://phabricator.wikimedia.org/T239848) (owner: 10Joal)
[19:31:00] <milimetric>	 hm, joal what about the typo here: https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/554593/1/oozie/cassandra/daily/coordinator.xml
[19:31:09] <milimetric>	 <instance>...</start-instance>
[19:31:55] <joal>	 milimetric: yeqh, corrected that in patch 2 :)
[19:32:00] <milimetric>	 ah ok
[19:32:05] <joal>	 only that in the 3 coords
[19:32:20] <joal>	 milimetric: --^ you can keep your ongoing review and comments :)
[19:33:36] <milimetric>	 my only comment is that the comments make it seem like you can use monthly/daily/hourly granularity and the delay will apply at the right granularity, but it's only implemented for day/hour for now
[19:33:56] <milimetric>	 I'd update the comments to mention day/hour joal, otherwise looks good
[19:34:30] <milimetric>	 oh neever mind, I missed the third coord
[19:34:33] <milimetric>	 it's good!
[19:34:34] <milimetric>	 nvm
[19:34:56] <wikibugs>	 (03CR) 10Milimetric: [C: 03+2] Add delay to cassandra oozie loading [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554593 (https://phabricator.wikimedia.org/T239848) (owner: 10Joal)
[19:35:05] <joal>	 also milimetric, about kill/restart, I suggest killing only the mediarequest-per-file coord (not the bundle), and restart this one only, to prevent 
[19:35:06] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] Add delay to cassandra oozie loading [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554593 (https://phabricator.wikimedia.org/T239848) (owner: 10Joal)
[19:35:33] <joal>	  reloading 3 days in cassandra
[19:35:34] <milimetric>	 joal: I just reran the specific instance of that coord only, didn't kill anything
[19:35:51] <milimetric>	 and then I'll kill and restart after deploy
[19:36:17] <joal>	 milimetric: let's not forget to rerun the failed pageview-per-article as well
[19:36:17] <milimetric>	 but yeah, I was just going to kill the coord not more
[19:36:31] <joal>	 cool milimetric
[19:36:39] <milimetric>	 joal: of course, when mediarequests finishes, it's on load-cassandra now
[19:36:41] <joal>	 the bundle is nice but super cumbersome
[19:37:00] <milimetric>	 wonder how all this would look like in flyte/airflow
[19:37:04] <joal>	 Thanks a lot milimetric -
[19:37:08] <joal>	 
[19:37:18] <joal>	 huhu
[19:37:25] <joal>	 Gone for diner team, back once done
[19:39:02] <milimetric>	 duh, of course forgot refinery-source
[19:43:04] <milimetric>	 mforns: did you do this on purpose (update the 106 notes instead of adding 107 notes) or should I split the 107 notes out?
[19:43:05] <milimetric>	 https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/552256/1/changelog.md
[19:44:06] <mforns>	 milimetric, hmmm.. wasn't my last deployment 106?
[19:44:27] <milimetric>	 it says 107 in the git log
[19:44:31] <mforns>	 milimetric, oh... yea
[19:44:34] <mforns>	 strange
[19:45:07] <mforns>	 maybe we didn't update the changelog for 106, and I just added the docs by incrementing without looking at the number...
[19:45:09] <mforns>	 sorry
[19:45:15] <mforns>	 was not on purpose
[19:45:21] <mforns>	 can you correct please?
[19:45:25] <milimetric>	 ok, np, ofc
[19:45:39] <wikibugs>	 (03PS1) 10Milimetric: Update changelog.md for v0.0.108 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/554601
[19:45:48] <mforns>	 thanks :]
[19:46:08] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] Update changelog.md for v0.0.108 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/554601 (owner: 10Milimetric)
[19:56:30] <nuria>	 joal: i have a question if you may for your patch for cassandra 
[19:56:49] <nuria>	 joal data_input_delay here: https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/554593/2/oozie/cassandra/hourly/coordinator.xml
[19:56:54] <nuria>	 joal: is used where?
[20:08:12] <milimetric>	 !log deployed refinery source
[20:08:18] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[20:14:37] <milimetric>	 ottomata: getting those same errors scap deploying refinery
[20:14:43] <milimetric>	 shall I ignore and retry as before?
[20:15:27] <milimetric>	 a list of things like (failed: No such file or directory (2)\nrsync: link_stat "/git-fat/72853206967f23d1980d53a9c90fc6139092c878" )
[20:15:50] <milimetric>	 rolled back
[20:16:34] <ottomata>	 hmm
[20:16:43] <ottomata>	 how come it always happens to you dan?
[20:16:44] <ottomata>	 heheh
[20:17:07] <ottomata>	 milimetric:  so the first thign I do here i checkt o see if that sha is actually a real jar
[20:18:23] <ottomata>	 do you know which file that is trying to get?
[20:18:45] <ottomata>	 ok it is artifacts//org/wikimedia/analytics/refinery/refinery-job-0.0.108.jar
[20:19:08] <milimetric>	 uh... I'm like a snail but yes, that and I'm guessing all the other 108 jars I just built
[20:20:06] <ottomata>	 http://127.0.0.1:8080/repository/releases/org/wikimedia/analytics/refinery/job/refinery-job/0.0.108/refinery-job-0.0.108.jar
[20:20:08] <ottomata>	 oops
[20:20:09] <ottomata>	 sorry
[20:20:52] <ottomata>	 ok it is, and the sha is correct
[20:20:57] <ottomata>	 so
[20:21:00] <ottomata>	 ok so things are ok
[20:21:06] <ottomata>	 i think maybe try again dan?
[20:21:10] <milimetric>	 ...
[20:21:15] <milimetric>	 :)
[20:21:23] <milimetric>	 I hate whatever is happening here
[20:21:29] <ottomata>	 might have needed more time between your deploy and the jenkins job finishing
[20:21:39] <ottomata>	 there is a cron that runs on archiva host to create the git-fat symlink
[20:21:53] <milimetric>	 is everyone else like cooking dinner in between these things?  I'm not going particularly fast or anything...
[20:22:04] <ottomata>	 hmm let me make sure the symlink exists
[20:22:24] <ottomata>	 yup it does
[20:22:33] <milimetric>	 like last time, it finished in like 10 seconds this time
[20:22:37] <milimetric>	 (for the canary)
[20:22:49] <ottomata>	 it was created 12 minutes ago
[20:22:56] <milimetric>	 makes sense, when I built
[20:23:06] <ottomata>	 so at :10 past the hour
[20:23:18] <ottomata>	 hm, no, the jar was uploaded 15 minutes agho
[20:23:21] <ottomata>	 at :05 past the hour
[20:23:34] <ottomata>	 the git-fat sha symlink was created 5 minutes later
[20:23:50] <milimetric>	 ok, so I'll add "wait 5 minutes" in the deploy instructions :)
[20:24:05] <ottomata>	 ya the cron runs on every 5 minutes
[20:25:31] <wikibugs>	 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Growth: implement wider data purge window - https://phabricator.wikimedia.org/T237124 (10Nuria) I see, +1 to @mforns idea
[20:34:51] <wikibugs>	 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Growth: implement wider data purge window - https://phabricator.wikimedia.org/T237124 (10mforns) @nettrom_WMF  OK, then!  I will implement a deletion timer specific to those 3 schemas, that will delete all their data from the event_sanitized d...
[20:45:24] <joal>	 nuria: data_input_delay is not used, but it needs to be named (therefore that name) - It is needed to enforce some (or no) delay in jobs
[20:46:02] <nuria>	 joal: the part about "it needs to be named" is what i do understand
[20:46:09] <nuria>	 *do not understand
[20:46:44] <nuria>	 lexnasser: for that work you will need java 1.8 and mvn, ping me if you need help with that
[20:48:59] <joal>	 nuria: input-event sections need a name in oozie (https://github.com/apache/oozie/blob/master/client/src/main/resources/oozie-coordinator-0.4.xsd#L105)
[20:51:07] <milimetric>	 ottomata: now it's complaining about git fat on stat1007:
[20:51:08] <milimetric>	 https://www.irccloud.com/pastebin/df9VgjAi/
[20:51:16] <milimetric>	 when I do the hdfs sync
[20:51:23] <milimetric>	 (sudo -u hdfs /srv/deployment/analytics/refinery/bin/refinery-deploy-to-hdfs --verbose --no-dry-run)
[20:52:11] <milimetric>	 did it on 1004 and it worked fine
[20:52:19] <milimetric>	 so I guess something's wrong on 1007
[20:54:53] <milimetric>	 yeah, there's something really weird on 1007, git status doesn't even work, it throws permission denied errors
[20:56:36] <milimetric>	 looks like that failed deploy left some bad status and references to temp files that are no longer there...
[20:57:25] <milimetric>	 !log finished refinery-deploy-to-hdfs from stat1004 but something's broken on stat1007 in the /srv/deployment/analytics/refinery repo
[20:57:27] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[21:00:28] <wikibugs>	 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10Halfak) I need creds for stat100*.  My username is `halfak`
[21:01:58] <wikibugs>	 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10Halfak) I'd like to request creds for the engineers on my team as we'll (hopefully) be using hadoop a lot more soon.  Usernames: `accraze` and `kevinbazira`  Ping @kevinbazira and @ACraze
[21:27:04] <nuria>	 milimetric: we need to fix the issues at /srv/deployment/analytics/refinery  in stat1007 right?
[21:27:11] <nuria>	 cc ottomata 
[21:27:46] <nuria>	 ottomata: there is an aerror in permissions
[21:27:49] <nuria>	 https://www.irccloud.com/pastebin/Uy8MhBCE/
[21:33:45] <ottomata>	 hm
[21:34:24] <ottomata>	 oh sorry milimetric  missed your ping!
[21:39:55] <ottomata>	 checked out master and git fat pulled
[21:40:47] <ottomata>	 oh need the scap sync branch
[21:40:47] <ottomata>	 doing that
[21:41:20] <ottomata>	 fixed
[21:42:18] <nuria>	 ottomata: shouldn't branch be master?
[21:42:22] <nuria>	 ottomata: cause now
[21:42:23] <nuria>	 https://www.irccloud.com/pastebin/21oa5hGt/
[21:42:30] <ottomata>	 that's right
[21:42:32] <ottomata>	 not master
[21:42:38] <ottomata>	 scap creates a branch at deploy
[21:42:40] <milimetric>	 right
[21:42:43] <milimetric>	 ok, thanks ottomata 
[21:43:02] <milimetric>	 so the bad deploy caused this, right?
[21:43:08] <nuria>	 ottomata: did you needed sudo to fix it?
[21:43:10] <milimetric>	 it's weird that it doesn't fix it when it says the deploy succeeds
[21:43:26] <ottomata>	 i did sudo -u analytics-deploy, but i think i could have fixed frrom deploy server with a scap redeploy
[21:43:31] <ottomata>	 i didn't do any schmod stuff
[21:44:04] <nuria>	 ottomata: k, i saw there was a permission  error, but after looked and all dirs were analytics-deploy owned
[21:45:35] <milimetric>	 (added to docs the importance of waiting 5 minutes and the potential consequences of not)
[21:51:43] <nuria>	 milimetric: for  every thing in life, really
[21:51:56] <milimetric>	 lol
[21:52:04] <milimetric>	 ok, adding to docs that we should wait 5 minutes in general
[22:01:32] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on an-worker1089 - https://phabricator.wikimedia.org/T239365 (10Jclark-ctr) Disk arrived
[22:12:09] <nuria>	 milimetric, joal : delaying the cassandra job is fine but we should not delay the period we rae computing
[22:12:11] <nuria>	 *are
[22:12:27] <nuria>	 milimetric: so we should be computing  0..23 hours just like we were before
[22:12:36] <joal>	 nuria: this is the way it's done
[22:12:51] <joal>	 nuria: base data computation is not affected
[22:12:59] <milimetric>	 I see, just the input events
[22:13:15] <milimetric>	 it's confusing though, until now all jobs had the same input events as actual input
[22:13:17] <joal>	 the cassandra job was waiting for 0..23 hours to be available before starting
[22:13:41] <milimetric>	 maybe we should wait for 0..0(next) instead
[22:14:01] <nuria>	 milimetric: that would not stagger jobs though
[22:14:03] <milimetric>	 anyway, it's late, we can talk tomorrow, nothing that can't be undone
[22:14:04] <joal>	 now it'll wait for 0..23 hours, AND hour 23+1 (meaning a delay of & hour)
[22:14:19] <milimetric>	 it would stagger, 'cause it would still wait for 0(next)
[22:14:50] <nuria>	 joal: but the period yer/month/day calculated is the same correct?
[22:15:00] <joal>	 milimetric: actually not all jobs have same events as input - some have hourly, others daily
[22:15:18] <joal>	 nuria: I don't understand your question
[22:15:50] <nuria>	 joal: sorry, that this did not affect the boundaries of data we are loading
[22:16:09] <nuria>	 joal: like the job starts one hour later but  it is still loading the same data inytervals
[22:16:13] <milimetric>	 nuria: yeah, this was my misunderstanding - joal go to sleep I got it :)
[22:16:14] <joal>	 nope - The hive query doesn't change 
[22:17:02] <joal>	 the boundaries of the hive queries don't change - However the available-data needed for the job to start is more than the data it actually uses (1 hour moe in our example)
[22:17:10] <milimetric>	 nuria: here's where my misunderstanding was, looking at the change makes sense: https://github.com/wikimedia/analytics-refinery/commit/c8de2abf5d70c6a93d737217bfc2595df5ae6f88#diff-eea685b5ab8d58b971f597afcb26e58cR117
[22:17:24] <milimetric>	 nuria: so there are two input-events
[22:17:28] <milimetric>	 one is data_input
[22:17:32] <milimetric>	 and one is data_input_delay
[22:17:56] <milimetric>	 so oozie will wait for the union of those two
[22:18:01] <nuria>	 milimetric: ya, if you see that i asked this same question to joal earlier and i guess i totally missed his answer
[22:18:04] <milimetric>	 data_input is 0..23
[22:18:11] <milimetric>	 data_input_delay is 1..0(next)
[22:18:49] <milimetric>	 so when you look at the coordinator instance in hue, you'll see 1..0(next) because most likely 0(today) is already there so you won't see it listed in the "missing"
[22:19:02] <milimetric>	 so it gives a slightly misleading impression that computation is happening one hour staggered
[22:19:37] <milimetric>	 but really computation is happening based on data_input, and data_input_delay is merely a blocker to force the job to wait some arbitrary amount
[22:20:24] <milimetric>	 clever but slightly confusing if (like me) you're not thinking of the details when looking at the job in hue
[22:21:44] <joal>	 milimetric, nuria - We are very much used to having oozie only rely on data being present because it needs it (which makes sense) - This is not needed by construction, and we take advantage of that in that case
[22:22:10] <milimetric>	 go to sleep, don't make me come to France :)
[22:35:19] <joal>	 I could actually not go to sleep only for the pleasure of having you in France milimetric - But you're right, it's late :)
[22:35:27] <joal>	 See you tomorrow team
[22:36:46] <milimetric>	 yeah, it would've been a win/win for me too :)
[22:40:43] <wikibugs>	 (03PS1) 10Milimetric: Add better usage commands [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554647
[22:41:30] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] "also a bump to the correct version for the wikidata spark jar" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554647 (owner: 10Milimetric)
[22:44:14] <wikibugs>	 (03PS2) 10Milimetric: Fix spark jar version, add better usage commands [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554647
[22:44:24] <wikibugs>	 (03CR) 10Milimetric: [V: 03+2 C: 03+2] Fix spark jar version, add better usage commands [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554647 (owner: 10Milimetric)
[22:56:25] <wikibugs>	 (03CR) 10Nuria: "Nice, thanks for doing these" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554647 (owner: 10Milimetric)
[23:41:52] <wikibugs>	 (03CR) 10Nuria: "Changes look good but please add  a ticket number" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255 (owner: 10Joal)