[00:49:02] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: replace onboard NIC in kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T236327 (10Jclark-ctr) @elukey   Received cards please message me on irc and we can start scheduling replacement
[01:08:29] <wikibugs>	 10Analytics, 10Operations, 10hardware-requests, 10ops-eqiad, 10User-Elukey: Upgrade kafka-jumbo100[1-6] to 10G NICs (if possible) - https://phabricator.wikimedia.org/T220700 (10Jclark-ctr)
[02:50:54] <nuria>	 milimetric (cc mforns ) should it be /features/hourly/automata or /features/automata/hourly?
[02:51:57] <milimetric>	 nuria: we usually keep the granularity last, like daily/hourly
[06:38:39] <wikibugs>	 10Analytics, 10Wikimedia Design Style Guide: Analytics: Some pages/page requests are not reflected in statistics - https://phabricator.wikimedia.org/T239685 (10Volker_E)
[06:39:00] <wikibugs>	 10Analytics, 10Wikimedia Design Style Guide: Analytics: Some pages/page requests are not reflected in statistics - https://phabricator.wikimedia.org/T239685 (10Volker_E)
[06:39:07] <wikibugs>	 10Analytics, 10Wikimedia Design Style Guide: Analytics: Some pages/page requests are not reflected in statistics - https://phabricator.wikimedia.org/T239685 (10Volker_E) p:05Triage→03Normal
[06:58:20] <addshore>	 Goat Morning!
[07:29:43] <elukey>	 morning!
[07:29:47] <elukey>	 addshore: o/
[07:33:03] <elukey>	 joal: bonjour! When you are online let's check alerts together
[07:46:13] <addshore>	 o/
[08:03:16] <joal>	 Good morning
[08:04:26] <wikibugs>	 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10matthiasmullie) @Milimetric it wasn't being used to omit deleted pages (see inline comments about archive table)...
[08:04:41] <joal>	 elukey: ready when you are!
[08:05:45] <elukey>	 hello! I saw your email, just wanted to know if anything was needed or if it was all part of the restart.. I noticed some alerts after your last email
[08:06:43] <joal>	 elukey: all is good - the 2 big jobs failed for their day to rerun (December 1st), I think it was because of cassandra being overwhelmed (too much loading at once)
[08:06:56] <joal>	 elukey: I reran each of them separately and everything is fine
[08:07:14] <elukey>	 super
[08:07:16] <elukey>	 thanks :)
[08:07:20] <joal>	 I just checked jobs this morning: nothing has failed, jobs are faster, I'm gonna confirm log size and I think we can call it done :)
[08:07:47] <joal>	 Thanks a lot for double checking elukey :)
[08:08:02] <joal>	 elukey: I'm ready for sparkerb when you want
[08:08:41] <elukey>	 joal: need 10mins since icinga is like a christmas tree now (not due to analytics alerts)
[08:09:00] <joal>	 elukey: when you want :)
[08:09:53] * joal needs to think of a present to put under the icinga tree for elukey 
[08:15:55] <joal>	 elukey: for big jobs, the cassandra loader log-change has reduced the size big a ratio of ~50k :)
[08:16:00] <joal>	 \o/
[08:17:57] <elukey>	 that is wonderful!
[08:20:51] <elukey>	 stat1007 again with low disk space sigh
[08:22:05] <joal>	 :(
[08:23:13] <wikibugs>	 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10akosiaris) >>! In T236386#5706868, @Ottomata wrote: > @akosiaris I merged and applied https://gerrit.wikimedia.org/r/c/opera...
[08:31:39] <elukey>	 second offending home belongs to Mr Andrew Otto :)
[08:34:51] <joal>	 huhuhu :)
[08:36:34] * joal hopes nobody will look at home sizes on hdfs ... <whistle in the wind looking away />
[08:37:27] <elukey>	 joal: aqs1004 depooled and ready for a test
[08:37:32] <joal>	 \o/
[08:39:07] <joal>	 elukey: also, we can go for me deploying using scap deploy if you prefer
[08:40:34] <elukey>	 joal: I have a cumin cookbook now to roll restart aqs
[08:40:40] <joal>	 Ah - ok :0
[08:40:58] <elukey>	 but yes eventually we should move to a scap config
[08:41:10] <elukey>	 to avoid using puppet at all
[08:41:43] <joal>	 makes sense
[08:41:49] <joal>	 elukey: DEPLOY !
[08:43:03] <elukey>	 ack!
[08:47:23] <elukey>	 ok joal about spark-hive
[08:47:34] <joal>	 yes
[08:47:57] <elukey>	 my understanding was that spark was aware of the need to use hive, and it would have gathered a delegation token for it upon start
[08:48:08] <elukey>	 that could be the case for regular sessions
[08:48:17] <elukey>	 but maybe not for our jdbc refine use
[08:48:53] <joal>	 elukey: I agree spark must gather hive token
[08:49:07] <elukey>	 the error seems clear: from a worker, there is no kerberos analytics credentials to use
[08:49:20] <joal>	 now the question is, how is it used when quering hive through jdbc
[08:49:21] <elukey>	 when passing the keytab, there is an explicit auth 
[08:49:23] <elukey>	 so all works
[08:49:40] <elukey>	 (aqs rolled restart btw)
[08:49:46] <joal>	 \o/
[08:50:10] <joal>	 elukey: maybe asking spark to use implicit creds to connect to hive when available?
[08:51:26] <elukey>	 joal: not sure if it has some, from the yarn logs it seems that there are none
[08:52:04] <joal>	 elukey: it must have one, otherwise it wouldn't manage to connect to hive in spark-shell mode, no ?
[08:52:04] <elukey>	 it started happening right after your deploy (that touched also analytics1030) but no idea what triggered this behavior
[08:52:35] <joal>	 elukey: I didn't understand that - behavior has changed?
[08:52:53] <elukey>	 joal: if it uses hive1 protocol, only hdfs delegation token is required to read data.. when you have to alter tables, then the problem might arise
[08:53:08] <elukey>	 (in which jdbc -> hive2 protocol is used)
[08:53:55] <joal>	 elukey: it also depends how hive tables are read: if using the hive metastore as base to get data info, then some hive creds must be available - if using parquet directly, then hdfs only
[08:56:16] <elukey>	 joal: we cannot really test alter tables with spark.sql() right? Due to the bug that Andrew is working on
[08:56:59] <joal>	 elukey: we can test using spark-shell, or spark-submit, but we can't have refine rely on that
[08:59:11] <elukey>	 joal: ah ok I am reading again https://github.com/apache/spark/pull/21012, the bug is only for some alter table use cases
[08:59:20] <elukey>	 yes let's test it and see if it works or not
[08:59:32] <elukey>	 in yarn mode of course, we need to trigger the command from a worker
[09:00:00] <joal>	 Ah - elukey - this is tricky
[09:00:09] <wikibugs>	 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10daniel) > I was thinking it made sense to exclude rev-deleted revisions: if a page only has (SDC) revisions that...
[09:00:33] <joal>	 elukey: we need to write a spark job for that, we can't use spark-shell
[09:03:49] <elukey>	 https://issues.apache.org/jira/browse/HIVE-5155 is interesting
[09:04:59] <joal>	 elukey: not related to our case I think, as spark talks to metastore directly IIRC
[09:06:13] <elukey>	 joal: we use jdbc to hive-server in refine no?
[09:07:00] <joal>	 elukey: I can;t recall, but you're probably right - So spark uses metastore, and jdbc hive-server, makes sense
[09:48:39] <joal>	 !log Kill restart mediawiki-history-load-coord after sqoop re-import of missing tables
[09:48:41] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:56:23] <wikibugs>	 (03PS1) 10Joal: Fix sqoop after new tables added [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255
[10:03:39] <wikibugs>	 (03PS2) 10Joal: Fix sqoop after new tables added [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255
[11:07:50] <wikibugs>	 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10matthiasmullie) > If you want to count //pages//, then you are only looking at the current revision, right? And...
[11:12:02] <elukey>	 joal: sorry got distracted by a mediawiki issue, back to the problem
[11:12:21] <joal>	 np elukey - I figured out you were not talking to me anymore ;)
[11:12:33] <elukey>	 in my mind it shouldn't be a problem to just explicitly pass keytab and principal to spark-submit
[11:12:36] <elukey>	 :)
[11:13:13] <wikibugs>	 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10daniel) >>! In T238878#5708511, @matthiasmullie wrote: > So page_latest can never point to a revdel'ed revision?...
[11:13:29] <elukey>	 the doc about kerberos in spark is not really complete
[11:15:36] <joal>	 elukey: I think passing creds to spark submit will only be needed for special cases like the one for refine (hive jdbc), and the ones for long running jobs (wikitext-conversion)
[11:15:50] <joal>	 For others, I don't think it'll be needed
[11:16:33] <elukey>	 yep
[11:16:55] <elukey>	 I can add those parameters in puppet for our spark jobs
[11:17:35] <joal>	 elukey: Now the reason why explicit creds are needed for hive-jdbc and not for spark-hive-metastore is kinda mysterious :(
[11:19:19] <elukey>	 joal: did we test spark hive metastore in yarn mode?
[11:20:19] <joal>	 elukey: not sure what 'yarn mode' means here - I have tested spark actions trough oozie, and spark-shell (executors in arn)
[11:20:50] <elukey>	 joal: yarn mode is --master yarn, it shouldn't work in theory
[11:21:03] <joal>	 elukey: it does work with that
[11:21:27] <joal>	 elukey: or should I say my previous tests have passed, as well as todays
[11:22:07] <elukey>	 we can re-do the tests and see yarn logs to figure out how it authenticates, and see what differs
[11:22:38] <joal>	 elukey: I think I have an idea as to why refine needs explicit creds: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-spark/src/main/scala/org/wikimedia/analytics/refinery/spark/connectors/DataFrameToHive.scala#L330
[11:22:51] <joal>	 elukey: logging onto an-tool6 to test
[11:23:50] * joal loves to receive kerberos errors :)
[11:23:55] <elukey>	 joal: I think that the hive principal is not analytics in that case
[11:24:11] <elukey>	 but the one that we use to construct the jdbc url
[11:24:16] <elukey>	 like we do in beeline
[11:25:16] <elukey>	 elukey@analytics1030:~$ sudo grep -A 1 hive.server2.authentication.kerberos.principal /etc/hive/conf/hive-site.xml
[11:25:19] <elukey>	     <name>hive.server2.authentication.kerberos.principal</name>
[11:25:21] <elukey>	 joal: --^
[11:25:23] <elukey>	     <value>hive/_HOST@WIKIMEDIA</value>
[11:25:34] <joal>	 hm
[11:25:54] <joal>	 elukey: Just ran an sql query in arn mode having just kinit
[11:26:10] <elukey>	 can you give me the app id?
[11:26:21] <joal>	 currently ongoing elukey 
[11:26:37] <joal>	 elukey: application_1573209619170_21817
[11:26:53] <joal>	 not closed by the way, so no logs yet
[11:27:01] <joal>	 I guess you want me to close it?
[11:27:18] <elukey>	 joal: if you are done yes please
[11:27:26] <joal>	 sure, done
[11:33:29] <addshore>	 joal: did you manage to have a look at that sqoop? ;D
[11:34:18] <joal>	 addshore: yes, it's currently gathering data we wer missing, and once done it's your turn (I should say wikidata turn :)
[11:34:58] <elukey>	 joal: when you have a min can you re-run the job with https://spark.apache.org/docs/latest/running-on-yarn.html#troubleshooting-kerberos ?
[11:35:11] <elukey>	 because I don't see anything mentioned in the logs about an-coord or metastore
[11:36:40] <addshore>	 joal: thanks!
[11:36:53] <joal>	 elukey: trying
[11:38:40] <joal>	 elukey: seems working - application_1573209619170_21820
[11:39:19] <joal>	 elukey: ticket used is from cache logs say
[11:45:01] <elukey>	 joal: mmm I can see that the ticket is not found in the cache no?
[11:45:13] <joal>	 elukey: found in driver I think
[11:46:22] <joal>	 elukey: first lines of the log
[11:48:03] <elukey>	 joal: can you paste the lines?
[11:48:23] <elukey>	 because I have
[11:48:25] <elukey>	 Container: container_e03_1573209619170_21820_01_000023 on analytics1031.eqiad.wmnet_8041
[11:48:28] <elukey>	 etc..
[11:48:53] <joal>	 First lines even before that
[11:49:16] <elukey>	 I don't have them
[11:49:55] <elukey>	 I am doing
[11:49:57] <elukey>	 sudo kerberos-run-command hdfs yarn logs -applicationId application_1573209619170_21820 --appOwner joal
[11:50:04] <joal>	 AH! I get it - those lines must be my debug mode!
[11:50:09] <elukey>	 ah!
[11:50:21] <joal>	 not related then :)
[11:50:34] <joal>	 BUT - I saw those same lines when launching my spark app
[11:50:39] <joal>	 pasting in a gist
[11:50:44] <elukey>	 super thanks
[12:02:30] <addshore>	 jo-al, please ping me if you notice the resqoop is done =]
[13:13:40] * elukey lunch!
[13:54:19] <wikibugs>	 10Analytics, 10Analytics-Kanban: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10Ottomata) Ok! Find me on IRC let's do it!
[14:24:21] <wikibugs>	 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata)
[14:36:51] <addshore>	 joal: did the sqoop by any chance finish?
[14:36:59] <joal>	 It did !!!
[14:37:10] <joal>	 addshore: 5 minutes ago roughly
[14:37:15] <joal>	 Let me repair the tables
[14:37:18] <joal>	 addshore: --^
[14:37:21] <addshore>	 okay! :D
[14:39:16] <joal>	 done addshore - Can you confirm this is ok?
[14:41:51] <addshore>	 I'll run my queries and check! thanks!
[14:48:25] <addshore>	 joal: still the 2019-10 snapshot?
[14:48:33] <joal>	 addshore: nope, 2019-11 :)
[14:48:38] <addshore>	 AAahhhhh :P
[14:48:41] * addshore updates queries
[14:48:44] <joal>	 stuff got sqooped early this month 
[14:49:04] <addshore>	 aaah, only 11 for that one table though? ;)
[14:49:10] <joal>	 also addshore - data has been sqooped into "prod" tables: wnf_raw database instead of joal
[14:49:20] <addshore>	 hahahaa, okay
[14:49:25] <joal>	 addshore: all tables
[14:49:28] <joal>	 :)
[14:54:52] <addshore>	 joal: all looks good!
[14:54:59] <joal>	 \o/ :)
[14:55:06] <joal>	 addshore: results are as expected?
[15:01:10] <joal>	 Gone for kids
[15:06:02] <wikibugs>	 10Analytics, 10Datasets-Archiving, 10Research-Backlog: Make HTML dumps available - https://phabricator.wikimedia.org/T182351 (10leila) @tizianopiccardi thanks for the update and great to see that you're there. :) Please make sure you advertise for it on wiki-research-l when the right time arrives.  @ArielGle...
[15:27:44] <dcausse>	 ottomata: hi, could we deploy https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/548764 ?
[15:28:25] <ottomata>	 YES!
[15:28:36] <dcausse>	 nice :)
[16:24:06] <wikibugs>	 (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/551690 (https://phabricator.wikimedia.org/T232671) (owner: 10Srishakatux)
[16:28:19] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Wikidata, 10User-Addshore, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): Sqoop wikidata terms tables into hadoop - https://phabricator.wikimedia.org/T239471 (10Addshore) 05Open→03Resolved These things now exist in wmf_raw.wikibase_wbt_item_terms for example =...
[16:34:24] <wikibugs>	 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) Sigh, it turns out SRE wants us to not rewrite paths.  So if we use path based routing, the app needs to handle wh...
[16:36:01] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Wikidata, 10User-Addshore, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): Sqoop wikidata terms tables into hadoop - https://phabricator.wikimedia.org/T239471 (10Addshore) 05Resolved→03Open
[16:36:39] <addshore>	 joal ;_; I made a mistake
[16:37:17] <addshore>	 need to alter 1 more field in 1 of the tables and re-sqoop one of them
[16:41:14] <wikibugs>	 (03PS1) 10Addshore: sqoop, wb_terms, use term_full_entity_id not term_entity_id [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554329 (https://phabricator.wikimedia.org/T239471)
[16:42:52] <wikibugs>	 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) How about:  - events-logging.wikimedia.org/v1/events - events-analytics.wikimedia.org/v1/events  ?  Is events-logg...
[16:44:04] <ottomata>	 sukhe: o/
[16:44:17] <sukhe>	 ottomata: webrequest_sampled_128
[16:44:22] <ottomata>	 ah k
[16:44:29] <ottomata>	 yeah there is a hive table 
[16:44:32] <ottomata>	 wmf.webrequest
[16:44:39] <sukhe>	 I am looking to get: top five or ten ASNs for a country split by time 30 days
[16:44:41] <ottomata>	 should have mostly the same data but is unsampled
[16:45:03] <ottomata>	 you can sample though
[16:45:16] <ottomata>	 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling#LanguageManualSampling-SamplingBucketizedTable
[16:45:17] <sukhe>	 ottomata: yeah sample is completely fine!
[16:45:42] <ottomata>	 so you'd just write a SQL query against the wmf.webrequest table that computes what you need
[16:45:48] <ottomata>	 have you used hive yet/before?
[16:46:35] <sukhe>	 ottomata: yes! I have and while I can use simple queries, I was wondering if there is a way to translate a Turnilo dashboard to a Hive query
[16:46:44] <sukhe>	 I was currently inspecting the network traffic haha
[16:47:00] <sukhe>	 (looking at HTTP requests to see if I could get something out of that)
[16:47:05] <ottomata>	 aye.  turnilo comes from semi-preaggregated data in druid
[16:47:12] <ottomata>	 and the query language is pretty different
[16:47:14] <ottomata>	 than hive
[16:47:19] <ottomata>	 oh but
[16:47:20] <sukhe>	 ah
[16:47:24] <ottomata>	 sukhe:  you could make a superset dashboard
[16:47:26] <sukhe>	 that I didn't know :)
[16:47:27] <ottomata>	 that uses data in druid
[16:47:40] <ottomata>	 are you just trying to make a monthly dashboard?
[16:48:07] <sukhe>	 ottomata: pretty much, I just need to get the top ASNs for a country say in the last 30-day period... and I need to run this once a month for a certain list of countries
[16:49:17] <ottomata>	 https://superset.wikimedia.org/superset/welcome
[16:49:37] <ottomata>	 is probably what you want
[16:49:44] <ottomata>	 turnilo is mostly for ad hoc exploration
[16:49:50] <ottomata>	 superset is better for creating dashboards
[16:50:50] <ottomata>	 https://superset.wikimedia.org/druiddatasourcemodelview/list/
[16:50:55] <ottomata>	 webrequest_sampled_128 is there
[16:51:06] <ottomata>	 i think you can explore querying against that druid datasource
[16:51:52] <ottomata>	 https://bit.ly/2sHSSQX 
[16:51:52] <sukhe>	 ottomata: looking at it, thanks very much; I had no idea 
[16:51:53] <ottomata>	 maybe takes you there?
[16:52:00] <ottomata>	 ah nope
[16:54:21] <sukhe>	 ottomata: possibly dumb question but how do I get a Druid query from a dashboard?
[16:57:49] <nuria>	 a-team: i will be late for standup today as we have another leveling metting running late , please start w/o me
[16:58:41] <ottomata>	 sukhe:  actually my superset fu is basically nil
[16:58:48] <ottomata>	 Connie Chen is our pro
[16:58:49] <nuria>	 sukhe: do take a look at superset docs: 
[16:58:51] <ottomata>	 what's her IRC....
[16:58:58] <ottomata>	 oh cchen_  ^^ 
[16:59:08] <nuria>	 sukhe: https://superset.apache.org/
[16:59:33] <sukhe>	 nuria: thanks :D
[17:00:04] <nuria>	 sukhe: trying things is fine and docs explain alot
[17:00:50] <sukhe>	 nuria: yeah, I wasn't even aware of this till ottomata told me today. I thought Turnilo was the thing
[17:01:26] <nuria>	 sukhe: now the way you go about creating a dashboard is not cuting and pasting druid queries, is a bit more friendly, teh dasboard will be on top of teh webrequest_128 datasource
[17:01:32] <nuria>	 sukhe: there are other examples
[17:04:06] <sukhe>	 nuria: yeah this is great
[17:05:56] <wikibugs>	 (03CR) 10Milimetric: Fix sqoop after new tables added (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255 (owner: 10Joal)
[17:09:31] <wikibugs>	 10Analytics, 10Product-Analytics: Many special pages missing from pageview_hourly dataset starting on July 23, 2019 - https://phabricator.wikimedia.org/T239672 (10MNeisler)
[18:08:43] <wikibugs>	 (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging and applying - Will be deployed on the next train" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554329 (https://phabricator.wikimedia.org/T239471) (owner: 10Addshore)
[18:08:54] <wikibugs>	 (03CR) 10Joal: Fix sqoop after new tables added (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255 (owner: 10Joal)
[18:14:18] <wikibugs>	 (03PS3) 10Joal: Fix sqoop after new tables added [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255
[18:15:02] <elukey>	 joal: really interesting
[18:15:03] <elukey>	   --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
[18:15:07] <elukey>	                               on one of the worker machines inside the cluster ("cluster")
[18:15:10] <elukey>	                               (Default: client).
[18:15:20] <elukey>	 the driver is running on the client with only --master-yarn
[18:15:25] <joal>	 elukey: indeed - I was testing spark-submit in deploy-mode cluster
[18:18:23] <elukey>	 joal: and in refine's logs on analytics1030 I can see hdfs delegation token requested, but not hive ones
[18:18:39] <elukey>	 meanwhile they should be listed if retrieved
[18:18:54] <elukey>	 it connects to the metastore though
[18:18:59] <joal>	 elukey: the bizarre thing is that I have seen the same on m test job, and my hive query hasn't failed
[18:19:19] <joal>	 my spark-query that needed hive metastore to e precise --^
[18:20:12] <elukey>	 joal: that is not bizarre I think, since the driver did it with the local credential cache, not the app master in yarn.. is it possible?
[18:20:15] <elukey>	 it would explain it
[18:20:49] <joal>	 elukey: nope - I ran a job using deploy-mode cluster, this job successfully did the querying
[18:22:15] <elukey>	 joal: is there anywhere in the driver's yarn log that it connected to the metastore?
[18:24:12] <joal>	 elukey: I have seen that log line from the client, looking in application logs
[18:25:59] <joal>	 elukey: it is present in the app-log, yes
[18:26:33] <joal>	 elukey: 19/12/03 12:06:02 INFO metastore: Trying to connect to metastore with URI thrift://analytics1030.eqiad.wmnet:9083
[18:26:36] <joal>	 19/12/03 12:06:02 INFO metastore: Connected to metastore.
[18:26:48] <joal>	 Those lines are in the AM of my job suceeding in querying
[18:27:00] <joal>	 But no mention of tickets or tokens
[18:31:44] <elukey>	 so it uses SASL for sure to authenticate
[18:32:10] <joal>	 elukey: are our logs the correct ones for hive tokens?
[18:35:04] <elukey>	 joal: ?
[18:35:43] <joal>	 elukey: I wonder if the logs we enable trying to get info actually show us hive tokens (they do show hdfs tokens, but maybe another setting is needed for hive?)
[18:36:30] <elukey>	 in theory no afaik
[18:37:01] <elukey>	 I am trying to understand what it means in concrete to have hive in spark's classpath
[18:37:12] <elukey>	 because that seems the trigger to get delegation tokens
[18:37:47] <elukey>	 but can't explain the metastore thing
[18:42:07] <ottomata>	 that probably means having hive-site.xml in spark's classpath?
[18:42:25] <elukey>	 I think so but it is very generic 
[18:53:55] <joal>	 elukey: I actually think spark can access hive through HDFS-token as long as it only queries: querying is equivalent to reading only, therefore filesystem uth is enough
[18:54:35] <ottomata>	 hm, that is likely true; spark will talk to the metastore to get file locations though, right?
[18:55:02] <joal>	 ottomata: yes, and metastore needs HDFS-token to grant access- but reading only doesn't require more than HDFS token
[18:55:07] <ottomata>	 aye
[18:55:13] <ottomata>	 makes sense
[18:55:16] <elukey>	 at this point yes it is a plausible explanation
[18:55:20] <joal>	 trying to triple check my view
[18:56:49] <elukey>	 with --conf spark.security.credentials.hiveserver2.enabled=true to spark-submit it seems that I don't get anymore the failure
[18:57:57] <elukey>	 but no traces in the logs about hive tokens
[18:59:42] <elukey>	 but of course if I remove it everything works
[19:00:06] <elukey>	 I guess I need to drop the table in hive to verify
[19:00:49] <ottomata>	 elukey:  you can also just alter the table and drop a column
[19:00:56] <ottomata>	 instead of dropping whole table if yo uprefer
[19:01:05] <elukey>	 yes also good suggestion!
[19:01:56] <joal>	 I managed to create a hive table using spark without any cred addition
[19:04:07] <elukey>	 joal: via jdbc? or via metastore?
[19:04:14] <joal>	 elukey: via metastore
[19:05:03] <joal>	 elukey: from a stack-overflow answer: The Spark "client" uses a Kerberos ticket to retrieve various tokens, then make these tokens available to the driver and the executors
[19:05:33] <joal>	 So tokens are gathered locally, then dispatched to every remote worker (including driveR)
[19:05:56] <elukey>	 joal: but there is no trace of this
[19:06:08] <elukey>	 this is the issue
[19:06:20] <joal>	 I know !!!
[19:06:32] <joal>	 We have trace for HDFS-token, but not for metastore :
[19:06:34] <joal>	 :(
[19:07:25] <elukey>	 ohh ok I dropped the table and finally the error again
[19:07:33] <elukey>	 19/12/03 19:06:38 INFO DataFrameToHive: Connecting to Hive over JDBC at jdbc:hive2://analytics1030.eqiad.wmnet:10000/default;principal=hive/_HOST@WIKIMEDIA
[19:07:36] <elukey>	 19/12/03 19:06:38 INFO Utils: Supplied authorities: analytics1030.eqiad.wmnet:10000
[19:07:40] <elukey>	 19/12/03 19:06:38 INFO Utils: Resolved authority: analytics1030.eqiad.wmnet:10000
[19:07:42] <elukey>	 19/12/03 19:06:38 ERROR TSaslTransport: SASL negotiation failure
[19:07:45] <elukey>	 javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
[19:08:00] <ottomata>	 principal=hive
[19:08:01] <ottomata>	 ?
[19:08:27] <elukey>	 it is the same thing that we put in beeline
[19:09:08] <elukey>	 your code grabs that value from hive-site.xml and adds it to the jdbc string
[19:09:44] <joal>	 dumb question then: how do we provide creds for hive principal?
[19:10:57] <elukey>	 joal: I think it should be read as "expect this principal from the other side of the connection" if I got it correctly
[19:11:22] <joal>	 elukey: ok, but a ticket is also needed, for that principal I guess?
[19:12:06] <elukey>	 joal: well no, a ticket is needed for you to present to the service
[19:12:38] <joal>	 so, if I have a ticket, I can say I'm hive - weird :)
[19:12:47] <elukey>	 no no
[19:13:25] <ottomata>	 elukey:  i think that value is not in hive-site.xml
[19:13:37] <joal>	 I stil don't get it elukey :)
[19:13:58] <elukey>	 ottomata: it is in theory, it is the hive's principal
[19:14:12] <ottomata>	 that is the default value of it?
[19:14:58] <ottomata>	 https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2
[19:15:11] <ottomata>	 do you think we also need to set hive.server2.authentication=KERBEROS
[19:15:12] <ottomata>	 ?
[19:15:16] <joal>	 Ah! principal = service name
[19:15:19] <joal>	 ok
[19:16:16] <elukey>	 ottomata: it is set
[19:16:38] <joal>	 elukey, ottomata : for reference - https://github.com/eBay/Spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L157 (and after)
[19:16:39] <elukey>	 joal: yes it is weird but the principal is the target service's principal
[19:16:53] <ottomata>	 elukey:  but in the jdbc connetion uri?
[19:17:10] <joal>	 elukey: I need to write that down to make sure I don't ask you again (probably the n-th time I do already ... sorry)
[19:17:38] <elukey>	 ottomata: ahhh sorry!
[19:18:24] <elukey>	 ottomata: is it needed? I don't find it
[19:19:06] <elukey>	 joal: please keep asking, I am learning things as well, during all this time I have probably taken things for granted since they were working without questioning them in dept
[19:19:35] <elukey>	 ottomata: for beeline we have
[19:19:36] <elukey>	 Connecting to jdbc:hive2://analytics1030.eqiad.wmnet:10000/default;principal=hive/_HOST@WIKIMEDIA
[19:21:31] <ottomata>	 elukey:  i don't know!
[19:21:47] <ottomata>	 https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2 seems to say maybe?
[19:21:56] <ottomata>	 hm
[19:22:20] <ottomata>	 hm, maybe that is just server side
[19:28:33] <elukey>	 ok I am declaring myself defeated for today, mediawiki and spark killed my brain :D
[19:28:39] <elukey>	 I'll restart tomorrow morning
[19:28:44] <elukey>	 thanks a lot for the brain bounce!
[19:28:54] <elukey>	 o/
[19:32:56] <ottomata>	 o/
[19:40:33] <joal>	 ottomata: still here with a minute?
[19:41:15] <nuria>	 mforns: if you are there can we talk about the pageview anomaliesd?
[19:42:04] <nuria>	 milimetric: also, we need to deploy aqs to flip to the new snapshot right?
[19:42:28] <joal>	 nuria: done this morning
[19:42:39] <nuria>	 ok ! cc milimetric 
[19:42:44] <nuria>	 thnak you joal 
[19:42:47] <nuria>	 *thank
[19:45:41] <wikibugs>	 (03CR) 10Nuria: Add data quality metric: traffic variations per country (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) (owner: 10Mforns)
[19:54:49] <mforns>	 nuria, yea makes sense, sth like the other doc page we made for user agent entropy, right?
[19:55:15] <mforns>	 nuria, was there sth else you wanted to discuss?
[20:00:42] <wikibugs>	 10Analytics, 10Datasets-Archiving, 10Research-Backlog: Make HTML dumps available - https://phabricator.wikimedia.org/T182351 (10ArielGlenn) >>! In T182351#5709123, @leila wrote: > @tizianopiccardi thanks for the update and great to see that you're there. :) Please make sure you advertise for it on wiki-resea...
[20:07:11] <wikibugs>	 10Analytics, 10User-Elukey: Redesign architecture of irc-recentchanges on top of Kafka - https://phabricator.wikimedia.org/T234234 (10elukey) Faidon improved his code in https://gist.github.com/paravoid/3419e0b5ae1f24b6ea21906a142f2f47, the python irc prototype is now able to do everything that is should be ne...
[20:32:05] <joal>	 ottomata: ping?
[20:32:19] <wikibugs>	 10Analytics, 10Datasets-Archiving, 10Research-Backlog: Make HTML dumps available - https://phabricator.wikimedia.org/T182351 (10tizianopiccardi) Yes, we used MediaWiki. Technically we extracted the function used for the parsing and we intercepted the database calls to return the version of the templates avai...
[20:33:10] <ottomata>	 joal:  o/
[20:33:16] <joal>	 heya
[20:33:38] <ottomata>	 wassssuuppp?
[20:34:03] <joal>	 I'm trying to reproduce the jdbc error and I'm stuck on getting access to files in spark-shell - Any idea?
[20:35:26] <ottomata>	 from where joal?
[20:36:28] <joal>	 I'm on an-tool1006, launching a spark-shell with --files hive-site.xml - But then hadoopConf doesn't have it :(
[20:37:31] <ottomata>	 hm
[20:37:54] <joal>	 I vaguely recall files not being accessible in shell-mode, but couldn't find doc
[20:38:29] <ottomata>	 hmm that doesn't seem right
[20:39:38] <ottomata>	 joal:  what are you trying to do?  load hive-site.xml manually?
[20:39:57] <joal>	 reproduce what happens in refine, hive-jdbc only
[20:40:01] <ottomata>	 aye
[20:40:05] <joal>	 meaning: loading conf 
[20:40:11] <ottomata>	 so
[20:40:14] <ottomata>	 on an-tool1006
[20:40:18] <ottomata>	 i just did
[20:40:22] <ottomata>	 spark2-shell (no --files)
[20:40:39] <ottomata>	 OH sorry, yeah, got null
[20:41:36] <ottomata>	 ah i know
[20:41:56] <ottomata>	 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Refine#Running_Refine_in_local_or_yarn_client_mode
[20:42:55] <joal>	 ?
[20:42:58] <ottomata>	 hmmmm why is there no --files there...
[20:44:24] <ottomata>	 hm i thnk you shouldn't need --files in local mode
[20:44:27] <ottomata>	 or yarn client mode
[20:44:27] <joal>	 ottomata: hardcoding conf, I managed to have something working (failing to be precise), but I can't get access to conf
[20:44:30] <ottomata>	 for hive-site.xml
[20:44:46] <ottomata>	 but, i get null in hadoopConf too..
[20:45:05] <joal>	 ottomata: hadoopConf contains all hadoop conf except hive
[20:46:37] <ottomata>	 hmmm
[20:49:07] <ottomata>	 yeah shouldn't need joal, hive-site.xml is in symlinked spark conf already
[20:49:10] <ottomata>	 trying too tho
[20:52:45] <ottomata>	 ah
[20:52:46] <ottomata>	 joal
[20:52:54] <ottomata>	 you need to interact with hive before the configs are loaded 
[20:53:02] <joal>	 ahhhh
[20:53:04] <joal>	 pf
[20:53:06] <joal>	 ok :)
[21:14:10] <nuria>	 mforns: there still?
[21:14:23] <mforns>	 nuria, yes
[21:14:29] <nuria>	 mforns: bc?
[21:14:33] <mforns>	 sure
[21:14:39] <nuria>	 mforns: to talk about pageview anomalies?
[21:33:48] <nuria>	 mforns: internet went kaput but no worries! thank you for your work
[21:34:49] <mforns>	 nuria, I connected secondary internet
[21:35:09] <nuria>	 mforns: no worries i think we talked about everything , unless you had anything else?
[21:38:26] <mforns>	 nuria, using second internet now wanna continue?
[21:38:51] <wikibugs>	 10Analytics-Cluster, 10Analytics-Kanban: Sanitize pageview_hourly - subtasked {mole} - https://phabricator.wikimedia.org/T114675 (10Aklapper)
[21:38:57] <nuria>	 mforns: nah, it is late  for ya, and I think we got everything, we are good
[21:39:07] <mforns>	 ok, cool
[22:10:44] <joal>	 I tried to manually run jdbc only code i
[22:10:59] <joal>	 in spark - no chance - Will try again tomorrow
[22:38:48] <wikibugs>	 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Growth: implement wider data purge window - https://phabricator.wikimedia.org/T237124 (10mforns) After discussing with @nettrom_WMF we concluded that the solution described above is not a fit. The raw (unsanitized) data does not have the appro...
[22:42:27] <wikibugs>	 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Growth: implement wider data purge window - https://phabricator.wikimedia.org/T237124 (10Nuria) >that would remove the data specific to the affected growth schemas from the event_sanitized database after 270 days (sliding window). This would r...
[22:50:52] <Amir1>	 hey, is there I can about jobs?
[22:54:47] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad: analytics1057's BBU is faulty - https://phabricator.wikimedia.org/T239045 (10Jclark-ctr) @elukey  unsure if this is same bbu it is diffrent models.  720xd vs 730xd
[22:55:57] <Amir1>	 I meant is there a way that I query job in the data lake, like hadoop
[22:56:30] <addshore>	 *jobs
[22:56:36] <addshore>	 ** mediawiki jobs