[00:49:02] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: replace onboard NIC in kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T236327 (10Jclark-ctr) @elukey Received cards please message me on irc and we can start scheduling replacement [01:08:29] 10Analytics, 10Operations, 10hardware-requests, 10ops-eqiad, 10User-Elukey: Upgrade kafka-jumbo100[1-6] to 10G NICs (if possible) - https://phabricator.wikimedia.org/T220700 (10Jclark-ctr) [02:50:54] milimetric (cc mforns ) should it be /features/hourly/automata or /features/automata/hourly? [02:51:57] nuria: we usually keep the granularity last, like daily/hourly [06:38:39] 10Analytics, 10Wikimedia Design Style Guide: Analytics: Some pages/page requests are not reflected in statistics - https://phabricator.wikimedia.org/T239685 (10Volker_E) [06:39:00] 10Analytics, 10Wikimedia Design Style Guide: Analytics: Some pages/page requests are not reflected in statistics - https://phabricator.wikimedia.org/T239685 (10Volker_E) [06:39:07] 10Analytics, 10Wikimedia Design Style Guide: Analytics: Some pages/page requests are not reflected in statistics - https://phabricator.wikimedia.org/T239685 (10Volker_E) p:05Triage→03Normal [06:58:20] Goat Morning! [07:29:43] morning! [07:29:47] addshore: o/ [07:33:03] joal: bonjour! When you are online let's check alerts together [07:46:13] o/ [08:03:16] Good morning [08:04:26] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10matthiasmullie) @Milimetric it wasn't being used to omit deleted pages (see inline comments about archive table)... [08:04:41] elukey: ready when you are! [08:05:45] hello! I saw your email, just wanted to know if anything was needed or if it was all part of the restart.. I noticed some alerts after your last email [08:06:43] elukey: all is good - the 2 big jobs failed for their day to rerun (December 1st), I think it was because of cassandra being overwhelmed (too much loading at once) [08:06:56] elukey: I reran each of them separately and everything is fine [08:07:14] super [08:07:16] thanks :) [08:07:20] I just checked jobs this morning: nothing has failed, jobs are faster, I'm gonna confirm log size and I think we can call it done :) [08:07:47] Thanks a lot for double checking elukey :) [08:08:02] elukey: I'm ready for sparkerb when you want [08:08:41] joal: need 10mins since icinga is like a christmas tree now (not due to analytics alerts) [08:09:00] elukey: when you want :) [08:09:53] * joal needs to think of a present to put under the icinga tree for elukey [08:15:55] elukey: for big jobs, the cassandra loader log-change has reduced the size big a ratio of ~50k :) [08:16:00] \o/ [08:17:57] that is wonderful! [08:20:51] stat1007 again with low disk space sigh [08:22:05] :( [08:23:13] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10akosiaris) >>! In T236386#5706868, @Ottomata wrote: > @akosiaris I merged and applied https://gerrit.wikimedia.org/r/c/opera... [08:31:39] second offending home belongs to Mr Andrew Otto :) [08:34:51] huhuhu :) [08:36:34] * joal hopes nobody will look at home sizes on hdfs ... [08:37:27] joal: aqs1004 depooled and ready for a test [08:37:32] \o/ [08:39:07] elukey: also, we can go for me deploying using scap deploy if you prefer [08:40:34] joal: I have a cumin cookbook now to roll restart aqs [08:40:40] Ah - ok :0 [08:40:58] but yes eventually we should move to a scap config [08:41:10] to avoid using puppet at all [08:41:43] makes sense [08:41:49] elukey: DEPLOY ! [08:43:03] ack! [08:47:23] ok joal about spark-hive [08:47:34] yes [08:47:57] my understanding was that spark was aware of the need to use hive, and it would have gathered a delegation token for it upon start [08:48:08] that could be the case for regular sessions [08:48:17] but maybe not for our jdbc refine use [08:48:53] elukey: I agree spark must gather hive token [08:49:07] the error seems clear: from a worker, there is no kerberos analytics credentials to use [08:49:20] now the question is, how is it used when quering hive through jdbc [08:49:21] when passing the keytab, there is an explicit auth [08:49:23] so all works [08:49:40] (aqs rolled restart btw) [08:49:46] \o/ [08:50:10] elukey: maybe asking spark to use implicit creds to connect to hive when available? [08:51:26] joal: not sure if it has some, from the yarn logs it seems that there are none [08:52:04] elukey: it must have one, otherwise it wouldn't manage to connect to hive in spark-shell mode, no ? [08:52:04] it started happening right after your deploy (that touched also analytics1030) but no idea what triggered this behavior [08:52:35] elukey: I didn't understand that - behavior has changed? [08:52:53] joal: if it uses hive1 protocol, only hdfs delegation token is required to read data.. when you have to alter tables, then the problem might arise [08:53:08] (in which jdbc -> hive2 protocol is used) [08:53:55] elukey: it also depends how hive tables are read: if using the hive metastore as base to get data info, then some hive creds must be available - if using parquet directly, then hdfs only [08:56:16] joal: we cannot really test alter tables with spark.sql() right? Due to the bug that Andrew is working on [08:56:59] elukey: we can test using spark-shell, or spark-submit, but we can't have refine rely on that [08:59:11] joal: ah ok I am reading again https://github.com/apache/spark/pull/21012, the bug is only for some alter table use cases [08:59:20] yes let's test it and see if it works or not [08:59:32] in yarn mode of course, we need to trigger the command from a worker [09:00:00] Ah - elukey - this is tricky [09:00:09] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10daniel) > I was thinking it made sense to exclude rev-deleted revisions: if a page only has (SDC) revisions that... [09:00:33] elukey: we need to write a spark job for that, we can't use spark-shell [09:03:49] https://issues.apache.org/jira/browse/HIVE-5155 is interesting [09:04:59] elukey: not related to our case I think, as spark talks to metastore directly IIRC [09:06:13] joal: we use jdbc to hive-server in refine no? [09:07:00] elukey: I can;t recall, but you're probably right - So spark uses metastore, and jdbc hive-server, makes sense [09:48:39] !log Kill restart mediawiki-history-load-coord after sqoop re-import of missing tables [09:48:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:56:23] (03PS1) 10Joal: Fix sqoop after new tables added [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255 [10:03:39] (03PS2) 10Joal: Fix sqoop after new tables added [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255 [11:07:50] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10matthiasmullie) > If you want to count //pages//, then you are only looking at the current revision, right? And... [11:12:02] joal: sorry got distracted by a mediawiki issue, back to the problem [11:12:21] np elukey - I figured out you were not talking to me anymore ;) [11:12:33] in my mind it shouldn't be a problem to just explicitly pass keytab and principal to spark-submit [11:12:36] :) [11:13:13] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10daniel) >>! In T238878#5708511, @matthiasmullie wrote: > So page_latest can never point to a revdel'ed revision?... [11:13:29] the doc about kerberos in spark is not really complete [11:15:36] elukey: I think passing creds to spark submit will only be needed for special cases like the one for refine (hive jdbc), and the ones for long running jobs (wikitext-conversion) [11:15:50] For others, I don't think it'll be needed [11:16:33] yep [11:16:55] I can add those parameters in puppet for our spark jobs [11:17:35] elukey: Now the reason why explicit creds are needed for hive-jdbc and not for spark-hive-metastore is kinda mysterious :( [11:19:19] joal: did we test spark hive metastore in yarn mode? [11:20:19] elukey: not sure what 'yarn mode' means here - I have tested spark actions trough oozie, and spark-shell (executors in arn) [11:20:50] joal: yarn mode is --master yarn, it shouldn't work in theory [11:21:03] elukey: it does work with that [11:21:27] elukey: or should I say my previous tests have passed, as well as todays [11:22:07] we can re-do the tests and see yarn logs to figure out how it authenticates, and see what differs [11:22:38] elukey: I think I have an idea as to why refine needs explicit creds: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-spark/src/main/scala/org/wikimedia/analytics/refinery/spark/connectors/DataFrameToHive.scala#L330 [11:22:51] elukey: logging onto an-tool6 to test [11:23:50] * joal loves to receive kerberos errors :) [11:23:55] joal: I think that the hive principal is not analytics in that case [11:24:11] but the one that we use to construct the jdbc url [11:24:16] like we do in beeline [11:25:16] elukey@analytics1030:~$ sudo grep -A 1 hive.server2.authentication.kerberos.principal /etc/hive/conf/hive-site.xml [11:25:19] hive.server2.authentication.kerberos.principal [11:25:21] joal: --^ [11:25:23] hive/_HOST@WIKIMEDIA [11:25:34] hm [11:25:54] elukey: Just ran an sql query in arn mode having just kinit [11:26:10] can you give me the app id? [11:26:21] currently ongoing elukey [11:26:37] elukey: application_1573209619170_21817 [11:26:53] not closed by the way, so no logs yet [11:27:01] I guess you want me to close it? [11:27:18] joal: if you are done yes please [11:27:26] sure, done [11:33:29] joal: did you manage to have a look at that sqoop? ;D [11:34:18] addshore: yes, it's currently gathering data we wer missing, and once done it's your turn (I should say wikidata turn :) [11:34:58] joal: when you have a min can you re-run the job with https://spark.apache.org/docs/latest/running-on-yarn.html#troubleshooting-kerberos ? [11:35:11] because I don't see anything mentioned in the logs about an-coord or metastore [11:36:40] joal: thanks! [11:36:53] elukey: trying [11:38:40] elukey: seems working - application_1573209619170_21820 [11:39:19] elukey: ticket used is from cache logs say [11:45:01] joal: mmm I can see that the ticket is not found in the cache no? [11:45:13] elukey: found in driver I think [11:46:22] elukey: first lines of the log [11:48:03] joal: can you paste the lines? [11:48:23] because I have [11:48:25] Container: container_e03_1573209619170_21820_01_000023 on analytics1031.eqiad.wmnet_8041 [11:48:28] etc.. [11:48:53] First lines even before that [11:49:16] I don't have them [11:49:55] I am doing [11:49:57] sudo kerberos-run-command hdfs yarn logs -applicationId application_1573209619170_21820 --appOwner joal [11:50:04] AH! I get it - those lines must be my debug mode! [11:50:09] ah! [11:50:21] not related then :) [11:50:34] BUT - I saw those same lines when launching my spark app [11:50:39] pasting in a gist [11:50:44] super thanks [12:02:30] jo-al, please ping me if you notice the resqoop is done =] [13:13:40] * elukey lunch! [13:54:19] 10Analytics, 10Analytics-Kanban: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10Ottomata) Ok! Find me on IRC let's do it! [14:24:21] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) [14:36:51] joal: did the sqoop by any chance finish? [14:36:59] It did !!! [14:37:10] addshore: 5 minutes ago roughly [14:37:15] Let me repair the tables [14:37:18] addshore: --^ [14:37:21] okay! :D [14:39:16] done addshore - Can you confirm this is ok? [14:41:51] I'll run my queries and check! thanks! [14:48:25] joal: still the 2019-10 snapshot? [14:48:33] addshore: nope, 2019-11 :) [14:48:38] AAahhhhh :P [14:48:41] * addshore updates queries [14:48:44] stuff got sqooped early this month [14:49:04] aaah, only 11 for that one table though? ;) [14:49:10] also addshore - data has been sqooped into "prod" tables: wnf_raw database instead of joal [14:49:20] hahahaa, okay [14:49:25] addshore: all tables [14:49:28] :) [14:54:52] joal: all looks good! [14:54:59] \o/ :) [14:55:06] addshore: results are as expected? [15:01:10] Gone for kids [15:06:02] 10Analytics, 10Datasets-Archiving, 10Research-Backlog: Make HTML dumps available - https://phabricator.wikimedia.org/T182351 (10leila) @tizianopiccardi thanks for the update and great to see that you're there. :) Please make sure you advertise for it on wiki-research-l when the right time arrives. @ArielGle... [15:27:44] ottomata: hi, could we deploy https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/548764 ? [15:28:25] YES! [15:28:36] nice :) [16:24:06] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/551690 (https://phabricator.wikimedia.org/T232671) (owner: 10Srishakatux) [16:28:19] 10Analytics, 10Analytics-Kanban, 10Wikidata, 10User-Addshore, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): Sqoop wikidata terms tables into hadoop - https://phabricator.wikimedia.org/T239471 (10Addshore) 05Open→03Resolved These things now exist in wmf_raw.wikibase_wbt_item_terms for example =... [16:34:24] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) Sigh, it turns out SRE wants us to not rewrite paths. So if we use path based routing, the app needs to handle wh... [16:36:01] 10Analytics, 10Analytics-Kanban, 10Wikidata, 10User-Addshore, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): Sqoop wikidata terms tables into hadoop - https://phabricator.wikimedia.org/T239471 (10Addshore) 05Resolved→03Open [16:36:39] joal ;_; I made a mistake [16:37:17] need to alter 1 more field in 1 of the tables and re-sqoop one of them [16:41:14] (03PS1) 10Addshore: sqoop, wb_terms, use term_full_entity_id not term_entity_id [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554329 (https://phabricator.wikimedia.org/T239471) [16:42:52] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) How about: - events-logging.wikimedia.org/v1/events - events-analytics.wikimedia.org/v1/events ? Is events-logg... [16:44:04] sukhe: o/ [16:44:17] ottomata: webrequest_sampled_128 [16:44:22] ah k [16:44:29] yeah there is a hive table [16:44:32] wmf.webrequest [16:44:39] I am looking to get: top five or ten ASNs for a country split by time 30 days [16:44:41] should have mostly the same data but is unsampled [16:45:03] you can sample though [16:45:16] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling#LanguageManualSampling-SamplingBucketizedTable [16:45:17] ottomata: yeah sample is completely fine! [16:45:42] so you'd just write a SQL query against the wmf.webrequest table that computes what you need [16:45:48] have you used hive yet/before? [16:46:35] ottomata: yes! I have and while I can use simple queries, I was wondering if there is a way to translate a Turnilo dashboard to a Hive query [16:46:44] I was currently inspecting the network traffic haha [16:47:00] (looking at HTTP requests to see if I could get something out of that) [16:47:05] aye. turnilo comes from semi-preaggregated data in druid [16:47:12] and the query language is pretty different [16:47:14] than hive [16:47:19] oh but [16:47:20] ah [16:47:24] sukhe: you could make a superset dashboard [16:47:26] that I didn't know :) [16:47:27] that uses data in druid [16:47:40] are you just trying to make a monthly dashboard? [16:48:07] ottomata: pretty much, I just need to get the top ASNs for a country say in the last 30-day period... and I need to run this once a month for a certain list of countries [16:49:17] https://superset.wikimedia.org/superset/welcome [16:49:37] is probably what you want [16:49:44] turnilo is mostly for ad hoc exploration [16:49:50] superset is better for creating dashboards [16:50:50] https://superset.wikimedia.org/druiddatasourcemodelview/list/ [16:50:55] webrequest_sampled_128 is there [16:51:06] i think you can explore querying against that druid datasource [16:51:52] https://bit.ly/2sHSSQX [16:51:52] ottomata: looking at it, thanks very much; I had no idea [16:51:53] maybe takes you there? [16:52:00] ah nope [16:54:21] ottomata: possibly dumb question but how do I get a Druid query from a dashboard? [16:57:49] a-team: i will be late for standup today as we have another leveling metting running late , please start w/o me [16:58:41] sukhe: actually my superset fu is basically nil [16:58:48] Connie Chen is our pro [16:58:49] sukhe: do take a look at superset docs: [16:58:51] what's her IRC.... [16:58:58] oh cchen_ ^^ [16:59:08] sukhe: https://superset.apache.org/ [16:59:33] nuria: thanks :D [17:00:04] sukhe: trying things is fine and docs explain alot [17:00:50] nuria: yeah, I wasn't even aware of this till ottomata told me today. I thought Turnilo was the thing [17:01:26] sukhe: now the way you go about creating a dashboard is not cuting and pasting druid queries, is a bit more friendly, teh dasboard will be on top of teh webrequest_128 datasource [17:01:32] sukhe: there are other examples [17:04:06] nuria: yeah this is great [17:05:56] (03CR) 10Milimetric: Fix sqoop after new tables added (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255 (owner: 10Joal) [17:09:31] 10Analytics, 10Product-Analytics: Many special pages missing from pageview_hourly dataset starting on July 23, 2019 - https://phabricator.wikimedia.org/T239672 (10MNeisler) [18:08:43] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging and applying - Will be deployed on the next train" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554329 (https://phabricator.wikimedia.org/T239471) (owner: 10Addshore) [18:08:54] (03CR) 10Joal: Fix sqoop after new tables added (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255 (owner: 10Joal) [18:14:18] (03PS3) 10Joal: Fix sqoop after new tables added [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255 [18:15:02] joal: really interesting [18:15:03] --deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or [18:15:07] on one of the worker machines inside the cluster ("cluster") [18:15:10] (Default: client). [18:15:20] the driver is running on the client with only --master-yarn [18:15:25] elukey: indeed - I was testing spark-submit in deploy-mode cluster [18:18:23] joal: and in refine's logs on analytics1030 I can see hdfs delegation token requested, but not hive ones [18:18:39] meanwhile they should be listed if retrieved [18:18:54] it connects to the metastore though [18:18:59] elukey: the bizarre thing is that I have seen the same on m test job, and my hive query hasn't failed [18:19:19] my spark-query that needed hive metastore to e precise --^ [18:20:12] joal: that is not bizarre I think, since the driver did it with the local credential cache, not the app master in yarn.. is it possible? [18:20:15] it would explain it [18:20:49] elukey: nope - I ran a job using deploy-mode cluster, this job successfully did the querying [18:22:15] joal: is there anywhere in the driver's yarn log that it connected to the metastore? [18:24:12] elukey: I have seen that log line from the client, looking in application logs [18:25:59] elukey: it is present in the app-log, yes [18:26:33] elukey: 19/12/03 12:06:02 INFO metastore: Trying to connect to metastore with URI thrift://analytics1030.eqiad.wmnet:9083 [18:26:36] 19/12/03 12:06:02 INFO metastore: Connected to metastore. [18:26:48] Those lines are in the AM of my job suceeding in querying [18:27:00] But no mention of tickets or tokens [18:31:44] so it uses SASL for sure to authenticate [18:32:10] elukey: are our logs the correct ones for hive tokens? [18:35:04] joal: ? [18:35:43] elukey: I wonder if the logs we enable trying to get info actually show us hive tokens (they do show hdfs tokens, but maybe another setting is needed for hive?) [18:36:30] in theory no afaik [18:37:01] I am trying to understand what it means in concrete to have hive in spark's classpath [18:37:12] because that seems the trigger to get delegation tokens [18:37:47] but can't explain the metastore thing [18:42:07] that probably means having hive-site.xml in spark's classpath? [18:42:25] I think so but it is very generic [18:53:55] elukey: I actually think spark can access hive through HDFS-token as long as it only queries: querying is equivalent to reading only, therefore filesystem uth is enough [18:54:35] hm, that is likely true; spark will talk to the metastore to get file locations though, right? [18:55:02] ottomata: yes, and metastore needs HDFS-token to grant access- but reading only doesn't require more than HDFS token [18:55:07] aye [18:55:13] makes sense [18:55:16] at this point yes it is a plausible explanation [18:55:20] trying to triple check my view [18:56:49] with --conf spark.security.credentials.hiveserver2.enabled=true to spark-submit it seems that I don't get anymore the failure [18:57:57] but no traces in the logs about hive tokens [18:59:42] but of course if I remove it everything works [19:00:06] I guess I need to drop the table in hive to verify [19:00:49] elukey: you can also just alter the table and drop a column [19:00:56] instead of dropping whole table if yo uprefer [19:01:05] yes also good suggestion! [19:01:56] I managed to create a hive table using spark without any cred addition [19:04:07] joal: via jdbc? or via metastore? [19:04:14] elukey: via metastore [19:05:03] elukey: from a stack-overflow answer: The Spark "client" uses a Kerberos ticket to retrieve various tokens, then make these tokens available to the driver and the executors [19:05:33] So tokens are gathered locally, then dispatched to every remote worker (including driveR) [19:05:56] joal: but there is no trace of this [19:06:08] this is the issue [19:06:20] I know !!! [19:06:32] We have trace for HDFS-token, but not for metastore : [19:06:34] :( [19:07:25] ohh ok I dropped the table and finally the error again [19:07:33] 19/12/03 19:06:38 INFO DataFrameToHive: Connecting to Hive over JDBC at jdbc:hive2://analytics1030.eqiad.wmnet:10000/default;principal=hive/_HOST@WIKIMEDIA [19:07:36] 19/12/03 19:06:38 INFO Utils: Supplied authorities: analytics1030.eqiad.wmnet:10000 [19:07:40] 19/12/03 19:06:38 INFO Utils: Resolved authority: analytics1030.eqiad.wmnet:10000 [19:07:42] 19/12/03 19:06:38 ERROR TSaslTransport: SASL negotiation failure [19:07:45] javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] [19:08:00] principal=hive [19:08:01] ? [19:08:27] it is the same thing that we put in beeline [19:09:08] your code grabs that value from hive-site.xml and adds it to the jdbc string [19:09:44] dumb question then: how do we provide creds for hive principal? [19:10:57] joal: I think it should be read as "expect this principal from the other side of the connection" if I got it correctly [19:11:22] elukey: ok, but a ticket is also needed, for that principal I guess? [19:12:06] joal: well no, a ticket is needed for you to present to the service [19:12:38] so, if I have a ticket, I can say I'm hive - weird :) [19:12:47] no no [19:13:25] elukey: i think that value is not in hive-site.xml [19:13:37] I stil don't get it elukey :) [19:13:58] ottomata: it is in theory, it is the hive's principal [19:14:12] that is the default value of it? [19:14:58] https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2 [19:15:11] do you think we also need to set hive.server2.authentication=KERBEROS [19:15:12] ? [19:15:16] Ah! principal = service name [19:15:19] ok [19:16:16] ottomata: it is set [19:16:38] elukey, ottomata : for reference - https://github.com/eBay/Spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L157 (and after) [19:16:39] joal: yes it is weird but the principal is the target service's principal [19:16:53] elukey: but in the jdbc connetion uri? [19:17:10] elukey: I need to write that down to make sure I don't ask you again (probably the n-th time I do already ... sorry) [19:17:38] ottomata: ahhh sorry! [19:18:24] ottomata: is it needed? I don't find it [19:19:06] joal: please keep asking, I am learning things as well, during all this time I have probably taken things for granted since they were working without questioning them in dept [19:19:35] ottomata: for beeline we have [19:19:36] Connecting to jdbc:hive2://analytics1030.eqiad.wmnet:10000/default;principal=hive/_HOST@WIKIMEDIA [19:21:31] elukey: i don't know! [19:21:47] https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2 seems to say maybe? [19:21:56] hm [19:22:20] hm, maybe that is just server side [19:28:33] ok I am declaring myself defeated for today, mediawiki and spark killed my brain :D [19:28:39] I'll restart tomorrow morning [19:28:44] thanks a lot for the brain bounce! [19:28:54] o/ [19:32:56] o/ [19:40:33] ottomata: still here with a minute? [19:41:15] mforns: if you are there can we talk about the pageview anomaliesd? [19:42:04] milimetric: also, we need to deploy aqs to flip to the new snapshot right? [19:42:28] nuria: done this morning [19:42:39] ok ! cc milimetric [19:42:44] thnak you joal [19:42:47] *thank [19:45:41] (03CR) 10Nuria: Add data quality metric: traffic variations per country (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) (owner: 10Mforns) [19:54:49] nuria, yea makes sense, sth like the other doc page we made for user agent entropy, right? [19:55:15] nuria, was there sth else you wanted to discuss? [20:00:42] 10Analytics, 10Datasets-Archiving, 10Research-Backlog: Make HTML dumps available - https://phabricator.wikimedia.org/T182351 (10ArielGlenn) >>! In T182351#5709123, @leila wrote: > @tizianopiccardi thanks for the update and great to see that you're there. :) Please make sure you advertise for it on wiki-resea... [20:07:11] 10Analytics, 10User-Elukey: Redesign architecture of irc-recentchanges on top of Kafka - https://phabricator.wikimedia.org/T234234 (10elukey) Faidon improved his code in https://gist.github.com/paravoid/3419e0b5ae1f24b6ea21906a142f2f47, the python irc prototype is now able to do everything that is should be ne... [20:32:05] ottomata: ping? [20:32:19] 10Analytics, 10Datasets-Archiving, 10Research-Backlog: Make HTML dumps available - https://phabricator.wikimedia.org/T182351 (10tizianopiccardi) Yes, we used MediaWiki. Technically we extracted the function used for the parsing and we intercepted the database calls to return the version of the templates avai... [20:33:10] joal: o/ [20:33:16] heya [20:33:38] wassssuuppp? [20:34:03] I'm trying to reproduce the jdbc error and I'm stuck on getting access to files in spark-shell - Any idea? [20:35:26] from where joal? [20:36:28] I'm on an-tool1006, launching a spark-shell with --files hive-site.xml - But then hadoopConf doesn't have it :( [20:37:31] hm [20:37:54] I vaguely recall files not being accessible in shell-mode, but couldn't find doc [20:38:29] hmm that doesn't seem right [20:39:38] joal: what are you trying to do? load hive-site.xml manually? [20:39:57] reproduce what happens in refine, hive-jdbc only [20:40:01] aye [20:40:05] meaning: loading conf [20:40:11] so [20:40:14] on an-tool1006 [20:40:18] i just did [20:40:22] spark2-shell (no --files) [20:40:39] OH sorry, yeah, got null [20:41:36] ah i know [20:41:56] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Refine#Running_Refine_in_local_or_yarn_client_mode [20:42:55] ? [20:42:58] hmmmm why is there no --files there... [20:44:24] hm i thnk you shouldn't need --files in local mode [20:44:27] or yarn client mode [20:44:27] ottomata: hardcoding conf, I managed to have something working (failing to be precise), but I can't get access to conf [20:44:30] for hive-site.xml [20:44:46] but, i get null in hadoopConf too.. [20:45:05] ottomata: hadoopConf contains all hadoop conf except hive [20:46:37] hmmm [20:49:07] yeah shouldn't need joal, hive-site.xml is in symlinked spark conf already [20:49:10] trying too tho [20:52:45] ah [20:52:46] joal [20:52:54] you need to interact with hive before the configs are loaded [20:53:02] ahhhh [20:53:04] pf [20:53:06] ok :) [21:14:10] mforns: there still? [21:14:23] nuria, yes [21:14:29] mforns: bc? [21:14:33] sure [21:14:39] mforns: to talk about pageview anomalies? [21:33:48] mforns: internet went kaput but no worries! thank you for your work [21:34:49] nuria, I connected secondary internet [21:35:09] mforns: no worries i think we talked about everything , unless you had anything else? [21:38:26] nuria, using second internet now wanna continue? [21:38:51] 10Analytics-Cluster, 10Analytics-Kanban: Sanitize pageview_hourly - subtasked {mole} - https://phabricator.wikimedia.org/T114675 (10Aklapper) [21:38:57] mforns: nah, it is late for ya, and I think we got everything, we are good [21:39:07] ok, cool [22:10:44] I tried to manually run jdbc only code i [22:10:59] in spark - no chance - Will try again tomorrow [22:38:48] 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Growth: implement wider data purge window - https://phabricator.wikimedia.org/T237124 (10mforns) After discussing with @nettrom_WMF we concluded that the solution described above is not a fit. The raw (unsanitized) data does not have the appro... [22:42:27] 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Growth: implement wider data purge window - https://phabricator.wikimedia.org/T237124 (10Nuria) >that would remove the data specific to the affected growth schemas from the event_sanitized database after 270 days (sliding window). This would r... [22:50:52] hey, is there I can about jobs? [22:54:47] 10Analytics, 10Operations, 10ops-eqiad: analytics1057's BBU is faulty - https://phabricator.wikimedia.org/T239045 (10Jclark-ctr) @elukey unsure if this is same bbu it is diffrent models. 720xd vs 730xd [22:55:57] I meant is there a way that I query job in the data lake, like hadoop [22:56:30] *jobs [22:56:36] ** mediawiki jobs