[00:47:11] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Prtksxna) Same as @atgo, I am able to see the outlink actions in the individual report, but the [[ https://piwik.wikimedia.org/index.php?module=CoreHome&action=index&idSite=18&period=day...
[03:10:55] <icinga-wm>	 PROBLEM - Check the last execution of refinery-import-page-history-dumps on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refinery-import-page-history-dumps
[05:47:20] <wikibugs>	 10Analytics, 10Product-Analytics: MobileWebSectionUsage schema is whitelisting both session ids and page ids - https://phabricator.wikimedia.org/T209049 (10Tbayer) Discussed with @ovasileva today - we are going to remove the session IDs and keep the page names. I will submit a patch soon.
[05:48:44] <wikibugs>	 10Analytics, 10Readers-Web-Backlog: Print schema is whitelisting both session ids and page ids - https://phabricator.wikimedia.org/T209050 (10Tbayer) Discussed with @ovasileva today - we are going to remove the page IDs and keep the session IDs. I will submit a patch soon.
[05:50:18] <wikibugs>	 10Analytics: ReadingDepth schema is whitelisting both session ids and page ids - https://phabricator.wikimedia.org/T209051 (10Tbayer) a:03Tbayer Still need to look into this with @ovasileva and possibly @Groceryheist .
[07:18:35] <elukey>	 morning!
[07:18:45] <elukey>	 so refinery-import-page-history-dumps seems complaining about the parameters
[07:19:09] <elukey>	 it says: Usage: etc..
[07:19:15] <elukey>	 but afaics it seems correct
[07:19:52] <elukey>	 ahhhh the log gile
[07:19:54] <elukey>	 *file
[07:19:57] <elukey>	 is not there
[07:44:08] <elukey>	 so now it says
[07:44:09] <elukey>	 WARNING Input-base /mnt/data/xmldatadumps/public doesn't contain a project folder for
[07:44:19] <elukey>	 that makes sense, there is not /mnt/data
[07:44:24] <icinga-wm>	 RECOVERY - Check the last execution of refinery-import-page-history-dumps on an-coord1001 is OK: OK: Status of the systemd unit refinery-import-page-history-dumps
[07:46:58] <elukey>	 ah snap I am reading https://phabricator.wikimedia.org/T202489
[07:47:08] <elukey>	 so this needs to be moved to stat1007 I suppose
[08:01:38] <elukey>	 joal: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/473161 :)
[08:01:55] <elukey>	 ready to move import_wikitext_dumps to stat1007
[08:08:13] <joal>	 Morning elukey - Thanks for the patch :)
[08:08:26] <joal>	 elukey: What did I do wrong about the logging file?
[08:09:26] <joal>	 Ah elukey - Wrong variable name :(
[08:09:29] <joal>	 sorry for that
[08:11:54] <elukey>	 nah all good, we need to move it anyway :)
[08:13:02] <elukey>	 joal: bonjour! :)
[08:13:16] <elukey>	 ok to move the timer to stat1007? (and start it)
[08:13:20] <elukey>	 I'll clean up on an-coord1001
[08:13:28] <elukey>	 then afaics there are some spark refine issues :(
[08:14:20] <joal>	 Ah crappy crap
[08:14:27] <joal>	 Inndeed let's move the time
[08:14:29] <joal>	 timer sorry
[08:16:55] <elukey>	 merging!
[08:28:01] <wikibugs>	 10Analytics, 10Analytics-Kanban: Long term solution for sqooping comments - https://phabricator.wikimedia.org/T209178 (10JAllemandou)
[08:28:04] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10JAllemandou)
[08:31:44] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10JAllemandou)  ping @Bstorm for the above comment, as she's the one having setup views. Thanks
[08:31:52] <joal>	 elukey: can we manually try a run of the wikitext-importer?
[08:32:07] <wikibugs>	 10Analytics, 10Analytics-Kanban: Long term solution for sqooping comments - https://phabricator.wikimedia.org/T209178 (10JAllemandou)
[08:33:05] <wikibugs>	 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2 Backend: Resiliency, Rollback and Deployment of Data - https://phabricator.wikimedia.org/T177965 (10JAllemandou) Ping @Nuria  - Can we close this parent task?
[08:36:21] <elukey>	 joal: I just tried
[08:36:28] <elukey>	 but /usr/bin/python3: can't open file '/srv/deployment/analytics/refinery/bin/import-mediawiki-dumps': [Errno 2] No such file or directory
[08:36:48] <joal>	 WAT?
[08:37:05] <elukey>	 yeah :D
[08:37:18] <elukey>	 refinery is there
[08:38:17] <elukey>	 let's bring a fresh copy of refinery
[08:38:30] <elukey>	 puppet running
[08:38:59] <joal>	 elukey: file present on stat1004
[08:41:31] <elukey>	 all right now it is there
[08:41:44] <elukey>	 Nov 13 08:41:36 stat1007 systemd[1]: Started Schedules daily an incremental import of the current month of page-history xmldumps into Hadoop.
[08:41:54] <joal>	 \o/ !!
[08:42:07] <joal>	 looking at logs
[08:42:32] * joal is hugely happy
[08:42:51] * joal sends hugs to elukey for the help :)
[08:44:06] <elukey>	 \o/
[08:48:28] <elukey>	 the eventlogging refine spark issue seems wierd
[08:48:33] <elukey>	 it is now complaining about Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.auth.HiveAuthUtils
[08:49:39] <joal>	 hm
[08:52:32] <elukey>	 it seems another jar issue
[08:55:04] <elukey>	 first one that failed was https://yarn.wikimedia.org/cluster/app/application_1542030691525_0044
[08:55:15] <elukey>	 yesterday at around 15:52 UTC
[08:59:44] <joal>	 elukey: must be related to the hive-jar we pass in to spark-refine
[09:00:00] <elukey>	 yeah
[09:00:09] <elukey>	 it is the only one complaining though
[09:00:25] <joal>	 elukey: possibly other schemas didn't have to uswe it
[09:00:35] <joal>	 elukey: seems related to applying changes to schemas
[09:01:10] <joal>	 elukey: --conf spark.driver.extraClassPath=/usr/lib/hive/lib/hive-jdbc.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/usr/lib/hive/lib/hive-service.jar
[09:01:27] <joal>	 elukey: and IIRC we have updated hive-jdbc yesterday
[09:01:44] <elukey>	 /usr/lib/hive/lib/hive-jdbc.jar -> hive-jdbc-1.1.0-cdh5.15.0.jar
[09:02:34] <joal>	 elukey: everything seems correctly linked :(
[09:02:54] <elukey>	 I am wondering if now that class is into another har
[09:02:55] <elukey>	 *har
[09:02:58] <elukey>	 aaaaaahhh 
[09:03:00] <elukey>	 *jar
[09:05:46] <joal>	 elukey: seems very probably
[09:06:03] <joal>	 elukey: hive-common
[09:06:33] <joal>	 elukey: can we try to add hive-common.jar to the refinement script?
[09:06:42] <joal>	 elukey: /usr/lib/hive/lib/hive-common.jar
[09:07:51] <elukey>	 elukey@an-coord1001:~$ jar tvf /usr/lib/hive/lib/hive-common.jar   | grep -i authutil
[09:07:54] <elukey>	 Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 5869 Thu May 24 04:20:24 UTC 2018 org/apache/hadoop/hive/common/auth/HiveAuthUtils.class
[09:08:07] <joal>	 indeed
[09:09:37] <elukey>	 added to /usr/local/bin/refine_eventlogging_analytics
[09:09:49] <joal>	 ok let's check if it changes something
[09:09:59] <elukey>	 I can manually run it
[09:10:02] <joal>	 +1
[09:11:15] <elukey>	 it's running
[09:11:23] <elukey>	 while we wait, did you check hue?
[09:11:30] <joal>	 I have not
[09:11:34] <joal>	 anything wrong?
[09:11:51] <elukey>	 no no.. yesterday I force a (sort of) db upgrade
[09:12:06] <elukey>	 that seemed to have done a positive effect
[09:12:18] <elukey>	 but it is still super slow
[09:12:40] <elukey>	 mmmm not that much
[09:12:50] <joal>	 yes - first load is slow, then ok
[09:12:51] <elukey>	 joal: so basically you can force the Hue 3 view
[09:12:58] <joal>	 yup I have seen that
[09:14:46] <joal>	 elukey: the "scheduler" app fails dramatically for me
[09:15:11] <joal>	 elukey: I think the easiest is to use the old version for oozie
[09:15:38] <elukey>	          diagnostics: User class threw exception: java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME
[09:15:41] <elukey>	         at org.apache.spark.sql.hive.HiveUtils$.formatTimeVarsForHiveClient(HiveUtils.scala:195)
[09:15:44] <elukey>	 :D
[09:16:02] <joal>	 refine --^ ?
[09:16:14] <elukey>	 yeah
[09:16:14] <elukey>	 https://issues.apache.org/jira/browse/SPARK-14492
[09:16:46] <joal>	 But refine is using spark2
[09:16:49] <joal>	 :S
[09:17:22] <elukey>	 so that Hive upgrade was on paper 1.1 (same version)
[09:17:39] <elukey>	 but I suspect that they have backported stuff
[09:19:48] <joal>	 elukey: trying to rebuild a refinery-source version with correct packages and see
[09:33:32] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10daniel) >>! In T209031#4732318, @Nuria wrote: > @Krenair: we are looking how to best import the public dataset from labs, we h...
[09:46:59] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Krenair) >>! In T209031#4740771, @JAllemandou wrote: >   - **Access to underlying tables** - We could query the underlying tab...
[09:54:29] <joal>	 elukey: I'm gonna provide an updated refinery-job jar on an-ccord1001, to test if it changes anything (I don't expect so much, but it's worth trying)
[09:56:05] <elukey>	 ack
[10:06:38] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10JAllemandou) >>! In T209031#4741603, @Krenair wrote: >>>! In T209031#4740771, @JAllemandou wrote: >>   - **Access to underlyin...
[10:08:30] <joal>	 elukey: jar available on an-coord1001: /home/joal/code/refinery-source/refinery-job/target/refinery-job-0.0.81-SNAPSHOT.jar
[11:02:20] <elukey>	 joal: have you tested it by any chance?
[11:32:08] * elukey lunch + errand!
[12:09:48] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10daniel) >>! In T209031#4741603, @Krenair wrote: >>>! In T209031#4741574, @daniel wrote: >> In any case, I'm wondering: doen't...
[13:36:35] <milimetric>	 joal: can I jump in to clarify what we're all talking about on that thread?  Or at least try :)  It looks a bit disjointed now
[13:36:58] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10JAllemandou) @daniel : We were using sanitized comment view (named `comment`), not the compat one (named `revision_compat`). W...
[13:36:59] <joal>	 milimetric: please please :)
[13:37:09] <joal>	 milimetric: just commented, to give precisions
[13:37:29] <joal>	 milimetric: moar, merrier :)
[13:37:38] <milimetric>	 joal: :) just saw your comment, that's the first thing I was going to say too
[13:38:23] <milimetric>	 ok, I'll leave that for now, and follow up if there's still confusion
[13:38:25] <joal>	 milimetric: I try to answer any question asked, as I think it helps readers to understand better where the issues are
[13:39:26] <joal>	 milimetric: I managed to have a quick chat with Manu (the DBA) - He's coming back from holidays and has obviously plenty on his plate - Not sure they'll be able to chime in :(
[13:39:46] <milimetric>	 ok, makes sense, I thought as much
[13:40:44] <joal>	 milimetric: the other thing this teaches me is that the sanitization from prod doesn't seem that complicated - While it represent code duplication and therefore error-prone stuff, I really wonder if it wouldn't be easiest to sanitize in hadoop
[13:40:46] <milimetric>	 joal: do you think the compat views join to the underlying comment table or the comment view?
[13:40:58] * joal hides very far
[13:41:28] <joal>	 milimetric: I'm sure it does: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/templates/labs/db/views/maintain-views.yaml#658
[13:41:28] <milimetric>	 lol, yeah, you may be right, the second misunderstanding might be what sanitization is actually happening
[13:53:31] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10daniel) >>! In T209031#4742129, @JAllemandou wrote: > @daniel : We were using sanitized comment view (named `comment`), not th...
[13:54:40] <elukey>	 joal: do you have some time to test spark?
[13:54:51] <joal>	 elukey: currently doing so
[13:54:55] <elukey>	 ah!
[13:54:56] <elukey>	 :)
[13:55:00] <joal>	 ;)
[13:55:02] <elukey>	 anything that I can help with
[13:55:03] <elukey>	 ?
[13:55:17] <joal>	 elukey: also, moritzm dropped a handful of logs on druid-labs
[13:55:36] <joal>	 elukey: not sure what is implied, but that'd be great if we had an automated rule doing so
[13:55:57] <moritzm>	 deleted the Oct files from /var/log/druid/ as the root part was full, the entries from the last week of Oct were 500M per day 
[13:58:34] <elukey>	 ah lovely
[13:58:36] <elukey>	 thanks!
[13:59:41] <elukey>	 joal: do you have some sql query examples for druid that we can show to Moritz?
[14:02:41] <elukey>	 I am pretty sure that druid self rotates its logs
[14:03:44] <moritzm>	 he sent me some before
[14:03:56] <moritzm>	 https://phabricator.wikimedia.org/P7798
[14:04:46] <moritzm>	 there's no logrotate rule for druid, only for pivot (on d-1), but maybe it's rotating logs internally
[14:07:02] <elukey>	 super! Yes I think it is self rotating logs, will check later about how to tune them in labs
[14:17:18] <mforns>	 heyaa
[14:18:10] <joal>	 Hi mforns :)
[14:55:19] <ottomata>	 o/
[14:55:32] <ottomata>	 hiya has anyone looked into the refined datasets missing email yet?
[14:55:37] <ottomata>	 i'm about to just wanted to check
[14:55:39] <ottomata>	 elukey/joal?
[14:56:56] <elukey>	 ottomata: o/ joal is looking into it
[14:57:19] <elukey>	 there seems to be a class that moved to another jar (hive-commons)
[14:57:33] <elukey>	 so we tried to add it to the spark conf manually
[14:57:42] <elukey>	 and then another error popped up
[14:57:47] <elukey>	 lemme copy/paste them in here
[14:57:54] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Cmjohnson) @ottomata Lets go with cloudvirtan1xxx.
[14:58:19] <elukey>	 so org/apache/hadoop/hive/common/auth/HiveAuthUtils.class is now in /usr/lib/hive/lib/hive-common.jar
[14:58:31] <elukey>	 that seems one of spark's complains in the logs 
[14:58:53] <elukey>	 but adding the jar to the spark conf is not enough since it returns
[14:59:03] <elukey>	 diagnostics: User class threw exception: java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME
[14:59:17] <elukey>	 and we stopped in here a couple of hours ago
[14:59:19] <ottomata>	 ok that's what i'm looking at now
[14:59:41] <elukey>	 IIUC joseph was working on it but I am not sure if he found the issue yet
[15:06:42] <joal>	 elukey, ottomata - Issue not found :(
[15:06:58] <ottomata>	 joal assume you saw this? https://issues.apache.org/jira/browse/SPARK-14492
[15:07:44] <joal>	 ottomata: I have - But how come it worked previsouly then?
[15:08:20] <ottomata>	 good q
[15:08:36] <ottomata>	 do we set spark.sql.hive.metastore.version ? 
[15:08:40] <ottomata>	 (q for myself, looking)
[15:11:36] <ottomata>	 joal:  so to fix the first errors
[15:11:48] <ottomata>	 you guys added /usr/lib/hive/lib/hive-common.jar to --conf spark.driver.extraClassPath ?
[15:11:51] <ottomata>	 what did that replace?
[15:12:11] <elukey>	 I just added it, didn't replace anything (tried a manual run)
[15:12:21] <joal>	 ottomata: first error is java.lang.NoClassDefFoundError: org/apache/hadoop/hive/common/auth/HiveAuthUtils
[15:12:30] <ottomata>	 do you know what jar HiveAuthUtils.class was in before?
[15:12:34] <joal>	 ottomata: And that class is made available in hive-common.jar
[15:12:49] <joal>	 no
[15:12:52] <ottomata>	 k
[15:13:07] <joal>	 ottomata: Now when using hive-common, we hit the bug you describe
[15:13:31] <joal>	 ottomata: I even wonder if the class was used before
[15:16:22] <joal>	 Need to drop to catch Naé from the creche - see you at standup
[15:21:14] <ottomata>	 k
[15:30:27] <ottomata>	 oh, btw spark 2.4.0 is out! https://spark.apache.org/releases/spark-release-2-4-0.html
[15:30:41] <elukey>	 ottomata: let's deprecate spark 1.6!
[15:30:43] <elukey>	 :D
[15:30:47] <ottomata>	 agree!
[15:35:22] <ottomata>	 joal:  i have no idea why this wasn't happening before!
[15:35:28] <ottomata>	 i don't see how it wasn't!
[15:35:49] <elukey>	 ottomata: the other times you did the upgrades and the hadoop cluster was pleased, this time I did it
[15:35:52] <ottomata>	 the code really does look like it trieds to access a 'METASTORE_CLIENT_SOCKET_LIFETIME' independent of what the configuration of spark.sql.hive.metastore.version is
[15:35:57] <ottomata>	 haha
[15:35:57] <elukey>	 it seems the most plausible explanation
[15:36:03] <ottomata>	 but the hive version didn't even change!
[15:36:04] <ottomata>	 HMMM
[15:36:06] <ottomata>	 HMMMMM
[15:36:10] <ottomata>	 HMMM
[15:36:16] <ottomata>	 but we did add this jar
[15:36:17] <ottomata>	 hmmm
[15:36:23] <ottomata>	 oh hmmm
[15:36:27] <elukey>	 maybe the cdh people backported some stuff from 1.2 ?
[15:36:29] <ottomata>	 suspicsion...
[15:36:31] <ottomata>	 hang on
[15:38:26] <ottomata>	 elukey:  isn't  already in the  --conf spark.driver.extraClassPath we use?
[15:38:42] <ottomata>	 hm ya it is:
[15:38:43] <ottomata>	 --conf spark.driver.extraClassPath=/usr/lib/hive/lib/hive-jdbc.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/usr/lib/hive/lib/hive-service.jar:/usr/lib/hive/lib/hive-common.jar
[15:39:01] <ottomata>	 i want to repro the error you were seeing about missing HiveAuthUtils class
[15:39:03] <ottomata>	 how do I do that?
[15:39:36] <elukey>	 just run eventlogging refine
[15:39:40] <elukey>	 it is in its logs
[15:45:37] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Research: Cannot connect to Spark with Jupyter notebook on stat1007 - https://phabricator.wikimedia.org/T208896 (10fdans) Hey Baho! You may have been following out of date documentation for jupyter notebooks. Right now standard procedure is to use one of the notebook machin...
[15:54:08] <joal>	 back
[15:57:30] <ottomata>	 its def a class loading order problem
[15:57:35] <ottomata>	 something changed in the order classes were loaded
[15:57:50] <ottomata>	 spark itself has a version of HiveConf (1.2.1)
[15:57:53] <ottomata>	 which has the opt.
[15:58:00] <ottomata>	 but hive-common also has HiveConf, which doesn't have that field 
[15:58:20] <ottomata>	 so, we need spark to use its own version oh hmmm
[15:58:25] <ottomata>	 maybe i can use spark hive-common...
[15:58:27] <ottomata>	 trying
[15:59:07] <joal>	 ottomata: The thing I don't get is - Why does the curent of refine need hive-common if spark brings one
[15:59:29] <ottomata>	 i think for HiveAuthUtils...actually i do'nt see hive-common at all in the spark lib jars
[15:59:35] <wikibugs>	 (03CR) 10Mforns: [C: 031] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471722 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal)
[15:59:52] <ottomata>	 gotta find out where it is getting HiveCOnf
[16:00:12] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Research: Cannot connect to Spark with Jupyter notebook on stat1007 - https://phabricator.wikimedia.org/T208896 (10bmansurov) @fdans thanks for looking into this issue. I need to use spark because my work is a little heavy on computation side and would adversely affect othe...
[16:00:38] <ottomata>	 OH
[16:00:52] <ottomata>	 in hive 1.2.1 HiveConf movied into hive-exec.jar
[16:00:58] <joal>	 Ahhh !
[16:01:08] <joal>	 So we'd need hive-exec?
[16:01:23] <ottomata>	 we need hive-exec from spark, which is has
[16:01:31] <ottomata>	 the problem is for some reason we need HiveAuthUtils
[16:01:34] <ottomata>	 which is in hive-common
[16:01:41] <ottomata>	 which in our 1.1.0 version ALSO has HiveConf
[16:01:49] <ottomata>	 so spark is using our 1.1.0 HiveConf
[16:01:57] <ottomata>	 which doesn't have the METASTORE_CLIENT_SOCKET_LIFETIME field
[16:02:01] <ottomata>	 and is erroring
[16:02:43] <joal>	 right
[16:02:47] <joal>	 Mwarf :(
[16:13:03] <wikibugs>	 (03CR) 10Ottomata: [C: 031] Add bin/refinery-drop-older-than [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471279 (https://phabricator.wikimedia.org/T199836) (owner: 10Mforns)
[16:17:50] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Ottomata) Ok!
[16:19:28] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10Ottomata)
[16:20:11] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Ottomata) @Cmjohnson  I updated {T207194} to reflect the new naming.  Please proceed and then assign to Cloud VPS folks f...
[16:57:18] <nuria>	 fdans: (cc ottomata ) if refine process is temporarily stalled , let's send e-mail to analytics@
[16:58:06] <ottomata>	 k hold on!  am close i feel
[17:04:24] <ottomata>	 rawr ok not close.
[17:04:38] <ottomata>	 joal:  i think the best thing to do would be for me to fix up this patch and get it merged https://github.com/apache/spark/pull/21012
[17:04:43] <nuria>	 ottomata, joal: what i do not understand: why did we not get this alarm yesterday? me no compredou, things were refined ok yesterday weren't they?
[17:04:45] <ottomata>	 our use of hive jdbc directly
[17:05:10] <ottomata>	 nuria:  it is only going to fire for things that require an alter table to add a column
[17:05:52] <nuria>	 ottomata: oooooohhh, and how did we figued that out ?
[17:06:20] <ottomata>	 our cdh hive jars are only used by refine when an alter needs to happen
[17:06:25] <ottomata>	 otherwise everything is via spark
[17:06:31] <ottomata>	 i do not know why this didn't break before the upgrade
[17:06:45] <ottomata>	 likely some subtle change somewhere  in the way classpaths are loaded 
[17:11:15] <ottomata>	 nuria:  as far as I can tell the only tables affected are the ones that were sent in that email
[17:11:22] <ottomata>	 SearchSatisfaction and TestSearchSatisfaction2
[17:11:24] <ottomata>	 so far anyway
[17:11:59] <ottomata>	 looks like there was some change in the data/schema for those things starting around 09:00 utc yesterday
[17:12:29] <ottomata>	 the monitor refine job only runs once a day
[17:12:33] <ottomata>	 which is why we've only seen the one email
[17:12:44] <nuria>	 ottomata:  on meeting but this is happen before, those schemas have not modified in a while so why would a column addition be needed? https://meta.wikimedia.org/w/index.php?title=Schema:SearchSatisfaction&action=history
[17:12:59] <ottomata>	 nuria:  it doesn't necessarily have to do with schemas at all
[17:13:02] <ottomata>	 refine doesn't look at schema
[17:13:12] <ottomata>	 it has to do with what is in data for those hours, vs what is already in hive table
[17:13:29] <nuria>	 ottomata: right it could be data for 1 column that was not present before, yes
[17:13:29] <ottomata>	 maybe they just started sending some field that has always been in the schema but wasn't sent before
[17:13:32] <ottomata>	 right
[17:13:53] <ottomata>	 i betcha the client that sends this is the same for both of those schemas (based on the way they are named)
[17:14:00] <ottomata>	 and they just made  some client change to send some new field
[17:14:03] <nuria>	 ottomata: ok, i would triple check  that after meeting
[17:14:07] <ottomata>	 which is why both of those schemas reported at the same time
[17:14:20] <ottomata>	 ok nuria, i just ran refine monitor
[17:14:27] <ottomata>	 there are more schemas missing data 
[17:14:42] <ottomata>	 enough to warrant an email
[17:14:43] <ottomata>	 sending
[17:20:02] <elukey>	 fdans: shall I merge your puppet patch?
[17:20:14] <fdans>	 yessss
[17:20:18] <fdans>	 thank you elukey 
[17:24:21] <ottomata>	 hm actually there are quite a few tables affected
[17:25:03] <elukey>	                        
[17:27:36] <elukey>	 (---^ was my cat)
[17:29:19] <ottomata>	 joal:  yarrrrr, i'm really not sure how to solve this!
[17:29:27] <ottomata>	 the problem is our use of JDBC -> hive
[17:29:35] <ottomata>	 its clashing with spark hive versions
[17:30:04] <ottomata>	 we could shell out to a hive command or JDBC thing so as to not have jar versions clash
[17:30:07] <ottomata>	 but that gets uglier and uglier
[17:30:09] <ottomata>	 yar
[17:33:26] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10elukey) @mforns created https://gerrit.wikimedia.org/r/#/admin/projects/operations/debs/prometheus-varnishkafka-exporter
[17:35:42] <ottomata>	 joal:  if you are around...brain bounce plz!
[17:42:56] <ottomata>	 (brb)
[17:47:38] <wikibugs>	 (03PS1) 10Milimetric: [WIP] DO NOT MERGE testing speed of sqoop against labs with different approaches [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473256
[17:52:00] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Bstorm)  >   - **Specialized views** -  Views for comments from each of revision, archive, and logging, separately.  We have t...
[18:00:24] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Nuria) Once data comes in you can see how you instrumentation is working @Prtksxna
[18:00:35] <joal>	 Hey ottomata - here now, after staff?
[18:02:28] <ottomata>	 ya
[18:02:50] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Anomie) >>! In T209031#4743469, @Bstorm wrote: > Materialized views in mariadb requires a plugin that basically is adding some...
[18:03:09] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10atgo) @Prtksxna I just adjusted the time range and it works: https://piwik.wikimedia.org/index.php?module=CoreHome&action=index&idSite=18&period=day&date=yesterday#?idSite=18&period=mont...
[18:04:34] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Bstorm) Note: I'm not done reading back yet--but yeah, that's what I was thinking of.  There's a lot here.
[18:33:30] <milimetric>	 elukey: it's late for you so I feel bad, but I am interested
[18:33:36] <milimetric>	 I did systemctl list-timers --all on an-coord1001
[18:33:44] <milimetric>	 and so I see the refinery-sqoop-mediawiki.timer job
[18:33:49] <milimetric>	 just wanted to see what command it was running
[18:34:33] <elukey>	 (meeting but I'll be available in max 1h! If you have patience I'll explain! :)
[18:34:55] <elukey>	 but the gist of it is that there is a .service unit that the .timer calls
[18:35:04] <elukey>	 you can check it via systemctl cat refinery-sqoop-mediawiki.timer
[18:35:11] <elukey>	 systemctl cat refinery-sqoop-mediawiki.service
[18:35:22] <elukey>	 this --^ has an ExecStart= line
[18:35:40] <elukey>	 that calls a script (we did it to not mess with escaping etc..)
[18:35:51] <elukey>	 in that script there are all the parameters (it is managed by puppet)
[18:37:46] <milimetric>	 aha! thank you, got it
[18:37:48] <milimetric>	 ExecStart=/usr/local/bin/refinery-sqoop-mediawiki
[18:37:55] <milimetric>	 sudo -u hdfs cat /usr/local/bin/refinery-sqoop-mediawiki
[18:38:27] <milimetric>	 the last part about puppet managing the file was what I didn't understand, thought it was pointing to something fake to trick me :)
[18:41:11] <milimetric>	 added to docs for clueless people like me (probably just future me): https://wikitech.wikimedia.org/w/index.php?title=Analytics%2FTeam%2FOncall&type=revision&diff=1808485&oldid=1808074
[18:51:14] <elukey>	 milimetric: I am looking forward for suggestions/etc.. about how to improve the whole thing, is pretty new for me too
[18:52:00] <elukey>	 but the systemctl list-timers command always make me feel happier than a crontab :D
[18:52:06] <milimetric>	 no it makes lots of sense, I might add a quick example at the top for getting this specific command I just did, but only if someone else gets confused again
[18:52:15] <milimetric>	 yeah, it's great, much neater
[18:53:11] <milimetric>	 joal: 12 minutes for frwiki.logging, 1 minute 38 seconds for simplewiki.logging
[18:53:18] <milimetric>	 seems fast?
[18:53:27] <joal>	 Fast enough I think :)
[18:53:40] <joal>	 milimetric: can you go for a full sqoop of logging table onl please?
[18:53:55] <milimetric>	 doing
[18:54:06] <joal>	 milimetric: I think it'll take a couple hours, and that's fine
[18:54:27] <joal>	 Wow great - This makes me fell we have an easy and workable solution :)
[18:54:34] <joal>	 Might even work for revision
[18:55:39] <milimetric>	 I can do revision for these test wikis and I'll do all wikis for just logging
[18:56:08] <joal>	 Super fine milimetric 
[18:57:11] <joal>	 milimetric: updating the task
[18:58:04] <milimetric>	 man, that FUSE mount is working like 10% of the time I try to use it...
[18:58:31] <joal>	 milimetric: it re-fuse ?
[18:58:41] * joal hides again
[18:58:44] <milimetric>	 :) yes, it refuses to work
[18:58:45] <milimetric>	 lol
[19:06:46] * elukey off!
[19:06:52] <joal>	 Bye elukey :)
[19:07:06] <elukey>	 fuse mount on stat1004 was dead
[19:07:11] <elukey>	 are you guys working on it?
[19:07:25] <elukey>	 (I just force a remound)
[19:07:29] <elukey>	 *remount
[19:07:56] <elukey>	 anyway, not a big deal, and we'll need to change it in the near future
[19:08:03] <elukey>	 (hopefully for something better!)
[19:09:57] <wikibugs>	 10Analytics-Kanban, 10MediaWiki-General-or-Unknown, 10Tool-Pageviews: Check abnormal pageviews for some pages on itwiki - https://phabricator.wikimedia.org/T209404 (10Daimona)
[19:18:24] <wikibugs>	 10Analytics-Kanban: EventLogging Hive Refine broken after upgrade to CDH 5.15.0 - https://phabricator.wikimedia.org/T209407 (10Ottomata)
[19:19:06] <wikibugs>	 (03PS1) 10Ottomata: Add Hive 1.1.0 jars from CDH 5.10.0 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473268 (https://phabricator.wikimedia.org/T209407)
[19:19:39] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review: EventLogging Hive Refine broken after upgrade to CDH 5.15.0 - https://phabricator.wikimedia.org/T209407 (10Ottomata) Ok:  When ALTERing Hive tables, DataFrameToHive uses a manual JDBC connection to Hive, rather than Spark SQL.  This is a work around for https://issues.ap...
[19:23:58] <wikibugs>	 (03PS2) 10Ottomata: Add Hive 1.1.0 jars from CDH 5.10.0 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473268 (https://phabricator.wikimedia.org/T209407)
[19:24:26] <wikibugs>	 (03CR) 10Ottomata: [V: 032 C: 032] Add Hive 1.1.0 jars from CDH 5.10.0 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473268 (https://phabricator.wikimedia.org/T209407) (owner: 10Ottomata)
[19:45:05] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install an-worker10[78-96].eqiad.wmnet - https://phabricator.wikimedia.org/T207192 (10Cmjohnson) a:05Cmjohnson>03RobH Robh can you do the installs please.
[19:50:31] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10Cmjohnson) @ottomata can these go in any row or does it need to be row B?
[19:51:31] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10Ottomata) IIUC it has to be row B for them to be used as Cloud Virts.  @Andrew to confirm.  If they can go any row, then they should be spread out as even...
[19:52:37] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10JAllemandou) Thanks again a lot @Anomie, @daniel, @Bstorm for having chimed in, your questions and suggestions have helped us...
[19:53:41] <milimetric>	 elukey: I tried to copy something via fuse and it died/was dead.  You can almost disable it as far as I'm concerned, I wonder if there are any jobs using it
[19:54:09] <milimetric>	 the time it saves me from going directly to hdfs it also costs me in hung processes
[20:01:47] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10chasemp) I don't want to muddy the waters as I have not been involved here :). But worth noting there is more than the views a...
[20:09:51] <ottomata>	 a-team i think we fixed Refine, am re-refining failed stuff over the last 48 hours now
[20:10:03] <mforns>	 ack
[20:10:04] <joal>	 \o/ ottomata :)
[20:10:08] <ottomata>	 https://phabricator.wikimedia.org/T209407
[20:10:10] <ottomata>	 info there ^
[20:11:54] <wikibugs>	 10Analytics, 10Operations, 10Wikimedia-Logstash, 10Core Platform Team Backlog (Watching / External), and 2 others: Review and make librdkafka-0.11.6 installable from stretch-wikimedia - https://phabricator.wikimedia.org/T209300 (10herron) 05Open>03Resolved a:03herron
[20:12:41] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10JAllemandou) Thanks @chasemp  for raising the point. We are aware of the 2 steps for sanitization.  For us, code for sanitariu...
[20:15:45] <joal>	 Gone for tonight team - see you tomorrow evening
[20:48:16] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10chasemp) Makes sense, again sorry for the drive by comment.  Let's me know if I can be helpful :)
[20:57:24] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review: EventLogging Hive Refine broken after upgrade to CDH 5.15.0 - https://phabricator.wikimedia.org/T209407 (10Ottomata)
[20:57:37] <wikibugs>	 10Analytics-Kanban, 10Patch-For-Review: EventLogging Hive Refine broken after upgrade to CDH 5.15.0 - https://phabricator.wikimedia.org/T209407 (10Ottomata) Woot!    ` 18/11/13 20:52:34 INFO RefineMonitor: No dataset targets in /wmf/data/raw/eventlogging between 2018-11-11T20:50:26.600Z and 2018-11-13T16:50:26...
[21:26:32] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10atgo) @Dzahn sometimes I'm getting a Bugzilla page instead of the Bienvendia page. See this screenshot: {F27213617}  Right now that's the only page I'm getting. Have tested across multip...
[21:28:15] <wikibugs>	 10Analytics: stats.wikimedia.org home page should link to wikistats 2 - https://phabricator.wikimedia.org/T191555 (10Milimetric)
[21:28:19] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Deprecate reportcard: https://analytics.wikimedia.org/dashboards/reportcard/ - https://phabricator.wikimedia.org/T203128 (10Milimetric)
[21:42:41] <wikibugs>	 10Analytics, 10Analytics-Kanban: Generate pagecounts-ez data back to 2008 - https://phabricator.wikimedia.org/T188041 (10Milimetric) a:03Milimetric
[21:43:16] <wikibugs>	 10Analytics, 10Analytics-Kanban: Generate pagecounts-ez data back to 2008 - https://phabricator.wikimedia.org/T188041 (10Milimetric) I'm so sorry I delayed so long, will work on this next week if I can.
[21:45:17] <milimetric>	 https://phabricator.wikimedia.org/T202592#4744207 !!!!!!!
[22:37:15] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10Andrew) Yep, Row B.  @Cmjohnson, these are to be handled like any other cloudvirt server.  For network details it might be best to consult with @ayounsi
[23:09:49] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Prtksxna) >>! In T202592#4743532, @atgo wrote: > @Prtksxna I just adjusted the time range and it works: https://piwik.wikimedia.org/index.php?module=CoreHome&action=index&idSite=18&perio...
[23:22:43] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Dzahn) @atgo Try again now. It should be better.
[23:25:45] <wikibugs>	 (03PS1) 10QChris: Add .gitreview [analytics/wmde/WDCM-WikipediaSemantics-Dashboard] - 10https://gerrit.wikimedia.org/r/473297
[23:25:48] <wikibugs>	 (03CR) 10QChris: [V: 032 C: 032] Add .gitreview [analytics/wmde/WDCM-WikipediaSemantics-Dashboard] - 10https://gerrit.wikimedia.org/r/473297 (owner: 10QChris)
[23:26:32] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10atgo) @dzahn still not working after clearing cache on my machine. Any chance it's office wifi that's caching and causing the problem? I could follow up with IT.
[23:56:19] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Dzahn) fixed for real after bblack restarted a backend server and also purged the cached contents
[23:57:27] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10atgo) 05Open>03Resolved Thank you!