[00:47:11] 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Prtksxna) Same as @atgo, I am able to see the outlink actions in the individual report, but the [[ https://piwik.wikimedia.org/index.php?module=CoreHome&action=index&idSite=18&period=day... [03:10:55] PROBLEM - Check the last execution of refinery-import-page-history-dumps on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refinery-import-page-history-dumps [05:47:20] 10Analytics, 10Product-Analytics: MobileWebSectionUsage schema is whitelisting both session ids and page ids - https://phabricator.wikimedia.org/T209049 (10Tbayer) Discussed with @ovasileva today - we are going to remove the session IDs and keep the page names. I will submit a patch soon. [05:48:44] 10Analytics, 10Readers-Web-Backlog: Print schema is whitelisting both session ids and page ids - https://phabricator.wikimedia.org/T209050 (10Tbayer) Discussed with @ovasileva today - we are going to remove the page IDs and keep the session IDs. I will submit a patch soon. [05:50:18] 10Analytics: ReadingDepth schema is whitelisting both session ids and page ids - https://phabricator.wikimedia.org/T209051 (10Tbayer) a:03Tbayer Still need to look into this with @ovasileva and possibly @Groceryheist . [07:18:35] morning! [07:18:45] so refinery-import-page-history-dumps seems complaining about the parameters [07:19:09] it says: Usage: etc.. [07:19:15] but afaics it seems correct [07:19:52] ahhhh the log gile [07:19:54] *file [07:19:57] is not there [07:44:08] so now it says [07:44:09] WARNING Input-base /mnt/data/xmldatadumps/public doesn't contain a project folder for [07:44:19] that makes sense, there is not /mnt/data [07:44:24] RECOVERY - Check the last execution of refinery-import-page-history-dumps on an-coord1001 is OK: OK: Status of the systemd unit refinery-import-page-history-dumps [07:46:58] ah snap I am reading https://phabricator.wikimedia.org/T202489 [07:47:08] so this needs to be moved to stat1007 I suppose [08:01:38] joal: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/473161 :) [08:01:55] ready to move import_wikitext_dumps to stat1007 [08:08:13] Morning elukey - Thanks for the patch :) [08:08:26] elukey: What did I do wrong about the logging file? [08:09:26] Ah elukey - Wrong variable name :( [08:09:29] sorry for that [08:11:54] nah all good, we need to move it anyway :) [08:13:02] joal: bonjour! :) [08:13:16] ok to move the timer to stat1007? (and start it) [08:13:20] I'll clean up on an-coord1001 [08:13:28] then afaics there are some spark refine issues :( [08:14:20] Ah crappy crap [08:14:27] Inndeed let's move the time [08:14:29] timer sorry [08:16:55] merging! [08:28:01] 10Analytics, 10Analytics-Kanban: Long term solution for sqooping comments - https://phabricator.wikimedia.org/T209178 (10JAllemandou) [08:28:04] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10JAllemandou) [08:31:44] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10JAllemandou) ping @Bstorm for the above comment, as she's the one having setup views. Thanks [08:31:52] elukey: can we manually try a run of the wikitext-importer? [08:32:07] 10Analytics, 10Analytics-Kanban: Long term solution for sqooping comments - https://phabricator.wikimedia.org/T209178 (10JAllemandou) [08:33:05] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2 Backend: Resiliency, Rollback and Deployment of Data - https://phabricator.wikimedia.org/T177965 (10JAllemandou) Ping @Nuria - Can we close this parent task? [08:36:21] joal: I just tried [08:36:28] but /usr/bin/python3: can't open file '/srv/deployment/analytics/refinery/bin/import-mediawiki-dumps': [Errno 2] No such file or directory [08:36:48] WAT? [08:37:05] yeah :D [08:37:18] refinery is there [08:38:17] let's bring a fresh copy of refinery [08:38:30] puppet running [08:38:59] elukey: file present on stat1004 [08:41:31] all right now it is there [08:41:44] Nov 13 08:41:36 stat1007 systemd[1]: Started Schedules daily an incremental import of the current month of page-history xmldumps into Hadoop. [08:41:54] \o/ !! [08:42:07] looking at logs [08:42:32] * joal is hugely happy [08:42:51] * joal sends hugs to elukey for the help :) [08:44:06] \o/ [08:48:28] the eventlogging refine spark issue seems wierd [08:48:33] it is now complaining about Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.auth.HiveAuthUtils [08:49:39] hm [08:52:32] it seems another jar issue [08:55:04] first one that failed was https://yarn.wikimedia.org/cluster/app/application_1542030691525_0044 [08:55:15] yesterday at around 15:52 UTC [08:59:44] elukey: must be related to the hive-jar we pass in to spark-refine [09:00:00] yeah [09:00:09] it is the only one complaining though [09:00:25] elukey: possibly other schemas didn't have to uswe it [09:00:35] elukey: seems related to applying changes to schemas [09:01:10] elukey: --conf spark.driver.extraClassPath=/usr/lib/hive/lib/hive-jdbc.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/usr/lib/hive/lib/hive-service.jar [09:01:27] elukey: and IIRC we have updated hive-jdbc yesterday [09:01:44] /usr/lib/hive/lib/hive-jdbc.jar -> hive-jdbc-1.1.0-cdh5.15.0.jar [09:02:34] elukey: everything seems correctly linked :( [09:02:54] I am wondering if now that class is into another har [09:02:55] *har [09:02:58] aaaaaahhh [09:03:00] *jar [09:05:46] elukey: seems very probably [09:06:03] elukey: hive-common [09:06:33] elukey: can we try to add hive-common.jar to the refinement script? [09:06:42] elukey: /usr/lib/hive/lib/hive-common.jar [09:07:51] elukey@an-coord1001:~$ jar tvf /usr/lib/hive/lib/hive-common.jar | grep -i authutil [09:07:54] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 5869 Thu May 24 04:20:24 UTC 2018 org/apache/hadoop/hive/common/auth/HiveAuthUtils.class [09:08:07] indeed [09:09:37] added to /usr/local/bin/refine_eventlogging_analytics [09:09:49] ok let's check if it changes something [09:09:59] I can manually run it [09:10:02] +1 [09:11:15] it's running [09:11:23] while we wait, did you check hue? [09:11:30] I have not [09:11:34] anything wrong? [09:11:51] no no.. yesterday I force a (sort of) db upgrade [09:12:06] that seemed to have done a positive effect [09:12:18] but it is still super slow [09:12:40] mmmm not that much [09:12:50] yes - first load is slow, then ok [09:12:51] joal: so basically you can force the Hue 3 view [09:12:58] yup I have seen that [09:14:46] elukey: the "scheduler" app fails dramatically for me [09:15:11] elukey: I think the easiest is to use the old version for oozie [09:15:38] diagnostics: User class threw exception: java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME [09:15:41] at org.apache.spark.sql.hive.HiveUtils$.formatTimeVarsForHiveClient(HiveUtils.scala:195) [09:15:44] :D [09:16:02] refine --^ ? [09:16:14] yeah [09:16:14] https://issues.apache.org/jira/browse/SPARK-14492 [09:16:46] But refine is using spark2 [09:16:49] :S [09:17:22] so that Hive upgrade was on paper 1.1 (same version) [09:17:39] but I suspect that they have backported stuff [09:19:48] elukey: trying to rebuild a refinery-source version with correct packages and see [09:33:32] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10daniel) >>! In T209031#4732318, @Nuria wrote: > @Krenair: we are looking how to best import the public dataset from labs, we h... [09:46:59] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Krenair) >>! In T209031#4740771, @JAllemandou wrote: > - **Access to underlying tables** - We could query the underlying tab... [09:54:29] elukey: I'm gonna provide an updated refinery-job jar on an-ccord1001, to test if it changes anything (I don't expect so much, but it's worth trying) [09:56:05] ack [10:06:38] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10JAllemandou) >>! In T209031#4741603, @Krenair wrote: >>>! In T209031#4740771, @JAllemandou wrote: >> - **Access to underlyin... [10:08:30] elukey: jar available on an-coord1001: /home/joal/code/refinery-source/refinery-job/target/refinery-job-0.0.81-SNAPSHOT.jar [11:02:20] joal: have you tested it by any chance? [11:32:08] * elukey lunch + errand! [12:09:48] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10daniel) >>! In T209031#4741603, @Krenair wrote: >>>! In T209031#4741574, @daniel wrote: >> In any case, I'm wondering: doen't... [13:36:35] joal: can I jump in to clarify what we're all talking about on that thread? Or at least try :) It looks a bit disjointed now [13:36:58] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10JAllemandou) @daniel : We were using sanitized comment view (named `comment`), not the compat one (named `revision_compat`). W... [13:36:59] milimetric: please please :) [13:37:09] milimetric: just commented, to give precisions [13:37:29] milimetric: moar, merrier :) [13:37:38] joal: :) just saw your comment, that's the first thing I was going to say too [13:38:23] ok, I'll leave that for now, and follow up if there's still confusion [13:38:25] milimetric: I try to answer any question asked, as I think it helps readers to understand better where the issues are [13:39:26] milimetric: I managed to have a quick chat with Manu (the DBA) - He's coming back from holidays and has obviously plenty on his plate - Not sure they'll be able to chime in :( [13:39:46] ok, makes sense, I thought as much [13:40:44] milimetric: the other thing this teaches me is that the sanitization from prod doesn't seem that complicated - While it represent code duplication and therefore error-prone stuff, I really wonder if it wouldn't be easiest to sanitize in hadoop [13:40:46] joal: do you think the compat views join to the underlying comment table or the comment view? [13:40:58] * joal hides very far [13:41:28] milimetric: I'm sure it does: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/templates/labs/db/views/maintain-views.yaml#658 [13:41:28] lol, yeah, you may be right, the second misunderstanding might be what sanitization is actually happening [13:53:31] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10daniel) >>! In T209031#4742129, @JAllemandou wrote: > @daniel : We were using sanitized comment view (named `comment`), not th... [13:54:40] joal: do you have some time to test spark? [13:54:51] elukey: currently doing so [13:54:55] ah! [13:54:56] :) [13:55:00] ;) [13:55:02] anything that I can help with [13:55:03] ? [13:55:17] elukey: also, moritzm dropped a handful of logs on druid-labs [13:55:36] elukey: not sure what is implied, but that'd be great if we had an automated rule doing so [13:55:57] deleted the Oct files from /var/log/druid/ as the root part was full, the entries from the last week of Oct were 500M per day [13:58:34] ah lovely [13:58:36] thanks! [13:59:41] joal: do you have some sql query examples for druid that we can show to Moritz? [14:02:41] I am pretty sure that druid self rotates its logs [14:03:44] he sent me some before [14:03:56] https://phabricator.wikimedia.org/P7798 [14:04:46] there's no logrotate rule for druid, only for pivot (on d-1), but maybe it's rotating logs internally [14:07:02] super! Yes I think it is self rotating logs, will check later about how to tune them in labs [14:17:18] heyaa [14:18:10] Hi mforns :) [14:55:19] o/ [14:55:32] hiya has anyone looked into the refined datasets missing email yet? [14:55:37] i'm about to just wanted to check [14:55:39] elukey/joal? [14:56:56] ottomata: o/ joal is looking into it [14:57:19] there seems to be a class that moved to another jar (hive-commons) [14:57:33] so we tried to add it to the spark conf manually [14:57:42] and then another error popped up [14:57:47] lemme copy/paste them in here [14:57:54] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Cmjohnson) @ottomata Lets go with cloudvirtan1xxx. [14:58:19] so org/apache/hadoop/hive/common/auth/HiveAuthUtils.class is now in /usr/lib/hive/lib/hive-common.jar [14:58:31] that seems one of spark's complains in the logs [14:58:53] but adding the jar to the spark conf is not enough since it returns [14:59:03] diagnostics: User class threw exception: java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME [14:59:17] and we stopped in here a couple of hours ago [14:59:19] ok that's what i'm looking at now [14:59:41] IIUC joseph was working on it but I am not sure if he found the issue yet [15:06:42] elukey, ottomata - Issue not found :( [15:06:58] joal assume you saw this? https://issues.apache.org/jira/browse/SPARK-14492 [15:07:44] ottomata: I have - But how come it worked previsouly then? [15:08:20] good q [15:08:36] do we set spark.sql.hive.metastore.version ? [15:08:40] (q for myself, looking) [15:11:36] joal: so to fix the first errors [15:11:48] you guys added /usr/lib/hive/lib/hive-common.jar to --conf spark.driver.extraClassPath ? [15:11:51] what did that replace? [15:12:11] I just added it, didn't replace anything (tried a manual run) [15:12:21] ottomata: first error is java.lang.NoClassDefFoundError: org/apache/hadoop/hive/common/auth/HiveAuthUtils [15:12:30] do you know what jar HiveAuthUtils.class was in before? [15:12:34] ottomata: And that class is made available in hive-common.jar [15:12:49] no [15:12:52] k [15:13:07] ottomata: Now when using hive-common, we hit the bug you describe [15:13:31] ottomata: I even wonder if the class was used before [15:16:22] Need to drop to catch Naé from the creche - see you at standup [15:21:14] k [15:30:27] oh, btw spark 2.4.0 is out! https://spark.apache.org/releases/spark-release-2-4-0.html [15:30:41] ottomata: let's deprecate spark 1.6! [15:30:43] :D [15:30:47] agree! [15:35:22] joal: i have no idea why this wasn't happening before! [15:35:28] i don't see how it wasn't! [15:35:49] ottomata: the other times you did the upgrades and the hadoop cluster was pleased, this time I did it [15:35:52] the code really does look like it trieds to access a 'METASTORE_CLIENT_SOCKET_LIFETIME' independent of what the configuration of spark.sql.hive.metastore.version is [15:35:57] haha [15:35:57] it seems the most plausible explanation [15:36:03] but the hive version didn't even change! [15:36:04] HMMM [15:36:06] HMMMMM [15:36:10] HMMM [15:36:16] but we did add this jar [15:36:17] hmmm [15:36:23] oh hmmm [15:36:27] maybe the cdh people backported some stuff from 1.2 ? [15:36:29] suspicsion... [15:36:31] hang on [15:38:26] elukey: isn't already in the --conf spark.driver.extraClassPath we use? [15:38:42] hm ya it is: [15:38:43] --conf spark.driver.extraClassPath=/usr/lib/hive/lib/hive-jdbc.jar:/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/usr/lib/hive/lib/hive-service.jar:/usr/lib/hive/lib/hive-common.jar [15:39:01] i want to repro the error you were seeing about missing HiveAuthUtils class [15:39:03] how do I do that? [15:39:36] just run eventlogging refine [15:39:40] it is in its logs [15:45:37] 10Analytics, 10Analytics-Kanban, 10Research: Cannot connect to Spark with Jupyter notebook on stat1007 - https://phabricator.wikimedia.org/T208896 (10fdans) Hey Baho! You may have been following out of date documentation for jupyter notebooks. Right now standard procedure is to use one of the notebook machin... [15:54:08] back [15:57:30] its def a class loading order problem [15:57:35] something changed in the order classes were loaded [15:57:50] spark itself has a version of HiveConf (1.2.1) [15:57:53] which has the opt. [15:58:00] but hive-common also has HiveConf, which doesn't have that field [15:58:20] so, we need spark to use its own version oh hmmm [15:58:25] maybe i can use spark hive-common... [15:58:27] trying [15:59:07] ottomata: The thing I don't get is - Why does the curent of refine need hive-common if spark brings one [15:59:29] i think for HiveAuthUtils...actually i do'nt see hive-common at all in the spark lib jars [15:59:35] (03CR) 10Mforns: [C: 031] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471722 (https://phabricator.wikimedia.org/T164020) (owner: 10Joal) [15:59:52] gotta find out where it is getting HiveCOnf [16:00:12] 10Analytics, 10Analytics-Kanban, 10Research: Cannot connect to Spark with Jupyter notebook on stat1007 - https://phabricator.wikimedia.org/T208896 (10bmansurov) @fdans thanks for looking into this issue. I need to use spark because my work is a little heavy on computation side and would adversely affect othe... [16:00:38] OH [16:00:52] in hive 1.2.1 HiveConf movied into hive-exec.jar [16:00:58] Ahhh ! [16:01:08] So we'd need hive-exec? [16:01:23] we need hive-exec from spark, which is has [16:01:31] the problem is for some reason we need HiveAuthUtils [16:01:34] which is in hive-common [16:01:41] which in our 1.1.0 version ALSO has HiveConf [16:01:49] so spark is using our 1.1.0 HiveConf [16:01:57] which doesn't have the METASTORE_CLIENT_SOCKET_LIFETIME field [16:02:01] and is erroring [16:02:43] right [16:02:47] Mwarf :( [16:13:03] (03CR) 10Ottomata: [C: 031] Add bin/refinery-drop-older-than [analytics/refinery] - 10https://gerrit.wikimedia.org/r/471279 (https://phabricator.wikimedia.org/T199836) (owner: 10Mforns) [16:17:50] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Ottomata) Ok! [16:19:28] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10Ottomata) [16:20:11] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Ottomata) @Cmjohnson I updated {T207194} to reflect the new naming. Please proceed and then assign to Cloud VPS folks f... [16:57:18] fdans: (cc ottomata ) if refine process is temporarily stalled , let's send e-mail to analytics@ [16:58:06] k hold on! am close i feel [17:04:24] rawr ok not close. [17:04:38] joal: i think the best thing to do would be for me to fix up this patch and get it merged https://github.com/apache/spark/pull/21012 [17:04:43] ottomata, joal: what i do not understand: why did we not get this alarm yesterday? me no compredou, things were refined ok yesterday weren't they? [17:04:45] our use of hive jdbc directly [17:05:10] nuria: it is only going to fire for things that require an alter table to add a column [17:05:52] ottomata: oooooohhh, and how did we figued that out ? [17:06:20] our cdh hive jars are only used by refine when an alter needs to happen [17:06:25] otherwise everything is via spark [17:06:31] i do not know why this didn't break before the upgrade [17:06:45] likely some subtle change somewhere in the way classpaths are loaded [17:11:15] nuria: as far as I can tell the only tables affected are the ones that were sent in that email [17:11:22] SearchSatisfaction and TestSearchSatisfaction2 [17:11:24] so far anyway [17:11:59] looks like there was some change in the data/schema for those things starting around 09:00 utc yesterday [17:12:29] the monitor refine job only runs once a day [17:12:33] which is why we've only seen the one email [17:12:44] ottomata: on meeting but this is happen before, those schemas have not modified in a while so why would a column addition be needed? https://meta.wikimedia.org/w/index.php?title=Schema:SearchSatisfaction&action=history [17:12:59] nuria: it doesn't necessarily have to do with schemas at all [17:13:02] refine doesn't look at schema [17:13:12] it has to do with what is in data for those hours, vs what is already in hive table [17:13:29] ottomata: right it could be data for 1 column that was not present before, yes [17:13:29] maybe they just started sending some field that has always been in the schema but wasn't sent before [17:13:32] right [17:13:53] i betcha the client that sends this is the same for both of those schemas (based on the way they are named) [17:14:00] and they just made some client change to send some new field [17:14:03] ottomata: ok, i would triple check that after meeting [17:14:07] which is why both of those schemas reported at the same time [17:14:20] ok nuria, i just ran refine monitor [17:14:27] there are more schemas missing data [17:14:42] enough to warrant an email [17:14:43] sending [17:20:02] fdans: shall I merge your puppet patch? [17:20:14] yessss [17:20:18] thank you elukey [17:24:21] hm actually there are quite a few tables affected [17:25:03] [17:27:36] (---^ was my cat) [17:29:19] joal: yarrrrr, i'm really not sure how to solve this! [17:29:27] the problem is our use of JDBC -> hive [17:29:35] its clashing with spark hive versions [17:30:04] we could shell out to a hive command or JDBC thing so as to not have jar versions clash [17:30:07] but that gets uglier and uglier [17:30:09] yar [17:33:26] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic: Add prometheus metrics for varnishkafka instances running on caching hosts - https://phabricator.wikimedia.org/T196066 (10elukey) @mforns created https://gerrit.wikimedia.org/r/#/admin/projects/operations/debs/prometheus-varnishkafka-exporter [17:35:42] joal: if you are around...brain bounce plz! [17:42:56] (brb) [17:47:38] (03PS1) 10Milimetric: [WIP] DO NOT MERGE testing speed of sqoop against labs with different approaches [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473256 [17:52:00] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Bstorm) > - **Specialized views** - Views for comments from each of revision, archive, and logging, separately. We have t... [18:00:24] 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Nuria) Once data comes in you can see how you instrumentation is working @Prtksxna [18:00:35] Hey ottomata - here now, after staff? [18:02:28] ya [18:02:50] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Anomie) >>! In T209031#4743469, @Bstorm wrote: > Materialized views in mariadb requires a plugin that basically is adding some... [18:03:09] 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10atgo) @Prtksxna I just adjusted the time range and it works: https://piwik.wikimedia.org/index.php?module=CoreHome&action=index&idSite=18&period=day&date=yesterday#?idSite=18&period=mont... [18:04:34] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Bstorm) Note: I'm not done reading back yet--but yeah, that's what I was thinking of. There's a lot here. [18:33:30] elukey: it's late for you so I feel bad, but I am interested [18:33:36] I did systemctl list-timers --all on an-coord1001 [18:33:44] and so I see the refinery-sqoop-mediawiki.timer job [18:33:49] just wanted to see what command it was running [18:34:33] (meeting but I'll be available in max 1h! If you have patience I'll explain! :) [18:34:55] but the gist of it is that there is a .service unit that the .timer calls [18:35:04] you can check it via systemctl cat refinery-sqoop-mediawiki.timer [18:35:11] systemctl cat refinery-sqoop-mediawiki.service [18:35:22] this --^ has an ExecStart= line [18:35:40] that calls a script (we did it to not mess with escaping etc..) [18:35:51] in that script there are all the parameters (it is managed by puppet) [18:37:46] aha! thank you, got it [18:37:48] ExecStart=/usr/local/bin/refinery-sqoop-mediawiki [18:37:55] sudo -u hdfs cat /usr/local/bin/refinery-sqoop-mediawiki [18:38:27] the last part about puppet managing the file was what I didn't understand, thought it was pointing to something fake to trick me :) [18:41:11] added to docs for clueless people like me (probably just future me): https://wikitech.wikimedia.org/w/index.php?title=Analytics%2FTeam%2FOncall&type=revision&diff=1808485&oldid=1808074 [18:51:14] milimetric: I am looking forward for suggestions/etc.. about how to improve the whole thing, is pretty new for me too [18:52:00] but the systemctl list-timers command always make me feel happier than a crontab :D [18:52:06] no it makes lots of sense, I might add a quick example at the top for getting this specific command I just did, but only if someone else gets confused again [18:52:15] yeah, it's great, much neater [18:53:11] joal: 12 minutes for frwiki.logging, 1 minute 38 seconds for simplewiki.logging [18:53:18] seems fast? [18:53:27] Fast enough I think :) [18:53:40] milimetric: can you go for a full sqoop of logging table onl please? [18:53:55] doing [18:54:06] milimetric: I think it'll take a couple hours, and that's fine [18:54:27] Wow great - This makes me fell we have an easy and workable solution :) [18:54:34] Might even work for revision [18:55:39] I can do revision for these test wikis and I'll do all wikis for just logging [18:56:08] Super fine milimetric [18:57:11] milimetric: updating the task [18:58:04] man, that FUSE mount is working like 10% of the time I try to use it... [18:58:31] milimetric: it re-fuse ? [18:58:41] * joal hides again [18:58:44] :) yes, it refuses to work [18:58:45] lol [19:06:46] * elukey off! [19:06:52] Bye elukey :) [19:07:06] fuse mount on stat1004 was dead [19:07:11] are you guys working on it? [19:07:25] (I just force a remound) [19:07:29] *remount [19:07:56] anyway, not a big deal, and we'll need to change it in the near future [19:08:03] (hopefully for something better!) [19:09:57] 10Analytics-Kanban, 10MediaWiki-General-or-Unknown, 10Tool-Pageviews: Check abnormal pageviews for some pages on itwiki - https://phabricator.wikimedia.org/T209404 (10Daimona) [19:18:24] 10Analytics-Kanban: EventLogging Hive Refine broken after upgrade to CDH 5.15.0 - https://phabricator.wikimedia.org/T209407 (10Ottomata) [19:19:06] (03PS1) 10Ottomata: Add Hive 1.1.0 jars from CDH 5.10.0 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473268 (https://phabricator.wikimedia.org/T209407) [19:19:39] 10Analytics-Kanban, 10Patch-For-Review: EventLogging Hive Refine broken after upgrade to CDH 5.15.0 - https://phabricator.wikimedia.org/T209407 (10Ottomata) Ok: When ALTERing Hive tables, DataFrameToHive uses a manual JDBC connection to Hive, rather than Spark SQL. This is a work around for https://issues.ap... [19:23:58] (03PS2) 10Ottomata: Add Hive 1.1.0 jars from CDH 5.10.0 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473268 (https://phabricator.wikimedia.org/T209407) [19:24:26] (03CR) 10Ottomata: [V: 032 C: 032] Add Hive 1.1.0 jars from CDH 5.10.0 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473268 (https://phabricator.wikimedia.org/T209407) (owner: 10Ottomata) [19:45:05] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install an-worker10[78-96].eqiad.wmnet - https://phabricator.wikimedia.org/T207192 (10Cmjohnson) a:05Cmjohnson>03RobH Robh can you do the installs please. [19:50:31] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10Cmjohnson) @ottomata can these go in any row or does it need to be row B? [19:51:31] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10Ottomata) IIUC it has to be row B for them to be used as Cloud Virts. @Andrew to confirm. If they can go any row, then they should be spread out as even... [19:52:37] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10JAllemandou) Thanks again a lot @Anomie, @daniel, @Bstorm for having chimed in, your questions and suggestions have helped us... [19:53:41] elukey: I tried to copy something via fuse and it died/was dead. You can almost disable it as far as I'm concerned, I wonder if there are any jobs using it [19:54:09] the time it saves me from going directly to hdfs it also costs me in hung processes [20:01:47] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10chasemp) I don't want to muddy the waters as I have not been involved here :). But worth noting there is more than the views a... [20:09:51] a-team i think we fixed Refine, am re-refining failed stuff over the last 48 hours now [20:10:03] ack [20:10:04] \o/ ottomata :) [20:10:08] https://phabricator.wikimedia.org/T209407 [20:10:10] info there ^ [20:11:54] 10Analytics, 10Operations, 10Wikimedia-Logstash, 10Core Platform Team Backlog (Watching / External), and 2 others: Review and make librdkafka-0.11.6 installable from stretch-wikimedia - https://phabricator.wikimedia.org/T209300 (10herron) 05Open>03Resolved a:03herron [20:12:41] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10JAllemandou) Thanks @chasemp for raising the point. We are aware of the 2 steps for sanitization. For us, code for sanitariu... [20:15:45] Gone for tonight team - see you tomorrow evening [20:48:16] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10chasemp) Makes sense, again sorry for the drive by comment. Let's me know if I can be helpful :) [20:57:24] 10Analytics-Kanban, 10Patch-For-Review: EventLogging Hive Refine broken after upgrade to CDH 5.15.0 - https://phabricator.wikimedia.org/T209407 (10Ottomata) [20:57:37] 10Analytics-Kanban, 10Patch-For-Review: EventLogging Hive Refine broken after upgrade to CDH 5.15.0 - https://phabricator.wikimedia.org/T209407 (10Ottomata) Woot! ` 18/11/13 20:52:34 INFO RefineMonitor: No dataset targets in /wmf/data/raw/eventlogging between 2018-11-11T20:50:26.600Z and 2018-11-13T16:50:26... [21:26:32] 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10atgo) @Dzahn sometimes I'm getting a Bugzilla page instead of the Bienvendia page. See this screenshot: {F27213617} Right now that's the only page I'm getting. Have tested across multip... [21:28:15] 10Analytics: stats.wikimedia.org home page should link to wikistats 2 - https://phabricator.wikimedia.org/T191555 (10Milimetric) [21:28:19] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Deprecate reportcard: https://analytics.wikimedia.org/dashboards/reportcard/ - https://phabricator.wikimedia.org/T203128 (10Milimetric) [21:42:41] 10Analytics, 10Analytics-Kanban: Generate pagecounts-ez data back to 2008 - https://phabricator.wikimedia.org/T188041 (10Milimetric) a:03Milimetric [21:43:16] 10Analytics, 10Analytics-Kanban: Generate pagecounts-ez data back to 2008 - https://phabricator.wikimedia.org/T188041 (10Milimetric) I'm so sorry I delayed so long, will work on this next week if I can. [21:45:17] https://phabricator.wikimedia.org/T202592#4744207 !!!!!!! [22:37:15] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10Andrew) Yep, Row B. @Cmjohnson, these are to be handled like any other cloudvirt server. For network details it might be best to consult with @ayounsi [23:09:49] 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Prtksxna) >>! In T202592#4743532, @atgo wrote: > @Prtksxna I just adjusted the time range and it works: https://piwik.wikimedia.org/index.php?module=CoreHome&action=index&idSite=18&perio... [23:22:43] 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Dzahn) @atgo Try again now. It should be better. [23:25:45] (03PS1) 10QChris: Add .gitreview [analytics/wmde/WDCM-WikipediaSemantics-Dashboard] - 10https://gerrit.wikimedia.org/r/473297 [23:25:48] (03CR) 10QChris: [V: 032 C: 032] Add .gitreview [analytics/wmde/WDCM-WikipediaSemantics-Dashboard] - 10https://gerrit.wikimedia.org/r/473297 (owner: 10QChris) [23:26:32] 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10atgo) @dzahn still not working after clearing cache on my machine. Any chance it's office wifi that's caching and causing the problem? I could follow up with IT. [23:56:19] 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Dzahn) fixed for real after bblack restarted a backend server and also purged the cached contents [23:57:27] 10Analytics, 10Analytics-Kanban, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10atgo) 05Open>03Resolved Thank you!