[00:15:27] 10Quarry, 10Patch-For-Review: Excel does not recognize Quarry CSV output as UTF-8 - https://phabricator.wikimedia.org/T76126#3484849 (10IKhitron) @zhuyifei1999, thank you very much! I have not excel today, so I asked somebody to check. He is absolutely happy. [00:21:12] 10Quarry: Quarry's indentation function is not completely functional - https://phabricator.wikimedia.org/T101424#1338373 (10zhuyifei1999) https://github.com/codemirror/CodeMirror/issues/1942 suggests to use `CodeMirror.extendMode("sql", {electricChars: ")"});` [00:26:55] (03PS1) 10Zhuyifei1999: view.js: Set electricChars: ')' for SQL [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/368603 (https://phabricator.wikimedia.org/T101424) [00:31:24] 10Quarry: Add an option to export result in Wikilist - https://phabricator.wikimedia.org/T137268#3484860 (10zhuyifei1999) >>! In T137268#2808805, @Dvorapa wrote: > I know this method, but it is super complicated to do it this way, then export results e.g. in excel format, then get everything from excel format in... [00:31:40] 10Quarry: Add an option to export result in Wikilist - https://phabricator.wikimedia.org/T137268#3484861 (10zhuyifei1999) p:05Triage>03Low [00:37:51] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3091603 (10zhuyifei1999) This task is essentially "Invalid" unless some clear steps-to-reproduce are provided. [00:40:07] 10Quarry: Investigate redash.io (open source query and report system) - https://phabricator.wikimedia.org/T131651#3484867 (10zhuyifei1999) [00:40:11] 10Quarry, 10Cloud-Services: Consider moving Quarry to be an installation of Redash - https://phabricator.wikimedia.org/T169452#3484870 (10zhuyifei1999) [00:48:54] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3484872 (10IKhitron) How can I do this? I know that it happens, but can't bring you the same run before and after without a time machine. [00:49:40] (03CR) 10Huji: [C: 032] view.js: Set electricChars: ')' for SQL [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/368603 (https://phabricator.wikimedia.org/T101424) (owner: 10Zhuyifei1999) [00:49:57] (03Merged) 10jenkins-bot: view.js: Set electricChars: ')' for SQL [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/368603 (https://phabricator.wikimedia.org/T101424) (owner: 10Zhuyifei1999) [00:56:22] 10Quarry: Quarry's indentation function is not completely functional - https://phabricator.wikimedia.org/T101424#3484877 (10Huji) 05Open>03Resolved a:03Huji [01:01:00] 10Quarry: Quarry should store the query runtime along with the results - https://phabricator.wikimedia.org/T172082#3484883 (10Huji) [01:04:03] 10Quarry: Quarry should store the query runtime along with the results - https://phabricator.wikimedia.org/T172082#3484904 (10zhuyifei1999) [01:04:07] 10Quarry: Include query execution time - https://phabricator.wikimedia.org/T126888#3484907 (10zhuyifei1999) [01:07:28] 10Quarry: Include query execution time - https://phabricator.wikimedia.org/T126888#3484909 (10Huji) [01:09:20] 10Quarry: Quarry's indentation function is not completely functional - https://phabricator.wikimedia.org/T101424#3484910 (10Huji) 05Resolved>03Open @zhuyifei1999 unsure if it should take effect in minutes or not, but I just checked now and it didn't solve the problem on https://quarry.wmflabs.org/ yet. Re-op... [01:24:35] 10Quarry: Some querries cannot be 'unstarred' - https://phabricator.wikimedia.org/T165169#3258930 (10zhuyifei1999) Issues: * The table `star` allows duplicates * On duplicate, [[https://github.com/wikimedia/analytics-quarry-web/blob/7dd8c60973fd03692491877fea2bec8f9acb2987/quarry/web/app.py#L159|the button will... [01:50:22] (03PS1) 10Zhuyifei1999: Handle duplicates in table 'star' [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/368604 (https://phabricator.wikimedia.org/T165169) [01:51:07] 10Quarry: Quarry's indentation function is not completely functional - https://phabricator.wikimedia.org/T101424#3484937 (10zhuyifei1999) Works for me. Have you cleared your browser cache? [01:52:17] (03PS2) 10Zhuyifei1999: Handle duplicates in table 'star' [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/368604 (https://phabricator.wikimedia.org/T165169) [01:53:59] (03CR) 10Zhuyifei1999: "The relevant alter-table is `ALTER IGNORE TABLE star ADD UNIQUE INDEX star_user_query_index (user_id, query_id);`" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/368604 (https://phabricator.wikimedia.org/T165169) (owner: 10Zhuyifei1999) [01:58:33] 10Quarry: Quarry's indentation function is not completely functional - https://phabricator.wikimedia.org/T101424#3484938 (10zhuyifei1999) My test is: press `(`, enter, `)` [02:01:49] (03CR) 10Zhuyifei1999: "> What's the status of this? Can/should I pick this up?" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/286094 (https://phabricator.wikimedia.org/T76466) (owner: 10Alex Monk) [03:07:05] 10Quarry: Some long queries give no results - https://phabricator.wikimedia.org/T109016#1537760 (10zhuyifei1999) Probably because of result loading takes time. [03:48:54] Beeline fails to establish a Hive connection for me (both on stat1005 and stat1004): [03:49:16] https://www.irccloud.com/pastebin/kwORty2I/Beeline%20connetion%20fail%20on%20stat1005 [03:49:59] hive CLI still seems to work [03:51:25] 10Quarry: Quarry query in unknown state - https://phabricator.wikimedia.org/T170464#3432141 (10zhuyifei1999) The logs are unfortunately lost :( [07:56:59] 10Analytics, 10Analytics-EventLogging, 10Community-Tech, 10DBA, 10User-Elukey: Drop CookieBlock* tables from EventLogging DB - https://phabricator.wikimedia.org/T171883#3485104 (10Marostegui) Will #analytics handle this? I would suggest to rename the tables on all the hosts and leave them with a differe... [07:57:08] 10Analytics, 10Analytics-EventLogging, 10Community-Tech, 10DBA, 10User-Elukey: Drop CookieBlock* tables from EventLogging DB - https://phabricator.wikimedia.org/T171883#3485106 (10Marostegui) p:05Triage>03Normal [08:00:51] 10Analytics, 10Analytics-EventLogging, 10Community-Tech, 10DBA, 10User-Elukey: Drop CookieBlock* tables from EventLogging DB - https://phabricator.wikimedia.org/T171883#3479233 (10elukey) Sure we can handle it, renaming sounds good. As far as I can understand dropping the renamed table after some days se... [08:02:34] 10Analytics, 10Analytics-EventLogging, 10Community-Tech, 10DBA, 10User-Elukey: Drop CookieBlock* tables from EventLogging DB - https://phabricator.wikimedia.org/T171883#3485112 (10Marostegui) >>! In T171883#3485110, @elukey wrote: > Sure we can handle it, renaming sounds good. As far as I can understand... [09:06:48] 10Analytics-Kanban, 10User-Elukey: Upgrade AQS to node 6.11 - https://phabricator.wikimedia.org/T170790#3485238 (10elukey) Done! [09:08:24] FYI aqs is running nodejs 6.11 now [09:44:36] 10Quarry: Gigantic query results cause a SIGKILL and the query status do not update - https://phabricator.wikimedia.org/T172086#3485418 (10zhuyifei1999) Oops, sorry forgot to add the tags [10:24:06] 10Quarry: Gigantic query results cause a SIGKILL and the query status do not update - https://phabricator.wikimedia.org/T172086#3485554 (10zhuyifei1999) Recent OOMs: ``` zhuyifei1999@quarry-runner-01:~$ zcat /var/log/messages*.gz | cat /var/log/messages* - | grep oom | grep python2.7 Jul 21 05:11:05 quarry-runne... [10:27:21] I have been running long HiveQL queries for an update of the Wikidata Concepts Monitor system from beeline, working on stat1005, since last night. [10:27:48] Recently, the following error started to occur: [10:27:50] Unexpected end of file when reading from HS2 server. The root cause might be too many concurrent connections. Please ask the administrator to check the number of active connections, and adj [10:27:50] ust hive.server2.thrift.max.worker.threads if applicable. [10:27:50] Error: Could not establish connection to jdbc:hive2://analytics1003.eqiad.wmnet:10000: null (state=08S01,code=0) [10:27:51] No current connection [10:28:39] Anyone who knows about the cluster and Hive/Hadoop settings: obviously, there are some constraints. If the constraints are documented, please point to the documentation. If not, please advise. Thanks a lot. [10:30:57] And it seems like I cannot connect to beeline from stat1005 now... [10:31:20] hey, it was reported by another user this morning, I am currently checking the issue [10:32:21] 10Quarry: Gigantic query results cause a SIGKILL and the query status do not update - https://phabricator.wikimedia.org/T172086#3485563 (10zhuyifei1999) Uh, ``` zhuyifei1999@quarry-runner-01:~$ zcat /var/log/messages*.gz | cat /var/log/messages* - | grep oom | grep invoked Jul 27 12:34:04 quarry-runner-01 kernel... [10:32:48] elukey: Hi elukey, thanks a lot. Just to let you, in that case: I have been running some HiveQL queries from an R script on stat1005 - I can't really estimate that precisely, but I would say they were quite demanding. So that could be a cause of the problem, in case the cluster has some constraints that were violated. I will certainly need to switch to batch processing. [10:33:02] elukey: *to let you *know*, typo, sorry [10:33:14] super thanks [10:33:39] one thing that I discover today is that the java heap sizes of hive* daemons are not correct [10:34:33] !log restart hive-server on an1003 - beeline not connecting, thrift errors [10:34:34] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:35:55] GoranSM,HaeB - now I can connect with beeline to hive, need to do a bit of investigation [10:36:19] 10Quarry: Quarry query in unknown state - https://phabricator.wikimedia.org/T170464#3485566 (10zhuyifei1999) Probably not an OOM as in T172086. ``` MariaDB [quarry]> select * from query where id = 18832; +-------+---------+-------------------+---------------+---------------------+-----------+-----------+--------... [10:36:43] elukey: I've just failed connecting to beeline; do you want me to open a Phab ticket and paste the error message so that you can have a more thorough insight? [10:37:36] GoranSM: I can connect, can you retry now? [10:37:48] I am going to open one don't worry, will give you the link [10:37:50] elukey: let me see, trying from stat1005 [10:37:52] still need to grab some data [10:38:19] 2017-07-31 00:47:19,471 WARN thrift.ThriftCLIService (ThriftCLIService.java:ExecuteStatement(508)) - Error executing statement: [10:38:22] org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.OutOfMemoryError: Java heap space [10:38:25] there you go [10:38:32] this one was probably your query :D [10:38:51] elukey: I can get to the beeline prompt, but "use gorasm" says: No current connection. [10:39:34] elukey: I told you it was one of my queries. I can't believe that I will have to do batches on Hadoop. A bunch of data on Wikidata usage... sorry. I couldn't have estimated that it would cause problems. [10:39:58] nono you unveiled an issue with the configuration of hive [10:40:06] so it is good :) [10:40:07] 0: jdbc:hive2://analytics1003.eqiad.wmnet:100> use gorasm; [10:40:07] Error: Error while compiling statement: FAILED: SemanticException [Error 10072]: Database does not exist: gorasm (state=42000,code=10072) [10:40:25] elukey: What?! [10:40:44] elukey: it's goransm, not gorasm. [10:41:04] elukey: Oh don't do that to me please :'( [10:41:05] 10Quarry: Quarry query in unknown state - https://phabricator.wikimedia.org/T170464#3485572 (10zhuyifei1999) Ok, I forked the query to https://quarry.wmflabs.org/query/20623 and reproduced the issue: ``` zhuyifei1999@quarry-runner-01:~$ grep 33b3c878-4f00-46a1-ad8c-a39ef615872f /var/log/syslog -B 1 -A 11 Jul 31... [10:41:06] well I copy/pasted whar you wrote [10:41:12] elukey: Sorry... [10:41:44] works for me now :) [10:41:57] elukey: It's official, I am looking at my logs, yes some of my queries were reporting: Error: Error running query: java.lang.OutOfMemoryError: Java heap space (state=,code=0) [10:43:08] elukey: The HiveQL query has WHERE something IN (millions-of-things), so I guess that has caused the problem. I will have to cut those millions-of-things into batches and run several queries instead. [10:43:24] 10Quarry: Quarry cannot store results with `table_a.column_name` and `table_b.column_name` in the same result - https://phabricator.wikimedia.org/T170464#3485577 (10zhuyifei1999) [10:44:50] elukey: Confirming that beeline works for me know, no problem with any connections. [10:45:16] HaeB: Hey please try to connect to beeline, do something, and let us know whether it works for you too. [10:45:29] elukey: thanks a lot. [10:45:52] GoranSM: super.. so I am going to lunch in a bit but next steps are: 1) if you could batch your giant query that would be great :) 2) I'll follow up on the JVM heap sizes and open a phab task [10:46:43] elukey: I will certainly introduce bath processing into the R script that orchestrates these queries, no worries. Thanks for support. [10:51:16] 10Quarry: Quarry cannot store results with `table_a.column_name` and `table_b.column_name` in the same result - https://phabricator.wikimedia.org/T170464#3485627 (10zhuyifei1999) a:05awight>03zhuyifei1999 Reproduced on vagrant with: ```lang=sql SELECT query.id, query_revision.id FROM query, query_revision WH... [10:55:48] * elukey lunch! [11:08:35] 10Quarry: Quarry cannot store results with `table_a.column_name` and `table_b.column_name` in the same result - https://phabricator.wikimedia.org/T170464#3485676 (10zhuyifei1999) ``` MariaDB [quarry]> SELECT query.id, query_revision.id -> FROM query, query_revision -> WHERE query.id = 1 -> AND query_... [11:26:15] 10Quarry: Quarry cannot store results with identical column names - https://phabricator.wikimedia.org/T170464#3485732 (10zhuyifei1999) [11:44:24] hellooooo [11:44:30] elukey, yt? [11:44:47] 10Quarry: Quarry cannot store results with identical column names - https://phabricator.wikimedia.org/T170464#3485792 (10zhuyifei1999) a:05zhuyifei1999>03None Some background for anyone who can cleanly resolve this: Results are stored in SQLite tables. [12:06:30] mforns: o/ [12:08:25] hey elukey ! [12:08:26] :] [12:08:38] qq: is the purging script running for 2015 still? [12:09:02] mforns: afaik yes, but lemme check [12:09:15] elukey, I'd like to introduce some changes to the white-list before we run 2016 [12:09:19] is that possible? [12:09:27] sure! [12:09:32] ok :] [12:09:33] still running on 2015 data though :( [12:09:38] ok ok [12:09:51] precisely on Edit_13457736_15423246 [12:10:09] will push changes to the list before stand-up and add you as a reviewer [12:10:15] it takes ~10m for a batch of 100k records [12:10:31] oh, you bumped it to 100k? [12:11:22] nothing against, I mean [12:11:37] leaving for 20 mins for lunch [12:11:48] see you in a bit [12:12:08] yep yep [12:34:45] 10Analytics-Kanban, 10User-Elukey: Wrong JVM heap size set for Hive* daemons - https://phabricator.wikimedia.org/T172107#3485953 (10elukey) [12:38:41] baack [12:39:16] (03PS1) 10Zhuyifei1999: Set session.permanent = True when user is logged in [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/368745 (https://phabricator.wikimedia.org/T164390) [12:40:35] 10Quarry: Quarry's indentation function is not completely functional - https://phabricator.wikimedia.org/T101424#3485978 (10Huji) 05Open>03Resolved It was a cache issue. Good catch! (No pun intended). [12:41:59] 10Quarry: Make a Quarry automatically refresh on a set time interval - https://phabricator.wikimedia.org/T141698#3485985 (10zhuyifei1999) [12:42:03] 10Quarry: Recurring queries - https://phabricator.wikimedia.org/T101835#3485988 (10zhuyifei1999) [13:35:34] 10Quarry: Include query execution time - https://phabricator.wikimedia.org/T126888#3486103 (10zhuyifei1999) See also T77941 [13:36:03] Hey folks. I'm looking at /srv on stat1006 and it's pretty bare. Is that on purpose? [13:36:19] Sorry I meant /srv/published-datasets [13:36:37] What would it take to keep that in sync with the file server? [13:37:00] It's very confusing and error prone to have to create deep, empty directories in order to copy files. [13:37:13] Also OMG I hate waiting for rsync, but that's unrelated. [13:41:07] 10Analytics-Kanban: Add QuickSurvey schemas to EventLogging white-list - https://phabricator.wikimedia.org/T172112#3486127 (10mforns) [13:43:58] 10Quarry: Add SHOW EXPLAIN support to Quarry - https://phabricator.wikimedia.org/T146483#3486149 (10zhuyifei1999) a:03zhuyifei1999 I'll try the button implementation, and show the button when the status is "running", and not store the results. Sometimes when you show explain too early the query plans of subque... [13:44:18] halfak: hm, I don't know why it doesn't have a couple of the things from stat1003. [13:44:23] I can copy them over now. [13:44:24] but [13:44:40] files with the same names in published-datasets on different stat boxes will conflict [13:44:46] since they all get synced to the same /datasets/ location [13:45:07] each stat box is distinct, but the destination location of all published-datasets location is the same [13:45:12] ottomata, right. I understand that. [13:45:20] what are you missing? [13:45:38] This is what is confusing. Maybe the dir could be rsync'd back to the stat boxes so each stat box has a full picture. [13:45:43] !log suspended webrequest-load-bundle as prep step to restart hive metastore/server [13:45:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:45:47] Identical files wouldn't conflict, would they? [13:45:56] halfak: that would get even more confusing [13:45:56] as the rsync is done with --delete [13:46:01] so you can remove files from /datasets [13:46:05] by deleting from a stat box published-datasets [13:46:32] ottomata, right now, I just created a string of directories. [13:46:33] "mkdir -p /srv/published-datasets/archive/public-datasets/all/wp10/20170701/" [13:46:45] Just to get a file in the right spot. It's really easy to miss. [13:46:52] Maybe there's nothing that can be done. [13:47:24] halfak: not sure I understand, maybe [13:47:28] My file still isn't rsync'd >:( [13:47:33] maybe we shoudl rsync archive from stat1003 -> stat1006 [13:47:36] and remove it from stat1003? [13:48:17] Sure! That would be somewhat helpful. [13:48:52] How insane would it be to have a truely common disk between the two boxes where the shared files are ... shared. [13:48:53] ? [13:49:06] you mean the dreaded NFS? [13:49:07] :) [13:49:30] I guess. So the answer is "insane, because dread" [13:49:45] ha, ya [13:49:56] i had to write https://github.com/wikimedia/puppet/blob/fe9a43d4098901f00219a92efa700022565b7fbd/modules/statistics/files/hardsync.sh just to make this multie source rsync with --delete work [13:50:04] https://en.wikipedia.org/wiki/Judge_Dredd [13:50:13] PLEASE NO NFS PLEASE NO NFS PLEASE NO NFS [13:50:21] automatic answer from Luca [13:50:22] :D [13:50:28] lol [13:50:43] I do dread this rsync stuff. Could it cycle faster? [13:51:18] I've been waiting 35 minutes [13:51:37] hm, i wonder if we could make a little push script that you could run manually [13:51:48] Could it run every 5 minutes? [13:52:00] Do you think that would be too much? [13:53:13] you don't wanna push halfak? :) [13:53:34] halfak: should I move all archive/ to stat1006 and remove from 1003? [13:53:35] I could push. That'd be cool too :) [13:53:44] ottomata, sure! [13:54:03] ok, doing that first [13:54:08] https://en.wikipedia.org/wiki/Push_It_(Salt-n-Pepa_song) [13:54:15] * halfak trolls with wikipedia articles [13:55:04] duh duh duh dun dun, dundundundundun [13:56:26] (03PS4) 10Mforns: Add script to purge old mediawiki data snapshots [analytics/refinery] - 10https://gerrit.wikimedia.org/r/355601 (https://phabricator.wikimedia.org/T162034) [13:56:46] If you could name the utility push_it_real_good.sh, that'd be cool. [14:00:34] GoranSM: o/ - I am seeing via Yarn.w.o that you are doing hive related inserts, would you mind to stop after the current one? I'd need to restart the hive daemons and it might cause your queries to fail [14:01:02] mforns: can we use the same tab len for all the tables? I just noticed that it varies a lot :) [14:02:52] elukey, it's just 1 tab char [14:03:18] sometimes it aligns to the next char, sometimes to later on, but it's only a tab [14:03:37] we can not add more tabs, otherwise the TSV format will break no? [14:03:44] all right, so it is gerrit showing me weird things.. I somehow confused it with nspaces tab [14:03:50] yes [14:03:52] super [14:03:53] :) [14:04:04] hehe, agree with you it's ugly [14:04:20] ottomata, a bunch of stuff just got deleted from public datasets [14:04:26] lol [14:05:05] deleted?! [14:05:06] oh [14:05:09] :P [14:05:11] yea that's ok [14:05:15] from the published ones? [14:05:21] mforns: the code review can't be rebased on operations/production, it seems conflicting with MediaWikiInstallPingback [14:05:22] sorry from /datasets ? [14:05:26] Looking at https://analytics.wikimedia.org/datasets/archive/public-datasets/all/wp10 [14:05:28] yeah [14:05:30] There should be 3 folders [14:05:34] sorry about that, the other rsync is taking longer [14:05:39] i deleted something else on thorium too soon [14:05:43] didn't realize so much data was there [14:05:48] gotcha. [14:05:58] it'll come back, i'll force a run once it gets over to stat1006 [14:05:58] elukey, yes, we should merge the other one first, but wait, before merging those, please wait for research, sorry I forgot to tell you! [14:06:03] PUSH IT GOOD [14:06:05] ;) [14:06:06] actually, halfak i can put it back real quick [14:06:10] if you are worried about it [14:06:14] i didn't delete it, i just moved it out of the way [14:06:33] or we can wait... [14:06:35] It's probably OK. Trying to share a couple datasets with someone right now but I think they already got the other one ^_^ [14:06:38] ok [14:07:24] mforns: ahh okok, it depends on the other one and you have them one on top of the other, will wait : [14:07:28] :) [14:08:58] thanks! [14:10:55] halfak: errrrrgh, pushing is going to be harder than I thought, and rrunning more frequently wont' help that much. [14:10:58] there are two crons [14:11:02] the rsync from the source stat box [14:11:06] and also the hardsync script [14:11:12] i think we shouldn't run the hardsync script too often [14:11:21] as it bascially recreates the whole /datasets/ directory every time [14:11:33] (it doesn't copy files around, it just recreates the whole dir structure with hardlinks) [14:11:35] 10Analytics-Kanban, 10Discovery, 10Discovery-Analysis: Add purge info for Kartographer schema - https://phabricator.wikimedia.org/T171622#3486286 (10mforns) Hi @mpopov ! I have looked at the schema and I have one question: Is the userToken field a persistent token? Or does it expire after the session finishes? [14:11:38] ottomata, oh. That does seem like a lit. [14:11:41] *alot [14:12:00] so, its pretty fast to run, but the more often we run it, the more often we might interrupt something [14:12:02] https://en.wiktionary.org/wiki/alot [14:12:02] liiiiiike, i dunno [14:12:12] i dunno what happens if someone is downloading a file and hardsync runs [14:12:24] i suppose it would keep working, since we aren't deleting the actual file [14:12:29] hm [14:12:42] but, i will at least make it easier to push the orignal stuff... [14:12:45] Hmm... OK. Well this isn't terrible, but it is very frustrating. Maybe we should stick this on a backlog somewhere. [14:18:46] ottomata: tried to restart hive-server but xmx didn't change.. [14:19:31] ottomata, what do you think about figuring out a way to make it easier to copy files from stat6 to 5 and just having one directory get rsync'd? Would that make the push/setup easier? [14:21:17] maybe but that could also get confusing halfak. what if you want to delete a file from stat1006, and you dont' ahve acess to stat1005? [14:21:22] but the files source is stat1005? [14:21:27] youd' delete it, but then it would come back! [14:21:32] elukey: hmMmMm [14:21:51] I'm imagining that you'd just not be able to push datasets to the public server without stat1005 access. [14:22:09] Or we could make stat1006 the one that hosts the One True Public Datasets Directory(TM) [14:22:13] w/e [14:22:41] then you'd have to manually sync your stuff to stat1006 if you want it published [14:22:49] Right [14:22:55] Way easier from my point of view [14:22:56] also, not all folks that have access to stat1005 have access to stat1006 :/ [14:23:08] there's not security reason they don't, they just don't [14:23:18] Assuming I can get the damn dataset online rather than blocking work for 30+ minutes. [14:23:18] mostly to reduce the number of servers folks have acess to [14:29:46] brb [14:41:24] elukey: Again: Unexpected end of file when reading from HS2 server. The root cause might be too many concurrent connections. Please ask the administrator to check the number of active connections, and adj [14:41:24] ust hive.server2.thrift.max.worker.threads if applicable. [14:41:24] Error: Error while cleaning up the server resources (state=,code=0) [14:41:45] I am working on it [14:41:52] told you above :) [14:42:05] O:-) I've done batches [14:43:02] elukey: Sorry, I've missed this one: GoranSM: o/ - I am seeing via Yarn.w.o that you are doing hive related inserts, would you mind to stop after the current one? I'd need to restart the hive daemons and it might cause your queries to fail [14:53:43] ottomata: I've created manually /etc/hive/../hive-env.sh with HADOOP_OPTS=-Xmx10g and it works [14:53:46] (on an1003) [14:54:03] but it is weird to configure since the file is used by metastore and server [14:54:51] hmm, [14:55:02] HADOOP_OPTS is the only way? [14:55:12] check out the hive-metastore and hive-server scripts that get launched by the init scripts [14:55:19] I did [14:55:22] hm [14:55:41] so the init.d checks out the default files [14:55:59] but it doesn't seem that it passes them to the hive script that it calls in start() [14:56:15] I thought that maybe exporting HADOOP_OPTS would have been ok [14:56:19] but it doesn't work [14:56:41] and the hive script explicitly looks for hive-env.sh [14:59:42] elukey: HIVE_METASTORE_HADOOP_OPTS doesn't work? [15:00:37] ping mforns holaaa [15:00:52] ottomata: didn't try it, wasn't mentioned in the docs... but this one shouldn't be for the server right? [15:00:52] export HADOOP_OPTS="$HIVE_METASTORE_HADOOP_OPTS $HADOOP_OPTS" [15:00:55] in [15:01:02] cominnnggg [15:01:05] /usr/lib/hive/bin/ext/metastore.sh [15:01:27] right for server [15:01:41] oh hm [15:01:54] server has HIVE_OPTS [15:01:59] not sure if that happens to metastore [15:02:01] hm [15:03:25] also, it looks like the init scripts should source default files [15:03:26] so [15:03:28] i think [15:03:33] in default/metastore [15:03:38] use HIVE_METASTORE_HADOOP_OPTS [15:03:48] in default/hive-server2 [15:03:52] use HIVE_OPTS [15:04:48] 10Analytics, 10Analytics-Dashiki, 10User-MarcoAurelio: Convert Extension:Dashiki to use extension registration - https://phabricator.wikimedia.org/T171884#3486461 (10Milimetric) Yes, Special:Version@meta links to https://www.mediawiki.org/wiki/Extension:Dashiki. The auto-added link to phabricator is broken... [15:06:55] yeah but $HADOOP_OPTS is also mentioned in there [15:07:11] so if it doesn't work now I doubt that HIVE_*_HADOOP_OPTS will [15:07:14] elukey: ottomata: If this helps: In step (1) I create an EXTERNAL table, then (2) I run several queries to populated (SELECT INTO). However, if the process fails, there goes (3) drop table, which will erase the metastore data but not the files itself, since the table is EXTERNAL, and according to: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ManagedandExternalTables [15:07:51] elukey: ottomata; I don't know whether what I have just described has anything to do with the metastore problems that you're inspecting. [15:09:00] elukey: i see HADOOP_OPTS for metastore for sure [15:09:02] but maybe not for hive server 2 [15:09:19] ottomata: ah so you are saying that it might work for the metastore? [15:09:33] in the server's init.d I can see exec_env="HADOOP_OPTS=\"-Dhive.log.dir=`dirname $LOG_FILE` -Dhive.log.file=${DAEMON}.log -Dhive.log.threshold=INFO\"" [15:09:40] elukey: i'm just reading the scripts that laucnh the process :) [15:09:52] ah! [15:10:01] elukey: if that is so, it looks lke the init script is setting HADOOP_OPTS [15:10:04] so you can't use it in default [15:10:07] since it will be overwritten [15:10:11] yeah :( [15:10:23] but, it doesn't overwrite HIVE_METASTORE_HADOOP_OPTS [15:10:24] probably they want people to use hive-env.sh [15:10:29] which will get prepended [15:10:39] i think use HIVE_METASTORE_HADOOP_OPTS and HIVE_OPTS in the respective default/ files [15:10:46] lemme try [15:11:31] ottomata: something like export HIVE_OPTS="$HIVE_OPTS -Xmx6144" ? [15:11:47] ya should be fine [15:11:51] that's for hive-server2 [15:14:35] ottomata: sure, now I see the hive-server unit working but no process in ps [15:14:38] wtf [15:15:02] PROBLEM - Hive Server on analytics1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hive.service.server.HiveServer2 [15:15:53] ok need to test it in labs [15:16:03] RECOVERY - Hive Server on analytics1003 is OK: PROCS OK: 1 process with command name java, args org.apache.hive.service.server.HiveServer2 [15:21:05] 10Analytics, 10Contributors-Analysis, 10DBA, 10Chinese-Sites: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3486496 (10Nuria) Ping @Marostegui: do we have an ETA on when these wikis will be available on labs new db hosts? Is it still end of Q1? @Neil_P._Quinn_WMF T... [15:22:46] 10Analytics, 10Contributors-Analysis, 10DBA, 10Chinese-Sites: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3486500 (10Marostegui) >>! In T165233#3486496, @Nuria wrote: > Ping @Marostegui: do we have an ETA on when these wikis will be available on labs new db hosts?... [15:23:51] (03PS1) 10Zhuyifei1999: Add explain button, and store the connection ID when running [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/368805 (https://phabricator.wikimedia.org/T146483) [15:24:45] 10Analytics, 10Contributors-Analysis, 10DBA, 10Chinese-Sites: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3486505 (10Nuria) Excellent, so we can count on all being available by end of this quarter! [15:25:21] 10Analytics, 10Contributors-Analysis, 10DBA, 10Chinese-Sites: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3486506 (10Marostegui) >>! In T165233#3486505, @Nuria wrote: > Excellent, so we can count on all being available by end of this quarter! We are hoping to hav... [15:28:45] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3486549 (10mforns) [15:28:48] 10Analytics: Pagecounts-ez not generating - https://phabricator.wikimedia.org/T172032#3486548 (10mforns) [15:29:33] 10Analytics-Cluster, 10Analytics-Kanban: Cannot request more than 4 cores per spark executor - https://phabricator.wikimedia.org/T172018#3486551 (10mforns) [15:30:18] 10Analytics-Kanban: Per Family Unique Devices Counts - https://phabricator.wikimedia.org/T143927#3486552 (10Nuria) [15:31:43] 10Analytics, 10Analytics-Cluster, 10Operations, 10User-Elukey: thorium - failed git clone of geowiki-data-private - https://phabricator.wikimedia.org/T171923#3486558 (10mforns) a:03Ottomata [15:32:13] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10User-Elukey: thorium - failed git clone of geowiki-data-private - https://phabricator.wikimedia.org/T171923#3480324 (10mforns) [15:33:16] 10Analytics-Cluster, 10Analytics-Kanban: Cannot request more than 4 cores per spark executor - https://phabricator.wikimedia.org/T172018#3486579 (10Ottomata) a:03Ottomata [15:33:18] 10Analytics-Kanban: Vet Analysis on June 2017 Data - https://phabricator.wikimedia.org/T171914#3486581 (10mforns) a:03Tbayer [15:35:29] 10Analytics-EventLogging, 10Analytics-Kanban, 10Community-Tech, 10DBA, 10User-Elukey: Drop CookieBlock* tables from EventLogging DB - https://phabricator.wikimedia.org/T171883#3486618 (10mforns) [15:37:24] 10Analytics, 10Research: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207#3486622 (10mforns) [15:37:26] 10Analytics, 10Pageviews-API: Provide weekly top pageviews stats - https://phabricator.wikimedia.org/T133575#3486621 (10mforns) [15:38:31] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Cannot request more than 4 cores per spark executor - https://phabricator.wikimedia.org/T172018#3486627 (10Ottomata) Hm the default should be 32, not sure why you are seeing 4. Anyway, just merged ^. We'll have to wait for some a cluster restart... [15:41:26] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10Traffic, 10User-Elukey: Encrypt Kafka traffic, and restrict access via ACLs - https://phabricator.wikimedia.org/T121561#3486631 (10mforns) [15:41:59] 10Analytics-Cluster, 10Analytics-Kanban: Provision new Kafka cluster(s) with security features - https://phabricator.wikimedia.org/T152015#3486635 (10Nuria) [15:43:42] (03CR) 10Steinsplitter: [C: 031] "Looks good to me" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/368805 (https://phabricator.wikimedia.org/T146483) (owner: 10Zhuyifei1999) [15:49:46] 10Analytics, 10Easy: Fix layout of the daily email that sends pageview dataset status - https://phabricator.wikimedia.org/T116578#1752766 (10mforns) 05Open>03declined Not a great benefit [15:52:54] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Cannot request more than 4 cores per spark executor - https://phabricator.wikimedia.org/T172018#3486700 (10EBernhardson) Not super urgent, everything certainly works now with 4 cores i was just doing some measurements to see if there was a sweet sp... [15:57:24] 10Analytics, 10Analytics-Cluster: Refactor webrequest_source partitions and oozie jobs - https://phabricator.wikimedia.org/T116387#3486720 (10mforns) p:05Normal>03Low [16:07:58] 10Analytics: Provide API for sampling pageviews - https://phabricator.wikimedia.org/T126290#2010185 (10mforns) From backlog grooming meeting team discussion: It looks as this use case is not suited for a salable API. It would better be solved by hive queries. Now, a better way to join the webrequest data and the... [16:08:42] 10Analytics: Improve joining mechanism between webrequest data and edit data for i.e. sampling pageviews - https://phabricator.wikimedia.org/T126290#3486775 (10mforns) [16:10:49] 10Analytics, 10Discovery: Look into encrypting logs sent between mediawiki app servers and kafka - https://phabricator.wikimedia.org/T126494#3486778 (10mforns) [16:10:51] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10Traffic, 10User-Elukey: Encrypt Kafka traffic, and restrict access via ACLs - https://phabricator.wikimedia.org/T121561#3486779 (10mforns) [16:11:12] while HaeB is not around, does anyone on this channel know if we store section requests as part of webrequest logs? (I'm trying to figure out if there is a way for us to see how often sections are read) [16:12:06] (03CR) 10Zhuyifei1999: [C: 032] Add explain button, and store the connection ID when running [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/368805 (https://phabricator.wikimedia.org/T146483) (owner: 10Zhuyifei1999) [16:12:22] (03Merged) 10jenkins-bot: Add explain button, and store the connection ID when running [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/368805 (https://phabricator.wikimedia.org/T146483) (owner: 10Zhuyifei1999) [16:24:36] 10Analytics-Kanban, 10Discovery, 10Discovery-Analysis: Add purge info for Kartographer schema - https://phabricator.wikimedia.org/T171622#3486818 (10mpopov) Based on [[ https://phabricator.wikimedia.org/diffusion/EWMV/browse/master/modules/ext.wikimediaEvents.kartographer.js;df9e69fe78f556cd4d291d5781a85a900... [16:35:56] leila: do you mean the https://en.wikipedia.org/wiki/Fragment_identifier (hash part) of the URL? yes, that's stored in webrequest [16:36:02] 10Quarry, 10Patch-For-Review: Add SHOW EXPLAIN support to Quarry - https://phabricator.wikimedia.org/T146483#3486857 (10zhuyifei1999) 05Open>03Resolved Should be fixed. After cleaning up browser cache there should be an "Explain" button just after "This query is currently executing...". [16:36:05] (like https://en.wikipedia.org/wiki/Nawaz_Sharif#2016_Panama_Papers_leak vs. https://en.wikipedia.org/wiki/Nawaz_Sharif ) [16:37:03] it won't quite tell you which sections are being read beyond the linked one though [16:37:27] i have been compiling some information about such questions here https://meta.wikimedia.org/wiki/Research:Which_parts_of_an_article_do_readers_read [16:38:56] that makes sense, HaeB. thanks. [16:39:29] (HaeB: yes, I meant the hashed part) [16:54:33] (03CR) 10Nuria: [C: 031] "There must be a puppet change companion to this one that executes this code keeping 6 months back per our agreement on standup." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/355601 (https://phabricator.wikimedia.org/T162034) (owner: 10Mforns) [16:56:33] halfak: i'm still syncing archive stat1003/6 stuff [16:56:33] but [16:56:37] there is now a new script on stat boxes [16:56:42] that you should be able to execute [16:56:43] called [16:56:50] published-datasets-sync [16:56:52] if you run that [16:56:56] it will just do the same the the cron does [16:56:59] (03CR) 10Nuria: [C: 031] "Have we tested this against some local database? setting up fake snapshots?" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/355601 (https://phabricator.wikimedia.org/T162034) (owner: 10Mforns) [16:57:11] and rsync the published-datasets dir to thorium [16:57:16] ottomata: do you have a min to discuss what do to with hive? [16:57:16] thorium still has to run hardsync on its own [16:57:26] i also upped the frequnecy of both to 15 mins [16:57:27] instead of 30 [16:57:30] elukey: hmmm [16:57:32] i have 2 minutes! [16:57:34] then have a meeting [16:58:10] ah! ok so we can skip don't worry [16:58:18] I wanted to find a way to use hive-env.sh [16:58:39] elukey: why not default? [16:58:44] i think default would be better, if possible [16:59:16] exec_env="HADOOP_OPTS=\"-Dhive.log.dir=`dirname $LOG_FILE` -Dhive.log.file=${DAEMON}.log -Dhive.log.threshold=INFO\"" [16:59:20] su -s /bin/bash $SVC_USER -c "$exec_env nohup nice -n 0 \ [16:59:22] $EXEC_PATH --service hiveserver2 $PORT \ [16:59:24] ottomata: --^ [16:59:26] > $LOG_FILE 2>&1 < /dev/null & "'echo $! '"> $PIDFILE" [16:59:29] sleep 3 [16:59:34] 10Analytics: Browser Reports. Break down "Other" bucket a little more? - https://phabricator.wikimedia.org/T131127#3486893 (10Nuria) [16:59:38] I am not sure how to overcome that thing [16:59:41] any ideas [16:59:42] ? [17:00:14] HIVE_OPTS seems to be for hive parameters, not jvms, adding it causes problem to default [17:00:20] elukey: don't mess with HADOOP_OPTS [17:00:22] oh hmmm [17:00:26] hahahaah [17:00:31] does the metastore one work though? [17:00:35] HIVE_METASTORE_HADOOP_OPTS? [17:00:39] that one should work [17:00:51] didn't test since I didn't want to make everything explode, but it starts like the above [17:00:59] so I am not sure if putting HIVE_METASTORE_HADOOP_OPTS in default works [17:01:05] elukey: it should work [17:01:15] if i am reading the scripts correctly [17:01:29] read the stuff in /usr/lib/hive/bin and /usrlib/hive/bin/ext [17:01:36] milimetric: I think a good one for the svds team will be the browser work we talked about no? (cc mforns_brb ) splitting underlying tables to have more precise data [17:01:43] see what env vars they expect to ahve set, and when they get set or overwritten [17:02:00] nuria_, milimetric, makes sense [17:02:09] ottomata: sure but are they passed in the weird code pasted above? I am a big confused [17:02:16] *bit [17:02:35] elukey: they are env vars [17:02:38] they don't need to be 'passed' [17:03:30] ottomata: sure, I am asking since I saw the /bin/bash with the -c command and variables, HADOOP_OPTS should be passed as well [17:03:41] ya it is [17:03:46] but for metastore [17:03:51] a yes [17:03:53] but it is OVERWRITTEN [17:03:55] in the init script [17:03:59] so whatever you set in deafult [17:04:00] won't take [17:04:07] but HIVE_METASTORE_HADOOP_OPTS is not overwritten [17:04:19] and is used in the metastore.sh script [17:06:44] it seems to me that hive-env.sh would be the preferred way, but we can try [17:06:54] we use it for all the other daemons [17:07:06] we do? [17:07:59] hmm ,myabe you are right [17:08:08] yeah like /etc/hadoop/conf.analytics-hadoop/hadoop-env.sh [17:08:14] ok elukey, and we don't have hive-env.sh puppetized, right? [17:08:16] elukey: ok [17:08:17] +1 [17:08:20] yeah looking too [17:08:46] ottomata: buuuut now there is an issue! 1 hive-env.sh and two daemons and puppet classes (metastore/server) [17:09:01] this is what I wanted to ask you :) [17:17:08] elukey: if we can use different vars in the same config file [17:17:10] that seem sfine [17:17:17] so HIVE_METASTORE_HADOOP_OPTS should work in hive-env.sh [17:17:25] and, maybe HIVE_OPTS for hive-server2? [17:17:27] not sure about that one [17:17:31] actually [17:17:34] if you are doing it in hive-env.sh [17:17:40] you could use HADOOP_OPTS [17:17:47] but that gets tricky because more than hive-server2 might use it? [17:18:47] ottomata: exactly, both would use it [17:19:04] plus what class should define the hive-env.sh file? metastore, server? [17:19:18] I think I got it why the defaults are used :) [17:19:42] HIVE_OPTS seems not working so I'd rather not use it [17:19:53] (seems more related to what hive as program expects) [17:20:10] but! We can use the SERVICE variable [17:20:19] it gets set with metastore or hiveserver2 [17:20:34] problem is what class should be responsible for hive-env.sh [17:20:52] haha [17:21:02] elukey: the same class that is repsonible for hive-site.xml [17:21:19] hive.pp was my next proposal :) [17:21:35] ya [17:21:46] I like it, changing puppet :) [17:22:13] elukey: the command that actually launches the hive server process is [17:22:15] exec $HADOOP jar $JAR $CLASS $HIVE_OPTS "$@" [17:22:18] so i think using should work HIVE_OPTS [17:22:30] HMM [17:22:34] actually [17:22:34] no [17:22:45] that won't work, because multiple SERVICEs (like beeline) use that [17:22:48] ERGH [17:22:51] this is why default is better :p [17:23:08] elukey: i don't see any reason why HIVE_OPTS in default wouldn't work [17:30:16] ottomata: for some reason I didn't see any process with ps aux but the unit looked fine [17:30:19] no weirdness in the logs [17:30:38] let's stick with the suggested way of the docs [17:53:50] ottomata: first attempt in https://gerrit.wikimedia.org/r/#/c/368820/1, I kinda like it but probably we can bikeshed if you don't. Going afk now, will work on it tomorrow :) [17:54:10] (when we merge the module we also need to tune some values in hiera) [17:54:18] ohhh if SERVICE [17:54:21] cool [17:54:24] reading, still in meeting though... [17:55:08] sure sure! Will check tomorrow :) [17:55:11] byyyeee o/ [17:55:28] k [18:34:19] 10Analytics-Kanban, 10Discovery, 10Discovery-Analysis: Add purge info for Kartographer schema - https://phabricator.wikimedia.org/T171622#3487205 (10mforns) Thanks @mpopov, then I think all the fields in the schema can be white-listed and kept indefinitely (except from EventCapsule's userAgent). Will write a... [18:38:11] 10Analytics-Kanban, 10Discovery, 10Discovery-Analysis: Add purge info for Kartographer schema - https://phabricator.wikimedia.org/T171622#3487212 (10mforns) Oh, wait... @mpopov, the talk page of this schema says "auto-purge after 90 days". I was assuming you wanted to keep the data for longer. Otherwise, the... [18:50:41] 10Quarry: Weird race condition makes query stuck in queued forever - https://phabricator.wikimedia.org/T172143#3487252 (10zhuyifei1999) [19:06:29] 10Quarry, 10Data-Services: Long-running Quarry query (querry?) produces strangely incorrect results - https://phabricator.wikimedia.org/T135087#3487373 (10zhuyifei1999) [19:12:00] Thank y'all so much for hue.wikimedia.org -- it is such a pleasure to write and run hive queries in that interface. [19:13:04] nice! [19:13:11] cool didn't know anyone was using it for that bearloga [19:13:40] how do you think ti compares to using jupyter notebook with hive (which I think you can sorta kinda but not really do, so not sure if you've tried that) [19:13:59] Hi a-team :) A quick hello from holiday-man :) [19:14:19] heeeelloooooooo holiday-man :D [19:14:38] hi joal! :) [19:14:44] how's things? [19:15:03] Well, except from having a second baby, holidays are awesome :-P [19:15:39] Naa, joking - everything fine, it's great to have time with Naé and Lino :) [19:16:23] Naé is gently awakening to the world, and Lino continues to grow :) [19:17:12] How is it in here? What have I missed? [19:17:56] joal: Welcome back! [19:18:27] not yet back GoranSM :) [19:18:33] Just taking some news :) [19:18:55] 8-) Enjoy the rest of the summer then [19:20:33] Thanks GoranSM :) [19:20:43] GoranSM: Should be back mid-august [19:21:52] ottomata: I don't think hue can be compared to jupyter. In fact, I write my queries in Hue since it's so easy to iterate + it has autocomplete and once I'm satisfied, I copy the query elsewhere (e.g. our Golden metrics calc codebase, a Jupyter notebook, a RMarkdown report) [19:22:12] nice [19:24:56] Anyone: is there *any* way of having mySQL tables created from production, say, in the staging databases, and then replicated on labsdb? [19:25:49] GoranSM: in general, no. The problem is there is a lot of sensitive information in the dbs so the only way to get dbs from prod to labs is via sanitarium and the standard labs replicas [19:32:43] ebernhardson: (1) I don't know what a sanitarium is, (2) can a custom set of tables from staging be introduced to standard labs replica (I guess no, but in case I'm developing something that will be used for a long time, by many interested parties..?) [19:35:21] joal, everything cool here :] missing you as well :D [19:35:39] * joal hugs mforns [19:35:48] :] [19:38:31] GoranSM: (1) wikitech.wikimedia.org is your friend, search for sanitarium. In the end the tl/dr is its the process that copys dbs from prod to labs and redacts sensitive information. (2) I'm not sure which staging you mean, typically all staging is done outside of production in labs already. [19:39:33] elukey: ottomata; beeline problems from stat1005 with my queries seem to be resolved (processing in batches now). Thanks for support. [19:40:27] ebernhardson: "Staging" is the name of the MariaDB database on m2 (I think) where researchers/analyst can create their own tables. [19:40:39] gr8 :) [19:41:20] GoranSM: you want to create a one off table? or something that updates regularly? [19:41:34] i doubt you'll be able to do as you say: make a table and have it 'replicated' to labsdb [19:41:54] but, if it is a one off, you can probably make the data available in labs somehow, maybe on toollabs? [19:42:05] I don't have much experience there [19:42:27] ebernhardson: ottomata: Background: I'm running quite voluminous Hadoop search queries, HiveQL, from production, and store the results in hdfs (part of developing a system to track/analyze Wikidata usage across hundreds of sister projects). [19:43:10] ebernhardson: ottomata: I have Shiny Dashboards running from Labs, supported by their own MariaDB back-end on labsdb. Task: migrate results from hdfs on production to labsdb so that my Dashboards can access them. [19:44:12] ebernhardson: ottomata: The current design is: produce .tsv files on production, go to /srv/published-datasets (previously: /a/published-datasets), download from the labs instance. It can be done, but it sounds ridiculous. [19:46:14] ottomata: So yes, something that updates regularly. [19:47:14] 10Analytics, 10Contributors-Analysis, 10DBA, 10Chinese-Sites: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3487492 (10Neil_P._Quinn_WMF) >>! In T165233#3486496, @Nuria wrote: > @Neil_P._Quinn_WMF Team agreed to only do prod snapshots ad hoc as we hope they are not... [19:47:30] GoranSM: I recommend that you put a 1 pager together of the system you are trying to create and ping us before you start coding so we can advice the best way to accomplish your project before coding starts [19:48:50] 10Analytics, 10Contributors-Analysis, 10DBA, 10Chinese-Sites: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3487509 (10Nuria) @Neil_P._Quinn_WMF we are going to be short 2 people in the team in the upcoming days, i cannot promise we can get this done in the next cou... [19:49:28] nuria_: Working on it. Much more than a one-pager, full-technical documentation, but for your team I will deliver a one-pager and explain the critical steps. [19:50:19] nuria_: The design of the system is, of course, fully modular, so the coding is under way for some time (months) already. It can always be adapted for new data management solutions, that's the way it is developed since the beginning. [19:52:04] nuria_: A prototype is here: http://wdcm.wmflabs.org/WDCM_Dashboard/ (this will be decomposed into several dashboards in the future). [19:57:29] GoranSM: i don't know how shiny works, but you could host your dashboard on analytics.wikimedia.org, if it is just an html page, etc., [19:57:39] by putting in published-datasets [19:58:04] GoranSM: I think getting feedback in your design before you start coding seems wise [19:58:31] GoranSM: my 1st question would be why dashboards need to be hosted on labs [19:59:22] nuria_: Certainly, and do not worry: I am well aware of the fact. BTW thanks for asking: why do dashboards need to be hosted on labs? Is there a place where I can host them so that they can be deployed from production directly? [20:00:05] 10Analytics-Kanban, 10Discovery, 10Discovery-Analysis: Add purge info for Kartographer schema - https://phabricator.wikimedia.org/T171622#3487538 (10mpopov) >>! In T171622#3487212, @mforns wrote: > I think all the fields in the schema can be white-listed and kept indefinitely (except from EventCapsule's user... [20:00:53] ottomata: For Shiny to run, you need the RStudio Shiny Server installed. That's the first step. However, developing Shiny without RStudio software is close to impossible, so then we would need to have RStudio Server running on production too. I never wanted to ask for something like that because it all sounds to demanding for the analytics people who already have too much on their plates. [20:00:57] GoranSM: yes, discovery has shiny dashboards built from analytics data deployed in prod. You might be able to steal some of bearloga's time to find out how thats done [20:01:36] ebernhardson: I know those Dashboards, but I had no idea that Mikhail runs them from production. Thanks. [20:01:40] GoranSM: +1 , i would invite our data analysts to contribute to your doc [20:01:43] ebernhardson GoranSM: I'm wrapping up a wm blog post explaining our puppetized dashboards and introducing the shiny server puppet module [20:02:14] GoranSM: an easy weay to get prompt feedback so you not need to undo your work is to ask for early design reviews and crs [20:02:55] GoranSM: and bearloga is the person whose work is most similar to yours [20:03:30] nuria_: Yes, but bearloga has already provided support to me, I can't go and e-mail him every now and then. [20:04:00] GoranSM: Code reviews are done via gerrit , are you familiar with this tool? [20:04:13] GoranSM: https://gerrit.wikimedia.org [20:04:26] nuria_: Gerrit - not yet. I know about it, but I'm still on GitHub. [20:04:48] nuria_: Let me have that one-pager written out for your team. And thanks for suggesting that. [20:05:05] GoranSM: well, that would be something to address, you can have some of the de developers help you with that , it is standard developers configuration [20:06:01] nuria_: of course, addshore is the person who usually introduces me to such things. [20:06:27] GoranSM: I would get familiar with gerrit, w/o it i doubt you will be able to ship any code to prod, some of our team docs: https://wikitech.wikimedia.org/wiki/Analytics/Team/Onboarding#Gerrit [20:06:39] GoranSM nuria_: also https://meta.wikimedia.org/wiki/Discovery/Analytics#Version_Control [20:06:41] GoranSM: but to be clear that is tool used by the mediawiki community [20:07:21] nuria_: bearloga: thank [20:07:24] *thanks [20:07:39] nuria_: Does the fact that the tool is used by the mediawiki community changes anything? [20:08:20] GoranSM: I've recently updated that page since you may have last seen it when you were first getting onboarded, so check if maybe there's new stuff there that could be helpful [20:08:49] bearloga: Ok [20:09:26] GoranSM: I recommend that you work with addshore reviewing your development practices and once you are familiar with tools and processes you ping us again, getting familiar with gerrit and how to get code reviews would be very helpful [20:10:20] nuria_: That is acknowledged (and necessary, as well; I understand nothing gets deployed here without gerrit) [20:19:41] 10Analytics, 10Analytics-Dashiki, 10User-MarcoAurelio: Convert Extension:Dashiki to use extension registration - https://phabricator.wikimedia.org/T171884#3487549 (10MarcoAurelio) @Milimetric Thank you very much. You've solved my question. Best regards. [21:17:41] 10Analytics, 10Contributors-Analysis, 10DBA, 10Chinese-Sites: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3487721 (10Neil_P._Quinn_WMF) >>! In T165233#3487509, @Nuria wrote: > @Neil_P._Quinn_WMF we are going to be short 2 people in the team in the upcoming days, i... [21:28:00] mforns: btw, I moved the patch to github [21:28:18] gwicke, ok [21:28:26] the primary for RB is there, gerrit is just a mirror [21:30:38] mforns: I am kind of wondering if it still makes sense to add a global default license note [21:30:40] 10Analytics-Kanban, 10Analytics-Wikistats, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): Fix Wikistats build in Jenkins - https://phabricator.wikimedia.org/T171599#3487815 (10hashar) I am most probably going to upgrade npm to version 3 this Tuesday. The blocker was to expli... [21:31:22] gwicke, I see [21:31:51] Something with "Unless specified in the entrypoint documentation, content available in this API is licensed under CC-by... [21:32:06] gwicke, yea sounds good to me [21:32:45] kk, lets wait if Zhou is okay with that as well [21:32:58] gwicke, cool :] [21:33:00] can then keep the general doc section fairly uniform across projects [21:33:09] aha [22:05:18] Should I expect this to fail? > awight@stat1006:~$ mysql -h analytics-store.eqiad.wmnet -A [22:07:06] ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 2 "No such file or directory" [22:07:16] Looks like it’s trying to connect to the wrong FIFO socket [22:10:41] nuria_: One-pager on WDCM is ready and shared. [22:14:30] lol and now mysql works. /me smudges sage to exorcise evil spirits [22:16:27] nope, broken again. original question stands... [22:18:35] but my FIFO guess doesn’t explain intermittent failures [23:34:21] What's up with s3-analytics-slave? [23:34:30] It just took 4 seconds to respond to a DESCRIBE query on a smallish tab le [23:35:24] DESCRIBE enwiki.page; just took 15 seconds [23:43:54] awight: It seems stat1006 is just broken right now, queries take forever there. stat1003 is fine [23:44:17] RoanKattouw: ty for letting me know I’m not the only crazy one [23:44:33] It works about 1/6 of the time for me. [23:44:57] query performance seems fine for what I’m doing, though (~1 minute) [23:46:51] I complained on the ops list thread where people were told to migrate from 1003 to 1006