[00:55:23] Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2596549 (Jdforrester-WMF) Please don't kill the scripts for good, but it's OK to disable them for a bit. [01:42:41] Analytics, Pageviews-API, Wikipedia-iOS-App-Backlog, iOS-app-v5.2.0-Honey: filter suspicious TV channels pageviews from Top Read - https://phabricator.wikimedia.org/T144333#2596593 (JMinor) [01:43:46] Analytics, Pageviews-API, Wikipedia-iOS-App-Backlog, iOS-app-v5.2.0-Honey: filter suspicious TV channels pageviews from Top Read - https://phabricator.wikimedia.org/T144333#2596606 (JMinor) p:Triage>High [01:52:09] Quarry: Forking your own query results in a new one owned by YuviPanda - https://phabricator.wikimedia.org/T144309#2596611 (Huji) Now it works fine. Could you please submit the patch that fixed it here before you close the task? [01:52:18] Analytics, Pageviews-API, Wikipedia-iOS-App-Backlog, iOS-app-v5.2.0-Honey: filter suspicious TV channels pageviews from Top Read - https://phabricator.wikimedia.org/T144333#2596613 (JMinor) For previous discussion of the Top 25 exclusions and our band-aid solution in the iOS client see https://ph... [02:25:45] Analytics, Operations, LDAP: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2596648 (Peachey88) [02:37:04] Analytics, Operations: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2596656 (Tbayer) [02:37:57] Analytics, Operations: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2596371 (Tbayer) @Peachey88 : As explained in the task description, this issue is specifically *not* about LDAP. [05:22:49] Quarry: Forking your own query results in a new one owned by YuviPanda - https://phabricator.wikimedia.org/T144309#2596738 (yuvipanda) I, uh, just restarted redis and flushed out the db :| when bringing it back up after an outage earlier I had started the wrong redis instance... I really should move Quarry... [06:57:49] jdlrobson: Hi ! [06:58:44] jdlrobson: You have many requests running in parallel on the cluster [06:59:12] jdlrobson: While they might finish at some point, resource oversharing is not helping them [06:59:48] jdlrobson: Since each of them is quite big, best practice would be to run them sequentially [07:00:36] there's an icinga alert for the filled-up root partition on stat1001 [07:00:45] Hi moritzm [07:00:48] from jdlrobson ? [07:01:45] the directory /var/www/limn-public-data is 13G big and not under /srv (as aggregate-datasets and public-datasets) [07:02:02] moritzm: ok [07:02:05] (of the 27G root partition) [07:02:19] not sure how recent that is, though [07:02:46] moritzm: I'll take a look, and see if there is anything I can help with [07:04:09] ok moritzm, I know what that is [07:04:27] moritzm: give me a minute triple check communication has not been made [07:04:31] I dropped the apt cache and an unused kernel, but we're still at 0 (the kernels does a bit of overcommitment) [07:04:37] joal: ok [07:05:18] moritzm: nuria has synced a new folder yesterday (or this morning): /var/www/limn-public-data/caching [07:05:25] This represents most of the 13g [07:05:40] ack, that's it [07:05:41] moritzm: This data is available on HDFS, can you please delete the folder [07:05:57] moritzm: I'll communicate with the team [07:06:17] sure, I can also only drop one of the files, though? [07:06:23] it's three files of 4 GB each [07:06:34] moritzm: let's delete all [07:06:39] ok [07:06:47] moritzm: this issue means the approach taken by nuria doesn't work :) [07:06:52] We should find another place [07:07:03] there's plenty of space under /srv [07:07:14] I need to drop for a minute - will be back shortly [07:07:15] 3T remaining [07:07:18] ok [07:07:25] moritzm: 3T remaining ???? [07:07:33] oh yues didn't notice the /srv [07:07:38] great [07:07:57] Thanks moritzm for the heads up, I don't receive cinga alerts for this machine [07:08:12] it's not alerting, it's just a passive check [07:08:32] hopefully we'll be able to provide access to the icinga UI at some point [07:08:57] so that you could have a look at all stat* hosts, e.g. [07:09:11] but the current setup is very unflexible, we're looking into a replacement [07:40:14] o/ [07:41:06] thanks moritzm! [07:42:11] wow weird oozie emails joal, but at least no data errors [07:42:25] for some reason it's now alerting again, but this time for inode usage: DISK CRITICAL - free space: / 50 MB (0% inode=93%): [07:42:52] going to take over and clean it up [07:43:58] mmmm it is back to 100% disk space used [07:45:17] ahhh 13GB in /home [07:45:47] oh my [07:45:52] the winner is... /home [07:45:54] no, the files in /var/www/linmn-public-data/caching are back [07:46:14] moritzm: yes but there is also a duplicate of the home dirs [07:46:20] so 13GB + 13GB [07:46:24] what the heck, I removed these [07:46:42] maybe it is one of the crons syncing data [07:46:45] from other stats [07:46:58] I am almost sure that Nuria uploaded the files somewhere else [07:47:04] and they are rsyncing [07:49:24] so we have something as weird as elukey@stat1001:/home/home/home$ [07:49:40] with ori's home (~3GB) repeated multiple times [07:50:00] and I suspect that I am the one to blame, because I re-imaged months ago the machine [07:50:09] or somebody else messed up with homes [07:50:14] but most probably it is me [07:52:00] !log removed /home/home/home dir from stat1001 to free space [07:52:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [07:52:45] now /home/home is weird [07:53:05] ls -lt shows dirs owned by users up to 2nd of May (IIRC when I reimaged) [07:53:16] but also other weird dirs dated Aug 14th? [07:54:09] ah seems old users [07:54:49] !log removed /home/home dir from stat1001 to free space [07:54:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [07:55:22] goood now free space down to 70% [07:55:24] all good [07:57:48] I think it also makes sense to out linmn-public-data out of the /var partition towards /srv (which has 3T of free disk space) [07:58:02] the other two dirs in /var/www are already symlinked [07:58:05] to /srv [08:02:05] yes it does, I'll have a chat with mr ottomata today [08:20:19] I'm back [08:20:25] Thanks elukey and moritzm ! [08:23:26] moritzm: Email sent to the team [08:24:22] 10:24 PROBLEM - Disk space on stat1001 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=93%) [08:24:25] ahahah [08:24:26] checking [08:24:58] elukey: I assume the rsync is managed by puppet having a cron ... [08:25:41] ah yes but it is not 22GB [08:25:48] I mean the /var/www [08:25:49] mmmm [08:26:02] I thought they were less [08:26:03] sigh [08:31:59] elukey: weird oozie messages are because load jobs are taking longer than expected [08:32:28] elukey: This is due to cluster a bit overwhelmed by user queries (see my comments earlier to jdlrobson) [08:33:07] ahhh okok makes sense [08:33:09] poor oozie [08:33:14] yeah [08:33:38] Well I mean, not really elukey - With what it makes us endure, I think it's a normal return :-P [08:33:40] but at least it seems that the consistency errors are gone [08:33:43] :D [08:33:46] Yay [08:33:50] ahhaah [08:41:53] all right so stat1003 /srv/reportupdater/output$ dir contains the new caching dir that Nuria created [08:42:00] that gets rsynced to limn-publicdata [08:48:49] all right so to unblock the situation for the moment I'll copy the caching dir to my home [08:48:59] and then I'll delete it on stat1003 [08:49:08] even if I am not sure how it gets published in there [08:49:09] elukey: hm [08:49:09] mmmm [08:49:23] elukey: problem is on stat1001, not 3 [08:49:48] joal: I know :) [08:49:55] elukey: I think the rsync should go to a /srv folder, then we can manually (or puppet) a symlink from /var/www [08:49:56] the dir on stat1003 gets rsynced on stat1001 [08:50:33] yes but I want to restore functionality and then fix the issue with some calm [08:50:49] so [08:50:50] elukey: I don't get it then [08:51:08] what is your doubt? [08:51:14] I don't uderstan [08:51:21] copying the data in your homen [08:51:37] I am not sure if nuria has it somewhere [08:51:51] elukey: I'm pretty sure it's oin hdfs [08:52:10] elukey: But, best would be to comment the puppet cron for the moment [08:52:31] elukey: no communication has been made on that data eing available yet [08:53:51] well we could just move the dir somewhere else [08:53:57] outside the scope of the rsync [08:54:05] then it will be automatically deleted on stat1001 [08:54:11] might be better [08:54:13] elukey: as you wish [09:10:22] Analytics-Tech-community-metrics: Deployment of Gerrit Delays panel for engineering - https://phabricator.wikimedia.org/T138752#2597638 (Qgil) Loading the page takes a while, but ok. It is about delays after all. ;) There are metric about MERGED and ABANDONED changesets there. I would expect to see only me... [09:10:36] !log Moved stat1003:/srv/reportupdater/output/caching to /home/elukey/caching as temporary measure to free space on stat1001 [09:10:38] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [09:11:15] Analytics-Tech-community-metrics: Deployment of Mediawiki panels - https://phabricator.wikimedia.org/T138006#2597639 (Qgil) "No results found". Is this expected? [09:11:45] Analytics-Tech-community-metrics: Deployment of Demography panel - https://phabricator.wikimedia.org/T138757#2597640 (Lcanasdiaz) >>! In T138757#2597618, @Qgil wrote: > Is this about the "[[ https://wikimedia.biterg.io/app/kibana#/dashboard/Git-Demographics?_g=(refreshInterval:(display:Off,pause:!f,value:0),... [09:12:44] mforns: you there? [09:13:45] in two minutes the /usr/bin/rsync -rt /srv/reportupdater/output/* stat1001.eqiad.wmnet::www/limn-public-data/ sync will run and the space will be freed [09:13:48] BUT [09:13:59] I have no idea how the caching data got there [09:14:12] since there are other crons of report updater to create stuff [09:14:26] Analytics-Tech-community-metrics: Deployment of Mediawiki panels - https://phabricator.wikimedia.org/T138006#2597641 (Lcanasdiaz) >>! In T138006#2597639, @Qgil wrote: > "No results found". Is this expected? No :-/ . Working on it .. [09:15:15] but afaiu nuria copied it manually [09:15:58] Analytics-Tech-community-metrics: Deployment of Gerrit Delays panel for engineering - https://phabricator.wikimedia.org/T138752#2597643 (Lcanasdiaz) >>! In T138752#2597638, @Qgil wrote: > Loading the page takes a while, but ok. It is about delays after all. ;) > > There are metric about MERGED and ABANDONE... [09:18:09] !log deleted /var/www/limn-public-data/caching on stat1001 to free space [09:18:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [09:21:14] 11:19 RECOVERY - Disk space on stat1001 is OK: DISK OK [09:27:18] joal: completed the clean up work for port 7000 between Cassandra and hadoop FYI [09:27:30] I am running puppet on aqs100[123] not [09:27:32] *now [09:27:42] if you see anything weird let me know [09:32:21] Analytics-Tech-community-metrics: Deployment of Mediawiki panels - https://phabricator.wikimedia.org/T138006#2597670 (Aklapper) "No results found" should get fixed in the next days; data gathering still in progress afaik. [09:39:06] Analytics, Beta-Cluster-Infrastructure, Services, scap, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2597678 (elukey) >>! In T116206#2595478, @bd808 wrote: >>>! In T116206#2582429, @elukey wrote: >> Thanks for reporting, this is my bad since analytics_hadoop_hosts... [09:50:37] also ready to rolling restart aqs100[123] to check performances joal [09:50:57] k elukey [09:57:14] elukey: question: It seems I don't have access to druid100[123] machines - Is that normal? [09:58:40] if andrew wants to mess with you yes :P [09:58:49] huhuhu :D [09:59:03] kidding, probably he didn't add the admin group in puppet [09:59:05] let me check [10:08:21] yes.. going to figure out where it is best to put the admins data [10:08:46] there is a hieradata/eqiad/druid.yaml but not sure if hieradata/role/druid would be better [10:10:24] elukey: I can't say ... I don't think we're gonna have druid clusters in other DCs [10:12:12] yeah but most of the hiera config for admins is in role so I am going to stick with the convention.. checking the druid role now [10:20:55] joal: you should be able to access now [10:21:26] I added analytics-admins/roots to the druid hosts [10:21:50] elukey: Awesome, testing [10:22:06] elukey: Working :) [10:22:11] elukey: Thanks a lot mate ! [10:23:32] now I am wondering if this should have been an access request [10:23:33] mmmm [10:25:10] ah joal I'd need to revert the change [10:25:13] for two reasons [10:25:23] 1) analytics-roots gives full sudo access to analytics [10:25:37] 2) analytics-admins gives sudo for oozie/hive/etc.. [10:25:43] that are not there :) [10:25:52] elukey: generates failures ? [10:25:55] so the best thing would be to create a druid-admins users [10:26:04] with correct sudo permissions [10:26:06] k [10:26:08] and go through access [10:26:09] ok? [10:26:17] I'll help you if you need anything with access [10:26:22] elukey: I don't need druid sudo, just needed to access the host [11:38:07] joal: I restarted only aqs1001 till now since I say compactions ongoing, but look what happened to latency [11:38:57] elukey: hehe [11:43:07] elukey: what is grafana dashboard showing an instance system metrics? [11:43:13] elukey: I can recall it :( [11:43:25] elukey: must be the tenth time I ask... sorry [11:44:07] server-board? [11:44:28] YESSSSS ! [11:44:31] Thanks :) [11:44:35] :) [11:46:41] all right cluster restarted [11:46:44] all good [11:46:52] I am going to lunch and then I'll double check metrics [11:46:52] great, thanks elukey [11:46:55] k [11:47:16] this one is very interesting https://grafana.wikimedia.org/dashboard/db/aqs-cassandra-system?panelId=7&fullscreen [11:47:53] elukey: just got an idea, we'll discuss that before standup [11:49:09] sure [12:07:47] taking a break a-team, see you in a bit [12:09:06] hallo [12:09:46] I'm trying to run a query of webrequest using beeline on stat1002, and it doesn't seem to do anything after [12:09:53] Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1472219073448_14336 [12:10:31] usually it types map reduce status lines, and takes up to minutes, but now it has been stuck for a much longer time [12:16:52] aharoni: hello! I think that the cluster is a bit overloaded atm, this might explain the problem [12:17:20] elukey: OK, I'll wait patiently [12:17:22] thanks [12:18:31] thank you! :) [12:41:43] Analytics-Kanban: AQS Cassandra READ timeouts caused an increase of 503s - https://phabricator.wikimedia.org/T143873#2597981 (elukey) During the last analytics ops sync we decided to test a rolling restart of cassandra on aqs100[123] to double check if last month's performance improvements (latency going dow... [12:49:46] Analytics-Kanban: AQS Cassandra READ timeouts caused an increase of 503s - https://phabricator.wikimedia.org/T143873#2597991 (elukey) Read latency followed the same pattern: {F4419667} [13:00:20] ok IOPs for the raid arrays drop right after the cassandra restart [13:00:34] causing a drop in read latency and also in response time [13:24:42] urandom: hi! If you have time I'd need a cassandra consult about --^ [13:25:00] the above phab task contains also some data [13:25:27] this is really weird [13:25:38] but there are 1000 things to check :D [13:26:09] the only big trace left by Cassandra seems to be the disk IOPs [13:51:19] mforns: btw, try not casting to string in the *HistoryRunner sql queries, the sqoops should be already cast [13:51:32] if they're not, I've gotta sqoop with the latest code maybe [13:51:45] (which I should do anyway so we have fresher data when Erik looks) [13:58:12] milimetric: o/ [14:00:56] elu hiii [14:01:01] why no analytics-admins/roots on druid hosts? [14:01:03] i would've done that too [14:02:21] hiiiiiii! [14:02:43] so I started to have tons of doubts [14:02:54] 1) analytics-roots gives full sudo afaiu [14:03:16] 2) analiytcs-admins gives sudo for stuff not running on druid IIRC (oozie, hive, etc..) [14:03:22] Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2598171 (DarTar) I paused all the cronjobs for the ee-dashboards and will help @HJiang-WMF and @Milimetric cherry-pick those that need t... [14:03:26] and 3) do we need an access request? [14:03:36] so I reverted waiting for you :) [14:03:45] Analytics, Analytics-EventLogging, DBA: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2598172 (DarTar) [14:04:43] Analytics, Analytics-EventLogging, DBA: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2598178 (jcrespo) Can I kill already-running jobs? [14:05:27] ottomata: do we need to change puppet to rsync correctly for limn-public-data ? [14:05:43] elukey: don't think so, there is a symlink [14:05:55] elukey: so, analytics-roots gives full sudo [14:05:57] that is its intention [14:06:09] analytics-admins does give sudo perms for stuff that isn't ther,e that's true [14:06:15] so maybe druid-admins makes sense [14:06:26] perhaps analytics-admins should be called hadoop-admins, dunno [14:06:26] but [14:06:30] i think that's a little annoying [14:06:44] yeah :( [14:06:47] generally we want the folks in analytics-admins to be able to do admin stuff on all analytics cluster boxes [14:07:08] puppet doesn't have the ability to not grant the sudo perms if they don't make sense, but they also don't hurt to have on the box [14:07:08] well not full sudo [14:07:15] no? [14:07:30] (I trust you guys just asking, don't look me in a bad way :P) [14:07:33] analytics-roots is sudo [14:07:34] right? [14:07:43] full sudo [14:07:46] yeah it gives root perms basically [14:07:58] yeah [14:08:08] haha, oh but nobody is in that group :p [14:08:16] ahahhaha [14:08:40] elukey: i'd add analytics-admins to that node, joseph just needs shell access there [14:08:50] and it doesn't actually give any permissions yet [14:09:03] or hm [14:09:08] maybe druid-admins would be better [14:09:11] elukey: ahh the thing is [14:09:26] i think we are still a little waffly about whether or not druid will be here for the long run. I feel about 90% about it, which is pretty good [14:09:44] and i guess we can always change group names later, buuuut, its kinda nice to just have a catch all analytics-admins that is useful [14:09:47] it will be the same people [14:13:18] ottomata, elukey: In fairness, I don't mind having hive sudo in places where there is no hive (but I understand if you tell me it's bad nonetheless) [14:13:37] elukey: As said before leaving, I have an idea for cassandra restart improvements [14:13:49] me too! [14:13:50] :D [14:14:16] anyhow, I think that the best procedure would be to create either a "real" analytics-admins or just a druid-admins [14:14:19] elukey: All In ! I go first ;) [14:14:24] and then submit sudo request to ops [14:14:42] BUT I am always the picky one so don't pay attention to me :P [14:15:57] elukey: there is a real analtyics-admins [14:15:59] you mean real analytics-roots [14:15:59] ? [14:18:51] ottomata: didn't we say that analytics-admins is in reality hadoop-admins? [14:19:11] oh [14:19:16] i mean, right now it is i guess ja [14:19:22] sudo capability wise I meant sorry :) [14:19:40] yes, but i would probably just expand its meaning [14:19:43] i'm fine with either [14:19:51] but it seems convenient and fine to me to have it mean more than just hadoop-admins [14:20:24] yep agreed.. a lot clearer [14:20:31] we could make a little refactor [14:20:57] wait, haha, i'm suggesting we leave it as analytics-admins [14:21:02] and just expand what it can do [14:21:27] so, i think you should just add analytics-admins to druid nodes for now. since folks don't actually need any special druid perms atm. [14:21:33] they just need access to the boxes [14:24:35] that would require a phab task :P [14:26:11] re: analytics-admins - does it make sense to have only one? because having one sudoers per "cluster" is handy since you can limit a lot what you can do [14:26:25] I'd refactor analytics-admins to hadoop-admins [14:26:30] and create druid-admins [14:26:40] finally removing analytics-roots [14:26:54] it is painful but more granular imho [14:29:14] yeah it is, but it seems unlikely that we would need that, and will just be more annoying to maintain [14:29:20] but elukey, ja i agree that that is good too [14:29:24] so if you prefer that i'm cool with that [14:29:27] not strongly opinioned here [14:30:56] maybe we can ask this to the team [14:31:00] and check their opinion [14:35:24] aye [14:37:00] milimetric, cool, I will add a patch to the history runners to remove the casts [14:39:39] elukey: hi [14:40:14] o/ [14:44:40] (CR) Joal: "If the reconstruction is scala only, it so far uses files paths, which mean we could go without hive tables (I don't mind having them thou" [analytics/refinery] - https://gerrit.wikimedia.org/r/306292 (https://phabricator.wikimedia.org/T141476) (owner: Milimetric) [14:44:54] elukey: yt? [14:45:22] nuria_: saw your message from yesterday about spark failure [14:45:30] nuria_: have you managed to have ti working? [14:45:46] joal: no, but i just run the query in hive [14:45:53] joal: was about to check results [14:46:04] nuria_: That's weird [14:46:31] nuria_: The errors you posted yesterday about workers being lost are fake errors (expected behavior) [14:47:27] joal: ah, ok, but still manipulating data on spark was talking forever [14:48:28] nuria_: Supposedly it's faster than in hive, that's why I use it for this kind of analysis, but nevermind, I assume it also depends the level of familiarity with the tool [14:48:39] joal: maybe i was doing something wrong but basically i just selected from the table created and called take(10) [14:48:49] joal: and that was minutes and minutes [14:48:53] hm [14:49:06] nuria_: first run is expensive: need to extract a month of pageviews [14:49:37] nuria_: then, if you cache the temp table (not done in my script, my bad), next queries should be really faster [14:50:04] joal: teh cached temp table persists across spark-shell restarts? [14:50:25] nuria_: o/ [14:50:30] also nuria_, cluster is stalled for users from yesterday, I'm waiting for jdlrobson to come online to discuss with him [14:50:43] elukey: hola! let's talk about caching directory in standup, i get there are some space issues? [14:50:49] nuria_: nope, if you want the table to persist, you need to save the data and then read from there [14:50:55] joal: you can kill jdlrobson query [14:50:57] ottomata already fixed the issue [14:51:04] I'll restore the data now [14:51:10] I backupped it in my home dir [14:51:16] nuria_: there are like 20 of them I think, that's actually the issue [14:51:31] joal: he is learning hive so was mentioning that was going through a lot of data and was trying to figure out how to make his pass samller [14:52:00] hm . I don't like to kill people queries nuria_. [14:52:17] joal: I talked with him about these just yesterday though [14:52:48] joal: about how he needed to reduce his data size but those were his 1st hive queries so he wasn't familiar with partitions and such [14:53:30] nuria_: partitionning was not optimal, but most problematic thing is launching a lot queries in parallel [14:53:39] jdlrobson: for when you come online --^ [14:54:20] joal: can we sandbox those even more so they do not affect other cluster business? I remember we reduced resources for user space [14:54:58] nuria_: Each query reads one day of webrequest data, and having them in parallel means they all have a very small amount of resources because they share it, meaning, at some point, it'll finish, but in the meantime there is no available resource for other regular users [14:55:25] nuria_: No impact on prod business, impact on other users only (like amir for instance) [14:55:48] joal: i see, and we cannot further reduce resources by user? [14:56:14] nuria_: resource quota management is really overkill for our use cases (difficult to set up and maintain) [14:56:30] I think teaching our users is the best scenario :) [14:57:47] nuria_: Big data tools tend to abstract computation cost to users - We should make sure they understand the resource cost of what they are doing [15:00:48] elukey: standduppp [15:27:58] elukey: can we move the data into /aggregate-datasets? that way i can delete it from 1002 [15:28:27] nuria_: sure I am looking into that, but I thought you put them on stat1003 [15:28:52] or are you saying that you have that also in stat1002? [15:29:03] (sorry too many rsyncs running between stat* :P) [15:31:02] basically what I know is [15:31:03] 15 * * * * /usr/bin/rsync -rt /srv/reportupdater/output/* stat1001.eqiad.wmnet::www/limn-public-data/ [15:31:03] elukey: data comes from 1002 and i rsycn-ed to 1003 so i t woudl made it to 1001 http endpoint [15:31:15] *would [15:31:18] ahhahaha [15:31:22] double jump [15:31:24] didn't know that [15:31:49] elukey: cause data comes from hive [15:31:58] elukey: but 1003 does not have access to hive [15:32:17] elukey: and i cannot put data directly in neither 1001 or 1002 [15:35:20] okok I need to figure out how to put data on aggregate-datasets [15:35:26] reading puppet [15:35:51] ? [15:35:54] there is an rsync module [15:35:58] ::srv [15:36:11] rsync ... stat1001.eqiad.wmnet::srv/aggregate-datasets/ [15:36:37] ahh so brutally copied in there [15:36:46] yup [15:36:49] I admit that the stat relationships confused me [15:36:51] there might be a cron too [15:36:56] that auto copies [15:37:01] yeah man, me too [15:37:26] cron { 'rsync aggregate datasets from stat1002': [15:37:27] command => "/usr/bin/rsync -rt --delete stat1002.eqiad.wmnet::srv/aggregate-datasets/* ${working_path}/aggregate-datasets/", [15:37:32] this guy in here [15:40:53] but I can't find srv/aggregate-datasets/ on stat1002 [15:42:07] ottomata: I get the stat1001 direct rsync but not the one --^ [15:42:16] that runs in cron on stat1001 [15:44:55] Arf elukey, we forgot to sync on logistics for the conf [15:47:03] joal: ah about druid perms? [15:47:09] sigh you are right [15:47:23] elukey: if we don't we'll never go ;) [15:47:55] I can re-join and steal 5 minutes of your meeting [15:48:00] nuria_: executing rsync -rt /srv/reportupdater/output/caching stat1001.eqiad.wmnet::srv/aggregate-datasets/ [15:48:04] (on stat1003 [15:48:19] elukey: yeah.... stat1002 /a vs /srv is the worst.' [15:48:22] i would like to get rid of /a [15:48:26] but, historically it exists [15:48:37] all other stat servers use /srv as the big data partition [15:48:40] but stat1002 uses /a [15:48:44] nooooooooo [15:48:57] :/ [15:48:57] on the other stat boxes, /a is a symlink to /srv [15:49:00] actually stat1003 not sure, checking. [15:49:14] yeah [15:49:18] but stat1002 both /a and /srv exist [15:49:38] and, to make things more transparent [15:49:41] on stat1002 [15:49:48] the ::srv rsync module [15:49:49]