[03:11:56] Quarry: Option to kill task by myself - https://phabricator.wikimedia.org/T137266#2362997 (Dvorapa) [03:12:32] Quarry: Option to kill task by myself - https://phabricator.wikimedia.org/T137266#2362997 (yuvipanda) If you fix it and re-execute again the previous task will automatically get killed! [03:49:14] Quarry: Option to kill task by myself - https://phabricator.wikimedia.org/T137266#2363073 (Dvorapa) Open>Invalid Oh, I didn't realized, thanks [04:03:16] Quarry: Option to kill task by myself - https://phabricator.wikimedia.org/T137266#2363095 (Dvorapa) Invalid>Open But sometimes you just don't want to run it again. E.g. if you find out the result is too large so you just want to kill it and end working on it [04:04:05] Quarry: Option to kill task by myself - https://phabricator.wikimedia.org/T137266#2363097 (Dvorapa) [04:10:54] Quarry: Add an option to export result in Wikilist - https://phabricator.wikimedia.org/T137268#2363098 (Dvorapa) [06:05:21] Analytics, Commons, Multimedia, Tabular-Data, and 4 others: Review shared data namespace (tabular data) implementation - https://phabricator.wikimedia.org/T134426#2265305 (pwalsh) Hi, I'm also working on the JSON Table Schema specification (and supporting tooling). It seems quite well suited her... [06:28:14] elukey: there's a failed RAID on analytics1049 [06:32:31] ah, just already wrote that in -operations two minutes earlier :-) [06:33:24] moritzm: thankssss! Working on it :) [06:33:42] the host is a regular hadoop node, I removed it from the cluster and scheduled downtime [06:56:32] Analytics-Cluster, Operations, ops-eqiad: analyitics1049.eqiad.wmnet disk failure - https://phabricator.wikimedia.org/T137273#2363239 (elukey) [06:56:41] moritzm: --^ :) [07:20:37] * elukey commutes to the office [07:50:12] joal: aloha! [07:50:20] Ciao elukey ! [07:50:22] o/ [07:50:31] Bonjour! [07:50:37] How are you today? [07:50:45] one of the kafka1022 partitions has only 10% of space left :P [07:51:03] mmm but it might get purged today, checking [07:51:06] ggoooooood [07:51:13] elukey: great: ) [07:51:17] really looking forward to deploy vk [07:51:19] and you? [07:51:25] elukey: great you're good, not kafka lack of space;) [07:51:44] I have been super distant from aqs load testing sorry, let me know if you need any help or if you are super busy :( [07:51:47] Good as well, Lino had a difficult night, which means me as well, but the rest ok :) [07:52:29] elukey: I can give you heads-up on qs if you want, but globally I can handle the rhythm [07:53:06] nono I am following what you are doing, just wanted to let you know that I am available if you need to offload things :) [07:53:55] elukey: Thanks mate :) [07:54:08] elukey: Currently trying to help others on other fronts as well [07:54:32] I know! You always do 100 things at the time, not sure how you do it :) [07:54:49] anyhow, bad news.. on kafka1022 the first timestamp is Jun 3 [07:54:54] elukey: I fail at least 30% of them (remember last week) [07:55:12] elukey: then manual purging will b [07:55:15] e the thing I guess [07:56:48] elukey: I have seen an open bug on the issue ion kafka jira, hopefully will be solved soon? [07:58:03] joal: I am not that confident, there is no traction from upstream from what I can see.. and we don't have super solid proofs about what is causingit [07:58:20] anyhow, to add more joy, /var/spool/kafka/b contains only maps and text partitions [07:58:34] so I'd probably need to purge text this time [07:59:12] maybe set topic retention days to 4 [07:59:24] for webrequest_text [07:59:34] should do the trick and free a lot of space [07:59:57] (brb) [08:00:11] elukey: I think you're right [08:00:42] joal: coffee and then I'll prepare the command, do you mind to review it? [08:00:50] I don't :) [08:00:54] thanks :) [08:01:00] four eyes are better than two [08:01:19] Correct, even I probably count for only one given the night I had ;) [08:11:26] joal: kafka configs --alter --entity-type topics --entity-name webrequest_text --add-config retention.ms=345600000 [08:13:12] elukey: except for the command itself (kafka-configs IIRC), the rest seems fine (I double checked the value, correct:) [08:13:50] joal: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Kafka/Administration#Temporarily_Modify_Per_Topic_Retention_Settings - I can't remember those commands :P [08:14:16] k elukey :) [08:14:27] super, proceeding :) [08:17:15] !log lowering down webrequest_text kafka topic retention time from 7 days to 4 days to free disk space [08:17:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [08:24:06] ok much better now [08:26:29] I thanks for taking care of that elukey :) [08:30:44] joal: :) [08:31:16] I was thinking that we could probably apply a maximum retention bytes set to something like 10TB [08:31:48] I wonder if there is analytics on beta (using mock data?). I don't have access to the webrequestlog (yet) and via beta I could get to know the system already. Also it seems to make sense to be able to test queries before running anyway :-) [08:32:00] retention.bytes should correspond to the maximum broker log size afaik [08:41:41] joal: woooooo somebody found the buuuuggg https://issues.apache.org/jira/browse/KAFKA-3802 [08:42:35] elukey: YAY ! [08:42:39] jand_wmde: Hi! We used to have something in beta but as far as I remember the cluster was unstable and not really able to accomodate queries.. [08:43:18] OK [08:45:33] !log removed temporary retention override for kafka webrequest_text topic (T136690) [08:45:34] T136690: Kafka 0.9's partitions rebalance causes data log mtime reset messing up with time based log retention - https://phabricator.wikimedia.org/T136690 [08:45:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [08:46:15] Analytics-Cluster, Analytics-Kanban: Kafka 0.9's partitions rebalance causes data log mtime reset messing up with time based log retention - https://phabricator.wikimedia.org/T136690#2363360 (elukey) Somebody was able to narrow down the problem in https://issues.apache.org/jira/browse/KAFKA-3802 finding... [09:08:42] joal: I'd propose to set the maximum log size to 10TB with https://gerrit.wikimedia.org/r/#/c/293270/1/modules/role/manifests/kafka/analytics/broker.pp [09:10:19] elukey: hm, can you remind me how much space there is on the brokers? [09:15:18] hi team! [09:17:57] Hi mforns :) [09:18:04] hey joal! [09:23:22] Hi early ottomata :) [09:23:37] HI! [09:23:38] :) [09:23:48] how goes it?! [09:23:52] you caught me! [09:24:02] heh, i drove my girlfriend to the airport super early [09:24:06] and i kinda felt like just stayin gup [09:24:09] probably will crash later [09:24:15] indeed very early for you ! [09:24:19] huhu :) [09:24:42] As you long it's you and not her ;) [09:26:45] ottomata: o/ [09:26:53] what time is it in NYC? [09:27:06] 5 AM? [09:27:12] yup! [09:27:19] got up 1.5 hours ago though :O [09:27:34] :D [09:27:43] we will see how this goes! I am pretty awake right now... [09:28:09] if you are super awake, I thought about https://gerrit.wikimedia.org/r/#/c/293270 [09:28:12] :P [09:28:25] CI is down atm so no possibility to test it with pcc etc.. [09:29:01] joal: re space - it is a bit tricky because we can set up the broker log size but not the partition sizes.. and we store them on disk partitions [09:29:29] I put 10TB because from what I can see it seems the threshold that we can sustain.. [09:31:10] elukey: cool, so, ja [09:31:11] hm [09:31:17] just looking at partitions sizes [09:31:25] looks likea webrequest_text is 614G [09:31:39] so 24 of those is about 15T [09:33:00] i guess 10 is good because we can be sure it will prune [09:33:17] but, it sounds like it will probably only prune webrequest_text, right? [09:33:30] an upload partition is 152G [09:33:40] which will be around 3.6T [09:33:44] in a week i guess [09:33:46] I brutally checked https://grafana.wikimedia.org/dashboard/db/kafka?panelId=17&fullscreen [09:34:02] oh aye [09:34:03] cool [09:34:04] for the past 30 days [09:34:11] hmmm [09:34:13] and yeah text will be the target :( [09:34:22] but at least it should give us some relief [09:34:24] but, that is for all logs [09:34:26] hm [09:34:44] oh that's per broker ja [09:34:44] hm [09:34:50] hm, i mean [09:34:54] sounds like the best we can do [09:35:19] it is not ideal but I wanted to have a chat with you and Jo about a permantent solution [09:35:40] elukey: since that will really only affect text, that will give us about 4.5 days of text i think [09:35:58] hmm elukey i wonder if instead of setting it globally, if we can get the retention bytes setting to work for upload specificly [09:36:05] and just set it low [09:36:09] would that actually help us? [09:36:15] i guess only if all disks had an upload partition [09:36:15] hm. [09:36:17] and they might now [09:36:19] not [09:36:28] ja they won't [09:36:28] hm [09:36:29] ahhh [09:36:52] oh, if things were distributed 100% evenly [09:36:53] they would [09:36:58] rep factor is 3 [09:36:59] 24 partitions [09:37:05] so 72 partition replicas [09:37:11] spread across 6 nodes would be 12 per node [09:37:29] buuut, unfortunetly partitions are not distributed evenly [09:37:34] mostly evenly, but not totally [09:37:41] I found this morning a disk partition with only text and maps :( [09:37:46] so I had to purge text [09:37:47] yeah, makes sense [09:37:50] (4 days) [09:37:52] aye [09:38:08] hm, i bet partitions are spread evenly per nodes [09:38:09] just not per disks [09:38:51] elukey: maybe we can be a little more explicit, and instead of setting this globally, just set it on text [09:38:54] and set the retention to 5 days [09:38:56] and just leave it [09:39:12] a 10T retention will only affect text anyway [09:39:18] so it might be nice to be explicit about it [09:39:43] oh, but, hm, would that solve hte problem? i guess not since our problem is with mitime and time retention [09:39:56] yeah... [09:40:26] for example, kafka1020 has mtime set to yesterday [09:40:39] jaaa right, yeah [09:40:40] hm ok [09:40:42] ok. [09:40:54] can we set this just on text then instead of changing it globally? [09:41:00] the bytes retention? [09:41:22] hmmmm [09:41:25] hmmm [09:41:26] I think that we could apply a kafka override and let it there permanently, but it will not be puppetized.. [09:41:47] hmmm, and also, thikning more, this would affect upload too [09:41:53] since in 2 weeks we'd grow over 10tb [09:41:56] even with upload [09:42:22] wait [09:42:48] ok elukey let's do it [09:42:49] yeah [09:46:57] all right, I'll wait until CI is up and do a pcc to verify :) [09:49:12] ha ok, i would just do it! :) [09:49:23] it's only going to change the server.properties file [09:49:29] and you can inspect it before you restart any brokers [09:52:03] don't feel that confident to merge stuff now :P [09:54:15] ah ottomata, joal: analytics1049 is out of the cluster, disk broken [09:54:49] dohhh [09:55:08] elukey: are we already out another too? [09:55:10] or was that fixed? [09:56:55] no no it was fixed (1047 IIRC) [09:57:08] we are not lucky with disks :( [09:57:32] I think that Chris could replace kafka1018 and analytics1049 together [09:57:49] maybe today we can chat with him when he comes online [09:58:21] new vk uploaded to carbon and installed via apt on cp1047, all good [09:58:34] will install on the whole maps cluster today [10:01:19] ja ok [10:01:22] nice [10:01:56] elukey: I stepped of that discussion in which I wouldn't have had much value ;) [10:02:12] joal: not true! do you like the idea? [10:02:41] elukey: I think it's an ok temporary solution, but I don't like the idea to keep it this way :) [10:02:55] elukey: temporal retention makes more sens functionally, [10:03:10] ja especially since it isn't across all topics. kinda funky [10:03:11] but ja [10:03:11] but since it's buggy now, let's make sure we don't break the things first [10:03:17] joal, at least the bug i made is getting a little attention [10:03:26] https://issues.apache.org/jira/browse/KAFKA-3802 [10:03:35] ottomata: I've seen that yeas, that great: ) [10:03:55] yep I was about to say that! They have a repro! \o/ [10:04:21] thing is, the fix will probably be on 0.11 (best case) [10:04:21] more than that: culprit identified, even if no solution yet [10:04:37] naw, they might do a backport :) [10:04:38] who knows [10:04:42] its kinda critical [10:04:51] To me this bug is big enough to necesitate a backport yes [10:04:52] we can ask for it at least if someone finds a fix [10:04:54] yeah [10:04:58] I was about to say that we could add the patch to the deb but no :P [10:08:17] heh yeah... [10:22:47] Analytics-Cluster, Analytics-Kanban, Operations, ops-eqiad: Smartctl disk defects on kafka1012 - https://phabricator.wikimedia.org/T136933#2363623 (Ottomata) Let’s do today. Apparently analytics1049 has a bad disk too. Maybe we can do them together! Ping either elukey or I when you are online... [11:29:15] Analytics: Update mediwiki hooks to generate data for new event-bus schemas - https://phabricator.wikimedia.org/T137287#2363796 (JAllemandou) [11:30:51] * joal is AFK for a whiole [11:32:03] Analytics: Update mediwiki hooks to generate data for new event-bus schemas - https://phabricator.wikimedia.org/T137287#2363824 (Ottomata) p:Triage>Normal a:Ottomata [11:53:46] heya joal, yt? [11:56:40] Analytics-Cluster, Operations, ops-eqiad: analytics1049.eqiad.wmnet disk failure - https://phabricator.wikimedia.org/T137273#2363964 (Danny_B) [11:56:56] ottomata: here but not for long [11:57:00] what's up ottomata ? [11:57:22] schema qs [11:57:27] sure [11:57:28] in https://gerrit.wikimedia.org/r/#/c/288210/9/jsonschema/mediawiki/revision_create/1.yaml [11:57:35] rev_bytes [11:57:42] in mw, this looks like it is called rev_len [11:57:51] where did we get rev_bytes from, the db? [11:57:57] I think so yeah [11:58:14] halfak names it this way, and ezachte as well [11:58:22] it is rev_len in db too [11:58:24] just chcked [11:58:24] checked [11:58:46] will leave a comment, i guess we ask halfak [11:58:49] rev_len seems more correct [11:59:27] ottomata: k [11:59:36] also a q about title vs page_title [11:59:39] rev create has page_title [11:59:43] page_delete has title [11:59:53] should we use page_title? [12:00:02] ottomata: good catch ! We should, yes [12:01:01] same for namespace [12:01:36] ottomata: I added a comment to the ZK max-conns but not sure if needed, so I'll leave the final choice to you :) [12:01:42] don [12:01:48] don't want to block moritzm [12:01:51] argh [12:01:53] mobrovac: [12:02:01] hello Moritz! Wrong ping sorry :) [12:10:08] elukey: responded inline thanks [12:10:17] joal: should we do the same for titles on page_move? [12:13:23] Analytics: Update mediwiki hooks to generate data for new event-bus schemas - https://phabricator.wikimedia.org/T137287#2364032 (Ottomata) [12:13:25] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2364031 (Ottomata) [12:16:01] * elukey lunch! [12:24:13] joal: when you come back, another schema q for you about redirect_page_id in page_move [12:45:14] Analytics: 20160431 produces "end timestamp is invalid, must be a valid date in YYYYMMDD format" - https://phabricator.wikimedia.org/T135812#2364182 (Nemo_bis) >>! In T135812#2318963, @Milimetric wrote: > @Nemo_bis: the query asks for 20160401 through 20160431, and 20160431 is not a valid date, in case that... [12:46:42] Analytics: 20160431 produces "end timestamp is invalid, must be a valid date in YYYYMMDD format" - https://phabricator.wikimedia.org/T135812#2364185 (Nemo_bis) [12:50:20] Analytics, Analytics-Cluster, Operations, Services: Better monitoring for Zookeeper - https://phabricator.wikimedia.org/T137302#2364233 (Ottomata) [12:51:46] Analytics, Analytics-Cluster, EventBus, Operations, Services: Better monitoring for Zookeeper - https://phabricator.wikimedia.org/T137302#2364255 (mobrovac) [12:58:42] Analytics, Analytics-Cluster, EventBus, Operations, Services: Better monitoring for Zookeeper - https://phabricator.wikimedia.org/T137302#2364285 (Ottomata) The default client connection in our puppet module is no limit. Once we have alerts we should set a limit, pretty high, maybe 2048. [13:02:25] ottomata: you there? [13:03:27] ja [13:03:33] elukey: fyi i am restarting zks [13:04:01] sure :) [13:04:14] I was looking for zk metrics aaand... nothing on graphite [13:04:35] the jmxtrans module seems in need of love [13:04:40] ayyyye [13:07:45] elukey: hi! :) [13:07:52] o/ [13:08:18] i dunno if you have yet seen my latest on https://gerrit.wikimedia.org/r/#/c/290860/2 [13:08:28] oh [13:08:30] you have! [13:08:45] elukey: could you merge that for me? [13:09:17] as it is, i'm going to need to open an issue for godog to cleanup the metrics introduced when i upgraded the code on the 2.2.6 nodes [13:09:35] will do it in ~1hr, is it ok? [13:09:42] I mean, in the next hour :) [13:09:44] i was supposed to be filtering them out with this gerrit before landing that code, and then all hell broke loose last week and i landed that code :) [13:09:47] sure! [13:09:49] super! [13:10:17] if you have time I'd also like to get your opinion of https://gerrit.wikimedia.org/r/#/c/292568/ [13:10:46] sure, let me have a look [14:09:26] Analytics: 20160431 produces "end timestamp is invalid, must be a valid date in YYYYMMDD format" - https://phabricator.wikimedia.org/T135812#2364448 (Danny_B) @Nemo_bis And what would you suggest instead of invalid report? [14:21:52] Quarry: Add a stop button to halt the query - https://phabricator.wikimedia.org/T71037#2364499 (matej_suchanek) [14:22:06] Quarry: Option to kill task by myself - https://phabricator.wikimedia.org/T137266#2364500 (matej_suchanek) [14:22:10] Quarry: Add a stop button to halt the query - https://phabricator.wikimedia.org/T71037#721028 (matej_suchanek) [14:52:54] Analytics: 20160431 produces "end timestamp is invalid, must be a valid date in YYYYMMDD format" - https://phabricator.wikimedia.org/T135812#2364607 (Nemo_bis) >>! In T135812#2364448, @Danny_B wrote: > @Nemo_bis And what would you suggest instead of invalid report? One option is to simply perform a numerica... [15:28:33] ottomata: Heya [15:29:03] ottomata: From your messages, I understand there is inconsistency in names for page related fields [15:29:28] joal: seems so, but it might have been intentional [15:29:30] oh let me save my comments [15:29:50] ottomata: I tried to be carefull on that aspect, so I'm a bit angry at me, but please let's discuss that, it's an important benefit of new schemas in my opinion [15:30:10] it seems in page contexts, the page_ prefix is dropped [15:30:21] for title [15:30:32] but on revision, it is page_title [15:30:34] maybe that was on purpose [15:32:29] ottomata: We didn't discuss that, but in my mind page related event didn't need the page_ prefix - But it actually is not true in the case of revisisions which have the prefix ... So let's be verbose and add the page prefix ? [15:32:54] +1 [15:40:54] ok ottomata, I'm gonna get back to modifying my schema :) [15:44:27] Analytics, Developer-Relations, MediaWiki-API, Reading-Admin, and 4 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#2364746 (bd808) [15:49:38] ottomata: I was thinking to try [15:50:15] kafka configs --alter --entity-type topics --entity-name webrequest_text --add-config retention.bytes=10000000000000 [15:50:16] elukey: you've installed your new vk on all the varnish4 hosts? [15:50:21] yeppa [15:50:33] elukey: +1 for trying that :) [15:51:58] joal: yt? [15:52:03] I am elukey [15:52:18] since you are my official reviewer, would you mind to check the above command? [15:52:24] :) [15:54:58] elukey: works for me if we are considering terabytes and not TiB :) [15:56:54] Analytics, MediaWiki-API, User-bd808: Run ETL for wmf_raw.ActionApi into wmf.action_* aggregate tables - https://phabricator.wikimedia.org/T137321#2364785 (bd808) [15:57:25] Analytics, MediaWiki-API, User-bd808: Run ETL for wmf_raw.ActionApi into wmf.action_* aggregate tables - https://phabricator.wikimedia.org/T137321#2364802 (bd808) [15:57:30] Analytics, Developer-Relations, MediaWiki-API, Reading-Admin, and 4 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#2364801 (bd808) [15:58:49] joal: yes :) [15:59:51] elukey: As an (kind of) old computer scientist, I'm not used to see round numbers when discussing data volumes :) [16:02:30] ottomata: standup [16:02:38] oh thanks [16:03:09] !log temporary set a 10TB upperbound to the Kafka webrequest_text topic to free space [16:03:36] trying to join... [16:10:54] Analytics-Kanban, Patch-For-Review: Update mediwiki hooks to generate data for new event-bus schemas - https://phabricator.wikimedia.org/T137287#2364848 (Ottomata) [16:25:47] joal: analytisc1001 back as rm active now [16:25:55] thanks ottomata [17:06:25] Analytics-Cluster, Operations, ops-eqiad: analytics1049.eqiad.wmnet disk failure - https://phabricator.wikimedia.org/T137273#2363239 (Ottomata) Also: ``` Jun 8 16:58:34 analytics1049 kernel: [7283582.453037] sd 0:2:2:0: [sdc] Jun 8 16:58:34 analytics1049 kernel: [7283582.453043] Result: hostbyte=D... [17:09:07] Analytics-Cluster, Operations, ops-eqiad: analytics1049.eqiad.wmnet disk failure - https://phabricator.wikimedia.org/T137273#2365080 (Ottomata) Some output from ``` sudo megacli -AdpEventLog -GetEvents -f events.log -aALL ``` ``` ... seqNum: 0x00005441 Time: Wed Jun 8 06:55:17 2016 Code: 0x0000... [17:11:02] Analytics-Cluster, Operations, ops-eqiad: analytics1049.eqiad.wmnet disk failure - https://phabricator.wikimedia.org/T137273#2363239 (Cmjohnson) Enclosure Device ID: 32 Slot Number: 1 Drive's position: DiskGroup: 2, Span: 0, Arm: 0 Enclosure position: 1 Device Id: 1 WWN: 500003964ba801e7 Sequence Num... [17:41:41] Analytics, MediaWiki-API, User-bd808: Run ETL for wmf_raw.ActionApi into wmf.action_* aggregate tables - https://phabricator.wikimedia.org/T137321#2365196 (bd808) The scripts I have been using are available at https://github.com/bd808/action-api-analytics [17:48:47] a-team, logging off, see you tomorrow! [17:48:53] :] [17:49:14] laters! [18:13:23] !log removed retention.bytes override configuration for kafka webrequest_text (didn't work) [18:13:32] wah wahhh [18:13:34] dunno why that doesn't work [18:13:44] :( [18:20:38] ottomata: anything that I can do before going offline? [18:22:41] naw its cool elukey i got it [18:22:44] have good eve! [18:22:45] ttyt! [18:23:18] you too! byeee! o/ [18:24:16] Analytics-Cluster, Operations, ops-eqiad: analytics1049.eqiad.wmnet disk failure - https://phabricator.wikimedia.org/T137273#2365341 (Cmjohnson) Replaced the disk. The preserved cache was not able to be cleared using megcli commands. Had to reboot the server and dicard using the raid bios on-site.... [18:25:59] Analytics-Cluster, Operations, ops-eqiad: analytics1049.eqiad.wmnet disk failure - https://phabricator.wikimedia.org/T137273#2365359 (Cmjohnson) Leave the ticket open until I get the warranty part from Dell. [21:44:27] Analytics: 20160431 produces "end timestamp is invalid, must be a valid date in YYYYMMDD format" - https://phabricator.wikimedia.org/T135812#2366172 (Milimetric) Oh I see, if the request is to not validate dates, then I would personally decline that. We validate parameters to make sure the user is asking fo... [21:52:06] milimetric: may be there's a bunch of stuff happening in the driver? [21:52:23] driver? [21:52:24] what driver? [21:52:26] you can bump driver memory with --driver-memory [21:52:39] so there's a driver, and many executors [21:52:51] oh... what's the default driver memory? [21:53:28] all distributed operations will happen in the executors, and the driver manages these, and also if you did like a grouping or counting - i believe it will have to be processed on the driver [21:53:36] not sure may be 512mb or 1g [21:54:11] hmmm [21:54:19] ok, I'll try, thanks [21:54:53] is 2G too much to ask? [21:55:23] https://spark.apache.org/docs/1.5.2/configuration.html says 1G [21:55:48] no that sounds fine [21:55:55] aha, ok, thx [22:13:44] indeed it's not dying yet, but it's taking for-ever!! weird [22:14:52] aah [22:18:42] ah ok, it died. Something must be wrong with the logic, back to the drawing board [22:19:28] died with OOM?