[13:28:25] does git review remember my user/pass ? [13:29:04] it just uses your git ssh config [13:29:13] did you install git-review [13:29:13] ? [13:29:23] yes [13:29:33] sudo apt-get install python-pip [13:29:39] sudo pip install git-review [13:29:41] ok [13:29:45] something like that [13:29:49] sounds right [13:30:08] I get this https://gist.github.com/6dba61eb2bb460c7f451 [13:30:34] yes, you need to install the post-commit hook [13:30:39] but i can't find the URL [13:30:42] 1 se [13:30:44] 1 sec [13:30:50] ok [13:32:49] run git review -s [13:33:00] ok [13:33:08] ran it [13:33:11] now ? [13:34:13] now git commit -a --amend to add the change i [13:34:34] just save and close vim [13:34:37] then git-review [13:35:45] worked [13:36:20] what is the gerrit patchset id? [13:36:38] https://gerrit.wikimedia.org/r/#/c/24286/ [13:36:45] click that link [13:36:48] yes [13:36:59] so you called your branch 'dev'? [13:37:06] so can we recap together(for myself .. I also gotta write this down somewhere). When a review is needed : 1) git review -s ; 2) git commit -a --amend 3) git-review [13:37:12] yes I did [13:37:22] recap: no :) [13:37:33] that was just the first time to get it installed [13:37:40] so the workflow is like this [13:37:51] 1) git branch -b new_branch [13:37:56] 2) do stuff [13:38:02] 3) git commit -a [13:38:06] 4) git-review [13:38:20] NOW if you want to *fix* an existing patch set then [13:38:27] 1) do stuff [13:38:32] 2) git commit -a --amend [13:38:35] 3) git-review [13:39:13] makes sense? [13:39:15] yes [13:39:29] can I keep my branch for all further development on that git repo ? [13:39:59] yes, but branches are cheap so best to put use a new branch for new stuff [13:40:20] I suppose I should always branch out of master right ? [13:40:24] particularly when things are not yet merged in gerrit and you continue working on the same branch [13:40:29] yeah that's best [13:41:05] yes [13:42:34] ok got the review [13:42:40] now Im going to "fix" it [13:42:48] yeah so just add a new line [13:42:55] then git commit -a --amend [13:42:57] then git-review [13:43:27] morning milimettric [13:43:32] morning milimetric [13:43:38] morning drdee [13:44:00] you might want to read up the conversation, it's about gerrit [13:44:35] how to install it and how to use it [13:44:55] drdee: done, please check [13:44:56] got it [13:44:57] so visit https://gerrit.wikimedia.org/r/#/c/24286/ [13:45:11] now you see there is a new patch set. patch set 2 [13:45:16] yep, i was following along a little bit though it seems a little bizarre with the whole amending [13:45:25] drdee: yes [13:45:29] and so all the changes, feedback are still part of the same changeset [13:45:45] milimetric: i totally agree, i think amending is abused by gerrit [13:45:56] but it's crucial part of workflow [13:46:36] average_drifter: so now I will accept your changeset [13:46:44] and it will get merged on master on origin [13:47:05] alright [13:47:08] check https://gerrit.wikimedia.org/r/#/c/24286/ [13:47:15] you see status is 'merged' [13:47:19] yay :) [13:47:21] :D [13:47:27] now run git pull [13:47:39] on local master [13:47:42] git pull will pull all the remote changes into each appropriate branch right ? [13:47:49] no [13:47:50] because I haven't specified it any branch or remote [13:47:59] ok, so I should checkout master [13:48:01] and then git pull ? [13:48:02] it will merge in master [13:48:03] yes [13:48:19] because there is no remote branch 'dev' [13:48:27] (you could create that if you wanted) [13:48:40] yes but you said local [13:48:46] should I create it on the remote as well ? [13:49:02] brb [13:49:03] not necessary [13:49:08] morning ottomata [13:49:20] git pull okay? [13:49:24] yes [13:49:31] worked fine [13:49:35] mooooorning [13:49:41] okay so now you can delete your local 'dev' branch [13:49:45] and that is the whole cycle [13:50:07] what happens if you checkout dev branch [13:50:10] and then run git pull [13:50:18] git branch -d dev [13:50:28] then you delete it :) [13:50:40] oh sorry [13:50:59] so I will make a new local branch for each feature I want to work on [13:51:04] yes [13:51:10] and then when I'm done with the gerrit and all, I delete it [13:51:45] yes, right ottomata? if you a local feature branch in git and it gets merged then you can delete your local branch, right? [13:52:28] average_drifter: so let's start pushing some changes to wikistats as well, now for realz :D [13:52:46] drdee: I'm missing an example2.log (or so I think) [13:52:55] drdee: it's used by test.sh [13:53:04] i know, i have to dig it up [13:53:11] ok [14:01:32] you wanna push your stuff to wikistats meanwhile? [14:02:48] yes, I'll be looking on other tasks for the moment [14:03:09] pls ping me when I can have that log [14:05:50] ok [14:08:42] drdee I was talking to dschoon about a possible improvement to the branching strategy [14:08:44] https://github.com/nvie/gitflow [14:08:52] have you guys seen/used git flow? [14:10:15] it lets you do stuff like [14:10:28] git flow feature start [14:10:40] git flow feature finish [14:11:04] so like high level branching and standardization so you think less (TM) [14:15:58] milimetric: i can see that work for limn for sure, for some other projects it might be a bit too heavy handed [14:16:35] i am not sure how easy it is to use with gerrit as well but as limn is hosting on github, yeah go for it [14:16:51] and yes i had seen this model in the past [14:17:14] so limn doesn't have to use git review? [14:17:42] no, because it's on github and because dschoon would rather stick needles in his eyes then use gerrit :D [14:19:19] lol [14:20:19] hey side question - do you guys have your IRC set to make a sound when someone says something? I turned that off after a while of otto vs. dschoon yesterday :) [14:20:32] no sound [14:20:34] no sound [14:20:38] it blinks if someone types my name [14:20:55] k, mine beeps if someone types my name. good [14:21:01] it blinks if one of the analytics team members are mentioned [14:21:11] ooh, how'd you do that [14:23:13] nvm, mac only [14:25:28] average_drifter: pushed example.log to webstatscollector [14:31:43] ottomata, how do we currently update wikistats on stat1? [14:31:52] (i mean the source code) [14:31:57] i don't know [14:32:00] that's erik z's stuff, right? [14:32:03] :D [14:32:05] yes [14:32:08] ja, dunno [14:32:18] thanks [16:52:38] hello hello [16:52:46] how goes, ottomata? [16:52:58] hey! [16:52:59] just now! [16:53:00] good [16:53:01] all morning [16:53:03] ERRRGH [16:53:10] but it think I figured out what was wrong with cass [16:53:14] rpc_port was set at 0.0.0.0 [16:53:18] and I think cass didn't like that [16:53:22] so I set it to the IPs of each machine [16:53:25] and it is much happier now [16:53:31] yeah. [16:53:36] i think it even says that in the docs? [16:53:37] no luck with opscenter though [16:53:43] i also think i mentioned this :) [16:53:45] so [16:53:50] a thing i realized [16:54:18] lemee find the link [16:56:24] i'll keep looking [16:56:27] actually [16:56:32] hmmmmmmm, something is not right [16:56:35] but basically, because opscenter uses 7199 for jmx [16:56:44] ja? [16:56:45] if you have another dse node on the same machine, they'll fight over 7199 [16:56:47] and get confused [16:56:48] ohhh [16:56:51] interseting [16:56:52] ok [16:56:54] good to know [16:57:07] is that for opscenterd AND opscenter-agent? [16:57:22] would be weird if the default ports from Datastax don't work [16:57:36] drdee: got a question about the msgformat in line 163 collector.c [16:57:38] no, the agent is calling home [16:57:45] aye ok [16:57:51] drdee: must it be like very precise, currently it's %llu %127s %llu %llu %1023[^\n]" [16:57:53] it should be inside the jvm collecting stats [16:57:54] ok, well i'm not running dse on an01 anymore [16:57:57] hence jsvc [16:58:02] *nod* [16:58:03] just sayin [16:58:05] aye [16:58:16] anyway, since I changed rpc_addy [16:58:22] cass nodes come up real fast and easy [16:58:26] but [16:59:27] trying to use hadoop [16:59:27] 10.64.21.102 [16:59:29] oops [16:59:33] Connection failed to Cassandra node: analytics1002.eqiad.wmnet:9160 unable to connect to server [16:59:34] hangout? [16:59:35] drdee: so the actual problem is that generate-test-data.pl generates lines with 4 filelds(number,string,number,string) ; what you told me on skype was [6:38:33 PM EEST] Diederik van Liere: and sends output like [16:59:36] uh [16:59:39] [6:38:39 PM EEST] Diederik van Liere: en.mw 1 1213 en [16:59:47] does this page have content for any of you? http://www.datastax.com/docs/opscenter/online_help/add_cluster/index [16:59:49] 9160 is the rpc port [16:59:52] https://plus.google.com/hangouts/_/7f14f09d5b23e6f5ddc282ccf3728bfdae83df93 [16:59:56] yes [16:59:56] no, many of them don't this mornig [17:00:06] rpc is different from storage is different from jmx [17:00:14] drdee: so basically you also agree that it sends lines like (number,string,number,string) [17:00:20] aye [17:00:21] but [17:00:27] it looks like hadoop connects via rpc [17:00:33] and there is no listening rpc port [17:00:34] yeah, that format should not have change IIRC [17:04:42] dse hadoop fs -ls / [17:13:39] http://nicolargo.github.com/glances/ [17:25:10] http://etherpad.wikimedia.org/AnalyticsMilestonesAndPriorities [18:33:44] drdee: https://gerrit.wikimedia.org/r/24319 [18:33:46] drdee: please review [18:33:50] ty [18:34:14] drdee: still working on filter.c , we'll have another git review soon [18:34:34] ok [18:35:34] okay, so maybe it's an idea to add a debug switch to collector and if the debug switch is enabled then #define PERIOD 60 [18:35:37] else #define PERIOD 3600 [18:35:51] because this is asking for trouble if we suddenly forget to revert this [18:36:30] alright [18:36:31] fixing [18:40:59] so i am gonna do a -1 on the review, then don't forget to git commit -a --amend [18:48:01] drdee: ok, fixed [18:49:38] ok, merged [18:50:41] so, the db contains data again? [18:51:18] now the collector writes data to dumps/ , but only from data coming from generate-test-data.pl [18:52:59] right [18:53:09] so now let's look at filter.c [18:56:32] yep [18:56:34] average_drifter: check line 135 in filter.c [19:00:01] back [19:24:07] hey dschoon [19:24:11] if I have rpc_port to 9160 [19:24:14] sup [19:24:22] and all the cass nodes seem to be talking to each other nicely [19:24:32] shouldn't I see somethign in netstat listening on that port? [19:24:55] yes? [19:25:06] netstat -lnp | grep jsvc [19:25:20] yeah its not there [19:25:21] netstat -lnp | grep -E '(jsvc|java)' [19:25:29] same [19:30:44] well poopers dschoon, i mean, ergh [19:30:50] hm [19:30:51] hm. [19:30:54] one sec [19:30:56] in the middle of something [19:30:59] going to restart it with 0.0.0.0 and see if that port is open [19:36:51] you sure this isn't iptables? [19:37:03] (also, in case anyone finds this useful: https://gist.github.com/3751741 ) [19:46:36] so [19:46:44] iptables isn't running on the other nodes [19:46:46] only an01 [19:46:48] because it has a public IP [19:46:50] hrm. [19:46:53] yeah, that makes sense. [19:46:56] ok, i need a lunch [19:46:59] then i will look? [19:47:01] k [19:49:13] drdee: the problem is the chroot [19:49:22] in collector? [19:50:16] filter [19:50:30] collector is ok overall [19:50:49] oh, ottomata [19:51:01] root@analytics1002:~# nodetool -h localhost statusthrift [19:51:01] not running [19:51:09] that is why. [19:51:13] average_drifter: what line? [19:51:13] statusthrift? [19:51:13] hm [19:51:26] (run nodetool without args to see the commands) [19:51:35] but the chroot("/tmp/); in filter.c is causing the rest of the program to switch the filesystem based in /tmp and that is also what causes /dev/stdin to not be found I presume, and that leads to the while(fgets..){} not running at all [19:51:41] because they're hadoop nodes, they don't run the rpc server [19:51:48] hm [19:51:49] brisk uses the internal API [19:51:50] drdee: 230 [19:51:56] got it [19:52:05] so you probably can't connect using cassandra-cli either [19:52:07] brb lunch [19:52:07] no i can't [19:52:09] tried that [19:52:11] word. [19:52:16] so mystery solved. [19:52:19] but hadoop uses the rpc port? [19:52:24] i don't think so. [19:52:33] dse hadoop fs -ls / [19:52:33] 12/09/19 19:52:28 WARN util.CassandraProxyClient: Connection failed to Cassandra node: analytics1003.eqiad.wmnet:9160 unable to connect to server [19:52:47] drdee: can you please tell me more details about that chroot, what is the motivation behind using it ? I can see no purpose for it, but someone must've put it there for a reason [19:52:50] *shrug* [19:52:51] dunno. [19:52:55] hm [19:52:56] i will look in a bit. [19:52:57] drdee: we can use git blame or something to find out who and ask him [19:52:58] i mean [19:53:01] does hadoop work? [19:53:06] no [19:53:16] that's why i'm looking into this [19:53:19] it *did* work before [19:53:20] yesterday [19:53:30] except that the nodes were flapping [19:53:36] average_drifter: i know who did it, i think it was put in /tmp/ because the db files are temporarly [19:53:54] that has always worked so no need to fix it [19:53:59] ah. [19:54:01] well, shit. [19:54:07] ok, def need food. brb [19:54:17] not sure if use of /tmp/ and /dev/stdin are related [19:54:24] hmmmmm [19:54:25] My recommendation is to use Java 7 runtime for 2.1 datastax. [19:54:27] ok bye [19:54:34] We do not recommend java 1.7 for DSE. [19:54:36] haha [19:54:40] (we are on 1.6) [19:54:46] (just pasting forum nonsense now, sorry) [19:57:41] drdee: I have a question [19:57:45] drdee: does filter run as root ? [19:57:45] shoot :) [19:57:54] drdee: usually if you do chroot you are required root priviledges [19:58:01] let's ask ottomata [19:58:19] ottomata, webstatscollector (both filter and collector) what user runs them on locke/emery? [19:58:27] yeah, root, pretty sure, [19:58:30] :D [19:58:35] udp2log feeds into them, right? [19:58:39] yes [19:58:48] ah no [19:58:51] they are run by the udp2log user [19:58:55] udp2log feeds into filter [19:59:13] average_drifter: ^^ [19:59:30] user@garage:~$ chroot /chroot [19:59:30] chroot: cannot change root directory to /chroot: Operation not permitted [19:59:38] in /chroot I have a debian set up [19:59:46] it says "Operation not permitted" with my normal user [20:00:02] ottomata: does the user doing chroot need to have special priviledges in order to use chroot ? [20:00:13] ottomata: and if so, what are the minimal such priviledges [20:00:38] what is chrooting? [20:00:42] the filter code? [20:00:45] yes [20:00:47] yes [20:00:52] (from webstatscollector) [20:00:58] not udp-filters :) [20:01:06] Only a privileged process (Linux: one with the CAP_SYS_CHROOT capability) may call chroot(). [20:01:31] drdee: when I comment out the 230 line in filter.c all works fine [20:01:37] drdee: so it must be chroot-related [20:02:19] you mean the retval clause? [20:02:45] drdee: on line 230 there's chroot("/tmp") [20:03:12] got it [20:08:31] dschoon, lemme know when you are back [20:08:38] eating atm [20:08:42] but also fiddling [20:08:47] aye [20:08:59] so I installed DSE on my local vm [20:09:02] thrift is not running either [20:09:04] but hadoop works [20:09:14] can't really type much tho, due to Emergency Avocado Actions that must be taken [20:09:21] yeah, that's kinda what i thought [20:09:33] i just tried to enable thrift on an03 [20:09:35] but it just hung [20:10:49] er, why do we have two opscenters running? [20:11:14] bwer where? [20:11:16] dsc ~ ❥ dsh -g kk 'sudo netstat -lnp | fgrep 8888' [20:11:16] an02: tcp 0 0 127.0.0.1:8888 0.0.0.0:* LISTEN 17935/python2.6 [20:11:16] an01: tcp 0 0 208.80.154.154:8888 0.0.0.0:* LISTEN 10827/python2.6 [20:11:32] pssh, duno, lets stop it on an02 [20:11:52] just did [20:11:55] DOUBLE THE OPS POWER [20:11:59] DOUBLE THE CENTRALIZATION [20:12:10] SO CENTRAL IT IS LIKE A BLACK HOLE [20:12:19] hehe [20:14:21] hmmmmmmmmmmmMMM [20:14:31] actuallyits not working on my local Vm anymore either! [20:14:54] also [20:14:59] also [20:15:03] where did you get this config we're using? [20:15:04] since I upgraded DSE [20:15:08] dsetool stopped working [20:15:11] auto_bootstrap: false is not documented... [20:15:15] cassandray.yaml? [20:15:21] from the deb package [20:15:32] hm [20:15:48] here's the orig: [20:15:55] /home/otto/scr/scr/etc/dse/cassandra/cassandra.yaml [20:16:05] ty [20:16:06] i extracted that from the .deb [20:16:06] will look [20:17:23] ok hmm [20:17:24] on local VM [20:17:30] after enablethrift [20:17:35] hadoop and cassandra-cli work again [20:17:48] trying to enablethrift on an02... [20:17:56] where does the cassandra.yaml that is used by dse live? it's /etc/dse/cassandra/cassandra.yaml, right? [20:18:00] yes [20:18:04] kk [20:18:57] interesting, so when I enablethrift [20:18:59] it just hangs, but the logs say [20:19:11] Starting up Hadoop trackers [20:19:11] Found CFS filesystem in Hadoop config: cfs [20:19:11] Found CFS filesystem in Hadoop config: cfs-archive [20:19:57] opscenter says it can't connect to the cluster. [20:20:13] i think this is very clearly networking configuration. [20:20:17] yeah, i haven't been messing with opscenter [20:20:35] i dunnoooooooo, why? [20:20:40] we can't even start the thrift service [20:20:41] right? [20:20:51] we know that the cassandras can all talk to each other [20:21:35] the rpc service is just for external API calls. [20:21:51] yes, which is used by cassandra-cli, hadoop interface, opscenter, etc. [20:21:52] no? [20:22:03] the storage_port is used by cassandra for its internal stuff [20:22:12] right. [20:22:12] that seems to be working [20:22:14] correct. [20:22:25] gimme a few to finish sandwich and read everything. [20:22:28] ok [20:22:38] maybe something is keeping cassandra from binding to port 9160 [20:22:45] even though we can't see it in netstat? [20:22:46] i dunnoooo [20:22:53] what if I just try changing the port? [20:22:58] on all machines? [20:23:30] question. [20:23:43] normally, when you run nodetool, "DC" is the cluster-name, is it not? [20:24:03] because it says Analytics [20:24:04] DC? [20:24:12] and cluster_name is KrakenAnalytics [20:24:23] ahh [20:24:30] nodetool -h localhost ring [20:24:30] Address DC Rack Status State Load Owns Token [20:24:30] 167592005115508692058986811016003550253 [20:24:30] no, that is something Datastax does [20:24:31] 10.64.21.110 Analytics rack1 Up Normal 6.92 KB 5.25% 6383010147265667853861934756551826302 [20:24:33] 10.64.21.103 Analytics rack1 Up Normal 31.51 KB 2.40% 10473255200867680374935687437417441993 [20:24:35] ... [20:24:45] telling you whether it is a raw Cassandra node, an analytics (hadoop) node, or a search (solr) node [20:24:50] mk. [20:25:05] that setting is in /etc/default/dse [20:25:10] HADOOP_ENABLED=1 [20:25:22] if we set it to 0, it would say something else [20:25:39] ok, i think we need to figure out why we can't enablethrift [20:25:45] on my local vm [20:25:47] if I disablethrift [20:25:55] I get the same error about connecting to 9160 [20:25:57] if I enablethrift [20:26:02] hadoop and cassandra-cli work [20:26:42] (messing with dse on an03) [20:28:04] i strongly suspect that this is due to the upgrade, and there's now a new step, and we haven't done it [20:28:14] i think auto_bootstrap:false is the problem [20:28:16] gimmee a sec [20:28:33] hmmm, yeah mabye [20:29:09] http://wiki.apache.org/cassandra/StorageConfiguration [20:29:46] StorageService says there's only one keyspace, 'system' [20:29:56] which is wrong, iirc. there should be a bunch of CFS metadata [20:31:42] hm, yeah should be, right? [20:31:51] its supposed to initialize all that stuff when it boots the first time [20:32:06] yes. [20:32:32] welp, rpc on a diff port still doesn't work [20:32:34] enablethrift just hangs [20:37:56] drdee: I'm ready for review [20:38:06] drdee: new branch or can I do it on the previous one ? [20:38:07] ok submit it :) [20:38:14] new branch [20:38:16] ok [20:38:22] rules are rules :) [20:38:25] because the previous one was merged [20:38:34] oh ok [20:38:58] ottomata: do we have JNA installed? [20:39:11] http://www.datastax.com/docs/1.1/install/install_jre#install-jna [20:40:16] I actually only followed those extra steps on an02! [20:40:24] maybe that's it?! [20:40:26] *eyebrow* [20:40:27] jna is installed [20:40:30] but not the github version they say [20:40:31] let us be consistent! [20:40:37] well, i did that earlier today [20:40:43] and let us follow instructions! [20:40:45] when I was trying to figure out why things weren't working [20:40:47] really [20:40:48] so that way when it doesn't work, it's not our fault! [20:40:49] heh [20:40:51] we should upgrade these to precise [20:40:52] and not worry about it [20:40:59] indeed [20:41:00] but. [20:41:07] from what I could tell [20:41:09] do they have any recommendation on that front? [20:41:12] the jna stuff only is for performance [20:41:18] not for functionality [20:41:20] yeah [20:41:21] but i mean [20:41:26] consistency [20:41:29] and, the rpc stuff works on my vm without the extra step [20:41:30] but yet [20:41:31] yeah* [20:43:31] average_drifter: gerrit link? [20:44:13] dschoon, re: consistency, it looks like I did not do the JNA step on an02…either that or something replaced the .jar I dled [20:44:21] haha [20:44:22] I will go ahead and put it in place on all of them [20:44:24] hokay. [20:46:35] drdee: https://gerrit.wikimedia.org/r/24375 [20:46:40] ty [20:46:48] uh. [20:46:58] that's certainly a big old WTF [20:47:12] dsc ~/tmp/dse ❥ dsk kkc 'grep rpc_port /etc/dse/cassandra/cassandra.yaml' [20:47:12] an05: rpc_port: 9160 [20:47:12] an07: rpc_port: 9160 [20:47:14] an09: rpc_port: 9160 [20:47:16] an02: rpc_port: 9160 [20:47:18] an01: rpc_port: 9160 [20:47:20] an03: rpc_port: 9161 [20:47:22] an06: rpc_port: 9160 [20:47:24] an04: rpc_port: 9160 [20:47:26] an08: rpc_port: 9160 [20:47:28] an10: rpc_port: 9160 [20:47:30] sup an03 [20:47:32] how's it goin? [20:47:39] average_drifter: you should have run git pull first after you previous commit was merged [20:47:52] now you are submitting your previous change again [20:48:20] drdee: I did git pull and then branched out of that [20:48:24] from master [20:48:32] I must've made a mistake :| [20:48:39] drdee: I can re-do it [20:48:44] drdee: should I ? [20:48:46] mmmm, but your new change set does contain the old patch [20:49:14] yes make sure that your new branch contains the most recent version [20:49:23] ok [20:49:36] sorry about it, I'm a nooblet to this gerrit thing [20:50:09] that's okay [20:50:15] we all curse at it [20:51:42] ...ottomata [20:51:50] how exactly did we start these nodes? [20:52:00] because we both have said out loud what the problem is. [20:52:25] ? [20:52:34] they're ANALYTICS nodes [20:52:36] btw, the 9161 on an03 was me manually trying stuff [20:52:37] not HADOOP nodes. [20:52:37] yesh [20:52:41] Analytics == Hadoop [20:52:46] ...I suspect not. [20:53:00] let me find you the doc... [20:54:12] ! maybe [20:54:22] maybe since changing the the rpc IP to the ip of the node [20:54:28] this [20:54:28] [20:54:28] fs.default.name [20:54:29] cfs:/// [20:54:29] [20:54:31] does not work anymore [20:54:36] since it probably assumes localhost [20:54:56] oh, fuck. [20:55:03] there are several things we need to fix. [20:55:10] i totally forgot, i even wrote it down last night [20:55:21] oh? [20:55:34] drdee: https://gerrit.wikimedia.org/r/24377 [20:55:50] drdee: I did git pull and branched out of it in filter_fix [20:55:56] nope same problem [20:56:01] hmm [20:56:12] we totally need to edit cassandra-env.sh and dse-env.sh to set -Drmi.hostname=$(hostname) [20:56:23] ? [20:56:25] oh? [20:56:58] drdee: I can't figure it out, halpz [20:57:02] halp [20:57:27] 1 sec [20:57:33] ok [20:58:12] -Djava.rmi.server.hostname= [20:58:37] it won't bind otherwise, and i suspect this prevents dsetool from working [20:58:43] that's why it throws rmi exceptions [20:59:02] goddamnit, i even figured this out last night [20:59:17] you wanna update that in puppet? [21:00:07] ah ok [21:00:13] sure, those aren't in puppet yet, but they prob should be [21:00:26] i'm going to stop the cluster. [21:01:05] btw, the hostname according to `hostname` does not have .eqiad.wmnet [21:02:01] i'll note that their docs also have quotes around the seed nodes [21:02:11] seed_provider: [21:02:11] - class_name: org.apache.cassandra.locator.SimpleSeedProvider [21:02:12] parameters: [21:02:14] - seeds: "110.82.155.0,110.82.155.3" [21:02:40] ah hmm, ok will add that too [21:02:47] i have a meeting now. [21:02:53] (robla) so i'll be back in a bit. [21:02:56] am I addin gthis to JVM_OPTS? [21:03:16] ah i see it, yeah [21:03:19] in cassandra-env.sh at least [21:03:19] ok [21:04:06] yeah, it should be in anything that starts a java process [21:04:39] average_drifter: back [21:05:01] drdee: I'm here [21:07:37] dschoon, fyi, it didn't make a diff for dsetool [21:07:54] i jsut manually added it to the /usr/bin/dsetool script (since dsetool doesn't load any opts from its env files) [21:08:03] same java.rmi.ConnectException: Connection refused to host: analytics1002.eqiad.wmnet; [21:08:04] though [21:08:28] dsetool works on my localvm with none of those changes [21:09:12] dschoon, you know what... [21:09:19] i kind of want to start over with these [21:09:28] everything kinda seems a bit unruly at the moment [21:09:33] and we want to put htem on precise anyway [21:09:35] right? [21:15:32] drdee: https://gerrit.wikimedia.org/r/24379 [21:20:10] merged [21:21:58] drdee: ok, now I'll look at asana to study the piece of code you post there [21:22:06] *posted [21:22:16] what exactly? [21:22:51] drdee: https://app.asana.com/0/1891117540465/1491693732176 [21:22:55] drdee: that one [21:23:29] 1 sec [21:24:16] dschoon, i'm about, ttyt [21:25:08] average_drifter: but we didn't finish webstatscollector yet, right? [21:25:39] it needs support for *.planet.wikimedia.org domains, blog.wikimedia.org and wikimediafoundation.org (in filter.c) [21:25:43] drdee: we're supposed to add planet.wikimedia yea [21:25:50] drdee: that's what I was gonna say [21:25:59] ok I'll add those [21:26:29] the asana task that you mention is udp-filter, another c project [21:57:56] all limn tasks have been migrated to github, from now on I'll track it there [21:59:54] awesome! [22:00:10] average_drifter: what hours do you work on a typical day? [22:01:32] back [22:06:45] drdee: well I'm usually full focus around 1-2PM and, but sometimes I start earlier [22:07:47] now I'm looking to understand the projects struct and how to add some test log lines to example2.log so that I can confirm *.planet.wikimedia.org passes the filter [22:07:50] and the other one as well [22:08:20] sounds good [22:09:45] so the * is a language code, like 'en' or 'pl' [22:10:55] yeah [22:30:45] nite everyone, tty morrow [22:30:57] i'll go update the etherpad now but I don't think I have anything too constructive [22:31:18] milimetric: is it that OSS collaborative editor that went closed-source ? [22:31:28] drdee: I have a question about adding blog.wikimedia.org [22:31:46] average_drifter: shoot [22:31:55] later milimetric [22:31:57] average_drifter: no idea, just found out about it from these guys [22:32:00] later drdee [22:32:38] drdee: the first check for a link is it has the first diretory /wiki [22:32:51] drdee: if it doesn't then it's skipped altogether [22:32:57] drdee: but blog.wikimedia.org has no /wiki [22:33:03] drdee: so should it be counted ? [22:33:07] yeah but that does not apply to blog and planet [22:33:21] that only applies to wikimedia project urls [22:33:46] so /wiki/ only applies if the project is a wiki project [22:33:58] the blog is wordpress and so doesn't have /wiki/ as you noticed [22:34:11] so probably best is first to filter by domain names [22:34:27] drdee: ok [22:34:27] and then filter by path [22:34:32] yep [22:35:39] and *.planet.wikimedia.org does not seem to be a wiki either [22:37:11] average_drifter: you can drop quality from the whitelist [22:37:42] and usability as well [22:46:03] ok [23:03:37] dschoon: http://blog.wikimedia.org/2012/09/19/what-are-readers-looking-for-wikipedia-search-data-now-available/ [23:03:49] nice [23:12:41] anther dataset liberated :D [23:12:56] drdee: can I ask something in pm ? [23:13:11] yeah [23:54:06] later guys