[00:05:49] (03PS2) 10AzaToth: grrrit: Fix check in repo_config for "repos" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/118028 [02:03:58] petan: I'm getting 502 - Bad gateway for lab-l http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs/ [02:28:02] Coren: I increased the dnsmasq cache size on eqiad 4x. Let me know if you notice any change in behavior. [02:29:00] (pmtpa cache is the same as always) [02:29:28] ottomata: Same to you: Please let me know if the proxy is either more or less reliable now. [02:38:34] petan: If you are around, can we talk about your proxy bug? [02:42:12] petan: ok, commented on the bug instead [08:58:55] petan: what project is http://wdjenkins2.wmflabs.org in? [08:59:11] I have no idea what it is [08:59:30] how can I check that? [09:00:40] oh, well... [09:01:02] Sorry, I'm talking about the proxy problems you're having. In order to set up a proxy you must have done it within a project, right? [09:02:36] Can you give me an example so that I can see the problem you're seeing? [09:04:21] andrewbogott: http://bots.wmflabs.org/ [09:04:44] ok… so that proxies to instance wm-bot, right? [09:04:49] yes [09:05:06] It doesn't look to me like wm-bot is accessable for http ports. Am I mistaken? [09:07:10] let me check [09:08:42] now it is [09:08:48] but it still doesn't work [09:09:11] I can connect to port 80 from other instances [09:15:19] strange, nginx says 2014/03/11 09:14:46 [error] 29152#0: *1 wm-bot.eqiad.wmflabs could not be resolved (3: Host not found), client: 218.212.126.111, server: , request: "GET / HTTP/1.1", host: "bots.wmflabs.org" [09:15:24] and yet I can resolve it from the same box [09:26:01] andrewbogott: aaah, I might know the solution. [09:26:09] YuviPanda: yeah? [09:26:30] When I set this box up it was working, now nginx can't resolve anything [09:26:39] andrewbogott: line 31, modules/dynamicproxy/templates/proxy.conf [09:26:45] andrewbogott: specifies a specific DNS server that's hardcoded [09:26:49] andrewbogott: might need to be changed? [09:26:56] * andrewbogott looks [09:26:56] andrewbogott: nginx required it when I first set it up... [09:27:16] andrewbogott: should probably be made into a parameter of some sort? [09:28:13] YuviPanda: why does it require a resolver at all? Clearly the system has perfectly reasonable dns... [09:28:19] could it be localhost? [09:28:35] andrewbogott: good question. nginx couldn't resolve anything before I put it there, so localhost might just as well work. [09:28:45] andrewbogott: I didn't try localhost, but it didn't seem to automatically pick up the system's [09:33:09] YuviPanda: yeah, changing that helps, although setting it to localhost does not. [09:33:22] andrewbogott: yeah, I dunno what nginx is doing / thinking there. [09:33:25] Guess I'll write a patch... [09:33:28] Weird that it's required [09:34:01] andrewbogott: yeah. might as well parameterize it [09:35:38] andrewbogott, what TZ are you in?:) [09:35:57] today I'm in UTC+8 [09:36:05] but returning to North America later this week [09:49:57] petan, better? [10:46:55] !log deployment-prep dropping some unused databases from deployment-sql instance. [10:46:56] Logged the message, Master [11:25:45] Sigh. I was redirected to http://www.tools-webgrid-01.com:4078/xtools/editsummary/ from tools.wmflabs.org/xtools/editsummary/ [13:14:43] re [13:31:48] hashar: Why in /blazes/ would you need a specific gid? [13:32:04] hashar: Anything that relies on a uid/gid's numeric value is a bug. [13:32:15] good morning Coren :-] [13:32:24] hashar: And yes, good morning. :-) [13:32:31] err: /Stage[main]/Mediawiki::Users::L10nupdate/File[/home/l10nupdate/.ssh]/owner: change from root to l10nupdate failed: Failed to set owner to '10002': Invalid argument - /home/l10nupdate/.ssh [13:32:35] that is from puppet [13:32:50] the manifest that installs l10nupdate assign the GID 10002 :( [13:32:53] hashar: Then that manifest is broken. [13:33:16] I dont know the exact details, but iirc the l10n files are fetched from translatewiki each night on tin then synced using dsh [13:34:32] and we have an admins::group with god 10002 [13:34:32] That still doesn't justifies a manifest with a numeric gid. :-) [13:36:13] the first commit date from 2011-09-13 :( [13:36:24] merely historical, I am not sure how to get it fixed in production nor whom to ask [13:36:53] so I though of matching the uid/gid in labs. Sounds easier to me than attempting to fix the tech debt :] [13:38:18] Coren: is it a total hack to create l10nupdate user and groups in LDAP with fixed id ? [13:38:24] If that currently exists in production, then it's a bug in production that needs to be fixed; NFS4 will not (ever) allow you to chown to a numeric uid, it idmaps with user/names/ (as any system should). You'll have to make the manifest conditional on labs until the production side is fixed. [13:38:31] if you dislike it, I would mail engineering list to figure out what need to be done [13:38:46] hehe [13:38:57] will warn on eng list and get folks to fix that [13:39:09] pretending it is a blocker for the pmtpa -> eqiad migration [13:39:32] hashar: Wait, the user exists in ldap? [13:39:50] I dont think [13:40:07] on pmtpa there is a /home/l10nupdate though [13:40:24] puppet can create that directory on the eqiad NFS server but can not assign the l10nupdate group [13:40:28] which I guess is working as intended. [13:41:18] hashar: The /correct/ solution is simple: make sure the group exists in ldap, and refer to it by name. [13:42:09] yeah I think something on the NFS server is slightly different between pmtpa and eqiad [13:42:52] Yes, pmtpa had broken 1:1 mapping and ignored usernames, causing all sorts of trouble. [13:43:03] the mount on pmtpa has the option sec=sloppy [13:43:16] hashar: hm. Does the l10nupdate group in labs need to be the same as in prod? [13:43:30] probably not [13:43:43] hmm no [13:43:51] 10002 is hardcoded in the puppet manifests [13:44:06] so got to get that fixed in prod [13:44:17] Then part (1) of the correct solution ("make the group exists in ldap") is as simple as "create a service group in the project") :-) [13:44:53] part (2) ("remove the stupid hardcoded gid") can be done in puppet with an if $::realm for now. [13:45:16] service group ??? [13:45:26] like we can create our own labs user groups? [13:45:41] * hashar digs [13:46:24] https://wikitech.wikimedia.org/wiki/Special:NovaServiceGroup [13:46:36] awesome [13:46:49] just found out it is in the sidebar under "Labs User" [13:47:17] that is going to conflict with the local groups created on the machine with gid 10002 but I can surely get that cleaned outp [13:47:49] Well, not really, because the name won't match nor will the gid. [13:48:34] ah yeah that creates local-l10nupdate [13:48:55] (Actually, not anymore. That'll create $projectname.l10nupdate.) [13:49:19] that is not what the wikitech web interface is showing to me :-] [13:49:45] But the idea is the same. Now, there is only a simple fix to be made to the manifest to have a group name parameter rather than a hardcoded '10002' and you're all set. [13:49:58] changing the group is going to be a mess in puppet manifests as well [13:50:09] The wikitech web interface lies because it still speaks pmtpaish. [13:50:23] hashar: It should have never been hardcoded to begin with. [13:50:42] hashar: Lemme give you a hand; what are the manifests in question? [13:50:51] a lot of them [13:51:00] there are references to group => l10nupdate in several place [13:51:10] so having a group deployment-prep.l10nupdate would cause too much changes I am afraid [13:51:15] I am fine having that group local though [13:51:15] Ah, by /name/ is okay. [13:51:31] okk [13:51:39] Then we can stuff a l10nupdate group in global ldap for you since that is pervasive. [13:52:30] But I can guarantee its gid is /not/ going to be 10002, so you need to fix that. [13:52:46] andrewbogott_afk: no it's not better now, it's same [13:52:49] You can't have a local group. It /needs/ to be in ldap for NFS to work. [13:53:30] make sense [13:53:44] creating the service group give me a deployment-prep.l10nupdate group: [13:53:45] # getent group deployment-prep.l10nupdate [13:53:45] deployment-prep.l10nupdate:*:51784:hashar [13:53:58] # getent group l10nupdate [13:53:58] # # not surprised =] [13:54:22] Well, that's not strictly true. It needs to be on the clients and on all the NFS servers. LDAP is the only way to make that happen easily. :-) [13:54:53] then I dont see myself maintaing deployment-prep.l10nupdate in our puppet manifests :-] [13:54:56] that is prone to failure hehe [13:55:07] I can surely get rid of 10002 though [13:55:18] No, you're right. Because that lives in prod too that'd be overcomplicated. [13:55:47] * Coren ponders. [13:55:47] :-( [13:56:09] that sort of dilemma have been strucking me for 2 years now :/ [13:56:27] You know... [13:56:32] that I why I came with the crazy idea of hardcoding stuff in LDAP but that is not an elegant solution [13:56:37] I can specialcase the service group instead. [13:57:08] agh generic::systemuser { 'l10nupdate': default_group => 10002 } [13:57:22] Nah -- that still makes it labs-specific. [13:57:28] can probably get rid of that one in favor of default_group => l10nupdate and an include group::l10nupdate [13:57:43] hashar: Yeah, that'll work. [13:58:05] if we can avoid a hack / specific case that is nicer [14:01:37] then to have a l10nupdate available on NFS I simply have to create such an user in wikitech right ? [14:07:07] Coren: is the DNS/proxy issue fixed? [14:07:09] its working for us right now [14:07:37] ottomata: andrewbogott_afk has made a change that may alleviate/fix the issue. Waiting on results. [14:12:53] ok thanks! [14:15:45] Coren: and here is my lame patch for l10nupdate https://gerrit.wikimedia.org/r/118071 . [14:15:56] Coren: that hack systemuser to accept a UID parameter just like puppet user {} [14:16:12] let me pass the UID of the l10nupdate user I created on wikitech [14:26:53] [14:26:53] yeah wikitech! [14:34:00] <^d> hashar: Hey, I think I'm getting my slave set up fine :) [14:38:15] ^d: good morning :-] [14:38:27] the hhvm slave ? [14:38:30] <^d> Yep [14:38:30] <^d> :) [14:38:53] there is some oddity with the username used to ssh from master to the slave [14:38:58] we have two users jenkins-slave and jenkins-deploy [14:39:03] <^d> Puppet's running right now. [14:39:08] iirc jenkins-deploy is to be used on beta [14:39:11] <^d> On the slave. [14:39:13] and jenkins-slave for the ci slaves [14:39:21] but I am not sure [14:39:47] hmm https://integration.wikimedia.org/ci/computer/integration-slave02/configure uses jenkins-deploy :-] [14:40:53] ah that might be jenkins-deploy for labs and jenkins-slave for production [14:40:55] got figure [14:40:59] <^d> ssh worked fine. [14:41:08] <^d> But I didn't have java + jenkins installed yet (forgot to run puppet) [14:41:13] what you wanna run on that slave ? [14:41:44] <^d> Just a job to build hhvm when things get checked into various branches we care about. [14:41:59] <^d> It's really really resource intensive so I didn't want to bog down the other slaves. [14:42:13] make sure this slave usage is restricted to run only jobs tied to it [14:42:20] on jenkins would assign random jobs to it [14:42:30] such as running mw/core tests there which is prone to failure [14:42:50] then when creating the job, you can restrict where it is running by giving it the slave name ('hhvm-build') [14:44:05] also that is the first CI slave created on eqiad [14:45:20] <^d> I did :) [14:45:54] <^d> Blah, I want to retry it. [14:45:59] <^d> It seems hung on master tho: https://integration.wikimedia.org/ci/computer/hhvm-build/log [14:46:12] :-/ [14:46:14] <^d> "[03/11/14 14:39:34] Launch failed - cleaning up connection" [14:46:19] <^d> Just spinning endlessly [14:46:20] see above [14:46:24] [03/11/14 14:43:19] [SSH] Remote file system root /mnt/jenkins-workspace does not exist. Will try to create it... [14:46:28] Caused by: com.trilead.ssh2.SFTPException: Permission denied (SSH_FX_PERMISSION_DENIED: The user does not have sufficient permissions to perform the operation.) [14:46:31] <^d> Ahhh [14:46:39] <^d> Well yeah, I want it to retry now ;-) [14:46:44] <^d> All kinds of things failed. [14:46:45] /mnt belong to root [14:46:48] <^d> No java, etc. [14:47:34] hmm [14:47:42] /mnt/jenkins-workspace is not created by puppet apparently [14:47:51] <^d> It is. [14:47:58] <^d> I just hadn't finished puppet yet ;-) [14:48:05] <^d> Hence: "I just want it to retry now" [14:48:05] ahh [14:48:44] <^d> Meh, I'll just delete & readd [14:48:46] ah puppet writes its logs to syslog ... how convenient [14:48:58] I was tailing the wrong file (/var/log/puppet.log) [14:49:48] ^d: it is busy creating some cow builder image for package building [14:49:54] ^d: that takes a bunch of time [14:50:07] <^d> Yeah it's just running, but the jenkins bits should all be done now I think [14:56:54] ^d: have you every tried Jenkins Job Builder to define your Jenkins jobs ? :-] [14:57:07] <^d> Yeah [14:57:11] <^d> I probably won't here. [14:57:12] <^d> :p [14:57:24] for maven jobs it is not that hard [14:57:35] I mean for specific jobs [14:57:43] and maven jobs are not any harder than a freestyle one [14:59:31] !gitweb integration/jenkins-job-builder-config [14:59:31] https://gerrit.wikimedia.org/r/gitweb?p=integration/jenkins-job-builder-config.git [15:00:14] ^d: an example for mobile team is https://git.wikimedia.org/blob/integration%2Fjenkins-job-builder-config.git/master/mobile.yaml#L3 [15:00:41] * ^d is still waiting for cowbuilder [15:00:43] <^d> moo. [15:02:28] enjoy some first kisses meanwhile http://vimeo.com/88671403 [15:02:40] being spammed by friends right now .. [15:09:57] <^d> hashar: Ok, so looking at jenkins job builder. any examples of git scms that have multiple remotes? [15:10:23] ^d: we might have a few [15:10:34] if I remember correctly you can pass several -git parameters [15:10:52] gotta try it out :-] [15:13:49] !jenkins chad-multigit [15:13:49] https://integration.wikimedia.org/ci/job/chad-multigit [15:14:15] ^d: https://gerrit.wikimedia.org/r/118086 creates https://integration.wikimedia.org/ci/job/chad-multigit [15:14:18] not sure it works though [15:14:25] building it [15:14:50] seems to work as expected https://integration.wikimedia.org/ci/job/chad-multigit/ws/ :-] [15:14:53] <^d> That's not what I want. [15:15:05] <^d> I want one repo with multiple remotes (this is git :)) [15:15:08] ah multiple remotes!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [15:15:12] <^d> So I can fetch origin/master [15:15:16] <^d> tstarling/something [15:15:17] <^d> ext. [15:15:20] <^d> *etc [15:15:47] <^d> I think I might need something like https://wiki.jenkins-ci.org/display/JENKINS/Multiple+SCMs+Plugin [15:16:27] yeah we have that plugin installed [15:16:42] which let you create a job with multiple repositories [15:16:51] not sure it supports multiple remotes in a single checkout though [15:17:06] <^d> I might have to just do something hacky. [15:17:23] <^d> Like just have the job shell out :) [15:18:41] trying with the same basedir but different remote names [15:19:13] doesn't work :( [15:19:28] <^d> What about combining it? [15:19:32] <^d> And giving it two url params? [15:19:33] https://integration.wikimedia.org/ci/job/chad-multigit/3/console [15:20:49] https://gerrit.wikimedia.org/r/#/c/118086/2/chad.yaml,unified [15:21:01] tried to get both fetched at the same place but only the last is kept :-( [15:21:11] ah it is wiped [15:21:53] <^d> Yeah, I don't think job builder can figure it out. [15:22:13] <^d> I'll just clone for origin, then do some prebuild shelling to make sure the repo's setup right. [15:22:21] it can :-] [15:23:01] ^d: https://gerrit.wikimedia.org/r/#/c/118086/2..3/chad.yaml,unified [15:23:07] by default the git plugin wipe the workspace [15:23:18] so when fetching the first repo to 'multi-remotes' directory it is cleaned up [15:23:25] and the same happens when the second repo is fetched [15:23:52] disabling workspace wiping on the second git scm prevent it form removing the remote defined by the first scm [15:24:11] the build happened on lanthanum.eqiad.wmnet in dir /srv/ssd/jenkins-slave/workspace/chad-multigit/multi-remotes [15:24:26] $ git remote [15:24:26] origin-jjb [15:24:26] origin-zuul [15:25:00] I think there is a way to merge a branch before running tests [15:25:05] so you could merge the first repo in the second [15:25:43] yeah that is under merge options [15:33:15] no idea [15:34:50] <^d> Yeah, it's not as urgent, don't worry about it. [15:34:56] <^d> I'll hack something up as needed. [15:35:02] <^d> And jjb the rest. [15:36:31] yeah via shell script maybe [15:36:38] not sure how git plugin can merge two remotes :( [15:39:00] <^d> Here's what I've got so far: http://p.defau.lt/?CY4Ja_eO63nVRm_4dDlSZQ [15:39:14] <^d> I'll add another step to save the binaries (since that's what ori wants) [15:40:13] <^d> In good news though: "Slave successfully connected and online" [15:40:16] <^d> :D [15:51:34] ^d: the git plugin should be able to handle it but nothing straightforward apparently [15:51:53] ^d: there is a github plugin as well iirc [15:52:06] might not be in jjb though [15:53:46] ^d: http://ci.openstack.org/jenkins-job-builder/triggers.html#triggers.github [15:54:46] ^d: and http://ci.openstack.org/jenkins-job-builder/properties.html#properties.github :D [15:55:30] <^d> Looks kind of hacky :p [15:55:32] <^d> I don't need it [15:55:50] that would probably poll github and build whenever a commit is find [15:57:53] <^d> Don't really need per-commit builds (they already have travis) [15:57:59] <^d> We just need daily builds for testing against. [16:02:35] ^d: there is also a template to build Debian package on ourjenkins [16:02:41] ^d: something like '{name}-debian-glue' [16:02:57] is there something I am doing wrong when getting the "no such tool" error when using become and connection timed out when doing ssh? [16:02:58] should be easy to setup. If it matches git build package conventions that should just work. [16:03:14] <^d> hashar: I'm not building debs. [16:03:29] just in case you want to do one day :-] [16:03:44] <^d> Debs are black magic, I'll let someone else do those. [16:07:09] I can look at it [16:19:09] <^d> hashar: https://gerrit.wikimedia.org/r/#/c/118091/ is up for review + testing [16:19:11] <^d> :) [16:19:32] almost :-] [16:21:41] 00:00:03.545 jenkins_jobs.errors.JenkinsJobsException: Unknown entry point or macro '@daily' for component type: 'trigger'. [16:22:04] not sure why [16:22:59] <^d> PS2 got it. [16:23:02] <^d> I forgot timed: [16:23:16] <^d> https://gerrit.wikimedia.org/r/#/c/118091/1..2/hhvm.yaml [16:23:39] nice [16:24:02] and you got the wrappers in \O/ [16:25:44] ^d: there is a path issue I think https://gerrit.wikimedia.org/r/#/c/118091/2/hhvm.yaml,unified [16:25:48] you fetch to /hhvm [16:25:59] your builder should probably cd to it [16:26:02] cd hhvm [16:26:07] <^d> Ahh, whoops. [16:26:15] $WORKSPACE is the current directory so you probably dont need it [16:26:23] <^d> I can just omit the basedir then [16:26:41] apart from that sounds good. I usually create the job ( jenkins-jobs --conf jenkins_jobs.ini update config/ hhvm-daily-build ) [16:26:46] then build it once to verify it works [16:26:51] if that is fine, you can self merge [16:27:37] <^d> I don't have the scripts installed on this machine [16:28:31] ahh [16:28:32] let me deploy it so [16:29:09] !jenkins hhvm-daily-build [16:29:09] https://integration.wikimedia.org/ci/job/hhvm-daily-build [16:29:13] done [16:29:17] building [16:29:25] https://integration.wikimedia.org/ci/job/hhvm-daily-build/1/console [16:29:39] note that if you wipeout the workspace it has to clone the full repo [16:29:43] might need to clean instead [16:29:49] it runs something like git clean -xqdf [16:30:30] <^d> Repo's not big, cloning is fine. [16:30:36] okk [16:30:45] stderr: fatal: Refusing to fetch into current branch refs/heads/master of non-bare repository [16:30:49] some ref spec is wrong I guess [16:31:10] <^d> fetch -t origin refs/heads/*:refs/heads/* [16:31:15] <^d> What's wrong with that? :p [16:31:51] <^d> Eh, it's redundant here. [16:31:53] <^d> Amending to remove. [16:32:21] deploying PS3 [16:32:38] <^d> No, PS4. [16:34:50] ^d: building [16:34:53] https://integration.wikimedia.org/ci/job/hhvm-daily-build/3/console [16:35:09] <^d> \o/ [16:36:19] congratulations! [16:36:25] and it is using JJB which is even better [16:36:50] it is a bit more painful than clicking / modifying via the gui but I find it easier to review changes and track what is being modified [16:38:01] <^d> I'm going to take a break and grab something to drink while this builds. [16:38:12] <^d> I'll need to amend again to copy off the binary to where we want to save it too. [16:38:53] archive: for the win! [16:39:03] I usually create "log" directory [16:39:07] and put artifacts there [16:39:15] (wiping log/ before the run begin) [16:51:18] Hi all, I am getting Permission denied (publickey,hostbased). [16:51:19] error [16:51:25] when I try to ssh [16:51:31] I am logging in for first time [16:51:57] Whats the mistake I am doing? [16:55:53] This is the output I get : OpenSSH_5.9p1 Debian-5ubuntu1.1, OpenSSL 1.0.1 14 Mar 2012 [16:55:53] debug1: Reading configuration data /home/.../.ssh/config [16:55:53] debug1: Reading configuration data /etc/ssh/ssh_config [16:55:53] debug1: /etc/ssh/ssh_config line 19: Applying options for * [16:55:53] debug1: Connecting to tools-login.wmflabs.org [208.80.153.224] port 22. [16:56:20] tuxnani: Hi! Which server do you try to reach? Do you have a different username on your own machine and on Labs? In this case you need to specify the Labs shell name, i. e. "ssh username@tools-login-eqiad.wmflabs.org". [16:56:59] .. [16:58:56] tuxnani: Have you uploaded your key to wikitech? I don't see your key in the Labs cluster. [16:59:44] ^d: hhvm build is a success!!!!!!!!!!! https://integration.wikimedia.org/ci/job/hhvm-daily-build/3/console [16:59:54] <^d> Yepppp :D [17:05:51] (03PS1) 10John F. Lewis: Add #wmt-ko to the family [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/118096 [17:05:52] scfc_de: I added it [17:05:59] Still I get the same error [17:06:34] (03CR) 10PiRSquared17: [C: 032 V: 032] Add #wmt-ko to the family [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/118096 (owner: 10John F. Lewis) [17:07:24] tuxnani: Are you connecting as tuxnani@tools-login-eqiad.wmflabs.org? [17:07:39] scfc_de: The problem seems to be username mismatch [17:07:52] I am having a different username in local machine [17:07:57] Whats the way out? [17:08:50] Are you using Linux? Then you need to "ssh tuxnani@tools-login-eqiad.wmflabs.org". [17:09:01] I am using the same command [17:10:13] <^d> hashar: Amended again to archive the build artifact. Can you test it again? :) [17:10:34] ^d: sure [17:10:57] tuxnani: The log on tools-login-eqiad shows that you logged in successfully. [17:11:10] scfc_de: Let me check [17:11:40] scfc_de: I am logged in indeed. But I cant access login to db [17:11:46] ^d: ahah [17:11:47] sql enwiki_p [17:12:02] says I am not allowed to login to sql [17:12:13] ^d: we can capture the artifacts and copy them back on the master (gallium) then have the build published under integration.wikimedia.org somewhere [17:12:23] tuxnani: What's the error message? [17:12:25] ^d: but for now that is probably good enough [17:12:40] <^d> I'm setting up a proxy & vhost on the slave too. [17:12:44] tuxnani@tools-login:~$ sql tewiki_p [17:12:45] Enter password: [17:12:45] ERROR 1045 (28000): Access denied for user 'tuxnani'@'10.68.16.7' (using password: YES) [17:12:48] <^d> So we can fetch them from hhvm-build [17:13:58] I created a tool account I think and I cannot become it [17:14:02] tuxnani: Try again, please. (There may be a short (< 5 minutes) delay between first log in and DB access.) [17:14:06] ^d: refreshing job [17:14:13] ok [17:14:17] tahnk you [17:14:24] scfc_de: thank you sir [17:14:38] TBloemink: You need to log out and in again so that the OS recognizes that you are a member of the tool now. [17:14:42] tuxnani: No problem. [17:14:51] scfc_de, from ssh you mean? [17:14:58] <^d> hashar: http://hhvm-build.wmflabs.org/ ;-) [17:15:44] hehe [17:15:58] <^d> Hmm, I must've misunderstood basedir. [17:16:10] no ide [17:16:11] a [17:16:24] <^d> Fixing. [17:16:25] you should set up JJB on your machine [17:16:39] also if you wanna test you can remove the workspace clean / Comment out the build step [17:16:43] then rerun the job [17:16:53] that would let you easily test the publishing step [17:17:06] TBloemink: Yes. But I see that you logged into tools-login.wmflabs.org, the "old" pmtpa cluster. Coren, do you have any advice for new users how to proceed, i. e. create a new tool in pmtpa and then migrate, or create in eqiad and let pmtpa be? [17:17:25] Create in eqiad, let pmtpa die. [17:17:45] Should probably fix the docs, but it's less than a week left so... [17:18:13] TBloemink: Then you should log into tools-login-eqiad.wmflabs.org exclusively and not worry about the "old" stuff. [17:18:17] petan: bot je mrtvý [17:24:12] <^d> hashar: Got it setup :p [17:27:49] ^d: there is some more doc at https://www.mediawiki.org/wiki/Continuous_integration/Jenkins_job_builder [17:28:09] <^d> Yep [17:28:14] you can update a single job with something like: [17:28:20] jenkins-jobs --conf etc/jenkins_jobs.ini update config/ hhvm-daily-build [17:29:21] hashar: i liked fab a lot [17:33:00] ^d: do you need anything else from me ? [17:33:24] ^d: if you want to test the publisher, prevent git from wiping the workspace and comment out the build [17:33:28] ^d: that will be faster :] [17:34:01] ^d: you can also play with the archive: publisher wich would save a copy of the hhvm binary with the build http://ci.openstack.org/jenkins-job-builder/publishers.html#publishers.archive [17:34:26] <^d> Nope, I think I got it now. [17:34:29] <^d> Thanks for all your help! [17:36:28] ^d: you can even rename hhvm to include the build # or gitsha1 [17:36:36] ^d: Jenkins has a few global env you can use https://wiki.jenkins-ci.org/display/JENKINS/Building+a+software+project#Buildingasoftwareproject-JenkinsSetEnvironmentVariables [17:36:38] <^d> I probably will :) [17:37:26] and the git plugin set a few more https://wiki.jenkins-ci.org/display/JENKINS/Git+Plugin (look for 'environment variables' at the bottom) [17:37:42] GIT_COMMIT - SHA of the current GIT_BRANCH - Name of the branch currently being used, e.g. "master" or "origin/foo" :-] [17:39:57] ^d: I am off :-] Ping me by email for follow up [18:01:48] Coren: scfc_de`any infos about the wm-bot? --> http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs/ [18:02:34] hedonil: No. Ask petan? Also, check labs-l, I think I recall a recent email about irc logs. [18:03:00] * hedonil checks [18:06:03] hedonil: My guess: petan pointed bots.wmflabs.org at a new webserver in eqiad that accesses eqiad shared project storage, but the logs are written to pmtpa shared project storage. [18:08:22] scfc_de: 'k. first it was 502 then 404. so poking petan again... [18:10:11] petan: poke. wm-bot/logs = 404 [18:12:17] YuviPanda: [18:13:56] YuviPanda: have you woken up yet? [18:14:03] AzaToth: yeah, in a meeting [18:14:07] k [18:18:58] okay, back.And how exactly should I now drop files on there? [18:33:28] hmmmm [18:36:16] could someone assist me with uploading files to the tools labs? /me is quite a noob to tools [18:37:25] scfc_de, ?:) [18:37:55] TBloemink: I would recommend using scp to transfer the files [18:38:35] TBloemink: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Accessing_Tool_Labs_and_managing_your_files [18:39:10] and as I evilishly assume you are using Windows, see https://wikitech.wikimedia.org/wiki/Help:Access_to_ToolLabs_instances_with_PuTTY_and_WinSCP [18:39:51] wonderful, it worked [18:39:53] using winscp [18:47:21] hedonil: do you have a minute for some questions about access to page counts for English Wikipedia? [19:09:24] Nettrom: back now. sure [19:22:27] hedonil: thanks, I just /msg'ed you, hope that's ok [19:23:53] hey, I'm getting errors when trying to delete instances that we don't need any more: Failed to delete instance parsoid-roundtrip3 (i-000004d8). [19:25:26] Coren, ^^ [19:26:14] gwicke: Entirely possible, given the current two-headed configuration. Is this an old instance in tempa? [19:27:24] yes, a bunch of them [19:27:43] we now have real hardware for parsoid rt testing [19:27:43] gwicke: Then don't worry about it; just mark them as 'doesn't need migration' in the migration doc and it'll go away when we shut tampa down. [19:27:51] !migration [19:27:51] https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration_Progress [19:28:08] Ah, wrong page. [19:28:25] No, right page. Last section. :-) [19:28:28] should I add a note at https://wikitech.wikimedia.org/wiki/Nova_Resource:Visualeditor ? [19:28:44] Second-to-last, I suppose. :-) [19:29:13] some instances need to be migrated, but not all [19:29:22] Put it in the migration progress page the bot just linked to, in the 'finished' section. [19:29:47] Oh, wait, you'll need actual migration? You're not creating afresh? [19:30:11] no, a few VMs should just be migrated over [19:30:16] Drop an email to andrewbogott_afk, then; he's the one who'll need to know. [20:19:02] Coren: Is this expected or something funky with my instance? "chgrp: changing group of `/data/project/': Read-only file system" [20:19:49] bd808: You might have to reboot the instance, it booted faster than the filesystem could do ACLs. :-) [20:20:17] I haven't found a clean way to serialize those two things yet. [20:20:18] Hmmm.. Ok. I just rebooted but I can try that again [20:20:41] I also tried unmounting and remounting with the same result [20:21:13] Yeah, I've seen that happen before. I think there is some caching of permissions that takes some time to time out. :-( [20:21:21] It always self-corrects after a little while. [20:21:43] Okey doke. As long as I'm not crazy… :) [20:27:20] Coren: I've tried rebooting 4 times now with no joy; any alternate fix? Or point me to the scripts that are running out of order? [20:27:28] hello [20:28:15] hashar: Shouldn't you be doing something more fun than hanging out here? ;) [20:28:23] bd808: It's not scripts; it's the NFS server's idea of what your access rights are/should be. If you inspect them they agree that you should have rw, but there is a layer that cached your earlier ro. [20:28:44] bd808: daughter asleep, wife watching some american soap on TV [20:28:52] bd808: and I took some sun bath this morning :] [20:29:22] bd808: I'm in the middle of another thing, but if you give me a little bit I'll use you as a guinea pig to figure it out right afterwards. [20:29:55] Coren: Sounds good. I'll go poke at a different instance in a different project [20:30:42] Hi all. Just set up a new tool (templatecount), created an index.php and ran webservice start [20:30:52] (all in eqiad) [20:30:56] Yet I'm getting 404 erros [20:30:59] *errors [20:32:34] Not showing on the list of tools either [20:34:39] ...but does appear on http://tools-eqiad.wmflabs.org/?list [20:34:39] Mmm, now working at http://tools-eqiad.wmflabs.org/templatecount/ but not at the regular address. [20:35:23] (And not working at http://tools-eqiad.wmflabs.org/templatecount ?) [20:36:35] jarry1250__: Ah, I see the issue. The proxy thinks the tool also exists in pmtpa. [20:37:04] Coren: What's the indicator that the proxy uses? [20:37:27] Hm. Actually, it /shouldn't/. [20:37:44] scfc_de: It uses the absence of a webservice or public_html, but right now neither are there so I'm a little confused. [20:39:01] Oh, ow. The tool's /home/ must be there though. [20:39:04] * Coren fixes that. [20:40:31] fix't [20:40:40] ok, so I've run into a problem while migrating to eqiad: newly migrated webtools (cgi python) are returning "Four hundred and four!" whereas they used to work in pmtpa [20:41:10]