[00:05:49] (03PS2) 10AzaToth: grrrit: Fix check in repo_config for "repos" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/118028 [02:03:58] petan: I'm getting 502 - Bad gateway for lab-l http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs/ [02:28:02] Coren: I increased the dnsmasq cache size on eqiad 4x. Let me know if you notice any change in behavior. [02:29:00] (pmtpa cache is the same as always) [02:29:28] ottomata: Same to you: Please let me know if the proxy is either more or less reliable now. [02:38:34] petan: If you are around, can we talk about your proxy bug? [02:42:12] petan: ok, commented on the bug instead [08:58:55] petan: what project is http://wdjenkins2.wmflabs.org in? [08:59:11] I have no idea what it is [08:59:30] how can I check that? [09:00:40] oh, well... [09:01:02] Sorry, I'm talking about the proxy problems you're having. In order to set up a proxy you must have done it within a project, right? [09:02:36] Can you give me an example so that I can see the problem you're seeing? [09:04:21] andrewbogott: http://bots.wmflabs.org/ [09:04:44] ok… so that proxies to instance wm-bot, right? [09:04:49] yes [09:05:06] It doesn't look to me like wm-bot is accessable for http ports. Am I mistaken? [09:07:10] let me check [09:08:42] now it is [09:08:48] but it still doesn't work [09:09:11] I can connect to port 80 from other instances [09:15:19] strange, nginx says 2014/03/11 09:14:46 [error] 29152#0: *1 wm-bot.eqiad.wmflabs could not be resolved (3: Host not found), client: 218.212.126.111, server: , request: "GET / HTTP/1.1", host: "bots.wmflabs.org" [09:15:24] and yet I can resolve it from the same box [09:26:01] andrewbogott: aaah, I might know the solution. [09:26:09] YuviPanda: yeah? [09:26:30] When I set this box up it was working, now nginx can't resolve anything [09:26:39] andrewbogott: line 31, modules/dynamicproxy/templates/proxy.conf [09:26:45] andrewbogott: specifies a specific DNS server that's hardcoded [09:26:49] andrewbogott: might need to be changed? [09:26:56] * andrewbogott looks [09:26:56] andrewbogott: nginx required it when I first set it up... [09:27:16] andrewbogott: should probably be made into a parameter of some sort? [09:28:13] YuviPanda: why does it require a resolver at all? Clearly the system has perfectly reasonable dns... [09:28:19] could it be localhost? [09:28:35] andrewbogott: good question. nginx couldn't resolve anything before I put it there, so localhost might just as well work. [09:28:45] andrewbogott: I didn't try localhost, but it didn't seem to automatically pick up the system's [09:33:09] YuviPanda: yeah, changing that helps, although setting it to localhost does not. [09:33:22] andrewbogott: yeah, I dunno what nginx is doing / thinking there. [09:33:25] Guess I'll write a patch... [09:33:28] Weird that it's required [09:34:01] andrewbogott: yeah. might as well parameterize it [09:35:38] andrewbogott, what TZ are you in?:) [09:35:57] today I'm in UTC+8 [09:36:05] but returning to North America later this week [09:49:57] petan, better? [10:46:55] !log deployment-prep dropping some unused databases from deployment-sql instance. [10:46:56] Logged the message, Master [11:25:45] Sigh. I was redirected to http://www.tools-webgrid-01.com:4078/xtools/editsummary/ from tools.wmflabs.org/xtools/editsummary/ [13:14:43] re [13:31:48] hashar: Why in /blazes/ would you need a specific gid? [13:32:04] hashar: Anything that relies on a uid/gid's numeric value is a bug. [13:32:15] good morning Coren :-] [13:32:24] hashar: And yes, good morning. :-) [13:32:31] err: /Stage[main]/Mediawiki::Users::L10nupdate/File[/home/l10nupdate/.ssh]/owner: change from root to l10nupdate failed: Failed to set owner to '10002': Invalid argument - /home/l10nupdate/.ssh [13:32:35] that is from puppet [13:32:50] the manifest that installs l10nupdate assign the GID 10002 :( [13:32:53] hashar: Then that manifest is broken. [13:33:16] I dont know the exact details, but iirc the l10n files are fetched from translatewiki each night on tin then synced using dsh [13:34:32] and we have an admins::group with god 10002 [13:34:32] That still doesn't justifies a manifest with a numeric gid. :-) [13:36:13] the first commit date from 2011-09-13 :( [13:36:24] merely historical, I am not sure how to get it fixed in production nor whom to ask [13:36:53] so I though of matching the uid/gid in labs. Sounds easier to me than attempting to fix the tech debt :] [13:38:18] Coren: is it a total hack to create l10nupdate user and groups in LDAP with fixed id ? [13:38:24] If that currently exists in production, then it's a bug in production that needs to be fixed; NFS4 will not (ever) allow you to chown to a numeric uid, it idmaps with user/names/ (as any system should). You'll have to make the manifest conditional on labs until the production side is fixed. [13:38:31] if you dislike it, I would mail engineering list to figure out what need to be done [13:38:46] hehe [13:38:57] will warn on eng list and get folks to fix that [13:39:09] pretending it is a blocker for the pmtpa -> eqiad migration [13:39:32] hashar: Wait, the user exists in ldap? [13:39:50] I dont think [13:40:07] on pmtpa there is a /home/l10nupdate though [13:40:24] puppet can create that directory on the eqiad NFS server but can not assign the l10nupdate group [13:40:28] which I guess is working as intended. [13:41:18] hashar: The /correct/ solution is simple: make sure the group exists in ldap, and refer to it by name. [13:42:09] yeah I think something on the NFS server is slightly different between pmtpa and eqiad [13:42:52] Yes, pmtpa had broken 1:1 mapping and ignored usernames, causing all sorts of trouble. [13:43:03] the mount on pmtpa has the option sec=sloppy [13:43:16] hashar: hm. Does the l10nupdate group in labs need to be the same as in prod? [13:43:30] probably not [13:43:43] hmm no [13:43:51] 10002 is hardcoded in the puppet manifests [13:44:06] so got to get that fixed in prod [13:44:17] Then part (1) of the correct solution ("make the group exists in ldap") is as simple as "create a service group in the project") :-) [13:44:53] part (2) ("remove the stupid hardcoded gid") can be done in puppet with an if $::realm for now. [13:45:16] service group ??? [13:45:26] like we can create our own labs user groups? [13:45:41] * hashar digs [13:46:24] https://wikitech.wikimedia.org/wiki/Special:NovaServiceGroup [13:46:36] awesome [13:46:49] just found out it is in the sidebar under "Labs User" [13:47:17] that is going to conflict with the local groups created on the machine with gid 10002 but I can surely get that cleaned outp [13:47:49] Well, not really, because the name won't match nor will the gid. [13:48:34] ah yeah that creates local-l10nupdate [13:48:55] (Actually, not anymore. That'll create $projectname.l10nupdate.) [13:49:19] that is not what the wikitech web interface is showing to me :-] [13:49:45] But the idea is the same. Now, there is only a simple fix to be made to the manifest to have a group name parameter rather than a hardcoded '10002' and you're all set. [13:49:58] changing the group is going to be a mess in puppet manifests as well [13:50:09] The wikitech web interface lies because it still speaks pmtpaish. [13:50:23] hashar: It should have never been hardcoded to begin with. [13:50:42] hashar: Lemme give you a hand; what are the manifests in question? [13:50:51] a lot of them [13:51:00] there are references to group => l10nupdate in several place [13:51:10] so having a group deployment-prep.l10nupdate would cause too much changes I am afraid [13:51:15] I am fine having that group local though [13:51:15] Ah, by /name/ is okay. [13:51:31] okk [13:51:39] Then we can stuff a l10nupdate group in global ldap for you since that is pervasive. [13:52:30] But I can guarantee its gid is /not/ going to be 10002, so you need to fix that. [13:52:46] andrewbogott_afk: no it's not better now, it's same [13:52:49] You can't have a local group. It /needs/ to be in ldap for NFS to work. [13:53:30] make sense [13:53:44] creating the service group give me a deployment-prep.l10nupdate group: [13:53:45] # getent group deployment-prep.l10nupdate [13:53:45] deployment-prep.l10nupdate:*:51784:hashar [13:53:58] # getent group l10nupdate [13:53:58] # # not surprised =] [13:54:22] Well, that's not strictly true. It needs to be on the clients and on all the NFS servers. LDAP is the only way to make that happen easily. :-) [13:54:53] then I dont see myself maintaing deployment-prep.l10nupdate in our puppet manifests :-] [13:54:56] that is prone to failure hehe [13:55:07] I can surely get rid of 10002 though [13:55:18] No, you're right. Because that lives in prod too that'd be overcomplicated. [13:55:47] * Coren ponders. [13:55:47] :-( [13:56:09] that sort of dilemma have been strucking me for 2 years now :/ [13:56:27] You know... [13:56:32] that I why I came with the crazy idea of hardcoding stuff in LDAP but that is not an elegant solution [13:56:37] I can specialcase the service group instead. [13:57:08] agh generic::systemuser { 'l10nupdate': default_group => 10002 } [13:57:22] Nah -- that still makes it labs-specific. [13:57:28] can probably get rid of that one in favor of default_group => l10nupdate and an include group::l10nupdate [13:57:43] hashar: Yeah, that'll work. [13:58:05] if we can avoid a hack / specific case that is nicer [14:01:37] then to have a l10nupdate available on NFS I simply have to create such an user in wikitech right ? [14:07:07] Coren: is the DNS/proxy issue fixed? [14:07:09] its working for us right now [14:07:37] ottomata: andrewbogott_afk has made a change that may alleviate/fix the issue. Waiting on results. [14:12:53] ok thanks! [14:15:45] Coren: and here is my lame patch for l10nupdate https://gerrit.wikimedia.org/r/118071 . [14:15:56] Coren: that hack systemuser to accept a UID parameter just like puppet user {} [14:16:12] let me pass the UID of the l10nupdate user I created on wikitech [14:26:53] [14:26:53] yeah wikitech! [14:34:00] <^d> hashar: Hey, I think I'm getting my slave set up fine :) [14:38:15] ^d: good morning :-] [14:38:27] the hhvm slave ? [14:38:30] <^d> Yep [14:38:30] <^d> :) [14:38:53] there is some oddity with the username used to ssh from master to the slave [14:38:58] we have two users jenkins-slave and jenkins-deploy [14:39:03] <^d> Puppet's running right now. [14:39:08] iirc jenkins-deploy is to be used on beta [14:39:11] <^d> On the slave. [14:39:13] and jenkins-slave for the ci slaves [14:39:21] but I am not sure [14:39:47] hmm https://integration.wikimedia.org/ci/computer/integration-slave02/configure uses jenkins-deploy :-] [14:40:53] ah that might be jenkins-deploy for labs and jenkins-slave for production [14:40:55] got figure [14:40:59] <^d> ssh worked fine. [14:41:08] <^d> But I didn't have java + jenkins installed yet (forgot to run puppet) [14:41:13] what you wanna run on that slave ? [14:41:44] <^d> Just a job to build hhvm when things get checked into various branches we care about. [14:41:59] <^d> It's really really resource intensive so I didn't want to bog down the other slaves. [14:42:13] make sure this slave usage is restricted to run only jobs tied to it [14:42:20] on jenkins would assign random jobs to it [14:42:30] such as running mw/core tests there which is prone to failure [14:42:50] then when creating the job, you can restrict where it is running by giving it the slave name ('hhvm-build') [14:44:05] also that is the first CI slave created on eqiad [14:45:20] <^d> I did :) [14:45:54] <^d> Blah, I want to retry it. [14:45:59] <^d> It seems hung on master tho: https://integration.wikimedia.org/ci/computer/hhvm-build/log [14:46:12] :-/ [14:46:14] <^d> "[03/11/14 14:39:34] Launch failed - cleaning up connection" [14:46:19] <^d> Just spinning endlessly [14:46:20] see above [14:46:24] [03/11/14 14:43:19] [SSH] Remote file system root /mnt/jenkins-workspace does not exist. Will try to create it... [14:46:28] Caused by: com.trilead.ssh2.SFTPException: Permission denied (SSH_FX_PERMISSION_DENIED: The user does not have sufficient permissions to perform the operation.) [14:46:31] <^d> Ahhh [14:46:39] <^d> Well yeah, I want it to retry now ;-) [14:46:44] <^d> All kinds of things failed. [14:46:45] /mnt belong to root [14:46:48] <^d> No java, etc. [14:47:34] hmm [14:47:42] /mnt/jenkins-workspace is not created by puppet apparently [14:47:51] <^d> It is. [14:47:58] <^d> I just hadn't finished puppet yet ;-) [14:48:05] <^d> Hence: "I just want it to retry now" [14:48:05] ahh [14:48:44] <^d> Meh, I'll just delete & readd [14:48:46] ah puppet writes its logs to syslog ... how convenient [14:48:58] I was tailing the wrong file (/var/log/puppet.log) [14:49:48] ^d: it is busy creating some cow builder image for package building [14:49:54] ^d: that takes a bunch of time [14:50:07] <^d> Yeah it's just running, but the jenkins bits should all be done now I think [14:56:54] ^d: have you every tried Jenkins Job Builder to define your Jenkins jobs ? :-] [14:57:07] <^d> Yeah [14:57:11] <^d> I probably won't here. [14:57:12] <^d> :p [14:57:24] for maven jobs it is not that hard [14:57:35] I mean for specific jobs [14:57:43] and maven jobs are not any harder than a freestyle one [14:59:31] !gitweb integration/jenkins-job-builder-config [14:59:31] https://gerrit.wikimedia.org/r/gitweb?p=integration/jenkins-job-builder-config.git [15:00:14] ^d: an example for mobile team is https://git.wikimedia.org/blob/integration%2Fjenkins-job-builder-config.git/master/mobile.yaml#L3 [15:00:41] * ^d is still waiting for cowbuilder [15:00:43] <^d> moo. [15:02:28] enjoy some first kisses meanwhile http://vimeo.com/88671403 [15:02:40] being spammed by friends right now .. [15:09:57] <^d> hashar: Ok, so looking at jenkins job builder. any examples of git scms that have multiple remotes? [15:10:23] ^d: we might have a few [15:10:34] if I remember correctly you can pass several -git parameters [15:10:52] gotta try it out :-] [15:13:49] !jenkins chad-multigit [15:13:49] https://integration.wikimedia.org/ci/job/chad-multigit [15:14:15] ^d: https://gerrit.wikimedia.org/r/118086 creates https://integration.wikimedia.org/ci/job/chad-multigit [15:14:18] not sure it works though [15:14:25] building it [15:14:50] seems to work as expected https://integration.wikimedia.org/ci/job/chad-multigit/ws/ :-] [15:14:53] <^d> That's not what I want. [15:15:05] <^d> I want one repo with multiple remotes (this is git :)) [15:15:08] ah multiple remotes!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [15:15:12] <^d> So I can fetch origin/master [15:15:16] <^d> tstarling/something [15:15:17] <^d> ext. [15:15:20] <^d> *etc [15:15:47] <^d> I think I might need something like https://wiki.jenkins-ci.org/display/JENKINS/Multiple+SCMs+Plugin [15:16:27] yeah we have that plugin installed [15:16:42] which let you create a job with multiple repositories [15:16:51] not sure it supports multiple remotes in a single checkout though [15:17:06] <^d> I might have to just do something hacky. [15:17:23] <^d> Like just have the job shell out :) [15:18:41] trying with the same basedir but different remote names [15:19:13] doesn't work :( [15:19:28] <^d> What about combining it? [15:19:32] <^d> And giving it two url params? [15:19:33] https://integration.wikimedia.org/ci/job/chad-multigit/3/console [15:20:49] https://gerrit.wikimedia.org/r/#/c/118086/2/chad.yaml,unified [15:21:01] tried to get both fetched at the same place but only the last is kept :-( [15:21:11] ah it is wiped [15:21:53] <^d> Yeah, I don't think job builder can figure it out. [15:22:13] <^d> I'll just clone for origin, then do some prebuild shelling to make sure the repo's setup right. [15:22:21] it can :-] [15:23:01] ^d: https://gerrit.wikimedia.org/r/#/c/118086/2..3/chad.yaml,unified [15:23:07] by default the git plugin wipe the workspace [15:23:18] so when fetching the first repo to 'multi-remotes' directory it is cleaned up [15:23:25] and the same happens when the second repo is fetched [15:23:52] disabling workspace wiping on the second git scm prevent it form removing the remote defined by the first scm [15:24:11] the build happened on lanthanum.eqiad.wmnet in dir /srv/ssd/jenkins-slave/workspace/chad-multigit/multi-remotes [15:24:26] $ git remote [15:24:26] origin-jjb [15:24:26] origin-zuul [15:25:00] I think there is a way to merge a branch before running tests [15:25:05] so you could merge the first repo in the second [15:25:43] yeah that is under merge options [15:33:15] no idea [15:34:50] <^d> Yeah, it's not as urgent, don't worry about it. [15:34:56] <^d> I'll hack something up as needed. [15:35:02] <^d> And jjb the rest. [15:36:31] yeah via shell script maybe [15:36:38] not sure how git plugin can merge two remotes :( [15:39:00] <^d> Here's what I've got so far: http://p.defau.lt/?CY4Ja_eO63nVRm_4dDlSZQ [15:39:14] <^d> I'll add another step to save the binaries (since that's what ori wants) [15:40:13] <^d> In good news though: "Slave successfully connected and online" [15:40:16] <^d> :D [15:51:34] ^d: the git plugin should be able to handle it but nothing straightforward apparently [15:51:53] ^d: there is a github plugin as well iirc [15:52:06] might not be in jjb though [15:53:46] ^d: http://ci.openstack.org/jenkins-job-builder/triggers.html#triggers.github [15:54:46] ^d: and http://ci.openstack.org/jenkins-job-builder/properties.html#properties.github :D [15:55:30] <^d> Looks kind of hacky :p [15:55:32] <^d> I don't need it [15:55:50] that would probably poll github and build whenever a commit is find [15:57:53] <^d> Don't really need per-commit builds (they already have travis) [15:57:59] <^d> We just need daily builds for testing against. [16:02:35] ^d: there is also a template to build Debian package on ourjenkins [16:02:41] ^d: something like '{name}-debian-glue' [16:02:57] is there something I am doing wrong when getting the "no such tool" error when using become and connection timed out when doing ssh? [16:02:58] should be easy to setup. If it matches git build package conventions that should just work. [16:03:14] <^d> hashar: I'm not building debs. [16:03:29] just in case you want to do one day :-] [16:03:44] <^d> Debs are black magic, I'll let someone else do those. [16:07:09] I can look at it [16:19:09] <^d> hashar: https://gerrit.wikimedia.org/r/#/c/118091/ is up for review + testing [16:19:11] <^d> :) [16:19:32] almost :-] [16:21:41] 00:00:03.545 jenkins_jobs.errors.JenkinsJobsException: Unknown entry point or macro '@daily' for component type: 'trigger'. [16:22:04] not sure why [16:22:59] <^d> PS2 got it. [16:23:02] <^d> I forgot timed: [16:23:16] <^d> https://gerrit.wikimedia.org/r/#/c/118091/1..2/hhvm.yaml [16:23:39] nice [16:24:02] and you got the wrappers in \O/ [16:25:44] ^d: there is a path issue I think https://gerrit.wikimedia.org/r/#/c/118091/2/hhvm.yaml,unified [16:25:48] you fetch to /hhvm [16:25:59] your builder should probably cd to it [16:26:02] cd hhvm [16:26:07] <^d> Ahh, whoops. [16:26:15] $WORKSPACE is the current directory so you probably dont need it [16:26:23] <^d> I can just omit the basedir then [16:26:41] apart from that sounds good. I usually create the job ( jenkins-jobs --conf jenkins_jobs.ini update config/ hhvm-daily-build ) [16:26:46] then build it once to verify it works [16:26:51] if that is fine, you can self merge [16:27:37] <^d> I don't have the scripts installed on this machine [16:28:31] ahh [16:28:32] let me deploy it so [16:29:09] !jenkins hhvm-daily-build [16:29:09] https://integration.wikimedia.org/ci/job/hhvm-daily-build [16:29:13] done [16:29:17] building [16:29:25] https://integration.wikimedia.org/ci/job/hhvm-daily-build/1/console [16:29:39] note that if you wipeout the workspace it has to clone the full repo [16:29:43] might need to clean instead [16:29:49] it runs something like git clean -xqdf [16:30:30] <^d> Repo's not big, cloning is fine. [16:30:36] okk [16:30:45] stderr: fatal: Refusing to fetch into current branch refs/heads/master of non-bare repository [16:30:49] some ref spec is wrong I guess [16:31:10] <^d> fetch -t origin refs/heads/*:refs/heads/* [16:31:15] <^d> What's wrong with that? :p [16:31:51] <^d> Eh, it's redundant here. [16:31:53] <^d> Amending to remove. [16:32:21] deploying PS3 [16:32:38] <^d> No, PS4. [16:34:50] ^d: building [16:34:53] https://integration.wikimedia.org/ci/job/hhvm-daily-build/3/console [16:35:09] <^d> \o/ [16:36:19] congratulations! [16:36:25] and it is using JJB which is even better [16:36:50] it is a bit more painful than clicking / modifying via the gui but I find it easier to review changes and track what is being modified [16:38:01] <^d> I'm going to take a break and grab something to drink while this builds. [16:38:12] <^d> I'll need to amend again to copy off the binary to where we want to save it too. [16:38:53] archive: for the win! [16:39:03] I usually create "log" directory [16:39:07] and put artifacts there [16:39:15] (wiping log/ before the run begin) [16:51:18] Hi all, I am getting Permission denied (publickey,hostbased). [16:51:19] error [16:51:25] when I try to ssh [16:51:31] I am logging in for first time [16:51:57] Whats the mistake I am doing? [16:55:53] This is the output I get : OpenSSH_5.9p1 Debian-5ubuntu1.1, OpenSSL 1.0.1 14 Mar 2012 [16:55:53] debug1: Reading configuration data /home/.../.ssh/config [16:55:53] debug1: Reading configuration data /etc/ssh/ssh_config [16:55:53] debug1: /etc/ssh/ssh_config line 19: Applying options for * [16:55:53] debug1: Connecting to tools-login.wmflabs.org [208.80.153.224] port 22. [16:56:20] tuxnani: Hi! Which server do you try to reach? Do you have a different username on your own machine and on Labs? In this case you need to specify the Labs shell name, i. e. "ssh username@tools-login-eqiad.wmflabs.org". [16:56:59] .. [16:58:56] tuxnani: Have you uploaded your key to wikitech? I don't see your key in the Labs cluster. [16:59:44] ^d: hhvm build is a success!!!!!!!!!!! https://integration.wikimedia.org/ci/job/hhvm-daily-build/3/console [16:59:54] <^d> Yepppp :D [17:05:51] (03PS1) 10John F. Lewis: Add #wmt-ko to the family [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/118096 [17:05:52] scfc_de: I added it [17:05:59] Still I get the same error [17:06:34] (03CR) 10PiRSquared17: [C: 032 V: 032] Add #wmt-ko to the family [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/118096 (owner: 10John F. Lewis) [17:07:24] tuxnani: Are you connecting as tuxnani@tools-login-eqiad.wmflabs.org? [17:07:39] scfc_de: The problem seems to be username mismatch [17:07:52] I am having a different username in local machine [17:07:57] Whats the way out? [17:08:50] Are you using Linux? Then you need to "ssh tuxnani@tools-login-eqiad.wmflabs.org". [17:09:01] I am using the same command [17:10:13] <^d> hashar: Amended again to archive the build artifact. Can you test it again? :) [17:10:34] ^d: sure [17:10:57] tuxnani: The log on tools-login-eqiad shows that you logged in successfully. [17:11:10] scfc_de: Let me check [17:11:40] scfc_de: I am logged in indeed. But I cant access login to db [17:11:46] ^d: ahah [17:11:47] sql enwiki_p [17:12:02] says I am not allowed to login to sql [17:12:13] ^d: we can capture the artifacts and copy them back on the master (gallium) then have the build published under integration.wikimedia.org somewhere [17:12:23] tuxnani: What's the error message? [17:12:25] ^d: but for now that is probably good enough [17:12:40] <^d> I'm setting up a proxy & vhost on the slave too. [17:12:44] tuxnani@tools-login:~$ sql tewiki_p [17:12:45] Enter password: [17:12:45] ERROR 1045 (28000): Access denied for user 'tuxnani'@'10.68.16.7' (using password: YES) [17:12:48] <^d> So we can fetch them from hhvm-build [17:13:58] I created a tool account I think and I cannot become it [17:14:02] tuxnani: Try again, please. (There may be a short (< 5 minutes) delay between first log in and DB access.) [17:14:06] ^d: refreshing job [17:14:13] ok [17:14:17] tahnk you [17:14:24] scfc_de: thank you sir [17:14:38] TBloemink: You need to log out and in again so that the OS recognizes that you are a member of the tool now. [17:14:42] tuxnani: No problem. [17:14:51] scfc_de, from ssh you mean? [17:14:58] <^d> hashar: http://hhvm-build.wmflabs.org/ ;-) [17:15:44] hehe [17:15:58] <^d> Hmm, I must've misunderstood basedir. [17:16:10] no ide [17:16:11] a [17:16:24] <^d> Fixing. [17:16:25] you should set up JJB on your machine [17:16:39] also if you wanna test you can remove the workspace clean / Comment out the build step [17:16:43] then rerun the job [17:16:53] that would let you easily test the publishing step [17:17:06] TBloemink: Yes. But I see that you logged into tools-login.wmflabs.org, the "old" pmtpa cluster. Coren, do you have any advice for new users how to proceed, i. e. create a new tool in pmtpa and then migrate, or create in eqiad and let pmtpa be? [17:17:25] Create in eqiad, let pmtpa die. [17:17:45] Should probably fix the docs, but it's less than a week left so... [17:18:13] TBloemink: Then you should log into tools-login-eqiad.wmflabs.org exclusively and not worry about the "old" stuff. [17:18:17] petan: bot je mrtvý [17:24:12] <^d> hashar: Got it setup :p [17:27:49] ^d: there is some more doc at https://www.mediawiki.org/wiki/Continuous_integration/Jenkins_job_builder [17:28:09] <^d> Yep [17:28:14] you can update a single job with something like: [17:28:20] jenkins-jobs --conf etc/jenkins_jobs.ini update config/ hhvm-daily-build [17:29:21] hashar: i liked fab a lot [17:33:00] ^d: do you need anything else from me ? [17:33:24] ^d: if you want to test the publisher, prevent git from wiping the workspace and comment out the build [17:33:28] ^d: that will be faster :] [17:34:01] ^d: you can also play with the archive: publisher wich would save a copy of the hhvm binary with the build http://ci.openstack.org/jenkins-job-builder/publishers.html#publishers.archive [17:34:26] <^d> Nope, I think I got it now. [17:34:29] <^d> Thanks for all your help! [17:36:28] ^d: you can even rename hhvm to include the build # or gitsha1 [17:36:36] ^d: Jenkins has a few global env you can use https://wiki.jenkins-ci.org/display/JENKINS/Building+a+software+project#Buildingasoftwareproject-JenkinsSetEnvironmentVariables [17:36:38] <^d> I probably will :) [17:37:26] and the git plugin set a few more https://wiki.jenkins-ci.org/display/JENKINS/Git+Plugin (look for 'environment variables' at the bottom) [17:37:42] GIT_COMMIT - SHA of the current GIT_BRANCH - Name of the branch currently being used, e.g. "master" or "origin/foo" :-] [17:39:57] ^d: I am off :-] Ping me by email for follow up [18:01:48] Coren: scfc_de`any infos about the wm-bot? --> http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs/ [18:02:34] hedonil: No. Ask petan? Also, check labs-l, I think I recall a recent email about irc logs. [18:03:00] * hedonil checks [18:06:03] hedonil: My guess: petan pointed bots.wmflabs.org at a new webserver in eqiad that accesses eqiad shared project storage, but the logs are written to pmtpa shared project storage. [18:08:22] scfc_de: 'k. first it was 502 then 404. so poking petan again... [18:10:11] petan: poke. wm-bot/logs = 404 [18:12:17] YuviPanda: [18:13:56] YuviPanda: have you woken up yet? [18:14:03] AzaToth: yeah, in a meeting [18:14:07] k [18:18:58] okay, back.And how exactly should I now drop files on there? [18:33:28] hmmmm [18:36:16] could someone assist me with uploading files to the tools labs? /me is quite a noob to tools [18:37:25] scfc_de, ?:) [18:37:55] TBloemink: I would recommend using scp to transfer the files [18:38:35] TBloemink: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Accessing_Tool_Labs_and_managing_your_files [18:39:10] and as I evilishly assume you are using Windows, see https://wikitech.wikimedia.org/wiki/Help:Access_to_ToolLabs_instances_with_PuTTY_and_WinSCP [18:39:51] wonderful, it worked [18:39:53] using winscp [18:47:21] hedonil: do you have a minute for some questions about access to page counts for English Wikipedia? [19:09:24] Nettrom: back now. sure [19:22:27] hedonil: thanks, I just /msg'ed you, hope that's ok [19:23:53] hey, I'm getting errors when trying to delete instances that we don't need any more: Failed to delete instance parsoid-roundtrip3 (i-000004d8). [19:25:26] Coren, ^^ [19:26:14] gwicke: Entirely possible, given the current two-headed configuration. Is this an old instance in tempa? [19:27:24] yes, a bunch of them [19:27:43] we now have real hardware for parsoid rt testing [19:27:43] gwicke: Then don't worry about it; just mark them as 'doesn't need migration' in the migration doc and it'll go away when we shut tampa down. [19:27:51] !migration [19:27:51] https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration_Progress [19:28:08] Ah, wrong page. [19:28:25] No, right page. Last section. :-) [19:28:28] should I add a note at https://wikitech.wikimedia.org/wiki/Nova_Resource:Visualeditor ? [19:28:44] Second-to-last, I suppose. :-) [19:29:13] some instances need to be migrated, but not all [19:29:22] Put it in the migration progress page the bot just linked to, in the 'finished' section. [19:29:47] Oh, wait, you'll need actual migration? You're not creating afresh? [19:30:11] no, a few VMs should just be migrated over [19:30:16] Drop an email to andrewbogott_afk, then; he's the one who'll need to know. [20:19:02] Coren: Is this expected or something funky with my instance? "chgrp: changing group of `/data/project/': Read-only file system" [20:19:49] bd808: You might have to reboot the instance, it booted faster than the filesystem could do ACLs. :-) [20:20:17] I haven't found a clean way to serialize those two things yet. [20:20:18] Hmmm.. Ok. I just rebooted but I can try that again [20:20:41] I also tried unmounting and remounting with the same result [20:21:13] Yeah, I've seen that happen before. I think there is some caching of permissions that takes some time to time out. :-( [20:21:21] It always self-corrects after a little while. [20:21:43] Okey doke. As long as I'm not crazy… :) [20:27:20] Coren: I've tried rebooting 4 times now with no joy; any alternate fix? Or point me to the scripts that are running out of order? [20:27:28] hello [20:28:15] hashar: Shouldn't you be doing something more fun than hanging out here? ;) [20:28:23] bd808: It's not scripts; it's the NFS server's idea of what your access rights are/should be. If you inspect them they agree that you should have rw, but there is a layer that cached your earlier ro. [20:28:44] bd808: daughter asleep, wife watching some american soap on TV [20:28:52] bd808: and I took some sun bath this morning :] [20:29:22] bd808: I'm in the middle of another thing, but if you give me a little bit I'll use you as a guinea pig to figure it out right afterwards. [20:29:55] Coren: Sounds good. I'll go poke at a different instance in a different project [20:30:42] Hi all. Just set up a new tool (templatecount), created an index.php and ran webservice start [20:30:52] (all in eqiad) [20:30:56] Yet I'm getting 404 erros [20:30:59] *errors [20:32:34] Not showing on the list of tools either [20:34:39] ...but does appear on http://tools-eqiad.wmflabs.org/?list [20:34:39] Mmm, now working at http://tools-eqiad.wmflabs.org/templatecount/ but not at the regular address. [20:35:23] (And not working at http://tools-eqiad.wmflabs.org/templatecount ?) [20:36:35] jarry1250__: Ah, I see the issue. The proxy thinks the tool also exists in pmtpa. [20:37:04] Coren: What's the indicator that the proxy uses? [20:37:27] Hm. Actually, it /shouldn't/. [20:37:44] scfc_de: It uses the absence of a webservice or public_html, but right now neither are there so I'm a little confused. [20:39:01] Oh, ow. The tool's /home/ must be there though. [20:39:04] * Coren fixes that. [20:40:31] fix't [20:40:40] ok, so I've run into a problem while migrating to eqiad: newly migrated webtools (cgi python) are returning "Four hundred and four!" whereas they used to work in pmtpa [20:41:10] dungodung: 404 and not 'No webservice'? [20:41:16] Coren: indeed [20:41:16] dungodung: URL? [20:41:20] e.g. http://tools.wmflabs.org/rightstool/cgi-bin/recentlogs [20:41:30] running the script from CLI returns valid html [20:41:46] which is quite baffling [20:41:47] That used to work without webservice, right? [20:41:58] I suppose [20:42:11] Move the cgi-bin /inside/ public_html. [20:42:17] !newweb [20:42:17] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help/NewWeb [20:42:25] ^^ is the scheme in use in eqiad [20:42:34] ah [20:43:03] (also needs a config tweak) [20:43:10] hi, i have set up a labs-vagrant instance (role: uploadwizard), and it is incredibly slow, both on equiad and pmpta [20:43:34] page load takes almost exactly 60 sec so i am guessing something is timing out [20:43:58] the debug toolbar reports page generation times below 1 sec [20:44:10] any ideas how i can fix? [20:47:10] Coren: thanks! [20:47:26] tgr: Have you tried loading a static URL from the instance to see if that is slow? [20:48:03] That might help narrow in on proxy vs something in the instance [20:48:26] Testing with curl from the instance itself might be helpful too [20:48:36] bd808: assets load normally, only the main page takes long [20:49:09] So probably not something in the labs proxy setup. Database wait? [20:50:20] Krinkle: tsintution seems to return me to https://tools-webgrid-01/ instead of https://tools.wmflabs.org when I try to change language [20:50:32] api wcalls are fast and those probably use the database [20:51:09] jarry1250__: Example url? [20:51:16] from a tool [20:51:21] plus, the debug toolbar reports very short response times, i imagine database calls would affect that [20:51:48] Krinkle: http://tools.wmflabs.org/templatecount/ then Change my language! [20:51:59] select one, and it autoredirects back [20:52:07] empty page [20:52:49] Getting a blank page from that url, this one works though: http://tools-eqiad.wmflabs.org/templatecount/ [20:52:50] testing [20:53:04] Hm.. indeed [20:53:09] probably something in the proxy causing that to happen [20:53:18] it used to work fine in pmtpa on the default apache [20:53:25] now that it has a separate web service.. [20:54:04] jarry1250__: $_SERVER['SERVER_NAME'] [20:54:07] thanks, Ill look into it [20:54:16] Krinkle: this effect occurs with the new web if a trailing / is missing in the url [20:54:16] Coren: getting a blank page from tools.wmflabs.org now :( I swear it worked briefly there [20:54:31] Coren: Specifically http://tools.wmflabs.org/templatecount/ [20:54:55] jarry1250__: ... I see a form with gears in the background. [20:55:05] jarry1250__: works for me, too [20:55:33] Coren: Okay, gdgd, must be caching or might be cookies. It's just that it wasn't working for Krinkle either than confused me [20:56:40] Coren: Can we get the tools webservice to preserve the the Host: header as it is originally? [20:56:50] http://tools.wmflabs.org/intuition/_server.php [20:56:59] right now HOST and SERVER_NAME change to tools-webgrid-01:4061 [20:57:11] or wherever it ends up [20:57:43] bd808: the HTML comment at the very end of the page says "Served in 0.134 secs" - i imagine that includes everything that mediawiki does? [21:00:07] Krinkle: I don't think that's possible when proxying. [21:00:16] tgr: In theory. That comes from wfReportTime() comparing microtime() with $wgRequestTime [21:00:21] Krinkle: Because it has to be a correct HTTP request. [21:00:30] Coren: I'm pretty sure it is. We do lots of proxying in Wikimedia production, and yet MEdiaWiki is able to maintain the server name through all that. [21:00:44] HTTP request target and Host: header don't have to match [21:02:27] This is among other things what powers bit.wikimedia.org and it proxing back to application land etc. and of course all apaches inside wmf clusters. [21:02:43] granted, they're probably more complex than this proxy, but it shouldn't be hard to forward the header properly [21:03:40] tgr: So static assets are fast and php thinks it's fast. Have you looked at the browser debug tools to see if the 30s is spent waiting for the page body or somewhere else? [21:03:53] waiting, yes [21:04:14] it only starts receiving data after 60 sec [21:04:47] this is for normal pages, api.php works normally [21:04:58] Krinkle: I'll look into it; but I'm honestly not sure why you'd be that interested in it since you normally won't be doing virtual hosting. :-) [21:04:58] Coren: Sorry to drop this here, but do you think this could be fixed? I'm basically blocked on that for Intuition's localization functioning. I currently use plain Host. I also looked at the more elaborate logic MediaWiki uses (which covers pretty much all reasonable and many unreasonable proxy setups) and that doens't get it either. The web tools proxy puts it in a 'HTTP_X_FORWARDED_SERVER' header, [21:04:58] but even MediaWiki doesn't use or need that. [21:05:09] scfc_de: cron daemon seems not to be running on tools-dev [21:05:26] Coren: Because I use libraries that aren't written specifically and exclusively for "tools.wmflabs.org". [21:05:39] for one, tools-eqiad.wmflabs.org is an exception already, full urls need to preserve that [21:05:40] Krinkle: Ah, and they construct absolute addresses? [21:05:50] How evil. [21:06:15] Depends. Needing an absolute url is quite reasonable imho. Relative paths work for lots of things, but not everything by a long shot. [21:06:21] Just look at a random source output from mediawiki [21:07:43] tgr: I'm stumped. [21:09:47] Coren: I'm trying to migrate the 'grouplens' project to eqiad, it doesn't want to finish correctly, though [21:10:09] Krinkle: And also look at how many hacks there needs to be to make this work right in many setups because of that very reason. :-) (Including $wgJunk to override any of it). But okay, I'll look into it. [21:10:10] project = tool (or whatever you'd like to call it) [21:10:21] Nettrom: What error are you running into? [21:10:48] Coren: "That tool doesn't seem to be migrated yet." although I get "Copy complete" on pmtpa [21:11:02] Nettrom: What tool is this? [21:11:20] Coren: grouplens, we have it as a shared space for research projects [21:11:28] Coren: "Unable to create and initialize directory '/home/bd808'" from multiple instances in deployment-prep project in eqiad. Not cleared by reboot. The eqiad NFS server hates me. [21:11:58] bd808: Clearly. Are those new or migrated instances? [21:12:11] Coren: New instances. [21:12:24] Coren: True, for Wikimedia we don't rely on MediaWiki detecting the hostname (we hardcode $wgServer) - however, we do rely on it very much actually, just not in mediawiki-core. wmf-config InitialiseSettings, wgConf, MultiVersion etc. all use SERVER_NAME to determine which wiki we're on. [21:12:32] and thus rely on all proxies in-between to forward it [21:13:16] so it goes SERVER_NAME -> wiki db name -> hardcoded wgServer for that wiki db name [21:13:27] he [21:15:06] Nettrom: I'm not seeing it as migrated at all. Hmm. Lemme check something. [21:15:52] Coren: I noticed that the shell complains about my locale settings (I used my Mac to SSH in), not sure if that's got something to do with it [21:16:03] s/complains/complained/ [21:16:22] Nettrom: That seems unlikely; but I'm looking at it now to see what's up. [21:16:30] Coren: thanks! [21:17:55] Coren: Can you point me roughly in the direction of where this proxy is configured / ran? Maybe I can submit a patch. [21:18:03] which software and its config [21:19:00] Krinkle: There are a couple of proxies in the way, both apaches. Give me a few and I'll be all yours. [21:19:35] Coren: FYI the NFS ACL problem in the wikimania-support project healed itself [21:19:50] Cool :) [21:20:03] or you healed it quietly [21:20:13] bd808: Yeah, like I said, there's something that's caching it I just haven't tracked down what yet. [21:20:57] Krinkle: Might not be a solution to your problem, but in my tools (pre-Labs, on Toolserver and everywhere else) I usually follow MediaWiki's example of an absolute URL in the configuration as determining it automatically is indeed troublesome. [21:21:26] hedonil: Cron on tools-dev(-pmtpa) or tools-dev-eqiad? [21:21:50] scfc_de: has been restarted yet [21:22:06] Nettrom: There was an issue with your copy which I've fixed; you should be able to finish-migration now [21:22:55] !ping [21:22:56] !pong [21:22:57] ok [21:24:10] YuviPanda: Hey, btw, when do you think you'll be ready to do the yuviproxy things for tools? I'd /love/ to get rid of the apaches for good. :-) [21:25:18] Coren: hey! Am travelling/working this week, so probably not this week :( I'll try to get to it on the weekend. [21:25:29] Coren: apps deadline is also approaching, so my ops work has taken a hit [21:26:06] Coren: last time I looked, it just needed 'webservice' to plug into it. [21:28:08] Krinkle: I can 'ProxyPreserveHost On'; this will do what you need at the cost of dragging 2616 in an alley and roughing it up a bit. [21:28:28] Coren: 2616? [21:28:46] RFC 2616. HTTP/1.1 :-) [21:28:49] Ah, the RFC [21:29:20] * Coren tries it. [21:29:41] What part would it violate? Maybe there's another way? What are the side effects? Are they limited to within the internal requests? I wonder how we do it in production [21:29:56] * Krinkle hopes it works [21:31:47] Krinkle: Basically, it makes the server-side proxy behave as a client-side proxy. In /practice/ there shouldn't be any significant issues. [21:33:09] Krinkle: That should do it. [21:34:54] Coren: I lost connection IRC, looks like you fixed the "grouplens" tool migration? [21:34:57] Change on 12mediawiki a page Developer access was modified, changed by PleaseStand link https://www.mediawiki.org/w/index.php?diff=926175 edit summary: [+3] fixed button styling [21:35:18] Nettrom: There was an issue with your copy which I've fixed; you should be able to finish-migration now [21:36:01] Change on 12mediawiki a page Developer access was modified, changed by PleaseStand link https://www.mediawiki.org/w/index.php?diff=926178 edit summary: [+3] combine divs [21:36:33] Coren: thanks! I seem to be having the same problem with my other tool, "suggestbot", does that require your intervention too? [21:37:55] Nettrom: It probably will; if you did it at the same time or so. Gimme a sec. [21:38:09] Coren: yep, I did it shortly after, sorry. And no worries, take your time! [21:39:01] Nettrom: Trivial fix. [21:40:39] Nettrom: In progress (for real) [21:42:34] Coren: awesome! [21:51:46] RE: Migrations -- if I start a new instance will it be created in equiad? Is there a 'zone' setting, etc? [21:52:42] duh [21:52:42] found it [22:06:49] Coren: Ready for yet another NFS issue in eqiad? I'm trying to use the labs-vagrant role and the vagrant user and it's /home/vagrant homedir are created, but /home/vagrant is owned by root:root and trying to change that with chmod gives "chown: changing ownership of `/home/vagrant': Invalid argument". [22:06:51] My wild guess is that this is the NFS server saying it doesn't know who "vagrant" is and the NFS server has idmapd turned on: https://www.novell.com/support/kb/doc.php?id=7014266 [22:08:13] bd808: Your guesses are on the nose. You really should have a LDAP user for anything of the sort, but if you can't do it for some reason I'll add one to the NFS servers. [22:08:31] bd808: So that's not an issue, it's by design. [22:09:24] Coren: Ok. I'd be fine with an LDAP vagrant user. This is just a change/regression from pmtpa [22:09:37] Coren, I'm branching xtools [22:09:50] The vagrant user is created by the labs_vagrant puppet module [22:09:57] "branching"? [22:10:24] Coren, moving each tool onto their own toollabs tool. [22:10:28] bd808: Sadly, proper NFS security requires a directory service. [22:10:32] Cyberpower678: Good move. [22:10:51] bd808: Well, not so much "sadly" as "annoyingly in context" [22:11:10] Coren, I'm also rewriting the tools to be a little more efficient so they don't clog everything so quickly. [22:11:20] Cyberpower678: Even better move. :-) [22:11:39] bd808: Lemme create a user for you in LDAP. 'vagrant' right? [22:11:45] Coren, I'm finishing with the edit counter core. [22:12:02] Coren: yes "vagrant", please and thank you [22:14:26] bd808: Does it need a group too? [22:14:45] Coren: group "vagrant" would be nice [22:16:10] Coren: The puppet class does this: https://git.wikimedia.org/blob/operations%2Fpuppet/db35978959f870c105c754046ea46f8d76736dc2/modules%2Flabs_vagrant%2Fmanifests%2Finit.pp#L2 [22:16:49] Which works to create the instance local user, but then NFS barfs on setting $HOME ownership [22:18:39] bd808: User and group added in ldap. Remember to remove the local user to avoid nightmares. :-) [22:18:56] Coren: Thanks. I'll give it a try [22:19:53] jarry1250___: Does it work now with intuition? [22:19:55] http://tools.wmflabs.org/intuition/_server.php [22:20:00] shows SERVER_NAME is good now [22:20:02] so it should be fine [22:20:31] Krinkle: Seems to :) [22:20:58] Thanks! [22:28:28] Coren: The ldap user works. Thanks again for the help [22:31:57] bd808: That should really be best practice anyways. Having disparate users on disparate servers is just begging for trouble. [22:32:19] * bd808 nods [22:32:38] labs_vagrant is a big bag of nails. Handy but dangerous [22:33:02] Coren: did the 'krinkle' patch fix the newweb / issue? [22:33:42] hedonil: Define "issue"? [22:35:06] Coren: missing trailing slash resolves link to webgrid-name instead to hostname [22:36:35] unfortunately I have no example at hand to test it [22:37:38] hedonil: It might, but that'd depend on how lighttpd constructs the implicit redirects. I'm guessing it would though. [22:44:40] hedonil: I lost track on what has been done, but http://tools-eqiad.wmflabs.org/wikilint/test redirects to http://tools-eqiad.wmflabs.org/wikilint/test/, and previously that failed IIRC. [22:49:14] scfc_de: you mean the 'krinkle' thingy? [22:49:44] hedonil: Yep. [22:50:28] scfc_de: they changed some variables http://tools.wmflabs.org/intuition/_server.php [22:51:13] scfc_de: s/variables /headers [22:52:07] scfc_de: mybe thaht was it (22:28:10) Coren: Krinkle: I can 'ProxyPreserveHost On'; this will do what you need at the cost of dragging 2616 in an alley and roughing it up a bit. [22:54:58] Coren: Could you please update https://bugzilla.wikimedia.org/show_bug.cgi?id=59926 with what you did where? [23:12:50] Coren: got my tools and my user account successfully moved to eqiad now, thanks again for the help! [23:17:59] !log wikimania-support Setup wikimania-scholarships.eqiad.wmflabs serving https://wikimania-scholarships.wmflabs.org via labs-vagrant and the wikimania_scholarships role [23:18:00] Logged the message, Master [23:33:10] <^d> !log integration spinning up new general purpose slave in eqiad, integration-slave1001. will replace slave02 (and maybe 03) from pmtpa [23:33:12] Logged the message, Master [23:33:55] <^d> !log integration slave hhvm-build added to eqiad earlier today, running hhvm nightly builds for testing [23:33:56] Logged the message, Master