[12:33:54] using jsub, can I submit a job, but delay its implementation seeing I have whacked a load of other jobs into play? [12:36:02] oh, is that qsub? [12:52:21] sDrewth: no, you can't; but you can use jlocal and then submit your jobs [13:05:25] 6Labs, 10Labs-Infrastructure: wikidatawiki_p on Labs lags ~20h behind live site - https://phabricator.wikimedia.org/T105632#1534695 (10Magnus) 5Open>3Resolved a:3Magnus [13:06:25] 6Labs, 10Labs-Infrastructure: wikidatawiki.labsdb user database is impressively slow - https://phabricator.wikimedia.org/T95276#1534697 (10Magnus) 5Open>3declined Moved to different server [13:18:37] 6Labs: Setup checkpoint check for private DNS - https://phabricator.wikimedia.org/T107453#1534727 (10coren) That's correct; not only is the data source different, but the mechanism is as well. [13:23:17] 6Labs: Create a checkpoint check for labs LDAP - https://phabricator.wikimedia.org/T107454#1534762 (10coren) If we do that through tools-checker, don't we create a dependency loop? (That is, tools-checker relies on the system we are testing) [13:42:20] 6Labs, 6operations: Investigate whether to use Debian's jessie-backports - https://phabricator.wikimedia.org/T107507#1534816 (10coren) Opsen consensus on IRC is that jessie-backports should be disabled fleet-wide and any needed package brought into jessie-wikimedia. [13:44:00] 6Labs, 6operations: Make certain that jessie-backports is disabled fleetwide. - https://phabricator.wikimedia.org/T108941#1534819 (10coren) 3NEW [14:48:41] !log tools rescheduling (and in some cases killing) jobs on tools-exec-1203 tools-exec-1210 tools-exec-1214 tools-exec-1402 tools-exec-1405 tools-exec-gift tools-services-01 tools-web-static-02 tools-webgrid-generic-1403 tools-webgrid-lighttpd-1204 tools-webgrid-lighttpd-1209 tools-webgrid-lighttpd-1401 tools-webgrid-lighttpd-1405 [14:48:41] tools-webgrid-lighttpd-1408 [14:48:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, dummy [14:48:56] !log tools and tools-webgrid-lighttpd-1408 [14:48:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, dummy [14:50:46] valhallasw`cloud: I’m running the killjobs script from yesterday and now it’s saying denied: host "tools-master.tools.eqiad.wmflabs" is no submit host [14:50:48] red herring? [15:00:07] andrewbogott: Did you open a ticket for the issue you had with the bastions yesterday when doing the rebooty? [15:01:00] um… issue I had with the bastions, refresh my memory? [15:01:05] You mean last night, the ssh key thing? [15:01:09] Yes? [15:01:43] Last message on my log when I came in was you declaring victory and return to status quo. :-) [15:02:00] (Which is good news, I hoped) [15:02:52] Coren: ah, that was unrelated to reboots. [15:02:56] I have a theory, will write a report soon. [15:03:04] But first I need to reboot labvirt1002 ‘cause it’s on the schedule. [15:03:17] Oh! Good, that means there's no alterations to the plan or process for the reboots then? [15:03:24] right [15:03:36] (Also, Mark wondered why that wasn't on the -109 board) [15:03:51] valhallasw`cloud: I’m rebooting labvirt1002 now despite scripting failure… please check in when you’re around. [15:03:56] Coren: which, the reboots? [15:04:03] * Coren nods. [15:04:26] Ah, it is, just, the bug is obscurely named. [15:04:31] It’s the ‘kernel issues’ bug [15:04:50] Ah, you just forgot to move it from 108 to 109. :-) [15:05:15] 6Labs, 6operations, 3Labs-Sprint-107, 3Labs-Sprint-108, and 2 others: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1535250 (10coren) [15:05:17] did I? [15:05:20] ok, thanks. [15:06:19] Hm. I need to take some of those reboots from you for next week to. [15:06:32] * Coren goes to edit the schedule now before he forgets. [15:06:39] Yeah, I might be around next week actually. I’ll twiddle with the calendar. [15:06:51] Is the ordering significant? [15:07:55] nope, doesn’t matter [15:08:17] and, btw, valhallasw`cloud and I have worked out a checklist for rebooting. It’s a wip but the phab link is https://phabricator.wikimedia.org/T108669 [15:13:12] bah, rebooting doesn’t help if I don’t install the new kernel first [15:17:58] Noted. [15:18:13] And yes, doing the upgrade first is generally considered beneficial. :-) [15:21:23] ok, so, labvirt1002 is up and the VMs are gradually starting. Beta, then Integration, then Tools, then everything else. [15:23:38] 6Labs, 6operations, 3Labs-Sprint-107, 3Labs-Sprint-108, and 2 others: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1535333 (10Andrew) labvirt1001, 1002, 1009 done. [15:56:18] 6Labs, 6Multimedia, 6operations, 10wikitech.wikimedia.org, and 2 others: Some wikitech.wikimedia.org thumbnails broken (404) - https://phabricator.wikimedia.org/T93041#1535441 (10Krenair) I checked several other images and most of them showed the same error. [16:03:36] hello, I've got a new tool I just created and am trying to install Ruby with rbenv. I was able to do this with my musikbot tool but for this tool I'm getting `Permission denied @ rb_sysopen - /data/project/musikanimal/.rbenv/versions/2.2.1/lib/ruby/gems/2.2.0/gems/minitest-5.4.3/.autotest (Errno::EACCES)` [16:03:58] it might be related to https://phabricator.wikimedia.org/T106170 [16:04:27] I can do `ls -al` on the relevant directories and the permissions all look right [16:04:54] but I was thinking... maybe I could somehow make the musikanimal tool use the same Ruby as musikbot? [16:05:23] or I could convince one of you admins to install Ruby 2.2+ system-wide...? [16:08:44] MusikAnimal: that directory doesn't exist [16:10:12] hmm I you are correct but the installation script should create it [16:10:25] I'll try making those directories myself [16:10:53] Coren: fyi https://wikitech.wikimedia.org/wiki/Incident_documentation/20150812-LabsOutage#Summary [16:11:56] but I still wonder if labs admins would upgrade the Ruby? Coren what do you think? the current system version, 1.9.3, was released nearly 8 years ago and is officially obsolete ;) [16:13:42] MusikAnimal: Ruby 2.0.0.484-1ubuntu2.2 is packaged for ubuntu, so that could be installed, I think. [16:14:04] hmm 2.2 is highly preferred, it has lots of performance enhancements [16:14:07] and security updates [16:14:30] 2.2 is not packaged in trusty, so no, we can't install that system-wide. [16:15:44] MusikAnimal: it needs "trusty -> jessie" upgrade of entire servers [16:15:54] I don't know how that stuff works to be honest, but I did find this: https://www.brightbox.com/blog/2015/01/05/ruby-2-2-0-packages-for-ubuntu/ [16:16:09] because my bot runs on trusty and uses Ruby 2.2.1 as far as I know [16:16:28] MusikAnimal: yea, we can't just add PPAs [16:16:45] valhallasw`cloud: you could backport the utopic 2.2 package...? [16:16:53] possibly [16:16:55] er, wily* [16:16:56] http://packages.ubuntu.com/search?keywords=ruby2.2 [16:17:00] we do want the jessie upgrade though [16:17:13] which would also solve this [16:17:19] the jessie upgrade for tools is far away for now [16:17:45] i dont know about that or why, but we do want it [16:35:24] Jessie only has 2.1, so that doesn't solve the issue [16:35:32] so we'd always need to backport [16:44:50] !log tools disabling job queue for tools-exec-1216 tools-exec-1219 tools-exec-1407 tools-mail tools-services-02 tools-webgrid-generic-1401 tools-webgrid-lighttpd-1202 tools-webgrid-lighttpd-1207 tools-webgrid-lighttpd-1210 tools-webgrid-lighttpd-1402 tools-webgrid-lighttpd-1407 [16:44:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, dummy [17:38:04] MusikAnimal: The problem with an upgrade like this is the support libraries. Do you know what version Jessie supports? [17:38:13] Coren: 2.1 [17:38:24] so it'd have to be a backport from wily [17:38:31] It might be possible to use a backport with some effort then. [17:39:16] well I finally did manage to get Ruby 2.2 installed [17:40:32] I'm still confused how my bot runs on 2.2 but is apparently also on trusty? [17:40:43] MusikAnimal: because you installed ruby 2.2 yourself [17:42:00] so it is possible to have 2.2 on trusty? why can't it be system-wide? sorry for my lack of understanding [17:44:38] Not that it's a big deal or anything. I don't think there's many other Ruby tools on labs, mine might even be the only one [17:49:31] MusikAnimal: basically, we need to make sure the package we install is safe in the sense that it won't take over all of tool labs to, I dunno, mine bitcoins [17:49:51] but also that it won't break everything that's already running [17:50:13] if ubuntu packages it, we can be certain of that [17:50:17] with PPAs, not so much [17:50:44] I see [17:51:06] now, when a user installs ruby themselves, they are also in control of upgrading [17:51:17] so we won't accidentally break their tools [18:17:45] ok all — I’m about to re-merge the patch that I merged last night right before the outage. I’m confident that that patch was not the cause… but not so confident as to not mention it here [18:31:25] * valhallasw`cloud eyes grrrit-wm [18:50:40] !log tools tools-exec-1201/Puppet staleness was critical due to an agent lock (Ignoring stale puppet agent lock for pid
Run of Puppet configuration client already in progress; skipping (/var/lib/puppet/state/agent_catalog_run.lock exists)) [18:50:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [18:51:23] !log tools which was resolved by scfc earlier [18:51:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [19:30:16] any xtools people on? [20:36:07] Hi, is tools labs down? [20:36:43] I have a server setup on tools-labs and I keep getting a 503 error. [20:43:00] ankita-ks: depending on the exact error, that either means your webservice is not running, or there's a bug in your code. [20:43:18] what tool is this? [20:43:35] languagetool [20:43:41] But it seems to be working fine now [20:43:45] I just restarted it [20:43:46] :/ [20:43:53] I don't know what went wrong [20:51:53] valhallasw`cloud: did you catch my message earlier about killjobs not working after your rewrite? [20:52:04] andrewbogott: no.... [20:52:37] andrewbogott: were you running it from tools-master? [20:52:47] yes [20:52:49] is that wrong? [20:52:53] " denied: host "tools-master.tools.eqiad.wmflabs" is no submit host" [20:53:05] andrewbogott: because (in true SGE logic) you're not allowed to do that -- you're only allowed to run qsub/qdel from submit hosts = the bastions + tools-submit [20:53:23] haven’t we been doing this stuff from the master historically though? [20:53:32] I'm actually not sure why not all of our hosts are submit hosts, to be honest [20:53:32] Or does that mean it was never working? [20:53:45] Well, anyway, that’s easy enough to do right next time. thanks [20:53:50] I always run SGE commands from tools-bastion-01 [20:54:23] but there's various things that are only allowed from certain groups of hosts [20:54:30] which makes no sense to me, because we already have user auth [22:29:32] 10Quarry: Every second attempt to use Quarry to do an SQL query fails - https://phabricator.wikimedia.org/T109014#1537736 (10Iislucas) 3NEW [22:30:53] 10Quarry: Some long queries give no results - https://phabricator.wikimedia.org/T109016#1537760 (10Iislucas) 3NEW