[00:20:23] I give up on the return to for now [00:20:46] Ryan_Lane: Done otherwise [00:20:58] sweet [00:21:05] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/TODO was modified, changed by MPelletier (WMF) link https://www.mediawiki.org/w/index.php?diff=658287 edit summary: [+159] New use case (connectivity) [00:22:01] There's probably some code cleanup to do, but I'll do it later on my local machine for eeease [00:22:48] openstack-foobar? :) [00:23:30] lol [00:23:45] I presumed I would've reworked that by now... [00:24:01] Though, it doesn't actually make any difference [00:24:14] * Ryan_Lane nods [00:24:33] fixed [00:57:11] Ryan_Lane: That OSM revisions should be good to go now [00:57:20] For the OATHAuth one - https://bugzilla.wikimedia.org/show_bug.cgi?id=40091 [00:57:25] Does it actually need it's own tab? [00:57:33] I don't think so [00:57:41] I'd like to get the return-to links working, though [00:57:48] otherwise it's confusing for the end-user [00:57:57] Would it make sense just to move the links from Special:OATH to Special:Preferences directly [00:58:04] ie the Enable/Disable/Reset links as appropriate? [00:58:06] probably so, yeah [00:58:14] Which removes one click level [00:58:18] yep [00:58:24] die clicks die [00:58:42] Indeed [00:58:49] And will possibly fix the return to there.. [00:58:52] https://wikitech-test.wmflabs.org/w/index.php?title=Special:OATH&returnto=Special%3APreferences [01:00:09] as it's in the first url, it's just not carried on further on [01:00:14] * Reedy starts poking that [01:00:58] hm. maybe rather than manage oath, it should show enable or reset/disable [01:01:12] that's another click gone as well [01:01:25] and it makes it easier to return to the preferences pane [01:12:06] Ryan_Lane: That's what I was meaning [01:12:11] ah [01:12:12] yeah [01:12:12] https://wikitech-test.wmflabs.org/wiki/Special:Preferences [01:12:16] ^ Done, I think ;) [01:12:38] * Ryan_Lane tries [01:12:50] Damn it. return to is still broken [01:13:09] heh [01:13:09] yep [01:13:24] one step in the right direction [01:19:19] Ryan_Lane: https://gerrit.wikimedia.org/r/53304 [01:19:37] That improves on it, the returnto is another thing I'll have to poke at seperately and see how we normally do it [01:19:50] * Ryan_Lane nods [01:20:31] oh. I'm explicitly setting the returnto somewhere [01:20:43] that's why this isn't working [01:21:38] Oh, I nuked it. Pffft [01:22:17] for instance: $out = Linker::link( $this->getTitle(), wfMsgHtml( 'oathauth-backtodisplay' ) ); [01:22:25] it doesn't use returnto at all [01:22:46] and that parameter isn't being passed through to the htmlform callback, either [01:24:06] https://wikitech-test.wmflabs.org/w/index.php?title=Special:OATH&action=enable&returnto=Special%3APreferences [01:24:29] $returnToTitle = Title::newFromText( $this->mReturnTo ); [01:24:54] should just be a matter of passing it through in htmlform, and using the parameter to make the link [01:25:51] let me see if I can do it really quick [01:38:28] that may do the trick [01:39:48] heh [01:39:51] that sure as hell didn't work [01:41:00] ah. right. it's missing the parameter [01:47:07] almost there [01:48:32] We should probably make it an unlisted special page, and remove the group on Special:SpecialPages [01:49:25] yep [01:49:26] extends SpecialPage => extends UnlistedSpecialPage [01:50:06] 'oathauth-prefs-manage' is unused again too [01:51:52] let me delete that [01:52:43] oh [01:52:45] did you push stuff in? [01:53:56] Where? [01:53:57] needed to rebase [01:55:36] grr [01:56:11] reset isn't returning properly [01:56:14] everything else is [01:58:28] Slightly weird when all the code looks the same [01:59:43] yeah [01:59:52] it's the damn DerivativeRequest, I'm betting [02:02:04] ah [02:02:05] I see [02:02:40] I really need to use variables more consistently [02:04:35] that did it [02:05:52] ok. I think that change is good to go, now [02:05:55] aha [02:06:05] I'll just skim over it [02:06:50] Yeah, LGTM [02:06:54] cool [02:06:56] I'll merge it in [02:07:58] I may as well deploy it, too [02:08:08] yeah [02:10:49] The OSM change should be good to go too [02:12:10] am I a reviewer on that? [02:12:20] it's missing from my queue [02:12:35] found it [02:12:55] doing a quick review [02:13:19] https://gerrit.wikimedia.org/r/#/c/53248/ does a bit of cleanup [02:14:12] AFK back later [02:14:30] ok [04:01:59] addshore [04:02:07] 2030 [04:02:08] load [04:02:10] :D [07:05:40] petan, you around? [07:06:36] I've started linkwatcher again, using your script and "qsub -u long /data/project/beetstra/linkwatcher/linkwatcher.sh -o /data/project/beetstra/linkwatcher/syslog.output -e /data/project/beetstra/linkwatcher/syslog.errors" .. it is in the queue already for at least 30 minutes without starting .. how long does that normally take? [07:07:39] depends on what else is running [07:07:40] I've also made the files and started the other 3 bots .. which are also waiting in the queue - 'state -> 'qw' in qstat [07:08:00] hmm .. what other bots are running? [07:08:25] well mine have been running since yesterday [07:08:28] in SGE [07:08:43] idk if anyone else is using it [07:09:29] I was suggested to start using that as well .. since linkwatcher is needin more resources [07:09:41] But those will be running continuously ... [07:09:55] oh these scripts im running are one time [07:10:05] they're the interwiki link removers [07:10:10] actually, linkwatcher was running last night on sg, but I adapted it, and was then suggested to put them in the long queue [07:10:14] Yeah, I know [07:10:27] I think we are doomed to tease each other with our bots :-) [07:12:14] :P [08:13:36] legoktm .. they sometimes get to 'status 't'' .. but get returned to qw, even in the normal queue [08:13:50] i forget what t stands for [08:13:52] * legoktm looks it up [08:13:59] t = transfer [08:14:15] t(ransfering) [08:14:15] T(hreshold) [08:14:17] ah [08:14:38] is there a way to see all the jobs that are being run? or do you need to be root for that? [08:14:50] did not figure that out yet [08:15:11] new to the qxxx commands [08:16:05] OMG [08:16:09] its addshore's fault [08:16:17] hold on a sec [08:16:38] http://bots.wmflabs.org/~legoktm/all.txt [08:17:03] * Beetstra loves it when it is someone elses fault :-) [08:17:08] though [08:17:12] i dont see your jobs? [08:17:24] legoktm@bots-gs:~$ qstat -u '*' > ~/public_html/all.txt [08:17:58] I submitted 4, killed 2 of them (hoping the other 2 came through) [08:19:28] http://bots.wmflabs.org/~legoktm/beetstra.txt [08:20:29] that is curious [08:20:43] legoktm@bots-gs:~$ qacct -o beetstra -j > ~/public_html/beetstra.txt [08:20:54] what does that mean? Those jobs are old and I gdel-d them [08:21:05] thats just your job history [08:21:18] yeah, I see [08:21:21] (now) [08:21:49] qstat -f says now: [08:21:53] main.q@bots-bnr1.pmtpa.wmflabs BIP 0/7972/20000 176.83 lx26-amd64 a [08:21:53] 18439 0.26278 unblockbot beetstra t 03/12/2013 08:21:12 1 [08:21:53] 18440 0.26273 xlinkbot.s beetstra t 03/12/2013 08:21:12 1 [08:22:06] ah [08:22:08] those should be the two 'lighter' bots [08:22:18] but they likely are back in qw in 5 minutes [08:22:54] earlier it was: [08:22:55] 18439 0.26052 unblockbot beetstra qw 03/12/2013 07:34:37 1 [08:22:56] 18440 0.26048 xlinkbot.s beetstra qw 03/12/2013 07:34:47 1 [08:23:02] hmmm [08:24:57] * Beetstra wants to try something .. [08:30:16] OK, that test works - if I log into bots-bnr1, and start the bots using my the script, they run [08:30:35] But submission of the same script into the queue .. stalls. [08:39:38] nah [08:41:19] anyway, they're back in 'qw'-state [08:44:59] hmm [08:45:01] the bot is don [08:45:02] down* [08:45:03] but [08:45:04] petrb to SAL: disabling addshore's cron for a while [08:47:56] wow [08:48:24] addshore: the job i submitted 2 days ago was at 30. now we're at 20262. [08:49:06] hmm [08:49:21] petan: if i schedule a job with the same name, isnt it just supposed to ignore it? [09:07:55] hi [09:07:57] I am here [09:08:32] legoktm no [09:08:34] it will submit it again [09:08:42] Beetstra, legoktm patience [09:08:46] :P [09:08:47] load is not some 8000+ [09:08:50] :P [09:08:51] * legoktm waits patiently [09:09:04] we need to wait for that burst of addshore jobs to finish somehow [09:09:08] I don't want to kill hem [09:09:14] addshore ping [09:09:17] iirc on the toolserver jobs dont get resubmitted [09:09:20] do something with that pls [09:09:27] he should be sleeping now [09:12:48] !log bots deleting all qw jobs of addshore from queu [09:12:52] Logged the message, Master [09:30:01] 20336 0.25679 unblockbot beetstra r 03/12/2013 09:28:23 main.q@bots-bnr1.pmtpa.wmflabs 1 [09:30:06] yay! [09:30:26] :P [10:01:37] Change on 12mediawiki a page Wikimedia Labs/Migration Of Toolserver Tools was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=658587 edit summary: [+287] /* When can I migrate my software to Labs? */ info on db replicas and user dbs [10:12:13] Change on 12mediawiki a page Wikimedia Labs/Migration Of Toolserver Tools was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=658588 edit summary: [+514] /* List of important questions/FAQ */ added section about permissions [10:14:42] oh man [10:14:46] "Failed to add jenkins-bot to deployment-prep. This needs the "loginviashell" right." [10:14:48] seriously [10:15:10] petan: is OG overflowing with me? :< [10:15:29] Change on 12mediawiki a page Wikimedia Labs/Migration Of Toolserver Tools was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=658589 edit summary: [+394] /* List of important questions/FAQ */ added section about stewards [10:16:58] Beetstra: legoktm remember you can set prioritys for tasks, I imagine that would affect how quickly they get picked up in the queue :) [10:17:07] oh how? [10:17:14] * addshore goes to find the parameter [10:17:31] * legoktm sets priority to addshore+1 ;) [10:18:28] qsub -p (priority which is The qsub utility shall accept a value for the priority option-argument that conforms to the syntax for signed decimal integers, and which is not less than -1024 and not greater than 1023.) [10:19:27] Change on 12mediawiki a page Wikimedia Labs/Migration Of Toolserver Tools was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=658590 edit summary: [+302] /* Table of features needed for current tools */ added link to the list of tools [10:21:17] Change on 12mediawiki a page Wikimedia Labs/Migration Of Toolserver Tools was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=658591 edit summary: [+13] /* Wikimedia Germany */ completed WMDE's staff list [10:23:06] Change on 12mediawiki a page Wikimedia Labs/Migration Of Toolserver Tools was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=658592 edit summary: [-3] changed disclaimer on top of the page [10:27:01] Whee!! All 4 running [10:27:16] And even better, linkwatcher is slowly munching away old backlogs [10:28:59] hmm .. but bot is not very, very responsive [10:29:06] Bleh .. lecture .. have to go again [10:29:59] heh .. cancelled .. :-D [11:12:03] hashar need some rights on wikitech [11:12:04] ? [11:12:11] maybe I can help you [11:15:25] addshore yes it was [11:15:41] addshore like 20 000 of your jobs waiting in queue :P [11:15:54] HAH [11:16:02] I have restructured my cron a bit :) [11:16:05] silly OG [11:16:05] ok [11:17:57] haha, i would say it is working looking at the loads but I think I just broke my cron ;p [11:19:35] so, how is this now. Is the box just working on 'full capacity', and the bots have to share that, or do individual bots get a predetermined fraction of 'load' assigned, and they are forced to stay within that? [11:20:41] its currently anything goes as long as the load stays below (i think 15) [11:21:30] did you see my message above about priority Beetstra ? [11:21:46] yeah .. I did not use that [11:21:54] It just started now [11:22:10] And I thought I added '-u long' to the list, but I see they are in the main.q [11:22:47] isnt it -q long? [11:23:07] * Beetstra copied petan's commandline :-) [11:23:14] * Beetstra is new to the 'qxxx' commands [11:23:19] the documentation is so messy [11:23:52] oh yes it's -q [11:23:55] [-u user_list] [11:23:58] hehe [11:24:00] if I told you -u it was not that [11:24:00] [-q destination] [11:24:05] * Beetstra changes script [11:24:08] xD [11:24:40] petan: are my jobs being denied from the queues atm or are they just breaking in some other way? [11:26:05] they are breaking [11:26:17] :D [11:26:20] interesting xD [11:28:16] well at least that cleared out the queue ;p [11:28:16] OK, 3 of 4 transferred to long queue [11:34:55] right, just let it be known that OG really cant handle lots of little jobs very well at all :/ [11:42:58] Beetstra do you need instance bots-liwa now? [11:59:58] okay I wrote some documentation Darkdadaah - https://wikitech.wikimedia.org/wiki/Nova_Resource:Bots/Documentation#Using_OGE [12:00:05] it's not much but better than nothing :P [12:01:09] petan, let me clean up there first [12:01:19] Beetstra cleanup what? :o [12:01:20] but it looks like I am finished with both -liwa and -nr1 [12:01:24] ok [12:01:37] :-) .. there may be a residual backlog file there .. have to check [12:01:41] though I think all is clean [12:01:43] sure just let me know when you wouldn't need them anymore so we can delete them [12:02:20] bots-liwa can go [12:02:27] ok [12:03:03] and I am not using bots-nr1 either anymore [12:03:06] !log bots deleted bots-liwa [12:03:08] Logged the message, Master [12:03:18] (maybe someone else is using bots-nr1?) [12:03:27] I don't know but I will figure out :) [12:03:45] :-) [12:04:14] Beetstra when you have a lot of time, you might consider moving all databases from bots-sql2 to bots-bsql01 - but that doesn't need to be done asap [12:04:27] it's just bigger and faster sql [12:04:29] OK [12:04:36] that is going to take time, indeed [12:04:41] probably yes [12:04:52] But may take a couple of weeks before I have serious time for that [12:04:59] no problem [12:05:06] Unless you have a quick way of just copying them instead of transferring them [12:05:46] I was using mysqldump for that, so probably nothing really fast, but... maybe we can figure out some better solution [12:06:18] Beetstra you still have processes running on -nr1 [12:06:24] huh [12:06:47] perl LinkSaver [12:07:04] petan, the servers are configured to use one file per db? [12:07:12] Platonides only the new one [12:07:21] Platonides these old sql servers were not I think [12:07:29] killed them .. sometimes the modules of my bots don't autodie [12:07:33] I was to suggest rsyncing the innodb files [12:07:36] Platonides one file per table, not db [12:07:47] but if it's in one big block, you may not be able to do that properly [12:07:48] Platonides not possible in this case :( [12:08:19] but maybe I find a way to quickly convert the one-file to multiple [12:08:24] like recreating the db online or something like that [12:08:30] who knows [12:08:36] maybe there is some tool for that [12:08:50] but it can wait now [12:09:32] !log bots deleting -nr1 [12:09:33] Logged the message, Master [12:11:19] Platonides I fixed that * bug [12:11:21] :P [12:11:22] petan: About OGE: -o -e and -j don't work? [12:11:26] now it tell you invalid name [12:11:37] Darkdadaah they should but they don't [12:11:40] and I have no idea why [12:11:52] everytime when I used them, nothing happened [12:12:16] I mean - the job started, finished and no output :/ [12:12:48] If people want to reuse their scripts, we should try to make those work. [12:12:58] hm... indeed [12:13:06] the script I wrote is a workaround for this bug [12:13:17] not a final solution :o [12:13:48] Writing >> $logfile at every line is troublesome :( [12:13:58] I know it's a workaround. [12:14:22] well, heh it doesn't need to be at every line [12:14:35] if you want to launch a huge script, you could create a second shell for that [12:14:50] but it needs to be at 1 line at least [12:15:07] If you don't add it to one line, Murphy's law predict that this is the line that will fail. [12:19:26] petan, which server was a submit host? [12:20:04] -gs [12:20:18] gs (grid scheduler) [12:20:19] *bots-gs [12:20:28] weird name :P [12:20:35] but it's short and I <3 short names [12:22:14] The load seems to be really high on both nodes. [12:23:16] this is weird, why would it suddenly show the waiting job in a queue and later in none again? [12:24:08] Because it ended really fast? [12:24:39] !mail [12:24:39] we have a mailing list labs-l@lists.wikimedia.org feel free to send a message there, don't forget to subscribe [12:24:42] ignore me [12:25:11] Darkdadaah: sorry thats me, just fixing it now [12:25:20] Platonides: it probably broke [12:25:39] Darkdadaah. I don't think so [12:25:44] it's still in qw state [12:25:59] ahh, thats again me causing things to go slowly :/ [12:26:04] it will go through shortly dont worry! [12:26:11] Oh. Ok. [12:26:39] w -> waiting [12:26:50] not sure what's the q for [12:26:56] "waiting for a queue" ? [12:27:29] qw = queued/waiting [12:27:43] r = running :) [12:28:58] queued and waiting for resource on a node [12:28:58] it has been waiting for 7 minutes :( [12:29:13] ok I completely rewrote the docs, have fun [12:29:14] all the nodes must be quite overloaded, then [12:29:21] don't overload box :) [12:30:31] given that we removed some boxes - we might consider creation of bnr3 but... [12:30:34] do we need it? :o [12:30:44] petan: I think with everyone moving onto these yes :/ [12:30:55] * addshore is still cutting back his cron :p [12:31:12] addshore what about just submitting these jobs less often ;( [12:31:14] ;) [12:31:21] thats what I am doing xD [12:31:24] :) [12:31:28] petan, just to have an idea .. if you compare bnr1 with -liwa or -nr1, how much bigger is bnr1? [12:31:30] but i have hundreds of lines of cron xD [12:31:48] -nr1 was 2gb or ram + swap and 1 cpu [12:32:01] -bnr1 is 8gb of ram + 20gb of swap and 4 cpu [12:32:21] Beetstra ^ [12:32:28] addshore I know [12:32:29] :P [12:32:31] do note that the linkwatcher was munching quite a bit of -liwa .. [12:32:43] ok -liwa was just as -nr1 [12:32:44] small [12:33:05] yeah, but roughly spoken, -bnr1 is 16 times -nr1 ... [12:33:11] petan: is there a way to clear all waiting jobs for a user? [12:33:21] addshore yes but hard [12:33:28] qdel is boring [12:33:31] addshore I can remove them if you want [12:33:33] OK ... so we have that x 32 (bnr1 & bnr2) .. hmm .. [12:33:37] it requires some shell magic [12:34:04] go for it petan as long as its only the ones waiting ;p [12:34:39] addshore qdel `qstat -u $user | sed 's/^\s*//' | sed 's/\s.*//'` [12:34:45] :D [12:34:48] wait [12:34:49] no [12:34:53] that will delete all [12:35:18] xD [12:35:56] addshore qdel `qstat -u addshore | grep -E 'addshore\s*qw' | sed 's/^\s*//' | sed 's/\s.*//'` [12:36:09] you sure? ;p [12:36:14] addshore no [12:36:21] addshore echo `qstat -u addshore | grep -E 'addshore\s*qw' | sed 's/^\s*//' | sed 's/\s.*//'` [12:36:22] :P [12:36:25] that's safe version [12:36:32] ill just wait then :P they should have made it to the queue in 1 or 2 more mins [12:36:33] check what it produces [12:36:42] I hope it is not possible to delete other people's jobs :/ [12:36:50] Darkdadaah I don't know? :D [12:36:53] qdel -u user_name doesn't work ? [12:36:53] but it shouldn't be [12:37:06] phe: yes it does but he wants to delete only some [12:37:06] qdel does seem to have -u as a param [12:37:07] not all [12:37:58] Platonides: is it running yet? ;p [12:38:01] iirc you can do also qdel 133-192,257 to del job id 133 to 192 + 257 [12:38:18] phe: !!!!! :O [12:38:25] i did not know you could do ranges [12:38:42] addshore or: qdel "please delete all my jobs in qw status" [12:38:55] but probably not so smart :P [12:39:09] btw why there is no limit through setrlimit at login on shared ? [12:39:15] hmm phe ranges dont work [12:39:15] *shared box [12:39:31] phe: because bots is work in progress [12:39:53] there are almost no limits atm [12:39:58] petan: I think we should create wrappers for each of these commands :/ [12:40:04] addshore yup [12:40:10] addshore feel free to do so + docs [12:40:11] they are just ... shit xd [12:43:34] petan: -o and -e work for me. [12:44:05] ^those have always worked for me [12:45:15] -o -e? [12:45:41] -o output.log -e errors.log [12:46:39] ahh :) [12:47:11] * addshore now has 3 waiting jobs :d [12:47:14] 1 [12:47:27] all gone :) [12:48:10] Even with no parameters, the files are created in ones home as script_name.e{JobID} and script_name.o{JobID} [12:48:42] (not sure about the path) [12:49:52] yeah [12:49:55] e for errors [12:49:57] o for output [12:59:54] :-/ [13:00:08] linkwatcher is getting it harder again :-( [13:10:03] petan and/or beetstra… still having trouble with sudo? [13:10:36] andrewbogott .. no, moved bots to another instance, not necessary anymore [13:10:47] Ok, so… shall I close out https://bugzilla.wikimedia.org/show_bug.cgi?id=45985? [13:10:50] That bug can be closed, bots-liwa does not exist anymore [13:11:03] And you're able to sudo elsewhere? [13:11:05] (forgot to test before the instance was deleted) [13:11:16] I am now running all bots from bots-sg [13:11:26]