[03:29:09] Holy fuck. [03:39:53] Apparently, there is a patron saint of complete morons that was watching over me today. [03:39:56] * Coren sighs. [03:40:24] :) [03:40:29] Is that a good thing or bad? :P [03:41:33] It's a good thing. I managed to have one of my more important passwords typed on IRC without my noticing for ~5h. [03:41:59] And yet, no accesses, no breaches, and I managed to change every password. [03:42:13] Heh [03:42:22] Security through non-obscurity. [03:42:41] Yeah, I think I'll pass on taking that (unwitting) gamble again, twvm. :-) [03:48:01] Did you manage to test the queue change? [03:49:19] Ah no, let me do that now [03:49:36] No rush, I only wanted to know if you ran into trouble. [03:53:07] Does anyone know what the he--ck happened to bots-2? [03:53:10] I had my code on it. :/ [03:53:23] Searched labs-l archives and didn't see anything. SAL has 12:57 labs-logs-bottie: madman: deleting bots-2 [03:53:26] Which is BIZARRE. [03:53:42] Because I didn't log that and I certainly don't have authority on that project to delete an instance. [03:53:45] I think they all got moved into the bnr-* [03:54:17] Well, that would explain why my cron jobs haven't been running. [03:54:20] errr [03:54:25] Data was moved? [03:54:26] i thought petan migrated all the crons? [03:54:42] Right but my code depends on PEAR packages that aren't installed. [03:54:42] AMadman: well all of your data was in /data/project/something right? [03:54:47] legoktm: Yes. [03:54:48] hrmmmmm [03:54:59] * legoktm pokes addshore petan Damianz  [03:55:20] (January 8th: 03:11 labs-logs-bottie: madman: Installed php-pear, HTTP_Request2, Log on bots-2. ) [03:55:44] I need to see if that can be done, puppetize or whatever, to get that working on a bnr.) [03:56:03] well the problem with bnr is that no one has root except the admins :/ [03:56:09] problem/benefit [03:56:13] Right. [03:56:17] I get the costs and the benefits. [03:56:18] Coren: I was just looking through something else, and it seems I can set a maxsize on the queue's. [03:56:25] I guess I just missed the e-mail. [03:56:33] And it's BIZARRE that someone else logged a message as me. [03:56:35] legoktm: Won't that cause the instertions to fail? [03:57:02] Right, but I can just do a loop/sleep until it goes in or something. [03:57:05] Someone may have sudoed to my user as scripts containing passwords are mode 600. [03:57:12] legoktm: And if it doesn't, then I expect it's set on a condition variable that'll wake the main thread as soon as one is taken off, which might be full of pain. [03:57:40] It'll throw a Queue.Full error [03:57:46] Actually I'm not even sure that would have required sudoing to my user for a root. [03:58:16] AMadman: It wouldn't have; but none of your files even would have needed accessing at all if they were in /data/project since that same filesystem is visible from every instance. [03:58:29] Ah. Didn't know that. [03:58:33] Which is probably easier for me to catch than setting up locks... [03:59:02] legoktm: You still need a condition variable otherwise your main thread will have to busy poll. I.e.: bad. :-) [03:59:16] er, why? [03:59:22] Then that removes even my working hypothesis. Having messages logged as me kind of decreases my trust in the shared environment, you know? [03:59:33] cant i just time.sleep(30) or something? [03:59:37] AMadman: Understandably so. [03:59:57] legoktm: You could, but that's kind of sucky if your worker threads empty the queue in that interval. [04:00:14] legoktm: It'd work, though. Kinda the low-tech option. :-) [04:00:34] Unless I misunderstand what was going on. I just noticed in history it says madman: deleting bots-2 (labs-logs-bottie) [04:00:43] So it may have been directed at me instead-- ohhhh [04:00:48] petrb: previous message was supposed to be logged as me (labs-logs-bottie) [04:01:10] I apparently totally missed that in the SAL but not in the page history. [04:01:18] >.> All is well in the world; I just can't read. [04:01:23] AMadman: There you go. It seems like it was an error. I'll have a word with Petr though, that's bad practice. [04:01:57] Coren: I guess. Maybe when I have time I'll properly re-write it :P [04:02:01] No, I totally understand how it could happen and he corrected it promptly. It was just startling when I searched for bots-2 and got my own name. Didn't see the context. [04:02:52] AMadman: Incidentally, if your tool is stable, the final tools project is ready for prime time. [04:04:16] I'll look into it. I'll be totally honest and say I haven't been following discussion on that project on labs-l at all. Any e-mail I've gotten in the past month or two that hasn't been "action required" has been archived for later reading. [04:04:21] put = False [04:04:22] while not put: [04:04:22] try: [04:04:22] fetched.put(page) [04:04:22] put = True [04:04:22] except Queue.Full: [04:04:23] time.sleep(30) [04:04:27] But my workload's becoming manageable at work (it won't last). [04:04:57] So I'll have a chance at least for the upcoming week. [04:06:40] * AMadman sets up local PEAR repository on bnr- as a quick and dirty fix until he can look into Puppet. [04:08:03] Now I'm just entertained by having read that message without reading the one above it. Me: "I would SWEAR I haven't been drinking!" [04:09:03] legoktm: It'll work, I think. It's also painful. :-) [04:09:22] Hm. If I had a crontab, where would it be... *cycles through instances* [04:09:55] AMadman: I've gotten so confused in the past few months about where my crontab is, so I started storing it on a wiki page [04:10:09] Yeah, I don't have one on any of the bnr servers. [04:12:22] Coren: any tips for finding copy/paste sources for things that sound like maybe they weren't originally written onwiki? [04:12:32] i'm mostly turning up WP mirrors [04:14:22] jeremyb_: The long something has been on-wiki, the harder it becomes. What I usually do is google "some typical phrase" -"view history" which tends to exclude wikis. [04:15:01] jeremyb_: Or you can use madmanbot's manual check which tries to exclude known mirrors. [04:15:14] hah, that's a new one: http://www.amazon.com/wiki/$1 [04:15:20] (for me) [04:15:33] "Shopping-enabled Wikipedia Page" [04:15:45] That's... just ew. [04:16:12] http://www.amazon.com/wiki/United_States [04:16:31] AMadman: Incidentally, I tried CSBot with the monitoring html active on tools. Success. :-) [04:16:47] the ISBNs link to amazon pages [04:16:51] i wonder what else they do [04:16:59] Yeah, my mirrors list is pretty okay, due to the constant people telling me there's a new mirror site, and me adding them without just saying DIY. [04:17:22] I mean, ahem. [04:17:35] Coren: Excellent. [04:17:36] AMadman: Teach a man to fish. [04:17:55] AMadman: I mean, ahem. :-) [04:18:02] Yeah, I've been debating on a nice way to say it. Anyhow, doesn't take up that much of my time. [04:18:14] InterfaceError: (2027, 'Malformed packet', None) [04:18:20] Coren: Give a man a fire and he'll be warm for a day; set a man on fire and he'll be warm for the rest of his life. [04:18:21] thats a mysql thing... [04:18:46] AMadman: http://tools.wmflabs.org/csbot/csb.php [04:18:48] ... [04:18:59] I understand why they call you a madman ;) [04:19:13] http://tools.wmflabs.org/home.html = 404 [04:19:14] :] [04:19:31] jeremyb_: Yeah, I don't actually have, like, useful content to put there yet. :-) [04:19:49] you could just redirect to a page onwiki :) [04:20:00] jeremyb_: Once the new tool interface is in place, that'll be an autogenerated list of tools. [04:20:13] for now redirect :-) [04:20:16] so what does https://dev.mysql.com/doc/refman/5.0/en/error-messages-client.html#error_cr_malformed_packet mean? [04:20:24] anyway, how does one join tools at this point [04:20:27] ? [04:20:32] jeremyb_: you ask Coren very nicely ;) [04:20:52] jeremyb_: Heh. In fact, to join the project you only need to ask any other member. [04:21:03] ah [04:21:15] can you have an onwiki request queue a la shell? [04:21:22] jeremyb_: Only thing that needs my intervention ATM is creating actual tools; and even then that's only temporary, we have a match to wikitech coming soon to self-serve. [04:22:03] jeremyb_: I don't know if there is support for it; shells has a magic userright to work. We might be able to work it in though. [04:22:23] jeremyb_: The obvious question now, of course, is /do/ you want access? :-) [04:22:42] legoktm: I have /no/ idea. [04:22:43] you mean you have your own OSM? [04:23:10] yes please :) [04:23:28] "OSM"? [04:23:48] OpenStreetMap? [04:23:54] Successfully added jeremyb to tools. [04:24:17] openstack manager [04:24:19] legoktm: That's the only expansion I know, but if that's what he mean, I don't understand in context. :-) [04:24:29] So, I'm just going to try running it again and hope it doesn't happen again. [04:24:30] you have a mediawiki extension to manage tool creation? [04:24:42] re: 31 04:21:22 < Coren> jeremyb_: Only thing that needs my intervention ATM is creating actual tools; and even then that's only temporary, we have a match to wikitech coming soon to self-serve. [04:25:02] jeremyb_: It's going to be folded into Special:NovaProject [04:25:27] jeremyb_: The mechanism is more general; it's for service accounts. It's use case is "like tools" :-) [04:25:45] huh [04:26:01] local-legobot@tools-login:~$ jsub -N itwiki -mem 1G itwiki_persondata.py [04:26:01] Your job 210 ("itwiki") has been submitted [04:26:01] local-legobot@tools-login:~$ qstat [04:26:01] local-legobot@tools-login:~$ cat itwiki.err [04:26:01] local-legobot@tools-login:~$ [04:26:10] jeremyb_: Good pointer to get you started: http://tools.wmflabs.org/?Help [04:26:46] legoktm: that's a bad thing? [04:27:04] jeremyb_: It means the job isn't running and spitting out any output nor error messages. [04:27:21] legoktm: failed 100 : assumedly after job [04:27:21] The code just failed quietly. [04:28:04] No traceback? :/ [04:28:19] hrmmmm [04:28:27] maxvmem 484.324M [04:28:34] So it's not out of memory [04:28:40] so, (not me, just thinking...) if you forget your shell name how do you find out what it is? [04:28:49] Wait but if it only ran for less than a second [04:28:53] i'm thinking it should be listed on [[special:preferences]] [04:28:54] In your preferences, on the Openstack tab [04:28:54] How did it use up that much memory? [04:29:13] legoktm: Python is a monster just to start [04:29:19] i don't see such a tab [04:29:21] :x [04:29:32] Gah [04:29:37] ohhh [04:29:45] helps, if i check the right wiki [04:29:58] jeremyb_: But I was wrong anyways, it's on the first tab not on Openstack. :-) [04:30:13] jeremyb_: Just above the i18n section [04:30:18] yeah. never mind. i was just checking the wrong wiki [04:30:47] (i didn't realize your likk was to mediawikiwiki, was thinking it was wikitech) [04:32:11] legoktm: You can try starting the job manually to see if it works at all before submitting it to the grid. [04:32:28] oh good point [04:32:33] yeah its working [04:32:45] Hm. How... odd. [04:33:13] Mind if I switch to your tool account to see if I can see what's up? [04:33:39] Go for it [04:34:58] Segmentation fault (core dumped) [04:35:12] about 5 seconds in; no traceback [04:35:21] uhhh [04:35:27] running it normally? [04:35:36] just $ python itwiki_persondata.py [04:35:37] right? [04:35:45] Yep. Complete output: [04:35:51] starting [04:35:51] starting imports [04:35:52] imports ok [04:35:52] config [04:35:52] sites [04:35:52] def create item starts here [04:35:52] end def [04:35:53] 100 [04:35:53] 19 [04:35:54] 13 [04:35:54] Segmentation fault (core dumped) [04:36:03] hmmmmmmmm [04:36:12] the 100 means 100 pages were loaded [04:36:33] Python 2.7, right? [04:36:35] yup [04:36:42] 19 and 13 mean 2 pages were processed [04:37:02] except the second page wasnt pushed into the mysql db [04:37:47] #2 0x00000000004d1c3d in Py_FatalError () [04:37:47] #3 0x00000000004b0d6f in ?? () [04:37:47] #4 0x00002ad9e5ce1e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 [04:37:47] #5 0x00002ad9e6f2ccbd in clone () from /lib/x86_64-linux-gnu/libc.so.6 [04:37:47] #6 0x0000000000000000 in ?? () [04:38:08] I have no clue what that would mean. [04:38:21] Python itself crashed trying to start a thread. [04:38:26] Oh hmm [04:38:48] But.... [04:38:57] And then failed to output an error in Py_FalatError() (it bailed with a call to abort()) [04:38:58] At that point it shouldn't be starting a new thread. [04:39:16] Unless oursql uses threads internally... [04:40:28] That seems like a plausible guess. [04:40:57] * Coren tries something. [04:41:30] ok [04:41:41] Something odd with that oursql build. [04:41:55] I've never had issues with it until just now [04:42:54] I think I see what goes on. because pip didn't see a system update. [04:43:08] Yeah, 10:1 that's it. [04:43:12] Give me a minute. [04:43:15] ok :) [04:43:37] The only downside of pip is that there is no "--update-all" feature [04:43:46] I wrote my own hack for that though [04:43:57] That was it. Lemme upgrade the rest of the grid. [04:44:06] yay :D [04:44:41] (libmysql was updated; apparently oursql had a break in its use of it that didn't track without an update) [04:44:57] Maybe just a rebuild would have done. [04:49:33] {{done}} [04:50:09] Ok, should I just test it with jsub? [04:52:07] Coren: ^ [04:52:33] You can; either will do. Depends on your confidence level really. :-) [04:52:47] > Your job 211 ("itwiki") has been submitted [04:53:28] ughh [04:53:28] InterfaceError: (2027, 'Malformed packet', None) [04:55:19] That, I have no idea what it is. [04:55:33] But it doesn't sound like a good thing. [04:55:53] hmmm [04:55:58] if i run it manually [04:55:59] Segmentation fault (core dumped) [04:56:09] so it seems jsub gets farther [04:57:15] meh [04:57:31] At this point, it seems like it would be easier to drop all the threads and just have it run in just one thread. [05:02:21] thanks for all your help though Coren :) [05:03:05] "Malformed packet" sounds like some sort of heap corruption, which might explain the randomish behaviour. [05:03:26] But sure. I'm always on-hand for help at need. [05:05:02] did you guys ever decide about shared primary group? [05:05:11] (or role primary group) [05:06:33] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Help was modified, changed by Jeremyb link https://www.mediawiki.org/w/index.php?diff=666804 edit summary: [-46] /* Access */ hardcoded external link -> interwiki! [05:06:41] * jeremyb_ pats wm-bot [05:07:45] * Jasper_Deng assumes IPv6 support is in Tool Labs' design [05:08:03] Jasper_Deng: err? [05:08:05] it's labs... [05:08:12] so no ipv6! [05:08:16] :-) [05:08:21] jeremyb_: yeah, but it's on labs' roadmap includes it [05:08:31] yeah... [05:08:38] but who knows when that will be [05:08:45] that seems to have been kicked down the road for at least 6 months now [05:10:42] jeremyb_: but what exactly must be done for IPv6? [05:10:53] (Ryan said it was something w/ adding a second zone @ eqiad) [05:12:49] i never heard that [05:18:03] that was a while ago though [05:18:09] (in December) [05:20:51] jeremyb_: and ftr, I was asking Ryan about his comment @ https://bugzilla.wikimedia.org/show_bug.cgi?id=35947 [05:21:44] idk [05:21:48] i have to sleep [06:54:25] legoktm@bots-gs:/public/datasets/public$ cd enwiki [06:54:26] -bash: cd: enwiki: Invalid argument [06:54:30] petan: [06:56:36] Coren: same thing happens on tools... [07:00:02] legoktm: This seems to be a general problem but Ryan_Lane and andrewbogott seems to offline [07:00:12] ok [08:17:32] !log wikiversity-sandbox Set up new frontend instance wikiversity-sandbox-frontend [08:17:35] Logged the message, Master [08:18:02] !log wikiversity-sandbox Run "aptitude full-upgrade" on wikiversity-sandbox-frontend [08:18:04] Logged the message, Master [08:27:34] !log wikiversity-sandbox Reboot wikiversity-sandbox-frontend to install the updates [08:27:36] Logged the message, Master [11:47:23] legoktm hi [11:47:29] hey petan [11:47:40] legoktm that is probably some gluster issue, does it work on some other instance [11:47:55] if not then the server which is hosting that folder is having troubles [11:48:37] hmm [11:48:47] my script on bots-bnr1 accessed it fine i think [11:49:03] and i think like 5 out of my 198 jobs worked [11:50:29] hmm not any more [12:04:06] legoktm which folder doesn't work [12:04:30] /public/datasets/public/enwiki/ [12:04:34] aha [12:04:37] yes it's broken [12:04:55] you need RYan [12:09:39] @notify Ryan_Lane [12:09:39] I will notify you, when I see Ryan_Lane around here [15:36:51] petan: Yeah; that NFS seems broken but I can't seem to find a local cause. [15:47:24] How... interesting. I almost, but not quite, can get to labstore1 to see what's up. Will have to wait for Ryan_Lane [17:55:56] !log wikiversity-sandbox Installed Moodle [17:55:58] Logged the message, Master [18:19:08] petan, can you comment on https://bugzilla.wikimedia.org/show_bug.cgi?id=46586 when you get a minute please? [18:19:20] hey [18:19:46] hey :) [18:52:01] legoktm did you manage to fix your prob? [18:52:42] lemme check [18:53:06] nope [18:53:06] legoktm@bots-gs:~$ cd /public/datasets/public/enwiki [18:53:06] -bash: cd: /public/datasets/public/enwiki: Invalid argument [18:53:08] also what about the memory issues [18:53:13] mhm [18:53:17] it's probably still borked [18:53:36] oh i gave up on the memory thing [18:53:44] heh [18:53:53] don't tell me it doesn't work on obts [18:53:54] bots [18:54:02] * Damianz pats petan [18:54:42] i never tried it on bots [19:11:24] legoktm I am pretty sure it would work there [19:12:30] * Damianz frowns at debian [19:13:37] petan: probably, but i already have the rest of it set up on tools, and this is just one part of it, so id rather not move everything else too [19:14:46] ok [19:15:00] Damianz? :P [19:15:05] debian <3 [19:16:38] it's a pain for pxe booting into a single app, exiting then rebooting