[00:28:18] * theopolisme groans because labs is being grumpy [00:28:24] yay there we go [00:28:30] Silly login server [00:28:54] I spoke too soon :/ [00:50:40] Coren: ping? [00:50:41] -rwxr-xr-x 1 yuvipanda svn 1405 Aug 2 00:49 receiver.py [00:50:45] local-suchaserver@tools-login:~$ take receiver.py [00:50:47] receiver.py: You need to share a group with the file [00:50:50] what gives? [00:58:08] YuviPanda: Coren is dining. Shouldn't the group be local-suchaserver? Is the directory +...s? [00:58:19] it's just home [00:58:23] err [00:58:23] hmm [00:58:24] maybe [00:59:41] It is (assuming /data/project/suchaserver = drwxrwsr-x). How did you put receiver.py there? [01:02:05] scfc_de: I scp'd it to ~ for yuvipanda, then cp'd it there [01:04:23] YuviPanda: Hmmm. The latter cp should have honoured the s. [01:04:37] Care to repeat with another file? [01:04:48] yeas, give me a moment [01:05:25] scfc_de: scp'd receiver.py to ~ (yuvipanda) [01:05:32] did a mv [01:05:34] to /data/project [01:05:51] scfc_de: take still teslls me /data/project/suchaserver//receiver.py: You need to share a group with the file [01:07:18] Sure, the file's group is svn. "touch /data/project/suchaserver/test"? [01:07:25] (As yuvipanda.) [01:07:49] touch'd [01:08:24] Aha, cp and mv work differently. [01:08:29] oh? [01:08:49] ah [01:08:50] yes [01:08:51] cp works [01:09:06] Tested it with local-wikilint (touch local-wikilint/abc and touch def && mv def local-wikilint/). Well, fuck :-). [01:11:37] http://www.redhat.com/archives/k12osn/2007-February/msg00253.html suggests standard Unix behaviour? [01:15:32] hmm, alright [01:15:35] I wasn't aware [01:19:28] Neither was I :-). [01:19:46] scfc_de: saw https://wikitech.wikimedia.org/wiki/User:Yuvipanda/Dynamic_http_routing [01:19:47] ? [01:27:00] No, but I listened in on your and Ryan's conversation re that. Sounds fascinating, however for Tools I don't know how "custom" we want individual tools to be. If every tool (or its framework) brings its own webserver, I fear a lot of people will want to use that "just 'cause I can" :-). [01:28:12] I'm sortof happy with that, really :P [01:28:21] gevent! gunicorn! ruby! go! rust! assembly! intercal! [01:29:06] it's also a big fuck you to apache and CGI :P [01:32:12] Great minds can accomplish great things :-), however looking for example at how people want their own instances because they think they hit some resource limit, I don't know if the majority of Tools developers should be given that freedom :-). [01:32:48] with enough guidance I bet we can make them do great things :) [01:32:57] apache can be relegated to just serving PHP, in the end [01:33:17] scfc_de: I guess a lot of them still have a 'toolserver' mindset, which I *guess* didn't have as much infrastructure [01:33:34] nor as much flexibility, nor was as open to volunteer participation in the *infrastructure* (puppet? spare instances?) [01:36:27] scfc_de: if I submit a job as myself (and not a tool user) - 1. is that allowed? 2. will that just get my previlages? [01:36:43] &ping [01:36:44] Pinging all local filesystems, hold on [01:36:45] Written and deleted 4 bytes on /tmp in 00:00:00.0006270 [01:36:48] sigh [07:19:27] I don't know; I am not afraid of those who speak Puppet or just know what that is :-). My concern are more people who see "the standard Apache doesn't allow this miniscule option that I don't really need, but I WOULD LIKE TO!!!eleven!!" and then roll out their own webserver. Lots of duplicated effort and wasted resources (*maybe*). [07:19:27] YuviPanda: Yes, I'm not aware of any prohibition, 2. yes. [07:19:27] scfc_de: hmm, that is true. Considering how there's already been an attempt to... rewrite memcached and syslog and the OOM killer and... [07:19:27] Written and deleted 4 bytes on /data/project in 00:01:28.2124880 [07:19:27] :-) [07:19:27] Slow much. [07:19:27] scfc_de: but still, that's okay, perhaps? what's the worst that can happen? [07:19:28] they can't run apache on the grid can they? [07:19:28] YuviPanda: Why not? [07:19:28] good point. they probably will. [07:19:28] ugh. [07:19:28] well [07:19:28] they actually can't [07:19:28] since we won't have apache on the exec nodes :P [07:19:28] YuviPanda: Local install :-). [07:19:29] Worst case is that instead of concentrating development on what the *tools* actually do, people will spend more time on the scaffolding -- which IMHO is the job of the Tools project. [07:19:29] true [07:19:29] but then a lot of the 'newer' languages have moved away from the PHP model, which is what apache is suited for [07:19:29] put files somewhere, you'll run things somewhere [07:19:29] Yes, and I see Coren's pain :-). [07:19:29] Coren is probably going to give up on uwsgi tyrant, and I can understand why. Forcing the 'reverse proxy' model into the older 'files in a location' model is hard [07:19:29] CGI is a massive waste of... everything :P [07:19:29] scfc_de: what we can do is to have the power available, but let things be hidden behind wrappers. [07:19:29] scfc_de: run a python job by doing 'websub' with these parameters, for example :P [07:19:29] scfc_de: also apache doesn't support websockets, and that sucks [07:19:29] I'd prefer "we support A, B, C", but we do *that* with a "promise". Re Python & Co., I lack the experience there, so I don't know what's needed for proper setup. [07:19:29] I don't think Coren can setup uwsgi in a way that's satisfactory to him [07:19:29] (BTW, JFTR your project will be valuable for instanceproxy alone.) [07:19:29] neither can we do node, go, scala, ruby, whatever. [07:19:29] scfc_de: true. we'll save precious ipv4 addresses! [07:19:29] And have arbitrary ports! [07:19:30] that too! [07:19:31] 20000 ft view on uwsgi: All apps are persistent "daemons", central distributor that forwards request to those? [07:19:31] scfc_de: pretty much, yeah [07:19:31] it's also true for rack (ruby), jack (nodejs), go (built in, IIRC?), jetty (scala / java), etc [07:19:31] even fastcgi runs like that [07:19:31] and supports lots of languages [07:19:31] apache-php is the anomaly here, since php is deeply integrated into apache [07:19:31] I saw the disdain for mod_wsgi, but can't we use that? Split the requests according to tools so not every server must run all apps (we already do that, BTW)? [07:19:31] I think it's not too well maintained, and Coren didn't like it [07:19:31] for other reasons too, IIRC [07:19:31] also nobody who runs python in production uses mod_wsgi anyway. [07:19:32] And what are the problems with uwsgi? [07:19:32] Or is that the wish to have all apps run as grid jobs? [07:19:32] that's just Coren trying to re-create the SGE + routing jobs with tyrant, I believe [07:19:32] :) [07:19:32] scfc_de: indeed, since they are *persistant* deamons, having them run as grid jobs makes sense [07:19:32] (to me, at least) [07:19:32] and to Coren too, since he talked about just using fcgi + spawning them out to the grid [07:19:32] Yes, but if the design stands in the way of success ... :-) [07:19:32] heh :P [07:19:32] let's see if Coren figures out a way [07:19:32] Just because we have a grid doesn't mean we need to run everything there. [07:19:32] scfc_de: sure, and I do not know the details of what issues Coren ran into [07:19:32] I'd be happy if we got native uwsgi support, and i'd be doubly happy if we got rid of apache :) [07:19:32] (Though it would be *really* cool.) [07:19:32] scfc_de: it'll happen! Killing the apaches for anything not php might be my long term goal :) [07:19:32] At Toolserver I had to endure Zeus, and I'm not sure that was an advantage :-). [07:19:32] what was Zeus? [07:19:32] my interactiosn with the toolserver were limited to running a few queries now and then for people :) [07:19:32] *Is*. http://en.wikipedia.org/wiki/Zeus_Web_Server [07:19:33] ugh not open source [07:19:33] scfc_de: shouldn't happen, I think. This will be based off nginx (which we use in production), lua (which we use in production / for users too) and redis (which we use in production) [07:19:33] and also puppetized + documented + open source [07:19:33] okay, 'documented' was stretching it :P [07:19:33] but the rest of it is true! [07:19:33] I don't have strong opinions on Apache; the problem on Toolserver was the knowledge people have about Apache and about Zeus :-). If we use something that's deployed in production, I don't see any problems. [07:19:34] yeah [07:19:34] scfc_de: our previous plans was to use hipache, which did make me a bit queasy since it's not in production [07:19:34] *our* production [07:19:34] nginx+lua+redis doesn't have that problem [07:19:34] Yep. [07:19:35] scfc_de: what exactly was zeus used for, btw? [07:19:36] YuviPanda: For http? [07:19:36] scfc_de: like, where did that fit in, with apache being around too? [07:19:36] I think River was a Solaris guy, and choosing Zeus over Apache was a side product of that. Maybe it is better integrated with Solaris (well, I doubt that, Apache should be fine as well), maybe he was more familiar with it. [07:19:36] oh, so it was used *instead* of apache? [07:19:38] Yes. [07:19:39] ow that's bad [07:19:39] (for php, at least) [07:20:17] Does anyone know how I can manually deploy changes on beta labs? [07:20:17] I tried updating /usr/local/apache/common and running scap, but that gives me a segfault in parsekit.so :( [07:20:17] (I need to do this because VisualEditor is broken and stuck on July 25th) [07:20:24] RoanKattouw, I think you just make changes on deployment-bastion and they get instantly deployed via glusterfs [07:20:24] at least that's according to the second-to-last line of https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Overview#Architecture [07:20:27] RoanKattouw, I assume it's fixed now? [07:20:27] yes [07:20:27] I ended up manually fixing the extensions.git repo, then triggering the Jenkins job my merging an extension commit [07:20:27] Though now we're having RL caching issues [07:21:05] @notify T13|sleeps [07:21:05] This user is now online in #wikimedia-mobile. I'll let you know when they show some activity (talk, etc.) [07:21:05] wtf [07:21:05] @notify T13|sleeps [07:21:05] :/ [07:21:05] !poof [07:21:05] addhome, T13|sleeps ping [07:21:05] !ping [07:21:05] !pong [07:21:05] !poang [07:21:05] !pang [07:21:05] !pung [07:21:05] !derp [07:21:05] ???????? [07:21:05] petan, what's wrong with wm-bot [07:21:06] *POOF* "Wadda need?" *POOF* "Wadda need?" *POOF* "Wadda need?" [07:21:07] !pong [07:21:08] pang [07:21:09] pung [07:21:10] derp [07:21:11] Don't mess with me. [07:21:12] This user is now online in #wikimedia-mobile. I'll let you know when they show some activity (talk, etc.) [07:21:52] someone restarted it absolutely pointlessly in a wrong moment [07:21:53] !ping [07:21:53] !pong [07:22:05] poor thing [07:22:24] it's back [07:22:24] :O [07:22:50] Pet an pong [07:23:07] He he that's me? XD [07:23:19] Blame t13! :) [07:24:43] lol [07:24:46] what were you doing?? [07:27:14] @notify T13|sleeps [07:27:14] This user is now online in #wikimedia-tech. I'll let you know when they show some activity (talk, etc.) [07:27:32] addshore: what happened?? #wm-bot is full of shit [07:27:37] I'll let t13 tell you :0 [07:27:37] he's not in here [07:27:37] I need to get breakfast ;p [07:27:37] Oh .. [07:27:44] Btb [07:27:51] :) [07:55:20] [bz] (8NEW - created by: 2Addshore, priority: 4High - 6enhancement) [Bug 48894] Include pagecounts dumps in datasets - https://bugzilla.wikimedia.org/show_bug.cgi?id=48894 [07:55:31] !ping [07:55:31] !pong [08:20:04] addhome, ping [08:20:40] pong [08:20:56] addhome, you sure know how to flood my inbox. :p [08:21:32] Cyberpower678 ?? [08:21:42] petan, ?? [08:21:50] you just joined #wm-bot, typed 2 non-existing commands and left o.O [08:22:02] petan, yhes. [08:22:15] :0 [08:24:02] addhome, I'm going to work on Peachy some more. I'm almost done the gut work. [08:27:42] :> [08:27:50] Cyberpower678: I want to try and catch you up ;p [08:41:29] addhome: you should migrated to tools-redis at some point [08:41:29] :) [08:44:43] YuviPanda: is it working the sames as -mc? [08:45:00] addhome: yeah, exactly the same... except it's got 7 times as much memory [08:45:02] and 4 cores :P [08:45:04] :D [08:46:06] ill do it in a second [08:46:12] once -login responds [08:48:22] addhome: <3 [08:48:26] &ping [08:48:27] Pinging all local filesystems, hold on [08:48:28] Written and deleted 4 bytes on /tmp in 00:00:00.0007550 [08:48:29] Written and deleted 4 bytes on /data/project in 00:00:00.0097340 [08:48:36] just srated working ;p [08:52:08] YuviPanda: switched [08:52:14] :D [08:52:15] works fine? [08:52:29] seems to be [08:53:04] hehe YuviPanda http://ganglia.wmflabs.org/latest/graph.php?r=20min&z=xlarge&h=tools-redis&m=load_one&s=by+name&mc=2&g=cpu_report&c=tools [08:53:06] \o/ [08:53:50] addhome: heh, looks very overloaded :P [08:54:21] haha :p [08:54:25] cheers YuviPanda ! :) [08:55:46] addhome: :D [08:57:23] addhome, I spent all morning reading my emails. 92% came from you. :p [08:58:34] :O [08:58:44] it was only about 8 commits or somethin xD [08:58:46] *something [08:58:54] grr nickname [08:59:38] Received 39 emails from Addwiki [08:59:50] addshore, ^ [09:00:13] addshore, make that 40 now. :p [09:00:19] addshore: are you coming to wikimani? [09:00:37] YuviPanda, where it is take place this time? [09:00:42] Hong Kong [09:15:29] YuviPanda: no worries not going to rewrite apache today [09:15:37] :D me neither [09:16:10] but there is a good point in rewriting memcache (which is pretty simple thing) in c# and making OOM killer as external service, not included in kernel and more flexible [09:16:28] it's not reinventing a stone wheel, it's changing a stone wheel into Pirrelli tires [09:17:28] well, if you think rewriting memcached is 'simple', please read https://www.varnish-cache.org/trac/wiki/ArchitectNotes [09:17:37] Redis is the tires. [09:17:58] well redis is nice, but it's still missing important things [09:18:33] like authentication... also did you check if redis is same fast as memcache? did you run any kind of benchmark before you decided to force all tool developers to move from that to redis? [09:18:57] I'm evil, don't you know? :) [09:19:22] it's possible to secure keys in redis. not possible in memcached. end of story. [09:20:04] and did you even click that link? [09:21:22] it's not possible to secure key in redis neither in standard memcached. that is end of story [09:21:33] redis has exactly same way how to secure keys [09:21:38] which is none [09:22:05] YuviPanda: yes I clicked that link. The guy who wrote that is... cool, people should learn from him, especially modern python programmers [09:22:26] anyway, that what he say is not entirely true, you can easily bypass swap and disk caches on linux kernel [09:22:30] :) ok [09:23:01] raw devices == no cache, (swappiness == 0 || no swap) == no problems with swapping [09:23:12] it's really a problem of architecture [09:23:14] have a look on oracle [09:23:30] it does exactly the same what squid and yet is considered the best database system on world [09:23:33] which is likely true [09:23:42] yeah, one that can be solved by writing low level tools in a language with non-deterministic GC + object overheads :) [09:24:13] I don't really know if memcache is a low level tool, but in this you are right [09:24:38] using c++ would be better, but that would mean spending 100 times more time to make it [09:25:20] indeed. [09:25:40] also having any form of required auth up front would kill most of the perf stuff anyway. [09:31:28] not necessarily, you could open a session, auth, and THEN you would do the perf stuff [09:31:40] YuviPanda: no [09:31:46] a w :( [09:31:46] but of course most of python kids prefer to open new session for every single key [09:32:01] because then the source code looks more cute even if it suck [09:32:17] hehe python kids :P [09:32:40] just saying petan. Insulting everyone who writes python code isn't bound to . [09:32:56] help make friends [09:32:59] or do anything productive [09:33:05] I am not insulting everyone, I am just insulting kids :P [09:33:12] yeah, keep doing that [09:33:17] * YuviPanda goes to do something productive instead. [09:34:19] you are taking stuff too much seriously... [09:35:20] why in the world should you open a new connection to redis 100 times if you wanted to insert 100 keys? [09:35:34] that is performance killer as well [09:44:52] anyone know where the repo / source is for http://tools.wmflabs.org/geohack/ ? [09:45:00] oh, wait... [10:06:46] there is twitter memcached proxy [10:06:53] Twemproxy is the name [10:07:56] you install it on the frontend server and it open a single connection to each cache server in which the requestes are multiplexed [10:08:02] i think we got it in prod [10:15:37] hashar: yeah, we run twemproxy in prod :) [10:16:50] hashar: I think we also now use Redis for sessions, not memcached. Since REdis does cross datacenter replication [10:19:24] yup sessions are in redis [10:19:33] not sure whether cross dc replication was the killer feature though [10:20:15] lack of problems with restarts + cross dc replication, I suppose? :) [10:20:43] we also use Redis for job queues, and I was talking to ori about the IRC RC changes thing and redis will probably be involved there in some form too [10:20:50] hey jarry1250! Will you be coming to wikimania? [10:22:12] one day I will reengineer our job queue system [10:22:15] it is hoooriiiibbble [10:23:53] hashar: at least we aren't using mysql as a queue :P [10:31:50] YuviPanda: well that is just the datastore [10:32:14] YuviPanda: my complain is more about the evil PHP scripts under maintenance/ and the VERY lame jobrunner shell script :] [10:32:22] hehe true truie [10:32:26] one step at a time, I guess [10:32:29] maybe next year I will attempt a proof of concept [10:32:33] nice! [10:32:39] I need to phase out jenkins first [10:39:34] hashar: phase out? [10:39:44] "get rid of" [10:39:45] :D [10:39:58] and replace it with? :D [10:40:04] the rough idea would be to have Zuul enqueue job request in some system [10:40:12] then have workers fetching the job requests and handling them [10:40:21] docker! [10:40:22] using http://gearman.org [10:40:25] a job server [10:40:33] (for isolation, at least0 [10:40:34] ) [10:40:38] yup [10:40:55] and then [10:41:21] rewrite MediaWiki/Wikimedia job system to send jobs requests to gearman [10:41:50] :) [11:00:25] * T13|sleeps yawns [11:05:49] andrewbogott_afk: ping [11:05:54] @notify andrewbogott_afk [11:05:54] This user is now online in #wikimedia-dev. I'll let you know when they show some activity (talk, etc.) [11:42:34] &ping [11:42:34] Pinging all local filesystems, hold on [11:42:35] Written and deleted 4 bytes on /tmp in 00:00:00.0005830 [11:43:52] Written and deleted 4 bytes on /data/project in 00:01:17.8795490 [11:50:31] addshore ping [11:50:34] can you try ssh to bots-labs [11:50:52] or Damianz [11:51:11] I disabled gluster mount for /home there [11:51:18] so I would like to figure out how it works [11:51:23] * whether it works [11:53:12] (03PS1) 10Yuvipanda: Refactor subscriptions.py [labs/tools/gerrit-to-redis] - 10https://gerrit.wikimedia.org/r/77291 [11:54:44] (03PS1) 10Yuvipanda: Move i18n extensions to #mediawiki-i18n [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77292 [11:57:13] (03PS2) 10Yuvipanda: Move i18n extensions to #mediawiki-i18n [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77292 [11:59:57] (03CR) 10Yuvipanda: [C: 032 V: 032] Refactor subscriptions.py [labs/tools/gerrit-to-redis] - 10https://gerrit.wikimedia.org/r/77291 (owner: 10Yuvipanda) [12:00:51] (03CR) 10Yuvipanda: [C: 032 V: 032] "siebrand says okay!" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77292 (owner: 10Yuvipanda) [12:07:08] (03PS1) 10Yuvipanda: Fix missing comma [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77295 [12:07:09] (03CR) 10Yuvipanda: [C: 032 V: 032] Fix missing comma [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77295 (owner: 10Yuvipanda) [12:07:16] heh, my shame! [12:12:23] I made a number of changes to the caching behaviour of the NFS server that may mitigate the negative effects of controller stalls. Please keep an eye on things and see if the pain is less. [12:13:03] &ping [12:13:03] Pinging all local filesystems, hold on [12:13:04] Written and deleted 4 bytes on /tmp in 00:00:00.0006270 [12:13:05] Written and deleted 4 bytes on /data/project in 00:00:00.0065830 [12:13:26] (i.e., I can't do anything against the controller wedging but I think I made it so that it doesn't affect thigs as much or for quite as long in many cases) [12:14:47] Coren: did the firmware update do anthing? [12:14:50] *any [12:15:31] That requires a long downtime; we're not going to do this until we switch to the backup server. [12:15:38] oh, right [12:16:11] ([bleep bleep] firmware update requires an idle controller and booting off a special image) [12:18:05] man, they should all just come off a 'firmware app store' :P [12:18:07] * yuvipanda runs away [12:20:57] [bz] (8NEW - created by: 2Yuvi Panda, priority: 4Unprioritized - 6normal) [Bug 52452] Have public, readonly, up-to-date git repositories available for all tools to use - https://bugzilla.wikimedia.org/show_bug.cgi?id=52452 [12:32:21] petan: [12:32:24] pong [12:32:31] yes [12:32:35] can you try ssh to bots-labs [12:32:49] I disabled gluster there and want to check if keys still work [12:33:03] I also had to umount /public [12:37:59] addshore? [12:38:13] *tries now* [12:38:33] Coren: is one of side effect of your tweaks that it takes horribly slow now for files to sync between servers? [12:38:49] "sync" [12:39:01] Coren: my tool is writting to a file and I see the changes on other server after a minute [12:39:14] petan: That's... not how NFS works. [12:39:20] ? [12:39:22] petan: I can get in fine [12:39:27] addshore: cool [12:39:46] naturally possible dns spoofing and remote host id changing :p [12:39:47] (03PS1) 10Yuvipanda: [WIP] Store canonical info about subscriptions in mysql [labs/tools/gerrit-to-redis] - 10https://gerrit.wikimedia.org/r/77300 [12:39:50] but *ignores* [12:39:54] petan: The buffering happens in your tool, NFS does not, /cannot/ have lag of the sort you describe. [12:40:17] Coren: nope, I did "execute some tool >> logfile" [12:40:22] it produces output in realtime [12:40:25] http://ganglia.wmflabs.org/latest/?c=tools [12:40:30] but logfile is being appended with horrid lag [12:40:32] what the .... happened an hour ago?? [12:41:18] petan: 99.99% of programs's standard output is buffered unless you took great care to not buffer it. [12:41:35] programs'* are* [12:41:51] ok but that is /bin/dash what is buffering it then [12:41:58] meh [12:42:04] why it happens only on tools project? [12:42:05] libc. :-) [12:42:15] I have done a same thing on bots-labs with wm-bot and it works fine [12:42:24] I can tail -f the file and I see it real time [12:42:33] but on tools project it takes a minute to refresh [12:43:00] don't tell me it's because tools project has different libc [12:43:32] petan: The events you describe are not consistent with the way filesystems work, nor NFS. Compare with a tail on the same host where the output is taking place. [12:44:15] yes I am talking about this, I am doing in on localhost on bots-project but not on tools of course [12:44:23] regular users aren't able to ssh to to exec nodes to do that [12:44:35] I see similar behavior on all my logs, I am pretty sure that's just buffering from the logging framework I use [12:44:41] you don't want an fsync after every write [12:44:46] I do [12:44:53] :P [12:45:00] yuvipanda: Well, you might want it in some situations. [12:45:07] Coren: sure, but not by default [12:45:20] yuvipanda I don't really care about fsync I want my tool behave same on tools project as it is on bots-project [12:45:23] for *logging* [12:45:49] I am trying to figure out how to accomplish this on tools [12:46:05] it was quite easy on bots... [12:46:09] petan: If you have different behaviour, then you have different circumstances. [12:46:14] ? [12:46:44] only different circumstance is that the project changed... [12:47:02] well, on bots I was using local filesystem not a nfs [12:47:07] hehe [12:47:15] that's another circumstance that changed [12:47:22] and quite significantly worsen stuff [12:47:39] petan: POSIX filesystem semantics have very well defined semantics, if you rely on unspecified behaviour, then you get unspecified results. [12:48:29] petan: That means your code didn't do synchronous I/O. If it doesn't do synchronous I/O, you can't expect its behaviour to be time-invarient in the absence of barriers and proper synchronization (which tail, obviously, never does) [12:48:47] Coren: I sent you pretty much whole my code [12:48:53] I am talking about redirecting output using > [12:48:55] in shell [12:49:07] I have no idea what source code of bourne shell is doing [12:49:18] petan: That has nothing with redirection (which only switches fds around) and everything to do with the program that outputs. [12:49:29] program outputs to terminal [12:49:32] it writes to console [12:49:38] I redirected that text from console to a file [12:49:41] petan: That means nothing. [12:49:50] petan: *HOW* does it write? [12:50:08] (03PS2) 10Yuvipanda: [WIP] Store canonical info about subscriptions in mysql [labs/tools/gerrit-to-redis] - 10https://gerrit.wikimedia.org/r/77300 [12:50:12] if I don't redirect it to a file, it writes, say 10 lines per seconds, if I redirect it to a file, it does the same, but file refreshes every minute [12:50:17] petan: Has the output been opened O_SYNC? Do you fflush()? [12:50:38] it's just a cout << "blabla"; [12:50:46] petan: libc buffers differently depending on whether the output is a terminal (line-buffering by default) or a file (block-buffering) [12:51:20] petan: Right. If you're using stdlib then you can expect the same behaviour. [12:51:23] Coren: whatever, imagine this script: [12:51:37] while [ 0 ]; do [12:51:42] echo "blabla" >> file [12:51:44] done [12:51:54] (03PS3) 10Yuvipanda: [WIP] Store canonical info about subscriptions in mysql [labs/tools/gerrit-to-redis] - 10https://gerrit.wikimedia.org/r/77300 [12:52:01] that would indefinitely append blabla to a file [12:52:13] if I didn't redirect it I would see it real time [12:52:17] petan: Your comparision is meaningless. the process 'echo', regardless of what it outputs, would have flushed at its end. [12:52:23] if I redirect it on bots project I see it real time as well [12:52:35] if I redirect it on tools project I see it after a minute [12:52:49] petan: Not the script you have just shown, no. [12:52:55] ok let me try it [12:53:18] (03PS4) 10Yuvipanda: [WIP] Store canonical info about subscriptions in mysql [labs/tools/gerrit-to-redis] - 10https://gerrit.wikimedia.org/r/77300 [12:54:18] (03PS5) 10Yuvipanda: [WIP] Store canonical info about subscriptions in mysql [labs/tools/gerrit-to-redis] - 10https://gerrit.wikimedia.org/r/77300 [12:54:44] petan: Okay, lemme try to explain this to you again. Your program buffers. Unless you take care to /very specifically and explicitly/ to synchronous I/O with write barriers, you cannot expect any specific timing on when its output reaches the filesystem. If you just use stdlib for output, you'll need some black magic to control that. Just 'cout << "foo"' has undefined behaviour from outside [12:55:27] your process. In other words, the program is *explicitly* allowed to write the file all at once at the end without any intervening output going to the file. [12:56:21] And, in fact, I *know* that stdlib (which uses glibc) has explicitly *different* buffering behaviour depending on whether the output fd is a file, a socket or a tty. [12:56:41] mhmm [12:57:07] Unless you fflush() /and/ your file was opened with the O_SYNC flag, your program has no control over when what you write to the file hits the disk. [13:12:24] (03PS1) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77307 [13:12:28] (03CR) 10jenkins-bot: [V: 04-1] Jenkins job validation (DO NOT SUBMIT) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77307 (owner: 10Hashar) [13:13:03] hashar: You know you can git review -D which will make not submittable (draft), right? [13:13:21] yeah should probably do that [13:13:31] though I am not sure jenkins reacts on drafts [13:13:49] Coren: does labs not have the daily wikidata dumps? :O [13:15:15] addshore: I thought it did. /public/datasets/public/wikidatawiki/ seems only ~2 weeks though. [13:15:24] :< [13:15:38] dailys would be awesome :> [13:15:57] addshore: I don't think that's under labs control, we're just making a copy of the public dumps server. [13:16:09] hmmm [13:16:17] *goes to find what directory they would be in [13:19:34] Hm. Interestingly enough, proper support of fastcgi implies reasonably good support of wsgi as a side effect; you can simply have your fcgi be an invokation of uwsgi and it'll Just Work™ [13:21:54] (03PS1) 10Hashar: make sure we use default jshint parameters [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77309 [13:22:35] (03CR) 10Hashar: [C: 031] make sure we use default jshint parameters [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77309 (owner: 10Hashar) [13:22:49] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77307 (owner: 10Hashar) [13:27:18] lol there is no working library for redis... wtf [13:27:45] this is probably a time where I have to reinvent the wheel, because all of wheels are broken [14:01:31] I hate intermittent problems because right now it /looks/ like just changing cache flush behaviour has had more effect than supposed. [14:04:21] but you can never [14:04:22] be sure [14:07:12] (03PS1) 10Nemo bis: Send all MediaWiki updates to #mediawiki-feed too [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77318 [14:10:02] (03CR) 10Nemo bis: "Added the channel +f/+F in case they have objections or they can directly flag it as needed (gerrit-wm was voiced there)." [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77318 (owner: 10Nemo bis) [14:15:23] (03PS2) 10Yuvipanda: make sure we use default jshint parameters [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77309 (owner: 10Hashar) [14:15:32] (03CR) 10Yuvipanda: [C: 032 V: 032] make sure we use default jshint parameters [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77309 (owner: 10Hashar) [14:29:29] (03PS2) 10Yuvipanda: Send all MediaWiki updates to #mediawiki-feed too [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77318 (owner: 10Nemo bis) [14:29:56] (03CR) 10Yuvipanda: [C: 032 V: 032] "Blame Nemo if people in #mediawiki-feed complain" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77318 (owner: 10Nemo bis) [14:39:45] (03PS6) 10Yuvipanda: Store canonical info about subscriptions in mysql [labs/tools/gerrit-to-redis] - 10https://gerrit.wikimedia.org/r/77300 [14:39:47] (03Restored) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77307 (owner: 10Hashar) [14:39:49] (03PS2) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77307 [14:40:16] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77307 (owner: 10Hashar) [14:48:50] (03CR) 10Yuvipanda: [C: 032 V: 032] "Works" [labs/tools/gerrit-to-redis] - 10https://gerrit.wikimedia.org/r/77300 (owner: 10Yuvipanda) [16:09:01] Coren: problem with NFS again? [16:09:32] Looks like. Should be back to health in 90 secs or so. [16:10:28] * AzaToth waits for cd tmp/ to retirn [16:11:23] It's annoyingly intermitent at that. Sometimes, no stalls for 2-3 hours, sometimes even 5 consecutive hours. Then it burps a few times in the same hour. [16:11:26] * Coren grumbles. [16:11:44] who to blame?= [16:13:38] I really really hope it's a hardware issue because if it's a regression in the driver there is no clear fix. [16:14:11] Well, there is, but it means loosing snapshots which would suck very much. [16:19:20] [bz] (8NEW - created by: 2Tyler Romeo, priority: 4Normal - 6enhancement) [Bug 52354] Run Minion testing instance for security testing - https://bugzilla.wikimedia.org/show_bug.cgi?id=52354 [16:37:24] Coren: Switching all projects to NFS is on hold until the issues are sorted out? [16:40:57] scfc_de: Yep. [16:41:03] I don't get it: http://ganglia.wikimedia.org/latest/graph.php?r=4hr&z=xlarge&h=labstore3.pmtpa.wmnet&m=cpu_report&s=descending&mc=2&g=cpu_report&c=Labs+NFS+cluster+pmtpa [16:44:54] Coren: labstore[1-4] are boxes that are (all?) connected to the same detached disk hardware? [16:45:24] scfc_de: No, 1 and 2 each have a shelf; 3 and 4 share two shelves. [16:47:23] So you have the option to switch between 3 and 4 to rule out issues in the boxes, but you can't catch anything "below" that? [16:51:50] scfc_de: Almost. There are also identical servers in eqiad with their own shelves we can try, but simulating the load will be trickier. [16:54:19] scfc_de: Another possibility is to eschew the shelves entirely and use the server's drives (they have their own enclosure and controlle) [16:54:19] But in the latter case we'd be losing half of the redundancy. [16:54:20] Another possibility, given that we have other servers with teh same hardware that have no issues, is that this may be a regression in the driver; rolling back to a 3.2 kernel is a possibility but then we loose snapshots. [16:54:27] None of the options are very much fun, and all of them require some downtime. [16:54:42] Coren: But if it is a problem triggered when the cache becomes saturated, shouldn't that test be "easy" compared to some random access pattern? [16:55:45] (Provided I understand the problem correctly: From time to time, the device stalls as it apparently writes the cache to disk.) [16:56:15] Though you said you switched to write-through, which would make that unlikely. [16:57:06] Hmmm. [16:57:15] coren: Hi. I wanted to push the updated tools home page copy back to the git repository. How should I do that? push or pull request--I don't currently have write access, as far as I know. [16:57:35] kma500: You have a commit in your local clone? [16:57:41] yes [16:57:43] kma500: gerrit is SO MUCH FUN! :-) [16:57:47] kma500: use 'git review' [16:58:01] That's a sorta pull-request that puts the commit up for review. [16:58:11] ah. thanks! I'll try that. [17:04:48] Hey Coren, do you know where would I go to request a new labs project? [17:05:37] csteipp: A project? https://wikitech.wikimedia.org/wiki/Special:FormEdit/New_Project_Request [17:05:43] csteipp: What for, though? [17:06:13] Ah, cool. You know, exploiting wikis ;) [17:07:08] kma500: Did you have any luck with that git review? [17:07:31] not yet. [17:07:55] I'm getting a message that I have to manually create a remote named gerrit and try again [17:08:15] kma500: do 'git review -s' first [17:08:18] then do 'git review' again? [17:08:27] Do a 'git review -s' since its the ... [17:08:30] Ninja'd [17:11:10] scfc_de: Will you be at WM? I'm pretty sure I asked, but I don't remember your answer. [17:11:33] Coren: I don't think you asked, but no, I will be not :-). [17:12:05] Thanks coren and YuviPanda_zz. I tried that, but get the same message. [17:12:45] kma500: Perhaps there is no .gitreview (I'll have to fix that). I'll PM you [17:12:54] okay. Thanks! [17:20:24] Actually... Coren, is it possible for labs projects to talk to each other? I was trying to find a good way to run some security scanners against beta, but seems like there's not routing between them.. [17:30:20] (03PS1) 10Yuvipanda: Revert "Send all MediaWiki updates to #mediawiki-feed too" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77353 [17:30:34] (03CR) 10Yuvipanda: [C: 032 V: 032] Revert "Send all MediaWiki updates to #mediawiki-feed too" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/77353 (owner: 10Yuvipanda) [17:40:16] csteipp: Not by default, you need to adjust the security groups of both projects. [17:42:07] Cool. Thanks! [17:46:06] Hi everybody... Maybe/hopefully someone can help me. I try to access the enwiki database on tools-login.wmflabs.org. Its working fine, expect that I cannot access the category table. [17:46:26] The error: ERROR 1356 (HY000): View 'enwiki_p.category' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them [17:47:02] (03PS1) 10Yuvipanda: Use print function rather than the print statement [labs/tools/gerrit-to-redis] - 10https://gerrit.wikimedia.org/r/77356 [17:47:06] btw, I hope IRC is the right place for this kind of questions :). [17:47:16] nette: it is! Coren or scfc_de might be able to help [17:47:25] (03CR) 10Yuvipanda: [C: 032 V: 032] Use print function rather than the print statement [labs/tools/gerrit-to-redis] - 10https://gerrit.wikimedia.org/r/77356 (owner: 10Yuvipanda) [17:47:27] nette: Only Coren can you help you there :-). [17:48:01] Thats a lie! A lie! [17:48:06] hahaha [17:48:32] Hm. category? Lemme see. [17:48:52] yes, it was working yesterday, and then suddenly not anymore [17:49:08] ... that's even odder. [17:49:10] some permission issues I guess [17:49:15] * YuviPanda blames glusterfs [17:49:38] * Coren idly wonders if there was a schema change he wasn't aware of. [17:53:49] Aha. They have. [17:53:57] Removed a column, have they! [17:54:07] Tsk, tsk. [17:54:19] bastards!! :) [17:55:15] * Coren makes a fix. [18:14:23] hey Ryan_Lane! proxy.wmflabs.org is the project proxy proxy project instance, and it doesn't have 80 open outwards. is that because I didn't add the 'proxy' service group? [18:14:50] yes [18:15:00] so I need to delete and recreate it again now [18:15:04] yep [18:15:07] stupid openstack [18:15:09] and ec2 [18:15:21] that behavior is simply just copied from ec2's api. so fucking stupid [18:15:25] Ryan_Lane: The NFS server is teh pains. Think we can set a half-day aside at WM to do some experiments to fix it? [18:15:31] Coren: yes [18:15:32] please [18:15:38] because I really want to move off of gluster [18:16:04] I'm going to spend today reducing disk space usage and responding to tweets and emails about ssl [18:16:08] Me no like limping NFS. It /works/, but it's the annoyance from hellz. [18:16:31] nette: category should now be working again (minus the disapeared column) [18:18:08] Ryan_Lane: The part that bugs me is: nobody else (that I can google) has this problem. So it has to be something /here/, or with our setup. [18:18:32] kernel bug maybe? [18:18:47] YuviPanda, can you catch me up about what you're doing proxywise? I might have opinions. [18:19:06] andrewbogott: https://wikitech.wikimedia.org/wiki/User:Yuvipanda/Dynamic%20http%20routing [18:19:13] Ryan_Lane: But then, it'd apply to 3.5 /and/ 3.8 and nobody else would have noticed despite Dell hardware and HX00 SAS raids being, like, in gazillion machine rooms all over? [18:19:34] andrewbogott: so you can just have one public IP for the proxy machine, and have it dynamically route to whichever host on whichever port [18:19:34] (That said, the *go boom ever other week bug* is definitely gone) [18:20:09] YuviPanda, how does that differ from how instance-proxy works now? [18:20:18] andrewbogott: it'll be configurable from wikitech [18:20:23] andrewbogott: you can put any domain on it. [18:20:30] so you can add x.wmflabs.org and it'll route properly [18:20:45] Ah, that seems good. [18:20:45] and we could add ssl to it [18:21:03] then we can yank most public IP addresses from projects :) [18:21:15] All I have to add is -- I puppetized and set up the pmtpa-proxy instance and it doesn't work and I'm not sure why. [18:21:30] But there might be some puppet code you can reuse there, if you aren't already doing so [18:21:30] this is going to be nginx+lua+redis [18:21:43] andrewbogott: YuviPanda is going to work on this at wikimania [18:21:46] yeah [18:21:51] cool. [18:21:57] andrewbogott: I'd like to stick an API in front with keystone auth [18:22:08] but alas I'll be stuck doing other things [18:22:40] we don't need an API for now, I guess. because wikitech [18:24:03] Cohen: Yes, I can access it!!! Thanks! Of course, another problem: The missing column is 'cat_hidden' - and I need this column :( [18:24:21] ok, this sounds great, and is much more ambitious than what I was imagining :) I don't have very specific wikimania plans, might be able to work on it a bit as well. [18:25:05] cool :) [18:25:20] andrewbogott: thankfully a lot of the work is already done by hipache [18:25:33] we just need to write config into redis [18:25:45] nette: From http://www.mediawiki.org/wiki/Manual:Category_table: "cat_hidden: Was reserved for future use; apparently no one found a use for it because it was removed in v1.20." [18:25:50] nette: Sure you need it? [18:26:10] we can do it directly in wikitech, or we can have an api that sits in front of it that's called from wikitech and writes to redis [18:26:20] I like the latter, but it's more upfront work [18:26:24] andrewbogott: Ryan_Lane owncloud (who wrote hipache) have moved on to https://github.com/samalba/hipache-nginx which is what I'm basing my work on [18:26:36] scfc_de: oh, I thought it refers to categories for administrational work [18:28:30] scfc_de: checked and yes, you are right, all values are 0 [18:28:52] nette: Right, I hope you really didn't need cat_hidden because (a) it was all 0 and (b) it no longer exists to be replicated anyways. :-) [18:29:17] brb [18:29:39] Ryan_Lane: btw, kartik mistry (nginx maintainer on debian/ubuntu) is also coming to wikimania :) [18:29:47] ah, cool [18:30:05] It was actually removed from the schema. Wow. I thought we _never_ removed stuff from the schema. enwiki has /whole tables/ that haven't been used since 2005, and some that were needed for extensions that were never used. :-) [18:32:31] :) ok, then I have another question: It is possible to identify the categories that are used for admin. work (e.g. http://en.wikipedia.org/wiki/Category:Disambiguation_categories) [18:32:58] Can I identify them over these category: http://en.wikipedia.org/wiki/Category:Wikipedia_categories [18:32:59] ? [18:36:38] nette: I don't think those categories have been classified that precisely. [[Category:Tracking Categories]] is probably closest to what you expect. [18:37:05] There's also http://en.wikipedia.org/wiki/Category:Hidden_categories which *sounds* similar to cat_hidden :-). [18:37:11] Also there is [[Category:Hidden Categories]] which might be what you expected cat_hidden to have been? [18:38:28] nette: But if you are going to be looking at category intersection and stuff, you *definitely* want to take a look at https://wikitech.wikimedia.org/wiki/Nova_Resource:Catgraph rather than poke the DB. [18:42:17] Coren: Yes, I am looking for category intersection. Catgraph contains the admin. categories as well? [18:43:28] nette: I think catgraph does ALL THE CATEGORIES!!! [18:44:41] Coren: ok, I will have a look! thanks a lot. I know already what is and what is not possible :). wikimedia labs - great work!!! [18:54:11] anomie: we provisioned a newer, bigger redis box (tools-redis), I remember you were migrating some of your tools to use redis. just ar eminder :) [18:54:33] YuviPanda|away: Hopefully I'll have time to finish that migrating this weekend. [18:54:38] sweet [18:54:46] use tools-redis, not tools-mc then :) I have updated the docs too [18:57:42] YuviPanda: BTW, is toolsbeta-mc still in/of use? [18:57:49] scfc_de: kill it! [18:59:08] YuviPanda: On toolsbeta, you can, too! :-) [19:13:44] someone is having problems with pywikipedia complaining about missing beaufifulsoup? [19:21:36] btw, Coren, it seems that the http server is not working as expected (long wait time and then page not available) [19:21:56] (in the mean of timeout error) [19:22:22] fale: Side effect of the occasional NFS stalls. Fixes itself within 2 minutes. We're going to sit down and bang on this during DevCamp. [19:22:46] Coren: cool :) [19:24:25] Coren: if for 2 minutes you mean 120 seconds... it did not happened [19:24:47] Holy! [19:24:52] No, that's not the usual. [19:25:14] Ryan_Lane: http://ganglia.wikimedia.org/latest/graph.php?r=1hr&z=xlarge&h=labstore3.pmtpa.wmnet&m=cpu_report&s=descending&mc=2&g=cpu_report&c=Labs+NFS+cluster+pmtpa [19:25:16] Ryan_Lane: ! [19:25:26] * Ryan_Lane sighs [19:25:30] god damn it [19:25:44] grrr netsplits [19:25:52] Ryan_Lane: god doesn't mind really [19:26:03] heh [19:26:27] is NFS dead again? [19:26:40] does that explain why me trying to ssh into proxy-project-project-proxy is stalled at debug1: Entering interactive session. [19:26:41] ? [19:26:48] it did go past Authenticated to proxy-project.pmtpa.wmflabs (via proxy). but [19:27:14] Oh, FFS [19:27:20] YuviSplit: seems worst [19:27:53] Restarted nfs service, see if that will fix. [19:28:14] Ryan_Lane: Yeay! 14 day bug! [19:28:23] -_- [19:28:24] (After 15 though) [19:28:26] this is killing me [19:28:31] and I'm sure you :) [19:28:50] Ryan_Lane: I reboot now, and switch no no snapshots. [19:28:54] (3.2 kernel) [19:28:56] during WM [19:29:06] * Ryan_Lane nods [19:29:10] sounds like a good plan [19:29:43] Coren: why did system incresed so much? [19:29:55] fale: The block layer is dead. [19:30:08] fale: And the driver just spins in place. [19:30:28] man, I shouldn't have added nfs to proxy project project proxy [19:30:29] grr [19:30:46] YuviSplit: The server will be back up in a few minutes. [19:30:57] And it usually works great for several days after a reboot. [19:31:10] ok [19:31:11] Coren: thanks :) [19:31:40] Coren: but in this case, I don't really need NFS, so I guess I can just not include it [19:31:43] at least while testing [19:32:03] so, I was thinking about per-project autofs the other day..... [19:32:18] YuviSplit: That just makes you victimized by gluster instead. :-) [19:32:27] YuviSplit: pm? :) [19:32:34] ^^ and that's why I was thinking about per-project autofs :) [19:32:47] so that there's the option of choosing which filesystem you use [19:32:57] addshore: sure [19:33:15] Coren: really? Can't I just have a standalone instance with just local disk? [19:33:24] is possible that there are no reliable network fs out there? [19:34:15] fale: in this case it isn't the filesystem [19:34:30] fale: it's either a kernel bug, or a raid controller issue [19:34:30] Ryan_Lane: what is it? [19:34:35] I see [19:34:38] YuviSplit: But that works only for one-instance projects? Also, don't the ssh keys come from NFS? [19:35:03] NFS is back. [19:35:04] we'll likely be upgrading the controller's firmware soonish [19:35:18] and we'll be reverting to a more stable kernel [19:35:27] scfc_de: SSH keys are on a read only nfs shared, yes [19:35:31] At the cost of a feature that was really nice. [19:35:52] fale: NFS is very reliable. Right now, the /server/ isn't. [19:35:54] scfc_de: i was able to ssh in even before I applied the labs role [19:36:05] Coren: I see :) [19:36:08] Reminds me to upgrade my laptop to 3.10 [19:36:47] Damianz: But that RO NFS is on a different server? [19:36:56] gluster afaik [19:37:06] Unless it moved once nfs came alive [19:37:39] yeah, it's on gluster [19:37:43] gluster can also export NFS [19:37:56] Which is actually a really nice feature [19:38:05] Shame it doesn't support the latest features of nfs though iirc [19:38:06] Ryan_Lane: Is it gonna take a while for a new role to appear on wikitech? [19:38:07] it can't export subvolumes, though [19:38:22] YuviSplit: it doesn't appear there unless you add it [19:38:30] YuviSplit: go to "Manage puppet groups" [19:38:37] Ryan_Lane: Other possible method: convince Ken to use the netapp. :-) [19:38:38] and add it to your project [19:38:53] Coren: I think that boat has sailed [19:38:58] I know, I know. :-) [19:39:00] it's being used for fundraising and backups [19:39:15] Backups? rofl [19:39:27] remember how much I complained about rolling our own NFS server? :) [19:39:33] Ryan_Lane: Methinks this needs to be our #1 job while at devcamp though. [19:39:35] I hate to say I called it [19:39:57] Ryan_Lane: In all fairness, were it not for that hardware/driver issue, it does work sweetly. [19:39:57] I'd still like to see what ceph does with a load of ssds... probably over the budget though [19:40:05] Coren: yeah, totally does [19:40:28] http://solidfire.com/ < Could totally buy one of these, openstack has a block driver :D [19:40:40] heh [19:40:46] we need a shared fs [19:40:47] not a block one [19:41:13] and we try not to buy proprietary things [19:41:21] though I'm really considering another netapp purchase now [19:41:28] back in a bit. lunch [19:41:43] netapp is pretty sweet tbf, was impressed with their dedup stuff at a recent demo [19:41:49] enjoy lunching [19:42:00] thanks [19:43:02] (03PS1) 10Kmenger: Modified copy at top of web page. [labs/toollabs] - 10https://gerrit.wikimedia.org/r/77373 [19:43:17] kma500: Hurray! :-) [19:43:38] Thanks! [19:43:39] woo [19:45:33] coren: have you had a chance to look at the the draft (https://www.mediawiki.org/wiki/User:Kmenger/ToolLabsGuide) [19:46:03] kma500: No, but I'll take it offline to read on the plane and annotate. [19:46:31] Great. Thanks! [19:47:23] (03CR) 10coren: [C: 032] "It haz a pretty!" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/77373 (owner: 10Kmenger) [19:47:35] (03CR) 10coren: [V: 032] "It haz a pretty!" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/77373 (owner: 10Kmenger) [19:53:03] kma500: Pulled. [19:55:25] Ryan_Lane: As usuall, and much to my annoyance, simply having rebooted the [beep] server made everything rosy and happy and skipping in the sunny fields of elysium. [19:55:39] Until the gremlins invade again, that is. [19:58:02] &ping [19:58:03] Pinging all local filesystems, hold on [19:58:04] Written and deleted 4 bytes on /tmp in 00:00:00.0005790 [19:58:09] nfs spike! [19:58:11] agaaaaain [19:58:19] Oh joy. [19:58:23] They didn't wait long. [19:58:38] That, OTOH, is a "normal" stall and is about to resolve itself if I trust the graph. [19:59:03] I'm guessing, ~30s [19:59:10] wm-bota should respond back [19:59:14] when it could write itself [19:59:18] and so should my console [19:59:22] Written and deleted 4 bytes on /data/project in 00:01:18.9181590 [20:00:08] There it is. [20:01:07] It's actually easy to see the stalls on the graph, it's the flat area where system drops to 0% (no red at the bottom) [20:02:29] Coren: Different topic: On tools*beta*-puppettestbed "puppetd -tv", I get "WARNING: The following packages cannot be authenticated! mariadb-common libmysqlclient18 libmariadbclient18 libmariadbclient-dev E: There are problems and -y was used without --force-yes". Does that ring any bell? I've looked at /etc/apt/sources.list.d and they look similar. /data/project/.system/deb doesn't have any packages with that name. [20:03:17] I. e. has Tools any not-yet-puppetized changes to package sources? [20:04:36] No, but I've seen things like that when they switch their keys around. Installing it --force-yes manually should make it work indefinitely, and I'll look into importing those in our own repo anyways. [20:05:07] Okay, will install manually, thanks. [20:19:57] 10.4.0.1 is our internal DNS server? [20:19:58] * yuvipanda checks [20:21:23] Coren: Snapshots are still enabled until further notice? [20:22:12] scfc_de: Yes. [20:34:32] Ryan_Lane: can't really setup a wildcard address in labs now, can I? [20:34:33] * yuvipanda checks [20:37:20] Ryan_Lane: I can't add *.proxy [20:37:25] but *.instance-proxy exists [20:37:26] hooow [20:37:36] one sec [20:37:44] the dns code isn't great [20:38:24] Ryan_Lane: and looks like we can do this without needing PPAs or backports \o/ [20:38:30] Ryan_Lane: it'll be abit slower, because no LuaJIT [20:38:32] but that's okay [20:38:36] ah, cool [20:43:39] Ryan_Lane: dns wildcards? [20:43:51] sorry was distracted by ssl stuff [20:43:57] :D [20:44:45] yuvipanda you have to add your domain on the manage dns domains page... [20:44:53] yep [20:44:56] that's what I was about to do :) [20:44:56] andrewbogott: yeah, but it doesn't let me add *.proxy [20:45:04] andrewbogott: says I have to 'start with a-z' [20:45:21] yuvipanda: why do you want *.proxy.wmflabs.org anyway? [20:45:29] Bad resource name provided. Resource names start with a-z, and can only contain a-z, 0-9, and - characters. [20:45:33] Ryan_Lane: testing? [20:45:37] ah, right [20:45:39] one sec [20:45:51] Ryan_Lane: I want to replicate the current instanceproxy setup, and then move on to redis based stuff [20:45:57] I bet I can replicate instanceproxy setup by tonight :D [20:46:00] you add the .blah.blahblah part on the dns domains page, and then assign the instance to be *. on the manage addresses page. [20:46:02] iirc [20:46:10] yuvipanda: please delete proxy.wmflabs.org [20:46:30] okay, deleted [20:46:31] now? [20:46:49] yuvipanda: now add * to the proxy.wmflabs.org domain [20:47:24] Ryan_Lane: in 'add host name', under Special:NovaAddress? [20:47:30] still getting Bad resource name provided. Resource names start with a-z, and can only contain a-z, 0-9, and - characters. [20:47:32] yep [20:47:40] just * [20:47:51] ohh [20:47:55] Host name '*' [20:47:59] DNS domain: proxy [20:48:02] yep [20:48:02] is that what you mean? [20:48:33] well, that said 'success' but I see no changes in NovaAddress [20:48:59] oh [20:48:59] no [20:49:00] Failed to add * entry for IP address 208.80.153.190. [20:49:02] is what I get [20:49:04] with that [20:49:07] andrewbogott: Ryan_Lane ^ [20:49:21] associatedDomain: *.proxy.wmflabs.org [20:49:21] dc: * [20:49:21] aRecord: 208.80.153.190 [20:49:56] hmm [20:49:58] so it is there [20:50:01] but just not showing up [20:50:02] in the interface [20:50:06] ? [20:50:07] not sure why it doesn't show in the interfacd [20:50:16] as I said, the DNS code sucks :) [20:50:24] I haven't touched it since I wrote it, for the most part [20:51:28] :P [21:07:18] Ryan_Lane: andrewbogott ttp://tools-webproxy.proxy.wmflabs.org/ [21:07:19] :D [21:07:20] err [21:07:21] http://tools-webproxy.proxy.wmflabs.org/ [21:08:11] now to add ports to that [21:08:15] and then I can puppetize that! [21:08:50] Ryan_Lane: I hope this polish is ok re the Labs monthly update https://www.mediawiki.org/w/index.php?title=Wikimedia_engineering_report%2F2013%2FJuly&diff=755506&oldid=755349 [21:10:47] yep [21:10:50] Ryan_Lane: should http://tools-webproxy.proxy.wmflabs.org/?status work for other instances as well? [21:10:53] I was hoping someone would clean it up :) [21:10:58] I can't think of any of the top of my head [21:11:00] *off [21:11:02] I wrote it last night at like midmight [21:11:05] midnight* [21:11:25] yuvipanda: what do you mean? [21:12:05] Ryan_Lane: security groups, basically. project proxy proxy project can access those only if they are allowed by security group policy, right? [21:12:20] I'm assuming 'security group' is just a fancy word for 'firewall rule templates' [21:16:20] oh [21:16:21] right [21:16:29] yes, that's what security groups are [21:16:33] it's ec2 terminology [21:16:52] AzaToth: ping [21:16:57] Ryan_Lane: right. I'm puppetizing what I have so far now [21:17:13] addshore: I've been distracted by puppet and proxies, haven't looked at your etherpad yet :( will do, sorry! [21:17:50] yuvipanda: haha, dont worry now! I managed to implement it! https://github.com/addshore/addwiki/commit/984661462a3e21cf51832ad717de97dbad591b4b [21:18:06] niiice! [21:18:23] took a lot of juggling :P [21:22:18] &ping [21:22:18] Pinging all local filesystems, hold on [21:22:19] Written and deleted 4 bytes on /tmp in 00:00:00.0006720 [21:22:44] coren: Now that I've been through gerrit/git once, I thought I should set it up for my tool account and see how that works. But, I'm not sure how to use the request page to ask for this. This is the page I have: https://www.mediawiki.org/wiki/Gerrit/New_repositories. Is this the right pointer? [21:23:46] https://www.mediawiki.org/wiki/Git/New_repositories/Requests [21:23:51] kma500: ^^ [21:24:04] Better instructions: https://www.mediawiki.org/wiki/Gerrit/New_repositories [21:24:16] Hmmm. The load on the exec nodes is >> 100 %, even though on them NFS stalls shouldn't trigger any fork bomblets? [21:24:26] Written and deleted 4 bytes on /data/project in 00:02:08.0519000 [21:24:30] thanks! [21:24:55] yuvipanda: the only question is now, am I sure I want it or not xd [21:25:28] scfc_de: You shouldn't trust load as a measure of anything; most the the programs waiting for disk access will artificially increase it. [21:27:38] Coren: Looks like you're right, the loads is decreasing after NFS became available again. [21:33:01] addshore: pong [21:33:34] Anyone happen to be logged into wikitech and be in the bots project? [21:33:53] AzaToth: I was going to say I think I fixed the bug you talked about the other day, and now I tihnk about it again I realise that the two are totally unrelated xD [21:34:13] hehe [21:34:33] Damianz: yup, why :O [21:34:42] Could you reboot bots-cb? [21:34:56] did it die? :( [21:35:07] It's hung, so can't login and my phone battery is flat :( [21:35:11] :( [21:35:27] *goes to reboot it now* [21:35:46] Rebooted instance 288d2dd1-c191-4629-b464-994145b0b998. [21:35:47] [= [21:35:59] Glad I could be of service today :) [21:37:32] andrewbogott or anyone, one of E3's labs Mediawiki_singlenode instances is broken, /srv/mediawiki empty [21:37:40] Thanks :) [21:37:56] * Damianz hops it comes back to life [21:37:58] lol [21:38:17] spagewmf, I'll look as soon as I finish this here sandwich [21:39:00] Yay - processing pages again :) [21:40:41] andrewbogott: Thanks, it's piramido. /var/log/puppet.log is full of "Dependency Exec[git_clone_mediawiki] has failures: true:", for everything [21:52:11] * yuvipanda pokes andrewbogott with https://gerrit.wikimedia.org/r/77454 :P [21:56:55] Is tools redis borked? [21:57:13] Damianz: hmm? [21:57:20] Damianz: tools-mc or tools-redis? [21:57:26] also what do you mean by borked? [21:57:31] a bunch of commands are disabled [21:57:43] Wait... nvm, I was trying to telnet to tools-login rather than tools-redis derp [21:57:48] Is pub/sub enabled? [21:57:50] hehrp [21:57:52] Damianz: yeah, enabled [21:57:58] Cooli [21:58:07] KEYS, DEBUG, CONFIG, FLUSH, RANDOM, and a couple of others [21:58:44] Makes sense [21:59:01] I assume the security is basically prefix with a random key? [21:59:12] Damianz: yeah [21:59:21] !tools-doc [21:59:24] !toolsdoc [21:59:24] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help [21:59:41] spagewmf, better? [21:59:45] I've never read that page lol [22:00:08] Damianz: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Security :P [22:00:15] :) [22:00:16] specifically, openssl rand -base64 32 [22:00:19] which I guess is good enough [22:00:25] 64 is too large, I think [22:00:37] heh [22:00:43] edited [22:00:58] I'd kinda like a rabbitmq node in tools for this tbh, but redis could work nicely [22:01:28] Damianz: only thing that Rabbitmq has that I'd want from redis is full on reliability. [22:01:45] andrewbogott: yes, back. What was the problem? [22:01:50] Tuesday I think. [22:01:51] Damianz: we can sortof get that with lua [22:01:54] redis with disk syncing is pretty reliable tbf [22:01:56] andrewbogott: ah, okay! [22:02:12] But yeah rabbit is more awesome for message reliability end to end [22:02:14] spagewmf: Did that system work at some point in the past? [22:02:17] Damianz: no, I was tlaking about how your tool can pop something, and then crash before it finishes processing it - hence the item popped is 'gone' [22:02:31] Yeah [22:02:40] andrewbogott: thanks for fixing. We lost our orig/LocalSettings.php, which we had in a local .git, is it backed up? [22:02:52] spagewmf: Because there was no mediawiki checkout at all that I could find. [22:03:16] spagewmf, I saved what was there as /srv/mediawikibak [22:03:41] andrewbogott: sure, we've used that machine for months. IIRC I made some changes a few weeks ago when the apache config changed to not use /var/www [22:04:51] andrewbogott: alas no .git or LocalSettings.php in /srv/mediawikibak/orig [22:05:16] yeah, they probably went wherever the original mediawiki repo went. [22:05:27] * andrewbogott searches the system [22:06:39] andrewbogott: is anything on a labs instance backed up? We could symlink orig/LocalSettings.php to a git repository somewhere else. [22:09:11] I don't think we have backups. [22:09:16] I see projectstorage.pmtpa.wmnet:/editor-engagement-project on /data/project containing a "repo", that sounds promising [22:10:10] NSA, pressure cooker bomb requires orig/LocalSettings.php please assist [22:10:45] andrewbogott: no big deal, I can reconstitute from our other server toro. Thanks again [22:10:48] sorry, not dangerous without backpacks [22:10:59] no backups without backpacks! [22:16:50] maybe not the best things to post in public channels before we all travel to a foreign country ;) [22:18:43] Ironically I talk most about explosives when in airports [22:18:59] Ryan_Lane: I doubt the PRC really worries about our low opinion of the NSA. :-) [22:19:30] Coren: though the US has this no-fly list thing [22:20:03] Oh, right. Land of the free and all that. :-P [22:20:18] I might have to worry in re the all-staff then. :-) [22:20:22] If you're on the no fly list, can you not have a private pilots license? [22:20:46] * yuvipanda tries to interest someone in https://gerrit.wikimedia.org/r/77454 [22:21:00] it's even a module now! [22:21:24] lua? :( [22:21:33] yeah, nginx! [22:21:38] and... redis! [22:21:42] and... mediawiki! [22:22:21] Why redis? [22:22:36] no I was mentioning the places lua is used [22:22:40] Damianz: hipache [22:22:40] ah [22:22:50] nginx lua is actually amazing - I'm using it to directly upload files, but the lua syntax just grates me [22:23:04] Ryan_Lane: we should stop calling it hipache, since it's just regular nginx + lua now :P [22:23:39] 'Dynamic HTTP Routing' sounds more webscale [22:23:51] hmm, 'Dynamic Realtime HTTP Routing', perhaps [22:24:08] there are going to be coroutines involved, so we could throw that in too! [22:24:15] 'Depends on c++ boost library' < I swear everything relies on boost in the c++ world [22:25:17] Damianz: you're using luajit or just regular lua with nginx? [22:25:48] luajit [22:25:54] Actually playing with using openresty at the moment [22:26:22] Damianz: ah, right. [22:26:33] Damianz: I had to stick to things that were in 12.04, so... no openresty :( [22:27:17] Yeah... I ran into that the other day, not decided what to do packaging wise... there's a load of dep packages that I don't want to manage heh [22:27:23] yeah [22:27:27] same :) [22:27:35] but luajit needs compiling anyway [22:34:20] Damianz: any thoughts on tha tpatch? [22:36:29] I'd want to get the domain out of being hard coded (maybe pass in as a var, set from puppet) so it's reusable. [22:36:47] But then I'm a bit fussy perfectionistic [22:37:05] Damianz: hmm, true. I could make that a template [22:37:23] but then I'll have to figure a way out to escape the .s [22:37:25] and stuff [23:20:48] Coren: On what host is /data/project/.snaplist generated? I see /usr/local/bin/snapshots on tools-login, but no cron job. [23:23:08] damn, I've got a screw loose [23:27:25] scfc_de: That comes from the NFS server itself. [23:28:40] Coren: I've not poked you with https://gerrit.wikimedia.org/r/#/c/77454/ have I? :) [23:30:07] no' yet anyways. [23:32:07] <{{Guy}}> Coren can you restart wm-bot3? [23:32:24] Coren: consider yourself poked :) [23:32:29] Coren: also when are you arriving in Hong Kong? [23:32:46] {{Guy}}: I almost certainly /can/, but I don't think I know how. :-) [23:33:02] <{{Guy}}> !reboot [23:33:02] https://wikitech.wikimedia.org/wiki/Nova_Resource:Bots/Documentation/wm-bot [23:33:21] <{{Guy}}> Coren: ^^^ instructions... [23:33:26] Coren: Okay. BTW, /data/project/.snaplist has one extra entry compared to $(/usr/local/bin/snapshots): "diff -u <(/usr/local/bin/snapshots | sort) <(sort /data/project/.snaplist)" = "20130715.2117". Don't know what's special about that. [23:38:24] {{Guy}}: I seem to be missing a password to do it the gentle way. [23:38:48] <{{Guy}}> Don't be gentle... [23:38:55] <{{Guy}}> I don't have it either. [23:39:03] <{{Guy}}> :p [23:39:21] <{{Guy}}> If I was at a computer... [23:40:48] My understanding is that since I don't have the password, I have to restart the whole lot of them. That seems to be... drastic. [23:42:02] (03CR) 10Tim Landscheidt: [C: 04-1] "(4 comments)" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/76313 (owner: 10Platonides) [23:43:37] {{Guy}}: That may have done it. Maybe. [23:43:37] Please to check? [23:44:33] !ping [23:44:33] !pong [23:44:47] Or is that a different wm-bot? [23:45:17] !log tools tools-dev: Installed dialog for testing [23:45:20] Logged the message, Master [23:47:32] scfc_de: There's a bouncer for each, so we wouldn't see them drop off IRC [23:49:43] petan seems to like complicated setups :-). Is an "instance" of wm-bot just a worker bee that resembles any other, or does each perform different tasks? [23:53:52] Went to #wikimedia-labs-beta where according to http://bots.wmflabs.org/~wm-bot/db/systemdata.htm wm-bot3 lives, /query'd it, got an answer. So seems to be working.