500 Internal Server Error

[01:44:56] (03PS1) 10Tim Landscheidt: WIP: Update rmtool [labs/toollabs] - 10https://gerrit.wikimedia.org/r/122274 [01:51:03] (03CR) 10Tim Landscheidt: [C: 04-1] "Work in progress, do not commit." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/122274 (owner: 10Tim Landscheidt) [05:59:58] * Beetstra carefully pokes Coren .. are there still databases to be moved, or do I have to do that myself (after the bots moved) .. ? [06:48:28] Beetstra_ TBH I don't recall I had to move db [06:48:49] so I think that was part of migration kit [06:57:36] sudo: ldap_start_tls_s(): Can't contact LDAP server [06:57:37] sudo: unable to resolve host dumps-stats2 [07:55:41] petan - if I login to tools-db, mine is not in the list .. [07:56:17] petan - so I am wondering where the db is [07:56:38] I dont see it with "mysql --defaults-file=data.cnf -h tools-db" & "show databases; [07:56:57] the number of dbs is however still increasing, maybe still waiting to move? [07:58:13] petan: is the beta cluster migration done ? [09:30:00] how exactly do i submit jobs and check their status? jsub didn't give me a job id [10:09:06] ? [11:30:39] !log deployment-prep Update DNS entries to point to EQIAD instances (aka switching beta cluster to eqiad) [11:30:41] Logged the message, Master [12:07:48] Beetstra: All databases should have been moved, actually. Do you have missing ones? [12:12:28] Coren: good morning:-] I have changed beta cluster DNS entries to point to eqiad ~ 40 minutes. [12:12:47] so I guess in an hour or so I will be able to start deleting instances in pmtpa [12:12:51] hashar: Victory? :-) [12:12:55] hopefully [12:32:45] Coren: I am missing __linkwatcher [12:32:50] (with the number before it) [12:33:41] so: s51230__linkwatcher [12:33:48] (I don't know the old number anymore) [12:35:20] Hmmm. [12:36:16] Coren: I do 'mysql --defaults-file=data.cnf -h tools-db', and then 'show databases;' - that does not list my db [12:36:29] Poop. [12:37:20] Yeah, I see this, and I don't seem to have it in my backups anywhere(!) [12:37:51] I doubt that I can tell you my old login [12:37:58] now that is stranger [12:39:01] And I see your ...DATA.crontab file so your data /was/ migrated, apparently without any databases. :-( [12:39:26] * Beetstra hits himself for not dumping his db-structure ... [12:40:42] I may yet be able to salvage this from the shutdown image, as soon as andrew starts his day. [12:41:10] That would be great. [12:41:31] My apologies for the trouble; there must have been an error dumping that I didn't notice on your account. :-( [12:41:56] There were a few that couldn't be dumped automatically, but I /thought/ I got them all. [12:43:06] Not a big problem (just a nuisance) - I thought they were still moving (as the number of databases in the 'show databases;'-result is still increasing) [12:43:27] Coren: I could use apache:apache user and groups with uid/gid 48 on the NFS server labstore.svc.eqiad.wmnet :D [12:43:29] I just hope that I do not have to rebuild the db - I just did that and forgot to dump the structure [12:43:46] Beetstra, how's the blacklist? [12:43:54] much like you did previously for l10nupdate and mwdeploy user, thought this time I don't think the apache user needs to be added to LDAP. It is a system user provided by Ubuntu [12:44:06] Cyberpower678? How do you mean? [12:44:25] I just blacklisted some long-term stuff which is widely linked, your bot is probably going crazy :-D [12:44:52] Beetstra, small talk regarding the spam-blacklist. I do get a complaint every now and then. [12:46:00] "What do I do", "where are the instructions", "how do I stop the bot", and the occasional someone shutting it down on the run page. Which has now been protected. :p [12:46:21] Beetstra, ^ [12:47:07] Ah well, complaints are fine, I get people yelling at me for blacklisting 'useful links' that still get actively campaign-spammed [12:47:56] People don't like being told that the pages they are editing need some maintenance - that is where Betacommand and Rich Farmbrough burned their fingers [12:47:58] Beetstra, I can develop a new tool where can input a set regexes and the tool will find all currently blacklisted links and match them. [12:48:20] Something like my COIBot is doing :-p [12:48:30] (which is now down because it has no database) [12:48:58] It be great to see what links have been getting caught, and it will allow you to refine regexes, to fix fallse positives. [12:49:12] COIBot has a 'wherelisted' command, which shows where your link is blacklisted, and a 'findrules', which regexes the regexes and reports the matches [12:49:13] Beetstra: what about me? [12:49:25] hey, Betacommand [12:49:51] Betacommand: Smalltalk about bots reporting problems with pages and people yelling about it - Cyberpower678 has such a bot now [12:50:30] Beetstra: he better get used to people screaming at him then [12:50:39] :-D [12:50:40] Betacommand, I already am. [12:50:44] :p [12:53:22] Does somebody killed the beta cluster with the wikis? can't reach it nor ping it (dns problem?) [12:55:37] hashar: 11:30 hashar: Update DNS entries to point to EQIAD instances (aka switching beta cluster to eqiad) <-- finished / Is there a problem? [12:56:53] 28 october 2013 - 31 March 2014 .. 5 months of spamming [13:02:14] Coren: was the migration tool using grid? [13:02:24] Coren: that would explain why mysqldump crashed :P [13:03:39] petan, hi. [13:03:47] hi [13:03:56] how exactly do i submit jobs and check their status? jsub didn't give me a job id [13:04:12] gry: there is a command "qstat" [13:04:24] it's very useful, because it display all (by default your) jobs [13:04:26] that are running [13:04:38] I know. it gave empty output. [13:04:46] that means no job is running [13:04:49] As did jsub. [13:04:59] jsub doesn't give empty output [13:05:14] it should say "your job SS was submitted bla bla" [13:05:16] It did to me, twice. [13:05:20] SS = some string [13:05:24] No, just empty. [13:05:30] ok, that's a bug I think [13:05:44] if you still have the terminal output can you create a ticket and paste it there? [13:06:03] Admittedly I consciously had it process a .pl file instead of .sh, but I'd expect it to be more verbose. It has no terminal output. [13:06:05] Beetstra, so you have a tool that can trace the regex affecting the link. [13:06:11] se4598: hi :-] [13:06:18] se4598: which entry are you trying to ping and what is its IP address? [13:06:20] gry: no output on error = bug [13:06:33] hashar: ping: unknown host en.wikipedia.beta.wmflabs.org [13:06:39] Beetstra, I'm proposing a tool that can find all currently blacklisted links that match a regex inputted. [13:06:41] gry: unless Coren is purposefuly obfuscating his error reports into ivisible chars :P [13:06:41] se4598: IP address ? [13:06:45] Cyberpower678: yes, by just brute-force matching [13:07:01] petan: http://repo.or.cz/w/gpy.git/blob/HEAD:/bot.pl is the file. [13:07:02] se4598: nm , unknown host hehe [13:07:33] gry: .cz? are you a czech? [13:07:59] Cyberpower678, that would be great, but what do you do for reporting all links that match '\bsmackmy\w{1-7}\.com\b' [13:08:12] se4598: fixed it. Will probably need some time for the change to propagate [13:08:24] petan: No, it must be someone who runs that host. [13:08:25] gry: ok but I would rather see the terminal output, eg. how you issued jsub and no output [13:08:29] Beetstra, in the event of a false positive mathes coming from the blacklist. You can take a regex and refine it on the spot. [13:08:36] Beetstra, what do you mean? [13:08:37] petan; "jsub bot.pl", no output. [13:08:54] tools.gpy@tools-login:~/gpy$ jsub bot.pl [13:08:54] tools.gpy@tools-login:~/gpy$ [13:08:54] like that [13:08:59] gry: ok, either create a ticket in bugzilla or I will create it [13:09:03] That's actually quite possible; the job may start correctly then fail while it is running. [13:09:04] go ahead [13:09:22] you did just tell me that it shows me the job id if it starts correctly [13:09:23] that regex possibly matches, eh, 26^7 + 26⁶ + 26⁵ + 26⁴ + 26^3 + 26² + 26¹ links ... [13:09:36] greg-g: What's in your .err and .out files? [13:09:51] (from my experience with qsub, it shows a nice error if used against something other than a .sh file with proper pbs directives) [13:09:54] Oh. Hm. You're not using -quiet? [13:09:59] no [13:10:10] not unless it's on by default [13:10:14] including 'witch', 'mitch', 'coren', 'bum' .. [13:10:27] and there's no new files it created in ~/gpy [13:10:28] It's not. [13:10:48] Beetstra, I'm still not sure what you're asking. [13:10:52] The output files should be created in your tool's home. [13:11:15] Unless you have explicit -o and -e, which you do not have. [13:11:17] Cyberpower678 - how would you know what links are possible false positives if there are a couple of billion of possible links that fit certain patterns [13:11:18] that's better, the nice error is in there [13:11:20] The bot maintains a DB that contains every link on Wikipedia that has been blacklisted. [13:11:28] gry: https://bugzilla.wikimedia.org/show_bug.cgi?id=63298 [13:11:38] hashar: PING en.wikipedia.beta.wmflabs.org (208.80.155.135) is ok from a remote server, on my home pc i think I have to clear the dns cache too [13:11:44] Beetstra, ^^^ [13:11:53] se4598: your DNS resolver probably has the old entry in cache as well. [13:12:00] Coren: but isn't jsub supposed to say something when the job is started? [13:12:02] thanks, petan [13:12:07] Beetstra, the tool would use that database and match regexes to those stored links. [13:12:09] Coren: it used to say "your job BLAbla started" [13:12:23] Ah, that is what you mean - so it can report which links that are blacklisted match the case, it will not know the links that some n00b wants to add, which is accidentaly matching the regex [13:12:23] if it fails, it silently nags into the errorfile, apparently [13:12:25] prtksxna: When it's started /sucessfuly/. In this case, it fails before. [13:12:32] qsub complains to terinal [13:12:32] petan: When it's started /sucessfuly/. In this case, it fails before. [13:12:41] Cyberpower678 - that is useful for quick review, indeed [13:12:48] Coren: You summoned me, master? [13:12:50] :-) [13:13:02] petan: By default, errors are sent to the .err file so that they are not lost -- you can -stderr to get them interactively. [13:13:03] prtksxna, sorry, it's just me being a bit too noisy here [13:13:04] Coren: ok in that case it should probably say that? like "your job wasn't started because it failed before" [13:13:08] prtksxna: Sorry, autocomplete fail. [13:13:18] :P [13:13:24] Coren, if you could spare some time? [13:13:36] Coren: your irc client really suck if it autocompletes gry into prtksxna XD [13:13:37] Coren, could you please alter that default? so it spews error to stderr? [13:13:54] petan, it was petan into it [13:13:54] petan: lol [13:13:54] Cyberpower678 - just have it make a statistics page "rule ### matches 20 times 'a.com', 5 times 'b.com' and 1 time 'c.com'" - then it is quick enough to see whether one of the three is a false positive [13:13:55] gry: Add -stderr to your invokation. [13:14:08] Coren, yes, I mean for the new users [13:14:18] alter default, make it more sane [13:15:00] Anyways, have to go home. Coren, can you ping me whether you found the database back and/or can re-spawn it, or whether I have to rebuild it from scratch? [13:15:01] Beetstra, true, but with the tool you could use an existing regex, and make the rule more refined to eliminate the false positives to check the results. [13:15:04] gry: Not loosing errors or having cron spam you (possibly once per minute) seems more sane to me. Sending errors to stderr is only useful interactively, which is rarely the default. :-) [13:15:22] Beetstra: I'll be able to give you news in a couple hours. [13:15:46] Coren: I don't understand this "jjobflow" so you execute "jsub example" -> jsub submits the job using qstat and do some ?what? check -> based on result of ?what? check it either say "started your job" on success or is silent on failure? or what [13:15:53] I won't be back until tomorrow ~08:00 (UTC+3) [13:16:13] UTC+3? [13:16:22] Saudi Arabia [13:16:22] Beetstra: Noted. [13:16:32] * Cyberpower678 looks [13:17:20] Coren, rarely the default? how so? I thought tools was for writing a tool and starting it at terminal [13:17:20] Bye all! [13:18:08] gry: That's a rare use case; most people have their bots spawned from cron and only start them manually when debugging. Well, except some continuous bots that are jstarted once. [13:18:08] Beetstra, no [13:18:24] I saw your mention. :p [13:19:00] gry: But in that case, the errors are asynchronous -- there is no point at which an error could be returned to your terminal unless 'jstart' just hanged there waiting for errors indefinitely (which would defeat the purpose) [13:19:27] Coren, can I steal your attention for a minute? [13:19:39] Cyberpower678: Don't ask to ask and just ask. [13:20:24] Coren: do you have any clue how to add a security group to an existing instance? I can't find a way to do it on wikitech : [13:20:26] / [13:20:45] Coren, ok. I want to make sure my new edit counter is leaving lingering connections which may cause exactly the issues xtools is having. Can you check, or tell me how to check that? [13:21:07] hashar: You explicitely cannot; for some ridiculous reason, Openstack only lets you set the security groups at creation. [13:21:21] ArGGHGHHHHH [13:21:41] *isn't [13:21:47] guess I have to recreate the instance [13:22:18] Cyberpower678: Simply make sure you close connections explicitly. [13:22:42] !log deployment-prep Creating deployment-cache-upload02 to replace deployment-cache-upload01 which was missing the security group "web" [13:22:45] Logged the message, Master [13:22:51] Cyberpower678: Also, have timeouts. [13:23:17] Coren, I do. But if there are timeouts happening, the connection may never close. [13:24:11] Cyberpower678: What language is this in? [13:24:20] PHP [13:24:24] As usual. [13:24:30] Cyberpower678: http://www.php.net/manual/en/features.connection-handling.php is your friend. [13:27:09] Coren, ooh. Nice. Tons of stuff to guard against lingering connections. [13:28:04] Coren, since a guard against timeouts haven't been implemented yet, has it resulted in any lingering connections so far? [13:28:36] You're the one who can tell me; it'd affect your tool alone since we're all lighttpd. [13:28:53] Coren, how do I look? [13:30:24] Cyberpower678: Keeping an eye on your access log is the best way; or you could add a status cgi for your tool: http://redmine.lighttpd.net/projects/1/wiki/Docs_ModStatus [13:33:41] Coren, I assume the .lighttpd file is modified to achieve this? [13:34:30] Cyberpower678: You assume correctly. [13:34:45] * Cyberpower678 likes getting it right. [13:37:18] !log deployment-prep migrating deployment-cache-upload02.eqiad.Wmflabs to self puppet/salt master [13:37:20] Logged the message, Master [13:40:32] gry: Just so that you don't feel like the one driving the wrong way :-): I totally share your opinion. jsub's behaviour of failing with no output is perpendicular to what 99 % of Unix tools do and thus very surprising, and that has cost me much time. But the nice thing about Tools is that you are not required to use jsub and can use a wrapper that meets your expectations. [13:47:36] !log deployment-prep applying role::cache::upload to role-cache-upload02 [13:47:38] Logged the message, Master [13:49:51] Is there anything against having a terms of service linked on tools' pages that prevent the accessing of certain pages? [13:51:42] a930913: You want to define your own (additional) terms of service? [13:53:21] !log deployment-prep upload varnish cache working :-] [13:53:21] Logged the message, Master [13:56:51] !log deployment-prep Deleted deployment-cache-upload01 , replaced by deployment-cache-upload02 [13:56:56] Logged the message, Master [13:58:36] hashar: to the beta dns issue: Connect to 208.80.155.135 on port 443 ... failed Error 111: Connection refused [13:58:49] Is this the correct ip? [13:59:29] hashar: merged [13:59:32] the IP change [13:59:56] se4598: SSL is not around [14:00:42] hashar: but will soon? :-) [14:03:05] !log deployment-prep shutdowning varnishes instances in pmtpa [14:03:07] Logged the message, Master [14:03:17] Coren, I ran a 'shut down everything that isn't beta' script a couple of times over the weekend, sorry if that messed with you :( [14:04:07] If you restart something now that's fine, I'll check with you before I break things again [14:04:08] andrewbogott: It really shouldn't have; it's a bug that the database wasn't properly copied. [14:04:46] andrewbogott: I'd have told you if I had /expected/ issues. Or not; if I had expected issues I wouldn't have left the db uncopied. :-P [14:05:04] hashar: There's no need for you to actually delete things in pmtpa unless you're feeling compulsive… I'll just shut down and hide pmtpa entirely when you're ready. [14:05:10] !log deployment-prep shutdowning database and apache boxes for now. [14:05:12] Logged the message, Master [14:05:30] andrewbogott: yeah I am manually shut downing them but would like to keep the instances around for a couple days just to be sure [14:05:38] yep, that seems wise. [14:05:53] hashar, everything going OK, generaly? [14:05:58] some browsertests are apparently still passing [14:06:19] so yeah that seems like it is working. I forgot a security rule on the Varnish upload box, had to recreate it [14:09:08] hashar: all the 'beta' related changes are merged i'd say [14:09:19] mutante: great :] [14:13:42] manybubbles: ah hello Nik. I have changed the beta cluster to point to eqiad. Hopefully elastic search is working properly there :° [14:13:56] I can check [14:14:32] Coren: andrewbogott , i'd like to touch glance.pp , but it should be very no-op (i know they all say that, heh) [14:14:48] it provides the whole labs VMs though [14:15:02] mutante: that's ok, just link me to the change... [14:15:13] looks ok [14:15:21] andrewbogott: should I delete the two instances you have setup to emulate puppetmaster on beta? deployment-prep-master and deployment-prep-puppetclient. We have our own master now so I don't think they are still needed. [14:15:46] hashar: yep, that's fine. I think they were just meant as a demonstration anyway. [14:16:05] two less instances [14:16:22] andrewbogott: easier one: https://gerrit.wikimedia.org/r/#/c/121667/1 (just retab, nothing else), a little harder (matanya will rebase on top of mine) https://gerrit.wikimedia.org/r/#/c/122353/ [14:18:15] Yeah, that second one will be a lot easier to read once it's rebased :) [14:18:40] sure, andrewbogott merge mutante's and i'll rebase [14:20:55] !log deployment-prep deleting deployment-parsoidcache01 cache the hardway: stopping varnish, deleting files in /srv/vdb/ , starting varnish [14:20:58] Logged the message, Master [14:21:55] https://gerrit.wikimedia.org/r/#/c/122334/ [14:22:03] https://gerrit.wikimedia.org/r/#/c/122335/ [14:26:28] andrewbogott: so yeah so far beta cluster migration is a success [14:26:47] hashar: great! Let me know when you're ready for pmtpa to disappear from wikitech. [14:27:24] andrewbogott: will probably ping / mail you on wednesday [14:27:39] andrewbogott: I would like to leave a couple days for SF folks to complain :] [14:27:52] the browser tests are still passing apparently [14:27:55] which is a huge ++ [14:28:03] hashar: I'd like to turn off the gui well before I actually delete instances. [14:28:21] So when I remove pmtpa from wikitech, the pmtpa instances and data will still be there, I can turn them on via cmdline [14:29:05] well I would probably need to be able to respawn them myself during my mornings [14:29:13] ah, true. [14:29:13] Ok. [14:29:57] hashar, the thing is I already have downtime scheduled for tomorrow morning, to move the wikitech host over to eqiad. [14:30:05] hashar, is beta in eqiad now? [14:30:13] MaxSem: ahh hello :-] [14:30:14] I'd prefer not to worry about having eqiad access when I do that... [14:30:25] MaxSem: yeah I have made the switch a couple hours ago. Was wondering how to get GeoData updated [14:31:03] andrewbogott: just make sure we can still access the instances somehow :-D [14:31:12] * hashar just being paranoid. [14:31:16] Coren, am I doing something wrong? [14:31:24] hashar, on it [14:31:45] MaxSem: might want to document it on the Extension:GeoData page [14:32:56] hashar, well. first of all, your migration reverted simplewiki content so it doesn't have {{#coordinates}} anywhere [14:34:06] then, since beta GeoData uses Elasticsearch, just follow the instructions for CirrusSearch:P [14:41:16] MaxSem: ah some wiki page was still referring to solr [14:47:43] hashar: Is it ok with you if I kill the logstash instance in pmtpa now? [14:48:02] The logstash for beta in eqiad is at https://logstash-beta.wmflabs.org/ [14:50:24] bd808: can just shut it down [14:50:34] * bd808 nods [14:50:35] bd808: in case you need to recover something from it over the next couple days [14:52:04] I'll reboot it to see if I can fix ssh access. I haven't been able to ssh into it for a couple weeks because the automounts died. [15:04:12] Coren, ping [15:04:46] Cyberpower678: Yes? [15:05:11] I made the modifications, but it doesn't seem to be working. Am I doing something wrong. Supercount [15:09:01] Coren, ^ [15:10:03] !log deployment-prep Rebased puppet repository. Only one hack left: https://gerrit.wikimedia.org/r/#/c/119534/ [15:10:06] Logged the message, Master [15:11:02] Cyberpower678: Well, at least two things: the file should be named .lighttpd.conf, and you need to use "server.modules +=" since you are adding to the configuration. Also, why are you adding mod_auth? [15:12:54] Coren, just because, I want to, [15:12:58] :p [15:32:04] ahhhhhhh [15:33:09] * hedonil 's opus recovered from failure [15:33:23] !failure [15:33:23] Cyberpower678 is responsible! [15:33:25] lol [15:33:43] Coren, I made the modifications. tools.wmflabs.org/supercount/server-status still doesn't work. [15:35:10] status.status-url should almost certainly = "/supercount/server-status" [15:38:13] Coren, still a 404 [15:39:15] tools.steinsplitter@tools-login:~/public_html/qery$ screen [15:39:15] Cannot open your terminal '/dev/pts/62' - please check. [15:39:19] o_O why screen dos not work? [15:39:34] Coren, do I need to reboot the web server? [15:39:53] Cyberpower678: Of course. [15:40:23] Steinsplitter: Screen works, provided you own your pty. (I.e.: use screen /before/ you become your tool) [15:40:42] ah, :):) k thx [15:42:03] Coren, web service won't start [15:42:29] ... and what does the error log tell you then? [15:46:20] Coren, [15:46:21] 2014-03-31 15:40:55: (configfile.c.853) source: /var/run/lighttpd/supercount.conf line: 559 pos: 15 invalid character in variable name [15:46:21] 2014-03-31 15:40:55: (configfile.c.909) configfile parser failed at: ( [15:46:21] 2014-03-31 15:41:40: (configfile.c.853) source: /var/run/lighttpd/supercount.conf line: 559 pos: 15 invalid character in variable name [15:46:21] 2014-03-31 15:41:40: (configfile.c.909) configfile parser failed at: ( [15:47:04] Ah, yes, the line number thing is clearly confusing. [15:47:29] You're using strange non-ASCII quotes in your config file. [15:47:45] Use "mod_status" not ”mod_status” [15:47:54] And so on. [15:48:21] (Line 559 is the first line of your config file) [15:51:12] hi, everybody [15:56:44] !log deployment-prep Cluster slow because some CirrusSearch job is spamming simplewiki . Gotta find a way to throttle the number of jobs being run on jobrunner01 or add more apache boxes . It is transient anyway, might look at limiting the runs tonight [15:56:45] I am off [15:56:47] Logged the message, Master [15:58:13] !log deployment-prep Updated kibana on deployment-logstash1 to e317bc6 [15:58:15] Logged the message, Master [16:00:19] Coren, I'm still getting 404 [16:00:46] Coren, bear in mind that I'm very inexperienced with lighttp [16:00:50] !log deployment-prep Restarted logstash service on deployment-logstash1; no new log events seen since 2014-03-28T10:57 [16:00:52] Logged the message, Master [16:06:14] scfc_de: Yeah. [16:07:59] Also, regarding this proxy blocker thing, it should temporarily give the IP blocking rights, then use the proxy to block itself. If it succeeds, it's an open proxy,and thus it's blocked :) [16:13:13] Coren, ping [16:20:14] You're still missing the initial slash in status-url [16:22:56] a930913: What do you propose would be the consequences of violating your TOS? [16:24:21] Coren, thanks [16:24:31] scfc_de: Highlighting the stupidity of certain laws? :p [16:25:30] a930913: Well, without more context it's hard to comment. [16:28:33] scfc_de: I was reading up on the laws surrounding computers, and found that it was really easy for a webadmin to prosecute people. [16:31:39] a930913: Aha. [16:51:49] Coren: whats the status of the cron reconfig? [16:54:47] Cyberpower678: Coren: you have to add tool's root to the relative url status.status-url = "//server-status" to work [16:54:52] https://tools.wmflabs.org/newwebtest/server-status [17:19:07] Coren: I just wanted to restart a lighttpd instance (because of a configuration change), but it doesn't start at the moment ("queue instance "webgrid-lighttpd@tools-webgrid-01.eqiad.wmflabs" dropped because it is overloaded"). Is the the only webgrid instance? I think this situation is not the best... [17:20:27] updated docs for newweb status & statistics https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Example_configurations [17:21:06] apper: It clearly isn't. Ima add a new webgrid instance now. [17:21:08] apper: Are you sure? qstat shows your webserver now running. [17:21:32] scfc_de: That'd probably be because it's nearly full and limiting by load. [17:21:48] yes, now it's running [17:22:15] but I can't wait five minutes after a configuration change to restart a webservice ;) [17:23:00] Coren: thanks for adding another instance [17:39:30] !log deployment-prep lowering # of jobs spawned by the jobrunner {{gerrit|122436}} [17:39:34] Logged the message, Master [17:40:57] lovely puppet [17:41:04] puppet.conf having: [17:41:04] certname = [17:41:05] [17:41:05] 500 Internal Server Error [17:41:05] :D [17:44:25] updated docs newweb default config https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Default_configuration [17:51:45] hedonil: Will you keep that page up to date? :-) [17:52:58] scfc_de: as long as I can ;) but I'll add a pull date. [17:54:20] scfc_de: I often wondered what the default looks like. now it's on screen... [17:59:50] * hedonil notes that the total request counter of lighttpd(server-status seems to suck - whyever [18:03:08] Cyberpower678: hey CP, solved your prob with /server-status [18:03:22] Yes [18:03:37] but it sucks in some details [18:03:54] hedonil, supercount is still going strong with 18000+ handled requests in <2 weeks [18:04:15] hedonil, feel free to add more. [18:04:31] So whats the status of labs now? The last email I saw said that "most of the functions are working now", however, beta labs seems to be unreachable. Any ETA on it being up and running again? [18:04:35] Cyberpower678: time to add some google-ads :-D [18:04:47] hedonil, for what? [18:04:51] :p [18:05:04] Oh wait. That would bring in quite the revenue. [18:05:54] * Cyberpower678 goes to implement google ads per hedonil [18:07:05] hedonil, what should I added .lighttpd.conf? [18:07:26] * hedonil looks [18:08:49] Cyberpower678: hmm. looks good [18:09:18] hedonil, you just said it's lacking details. :p [18:09:25] kaldari: Did you see http://permalink.gmane.org/gmane.science.linguistics.wikipedia.technical/76373? I think hashar's gone for the day, though. [18:09:54] Cyberpower678: make some reloads and look at absolute requests counter [18:10:48] hedonil, I noticed that already. I don't quite get the random number. [18:11:55] scfc_de: That URL doesn't show me anything (I assume there's supposed to be an email message there). The last email I saw was hashar's email to wikitech-l "beta cluster migrated to eqiad". [18:12:14] Cyberpower678: yeah. lol, looks like ol' debian RNG. lol ;) [18:12:52] hedonil, what does it mean? [18:13:45] kaldari: *argl* Gmane's down again. Yes, that was the link to the mail you referred to. So, if it doesn't work, file a bug? I don't think anyone besides hashar is deeply involved with Beta. [18:14:05] Cyberpower678: RNG = random number generator, old (notorious) bug in debian [18:14:18] scfc_de: Cool, thanks for the info. Just wanted to make sure it wasn't common knowledge before I started bugging people about it :) [18:14:27] hedonil, I meant the number. I know what RNG means. :p [18:15:39] Cyberpower678: afaik it should tell absolute requests since webservice started [18:15:59] hedonil, I figured, but why isn't it working? [18:18:20] Coren, ^^ [18:18:50] Cyberpower678: but numbers jump like an ADHD kid, lol ;) [18:19:08] I have ADHD. I'm not that hyper. :p [18:19:31] * hedonil laughs his ass off [18:20:00] :p [18:21:14] Cyberpower678: I'd have expexted that. But you're the hell of buddy ! [18:21:20] bug filed: https://bugzilla.wikimedia.org/show_bug.cgi?id=63315 [18:21:22] lol, lol [18:21:49] Coren, so is why the webserver throwing out random numbers rather than actually providing useful data. :/ [18:22:20] hedonil, am I "a hell of a buddy" or "the hell of body"? [18:22:20] YuviPanda: `sudo labs-vagrant enable-role flow; sudo labs-vagrant provision` gives "notice: /Stage[main]/Role::Flow/Mediawiki::Extension[Flow]/Git::Clone[mediawiki/extensions/Flow]/Exec[git clone mediawiki/extensions/Flow]/returns: fatal: could not create work tree dir '/vagrant/mediawiki/extensions/Flow'.: Permission denied" [18:22:37] spagewmf: which instance / project? can you add me? [18:23:20] editor-engagement's ee-flow-extra.eqiad.wmflabs. The extensions dir is owned by vagrant. [18:26:33] hedonil, so how do I fix this bug? [18:27:08] Cyberpower678: no clue! let's ask/blame Coren :P [18:27:29] * Coren is in a meeting atm. [18:27:31] * Cyberpower678 points his finger at Coren . [18:27:46] Coren, eta? [18:29:02] YuviPanda: /vagrant/puppet/modules/git/manifests/clone.pp tries to run as $user = vagrant, so I'm not sure. I'll strace. You're a member of editor-engagement, do you need to be a projectadmin ? [18:34:49] spagewmf: ah, so sudo on labs seems a bit broken [18:35:08] spagewmf: i do 'sudo su', 'su vagrant', 'sudo labs-vagrant enable' [18:35:30] spagewmf: someone else had issues running commands as other users (with postgres, I think). This solution worked there too [18:40:55] YuviPanda: that worked. And I found `sudo su vagrant touch /vagrant/mediawiki/extensions/foo` fails, so sudo and maybe clone.pp are accessing the actual user (spage) instead of the current user4 [18:41:04] I'll file a bug [18:41:22] spagewmf: yeah, I've been meaning to but never found time to hunt down the issue. Seems labs-related since other people were also having it [18:41:31] spagewmf: woo! Can you also mention this in the labs-vagrant page? [18:41:47] YuviPanda: yup [19:00:58] awjr: when you hit en.wikipedia.beta.wmflabs.org, what do you see? [19:01:30] awjr: also, is there a url I can use on my desktop to see the mobile site that you mentioned? [19:01:32]