[01:45:34] anyone knows why "mwscript eval.php --wiki enwiki [01:45:34] " does not seem to function on betalabs? [01:46:00] $wgSite and $wgTitle are null [02:06:45] 3Wikimedia Labs / 3tools: Some users can't connect to replica DB servers - 10https://bugzilla.wikimedia.org/69679 (10Tim Landscheidt) 3NEW p:3Unprio s:3major a:3Marc A. Pelletier (Split out from comment #1 of bug #54330.) Some recently created user and tool accounts seem to not be able to connect to... [02:07:43] 3Wikimedia Labs / 3tools: Some users can't connect to replica DB servers - 10https://bugzilla.wikimedia.org/69679 (10Tim Landscheidt) [02:07:44] 3Wikimedia Labs / 3(other): (Tracking) Database replication services - 10https://bugzilla.wikimedia.org/48930 (10Tim Landscheidt) [02:09:29] 3Wikimedia Labs / 3Infrastructure: Some databases missing - 10https://bugzilla.wikimedia.org/54330#c2 (10Tim Landscheidt) 5NEW>3RESO/FIX a:5Ryan Lane>3Marc A. Pelletier The issue in comment #0 seems to be fixed now; I've filed bug #69679 for the different issue in comment #1. [02:12:08] Thanks, Marc. This only happens for a small fraction of language editions for me. [02:19:17] > Failed to create instance. [02:19:22] thanks for the help :< [02:20:41] wtfff [02:20:48] tell me why??? [02:21:58] YuviPanda: can you try creating a new instance for me? [02:24:17] legoktm: you are probably out of quota? [02:24:27] there's a limit? [02:24:46] hm [02:24:50] there are 15 instances [21:06:22] and one of them happens to be this, just because i search for "sites-enabled" and such [21:06:34] mutante: I gotcha [21:06:41] it seems safe but I'm always scared of puppet :) [21:07:03] especially because that instance has a bunch of instances of limn, but I can check all of them if you want to run it [21:07:12] so it's limn1.eqiad.wmflabs [21:07:16] if you currently look into sites-enabled and sites-available, [21:07:20] and then reportcard1.eqiad.wmflabs in project reportcard [21:07:21] you _may_ have a messed up situation a bit [21:07:27] and have 2 files with different content [21:07:31] where actually one should be a symlink [21:07:39] Coren, I'm running a bot process on -login, in case you're wondering. [21:07:49] but if you could, don't do anything on reportcard1.eqiad.wmflabs, because that's running the "real" reportcard [21:08:04] !project mailman [21:08:04] There are multiple keys, refine your input: project-access, project-discuss, projects, [21:08:07] ok mutante, I'm checking that out now (-available vs. -enabled) [21:08:14] !project-access mailman [21:08:14] To request access to a project, use a project's discussion page; see !project-discuss [21:08:16] milimetric: i recently made a bug about reportcard.wikimedia.org [21:08:26] that's different mutante [21:08:35] i know, but it's broken [21:08:41] reportcard1.eqiad.wmflabs serves reportcard.wmflabs.org [21:08:47] then i found out it was decided it should not become production [21:09:12] imho redirect would be nicer than the error message [21:10:04] mutante: there's nothing in sites-available on limn1 [21:10:04] let me see which roles those instances have really configured [21:10:15] so I'm not sure what you mean now [21:10:24] milimetric: ..which is wrong [21:10:31] the sites should be in sites-available [21:10:38] and enabled in sites-enabled [21:10:48] by symlinks [21:10:49] gotcha [21:10:51] right [21:10:53] and is that what your change does? [21:10:58] yes [21:11:23] ok, cool, not super important for us (would rather the stability that comes with laziness) but I'm all for cleaner better code [21:11:33] well, it's more than that [21:11:35] so I'm happy to help test, and I think it should be pretty low risk [21:11:44] it's also that almost every Apache setup has slightly different dependencies [21:11:46] one sec lemme back up those sites-available [21:11:50] and this makes them do the same stuff [21:12:12] it's nice that the config is already in a template , so it's not touching that line [21:12:52] cool, ok mutante, do your worst and let me know when I should test [21:13:00] hashar: i notice zuul/jenkins fallen over once in awhile ... is there perhaps an easy way i can reset it without having to poke people? [21:13:00] I'll only be around for another little bit though [21:13:20] ebernhardson: yeah I need to write the step by step documentation to restart it [21:13:26] though restarting Jenkins would fix it :-] [21:13:55] ebernhardson: I have filled a bug for upstream (openstack) , but not much traces are available ::( [21:14:18] milimetric: cool, thanks, let me connect to instanes [21:14:40] hashar: where would i bounce jenkins from? i can't seem to ssh to integration.wikimedia.org directly(although maybe just permissions) [21:14:48] mutante: you know those are self-hosted and have secret data locally right? [21:14:59] ebernhardson: few folks have permissions. But ops can give it a poke [21:15:02] milimetric: denied public key, where i would expect to have root [21:15:03] using "secret" very loosely [21:15:09] hashar: ok [21:15:10] milimetric: no [21:15:21] one sec mutante, lemme get you access [21:16:39] ebernhardson: I will probably write a script to unstuck it :-] [21:17:45] 3Tool Labs tools / 3X!'s tools: Checkwiki link on gadget - 10https://bugzilla.wikimedia.org/69718 (10bgwhite) 3UNCO p:3Unprio s:3normal a:3None Wow, the Xlink tools and gadget are impressive. I did not know about the Checkwiki feature on the gadget. I'm the maintainer of Checkwiki. There is no "b... [21:25:46] 3Wikimedia Labs / 3tools: I can't access ptwiki_p database - 10https://bugzilla.wikimedia.org/69701 (10Tim Landscheidt) [21:25:46] 3Wikimedia Labs / 3tools: Some users can't connect to replica DB servers - 10https://bugzilla.wikimedia.org/69679 (10Tim Landscheidt) [21:27:19] how fixable is the mariadb puppet error? [21:28:29] we would like to apply other changes, but are blocked [21:28:45] Every few days I get a failure when I try to login to beta labs. It comes from http://en.wikipedia.beta.wmflabs.org/wiki/Special:CentralLogin/complete?token=, this time "[91e46ac0] /wiki/Special:CentralLogin/complete?token=393bfc3673da806cbc1c33ebef9b2516 Exception from line 167 of /srv/common-local/php-master/extensions/CentralAuth/specials/SpecialCentralLogin.php: The user account logged into does not exist." [21:29:37] the workaround is to visit beta-labs Special:UserLogout , then login works. [21:37:46] somebody testing reboots? [21:37:54] info: mpt raid status change on testlabs-reboottest6 [21:38:30] we just got like 10 mails similar to that [21:38:42] should the entire raid status check be removed? [21:38:49] because virtual [21:50:56] mutante: What mariadb puppet error? [21:51:52] Error 400 on SERVER: Could not parse for environment production: Syntax error at '{'; expected '}' at /etc/puppet/manifests/role/mariadb.pp:86 [21:51:56] scfc_de: [21:54:01] mutante: that's me, sorry. [21:54:07] Rebooting frantically, trying to trigger a race. [21:56:21] gotcha, alright [21:56:42] andrewbogott: you mean the raid status right [21:57:08] I don't know what 'raid status' means in this context, since testlabs-reboottest6 is a VM... [21:57:11] But it's probably related. [21:57:29] 14:40 < mutante> should the entire raid status check be removed? [21:57:29] 14:40 < mutante> because virtual [21:57:47] Ah, yes, probably should be removed. I'm surprised that it's there at all. [21:58:15] subject:info: mpt raid status change [22:11:43] <^d> bd808: Can you think of any deployment-prep instances we could drop? We're maxed out at 40/40 already :\ [22:12:07] <^d> Maybe could lower elastic from 4 to 3, hmm. [22:12:17] ^d: Let me take a quick look. We can always ask for more quote [22:12:21] quota [22:12:26] labs has room to spare [22:18:25] ^d: We could kill deployment-apache01 and deployment-apache02 if we had to. They are unused now that we are running hhvm. [22:19:07] I'm not sure what deployment-sandbox and deployment-fluoride are doing for us either but I didn't build them [22:19:20] <^d> I think -fluoride is mostly broken. [22:20:44] probably not related, but deployment-mediawiki01 isn't serving any requests either [22:20:47] but thats probably a bug :P [22:21:05] (or maybe leftover from swtaars debugging?) [22:21:07] <^d> 2014-07-31T22:18:09 was the launch time for -sandbox. [22:21:15] <^d> No used storage yet. [22:22:28] <^d> !log deployment-prep dropped apache01/02 instances, unused and need the resources [22:22:31] Logged the message, Master [22:23:09] ebernhardson: It's a bug/leftover from debugging. [22:23:34] I tried to repool it this morning and found out that puppet can't configure the apache vhosts right now [22:23:54] deployment-mediawiki02 has puppet disabled :/ [22:24:28] wait a minute, please tell me more about "can't configure apache vhosts" [22:24:33] And the guy I'd bug to fix that is on vacation this week [22:24:53] mutante: Log into deployment-mediawiki01 and try to start apache [22:25:08] lots of "oops this is wrong" and then it dies [22:25:20] sigh, let's see if i can log in [22:25:23] heh [22:25:33] deployment-mediawiki02 is running hand coded something from _joe_ [22:26:44] All related to the move of apache config to the puppet repo I imagine. We had a very non-prod config in the apache repo under a separate branch [22:26:54] apache 25382 0.0 0.1 475312 11960 ? S 21:31 0:00 /usr/sbin/apache2 -D HHVM -k start [22:26:58] _D HHVM ? [22:27:05] it's running? [22:27:16] It is? /me looks again [22:27:27] starting it = no error [22:27:40] * Starting web server apache2 * [22:27:46] but it was already before ?! [22:27:57] on deployment-mediawiki01? [22:28:08] 02 ..argg [22:28:46] yea, 01 is different [22:29:01] please say it's different roles and not all manual [22:29:09] all manual [22:29:14] (98)Address already in use: AH00072: make_sock: could not bind to address [::]:9002 [22:29:14] puppet is disabled on 02 [22:29:18] ok, i'm logging out [22:29:38] that was my response too. _joe_ will fix it next week [22:29:55] Until then lets hope that vm doesn't vanish in a puff of smoke [22:29:56] AH00112: Warning: DocumentRoot [/usr/local/apache/common/docroot/wikispecies.org] does not exist [22:29:59] AH00112: Warning: DocumentRoot [/mnt/upload7] does not exist [22:30:02] AH00112: Warning: DocumentRoot [/usr/local/apache/common/docroot/config] does not exist [22:30:05] we could probably start with that part [22:30:07] AH00112: Warning: DocumentRoot [/usr/local/apache/common/docroot/ee-prototype] does not exist [22:30:11] and have the document roots [22:48:26] !log mailman Deleted corrupt mailman-01 instance [22:48:28] Logged the message, Master [22:48:57] !log wm-bot Restarted instance and bot as frozen for almost 24 hrs [22:48:57] wm-bot is not a valid project. [22:49:11] !log bots Restarted instance as wm-bot as frozen for almost 24 hrs [22:49:12] Logged the message, Master [22:51:55] <^d> Gah, new instance built. SSH'd fine. Exited. Can't ssh now, pubkey denied. [22:51:57] <^d> What gives. [22:53:27] ^d: that happened to me for wm-bot [22:53:32] restart the instance? [22:54:12] <^d> {{doing}} [22:57:37] <^d> {{no dice}} [22:58:47] ˙^d, what project and instance? [22:59:09] <^d> deployment-swift01 (i-0000054d.eqiad.wmflabs), deployment-prep. [23:00:08] Hm, just as soon as my internet starts working again, I'll have a look. [23:01:13] bah, virt1006 [23:01:20] ^d: that host seems to be rapidly decaying :( [23:01:30] <^d> :( [23:01:36] I'm in the process of depooling so it won't get new instances. But instances that are on it… I don't know yet if they can be saved. [23:01:44] I'll use yours as an experiment in just a moment. [23:02:00] <^d> Do whatever you'd like to it. It's brand-new from like 15 minutes ago. [23:04:41] actually… ^d, I have root on that instance, no problems. [23:04:45] You can ssh to other beta vms? [23:05:23] <^d> Hmm, swift01 works now [23:05:30] <^d> Other vms have been fine [23:05:53] ok, well, lemme try to migrate it anyway, as an exercise. I'll need to shut it down first [23:15:29] well… ^d, I'm stumped, probably best to just delete that instance and build a fresh one. [23:15:53] The issue shouldn't recurr, I'm just not sure what to do about instances that already live on virt1006 :( [23:16:13] * bd808 hopes that deployment-mediawiki02 isn't there [23:17:03] That's the box that I figured out this morning is hand built and the only apache powering all of beta [23:17:06] nope, it's on 1008 [23:18:42] YuviPanda|brb: is it ok if I 'accidentally' break quarry-web-test?