[00:09:40] Betacommand tyvm [00:10:09] * Betacommand gumbles about ToAruShiroiNeko not writing that down last time [00:11:15] Sorry, I was gullable enough to think it would be fixed instantly :( [00:21:35] Coren: I should probably know this already, but how can I scp files between pmtpa and eqiad? I tried pushing from pmtpa to eqiad but got an "unknown host" error for the eqiad address. Same thing from bastion in pmtpa. bastion-eqiad.wmflabs.org gives "unknown host" for the pmtpa side. [00:24:37] Coren: Never mind. I figured it out. I can pull to the bastion and push from there just not an end to end copy. [00:34:24] bd808: You can also do end-to-end, but DNS won't work for the 'fake' .wmflabs addresses so you have to use IP [00:48:27] Someone already having experience with self-hosted puppetmasters in eqiad? I get a lot of "Error 400 on SERVER: Not authorized to call find on /file_metadata/files/ssl/wmf-ca.pem" and ... labs-puppet-key with that, but too tired to debug this today. [04:59:09] does anyone else have problems with the webproxy for instances on eqiad? [05:00:20] I got bad gateway errors, deleted the proxy entry and recreated it, then it worked for a while and now it is back to throwing 502s again [05:29:57] Cyberpower678: Here they are! [05:30:12] andrewbogott, yes. More than 20. :p [05:30:13] Creepy, right? [05:30:22] That's really wierd. [05:30:46] morebots, are you here too? [05:30:52] Coren, so I get mysqli_connect(): (HY000/2003): Can't connect to MySQL server on 'tools-db' (99) [05:30:55] nope :( [06:32:07] hi, i just migrated my tool to eqiad. I can't find there the source for pywikpedia in the shared directory (my bot depend on pywikipedia). Should I clone it locally? [06:45:48] according to ...MIGRATE STATUS pywikipedia already moved, but probably it isn't accessible (ls doesn't show files but only directories) [06:56:41] (solved: cloned it locally. using shared isn't currently possible) [07:19:54] My mwui instance (a simple single node wiki proxied to http://mwui.wmflabs.org/wiki/Main_Page) stopped working again. [07:20:02] It's on eqiad, and I'm getting 502 Bad Gateway. [07:20:09] I used to be able to ssh to it, but now I can't. [07:23:56] Is bastion-eqiad.wmflabs.org the right bastion for me to use? [07:40:29] superm401: that's the right bastion, sure. [07:40:39] superm401: but, the 502 is… well, let me look. [07:40:59] superm401: what is the project name? [07:41:16] editor-engagement [07:41:56] OK, so -- we're talking about two different problems, right? ssh and also http? [07:42:31] The ssh issue is most likely this one: https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration_Howto#Security_Groups [07:42:36] I can fix that for you if you would like. [07:43:10] superm401: ? [07:43:39] andrewbogott, yeah, both, and both were working before. :) [07:43:59] When you say 'working before'... [07:44:07] you mean with the new eqiad instance? [07:44:12] andrewbogott, yes. [07:44:24] But not with bastion-eqiad. [07:44:40] I think it was bastion2.wmflabs.org [07:44:51] Ah, sure, that's the difference. For that bit at least. [07:45:01] Do you want to read and understand the above link, or just have me fix it? [07:45:07] (This is ssh. The http thing I don't know about yet.) [07:46:06] andrewbogott, which group name do I add it to? [07:46:29] you'll want to ssh to everything, so change the rule in 'default' to allow from 10.0.0.0/8 [07:49:54] andrewbogott, done, how long is it supposed to take? [07:50:04] for ssh? Should work right away. [07:52:10] andrewbogott, got it. My local ssh config also wasn't quite right. [07:52:36] cool. OK, I'm working on the proxy right now anyway so I'll look at your http issue as well... [07:52:50] It says 'System restart required', FWIW. [07:53:16] andrewbogott, I can reboot through the wikitech UI if that doesn't interfere with you. [07:53:35] hang on just a second... [07:54:32] superm401: there's a dns issue… I think you've done everything properly with the proxy. [07:54:34] I'm not sure what's up [07:54:41] If you need to reboot for other reasons then that's fine. [07:55:27] superm401: fixed. Although I fear that DNS is dieing off every day or so :( [07:55:31] Anyway, working for you now too? [07:56:01] andrewbogott, yep, thanks for your help. [07:56:11] superm401: do you mind helping me test something? It should be easy... [07:56:18] andrewbogott, sure. [07:56:22] I've just changed wikitech to allow you to use a proxy in eqiad for eqiad instances... [07:56:31] So, want to delete the current proxy for that box and create a new one in eqiad? [07:56:35] Should be pretty obvious how to do it. [07:56:40] Okay, I better not tell anyone it's fixed yet. ;) [07:57:03] Yeah, moving to a different proxy will require dns to propagate again... [07:57:21] I didn't even realize it was proxying between data centers before. [07:57:21] (It can have the same name of course.) [07:57:36] I'll do a fun test first... [07:57:37] Until an hour or so ago there was only one proxy, in tampa. [07:57:40] Now there are two. [07:58:24] And now there are two proxies, both with mwui.wmflabs.org. [07:58:30] Probably shouldn't allow that, right? [08:00:14] ^ andrewbogott [08:00:41] superm401: Huh, it should already prevent that. I'll have a look. [08:05:38] superm401: working ok other than that? [08:06:05] andrewbogott, yep. [08:06:39] cool, thanks for testing! [08:06:47] I'm going to disable creation of pmtpa proxies in just a minute... [08:07:00] No problem, looks like eqiad won, BTW. [08:07:12] I bet it's random :( [08:16:17] isn't round-robin DNS *fun*? [10:29:11] hoi, the backups that exist in /public/datasets/public/wikidatawiki are not present on eqiad [10:29:30] they are run by a user called "backup" [10:29:53] can someone fix this so that the new environment works fine ? [11:54:54] I didn't follow updates to Labs / Tool-Lab recently, but what's *OLD* in bash prompt? [11:54:56] local-liangent-py@*OLD*tools-login:~/scripts/updatedyk$ [12:03:22] and regarding migration: is it fine (and actually better) not to touch them if I want my instances archived [12:04:02] so they don't need to keep running but I'll be able to get old data back at some point in the future [13:43:52] andrewbogott_afk: The nova-network dhcp/dns thing seems to be full of fail. [13:44:12] dhproxy. [14:07:37] Hi, I tried to migrate some tools (migrate-tool), but the webservice doesn't seem to restart. Do I need to do something else than "finish-migration"? [14:08:12] Darkdadaah: Well, possibly 'webservice start' [14:08:15] :-) [14:08:33] Ah, I did try that too. [14:08:45] Oh? That /should/ have worked. Let's see. [14:09:12] Oy! [14:09:21] We're already out of slots on webgrid. :-) [14:09:33] (Not long or hard to fix) [14:09:44] Ouf [14:17:34] Darkdadaah: Should be started now. [14:19:51] Ok, it seems to work :) [14:23:19] Coren, what's dhcp/dns doing? [14:23:28] andrewbogott: Sucking. [14:23:35] Could you be more specific? [14:24:03] andrewbogott: We had the same problem in pmtpa, remember? Where the dhrelay thing just drops a fraction of requests to /dev/null when it just vaguely looks like it's being lightly loaded. [14:24:41] I only barely remember this... [14:27:08] …which makes me think I must not have been the one who fixed it. [14:27:16] Coren, Do you remember what the solution was, then? [14:27:21] ok andrewbogott, trying proxy again [14:27:27] getting 404 from nginx right now [14:27:40] inside network direct to instance works [14:27:49] curl http://archiva0.eqiad.wmflabs:8080 [14:28:01] but archiva.wmflabs.org does now [14:28:02] not [14:28:03] * [14:28:21] ottomata, what project? [14:28:21] andrewbogott: There wasn't; or at least I never found one. All I did was alleviate the problem by reducing the number of queries by setting up a local DNS in some projects, and adding entries in /etc/hosts in tools. I was /hoping/ that the new version woulld fix this. [14:28:34] analytics [14:30:06] ottomata, archiva.wmflabs.org looks to be working for me. Or, at least, I get 'Apache Archiva' and an icon up top. [14:30:40] Coren: ok… so which components are involved in this? It's not just pdns? [14:31:00] (Sorry to be clueless...) [14:31:46] hm [14:31:49] not working for me [14:31:53] i get 404 [14:31:54] from nginx [14:31:58] Coren: how do I know what is the new name of my database after migration? [14:31:59] andrewbogott: I don't think it's pdns at all; I think it's dnsmasq [14:32:02] is it hitting the wrong IP, maybe my dns is cached? [14:32:23] ottomata: probably you're hitting the old proxy. Or I am. [14:32:43] hmm ok i think it might be my dns, i just curled from another server and it worked, k [14:33:02] Darkdadaah: The prefix will be the same as the username in replica.my.cnf [14:33:18] Darkdadaah: sXXXXX [14:33:40] ottomata: if you get a 208.80.153 IP, that's still in tampa. [14:33:44] .155 is eqiad [14:33:45] Coren: Yes but, what about the suffix? I used to have "localanagrimes" as a name. [14:34:24] yeah i got that locally [14:34:25] k [14:34:29] trying to flush os x cache... [14:34:41] thanks andrewbogott, looks like it is working after all, on it.. [14:34:53] Coren, so… is the dns problem causing immediate failures, or is it a longer-term/capacity issue? [14:35:06] (Because I have an immediate problem which I want to change the subject to :) ) [14:35:24] andrewbogott: I think it's causing infrequent issues at this time; but it'll just increase [14:35:39] Ok, I've certainly seen infrequent dns failure as well :( [14:35:51] same here [14:36:06] But I will have to study up in order to understand what component to blame. [14:36:12] Darkdadaah: I see 's51045__anagrimes2' [14:37:00] Coren: this is an experimental and empty one that I just created. [14:37:56] (and I can access this one in eqiad) [14:38:19] Darkdadaah: That's the only one the automated system found. You'll have to dump and restore any others you might have had that had different naming. [14:38:51] Ok. I suppose it's because it was created before we used the username for the db name. [14:39:12] Darkdadaah: Probably. You'll also have to edit the dump to fix the name of the database, but that's easy. :-) [14:39:49] andrewbogott: So, what subject did you want to change to? [14:40:20] Coren: I emailed. Migrated images don't have /dev/sdb after I copy them over [14:40:31] andrewbogott: Ah that. [14:40:34] I'm thinking since you've done some recent thinking in that area… [14:40:35] Ok, in the meantime I move back the old public_html to use the working db. [14:41:36] andrewbogott: I didn't think so much as observe that eqiad sets up "large vda" rather than "small vda and vdb for the rest". It's easy to adapt, then. [14:42:01] andrewbogott: But if copying the image over loses the vdb, that's a more complicated matter. [14:42:44] andrewbogott: Do you have a running migrated instance that losts its vdb I can look at? [14:42:47] sure. [14:43:03] wikisource-dev.eqiad.wmflabs [14:45:19] hiya YuviPanda? [14:47:43] andrewbogott: And things were going so wel... as far as I can tell, the method you use to copy images doesn't actually carry the /dev/vdb with it at all. [14:48:45] andrewbogott: All the image seems to have is the small (16G) vda [14:59:52] Coren, sorry, network downage [15:00:22] So… the files that I copied over are… disk.local, 470M and disk, 8.6G [15:00:30] I would expect that that latter is /dev/vdb [15:00:39] Does that seem likely or unlikely? [15:01:13] ... that seems smallish. [15:01:29] 470M in particular is too small to be vda [15:01:34] hm, ok. [15:04:02] hi ottomata. [15:04:04] ottomata: sorry, only spoardoically available at the moment :( [15:04:39] that's ok, i was just wondering about the ssl proxy you set up that we are usign for wikimetrics, i thought for a second it wasn't working in eqiad labs, but I think it is [15:04:45] so , never mind! :) [15:04:55] ottomata: ah :) [15:05:03] ottomata: ok! :) [15:05:03] ottomata: good to know! [15:06:44] well, wait -- coren, it's copy-on-write. So the 470M should just be the diff. [15:07:31] andrewbogott: Hm... good point. [15:07:47] andrewbogott: Yeah, taking cow into account, the sizes would make sense. [15:08:23] I see another instance where 'disk' is 9g and 'disk.local' is 20G [15:08:59] So... [15:09:20] Probably the same reason why new instances didn't have /deb/vdb. Whatever that is. [15:32:35] are the dumps ('datasets') already in eqiad? [15:37:00] gifti: no, not yet [15:37:12] is there an estimate? [15:37:17] i heard it's 95% done [15:37:26] within March i'd say [15:37:30] uh, ok [15:37:34] who's responsible? [15:37:37] apergos [15:37:49] thx [15:37:53] yw [15:38:16] :) [15:44:47] andrewbogott: I don't think it's the /same/ reason; instances in eqiad get created with a variable-sized vda so it /is/ taking the instance size into request, just not the same way as in folsom [15:57:52] andrewbogott_afk: More importantly, we need to figure out a way around/through this. Clearly, creating the instance in eqiad first then overwriting the image has openstack "fix" the image to fit the disks. [15:58:05] (As they were created) [15:58:20] * Coren ponders. [15:58:54] Obviously, people who create new instances won't have issues; they can simply use the labs_lvm thing and get their storage. [15:59:40] (Which means that making it the default might actually make sense then) [16:06:43] gifti: https://bugzilla.wikimedia.org/62296 [16:06:46] ottomata: Have you tried running a self-hosted puppetmaster in eqiad? It fails for me with "Error 400 on SERVER: Not authorized to call find on /file_metadata/files/ssl/wmf-ca.pem" & Co. Any ideas? I set the IP subnet for eqiad in master.pp, but that didn't help. [16:07:00] How many instances are we talking about, and what constitutes an instance besides a tar ball of the file system? [16:08:50] hm, i have tried it and ahve not seen that [16:09:01] Coren, I'm stuck. [16:09:47] Coren, I seem to be getting mysqli_connect(): (HY000/2003): Can't connect to MySQL server on 'tools-db' (99) at random since the move to eqiad. [16:10:13] ottomata: One moment, I'll pastebin the log. [16:11:02] scfc_de: are you doing self hosted puppet with no special puppetmaster setting? [16:11:08] i.e. locally hosted self hosted puppet? [16:11:21] (i haven't tried it in eqiad with a remote self hosted puppetmaster yet) [16:11:45] Coren, any ideas? [16:12:31] Oh, sorry, should have been clearer: role::puppet::self with $puppetmaster = FQDN (local for the moment, but planned as a puppetmaster for other instances later). [16:12:41] Let me see if $puppetmaster = empty works. [16:13:09] ah ok, yeah i haven't tried setting that yet [16:13:21] * Coren reads scrollback. [16:14:03] Cyberpower678: "random"? [16:14:06] ottomata: Okay, clearing $puppetmaster on an instance with problems doesn't seem to be a good idea, let me set up puppetmaster3 :-). [16:14:21] Cyberpower678: Like, it works, then fails, then works again? [16:14:22] Coren, it sometimes works and sometimes doesn't [16:15:00] Coren, It doesn't work at all in a different but similar script. [16:15:17] $dblocal = mysqli_connect( 'tools-db', $toolserver_username, $toolserver_password, 's51059__cyberbot' ); [16:15:17] $dbwiki = mysqli_connect( 'enwiki.labsdb', $toolserver_username, $toolserver_password, 'enwiki_p' ); [16:15:27] $dblocal fails [16:15:27] scfc_de: worked for me yesterday or so, i put "localhost" in the form [16:15:32] $dbwiki works [16:15:33] and puppet::self worked [16:15:50] ha, yeah scfc_de, i wouldn't recommend that :) [16:16:02] you could get it to work eventually (clearing the setting), bu tit would be a pain [16:16:22] yeah so, mutante 'localhost' and empty are equivalent [16:16:27] it will infer localhost if you leave it empty [16:16:35] gotcha, ok [16:16:40] Coren, ^^^^^^^ [16:16:46] putting FQDN of a node there allows you to configure other labs instances as puppet clients of a self hosted puppet master [16:16:47] Cyberpower678: Yes, yes, I've read. [16:16:52] Ok. [16:17:00] Cyberpower678: Does dblocal fail always, or intermitently? [16:17:04] if the value there matches $::fqdn when puppet runs [16:17:09] then puppet will set up the node as the puppetmaster [16:17:10] Coren, it seems to be intermittent. [16:17:36] DNS related? [16:17:44] * Coren wishes the error message was less stupid than just 'can't connect' [16:18:05] couldn't the intermittent DNS issue also just cause this, can't resolve = can't connect? [16:18:18] mutante: It could; I wish I could know. [16:18:22] Try it with tools-doesnotexist? [16:19:00] Cyberpower678: if you type "host enwiki.labsdb" on that shell and repeat it a couple times, does it always return an IP? [16:19:23] mutante: enwiki.labsdb is resolved in /etc/hosts. [16:19:45] eh, then "tools-db" [16:19:49] So is tools-db actually. [16:19:49] the one that fails [16:19:53] So no dns. [16:20:03] ok [16:20:11] (tools-db is an alias for tools.labsdb) [16:20:14] unless nsswitch it set to ask DNS before files [16:20:49] Coren, http://tools.wmflabs.org/cyberbot/botlogs.php?bot=CyberbotII&task=spambot&type=err shows the last 10 MB of the stderr logs generated when in operation. Scroll almost to the bottom. You'll see it see all of the attempts it made and other time the bot completed a successful run. [16:20:57] mutante: Good idea. But "hosts: files dns" [16:21:07] hmm,ok [16:21:45] "Can't connect to MySQL server". What a useful, Microsoft-like error message. [16:22:05] "An error has occured." [16:22:15] :p [16:22:32] Can you reproduce the problem by hand at all? [16:22:41] does the GRANT for that contain a from IP range, and that is just Tampa? [16:22:43] mutante, Coren typing "host tools-db" returns nothing [16:23:07] Cyberpower678: host only checks DNS. [16:23:41] Coren, I just did what mutante asked me to do. [16:23:58] That's because he didn't know that those came from hosts. [16:24:01] Cyberpower678: it was a way to test DNS before i knew you had it in /etc/hosts [16:24:14] that's when i thought those problems could be related [16:24:25] show global status for tools-db says [16:24:27] | Aborted_clients | 505 | [16:24:28] | Aborted_connects | 184 [16:24:28] mutante: My automatically created grants all have @'%'. So not that either. [16:24:36] A while ago we had some connection issues with the replicas, and the error was in the DB servers being overloaded or something like that. Don't remember the details, but if tools-db is now the same setup, Sean might be of help. [16:25:12] scfc_de: Hitting the connection limit is usually simple to debug because you normally get a specific message for this; but I'll check. [16:25:50] It was something different (network?). IIRC January or so. [16:26:03] Coren, one of my scripts fails to connect to it entirely. I tried copying and pasting the mysqli_connect command from the working script to the broken script, but it's still not working. [16:26:33] so, there is a documentation for intuition on tools/help. how can i use translatewiki without intuition/php? [16:27:16] i can't detect the underlying mechanism [16:27:18] Coren: is there documentation somewhere about how to set up an app to use our ldap? [16:27:47] Cyberpower678: for it works fine with hte test script https://tools.wmflabs.org/tools-info/?dblist=tools-db [16:28:00] ottomata: I don't know; I don't think so (at least I've never seen it). [16:28:29] he could likely copy from Icinga login [16:28:44] ottomata: [16:28:49] Cyberpower678: repeatedly called w/o connection errors [16:28:55] at least some other config to copy from i suppose [16:29:09] uses same LDAP for auth [16:29:21] Cyberpower678: I've been trying to get the error repeatedly as well, and I'm got getting issues. [16:29:22] hedonil, then perhaps somebody could look at my replica.my.cnf file? [16:29:31] if it uses Apache that is [16:29:42] mutante: ? [16:29:48] oh [16:29:49] LDAP [16:29:50] yeah [16:29:55] ottomata: you want to setup an app to use LDAP for auth, right [16:30:00] It makes no sense for one script to work and another to fail. [16:30:04] yes, trying to see if I can make my archiva setup in labs use it [16:30:06] and it made me think of Icinga setup in puppet [16:30:07] Who knows how many databases are copied at various moments by the migrate-tool script; so there may be peaks. [16:30:14] so that should be the Apache module [16:30:16] for LDAP auth [16:30:23] Cyberpower678: Well, unless you have some bug in the way you read your credentials from the replica.my.cnf? [16:30:27] i'm a little worried about the proxypass [16:30:29] mod_auth_ldap i thik [16:30:31] since that comes from private [16:30:36] not sure if I need that [16:30:43] scfc_de: It's one-at-a-time on purpose to avoid exactly that. [16:30:54] Coren, I haven't changed it. It's always been the same so far. [16:30:55] Cyberpower678: I use the brand new replica.my.cnf one [16:31:02] hedonil, same [16:31:25] database.inc reads the replica.my.cnf file. [16:32:14] does it work manually without reading any credentials from files? [16:32:14] Cyberpower678: cwd issues? Does it read the file with an absolute path? [16:32:54] mutante: i dunno, i'm not entirely sure what port to connect to [16:32:57] or server [16:33:02] virt1000.wikmedia.org [16:33:06] port 1382? [16:33:09] 1389* [16:33:09] ? [16:33:13] Coren, no [16:33:30] $toolserver_mycnf = parse_ini_file($_SERVER['HOME']."/replica.my.cnf"); [16:33:30] $toolserver_username = $toolserver_mycnf['user']; [16:33:30] $toolserver_password = $toolserver_mycnf['password']; [16:33:30] unset($toolserver_mycnf); [16:35:55] If it was the same issue as some weeks before (db connection aborted due to network issues) we'd have seen another error message (...lost connection during etc) [16:37:29] * Coren tries to reproduce the error. [16:37:35] ottomata: actually i meant Cyberpower678 for that one:) but the LDAP port should be 1389 [16:37:40] opendj is configured to listen on ports 4444, 8989, 1636, 1389. [16:38:54] Cyberpower678: how about mysql -u user -p -h toolsdb enwiki or so [16:39:04] and then interactively entering pass [16:39:17] Cyberpower678: That's not the bug but you should use process_section in your parse_ini_file as there could be more than just the [client] section. [16:39:37] mutante, what good will that do? [16:39:48] * Coren tries to go see if the logs on the DB are any more informative. [16:40:21] Cyberpower678: ruling out possible issues with include the credentials from files etc [16:40:28] just to get closer to the root cause [16:41:43] SQL wokrbench seems to work with those credentials. [16:41:52] hm, mutante, am I allowed to talk to ldap from labs? [16:42:11] Cyberpower678: What's your tool's username? [16:42:22] cyberbot [16:42:29] on eqiad [16:42:34] ... DB username. :-) [16:43:38] Oh, seriously? That is SO useful mysql! [16:43:38] Coren, oh. Be more specific. :p S51059 [16:43:39] ottomata: i think you should be, but maybe not from eqiad instances yet? security group rules or so? [16:43:42] Mar 7 12:57:20 labsdb1005 mysqld: 140307 12:57:20 [Warning] Aborted connection 275787 to db: 's51059__cyberbot' user: 's51059' host: '10.68.16.39' (Unknown error) [16:44:04] "(Unknown error)" [16:44:14] yea, -u [16:44:15] :p [16:44:16] * Coren headdesks. [16:44:51] likes the "unknown" part [16:45:23] * Cyberpower678 wonders how an error could be unknown, but still be recognized as an error. [16:45:41] hm, mutante, i think security groups control incoming connections [16:45:43] not outgoing, [16:45:45] right? [16:45:46] well it knows it's an error it just doesn't know which kind:) [16:45:51] and you are rightr, I can connect to 1389 from a pmtpa instance [16:46:08] ottomata: but something must come back from LDAP to you incoming as well [16:46:24] estabilshed shoudl be allowed, right? [16:46:32] its not like the ldap server is going to open new sockets to my instance [16:46:33] ? [16:46:46] also, there are no 1389 security policies in pmtpa (in my project) [16:46:49] or in eqiad [16:46:51] unless it's set to allow all the related traffic, like ESTABLISHED,RELATED -j ACCEPT [16:46:53] but it works from pmtpa [16:47:33] well, then, i dunno, something networking related with eqiad [16:48:11] tcpdump? [16:49:19] ottomata: could the LDAP server itself be firewalled to only allow from pmtpa? [16:49:30] which you would not see in security groups in webui [16:49:49] UHHH, now it is letting me telnet [16:49:52] it was nto before [16:50:59] Cyberpower678: Google shows that there are some buggy php extensions that can interfere with database streams in some cases. Are you using geophp in those scripts that fail? [16:51:23] Coren, not likely. I'm using mysqli [16:51:58] ottomata: odd, we keep having intermittent issues it seems [16:52:07] What's mysqli? [16:52:15] yeah, i've noticed my proxies someimtes work and sometimes not [16:52:18] Coren, my script should've just flooded some connections into the log. [16:52:23] Reconnecting to local DB... [16:52:23] Reconnecting to local DB... [16:52:23] Reconnecting to local DB... [16:52:23] Reconnecting to local DB... [16:53:02] mysqli replaced mysql_connect etc [16:53:14] might be worth seeing what happens with the old method..shrug [16:53:35] but when you use that PHP will tell you it's deprecated and you should switch ,,juts warnings though, still WFM [16:54:09] !newtoollabs [16:54:11] Logs actually show three users getting those errors. [16:54:19] !newlabs [16:54:19] The tools project in labs another version of toolserver.  Because people wanted it just like toolserver, an effort was made to create an almost identical environment.  Now users can enjoy replication, similar commands, and bear the burden of instabilities, just like Toolserver. [16:54:21] But only those users. [16:54:57] Coren: So, what do they have in common? [16:54:58] Switching to my phone [16:55:36] hedonil: Absolutely nothing afaict. [16:55:44] I'm an iPhone. :p [16:56:56] CP678|iPhone: admit it, you compiled your own mysqli driver and it doesn't work [16:57:01] Please ping me for my attention. [16:59:23] hedonil: if I knew how to do that, I would've made work over SSH connections so I could debug DB dependent scripts on my computer more easily. [16:59:57] CP678|iPhone: hehe [17:00:39] I have no idea what this could possibly be. The client whines that it doesn't work without details, and the DB simply says that something went wrong without detail. [17:00:44] ottomata: role::puppet::self with empty $puppetmaster works, you're right. I'll file a bug for the multi-instance bug. [17:01:05] ottomata: Re LDAP, if it is the same server as queried by "ldaplist -l passwd", this seems to work from eqiad flawlessly. [17:02:22] CP678|iPhone: Please try from the command line with the mysql client. [17:02:28] Coren: makes it a little hard to fix doesn't it. [17:02:44] Coren: no access right now. [17:02:54] CP678|iPhone: so do that when you're back home? [17:03:11] valhallasw`cloud: no time. :/ [17:03:32] or, you know, tomorrow, or next week, or whenever you have time. [17:05:42] Coren: CP678|iPhone: here is an interesting thread http://lists.mysql.com/mysql/204830 [17:06:35] yeah scfc_de, it is working right now, now trying to figure out how to configure my app [17:06:36] ldap [17:06:37] tells about 'running out of available outgoing ports' [17:08:11] Coren: CP678|iPhone: maybe not properly closed connections or use of persistent-connections (although I can't see the latter in the cb-script) [17:08:37] hedonil: Also, there are very few connections atm and no dangling ones; so that's not it. [17:08:41] Coren: tool paste doesn't work for unknown reasons [17:08:53] Coren: since I migrated it the webserver doesn't work [17:09:06] http://tools.wmflabs.org/paste/ [17:09:14] it produces no error log whatsoever [17:10:48] petan: It'd probably help if you turned on request logging. [17:11:16] Eqiad seems to be having unsolvable issues. :p [17:11:40] Coren: what is "request logging" [17:11:52] and how do I turn it on [17:12:04] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help/NewWeb#Enabling_request_logging [17:12:15] petan: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help/NewWeb [17:12:23] * valhallasw`cloud <-- slowpoke [17:12:23] CP678|iPhone: It's not unsolvable if you'd take a minute to help isolate your problem. [17:12:38] CP678|iPhone: my suggestion. in /data/project/cyberbot/bots/cyberbot-ii/externallinks.php replace 'tools-db' with 'p:tools-db' see what happens [17:13:11] hedonil: I can try it as soon as I can access labs again. [17:13:23] CP678|iPhone: so it makes a persistant connection only once [17:13:35] Coren: no computer with Internet = not possible. [17:14:26] Coren: what am I supposed to isolate. [17:14:37] Coren: I enabled it, what now? [17:14:43] Coren: I don't see any error logs [17:14:55] petan: Did you restart the webservice? [17:15:00] yes [17:17:40] petan: error_reporting(0); in your php script probably isn't helping either. [17:18:00] Coren: can you convert this .htaccess to lighttpd format? http://etherpad.wikimedia.org/p/paste [17:19:52] Coren: I changed that [17:20:51] btw if anyone wants to become maintainer of that paste tool you are welcome [17:22:52] Coren: btw I enabled error reporting and all that but, still see no php errors in any logfile :( [17:33:26] petan: If I look at ~tools.paste/error.log, I see "PHP Parse error: syntax error, unexpected $end in /data/project/paste/public_html/fail.php on line 4". [17:34:05] etherpad as paste tool to paste config of paste tool [17:34:11] :) [17:37:21] scfc_de: That was a test of mine. :-) [17:38:08] scfc_de: I wanted to make sure that the reason he didn't get error logs wasn't because of the lighttpd setup. [17:42:03] petan: So yeah; php errors /do/ end up in the log, and php gets executed correctly in your public_html [17:45:24] scfc_de: where do you see that? [17:45:29] scfc_de: I don't :( [17:47:24] hm [17:49:26] petan: What contains ~tools.paste/error.log for you? [17:49:45] scfc_de: lot of lines like 2014-03-07 17:47:09: (response.c.720) Path : /data/project/paste/public_html// [17:49:55] however I did find this error using grep... [17:50:21] for some reason it takes long time for errors being displayed in error.log [17:50:28] maybe some nfs sync lag [17:50:36] I did tail -f on it on -login [17:50:47] it takes lot of time for error to display when I refresh in my browser [18:00:48] Aaaahgh, and now my proxy isn't workign anymore [18:01:30] Coren: back to CP's problem: mysqli_connect(): (HY000/2003): Can't connect to MySQL server on 'tools-db' (99) [18:01:57] OS error code (99) indicates that the server is running out of resources [18:02:51] if you issue $perror 99 on mysql host it should say something similar [18:03:05] hedonil: ... that's definitely broken. There are 7(!) connections atm; and the box is completely idle. [18:03:33] Load is 0.18, and 24G free ram. [18:04:15] Coren: hmmm 7 connections is not that much ;) [18:04:28] 8 when I'm connecting to make a test. :-) [18:04:53] Coren, scfc_de: 2014-03-07 17:32:14: (mod_fastcgi.c.2701) FastCGI-stderr: PHP Parse error: syntax error, unexpected $end in /data/project/paste/public_html/fail.php on line 4 [18:05:01] this make no sense there is no such a file [18:05:25] Coren: but JFI what does $perror 99 say on host tolls-db (Icant issue that command...) [18:05:25] petan: scfc_de: That was a test of mine. :-) [18:05:42] OS error code 99: Cannot assign requested address [18:05:48] meh [18:06:15] Coren, hmmm why can't reach 208.80.155.156 (eqiad proxy) from bastion-restricted1? [18:06:18] petan: I wanted to make sure php scripts generate errors without breaking yours. [18:06:26] andrewbogott_afk: as far as you know, is anyone managing to get any result from wikitech search? [18:06:30] Coren: let's see if CP can fix this with a persistant connection [18:06:37] ottomata: You can't reach floating IPs from within the virtual network. [18:07:09] ottomata: You have to use the internal ip (tools-webproxy) [18:07:26] Oh, wait, that's the general proxy, not tools' [18:07:30] yeha [18:07:37] just curious, as the proxy is borked again right now [18:07:44] But same idea. You can only reach the outside IPs from outside. You'll have to use the local IP from the bastion. [18:07:48] sometimes it works, sometimes it doesn't [18:07:54] aye [18:07:59] well, i can reach the local instance IP fine [18:08:04] and the web service works just fine [18:08:09] Where's Yuvi when we need him? :-) [18:08:10] but, it does not through the proxy [18:08:26] i can reach the proxy externally though, it just 502s [18:28:41] Coren: is it possible it crash because of memory limit? [18:28:55] Coren: there is no php error but I somewhat found where it die [18:29:13] ... not unless you managed to gobble up 4G of ram with a php script -- and if you do, I don't want it running on our servers. :-) [18:30:06] Coren: I doubt, it was running fine on apache before [18:30:44] What do you think the script is doing when it dies? [18:31:00] it creates new instance of a class [18:31:07] that is last thing that happen [18:31:16] then it just... finish? [18:31:18] wtf [18:32:15] it's extremely hard to debug anything because of slow responses from nfs [18:32:24] I have to wait 2 minutes for errors to get written in error.log [18:32:51] petan: That doesn't make sense. NFS is synchronous. [18:33:09] Coren: do tail -f error.log [18:33:14] refresh the webpage [18:33:22] it doesn't immediately show the error [18:33:29] it takes long time for that to happen [18:33:36] Yeah, it takes a few seconds to show but it's (a) a couple seconds, and (b) lighttpd buffering. [18:33:44] maybe [18:33:55] however the last line before it die is: $CI = new $class(); [18:34:18] 308 system/core/CodeIgniter.php [18:34:45] I inserted some syntax nonsense after this line, it never reach it [18:35:02] but if I put syntax error above this line it show in error.log [18:35:12] so this line "fail with no error whatsoever" [18:35:34] it looks like if the webserver just got killed with -9 or something [18:35:45] Or the script just ends in there. [18:35:57] And no, if the webserver got killed, you'd see /that/ in the logs. [18:35:59] it doesn't because otherwise my syntax error would happen [18:36:12] I put the syntax error on next line [18:36:34] Right, so obviously something happens in the class constructor. [18:37:31] seriously, this thing worked perfectly on apache, now it doesn't on lighttpd [18:37:45] this makes me think it's failure of lighttpd [18:38:04] petan: You're not making any sense. lighttpd is a webserver. It doesn't control what PHP does. [18:38:18] That's PHP runnning this script, not lighttpd. [18:38:30] howcome the same php did work on apache then? [18:39:04] It didn't. Apache is also a webserver; it doesn't run php scripts either. (Well, kinda if you use mod_php, but we didn't so it didn't). [18:40:10] You've got some environmental dependency you need to track down; perhaps you're using a php extension that was never puppetized? [18:40:51] You might want to try to ini_set('display_errors', 'On'); [18:41:07] It's not a good idea in production, but you're debugging. [18:41:32] You might also want to try to report E_ALL|E_STRICT instead of just E_ALL [18:46:30] memory_limit is set to 128M, which should be more than enough. [18:53:48] I am having trouble verifying the SSH Fingerprint of bastion-eqiad.wmflabs.org. Are it's fingerprints available on some known good page? [18:55:04] https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/bastion-eqiad.wmflabs.org [18:55:46] Coren: Thanks. [19:42:19] saluton, labsians [19:42:29] I totally read that as something else... [19:42:30] i am trying to migrate one of my tools on tool labs to eqiad [19:42:38] Btw, did someone murder petan in the last few months? [19:42:41] lol Damianz, i did too after i wrote it :p [19:43:00] im trying to 'finish-migration' in eqiad for one of my tools, but i get the following error: [19:43:01] Proceed with finalization? (Yes/No): yes [19:43:01] sudo: sorry, a password is required to run sudo [19:43:10] Coren sent an email about this - 1sec [19:43:13] ah [19:43:37] [Labs-l] Possible (minor) issue in migration of tools [19:43:43] i hadn't gotten to it in my inbox yet [19:43:53] The tool's maintainer list was not synchronized correctly. The fix is simple: simply go to the wikitech interface to make any change to the list of maintainers (such as add or remove a maintainer) and save the change -- you can undo the modification immediately afterwards [19:44:01] aye [19:44:03] thanks Damianz [19:44:39] np [19:45:06] imo a fix-migration-bug-001 command would be better, but meh priorities [19:45:11] Damianz: they tried [19:45:49] Damn you're alive... :P [19:46:15] Don't suppose you saw my thing about icinga or the other thing that I forgot [19:46:24] Damianz: It has to be done from the wikitech interface, so no commandline. [19:46:36] Coren: Buuut it just edits ldap in the backend? [19:47:05] Hmm I just installed Flux again... how do people work with funny colours like this =\ [19:47:25] Damianz: As well as change its own status or whatever. [19:47:50] :( [19:48:22] In that case I re-argue the point of wikitech having a real api until such time that it just becomes a dumb client to openstack and I can just use my details directly [19:48:42] Iz not arguing. [19:49:13] True... but seriously!? ;) [19:49:48] or maybe some sort of, um, "purge" command? [19:50:08] Also... on a seperate note - any chance of tools-login or bastion listening on port 443 for ssh? Would make fixing bots from work possible without using corkscrew, which is horrid. [19:51:33] Damianz Coren: for some reason i am not seeing either of my tools (bingle and bugello) on tools.wmflabs.org [19:51:39] they used to be there... [19:52:04] "see on tools"? [19:52:25] Damianz: http://code.kryo.se/iodine/ ? [19:52:46] Damianz: i know it's not the answert, just wanted to add to the list of tools:) [19:53:17] Coren: under the list of 'hosted tools' on tools.wmflabs.org/?list [19:53:33] i need to update the maintainers for bugello since i got the sudo error during migration [19:54:30] awjr: You don't need to go to all that trouble: https://wikitech.wikimedia.org/wiki/Special:NovaServiceGroup [19:55:08] mutante: Cool - we block dns too. I mean I could bounce though to the internet via like 4 hosts internally over many restricted networks... but 80/443 just go straight though the proxy fine heh [19:55:55] awjr: Are they on tools-eqiad.wmflabs.org? the proxy works great, but the indexes aren't merged [19:56:19] ah yeah they are Damianz [19:56:35] thanks Coren - that looks better [20:06:28] Coren: what are the possibilities to debug the lighttpd server? For one of my tools it suddenly stopped working, webservice restart does not work either. error.log has no info, access.log has no entries since the problem first occured. Calling a page doesn't give an error page but no page at all [20:06:52] "no page at all?" [20:07:15] apper: That means that whatever script is running isn't giving an error, just no output. PHP? [20:07:28] "Waiting for tools.wmflabs.org..." [20:07:44] http://tools.wmflabs.org/wikihistory/test.php [20:08:12] okay, it's php.... [20:08:15] http://tools.wmflabs.org/wikihistory/style.css works [20:08:35] apper: There is no output because the script is running [20:08:41] test.php is just [20:09:00] ? [20:09:08] apper: missing " ? [20:09:22] it's of course ;) [20:09:23] apper: Well, obviously your lighttpd is currently stuck servicing previous requests. [20:09:32] it worked hours ago [20:09:47] Coren: hmm... [20:09:49] Damianz: I barealy see anything. you need to repeat it :P [20:09:58] Damianz: that means "repeat it when I am looking" [20:10:02] I am looking now :3 [20:11:17] Coren: there are a lot of requests, which could run for some time (waiting for a job to be ready). But I have "if (connection_aborted()) exit;" in this waiting loop [20:11:48] Coren: and at the moment all browser windows are closed, so they should be exited... [20:11:54] @notify andre [20:11:54] I'll let you know when I see andre around here [20:11:55] @notify andre_ [20:11:55] I'll let you know when I see andre_ around here [20:11:56] apper: Lemme go see. [20:11:59] @notify andre__ [20:11:59] I'll let you know when I see andre__ around here [20:12:03] :o [20:12:21] that guy really should use less weird nicks [20:12:39] what do you need from Andre__ [20:12:52] mutante: he sent me private message [20:13:02] mutante: I replied and then I figured he is offline [20:13:40] apper: Well, what I can tell you is that right now all five slots of your lighttpd are running scripts. [20:13:41] he is one of those who join irc, ask a question and disappear to mysterious places [20:13:41] petan: got it, you could try memoserv. it will even send a message to him when he joins again [20:13:54] nobody reads memoserv [20:14:10] well. I don't, maybe some people do :) [20:14:24] i agree they should just leave their clients online:) [20:14:43] I answered just 20 minutes late [20:14:56] Coren: hmm... okay... than these scripts do not exit when the browser connection is closed, so connection_aborted() doesnt seem to work. I will check this and rewrite it another way [20:15:02] Coren: thanks [20:15:03] Damianz: now you disappeared to mysterious place [20:15:27] Damianz: I am no longer looking send me a mail :] [20:15:28] apper: IIRC, connection_aborted() only works under limited conditions. [20:15:29] petan: maybe wmbot could mail people if it gets email addresses by nickname from somewhere :) [20:15:37] hehe [20:15:45] sounds like an idea [20:16:03] or maybe it could control remote dron that would seek them and slap them with a message [20:16:04] heya, does anyone know [20:16:10] petan: lol [20:16:13] is the svn group / uid 550 special in ldap? [20:16:14] somehow? [20:16:14] It /does/ sound like an idea. A very bad one. [20:16:28] Coren: hence my joke [20:16:32] Internal error [20:16:33] The URI you have requested, http://tools.wmflabs.org/wikihistory/test.php, appears to be non-functional at this time. [20:16:37] Coren: why does "webservice restart" don't kill everything? [20:17:06] apper: because webservice != killall5 :-) [20:17:30] btw don't run killall5 just to figure out what it is pls [20:17:47] petan: then: How can I kill php threads started by lighttpd webservice ;) [20:18:02] apper: use I am almost sure that qdel should kill it [20:18:12] Damianz: here you are meh [20:18:15] petan: I sent you an email :P [20:18:19] aha cool [20:18:24] Like yesterday [20:18:26] but anyway [20:18:32] it didn't arrive [20:18:35] hmm [20:18:37] your e-mail is broken [20:18:37] work email? [20:18:40] probably [20:18:41] Damianz: thanks for the reminder, I always forget about this, but normally I should know ;) [20:18:48] work e-mail? [20:18:55] you even know it? [20:18:56] Could you add me to nagios as a project admin so I can build an instance in eqiad and fix the current broken one [20:19:06] you aren't? o.O [20:19:15] I use to be.... [20:19:19] petr.bena@acc....e.com [20:19:22] but seriously, it's no different from wiki mail [20:19:29] even though i also wasnt serious [20:20:24] Damianz: omg where do people get it from o.O it had to leak somewhere [20:20:26] Damianz: fixed [20:20:31] Thanks [20:20:37] Coren: is there a possibility to manually kill all php processes created by lighttpd webservice? [20:20:40] Pretty sure you emailed me from it sometime or something [20:20:42] Damianz: you better use my personal mail [20:20:49] gmail? [20:20:50] Damianz: maybe linkedin [20:20:52] yes [20:20:59] * Damianz updates his contacts for auto complete thing [20:21:05] I think this is on my linkedin maybe [20:21:12] that's where u got it from I guess [20:21:16] apper: 'webservice stop' should. [20:21:22] No idea - it's the default in my contacts for you [20:21:44] Coren: is "webservice stop" + "webservice start" something different than "webservice restart"? [20:22:06] It shouldn't be. [20:22:21] Coren: okay, both seperate didn't work either... [20:22:30] Huh. Interesting. [20:23:12] lemme murder them violently. [20:23:26] Coren: I'm pretty sure it worked in tampa [20:23:46] Coren: I had this problem some weeks ago, and a "webservice restart" did it [20:23:52] apper: Yeah, there's something wrong atm [20:24:08] Your CGI appear indestructible. :-) [20:24:13] hehe [20:24:16] Can you webservice stop for me? [20:24:40] done [20:24:53] apper: Okay, they're all dead now. [20:25:06] okay I'll restart [20:25:07] mmmm murder [20:25:29] thanks, works now [20:25:49] apper: Not sure how your phps could wedge themselves that hard, I had to kill -9 them [20:26:04] uh [20:26:44] (BTW, if that ever happens and I'm not around, you are perfectly allowed to log onto tools-webgrid-01 and kill them) [20:27:36] connection_aborted() doesn't work if no output is done before... [20:27:48] I think that was the problem, I will do some tests [20:27:50] thanks [20:28:43] I knew there were some gotchas with connection_aborted() [20:28:52] I love the MOTDs [20:40:31] Anyone around here an enwiki admin and want to do a quick edit on a restricted page (under a user I have access to) [20:49:44] Coren: Are you dealing with labs puppet stuff or is andrewbogott_afk atm? [20:56:50] * Damianz sigh [20:59:06] Damianz: i'd say both [20:59:11] git log:) [20:59:25] lol [20:59:31] Well I have a general meh [21:00:08] Refactoring things into modules is great, production manifests, great... puppet classes in wikitech are broken. Are we going to migrate the classes in the db to the new ones, so the interface is right and ldap so puppet is right? [21:00:29] Right now misc::ircecho -> ircecho means puppet doesn't run, which makes me grumpy since I didn't break it and I can't edit the global classes [21:00:39] Damianz: "puppet classes in wikitech"? [21:00:42] So I've got to add a custom class, copy all the settings etc to make ti work, when it should be migrated [21:00:59] Yeah - the 'assign puppet classes to instance' thing [21:01:03] oh, you mean the puppet groups thing [21:01:06] to configure instances [21:01:17] mhm [21:01:17] i wasn't aware they are broken [21:01:23] out of sync with the repo? [21:01:28] or how [21:01:45] because there is just a single one [21:01:51] Yeah - so for example misc::ircecho is now the module ircecho. Which needs fixing in the db (for the interface options), but also ldap (for actual puppet enc) [21:01:53] there isn't labs vs/ prod branch anymore [21:02:12] Just an interface/assignement issue - but annoying because we have this desynced thing [21:02:15] uhm.. that sounds like a real issue [21:02:21] if that is not in sync [21:02:26] with the normal puppet repo [21:02:55] i just did not notice because i was using puppetmaster::self the other day [21:03:02] Basically everytime something is moved, not only do the manifests in the repo need refactoring. But migrations for mysql/ldap need writing and running [21:03:02] i would have ran into it for sure [21:03:30] "something is moved" = any puppet merge? [21:03:38] Potentially [21:03:45] Required options changing, things moving, being renamed [21:03:46] well, i wouldn't know how else to explain it [21:03:50] Anything that breaks anything =\ [21:03:55] arg :( [21:04:07] * Damianz doesn't really like the split labs/prod thing... same as monitoring, refactoring is a PITA for keeping labs working [21:04:22] but the thing is , it's not split anymore [21:04:33] it should just be what is being merged [21:04:36] something broke [21:04:42] Well - in git its not... usage is (exported resources, ENC vs nodes) [21:04:51] and i wonder how this can be without others like hashar complaining [21:04:53] in beta [21:05:27] what do you mean by refactoring? [21:05:35] adding a case for $realm ? [21:06:06] monitoring, that is totally unrelated isnt it [21:06:17] because labs nagios has never been prod. nagios [21:06:25] Using $realm for stuff, just makes difference cases and diverges things more [21:06:29] which i keep pointing out as being really unfortunate since before it even existed [21:06:36] I think beta gets away with it using a lot of $realm [21:06:40] uhm... something happens to tools-login? [21:06:54] of course, $realm, but that's in every role class [21:07:00] how else are you going to do it [21:07:09] oh fuck [21:07:21] ircecho now takes a map... I can't pass maps from wikitech IIRC [21:07:29] example $apache_site="something.wikimedia.org" [21:07:36] vs. $apache_site="something.wmflabs.org" [21:07:42] you have to do some $realm thing [21:07:54] but that's usually it [21:08:02] Or $domain - yeah, those cases aren't really avoidable [21:08:05] apache site name and maybe certificate [21:08:30] you should just have one of them in the role class [21:08:36] and the module should be independent [21:09:03] the $real case in /mainfests/role/something.pp and set the variables there [21:09:20] then let it call ./modules/something/init.pp [21:09:24] which can be generic [21:09:33] s/$real/$realm [21:10:04] Ideally we should have production and labs hiera files and not use cases in the role files - but that's a different discussion [21:10:10] when configuring instances, just apply role class [21:10:29] yea, maybe, just saying how we all do it [21:10:31] not just beta [21:10:49] so back to the issue, if the puppetmaster is out of sync [21:10:51] that needs fixing [21:10:57] but is it really? are you sure? [21:11:15] because then i really wonder how hashar could work, we merged stuff for him [21:12:40] hi [21:12:41] The puppetmaster is in sync with git [21:12:51] Wikitech isn't in sync with the groups/classes [21:13:04] And some changes made to git are not compatible with labs/wikitech AFAIK [21:13:22] then " Wikitech isn't in sync with the groups/classes" is a bug for sure [21:13:29] So for example, misc::ircecho moving to ircecho broke puppet on labs instances, because the ENC was not updated [21:13:56] yeah because we dont keep back compatibility classes [21:13:57] Also, because ircecho now takes a map for the channels, rather than a log and a channel value as a string... I can't use it - the ldap ENC doesn't support maps [21:14:12] in theory this would be the other way around, right [21:14:20] labs is for testing stuff that gets into prod :p [21:14:24] Yeah [21:14:38] But we can't do that - because labs and prod run off the same git branch [21:14:54] they used to be 2 separate branches [21:14:58] Though apparently even when we test stuff it gets into prod broken *hehem varnish issue* [21:14:59] caused even more issues :p [21:15:00] if we had a dashboard/reporting of puppet failures on labs instance, that would help catch up [21:15:02] petan: around? [21:15:07] that's why we stopped that [21:15:14] liangent: no [21:15:40] :/ [21:15:43] petan: add me to http://tools.wmflabs.org/paste/ ? [21:16:03] where does ircecho run? [21:16:20] nagios-icinga [21:16:25] so, in labs [21:16:25] * SamB backs slowly away from whatever mess he just tab-switched into the middle of [21:16:34] (even if it is a tree widget rather than a tab bar!) [21:16:43] done [21:16:48] Damianz: imho the root cause is wrong worklow [21:16:51] liangent: u there [21:16:55] it should be tested in labs, and then run in prod [21:17:05] if it's .. well.. prod [21:17:10] petan: thanks [21:17:17] It should be tested in labs, run in labs then run in prod [21:17:29] But yeah - shits broken *grumpy* [21:17:47] yea, well, git blame the person who touched ircecho [21:17:53] and make them add a $realm case [21:18:34] Won't fix the problem due to the lack of map support in the ldap enc... will need a misc class to handle a single chan/log ircecho instance, which is hacky [21:18:59] if you are saying something is impossible to run in labs [21:19:06] then that's another bug:) [21:19:11] And people changing git, don't have access to wikitech to do real migrations and making them write real migrations is evil workflow... I think wikitech needs automated logic to handle that side [21:19:30] why if I link to /blahblah it links to http://tools-webserver-02/blahblah instad of http://tools.wmflabs.org/blahblah? [21:19:35] I might just spam lots of grumpy in bugzilla, then people can email slap me down [21:19:43] automatic logic = fix the sync [21:19:51] and jenkins [21:20:16] fale: using $_SERVER or something? that's the real hostname behind the proxy [21:20:53] Going back to early bitching - if wikitech had an api, then jenkins could spin up instances to test puppet changes and avoid this [21:21:04] Damianz: uhm... is possible that apache adds the $_SERVER? [21:22:42] Sorry - I'm assuming you're using php [21:22:51] Got a real url example? [21:23:47] Damianz: http://tools.wmflabs.org/isbn2tpl/ [21:23:55] Damianz: yep, it's php :( [21:24:18] petan: I can't find anything strange now after local-paste@tools-dev:~$ mv ...DATA.public_html public_html [21:24:41] petan: new test paste http://tools.wmflabs.org/paste/view/73077ec0 [21:25:02] liangent: interesting, can you explain to me why it didn't work before [21:25:24] petan: because there was no public_html ? [21:25:44] oh wait [21:25:53] did you move it there on OLD tools? [21:26:06] because I am trying to migrate this to new tools cluster [21:26:53] petan: new tools cluster? [21:27:02] fale: yes, eqiad one [21:27:04] fale: You have the most complex index.php even... digging though the code [21:27:16] Damianz: I have more complex one [21:27:23] You write shit in c# [21:27:30] :o [21:27:39] I even write shit in assembler when I am bored [21:28:08] recently I rewrote few of my drivers cuz they didn't work as well as I wanted :P [21:29:09] Coren: how do I figure out which webservice is serving the webpage? [21:29:21] fale: What is url set to in app config.php? [21:29:22] Coren: if it was lighttpd or old apache from old cluster [21:29:27] petan: I'm assuming they still didn't work as well as you wanted afterwards [21:29:37] That link_to function is using app('url') [21:29:39] or you're blessed with low expectations [21:29:43] So I'm guessing some config is wrong [21:29:46] SamB: they do perfectly work now [21:30:13] or you had some very nice hardware with shoddy drivers [21:30:21] SamB: otherwise I wouldn't be even connected they were wi-fi drivers [21:30:30] SamB: no I have too new hardware for linux :P [21:30:41] huh I was thinking about simply "making it running" [21:30:59] I even haven't read stuff about migration [21:31:09] oh, somehow I'd gotten the impression that you were hand-patching driver binaries ;-) [21:31:13] liangent: yes, but in few weeks the old cluster will be mercilesly deleted and the tool will stop working if we don't migrate it [21:31:18] (or better to say, didn't read anything about ToolLab migration) [21:31:35] liangent: you better read it if you run anythin there [21:31:45] otherwise all your tools are in danger [21:32:11] but everything I saw is about instance migration [21:32:49] liangent: are you even subscribed to labs-l [21:32:52] random thought (sorry about the interjection): but would it be good to have a notice on wikitech.wikimedia.org about the transition? [21:33:34] petan: "[Labs-l] Tool Labs migration instructions" found it :) [21:34:13] liangent: let's just create a new tool paste-test [21:34:22] I will delete it once we make it work [21:34:29] so that /paste works until we find the issue [21:37:21] petan: have you created it? become: no such tool 'paste-test' [21:37:29] sec [21:39:08] it's being created I guess [21:39:28] no more toolwatcher on -login :/ [21:41:55] (is there a canonical "This is what the labs migration means to you" page?) [21:43:23] petan: in meantime... should I try to migrate my own tools first? [21:43:33] I don't know [21:43:35] or should I wait for paste-test as a practice for me [21:43:38] you probably should [21:43:40] fale: this kind of url sometimes appears if the trailing / is missing. know issue [21:43:44] but no guarantee it will work [21:44:14] <^d> Any labs opsen about? [21:44:45] greg-g: like, a notice on all user pages, or what? [21:44:59] +talk [21:45:24] SamB: I was thinking a site notice [21:46:10] did everyone who has a tool get emailed about this yet? [21:46:51] SamB: Coren is on that. mail seems not to work yet [21:46:59] * SamB has no tool [21:48:10] no tool? that is like not having a facebook profile [21:48:16] you better got get some [21:48:20] * go [21:50:43] petan: what's the current problem with paste? it's still running in old labs.. and w/o https :( [21:50:59] this is sooo sad [21:51:09] x [21:51:19] hedonil: I migrated today then I rolled it back [21:51:37] hedonil: it doesn't work on new cluster, it doesn't produce any error. it just silently die [21:51:42] petan: hmmm [21:51:48] what's tools-db databases ? [21:51:49] petan: hey, I only even got an account on wikitech because I wanted to use url2commons ... which then refused to work anyway, at least until it had been upgraded to support OAuth [21:52:00] I say it's problem of new cluster, Coren say it's problem of tool. [21:52:21] SamB: no tool, no facebook, poor you [21:52:31] hehe [21:53:37] I was thinking of setting up a copy of url2commons so I could see WTH was going wrong, because all I got was an erorr message that looked vaguely equivalent to 5xx ... [21:54:40] petan: mind, if I join the paste-debug-party? [21:55:44] ok [21:56:20] petan: so then add me to the maintainer's list [21:56:42] petan: Can you add projects? [21:56:51] Coren: I created new tools "paste-test" it just doesn't want to get working, I can't even become [21:56:57] Damianz: I can barely login [21:57:06] Damianz: but I can make sushi [21:57:20] Mmmm I'd totally have sushi [21:57:27] it doesn't look well but it taste so great [21:58:00] petan: newly created tools are not sync'd automatically to eqiad. ha a similar issue which Coren fixed by hand [21:58:19] ok I guess I have to wait for Coren because my hand lack powers [21:59:03] I can forcefuly create it's home folder, but I can't setup the mysql etc, new cluster has even more secret db root account and regular admins don't have access to it [21:59:12] I guess Core n have no trust in us :-) [22:00:48] hedonil: I can add you to paste tool as well but it's not easily possible to debug problem in there [22:01:05] hedonil: you would need to migrate the tool again which would break it and I don't like broken tools [22:01:15] so I would prefer setup a clone of this tool [22:01:19] on new cluster [22:01:25] petan: weel if it would have been easy you'd already fixed it ;) [22:01:30] and once the issue is fixed we just move the "production" [22:02:03] paste-test is supposed to be this clone [22:02:15] but it doesn't want to get created [22:02:23] petan: If paste -test will disappear at the end of the test, let's try it by force [22:02:40] so once Coren fix that I will clone the files of paste in there and we can start working on that [22:02:48] hedonil: what you mean by force? [22:03:04] petan: just create a tools folder in eqiad, [22:03:26] hedonil: I have no force. I am just an admin that implies "almost no powers" where user is "no powers" :P [22:03:28] Is there some funky page somehwere to request wikipedia admins to make an edit? [22:03:39] Damianz: edit to what [22:04:00] Damianz: did you ask if I am admin on wikipedia before? [22:04:03] cbng fpreport url to move to tools from the bots redirect that's broken [22:04:16] I asked if there was any enwiki admins around that wanted to make a quick edit before [22:04:24] aha I can do that [22:04:26] Damianz: isn't that just {{edit protected}} on the talk page of whatever thing? [22:04:28] maybe [22:04:56] Need https://en.wikipedia.org/wiki/User:ClueBot_NG/Warnings/FPReport changing from report.cluebot.cluenet.org/ to tools.wmflabs.org/cluebot/ [22:05:13] sec [22:05:19] It's rather annoying that I can login as Cluebot_NG, but not edit the page under the user... stupid full protection [22:06:12] Damianz: https://en.wikipedia.org/w/index.php?title=User:ClueBot_NG/Warnings/FPReport&diff=598608741&oldid=402309756 [22:06:25] Damianz: is it correct? I can also remove the protection if you need :P [22:06:36] Yep - perfect. Tyvm [22:06:44] Now the bot will stop reverting pages and linking people toa 404 [22:08:16] good :P [22:08:24] Damianz: you gonna be at hackaton? [22:09:09] I really need to hack into cluebot hehe [22:09:11] I don't suppose pages can have ad-hoc ACL rules, rather than requiring a certain clearance bit of any editor? [22:10:26] SamB: they can't because everybody knows that role based security is better than stupid group-based crap mediawiki uses. And mediawiki always uses the worse solution :P. But just the second worse, so it's still ok XD [22:10:29] I think I've done migration for two of my tools... [22:10:38] the simplest two :p [22:10:45] In Zürich? Hmm hadn't thought about it... will be about due a holiday then. I've got something I have to do on the 7th, but could make it. [22:11:12] Would probably end up massivly refactoring cluebot or strangling someone from ops [22:11:17] Damianz: lot of cookies and beer and stuff. maybe some programming too [22:11:38] petan: hey, at when I create pages they aren't marked as owned by a group named SamB [22:11:41] oh yes we eventually do some wikimedia related work there too [22:12:43] SamB: no but you can set up special protection levels that will give certain groups permissions to edit pages under this [22:13:17] for example you can create group "suparpowar" and set up a protection level "suparpowaronly" [22:13:33] if you protect the page using this, nobody except member of suparpowar will be able to edit this [22:13:46] I think [22:15:04] I definitely saw something like that, but in the end, mediawiki developers have this excuse "mediawiki was created for online encyclopedia that anyone can edit, so edit restrictions aren't very good. If you want better edit restrictions you need to use some proper corporate CMS" :P [22:16:11] and yet I don't seem to be able to edit other peoples' /*.{css,js} pages ;-P [22:16:44] btw there are very poor role based security implementations even in linux kernel, so don't expect these in software which doesn't need them, when they aren't even in software that should have them [22:16:48] user pages are user pages [22:17:10] * Damianz xss's SamB [22:17:13] SamB: these are specially restricted using a nasty hack [22:17:14] :P [22:17:24] Damianz: yes I know why, what do you think the ";-P" was for [22:17:34] which is almost as ugly like the hack preventing people from deleting main page on enwp :P [22:17:47] yes I read about that [22:18:15] I thought that was just protected [22:18:22] lol no [22:18:29] there 2 other protection layers [22:18:38] Damianz: no, someone actually deleted because someone else had told them it was impossible [22:18:38] you can't delete main page on enwp, neither move it [22:18:48] your computer would blow up if you tried [22:18:53] and they just had to see it not delete [22:19:23] But they had to be an admin todo that first? [22:19:27] Damianz: it's like hook_ondelete() { if ( $wikiname == "en.wikipedia" && $pagename == "Main_Page" ) { die(); } } [22:19:33] o.0 [22:19:34] Damianz XD [22:19:37] * ^d skims the armchair criticism, then goes back to work [22:19:39] Damianz seriusly let me find it [22:20:27] ^d: It's easier to criticise than fix... php [22:20:31] petan: it seems uid and gid of paste-test's home on new login are incorrect? [22:21:16] <^d> Meh, anyway we wouldn't have to have hacks against deleting the main page in wmf-config if people didn't think deleting main pages was a good idea. [22:21:25] <^d> Because here's a hint: it's not :D [22:21:31] SamB: Ctrl+F for cant-delete-main-page in https://noc.wikimedia.org/conf/CommonSettings.php.txt [22:21:34] Can't fix stupid [22:21:57] Damianz: I found it [22:22:18] oh yes it's in that file liangent just sent [22:23:07] I'd just remove the idiots access, that's just ugly [22:23:10] look for "Only use this shitty hack for enwiki. Thanks." [22:23:11] Hmm anyway [22:23:32] ^d: yes, I guess it's pretty much the same as looking down the barrel of gun you think isn't loaded ... [22:23:53] except you only kill the server and/or page, not a human being [22:24:13] <^d> Damianz: Well sadly people like imitating idiots. [22:24:14] no you only kill the page, which isn't even that important in the end [22:24:18] who cares about the main page [22:24:22] <^d> :) [22:24:26] Coren: Since you're sorta lurking - do you have a min to create a new project for me? [22:24:32] <^d> Anyway, deleting a page is easy. [22:24:36] Doesn't everyone get to wikipedia pages via google? [22:24:38] <^d> `mwscript eval.php enwiki` [22:24:41] so it's safe to delete a page with 4999 revs? [22:24:49] the server won't fall over? [22:24:50] SamB: of course :P [22:24:58] <^d> ;-) [22:25:19] SamB: counting that 5000 revs uses a function called estimate*** [22:25:22] Damianz: that's my point [22:25:30] Damianz: I haven't visited main page for long [22:25:36] it might overestimate the number of there're 4999 revs :p [22:25:39] google gets you where you need [22:25:41] *if [22:25:50] it even gets you what you need in some cases :P [22:25:54] <^d> liangent: That function is misnamed. It shouldn't be called estimateRowCount() [22:25:58] <^d> Should be bullshitRowCount() [22:26:02] lol [22:26:20] by 4999 I mean "as many as it can have and still be deleted" [22:26:26] the body of function is { return random(); } anyway [22:26:55] <^d> mt_rand() is the best way to do stats. [22:26:58] petan: hm what about the uid/gid issue? [22:27:05] IMO main page exists so that pages can be marked as featured after they've been shown on it [22:27:10] liangent: where [22:27:13] petan: it seems uid and gid of paste-test's home on new login are incorrect? [22:27:15] <^d> We did it in Special:Code for awhile when we didn't know how to count properly and just new max and min. [22:27:15] or, well, linked to from it [22:27:22] <^d> So we mt_rand()'d the number of results. [22:27:24] liangent: are they even there? [22:27:28] liangent: they weren't last time [22:27:38] liangent: like the folder wasn't even created [22:27:51] wheeeee [22:27:53] it's there [22:28:06] what the hell [22:28:11] it doesn't work [22:28:13] but who cares [22:28:18] also, so people can't create a new main page with, say, the goatse image or a video of that Rick Astley video ... [22:28:28] huh it got fixed automatically? [22:28:28] petan: so now we have a directory paste-test & new db creds in eqiad. you just need to chown /data/project/paste-test to tools.paste-test [22:28:41] so http://tools.wmflabs.org/paste-test/ seems working now [22:28:47] hedonil: no we don't have new db creds, because it's broken [22:28:54] we don't have any db creds :P [22:29:11] tools.paste-test@tools-login:~$ mysql [22:29:12] ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) [22:29:16] wait... [22:29:18] local? o.O [22:29:39] tools.paste-test@tools-login:~/public_html$ mysql -h tools-db [22:29:40] ERROR 1045 (28000): Access denied for user 'petrb'@'10.68.16.7' (using password: NO) [22:29:47] petan: mysql --defaults-file=replica.my.cnf -h tools-db works fine what issue? [22:29:49] why petrb? o.O [22:29:58] oh this [22:30:09] I thought we created this as .my.cnf as well for some reason [22:30:13] but that were just my tools [22:30:42] hedonil: you can type `sql local` [22:30:45] that is shorter [22:31:00] petan: yep works [22:31:00] ok you are right [22:31:06] I will clone the files now [22:32:12] ok it's all cloned [22:32:37] and yes, the problem is replicated [22:32:42] now we get error 500 again [22:32:49] hedonil: start debugging! [22:32:50] XD [22:33:22] hedonil: btw the cloned files are using db credentials of "paste" you might want to replace them [22:33:40] petan: on it. btw I did a migration to create the directories/files in eqiad [22:33:50] aha [22:33:51] interesting [22:33:59] I would expect them to be created automaticaly there [22:34:08] since new tools should be only on eqiad [22:36:42] btw hedonil to save you work [22:36:55] petan: yes? [22:37:52] the problems start at line 308 system/core/CodeIgniter.php [22:38:08] hedonil: $CI = new $class(); seems to be where problems start [22:38:18] until then everything works [22:38:45] but I didn't get further in debugging because I am lazy :P [22:39:12] you do the hard work I will take the glory :P [22:39:30] petan: ok. just checking the basic webs (paths & variables) because that's almost the things that changed http://tools.wmflabs.org/paste-test/index2.php [22:39:51] I think that all paths should be same [22:48:01] petan: well I broke paste again by mv public_html ...DATA.public_html, because unless we migrate it again, new pastes will be lost... [23:14:24] petan: and http://tools.wmflabs.org/paste/ fixed on eqiad [23:15:29] by setting db connection data and adding rewrite rules [23:27:57] petan: it's getting verbose now http://tools.wmflabs.org/paste-test/ [23:28:28] petan: is the database already migrated https://tools.wmflabs.org/tools-info/?dblist=tools-db which is paste db? [23:30:31] petan liangent. already fixed, didn't notice - what was the error [23:32:38] petan: liangent: but still doesn't work with https (unsecure content blocked by browser) [23:33:47] needs more // ?