[00:00:35] Damianz: I'm reading that you can broadcast to a port, and multiple clients can listen. [00:01:49] broadcast is to all, multicast is to a group, unicast is to one [00:01:56] Coren: ok, updated [00:02:32] Damianz: Yeah, so we bag a port for the relay, and broadcast to it. [00:02:42] Then anybody who wants to use can tune in. [00:03:17] ermm [00:03:21] we'd want multicast [00:03:25] and yes if labs supports it [00:03:38] Do you have some example code to read from a redis key? [00:03:44] apaprently none of tools has redis-cli installed :( [00:04:04] Damianz: Python, import redis. [00:17:23] Damianz: Would it be worth just making a twisted server to connect to? Or perhaps we're overengineering, and we ought to just have a local IRC? [00:18:53] Anyone know how to reach eqiad web server internally (e.g. wget)? http://tools-webserver-01/TOOLNAME used to work, but no more? [00:20:22] a930913: I've got an example client in node that isn't very flexiable - anything that can listen on a socket and figure out where to send data would wokr though [00:20:29] tools-webproxy-eqiad [00:21:58] Damianz: "unable to resolve host address `tools-webproxy-eqiad'" [00:22:41] tools-webproxy.eqiad.wmflabs has address 10.68.16.4 [00:22:56] There isn't really 'the webserver' to access now since it's all just new web stuff [00:23:23] a930913: So if you subscribe to 'testkey' you'll get json of all edit data [00:26:25] Damianz: Thanks, the IP seems to do the job. But, there should really be a better way to reach labs tools from within labs tools... [00:26:44] * magnusmanske_ calls it a day... [00:27:42] Intersting tools-webproxy.eqiad.wmflabs isn't resolvable from pmtpa... hmm maybe Coren can fix that [00:28:06] Coren: So are we supposed to delete everything in pmtpa once we're done moving to eqiad? [00:28:25] anomie|away: You can, or you can be paranoid and leave it there. [00:28:37] * anomie|away will be lazy and leave it there [00:32:10] I removed my xxxG logs form pmtpa [00:32:32] a930913: If you're interesting/feel like hacking php https://github.com/DamianZaremba/cluebotng/commit/35f3768b1a461fddf93a3923bf22a6f1bfd05323 [00:34:58] Damianz: 'testkey'? [00:35:22] it's a redis 'channel' or w/e it's called [00:35:36] because I'm awesome at naming things [00:36:47] Damianz: On tools-redis? [00:36:53] mhm [00:37:04] >>> r.exists("testkey") [00:37:05] False [00:37:06] ? [00:37:35] in eqiad? [00:37:59] Damianz: Wha? On tools-dev. [00:38:06] Connecting to tools-redis. [00:38:44] https://gist.github.com/DamianZaremba/d59946bccb69ce63c020 [00:44:38] Damianz: I'm not getting anything :/ [00:45:59] I had it stopped for a min to add extra data... but it's definatly pushing atm. [00:47:02] Well, I'm not receiving anything using your code. :/ [00:47:38] Where are you running it from? [00:47:56] Damianz: tools-login atm. [00:48:10] run it from tools-login-eqiad [00:48:34] Should there be a difference? [00:48:48] yeah [00:48:52] different dc = different box [00:49:10] apaprently you can't even connect cross dc to the redis port for some reason (I tried from a bots server earlier) [00:49:21] Damianz: Aren't we both connecting to the same server? [00:50:16] If you're using my code exactly on tools-login, you're connecting to tools-redis.pmtpa.wmflabs. I'm connecting to tools-redis.eqiad.wmflabs [00:50:27] They aren't clustered/replicated [00:51:09] Two servers with the same name, in a similar place, why? [00:52:01] same reason tools-webprooxy, tools-exec-0{1..8}, tools-login etc are all named the same but have different domains [00:52:46] We kinda do the same thing at work, but also number them offset depending on which side so you definatly know where you're connecting because dns is silly [00:53:49] In our case, the primary concern for the same hostnames is to ensure that the scripts will work with as little change as possible during and after migration. The *.pmtpa will just go away at the end. [00:54:17] Wait, everything's moving? [00:54:55] ... a930913, it has been announced at regular interval on labs-l over the past several months, and explained in detail howto this morning. [00:55:00] I kinda think it would be cool if pmtpa just became eqiad2... still isolated for the purposes of like omfg nfs is broken, lets test ceph [00:55:26] Damianz: We'd rather physically move the servers to make labs bigger. :-) [00:56:11] Also, the actual datacenter itself is being decomissioned. [00:56:25] Coren: Where am I supposed to be signed up to to hear about these things? :p [00:56:41] labs-l. You should be signed up there anyways. [00:56:43] As far as the datacenter is concerned good riddence... considering the amount of fibre issues in the last few years [00:57:35] Though it will sadly mean the death of some instances that have been around sine the early years of labs, which are no longer actually used but left hovering [00:57:53] I'm not sure why that'd be sad. :-) [00:59:29] Coren: How do I sign up to labs-l? [00:59:32] !labs-l [00:59:32] https://lists.wikimedia.org/mailman/listinfo/labs-l [01:00:10] The sallient email being: http://lists.wikimedia.org/pipermail/labs-l/2014-March/002160.html [01:08:44] * a930913 panics and breaks the glass. [01:09:03] Coren: "sudo: sorry, a password is required to run sudo" on finish-migration. [01:09:16] a930913: What tool? [01:09:22] Coren: referencebot. [01:09:50] Just so I can track down that infrequent bug, were you only recently added as maintainer to that tool? (Or was it created recently)? [01:10:05] Coren: Nope, months ago. [01:10:10] Hm. Odd. [01:10:45] Doomed? [01:10:48] All is lost? [01:13:21] Hardly. All I have to do is poke the maintainer list. :-) You'll have to log off and back on though. [01:13:48] Probably a stupid question, but how do I access my eqiad account once I run the migration script from my tools-login account? [01:13:51] For some reason, /some/ service groups have an outdated maintainer list. Simply adding and removing a maintainer fixes it. [01:14:21] TCN7JM: You can log in from the 'net via tools-login-eqiad.wmflabs.org, or from the old server by simply 'ssh eqiad' [01:15:43] Alright, got it. Thanks. [01:17:18] Coren: So what now? [01:17:36] a930913: ... ? Your issue isn't fixed? [01:18:03] Coren: I'm just being a muppet :p But it should be fixed without me doing anything more, yes? [01:18:18] Hardly. All I have to do is poke the maintainer list. :-) You'll have to log off and back on though. [01:21:13] Coren: LDAP is the authorative source for service group members? [01:21:40] Ah, wonderful. I guess we find out tomorrow if it explodes. [01:21:51] scfc_de: It is, but right now there are two copies due to the migration between local-foo and project.foo [01:22:04] And for a while, only the local- was being correctly updated. [01:22:35] k [01:31:59] Coren: What was the command to start the lighttpd? [01:32:24] "webservice start"? [01:32:40] * Coren nods. [01:37:36] Coren: I thought log files weren't meant to be copied? [01:37:58] They aren't, by default. [01:38:41] Oh silly me, they're the new ones :p [01:39:00] #2AMproblems [01:42:44] petan: When are the wm-bots migrating? [01:43:11] When one of their maintainers migrates them. :-) [01:44:09] I'm running blind until then :p [02:00:26] UDP I said, as it won't break anything, but no, he had to use TCP, so my code now throws a hissy fit when it can't connect. D: [02:47:16] andrewbogott: We fixed the proxy stuff. You can now access drmf.wmflabs.org [02:47:33] Howie_: great! Are you still hitting the timeout, or is that resolved? [02:48:26] hold on let me test [02:48:53] yes that worked! [02:49:00] cool [02:50:39] You can see ... ? http://drmf.wmflabs.org/wiki/Zeta_and_Related_Functions ? [02:53:21] Howie_: Yep, looks good. [02:53:51] If there's a hope of rolling whatever that is into production it'll probably need some performance work :) [02:54:54] andrewbogott: How would that work? [02:55:18] I think it's slow just because of the amount of equations which need to be rendered. Do you think it could still be made faster somehow? [02:55:48] Howie_: Um… I have no context for what that page is doing or what it is for. I'd recommend you check in with Ori about performance, he can probably judge whether or not it will be a problem. [02:56:13] Yeah, could be just the large number of equations. [02:56:13] In that case, one way of making this faster would be just to have a smaller number of equations on a given page. This is easily doable. [03:26:37] Hi Coren, still busy with the nigration? Remember you installed fastcgi for me a while ago. On eqiad I get "/data/project/zoomviewer/cgi-bin/iipsrv.fcgi: error while loading shared libraries: libfcgi.so.0: cannot open shared object file: No such file or directory" when lighttpd tries to launch my fastcgi program [03:27:07] I think I used to link it statically (but cannot remember how I did that) [03:27:28] is libfcgi missing on the web nodes? [03:27:51] dschwen: It might never have been installed there; if you linked statically you only needed it where you built. [03:33:50] yeah [03:34:06] ok, I'll try to get it to link everything statically for now [03:34:50] but there would be less surprises if the lib was actually installed on the web nodes [03:36:24] dschwen: That can be done relatively easily -- please open a bugzilla for tracking. I'll be able to hack through install requests in a few days once most people have migrated. [03:36:42] ok, thx [03:46:57] Coren, I just migrated orgchart.eqiad.wmflabs to eqiad and I can't access it. Can you? [03:47:09] I wonder if I'm making some dumb mistake w/the security group... [03:47:41] lemme check. [03:48:10] thanks [03:49:14] andrewbogott: What project is it in? [03:49:20] 'orgcharts' [03:51:55] Works from tampa. Maybe it doesn't allow two different security groups for the same port/proto and only picks the one? [03:52:25] I put 10.0.0.0/8 in for the other ones, I haven't tried two. [03:52:26] I just added the rule, maybe there's a lag [03:52:32] although I don't know why... [03:52:36] But, ok, I'll try that. [03:52:42] http is working so it's clearly a firewall thing. [03:52:46] thanks! [03:53:03] andrewbogott: In other news, tools migration is proceeding smoothly. [03:53:34] Yeah, so I see! Nice work -- I hope you weren't sleepless all weekend writing those scripts. [03:53:56] Monday was long. :-) [03:54:14] Project migration seems to be going OK. There are going to be a TON of orphan projects. [03:54:23] Or, at least, projects that no one cares about until I break them :) [03:54:43] andrewbogott: It's not like that's a bad thing really. [03:54:54] nope, should be fine. [03:55:48] Coren, good guess. Replacing the rule rather than adding a second solved the problem. [03:57:06] andrewbogott: We probably want to change the default. [03:57:41] I think the default is already correct for new projects. But I'll make sure. [03:59:43] * Coren hasn't checked. [04:36:34] 77 tools migrated; seemingly without issues. Yeay. [04:38:55] 77/how many? [04:39:06] soon to be 78 [04:51:52] andrewbogott: 600-odd, I think. Though I expect that many of those will end up orphaned too. [04:52:31] Oh, 77/600 is pretty good! [04:59:03] For a first day, yeah, I'm happy. I was expecting a slower and bumpier ride; but it's fairly smooth sailing to date. [04:59:52] Coren++ [05:09:06] Whee, the database performance on the new Tool Labs is amazing. <3 [05:09:39] because no one is using it yet ;) [05:09:48] give it a few weeks [05:09:57] well, less people are using it anyway [05:10:00] Pfft. Let's keep it that way then. :p [05:10:58] But I'd say https://bugzilla.wikimedia.org/show_bug.cgi?id=55929 is resolved now. [05:21:58] Ryan_Lane: I think he's refering to the fact that the DB is now feet away instead of 0.026 light-seconds away. :-) [05:22:07] ah, yeah [05:22:13] that makes a lot of sense ;) [06:01:57] Coren: for some reason directory URLs that don't end with a slash get redirected to http://tools-webgrid-01:4068/. For example, this URL is broken: http://tools.wmflabs.org/pathoschild-contrib/stewardry but this one is fine: http://tools.wmflabs.org/pathoschild-contrib/stewardry/ [06:02:27] Is this a known issue, or something I'm doing wrong, or should I file a bug? [06:41:50] Ryan_Lane: (repeat of email question): When you twiddled wikitech login sessions the other day, what did you do exactly? [06:51:44] Um, crap, got kicked and can't tell if my last message was sent. So, sorry if this is a repeat... [06:51:52] Ryan_Lane, when you twiddled wikitech login sessions the other day, what did you do exactly? [07:17:32] andrewbogott: three things [07:18:36] use labswiki; [07:18:46] update user set user_token=null; [07:18:53] truncate openstack_tokens; [07:19:02] (that's in mysql, of course) [07:19:09] * andrewbogott nods [07:19:10] then I purged memcache by restarting it [07:19:38] your session should work past a single browser session, but only if you opt to stay logged in [07:20:10] Well, doesn't sound like anything you did could affect future behavior anyway. [07:20:20] But I'm pretty sure the behavior is different… want to check for yourself? [07:20:32] Maybe I've varied my behavior in some way that I'm unaware of... [07:22:29] I logged in with "Keep me logged in", closed my browser, then re-opened it [07:22:33] still logged in [07:23:30] hm [07:23:34] * andrewbogott tries it yet again [07:23:57] which browser are you using? [07:24:09] I tried in chrome and firefox [07:24:32] ff [07:24:50] yeah, logs me out. I'm sure I ticked the box [07:25:49] ah. indeed. in safari it's not working properly for me [07:26:59] Ryan_Lane: you can look at it if it interests you… if not, then not :) Pretty unlikely that you caused the change, now that I know what you did. [07:27:23] hm. maybe I wasn't supposed to set it to null [07:28:41] yeah, it's not updating my token [07:30:21] in fact, it's nulling the token [07:31:12] my openstack token is being set fine [07:31:18] but my mediawiki token is not [07:35:06] So 'user_token' isn't the current token, it's a template of some sort? [07:35:17] Or is it just that null vs '' is tripping a bug someplace? [07:39:15] andrewbogott: gixed [07:39:16] *fixed [07:39:21] apparently you can't null the tokens [07:39:27] it won't regenerate them [07:39:42] there's a resetUserTokens.php maintenance script [07:39:47] I just ran that [07:39:55] then I logged out and back in with the option selected [07:40:19] good thing there's mediawiki devs around to help (aaron in this case :)) [07:40:44] Ryan_Lane: that's a bit obscure, but, great! thank you. [07:41:04] yeah, so in the future the proper way to handle the user tokens is via that script [07:41:14] but the truncate of openstack_tokens is still needed [07:41:16] and the purge of memcache [07:42:12] * andrewbogott looks on wikitech for a sensible place to note that down [07:42:24] hm, but first I have to log in again [07:47:13] https://wikitech.wikimedia.org/wiki/Help:Force_all_users_to_log_in_afresh [07:51:21] you want to restart memcache last [07:51:23] * Ryan_Lane edits [07:51:30] first I have to log in ;) [07:55:38] OK, I'm out for a bit, will be back tomorrow AM. Thanks again for fixing! [08:02:35] !ping [08:02:35] !pong [11:51:55] hmm Coren in eqiad puppet seems to run with 2 errors on a fresh instance [11:52:27] https://www.irccloud.com/pastebin/YYgPDRV8 [11:54:11] oh wait, I realise your not here yet xD [11:54:47] addshore: puppet might be triggered before /home had an opportunity to be configured/mounted [11:54:53] ah no [11:55:02] the NFS server has no home did for wikidata :] [11:55:06] labstore.svc.eqiad.wmnet:/project/wikidata-build/home failed, reason given by server: [11:55:07] No such file or directory [11:55:28] ;_; [11:56:24] I assume that should be created on project creation (well before instance creation); could you file a bug? [11:58:57] (Or for legacy projects that were created before eqiad Labs: By the Labs admins :-).) [12:02:03] Will file a bug, is there any way you could poke the dir into existence now so I could carry on migration? ;p [12:04:07] I don't have access to the NFS server; only andrewbogott_afk and Coren probably. [12:04:21] ahh okay :) thats fine! [12:04:31] (And other ops if it is documented :-).) [12:08:27] for your reference https://bugzilla.wikimedia.org/show_bug.cgi?id=62252 [12:12:21] How can I add migrated tools to service groups? They don't exist atm, according to Special:NovaServiceGroup (functionality was broken before anyway, so perhaps best just to wait until pmtpa is RIP?) [12:39:08] Hi, it is normal for migrate-tool to delete the public_html folder of the tool? [12:41:02] Second question: is it there a "standard" UI framework for the web tools of Tool Labs? I'm using Bootstrap now. [12:49:23] pietrodn: I think it just moves public_html (could be wrong on that though) [12:50:07] Okay, other question, when projects get mothballed, does that mean the instance gets mothballed and can be rebooted? [12:54:24] jarry1250__: yes it will be in a shutdown state but you can always start it up again [12:54:57] addshore: Okay, cool. It's just that I guess I don't need the instance rebooted for a while, but I will want it rebooted at some point :) [12:55:29] should be fine then :) might be worth adding a note to https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration/Progress [13:03:25] !migration is https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration [13:03:25] This key already exist - remove it, if you want to change it [13:03:30] !migration [13:03:31] https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration_Progress [13:03:34] :O [13:25:18] Damianz: Should I be getting the redis string in chunks, or all at once? [13:26:38] 'morning labs. [13:26:45] * Coren reads backscroll. [13:27:56] addshore: The project was created without the 'shared home' and 'shared project dir' option? [13:28:15] * addshore does not recall [13:28:25] Coren: waiting forever in finish-migration [13:29:22] addshore: Manage projects -> configure -> checkboxes at the top. :-) [13:29:36] zhuyifei1999_: Do you have databases on tools-db? [13:29:56] likely yes [13:30:16] That may take some time then, because it requires a dump and restore. [13:30:29] lovely Coren :D so tick those 2 boxes and re run puppet and magic things will whur and things shall be fixed? ;p [13:30:59] coren: it shows "That tool doesn't seem to be migrated yet" [13:31:02] addshore: There's a delay before those get created though; ~5min as a rule. But yeah. [13:31:09] awesome :) [13:31:12] zhuyifei1999_: Which tool? [13:31:27] yifeibot [13:31:32] * Coren looks. [13:31:55] zhuyifei1999_: It's still being copied between the two datacenters. [13:32:30] Coren: when will it finish? [13:33:26] That depends only on how much data there is to move. It takes several minutes /GB [13:33:40] Coren: is it possible to make a copy of /shared/pywikipedia/ on equiad, without loosing the other one, so that bots don't breack on migration? [13:34:23] Alchimista: Yes. I saw the email. That's not a difficulty and in fact I have to do it since it's not 'part of a tool'. I'm going to do it shortly. [13:35:02] Oh, you have *GOT* to be shitting me. [13:35:09] XFS decides to break *now*? [13:35:22] zhuyifei1999_: That's why it stalled. [13:35:27] thanks Coren, any ETA? i forgot to check if /shared/pywikipedia/ was already on equiad, so my bots are stoped now [13:35:53] Alchimista: I fix the pmtpa server, then about ~15m later. So within half an hour. [13:36:23] Coren: perfect, that way i don't need to install it locally, and then change all back :D [13:36:28] Lol, wikipedia is going to break because all the bots are going down :p [13:36:32] intersect-contribs: Scheduled for copy Wed Mar 5 12:28:39 UTC 2014 [13:36:41] It's stalled I think [13:36:50] no DBs [13:36:53] XFS broken. Will return shortly. [13:37:12] Does anybody know redis here? [13:37:44] a930913: Yuvipanda is the resident expert. [13:38:08] @notify YuviPanda [13:38:08] This user is now online in #wikimedia-dev. I'll let you know when they show some activity (talk, etc.) [13:38:09] * anomie just looks things up at http://redis.io/commands when necessary [13:38:36] * Coren roars at the pmtpa NFS server. 'Your days are numbered!' [13:38:57] anomie: Either Damianz is doing something weird, or redis is splitting the messages to me :| [13:41:41] @notify andrewbogott_afk [13:41:42] This user is now online in #wikimedia-labs. I'll let you know when they show some activity (talk, etc.) [13:42:01] petan: wm-bot on eqiad? [13:42:08] a930913: not yet [13:42:16] I know. [13:42:27] a930913: there is this bug https://bugzilla.wikimedia.org/show_bug.cgi?id=62234 [13:42:32] Your TCP idea means BracketBot is now broken :( [13:42:37] until it's fixed wm-bot doesn't move anywhere :/ [13:42:47] a930913: what do you mean? [13:42:50] what TCP idea [13:43:28] @replag [13:43:28] Replication lag is approximately 00:00:00.6917770 [13:43:44] this one is on eqiad ^ [13:43:48] petan: Sending the, erm, relays? [13:44:05] * Coren power cycles the stupid pmtpa NFS server. [13:44:10] a930913: ok I have no idea what you talk about, what is the context? [13:44:27] petan: @relay I think. [13:44:29] Coren: xfs again ? [13:44:30] so far I know you talk about something related to BracketBot (what is it?) and TCP idea [13:44:43] matanya: Yeah. While in the middle of migration, natch. [13:44:46] a930913: it perfectly works [13:44:57] matanya: We're getting rid of that filesystem in <3 weeks. [13:45:09] petan: Yeah, but it used TCP which means breaking pipes. [13:45:17] Coren: ext4 instead? [13:45:18] Silly emoticons making it hard to type less-than-three. :-) [13:45:23] a930913: I see it's definitely sending in chunks of 8192 bytes of message payload. Do you know where his source is? [13:45:33] a930913: you just need to relay to i-00000816.pmtpa.wmflabs [13:45:36] matanya: Ayup. XFS has gotten really flaky under load in modern kernels. [13:45:46] a930913: what do you mean by "breaking pipes" [13:46:04] petan: UDP doesn't "break" but TCP does. [13:46:12] define "break" [13:46:15] Coren: interesting, rhel 7 ships xfs as default [13:46:40] while deb based sticked to ext for years [13:46:41] a930913: the difference between TCP and UDP is that UDP is simple data stream with no verification [13:46:55] so it's not so reliable, but faster [13:46:58] i wonder why is it this way [13:47:01] good for video streaming etc [13:47:23] not good for any kind of client / server communication where data are never supposed to be malformed [13:47:32] sending data to IRC using UDP is not a good idea [13:48:10] a930913: however, neither TCP nor UDP performs any "break" or whatever you mean [13:48:41] petan: "IOError: [Errno 32] Broken pipe" [13:48:46] pmtpa NFS on its way back up. [13:49:08] a930913: what were you trying to do? [13:49:44] zhuyifei1999_: migrations will resume as soon at it finishes starting up. [13:51:14] anomie: https://github.com/DamianZaremba/cluebotng/commit/35f3768b1a461fddf93a3923bf22a6f1bfd05323 [13:51:20] Coren: is there an outage? [13:51:30] Betacommand: XFS went down [13:51:42] Betacommand: Yeah, pmtpa NFS server needed reboot. It's almost completely back up now. [13:51:43] pietrodn: ah [13:51:46] petan: Send to wm-bot. [13:51:56] a930913: ok but how [13:52:06] a930913: from which server to which server [13:52:52] !newweb [13:52:52] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help/NewWeb [13:53:23] petan: eqiad grid to lab-bots and then to the one you mentioned above. [13:53:46] NFS restarted. Things will unclog shortly. [13:53:55] a930913: sending to 10.4.0.81 100% works [13:54:03] a930913: that is what I do now [13:54:27] @help [13:54:27] I am running http://meta.wikimedia.org/wiki/WM-Bot version wikimedia bot v. 2.0.0.4 my source code is licensed under GPL and located at https://github.com/benapetr/wikimedia-bot I will be very happy if you fix my bugs or implement new features [13:55:05] Is it good or not to embed jQuery code from an external CDN (Google, …) on Tool Labs? [13:55:25] ok, maybe not 100% :P [13:55:29] now it just stopped working [13:56:16] zhuyifei1999_: Migration copies restarted. Sadly, the order is pretty much random so you ended up at some random spot in the queue. Sorry. :-( But the queue is pretty short. :-) [13:56:34] coren ... labs seems to be down [13:56:36] I can SSH now, good [13:56:41] a930913: Well, I've determined it's something in his code. I just sent a message (manually) between two session with 9000 bytes of payload. [13:56:49] GerardM-: XFS in pmtpa died again; it just restarted. [13:57:39] thanks [13:57:43] !log tools petrb: test [13:57:47] Logged the message, Master [13:57:55] pietrodn: use http://tools.wmflabs.org/static/ [13:57:56] a930913: yes you are correct [13:58:04] a930913: that port is firewalled out for whatever reasons [13:58:25] Coren: /shared/pywikipedia/ is actually currently linked to the pywikibot project, but I don't have the time to fix it now (at a conference this week, and holidays next week). If you could just make a static copy that would be awesome [13:58:41] Coren: why is port 64834 firewalled pmtpa - eqiad [13:58:43] valhallasw`cloud: Will do. [13:58:44] sitic: didn't know of that, thank you! [13:58:56] Coren: thanks! [13:59:12] petan: (And there I was, thinking I was going mad. :p ) [13:59:12] a930913: I will eventually change the port... that is probably only solution now [13:59:29] petan: Random traffic isn't allowed between datacenters; only within projects in the same DV. [13:59:29] I don't expect any ops to fix it nor explain it [13:59:32] DC* [13:59:34] Coren: is there a way to access the status page for eqiad? [13:59:36] Well, the database on eqiad is just FAST! [13:59:49] Betacommand: Yes, through http://tools-eqiad.wmflabs.org/ [13:59:54] Coren: but another port works just fine [14:00:03] Coren: only this one doesn't [14:00:12] I mean maybe all in this high range [14:00:21] but 5xxxx ports are fine [14:00:39] a930913: If I had to guess, I'd guess it's his socket listener is reading buffers of 8192 bytes and then blindly publishing them, rather than trying to accumulate full records before publishing in redis. [14:02:28] Aha! It looks like the actual XFS problem lives in yifeibot! [14:02:37] (Or at least "one of") [14:03:08] A tool is managing to break XFS? [14:03:39] That's plain evil XD [14:03:40] anomie: Well, no, the XFS bug is broken somewhere in that part of the filesystem. It's obviously not the tools itself. :-) [14:03:50] s/broken/triggered/ [14:04:55] Ah, so there's some sort of corruption in the FS, and it just so happens that the corrupted file/block/whatever is accessed by that tool. [14:05:32] anomie: I don't think there's actual corruption; xfs_repair never sees anything wrong. But there's something around there that makes the driver think so. [14:05:32] a930913: fixed! [14:06:06] Coren: after the migration my tool runs well with the OLD sql credentials… is that ok? [14:06:18] a930913: wait, maybe no :o [14:06:50] pietrodn: No credentials were remove yet; but you want to switch to the new ones before migration is complete. [14:06:59] a930913: now it's fixed! [14:07:05] a930913: same port [14:07:35] a930913: the IP address will however change once wm-bot is in eqiad [14:08:39] Holey carps! How many millions of files are there in that directory?! [14:09:38] petan: \o/ The best part is that it autoconnected. Which means I must have done something right when making it :p [14:12:09] Coren, ping [14:12:30] !ask [14:12:30] Hi, how can we help you? Just ask your question. [14:12:30] a930913: autoconnected? :) [14:13:47] Cyberpower678: ping [14:14:03] petan: As soon as the port opened, it connected by itself. [14:14:10] aha ok [14:18:24] BracketBot lives \o/ [14:20:02] Before the migration was announced, I could access my instances without a problem. After the migration was announced, my instances are no longer accessible from bastion with a reported error of Permission denied (publickey). Could this be related or is it just a coincidence. [14:21:51] * a930913 goes off to uni. [14:26:22] slevinski: It shouldn't be related; the announcement didn't actually scare the instances as far as I know. :-) What instances/project [14:26:35] slevinski: It's probably just gluster being broken again. [14:27:14] the signwriting project, instance i-0000070c and i-00000322 [14:27:34] Coren: virt10 is out of disk space ( 44 GB left ) [14:27:46] DISK CRITICAL - free space: /var/lib/nova/instances 44146 MB (3% inode=99%): [14:28:18] hashar: They're *all* out of disk space. Not caring, they're getting decomissioned soon and instances are being migrated off. :-) [14:28:28] ok ok :D [14:28:30] IT'S SO FAST :D [14:28:31] hashar: And instance creation is disabled. [14:28:45] o really! that is nice [14:28:56] Coren, how do I initiate the migration? [14:30:09] Cyberpower678: If only there was an email with detailed instructions that was sent to labs-l and wikitech-l, as well as a wiki page with a copy of those instructions... [14:30:30] slevinski: Yep. Insane gluster. Give me a minute to kick them. [14:30:42] Coren, I got flooded with so many emails, I don't even know where to look for it. [14:32:04] slevinski: Should be working now. [14:32:36] !migration [14:32:36] Cyberpower678: http://lists.wikimedia.org/pipermail/labs-l/ [14:32:36] https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration_Progress [14:32:42] Coren: is there a way to check overall status of all migrations? how many are in the queue, which one is currently being moved... [14:32:49] anomie, thank you. [14:32:52] petan: No. [14:33:11] petan: Well, yes, but not anywhere reachable by endusers. [14:33:36] Coren: which job is doing this migration? I would like to figure out somehow how long the queue is :P so that I know how long I need to wait [14:33:54] that job could probably keep some statistics on html page or something [14:34:13] petan: There is no way you can get that information; this does not run within labs at all. [14:34:24] mhm [14:34:38] Coren: Not working yet. I started a new ssh to bastion, then tried to ssh to the instances. Both still report Permission denied [14:34:41] petan: And right now, the tool that is being migrated will take some time; there's some 20 million files to move. [14:34:48] o.O [14:34:51] 20 millions wtf [14:34:58] let me guess, it's some python bot [14:35:58] slevinski: I see nothing wrong with them right now. [14:36:14] slevinski: And I can log in fine. [14:39:14] Coren: thanks for looking. [14:39:55] slevinski: I know it sounds silly, but check that you are using the right ssh key, and that you're forwarding it correctly through bastion? :-) [14:48:03] Coren: Yes, it might be ssh key related. I can set the ssh key through the wikitech interface. Is there anything I need to worry about after I am connected to bastion? [14:49:15] https://wikitech.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [14:49:23] slevinski: Check the bit about proxycommand [14:50:04] Otherwise, check two sections down about agent forwarding. [14:51:03] Lemme go check your logs. [14:53:30] ... slevinski, I don't see any /attempt/ at logging into those instances from you. [14:56:19] Coren: I was using 'ssh i-0000070c.pmtpa.wmflabs'. I just tried an alternate "ssh i-0000070c' Can you check the log again? [14:58:16] slevinski: I don't know what you are trying exactly, but I can tell you you're not actually reaching that instance. Where are you trying this from and can you try with -v and pastebin the result? [14:59:58] Ah, this time I saw you reach the server, but not attempt to authenticate. [15:00:13] OpenSSH_5.9p1 Debian-5ubuntu1.1, OpenSSL 1.0.1 14 Mar 2012 debug1: Reading configuration data /etc/ssh/ssh_config debug1: /etc/ssh/ssh_config line 19: Applying options for * debug1: Connecting to i-0000070c.pmtpa.wmflabs [10.4.0.188] port 22. debug1: Connection established. debug1: identity file /home/slevinski/.ssh/id_rsa type -1 debug1: identity file /home/slevinski/.ssh/id_rsa-cert type -1 debug1: identity file /home/sle [15:01:32] debug1: Authentications that can continue: publickey debug1: Next authentication method: publickey debug1: Trying private key: /home/slevinski/.ssh/id_rsa debug1: Trying private key: /home/slevinski/.ssh/id_dsa debug1: Trying private key: /home/slevinski/.ssh/id_ecdsa debug1: No more authentication methods to try. Permission denied (publickey). [15:05:14] slevinski: Allright, you're trying to use keys on bastion -- that won't work, you shouldn't have keys here. You'll need to either set up agent forwarding or proxycommand like the document I pointed you at suggests. :-) [15:05:21] https://wikitech.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [15:05:22] ^^ [15:07:01] Hmm. Interesting. I wonder why it broke. Luckily I have time for the migration. [15:15:02] That tool is ridiculously huge! [15:16:45] zhuyifei1999_: You're using tens of GB in millions of files. I don't know what exactly your tool does, but that kind of big data stuff is usually better done in a database? [15:17:27] Oh, bah, he's gone. [15:19:08] We're still stuck on the 20-million-file-tool? :P [15:19:50] pietrodn: Yeah; the actual copy. That will... take some time. [15:20:17] Just curiosity… how fast is the link between the two datacenters? [15:20:20] should I worry if migrate-tool says "is prepating [sic] migration" since ~18 hours and I see no new files in eqiad? 'watchr', pretty small, no fancy stuff [15:21:42] sitic: if it says 'preparing' then your run of migrate-tool actually failed. [15:21:45] Coren, just for clarity, once I use migrate-tool, do I have to wait for the copy or is the copy complete once it says "All set [15:21:47] "? [15:21:56] Cyberpower678: That /queues/ the copy. [15:22:11] Coren, so in other words, I wait. [15:22:11] pietrodn: The link is plenty fast, this is disk I/O bound. [15:22:23] Cyberpower678: you can re-run migrate-tool to check the status of the copy [15:22:40] Scheduled for copy [15:22:44] Cyberpower678: Yes; sadly the queue is clogged until that humongous tool is copied. [15:22:58] Coren: yes, and having millions of little files is not going to speed things up, right? [15:22:59] Coren, what tool is that? [15:23:11] yifeibot [15:23:38] pietrodn: It's about halfway done, as far as I can tell. [15:24:35] sitic: I've reset the migration status; nothing had been done yet. Try the migrate-tool again? [15:25:03] Coren: it says "prepating" but I guess it means preparing ;-) Can I re-shedule or alternatively copy things by hand and ignore it? [15:26:07] thanks, scheduled for copy [15:27:04] I fixed that typo btw. :-) [15:29:23] Coren, Is there an eta to yifeibot completing migration? [15:33:18] Icinga giver bad gateway http://icinga.wmflabs.org/ [15:33:24] *gives [15:33:30] Cyberpower678: I'm guessing around 2h [15:33:48] O.O [15:33:54] Cyberpower678: Judging by the current progress and presuming the rate will remain roughly the same. [15:34:01] Just how big is that tool? [15:34:22] tens of GB over millions of files [15:34:28] WTF? [15:34:51] You'll want to ask the maintainer if you're curious about what it does. [15:35:10] * Cyberpower678 is reminded to clear out xtools caching data before initiating migration. [15:35:41] * Cyberpower678 also decides to wait a bit before scheduling a copy. [15:36:04] Coren: guessing your familiar with puppet and specifically creating users with puppet? :) And sorry to jump on you at such a clearly busy time ;p But I have https://github.com/wmde/puppet-builder/blob/master/manifests/builder.pp#L31-36 which creates a user by the homedir is owned by root as a default.. [15:36:15] Coren, I do have some DB questions. [15:36:35] and then I get fun things such as err: /Stage[main]/Wdbuilder::Builder/File[/home/wdbuilder/.ssh]/ensure: change from absent to directory failed: Failed to set owner to '51768': Invalid argument - /home/wdbuilder/.ssh [15:36:35] (if you dont have time ill try to find someone else to poke [=) [15:37:02] 1. Cyberbot used the old DB .my.cnf file before replica.my.cnf was introduced. What changes should I expect? Coren [15:37:41] Cyberpower678: Your tools-db databases won't be migrated automatically; you'll have to dump them and restore them. [15:38:01] Cyberpower678: If you don't have any, then it's not an issue and won't change squat. :-) [15:38:11] Coren, I have two. [15:38:28] And they're extremely to the bot's operations. :p [15:38:49] Coren, I thought migrate-tool dumped them for me. [15:38:57] addshore: You can't do that with a home on NFS. [15:39:35] Cyberpower678: It does, if you use the replicas.my.cnf. Dump them before you start the migration into files in the tools' homes and they'll get copied over. [15:39:35] cant chown a home? :P [15:39:51] addshore: Can't chown a home with to a user that doesn't exist in LDAP. [15:39:59] ahhhh, hmmmmm [15:40:36] * addshore guesses he needs to put this users home dir somewhere else then? ;p [15:40:41] Coren, how do I dump them. I haven't dumped a DB before. [15:40:55] addshore: Also, you want system => true, [15:41:15] addshore: Because you're trying to create a local system user. [15:41:35] okay [15:41:45] would I be able to chose a dir in /data/project? [15:41:49] *chown [15:42:05] Cyberpower678: mysqldump -h tools-db --databases databasename databasename... [15:42:39] addshore: Nope. Also NFS, also needs LDAP. Besides, thing of what that would cause: more than one instance trying to manage the home? [15:42:56] Coren: the old MySQL credentials for tools are still valid after the migration [15:42:58] Cyberpower678: That sends to standard output, so you probably want to add >mydumps.sql [15:43:13] pietrodn: By design, during the migration period. [15:43:27] Coren, thank you [15:43:33] Coren: ok, so we should definitely update those [15:43:54] pietrodn: Yep. [15:44:00] Ok, thanks :) [15:44:17] Coren: indeed, guess I have to stick it somewhere not NFSed then :P [15:45:24] matanya, rdwrer, are the new etherpad instances working OK? And, shall I redirect etherpad.wmflabs.org to point at the new eqiad etherpad-lite instance? [15:45:29] Coren: any suggestions for a sensible path? [15:45:57] My god. My DB is huge. [15:46:15] how huge Cyberpower678 ? :P [15:46:30] addshore, 4MB [15:46:37] Bigger than I thought. :p [15:46:56] I thought it was only a few KBs [15:48:59] Coren, successfully dumped DBs [15:49:45] brb [15:51:03] andrewbogott: I no longer have anything but data on that instance, so go nuts :) [15:51:49] rdwrer: Does that mean that etherpad.wmflabs.org is no longer used? Should I just reclaim the address? [15:52:13] andrewbogott: I kept it around for data recovery, but maybe I should send a mail saying "lol your data is gon' die" [15:52:25] I'm not sure what timeline we set for that. [15:53:18] I'll redirect the IP for now. I guess http://etherpad.wikimedia.org/ is officially supported now so the labs install is moot? [15:54:35] gooood morning [15:55:23] Coren: would I get /dev/vdb by adding in puppet " include labs_lvm " ? [15:56:36] hashar: You'll also need to create a volume: labs_lvm::volume { 'somename': mountat => '/mnt' } [15:56:47] ohhhh [15:57:23] (By default, the volume will be created with all the remaining free space) [15:57:23] shouldn't it be the default ? Since when we create an instance we have a bunch of storage available. I would rather have it mounted by default when creating an instance [15:57:28] unless that is to save disk space [15:57:57] hashar: It saves disk space, yes, but it's not by default because if you create a "all space" volume then it's too late to create a couple smaller ones. [15:57:59] andrewbogott: seems ok [15:58:07] Coren: make sense :] [15:58:17] Coren: will look at adding some puppet snippets on the role for which I need a /mnt [15:58:31] Coren: there might be no use case for beta cluster instances though. [15:58:34] hashar: Point me at 'em when you're set and I'll +2 [15:58:51] rdwrer: do you have a link handy that you could use to verify that old pads are still availabel now that I've moved everything? [15:59:32] Coren: regarding migrating beta. I will most probably rebuild it from scratch in eqiad. Since everything is puppetize that should not be too much of a problem. [15:59:48] andrewbogott: No, the service is purposefully not running, so no link exists [16:00:10] oh, I see, so you meant literally data preservation, not actual… providing of said data :) [16:00:15] hashar: It's the better solution if you can swing it; it gets you fresh images and avoids the quirks of having old schemes in the new DC [16:00:34] andrewbogott: Correct - I responded to one email about getting a pad off the machine, a long time ago - I think it may be time to retire the ol' girl [16:00:35] So, anyway, I will mark that project as 'done' and stop bugging y'all about it. [16:00:39] bd [16:00:39] hashar: Also, it tests your puppet config. :-) [16:00:54] Coren: they are the one from prod so it should be fine :D [16:07:54] Coren: would need the quotas for deployment-prep on eqiad to be raised. pmtpa currently has allocated 61 cores, 121856 RAM, 26 instances [16:08:12] hashar: public IPs? [16:08:15] I guess we could go with eqiad quotas of 70 cores 150000 RAM and 30 instances [16:08:20] public IP I got 5 that is enough [16:08:35] andrew assigned them yesterday and I allocated them on the varnish boxes that need the public IPs. [16:08:42] 10security gorups is enough [16:08:58] oh and I need more than 10 instances in eqiad 30 might be enough [16:09:26] one day we will have to review the number of core / ram / disk being consumed by beta and attempt to optimize the ressources consumption :D [16:10:44] ram 131072 34816 0 [16:10:44] instances 32 10 0 [16:10:44] cores 64 17 0 [16:11:22] Round numbers. :-) [16:13:11] zhuyifei1999_: Your tool is being migrated, but given that it has literally /millions/ of files, it'll take some time before it's done. [16:13:42] Coren: ok [16:13:55] zhuyifei1999_: What /does/ yifeibot do with all those files? Also, wouldn't it be a good idea to consider using a DB for such large amount of data? [16:14:07] coren thank you [16:14:14] I love how 64 and 32 are considered round # [16:16:32] Coren: i saw that /shared/pywikipedia/ is getting the repos, but /shared/pywikipedia/core/ is still missing, it's a work in progress, right? [16:16:58] Alchimista: Yes. [16:17:03] hi, I'm having problems logging into instance staging of mobile project, it simply hangs forever. tried rebooting, didn't help [16:19:13] Coren: maybe using 2 worker processes would speed up this process significantly [16:19:30] petan: I/O bound. [16:19:36] now everyone is waiting for 1 tool to get migrated and tools that were flagged for migration are already stopped [16:19:38] So no. [16:19:53] are you sure that 2 processes would kill the storage? [16:20:02] that makes it look like pretty weak storage [16:20:37] "kill"? No; but it wouldn't be any faster. [16:21:19] petan: Also, after talking with zhuyifei1999_, the copy is now rescheduled without the crazy big directory. [16:21:35] Coren: it would be faster because many other small tools would be migrated while waiting for this 1 huge [16:21:59] It sounds like an interesting statistics question, actually. Probably has been answered already, too. [16:22:38] of course, that total time of migration would probably be same, but that is not what I am talking about [16:22:40] it makes no difference for the mean copy time, but it might change the median time [16:22:45] petan: It wouldn't /be/ faster; it'd be slower. Grouped reads are faster than scattered ones. It "would" appear faster for some people, at the expense of slowing everyone down. [16:22:52] I am talking about owners of small tools having to wait for big tools [16:23:16] Coren: yes it would appear faster for owners of small tools and slower for owners of big tools which sounds fair to me [16:25:21] petrb@tools-login:~$ become huggle [16:25:22] sudo: sorry, a password is required to run sudo [16:25:28] o.O [16:25:38] petan: Should I also prioritize according how how 'important' tools are? Perhaps how politically connected the maintainer is? :-) My concern is total time of migration. [16:25:55] Coren: ok is that password known bug? [16:26:05] petan: There is an odd bug; some groups have incorrectly sync'ed the list of maintainers. Add and remove one on wikitech and that'll fix you up. [16:26:20] petan: look it 'getent group tools.huggle' [16:26:46] Also, in re huggle: [16:26:49] huggle/bin/tar: huggle: file changed as we read it [16:27:03] You have stuff modifying your files during migration? [16:28:31] Coren: I don't know o.O it shouldn't happen since all jobs were stopped [16:28:48] Coren: I would rather say it's some weird nfs bug, it happens to me on bots project too when working on /data/project [16:28:59] Coren or andrewbogott, I'm having problems logging into instance staging of mobile project, it simply hangs forever. tried rebooting, didn't help [16:29:04] Yeah, no. NFS doesn't randomly modify files. [16:29:33] MaxSem: I'm in a call, will check shortly. [16:29:34] Coren: idk why it happens then, but it only happens when I am tarring on gluster or nfs... however what you suggested didn't fix the bug [16:29:38] MaxSem: I've got little time to diagnose atm. Emergency level? [16:29:44] Coren: I added Ryan to huggle and removed then, I still can't use become [16:29:47] no [16:30:45] petan: Did you log off and on afterwards? Changes of maintainers. [16:30:54] yes [16:31:08] ha [16:31:10] it works now [16:31:15] maybe it needed some time for resync [16:38:49] Coren, I see I have my cyberbot node. :-) [16:40:00] Cyberpower678: Yep. [16:40:05] And it might even work. [16:40:39] Coren, I just successfully migrated cyberbot. Much less painless than I thought it would be. :-) [16:41:19] Coren, http://tools.wmflabs.org/cyberbot/spambotstatus.php is running of eqiad now. :D [16:43:54] Coren: What's the url for tools ganglia in eqiad? [16:44:49] hedonil1: I... haven't actually paied attention to that aspect yet. :-) [16:45:07] Coren, ganglia for eqiad? [16:45:37] Cyberpower678: there is not even icinga nor web proxy so far [16:45:47] Cyberpower678: I think ganglia has low priority [16:45:55] Ok. [16:45:58] at least it doesn't block migration like webproxy does [16:46:02] Not important for the moment. [16:46:08] webproxy is coming soonish. [16:46:21] good [16:47:06] Coren: just curious. maybe you could add that link to the tools homepage afterwards [16:53:08] Coren, I can't delete the contents of articleinfo/data in xtools. If the transfer starts it might clog. [16:53:26] It's got 225,000 little files in it. [16:53:35] Hah! Just 225k? :-) [16:53:45] (It'd be faster if you could make a tarball of them first though) [16:54:10] If not, don't worry about it. [16:54:18] Coren, can you delete the contents of articleinfo/data within the public_html folder? [16:54:37] Sure. [16:54:46] Coren: one think that is is on my wishlist is being able to connect directly as a tool [16:55:08] * Cyberpower678 echoes Betacommand. [16:55:09] would make sftping files much easoer [16:55:18] * Cyberpower678 echoes Betacommand again. [16:55:19] *easier [16:55:49] right now Im force to run every change through a svn repo [16:56:16] which means I have 10-15x more commits than I really need [16:56:40] Why are you using SVN? [16:56:50] Cyberpower678: It works on windows [16:56:58] GitHub does too. :p [16:57:18] Cyberpower678: and I can have non-public repos [16:57:27] meh [16:57:51] Cyberpower678: I am greedy with my code [16:58:03] :p [16:58:17] Id rather not hand AK47's to 8 year olds [16:58:32] Coren, did you get it? [16:58:35] 8-) [16:58:47] Cyberpower678: I've destroyed all the .xmls. [16:58:54] Thank you. [16:59:00] :-) [16:59:34] rm ALL THE FILES! [16:59:57] Coren: any particular reason that we cannot login as a tool? [17:00:18] Coren: still copying? [17:01:45] Betacommand: There are lots of reasons, none of them particular. The primary one being that tools aren't "real" users and don't have ssh keys in wikitech. The second is that this would remove access control and accountability; you need to auth as a maintainer of a tool to access it. If the tool has a key, then anyone who has the key can access it. [17:02:17] And it becomes impossible to know which maintainer did what. [17:02:39] Or even if whoever has the key /is/ a maintainer and therefore has agreed to the TOU. [17:03:05] Coren: however it makes it really annoying to try and work with files [17:03:28] hashar: loh [17:04:04] for example I cannot use WinSCP for file copying/maintenance like I used to on the toolserver [17:04:26] Betacommand: That's because WinSCP is broken. I use scp/sftp a lot to manage my tools. [17:04:47] Coren: what tool do you use? [17:04:54] Coren: that's just a complete non-argument. If a widely-used tool cannot use the infrastructure, the issue is with the infra, not with the tool. [17:05:13] it's like saying 'hey, 60% of our users use IE6, but IE6 sucks so we won't support it' [17:05:57] I'm also not sure how one would use scp/sftp, because there is no obvious way to 'become' with that, either. [17:06:28] so, really, the simple option would be to fix the webserver to also accept files that just have the correct group instead of just the correct owner [17:06:59] +1 fixing the webservers [17:07:20] given that now all tools are using own webserver, it's no security breach [17:07:51] !log deployment-prep mwversioninuse gives a wmf branch instead of master. That breaks l10n messages update and the job https://integration.wikimedia.org/ci/job/beta-code-update/ . Root cause is the python based scap. [17:07:53] Logged the message, Master [17:08:11] valhallasw`cloud: That's been the case in forever. :-) [17:08:18] hashar: that was a typo, it should be "python based crap" [17:08:25] yes, it has been broken forever [17:08:31] petan: lighty-based webservice doesn't care what the owner is. [17:08:39] aha [17:08:44] it never has. [17:08:55] ok that fixes it I guess [17:08:57] That was only the apache suphp crap. [17:09:05] on new cluster everyone use lighttp or not? [17:09:22] petan: Nobody's obligated to use it, but there's no apache default. :-P [17:09:33] Coren: still copying? [17:09:35] default or no apache at all? [17:10:02] if there was an apache it would be cool to document how to use it [17:10:04] zhuyifei1999_: Yes, but now it's done with the big dumps you had in public_html [17:10:08] petan: There isn't one. [17:10:30] ok I consider it fixed then given that this issue was with apache only [17:10:32] http://tools-eqiad.wmflabs.org/legobot/ <-- no webservice [17:10:44] if users will jsub their own apaches that's not our issue [17:11:44] Coren, . granting control of databases to the new user [17:11:45] ERROR 1044 (42000) at line 1: Access denied for user 'pg50985'@'%' to database 'p50380g50985__%' [17:11:45] ERROR 1044 (42000) at line 1: Access denied for user 'pg50985'@'%' to database 'p50380g50985__%' [17:11:45] ERROR 1044 (42000) at line 1: Access denied for user 'pg50985'@'%' to database 'p50380g50985__%' [17:11:45] ERROR 1044 (42000) at line 1: Access denied for user 'pg50985'@'%' to database 'p50380g50985__%' [17:11:47] ERROR 1044 (42000) at line 1: Access denied for user 'pg50985'@'%' to database 'p50380g50985__%' [17:11:49] ERROR 1044 (42000) at line 1: Access denied for user 'pg50985'@'%' to database 'p50380g50985__%' [17:11:51] ERROR 1044 (42000) at line 1: Access denied for user 'pg50985'@'%' to database 'p50380g50985__%' [17:12:02] Whoops. [17:12:47] Cyberpower678: ... why in hell does your old replica.my.cnf contain an invalid username? [17:13:19] (And it's not an issue unless you had replica databases with those names; your old credentials will keep working so you can tweak permissions) [17:13:54] Coren, I never touhed it. [17:13:57] !migration [17:13:57] https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration_Progress [17:14:50] Cyberpower678: Well, the migration script can't really do much about broken configuration files except do its best. We'll fix issues if you run into them. :-) [17:15:08] Ok, thanks. [17:18:22] Sometimes the links in tools.wmflabs.org redirect to URLs like "http://tools-webgrid-01:4007/" which are not valid. See https://tools.wmflabs.org/logs/ [17:19:32] pietrodn: WFM. [17:20:24] Coren: the link to the "ugly list" is broken [17:20:39] Ah, the first link on the page? Missing its trailing / and the actual tool is misconfigured. [17:21:10] Possibly because the directory doesn't exist. [17:21:18] Ah, ok [17:21:33] HTTP demands that /foo/directory redirects to the correct /foo/directory/ [17:22:01] ... but in this case, the redirect isn't fixed automatically by the proxy because the destination doesn't exist. [17:22:14] So it doesn't quite know what to do. [17:22:34] interesting :D [17:23:17] Well, all of my tools have migrated to eqiad now. Thanks to all for the help, have a nice day [17:26:13] !log deployment-prep hacked in mwversioninuse to return "master=aawiki". Relaunched l10n job using mwdeploy user and then running mw-update-l10n [17:26:14] Logged the message, Master [17:31:27] Coren, which replica.my.cnf is used in eqiad. The new one or the old one? [17:33:57] (03PS1) 10AzaToth: grrrit: Allow filtering based on branches [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/116996 [17:34:03] (03PS1) 10AzaToth: grrrit: Pass betacluster messages to QA [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/116997 [17:34:05] hashar: ↑ [17:34:20] untested offcourse [17:37:12] Coren, nevermind that first question. But how's the migration queue looking? [17:39:49] Cyberpower678: it progresses. cyberworm is done. [17:39:49] AzaToth: you are soo fast :-] I guess Yuvi will look at it when he wake up [17:40:10] hashar: was a trivial change [17:40:45] hashar: but to make it easier for me, I threw all possible betacluster branches to QA [17:41:05] (03CR) 10Hashar: [C: 031] "Nice one. I love how any branch named 'betacluster' will end up in qa channel. Thank you very much!" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/116997 (owner: 10AzaToth) [17:41:08] yeah that is lovely [17:41:20] unlikely to be used elsewhere, but we never know :] [17:41:38] Coren, any way of looking at the queue? [17:41:43] From my end? [17:41:47] Cyberpower678: Nope. [17:41:50] hashar: haver not tested it though, as I lack some redis :-P [17:41:53] :< [17:41:59] That takes place in prod. [17:42:10] Still :< [17:42:16] tested the regexpen though [17:43:10] (03CR) 10Hashar: "I would make :.* the default whenever it is not set. That would make the conf more readable. But some people prefer settings to be set " (031 comment) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/116996 (owner: 10AzaToth) [17:43:22] AzaToth: can stillb e deployed and test in prod :] [17:43:26] heh [17:43:41] AzaToth: thank you! I am leaving for home now. See you tomorrow! [17:43:44] k [17:43:56] AzaToth: o/ [17:45:37] Coren: I have added you as a reviewer to a bunch of operations/puppet change for beta. Should be harmless for prod. [17:45:45] Coren: not urgent since I am leaving home anyway. [17:46:18] (03CR) 10AzaToth: "Non-trivial to do that (for me at least)" (031 comment) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/116996 (owner: 10AzaToth) [17:46:57] hashar: not trivial to do I'm afraid [17:47:07] AzaToth: maybe yuvi will figure it out :] [17:47:18] AzaToth: lets delegate merge to him. There is no urgency [17:47:23] k [17:47:31] I guess it can be done with the regex [17:47:36] figuring out how many fields are captured [17:47:42] if there is only one, assume branch=.* [17:47:46] else use branch=$2 [17:47:52] (aka second capture group in the regex) [17:47:55] something like that [17:48:22] though the regex has no capturing groups [17:48:31] * andrewbogott is about to go to sleep... [17:48:35] would need parantesis [17:48:37] Last chance for project migration requests! [17:49:12] (03CR) 10Krinkle: grrrit: Allow filtering based on branches (031 comment) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/116996 (owner: 10AzaToth) [17:49:50] or timo hehe [17:49:56] anyway I am off *wave* [17:50:03] wave [17:50:23] Krinkle: thought about using a separate array, but it felt clumsy [17:50:52] but true, could check for a single colon [17:51:15] AzaToth: Can I invite you to make a Vada app? [17:51:30] and assume no one ever gets the bright idea to use (?:asdf) [17:51:36] a930913: Vada? [17:52:04] AzaToth: A JS wiki framework I'm making. [17:52:24] * AzaToth goes adding a non-capturing group to a regex in the config just to foil Krinkle's idea [17:54:10] AzaToth: COuld also use an '@' as separator [17:54:18] : has other meanings in ssh/git [17:54:25] not applicable here, but still [17:54:28] and regex indeed [17:54:48] for example, : could be used later as file path filter [17:54:52] e.g. :/resources [17:57:09] true [17:57:25] could use ⑦ [17:57:30] as separator :-P [17:58:00] Or snowman [17:58:28] <^d> Crazy unicode freaks me out. [17:58:35] <^d> Especially when it appears in logs :p [17:58:46] hehe [17:59:41] ^d: hi, could you find the problem in Wikibase? [18:00:22] <^d> Didn't find it yesterday. [18:00:26] <^d> But I'm not giving up! [18:02:20] If you want to be really evil you could use ZWNJ as separator (U+200C) [18:04:32] AzaToth: Will you take a look? I could do with some feedback. [18:06:14] AzaToth: https://en.wikipedia.org/wiki/Wikipedia:Vada [18:09:58] Coren, finished migration. :D [18:17:45] andrewbogott_afk / Coren, any luck with my problem? [18:18:52] MaxSem: Hadn't gotten to it yet, but I can take a look now. Which instance and project is this? [18:19:07] Coren, staging / mobile [18:21:34] MaxSem: Ah, the usual gluster breakage. [18:21:59] sigh. we still donn't monitor for these? [18:22:30] It's moot; gluster goes away for good in weeks. [18:23:57] MaxSem: You should be okay now. [18:24:24] Coren, "Connection to staging closed." [18:24:53] Coren, works now [18:25:01] took it 5 attempts:) [18:25:09] autofs takes some time to wake up. [18:26:12] thanks! [18:53:25] hi, Python doesn't work with locale.setlocale(locale.LC_ALL, 'ca_AD') [18:53:26] >>> locale.setlocale(locale.LC_ALL, 'ca_AD') [18:53:26] Traceback (most recent call last): [18:53:26] File "", line 1, in [18:53:26] File "/usr/lib/python2.7/locale.py", line 539, in setlocale [18:53:26] return _setlocale(category, locale) [18:53:26] locale.Error: unsupported locale setting [18:53:27] I can't run archivebot.py because of this error [18:54:23] coet: I don't remember that locale being installed. Please open a bugzilla requesting it. [19:24:31] Yay... tools hasn't exploded so far [19:54:04] Coren: when opening mutt in eqiad I get the error /var/mail/hedonil: No such file or directory (errno = 2). worked in pmtpa. Am I doing anything wrong? [19:54:36] Hmmm. [19:56:20] Ah-ha. Something that wasn't properly puppetized. [19:56:24] * Coren fixies. [19:58:39] Coren, mkdir( $gitFolder, 2775 ); ends up creating a folder with permission 3305. Any ideas? Seems to have started every since I migrated to eqiad. [19:59:02] Cyberpower678: Your umask is almost certainly wrong. [19:59:15] Coren, what? [19:59:29] Wait, '2775'? That's decimal dude. :-) [19:59:41] You probably want '02775'. Octal. [20:02:10] I blame addshore. He wrote it. [20:02:14] :p [20:02:57] ok Coren [20:03:13] And 2775 decimal -> 5327 which, with a normal umask of 0022 becomes... 3305. :-) [20:03:45] coet: ok what? [20:03:56] [19:55:17] coet: I don't remember that locale being installed. Please open a bugzilla requesting it. [20:04:43] Cyberpower678: whatd did I write? [20:04:49] Ah, yes. :-) [20:06:40] addshore, the new autoupdater, including the mkdir part where you wrote the mode as 2775 [20:06:57] I wrote the mode as 2775? :P [20:07:19] Yes. :p [20:07:36] parsoid-spof in the Visualeditor project seems to hang on glusterfs [20:13:23] * hedonil blames Cyberpower678 for not properly documenting his successful move to eqiad (not mandatory but... ) [20:14:06] hedonil, having trouble? [20:14:35] Cyberpower678: no. just the 2document" fetish [20:14:42] :p [20:14:59] Cyberpower678: https://wikitech.wikimedia.org/wiki/Tool_Labs/Migration_to_eqiad --> add your sign here [20:15:04] Cyberbot is running beautifully on eqiad and the SUL tool is even faster now. [20:15:20] Cyberpower678: yep. very fast now [20:15:30] hedonil, I'm not signing yet. I want to make sure everything works right. [20:15:48] That the new DB connections and setting are correct, etc.. [20:16:01] Cyberpower678: so I withdraw the blame ;) [20:22:20] AzaToth: feel free to self merge and deploy :) [20:22:23] thanks for the patches [20:23:14] Coren, is is possible that projectstorage.pmtpa.wmnet:/visualeditor-project is unhappy? [20:33:46] YuviPanda: dunno if I have access to deploy [20:34:16] AzaToth: you should, if not I can give you [20:35:13] AzaToth: you do [20:38:08] !ping [20:38:08] !pong [20:39:06] !ping [20:39:06] !pong [20:40:57] YuviPanda: when I did "become lolrrit-wm", the host says "local-lolrrit-wm@*OLD*tools-login:~" [20:41:24] AzaToth: i suppose that is part of the eqiad migration [20:41:41] so I'm at the wrong place? [20:42:12] probably not. give it a shot? [20:42:21] should I be at eqiad or at ptmfdsfsdfs [20:42:31] (can never remember the name) [20:43:41] YuviPanda: would be happy if you could review the commits so there's papertrail [20:43:53] AzaToth: I could do that tomorrow only though :( [20:44:09] k [21:15:04] hashy - let me know if you need a hand with beta over the weekend... not that I've jabbed it back into life for a very long time, but did do some of the intial bits long ago iirc... [21:33:10] (03CR) 10Yuvipanda: [C: 04-1] grrrit: Allow filtering based on branches (031 comment) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/116996 (owner: 10AzaToth) [21:34:19] (03CR) 10Yuvipanda: [C: 031] "This is nice! But needs fix for prev. patch before this can be merged" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/116997 (owner: 10AzaToth) [21:34:26] AzaToth: i responded :) [21:45:22] petan: around? [22:03:16] Just in case I'm going crazy - bot was running fine in pmtpa, but now has an issue after the move [22:03:19] ClueBot III (talk | contribs)‏‎ (bot) [22:03:25] mw1208 - assertbotfailed - Assertion that the user has the bot right failed [22:03:36] Which means no edits are done... =\ [22:05:50] Damianz: so you got logged out, apparently? Try logging in again. [22:05:52] Damianz: Sanity check: The bot is logged in, right? [22:06:14] (querying meta=userinfo is a good way to check) [22:06:42] Re-loging in is a bot restart... which it has been restarted and has a token.... hmm I'll add some extra debugging code [22:06:59] Literally copied over though, so should work [22:10:01] Login result => Success, but Action 'delete' is not allowed for the current user (same for protect, move, block, unblock, email)... guessing we indeed do have a login issue somewhere [22:10:16] Thanks anomie/valhallasw`cloud [22:54:31] Anyone here? [22:57:43] nope [22:58:05] * ^d hides [23:06:41] I'm sure everyone here is already aware, but per request of 28bytes... http://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard#Possible_bot_malfunctioning.3F [23:10:32] I concur... same issue I've just been looking at - 2 bots, same class, same calls, 1 fails a large chunk of the time and the other occasionally... but always due to tokens/auth grrr and yes those 10. are tools/labs boxes [23:11:05] tools*OLD* ? [23:11:09] * Damianz can't for the life of him figure out why this has chosen to break now though... considering the same script that was working in pmtpa last week is also now broken in pmtpa... [23:11:18] huh: pmtpa [23:11:24] Tampa? [23:11:27] mhm [23:11:45] tools-login-eqiad is new tools [23:12:18] Why not just call it tools-login? [23:12:22] and make the old one -pmtpa [23:12:56] Because when the servers really move in like 2 weeks everything *-eqiad will become the default and pmtpa will die [23:13:04] Chaging it around now for 2 weeks is more confusing [23:13:17] so will everything on pmtpa be lost? [23:13:40] No... it might be stopped/broken though [23:14:17] How can I copy everything to eqiad? [23:14:38] For tools there's migration scripts, for instances the suggestion is to rebuild them from puppet [23:14:51] Are you on the mailing lists? There's lots of emails/wiki pages around with info [23:15:11] * Damianz pokes Coren in case he has any ideas as to wtf would have changed in tools to break bots auth stuff... even when the calls look ok [23:16:23] Damianz: Oauth-related? [23:17:05] Nah.. just old school api user/pass login (which is successful) and then actions using the session (very unsucessful, but the cookies are being passed correctly) [23:17:44] ... what. Hm... [23:18:10] Wait, I did notice that Wikitech, at least, looses login creds several times a day lately and requires loging back in. [23:18:54] I had /presumed/ that was due to the current work by Andrew and Ryan, but if you say you've had issues with the API on the projects, it may point to something deeper. [23:18:58] It's like a) mostly working - cbng is only missing a portion of actions and b) totally not woking cb3 has missed hours worth... almost like varnish is caching things and stipping the cookie before it hits the app servers [23:19:27] There's been so many things changed it's hard to tell if it's really broken/how broken... but this sorta is saying 'broken' to me right now... somehow [23:19:47] Damianz: That seems rather a big deal. Please open a bugzilla about it. [23:20:09] Damianz: It's definitely not labs-specific if it happens anywhere else than on wikitech. [23:20:25] hmm... well [23:20:45] Is it just me that thinks labs use to be shown as the external ip on enwiki, as it hits the external of the lb? [23:21:03] Because now you get the 10. ip... which I have a feeling you didn't use too... which makes it labs smelling for things [23:21:12] But that could also be totally unrelated/I have bad memory [23:21:30] Also OMFG bugzilla has big buttons [23:22:14] It shouldn't be related as long as you hit the external IP on the enwiki side -- routing might have changed a bit but once it hits the caches that's not important. [23:22:37] And I've had multiple reports of lost (user) sessions lately, so I'm guessing it's related. [23:29:41] (03CR) 10Adamw: "Anyone sober yet? :p" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/112311 (owner: 10Adamw) [23:33:29] Explained in https://bugzilla.wikimedia.org/show_bug.cgi?id=62288 in vaugly strung together english [23:33:44] For now I'll just restart the bot lots, which seems by random chance to make it work for 'a while' [23:33:48] * Damianz sigh [23:39:57] On a randomly different note - does anyone actually understand what all the different categories and lists mean in bugzilla? I swear everytime I try and use it I can never find the right thing to put stuff under [23:54:43] Coren: I can't seem to find my cron stuff after the migration :/ [23:55:14] a930913: It's in your crontab, commented out. Just edit it with crontab -e [23:55:20] And remove applicable comments. [23:57:00] Coren: It's not there. I'm somewhat confident the stuff I'm thinking of was on cron :/ [23:57:49] Hm. Perhaps it was on -dev? If you migrated very early, the script didn't use to back /that/ up yet. [23:58:14] Ah, that's probably it. "Early"? [23:58:48] First 10 hours or so before someone pointed out they sometimes had crons running on -dev. :-) [23:59:32] Coren: If I did, they'd still be there on -dev? [23:59:47] a930913: Yep.