[03:56:20] gwicke: hey, did you deploy today? [03:57:53] Ryan_Lane, yes I did; third time is the charm [03:58:01] cool. glad to hear it [03:58:12] new interface better? [03:58:19] yeah, a bit cleaner [03:58:27] didn't try the revert interface yet [03:58:43] my main pet peeve currently is that salt is still broken [03:59:12] there is no revert [03:59:59] ok, just checkout & re-deploy? [04:00:02] yep [04:00:11] easiest way [04:00:13] that's better [04:00:24] revert is difficult to handle sanely [04:00:50] yeah, I'm waiting for a new release for the timeout fix [04:01:34] also learned that salt permissions for non-roots are also just sudo rules [04:01:58] yep [04:02:38] that's not completely true, though [04:02:54] there's peer call permissions that are protected via sudo [04:03:09] so if you become root on tin you are still limited to what the salt master allows that minion to run [04:03:46] where are those rights configured? [04:03:52] on the salt master [04:04:00] via puppet [04:04:16] manifests/role/salt [04:04:28] I'm looking at deployment.pp [04:04:39] see role::salt::masters::production [04:04:46] it's not configured in the deployment hash [04:05:02] it's in role::salt::masters::production [04:05:13] the salt_peer_run argument in class { "salt::master": [04:05:14] ah, salt_peer_run I guess [04:05:20] that's specific to runners [04:05:25] "tin.eqiad.wmnet" => ['deploy.*'], [04:05:38] it's also possible to allow tin to call commands directly for minions [04:06:12] so I was wondering how salt would work for restarting a service on one box only [04:06:15] http://docs.saltstack.com/ref/configuration/master.html#peer-publish-settings [04:06:26] http://docs.saltstack.com/ref/configuration/master.html#peer-run [04:06:33] basically the equivalent of sudo service parsoid restart [04:07:02] that's what the runner currently does, except it runs it on all targets that match [04:07:17] we could add another param to the runner to make the call more targetted [04:07:25] so that's not built in? [04:07:34] the runner is something I wrote [04:07:48] I use the runner as an additional means of authorization [04:08:08] it ensures that tin can only run deploy on hosts with specific grains [04:08:26] so it's salt calling the runner on all hosts with a specific grain [04:08:28] we could allow deploy commands directly against the minions and bypass runners [04:08:41] can salt also be more specific on where to call the runner on? [04:08:47] yep [04:08:51] if we add it to the runner [04:09:08] is my description of salt calling the runner correct? [04:09:09] right now it targets like: salt -G 'deployment_target:' [04:09:14] the runner runs on the master [04:09:19] modules run on the minions [04:09:20] ahh [04:09:32] a peer call occurs from tin to the master, which runs the runner [04:09:40] the runner makes calls to modules on the minions [04:10:01] I'm really just using the runner as an authorization point [04:10:46] the runner also makes the calls easier from tin, since tin doesn't need to know how to target, just that it's doing an action on the repo [04:10:47] so my understanding is that currently everybody can deploy to any trebuchet-deployed service [04:11:06] yes, we'd limit that by using posix groups and stricter sudo calls [04:11:43] there hasn't been a need so far so I haven't really prioritized that [04:12:22] could the same functionality be implemented with a sudo script that directly calls modules via salt? [04:12:30] without runner [04:12:44] yep [04:12:55] the script would fix params in a canned/safe way [04:12:56] there's a timeout value on that call, too [04:13:35] so the main users of salt don't call stuff from minions I guess [04:13:36] we could do the equivalent of what we're doing in the runner directly on tin [04:14:07] hm. I wonder how batching would work... [04:14:17] is there a dsh-like mode in salt? [04:14:26] a 'shell' module so to speak [04:14:40] what do you mean by dsh? [04:14:50] the ability to run arbitrary commands? [04:14:58] yes [04:15:07] yes, but we shouldn't ever expose that [04:15:11] how would I do the equivalent of dsh -g parsoid foo with salt? [04:15:16] roots use it, but it's dangerous [04:15:21] from a script that would be handy [04:15:30] or rather, could [04:15:59] that command would run as root [04:16:18] sure, but that's often what you want when you install software [04:16:21] it's possible to have salt drop to another user [04:16:33] why would you install software via a script? [04:16:48] a command, not a script [04:16:50] if there's something that's going to install something, you'd want to make it into a proper module [04:17:12] like pkg: http://docs.saltstack.com/ref/modules/all/salt.modules.aptpkg.html#module-salt.modules.aptpkg [04:17:59] in general, though, you want to avoid doing arbitrary commands [04:18:10] everything should be through a more specific command, or a custom module [04:18:10] I see, there is some value in the functionality around setting up sources etc [04:18:24] apt-get install is arguably more complex that way [04:18:36] more complex which way? [04:18:39] via pkg? [04:19:06] although salt '*' pkg.install looks quite ok [04:19:16] apt-get is actually very hard to automate [04:19:45] it has lots of weird ways it returns, it can just hang, it has a lot of edge cases [04:19:58] the pkg module for apt in salt has already worked all of that out [04:20:28] sounds good [04:20:50] if you're aiming at a deb deployer, I'd wrap it in trebuchet [04:20:58] and have it return data to redis [04:21:07] I'll likely try package-based deploys in our rt test infrastructure [04:21:28] we're adding a salt master to beta [04:21:32] I added some docs on how to add one [04:21:47] https://wikitech.wikimedia.org/wiki/Help:Project_hosted_salt_master [04:21:50] right now we rsync stuff around there, which is a pain mostly because our git version is also broken wrt submodules [04:22:07] you could also use trebuchet there [04:22:14] I'll be adding docs on how to set it up [04:22:27] it's relatively straightforward [04:22:30] what would the benefits of that be? [04:22:47] well, you could extend trebuchet to do deb deployments [04:22:54] it could be called via jenkins [04:23:28] would that be easier than using salt directly from a script? [04:23:48] in the long run? yes. in the short term? probably not. [04:24:29] which advantage do you see in the longer term? [04:24:44] everything using the same system for deployment [04:24:57] with a consistent reporting method [04:25:01] by name, yeah ;) [04:25:13] the ui and functionality are completely different though [04:25:20] same salt calls, same reporting [04:25:53] and that new deployment method would have a better chance of being used elsewhere [04:25:56] so is the redis reporting something you added because salt doesn't do it properly itself? [04:26:14] salt records output on the master [04:26:21] in jobs [04:26:39] but it doesn't occur in real time [04:26:58] and you'd need to make calls to the salt master a lot [04:27:11] salt has support for returners for exactly this purpose ;) [04:27:17] http://docs.saltstack.com/ref/returners/ [04:27:42] could those just log to something like graylog2? [04:27:55] well, in this case it isn't logs [04:28:04] it's data in exactly the format I want for reporting [04:28:06] that said... [04:28:15] http://docs.saltstack.com/ref/configuration/logging/handlers/salt.log.handlers.logstash_mod.html [04:28:18] graylog is json data [04:28:32] well, you could write a graylog returner [04:30:18] I got to go, thanks for all the info! [04:30:20] the returners just get the returned info from the modules and do what they're going to do [04:30:20] yw [04:35:58] bd808|BUFFER: is deployment-scap running puppet properly? I was going to add the trebuchet classes to it [04:37:25] hm, actually, I'll wait till you've made it a minion [04:37:32] and I'll just write docs for now [04:53:35] bd808|BUFFER: https://wikitech.wikimedia.org/wiki/Trebuchet#Using_Trebuchet_in_Labs [09:46:26] Coren: https://en.wikipedia.org/wiki/User_talk:Coren#Talking.2C_talking.2C_talking... [09:46:43] Coren: unless I confused you with whoever actually runs the bot... [09:56:33] andrewbogott_afk: is there any schedule to copy data from bots projects gluster? because I want to delete it whole, so don't copy it [09:56:50] I am already deleting it [10:05:51] andrewbogott_afk: 300gb freed :o [10:07:03] Damianz: is cluebot fully migrated or it still run on bots? [10:13:03] rschen7754: can I kill your stuff on bots-4? [10:13:15] rschen7754: is there anything on bots project that you are still using? [10:13:22] I am going to delete all instances and all data [13:01:12] <`fox`> hi, i'm migrating stuff from toolserver but I have issues running php scripts on labs. I just added a new project and inserted a file with just but it returns a 500 [13:01:19] <`fox`> what am I missing? [13:02:14] <`fox`> the the script is here http://tools.wmflabs.org/manypedia/test.php [13:02:35] <`fox`> and I am not getting any log in my home, so dunno how to debug [13:02:44] That's odd; lemme see. [13:03:04] <`fox`> thanks Coren ;) [13:04:49] D'oh! You just got here, so you never heard about the data center migration! [13:05:24] <`fox`> nope, I started looking at labs now basically [13:06:30] <`fox`> I requested an account in january, then I had that issue so I stopped working on it as I had other stuff to do, and now that I have free time I would like to finish that work [13:07:00] Heh. Due to the timing, you set yourself up in the datacenter that's being turnd off Monday. :-) Thankfully, the fix is trivial. From your user account on tools-login, type: migrate-tool manypedia [13:07:36] <`fox`> Coren, lol...anyway I had the same issue in january [13:07:46] Then move to tools-login-eqiad.wmflabs.org rather than tools-login. [13:07:48] <`fox`> so dunno if that solves it, let's see [13:08:51] In January the datacenter didn't exist yet. [13:10:07] I could explain to you why your tests in the old datacenters didn't work, but that's not very useful since you have to move. :-) [13:11:08] hi Coren, just an information: after the successfully migration (migrate-tool + finish-migration) I found a ...DATA.olduser file in the new home, can I remove it? [13:11:24] rotpunkt: Yes, you can remove it if you want. [13:11:28] <`fox`> Coren, ok, thanks ;) well if it's not a long story I could listen to it...but anyway it doesn't really matter [13:11:33] thanks! [13:12:34] `fox`: Right, so now you need to 'webservice start' and you're all set. [13:12:41] <`fox`> Coren, but now is the project still running here http://tools.wmflabs.org/manypedia/test.php ? [13:12:44] <`fox`> ah ok [13:12:55] if some scripts need it, I can leave it anyway [13:13:29] <`fox`> Coren, /usr/bin/webservice: sonet does not appear to be a tool [13:13:37] rotpunkt: Once migration is complete, it's left around in case you forgot what your old credentials were and you need to copy more databases away. [13:13:54] thanks for the explanation, I will leave it for some time [13:13:56] `fox`: You have to do it from your tool ('become manypedia') not your user account. [13:15:15] <`fox`> Coren, cool! now it works...is there a good documentation somewhere for getting started? [13:16:29] !newweb [13:16:29] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help/NewWeb [13:16:36] ^^ this explains the webservice thing. [13:16:58] <`fox`> thanks ;) [13:17:26] Otherwise, most of the documentation at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help still appliles; I just need to merge both once migration is over. [13:34:19] Coren: bonjour! :-] [13:34:29] can we finish up the l10nupdate mess for beta please [13:34:47] you posted a new patchset on https://gerrit.wikimedia.org/r/#/c/118071/ [13:34:58] hashar: Yep. Have you looked it over? [13:35:11] https://gerrit.wikimedia.org/r/#/c/118071/6/modules/mediawiki/manifests/users/l10nupdate.pp [13:35:12] yeah [13:35:28] I am not sure we want puppet to define /home/l10nupdate since I created that user in wikitech [13:35:38] appart from that, it seems to be fine [13:36:11] hashar: You do, otherwise it won't be able to put the key in there (creating a user in LDAP doesn't create homes; only actually /logging in/ with that user does). [13:36:25] And you probably don't want to log in with that user. [13:36:26] ahhh [13:36:43] so I guess that is fine to me [13:36:53] Wanna +1 this so I can +2 and merge? [13:37:32] did [13:39:14] hashar: Merged. [13:39:43] * hashar crosses fingers [13:40:13] :( [13:40:14] err: Failed to apply catalog: Parameter before failed: No title provided and "/home/l10nupdate/.ssh" is not a valid resource reference [13:40:36] o_O? [13:40:45] Why did Jenkins +2 this? [13:40:53] cause it does not compile the catalogs [13:40:57] just does the puppet parser validate [13:40:58] * Coren tries to understand the error message. [13:41:14] akosiaris is working on something to compile the catalogs on all node which would catch such failures [13:41:22] D'oh! Moron! [13:41:34] My before clauses are just plain wrong. [13:41:37] * Coren fixes. [13:42:01] should be before => File['....ssh'] isn't it ? [13:42:24] Yep. I don't know how I could miss that. [13:42:34] I havent spotted it either [13:43:17] https://gerrit.wikimedia.org/r/#/c/118696/ [13:43:50] :-] [13:44:09] +3 [13:45:57] Fix't [13:48:13] notice: /Stage[main]/Mediawiki::Sync/Git::Clone[mediawiki/tools/scap]/Exec[git_pull_mediawiki/tools/scap]/returns: executed successfully [13:48:15] yeahhhhh [13:48:17] progress! [13:48:30] Joy! [13:58:43] now I got to figure out why my two apache instances behave differently although they have the same classes applied grrr [14:02:08] <`fox`> Coren, is there a db that can be used as "index" for all the db available? like the one on toolserver [14:03:00] `fox`: Yes; it's mentionned on that help page. meta_p.wiki on all databases. [14:03:31] There is also meta_p.legacy which has compatible schema with the toolserver's but I recommend using wiki if you can. [14:04:07] <`fox`> ok thanks [14:05:47] <`fox`> Coren, on which server is it? it's not mentioned in the doc [14:06:25] `fox`: All of them. :-) Most people use s3.labsdb because that tends to be one of the less busy one but you can pick any shard. [14:16:32] Coren: is there an autoupdated list of migrated tools? [14:17:27] liangent: Tim made https://tools.wmflabs.org/tools-info/migration-status.php but I don't know if this is live or updated at interval. [14:19:12] Coren: enough for me. I migrated some of my tools a few days ago but I forgot the list [14:20:48] liangent: You can check for the presence of a file named ...MIGRATE.REMOTE in a tool's home to confirm wether it has been migrated, also. [14:25:14] war more madness with mwdeploy user :-] [14:25:14] err: /Stage[main]/Misc::Deployment::Vars/File[/data/project/apache]/owner: change from root to mwdeploy failed: Failed to set owner to '996': Invalid argument - /data/project/apache [14:25:15] err: /Stage[main]/Misc::Deployment::Vars/File[/data/project/apache]/group: change from root to mwdeploy failed: Failed to set group to '51789': Invalid argument - /data/project/apache [14:25:45] hashar: Bleh. [14:26:12] hashar: Prepare a changeset to do the same as with l10nupdate; I'll create the LDAP user and group [14:26:23] (Better not create it through Wikitech) [14:26:29] Coren: another question, will I be able to create databases called p***g***__*** on eqiad replica? [14:26:56] Coren: sure. Feel free to cleanup l10nupdate user which I created via wikitech. It has my personal email as a contact right now [14:26:59] liangent: Yes, for a time, but don't rely on it. You should create sXXXXX databases now. [14:28:30] Coren: but I can't rename old databases :/ [14:29:22] liangent: Yeah, mysql doesn't like renaming databases. If you /want/ you can dump-and-restore, but you will always have access to already existing databases regardless of their names (there were grants made to grandfather them in) [14:30:18] <`fox`> Coren, is the name of the replica server always the same as the dbname? [14:30:33] `fox`: Yes. [14:30:40] <`fox`> ok, perfect [14:32:07] Coren: mwdeploy hack for labs is https://gerrit.wikimedia.org/r/118699 :-] [14:32:31] hashar: User created, and I stripped the password and email away from the l10nupdate user. [14:32:38] awesome! [14:33:24] Coren: my current tool code doesn't like inconsistent database prefixes [14:34:17] liangent: Eeew. Yeah, I understand by there is, literally, no way rename databases in mysql. [14:34:32] liangent: If they are fairly small, then a dump-and-restore would do. [14:34:47] Well, it'd also work if they are large -- just longer. :-) [14:36:02] Coren: :) [14:36:21] hashar: Merged. [14:36:38] Coren: I just listed https://wikitech.wikimedia.org/wiki/Talk:Labs_Eqiad_Tools_Migration . are they considered evil commands :p [14:38:33] liangent: They are not evil, but I'd really rather people take the time to actually inspect any log files they really want to keep and not just gzip them outright and blindly. A quick inspection shows a few /terabytes/ of logs in pmtpa, most of which clearly useless garbage. [14:42:27] hm [14:42:50] uid=113(mwdeploy) gid=120(mwdeploy) groups=603(mwdeploy),120(mwdeploy) [14:42:50] :-] [14:42:53] Coren: thank you! [14:43:28] hashar: Wait, what? That's not the right uid and gid. [14:43:46] hashar: You have a /local/ user named mwdeploy as well as the LDAP one. [14:43:57] that is the local group on pmtpa instances [14:44:04] o_O [14:44:06] will fix it manually by deleting the local user and group [14:44:15] Hm. Well, in /theory/ that shouldn't be too problematic. [14:44:22] But it could lead to oddness. [14:44:26] and l10nupdate user on labs has a typo in its homedir : https://gerrit.wikimedia.org/r/118701 :-] [14:44:30] l10update ( missing a n ) [14:47:23] Coren: well I have a habit of keeping log files. users might ask me one day in the future "why did your bot do XXX in the past at ZZZ", though it's fine to keep them gzipped after migration [14:47:25] !log deployment-prep changing uid/gid of mwdeploy which is now provisioned via LDAP (aka deleting local user and group on all instance + file permissions tweaks) [14:47:27] Logged the message, Master [14:48:15] Coren: https://wikitech.wikimedia.org/w/index.php?search=phpmyadmin&title=Special%3ASearch&go=Go search broken on wikitech? [15:00:35] liangent: What's broken there for you? [15:18:03] * Coren goes back to his battle with exim. [15:27:40] scfc_de: "An error has occurred while searching: We could not complete your search due to a temporary problem. Please try again later." [15:28:13] @notify Damianz [15:28:13] I'll let you know when I see Damianz around here [15:28:24] @notify rschen7754 [15:28:24] This user is now online in #wikimedia-labs. I'll let you know when they show some activity (talk, etc.) [15:30:45] Coren: sorry to interrupt your exim battle, but I could use a udp2log user in LDAP just like l10nupdate and mwdeploy :] [15:31:16] hashar: How about you make sure that's the last one you need? :-) [15:31:23] I have no clue :] [15:31:46] Meh. Start on the changeset, I do the user. [15:31:53] on it :] [15:32:14] * hashar shoots himself [15:32:21] the user is created by the debian package hehe [15:32:55] Wait, the package creates the user? [15:33:01] * Coren grumbles. [15:33:29] Why the hell is its home in /home then? [15:33:50] hmm [15:33:54] * hashar checks again [15:34:34] true system users should not live in /home [15:35:12] I dont even know where the udp2log package is :] [15:35:35] What is the actual puppet error you are getting? [15:35:45] err: /Stage[main]/Role::Logging::Mediawiki/Misc::Udp2log::Instance[mw]/File[/data/project/logs]/owner: change from root to udp2log failed: Failed to set owner to '997': Invalid argument - /data/project/logs [15:35:55] the udp2log logs to the shared dir /data/project [15:36:16] and /data/project/logs belong to root [15:36:24] Well, first off, the owner is set to '997' that's definitely wrong. [15:36:32] I can't change it to be owned by udp2log because that user does not exist on the NFS server [15:36:35] It should be to 'udp2log' by name. [15:36:41] since udp2log user got created on the instance with UID 997 [15:36:47] ahh [15:36:51] liangent: I get the usual "Create the page "Phpmyadmin" on this wiki! See also the page found with your search." and https://wikitech.wikimedia.org/wiki/Server_admin_log/2008-09 as the search result. [15:36:52] Numeric user IDs will *never* work on NFS. Ever. By definition. [15:36:57] \O/ [15:38:11] But yeah, I see why you're trying to make the destination directory owned by it. it kinda sucks, but I see why. I'll work around it another way, then. [15:38:14] can't find it sorry [15:38:29] <`fox`> Coren, is there something particular to do to have python cgis running? I added a python script in cgi-bin but I get a 404 [15:38:57] `fox`: cgi-bin does not exist with the new scheme. Check the documentation at: [15:38:59] !newweb [15:38:59] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help/NewWeb [15:39:22] The idea of 'cgi-bin' is kinda silly this century anyways, but there is a workaround there. [15:39:58] hashar: But, for the record, having a local system user write to a shared directory is just asking for trouble and is a really bad practice. [15:40:21] Coren: yeah hence why we could get rid of udp2log user on the instance and use a LDAP user instead [15:40:25] what I have is http://paste.debian.net/plain/87676 [15:41:27] I've added a similarily named system user on the NFS server as a workaround. [15:41:44] chowns, by name, will work now. [15:42:21] I know what I'm going to be doing at the ops hackaton for sure now. [15:42:29] so even with different UID that is going to work ? [15:43:07] This needs cleanup. That silly thing of creating system users randomly on different servers and hoping the UID match is slightly less smart than a sack of rocks. [15:43:30] hashar: NFS4 speaks usernames, not user ids. [15:44:44] good [15:45:20] (You can /make/ it speak UID, but it requires kernel tweaks, is very brittle, and causes all sorts of problems with group membership) [15:46:36] na it is fine to me [15:53:15] arrghhh udp2log is still using Augeas -( [15:56:11] another super simple change https://gerrit.wikimedia.org/r/118708 to get rid of /home/wikipedia/logs symlink to /data/project/logs [15:56:15] there is no more /home/wikipedia :-] [15:56:26] Coren: ^^ :-] [15:58:09] now I gotta figure out how the hell timidity-daemon ends up being installed on production application servers :] [15:58:21] o_O? [15:58:51] Do we have something that renders midis into audio files for previews, maybe? [15:58:51]  modules/mediawiki/manifests/init.pp has a service { 'timidity': ensure => stopped } [15:59:08] but on my instances I got: [15:59:09] err: /Stage[main]/Mediawiki/Service[timidity]: Could not evaluate: Could not find init script for 'timidity' [15:59:21] turns out timidity is installed as a dependency of wikimedia-task-appserver for some reason [15:59:32] and on precise the timidity package does not provide the init script [15:59:39] liangent: search is broken because ^d and manybubbles need to regenerate the index. [15:59:43] it is provided by the package timidity-daemon which is not installed on my instances :] [15:59:55] which, ^d and manybubbles, consider yourselves nagged twice :) [16:00:05] we regenerated it _days_ ago [16:00:36] manybubbles: Ah, sorry, wasn't it regenerated before I upgraded? [16:00:38] something else is broken but I can't tell what without logs [16:00:43] And hence the index didn't traverse the broken bits? [16:00:53] Or did you regenerate again after I updated things? [16:01:42] manybubbles: would you prefer I start a tracking bug or shall we sort this out here and now? [16:01:50] <^d> Let's sort it now. [16:02:09] ok. Sorry about the re-nag, I didn't know that you'd already regenerated. [16:02:15] <^d> manybubbles: eww. http://p.defau.lt/?1Zpvra3MwkL7VT4htT_3aw [16:02:19] OK, so, apache logs? [16:02:23] <^d> that's one of the things I meant by "wonky" [16:02:51] ^d: hit it with --reindexAndRemoveOk --indexIdentifier now [16:03:04] that'll rebuild it anyway [16:03:06] btw, y'all definitely should assume that mw + extension versions are messed up, since I only barely know what I'm doing in that regard. [16:03:08] but, I see what you mean [16:03:28] <^d> http://p.defau.lt/?tYq_f8Yat6E9IxnbncM6Zg [16:03:45] ^d: funky [16:03:49] I've never seen that locally [16:03:55] but shit happens [16:03:55] <^d> Nor have I...locally [16:04:07] does it only happen with the funky wikitech thing? [16:04:23] <^d> The "Validating number of shards...is but should be 1...cannot correct!" [16:04:24] andrewbogott: grab master from both cirrus and elastica.... [16:04:26] <^d> I did see yesterday [16:04:42] is isn't a happy thing [16:05:21] <^d> When I was trying to debug yesterday it looked like our number of shards *was* 1. [16:06:05] manybubbles: ok, cirrus now has a patch from the 13th, elastica from the 10th. recent! [16:07:29] andrewbogott: so cirrus has maintenance scripts that you should run after an update. ^d is having trouble with them, so we'll see what we can do from here first [16:07:52] ok. Let me know what/if you want me to run [16:09:24] ^d: it is totally working for me locally..... [16:09:32] <^d> Me too. [16:09:35] andrewbogott: can you try running: php extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now [16:10:03] ok, it's running [16:10:35] seems happy -- want the output? [16:10:40] sure! [16:11:11] https://dpaste.de/4kRY [16:11:13] /status twemproxy' returned 1: status: Unknown job: twemproxy [16:11:18] * hashar whistles [16:11:21] moreee dependencies [16:11:51] manybubbles, ^d, also my default test search ('address') is now working properly! [16:11:59] liangent, how about you, search looking better? [16:12:17] ^d: labswiki != wikitechwiki [16:12:25] I'm not sure about what, but hey [16:12:42] <^d> I don't think wikitechwiki is a thing. [16:12:43] maybe one is virt1000 and one is virt0? [16:13:04] ^d: you tried to use it and it didn't work [16:13:09] I've tried to use it in the past [16:13:32] andrewbogott: it works for me now [16:13:48] woo! Thanks y'all [16:13:56] andrewbogott: sinple answer: when you upgrade cirrus run that thing [16:14:05] <^d> manybubbles: wikitechwiki isn't a cluster wiki either. [16:14:16] manybubbles: with reindexAndRemove? Or just the script wout args? [16:14:22] <^d> I'm going to delete the elastic index since it's empty and confuses. [16:14:38] andrewbogott: best bet is this monster: https://gist.github.com/nik9000/9550932 [16:14:43] do reindex labswiki [16:14:50] ^d: already did [16:14:51] ok... [16:14:55] <^d> ah ok [16:15:11] andrewbogott: the README has all this, if you had time to read it [16:15:26] that's the README in the cirrus source? [16:15:27] <^d> Nobody has time :( [16:15:29] andrewbogott: actually that reindex thing has mwscript in it, which you can't use [16:15:38] just replace that wiki php and you should be good [16:15:48] ok, makes sense [16:16:03] <^d> manybubbles: We should clean up the README and make it less dense. [16:16:04] ^d: so, when have you seen the error otherwise? only when using that --basename thing? [16:16:09] <^d> Move some of our WMF-isms to wikitech [16:16:11] ^d: always [16:16:23] most of the WMF-isms are already moved, I think [16:16:26] but yeah [16:16:39] andrewbogott: why did it work for scfc_de while it didn't work for me? [16:17:00] liangent: the index was partially broken so some searches worked and some didn't [16:17:04] <^d> manybubbles: I did see it yesterday. Wasn't with baseName though. [16:17:14] funky [16:17:30] liangent: "foo" was working but "bar" wasn't [16:17:35] I haven't a clue, really [16:17:47] we're really careful keeping the one in production current [16:17:58] but we don't have access to the wikitech one [16:18:02] not properly, at least [16:18:06] <^d> In some $magicFuture we'll stop making so many schema updates. [16:18:46] 9 months or something.... [16:33:57] petan: hi [16:37:44] rschen7754: can I delete your stuff on bots project? [16:37:50] petan: yes [16:40:35] petan: do I have anything left on bots? [16:40:49] likely not, I am deleting everything [16:42:59] petan: fine [16:57:51] <`fox`> Coren, would it be possible to have libGeoIP installed on the labs machines? I need it for a project and it was available on toolserver [17:00:15] `fox`: It's not going to be very useful to you since you'll not get the actual enduser IP; but you can request it with a bugzilla, I'm going to do a round of package installs next week. [17:02:07] We could install some GeoIP on the proxy and set a HTTP header accordingly. [17:03:43] <`fox`> Coren, I will use it for visualizations on wikipedia edits [17:04:10] <`fox`> so I don't need to process incoming requests [17:04:28] <`fox`> I have the real IPs of anonymous edits ;) [17:04:45] Coren: are *.e* and *.o* excluded for migration too? [17:05:02] e[0-9]+ [17:06:47] I found two files lost in migration, .e442923 and .o442923 [17:10:32] liangent: Yes, as known log files. [17:10:57] I might have forgotten to enumerate those. You can scp between the DCs though. [17:11:36] `fox`: Right, just open a bugzilla with it; it's a fairly simple thing to add a package I just don't have time until the middle of next week. [17:16:11] <`fox`> Coren, https://bugzilla.wikimedia.org/show_bug.cgi?id=62649 [17:16:55] `fox`: Perfect. Sorry for the delay. :-/ [17:17:43] <`fox`> no problem! [17:18:18] Coren: scp filename tools-login-eqiad.wmflabs.org:/home/liangent doesn't work? [17:18:23] Coren: Permission denied (publickey). [17:20:48] liangent: Go the other way around, using the alias 'eqiad', there is HBA [17:39:37] <`fox`> Coren, why in the replica there is no "user_properties" table? in toolserver it's there [17:47:04] `fox`: There's an open Bugzilla issue about providing that table IIRC. [17:48:09] addshore: you'll be migrating the dumpscan tool, I presume? [17:48:40] <`fox`> scfc_de`, ok, thanks [17:49:14] addshore: and repi -- what was that again? Some brilliant idea during the hackathon iirc [17:56:56] who wants to help me do a big search and replace in mysql? [17:57:14] * andrewbogott is a little bit terrible at mysql stuff [17:58:56] * addshore waves [17:59:22] valhallasw: i will probably just let it be migrated in the bulk migration :P [17:59:42] not got allot of time the next 3 weeks [17:59:45] addshore: that's also an option, but that will screw up the directory structure [18:00:11] petan: is 'bots' now fully migrated? Can I move it to the 'done' section? [18:00:21] mhhm, all the stuff is in public_html so it shouldnt be hard to resolve :) [18:00:28] petan: same question about 'huggle' [18:00:28] andrewbogott: yes [18:00:39] no, but I will do huggle myself soon [18:00:44] OK. I'll move bots. [18:00:45] And, thank you! [18:01:09] I /think/ that bots instances on pmtpa can be deleted, but I would like to confirm with some people before [18:01:27] ok -- I'm not going to start deleting stuff until next week. Composing an email about that now. [18:01:32] I already revoked access to most of people and nobody complained so far [18:01:46] I deleted almost everything from gluster today, more than 300gb of stuff [18:02:29] except for some people and bots I am not sure about like cluebot and salesbot [18:02:51] 300G 195M 300G 1% /data/project [18:03:58] We can still do wholesale migration of instances or files if that turns out to be necessary to avoid killing maybe-but-not-definitely-abandoned bots. [18:05:56] yes, that's why I kept both instances for these 2 bots [18:06:07] do you want me to just move them right now? [18:06:13] instances? no [18:06:19] ok [18:06:19] I still believe we delete them [18:06:40] I just need response from Damianz and Beetstra [18:06:53] cool [18:10:51] !log local-gerrit-reviewer-bot Migrated to eqiad; webservice started & crontab restored. Seems to be working. [18:10:52] Logged the message, Master [18:12:52] !log local-tsreports Moved -dev over to eqiad; deployment does not work due to different database credentials/database name. [18:12:53] Logged the message, Master [18:15:02] Coren: what's HBA? [18:15:24] YuviPanda: What magic did you do to install a fancy MediaWiki instance on multimedia-alpha? [18:15:30] and do you mean scp from pmtpa to eqiad on a eqiad host? [18:16:53] !log local-tsreports dev: adapted database name to s51693__cache. Deployment runs, webservice started. [18:16:54] Logged the message, Master [18:19:33] liangent2: HBA is host-based authentification. I. e., user on host1 sshes to host2, and host1 says to host2: "this is $user, you don't have to check his credentials, because you know I am host1." scp on tools-login.wmflabs.org (= pmtpa) to "eqiad:" (= tools-login-eqiad.wmflabs.org) works for me. [18:20:46] (To be clear: Log into tools-login.wmflabs.org (= pmtpa) and execute there "scp somefile-pmtpa.txt eqiad:somefile-eqiad.txt" works for me.) [18:21:23] petan: I'm interested in your judgement here… do you feel like the migration progress is going well enough? And that things running in eqiad are acceptably stable? [18:21:33] scfc_de: it doesn't work for me [18:21:41] My inclination is to forge ahead with our current deadline, but I don't want to steamroll folks if you think that we're not ready. [18:22:05] liangent2: just updated the page https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Tools_Migration#Manually_copying_files [18:22:23] I'm moving files in user account instead of tool account [18:22:32] maybe this is the difference [18:23:35] !log local-tsreports dev: due to different username-$HOME relationship, the lighttpd config breaks. Adapted, and tsreports-dev now works. [18:23:36] Logged the message, Master [18:24:44] liangent2: I was doing this as my user account. [18:26:16] hedonil: thank you! [18:27:43] scfc_de: still failed [18:27:48] liangent@tools-login:~$ scp \<* eqiad:/home/liangent [18:28:38] liangent2: are you logged in in pmtpa? [18:29:17] liangent2: as your prompt should look like *OLD*tools-login:~$ [18:29:47] hedonil: I have bashrc customized... [18:30:15] liangent2: so what does the error look like? [18:30:36] permission denied [18:32:15] liangent2: just try again. sometimes it needs 2-3 times to work [18:32:40] petan, need to run; I'll catch you in the backscroll if you respond. [18:33:19] liangent2: this was my current copy. worked at second attempt https://tools.wmflabs.org/paste/view/bd34c222 [18:36:05] YuviPanda: Ping re: mediawiki on labs [18:36:07] this suggestion works :) thx [18:36:49] Else I will just apply the mediawiki-install class and hope for the best [18:38:39] rdwrer: hmm, unsure. either mediawiki_singlenode or mediawiki-vagrant [18:40:33] YuviPanda: Neither of those seem to be enabled on the alpha instance...but mediawiki-install is [18:40:39] I'm gonna go for it [18:40:45] rdwrer: mediawiki-install? :| [18:40:48] If everything is broken forever that's fine [18:40:50] rdwrer: idon't remember what that is. [18:40:59] rdwrer: also I remember setting up multimedia-dragon, not alpha [18:41:12] YuviPanda: I'm pretty sure you did some magic on -alpha [18:41:25] rdwrer: does it have puppetmaster::self? [18:41:32] It does [18:41:43] But I had to enable that because I was doing magic with limn [18:41:52] I can not do that now, because limn0 has our stuff on it [18:59:06] !log puppet - added Rush as member and projectadmin [18:59:08] Logged the message, Master [19:06:49] Coren: Failed to add rush to bastion. [19:06:53] what did i forget [19:09:50] mutante: I gave Rush shell rights in February? [19:12:32] scfc_de: he is supposed to use bastion-restricted-eqiad and that tells him he needs membership in project bastion [19:12:40] and when i try to add him to that it fails [19:12:53] while i could just add him to another project, "puppet" without problem [19:13:20] in February he was volunteer, how he is root [19:13:51] now [19:16:21] mutante: He is a member of the project Bastion (cf. https://wikitech.wikimedia.org/wiki/Special:NovaProject or "groups rush"). [19:16:52] scfc_de: chasemp [19:17:09] rush the shell account [19:17:09] now [19:17:12] 12:05 ' you must be a member of the bastion project.' [19:17:20] so once I setup everything to go through the ops bastion etc [19:17:21] I see [19:17:45] :ssh puppet-testing-1 [19:17:45] If you are having access problems, please see: https://wikitech.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [19:17:45] Permission denied (publickey). [19:17:47] ssh_exchange_identification: Connection closed by remote host [19:17:59] I have changed my key since initial setup in wikitech [19:18:16] is it possible it is now stale, and/or was not added to the ops bastion prior since I was not ops [19:18:28] chasemp: Are you logged in on bastion-restricted and then try "ssh puppet-testing-1" on the command line there? [19:19:07] chasemp: your instance should also have a link to "view console output" btw, it's worth checking that to see if it as actually done creating it as well [19:19:21] sometimes it might still be running puppet when it's new [19:19:49] gotcha, 'ssh rush@bastion-restricted-eqiad.wmflabs.org' == 'Permission denied (publickey)' [19:30:48] scfc_de: he doesn't have a home dir on bastion though [19:38:47] mutante: The home dir is created on the first successful login. Wrong key => no login => no home dir. [19:39:46] scfc_de: ok, that makes sense, he still can't login though [19:40:38] root@bastion-restricted1:/home# tail /var/log/syslog [19:40:39] Mar 14 19:38:11 bastion-restricted1 nslcd[1027]: [f6c55a] error writing to client: Broken pipe [19:40:42] Mar 14 19:38:11 bastion-restricted1 nslcd[1027]: [ca16ca] error writing to client: Broken pipe [19:40:45] hmmm [19:42:02] mutante: Is that related? [19:42:27] mutante: There are no keys in either /public/keys/chasemp or /public/keys/rush, so I think he hasn't uploaded his ssh key yet. [19:43:34] scfc_de: i think we litearally just ended up with the same conclusion [19:43:35] yea [19:43:41] https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-openstack [19:43:44] that [19:44:19] scfc_de: probably not related but looked like it potentially could [19:44:38] but no. [19:44:48] welp I'm clearly on the ball, I thought the key was being sourced from somewhere else [19:44:53] Hrm. Can I get a public IP for the multimedia project in eqiad? [19:45:01] I'm switching our multimedia-alpha instance over [19:45:13] We have one in pmtpa that I'll release once it's done [19:49:42] rdwrer: https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration_Howto#Web_access_in_eqiad suggests filing a bug (if andrewbogott_afk or Coren can't do it immediately). [19:51:19] Ah, [19:51:23] Got it. [19:51:51] Proxies! [19:51:54] Will do. [19:59:06] just migrated to eqiad and everything looks quite good :) [20:22:35] Is it known that proxies are like...way slower [20:22:40] 3x slower maybe [20:24:22] rdwrer: in what aspect? [20:24:30] Page loading [20:24:34] throughput should be comparable, but latency could be a bit higher [20:25:51] and could be extra high as I think the proxy is in ptmpa, so there's an extra ptmpa<->eqiad round trip involved [20:26:07] Heh [20:26:11] Wonderful [20:26:23] should be better somewhere next week :-) [20:26:28] <^d> I just created an instance in eqiad, hadn't done anything special and was logging in first time. [20:26:47] <^d> Got a mostly normal login but with "Unable to create and initialize directory '/home/demon'." and then signal killed. [20:28:38] <^d> Coren: ? [20:32:20] ^d: maybe it'the 'creation is too fast for ldap' thingy. reboot may fix it. [20:32:36] * ^d tries that [20:35:44] <^d> Hmm, no luck [20:36:54] ^d: that's the end of my powers ;) (there was a second thing about cache & timeout...) [20:40:37] !log local-tsreports migrate-tool tsreports results in ERROR 1049 (42000): Unknown database 'p50380g50943__cache'. Should not be a huge issue as it's just a cache table. [20:40:39] Logged the message, Master [20:42:03] <^d> And it's all happy now. [20:43:32] ^d: There is some negative caching going on, if you had a fail it takes several minutes before it times out and tries again for real. [20:44:07] <^d> gotcha. [20:44:26] Coren, "User databases owned by this tool on the replicas are not renamed but the new tool credentials have been granted access." [20:44:33] ^ that was not the case for tsreports-dev [20:44:51] ... oh? That's a problem. Which database is that? [20:45:21] err, p50380g50943__cache, now s51693__cache on tools-db [20:45:57] Perhaps my message isn't clear; the database /isn't/ renamed, so s51693 (your new user) should have full control over p50380g50943__cache [20:46:06] mysql doesn't actually allow renaming of databases. [20:46:13] (Much to my annoyance) [20:46:13] the database p50380g50943__cache doesn't exist [20:46:39] Coren: try 'show databases' on tools-db [20:46:40] ... what? Then that has nothing to do with migration; none of the databases on the replicas are touched by it. [20:46:43] Oh! [20:46:51] tools-db <-- not a replica database! [20:47:05] tools-db databases /are/ renamed! :-) [20:47:06] the message was talking about 'User databases'. [20:47:29] "User databases owned by this tool *on the replicas* [...]" :-) [20:47:40] .... [20:47:55] I think I get what you mean, but that's just a bizarre inconsistency [20:48:21] that means that if someone had a database with the same name on different servers, those databases might now have different names [20:48:28] Well, in the case of tools-db the actual databases are dumped, then restored in the new infrastructure so the name change is possible. [20:49:16] valhallasw: No; any database you had on tools DB were named pXXXXXgYYYYY__%, and were renamed sZZZZZ__%. [20:49:27] tools-db [20:49:43] Yes, exactly. So if I had a pXXXXXgYYYYY__% on labsdbX /and/ a pXXXXXgYYYYY__% on tools-db, they would now have different names [20:50:04] in any case, the message as it is now is confusing. Then again, it'll be irrelevant in something like 48 hours anyway [20:50:47] Ah, I see what you mean. I doubt that's a likely scenario; and it's always possible to dump-drop-restore your pXXXXX databases to "rename" them. I just couldn't do it safely automatically. [20:51:37] !log local-tsreports after a git pull and fixing the config file, tsreports itself is also up in eqiad [20:51:38] Logged the message, Master [20:52:01] !log tsreports is the new morebots deployed yet? [20:52:01] tsreports is not a valid project. [20:52:08] apparently not [20:55:58] !log local-gerrit-patch-uploader Copy complete, but ldap permissions are broken. Fixing by adding and removing someone as maintainer. [20:55:59] Logged the message, Master [20:57:31] !log local-gerrit-patch-uploader oauth roundtrip is broken. Might be on the mw.o end, though. Investigating. [20:57:32] Logged the message, Master [21:00:19] !log local-gerrit-patch-uploader Apparently oauth-callback forwards to Location:http://tools.wmflabs.org, tools.wmflabs.org/gerrit-patch-uploader/. Restarting the webservice on ptmpa to see if the issue is due to the move. [21:00:20] Logged the message, Master [21:03:05] !log local-gerrit-patch-uploader No issues on ptmpa. Issue with determining the URL base path, or a different Flask version? [21:03:07] Logged the message, Master [21:10:30] !log Upgraded Flask from 0.8 to 0.9, but no change. [21:10:31] Upgraded is not a valid project. [21:11:56] !log local-gerrit-patch-uploader Upgraded Flask from 0.8 to 0.9, but no change. [21:11:57] Logged the message, Master [21:14:32] !log local-gerrit-patch-uploader flask-mwoauth calls redirect('/gerrit-patch-uploader'), so this is an issue somewhere in Flask / the lighttpd configuration. [21:14:33] Logged the message, Master [21:15:50] Coren, any idea why Flask might decide '/gerrit-patch-uploader' should resolve to "http://tools.wmflabs.org, tools.wmflabs.org/gerrit-patch-uploader/"? [21:16:16] This did not happen on ptmpa, but it does happen in eqiad. The Flask versions are now the same, so this suggests some lighttpd config change. [21:17:18] The lighttpd configuration comes from puppet, so that didn't change. You'll probably see why it makes that decision if you turn request loggin on in the .lighttpd.conf; that shows all the URL transofrmations it decides to do and why [21:18:37] So you'll quickly see if that's lighttpd doing something inane. [21:19:05] I'm thinking maybe lighttpd sends different server variables to the fcgi server [21:21:14] request logging doesn't help -- Flask sends back the Location: header (which, according to spec, must be an absolute URL). For some reason, Flask thinks the server name is 'http://tools.wmflabs.org, tools.wmflabs.org' [21:21:59] _SERVER["HTTP_X_FORWARDED_HOST"]tools.wmflabs.org, tools.wmflabs.org < is that allowed to contain multiple values? [21:23:21] yeah, this seems to be an issue that is an issue for multiple web frameworks [21:28:38] !log local-gerrit-patch-uploader Werkzeug<0.9 cannot handle multiple hosts in HTTP_X_FORWARDED_HOST. Upgraded to 0.9.4, and adapted requirements.txt to require >=0.9. Works now. [21:28:39] Logged the message, Master [21:29:21] Coren: This will be an issue for anyone using Werkzeug, so providing 0.8.4 on eqiad is basically useless at the moment. [21:35:07] hello guys. quick question. I'm trying to access the toolserver on a macbook using the "connect to server" function (connection through the terminal is working fine) but i get an error every time. I guess i'm getting the server address wrong. I use (tools-login.wmflabs.org) is this correct? [21:46:47] i am suddenly getting hella cron-spam from tool labs every time one of my jobs gets submitted (which is often) - is there a trivial way to disable that? [21:50:57] awjr: > /dev/null 2>/dev/null [21:51:23] just >/dev/null will stop mails if submission went OK [21:51:46] if you also add 2>/dev/null all mails will be stopped [21:52:06] oh this is actually an output issue from the jsub job [21:52:07] duh [21:52:09] ok thanks valhallasw [21:52:59] !log sugarcrm create eqiad server for transfer, using instance name examplecrms to be more precise then old one. [21:53:01] Logged the message, Master [22:05:27] awjr: Ah, yes, sorry about that. The information I sent last week about the new mail scheme also has the side effect that there is no longer local email delivery at all, it's /always/ forwarded. [22:06:01] Well, it's not the email that has that side effect, it's the changed described in it. :-P [22:06:30] Can I get a floating IP for equiad (moving server over from pmtpa)? [22:07:10] (sugarcrm project, though I'd like to rename the project too if that's possible ;) ) [22:10:44] Coren: no worries, thanks : [22:10:46] :) [22:14:18] resending : hello guys. quick question. I'm trying to access the toolserver on a macbook using the "connect to server" function (connection through the terminal is working fine) but i get an error every time. I guess i'm getting the server address wrong. I use (tools-login.wmflabs.org) is this correct? [22:19:21] onetimenichname: from the information I can find, "Connect to server" only lets you connect to servers that share filesystems and such, I don't that applies to Tool Labs [22:20:43] Nettrom : I see, thanks. I thought I could browse the files hosted on fileserver. [22:34:09] onetimenichname: You may, but that will require a scp/sftp client. I'm sure there exists one for MacOS, although I am not familiar enough to be able to point you at one. [22:36:54] Coren : yes, I was lazy to look for one, but i will :) [22:37:57] valhallasw: I think the proxy configuration changed recently in that it now adds some additional header so that the formerly failing "/tool/dir" redirects are now properly resolved, so different issue than pmtpa => eqiad. [22:42:36] scfc_de: I think it's due to the ptmpa -> eqiad -> webserver proxying -- the X_FORWARDED_HOST server variable now contains two values (tools.wmflabs.org, tools.wmflabs.org) instead of just one [22:43:38] in any case, upgrading Werkzeug solved the issue [22:45:31] anyway, bed. [22:47:20] Good night. [23:33:17] Coren: Is there an ETA for PostgreSQL to be accessible by users? [23:33:49] hedonil: It's fairly high on my list of priorities. If I had to make an educated guess, I'd say ~ 2 weeks from now. [23:34:57] ahh fine! just wanted to try something out and noticed, that psql client package is still missing ... ;) [23:35:19] Coren, so....I'm getting jsub emails about my jobs being submitted from the crontab suddenly. [23:35:40] AFK [23:36:45] Cyberpower678: Known side effect of the mail system being prepared to go "really" live; there is no longer any local mail delivery. You /used/ to get those email already, and they were likely filling your local inbox. [23:37:27] Uh oh [23:38:04] Coren, can that be suppressed. I don't really feel like having my inbox get flooded with these message. [23:39:26] CP678|AFK: You can use "jsub -quiet" to suppress non-error messages. [23:39:43] scfc_de, thanks. [23:39:47] CP678|AFK: Sez 'man cron': "When executing commands, any output is mailed to the owner of the crontab". Have your cron be quiet and no email will be sent. [23:40:12] Yeah, if you use jsub, '-quiet' does what you'd expect. [23:40:53] CP678|AFK: Coren: Haha. Seems my mail provider is rejecting mails from Labs .... [23:41:15] CP678|AFK: Coren ...To reduce email spam reception, we use popular RBL (Realtime Blackhole List) lists. These lists contain dynamic IP addresses and IP addresses currently known to be spam senders. ;) [23:41:27] hehe [23:41:37] :p [23:42:29] hedonil: I think someone else pointed out that the IP has no reverse DNS. SPF & Co. wouldn't harm probably as well. [23:43:12] scfc_de: I was just kiddin'. you're right: 554 invalid DNS PTR resource record [23:44:04] to silence cron jobs, append to your cron command line: >/dev/null 2>&1 [23:44:28] CP678|Dinner