[00:37:24] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 24% free memory [01:01:46] Alchimista, do you have the username alchimis on labs? [01:02:36] addshore: no, my nick is lchimista too [01:02:50] *alchimista [01:03:14] hmm, it might just be chopping off a few chars [01:03:26] are you running a python process on bots4? [01:03:43] yes, is it something wrong with it? [01:04:16] its just using rather allot of resources :P 49% ram and currently 97% cpu [01:05:01] :O it's an iw bot. i'll kill it and remove it fron crontab, i need to check how to manage it on labs [01:05:20] haha :p [01:05:22] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 11% free memory [01:05:44] im guessing it has a mem leak somewhere or something :P [01:05:58] but weired if you are using the pywiki script :O [01:06:31] it's usual, after a few days consuming too many memory. on ts, using cronie we can specify when the bot will be automaticly killed, and the maximum ammount of ram and cpu he can use [01:06:40] it's the standard py script [01:07:06] what you could do is restart the bot each day [01:07:35] when to bot runs write the pid to a file and have a cronjob load that file and kill the pid before it then starts the next run on the following day [01:08:03] right now it's more to get confortable with the labs envorinment, so that i can move my bots to labs [01:08:32] ahhh :p [01:08:35] i love it :P [01:08:42] PROBLEM Total processes is now: WARNING on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS WARNING: 175 processes [01:08:50] the only thing it is missing is replication databasseessss!!!! [01:09:08] i where playng arounf crontab and pgrep, i'm more used to ts crinie and qstat [01:10:18] the bot was already killed? [01:10:34] ? [01:11:05] PID 7122 still running :P 893 mins xD [01:11:39] there's a reason pywiki interwiki scripts are mostly banned on the toolserver... [01:11:55] are they greedy? :P I have never run one personally [01:12:41] Basically. Long live wikidata :P [01:13:44] RECOVERY Total processes is now: OK on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS OK: 100 processes [01:20:48] once in bastion, should be "ssh bots-4.pmtpa.wmflabs" to enter bots-4? [01:23:01] yep :) [01:23:29] nevermind, i forgot the -A flag :S must setup my ssh/config [01:25:43] well, the bot is off, now, and i'm gonna bed. i'll have to explore a little more of labs [04:40:24] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 20% free memory [04:51:02] PROBLEM Free ram is now: CRITICAL on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: Critical: 5% free memory [05:03:42] PROBLEM Free ram is now: WARNING on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: Warning: 19% free memory [05:08:22] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 14% free memory [06:28:55] PROBLEM Total processes is now: WARNING on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS WARNING: 152 processes [06:33:52] RECOVERY Total processes is now: OK on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS OK: 148 processes [06:48:43] RECOVERY Free ram is now: OK on wordpressbeta-precise.pmtpa.wmflabs 10.4.0.215 output: OK: 49% free memory [06:50:53] RECOVERY dpkg-check is now: OK on mw1-21beta-lucid.pmtpa.wmflabs 10.4.0.182 output: All packages OK [07:30:53] PROBLEM Free ram is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Unknown [07:40:52] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 10% free memory [08:38:22] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 24% free memory [08:41:04] PROBLEM Free ram is now: WARNING on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: Warning: 6% free memory [09:06:05] PROBLEM Free ram is now: CRITICAL on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: Critical: 4% free memory [09:11:23] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 14% free memory [09:26:04] RECOVERY Free ram is now: OK on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: OK: 97% free memory [09:50:52] !log bots petrb: updating some configs in apache to fix broken urls [09:50:54] Logged the message, Master [12:41:23] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 20% free memory [12:41:54] RECOVERY Free ram is now: OK on swift-be2.pmtpa.wmflabs 10.4.0.112 output: OK: 20% free memory [13:09:22] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 14% free memory [13:09:53] PROBLEM Free ram is now: WARNING on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Warning: 14% free memory [13:32:44] PROBLEM Free ram is now: WARNING on newprojectsfeed-bot.pmtpa.wmflabs 10.4.0.232 output: Warning: 19% free memory [13:58:49] !tunnel [13:58:50] ssh -f user@bastion.wmflabs.org -L :server: -N Example for sftp "ssh chewbacca@bastion.wmflabs.org -L 6000:bots-1:22 -N" will open bots-1:22 as localhost:6000 [14:44:03] @labs-resolve bot [14:44:03] I don't know this instance - aren't you are looking for: I-0000009c (bots-2), I-0000009e (bots-cb), I-000000a9 (bots-1), I-000000af (bots-sql2), I-000000b4 (bots-sql3), I-000000b5 (bots-sql1), I-000000e5 (bots-3), I-000000e8 (bots-4), I-0000015e (bots-labs), I-00000190 (bots-dev), [16:12:45] !q1 [16:12:45] Damianz where is teh ramcheck in puppet? [16:13:08] I am wondering if Damianz ever read it [16:14:12] https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=blob;f=files/nagios/check_ram.sh;h=bd26d5dee05ea17056733b8018a6ddce1dfce59f;hb=refs/heads/production [16:16:39] can you send me path - in current HEAD [16:16:51] puppet suck too much :/ [16:17:08] I did grep "ram_check" `find .` - no results [16:17:13] in puppet root [16:17:24] that means there is not a single file containing "ram_check" [16:18:36] petanb@server:~/puppet$ ls files/nagios/ [16:18:37] cgi.cfg check_longqueries check_to_check_nagios_paging misccommands.cfg percona [16:18:39] check_all_memcached.php check_MySQL.php check_udp2log_log_age nagios.cfg purge-nagios-resources.py [16:18:39] check_bad_apaches check_mysql-replication.pl check_udp2log_procs nagios-init resource.cfg [16:18:40] check_cert check-ssl-cert contactgroups.cfg nrpe-server-init submit_check_result [16:18:41] check_dpkg check_stomp.pl gammurc nrpe_udp2log.cfg timeperiods.cfg [16:18:42] check_job_queue check_subdir_limit migration.cfg page_all [16:19:11] I don't see check_ram.sh Damianz no idea [16:30:36] ok, Damianz if u can access that file, can you change it? [16:33:06] !git [16:33:07] For more information about git on labs see https://labsconsole.wikimedia.org/wiki/Help:Git [16:33:11] !search git [16:33:11] http://bots.wmflabs.org/~wm-bot/searchlog/index.php?action=search&channel=%23wikimedia-labs [16:33:18] @search git [16:33:18] Results (Found 9): leslie's-reset, damianz's-reset, account-questions, git, origin/test, git-puppet, gitweb, msys-git, git-branches, [16:33:26] !origin/test [16:33:27] git checkout -b test origin/test [16:33:46] !git-puppet [16:33:46] git clone ssh://gerrit.wikimedia.org:29418/operations/puppet.git [16:34:03] fuck [16:34:08] jeremyb around? [16:34:21] can you tell me that cheat to push stuff in wikimedia git? [16:34:27] I always can't remember [16:37:43] RECOVERY Free ram is now: OK on newprojectsfeed-bot.pmtpa.wmflabs 10.4.0.232 output: OK: 34% free memory [16:44:52] ok I did it [17:30:43] PROBLEM Free ram is now: WARNING on newprojectsfeed-bot.pmtpa.wmflabs 10.4.0.232 output: Warning: 19% free memory [17:32:41] petan: hi [17:34:23] PROBLEM host: preilly.pmtpa.wmflabs is DOWN address: 10.4.1.3 CRITICAL - Host Unreachable (10.4.1.3) [18:45:53] PROBLEM dpkg-check is now: CRITICAL on testing-amf.pmtpa.wmflabs 10.4.1.54 output: DPKG CRITICAL dpkg reports broken packages [18:57:26] !tunnel [18:57:26] ssh -f user@bastion.wmflabs.org -L :server: -N Example for sftp "ssh chewbacca@bastion.wmflabs.org -L 6000:bots-1:22 -N" will open bots-1:22 as localhost:6000 [18:59:53] PROBLEM host: eelwelling.pmtpa.wmflabs is DOWN address: 10.4.1.3 CRITICAL - Host Unreachable (10.4.1.3) [19:08:52] RECOVERY host: eelwelling.pmtpa.wmflabs is UP address: 10.4.1.3 PING OK - Packet loss = 0%, RTA = 0.71 ms [19:09:23] PROBLEM Total processes is now: CRITICAL on eelwelling.pmtpa.wmflabs 10.4.1.3 output: Connection refused by host [19:10:52] PROBLEM Current Load is now: CRITICAL on eelwelling.pmtpa.wmflabs 10.4.1.3 output: Connection refused by host [19:10:53] PROBLEM dpkg-check is now: CRITICAL on eelwelling.pmtpa.wmflabs 10.4.1.3 output: Connection refused by host [19:11:32] PROBLEM Current Users is now: CRITICAL on eelwelling.pmtpa.wmflabs 10.4.1.3 output: Connection refused by host [19:12:12] PROBLEM Disk Space is now: CRITICAL on eelwelling.pmtpa.wmflabs 10.4.1.3 output: Connection refused by host [19:13:02] PROBLEM Free ram is now: CRITICAL on eelwelling.pmtpa.wmflabs 10.4.1.3 output: Connection refused by host [19:53:29] andrewbogott: I try to commit my code. when I run "git review", I got error message http://justpaste.it/1s4f [19:54:33] mike_wang: can you paste the output of: git remote -v [19:54:35] hm… did you ssh -A to your dev instance? [19:54:48] mike_wang: you also should use git-review -s [19:55:19] mwang@mwang-dev:~/puppet$ git remote -v [19:55:21] origin https://gerrit.wikimedia.org/r/p/operations/puppet (fetch) [19:55:23] origin https://gerrit.wikimedia.org/r/p/operations/puppet (push) [19:55:24] mwang@mwang-dev:~/puppet$ [19:55:26] that would set up your working copy for you (aka create a remote named "gerrit" and download a script that adds a "Change-Id: 12031240ABCDEF" line in your commit summaries [19:55:34] try out 'git-review -s' [19:55:38] that should fix it up [19:55:57] <<< We don't know where your gerrit is. Please manually create a remote named "gerrit" and try again. >>> [19:57:39] andrewbogott: git-review -s error message. http://justpaste.it/1s4h [19:59:08] hashar: ^^ [19:59:25] doh [19:59:46] mike_wang: might be an old version of git-review :-D [19:59:57] you can simply try renaming the remote [20:00:01] it is currently named "origin" [20:00:12] you can rename it using: git remote rename origin gerrit [20:00:38] or create a new one: git remote add gerrit ssh://mwang@gerrit.wikimedia.org:29418/operations/puppet.git [20:00:45] (but then you get two remotes which is often confusing) [20:00:51] hashar, mike_wang, I'm wondering about shell username vs. gerrit username. Mike, are yours the same? [20:00:52] so I would recommend renaming the "origin" one [20:01:26] andrewbogott: in this case, the issue is in the git-review python scripts. It look up for a git remote named "gerrit" and bailout whenever it can't be found [20:01:37] oh, ok. [20:02:20] when we switched to git a year ago, that was a common support request :-] [20:02:40] mwang@mwang-dev:~/puppet$ git review [20:02:42] Problems encountered installing commit-msg hook [20:02:43] The following command failed with exit code 1 [20:02:45] "scp -P None gerrit.wikimedia.org:hooks/commit-msg .git/hooks/commit-msg" [20:02:47] ----------------------- [20:02:49] Bad port ' None' [20:02:53] hehe [20:03:01] stupid git-review [20:03:17] mike_wang: what's in your .gitreview? [20:03:46] ah [20:03:48] https:// [20:03:49] mwang@mwang-dev:~/puppet$ more .gitreview [20:03:50] no no [20:03:50] [gerrit] [20:03:52] host=gerrit.wikimedia.org [20:03:54] port=29418 [20:03:55] project=operations/puppet.git [20:03:57] defaultbranch=production [20:04:10] I guess the git-review version you are using is buggy / too old [20:04:14] this is https://bugs.launchpad.net/git-review/+bug/1021073 [20:04:24] he uses https I guess? [20:04:46] origin https://gerrit.wikimedia.org/r/p/operations/puppet (push) [20:04:47] oh yeah [20:04:55] saper: you are suuuch a hacker :-] [20:05:10] and there is https://bugs.launchpad.net/git-review/+bug/1075751 [20:05:18] one you use SSH [20:06:19] hashar: I propose to change the logic completely here https://bugs.launchpad.net/git-review/+bug/1097278 (try remotes first, then fiddle with the new one/.gitreview) [20:07:01] I would try .gitreview first though :-] [20:07:20] I wouldn't [20:07:21] ah no [20:07:26] we wanted orignally to get rid of it [20:07:31] you might not have the correct remote indeed [20:07:38] so you want to guess from the remote yeah [20:07:41] that makes sense [20:07:58] if you did "git clone" there are good changes you have a good remote, no? [20:08:05] yup [20:08:11] and this https://bugs.launchpad.net/git-review/+bug/1097278 sux really on first run [20:08:17] and good chance that you cloned from a gerrit install [20:08:35] I got a clone script somewhere [20:08:44] but at this speed of development git-review will be fixed in 22nd century [20:08:47] basically do something like: $ clone mediawiki/extension/Foobar [20:08:55] it change dir to ~/projects [20:08:59] mkdir -p [20:09:08] clone -o gerrit [20:09:10] done :-] [20:09:10] https://review.openstack.org/#/q/status:open+project:openstack-infra/git-review,n,z [20:09:24] yeah I know your "rant" about upstream being slow [20:09:32] maybe I should review your patches [20:09:57] they were also talking about a test suite to check for possible regressions [20:11:41] hashar, saper, I'm following you backscroll but can't tell what you think mike_wang should be doing to fix the problem... [20:11:50] Could anyone explain to me who can add me to https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots ? [20:12:00] oh mike hmm [20:12:08] mike_wang: so you have to change your remote again :-] [20:12:16] mike_wang: your remote should now be named "gerrit" [20:12:24] but points to the HTTPS url instead of the SSH one [20:12:30] you can't push over HTTPS [20:12:34] must use ssh there [20:12:36] valhallasw, I can add you; what is your username? [20:13:01] andrewbogott: Reedy beat you to it, but thanks :-) [20:13:08] 'k [20:13:15] however - who can add people and who cannot? [20:13:16] mike_wang: so you have to change the URL: git remote set-url gerrit ssh://mwang@gerrit.wikimedia.org:29418/operations/puppet.git [20:13:29] andrewbogott: hashar is in the user list but apparently was unable to [20:13:31] mike_wang: then attempt to connect using : git fetch gerrit [20:13:45] valhallasw: I think I am a simple mortal on the bots project. [20:13:49] valhallasw, yes, one must be a sysadmin to add new members. [20:13:59] andrewbogott: and how can they be recognised? [20:14:05] the admin list is empty [20:14:10] valhallasw and I have no idea what is the policy of that project to add new people in the bots project :/ Petan or Damianz would know I guess [20:14:27] Reedy figured it out :-] [20:14:41] we need an IRC bot to let us add people to projects [20:14:50] !add reedy to bots [20:14:50] https://labsconsole.wikimedia.org/wiki/Help:Addresses [20:14:55] that would be neat [20:14:57] valhallasw, not sure… at the moment asking on IRC or sending an email to the labs list is the proper process… pending more automation. [20:15:03] andrewbogott: OK. [20:15:59] valhallasw what is your labs id? [20:16:03] petan: Merlijn van Deen [20:16:11] with spaces? [20:16:22] @labs-user Merlijn van Deen [20:16:22] Merlijn van Deen is member of 1 projects: Bots, [20:16:26] ok [20:16:37] you already have access to bots, but not to shell, one sec [20:17:15] !log bastion giving access to Merlijn van Deen [20:17:16] Logged the message, Master [20:17:37] Ryan_Lane: Failed to add Merlijn van Deen to bastion. This needs the "loginviashell" right. [20:17:42] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 10.80, 8.26, 6.29 [20:17:55] sec [20:17:59] hashar: mwang@mwang-dev:~/puppet$ git remote set-url gerrit ssh://mwang@gerrit.wikimedia.org:29418/operations/puppet.git [20:18:00] mwang@mwang-dev:~/puppet$ git review [20:18:02] Problems encountered installing commit-msg hook [20:18:03] The following command failed with exit code 1 [20:18:05] "scp -P 29418 mwang@gerrit.wikimedia.org:hooks/commit-msg .git/hooks/commit-msg" [20:18:06] ----------------------- [20:18:08] Permission denied (publickey). [20:18:13] AH [20:18:15] much better :-] [20:18:24] so this time you did connect to the gerrit host :-] [20:18:27] and got denied [20:18:41] mike_wang: so we are a step ahead :-] [20:18:45] /now/ my question about ssh -A is relevant. [20:18:47] mike_wang did you upload your ssh key? [20:18:53] maybe [20:19:12] petan: yes I uploaded my ssh key [20:19:21] mike_wang to gerrit and labsconsole? [20:19:31] there are 2 places you need to upload to [20:19:33] petan: both [20:20:13] k, can you try to do just ssh -p 29418 mwang@gerrit.wikimedia.org [20:20:24] I guess it reject your key for some reason [20:21:08] petan: mwang@mwang-dev:~/puppet$ ssh -p 29418 mwang@gerrit.wikimedia.org [20:21:10] Permission denied (publickey). [20:21:21] ok do you have your private key in ~/.ssh [20:21:28] it needs to have chmod 600 [20:21:42] also you should have your public key as well there [20:21:51] despite it shouldn't be necessary [20:22:22] petan: that make sense. I don't have my private key in .ssh [20:22:32] eventually you can do ssh -v -p29418 mwang@gerrit.wikimedia.org in order to debug this issue [20:22:52] it doesn't need to be in .ssh but that's most typical path [20:23:07] on most distributions and systems I have ever seen [20:23:20] so try to copy it there and give it chmod 600 [20:24:03] on a side note - is it possible te receive mail on labs? [20:24:22] valhallasw depends - receive should be possible [20:24:26] sending from labs is harder [20:24:45] mike_wang, petan: Wait, I want to make sure that we aren't encouraging mike to put his private key in .ssh on a labs instance... [20:24:58] ok [20:25:21] mike_wang I am talking about machine you attempt to use git from [20:25:25] petan: Mike can clearly ssh to bastion and on to his instance. So there's every reason to think his key is set up correctly. [20:25:33] And he is accessing git from a labs instance. [20:25:35] petan: I'm thinking of using mediawiki-commits instead of gerrit's stream for the reviewer bot, and this would be a good reason to move it from the TS [20:25:58] petan, mike_wang, so the only real possibility is that he isn't forwarding his key to mwang-dev or that gerrit doesn't know about that key. [20:26:21] valhallasw yes it's not a problem to move anything there but sending e-mails is not so easy atm, do you need to do that? [20:26:31] did someone else add sell to valhallasw ? [20:26:33] petan: nope. [20:26:35] *shell [20:27:00] Ryan_Lane: howcome it's possible to insert someone to bots and not to bastion when they don't have shell [20:27:15] that's a good question. should be impossible [20:27:22] but he is there [20:27:25] he has shell [20:27:35] ok can you insert him to bastion? [20:27:38] maybe it's just me [20:27:40] oh. wow. weird [20:27:49] I see a bug [20:27:53] a caching bug [20:27:57] aha [20:28:04] cache is evil [20:28:08] really [20:28:21] @labs-user Merlijn van Deen [20:28:21] Merlijn van Deen is member of 1 projects: Bots, [20:28:46] valhallasw: what's your labsconsole username? [20:28:54] Ryan_Lane ^^ [20:28:56] see my query [20:29:01] * Ryan_Lane nods [20:29:15] now he's in shell [20:29:29] it checked the rights of the incorrect user [20:29:32] andrewbogott: you are right. I did not forward my key to mwang-dev [20:29:38] that's a nasty bug [20:29:44] @labs-user Merlijn van Deen [20:29:44] Merlijn van Deen is member of 1 projects: Bots, [20:29:48] hmm [20:29:51] mike_wang: OK, so -- working now? [20:29:52] anyway you are in bastion now [20:30:02] valhallasw do you know what is bastion? [20:30:04] !bastion [20:30:04] http://en.wikipedia.org/wiki/Bastion_host; lab's specific bastion host is: bastion.wmflabs.org which should resolve to 208.80.153.194; see !access [20:30:13] petan: yeah, the reverse ssh proxy [20:30:18] ok [20:30:20] or, well, proxy, er, login host [20:30:28] you can ssh there now [20:30:28] ok. I need to put a bug in for this.... [20:30:46] !botsdocs | valhallasw [20:30:47] valhallasw: https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots/Documentation [20:31:02] on that page you will find some basic information about bots [20:31:31] in case you need root, you should move your bot to bots-3, but don't forget to !log things [20:31:41] if you don't then bots-bnr1 is best instance we have atm [20:31:47] lot of ram and lot of resouces for anything [20:31:56] OK. I should be fine without. [20:32:00] also, because root is restricted there, it's more safe [20:32:10] is there any project structure within the resource? [20:32:29] e.g. unix groups [20:32:30] https://bugzilla.wikimedia.org/show_bug.cgi?id=43968 [20:32:44] this is likely due to getCanonicalName, the most evil of all mediawiki functions [20:32:55] hmm, depends, there are some basic groups in ldap, but not really [20:33:37] basically, I'd like to mirror the current nlwikibots toolserver MMP [20:33:52] where ~5 users from nlwiki can maintain all three-or-so bots [20:33:56] ok, you can create local service user or a group if u need [20:34:24] https://bugzilla.wikimedia.org/show_bug.cgi?id=40024 [20:34:31] petan: ok. [20:34:41] for this kind of stuff, sharing the access to multiple members, I recommend to use some shared service account - or, we can create a group for you all [20:34:42] andrewbogott: almost working. http://justpaste.it/1s4x [20:35:31] sharing stuff between multiple people wasn't yet discussed much, so far we were mostly using instances where everyone has root, so it wasn't hard to switch to someone else in order to access their bots [20:35:57] mike_wang: That looks correct! Just do a 'git commit --amend' and don't make any changes to the message... [20:36:00] Which I find frightening, to be honest. [20:36:06] though we were having lot of troubles with that... [20:36:08] git-review will add the changeid and then you should be able to submit it. [20:36:33] but OK, I'll take a look [20:36:54] valhallasw you can always register a special account for your bot in labsconsole, once you create an account in that it's already in ldap [20:37:35] petan: eh, as in having a seperate LDAP user? [20:37:44] yes, then you'd be able to sudo to it [20:37:49] yes [20:38:05] valhallasw: it doesn't need shell access and doesn't need to be added to the project [20:38:12] if the account exists all instances in labs know about it [20:38:32] you don't need to add an ssh key for it either [20:38:34] and how would I configure who can sudo to it? [20:38:40] or would sudo use the LDAP password? [20:38:40] I can do that [20:38:50] OK [20:38:57] we can create a "sudo group" for all people who would need to access it [20:39:16] ah, of course. Great. [20:39:23] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 20% free memory [20:39:38] petan: ugh. that's still broken, isn't it? [20:39:47] I guess it isn't if you don't specify the instances [20:39:53] RECOVERY Free ram is now: OK on swift-be2.pmtpa.wmflabs 10.4.0.112 output: OK: 20% free memory [20:39:56] It's essentially how toolservers' MMPs work [20:40:02] valhallasw: yep [20:40:09] except you can create them yourself [20:40:20] Ryan_Lane it works so far :/ or it seems to me [20:40:23] hooray for self-registration [20:40:27] petan: oh. ok [20:40:32] petan: andrewbogott may have fixed it [20:40:42] RECOVERY Free ram is now: OK on newprojectsfeed-bot.pmtpa.wmflabs 10.4.0.232 output: OK: 34% free memory [20:40:45] Ryan_Lane: well, sort-of, because you still need a root to configure the sudo rights [20:41:07] valhallasw: yeah. true. though it's volunteers who manage that as well [20:41:29] once I finish our new deployment system I'm going to go on a sprint to make some of this stuff easier [20:41:40] Is there an SSH fingerprint list? [20:41:48] unfortunately not yet [20:42:06] I'll be solving that problem soon [20:42:25] since I have an easy method for doing so now [20:43:34] Change on 12mediawiki a page Wikimedia Labs was modified, changed by Ryan lane link https://www.mediawiki.org/w/index.php?diff=628675 edit summary: [+83] /* Proposals */ [20:45:46] ok valhallasw ping me when you need me [20:45:55] petan: sure! thanks for your help [20:46:00] you too, Ryan_Lane :-) [20:46:03] np [20:46:06] totally welcome [20:46:42] reconnecting [20:49:35] can somebody help me with a login issue in labsconsole/gerrit? [20:49:46] Vulpix yup [20:49:48] Ryan_Lane: is there a reason the instances are not using host-based auth internally? Setting up an ssh agent is difficult - especially for new users. [20:50:03] host-based auth? [20:50:15] well, it's a bit embarrassing. It *seems* I forgot my password. That is, I put what I think is my password but it says it's invalid [20:50:29] I tried the email me a new password, and received the mail with a temp one [20:50:30] Vulpix: it should be your labsconsole password [20:50:31] Vulpix can you login to labsconsole? [20:50:45] Ryan_Lane: yes. basically, a user logs in to a single host, and that host declares to all other hosts 'the user has authenticated to me' [20:50:51] you need to reset the password from the web interface [20:51:03] valhallasw: hm. that would be a little insecure [20:51:17] Ryan_Lane: it would be as secure as the login server, basically. [20:51:19] valhallasw or you can create private key for labs [20:51:25] valhallasw and upload it to bastion [20:51:29] I login with the temp password (in labsconsole.wikimedia.org), but when it prompts me to change it, it says there was an error [20:51:32] petan: that is a practical option, yes [20:51:36] that would allow you to ssh anywhere from there without agent [20:51:42] I'll use that in the guide. Thanks. [20:52:24] petan: when I change my password it says "There was either an authentication database error or you are not allowed to update your external account" [20:52:54] wow - I would say make a ticket in bugzilla :P but Ryan_Lane see it here :D [20:53:20] Vuplix, any chance you enabled two-factor auth in labs? [20:53:28] And are not, currently, giving it your second factor? [20:53:41] I don't know what is it so... I guess no [20:53:57] I leave the Token field blank [20:54:49] Ryan_Lane: it's being used on the toolserver, so DaB. and nosy probably know more about it. [20:55:09] so... I should make a bugzilla about that? [20:57:04] Vulpix: Maybe, but let me check one more thing... [20:57:23] hm [20:57:29] Ok, no problem. I don't need my account for today ;) [20:57:30] lemme look at the log [20:58:04] Vulpix: mind trying it for me? [20:58:09] I'm tailing the log [20:58:17] okay, I'll try again now [20:58:36] mediawiki's auth systems needs to be rewritten :( [20:58:38] done, same error [20:58:40] ah [20:58:43] which username are you using? [20:58:48] Martineznovo [20:59:16] well, I hope it's not because it should be all lowercase... [20:59:33] andrewbogott: hm. we may want to revert that change you made for uid/username lookups [20:59:34] well, the temporary password worked, so I gess no [21:00:00] Vulpix: try now [21:00:03] Ryan_Lane: Because it works for password reset but not for login, or something? [21:00:19] andrewbogott: I think it has a memcache bug [21:00:48] Ryan_Lane: nope, same error [21:00:59] why do I get logged out from labsconsole every... 30 minutes or so? [21:01:13] valhallasw: are you using "remember me"? [21:01:27] valhallasw: I needed to restart memcache a couple times [21:01:31] eh, probably not. [21:01:33] ah, I see [21:01:37] sorry about that [21:01:43] np. [21:01:45] I should probably warn people when I do it [21:02:11] Vulpix: can you have it send a new password reset? [21:02:18] sure [21:02:38] for some reason it doesn't see which domain you're using when trying to reset your password [21:02:42] this is some odd mediawiki bug [21:02:52] PROBLEM Free ram is now: WARNING on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Warning: 13% free memory [21:03:25] same error :S [21:03:37] with new password emailed [21:03:41] yep [21:03:56] oooohhhhhhh [21:04:00] that's not your username [21:04:07] this is: Jesús Martínez Novo [21:04:08] what [21:04:48] eh... it's not possible... my gerrit username is martineznovo [21:04:51] ah [21:04:58] I know what happened here, then [21:05:06] the realname bug that lasted for one week [21:05:08] let me fix this [21:05:09] Jesús Martínez Novo may be my "real name" [21:05:18] oh [21:05:50] ok. it'll work now [21:05:53] Vulpix there was briefly a bug that resulted in your username being changed if you changed your 'real name' in your account settings. [21:06:01] Well, I guess I shouldn't have changed my realname there :P [21:06:08] nah, it's fine now [21:06:09] Which, there's no reason you should've known that [21:06:16] it was broken for a very short period of time [21:06:28] Vulpix: make sure your gerrit web login is working too, please [21:06:33] if not I'll fix that now as well [21:06:52] hmmm, so maybe my old password were valid but the login failed because of that :S [21:06:53] Ryan_Lane: So do you still think there's a caching problem in that patch? I'm looking at it now and, if it interacts with the cache it's too subtle for me to see. [21:07:24] woohooo, yes! my old password works!! [21:07:35] many thanks Ryan_Lane ;) [21:08:01] great :) [21:08:07] sorry about it being broken [21:08:23] back in a bit. lunch [21:12:23] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 14% free memory [21:37:44] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3.pmtpa.wmflabs 10.4.0.62 output: WARNING - load average: 7.23, 6.22, 5.41 [21:38:44] PROBLEM Free ram is now: WARNING on newprojectsfeed-bot.pmtpa.wmflabs 10.4.0.232 output: Warning: 19% free memory [21:39:54] PROBLEM Current Load is now: WARNING on ve-roundtrip2.pmtpa.wmflabs 10.4.0.162 output: WARNING - load average: 7.41, 6.37, 5.42 [22:07:43] RECOVERY Current Load is now: OK on parsoid-roundtrip3.pmtpa.wmflabs 10.4.0.62 output: OK - load average: 4.25, 4.19, 4.98 [22:11:04] PROBLEM Free ram is now: WARNING on swift-be3.pmtpa.wmflabs 10.4.0.124 output: Warning: 19% free memory [22:21:01] Ryan_Lane, could you have a look at Mike's proxy class? https://gerrit.wikimedia.org/r/#/c/43886/ [22:24:04] …partly because I want to talk about how/if/when we put that on a production machine. [22:25:46] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3.pmtpa.wmflabs 10.4.0.62 output: WARNING - load average: 4.94, 5.41, 5.12 [22:34:42] PROBLEM Current Load is now: WARNING on parsoid-roundtrip6-8core.pmtpa.wmflabs 10.4.0.222 output: WARNING - load average: 4.15, 5.23, 5.08 [22:37:44] PROBLEM dpkg-check is now: CRITICAL on follow01-dev.pmtpa.wmflabs 10.4.0.243 output: DPKG CRITICAL dpkg reports broken packages [22:39:44] RECOVERY Current Load is now: OK on parsoid-roundtrip6-8core.pmtpa.wmflabs 10.4.0.222 output: OK - load average: 5.36, 4.81, 4.89 [22:47:28] Ryan_Lane: we quickly talked about git-deploy during our mediawiki weekly meeting. Chris Steipp might show up to ask you a bit about its status in beta :-] [22:49:22] hashar: it's the same in beta as in production [22:49:35] hashar: we should switch all wikis to use the wmf branches for testing [22:49:40] until we deploy to production [22:49:53] PROBLEM host: follow01d.pmtpa.wmflabs is DOWN address: 10.4.1.40 CRITICAL - Host Unreachable (10.4.1.40) [22:50:33] ahh [22:50:38] I did not understood that sorry [22:51:13] Ryan_Lane: that would solve the issue with the recursive submodule update you talked about last week? [22:55:07] will symlink [22:56:03] RECOVERY Free ram is now: OK on swift-be3.pmtpa.wmflabs 10.4.0.124 output: OK: 21% free memory [22:56:36] !log deployment-prep updating mediawiki-config fd29e6a..329113f [22:56:38] Logged the message, Master [22:58:41] !log deployment-prep renamed php-1.21wmf{6,7} with a -back prefix. Created symbolic links to the git-deploy slots: ln -s /srv/deployment/mediawiki/slot1 php-1.21wmf6 and /srv/deployment/mediawiki/slot0 php-1.21wmf7 [22:58:41] Logged the message, Master [22:58:53] RECOVERY host: follow01d.pmtpa.wmflabs is UP address: 10.4.1.40 PING OK - Packet loss = 0%, RTA = 9.51 ms [22:59:23] PROBLEM Total processes is now: CRITICAL on follow01d.pmtpa.wmflabs 10.4.1.40 output: Connection refused by host [22:59:58] ahh LocalSettings.php missing :-D [23:00:53] PROBLEM Current Load is now: CRITICAL on follow01d.pmtpa.wmflabs 10.4.1.40 output: Connection refused by host [23:00:53] PROBLEM dpkg-check is now: CRITICAL on follow01d.pmtpa.wmflabs 10.4.1.40 output: Connection refused by host [23:01:33] PROBLEM Current Users is now: CRITICAL on follow01d.pmtpa.wmflabs 10.4.1.40 output: Connection refused by host [23:02:13] PROBLEM Disk Space is now: CRITICAL on follow01d.pmtpa.wmflabs 10.4.1.40 output: Connection refused by host [23:03:03] PROBLEM Free ram is now: CRITICAL on follow01d.pmtpa.wmflabs 10.4.1.40 output: Connection refused by host [23:06:12] hashar: no, but it'll mean we can test deployment before we do it in production [23:06:18] to make sure paths and such are correct [23:06:34] Ryan_Lane: that is what csteipp was asking for [23:06:48] we don't need master branch for that [23:06:55] indeed [23:07:19] how would I sync LocalSettings.php which is in ignore list ? [23:07:30] should I git add + commit a live hack in the slot ? [23:07:32] then sync? [23:08:30] * hashar tries [23:08:34] good question. no clue [23:08:40] I thought they were going to add one into the branch [23:10:52] RECOVERY Current Load is now: OK on follow01d.pmtpa.wmflabs 10.4.1.40 output: OK - load average: 0.48, 0.99, 0.70 [23:10:52] RECOVERY dpkg-check is now: OK on follow01d.pmtpa.wmflabs 10.4.1.40 output: All packages OK [23:11:32] RECOVERY Current Users is now: OK on follow01d.pmtpa.wmflabs 10.4.1.40 output: USERS OK - 0 users currently logged in [23:12:12] RECOVERY Disk Space is now: OK on follow01d.pmtpa.wmflabs 10.4.1.40 output: DISK OK [23:13:02] RECOVERY Free ram is now: OK on follow01d.pmtpa.wmflabs 10.4.1.40 output: OK: 1637% free memory [23:13:05] that worked [23:13:27] Ryan_Lane: cd slot1; git deploy start ; git add -f LocalSettings.php; git deploy sync [23:13:33] ... [23:13:36] I got LocalSettings.php on one of the minions [23:13:38] there's already one in the brances [23:13:44] *branches [23:14:22] RECOVERY Total processes is now: OK on follow01d.pmtpa.wmflabs 10.4.1.40 output: PROCS OK: 90 processes [23:15:29] did someone do a pull in slot0 before doing a git deploy start? [23:15:33] on beta [23:15:40] or are you deploying there now? [23:15:46] me yeah [23:15:59] finish a deployment [23:16:07] slot1 got me blank page :-] [23:16:18] did you start a deployment on slot0? [23:16:18] feel free to reset slot0 [23:16:27] I think so [23:16:32] attempt to deploy a LocalSettings.php [23:16:39] I don't see a lock file, but I see a dirty checkout [23:16:44] sorry if I screwed something :/ [23:16:57] it's not really a problem [23:17:13] I just reset to the last tag [23:17:19] two extensions add different status [23:17:21] ah [23:17:24] you had started there [23:17:40] hashar: can you do: git deploy abort in slot0? [23:17:56] or finish it up [23:18:08] doing [23:18:13] I'm pretty sure a LocalSettings.php file exists in the branch [23:18:15] that is aborting right now [23:19:02] PROBLEM Free ram is now: WARNING on swift-be3.pmtpa.wmflabs 10.4.0.124 output: Warning: 19% free memory [23:19:21] Ryan_Lane: aborted slot0 [23:19:24] cool [23:19:35] I am heading bed before I break something [23:19:52] RECOVERY Current Load is now: OK on ve-roundtrip2.pmtpa.wmflabs 10.4.0.162 output: OK - load average: 3.70, 4.13, 4.85 [23:19:53] hm. maybe there isn't one in the branch [23:20:04] how did it get in slot0 and slot1 on tin? [23:20:23] someone copied them there from the production? [23:20:26] I don't think so [23:20:30] I guess we could have them as live hack [23:20:36] because it's specific to the new deployment system [23:20:38] yeah [23:20:49] I did that on slot1 , that worked [23:20:54] even if in the .gitignore file [23:21:01] it should be this: [23:21:01] (though I then add blank pages on the apaches [23:21:02] include_once( "/srv/deployment/mediawiki/common/wmf-config/CommonSettings.php" ); [23:21:31] I have changed the branch sym link in /home/wikipedia/common so they now points to the slot [23:23:08] hashar: that's was on beta's apaches you changed the symlinks? [23:23:20] csteipp: yes [23:23:31] I have kept a copy of the wmf branches checkout though [23:23:36] Ok. added LocalSettings.php locally [23:23:57] we might want to warn chrismcmahon about no more having master on beta :d [23:24:02] hm. if there's a .gitignore for LocalSettings, doesn't that mean the minions won't pull it? [23:24:09] I think the feature teams are using beta to test out their changes [23:24:16] Ryan_Lane: they do pull it [23:24:20] ah, ok [23:24:24] hashar: I think Rob warned him, but I'll check. [23:24:32] Ryan_Lane: regardless of the gitignore cause it is in the local git tree [23:24:39] hashar: I am ready to think about how else to work with beta. atm only AFTv5 is hosted there seriously. [23:24:43] csteipp: thannnnkkks [23:24:43] hm [23:24:47] l10nupdate-quick didn't run [23:24:55] I wonder if I need to run puppet again [23:25:01] chrismcmahon: well we are using beta to test out git-deploy :-] [23:25:16] chrismcmahon: which involves having some wikis (if not all) to run out of the wmf branches. [23:26:21] hashar: I am not sure that having only master exist on beta is the best way to go. but so far test2wiki is doing most of what I want, at least for right now. [23:26:57] chrismcmahon: i am all open to do whatever you think is the best for the other teams and for QA :-] [23:27:23] chrismcmahon: mobile on beta would be nice anyway [23:27:53] oh. I probably need to update the pillars [23:27:55] hashar: MobileFrontend on beta is under way, and if AFTv5 continues to work, that'll be OK for now [23:28:08] nice [23:29:01] chrismcmahon: should we stick enwiki.beta to the master branch ? [23:29:17] hashar: sure [23:29:28] we might have to have all wikis on wmf branches though :/ [23:29:38] hashar: works for me [23:29:47] nice [23:31:33] sleeping time, back tomorrow [23:31:37] hm. why is the localization stuff not working on beta? :( [23:32:25] mw-update-l10n as mwdeploy user does seem to work [23:32:43] no it doesn't [23:32:58] it's not writing into the l10n-slot0 or l10n-slot1 slots [23:34:06] grr... and anomie is gone for the night too. [23:34:21] well, I think I can likely figure it out [23:34:33] mw-update-l10n might have wrong paths [23:34:41] and it run the scripts as mwdeploy user [23:34:42] It looks like it: [23:34:47] DEST=/usr/local/apache/common-local [23:34:47] which might not have access to the slots [23:34:50] cause of a perm denied [23:35:14] Update for 1.21wmf6 failed: /usr/local/apache/common-local/l10n-1.21wmf6 does not exist [23:35:24] that's not the DEST [23:35:35] the DEST should be Update for 1.21wmf6 failed: /usr/local/apache/common [23:35:39] err [23:35:40] sorry [23:35:51] should be: /srv/deployment/mediawiki/common [23:36:23] or /usr/local/apache/common-local/php-1.21wmf6/cache/l10n [23:36:29] no [23:36:54] /usr/local/apache is going away [23:37:14] [hashar@fenari(mw-inst):/home/wikipedia/common/php-1.21wmf7/cache/l10n]$ ls |head -n2 [23:37:15] l10n_cache-ab.cdb [23:37:16] l10n_cache-ace.cdb [23:37:36] that's the old deployment system [23:37:51] the new deployment system uses /srv/deployment/mediawiki/common [23:37:55] but then I have setup /home/wikipedia/common/php-1.21wmf7 to points to the slot [23:38:11] we want it to be the same as on production [23:38:20] that's the idea [23:38:28] and mwdeploy user can't write to slot0/cache because mwdeploy is not in the project-deployment-prep group [23:38:38] ugh [23:38:45] and thus can't create the l10n directory [23:38:56] yeah perm issue gave me a few headhaches [23:38:56] that's a separate issue [23:39:34] we don't really have that cron working yet anyway [23:40:04] I got a shell script named wmf-beta-autoupdate in puppet which run on -bastion [23:40:18] it does a git pull of php-master and does run the mw-update-l10n after each git pull of core+extension [23:40:43] that's still a separate issue [23:40:56] we need the configuration to point at the right spot [23:41:05] see /usr/local/lib/mw-deployment-vars.sh on tin [23:41:07] and on bastion [23:41:20] on tin it points to the correct spots for the new system [23:41:26] on bastion it points to the old spots [23:41:42] so maybe use the "new deploy" branch of mediawiki-config [23:42:00] /usr/local/lib/mw-deployment-vars.sh doesn't come from that [23:42:11] does it? [23:42:12] nope [23:42:13] noooo idea [23:42:16] it doesn't [23:42:27] but newdeploy branch is supposed to represent the new paths [23:42:33] meanwhile in prod we will use symlinks iirc [23:42:34] ...... [23:42:56] Tim or Sam would know better than I [23:43:03] /usr/local/lib/mw-deployment-vars.sh is what points scripts to locations of common [23:44:55] ah. I see [23:45:20] well the mw-deployment-vars.sh is probably not used yet [23:45:29] it absolutely is [23:46:29] I see why [23:46:36] we're using: v [23:46:38] err [23:46:38] misc::deployment::scap_scripts [23:46:45] we should be using: misc::deployment::scripts [23:47:39] hashar: I'll get it worked out :) [23:47:42] hashar: go to bed ;) [23:47:47] I am sure you will :-D [23:48:09] I probably applied the wrong class [23:48:20] I think I changed that last week. SAL might now [23:48:35] or the history for the deployment-bastion wikipage on labsconsole [23:48:49] waking up in 6 hours, I guess I get to sleep for real now. [23:49:48] the wmf-beta-autoupdater is a service [23:49:48] I wrote an upstart config for it [23:49:56] so you should be able to stop it with: stop wmf-beta-autoupdater [23:50:03] logs are in /var/log/wmf-beta-autoupdater [23:50:19] it just while(1); git pull & mw-update-l10n [23:50:26] on php-master [23:50:32] so I guess you can stop it [23:51:26] ok bed. have a nice afternoon [23:55:45] andrewbogott: thanks for the gluster server log fi [23:55:46] *fix [23:56:24] Ryan_Lane: Sure -- I'll want to check back in a week and make sure it's working properly. Gluster has a built-in log rotation tool which makes me worry there are complications of some sort. [23:56:47] it must not use it :D [23:57:33] I mean, they provide a tool which we could invoke via cron rather than using logrotate. [23:57:58] It occurs to me that I should look at what it does :)