[00:04:37] (03CR) 10Sitic: [C: 032 V: 032] Add option to hide own edits [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/224980 (https://phabricator.wikimedia.org/T105937) (owner: 10Sitic) [00:04:48] (03PS1) 10Sitic: Add missing aria labels [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/224981 [00:05:07] (03CR) 10Sitic: [C: 032 V: 032] Add missing aria labels [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/224981 (owner: 10Sitic) [00:45:56] 6Labs, 10Sentry: Create labs project for Sentry - https://phabricator.wikimedia.org/T105979#1455907 (10Tgr) 3NEW a:3bd808 [00:47:45] 6Labs, 7Tracking: New Labs project requests (Tracking) - https://phabricator.wikimedia.org/T76375#1455921 (10bd808) [00:47:46] 6Labs, 10Sentry: Create labs project for Sentry - https://phabricator.wikimedia.org/T105979#1455919 (10bd808) 5Open>3Resolved https://wikitech.wikimedia.org/wiki/Nova_Resource:Sentry [02:50:41] PROBLEM - Free space - all mounts on tools-webgrid-lighttpd-1404 is CRITICAL tools.tools-webgrid-lighttpd-1404.diskspace.root.byte_percentfree (<10.00%) [03:25:55] 10Wikibugs: Wikibugs should ignore changes to the security field - https://phabricator.wikimedia.org/T105625#1456156 (10mmodell) @legoktm: The other custom fields are story points and bugzilla id. Since those are numeric, you can assume if newValue and oldValue are either null or a string then it is the security... [05:42:19] hullo. can anyone point me at the documentation about cronjobs for tools? i'm not getting emails, even though have set mailto in crontab [06:01:12] 6Labs, 10Beta-Cluster, 10Labs-Infrastructure, 7Tracking: Log files on labs instance fill up disk (/var is only 2GB) (tracking) - https://phabricator.wikimedia.org/T71601#1456340 (10zhuyifei1999) [06:45:41] RECOVERY - Free space - all mounts on tools-webgrid-lighttpd-1404 is OK All targets OK [09:22:41] today ich checked my bot log and they are full of one sql error: Got error 176 "Read page with wrong checksum" from storage engine Aria [09:23:20] since six days. does sb. already have expierence with this problem and know how to solve this? [09:49:29] Merlissimo: hello [09:49:45] Merlissimo: please fill in a task in https://phabricator.wikimedia.org/ [09:50:03] Merlissimo: you want to include the database server / db name / SQL query and any other info [09:50:38] Merlissimo: seems some labs database might be corrupted [09:50:52] hashar: its a table in user database which is dropped and recreated. the error happens on an insert [09:51:58] even google does not know anything about this error ;-) [09:53:18] from some old mysql bug reports MySQL error code 176: File too short; Expected more data in file [09:53:52] who knows really, that probably needs to be investigated on the server [09:54:35] and if you can find a way to reproduce it, that is even better [09:55:55] PROBLEM - Puppet failure on tools-static-02 is CRITICAL 100.00% of data above the critical threshold [0.0] [10:07:32] PROBLEM - Puppet failure on tools-static-01 is CRITICAL 100.00% of data above the critical threshold [0.0] [10:38:30] Is it acceptable to request a labs project for vagrant so I can test various patches I hack up [10:39:14] (currently don't have access to my usual testing server so was trying to think up how I could test things) [10:40:47] GEOFBOT: can't you run vagrant on your local machine? :-D [10:41:01] away right now so can't access local machine :/ [10:54:26] (currently on locked-down laptop w/o virtualization) [11:27:40] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL 50.00% of data above the critical threshold [0.0] [12:23:11] PROBLEM - Puppet staleness on tools-bastion-01 is CRITICAL 33.33% of data above the critical threshold [43200.0] [12:32:36] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [0.0] [13:06:16] 10Gerrit-Patch-Uploader, 7Easy: Serve static resources from //tools-static.wmflabs.org or /static/ project - https://phabricator.wikimedia.org/T86354#1456820 (10Aklapper) @valhallasw: Could you answer Krinkle's question? [13:08:08] RECOVERY - Puppet staleness on tools-bastion-01 is OK Less than 1.00% above the threshold [3600.0] [13:43:32] hi guys. who's the db specialist? i opened a ticket and i'd like to tag someone :) [13:55:49] Depends on the nature of the question.. [13:58:35] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL 66.67% of data above the critical threshold [0.0] [14:08:48] hi guys. who's the db specialist? i opened a ticket and i'd like to tag someone :) [14:09:45] Depends on the nature of the question.. [14:10:27] well, the performance is much worse than three-four months ago. [14:10:50] my script is not working very well. so it means the database is working *differently* [14:11:55] https://phabricator.wikimedia.org/T105964 [14:13:23] 10Tool-Labs-tools-Database-Queries, 7Database: HELP! Database is getting Slow: A test which took less than 10 min, now it takes 3 hours. I cannot do my experiments. - https://phabricator.wikimedia.org/T105964#1457111 (10marcmiquel) a:3jcrespo [14:17:46] 10Tool-Labs-tools-Database-Queries, 7Database: HELP! Database is getting Slow: A test which took less than 10 min, now it takes 3 hours. I cannot do my experiments. - https://phabricator.wikimedia.org/T105964#1457141 (10Reedy) ``` mysql:wikiadmin@db1052 [enwiki]> EXPLAIN SELECT DISTINCT rev_user_text, count(*)... [14:24:41] hi Reedy [14:24:48] i've seen your message [14:25:32] i posted the three queries but the one which is probably costing more is the one which does 'union all'. [14:26:31] well, the union is just doing a load queries, so ofc is going to be slow [14:26:35] query = query + 'SELECT "'+language+'",user_editcount FROM '+language+'_p.user WHERE user_name LIKE %s ' [14:26:38] Why are you using LIKE? [14:27:39] 1 sec. let me obtain an example query [14:27:48] because I posted the code to construct it [14:28:51] https://phabricator.wikimedia.org/P997 [14:37:47] Reedy: i updated with a real query. [14:38:01] i certainly can't remember why i chose like instead of = [14:38:10] if it seemed more approipat [14:38:16] appropiated at the time [14:38:50] it'll certainly be causing extra overhead [14:39:36] ok, Reedy, i change it. [14:39:50] it might not fix the problem, but it's little things that can make a difference [14:39:54] however, the code is the same and the difference is very substantial. [14:40:04] yup, thanks. [14:40:08] what can we do next? [14:40:15] try it again and see? ;) [14:40:32] If you look at that paste, it's evaluating potentially a lot more rows using LIKE 'Foo%' [14:43:45] yes. I saw it. [14:43:57] but the results should be the same shouldn't they? [14:44:34] should be [14:44:39] just the dbserver might do more work [14:44:48] making it more slow [14:44:50] anyway, it's changed now. [14:45:33] who should I tag in the task to see the performance thing? [14:45:52] you already assigned it to a dba [14:46:25] ok. it's fine. [14:52:06] marmick: I'm confused [14:52:09] Just ran that query for my account [14:52:16] which one? [14:52:17] took 0.17 seconds [14:52:42] All the unions [14:52:57] it's an example. it may be longer or shorter [14:53:07] but i do hundreds [14:53:10] ... [14:53:23] You can't expect people to help you if you're not giving all the information [14:53:42] Reedy: i'm telling you that the query is generated automaticly depending on previous info. [14:53:57] when i say longer it's because the join might take more databases or less [14:54:01] not because i don't want to tell ! :) [14:55:47] 10Tool-Labs-tools-Database-Queries, 7Database: HELP! Database is getting Slow: A test which took less than 10 min, now it takes 3 hours. I cannot do my experiments. - https://phabricator.wikimedia.org/T105964#1457229 (10Reedy) See also P997 [14:57:09] Reedy: in some cases it might take 150 db at the same union. [14:57:16] Right [14:57:33] I can't see how it'd get magnitudes slower [14:57:55] it's just tens of simple select queries [14:58:12] yes. but imagine that for thousands of users [14:58:28] for thousands of articles [15:08:38] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [0.0] [15:10:39] hashar: someone reported to me there is a problem with spam on beta cluster, is that actually true? i can't find a single evidence [15:10:55] I suppose anons are not allowed on beta? [15:11:09] there is literally no spam whatsoever which is kind of weird... is there captcha? [15:11:23] I never saw so clean wiki in past [15:11:34] o.o [15:13:47] Reedy: thanks for helping! [15:13:54] i got to go :) bye. [15:54:51] PROBLEM - Puppet failure on tools-exec-1205 is CRITICAL 20.00% of data above the critical threshold [0.0] [15:55:48] PROBLEM - Puppet failure on tools-webgrid-generic-1401 is CRITICAL 20.00% of data above the critical threshold [0.0] [15:56:08] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1208 is CRITICAL 22.22% of data above the critical threshold [0.0] [15:56:42] PROBLEM - Puppet failure on tools-exec-1405 is CRITICAL 50.00% of data above the critical threshold [0.0] [15:57:38] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1204 is CRITICAL 50.00% of data above the critical threshold [0.0] [15:57:40] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL 50.00% of data above the critical threshold [0.0] [15:57:54] PROBLEM - Puppet failure on tools-exec-1203 is CRITICAL 60.00% of data above the critical threshold [0.0] [15:57:56] PROBLEM - Puppet failure on tools-master is CRITICAL 20.00% of data above the critical threshold [0.0] [15:58:56] PROBLEM - Puppet failure on tools-exec-wmt is CRITICAL 30.00% of data above the critical threshold [0.0] [15:59:13] PROBLEM - Puppet failure on tools-exec-1404 is CRITICAL 55.56% of data above the critical threshold [0.0] [15:59:19] PROBLEM - Puppet failure on tools-exec-1408 is CRITICAL 60.00% of data above the critical threshold [0.0] [16:00:23] PROBLEM - Puppet failure on tools-webgrid-generic-1403 is CRITICAL 33.33% of data above the critical threshold [0.0] [16:00:29] PROBLEM - Puppet failure on tools-redis-02 is CRITICAL 70.00% of data above the critical threshold [0.0] [16:00:39] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1207 is CRITICAL 60.00% of data above the critical threshold [0.0] [16:01:38] PROBLEM - Puppet failure on tools-exec-1211 is CRITICAL 40.00% of data above the critical threshold [0.0] [16:01:58] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1406 is CRITICAL 60.00% of data above the critical threshold [0.0] [16:02:04] PROBLEM - Puppet failure on tools-exec-1204 is CRITICAL 60.00% of data above the critical threshold [0.0] [16:02:21] andrewbogott: ^ [16:02:22] > SC[1;31mError: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item labs_puppet_master in any Hiera data file and no default supplied at /etc/puppet/manifests/role/salt.pp:96 on node tools-exec-1408.tools.eqiad.wmflabsESC[0m [16:08:21] 10Tool-Labs-tools-Other: Edits by user results - https://phabricator.wikimedia.org/T106040#1457544 (10Krenair) [16:18:14] AzaToth: did you have a chance to see my proposal for how to implement i18n in TW? [16:20:14] Hey! I want to setup a new server on labs to host a website for the Community Tech team internal use - am I correct in assuming that I need to request a new "Project" for that? [16:20:48] Niharika: what internal use are you going to be doing? [16:20:52] Niharika: and the answer is 'yes', yes [16:21:46] YuviPanda: Reviewing potential tasks and scoring them on some criterias. [16:22:05] isn't that phabricator? :) [16:22:08] but yeah, do request a project [16:22:43] YuviPanda: "Private" tool. :P [16:24:34] Niharika: you can have private project sin phabricator [16:25:27] YuviPanda: Thanks for that information! I'll go back and re-evaluate what we want and see if Phabricator fits. [16:25:34] You can't have a private project. [16:25:35] yw! [16:25:40] You can have a project to track private tasks. [16:26:05] If you can justify putting the tasks in private areas of phabricator. [16:28:37] hey all. i'm trying to get T103192 unblocked. in short, my public key is denied but it seems i should have access. [16:30:59] "In general, where possible, we aim to do much of our work in public, rather than in private, typically on public wikis. " [16:31:16] https://wikimediafoundation.org/wiki/Resolution:Wikimedia_Foundation_Guiding_Principles [16:31:37] niedzielski: i can check that on the server.. [16:31:55] Thanks mutante and Krenair. I'll convey that information over to the team. [16:32:00] mutante: thanks! [16:32:12] YuviPanda: that should be fixed by now… is it? [16:34:19] RECOVERY - Puppet failure on tools-exec-1408 is OK Less than 1.00% above the threshold [0.0] [16:34:51] RECOVERY - Puppet failure on tools-exec-1205 is OK Less than 1.00% above the threshold [0.0] [16:34:58] niedzielski: you dont actually have a home directory on that server yet... [16:35:04] awight dduvall ejegg hashar jzerebecki krinkle legoktm ubuntu [16:35:14] oops, didn't mean to ping all of you [16:35:25] but these people have shell on the integration-slave [16:35:31] RECOVERY - Puppet failure on tools-redis-02 is OK Less than 1.00% above the threshold [0.0] [16:35:46] mutante: how does one obtain shell access? that's what i'm after [16:35:47] RECOVERY - Puppet failure on tools-webgrid-generic-1401 is OK Less than 1.00% above the threshold [0.0] [16:36:17] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1208 is OK Less than 1.00% above the threshold [0.0] [16:36:20] i'm a bit confused by hashar's comment "I forgot to grant you sudo, that is now granted." .. how did he [16:36:42] RECOVERY - Puppet failure on tools-exec-1405 is OK Less than 1.00% above the threshold [0.0] [16:36:58] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1406 is OK Less than 1.00% above the threshold [0.0] [16:37:04] RECOVERY - Puppet failure on tools-exec-1204 is OK Less than 1.00% above the threshold [0.0] [16:37:38] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1204 is OK Less than 1.00% above the threshold [0.0] [16:37:40] mutante: i'm not sure. i could comment on the thread and hashar may follow up when he has a moment [16:37:40] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1407 is OK Less than 1.00% above the threshold [0.0] [16:37:54] RECOVERY - Puppet failure on tools-exec-1203 is OK Less than 1.00% above the threshold [0.0] [16:37:55] RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0] [16:38:50] niedzielski: i'll comment and check if you are a project member [16:39:00] o/ [16:39:00] RECOVERY - Puppet failure on tools-exec-wmt is OK Less than 1.00% above the threshold [0.0] [16:39:07] mutante: thanks! [16:39:15] RECOVERY - Puppet failure on tools-exec-1404 is OK Less than 1.00% above the threshold [0.0] [16:40:25] RECOVERY - Puppet failure on tools-webgrid-generic-1403 is OK Less than 1.00% above the threshold [0.0] [16:40:41] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1207 is OK Less than 1.00% above the threshold [0.0] [16:41:26] niedzielski is a project member but not a project admin [16:41:39] RECOVERY - Puppet failure on tools-exec-1211 is OK Less than 1.00% above the threshold [0.0] [16:41:41] sniedzielski is in project-integration [16:41:53] niedzielski is in project-deployment-prep [16:42:06] sniedzielski is in wmf [16:42:29] Krenair: recently i heard that adminship is not needed for shell [16:42:42] it is not [16:42:49] then why does he not have a home :) [16:43:07] you do, for example [16:43:14] is the home created on login? [16:43:17] and same "project-integration" [16:43:40] is admin needed for sudo? [16:43:44] jzerebecki: yes. so the usual ssh issue [16:44:02] probably [16:44:28] https://wikitech.wikimedia.org/wiki/Special:NovaSudoer says for integration that Sniedzielski is in default sudo policy [16:44:36] jzerebecki: hashar claimed he gave him sudo [16:44:50] yea that means he did [16:45:11] niedzielski: what happens when you try connecting? let's try one more time while i watch the log [16:45:28] integration-slave-trusty-1015 [16:45:38] fatal: Access denied for user niedzielski by PAM account configuration [preauth] [16:45:41] hah [16:45:44] so why that [16:45:47] mutante: i tried both sniedzielski and niedzielski accounts [16:46:01] Failed publickey for sniedzielski [16:46:02] mutante: sniedzielski returns public key denied. niedzielski says connection unknown [16:46:16] sniedzielski is the correct one [16:46:25] so then the question is if you have the right key loaded [16:46:35] do you have the "ssh-add" command? [16:46:44] mutante: i do [16:46:48] ssh-add -l [16:46:53] does it list multiple keys? [16:47:03] mutante: no, just the one [16:47:08] which one? [16:47:12] you should have 2 though [16:47:27] is this the right one for sniedzielski? [16:47:42] mutante: i only have one key. it works on other wmf servers [16:47:48] can you paste the line it outputs? [16:49:03] which server does it work on.. i'll compare [16:49:28] mutante: er, sorry it seems my output is actually: [16:49:30] 2048 d7:82:3d:a7:da:ed:38:d1:4f:32:fc:6b:b4:55:6c:08 stephen@niedzielski.com (RSA) [16:49:31] 4096 3f:cf:51:e9:0e:d1:c2:e8:ca:03:05:ef:2e:1e:8e:77 id_rsa_4096 (RSA) [16:51:22] mutante: sorry again, it seems those other servers are all with my niedzielski account. deployment-eventlogging02.eqiad.wmflabs, for example [16:52:10] niedzielski: i suspect in your .ssh/config it's about niedzielski vs. sniedzielski [16:52:44] so... the shell user is called niedzielski [16:52:52] but the correct LDAP user is sniedzielski :p [16:54:46] maybe only use one of the users or is there some reason you want two? [16:54:51] niedzielski: let's copy the part of your ssh config that lets you get on deployment labs hosts, but then change the username to sniedzielski and the hostname to the integration servers [16:55:20] jzerebecki: it's the classic "private vs. WMF account" dilemma [16:55:34] we couldnt add his private non-WMF account to the WMF LDAP group [16:55:36] oh ok [16:55:44] but at the same time it was added to labs projects [16:55:50] ..which you could be on as a volunteer too [16:55:59] mh is ldab not publicly reachable anymore? i get a timeout. [16:56:12] jzerebecki: yes, not anymore [16:56:14] only inside labs [16:56:25] oh ok [16:57:05] mutante: do i have to login through bastion? [16:57:22] niedzielski: yes [16:57:39] this will be super confusing. the root cause is kind of that we abuse the WMF group for jenkins permissions [16:59:12] jzerebecki: so i assume you also have permission to access a Jenkins job configuration page? [16:59:20] but via the "nda" group instead [16:59:33] or "wmde" [17:00:07] what we could do is add the non-WMF user to the "nda" group.. after going through the process for volunteer NDA .. shudder [17:01:34] mutante jzerebecki: i do have access to the jenkins job configure page as sniedzielski. for example, https://integration.wikimedia.org/ci/job/test-T62720-android-emulator/configure [17:04:15] mutante jzerebecki: i'm trying to make a scratch ssh config that works for integration-slave-trusty-1015.eqiad.wmflabs. here's what i have that's not working: [17:04:17] Host bastion [17:04:17] HostName bastion.wmflabs.org [17:04:18] User niedzielski [17:04:19] Host wmf-android-jenkins [17:04:19] User sniedzielski [17:04:19] HostName integration-slave-trusty-1015.eqiad.wmflabs [17:04:21] ProxyCommand ssh -W %h:%p bastion [17:04:59] niedzielski: try this https://phabricator.wikimedia.org/P998 [17:05:23] then "ssh integration-slave-trusty-1015" [17:06:17] mutante: no dice :/ [17:06:43] this time i didnt see the "failed public key" [17:07:44] niedzielski: how about just direct connection to bastion-eqiad.wmflabs.org, as both users [17:08:15] i cant connect to that one to check [17:08:19] jzerebecki: can you? [17:09:10] mutante: so i can login to niedzielski@bastion-eqiad.wmflabs.org but not sniedzielski. and from there i have to hop as sniedzielski to integration-slave-trusty-1015.eqiad.wmflabs which doesn't work [17:09:46] yet, BOTH users are in project-bastion ... hhhhhhrmmm [17:10:15] mutante: (this is all using the pasted ssh config) [17:11:09] YuviPanda: do you have any ideas why he cant connect to bastion even when his user is in project bastion? [17:11:42] mutante: while i strongly prefer all my accounts be niedzielski, would it be easier to switch them all to sniedzielski? [17:13:37] niedzielski: i'm not sure which is easier, you _could_ do this https://wikitech.wikimedia.org/wiki/Volunteer_NDA [17:13:54] that would make it so that your non-WMF user has the same rights [17:14:08] that WMF users get from the WMF LDAP group [17:14:15] but volunteers get from the nda group [17:14:21] mutante: no problem there. maybe i can get everything switched over niedzielski then! [17:15:10] niedzielski: i'd still like to know why the bastion login doesnt work though,, it seems it should, even with 2 users [17:17:04] andrewbogott: we have 2 users, both are in "project bastion", both use the same SSH key, yet one of them can connect to labs bastion and one can't.. what could it be [17:17:34] mutante: most often this is a user failing to add @ [17:18:07] Also you can always look in auth.log on the bastion to see what’s happening [17:24:25] andrewbogott: i can't, it's the non-ops bastion [17:24:38] surely you have a root key there... [17:25:06] he is already using the ssh config pasted above, incl. user@host [17:25:25] tries the root key thing after unloading other keys [17:25:30] what username? [17:25:47] < niedzielski> mutante: so i can login to niedzielski@bastion-eqiad.wmflabs.org but not sniedzielski. and from there i have to hop as sniedzielski to integration-slave-trusty-1015.eqiad.wmflabs which doesn't work [17:26:15] andrewbogott: niedzielski works, sniedzielski doesn't. i checked that both are in bastion project [17:26:41] 'Failed publickey for sniedzielski from 50.170.134.108 port 59339 ssh2: RSA d7:82:3d:a7:da:ed:38:d1:4f:32:fc:6b:b4:55:6c:08' [17:27:31] niedzielski: ^ [17:28:08] maybe you do have 2 separate keys? it would be good anyways [17:28:34] 2048 d7:82:3d:a7:da:ed:38:d1:4f:32:fc:6b:b4:55:6c:08 stephen@niedzielski.com (RSA) [17:28:34] 4096 3f:cf:51:e9:0e:d1:c2:e8:ca:03:05:ef:2e:1e:8e:77 id_rsa_4096 (RSA) [17:28:55] mutante: that's from ssh-add -l ^ [17:29:27] so that matches the key fingerprint [17:29:31] eh... [17:29:47] wait, does it [17:29:47] sorry, I have to go… back soon [17:34:56] niedzielski: so when you login on wikitech as "sniedzielski" and then go to https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-openstack does it have the key? and how does the same look as "niedzielski" compared to it [17:35:08] the "upload your ssh key to labs" part [17:35:25] did that only happen for the other user maybe? [17:35:51] mutante: no key for sniedzielski! no account for niedzielski [17:36:56] mutante: sorry, I am both eating lunch and babysitting appliance installers. I’ll be able to pay attention in a bit [17:36:57] mutante: ok added key for sniedzielski but still fails in the same way [17:37:59] ok, I see a working login for niedzielski, will you try the other account now? [17:38:04] (why two accounts, btw?) [17:38:32] because one is WMF and one is not [17:38:36] ah, sure [17:38:45] so, looks to me like it’s working. I see "Accepted publickey for sniedzielski from…" [17:38:47] and we "abuse" the WMF LDAP group for jenkins access [17:38:53] nice :) [17:38:56] that's something [17:39:00] 6Labs, 10wikitech.wikimedia.org: Cannot select different project in Special:NovaProject - https://phabricator.wikimedia.org/T105945#1458037 (10scfc) I tried deleting all cookies for `*wikitech*` (and logging in afterwards again :-)), but https://wikitech.wikimedia.org/w/index.php?title=Special:NovaProject&acti... [17:39:18] andrewbogott: i just logged sniedzielski into bastion-eqiad.wmflabs.org but can't login to ssh sniedzielski@integration-slave-trusty-1015.eqiad.wmflabs from there [17:39:51] ok, so we’re on to a new problem! Good news :) [17:39:56] You have proxycommand set up for both users? [17:41:43] andrewbogott mutante: new ssh config is here https://phabricator.wikimedia.org/P998 [17:43:07] andrewbogott mutante: nvm. i was logging in to the bastion and then trying to hop from there which didn't work. i just logged into ssh sniedzielski@integration-slave-trusty-1015.eqiad.wmflabs directly from my laptop and hat worked! [17:43:16] ah, good. [17:45:01] mutante andrewbogott jzerebecki Krenair: so i think we're good to go! thanks so much for all the help! [17:45:36] niedzielski: great :) [17:51:44] Hi all [17:51:53] andrewbogott: ping :) [17:52:01] CristianCantoro: what’s up? [17:52:38] andrewbogott: I was wondering if you can help me again with the puppet stuff to instanciate storage on my labs machine [17:52:58] sure [17:53:03] the other day I was way too tired to try that [17:53:13] so, go to ‘manage instances’ and click the ‘configure’ link for your instance. [17:53:52] scroll down and find role::labs::lvm::mnt [17:54:01] you can tick that box, or the next one role::labs::lvm::srv [17:54:08] depending on if you want your extra space at /mnt or /srv [17:54:35] in practice what is the difference? [17:54:52] I mean... besides having the extra space mounted at a different location? [17:55:02] that’s it [17:55:13] ok... [17:55:18] both names are pretty arbitrary. /srv is slightly more standard [17:55:44] ok... I was going with /mnt [17:55:54] * andrewbogott shrugs [17:56:12] then I submit [17:56:26] yep [17:56:31] (sorry, brb) [17:59:37] CristianCantoro: now if you just wait 20 minutes the mount should appear on its own. If you want to watch it work, log into the instance and [17:59:41] $ sudo puppet agent -tv [18:00:18] andrewbogott: ok... I'm curious [18:00:49] andrewbogott: Notice: Finished catalog run in 15.80 seconds [18:00:52] as long as your output is green, all is well. [18:01:05] so does ‘df’ show you as having a new volume? [18:01:09] andrewbogott: the dissk seems to already be there [18:01:19] great. [18:01:28] puppet refreshes every 20, you must’ve gotten lucky and it ran before you checked. [18:01:30] andrewbogott: I checked with df -H a few minutes ago and it was not there [18:01:41] andrewbogott: nice :) [18:02:54] CristianCantoro: so, unstuck now? [18:03:34] * andrewbogott has to go fuss with plumbing again, back soon [18:09:40] 6Labs, 10wikitech.wikimedia.org: Cannot select different project in Special:NovaProject - https://phabricator.wikimedia.org/T105945#1458117 (10Andrew) Sorry if this is already detailed above -- are you able to change the filter on other pages, just not the 'manage projects' one? [18:10:11] 6Labs, 10wikitech.wikimedia.org: Cannot select different project in Special:NovaProject - https://phabricator.wikimedia.org/T105945#1458118 (10Andrew) (Also, as an example of 'something' I just restarted Keystone. Did it help?) [18:22:33] abartov: I'm around table 22 whenever you want [18:41:01] andrewbogott: ok, now I only need to copy the data from the old server to the new machine via rsync [19:02:00] 10Tool-Labs-tools-Other: Edits by user results - https://phabricator.wikimedia.org/T106040#1458378 (10Krenair) [19:33:09] Krenair: Can you comment and say that I'm actually doing that right now [19:02:15] Yeeeeeeeeeee [19:02:21] so every once in a while, my bash script that gets called by the cronjob will fail saying that it can't load either the Ruby file or can't write to the log file, etc, because of "permission denied" [19:02:32] I did chmod 0777 on the files and no go [19:02:54] I'm convinced this is something environment related, no code changes have happened, it just all of a sudden couldn't execute the file anymore [19:03:10] any ideas? this breaks my bot :( [19:03:55] I can run my script directly, but apparently whatever user the cron is can't [19:06:14] Maybe you have to chmod the directory as well [19:06:43] the strange part is the "every once in a while" part [19:06:49] it is, full read/write/execute [19:06:52] yeah [19:07:12] it used to be the log file that it was complaining about, saying it couldn't open it bc it didn't have permission [19:07:12] where are these files , the ruby file and logfiles, located [19:07:18] on an NFS mount maybe? [19:07:43] I did a chmod 0777 on the log file, then maybe a day or two later the rights went back to 0644 [19:07:47] on their own [19:07:58] I eventually deleted the file and recreated it and I think that's what fixed it [19:08:05] MusikAnimal: is this toollabs or labs? [19:08:09] toollabs [19:08:22] I believe it is an NFS mount, yes [19:08:33] all the files are there on my end [19:09:37] does the cronjob itself also log output to a file? [19:09:55] maybe first gather info how often it actually happens / if it still happens [19:09:59] [13intuition] 15kenrick95 opened pull request #49: Updating Raun messages (06master...06master) 02https://github.com/Krinkle/intuition/pull/49 [19:10:20] 6Labs, 6operations: upgrade salt to 2015.5 - https://phabricator.wikimedia.org/T106074#1458390 (10Krenair) [19:11:21] it's happening right now [19:11:42] the cron outputs to null so I don't get emails [19:12:03] but there's an exec.sh file that gets run by the cron, and that has a corresponding exec.err [19:13:16] if it means anything, the cron runs the script on the trusty release [19:13:48] 6Labs, 6operations, 3ToolLabs-Goals-Q4: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1458395 (10Andrew) So, here's what I'm seeing: - 3.19 doesn't crash with suspend/resume. That's good! - Suspend/resume doesn't work reliably... instances seem to lose some amount... [19:14:26] the cron runs `~/exec.sh` which has always worked, going to try `sh exec.sh` [19:15:38] 6Labs, 10Labs-Infrastructure: nfs-exports-daemon hangs, prevents new instances from accessing nfs - https://phabricator.wikimedia.org/T106076#1458401 (10Andrew) 3NEW a:3yuvipanda [19:25:27] 6Labs, 10wikitech.wikimedia.org: Cannot select different project in Special:NovaProject - https://phabricator.wikimedia.org/T105945#1458425 (10scfc) 5Open>3Resolved a:3scfc I didn't test the filter on other pages. Now it works (on Special:NovaInstance), so restarting Keystone may have solved this. What... [19:25:50] 6Labs, 10wikitech.wikimedia.org: Cannot select different project in Special:NovaProject - https://phabricator.wikimedia.org/T105945#1458433 (10scfc) a:5scfc>3Andrew [19:41:23] 6Labs, 3Labs-Sprint-103, 3Labs-Sprint-104, 3Labs-Sprint-105: Upstream: Limit available images on horizon - https://phabricator.wikimedia.org/T91782#1458506 (10Andrew) [19:58:18] 10Tool-Labs-tools-Database-Queries, 7Database: HELP! Database is getting Slow: A test which took less than 10 min, now it takes 3 hours. I cannot do my experiments. - https://phabricator.wikimedia.org/T105964#1458575 (10MZMcBride) Have you tried using `revision_userindex` instead of `revision`? Even though th... [20:00:05] 10Tool-Labs-tools-Database-Queries, 7Database: HELP! Database is getting Slow: A test which took less than 10 min, now it takes 3 hours. I cannot do my experiments. - https://phabricator.wikimedia.org/T105964#1458588 (10MZMcBride) Looking at P997, it seems like this is a matter of counting entries in the revis... [20:17:29] 6Labs, 7Tracking: Sn1per mediawiki testing labs project - https://phabricator.wikimedia.org/T106086#1458648 (10Sn1per) 3NEW [20:30:41] 10Tool-Labs-tools-Database-Queries, 7Database: HELP! Database is getting Slow: A test which took less than 10 min, now it takes 3 hours. I cannot do my experiments. - https://phabricator.wikimedia.org/T105964#1458698 (10marcmiquel) Thanks for checking MZMcBride! MZMcBride the most costful operation is the UNI... [20:31:30] 6Labs, 7Database: Replicate sanitized watchlist table - https://phabricator.wikimedia.org/T106089#1458699 (10coren) 3NEW [20:32:00] 6Labs, 7Database: Replicate sanitized watchlist table - https://phabricator.wikimedia.org/T106089#1458707 (10coren) [20:32:01] 6Labs, 7Database, 5Patch-For-Review: watchlist table not available on labs - https://phabricator.wikimedia.org/T59617#1458706 (10coren) [20:33:32] 6Labs, 7Database: Replicate sanitized watchlist table - https://phabricator.wikimedia.org/T106089#1458699 (10coren) [20:33:34] 6Labs: Replicate watchlist to labs - https://phabricator.wikimedia.org/T93887#1458710 (10coren) [20:37:56] 6Labs: Replicate watchlist to labs - https://phabricator.wikimedia.org/T93887#1458738 (10coren) a:5Springle>3None Unassigning so that any of our star DBAs can gram [20:39:42] Coren|MX: YuviPanda sorry to bother... did you see my messages above about the cronjob being unable to execute my script due to permissions? everything is chmod 0777.... [20:39:53] this keeps happening, maybe a NFS issue? [20:42:23] the order is: cronjob calls exec.sh (has full rwx), exec.sh sets the PATH and Ruby version, then it runs the Ruby script [20:43:18] the error is about running the Ruby script, so whatever "user" the cron is doesn't have permission, despite granting full rwx to the whole directory [20:45:06] this has happened a few times now since I launched my bot, every time it happens with no change to the code or anything else, it just all of a sudden complains about permissions [20:45:26] and just as mysteriously will start working again [20:45:31] last time it lasted about a day or two [21:09:55] 6Labs, 10Tool-Labs: lighttpd does not correctly close connections (CLOSE_WAIT) - https://phabricator.wikimedia.org/T104799#1458826 (10Danmichaelo) Strange. Same problem today. 298 CLOSE_WAIT connections at tools-webgrid-lighttpd-1206.eqiad.wmflabs ``` tcp 1 0 tools-webgrid-lighttpd-1206.tools.eqia... [21:36:47] [13intuition] 15lokal-profil opened pull request #50: Add support for dcatap (06master...06dcatap) 02https://github.com/Krinkle/intuition/pull/50 [21:40:51] !log cleaned up some stale all-project Special:NovaPuppetGroup classes related to deployserver roles [21:40:52] cleaned is not a valid project. [21:41:03] !log wikitech cleaned up some stale all-project Special:NovaPuppetGroup classes related to deployserver roles [21:41:03] wikitech is not a valid project. [21:41:08] meh [21:56:36] hi [21:57:11] a quick question, could someone please remind me how to connect to a instance through bastion? [21:57:22] my ssh tunnel isnt working and I dont mind doing this manually [21:57:37] I have been looking at https://wikitech.wikimedia.org/wiki/Help:Access_to_instances_with_PuTTY_and_WinSCP [22:00:21] 6Labs, 10Tool-Labs: Provide namespace IDs and names in the databases similar to toolserver.namespace - https://phabricator.wikimedia.org/T50625#1458973 (10Ricordisamoa) [22:02:07] 6Labs, 10Tool-Labs, 7Database: Provide replication lag as a database function - https://phabricator.wikimedia.org/T50628#1458991 (10Ricordisamoa) [22:02:37] 6Labs, 10Tool-Labs, 7Database, 3ToolLabs-Goals-Q4: Show replication lags in Graphite - https://phabricator.wikimedia.org/T50694#1458994 (10Ricordisamoa) [22:05:00] is it not just "ssh instance" [22:11:56] White_Cat: in the example screnshots, where it says "pmtpa" you should put "eqiad" instead nowadays [22:18:35] (03PS1) 10BryanDavis: Add empty releases/id_rsa.upload [labs/private] - 10https://gerrit.wikimedia.org/r/225251 [22:20:43] bd808: i saw a patch by hashar earlier that removes that from the role when in labs [22:20:56] cool that would work too [22:20:58] bd808: and i was the one who moved it from nodes into the role... not expecting issues for beta [22:21:13] It is picked up by role::deployment::server now [22:21:21] all i wanted is enable uploads from the new codfw mira [22:21:24] and we use that in beta cluster and other places [22:21:25] yes [22:21:40] madhuvishy: https://github.com/wiki-ai/wikilabels-wikimedia-config/blob/master/requirements.txt [22:21:46] bd808: https://gerrit.wikimedia.org/r/#/c/225025/ [22:22:07] bd808: your approach might be better though [22:22:34] I just want the puppet noise to stop :) [22:22:34] like i would +1 putting files in labs/private instead of another $realm check [22:23:09] bd808: btw, you have *nice* vagrant packages now [22:23:39] I actually want to split role::deployment up. I want to have trebuchet without MW [22:23:57] bd808: i argued that "being able to uploaded releases should be a feature of a deployment host", that made me do it [22:24:02] YuviPanda: I saw that Alex did deb magic :) [22:24:09] yeah [22:24:14] I saw the deps and wnet 'poop' [22:24:36] YuviPanda: I'm testing an updated patch that uses the deb install now [22:25:06] bd808: i'll merge hashar's change [22:25:15] works for me [22:25:47] bd808: nice! [22:27:11] (03Abandoned) 10BryanDavis: Add empty releases/id_rsa.upload [labs/private] - 10https://gerrit.wikimedia.org/r/225251 (owner: 10BryanDavis) [22:28:14] there is the same issue with backup classes too [22:28:51] no bacula in labs in general? [22:31:41] bd808: there should be less noise now [22:32:01] mutante: thanks [22:32:58] bd808: iegreview moved.. i hacked the admin password to check on it. let me know if you want it back/reset [22:33:26] noticed the salted(!) passwords , very nice compared to before with md5 [22:34:01] and your comment on "how to write a test that tests if random is random" .interesting , yes [22:34:17] I'll need access to the admin account at some point but I can hack it much as you did [22:38:03] (03PS1) 10Sitic: Forward API errors to client, fix error handling [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/225258 (https://phabricator.wikimedia.org/T106057) [22:38:38] (03CR) 10Sitic: [C: 032 V: 032] Forward API errors to client, fix error handling [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/225258 (https://phabricator.wikimedia.org/T106057) (owner: 10Sitic) [22:45:58] does labs support trebuchet deployment packages? [22:51:29] SMalyshev: bd808 knows but I guess no [22:51:53] well, 'maybe' perhaps? [22:51:56] * YuviPanda isn't sure at all [22:52:15] ok, thanks, let's see if bd808 can tell [22:52:34] well.... [22:52:47] not out of the box, no [22:53:04] beta cluster has a deploy server setup [22:53:13] and any project can add one of their own [22:53:21] but there is no labs wide one [22:53:51] tgr and I are working through setting such a thing up for the sentry project [22:53:52] I'm not sure I understand all the components yet... so if I make my own deploy server, can I still use the same git repos? [22:54:27] yes, same git repos as sources (gerrit repos) [22:54:46] but you need a salt master and a "deploy server" which can be the same or separate boxes [22:55:23] git-deploy talks to the salt master which then talks to the deploy targets and tells them how to fetch from the deploy server [22:55:32] ah, ok [22:55:52] found this one - so this would work? https://wikitech.wikimedia.org/wiki/Trebuchet#Using_Trebuchet_in_Labs [22:56:10] also, is it related somehow to running own puppetmaster or it's completely independent things [22:56:15] ? [22:56:39] self-hosted puppet is not needed I don't think [22:56:48] (bad grammar) [22:56:58] I don't think a self-hosted puppetmaster is needed [22:57:01] well, it's needed for me :) for non-deploy reasons [22:57:09] at least for now [22:57:34] and yeah that page is mostly right, except the role names have changed a bit. Let me update it [22:57:47] cool, thanks! [23:03:43] SMalyshev: I changed the role names on the page. tgr is still working on getting puppet to run cleanly and there may be more changes. [23:03:49] that page was a bit old [23:08:52] on the tool labs index page. we are linking to maintainers and Special:NovaServiceGroup .. but not to actual tool URLs? [23:32:54] bd808: ok, thanks [23:44:06] anybody knows what it means when I get this from puppet: Error: Could not retrieve catalog from remote server: Could not intern from text/pson: Could not intern from data: Could not find relationship target "File[]" [23:44:29] Is there a way to ssh directly as a tool? [23:45:35] SigmaWP: there is "become" to become a tool,if you mean that [23:46:02] Yeah, I mean skip the "become" command entirely, like just ssh blah and you're in [23:46:05] Though it appears not. [23:47:22] SMalyshev: maybe something like "file { "$variable" but $variable is undefined [23:49:55] mutante: can't find anything like this though... does puppet have any setting to force it to tell me where the problem is? [23:51:28] SigmaWP: alias magic="ssh tools-login.wmflabs.org -t 'become morebots'" [23:51:32] magic [23:51:44] true! [23:51:47] tools.morebots@tools-bastion-01:~$ [23:52:05] Hopefully this works with scp and friends. [23:55:55] SMalyshev: assuming you already run it with -v . afraid this is a special case of error where the parser thinks it looks ok [23:56:21] usually it would output file and line number if it was detected by the parser [23:56:34] would it be easy to paste the code ? [23:56:35] mutante: yeah I run it with -v and even with --debug and it's not very useful... it doesn't look like it's parser [23:57:05] mutante: it's pretty big chunk, https://gerrit.wikimedia.org/r/#/c/223663/