[00:56:52] 6Labs, 6Phabricator, 7Puppet: Create puppet role for Phabricator hosted repo testing - https://phabricator.wikimedia.org/T104827#1428544 (10Negative24) 3NEW [00:57:22] 6Labs, 6Phabricator, 7Puppet: Create puppet role for Phabricator hosted repo testing - https://phabricator.wikimedia.org/T104827#1428551 (10Negative24) a:3Negative24 [00:57:43] 6Labs, 6Phabricator, 7Puppet: Create puppet role for Phabricator hosted repo testing - https://phabricator.wikimedia.org/T104827#1428544 (10Negative24) [00:57:46] 6Labs, 10Gerrit-Migration, 6Phabricator: Stabilize vcs-user owned files and directories in Phab-02 - https://phabricator.wikimedia.org/T95982#1428552 (10Negative24) [00:58:18] 6Labs, 6Phabricator, 7Puppet: Create puppet role for Phabricator hosted repo testing - https://phabricator.wikimedia.org/T104827#1428544 (10Negative24) [01:29:55] YuviPanda: I'm trying to add a puppet group in WT (role::phabricator::labs::differential) and its returning "Bad resource name provided. Resource names start with a-z, and can only contain a-z, 0-9, and - characters." [01:32:12] ^ puppet groups always contain : [01:50:36] ^ nm. Figured it out and updated docs [02:25:11] 10MediaWiki-extensions-OpenStackManager: Puppet group management interface doesn't allow all characters allowed for puppet variables and classes - https://phabricator.wikimedia.org/T38044#1428580 (10Negative24) I updated some confusing documentation [[ https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetm... [07:54:54] 6Labs, 10Tool-Labs, 10Pywikibot-OAuth: Add OAuth to https://tools.wmflabs.org/pywikibot-testwiki/ - https://phabricator.wikimedia.org/T104291#1428775 (10jayvdb) [08:14:05] s51053 is abusing his/her access to replica dbs and creating lag for other users. His/her queries are to be terminated. [08:20:33] ^lag immediatly improved upon performing the termination [08:40:34] jynus: please log such actions, so uses can know what happened to their process [08:40:40] *users [08:43:25] I do not know which project that user belongs to [08:45:13] I need an easy way to do that [08:45:37] jynus: can you see what ip he was coming from ? [08:47:49] no, that is not logged [08:48:13] I see, thanks [08:52:25] I got an IP from my console history, but it does not resolve to any know service [08:54:19] 10Tool-Labs-tools-Other: Https bug with Contributions by User to a page - https://phabricator.wikimedia.org/T104812#1429008 (10Aklapper) p:5Unbreak!>3Triage Confirming, however the bottom of that page says > Bugs, suggestions, questions? Contact the author at User talk:Scottywong @Rsrikanth05: Did you do tha... [08:56:17] 10Tool-Labs-tools-Other: usersearch on Tool-Labs lists invalid URLs for Contributions by User to a page - https://phabricator.wikimedia.org/T104812#1429010 (10Aklapper) [09:09:54] 6Labs, 10Tool-Labs, 7Database: mixnmatch_p on wikidatawiki.labsdb slow - https://phabricator.wikimedia.org/T104833#1429051 (10Nemo_bis) 3NEW [09:10:48] 10Tool-Labs-tools-Other: usersearch on Tool-Labs lists invalid URLs for Contributions by User to a page - https://phabricator.wikimedia.org/T104812#1429059 (10Rsrikanth05) @Aklapper Scottywong has retired on the English Wikipedia, hence I posted it here. Forgive me for the priority levels. This is the first time... [09:24:16] 6Labs, 10Tool-Labs, 7Database, 7Easy: mixnmatch_p on wikidatawiki.labsdb slow - https://phabricator.wikimedia.org/T104833#1429083 (10jcrespo) p:5Triage>3Low [09:26:34] ^you, yes, YOU can help! [09:48:38] jynus: ldaplist -l passwd | grep -C 15; ldaplist -l servicegroups | grep -C 15 --> tools.jackbot [09:49:04] !log tools 10:14 s51053 is abusing his/her access to replica dbs and creating lag for other users. His/her queries are to be terminated. (= tools.jackbot / user jackpotte) [09:49:08] Logged the message, Master [09:50:09] (the s in front of the uid means it's a service group as well, I think) [09:56:36] will !log tools.jackbot work? [09:57:42] jynus: yeah, but that probably won't reach the owners if they doný use the SAL. I'd just mail to tools.jackbot@tools.wmflabs.org [09:59:10] (that only works for tools users, though) [10:25:18] valhallasw`cloud, https://etherpad.wikimedia.org/p/dear-tool-user [10:53:22] jynus: fixed two typos, changed one sentence (the one on correcting measures), looks OK otherwise [10:53:34] jynus: I also realized a user talk page message might be more practical, as that's logged [10:54:18] https://wikitech.wikimedia.org/wiki/User_talk:JackPotte [10:56:12] jynus: and if it's obvious why the query is causing excessive load, it'd be good to mention that, as most users would not necessarily understand why a certain query is bad [10:56:29] in any case, thanks for taking care of load issues [10:56:36] thank you, valhallasw`cloud [10:56:53] problem is, valhallasw`cloud, is always time [10:57:13] if I could, I would sit a bit with the user and mentor it [10:57:22] *him/her [10:57:39] but I really *cannot* [10:58:15] I cannot even do it with production [10:59:01] of course :-) but I guess by looking at the query you'd be able to say 'this is an issue because it's an insert that takes ages', or 'don't do outer joins', or something like that? If there's no obvious cause, it's of course perfectly fine to not put time in it to figure out why the query was an issue [11:00:02] not really, "too many and too long" was the obvious thing here [11:00:30] adding that is already helpful [11:00:55] the only thing that can be done is refer to EXPLAIN, but that is more useless if it is not known by the user [11:08:07] thanks, for the help and suggestions, much appreciated [11:38:52] Negative24: yeah unfortunately [12:36:34] 6Labs: clean up old ec2id-based salt keys on labs - https://phabricator.wikimedia.org/T103089#1429454 (10ArielGlenn) So we have a variety of hosts using the old names. I have a cut and paste list here; these are not salt related issues but should be fixed. +not found: android-build.mobile.eqiad.wmflabs (i-000... [12:37:34] 6Labs, 3Labs-Sprint-101, 3Labs-Sprint-102: Sort out remaining virt1000 salt minions - https://phabricator.wikimedia.org/T103010#1429462 (10ArielGlenn) Still three left for whatever reason. android-build.mobile.eqiad.wmflabs mobile-hierator2.mobile.eqiad.wmflabs test-carbon-c-relay.graphite.eqiad.wmflabs [13:22:04] 6Labs, 10wikitech.wikimedia.org: Cannot log into wikitech - https://phabricator.wikimedia.org/T103939#1429594 (10JanZerebecki) 5Open>3Resolved a:3JanZerebecki Works now. [13:34:56] 6Labs, 10Tool-Labs, 5Patch-For-Review: Create process for 'tool labs is down' notifications on tools.wmflabs.org/* - https://phabricator.wikimedia.org/T102971#1429662 (10valhallasw) Need to update this to https://s.codepen.io/Krinkle/debug/domOQK?#errorpage [13:49:00] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-103, 3Labs-Sprint-104: Labs: Make a new backup of the Labs storage to codfw - https://phabricator.wikimedia.org/T103356#1429675 (10yuvipanda) a:5coren>3None The broken mount is done, and there's a new copy of a new snapshot going to /srv/backup--*... [13:49:22] 6Labs, 3Labs-Sprint-104: Recover wikidata.php from magnustools - https://phabricator.wikimedia.org/T104337#1429680 (10yuvipanda) a:5coren>3yuvipanda [13:49:34] 6Labs, 3Labs-Sprint-104: Recover files from old corrupted file system (Tracking) - https://phabricator.wikimedia.org/T104334#1429681 (10yuvipanda) a:5coren>3yuvipanda [13:49:45] 6Labs, 3Labs-Sprint-104: Recover GND bot from wikidata-todo - https://phabricator.wikimedia.org/T104336#1429685 (10yuvipanda) a:5coren>3yuvipanda [13:52:17] YuviPanda: regarding mail and dns in labs. if i set up a host and give it a dns name, and configer a mail server on it, will i be able to get mail to matanya@wmflabs.org ? [13:54:07] matanya: no, you can't configure MX records from wikitech [13:54:24] i guess so, thanks valhallasw`cloud [13:54:32] *guessed [13:55:25] matanya: what do you need it for? would it be OK to use @tools.wmflabs.org for it? [13:56:17] kust thinking out loud. i thought about something like people.wikimedia.org but for labs [13:59:07] "waiting for available socket" what? http://tools.wmflabs.org/glamtools/baglama2/#gid=172&month=201505 [13:59:30] Nemo_bis: {{worksforme}}? [13:59:41] maybe browser overload [13:59:50] matanya: there should not be a people.wikimedia.org if wikimedia people can't use it [14:00:02] yeah, I wouldn't expect a message in your browser if that were a server-side wit [14:00:25] Nemo_bis: define wikimedia people :) [14:01:28] PROBLEM - Puppet failure on tools-checker-02 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:01:36] I love it when people think of RTL: https://tools.wmflabs.org/glamtools/baglama2/#gid=172&month=201505&giu=hewiki&server=he.wikipedia.org [14:01:39] matanya: doesn't matter :) [14:01:56] PROBLEM - Puppet failure on tools-exec-1403 is CRITICAL 20.00% of data above the critical threshold [0.0] [14:03:02] PROBLEM - Puppet failure on tools-exec-1402 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:03:16] PROBLEM - Puppet failure on tools-webgrid-generic-1404 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:03:18] PROBLEM - Puppet failure on tools-exec-1401 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:03:40] PROBLEM - Puppet failure on tools-exec-1202 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:03:48] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1409 is CRITICAL 20.00% of data above the critical threshold [0.0] [14:04:18] PROBLEM - Puppet failure on tools-mail is CRITICAL 22.22% of data above the critical threshold [0.0] [14:04:22] PROBLEM - Puppet failure on tools-exec-catscan is CRITICAL 30.00% of data above the critical threshold [0.0] [14:04:32] PROBLEM - Puppet failure on tools-exec-1207 is CRITICAL 40.00% of data above the critical threshold [0.0] [14:04:33] PROBLEM - Puppet failure on tools-exec-1216 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:04:38] PROBLEM - Puppet failure on tools-exec-1405 is CRITICAL 20.00% of data above the critical threshold [0.0] [14:04:52] PROBLEM - Puppet failure on tools-exec-1208 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:04:58] PROBLEM - Puppet failure on tools-bastion-02 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:05:20] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1210 is CRITICAL 33.33% of data above the critical threshold [0.0] [14:05:32] PROBLEM - Puppet failure on tools-precise-dev is CRITICAL 20.00% of data above the critical threshold [0.0] [14:05:33] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:06:04] wahhh [14:06:29] No such file or directory - /etc/puppet/modules/labstore/files/projects-nfs-config.yaml [14:06:30] andrewbogott: I'm pretty sure ^ is me [14:06:50] I wasn’t worried :) [14:07:10] andrewbogott: I changed a parser function, and apparently those require a puppetmaster restart in labs but not in prod? [14:07:29] prod uses passenger, so no puppetmaster there. [14:07:34] So, seems normal. [14:07:52] andrewbogott: oh, I see. the labs / prod puppetmasters are htat different? [14:07:54] I guess we could make the auto update detect w/not there was a change and restart the master if there was? [14:07:59] labcontrol1001 vs palladium [14:08:09] Oh — no, those should be the same as far as I know. [14:08:16] And I don’t think either one uses a puppetmaster. [14:08:21] It’s only ::self that does. [14:08:39] PROBLEM - Puppet failure on tools-shadow is CRITICAL 60.00% of data above the critical threshold [0.0] [14:08:55] PROBLEM - Puppet failure on tools-master is CRITICAL 33.33% of data above the critical threshold [0.0] [14:08:57] Is it possible that palladium ‘needs’ it too but there just aren’t any symptoms of it being behind? [14:09:01] PROBLEM - Puppet failure on tools-exec-1212 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:09:04] 6Labs, 10Tool-Labs: lighttpd does not correctly close connections (CLOSE_WAIT) - https://phabricator.wikimedia.org/T104799#1429722 (10valhallasw) [14:09:54] andrewbogott: hmm, but when I made changes to ipresolve function earlier nothing needed restarting... [14:10:04] 6Labs, 10Tool-Labs: lighttpd does not correctly close connections (CLOSE_WAIT) - https://phabricator.wikimedia.org/T104799#1427257 (10valhallasw) [14:10:15] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL 11.11% of data above the critical threshold [0.0] [14:10:24] I can never follow when it does and doesn’t need it. [14:10:29] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1206 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:10:36] PROBLEM - Puppet failure on tools-exec-1406 is CRITICAL 30.00% of data above the critical threshold [0.0] [14:11:00] PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL 60.00% of data above the critical threshold [0.0] [14:11:06] One thing to keep in mind is that puppet-merge doesn’t push to labcontrol1001 — it rebases with a cron. So palladium and labcontrol1001 can be out of sync for a minute or two. never more than that though. [14:11:18] PROBLEM - Puppet failure on tools-webproxy-01 is CRITICAL 66.67% of data above the critical threshold [0.0] [14:11:42] PROBLEM - Puppet failure on tools-exec-1209 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:12:10] YuviPanda: in any case, looks like recoveries are on the way. [14:12:18] andrewbogott: yes, I rebooted. [14:12:22] PROBLEM - Puppet failure on tools-webgrid-generic-1403 is CRITICAL 50.00% of data above the critical threshold [0.0] [14:12:27] andrewbogott: err, restarted apache [14:12:29] not rebooted... [14:16:15] elee: heh. we must be on at different times. still around? [14:20:13] RECOVERY - Puppet failure on tools-bastion-01 is OK Less than 1.00% above the threshold [0.0] [14:23:27] Negative24: yessir [14:23:31] caught me at a good time [14:24:16] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-102, 3Labs-Sprint-103, 3Labs-Sprint-104: Audit projects' use of NFS, and remove it where not necessary - https://phabricator.wikimedia.org/T102240#1429772 (10yuvipanda) [14:24:18] 6Labs, 5Patch-For-Review: Remove NFS mounts from project cephtest - https://phabricator.wikimedia.org/T102381#1429770 (10yuvipanda) 5Open>3Resolved Done! [14:24:45] elee: good [14:24:50] 6Labs: Archive NFS data for projects that no longer have NFS - https://phabricator.wikimedia.org/T104857#1429775 (10yuvipanda) 3NEW [14:25:10] Negative24: I can't think of additional troubleshooting steps so have given up =p [14:25:18] elee: where's your ssh config again? [14:25:32] https://github.com/leee/.dotfiles/blob/master/sshconfig [14:25:36] also: [14:25:51] 6Labs: Archive NFS data for projects that no longer have NFS - https://phabricator.wikimedia.org/T104857#1429795 (10yuvipanda) Also some projects used to have /home and /project and have only one now. Those should be archived as well. [14:25:54] http://web.mit.edu/leee/Public/ssh-stawp.txt [14:25:56] and http://web.mit.edu/leee/Public/ssh-stawp2.txt [14:26:00] I have a tiny idea that might work but I don't know why someone else wouldn't have caught it [14:26:06] sure hit me [14:26:17] elee: very copy and pastey :p [14:26:21] ^_^ JohnFLewis [14:26:23] RECOVERY - Puppet failure on tools-checker-02 is OK Less than 1.00% above the threshold [0.0] [14:26:43] yeah I probably should think of a better machine name than the current one [14:27:23] ok so bastion resolves but not tunneling [14:27:25] also for some really funky reason I kept thinking harej was indian [14:27:28] Negative24: correct [14:28:29] elee: fyi this is mine: https://github.com/jdloft/dotfiles/blob/master/link/.ssh/config [14:28:58] its significantly simpler and everything below ln 38 is shorthand [14:29:33] oh wait what about gerrit? [14:29:41] or do you use gerrit over https [14:29:52] 6Labs, 3Labs-Sprint-102, 3Labs-Sprint-104, 5Patch-For-Review: Labs: manage-nfs-volumes-daemon rewrite - https://phabricator.wikimedia.org/T102782#1429817 (10yuvipanda) Lots of patches later, the stuff left to do is to support new projects: # Create folders under /srv/others # Bind mount that onto /exp I'... [14:30:40] elee: I haven't thought about gerrit. Its always worked on ssh for me [14:30:46] hrm... [14:31:12] I wish wmflabs had a bastion host that people could dev on [14:31:45] elee: https://gist.github.com/JohnFLewis/e65febe9bc15b00b2510 mine if it helps too [14:31:54] hrm should I just rewrite this [14:31:55] RECOVERY - Puppet failure on tools-exec-1403 is OK Less than 1.00% above the threshold [0.0] [14:32:19] oh I've got a really good question for you al [14:32:20] all* [14:32:24] elee: the config? yes as like 99% of the top half is root-only stuff [14:32:26] how do you guys keep keys portable? [14:32:44] physically portable? [14:32:51] not necessarily [14:33:01] RECOVERY - Puppet failure on tools-exec-1402 is OK Less than 1.00% above the threshold [0.0] [14:33:07] elee: mines encrypted so I don't worry about it as much as other people [14:33:15] like I could always store my private keys in my dotfiles repo [14:33:19] I store by keys in an encrypted usb [14:33:21] but people consider that a Bad Idea [14:33:30] like I see nothing wrong with it though [14:33:32] elee: comment out line 45 for me [14:33:37] no don't do that [14:33:40] RECOVERY - Puppet failure on tools-shadow is OK Less than 1.00% above the threshold [0.0] [14:33:42] RECOVERY - Puppet failure on tools-exec-1202 is OK Less than 1.00% above the threshold [0.0] [14:33:46] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1409 is OK Less than 1.00% above the threshold [0.0] [14:33:50] Negative24: roger, standby [14:34:02] oh wait that makes sense [14:34:02] RECOVERY - Puppet failure on tools-exec-1212 is OK Less than 1.00% above the threshold [0.0] [14:34:12] (that was to the dotfiles storage comment, still comment out ln 45) [14:34:22] RECOVERY - Puppet failure on tools-exec-catscan is OK Less than 1.00% above the threshold [0.0] [14:34:22] RECOVERY - Puppet failure on tools-mail is OK Less than 1.00% above the threshold [0.0] [14:34:24] oh irc [14:34:32] you make things so clear :P [14:34:32] RECOVERY - Puppet failure on tools-exec-1216 is OK Less than 1.00% above the threshold [0.0] [14:34:33] RECOVERY - Puppet failure on tools-exec-1207 is OK Less than 1.00% above the threshold [0.0] [14:34:38] RECOVERY - Puppet failure on tools-exec-1405 is OK Less than 1.00% above the threshold [0.0] [14:34:52] RECOVERY - Puppet failure on tools-exec-1208 is OK Less than 1.00% above the threshold [0.0] [14:34:59] RECOVERY - Puppet failure on tools-bastion-02 is OK Less than 1.00% above the threshold [0.0] [14:35:02] but no cigar [14:35:18] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1210 is OK Less than 1.00% above the threshold [0.0] [14:35:30] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1206 is OK Less than 1.00% above the threshold [0.0] [14:35:34] RECOVERY - Puppet failure on tools-precise-dev is OK Less than 1.00% above the threshold [0.0] [14:35:34] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK Less than 1.00% above the threshold [0.0] [14:35:42] elee: do what I do - just have a host bastion.wmflabs.org definition separately with user and identityfile [14:36:01] wait the problem here is being able to use bastion for what it does [14:36:12] RECOVERY - Puppet failure on tools-webproxy-01 is OK Less than 1.00% above the threshold [0.0] [14:36:13] see http://web.mit.edu/leee/Public/ssh-stawp.txt [14:36:34] wait hah JohnFLewis != James_F [14:36:55] how did you know? :o [14:37:01] =p [14:37:23] though anyway, do as I said. keep the bastion definition simple [14:37:26] RECOVERY - Puppet failure on tools-webgrid-generic-1403 is OK Less than 1.00% above the threshold [0.0] [14:38:01] 6Labs: fix labs jessie instances to have correct salt version - https://phabricator.wikimedia.org/T104849#1429845 (10Krenair) [14:38:16] RECOVERY - Puppet failure on tools-webgrid-generic-1404 is OK Less than 1.00% above the threshold [0.0] [14:38:17] elee: James_F is James_F [14:38:20] RECOVERY - Puppet failure on tools-exec-1401 is OK Less than 1.00% above the threshold [0.0] [14:38:20] RECOVERY - Puppet failure on tools-exec-1402 is OK Less than 1.00% above the threshold [0.0] [14:38:21] =p [14:38:32] okay uh let me just make it simple [14:38:36] I'm also not Negative24. [14:38:38] :-) [14:38:43] gasp [14:38:46] WHAT [14:38:54] RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0] [14:39:56] okay hrm [14:40:03] JohnFLewis: I did just do that [14:40:09] Host bastion.wmflabs.org [14:40:15] with the appropriate user and identity file [14:40:16] no cigar [14:40:38] 6Labs: Archive NFS data for projects that no longer have NFS - https://phabricator.wikimedia.org/T104857#1429864 (10Andrew) I periodically run archive-managed-projects. From the comments there: ### puppet:///files/ldap/scripts/archive-managed-projects ###########################################################... [14:40:38] RECOVERY - Puppet failure on tools-exec-1406 is OK Less than 1.00% above the threshold [0.0] [14:40:44] Positive24: hold on one sec. I'm making a simplified ssh config that I know works [14:40:47] keep getting denied (publickey)? [14:40:50] WAIT [14:40:52] UH [14:40:55] hold on [14:41:00] RECOVERY - Puppet failure on tools-exec-cyberbot is OK Less than 1.00% above the threshold [0.0] [14:41:09] 10Quarry: Unicode in query results in strange behavior - https://phabricator.wikimedia.org/T71224#1429869 (10Halfak) 5Open>3Resolved [14:41:18] 10Quarry: Unicode in query results in strange behavior - https://phabricator.wikimedia.org/T71224#742694 (10Halfak) Yup. Works for me now. [14:41:36] HAHA [14:41:42] oh god that's hilarious [14:41:42] RECOVERY - Puppet failure on tools-exec-1209 is OK Less than 1.00% above the threshold [0.0] [14:41:48] okay uh [14:41:51] my dotfiles [14:41:54] my sshconfig rather [14:41:58] look at line 47 [14:42:07] 6Labs, 10Tool-Labs, 10Wikimania-Hackathon-2015: Conduct a research tools workshop at wikimania hackathon 2015 - https://phabricator.wikimedia.org/T91062#1429873 (10Halfak) [14:42:09] I define proxycommand, but not my identityfile, let alone my username [14:42:12] that's pure gold [14:42:48] and wait I don't need to define identityfile because it gets automatically asked for [14:42:53] remember people: leee != elee [14:43:13] yeup it works [14:43:37] that was it [14:43:40] just add User elee [14:43:43] wow I'm a dumbass [14:44:38] but uh wow [14:44:41] thanks JohnFLewis and Negative24 [14:44:57] working configuration at https://github.com/leee/.dotfiles/blob/master/sshconfig [14:45:09] I should clean up my dotfiles, hrm. [14:45:13] yeah [14:45:16] heh [14:45:27] I was getting to the same point in my investigation [14:45:43] lines 4 to 40 will likely never be needed for you anyway [14:45:56] yeah exactly [14:46:03] uh what are bastions 1-4 for anyways [14:46:39] which ones? [14:46:48] was looking at your dotfiles [14:46:51] orr ather, Negative24's [14:47:16] I do however want to talk to harej at some point about getting a new cluster running [14:47:37] out of curiosity, how large is eqiad? [14:48:59] elee: I'm updating my config [14:49:12] check in a min. [14:49:14] as am I =p [14:49:22] elee: around 500 servers I think. I could be way out of date though. [14:49:40] hrm, that's a lot, I wouldn't be able to ask for that many, let alone 100 [14:49:48] 10? 20? =p [14:49:54] but we do have tons of external IPs [14:50:30] elee: now check/refresh [14:51:25] bastion 2 is er [14:51:26] ? [14:51:38] I remember tools has two as well [14:51:44] the latter being larger at 8GB of ram [14:51:48] and 4 cores methinks? [14:52:00] oh right, login.tools vs dev.toolsw [14:52:03] the second one is for a hot swap [14:52:13] only in outages [14:52:17] reasonable [14:52:32] on tools its different. the second is for big dev stuff such as compiling [14:52:45] since on bastion you don't do any dev stuff [14:56:23] Negative24: are there hosts that one can do dev work on? [14:57:12] elee: generic dev stuff? [14:57:39] yeah - I keep all the things I do on seperate machines and would like to keep wikimedia related stuff elsewhere [14:58:48] andrewbogott: elee: I was bored and well, 668 servers in ganglia for eqiad :) [14:59:16] uh well [14:59:24] jesus christ [14:59:24] you must be very bored [14:59:34] that's fun [14:59:51] I honestly would have written a script to do it if I was bored [14:59:53] Gmetad must totally be happy [15:00:16] elee: which means the other 3 datacenters have 371 collectively [15:00:25] wait, there are other ones? [15:00:36] I thought it was just eqiad and something internal with wmf [15:00:51] elee: labs doesn't really want to just give out machines for misc stuff. In fact a lot of the docs say do everything on your local machine and then push to labs for testing [15:00:59] yeah, eqiad (Ashburn), ulsfo (San Fran), esams (Haarlem) and codfw (Dallas) [15:01:16] eqiad is mostly for labs, right? [15:01:31] reasonable. I have a machine that I remotely do dev stuff on but meh [15:01:35] I need to rethink my workflow [15:01:36] eqiad has the only labvirt machines which there are 8 I think? [15:02:09] wait, kvm machines? [15:02:19] I thought labs was a openstack shop [15:02:28] no its openstack which is kvm under the hood [15:02:29] Negative24: ^ + eqiad is one of the main dcs where all traffic goes though codfw is opening as a second main dc [15:02:46] ah [15:02:57] not opening, slowly being deployed as so [15:03:26] ah [15:03:58] sorry for the lack of expertise with nova [15:04:08] I'm hoping to twiddle with mesos when I have more time over the summer though [15:05:43] just wait until we get horizon. that's a whole different ballpark. [15:06:05] come to thing of it, anyone know how to track the progress of horizon [15:06:10] Negative24: labs isn't the only visualized environment in eqiad too :p [15:06:13] horizon? [15:07:10] elee: openstack horizon [15:07:52] oh wait [15:07:54] this looks great [15:08:27] wait [15:08:30] WHOA [15:09:48] elee: where are you looking? [15:09:54] docs [15:10:12] okay wow uh I wonder if they know about this [15:10:14] ganeti ftw imo ;) [15:10:26] elee: they know. Its installed already [15:10:43] oh no this is for someone else [15:11:03] but wow [15:11:42] JohnFLewis: ganeti is cool but I don't know of anyone (other than Google!) that uses it [15:11:51] Negative24: I do :) [15:11:56] and you [15:11:57] the WMF [15:12:00] I'm playing with it. [15:12:09] JohnFLewis: where? [15:12:15] in production [15:12:28] https://wikitech.wikimedia.org/wiki/Ganeti [15:14:39] I wasn't aware of that [15:16:04] elee: where are you playing with it [15:16:26] Negative24: personal cluster [15:16:44] oh. is this MIT? [15:17:05] sure [15:17:15] lets just say its 5 esxi hosts inside a dorm room [15:17:23] rather [15:17:32] I'm seeing if I like esxi over kvm or xen [15:17:35] or something else basically [15:18:22] MIT's great though - literally labs left and right retire machines every two to three years, we go dumpster diving, and find really specced out machines for free [15:18:37] get drives since drives are typically destroyed, and we're good to go [15:19:08] this current cluster are 5 identical poweredges (I think, I haven't seen them for a long time now because summer and stuff) with 128GB each [15:21:30] wow [15:34:03] sorry but I sent emails to wrong addresses date@tools.wmflabs.org, by@tools.wmflabs.org, User-Liangent@tools.wmflabs.org, and@tools.wmflabs.org, User-Liangent-adminbot@tools.wmflabs.org, from@tools.wmflabs.org, edits@tools.wmflabs.org, Template-Dyk@tools.wmflabs.org, -s@tools.wmflabs.org because of bad commands used [15:34:38] liangent: :D they should just bounce, I think [15:34:39] any matched existent user? [15:35:19] from the looks of it probably not [15:36:01] okay [15:56:00] I'm not sure if this is the appropriate place or if this is even an issue or if it is intentional, but the service running on tiles.wmflabs.org is no longer functioning for me. For example, the following URL results in a 500 internal server error: http://a.tiles.wmflabs.org/bw-mapnik/13/2099/3040.png whereas previously this would have returned an image file. Possibly related -- going to http://a.tiles.wmflabs.org/bw-mapnik/ results in a file not [15:56:01] found and http://tiles.wmflabs.org/ just shows a default directory listing as if the server is missing the appropriate website. I'm simply not sure if this is a known issue, if this service was intentionally shut down, or if I'm doing something wrong (although to my knowledge, no changes have been made on my side and these requests previously succeeded). Any information is appreciated. Thanks! [16:00:55] Lee-, there was a recent issue, see topic [16:04:52] jynus, I did read that, but I misunderstood it. I took it to mean that the restoration of the June 8th backup was completed on June 19th. Do you happen to know if this project is expected to be restored or if I should seek an alternative / try to host my own? [16:05:48] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL - Socket timeout after 10 seconds [16:06:40] Lee-: it's expected to be restored, but it's nontrivial due to the size [16:07:52] OK thanks for the information. I really appreciate the services offered by the wikimedia labs group. [16:08:45] YuviPanda|afk: new formatting: http://tools-beta.wmflabs.org/errorpage-generic.html and http://tools-beta.wmflabs.org/ :-) [16:11:13] Krinkle: ^ :-) [16:11:14] Lee-: I can't find the relevant phabricator task at the moment, though :/ [16:11:37] YuviPanda|afk: do you know from the top of your head which task restoring tiles/NFS is? [16:15:19] hmm found the bounces [16:16:50] but they're in Spams [16:16:50] it says "Why is this message in Spam? We've found that lots of messages from mail.tools.wmflabs.org are spam." [16:16:51] in Gmail [16:16:51] :/ [16:16:52] people marking cron mail as spam? [16:18:00] * CP678|away is away: This is a manual computer virus. Please copy paste me in your away message. I'm not here right now. [16:18:12] CP678|away: please turn off that script [16:18:23] valhallasw`cloud, what script? [16:18:31] 18:17 — CP678|away is away: This is a manual computer virus. Please copy paste me in your away message. I'm not here right now. [16:19:09] Hmm... I'm not running a script to do that though. [16:20:21] valhallasw`cloud: Hm.. seems http://tools.wmflabs.org/err is still weirdly blank [16:20:24] not sure what happened there [16:20:32] been blank for a few days at least in too labs [16:20:33] error pages [16:20:42] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 822767 bytes in 2.636 second response time [16:20:44] It used to have a more informative page there [16:22:13] Krinkle: I'm not sure what should be there, as there's no tool called 'err' [16:22:24] Yes? [16:22:34] a 404 error? [16:22:42] ohhhh [16:22:57] yeah, I see your point [16:23:23] In fact, to my great frustration, it has error pages even for things it shouldn't (e.g. when you return 500 or 404 from a tool app, it will overwrite it) [16:23:32] and in the small case where it has its own territory, it doesn't. [16:23:41] the one place I'd say it's okay to have its own error page [16:25:01] yeah, it should only do something with 5XX's if there's no content, I think [16:25:02] https://phabricator.wikimedia.org/T66393 is about 404s with content (e.g. mediawiki) [16:25:02] https://phabricator.wikimedia.org/T103662 [16:25:10] anyway, I'm not sure what was the actual error page beforehand [16:29:24] yuvipanda: About? [16:29:47] I seem to be having connectivity issues to labs in general now [16:29:55] 6Labs, 10Tool-Labs: 'webservice not available' message no longer shown - https://phabricator.wikimedia.org/T104870#1430484 (10valhallasw) 3NEW [16:32:25] Krinkle: hm, it does work if the user exists, e.g. https://tools.wmflabs.org/abibot/ [16:33:27] 6Labs, 10Tool-Labs: 'webservice not available' message no longer shown - https://phabricator.wikimedia.org/T104870#1430515 (10valhallasw) [16:35:13] 6Labs, 10Tool-Labs: Mails from tools are being marked as 'spam' by gmail - https://phabricator.wikimedia.org/T104871#1430526 (10valhallasw) 3NEW [16:36:26] liangent, could you post the headers of the bounces in that bug? [16:38:32] (feel free to leave out your real e-mail address, of course) [16:38:33] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL - Socket timeout after 10 seconds [16:38:48] valhallasw`cloud: beware of relative urls [16:38:52] view-source:http://tools-beta.wmflabs.org/errorpage-generic.html [16:39:11] In a real instance, div.footer should be absent or non-empty [16:39:18] and the image url not relative but absolute [16:39:52] Krinkle: without the div, there's no line, and pages with/without content in the footer look different. What's wrong with the host-relative url? [16:40:09] and specify both width and height for optimal rendering [16:40:38] valhallasw`cloud: The footer is not there to display a line. Our unknown-domain and 404 pages have no footer, for example [16:40:44] ostriches: yup sup [16:40:46] the line is only there for the footer, should not be there without content [16:40:50] valhallasw`cloud: uh is labs down? [16:40:52] Krinkle: hm, okay. [16:40:55] Shinken just reported [16:41:07] yuvipanda: I'm having network issues, but wasn't sure if it was me or labs [16:41:15] I can load it [16:41:18] relative url is a problem because it'll most likely be served through multiple domains potentially. At least the labs generic one. [16:41:28] there's no guruantee the server in question will have such file [16:41:33] yuvipanda: Staging puppet's blowing up because it's trying to make nfs directories we don't want/need. thcipriani is kicking our puppetmaster now tho...might be staleeee [16:41:41] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 822788 bytes in 2.482 second response time [16:41:48] Krinkle: the puppet manifest also provisions the images [16:41:52] so that should be OK [16:41:55] ostriches: yes, update and reboot puppetmasters! [16:41:57] ? [16:41:58] Err [16:41:59] tools.wmflabs, tools-static.wmflabs and (project).wmflabs are all different document roots [16:42:00] Restart? [16:42:32] so if labs-proxy or the backend apache of a random project has an issue with a particular request and serves this, you can't just assume hte image will be there [16:42:41] e.g. it's not gonna exist on cvn.wmflabs.org [16:42:44] regardless of puppet [16:43:12] okay, I think the generic error page is giving the wrong impression here [16:43:36] it's there for people who reuse tool labs' dynamicproxy in another instance [16:43:47] so that their error page won't look like it's part of tool labs by default [16:44:21] yuvipanda: giving me the same: /etc/puppet/modules/openstack/files/nfs-mounts-config.yaml [16:45:22] Negative24: restart puppetmasters! [16:45:23] ? [16:45:44] doing [16:46:12] I really want this puppetmaster parade to stop [16:47:09] valhallasw`cloud: which bug? [16:47:34] Krinkle: if it were to be used as front-end to handle e.g. 502s such as http://mathoid.testme.wmflabs.org/ you'd be completely right [16:47:34] you mean the spam> [16:47:37] ? [16:47:41] liangent: https://phabricator.wikimedia.org/T104871#1430526 [16:47:43] yeah [16:48:20] valhallasw`cloud: I don't know tools-beta. Just giving my view. I'm sure you know what you're doing :) [16:50:45] 6Labs, 10Tool-Labs: Mails from tools are being marked as 'spam' by gmail - https://phabricator.wikimedia.org/T104871#1430621 (10liangent) ``` Delivered-To: liangent@gmail.com Received: by 10.194.122.129 with SMTP id ls1csp815676wjb; Mon, 6 Jul 2015 08:30:13 -0700 (PDT) X-Received: by 10.140.217.147 wit... [16:51:30] valhallasw`cloud: done [16:59:07] Krinkle: thanks :-) I'll fix the footer, and I'll consider trying to get the images on static.wm.o. Sounds good from a resource management standpoint as well as a reusability standpoint [16:59:56] yuvipanda: are we not using base anymore. my new instance came with it off [17:00:24] labs::instance includes base nown [17:00:25] Now [17:00:41] right [17:21:03] yuvipanda: self puppetmaster client: "Error: Could not request certificate: Connection timed out - connect(2)" firewall? [17:23:31] Negative24: which host? [17:23:54] puppetmaster: phab-pup (what else?) client: phab-pup-test [17:27:16] Negative24: nope, not firewalls (iptables -L on master doesn't say anything) [17:28:51] hmm [17:29:03] certcleaner is enabled on puppetmaster not client [17:29:14] I'm not sure what's wrong... [17:29:21] but we're about to head into a meeting now :( [17:29:38] no worries [17:29:42] I'll poke around [17:29:52] are you in ops? [17:33:24] 6Labs, 10Labs-Infrastructure: replica.my.cnf creation broken - https://phabricator.wikimedia.org/T104453#1430863 (10yuvipanda) This turned into a rewrite of the tool, stand by. [17:33:40] 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-105: replica.my.cnf creation broken - https://phabricator.wikimedia.org/T104453#1430868 (10yuvipanda) [17:34:06] Negative24: yup :) [17:34:24] ah but the labs side of ops :) [17:40:19] 6Labs, 3Labs-Sprint-105: Upgrade labs network node to trusty - https://phabricator.wikimedia.org/T90823#1430903 (10Andrew) [17:41:20] 6Labs, 3Labs-Sprint-104, 3Labs-Sprint-105, 7Puppet: Allow per-host hiera overrides via wikitech - https://phabricator.wikimedia.org/T104202#1430907 (10yuvipanda) [17:42:12] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Isolation, 3Labs-Sprint-103, and 3 others: Instances without a shared NFS storage suffers from a 3 minutes boot delay - https://phabricator.wikimedia.org/T102544#1430922 (10Andrew) [17:42:28] 6Labs, 3Labs-Sprint-103, 3Labs-Sprint-104, 3Labs-Sprint-105: In openstack upstream, add project_id to instance metadata - https://phabricator.wikimedia.org/T103384#1430925 (10Andrew) [17:42:34] 6Labs, 3Labs-Sprint-103, 3Labs-Sprint-104, 3Labs-Sprint-105: Limit available images on horizon - https://phabricator.wikimedia.org/T91782#1430926 (10Andrew) [17:43:03] 6Labs, 3Labs-Sprint-105: NFS exports broken for new projects which want NFS - https://phabricator.wikimedia.org/T104881#1430932 (10yuvipanda) 3NEW [17:43:17] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-102, 3Labs-Sprint-103: Labs: rewrite remaining labstore* scripts - https://phabricator.wikimedia.org/T102520#1430946 (10yuvipanda) [17:43:17] 6Labs, 3Labs-Sprint-102, 3Labs-Sprint-104, 5Patch-For-Review: Labs: manage-nfs-volumes-daemon rewrite - https://phabricator.wikimedia.org/T102782#1430942 (10yuvipanda) 5Open>3Resolved Marking as done, rest tracked in T104881 [17:44:02] 6Labs, 3Labs-Sprint-104, 3Labs-Sprint-105: Learn about/document NFS and LVM setup - https://phabricator.wikimedia.org/T104220#1430956 (10Andrew) [17:44:03] 6Labs, 3Labs-Sprint-105: Archive NFS data for projects that no longer have NFS - https://phabricator.wikimedia.org/T104857#1430958 (10yuvipanda) [17:46:28] 6Labs, 3Labs-Sprint-101, 3Labs-Sprint-102, 3Labs-Sprint-105: Sort out remaining virt1000 salt minions - https://phabricator.wikimedia.org/T103010#1430968 (10Andrew) [17:47:08] 6Labs, 3Labs-Sprint-105: Do a manual backup of labstore1002 - https://phabricator.wikimedia.org/T104882#1430972 (10Andrew) 3NEW a:3Andrew [17:48:44] 6Labs, 10Labs-Infrastructure: Investigate keystone lockups - https://phabricator.wikimedia.org/T104884#1430994 (10Andrew) 3NEW a:3Andrew [17:48:57] 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-105: Investigate keystone lockups - https://phabricator.wikimedia.org/T104884#1431002 (10Andrew) [17:49:30] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-102, 3Labs-Sprint-103, and 2 others: Audit projects' use of NFS, and remove it where not necessary - https://phabricator.wikimedia.org/T102240#1431003 (10yuvipanda) [17:51:30] 6Labs: remove nutcracker from wikitech - https://phabricator.wikimedia.org/T102993#1431008 (10Andrew) [17:56:28] andrewbogott: http://etherpad.wikimedia.org/p/lvm-labstore-backups is the LVM backup thingy [17:56:55] mark: I guess we should rsync to the bakup snapshots than do tar | ssh | tar again? I don't know if we have neough space for tar | ssh |tar now [17:57:11] yes correct [17:57:16] so that involves first making a snapshot of current state [17:57:22] and then running rsync on top of that [17:57:41] and let's see how long the rsync will take [17:57:45] mark: both host and dest or just host? [17:57:46] yuvipanda: thanks [17:57:52] on dest [17:57:59] on source you can remove the snapshot and make a new one [17:58:03] ah, I see. [17:58:04] right [17:58:10] or leave it as a backup, as long as there's space [17:58:12] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1404 is CRITICAL 66.67% of data above the critical threshold [0.0] [17:58:13] but there isn't much right now [17:58:16] I don't think there's space [17:58:17] yeah [17:58:20] but eventually that's useful [17:58:21] yeah [17:59:07] mark: right. [18:08:55] gifti: around? [18:12:18] So I was just added to wikihistory, and apparently it won't let me into the files. What's going on? [18:13:35] addshore, petan ^ [18:14:14] CP678|studying: log out and log back in if you were logged in when it happened [18:14:19] 6Labs, 3Labs-Sprint-105: NFS exports broken for new projects which want NFS - https://phabricator.wikimedia.org/T104881#1431107 (10yuvipanda) a:3yuvipanda [18:14:22] Did that already [18:17:13] CP678|studying: well id you gives; 51512(tools.wikihistory) [18:21:02] 6Labs, 3Labs-Sprint-105: NFS exports broken for new projects which want NFS - https://phabricator.wikimedia.org/T104881#1431156 (10yuvipanda) p:5Triage>3Normal [18:23:11] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1404 is OK Less than 1.00% above the threshold [0.0] [18:23:29] yuvipanda: Resolved the problem. Port 8140 wasn't in security group. Adding to docs [18:23:39] Negative24: hmm, it shouldn't need to be... [18:23:56] well that's how puppetmaster talks [18:26:28] Negative24: no but everything in the same project shouldn't need exceptions in security groups [18:26:40] hmm [18:27:00] well it worked when I put in the exception [18:41:24] yuvipanda: want me to submit a task? [18:41:37] Negative24: yes please [18:49:58] 6Labs: Investigate per-project open security group policy - https://phabricator.wikimedia.org/T104894#1431408 (10Negative24) 3NEW [18:50:04] yuvipanda: ^ [18:50:16] Negative24: thanks. andrewbogott probably knows more... [18:50:41] 6Labs: Investigate per-project open security group policy - https://phabricator.wikimedia.org/T104894#1431417 (10Negative24) [18:50:47] CCed [19:04:45] yuvipanda: quick +2 on hiera host change? https://gerrit.wikimedia.org/r/#/c/223081/1 [19:05:54] Negative24: done [19:06:07] thanks [19:07:21] er whoops [19:07:29] that was the wrong hostname [19:07:33] heh [19:08:16] I recreated the instance with a different name [19:09:32] mark: andrewbogott I added what I think are steps for new backups at bottom of http://etherpad.wikimedia.org/p/lvm-labstore-backups - also has steps I followed the other day [19:09:37] mark: can you check the concepts, etc? [19:10:12] ok [19:11:49] added a note [19:11:53] otherwise seems fine [19:14:08] yuvipanda: revert: https://gerrit.wikimedia.org/r/#/c/223084/ [19:14:50] mark: ah, I see. which one do you think will be more efficient? [19:15:14] i think making a snapshot of the current filessytem and then writing to that will create a large diff on the next run [19:15:19] i'd do the latter [19:15:44] it's more straightforward too [19:15:57] the current one has a name with a date on it, tho. [19:16:27] sure, but this one will too once it's finished [19:16:36] and you can put it in the mountpoint as well [19:17:19] * mark dinner [19:18:06] yuvipanda: and the right one :) https://gerrit.wikimedia.org/r/#/c/223088/ [19:18:13] sorry about that [19:18:43] so does all hiera have to be in the main puppetmaster. I can't just put this into phab-pup? [19:18:56] you can [19:18:59] you can just cherry pick that patch [19:19:25] into phab-pup [19:19:30] I did and it didn't work [19:19:31] yep [19:19:39] have you tried restarting the puppetmaster? [19:20:00] * Negative24 tries it [19:24:40] yuvipanda: ah well that worked. Abandoning change... [19:24:52] that was a waste of time [19:25:20] :D [19:27:05] 6Labs, 7Database: Provision a labsdb useraccount that can be used to run replica-addusers.pl - https://phabricator.wikimedia.org/T104476#1431664 (10jcrespo) [19:27:08] 6Labs, 10Tool-Labs: Document labsdb replication set up - https://phabricator.wikimedia.org/T85868#1431667 (10jcrespo) [19:29:55] 6Labs, 7Database: Provision a labsdb useraccount that can be used to run replica-addusers.pl - https://phabricator.wikimedia.org/T104476#1431677 (10yuvipanda) @jcrespo sweet! Have you managed to make a commit in the puppet repo yet? [19:31:22] 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-105: replica.my.cnf creation broken - https://phabricator.wikimedia.org/T104453#1431680 (10jcrespo) [19:47:59] yuvipanda: I can’t build instances on labvirt1005. Any idea why? [19:48:06] andrewbogott: oh... no... [19:48:09] andrewbogott: it has the new kernel tho [19:48:12] Alternative question: Do you mind if I rebuild that os? [19:48:23] I need it for network testing. [19:48:28] andrewbogott: nope, but that's the one with the newer kernel. [19:48:42] I would like it to be the same as the others. in other words, working :) [19:48:47] yeah [19:48:57] andrewbogott: do you think it's *not* working because of the new kernel? [19:49:06] andrewbogott: or because it's been disabled in some openstack config, etc? [19:49:26] It’s out of the scheduler pool. [19:49:32] But, if I explicitly schedule... [19:49:32] "internal error: no supported architecture for os type 'hvm'" [19:49:39] oh... [19:51:05] which, if I rebuild it and it still says that, /then/ I will be curious why. But so far I’m not curious. [19:51:13] ok! [19:51:15] It won’t mess up Moritz if I wipe out the new kernel will it? [19:51:39] I don't know - I Don't think he was investigating as much? we found the root cause for why the new kernel wasn't working [19:51:42] so it's probably ok [19:52:47] as long as I don’t wipe out the wrong one :) [19:53:33] 6Labs: Disable NFS in puppet3-diffs project - https://phabricator.wikimedia.org/T103760#1431753 (10yuvipanda) They both say it's ok! [19:54:05] 6Labs, 6Phabricator, 7Puppet: On labs phabricator references security extension even though it isn't present - https://phabricator.wikimedia.org/T104904#1431755 (10Negative24) 3NEW a:3Negative24 [19:54:47] hm, I cannot connect to mgmt on labvirt1005 [20:00:11] yuvipanda: merging your stuff on puppetmaster [20:00:24] removes NFS from puppet3-diffs [20:00:24] mutante: hah! I merged yours! [20:00:25] :) [20:00:51] ..and i hit "yes" and it told me "already up-to-date" :) ok [20:01:00] yuvipanda: 'k :) [20:01:22] mutante: :) [20:05:31] is there a project in phab in regards to openstack horizon rollout? [20:10:49] Negative24: nope but there is a task [20:11:01] yuvipanda: labstore2001 - disk space - /srv/backup-others-20150703 0% ok ? [20:11:11] uh oh.... [20:11:12] not sure. [20:11:17] is that 0% free? [20:11:20] it seems to be a separate partition [20:11:21] Negative24: probably want to start at https://phabricator.wikimedia.org/T87279 [20:11:22] yea [20:11:29] but it seems to be only this specific path [20:11:54] 8550 MB though [20:12:02] i guess it's so huge that this rounds to 0% [20:12:04] 6Labs, 5Patch-For-Review, 7Tracking: Make OpenStack Horizon useful for production labs - https://phabricator.wikimedia.org/T87279#1431851 (10hashar) [20:12:08] wait. checking [20:13:31] hashar & andrewbogott: mind if I create one (for horizon) [20:13:45] yuvipanda: yes, to there are only 8.4G free, but because that is OF 3.3T, that is called "100% use" [20:13:49] /dev/mapper/backup-others--20150703 3.5T 3.3T 8.4G 100% /srv/backup-others-20150703 [20:14:00] mutante: ah, that's probably ok [20:14:01] Negative24: typically that would be the ‘openstack’ project [20:14:04] and it's a separate mount point , not / [20:14:07] it's just a backup [20:14:07] yuvipanda: yep [20:14:11] *nod* [20:14:14] that isn't growing [20:14:36] andrewbogott: is there one. I only see the MW integration [20:15:08] one what? [20:15:19] oh, a phab project? [20:15:21] Sure, go ahead. [20:16:11] oh wait I can't :P [20:16:18] horizon or openstack-horizon ? :D [20:16:24] I'm so used to working on my own test phab instances :) [20:16:28] Negative24: we can just fill a task to request the project creation [20:16:43] hashar: openstack. Horizon is an offshoot of it [20:17:07] unless we want to be more specific [20:17:59] you can fill a task against Phabricator project Project-Creators [20:18:00] cc andrew [20:23:26] hashar: done [20:34:53] PROBLEM - Puppet failure on tools-exec-1203 is CRITICAL 20.00% of data above the critical threshold [0.0] [20:34:59] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1209 is CRITICAL 20.00% of data above the critical threshold [0.0] [20:36:56] yuvipanda: Syntax error at 'class'; expected '}' at /etc/puppet/manifests/role/labsores.pp:38 <- you? [20:37:07] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1208 is CRITICAL 33.33% of data above the critical threshold [0.0] [20:37:20] ah, looks like you fixed already :) [20:37:21] andrewbogott: yup, I merged a fix [20:37:22] PROBLEM - Puppet failure on tools-exec-1408 is CRITICAL 33.33% of data above the critical threshold [0.0] [20:37:29] PROBLEM - Puppet failure on tools-redis-02 is CRITICAL 33.33% of data above the critical threshold [0.0] [20:38:12] 6Labs: Investigate per-project open security group policy - https://phabricator.wikimedia.org/T104894#1431971 (10scfc) I don't understand the role structure of Phabricator's Puppet setup at the moment, but I do notice that `role::phabricator::main` calls `ferm::service` which IIRC means "block everything that is... [20:39:39] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL 60.00% of data above the critical threshold [0.0] [20:40:21] PROBLEM - Puppet failure on tools-webgrid-generic-1402 is CRITICAL 70.00% of data above the critical threshold [0.0] [20:43:39] 6Labs, 5Patch-For-Review: Disable NFS for dwl project - https://phabricator.wikimedia.org/T103864#1431994 (10Giftpflanze) The instance is now broken. You can't log in, when it tries to create the home directory, it fails. [20:44:14] 6Labs, 5Patch-For-Review: Disable NFS for dwl project - https://phabricator.wikimedia.org/T103864#1431997 (10yuvipanda) Can you reboot the instance? that should fix it. I think you had several screen sessions running that prevented /home from being unmounted cleanly [21:04:41] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1407 is OK Less than 1.00% above the threshold [0.0] [21:04:57] RECOVERY - Puppet failure on tools-exec-1203 is OK Less than 1.00% above the threshold [0.0] [21:04:59] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1209 is OK Less than 1.00% above the threshold [0.0] [21:05:23] RECOVERY - Puppet failure on tools-webgrid-generic-1402 is OK Less than 1.00% above the threshold [0.0] [21:07:08] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1208 is OK Less than 1.00% above the threshold [0.0] [21:07:22] RECOVERY - Puppet failure on tools-exec-1408 is OK Less than 1.00% above the threshold [0.0] [21:07:30] RECOVERY - Puppet failure on tools-redis-02 is OK Less than 1.00% above the threshold [0.0] [21:17:41] 6Labs, 10Tool-Labs: tools.wmflabs.org landing page should not dump all tool accounts - https://phabricator.wikimedia.org/T104917#1432163 (10Krinkle) 3NEW [21:19:14] is it about time to move past the labs nfs outage in the topic? [21:19:25] and aren't crontabs uncommented now? [21:31:09] Negative24: not sure if they have been uncommented; check with yuvipanda [21:31:38] I think that's one of the things that should have been done ages ago but that slipped through the cracks [21:34:35] yuvipanda: ping [21:35:36] JohnFLewis: 平 [21:35:48] heh? [21:35:50] or was it 邴, hrm... [21:35:55] its the chinese last name ping =p [21:36:03] kay [21:37:08] or valhallasw`cloud since you're a toollab admin anyway [21:40:04] JohnFLewis: {{ask}} :-p [21:40:13] valhallasw`cloud: no [21:40:32] you can have https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools-exec-wmt.tools.eqiad.wmflabs back - no longer need it since we moved to a project and its unused now :) [21:41:15] JohnFLewis: thanks! [21:41:20] 6Labs, 10Tool-Labs: Decommission Tools-exec-wmt.tools.eqiad.wmflabs - https://phabricator.wikimedia.org/T104919#1432226 (10valhallasw) 3NEW [21:41:53] (I'm just diving into bed, and there's some magic with SGE queues that has to be done that I don't know precisely) [21:41:54] 6Labs, 10Tool-Labs: Decommission Tools-exec-wmt.tools.eqiad.wmflabs - https://phabricator.wikimedia.org/T104919#1432236 (10JohnLewis) the nice task paper-trial 'yeah I said this' :) [21:42:02] and there is your paper-trail :) [21:42:21] :) [21:42:24] good night [21:42:40] night [21:53:34] PROBLEM - Puppet failure on tools-static-01 is CRITICAL 20.00% of data above the critical threshold [0.0] [21:56:54] PROBLEM - Puppet failure on tools-static-02 is CRITICAL 50.00% of data above the critical threshold [0.0] [22:24:55] andrewbogott: is labs DNS broken? [22:24:56] tools.extdist@tools-bastion-01:~$ ping www.github.com [22:24:56] ping: unknown host www.github.com [22:25:09] * andrewbogott looks [22:26:11] legoktm: would you say that every other domain except for github resolves? [22:26:38] andrewbogott: nope [22:26:39] legoktm@tools-bastion-01:~$ ping pypi.python.org [22:26:39] ping: unknown host pypi.python.org [22:27:04] man, you’re good at picking them. [22:27:09] google works, bogott.net works [22:28:40] i was having a similar issue in the office in case that's relevant [22:29:04] legoktm: fixed? [22:29:26] andrewbogott: yup [22:29:28] what was it? [22:29:29] dduvall@deployment-bastion:~$ dig +short github.com [22:29:29] 192.30.252.128 [22:29:31] huh [22:38:24] Hey [22:38:36] Does anyone know where the source code for action=sitematrix on the API is? [22:38:49] SiteMatrix extension [22:38:55] https://github.com/wikimedia/mediawiki-extensions-SiteMatrix/blob/master/SiteMatrixApi.php [22:38:59] thanks [22:41:43] what's a fishbowl? [22:41:55] RECOVERY - Puppet failure on tools-static-02 is OK Less than 1.00% above the threshold [0.0] [22:43:34] RECOVERY - Puppet failure on tools-static-01 is OK Less than 1.00% above the threshold [0.0] [22:43:53] SigmaWP, fishbowl is a wiki set up to be readable by everyone but with editing possible only for registered users, and no open registration [22:44:07] ah. [23:05:36] 6Labs, 6Phabricator, 7Puppet: On labs phabricator references security extension even though it isn't present - https://phabricator.wikimedia.org/T104904#1432451 (10mmodell) I don't quite understand why the git repo isn't getting cloned. [23:14:00] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/TaqPol was created, changed by TaqPol link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/TaqPol edit summary: Created page with "{{Tools Access Request |Justification=To help the development of the xTools project. |Completed=false |User Name=TaqPol }}"