[07:52:15] 3Wikimedia-Labs-Infrastructure, Labs-Team: Fix syslog error "nslcd[29117]: error writing to client: Broken pipe" - https://phabricator.wikimedia.org/T78616#849835 (10scfc) Ah! I wondered for ages what that was. The syslog rule seems to be a appropriate workaround, but the upstream bugs about this (for Debian/U... [09:33:21] RECOVERY - Puppet staleness on tools-webproxy is OK: OK: Less than 1.00% above the threshold [3600.0] [10:12:57] andrewbogott: hi, i have sometime now, if you wish me to look at stuff [10:13:34] matanya: I'm about to head out, but may have new VMs for you to look at before then. [10:13:36] How's the little one? [10:14:16] she is fine, thanks :) litttle cutie [10:15:12] Everyone home from the hospital already? [10:15:22] no, in a few days [10:16:27] congrats matanya [10:16:41] thank you mutante ! [10:34:52] matanya: I'm heading out. If you're interested, there are new images building in the 'openstack' project, jessie-bootstrap and jessie-bootstrapsmall. The manifest and scripts and such are here: https://gerrit.wikimedia.org/r/#/c/179765/ [10:35:11] I'm trying to do some crazy lvm acrobatics to get the partitions set up during boot. No telling if it'll work or not. But, you're welcome to submit a revision of that patch if you like. [10:57:41] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<33.33%) [11:07:40] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [13:56:38] 3Wikimedia-Labs-Infrastructure, Continuous-Integration: integration-slave1001.eqiad.wmflabs can't start, mount.nfs yields failure in name resolution - https://phabricator.wikimedia.org/T76250#850455 (10hashar) a:3hashar Thanks @yuvipanda. I have deleted the old instance and recreated it (with IP 10.68.17.119)... [14:01:02] 3Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#850482 (10yuvipanda) Seems to be failing some more - lots of transient puppet failures because of that. [14:01:49] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: Puppet stalled on fresh Precise instance - https://phabricator.wikimedia.org/T78661#850484 (10hashar) 3NEW a:3hashar [14:02:23] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: integration-slave1001.eqiad.wmflabs can't start, mount.nfs yields failure in name resolution - https://phabricator.wikimedia.org/T76250#850493 (10hashar) Puppet choke on newly created Precise instances T78661 :-( [14:03:05] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: Puppet stalled on fresh Precise instance - https://phabricator.wikimedia.org/T78661#850484 (10hashar) [14:03:57] andrewbogott: hmm, lvm steps are failing in puppet runs [14:05:16] YuviPanda: looking... [14:05:20] andrewbogott: possibly related to DNS [14:05:35] andrewbogott: since there was another puppet storm alongside for DNS failures [14:06:22] YuviPanda: do you have an example of an instance that's failing? Or was it transient? [14:06:50] andrewbogott: is transient but was captured in puppetlogs. deployment-cache-bits01 [14:07:45] YuviPanda: you think it was lvm-specific? Or just a puppet failure due to not being able to fetch files? [14:08:31] andrewbogott: does lvm do a DNS lookup at all? [14:08:36] hashar thinks it does [14:08:41] yeah it does [14:08:51] Why would it? [14:08:53] errr [14:09:28] sorry was confused with another issue I had on an instance [14:09:38] which was failing to mount a NFS disk [14:09:41] unrelated:-] [14:11:08] heh :) [14:11:23] hashar: so, lvm was failing in mediawiki02 I think? also transient. [14:11:28] deployment-mediawiki02 [14:12:20] havent looked [14:12:26] but /var/log/puppet.log would tell [14:12:27] err [14:12:30] I meant andrewbogott [14:12:31] not hashar [14:12:45] !log integration manually cleaned and re-requested puppet cert for i-0000078a.eqiad.wmflabs [14:12:47] Logged the message, Master [14:13:37] YuviPanda: I'm interested in this but multitasking and also only have 20 mins or so of work time left. [14:13:54] andrewbogott: alright, I'll investigate and see what's up, and put things on the phab task. [14:14:01] I suspect that this is a random symptom of the wikitech DB being overwhelmed (since I saw one of those too-many-connections errors minutes ago) [14:14:06] ah, hmm [14:14:11] So probably we should just focus on that problem for starters. [14:14:18] I should poke springle about it at some point and figure out wtf is happening there [14:14:25] Which, I dunno, getting wikitech off of virt1000 would be a good start :) But that's a big job. [14:14:33] I checked logs during one of those times, and it wasn't remotely being hit hard [14:14:46] Yeah [14:16:12] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: Puppet stalled on fresh Precise instance - https://phabricator.wikimedia.org/T78661#850538 (10yuvipanda) Manually cleaned the old cert and requested a new one and it's alright now, for this instance. let's see if this recurs. [14:22:24] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: Puppet stalled on fresh Precise instance - https://phabricator.wikimedia.org/T78661#850549 (10hashar) 5Open>3Resolved Thanks, per our discussion lets close this and figure out later on when someone create another Precise instance. Might have been a tran... [14:22:25] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: integration-slave1001.eqiad.wmflabs can't start, mount.nfs yields failure in name resolution - https://phabricator.wikimedia.org/T76250#850551 (10hashar) [14:35:20] 3Tool-Labs: Set up redirect webserver for toolserver.org - https://phabricator.wikimedia.org/T62238#850577 (10coren) *.toolserver.org are all sent to the same place, though it has a landing page by default with a brief explanation and links to documentation rather than automatically redirect (except for existing... [14:36:33] 3Tool-Labs: Transfer domain toolserver.org to WMF - https://phabricator.wikimedia.org/T62864#850580 (10coren) This is now in progress with Legal. [14:39:27] 3Tool-Labs: Set up redirect webserver for toolserver.org - https://phabricator.wikimedia.org/T62238#850586 (10Dzahn) >>! In T62238#799806, @Aklapper wrote: > @scfc: As you removed the Wikimedia-Apache-configuration project, which area is this task in? Apache configuration has been moved from the separate Apache... [14:40:16] 3Tool-Labs: Transfer domain toolserver.org to WMF - https://phabricator.wikimedia.org/T62864#850590 (10Dzahn) >>! In T62864#850580, @coren wrote: > This is now in progress with Legal. any way to check the progress? [14:43:59] 3Tool-Labs: Transfer domain toolserver.org to WMF - https://phabricator.wikimedia.org/T62864#850605 (10coren) Yana from Legal has looped in MarkMonitor two business days ago (they are the ones who handle this for the WMF); I expect we'll have news shortly. [14:44:37] 3Tool-Labs: Set up redirect webserver for toolserver.org - https://phabricator.wikimedia.org/T62238#850606 (10coren) 5Open>3Resolved [14:44:38] 3Tool-Labs: Toolserver migration to Tools (tracking) - https://phabricator.wikimedia.org/T60788#850607 (10coren) [14:49:31] 3Tool-Labs: Set up redirect webserver for toolserver.org - https://phabricator.wikimedia.org/T62238#850620 (10coren) [14:58:22] (03CR) 10Hashar: "Seems fine. I am merging/deploying the CI change https://gerrit.wikimedia.org/r/#/c/175607/ then we can test the jobs using this change " [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/178571 (owner: 10Legoktm) [15:06:25] (03CR) 10Hashar: "recheck" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/178571 (owner: 10Legoktm) [15:07:06] (03CR) 10Hashar: [C: 032] "Well done \O/" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/178571 (owner: 10Legoktm) [15:34:39] 3Wikimedia-Labs-Infrastructure, Continuous-Integration: integration-slave1001.eqiad.wmflabs can't start, mount.nfs yields failure in name resolution - https://phabricator.wikimedia.org/T76250#850672 (10hashar) So I deleted the old blocked instance and created a fresh one. Applied the manifests I needed and did... [15:36:02] hashar: I'm trying to look at your issue with DNS now. [15:36:27] Coren: hi :] [15:36:42] I had the instance integration-slave1001.eqiad.wmflabs borked completely because of dns failure when mounting the NFS mounts [15:36:52] recreated it from scratch but it still has the failure :-( [15:37:03] now it is locked waiting for one to press some key on the console sniff [15:37:59] Yeah; it's not clear wth that happens - I see no issue with DNS anywhere else. I wonder if there is something in the boot sequence that causes an attempt to mount NFS too early (i.e.: before networking is fully set up [15:38:50] From what I read of the error message, you're trying to mount somthing on a directly that actually lies on the remote fs? [15:39:08] (This, btw, is the failure that makes the boot sequence stuck, not /home itself) [15:40:22] 3Wikimedia-Labs-Infrastructure, Continuous-Integration: integration-slave1001.eqiad.wmflabs can't start, mount.nfs yields failure in name resolution - https://phabricator.wikimedia.org/T76250#850690 (10hashar) [15:40:32] 3Wikimedia-Labs-Infrastructure, Continuous-Integration: integration-slave1001.eqiad.wmflabs can't start, mount.nfs yields failure in name resolution - https://phabricator.wikimedia.org/T76250#793736 (10hashar) [Get console output](https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=conso... [15:40:37] Coren: not sure, the console output is https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=consoleoutput&project=integration&instanceid=f6af33a7-b21e-446a-8f25-6bd4a9507e25®ion=eqiad [15:41:05] seems it fails to mount the well known /data/project /public/* etc [15:41:08] Yeah, ultimately it's the attempt to mount /mnt/home/jenkins-deploy/tmpfs that causes the failure. [15:41:24] also [15:41:25] An error occurred while mounting /mnt/home/jenkins-deploy/tmpfs. [15:41:33] tmpfs: Bad value 'jenkins-deploy' for mount option 'uid' [15:41:34] :( [15:42:00] WTF!!!!!!!!!! [15:42:24] that user is only in LDAP [15:43:12] Hm. That clearly wouldn't work unless networking is fully set up. I think that's the issue; the boot sequence is trying to use things that live on the network too early. [15:43:32] that is a patch by Timo https://gerrit.wikimedia.org/r/#/c/173512/1/manifests/role/ci.pp [15:43:41] which I commented it is not going to work :/ [15:45:25] It might work, but only if it has 'noauto' in the mount options and is mounted later in the boot sequence. Otherwise, the bootstrap will try to mount it too early. [15:45:44] I.e. before LDAP or NFS is usable. [15:45:59] 3Wikimedia-Labs-Infrastructure, Continuous-Integration: integration-slave1001.eqiad.wmflabs can't start, mount.nfs yields failure in name resolution - https://phabricator.wikimedia.org/T76250#850700 (10hashar) From coren, the relevant bits are: ``` tmpfs: Bad value 'jenkins-deploy' for mount option 'uid' ... An... [15:47:21] :-( [15:48:18] do we have any way to tweak the boot order sequence? [15:52:23] hashar: There are a couple, though they all require first that the fstab entry be marked noauto (otherwise the sysinit will try to mount it). After that, it's simple enough to either do an upstart conf to mount it, or even do it in rc.d [15:55:19] yeah hmm i remember a bit about those options from ten years or so ago [15:55:30] not comfortable in figuring out a fix for it though :-( [16:38:45] did beta die or something? [16:39:15] yea, looks broken [16:39:23] bad timing as we are currently running tests. [16:39:29] DB error [16:39:31] (Cannot contact the database server: Can't connect to MySQL server on '10.68.17.94' (4) (10.68.17.94)) [16:39:35] Coren: andrewbogott [16:42:45] Tobi_WMDE_SW: pretty broken to me too [16:44:01] that IP is deployment-db2.eqiad.wmflabs [16:44:01] !log restarting beta's elasticsearch servers to pick up a new version of a plugin. won't interfere with current downtime. [16:44:02] restarting is not a valid project. [16:48:13] lets try that again, bot [16:48:34] !log deployment-prep restarting beta's elasticsearch servers to pick up a new version of a plugin. won't interfere with current downtime. [16:48:37] Logged the message, Master [16:48:42] !log beta deployment-db2 is down [16:48:42] beta is not a valid project. [16:48:54] !log deployment-prep deployment-db2 is down [16:48:56] Logged the message, Master [16:49:19] mutante: deployment-db2 looks entirely dead to me. Try to kick it? [16:52:01] !log deployment-prep elasticsearch restart finished [16:52:04] Logged the message, Master [16:52:23] mutante: on wikitech? [16:53:11] Coren: i cant if it means i have to login on wikitech [16:53:14] andrewbogott: ? [16:53:30] mutante: you paged me a few minutes ago, I don't know what about [16:53:45] andrewbogott: (Cannot contact the database server: Can't connect to MySQL server on '10.68.17.94' (4) (10.68.17.94)) [16:53:48] mutante: I can do it for you if you want. [16:53:52] deployment-db2 is down [16:53:57] it's not for me [16:54:01] but please do [16:54:06] mutante: yes, but what is giving you that error? [16:54:09] I just don't randomly reboot deployment-prep instances. :-) [16:54:18] Coren: and this is why we can't have nice things :) [16:54:19] andrewbogott: http://deployment.wikimedia.beta.wmflabs.org/ [16:54:29] andrewbogott: http://beta.wmflabs.org/ [16:54:41] yeah, all of beta cluster is down [16:54:45] i dont think it's random [16:54:52] well, the part that matters (mediawiki barfs) [16:55:50] She us rebooting. [16:55:51] FWIW, I hit beta labs with a funky SQL query earlier today and got a funky result: https://phabricator.wikimedia.org/T78671 [16:56:25] mutante: I didn't say it was random; I just won't reboot deployment-prep instances without being asked to. :-) [16:56:51] Coren: I rescind my "why we can't hav enice things" comment, I missed your "don't" in "randomly reboot" :) [16:57:21] greg-g: :-P [16:58:20] mutante: The instance appears to be rather wedged. Ima try again at a lower level. [16:58:24] :/ [16:58:56] Coren: did it go to "SHUTOFF" state? [16:59:04] * YuviPanda is only kind of here [16:59:15] YuviPanda: No; so not the oom issue. [16:59:27] Ah ok [17:00:24] mutante: Looks like it's back up (the instance); but the DB doesn't seem to be yet. Ima give it another minute then go take a look if it doesn't wake. [17:02:18] (03CR) 10Hashar: "JenkinsBot lacked Submit right on labs/tools/ I granted it :-)" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/178571 (owner: 10Legoktm) [17:02:30] (03CR) 10Hashar: "recheck" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/179088 (owner: 10Merlijn van Deen) [17:03:19] (03CR) 10Hashar: [C: 031] T1316: Also report {icon umbrella} projects [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/179088 (owner: 10Merlijn van Deen) [17:04:44] mutante: the mysqld on that box is starting up and being *very* busy doinjg it. I'd guess it's replaying logs. [17:05:31] Coren: you fixed beta.wmflabs.org though [17:05:38] i was just reporting it [17:05:40] thanks [17:06:12] 141216 17:02:52 InnoDB: Database was not shut down normally! [17:06:12] InnoDB: Starting crash recovery. [17:06:19] Not surprising. [17:07:52] Hm. Should the beta site /not/ use the puzzle globe? [17:09:24] confirmed recovery [17:09:27] < shinken-wm> RECOVERY - English Wikipedia Main page on beta-cluster is OK: [17:09:41] (this is reported in -qa channel) [17:10:48] Coren: afaics it's meant not to [17:11:26] Yeah, I'm pretty sure it's supposed not to. [17:11:46] And I seem to recall having seen the labs logo there in the past. Must be a regression. [17:12:57] afair there was a red "beta" on top of the logo or so [17:21:53] en.wp.beta has the red beta logo. Apparently deployment.wm.beta does not [17:41:04] 3Tool-Labs-tools-Other: Fatal error trying to see Commons contributions on tools.wmflabs.org/supercount - https://phabricator.wikimedia.org/T76335#851026 (10Cyberpower678) 5Open>3Resolved [17:41:27] bd808: this can be changed, if needed [17:44:45] Steinsplitter: Yeah. I was just responding to the the exchange by Coren and mutante above. It sounded like Coren expected the beta banner to be present [17:45:27] :) [17:45:54] It really should be, actually. Legal really really doesn't want the project logos in labs. [18:10:05] Coren: I have a jessie image mostly working now, but your complicated firstboot lvm magic isn't doing much. Do you have time to poke around in a jessie instance and adapt that code as needed? [18:10:49] (I'm about to go to sleep, so there will only be time for one pass tonight in any case) [18:11:45] Sure thing; it shouldn't be all that hard to adapt in theory. What happens with the existing code; does it just fail or does it just work wrong? [18:12:25] The existing code gets the instances into some kind of maintenance mode on reboot. [18:12:46] The instances that I set up just now were built with a base image with all the lvm bits removed from the firstboot. [18:12:59] Hrm. Allright. Where is it? [18:13:08] (I was hoping that I could log into them and set up lvm by hand to figure out where the breakage is… but that doesn't work at all.) [18:13:26] ... it doesn't? That's what I was hoping to do. :-) [18:13:40] I just built jessie-base and jessie-base2 for you to tinker with [18:14:05] and you can amend https://gerrit.wikimedia.org/r/#/c/179765/ accordingly, if you find a good solution. [18:14:41] Actually, maybe I'll just merge that so you can make a simpler change. It's not like anyone is using that module anyway... [18:14:54] That sounds sane. [18:16:07] The patch I just merged includes the broken lvm code. So it's not the same as the code that built the images you're looking at. [18:16:15] Um… it's the same, except minus the lvm code. [18:16:26] I guess I could've planned this better :) [18:17:01] puppet also won't work so well on those instances, for reasons that I understand but have not yet fixed. [18:17:06] Shouldn't matter for your purposes anyway. [18:17:14] Indeed not. [18:17:15] Can you log in? [18:17:34] Without issues. [18:17:39] great. [18:17:58] In theory those instances have one 10g / partition and 10g of unpartitioned space to play with. [18:19:43] Hm. They appear to, but there's something odd with the disklabel. [18:21:02] I think I see what the issue is. the gpt was constructed funny, and parted doesn't like it. [18:21:09] That'll need some investigation. [18:22:28] I believe the partitioning code is here: https://github.com/andsens/bootstrap-vz/blob/master/bootstrapvz/base/fs/partitions/gpt.py [18:23:09] Oh, except, wrong branch. Here's what I'm running: https://github.com/andsens/bootstrap-vz/blob/development/bootstrapvz/base/fs/partitions/gpt.py [18:24:07] I'm already running a custom build of that, so there's no problem if you want to patch it. [18:25:22] Ah, interesting. It uses parted to do it. Which makes parted complaining about the result all that stranger. [18:27:41] OK, sleep time for me. Thanks for looking. [18:37:08] (03CR) 10Merlijn van Deen: [C: 032] T1316: Also report {icon umbrella} projects [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/179088 (owner: 10Merlijn van Deen) [18:37:33] (03Merged) 10jenkins-bot: T1316: Also report {icon umbrella} projects [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/179088 (owner: 10Merlijn van Deen) [19:43:47] (03PS1) 10Merlijn van Deen: Mv config.json to .example [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180239 [19:44:02] (03CR) 10Merlijn van Deen: [C: 032] Mv config.json to .example [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180239 (owner: 10Merlijn van Deen) [19:44:18] (03Merged) 10jenkins-bot: Mv config.json to .example [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180239 (owner: 10Merlijn van Deen) [20:07:49] (03PS1) 10Legoktm: Auto-detect changes to channels.yaml and !log it [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180245 [20:07:58] 3Tool-Labs: program created by proprietary compiler allowed on labs? - https://phabricator.wikimedia.org/T74253#851547 (10coren) Luis, as it stands I'm inclined to allow the use of the language on Labs unless your interpretation of our TOU prevents it (or you feel strongly that we should ajust the TOU to make it... [20:14:22] Coren: ping [20:14:38] Coren: I’m thinking of making one tiny change to maintain_replicas that won’t actually have any real effects. [20:15:06] Coren: ipb_deleted in ipblocks is set in the view as ‘0 as ipb_deleted’, which is redundant since the view already has a where limiting it only to rows with ipb_deleted = 0 [20:15:27] Coren: so if I get rid of that 0, then all of the redacted rows are replaced with null, and I can simplify the schema a bit :) [20:17:45] YuviPanda: You don't want to have null there; any code that looks at this column expects to do bit bashing on it. [20:18:04] Coren: it won’t be a null, it just would always be zero [20:18:17] Coren: because there’s already a *where* which replicates rows only if ipb_deleted is 0 [20:18:24] so if it isn’t 0, the row won’t be there at all [20:18:26] YuviPanda: Ah, you mean just letting ipb_deleted through. [20:18:29] so that 0 as is reduandant. [20:18:43] Coren: yeah. [20:18:44] Yeah, it's redundant atm indeed. [20:19:01] Coren: let me submit a patch :) [20:27:30] YuviPanda: hi, only letting you know its fixed now. crated a .sh workaround [20:28:19] Steinsplitter: cool. I also added a line in your code that made it work with jsub for me [20:29:08] :) thanks [20:29:20] yw! [20:33:45] YuviPanda: One thing I've been pondering is to draw a line in the sand and not replicate/don't put in views things that are not (no longer) part of the current schema. [20:34:38] Coren: hmm, so things like aft? [20:34:42] and mark as helpful? [20:35:08] Coren: we still replicate blob and cur from… 2005? 2006? :) [20:35:09] Also old columns. enwiki has dozens. [20:35:11] just don’t expose them as views [20:35:28] Coren: I’m going to try to push for a whitelist based replication strategy, *or* cleanup the dead tables [20:35:41] The counterargument is that there may be value in that old data. [20:35:51] But that increases the burden. [20:36:13] Coren: not by much, no? [20:36:25] Coren: I think I showed you https://phabricator.wikimedia.org/T78269 already [20:36:30] So the fundamental question is "Is the value of that old data potentially greater than the maintenance cost" [20:36:55] YuviPanda: I don't even have rights to see it. :-) [20:37:31] Coren: apparently I can’t properly set security settings... [20:37:59] Coren: try now? [20:38:12] "This object has a custom policy controlling who can take this action." [20:38:27] Coren: even now? [20:38:38] Still no change for me. [20:38:46] Maybe I need to log off and on? [20:38:58] Coren: probably not. can you view security bugs in phab in general? [20:39:28] YuviPanda: I don't think I've tried yet. I used to in bz, certainly. [20:42:49] Coren: ah, you can’t view security bugs in phab: https://phabricator.wikimedia.org/tag/security/ doesn’t have you in it [20:43:18] It's all André's fault! [20:43:19] :-) [20:43:20] (03CR) 10Legoktm: [C: 032] Auto-detect changes to channels.yaml and !log it [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180245 (owner: 10Legoktm) [20:43:36] testing wm-bot /valhallasw [20:43:43] (03Merged) 10jenkins-bot: Auto-detect changes to channels.yaml and !log it [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180245 (owner: 10Legoktm) [20:43:58] legoktm: ^ :P [20:44:50] !log tools.wikibugs restarting for https://gerrit.wikimedia.org/r/180245 [20:44:57] Logged the message, Master [20:54:11] tools.wikibugs valhallasw: Deployed 9649aa14cf1b8fd63a0e6efd3ac1aff0c351b141 testing extra parameters [20:54:43] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<11.11%) [20:55:02] Coren: also, does this look ok as a regex for finding tool created databases? https://github.com/wikimedia/operations-software-labsdb-auditor/blob/master/config.yaml#L6 [20:56:32] YuviPanda: It's a bit overly generous, but in practice this will match every user-created db and none of the prod ones. [20:56:43] Coren: can you help me tighten it up? [20:57:35] YuviPanda: It can be made more precise, but at a great complexity cost. In practice, '[pus]\d' suffices. [20:58:18] Coren: hmm, ok. alright then :) [20:59:01] * valhallasw`cloud prods wm-bot [20:59:30] oh wait. ! in bash. [20:59:33] * valhallasw`cloud cries [21:02:50] !log tools.wikibugs valhallasw: Deployed 9649aa14cf1b8fd63a0e6efd3ac1aff0c351b141 wb2-phab [21:02:52] Logged the message, Master [21:02:54] * valhallasw`cloud laughs maniacally [21:03:27] now let me try wb2-irc as well... [21:04:05] .... :( [21:04:40] * valhallasw`cloud waits for wikibugs to return [21:04:41] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [21:05:07] now why did that message not come through... *sigh* [21:06:11] !log tools.wikibugs valhallasw: Deployed 9649aa14cf1b8fd63a0e6efd3ac1aff0c351b141 a, b, c [21:06:13] Logged the message, Master [21:06:21] meh! [21:06:46] !log tools.wikibugs valhallasw: Deployed 9649aa14cf1b8fd63a0e6efd3ac1aff0c351b141 wb2-phab, wb2-irc [21:06:48] Logged the message, Master [21:06:49] whatever. [21:09:57] (03PS1) 10Merlijn van Deen: Add fabric runner [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180300 [21:10:20] legoktm: ^ [21:10:29] legoktm: also, I touched config.json but nothing is happening :P [21:11:27] er [21:11:30] touch channels.yaml [21:11:47] and it'll get reloaded after an IRC event [21:11:54] >_< [21:11:59] I feel stupid [21:12:00] er, well right before something is sent to IRC [21:12:28] *nod* [21:12:37] when was with_statement added to python? [21:12:37] I wonder if we can make labs-morebots include a link to the diff [21:12:47] 2.3? ;D I just copied a template [21:12:55] >.< [21:12:56] 2.6, I think, to be serious [21:13:10] 2.5. [21:13:58] !log tools.wikibugs Updated channels.yaml to: 9649aa14cf1b8fd63a0e6efd3ac1aff0c351b141 Auto-detect changes to channels.yaml and !log it [21:14:01] Logged the message, Master [21:14:28] :> [21:14:29] ah there we go [21:14:31] nice! [21:14:36] oh I can do that too [21:15:13] the with cd(code_dir) thing looks useful in general [21:15:22] (03PS2) 10Merlijn van Deen: Add fabric runner [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180300 [21:15:36] that's how fab rolls :-p [21:19:26] legoktm: looks OK? [21:20:15] valhallasw`cloud: think so [21:20:19] I've never used qmod before [21:20:29] I usually qdel ###, and then jstart [...] [21:21:13] bd808, got a min? [21:21:16] Yeah. The advantage of qmod is not having to poll [21:21:29] because jstart will only start a job once the job is actually completely dead [21:21:36] also, it seems wikibugs does not respond well to SIGKILL [21:21:43] KTC: What's up? [21:21:44] er SIGTERM [21:21:49] because it's SIGKILLed [21:22:00] that's also why the qdel takes ages [21:22:39] any chance you can give me quick instructions on how to install wikimania-scholarships on a local server so I can test it and help with some code contrib if I can? [21:23:14] install mediawiki-vagrant and then `vagrant roles enable scholarships` [21:23:20] poof you're up and running [21:23:43] bd808, thanks :) [21:24:21] PROBLEM - Puppet staleness on tools-webproxy is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [43200.0] [21:26:22] KTC: There is a bit more detail at https://www.mediawiki.org/wiki/Wikimania_Scholarships_app/Developing_using_Vagrant#Getting_started (usernames and passwords, etc) [21:26:52] thanks [21:44:50] (03PS3) 10Legoktm: Add fabric runner [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180300 (owner: 10Merlijn van Deen) [21:45:19] (03CR) 10Legoktm: [C: 032] Add fabric runner [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180300 (owner: 10Merlijn van Deen) [21:45:39] (03Merged) 10jenkins-bot: Add fabric runner [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180300 (owner: 10Merlijn van Deen) [21:46:58] !log tools.wikibugs valhallasw: Deployed 366f1b524cb4aecbdf4825a8b96e9f66524fa727 Add fabric runner wb2-phab [21:47:01] Logged the message, Master [21:47:05] hurrah! [21:51:41] !log tools.wikibugs Updated channels.yaml to: 366f1b524cb4aecbdf4825a8b96e9f66524fa727 Add fabric runner [21:51:43] Logged the message, Master [21:52:53] wat. [21:59:24] !log tools.wikibugs legoktm: Deployed 366f1b524cb4aecbdf4825a8b96e9f66524fa727 Add fabric runner wb2-irc [21:59:26] Logged the message, Master [21:59:30] niceeeeee [22:01:03] (03PS1) 10Legoktm: fab: set use_ssh_config = True [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180316 [22:01:12] valhallasw`cloud: ^ [22:01:52] (03CR) 10Merlijn van Deen: [C: 032] fab: set use_ssh_config = True [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180316 (owner: 10Legoktm) [22:02:16] (03Merged) 10jenkins-bot: fab: set use_ssh_config = True [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180316 (owner: 10Legoktm) [22:02:20] * legoktm tries deploying [22:04:51] valhallasw`cloud: no log to IRC... [22:05:11] legoktm: :/ [22:05:19] !log tools.wikibugs valhallasw: Deployed 3ec300c6605ed2087ad6bf25bf43abb4c0319d18 fab: set use_ssh_config = True (no jobs restarted) [22:05:21] it seems wm-bot sometimes ignores stuff [22:05:22] Logged the message, Master [22:05:26] .... [22:05:29] >.> [22:05:34] sorry, testing in the meanwhile [22:05:51] but I'm not quite sure why it sometimes ignores messages >_> [22:06:33] (03PS1) 10Merlijn van Deen: change default behavior to just-pull [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180318 [22:06:46] (03CR) 10jenkins-bot: [V: 04-1] change default behavior to just-pull [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180318 (owner: 10Merlijn van Deen) [22:07:01] >_. [22:07:29] (03PS2) 10Merlijn van Deen: change default behavior to just-pull [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180318 [22:07:33] valhallasw`cloud: wm-bot :P [22:08:03] YuviPanda: :> [22:10:20] I can haz message [22:10:29] * valhallasw`cloud cheers [22:15:26] !log tools.wikibugs valhallasw: Deployed b'3ec300c6605ed2087ad6bf25bf43abb4c0319d18 fab: set use_ssh_config = True\n' testing one two three [22:15:28] Logged the message, Master [22:15:37] and the crowd rejoices! [22:15:46] valhallasw`cloud: \o/ [22:16:45] (03PS1) 10Merlijn van Deen: Report IRC using Python and Yuvi's ircnotifier [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180322 [22:16:55] YuviPanda: ^ :* [22:17:29] valhallasw`cloud: put the token same place you put your passwords? [22:18:04] sure. [22:20:35] !log tools.wikibugs valhallasw: Deployed b'3ec300c6605ed2087ad6bf25bf43abb4c0319d18 fab: set use_ssh_config = True\n' test [22:20:37] Logged the message, Master [22:20:51] gotta love python 3 [22:20:56] valhallasw`cloud: :D [22:21:02] valhallasw`cloud: ircyall is py3 :D [22:21:07] and debianized and puppetized... [22:21:15] YuviPanda: I mean the b'BLAH' [22:21:20] !log tools.wikibugs valhallasw: Deployed 3ec300c6605ed2087ad6bf25bf43abb4c0319d18 fab: set use_ssh_config = True [22:21:21] Logged the message, Master [22:21:24] valhallasw`cloud: ah, heh :D [22:22:35] (03PS2) 10Merlijn van Deen: Report IRC using Python and Yuvi's ircnotifier [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180322 [22:22:37] YuviPanda: ^ bettah? [22:23:19] valhallasw`cloud: <3 [22:26:02] nothing says "a language based on readability and whitespace" like json.load(open(os.path.join(os.path.split(__file__)[0], 'config.json')))['IRCNOTIFIER_KEY'] [22:27:35] legoktm: guilty as charged [22:27:39] valhallasw`cloud: legoktm you can use os.path.dirname instead of split and [0] [22:27:53] the fancy new path lib is only in 3.5, right? [22:28:00] and split that line into like, 3 lines :P [22:28:29] and this is why we have code reviews :P [22:28:35] people like me pulling shit like this [22:28:51] so many closing parens it could be lisp :P [22:29:49] (03PS3) 10Merlijn van Deen: Report IRC using Python and Yuvi's ircnotifier [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180322 [22:29:54] ^ better now? <3 [22:31:59] valhallasw`cloud: pretty sure you have to open the file? [22:32:49] also **locals() is cheating :P [22:33:56] I just realised it's also useless [22:36:27] !log tools.wikibugs valhallasw: Deployed 3ec300c6605ed2087ad6bf25bf43abb4c0319d18 fab: set use_ssh_config = True [22:36:29] Logged the message, Master [22:36:36] now where did my message go :< [22:36:42] !log tools.wikibugs valhallasw: Deployed 3ec300c6605ed2087ad6bf25bf43abb4c0319d18 fab: set use_ssh_config = True [22:36:43] Logged the message, Master [22:38:16] !log tools.wikibugs valhallasw: Deployed 3ec300c6605ed2087ad6bf25bf43abb4c0319d18 fab: set use_ssh_config = True [22:38:18] Logged the message, Master [22:38:21] oh, newlines :< [22:38:37] YuviPanda: your bot doesn't support newlines :P [22:38:40] !log tools.wikibugs valhallasw: Deployed 3ec300c6605ed2087ad6bf25bf43abb4c0319d18 fab: set use_ssh_config = True test test1 test2 test3 [22:38:42] Logged the message, Master [22:38:47] valhallasw`cloud: what am I supposed to do with newlines in IRC?! [22:38:58] YuviPanda: strip them, send more messages, automatically pastebin them [22:39:05] all of the above [22:39:08] all too rational! [22:39:10] :D [22:39:12] probably *not* send more messages :P [22:39:14] now what I'm wondering... [22:39:38] !log tools.wikibugs valhallasw: Deployed 3ec300c6605ed2087ad6bf25bf43abb4c0319d18 fab: set use_ssh_config = True test test1 test2 test3 [22:39:40] Logged the message, Master [22:39:46] YuviPanda: ^ ;-D [22:39:57] definitely *don't* do that :p [22:40:15] valhallasw`cloud: ah, heh :) [22:40:18] valhallasw`cloud: is that with newlines? [22:40:23] yeah [22:40:24] \nJOIN 0 [22:40:29] = part all channels [22:40:44] !log tools.wikibugs valhallasw: Deployed 3ec300c6605ed2087ad6bf25bf43abb4c0319d18 fab: set use_ssh_config = True test test1 test2 test3NICK ircspammer [22:40:46] Logged the message, Master [22:40:49] valhallasw`cloud: ouch. [22:40:52] oh, I forgot the \n [22:40:53] valhallasw`cloud: at least it rejoined :) [22:41:02] !log tools.wikibugs valhallasw: Deployed 3ec300c6605ed2087ad6bf25bf43abb4c0319d18 fab: set use_ssh_config = True test test1 test2 test3 [22:41:04] Logged the message, Master [22:41:05] :> [22:41:07] valhallasw`cloud: yeah, I think I’ll just… properly strip newlines and anything starting with / [22:41:16] IRC doesn't care about / [22:41:23] but maybe your irc lib does, dunno [22:42:12] NICK ircnotifier [22:42:15] :< [22:42:23] x [22:42:31] I'll stop playing now :p [22:42:49] this is what we call an IRC injection vector [22:43:55] (03PS4) 10Merlijn van Deen: Report IRC using Python and Yuvi's ircnotifier [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180322 [22:51:51] valhallasw`cloud: try again? [22:51:55] valhallasw`cloud: IRC injection, that is [22:53:17] JOIN 0 [22:53:25] YuviPanda: très broken [22:53:45] JOIN 0 [22:54:11] how do you do it? o.o [22:55:00] annika_: its like SQL injection, but for IRC! [22:55:54] i got that [22:56:19] annika_: basically, IRC is a protocol where a newline denotes the end of a command and the start of the next command [22:56:26] annika_: so any IRC library should filter out newlines [22:56:57] valhallasw`cloud: try again? [22:57:01] JOIN 0 [22:57:12] YuviPanda: you can test this yourself, you know :P [22:58:13] JOIN 0 [22:58:29] JOIN 0 [22:58:37] valhallasw`cloud: ^ :) [22:58:42] valhallasw`cloud: my code did work \o/ [22:58:42] that's also with \r? [22:58:59] JOIN 0 [22:59:07] told you so :> [22:59:10] JOIN 0 [22:59:15] valhallasw`cloud: lol [23:01:20] The current solution is to designate two characters, CR and LF, as message separators. Empty messages are silently ignored, which permits use of the sequence CR-LF between messages without extra problems. [23:02:09] JOIN 0 JOIN 0 [23:02:16] vertical tab doesn't do anything (good) [23:02:29] hmmm [23:15:30] (03PS1) 10GWicke: Add #wikimedia-services reporting [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180345 [23:30:24] (03CR) 10Legoktm: [C: 032] Add #wikimedia-services reporting [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180345 (owner: 10GWicke) [23:30:50] (03Merged) 10jenkins-bot: Add #wikimedia-services reporting [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/180345 (owner: 10GWicke) [23:32:15] * legoktm crosses fingers [23:32:27] * gwicke crosses another pair of fingers [23:32:48] gwicke: we set up some auto-magical deploy stuff earlier today so I'm hoping it works :P [23:34:16] ah [23:34:39] I just edited one, which didn't show up in #wikimedia-services [23:34:48] hm [23:35:06] * legoktm gives it 60 more seconds [23:35:07] does your magical deploy stuff involve trebuchet? [23:35:12] no [23:35:16] it just does mtime checking [23:35:20] kk ;) [23:35:42] !log tools.wikibugs Updated channels.yaml to: 432e66a45273e9798e26a8df08caa5a102eeec97 Add #wikimedia-services reporting [23:35:44] Logged the message, Master [23:35:54] yeee [23:36:08] gwicke: should be all set now :) [23:36:14] legoktm: nice :) [23:36:28] legoktm: also two different bots doing this :P [23:36:37] :P [23:37:24] * gwicke tries an edit [23:39:36] * gwicke does not see a notification yet [23:51:54] gwicke: it didn't notify to the proper place? did the bot join the channel at least? [23:52:15] legoktm: let me check (have joins muted by default) [23:53:29] legoktm: I don't think it joined [23:53:35] hmm [23:53:51] no wikibugs [23:54:18] I'll look in a bit