[01:20:02] 3Wikimedia Labs / 3tools: Mail delivery failed: returning message to sender for tfaprotbot's cron jobs - 10https://bugzilla.wikimedia.org/71632 (10Kunal Mehta (Legoktm)) 3NEW p:3Unprio s:3normal a:3Marc A. Pelletier Created attachment 16668 --> https://bugzilla.wikimedia.org/attachment.cgi?id=16668... [01:52:26] ok, flood? [01:53:28] i just got over 40 bounces in 2 mins [01:53:31] still coming [01:53:35] andrewbogott_afk: [01:54:06] RD: ^ [01:54:39] bounces where? [01:54:50] *from where [01:55:02] tool labs [01:55:26] Like...when I created your account? [01:55:29] jeremyb: What are the dates of the bounced messages? (Not that of the bounces) [01:56:48] Oh, your account was just created? That's probably piled up then that unclogged when you suddenly started existing. :-) [01:57:11] Not that [01:57:17] Ignore me, Coren [01:57:36] erm? oh, right. no [01:57:51] Coren: you think my account was just created??? :) [01:57:59] Coren: mostly first half of yesterday [01:58:14] Well, it might have been another account for some other purpose. RD confused me. :-) [01:58:47] yeah, it was an account on a test instance [01:58:52] Yea, sorry :) [01:59:12] First half of yesterday there was a fairly bad outage while we attempted to debug a network problem that stalled all of labs -- that piled up a bunch of crap and some dust is still falling. [01:59:31] hrmmm, ok [01:59:54] i thought at first that i was seeing timestamps from after my successful test [01:59:58] but maybe not [02:00:05] Keep an eye on it, and if you see the flood not subside on its own, do ping me. [02:00:17] That may just be the timestamp of the bounce itself which you saw. [02:00:31] no, i don't think so [02:00:37] Hmmm. [02:00:42] well it did subside. and came back. and subsided and came back [02:00:51] but maybe will be done soon [02:00:55] What are the bounced email about? [02:00:56] what about the one for merlbot? [02:01:16] cronspam and stuff from SGE [02:01:26] idk even how i ended up on the cronspam list [02:01:54] The network failure caused NFS to fail and a lot of cron jobs to pile up (especially job submission) [02:02:04] 3Wikimedia Labs / 3tools: Mail/LDAP query is broken - 10https://bugzilla.wikimedia.org/71392#c3 (10Tim Landscheidt) *** Bug 71632 has been marked as a duplicate of this bug. *** [02:02:04] 3Wikimedia Labs / 3tools: Mail delivery failed: returning message to sender for tfaprotbot's cron jobs - 10https://bugzilla.wikimedia.org/71632#c1 (10Tim Landscheidt) 5NEW>3RESO/DUP Those are old mails that were stuck being sent out after the LDAP lookup issue was fixed. *** This bug has been marked as... [02:02:20] Not all tools were affected equally. [02:03:00] anyway, what about merlbot's mail? [02:03:21] Missing context: what merbot's mail? [02:03:26] and shouldn't some of this stuff have gone through after the delay? [02:03:37] search your mail for 1XYtlA-0005xI-3P [02:03:55] i wonder if Coren uses notmuch [02:04:39] I don't, but I have good client-side search tools. :-) [02:05:18] * Coren is digging now. [02:05:36] funny having check-raid.py on labs hosts [02:05:43] s/hosts/guests/ [02:05:58] jeremyb: "Should". exim has some annoyingly conservative exponential backoff and can queue stuff for days in case of failure. [02:06:33] Coren: no, i mean like if it retries today it should go through? [02:07:22] I know of no reason why it wouldn't. It's fairly easy to test though - just send email to that same address. [02:07:35] right [02:07:48] but then i can't check the queue to see if it went through :P [02:07:55] have to wait 24 or whatever hours [02:08:32] No, but if you want I can watch the logs. [02:09:50] Ah, but: == tools.merlbot@tools-dev.eqiad.wmflabs routing defer (-51): retry time not reached [02:12:19] There's still mail stuck in the queue, but I see new mail go through. Ima flush the queue forcibly. [02:15:46] ok, that explains all the spam I just got [02:15:48] thanks [02:50:35] 3Wikimedia Labs / 3tools: tools-mail uses non-existing hostname in EHLO - 10https://bugzilla.wikimedia.org/71634 (10Tim Landscheidt) 3ASSI p:3Unprio s:3normal a:3Tim Landscheidt tools-mail uses the hostname relay.tools.wmflabs.org in EHLO which does not exist and causes bounces: | scfc@tools-mail:~$... [06:19:45] 3Wikimedia Labs / 3wikitech-interface: add [[wikitech:Release Engineering/SAL]] to [[wikitech:mediawiki:sidebar]] - 10https://bugzilla.wikimedia.org/71165#c1 (10jeremyb) bump [06:34:55] So, how well-supported is running Mono jobs on the grid? [06:57:03] vvv: I'm interested in those too, so I suppose I can help fix any issues you run into :) [06:57:15] just unsure how well running a webservice would work, but running things on the grid should work [07:23:31] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Metadata_database did the toolserver's namespace database ever get created as well? [07:26:29] (the answer is yes, just not documented) [07:27:52] some people! [07:31:07] oursql.PermissionsError: (1142, "SELECT command denied to user 'p50380g50440'@'10.68.16.36' for table 'namespace'", None) [07:32:03] works for me if I do it directly via mysql...hm [07:32:16] YuviPanda: ^ any ideas? [07:33:09] the query is https://github.com/mzmcbride/database-reports/blob/a6218f2b731ab2e998f75b7520085ee0efe3f522/reports/general/blankpages.py#L32 with s/toolserver/toolserverdb_p/ [07:33:18] (in the toolserver.namespace line) [08:22:40] !log tools.dbreps enabling a bunch of reports to run on commonswiki [08:22:42] Logged the message, Master [16:44:40] hi all - Brian in Berkeley here.. new today [19:23:13] ssh: Could not resolve hostname tools-login.wmflabs.org: Name or service not known [19:41:13] I'm getting external dns failure for iegreview.wmflabs.org which is an instance proxy for ieg-dev.eqiad.wmflabs. Anybody around who can look into Labs DNS health? [19:50:49] same here [19:50:59] Coren? [20:00:27] is Tomasz on this channel ? .. ref from Emily B. [20:10:41] I have to DNS issues visible from any of the spots I can test from. [20:11:33] s/to/no/ [20:13:58] iegreview.wmflabs.org resolves now -- did not 30 minutes ago [20:35:01] Time heals all wounds and wounds all heels. [20:35:01] jgage restarted pdns about 35 minutes ago [20:35:09] 22:59 < jgage> !log restarted pdns on virt1000 for ldap config update [20:35:10] 22:59 < morebots> Logged the message, Master [20:35:58] Ah. That sounds like something that would have made a constructive difference. [20:36:05] * bd808 waves at paravoid [20:36:21] heello [20:36:46] Are you almost done with your "vacation"? [20:37:47] yes :) [20:37:55] in fact I'm starting work on Monday [20:38:26] still in a bit of turmoil (houses/countries/flights) for about 10 days more [20:38:45] \o/ awesome news. I hope you get to work on something fun [20:41:59] Ah, that's why I could see nothing wrong with DNS; paravoid had already swooped in and fixed it before I even had time enough to look. :-) [20:42:36] I didn't [20:42:39] jgage did :) [20:42:49] Ah, misread above. [20:42:56] inadvertently too I thin [20:42:57] k [20:43:15] paravoid: Speaking of fun things, would you care to take a look at https://gerrit.wikimedia.org/r/#/c/163798/ ? It's the next step my my cert cleanup. [20:44:42] not tonight :) [20:44:54] paravoid: No rush. :-) [20:44:58] I'm just around because of the codfw outage [20:46:12] hi guys - Brian in Berkeley here.. Tim L recently granted me "shell access" and I am here to look into what that might mean.. [20:46:26] .. that ticket looks like a handfull.. all new to me.. [20:46:58] I run two machines in Berkeley for geo-data R&D.. [20:50:21] .. I have this blog post open .. http://ryandlane.com/blog/2014/08/04/moving-away-from-puppet-saltstack-or-ansible/ [20:50:57] and, I see some puppet ref in that ticket.. so I imagine that puppet is current for at least some of this setup.... its not hard to imagine that there is a lot here [20:51:59] I did two months at Planet-Labs in San Francisco earlier this year.. so I got to see some modern, scale setups in progress [20:55:06] dbb: I fear I'm missing crucial context to figure out what help you need. :-) [20:55:52] dbb: As for "shell access" that means that you have permission to connect to Labs' virtual servers in all projects you are a member of (by default, only the bastions) [20:56:24] ok - thats what I am here to figure out.. [20:56:48] I spoke with one of the Wikimedia folks about geo-data and servers, and I ended up here :-) [20:57:27] * bd808 scans Ryan_Lane's treatise on migrating of off Puppet [20:58:05] Well, "here" (Labs) is basically just an infrastructure when you can run stuff. I expect if there are WMF people who pointed you at us, it means you're looking to deploy software. :-) [20:58:29] there are rumblings of new capacity, but nothing is firm yet [20:58:52] Now, depending on your needs, you may want the Tool Labs, a project, where all the system administration is handled and you get the infrastructure taken care of. [20:59:19] Otherwise, you can request a project of your own where you'll be able to create virtual machines to use as you need - and do their administration yourself. [21:00:58] oh sure.. thank you .. personally, I have a few physical boxes and try to stick to those.. debian/ubuntu.. VMs spawning and deployment is a learning piece for me .. I "get" a few things.. [21:01:19] certainly you have a lot of sophisticated infra here.. so I will try to read up a bit [21:02:10] dbb: For most purposes, a project is "just a bunch of servers"; if you're looking to replicate some existing setup, this is likely the easiest way to do so. [21:03:08] as an aside, I have been building a fairly popular Linux "distro" for about four or five years now, with the Open Source Geospatial Foundation.. [21:03:43] .. Ubuntu based.. more than 50 projects of various sorts have reference installs on it [21:05:50] ooh Ryan's compared Ansible and Salt [21:06:19] we used Ansible at the place in SF this year.. it made sense to me.. [21:06:57] I was in data-pipeline, not infra, just so there is no mistake.. but I am curious [21:07:00] I/we currently use ansible for a wikifarm project. [21:07:06] ah good [21:08:24] ansible is relatively slower [21:08:59] Ryan points out the "no action" time, in particular iir [21:09:05] an average mediawiki role for us takes about 15 minutes on average [21:09:18] yeah indeed. It is slower there by a lot. [21:10:04] we implemented a 'slow' mechanism so we only run the slow mediawiki tasks every hour which is good for fast deploys [22:34:49] aww, dbb left [22:35:53] hi YuviPanda [22:35:58] thats me ;-) [22:36:02] ah [22:36:03] hello :) [22:36:04] and welcome [22:36:09] darkblue_b: may I ask who pointed you here? [22:36:14] * YuviPanda suspects it is some of the research folks? [22:36:33] just a moment, please [22:38:13] legoktm: I lost scrollback as to what you were pointing to :( [22:38:19] one sec [22:38:35] YuviPanda: https://github.com/mzmcbride/database-reports/issues/13 [22:38:52] ah [22:38:53] right [22:39:03] seems to be a bug in the code... [22:39:04] query looks like https://github.com/mzmcbride/database-reports/blob/master/reports/general/blankpages.py#L32 just with "toolserverdb_p.namespace" [22:39:25] in dbreps code? [22:39:43] yeah? I guess it should use toolserverdb_p? [22:39:52] no, with that it throws the error [22:40:07] oh [22:40:16] sorry, too early in the day :) [22:40:33] I don't know who maintains that... [22:40:42] early? are you in AU/NZ then ? [22:40:56] darkblue_b: I'm in India :) fighting jetlag, went to sleep at 2pm, woke up at 3AM... [22:41:02] yow [22:41:51] ok.. I will be brief.. originally I should thank Ward Cunningham, with whom I have talked a few times, years ago.. nothing involved, just I respect what I heard from him [22:42:08] ah, yeah, he's nice [22:42:27] specifically, I met Erik Möller at an event in San Francisco recently, and spoke in detail about geo-data [22:42:38] ah, nice [22:42:40] that lead to this.. essentially.. [22:42:49] so how can I help you? [22:42:56] I got a robo-email from Tim L saying that I now have "shell acess" [22:43:10] * YuviPanda is labs ops (officially from november, but for all pracitcal purposes already am) [22:43:11] so, I am here to discover what that might mean ... :-) [22:43:26] darkblue_b: ah, so that means you can go to wikitech.wikimedia.org, upload a ssh key, and login to bastion.wmflabs.org :) [22:43:35] pleased to (virtually) meet you YuviPanda [22:43:45] ok, I will do that then [22:43:47] darkblue_b: and you can request access to two things: 1. The tools project, which is kind of (very nice) shared hosting, 2. a raw VM [22:43:55] well, empty VM, rather [22:44:14] with tools, we take care of the underlying infrastructure, make sure things are running, etc [22:44:19] I suspect there are many stories in this place.. I will try to read up a bit.. always appreciate pointers [22:44:21] if you get a VM, that's all your repsonsibilities [22:44:27] ok [22:44:38] darkblue_b: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help is the long doc explaining all the features of the tools environment [22:44:56] darkblue_b: and tools.wmflabs.org lists all the projects from all the people that currently run there [22:45:24] ok [22:45:50] darkblue_b: and https://lists.wikimedia.org/mailman/listinfo/labs-l is the mailing list [22:46:07] I see... I will try to send an mail there now [22:46:55] cool :) [22:48:20] .. they say, networking is easy! it works every time.. except the first time [22:48:34] :) [22:48:47] for some definition of 'works', 'every time', and 'first time' :) [22:49:07] I sent an email to abs-l@lists.wikimedia.org bounced 550 [22:49:12] is that right ? [22:49:29] darkblue_b: labs-l, I think [22:49:33] darkblue_b: and you need to subscribe first [22:49:37] oohh the page I cpoied that from, has it the other way.. [22:49:44] ok, subscribing now