[00:48:19] 10PAWS, 10pywikibot-core, 5Patch-For-Review: Install developer requirements into PAWS - https://phabricator.wikimedia.org/T120860#1871821 (10jayvdb) 5Open>3Resolved a:3yuvipanda [00:51:18] 6Labs, 10Labs-Infrastructure: Groups for project are created in ldap but getent cannot see them on user VMs (disallowing ssh, etc) - https://phabricator.wikimedia.org/T121064#1871826 (10Andrew) a:3MoritzMuehlenhoff I feel like https://gerrit.wikimedia.org/r/#/c/258355/2 is an improvement regardless. But, ye... [03:04:57] 6Labs, 10MediaWiki-Special-pages, 10wikitech.wikimedia.org, 7Regression, 7Wikimedia-log-errors: Special:MovePage throws MWException "Hook SMWParseData::onTitleMoveComplete has invalid call signature" - https://phabricator.wikimedia.org/T120218#1872062 (10bd808) >>! In T120218#1848897, @bd808 wrote: > Lik... [03:05:08] 6Labs, 10MediaWiki-Special-pages, 10wikitech.wikimedia.org, 7Regression, 7Wikimedia-log-errors: Special:MovePage throws MWException "Hook SMWParseData::onTitleMoveComplete has invalid call signature" - https://phabricator.wikimedia.org/T120218#1872063 (10bd808) a:3ori [03:05:16] 6Labs, 10MediaWiki-Special-pages, 10wikitech.wikimedia.org, 7Regression, 7Wikimedia-log-errors: Special:MovePage throws MWException "Hook SMWParseData::onTitleMoveComplete has invalid call signature" - https://phabricator.wikimedia.org/T120218#1872064 (10bd808) 5Open>3Resolved [03:10:16] 6Labs, 10wikitech.wikimedia.org, 7Wikimedia-log-errors: Hook SMWParseData::onTitleMoveComplete has invalid call signature; Parameter 3 to SMWParseData::onTitleMoveComplete() expected to be a reference, value given - https://phabricator.wikimedia.org/T118649#1872086 (10bd808) [03:10:18] 6Labs, 10MediaWiki-Special-pages, 10wikitech.wikimedia.org, 7Regression, 7Wikimedia-log-errors: Special:MovePage throws MWException "Hook SMWParseData::onTitleMoveComplete has invalid call signature" - https://phabricator.wikimedia.org/T120218#1872087 (10bd808) [03:39:55] YuviPanda: Lemme read backscroll (just back from Kendō) [03:42:36] YuviPanda: The PAM changes shouldn't affect this; that's just libnss-ldap and that wouldn't change. Do you have a specific example I can look at to dig into? [03:50:55] Coren: hey [03:51:06] Coren: yes, tools.quarry (service group) and project-mediawiki-vagrant [03:51:16] * Coren goes and look. [03:51:46] Coren: https://phabricator.wikimedia.org/T121064 has more info, but tldr is the groups are there on LDAP but getent and PAM don't see it [03:52:07] Coren: ldaplist -l project mediawiki-vagrant and getent group | grep mediawiki-vagrant exhibit this clearly [03:54:22] Ah, I see the confusion. [03:54:53] ldaplist -l projects gives you the ou=projects,dc=wikimedia,dc=org entry, but that's not where the group is. [03:55:43] If the group had been created, you'd see it in 'ldaplist -l group project-mediawiki-vagrant [03:56:25] Your tools.quarry, otoh, works right - it was created after the patch right? [03:57:07] YuviPanda: Check 'ldaplist -l group project-quarry' for comparison. [03:57:29] So, when the project was created, the /project/ record was created, but the matching /group/ failed. [03:58:16] ah [03:58:18] I see [03:58:21] so it's back on OSM [03:58:24] rather than openldap [03:58:26] * Coren nods. [03:58:32] Coren: can you comment on the ticket to that effcet? [03:59:05] Sure. It's pretty bad handling on OSM, since it doesn't rollback when it hits an error. Failing to create the group should lead to the project being removed otherwise you're suck in this zombie state. [03:59:38] Coren: yeah, there is a 'TODO: Handle failure gracefully' there from years ago [04:02:55] I can create the group "manually" to fix the half-broken project though. [04:03:12] But it's not going to fix the failure mode that led there. [04:03:51] 6Labs, 10Labs-Infrastructure: Groups for project are created in ldap but getent cannot see them on user VMs (disallowing ssh, etc) - https://phabricator.wikimedia.org/T121064#1872136 (10coren) From what I can see, what happened in the case of mediawiki-vagrant is that: * the //project// ldap entry (`cn=mediaw... [04:10:02] 6Labs, 10Labs-Infrastructure: Groups for project are created in ldap but getent cannot see them on user VMs (disallowing ssh, etc) - https://phabricator.wikimedia.org/T121064#1872137 (10coren) If we wanted to be //really// robust, what adding or removing a member of a project //should// do is: * grab the list... [04:17:12] 6Labs, 10Labs-Infrastructure: Project entries are created in ldap but not the posix group entry (disallowing ssh, etc) - https://phabricator.wikimedia.org/T121064#1872139 (10coren) [04:18:50] 10PAWS, 5Patch-For-Review: PAWS network error: - https://phabricator.wikimedia.org/T120561#1872142 (10yuvipanda) This hasn't happened since ^ fixes. [05:01:48] 10PAWS, 10pywikibot-core: "Open in browser" option does not work in PAWS - https://phabricator.wikimedia.org/T120632#1872175 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Links is installed now [05:23:03] 6Labs, 10Labs-Infrastructure: Project entries are created in ldap but not the posix group entry (disallowing ssh, etc) - https://phabricator.wikimedia.org/T121064#1872186 (10chasemp) Well that makes the getent behavior more clear thanks [05:43:35] Coren: 15s lag to ssh now on tools-worker-02 :'( [09:04:17] 6Labs, 10MediaWiki-extensions-Newsletter: Create a larger newsletter-test instance in labs - https://phabricator.wikimedia.org/T120516#1872447 (1001tonythomas) I am sorry bd808, but the result is still negative :( ``` mwvagrant@newsletter-test:/srv/mediawiki-vagrant$ chgrp -R wikidev /srv/mediawiki-vagrant... [09:08:24] I still want help here with https://phabricator.wikimedia.org/T120516#1872447 ! [09:08:32] someone please look into the same :( [10:38:36] 10Tool-Labs-tools-Other: Migrate https://toolserver.org/~bawolff/en-wn-editor-stats.php to Tool Labs - https://phabricator.wikimedia.org/T60867#1872619 (10Liuxinyu970226) [10:38:45] 10Tool-Labs-tools-Other: Migrate http://toolserver.org/~dispenser/* to Tool Labs - https://phabricator.wikimedia.org/T68868#1872620 (10Liuxinyu970226) [10:38:52] 10Tool-Labs-tools-Other, 6Commons: Do something with globalusage account on Toolserver - https://phabricator.wikimedia.org/T63827#1872621 (10Liuxinyu970226) [10:38:56] 10Tool-Labs-tools-Other, 6Commons: Move contests from toolserver to tool labs - https://phabricator.wikimedia.org/T63826#1872623 (10Liuxinyu970226) [10:39:34] 10Tool-Labs-tools-Erwin's-tools: Migrate https://toolserver.org/~erwin85/categorycount.php to Tool Labs - https://phabricator.wikimedia.org/T62869#1872625 (10Liuxinyu970226) [10:39:49] 10Tool-Labs-tools-Erwin's-tools: Migrate http://tools.wmflabs.org/erwin85/catanalyzer.php to Tool Labs - https://phabricator.wikimedia.org/T62868#1872626 (10Liuxinyu970226) [10:40:03] 10Tool-Labs-tools-Erwin's-tools: Migrate https://toolserver.org/~erwin85/contribs.php to Tool Labs - https://phabricator.wikimedia.org/T62870#1872628 (10Liuxinyu970226) [10:40:30] 10Tool-Labs-tools-Erwin's-tools: Migrate https://toolserver.org/~erwin85/randomarticle.php to Tool Labs - https://phabricator.wikimedia.org/T62871#1872630 (10Liuxinyu970226) [10:41:03] 10Tool-Labs-tools-Erwin's-tools: Migrate https://toolserver.org/~erwin85/relatedchanges.php to Tool Labs - https://phabricator.wikimedia.org/T62872#1872631 (10Liuxinyu970226) [10:41:17] 10Tool-Labs-tools-Erwin's-tools: Migrate https://toolserver.org/~erwin85/shortpages.php to Tool Labs - https://phabricator.wikimedia.org/T62873#1872633 (10Liuxinyu970226) [10:42:26] 10Tool-Labs-tools-Other: Migrate to Tool Labs: https://toolserver.org/~dungodung/list.html - https://phabricator.wikimedia.org/T63037#1872644 (10Liuxinyu970226) [11:03:13] Coren or YuviPanda: can you delete job 241037. that's a bot running on dewiki currently making mass unwanted edits . operater is currently unavailable and this why the bot may work again tomorrow, which is not that case if i would block it [11:03:46] https://de.wikipedia.org/wiki/Benutzer_Diskussion:Sitic#Botproblem_bei_Sumdisc_-_Bot_aktualisiert_gerade_in_Dauerschleife [11:10:52] 10PAWS, 10pywikibot-core: svnversion failed - https://phabricator.wikimedia.org/T120268#1872705 (10jayvdb) fwiw, the version checking can also be disabled with `config.log_pywiki_repo_version = False` (sensible) and revising the `config.user_agent_format` to not include a dynamic version (not sensible). The `... [11:56:36] 6Labs, 10Wikimedia-Labs-General, 10DBA, 6operations, 7Tracking: (Tracking) Database replication services - https://phabricator.wikimedia.org/T50930#1872856 (10jcrespo) [11:56:39] 6Labs, 10Tool-Labs, 10DBA: labs db inconsistent data - https://phabricator.wikimedia.org/T119841#1872854 (10jcrespo) 5Open>3Resolved templatelinks ``` root@iron:~$ mysql -A -h s$SHARD-master $DATABASE -e "SELECT count(*) FROM $TABLE" +----------+ | count(*) | +----------+ | 2398600 | +----------+ roo... [12:19:26] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Mr.Ibrahem was created, changed by Mr.Ibrahem link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Mr.Ibrahem edit summary: Created page with "{{Tools Access Request |Justification=to work in Arabic Wikipedia |Completed=false |User Name=Mr.Ibrahem }}" [12:24:00] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Mr.Ibrahem was modified, changed by Mr.Ibrahem link https://wikitech.wikimedia.org/w/index.php?diff=225370 edit summary: [12:56:14] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Mr.Ibrahem was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=225382 edit summary: [14:06:57] Coren, chasemp, regarding https://phabricator.wikimedia.org/T121064 — is one of you working on that so I don’t have to? [14:07:24] andrewbogott: I already offered to take it and I see clearly how to fix it. Can do this morning if you want. [14:07:46] I’m not in a rush, as long as I know it’s not up to me. Thanks! [14:07:46] 6Labs, 10Labs-Infrastructure: Project entries are created in ldap but not the posix group entry (disallowing ssh, etc) - https://phabricator.wikimedia.org/T121064#1873153 (10coren) a:5MoritzMuehlenhoff>3coren Ima do the changeset. [14:07:57] andrewbogott: I was looking at it now but am less clear on the exact fix [14:07:59] :) [14:08:24] chasemp: I think the fix is pretty much the patch you wrote except in the group entry as well as the project entry [14:08:39] 6Labs, 10Tool-Labs: Implement metrics for tool labs (under NDA?) - https://phabricator.wikimedia.org/T121233#1873155 (10valhallasw) 3NEW [14:26:11] Coren: you are putting that up then? [14:36:44] chasemp: Yeah, I'm writing it now. [14:37:36] kk [14:39:12] chasemp: Well, I'm trying to write it but not having much luck. Every bit where I thought I needed to make a fix already _has_ the fix. [14:39:53] chasemp: I'm no longer certain /why/ the group was not created - obviously it didn't (because it's not there anymore) but afaict the code isn't trying to create the group empty anymore. [14:40:32] what do you mean by anymore, no changes have been made [14:40:38] or are you hotfixing it and seeing no change? [14:41:09] chasemp: No, I'm looking at OpenStackNovaProjectGroup.php... [14:41:37] Oh! https://gerrit.wikimedia.org/r/#/c/258355/2 was abandonned, but I still had it cherry-picked! [14:41:44] ok yes [14:41:51] but the group I made during that patch [14:41:54] still has no group [14:42:15] root@terbium:~# ldaplist -l group | grep gtest [14:42:22] so that couldn't have the been full deal [14:42:36] Hm. That should have worked. Was debugging turned on so we know what failed? [14:43:42] No at the time we switched gears becaues I didn't understand the group/project-group distinction [14:43:58] Oh... interesting. I look at the project objects that got created and /they/ don't have members either. [14:44:30] Oh, no, ignore me. Typo. [14:45:04] So they definitely have the members; which means the group should have had them when created, which means it should have worked. So there is another issue that prevents the group from being created too. [14:45:33] I think I may, at some point, upgrade labsdb to mariadb 10.1, that will give us proper role/group support and row-binlog-format-based filtering [14:50:54] Coren: what is the they having members, (this is horrific nomenclature) but is it the group (which you find by ldaplist using project) or the project group which you find using ldaplist using group (which seems to not exist for any of the test groups I tried with or without the patch) [14:51:58] chasemp: Let's make sure we don't get confused anymore. The ldap object which is under 'ou=projects,dc=wikimedia,dc=org', that is an 'extensibleObject', and that is named $project we shall call 'project' [14:52:26] agreed then [14:52:45] chasemp: The object under 'ou=groups,dc=wikimedia,dc=org' that is a 'prosixgroup' and that is named 'project-$project' we'll call the group. :-) [14:53:23] the group is never created with or without my test patch yesterday [14:53:27] agreed? [14:53:34] It appears so. [14:53:42] the project is created in both cases [14:53:49] Also correct. [14:53:58] !log restarting and configuring S1:codfw mysqls (db2016,34,42,48,55,62,69,70) [14:53:58] restarting is not a valid project. [14:54:12] Now, our current working hypothesis was that the /only/ reason the group wasn't getting created is because it had to members at creation. [14:54:27] But I'm pretty sure the patch fixed /that/. [14:54:48] ups, wrong channel [14:54:52] And I can confirm that the project /was/ created with users. [14:55:14] so our assumption is if the empty group denial was the only issue then https://gerrit.wikimedia.org/r/#/c/258355/2/nova/OpenStackNovaProjectGroup.php would have solved it [14:56:31] Yes, so long as $user->userDN actually has a valid cn (which I have no reason to think it doesn't in this context) [14:57:38] So the root cause remains 'the group isn't getting created'; maybe I should try to create a group for one of your gtest*-temp project with the creds of osm? This way I'll get an actual error message from ldap [14:58:06] that's a good next step and actually my re-patch this AM was goign to have a failure on invaid user-userDN [14:58:11] so I'll follow through with it I guess [14:58:15] and we can reorient [14:58:20] because wtf man [14:58:34] Lemme do the manual test first so we can catch the real reason it says no. [14:58:44] please [15:10:14] chasemp: I added the project-gtest1-temp group with exactly one member (uid=rush,ou=people,dc=wikimedia,dc=org) and ldap was happy to accept it. [15:10:44] marc@tools-bastion-01:~$ getent group project-gtest1-temp [15:10:45] project-gtest1-temp:*:52779:rush [15:13:10] chasemp: `ldaplist -l group project-gtest1-temp` will show you the exact object. [15:15:41] chasemp: So it looks like the patch doesn't work as intended despite looking like it should - having a member seems to suffice to create the group. [15:17:42] chasemp: (more data points: done with novaadmin credentials, from silver - so this should be exactly what OSM does) [15:22:06] chasemp: So, I'm a little nonplussed atm. As far as I can tell, that patch is exactly right and does the very thing I've just confirmed worked. So why didn't it? [15:26:46] 6Labs, 10Labs-Infrastructure: Project entries are created in ldap but not the posix group entry (disallowing ssh, etc) - https://phabricator.wikimedia.org/T121064#1873244 (10coren) So now I'm getting a bit confused. Adding the following record to ldap (manually, with novaadmin creds from silver) worked: ```20... [15:27:13] not sure yet :) [15:27:50] I added that new info to the task. [15:28:18] tx [15:28:20] thoughts on https://gerrit.wikimedia.org/r/#/c/258462/1/nova/OpenStackNovaProjectGroup.php [15:28:32] * Coren looks [15:29:45] That's suitably paranoid; and covers the only case I can think of that'd make the group addition fail. [15:31:36] Can you to a quick look at https://gerrit.wikimedia.org/r/#/c/258448/ while you're around gerrit so we can quite the silly not-alarms? [15:32:12] (+1'ed yours) [15:35:33] is the role in site.pp only applied to one of them or soemthing? [15:39:10] No, it's applied to all of them but only the active fileserver has the right settings for ldap to work. [15:49:16] chasemp: I go get me some quick breakfast. Be back shortly. [15:56:24] !log newsletter Added myself (BryanDavis) to project to debug MediaWiki-Vagrant issues [15:56:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Newsletter/SAL, Master [16:16:26] chasemp: Any luck? [16:41:42] 6Labs, 5Patch-For-Review: Create labs baremetal subnet? - https://phabricator.wikimedia.org/T121237#1873345 (10Andrew) 3NEW [16:43:15] Coren: a bit I think I got it working but I had a transient failure and I'm not sure why [16:43:20] andrewbogott: Coren YuviPanda [16:43:22] https://phabricator.wikimedia.org/diffusion/EOST/browse/master/nova/OpenStackNovaProject.php;34972f145cb96b9b9fd3511de7378d1fb4c3d49d$141-149 [16:43:30] why do a preemptive delete on a group that we know doesn't exist? [16:43:41] and why depend on that delete opreation's success? [16:44:11] I know it fails as when I debug it I can see that failing and I had a failed project creation I can't duplicate with https://gerrit.wikimedia.org/r/#/c/258462/1/nova/OpenStackNovaProjectGroup.php only [16:44:42] chasemp: I don’t know, but i would guess that ‘delete’ is a proxy for ‘empty' [16:44:48] so I'm confused as to the //why// of the heavy handed deletion of what shouldn't exist and teh dependence on that deletion to create [16:44:53] to avoid re-use of existing groups and accidentally providing access. [16:45:28] Do you know for a fact that the delete fails if the group didn’t exist? [16:46:54] it returns an exception 'ldap_delete(): Delete: No such object","' [16:47:00] ok [16:47:05] seen on line 12628 [16:47:09] in /tmp/wiki.log [16:47:13] for my testing w/ gtest4 [16:47:17] but here is the nuts part [16:47:31] I can't recreate the failure now tho it failed... [16:49:15] chasemp: I think that createProject() used to create the group which was later deleted. But now the creation in createProject() fails (due to empty) so the subsequent delete fails [16:49:22] I can’t explain why the code is organized that way though [16:50:25] Did you already patch OpenStackNovaProjectGroup::createProjectGroup to add a member on creation? [16:52:57] * Coren fails to parse that code flow. [16:53:11] everything after if ( !$this->projectGroup->loaded or $this->projectGroup->isVirtual() ) { [16:53:19] is handling weird error cases, and handling them badly [16:53:20] 6Labs, 10MediaWiki-extensions-Newsletter: Create a larger newsletter-test instance in labs - https://phabricator.wikimedia.org/T120516#1873383 (10bd808) I added myself to the project and logged in to see if I could figure out what was going on. `vagrant status` showed a created but halted LXC container. `vagra... [16:53:27] Probably that code has been dormant for years [16:53:47] Best to figure out why it’s getting traversed now rather than pay attention to its brokenness [16:54:35] !log newsletter Removed myself (BryanDavis) from project after working on T120516 [16:54:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Newsletter/SAL, Master [16:54:54] andrewbogott: ok thanks and yeah, sorry had to brb for a sec kids and blood and chaos for a moment [16:59:38] andrewbogott: this what makes sense to me atm https://gerrit.wikimedia.org/r/#/c/258462/2/nova/OpenStackNovaProject.php [17:02:12] I hotfixed w/ https://gerrit.wikimedia.org/r/#/c/258462/ [17:02:13] now [17:02:15] and it works for me [17:03:13] project-gtest7:*:52794:novaadmin,rush [17:03:44] I can't make heads or tails of that preemptive depend on deletion stuff and maybe there is an edge case there that we are naively (imo) guarding against [17:04:03] but more probabaly based on comments etc it is arranged crufty from changes over time [17:09:16] chasemp: Yeah, I agree. I can't really make heads or tails of that as it stands. [17:10:00] hotfix is in if you want to take a swing at it [17:10:08] I have to be away in a bit for an hour or so fyi [17:11:40] I'm sure there was once a reason to delete-and-recreate. Maybe the hint is in that "... or is a virtual static group" debug message above. [17:26:16] Coren: I commented inline saying roughly that [17:27:11] Not that I know what a virtual static group is supposed to /be/ in that context. [17:55:31] andrewbogott, chasemp: I've tracked down what it was, and isVirtual() can never be true anymore. [17:55:49] cool, then we can excise that whole code block I think [17:56:07] Pretty sure s/can/should/ even. :-) [17:57:40] Gerrit's gonna restart in a few mins, pushing a config change. Plz don't panic. [18:13:03] chasemp: Are you writing the get-rid-of-isvirtual code or should I? [18:24:40] andrewbogott: Split into https://gerrit.wikimedia.org/r/#/c/258488/ and https://gerrit.wikimedia.org/r/#/c/258462 [18:30:04] YuviPanda: https://phabricator.wikimedia.org/M128 [18:49:11] andrewbogott: chasemp: i'z lunch. bbiab [18:52:21] me too [19:03:04] * YuviPanda reads backscroll [19:03:10] Merlissimo: did that get sorted? [19:06:05] 10Tool-Labs-tools-Other: Migrate https://toolserver.org/~bawolff/en-wn-editor-stats.php to Tool Labs - https://phabricator.wikimedia.org/T60867#1873932 (10Bawolff) >>! In T60867#1482788, @Ricordisamoa wrote: >>>! In T60867#606068, @Bawolff wrote: >> Don't worry, its on my todo list. That was several years ago.... [19:38:42] Merlissimo: I see that the job with that number is already gone [19:41:54] jynus: would <3 if we can upgrade to mariadb 10.1 :D [19:44:00] YuviPanda, do you have a special interest on it, or just because I mentioned the roles? [19:44:37] jynus: I've a special interest in getting tools-db to mariadb, since some applications I want to run require mysql 5.6+ or mariadb 10+ [19:45:01] jynus: what I really want for labsdb is 10.2, which has ANALYZE. that'll allow users to actually tune their queries, since EXPLAIN does not work right no [19:45:03] *now [19:45:05] tools actually would be the esiest [19:45:17] its slave is in 10 already, we only need to failover [19:45:35] 10.2, is that a thing? [19:45:49] 10.1 is still a bit unstable [19:46:15] jynus: yeah, it's not a thing yet [19:46:17] afaict [19:46:25] and I think EXPLAIN FOR connection would work already [19:46:38] yea but is hard to do right [19:46:40] whatever is the mariadb syntax [19:46:41] wait [19:46:43] https://mariadb.com/kb/en/mariadb/analyze-statement/ [19:46:45] it's in 10.1 [19:46:47] actually [19:46:48] yep [19:46:50] not 10.2! [19:46:51] so we'll get that with 10.1 [19:47:03] my interest in a 10.1 upgrade just shot up through the roof into the stratosphere [19:47:35] I do not know if it will work for what you want [19:47:49] but it will fix the dependency on statement based replication [19:47:58] and with that 95% of the replication problems [19:48:21] +1 [19:48:29] how much work do you think it'll take, jynus [19:48:34] I will have to test it though, I do not trust very new releases [19:48:43] ^this is the main problem [19:48:46] seems prudent for things like DBs [19:48:53] we do not want to go from something that has problems [19:48:54] jynus: we can upgrade tools-db and use that as our test bed :) [19:48:57] to something that has more [19:50:38] since it has no replication [19:50:40] err [19:50:43] no incoming replication at least [19:51:30] that is actually not an issue [19:51:54] replication from previous versions tends to work well [19:52:06] we replicate from 5.5 to 10 [19:52:31] it is the compatibility that is risky for the tools, etc. [19:53:03] but yes, that would be the first place to implement it [19:53:30] yeah [19:53:45] I think that's an ok tradeoff. Getting ANALYZE would be worth it [20:14:25] jynus: Do you have a changelog as far as feature incompatibility? [20:14:48] And yes, YuviPanda is right that most users would be jumping up and down in joy at a working analyze. :-) [20:14:50] Coren, no, that is the problem [20:15:03] obviously you have the official one [20:15:17] I take it you don't trust it to be complete then? [20:15:22] which I can search, but I heard some people complaining [20:15:36] about some bugs and suggesting waiting a bit [20:16:05] what I would do is do it opt-in at first [20:16:34] mmm maybe we can add another labsdb replica with 10.2 :D [20:16:39] err [20:16:42] 10. [20:16:44] 11 [20:16:46] 10.1 [20:17:02] if you have the hardware, YuviPanda :-) [20:17:22] I do not have new hardware for production either [20:17:26] yeah [20:17:35] maybe I can scrounge around and get budget for it [20:17:42] please do [20:17:50] jynus: if you can think of a spec for what labsdb box would look like, I'll scrounge around for money [20:18:14] PROBLEM - Puppet failure on tools-worker-08 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [20:18:16] are you thinking of a replica? [20:18:44] or tools? [20:18:48] jynus: replica [20:19:01] ok, I will go back to that ticket soon [20:19:17] Coren: since you're root on this channel, can you give me root too? It's using my old typo'd cloak [20:19:34] err [20:19:36] op [20:19:38] not root [20:19:54] jynus: +1. Once I have a vague spec I can work with robh to get a number and then with a number scrounge around [20:19:55] YuviPanda: I'm pretty sure I can. I haven't talked to chanserv in ages though, lemme look it up. :-) [20:21:25] YuviPanda: I think you are magic, now. [20:21:57] woo thanks [20:22:22] slash cs op #wikimedia-labs to ask it for the bit [20:23:47] hmm [20:23:49] or just [20:23:50] no dice [20:23:50] @op [20:23:52] oh [20:23:54] yes dice [20:24:11] yup [20:24:18] (it also de-ops me in one go too!) [20:24:59] well if you leave the channel you lose all flags [20:25:17] btw you can use wm-bot for most of stuff like this [20:25:18] I think [20:25:21] @kick petan [20:25:58] yes, it works :P [20:26:45] @kick petan [20:26:56] thanks [20:26:58] yup! :D [20:27:11] yw! [20:27:19] @kick wm-bot [20:28:10] I was almost sure I made some code that would prevent that [20:28:15] RECOVERY - Puppet failure on tools-worker-08 is OK: OK: Less than 1.00% above the threshold [0.0] [20:29:03] it also support these hard-to-rememeber syntax quiets [20:29:07] @q shinken-wm [20:29:13] @unq shinken-wm [20:29:29] or join-bans and so on [20:29:36] @jb petan [20:30:34] andrewbogott: If you are back from lunch; I put tabs back. [20:31:16] I'm guessing we want to hot fix and not wait for a deployment train? [20:32:59] I'm back here, thank for amending that [20:33:09] AFA https://gerrit.wikimedia.org/r/#/c/258462/ now [20:33:20] if you are in a position to dpeloy I say go [20:33:33] chasemp: No worries. I ripped out the last part of that isVirtual() dangling bits. [20:33:45] in a separate changeset? [20:34:21] chasemp: Aye: https://gerrit.wikimedia.org/r/#/c/258488/ [20:34:54] great and you caught https://gerrit.wikimedia.org/r/#/c/258488/2/nova/OpenStackNovaProjectGroup.php too [20:35:21] But since wikitech is a real wiki now, I thought we needed to do a "real" deploy in a normal window? [20:35:31] ok that I don't know about [20:35:43] It used to be it was outside of the deployment train afaik but now [20:36:01] ask krenair? [20:36:12] Krenair: ^^ ? [20:36:14] :-) [20:36:56] chasemp: Coren we can test it with a cherry-pick and then do a real deployment [20:37:24] we have tested and I believe it's real deploy time [20:37:32] we just hot patched it this am tho [20:37:38] which if I'm hearing is entirely bad manners [20:39:51] ah ok [20:39:54] chasemp: I think that qualifies as minor heresy. [20:40:05] ya should've been a cherry-pick [20:40:26] no SWAT Today since friday [20:40:26] chasemp: Or at the very least apostasy. :-) [20:40:40] so we should poke greg-g and then find someone versed in mediawiki-ese to deploy this for us [20:40:59] Bit busy at the moment, chasemp, Coren [20:41:02] greg-g is on vacation, ostriches is the current replacement. [20:41:08] ok I'll ask [20:41:09] I'm already asking in -operations. [20:41:12] ah [20:47:34] Coren: you want me to do it or you got it? [20:49:15] chasemp: We gots the all clear; I can do it but I'd have to look up the docs. If you do it, I'll look over your shoulder. [20:50:08] You +2 mine and I'll +2 yours. [20:50:11] :-) [20:50:33] uh I was thinking do both here, ok give me a minute to make sure I'm right that I know what I'm doing [20:50:36] been a bit [20:51:22] No, that's okay - I can do it too since it's no fresher in your memory than in mine. I meant "yours...mine" so that we don't end up with self-+2s. :-) [20:51:40] oh I think in this case it's no worries as andrew gave it a +1 [20:53:25] chasemp: Mergèd. [20:56:07] Coren: twentyafterfour kindly offered [20:56:24] That /is/ kind. :-) [20:57:03] great thanks for separating those commits out, see you in 3 [21:03:42] (03PS2) 10Yuvipanda: Move docker image to directly use Debian Jessie [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/256781 [21:03:46] (03CR) 10Yuvipanda: [C: 032] Move docker image to directly use Debian Jessie [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/256781 (owner: 10Yuvipanda) [21:04:46] (03Merged) 10jenkins-bot: Move docker image to directly use Debian Jessie [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/256781 (owner: 10Yuvipanda) [21:18:53] andrewbogott: chasemp: Here or some other channel? [21:18:57] find w/ me [21:18:59] here’s fine [21:18:59] fine even [21:19:07] sorry about my clunky internet access. [21:19:07] on the dhcp server I'm cool w/ what's simple [21:19:10] I could hear surprisingly well [21:19:26] if we we can hack in our needed options and have it handle dhcp for the mw sanely I"m good w/ it [21:19:36] I feel no ideology about it at all [21:20:01] but I don't understand what context nova-network needs to have to serve dhcp in this case I guess [21:20:10] it seems like it should be simple maybe it is not [21:20:13] It doesn’t need to — it just already does [21:20:29] So — let me try to back up. [21:20:35] what happens right now if promethium comes up and asks for dhcp? [21:20:37] Imagine you’re me, and don’t know anything about networking. [21:20:37] does it get an address [21:20:47] (Aside: fix deployed to Wikitech and tested to work) [21:21:00] You have a black box that says NETWORKING really big on it, and when you plug something in, the right thing comes out [21:21:18] So, I’m reaching out to plug my new thing into that box [21:21:32] and you’re saying, don’t do that, all that happens in that box is [21:21:45] so, I believe you. But I already have that black box [21:21:54] so what’s my motivation to pay attention to [21:22:01] even if only has three items? [21:22:32] ok what are the three items? [21:22:51] 1) serving up the right IP address for the right host to dhcp [21:23:02] 2) routing and/or natting [21:23:32] (I don’t know if 2 is one thing or two things or zero things, and neither do any of us I think) [21:23:46] 3) floating ips which, sure, we don’t have to have them, but the black box already does them [21:23:46] it's 2 things [21:23:52] ah, great [21:23:54] so then 2)routing [21:23:55] 3) natting [21:24:01] but they should come together more or less [21:24:22] alright but assuming we can easily have the existing dhcp serve for this new server [21:24:36] none of the other two should require any nova-network context as I understand them now [21:24:53] so I'm asking, where do we get for doing some hacks in nova-network [21:24:53] that may be so. I know for sure that if nova-network goes down that they stop happening :) [21:25:03] well yeah sure [21:25:11] taht's not like, relevant to this is it? [21:25:17] 6Labs, 10MediaWiki-Vagrant, 15User-bd808: Create "mediawiki-vagrant" project - https://phabricator.wikimedia.org/T120982#1874320 (10bd808) 5Resolved>3Open Reopening. The project had to be deleted to work around a wikitech LDAP problem that has now been fixed. [21:25:19] 6Labs, 7Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#1874323 (10bd808) [21:25:36] I mean to be fair I said I wanted a day or two in order to look at this so I could communicate well on what our optoins seem to be [21:25:44] sure [21:25:46] and instead today you kind dove into it and I just don't know [21:25:52] but what’s the harm in trying a trivial approach in the meantime? [21:25:59] what is the trival approach? [21:26:30] andrewbogott: My point is, making the metal host work when it sits outside the dhcp pool is a known problem with very simple solution; and I fear trying to hack it in nova-network has the potetntial to be brittle. What if openstack decides it needs to update the record when $x happens and we loose the hack? [21:26:54] shrinking the dhcp pool has to be trivial [21:26:59] I think I've even seen the setting [21:27:04] when we looked at the labnet1001/1002 issue [21:27:14] 6Labs, 7Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#799552 (10bd808) [21:29:18] 6Labs, 10Attribution-Generator, 6TCB-Team, 15User-bd808: Create labs projects for lizenzhinweisgenerator - https://phabricator.wikimedia.org/T120925#1874332 (10bd808) Deleted and recreated the project to work around LDAP problems left by T121064. [21:29:43] So — this is moot, since I’m taking monday off [21:29:53] I would love to be proved wrong, and having the routing/natting issues turn out to be trivial [21:29:55] So, have at! [21:30:54] I think I missed previously, is it that in your mind having dhcp from the existing dhcp server requires a database entry or the pseudo-VM hack? [21:31:11] nah, not necessary, we can hack around it by adding a separate table to dnsmasq [21:31:36] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review, 5WMF-deploy-2015-12-15_(1.27.0-wmf.9): Project entries are created in ldap but not the posix group entry (disallowing ssh, etc) - https://phabricator.wikimedia.org/T121064#1874344 (10bd808) I deleted the `mediawiki-vagrant` project, recreated it and spun up... [21:31:46] ok [21:32:10] so two things, the routing is more or less trivial and not really handled by openstack persay because it's a route on the core routers [21:32:16] and then attached networks for the labnet box [21:32:23] victory! thanks for fixing the ldap project bug guys [21:32:26] I think [21:32:54] so assuming that is true, and the SNAT stuff us just by vlan out of the box so we get it for free that seems like all the integration required to get that far [21:33:14] I agree that if that doesn't work out we are at the make this hw seem like a vm place [21:33:18] so that we aren't reinventing teh wheel [21:33:45] bd808: Heh. It was annoying us too, but I'm glad it solved your issue. :-) [21:34:07] chasemp: it should be easy enough to find out :) [21:34:20] agreed so that's where I'm at mentally [21:35:55] andrewbogott: do we know now if promethium boots and asks for an IP...does it get one? [21:36:12] I don’t know — I’ll try it now [21:36:17] what I mean is, is the openstack dhcp stuff already doing it by pre-assigned MAC from a DB [21:36:44] we could argue actually, especially having our own pxe server [21:36:47] we just want to do dhcp in general [21:38:34] openstack dhcp stuff already doing it by pre-assigned MAC from a DB <- you mean in general or for promethium in general? [21:43:51] * andrewbogott wonders if he fell off the internet again [21:44:52] sorry no had to put in a car seat :) [21:45:21] andrewbogott: we should be able to get root on console [21:45:32] and test a few fo these things out as is before imaging, that had been my thought [21:47:21] No, that actually makes sense: log into the console via mgmt and *try* it with dhclient. See what pops. [21:47:48] our current nova network is 10.68.16.0/21 [21:47:49] that's my thought [21:47:57] It’s not possible to carve off a bit without cutting it in half, right? [21:48:13] well is the entire nova network definition also the dhcp pool scope defintion? [21:48:14] andrewbogott: I'm pretty sure the dhcp pool is a range, not a subnet. [21:48:18] I guess I remember tehm being separate [21:48:31] it would be odd if they were teh same as one determines network config and the other network assignment [21:49:32] I see... [21:49:36] | dhcp_server | 10.68.16.1 | [21:49:36] | dhcp_start | 10.68.16.2 | [21:49:42] but no dhcp_stop [21:49:48] that is interesting [21:50:26] Well, we could dhcp_stop somewhere higher than .2 and it'd end up pretty much the same; but it's odd that it'd imply the end of the subnet as the end of the range. :-) [21:51:18] well assuming it just allows teh dhcp server to mange it's own lease pool [21:51:21] it may not be an issue at all [21:51:33] I'm just not sure how specific the dhcp server is with it's clients in this case [21:52:12] I don’t know for sure, but I thought that IPs were assigned by nova and then passed on to dnsmasq [21:52:35] andrewbogott: can you try your root pass on mgmt for promethium? [21:52:40] either I have an old pass or idk [21:52:51] it's "console com2" fyi [21:53:38] 6Labs, 10Tool-Labs, 5Patch-For-Review: Update Java 7 to Java 8 - https://phabricator.wikimedia.org/T121020#1874417 (10valhallasw) 5Open>3declined a:3valhallasw I spent a bit more time on this, but I don't see this working: the packages need more work (dependencies are wrong) and I have no way to easily... [21:53:58] but yeah Coren could just start the valid lease pool higher to the same effect I guess [21:54:04] super weird to have a start and no end entry tho isn't it? [21:54:30] chasemp: Maybe it /accepts/ a dhcp_end too, just defaults to "the whole thing"? [21:54:36] chasemp: Labs instances don’t have root passwords, so probably promethium doesn’t either :( [21:54:49] ....oh :) [21:55:02] We could start the lease pool higher if we didn’t have instances already in the pool. [21:55:04] the image tho came from caron [21:55:14] Yeah, but it was puppetized for labs [21:55:23] I’ll look in the source about lease ranges [21:55:42] andrewbogott: Does something live at .2 now? Just putting the start at .3 would work for testing if nothing else. [21:55:54] * andrewbogott checks [21:56:11] or even just bumping that vm out of that space with a pool change and a reup [21:57:44] hm, certainly pinging .2 and .3 and .4 works [21:58:25] 10PAWS: Make the default PS1 more helpful - https://phabricator.wikimedia.org/T120560#1874448 (10yuvipanda) 5Open>3Resolved a:3yuvipanda It's $MWUSERNAME@PAWS:$CWD $ now [21:59:05] ok I'll move promethium back to labs-support so I can set a root password to test out the rest of it [21:59:44] sure [21:59:58] .2 is an instance called ‘nfstest’ built by faidon in testlabs [22:00:19] I think that was the one he used to break into labstore. :-) [22:00:26] pretty good odds that's not a big deal to move ip's for [22:01:03] I still don’t see any public-facing way to change the dhcp range. Still digging [22:01:10] 6Labs, 10Tool-Labs: Figure out a way to support java 1.8 on tool labs (Merl's bot) - https://phabricator.wikimedia.org/T121279#1874462 (10valhallasw) 3NEW [22:01:20] ^ I would appreciate input on this [22:01:42] YuviPanda: especially from you on whether k8s is 'ready enough' for this -- probably not? [22:03:01] valhallasw`cloud: so 6. is their own project, but that sucks too [22:03:19] well, that’s upsetting [22:03:34] YuviPanda: right. [22:03:51] valhallasw`cloud: 7. is 'write a small proxy in another language that speaks the SSL Java 1.7 speaks on one side and the SSL that wmf speaks on the other' [22:03:51] silly openjdk :( [22:04:07] that's evil! :D [22:04:10] indeed [22:04:13] mitmproxy to the rescue [22:04:15] but let's not [22:04:17] yeah [22:04:44] chasemp: can you make any sense out of the ‘fixed reserve’ docs here? http://docs.openstack.org/admin-guide-cloud/compute-networking-nova.html [22:05:04] valhallasw`cloud: how about, 1. setup a separate project for this, 2. move to K8s in 3-4 months? [22:05:08] seems like that command would let us change dhcp_start, but… ‘reserve’? what the heck? [22:06:20] YuviPanda: yeah, also not a big fan of that :/ [22:06:34] valhallasw`cloud: might be the most workable setup. [22:06:46] definitely don't want to port gridengine to jessie [22:06:50] not sure if merl agrees :-p [22:07:33] the only difference between 6 and 5 is that in (5) the instance lives inside tools and in (6) it does not [22:08:01] andrewbogott: I read that as a) yes vm's are mapped by mac to a specific ip b) you can set a starting point in the network range for dhcp but it assumes it will consume to the end of hte range from there [22:08:09] which does what we are talking but very awkwardly [22:08:26] 6Labs, 10Tool-Labs: Figure out a way to support java 1.8 on tool labs (Merl's bot) - https://phabricator.wikimedia.org/T121279#1874483 (10yuvipanda) 6. Put this in its own projects until we can do (4), which is a few months off now. [22:08:28] YuviPanda: yes, which means you have access to tools nfs [22:08:31] yeah, that’s what it reads like to me too. I just think that ‘reserve’ is a preposterous name for it [22:08:36] truly [22:08:40] might not be that important, though [22:08:47] also someone still has to manage that project [22:08:50] valhallasw`cloud: yup. I don't actually know what that bot needs. [22:09:00] (other than 'java 8') [22:09:01] what kind of a dhcp scheme doesn't include specific scope [22:09:10] for example, if it's using qsub, everything becomes a lot more complicated [22:09:58] yeah [22:11:56] 6Labs, 10Tool-Labs: Figure out a way to support java 1.8 on tool labs (Merl's bot) - https://phabricator.wikimedia.org/T121279#1874486 (10yuvipanda) What features of tools does merlbot use? I think that's an important consideration in figuring out what to do. Specifically: # Does it use qsub/jsub? # Does it... [22:11:57] valhallasw`cloud: commented [22:12:09] thanks! [22:12:55] valhallasw`cloud: Depending on what is required I might be able to provide support for it on its own project. [22:13:00] valhallasw`cloud: until k8s is ready [22:13:19] ok, great :-) [22:13:30] need to figure out what's needed first tho :D [22:13:34] I'm going to go eat food [22:13:36] I'll brb [22:13:37] I wish there were an easy way to statically compile a .deb :( [22:13:49] > easy [22:13:51] > .deb [22:14:01] gotta get hazed first :) [22:15:02] anyway [22:15:04] fooood [22:15:43] ebernhardson: is instance estest1003 defunct or still in use? [22:15:50] (Nothing is wrong, I’m just making space :) ) [22:20:31] andrewbogott: in use, the whole estest100{1,2,3,4} set [22:20:38] ok [22:20:58] to clear up some space, help me convince you all to keep some of elastic1001-16 when they get replaced later this year ;) [22:21:07] (where you all = ops ;) [22:21:24] this year -> ? [22:21:26] 2016? [22:21:28] andrewbogott: yes, kill it [22:21:29] or next few months? [22:21:40] paravoid: ok, thanks [22:21:59] YuviPanda: mark said "They're slated to be replaced by end of this fiscal year (at 4 years of age), which may (or may not) have to be pushed to early start of the next fiscal year (July 2016) if necessary for budget pressure. " [22:22:08] ah :) [22:22:15] ebernhardson: ys, that'd be nice to recycle [22:22:55] YuviPanda: although if we were to use them for a labs replica, they need more disk [22:23:18] hmm [22:23:28] well, maybe. hard to say exactly. they have 493GB each [22:23:40] i think thats after raid0 [22:23:52] or maybe raid1? gah i should know this... [22:24:07] it's raid 0 w/ a software raid 1 on the os partition [22:24:14] which is a small part of the disk [22:24:45] ahh ok, thats it :) [22:33:14] chasemp, Coren, ok I removed one solitary IP from the dhcp pool: 192.168.16.2 [22:33:45] sweet -- I moved back promethium and set a root pass now I'm looking to test the console but it's bieng weird [22:33:55] .3 and .4 are in use [22:35:05] 6Labs, 10Tool-Labs: Figure out a way to support java 1.8 on tool labs (Merl's bot) - https://phabricator.wikimedia.org/T121279#1874520 (10Merl) Are there any plans to support ubuntu 16.04 as sge execution nodes next year? If so the easiest solution would be to keep the ssl force exception for labs until this i... [22:38:28] andrewbogott: We need just the one for testing anyways. [22:38:55] Coren: yes, although it would be nice to have a plan for getting a second one, someday :) [22:39:09] yeah I figure if this works out we clear out a block of 10 or soemthing? [22:39:12] idk [22:39:36] side note, that patch was only the second time I've written php here in 2 years ...just occurred to me [22:39:39] I think [22:39:53] I don't miss it. :-) [22:40:34] andrewbogott: so I got root going on promethium [22:40:40] anything else you want o do w/ it befoer we jump back? [22:40:45] 6Labs, 10Labs-Infrastructure, 6operations, 10ops-eqiad: labstore1002 issues while trying to reboot - https://phabricator.wikimedia.org/T98183#1874524 (10coren) @cmjohnson: We need to get our hands on another then; perhaps put in a req for one or is there one in Dallas we could ship accross? [22:40:51] I'm going to install a few diagnostic packages [22:41:01] chasemp: nothing specific [22:41:19] oh, btw, Coren and chasemp, when hacking keep in mind that currently the active network node is labnet1002, not labnet1001 [22:41:30] I keep making that mistake and getting upset at how there are no logs [22:41:37] Heh. [22:41:42] gotcha I actually no joke just did too when we were talking [22:41:43] 6Labs, 10Tool-Labs: Figure out a way to support java 1.8 on tool labs (Merl's bot) - https://phabricator.wikimedia.org/T121279#1874526 (10Merl) Btw: i am running funtoo/gentoo on my two year old laptop and emerging jdk8 with a clean compile cache uses about 6GB of ram, 7 GB temp space, but completes within 40... [22:41:47] lik wtf iptables rules? oh yeah [22:42:30] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review: labcontrol1001 and 1002 running web servers on 80 and 443 open to all - https://phabricator.wikimedia.org/T120449#1874537 (10Dzahn) a:3Dzahn [22:43:27] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review, 5WMF-deploy-2015-12-08_(1.27.0-wmf.8), 5WMF-deploy-2015-12-15_(1.27.0-wmf.9): Project entries are created in ldap but not the posix group entry (disallowing ssh, etc) - https://phabricator.wikimedia.org/T121064#1874539 (10coren) 5Open>3Resolved Confirm... [22:52:15] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review: labcontrol1001 and 1002 running web servers on 80 and 443 open to all - https://phabricator.wikimedia.org/T120449#1874550 (10Dzahn) fixed by removing the firewall hole for 80/443 that wasn't needed. can't open these URLs anymore now [22:52:27] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review: labcontrol1001 and 1002 running web servers on 80 and 443 open to all - https://phabricator.wikimedia.org/T120449#1874551 (10Dzahn) 5Open>3Resolved [23:00:30] YuviPanda: my search skills arn't working out...where were docs for putting up a repository in tool labs via kubernetes (as in, without managing your own server) ? [23:02:11] oh it looks like normal tools are that way, i was just uncertain because it started talking about lighttpd.conf :) [23:05:41] * ebernhardson is a little scared ssh tools-k8s-master-01.tools.eqiad.wmflabs [23:05:46] let him log in...and logs back out [23:05:52] chasemp: here are some dnsmasq facts! 1) dnsmasq is started/stopped by nova-network, if you ‘service dnsmasq start’ yourself you may end up with more dnsmasqs than you want. 2) it’s configured via /etc/dnsmasq-nova (as specified in /etc/nova.conf) 3) which comes from puppet/modules/openstack/templates/kilo/nova/dnsmasq-nova.conf.erb [23:06:14] this all makes sense [23:06:16] Most of the craziness in there is for aliasing public IPs [23:06:31] (sorry if I’m telling you what was already obvious) [23:07:08] * andrewbogott out for now [23:07:40] andrewbogott: are you coming back tonight? if not or if I'm gone, have fun at the garden thing [23:07:45] talk to you tuesday [23:07:59] I’ll be back a bit this evening, but thanks!