[15:22:49] 3MediaWiki-Core-Team, operations: Deploy multi-lock PoolCounter change - https://phabricator.wikimedia.org/T85071#970891 (10fgiunchedi) what: * build and upload a new version of poolcounterd debian package * coordinate the poolcounter deployment, perhaps under SWAT? ** namely this involves upgrading the poolcoun... [16:09:52] <^d> manybubbles: Any thoughts on the last 2 comments I have on https://gerrit.wikimedia.org/r/#/c/177223/? I can't replicate completely and I'd like to move that chain forward more. [16:12:24] I lost a lot of willpower on that chain when I merged something that would have totally deleted the content index if we'd run it. [16:12:46] thats just really really bad, like break search while we rebuild it bad [16:15:07] <^d> The other option is just merging all of them, pinning deployment rather than using master for a week or two until we shake it all out :p [16:16:43] <^d> So your thought on my issue is "shouldn't happen unless you're a developer and fucking with things yourself?" [16:16:51] <^d> If so, I'm happy with that one and will +2 [16:31:10] 3MediaWiki-Core-Team, MediaWiki-extensions-ContentTranslation, ContentTranslation-Deployments: Content Translation Beta Feature security review - https://phabricator.wikimedia.org/T85686#971047 (10Jsahleen) @CSteipp. To reiterate what Santhosh said, the drafts are only visible to the user who created them and th... [17:14:05] <_joe_> manybubbles: we don't have the wikidata query meetings anymore? [17:14:23] _joe_: in an hour [17:14:33] <_joe_> I'm not invited apparently :) [17:14:51] fixed [17:14:55] I have no idea how that happened [17:15:09] <_joe_> yea no big deal [17:15:26] <_joe_> but I don't want to get too much out of sync with what you guys are doing [18:10:23] bd808: Do you know about/ do we have a bug about the incomplete logging? [18:10:28] That's pretty annoying [18:10:44] hoo: which incomplete logging? [18:11:22] I think all of it is incomplete recently [18:11:36] But especially annoying is the incomplete fatal.log [18:12:56] hoo: First I'm hearing of it. I know there have been several logstash problems over the alst couple of weeks but I don't know of any udp2log problems [18:13:02] *last couple [18:13:42] If you look into it, it only has fatals from the job runners I think [18:13:43] I do know that the fatalmonitor shell script is basically useless right now because of the hhvm switch [18:14:02] I need backtraces for my fatals [18:14:12] The things that used to go into apache2.log are in hhvm.log now [18:14:26] yeah, but that's also incomplete [18:14:41] Jan 12 17:47:13 mw1167: [proxy_fcgi:error] [pid 27774] (70014)End of file found: [client 10.64.32.107:11585] AH01075: Error dispatching request to :, referer: https://meta.wikimedia.org/wiki/Special:MergeAccount [18:14:51] but no such entry in the hhvm.log [18:14:55] o.O [18:15:26] that's a fcgi proxy error. Probably not an hhvm fatal to go with it [18:16:39] bd808: Wait a second, I'm manually running it again an appserver jsut now [18:17:15] * Error while processing content unencoding: invalid code lengths set [18:17:17] meh [18:17:28] compression for the error pages is apparently broken [18:17:42]
[18:17:42] [18:17:42] PHP fatal error:
[18:17:42] request has exceeded memory limit
[18:18:03] that's what I get if I run curl without --compress against an appserver directly [18:18:37] wow, this one shows up in the logs, even [18:18:50] Something like "Fatal error: request has exceeded memory limit in /srv/mediawiki/php-1.25wmf13/includes/db/DatabaseMysqli.php on line 183" [18:19:15] Or "Fatal error: request has exceeded memory limit" [18:19:16] Jan 12 18:16:40 mw1167: #012Fatal error: request has exceeded memory limit [18:19:18] just that [18:19:26] fun [18:19:29] and it only got caught in the logs once [18:19:36] but I fired the request several times [18:19:48] They will buffer [18:19:58] those logs flow through rsyslog [18:20:15] legoktm: Btw, I have no idea whether user merge even works at all right now [18:20:38] if it sees 2 events that match quickly it will wait a while and then log something like "last message repeated N times [ message here ]" [18:20:40] special:mergeaccount or special:globalusemrerge? [18:20:50] legoktm: oh, merge account [18:21:02] bd808: mh... how long will it wait? [18:21:31] I'm not sure. I never dug into the rsyslog settings to find that out [18:21:42] but it's a normal syslog behavior [18:21:45] legoktm: 2015-01-12 14:05:05 mw1103 ruwiki: password Zgrad@enwiki [18:22:01] I don't even get such log entries for my test account while it's processing with the current version [18:22:04] taht use to work [18:23:56] * legoktm looks for what he named his test account [18:24:21] 3MediaWiki-Core-Team, MediaWiki-extensions-ContentTranslation, ContentTranslation-Deployments: Content Translation Beta Feature security review - https://phabricator.wikimedia.org/T85686#971270 (10csteipp) >>! In T85686#968117, @santhosh wrote: > @csteipp, > > - Drafts are private to the creator. User A canno... [18:24:41] "Lego-test" [18:25:48] 3MediaWiki-Core-Team, MediaWiki-extensions-ContentTranslation, ContentTranslation-Deployments: Content Translation Beta Feature security review - https://phabricator.wikimedia.org/T85686#971271 (10csteipp) {F28231} [18:27:06] 3MediaWiki-Core-Team, MediaWiki-extensions-ContentTranslation, ContentTranslation-Deployments: Content Translation Beta Feature security review - https://phabricator.wikimedia.org/T85686#971272 (10Jsahleen) @CSteipp It looks like the necessary patch is waiting on review from Amir. Should be available shortly. [18:27:10] hoo: my log entries show up: [18:27:11] 2015-01-12 18:26:22 mw1254 test2wiki: primary Lego-test@test2wiki [18:27:11] 2015-01-12 18:26:22 mw1254 test2wiki: unresolvable Lego-test@testwiki [18:27:11] 2015-01-12 18:26:22 mw1254 test2wiki: unresolvable Lego-test@testwikidatawiki [18:28:18] yay, that's something [18:28:41] but wtf is wrong with the actual bug [18:28:48] it doesn't even seem to come as far as it used to [18:29:04] although we removed the biggest memory eater that happens in the second loop [18:30:05] bd808: Could it be that, for some reason, wfDebugLog only make it, if the request succeeds? Has anything been changed about that today? [18:31:41] We are using slightly different code for logging since the PSR-3 patches merged. I don't think there should be any new buffering in the MW side but I can take a look to make sure in a bit. [18:32:38] I think we have a Heisenbug here... the more we do to eliminate it, the more mysterious it gets :P [18:33:04] On the group0 wikis the logging is using Monolog now too instead of my hand rolled PSR-3 implementation. Again it should not be adding buffering but I can try to verify that. [18:36:29] Trying it from shell on tin now [18:36:36] 3Wikimedia-Wikimania-Scholarships, MediaWiki-Core-Team: Deploy updated Scholarships application to production - https://phabricator.wikimedia.org/T86370#971288 (10bd808) [18:36:41] Let's see what that can give me [18:48:11] csteipp: How long does it approx. take to test pbkdf2 hash on an appserver? [18:48:23] 1s 1.5s? [18:49:00] hoo... Iirc we were targeting <1s, but I don't remember the exact time for the level we were at. [18:49:44] > $t = microtime( true ); echo $u->authenticate( 'test' ); echo microtime( true ) - $t; [18:49:44] bad password0.59468007087708 [18:49:48] on osmoum [18:49:50] * osmium [18:52:09] So that's to much for Special:MergeAccount [18:52:15] :( [18:52:51] hoo: Yikes [18:52:52] > $t = microtime( true ); echo $u->authenticate( 'test' ); echo microtime( true ) - $t; [18:52:53] bad password2.1064610481262 [18:53:01] on mw1017 (testwiki) [18:53:10] oh :S [18:53:16] So... .5 - 2 seconds probably [18:54:57] that's painful [18:56:13] legoktm: Looks like we need to find a hack here :( [18:56:29] if PHP were to support multi threading *sigh* [18:56:42] :/ [18:57:09] so are we running out of time instead of memory? [18:57:15] can we just increase the time limit? [18:57:50] legoktm: Apparently we run out of memory... but we would also clearly run out of time [18:58:20] playing it through on osmium right now, to see how high the actual memory peak would be [18:58:23] and how much time it would take [18:58:29] for my large test account [18:59:35] legoktm: I keep increasing $wgMemoryLimit slowly... :P [19:00:09] You could look for identical salts, in case they have old style accounts, or they had a global account that was unmerged. Both edge cases. [19:00:11] Took 263s for my test account on osmium [19:00:36] A major feature of pbkdf2 is that it is slow. If we are doing things on purpose that require lots and lots of pbkdf2 hash generations... it's going to be slow. [19:00:50] > echo memory_get_peak_usage(); [19:00:50] 20416392 [19:01:05] legoktm: ^ no idea why hhvm says it runs out of memory [19:01:07] is that in gigabytes? [19:01:07] <^d> Reedy: 128MB should be more than enough :) [19:01:11] We should install a rack of gpu's on the cluster for when we want it to be fast ;) [19:01:31] for [19:01:32] $u = new CentralAuthUser( 'Hoo MergeAccount Test' ); [19:01:34] ^d: it's at 330MB :P [19:01:42] $u->storeAndMigrate( array( '...' ) ); [19:01:44] <^d> should be more than enough [19:01:58] I haven't looked at OOM count generally [19:02:06] Oh [19:02:11] but it didn't work in the end [19:02:13] * hoo wtfs [19:02:25] csteipp: We should finish the SUL migration and get rid of the idea of merging accounts based on common passwords :) [19:02:38] bd808: But apperently logs are buffered [19:03:03] Oh wait no, we buffer the password stuff, so not even necessary [19:03:51] We could make the password checking asynchronous, with just a small rework of that page [19:04:12] csteipp: How would that work? And would that also work for automerge on login? [19:04:43] hoo: Only for the webpage, most likely [19:06:47] We already keep the list of accounts that we know, and the ones we don't. And we keep all of the plaintext passwords we know in memory (iirc). So limiting the number of attempts to 50 or so, then adding a "keep checking with this password" button... [19:07:31] Or we add an api that looks at the users sessions for which accounts have been checked, and then fill in the UI as we have results [19:09:12] csteipp: Mh, that's quite some work [19:09:24] as a stop gap, we could check the email for matches before checking the password [19:09:27] that's way cheaper [19:17:09] 3Wikidata, wikidata-query-service, MediaWiki-Core-Team: Deploy a Wikidata complex query service into production - https://phabricator.wikimedia.org/T85159#971459 (10GWicke) [19:18:29] 3Wikidata, wikidata-query-service, MediaWiki-Core-Team: Deploy a Wikidata complex query service into production - https://phabricator.wikimedia.org/T85159#940222 (10GWicke) [19:21:56] legoktm: csteipp: https://gerrit.wikimedia.org/r/184407 [19:21:57] untested [19:29:32] hoo: Definitely test that the operator precedence short circuits out the matchHash... [19:29:53] AaronSchulz: https://gerrit.wikimedia.org/r/#/q/status:open+topic:kill-mwexception,n,z [19:31:00] csteipp: In most cases it can't. That will only have effect, if we have an account with a verified email, that doesn't belong to our global account [19:31:06] edge case, I guess [19:31:12] but worth catching [19:32:37] 3Wikidata, wikidata-query-service, MediaWiki-Core-Team: Deploy a Wikidata complex query service into production - https://phabricator.wikimedia.org/T85159#971532 (10GWicke) [19:33:19] ori: did you see brad's comment? [19:35:59] hoo: Yeah, short circuit works. I haven't tested the whole thing yet, but I like the idea [19:36:56] AaronSchulz: no [19:37:18] commit summary could be longer :) [19:38:16] I could change the arrow from -> to --> [19:39:09] bbiaf [19:39:54] 3MediaWiki-General-or-Unknown, MediaWiki-Core-Team: Skin api - https://phabricator.wikimedia.org/T86570#971567 (10Paladox) 3NEW [19:55:47] ori: https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/services/jobrunner,n,z [20:01:33] legoktm: Guess we should SWAT the new patch tonight and then create a bug to get this stuff fixed properly [20:14:23] anomie: if that one field was made protected, then calling super should work, since parent classes can see protected child fields [20:14:57] AaronSchulz: Yes [20:15:50] though $linkInsertions is private [20:21:50] it's odd that one can call with different titles and get the same results as the first time [20:26:00] anomie: I'm thinking it would be better to go a different route, perhaps making a LazyDataUpdate class with a resolve() method (to get the real one), subclasssing it, and having getSecondaryDataUpdates() resolve any such ones on the fly [20:26:59] that also avoids references (some circular) in the serialization, which works with serialize() but is still weird [20:29:59] hasCustomDataUpdates() would just turn into a method to check for non-lazy updates [20:30:06] AaronSchulz: What would be the difference between a DataUpdate and a LazyDataUpdate? [20:30:54] most of the input is already in the output, we just need to pass that the the LazyDataUpdate class [20:30:59] e.g., now we have "$linksUpdate = new LinksUpdate( $title, $this, $recursive );" [20:31:48] all we need is the logic to do that when needed, which simple, much simpler than doing it immediately and playing whack-a-mole with DB handles, caches, and private vars [20:32:38] *the parser output [20:35:48] So LazyDataUpdate would be a one-method class that ParserOutput::getSecondaryDataUpdates() would call some sort of $lazy->foo( $this ) method to create the actual DataUpdate? [20:36:54] more or less, subclasses might have extra fields in the class and logic for foo() [20:38:40] Makes sense to me. [20:47:47] legoktm: thanks for all the reviews you did for the last few weeks. Very helpful :-] [20:48:48] :) [21:37:16] ^d: https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/services/jobrunner,n,z [21:37:25] a few simple bits [21:52:02] 3MediaWiki-API, MediaWiki-General-or-Unknown, MediaWiki-Core-Team: Fatal Error: Call to a member function disable() on a non-object (wfHttpError should check, if wgOut is an instance of OutputPage) - https://phabricator.wikimedia.org/T86398#972064 (10Anomie) [22:04:19] hey folks, sorry, the machine in our conference room isn't behaving [22:17:58] Reedy: I [22:18:03] err [22:18:24] Reedy: I'd like to deploy https://gerrit.wikimedia.org/r/#/c/181130/3 tomorrow after the train. A review from you would be awesome. [22:19:06] woo :) [22:19:08] legoktm: Could you test and +2 the change I had earlier? I'd like to SWAT that later on [22:19:20] still to many errors from that special page... [22:19:27] sure [22:24:31] 3Librarization, MediaWiki-Core-Team: Update [[mw:Composer]] to include information about usage with libraries - https://phabricator.wikimedia.org/T85172#972173 (10Legoktm) a:3Legoktm [22:24:45] Reedy: Would it make sense for you to just roll that out in the train window? [22:25:14] Can't see any reason why we couldn't after the bumps and it's settled [22:25:38] sweet. I'll take my -2 off of it [22:29:10] 3MediaWiki-Core-Team: Remove ProfilerStandard - https://phabricator.wikimedia.org/T86055#972184 (10Chad) 5Open>3Resolved [22:30:15] 3VisualEditor, MediaWiki-Configuration, VisualEditor-MediaWiki, MediaWiki-Core-Team: convertExtensionToRegistration.php does not set defaults for globals, so $wgResourceModules += array(...) and similar cause fatals - https://phabricator.wikimedia.org/T86311#972204 (10Legoktm) [22:30:20] 3Continuous-Integration, MediaWiki-Core-Team, MediaWiki-Configuration: Update jenkins for extension registration changes - https://phabricator.wikimedia.org/T86359#972205 (10Legoktm) [22:30:28] 3MediaWiki-Core-Team: New LazyDataUpdate classes - https://phabricator.wikimedia.org/T86597#972207 (10aaron) 3NEW a:3aaron [22:31:47] 3VisualEditor, MediaWiki-Configuration, VisualEditor-MediaWiki, MediaWiki-Core-Team: convertExtensionToRegistration.php does not set defaults for globals, so $wgResourceModules += array(...) and similar cause fatals - https://phabricator.wikimedia.org/T86311#972219 (10Jdforrester-WMF) >>! In T86311#968918, @Lego... [22:35:03] +» » » » » if ( $passwordConfirmed[$wiki] ) { [22:35:03] 789 +» » » » » » $passingMail[$local['email']] = true; [22:35:03] 790 +» » » » » } [22:35:09] hoo: ^ that part seems... :| [22:35:34] wait [22:35:37] you didn't add that part [22:36:01] where's the problem? [22:40:12] In my head, I read the code wrong [22:40:13] +2'd [22:42:27] <_joe_> TimStarling: would you mind to see if your PCRE patch can apply to HHVM 3.3.1? [22:47:00] ok [22:52:33] greg-g: Already have the tracking bug: https://phabricator.wikimedia.org/T1272 [22:52:40] 3MediaWiki-Core-Team, MediaWiki-extensions-TitleBlacklist: Title blacklist intermittently failing, allowing users to edit things they shouldn't be able to - https://phabricator.wikimedia.org/T85428#972307 (10Magog_the_Ogre) This is now occurring on Commons as well. ``` File:10612668 1469368410002666 509505423... [22:55:39] anomie: subsribing [22:56:31] +c [23:00:07] <_joe_> TimStarling: thanks, this would speed me up a lot [23:01:19] <_joe_> TimStarling: btw my usual deploy strategy for a package is: deploy on beta and CI, wait a few days to see if disasters happen, deploy to a subset of live hosts (actually 10 appservers and 5 api servers), wait ~ 3 days , deploy to the rest of the cluster [23:04:19] ori: We should come up with a hackathon-like collaboration project to work on during the "slack time" in the all-hands/tech summit week. Is there something you've been wanting to do and just not found time poke at? [23:04:33] *time to poke at [23:05:24] _joe_: backporting this is lots of fun since I had to move every existing asterisk character in the file to get it to pass code review [23:05:52] <_joe_> shit. [23:06:06] bd808: (a) I never finished the work to make PyBal optionally monitor more than one port on a backend host (right now it maintains an idle connection to port 80, but it should do that for port 9000 / FastCGI too). [23:06:25] I did the cherry pick, there are quite a lot of conflicts [23:06:28] anyway, I'll work through it [23:06:51] <_joe_> ori: I'd love to work on that too btw [23:07:16] <_joe_> but I have already the "push HHVM to debian" goal of mine [23:07:38] (b) I'd like to produce flame graphs from exception and fatal traces. Rather than showing the relative time code is on-CPU, as the current graphs show, the exception/fatal graphs would show which code-paths are most error prone. [23:08:20] both sound interesting [23:10:04] getting the exception traces out of logstash might be fun for the flame graphs [23:10:22] assuming I have pretty traces in logstash by then [23:11:03] (c) AaronSchulz is out to kill MWException; we could support him in that. [23:11:26] yeah. That would be awesome too. [23:13:29] Sounds like there are plenty of ideas. Maybe we can sketch out some details this week and make tasks in phab [23:14:11] rationale for the latter? [23:14:24] makes extracting libraries easier [23:15:16] I think when we looked at this a couple months ago we found that not many things that extended MWExceptiopn really used the features of that class [23:17:06] bd808: Not sure I told you... but the logging stuff (wfDebugLog) seems alright (yay) [23:17:16] but having a fatal.log for all hhvm fatals would be awesome [23:18:36] we may be able to get that. hhvm calls the installed error hander on most fatals as I recall. [23:20:11] hoo: Actually it looks like be may be sending all hhvm fatals to the "error" channel now (or at least trying to) [23:20:42] https://github.com/wikimedia/mediawiki/blob/master/includes/exception/MWExceptionHandler.php#L236-L261 [23:21:05] mh [23:21:12] why does a class with the name MWExceptionHandler handle fatals? :P [23:22:07] because when I wanted to add a shutdown function to log errors I put the logic there :) [23:22:14] InitialiseSettings.php: // 'error' => "udp://$wmfUdp2logDest/error", // Logs warnings, notices and such [23:22:23] ah, dammit [23:22:26] why is that? [23:22:31] do we also throw notices in there? [23:23:32] Having fatals logged would help me a lot... and I guess I'm not alone there [23:23:41] The notices part is from Sam and Timo's attempt to get stack traces for notices I think [23:24:21] I think there may be an outstanding patch or two related to that too. [23:27:23] I see... [23:27:29] but well, it's an actual problem [23:29:13] yeah we can fix it up I think. Easy enough to log to the fatal channel on fatals [23:29:50] looking on fluorine to see if it is even working as hoped [23:31:12] ugh there's noise in there from suppressed things too [23:31:49] actually that may be what Timo has a patch open for that I was supposed to review [23:33:01] Would be nice to get that done [23:33:04] manybubbles: cirrusSearchLinksUpdate: 15050363 queued; 1 claimed (1 active, 0 abandoned); 0 delayed [23:33:11] seems kind of high for enwiki [23:37:39] 3MediaWiki-Core-Team: Fix onResponses in VirtualRESTServiceClient - https://phabricator.wikimedia.org/T85016#972445 (10bd808) 5Open>3Resolved [23:37:55] 3MediaWiki-Core-Team, Wikidata, wikidata-query-service: Validate Java 8 package - https://phabricator.wikimedia.org/T85174#972449 (10bd808) 5Open>3Resolved [23:38:02] 3MediaWiki-Core-Team, SUL-Finalization, MediaWiki-extensions-Renameuser: Renameuser has no debug logging - https://phabricator.wikimedia.org/T85042#972450 (10bd808) 5Open>3Resolved [23:38:11] 3MediaWiki-Core-Team, operations: HHVM gets stuck in what seems a deadlock in pcre cache code - https://phabricator.wikimedia.org/T1194#20577 (10bd808) [23:43:26] ori: https://gerrit.wikimedia.org/r/#/c/183999/ [23:54:36] anyone know why mw.format would be undefined in the default vagrant setup? [23:54:56] this tells me it's available since 1.25: https://doc.wikimedia.org/mediawiki-core/master/js/#!/api/mw [23:55:12] and Special:Version says vagrant is at 1.25 alpha