[00:00:54] bd808: I'm supposed to bug you about the SoS task for getting wikitech onto the train. Is that something you want? [00:01:07] Ops isn't really sure what/why of it [00:01:14] ffs [00:02:29] so wikitech is using hetdeploy today thanks to work by Andrew, Sam and myself, but scap and sync-* don't update it because Andrew was scared to give shell deployers access to virt1000 so that scap can acutally be used [00:03:01] The state of wikitech today is "works as long as a root remembers to run sync-common once in a while" [00:03:20] this is dumb and it should jsut be deployed during a normal scap [00:03:49] or we should take it back out of hetdeploy and just let ops update it "whenever" [00:03:59] [00:05:42] I can say all of this on the phab task I guess. Not sure how much more I can do than that. [00:07:49] 3MediaWiki-Core-Team: Investigate memcached-serious error spam that mostly effects HHVM API servers - https://phabricator.wikimedia.org/T75949#977876 (10tstarling) 5Open>3Resolved a:3tstarling If we changed something and it fixed the problem, that's sufficient confirmation for me. [00:17:29] pffft. compared to our ability to dcrew up enwiki, virt1001 sounds sooooo insignificant:P [00:18:08] speaking of deployment.... [00:20:21] bd808, i had a freak accident yesterday when a routine config change resulted in a flood of warnings that looked like one file was pushed, but not another. it was fixed only after touching everything in wmf-config and resyncing TWICE [00:22:34] 3MediaWiki-Core-Team: Investigate master revision query on enwiki edit form - https://phabricator.wikimedia.org/T86862#978027 (10aaron) 3NEW a:3aaron [00:35:29] the_nobodies: yuck. I'm not even sure what touching anything other than initializessettings would do. [00:35:42] oh, maybe hhbc cache invalidation? [00:43:35] bd808: so I have a prototype web app that you post your composer.json files to and it spits out a tarball of the vendor directory [00:54:28] 3MediaWiki-Core-Team: Investigate master revision query on enwiki edit form - https://phabricator.wikimedia.org/T86862#978264 (10aaron) [01:06:29] legoktm: That sounds neat. Who is the audience? [01:10:49] bd808: the extension management system that exists in my head. Instead of ExtensionDistributor packaging dependencies, the magical web interface downloads tarballs, checks if they have composer.json files, if they do it sends them to this web app, which gives it a vendor directory that has everything they need [01:12:06] Ah. So a web distribution configurator for MW that spits out a single tarball for you to unpack and run basically? [01:13:08] yes, except this only handles the vendor stuff for now [01:13:56] *nod* Do you have it running in labs or just on your laptop? [01:16:00] 3MediaWiki-Core-Team: Support a nice sso experience with MediaWiki's OAuth - https://phabricator.wikimedia.org/T86869#978306 (10csteipp) 3NEW [01:17:29] just on my laptop for now [01:18:56] 3MediaWiki-Core-Team, Librarization: Move content of library guidelines RfC to Manual namespace on mediawiki.org - https://phabricator.wikimedia.org/T86870#978328 (10bd808) 3NEW a:3bd808 [01:33:33] anyone have a list of our top-level domains handy? [01:33:47] ones actually hosting projects (rather than just redirecting / typosquatting) [01:34:01] ori: Fatalmonitor is full of "Warning: Failed connecting to redis server at fluorine.eqiad.wmnet: Connection timed out" I think it's the profiler [01:34:21] yeah, sounds like xenon [01:34:26] i'll check, thanks for flagging [01:34:40] ~490 per 5 minutes [01:34:52] i bet it's the iptables stuff faidon pushed out [01:35:01] I was just thinking the same thing [01:35:21] udp2log added ferm with a tcp drop today [01:35:35] yep. redis is up and accepting connections. i can connect from fluorine itself. but not from other hosts. [01:35:40] it broke scap in beta for a while too [01:35:58] i already poked aaron about that [01:36:38] before i git-log, do you have the change # handy? [01:37:13] I don't any more. I had it in my browser earlier today but closed it [01:37:46] Ia91a6816b6d40e96e801e06981be1638621fb597 [01:37:57] yeah, I just found it too. Thanks [01:38:42] beta's fix was https://gerrit.wikimedia.org/r/#/c/185085/ [01:38:57] Not redis but same type of problem [01:39:04] that's a bit sloppy of faidon, it can't be that hard to run netstat and see which ports a machine is listening on [01:40:42] bd808, more sync fun: https://gerrit.wikimedia.org/r/#/c/185106/ didn't make it to mw1024 [01:41:52] the_nobodies: o_O did sync-file at least tell you it failed? [01:45:22] hmm [01:45:52] i remember it barked about something, but closed that window too soon [02:03:52] bd808|BUFFER: I put the code up at https://github.com/legoktm/composer-proxy [02:05:02] legoktm, Krinkle: thanks a ton for jumping in to debug the schema HTML bug [02:05:48] np [02:43:55] I didn't realise that TitleBlacklist logging patch of csteipp's was never fully deployed [02:45:46] By the time I went to deploy it, Brad had confirmed it... but it should have gone out [02:46:33] I'm just wondering how to confirm the fix [02:47:24] I think I'll enable your log [03:05:18] 3MediaWiki-extensions-TitleBlacklist, MediaWiki-Core-Team: Title blacklist intermittently failing, allowing users to edit things they shouldn't be able to - https://phabricator.wikimedia.org/T85428#978563 (10tstarling) It is probably fixed, but I have enabled some extra logging to be sure. We will need 24 hours... [03:56:40] TimStarling: do you know what the zhwiki:* SERVER ERROR failures in memcached-serious are about? [03:57:07] maybe a value which is too large [03:58:14] * ori will poke further. [03:58:21] (memcached, not you :P) [04:37:31] the poor optics of memcached has been implicated in several of our most severe bugs [04:38:00] indirectly, at least. not causing the bug, but allowing it to go undetected. [05:02:08] 3MediaWiki-Core-Team, MediaWiki-API: Clean up ApiResult and ApiFormatXml, create format=json2 - https://phabricator.wikimedia.org/T76728#978676 (10Mattflaschen) [05:20:28] node.js is purely cooperative multitasking, right? it has no pre-emption or threading? [05:21:49] apparently nodejs.org only has an API reference, and a link to some lessons you can take if you don't understand the API reference, I haven't found any architecture discussion except for on reddit.com [06:08:35] TimStarling: basically, yes. There is the cluster module (http://nodejs.org/api/cluster.html) which gives you an API for forking and supervising node.js subprocesses but it was bolted on as an afterthought [06:11:35] TimStarling: when nodejs came up on [engineering] I wrote a reply to CScott explaining why I think it's a horrible development environment by looking at one specific and representative example. I never sent it because I thought it wouldn't do anything except fan a flame-war, but here it is, for your reading enjoyment: https://dpaste.de/eEie/raw [06:14:59] this is also great reading: [06:15:00] https://code.google.com/p/v8/issues/detail?id=3692#c6 [06:15:17] well, it's an asynchronous paradigm [06:15:57] the use case is a bit synthetic [06:16:02] yes, but there's fs.readFileSync(), because there's a difference between nudging you toward async APIs and cramming async down your throat and not giving you an alternative [06:17:02] it seems like your objection would be resolved if readFileSync() were removed [06:17:32] then async would be consistently crammed down your throat, instead of inconsistently [06:17:37] making the dev experience consistently miserable, you mean, rather than having an uneven sprinkling of convenience [06:17:38] right [06:19:12] It's not really synthetic, I mean, there's an SO question and the top answer (which recommends the library that busy-loops) has 48 upvotes [06:23:35] nodejs's author abandoned it; the version of the engine it runs on is no longer supported ("Sometimes, it's possible to cherry-pick patches from upstream but only if they are small and apply cleanly. In general, however, you should not expect any fixes to be forthcoming."); and now the most active contributors have forked the project into io.js, citing irreconcilable differences. [06:24:53] http://i.imgur.com/leIv0Ah.gif [06:25:42] I think it's fine for an asynchronous framework to be consistently asynchronous [06:26:02] the right answer to the question "how do I do synchronous thing X in asynchronous platform Y" should really be "don't do that" [06:26:42] maybe there is some confusion or overreach about node's use cases [06:27:06] as long as everyone agrees it's a special-purpose framework rather than a suitably-generic development environment [06:27:47] yeah, definitely [06:32:22] well, there are generators now, with yield [06:33:33] you could possibly transform async I/O to something resembling synchronous I/O using a similiar feature [06:34:05] i.e. say if readFileSync() actually yielded control to the main event loop [06:34:32] then other events could be processed, then the readFileSync() caller could be resumed once data is available [06:35:11] of course it would break some data consistency assumptions, most code would not expect variables to randomly change during a readFileSync() call [06:39:14] maybe some day [06:41:22] tracking bugs are still allowed, right? [06:42:23] maybe not, I see T2001 is closed [06:45:51] tracking buts that are indended to be indefinitely open are typically converted to projects now [06:45:51] bugs* [07:02:58] it says something about our approach to projects [07:07:10] the problem with 'yield' and using ES6 features in nodejs generally (except trivial ones that can be shimmed) is that it undercuts at the one good value proposition nodejs makes, which is the ability to move rapidly between front-end and back-end code without having to context-switch. [07:08:05] ES6 is not going to be usable for front-end code any time soon [07:08:33] we're still avoiding ES5 constructs in mediawiki (though oojs depends on an es5 shim library, so that may change) [07:18:13] <_joe_> good morning [07:29:13] hello [12:01:41] 3MediaWiki-extensions-CentralAuth, SUL-Finalization, MediaWiki-Core-Team: Cherry-pick and deploy fd41d20010 from facebook/hhvm - https://phabricator.wikimedia.org/T86813#979247 (10Joe) [13:32:09] 3Wikidata, wikidata-query-service, MediaWiki-Core-Team: Write example queries in different query languages - https://phabricator.wikimedia.org/T86786#979381 (10Lydia_Pintscher) [13:32:10] 3Wikidata, wikidata-query-service, MediaWiki-Core-Team: Wikidata Query - make unit tests for domain specific language - https://phabricator.wikimedia.org/T86832#979379 (10Lydia_Pintscher) [14:06:29] * manybubbles is removing ants from the tea side of his coffee maker..... [14:50:36] manybubbles: Is ant tea any good? [14:50:43] hellish [15:19:27] * anomie hopes the tree trimming next door doesn't drop a branch to take out his internet [15:58:13] 3MediaWiki-ResourceLoader, operations, MediaWiki-Core-Team: Bad cache stuck due to race condition with scap between different web servers - https://phabricator.wikimedia.org/T47877#979615 (10Nemo_bis) > So we think this is inherent to the design of our deploy mechanisms? If so, this report should be moved rom M... [15:59:43] 3MediaWiki-ResourceLoader, operations, MediaWiki-Core-Team: Bad cache stuck due to race condition with scap between different web servers - https://phabricator.wikimedia.org/T47877#979621 (10matmarex) [16:00:57] 3MediaWiki-ResourceLoader, operations, MediaWiki-Core-Team: Bad cache stuck due to race condition with scap between different web servers - https://phabricator.wikimedia.org/T47877#979631 (10Jdforrester-WMF) It's an over-optimisation in ResourceLoader which creates a bug exposed by the way we deploy. [17:22:41] 3MediaWiki-Core-Team: Support a nice sso experience with MediaWiki's OAuth - https://phabricator.wikimedia.org/T86869#979812 (10csteipp) [17:22:51] 3MediaWiki-Core-Team, MediaWiki-extensions-OAuth: Support a nice sso experience with MediaWiki's OAuth - https://phabricator.wikimedia.org/T86869#978306 (10csteipp) [17:32:28] hah, thanks legoktm [17:32:46] (for the bump on friendsofphp) [17:32:58] and merged! [17:33:15] Now let's test how often sensio deploys code... ;) [17:41:01] csteipp: it's already deployed :D [17:41:08] The checker detected 1 package(s) that have known* vulnerabilities in [17:41:09] your project. We recommend you to check the related security advisories [17:41:09] and upgrade these dependencies. [17:41:27] Yep, Jenkins just alerted me :) [17:43:58] 3Deployment-Systems, MediaWiki-Core-Team, Librarization: Have a check for reported security issues in dependencies - https://phabricator.wikimedia.org/T74193#979878 (10csteipp) After https://github.com/FriendsOfPHP/security-advisories/pull/44 was merged, I reran the jenkins build and was notified about the updat... [17:47:38] 3Deployment-Systems, MediaWiki-Core-Team, Librarization: Have a check for reported security issues in dependencies - https://phabricator.wikimedia.org/T74193#979889 (10greg) >>! In T74193#979878, @csteipp wrote: > Should I have Jenkins email security@ instead of just me? +1 to redundancy [17:56:33] csteipp: so now it just needs to be turned into a real jenkins check? or is that already done? [17:57:31] legoktm: yes, it does... I'm not sure what that's supposed to look like, but yes. [18:16:00] suddenly quiet everywhere with the bots being knocked off by labs updates [18:16:02] good [18:16:02] time for breaking changes [18:16:04] Reedy: https://gerrit.wikimedia.org/r/#/c/185210/ would switch all wikis to log via monolog. The data in logstash looks good. The monolog+redis events are actually logging more events per unit time than the udp2log ones [18:16:04] Reedy: Wondering if today is a good day to switch them all over? [18:16:04] MatmaRex: ^ :) [18:16:25] hmmm except not always apparently. more memcached-serious events recorded via udp2log than redis [18:19:35] hmmm... logstash isn't keeping up with popping the redis queue. backlog of 103K events in logstash1001 [18:19:41] that's not great [18:19:59] seems to be decreasing but still [18:20:56] 3Wikidata, wikidata-query-service, MediaWiki-Core-Team: Write example queries in different query languages - https://phabricator.wikimedia.org/T86786#979959 (10JanZerebecki) [18:20:57] 3Wikidata, wikidata-query-service, MediaWiki-Core-Team: Write example queries in different query languages - https://phabricator.wikimedia.org/T86786#979960 (10Manybubbles) https://www.mediawiki.org/wiki/Wikibase/Indexing/Query_examples [18:23:24] redis queue depth is 0 on logstash1002 and logstash1003. Likely that things are getting backed up on logstash1001 because it is also processing all of the udp2log events [18:24:34] I should figure out how to trim those down since quite a few of them are duplicates now (apache2, hhvm) and more will be once we get monolog running everwhere [18:25:37] backlog down to 65K now so if clears pretty fast [19:38:46] "more memcached-serious events recorded via udp2log than redis" duh. redis is not getting wikipedia events yet [20:00:58] ori: Did you figure out what is broken about the "zhwiki:preprocess-hash:715b4007d719254ecd0ba9a09e83dfe1:1" and "zhwiki:preprocess-hash:b878bc90c624257155bc0aa0e4b4e0c5:1" memcached keys? [20:02:21] bd808: no. the ones in the log had expired by the time i looked. i meant to tail -f the log so i can catch one live but haven't yet. [20:06:11] 3Community-Engagement, MediaWiki-Core-Team, MediaWiki-extensions-General-or-Unknown: Use structured feedback for MediaWiki core's feedback tool - https://phabricator.wikimedia.org/T86956#980164 (10Whatamidoing-WMF) 3NEW [21:14:54] 3Librarization, MediaWiki-Core-Team, Deployment-Systems: Have a check for reported security issues in dependencies - https://phabricator.wikimedia.org/T74193#980413 (10hashar) Can we get the job defined with Jenkins Job Builder in `integration/config.git`. Multiple people should be able to help doing the convers... [21:31:11] 3Deployment-Systems, MediaWiki-Core-Team, operations, Release-Engineering: Update servers in scap rsync proxy pool - https://phabricator.wikimedia.org/T1342#980513 (10greg) >>! In T1342#974944, @Reedy wrote: > https://gerrit.wikimedia.org/r/#/c/184817/ was merged yesterday, we good here? [22:52:46] Tim-away: now that the jobrunners are on HHVM it's easy to see what is keeping them busy: [22:53:48] also, what was the context for you asking about node.js yesterday? Were you planning on bringing up the issue of development environments for services? If not, I think I will. Mostly I just want there to be an alternative; I'm not really picky about what it is. [22:54:06] very nice [22:55:16] I was asking about node because I was planning a solution for T45888 [22:57:43] So the task calls for a thin service that handles requests by making multiple requests to backend services and composing the result into a single response? [22:59:22] that's what gwicke thinks [22:59:32] like I said there, I am leaning towards an internal batcher [23:01:38] batching imageinfo requests doesn't work as well as I had hoped, since without core modifications, it only allows a single urlwidth to be specified for the whole batch [23:06:55] TimStarling: how much of the per-request overhead is time spent in MediaWiki code, and how much is in lower layers (HTTP and TCP)? [23:07:42] I haven't measured it with HHVM, but presumably almost all is in MW [23:09:00] those xenon flame graphs omit file-level code [23:10:36] oh! Duh. Yeah. I'm stripping include / require. I didn't think they'd be useful but that's clearly wrong. [23:17:25] was the xbox stuff hopelessly broken? IIRC Facebook uses it in production, so it can't be that unstable. I'm looking at hphp/runtime/server/xbox-server.cpp, and it appears that the underlying protocol is HTTP. So having node.js dispatch xbox requests wouldn't be hard. [23:18:35] I tried running the HHVM unit tests under an xbox server, and a lot of them crashed for xbox-related reasons [23:19:51] hmm, or was that pagelet? [23:19:57] one of those other servers anyway [23:20:08] I think you tried both and neither worked very well [23:20:15] <^d> Did you guys try and Xbox 360 or Xbox One? [23:20:43] swtaarrs: any insights to share on xbox / pagelet? Is it true that you guys use them in production? Is it sane to expect them to be stable for production use? [23:20:43] <^d> s/and/an/ [23:20:55] yeah we use pagelets pretty heavily in prod [23:21:01] and lots of scripts use xbox [23:21:19] what's the case for using one or the other? [23:21:21] TimStarling: how exactly did you run them? [23:21:54] my IRC logs probably have the details... [23:22:12] ori: I'm not really sure what the difference between the two is, but they're roughly equivalent to kicking off an async curl request to yourself [23:22:16] and doing other stuff while that processes [23:22:30] we use pagelets to render different parts of the page in parallel [23:22:31] 17:27 TimStarling: xbox test rig: http://paste.tstarling.com/p/cHnKql.html [23:22:37] yeah that's the one [23:22:48] 23:07 TimStarling: xbox turns out to be totally broken for its own reasons [23:22:49] 23:07 TimStarling: so I set up apache/fastcgi instead, and hit it with siege, it worked just as well [23:22:49] 23:08 TimStarling: uncovered bugs that were actually bugs in my code rather than in xbox [23:23:13] 23:31 ori: TimStarling: is the pagelet server any better? [23:23:13] 23:31 TimStarling: yes, I was just wondering that myself [23:23:13] 23:31 TimStarling: the pagelet server uses HttpRequestHandler to handle requests, same as fastcgi [23:23:15] 23:32 TimStarling: there are only two entry points that correctly call requestInit() -- CLI and HttpRequestHandler [23:23:17] 23:32 TimStarling: so pagelet should be OK since it uses HttpRequestHandler [23:23:42] 2014-07-03, fwiw [23:23:55] I think the xbox server is meant to be used with fb_call_user_func_async, which I'm not sure exists in the open source build [23:24:19] you missed this bit which swtaarrs probably is interested in: [23:24:31] Jul 04 16:23:01 it is a bit ridiculous, it calls the requestShutdown() hook but not the requestInit() hook [23:24:31] Jul 04 16:23:31 well, mostly calls requestShutdown() -- about one in every 10 requests, requestShutdown() was omitted and it would just start the next request without calling it [23:24:52] you could try asking mwilliams in #hhvm-dev, he knows a lot more about it than I do [23:24:56] heh [23:30:33] here is the full IRC log: http://paste.tstarling.com/p/ZgecAp.html [23:32:22] I synced the change to include file and closure scopes in xenon traces, so the next full hourly graph should give some indication of the relative time we spend in those [23:37:44] so the way this would presumably work is: have mediawiki fully initialize, and then handle xbox/pagelet api requests in a loop [23:39:13] parsoid would need very few modifications, if any, because token handlers would still be making http requests [23:39:38] the only issue is, as gwicke says: "Ideally we would also enforce separation between batched requests to avoid non-deterministic behavior for stateful extensions." [23:40:33] pagelet and xbox requests start with a new, independent state, there's no function exposed to userspace code to accept a pagelet request [23:41:13] even if there was, you know how terrible HHVM's memory management is [23:41:36] you'd have to restart the server every few requests to release memory [23:41:44] xbox_process_message? [23:42:22] you don't call that, the runtime calls that [23:42:58] ahh. [23:43:43] so 'The pagelet server is similar to a CURL call to localhost. ' --- how is it any better than just, err, making a CURL call to localhost? [23:44:10] is the idea that its a means of having private entrypoints? [23:44:24] *it's [23:45:55] but anyhow, even if pagelet and xbox don't let you do that, there's no reason you couldn't use message-passing to process api requests in an already-initialized mediawiki context [23:46:22] you're sure there's no reason? [23:46:43] that sounds a bit scarry [23:47:17] what's the worst that could happen? leakage of private information between requests and horrible memory leaks [23:47:21] all good fun [23:47:24] I know I made global state hacks in CentralAuth and I bet I wasn't the first person to do something nasty like that [23:48:34] of course, it's a reasonable idea, on the face of it [23:48:37] OK, so now I get why you're interested in the initialization time of MediaWiki [23:49:00] you know domas wrote a CLI mode HTTP server embedded in MW for this purpose back in about 2006 [23:50:11] there are a number of little gotchas [23:50:28] like the fact that we can't switch between wikis without starting up MW again [23:51:06] memory management issues would require short-lived workers [23:51:07] right. [23:52:09] how much time *are* we spending in global config/setup? i guess the xenon graphs will show that soon enough, but what was the figure you had yesterday? [23:52:13] there is the kind of thing that bd808 mentions, just the long tail of request-local assumptions in class static data etc. [23:53:15] and there is the issue of poor library/runtime support for this kind of thing [23:53:33] the figure I had yesterday was 21ms for LocalSettings.php and 11ms for Setup.php [23:54:06] with xhprof I measured an API request that did essentiallly nothing [23:54:42] so it was those two kinds of startup overhead, plus another 10ms or so for ApiMain, output, shutdown, etc. [23:54:55] ~40ms overall to do basically nothing [23:55:14] IIRC it was a query request with an empty title list [23:55:49] * ori nods [23:56:17] and, lastly, is it really unrealistic to expect that parsoid won't need to make these API calls in our lifetime? [23:57:09] it's not unrealistic [23:57:17] my brain just refuses to be ok with the fact that the thing that was supposed to replace the php parser is its biggest user [23:57:19] batching is pretty easy though, so it shouldn't be a big issue [23:57:36] nod [23:57:53] another thing that we can do is provide an API for extension tag expansion [23:58:17] basically a batch API that calls Parser::extensionSubstitution on structured input [23:58:47] that sounds like a good idea [23:58:49] currently parsoid invokes the whole Parser::parse() for every extension tag