[00:00:00] ? [00:00:05] they're much higher [00:00:12] the scale is different on those graphs [00:00:47] oh, hm [00:02:13] it was like 300 => 500 [00:02:32] kind of ironic [00:03:16] https://graphite.wikimedia.org/render/?width=586&height=308&_salt=1418342580.337&target=MediaWiki.WikiPage.doEditContent.tavg&target=HHVM.WikiPage.doEditContent.tavg&from=-5weeks&logBase=10 [00:06:00] i'm not sure why the zend numbers dip toward the end [00:06:17] yeah, the actually kind of meet up at 500 [00:07:02] no idea what the increase was, it wasn't anything weekly [00:25:55] SMalyshev: did the meeting happen today? I had power outage and low smartphone battery issues. [00:58:37] AaronSchulz/ori: can one of you take a look at https://phabricator.wikimedia.org/T78237 ? it looks like it may be related to the editstash thing you're working on, and anomie thinks it may be replag (which we could fix by having vary-revision pull from DB_MASTER, if that wouldn't hurt performance too much) [00:59:19] sure, looking. [01:23:59] jackmcbarn: what makes you suspect the edit stash thing? [01:26:14] ori: i see it's doing stuff reusing revisions and checking vary-revision for something, and i didn't look closely at its logic for correctness. it's not my most likely suspect. slave lag is [01:26:27] (is it even on in prod now?) [01:26:41] it won't cache any output that had vary-revision set, unless there is some bug [01:26:55] it is, yeah. [01:27:10] which is based on the output, not the page (or it's options cache) [01:27:49] of course templates with revisionid could change in the meantime, though that is esoteric [01:28:05] and after a few sec it actually checks for template changes for sanity [01:29:11] ori: did you deploy the preview thing or just backport it? [01:29:22] * AaronSchulz doesn't want undeployed changes around [01:29:42] AaronSchulz: deployed it [01:32:59] didn't see the log entry [01:34:01] ori: https://gerrit.wikimedia.org/r/#/c/178787/ might help saving too [01:34:11] at least when any server is lagged [01:35:52] yes, there isn't a log entry, but i synced it [01:35:58] i checked two random servers just now [01:37:52] 3MediaWiki-Core-Team, MediaWiki-extensions-CentralAuth: No valid null revision produced during global rename - https://phabricator.wikimedia.org/T76975#843298 (10RobLa-WMF) a:3Legoktm [02:05:41] AaronSchulz: thoughts about having vary-revision pull from DB_MASTER? I don't know what the impact would be [02:08:24] where? [02:12:56] where what? [02:14:02] I don't know what code is being talked about [02:16:04] oh, i don't know either. i assumed you would. jackmcbarn? [02:23:33] AaronSchulz: when a page transcludes itself, it sets vary-revision, so that it doesn't transclude the old version of itself on parses during saves [02:23:47] vary-revision makes it reparse itself during saving, but it's not working on production [02:24:01] (oh, you knew that) [02:24:16] what are you asking then? [02:27:58] 3MediaWiki-Page-editing, MediaWiki-Core-Team: ipblocks query from EditPage unconditionally goes to master - https://phabricator.wikimedia.org/T51419#843344 (10RobLa-WMF) [02:34:58] actually, i think i may see a way to get rid of vary-revision [02:44:19] jackmcbarn: it doesn't load anything from the master [02:44:26] that's why I was confused [02:44:34] right. i was suggesting we change it that it should [02:44:36] or slave [02:44:43] well, not directly [02:44:51] a long chain of function calls down, inside the parser, it does [02:45:09] you mean using an old template? [02:45:23] an old template? [02:45:40] I mean the template text being stale due to lag [02:45:43] yes, that [02:46:07] anyway, the new idea i just thought of to fix this is to use the currentRevisionCallback to ensure it always gets the latest revision [02:46:10] i'm going to see if it works [02:46:37] we only have like <= 1 sec of lag normally [02:46:46] but that's all it takes [02:46:53] anything more than 0 pretty much does it [02:47:03] if the templates didn't change much it wouldn't matter [02:47:16] the problem is pages that look at their own content [02:47:27] {{foo}}aaa [02:47:27] is there some template that keeps getting edited all the time that also has pages getting edited all the time that also *need* that to be up to date? [02:47:41] i change aaa to bbb, and instead of seeing "bbbbbb", i see "aaabbb" [02:47:45] yeah, I guess self-transcludes would suffer much more [02:47:52] yes, that's what i noticed the problem with [02:47:58] i don't think there's a problem at all for non-self-transcludes [02:47:59] otherwise it would be so unlikely it wouldn't matter [02:48:09] so you want self-transcludes to load via the master but nothing else? [02:48:13] yes [02:48:17] at least on edit [02:48:19] yes [02:48:36] tha sounds reasonable [02:48:47] though I hate that parser function :) [02:48:56] which parser function? [02:49:37] the revision ID one, though this is a bit more general [02:49:48] nuking that won't help here [02:49:55] i'm going to try currentRevisionCallback to see if it will work [02:54:03] * AaronSchulz goes backing to making a hacky WDQ => orient sql parser [02:54:14] *back [03:37:12] woot, i got it to work with a callback, so no DB_MASTER needed [04:01:26] AaronSchulz: sorry, missed you msg. Yes, we had a meeting. mostly discussed ops things [04:13:05] 3MediaWiki-API, MediaWiki-Core-Team: allpages filterlanglinks DBQueryError - https://phabricator.wikimedia.org/T78276#843514 (10jayvdb) There is a very similar, but less frequently occurring case in T73971 : generator imageusage iufilterredir=redirects [05:54:57] TimStarling: check out (8.4 mb, but worth it) [05:56:44] that's from an hour of xenon-captured stack traces, with a snapshot taken once every 10 minutes per server [06:06:03] <_joe_> ori, TimStarling: do you see any possible pitfalls in upgrading the imagescalers to trusty/HHVM on the code side? I have been looking at packages and there is not so much work to do [06:06:13] ori: very nice [06:06:20] <_joe_> so, monday could be a good time to try [06:06:47] i know very little about the image scalers, unfortunately, but we have enough capacity now that we could just image one and try, no? [06:07:06] _joe_: upgrading imagescalers is generally pretty scary, it would probably be good to notify the community in advance [06:07:35] <_joe_> ori: that was the plan [06:07:39] I don't think I would want to try one and see how it goes [06:07:39] <_joe_> TimStarling: nod [06:07:48] not without announcing first [06:07:59] what sort of things are liable to go wrong? [06:08:06] the main thing we worry about is rsvg changing details of how it renders SVGs [06:08:10] <_joe_> TimStarling: I'm pretty sure puppet is ready [06:08:16] <_joe_> that ^^ [06:08:44] for example, minor changes in font config can screw up SVG rendering pretty badly [06:08:52] <_joe_> mediawiki now supports the actuall librsvg security model, while we had a different implementation [06:09:05] <_joe_> but that should be ok [06:09:17] <_joe_> fonts are scary, yes [06:09:53] I also have concerns specific to HHVM [06:10:15] we know that fork/exec is slow on HHVM, especially under concurrent load [06:10:45] so one possible syndrome would be to do a pilot deployment, and it seems slightly slower than how it was before [06:10:56] so we do a full deployment, ok, still a bit slower [06:11:02] <_joe_> TimStarling: I kinda expect it [06:11:20] then someone uploads 2000 new images and requests Special:Newimages [06:11:38] and we suddenly discover that our maximum throughput has dropped by a factor of 3 and we can't meet capacity anymore [06:12:10] <_joe_> we can't meet capacity /now/ [06:12:23] <_joe_> in that case, but I do agree fully [06:12:25] well, all the more reason not to make things worse [06:12:44] so my inclination would be to first do a throughput test [06:12:49] <_joe_> ok [06:12:51] small shell executions [06:13:08] converting a smallish image to an array of smallish sizes [06:13:41] then announce, then do a pilot deployment [06:13:57] then do the throughput test on the new HHVM server with the same benchmark [06:14:19] then you know if the site will fall over or not [06:14:31] btw you may want to try benchmarking with the "light process" feature [06:14:36] yeah, i was about to ask [06:14:39] what does that do, exactly? [06:15:06] <_joe_> sorry brb - I will read the backlog [06:15:32] it forks in advance [06:15:49] then subsequent shell executions are forked from the child process [06:16:25] when you fork/exec, you briefly acquire a lock on the kernel's view of the address space of the whole process [06:16:58] so if you do your high volume forks from a separate worker process, you avoid acquiring the lock on the main process [06:18:43] it seems like a good technique. is it common? [06:21:22] I haven't seen it before [06:21:37] <_joe_> TimStarling: ok, I'll start by benchmarking a tiny php script that just shells out to convert an image to a randomly chosen size from a set [06:21:55] sounds good [06:23:34] <_joe_> then *if* everything goes the way we'd like I'll convert one "normal" appserver to be an HAT imagescaler [06:23:41] <_joe_> out of the cluster [06:23:46] <_joe_> so that we can do more tests [06:23:53] <_joe_> rinse, repeat [06:23:59] <_joe_> ori: sounds good? [06:24:26] sure, if TimStarling think so.. I really know very little about the image scaler setup. [06:25:16] if the concerns are (a) performance under high concurrency and (b) correctness, a php script may not be enough [06:25:48] the script won't reveal subtle rsvg rendering differences, right? [06:26:17] so maybe an intermediate step should be to replay requests from one of the current image scalers and compare output [06:27:06] <_joe_> ori: nod, the best way to test that is to have a reimaged machine not in prod [06:27:20] <_joe_> yeah the "script" is useful for testing a) [06:27:37] <_joe_> not in prod == not in the cluster [06:27:46] right [06:30:23] <_joe_> ori: whoa the xenon map is /nice/ [06:30:57] it's xenon -> python script to munge stack traces -> https://github.com/brendangregg/FlameGraph [06:31:13] i'd like to turn it into a webapp of some kind so that the charts are generated daily (or continuously) [06:35:23] <_joe_> it would be super cool [06:35:39] <_joe_> (and of course history trends would be _very_ important) [07:10:52] oooh flame graphs! [09:29:38] 3MediaWiki-General-or-Unknown, Phabricator, Project-Creators, MediaWiki-Core-Team: Allow to search tasks about MediaWiki core and core only - https://phabricator.wikimedia.org/T76942#843950 (10Qgil) I think this is how it should be done, and I'm personally happy that no less than the MediaWiki team is willing to... [14:39:47] vagrant is sad: unable to locate package elasticsearch [15:14:46] <^d> manybubbles: Do you have any thoughts? https://secure.phabricator.com/D10955#inline-41282 [15:15:42] ^d: chacking [15:27:34] Does anyone happen to know what the correct phab project is for "Why is the database being stupid about optimizing this query?" [15:29:59] <^d> Hopefully [Database] soon if I get my way. [15:30:04] <^d> Right now, who knows. [15:33:32] * anomie will just CC springle on it and hope for the best [15:35:27] <^d> That's what I'd do [15:51:05] 3MediaWiki-API, MediaWiki-Core-Team: allpages filterlanglinks DBQueryError - https://phabricator.wikimedia.org/T78276#844362 (10Anomie) >>! In T78276#843514, @jayvdb wrote: > There is a very similar, but less frequently occurring case in T73971 : generator imageusage iufilterredir=redirects Only similar in that... [15:52:49] 3MediaWiki-extensions-SecurePoll, MediaWiki-Core-Team: When translations are changed on control wiki the change should be updated on jump wikis - https://phabricator.wikimedia.org/T74576#844367 (10Anomie) 5Open>3Resolved [15:53:12] <_joe_> manybubbles: einstenium will be the Titan server, by monday it will be ready - if anyone beyond stas needs access, please file an RT [15:53:21] <_joe_> so that on monday I can take care of that [15:53:24] <_joe_> as well [15:53:46] 3MediaWiki-extensions-SecurePoll, MediaWiki-Core-Team: Avoid holding many open DB connections when saving SecurePoll edits - https://phabricator.wikimedia.org/T78013#844369 (10Anomie) 5Open>3Resolved [15:53:47] _joe_: might as well give me, gabriel, and aaron access as well. [15:53:56] I can file an rt for that if you need it [15:54:23] <_joe_> manybubbles: ok, for you and aaron I think I need robla approval too. You know, access bureaucracy [15:54:49] <_joe_> *robla's [15:55:12] <_joe_> I know how annoying all of this is [15:55:44] bookkeeping [15:55:56] it's a good idea [15:58:11] <^d> manybubbles: By the way, I'm likely joining your query cabal if the quarterly planning goes as...planned. [15:58:23] ^d: fun [15:58:44] * bd808 is making no bets at this point [15:59:47] _joe_: 9078 [15:59:48] I'm pretty sure I got all of my monolog config problems figured out and fixed. [16:00:16] I really wish today was Thursday instead of Friday so I could roll them out [16:00:27] <^d> I won't tell anyone ;-) [16:00:56] The new patch chain should only change logging for testwiki [16:04:09] bd808: you can pretend you relocated to Baker Island. It is Friday 4am there so that is still a Thursday -> Friday overnight deploy [16:04:34] though in New Zealand it is already Saturday 5am :/ [16:05:25] I can just declare my house to be its own timezone. GMT-24 [16:06:30] https://gerrit.wikimedia.org/r/#/c/179368/ -> https://gerrit.wikimedia.org/r/#/c/179369/ -> https://gerrit.wikimedia.org/r/#/c/179370/ [16:08:28] I figured out that the errors I saw yesterday about undefined vars was due to not touching InitializeSettings.php *before* syncing. scap touches it after everything is pulled to the host. That leaves a small but real period of time that the cached settings are used instead of the new file. [16:09:47] And then there was the php function I used for the first time ever and didn't read the man page on closely enough. It's gone from the new code in favor of good old mt_rand() [16:12:13] <^d> manybubbles: Did you have any thoughts on the Phab mapping? [16:12:21] ^d: writing a response [16:13:19] <^d> Ah ok, carry on then :) [16:23:41] replied [16:23:43] ^d: ^^ [16:23:55] but its ugly because I wrote it in markdown and didn't preview it [16:26:43] <^d> Ah, for inline fixed-width text you use `foo`, not ```fooo``` btw. [16:26:55] <^d> I think that's what tripped you up. [16:32:41] My fingers just started doing markdown and I never really stopped to check [16:33:40] <^d> :) [16:34:08] <^d> Of course all the non-english is theoretical. Phab has shown that i18n isn't exactly high on the priority list :) [16:39:21] yeah [16:39:22] fair [16:39:37] the trigrams thing I think is more important to talk about [16:40:11] <^d> Yeah I was unsure too. I just copy+pasted @fabe's idea. [16:42:09] <^d> Ok, home -> office [16:51:55] _joe_: can you check that elasticsearch is still in our apt? I'm having trouble installing it in vagrant which is just funky. [16:52:06] I'm not sure how to do it or I'd do it myself [16:52:32] <_joe_> manybubbles: what distro? [16:52:35] <_joe_> trusty? [16:53:03] _joe_: yeah [16:53:15] <_joe_> manybubbles: still there [16:53:18] thanks [16:53:29] funky. I'll shake vagrant some more [16:53:46] <_joe_> manybubbles: maybe the apt config is broken there? [16:53:57] i'll check it, yeah [16:57:08] _joe_: doesn't come back from apt-cache but http://apt.wikimedia.org/wikimedia is in sources.list.d and apt-get update pulls from it.... [16:57:31] <_joe_> apt-get update? [16:58:43] _joe_: yeah - to pull the lists from the apt sources [17:00:03] <_joe_> manybubbles: is your box 32 bits? [17:00:21] <_joe_> the vagrant box I mean [17:00:24] never was befor [17:00:27] before [17:01:15] x86_64 [17:01:53] _joe_: should elasticsearch be in http://apt.wikimedia.org/wikimedia/dists/trusty-wikimedia/main/binary-amd64/Packages ? [17:02:44] <_joe_> no [17:02:46] <_joe_> in all [17:03:15] its in http://apt.wikimedia.org/wikimedia/dists/trusty-wikimedia/thirdparty/binary-amd64/Packages [17:04:28] <_joe_> and it's here http://apt.wikimedia.org/wikimedia/pool/main/e/elasticsearch/ [17:05:01] <_joe_> looked now, the conf looks ok [17:10:53] <_joe_> ori: I spent the morning complaining about a performance regression in imagemagick on trusty... just to find out that two labs instances created in sequence landed on two different hosts, thus having completely different I/O profiles. Apart from that, I have a pretty neat isolated test case I can work on. One thing I didn't get to look at is how to use the light processes in HHVM. [17:11:48] _joe_: set hhvm.server.light_process_count to a nonzero number [17:11:52] also, cool! [17:12:05] <_joe_> ori: that's it? meh :P [17:23:22] <^d> hashar: Is blackfire floss? I can't find the code. [17:23:29] <^d> [17:24:56] ^d: na apparently a sass :( noticed that after having send the mail [17:25:20] they have https://insight.sensiolabs.com as well [17:25:24] a code audit utility [17:25:27] but sass as well :( [17:26:44] I am off, good week-end [17:26:54] <_joe_> ori: seeing quite a bit of crashes in the api pool [17:26:57] <_joe_> Dec 12 17:21:47 mw1188 hhvm-fatal: [17:26:58] <_joe_> Dec 12 17:21:47 mw1188 hhvm-fatal: Failed Assertion: /usr/src/hhvm/hphp/runtime/vm/jit/type.cpp:243: static HPHP::jit::Type::bits_t HPHP::jit::Type::bitsFromDataType(HPHP::DataType, HPHP::DataType): assertion `false && "Unsupported DataType"' failed. [17:27:13] _joe_: i'll look [17:27:31] <_joe_> ori: thanks [17:27:46] <_joe_> it has made API have quite a few crashes, from what I see [17:27:55] <_joe_> 2/day/server from what I can see now [17:28:08] <_joe_> but it's still enough to look into it [17:36:11] <_joe_> ori: also, we're still dumping jemalloc heap profiles, wtf? [17:36:21] _joe_: ....where? [17:36:28] <_joe_> salt -G 'cluster:api_appserver' cmd.run 'ls -1 /var/tmp/hhvm/jeprof* | wc -l' [17:36:40] <_joe_> _everywhere_ [17:36:42] <_joe_> :/ [17:36:48] _joe_: ahhhhhhhhhhhhhhhhhh1!!2319i2310 [17:36:49] wtf [17:37:10] it's not everywhere [17:37:33] 70 out of 470 hosts [17:37:43] <_joe_> 470? [17:37:50] <_joe_> where did you get that figure? [17:37:50] err [17:38:00] how many do we have again? 270? [17:38:38] <_joe_> yes [17:38:55] <_joe_> and salt --output=raw -C 'G@cluster:appserver or G@api_appserver' cmd.run 'ls -1 /var/tmp/hhvm/jeprof* | wc -l' [17:38:59] <_joe_> gives a bad picture [17:39:11] <_joe_> every appserver but mw1017 (testwiki) [17:40:20] <_joe_> sorry, salt --output=raw -C 'G@cluster:appserver or G@cluster:api_appserver' cmd.run 'ls -1 /var/tmp/hhvm/jeprof* | wc -l' [17:40:42] okay, let's again remove the 'on but disabled' thing from the upstart script [17:40:51] give me a minute [17:40:57] i want to bundle it with another change [17:40:57] <_joe_> :/ [17:41:05] <_joe_> it seemed such a brilliant solution [17:41:12] <_joe_> but the docs clearly lie :P [17:41:49] <_joe_> I have seen a couple of crashes that happened right at the time of the heap dump :/ [17:42:08] <_joe_> but take your time [17:42:13] <_joe_> this is not priority 0 [18:50:31] ^d is the only member of the epic project, I think that was on purpose :) [19:02:27] <^d> No, it adds you automatically when you create a project. [19:02:50] <^d> No members now. [19:06:22] ^d: https://gerrit.wikimedia.org/r/179519 [19:08:13] thanks [19:09:28] <^d> yw [19:24:34] * ^d looks around for an Aaron [19:25:06] <^d> bd808: I've got a patch up for one of your index access spamlogs. [19:25:15] <^d> The one for FlaggedRevs looked easy [19:28:29] ori: https://gerrit.wikimedia.org/r/#/c/178724/ [19:28:31] <^d> AaronS: https://gerrit.wikimedia.org/r/#/c/179525/ [19:28:47] AaronS: do you want to reply to tyler? [19:28:55] <^d> Ah I see your response. [19:28:55] anomie: hmm, setWikiPage does call setTitle, but it does getTitle() first to see if the title it's setting is different from the one that arleady exists...I guess we could just remove that check and always set the title when setting a new wikipage? [19:30:14] legoktm: Yeah. Or use $this->hasTitle(). [19:30:43] 3MediaWiki-ResourceLoader, MediaWiki-Core-Team: Introduce MediaWiki:Mainpage.css - https://phabricator.wikimedia.org/T78418#844728 (10Jdlrobson) 3NEW [19:30:53] 3Mobile-Web, MediaWiki-ResourceLoader, MediaWiki-Core-Team: Introduce MediaWiki:Mainpage.css - https://phabricator.wikimedia.org/T78418#844728 (10Jdlrobson) [19:33:54] ori: you Kevin? done [19:35:27] AaronS: kevin? [19:35:38] *you mean [19:35:48] oh, right. [19:47:14] anomie: updated https://gerrit.wikimedia.org/r/179524 to just fix it in RequestContext [19:52:08] 3Mobile-Web, MediaWiki-ResourceLoader, MediaWiki-Core-Team: Introduce MediaWiki:Mainpage.css - https://phabricator.wikimedia.org/T78418#844799 (10matmarex) Doesn't sound unreasonable to me. By the way, the Polish Wikipedia currently has a default-on gadget that implements styles for the main page (which is as mo... [19:58:46] bd808: I need the job runner exception logs on beta from 1-2 days ago...where should I be looking? [19:59:10] Files are in /data/project/logs [19:59:31] Things should be in logstash too [19:59:42] aha, thanks [20:23:13] ori: Notice: Undefined variable: stacks in /srv/mediawiki/wmf-config/StartProfiler.php on line 122 [20:41:20] bd808: thanks, i'll fix [20:41:35] I put up a patch for it that might work [20:41:50] https://gerrit.wikimedia.org/r/#/c/179539/ [20:48:18] 3MediaWiki-extensions-General-or-Unknown, MediaWiki-Core-Team: Pass-by-reference arguments not passed by reference through StubObject::_call() - https://phabricator.wikimedia.org/T78427#844896 (10bd808) 3NEW [20:48:34] 3Language-Engineering, MediaWiki-extensions-General-or-Unknown, MediaWiki-Core-Team: Pass-by-reference arguments not passed by reference through StubObject::_call() - https://phabricator.wikimedia.org/T78427#844896 (10bd808) [20:57:59] bd808: is it bad manners in php to do $myArray['keyThatMayOrMayNotExist'] += 1 ? [20:58:18] In strict error mode, yes [20:58:49] At the log level we are running hhvm at it will emit a Notice [20:58:54] for the undefined key [20:58:59] what's the proper idiom? $myArray['key'] = isset( $myArray['key'] ) ? 0 : $myArray['key'] + 1 ? [20:59:50] err, flip the consequents of that ternary [21:00:04] I'd do `if (!isset($array['key'])) { $array['key'] = 0; }` and then the += [21:00:15] which is ugly as hell but safe [21:00:58] This is not a graceful corner of php [21:01:17] nothing like the python defaultdict that I'm aware of [21:08:57] are you guys aware of the warnings the profiler generates in prod? [21:09:34] MaxSem: yeah Ori is working on a fix [21:10:08] ^d: https://gerrit.wikimedia.org/r/#/c/173098/2 [21:10:57] <^d> bd808: I wonder if we can drop function-level wfProfile* calls now that we've got hhvm on all apaches. [21:11:16] <^d> I don't think we need to wait for job runners or image scalers, we don't profile those. [21:12:03] If we don't then I think the only blocker is making sure we have good docs on xhprof profiling [21:12:16] which I think you started on at least [21:12:43] <^d> Yeah, Manual:Profiling and its /Xhprof subpage should be a decent start. [21:12:53] We could always get xhprof built for our crusty old php5 if we needed to [21:13:03] <^d> AaronS: What kind of error do you want to do on https://gerrit.wikimedia.org/r/#/c/179525/? Throw an exception? Something less fail-y? [21:13:15] dieUsage I guess [21:13:21] <^d> Ah, yeah that'd work. [21:14:45] AaronS: If you get really bored and want an interesting challenge -- https://phabricator.wikimedia.org/T78427 (pass-by-ref and StubObject not playing nice together) [21:16:19] those methods need refactoring then [21:16:34] I ran into that problem with WikiPage and the article b/c stuff [21:16:58] I hack is to subclass the stub and manually enumerate those method proxies [21:17:06] There are several places in core that call $wgContLang->findVariantLink() [21:17:25] ah. ok [21:18:35] bd808: see the last methods in Article.php [21:18:42] it's a similar problem with _get/_set magic [21:19:34] 3Language-Engineering, MediaWiki-extensions-General-or-Unknown, MediaWiki-Core-Team: Pass-by-reference arguments not passed by reference through StubObject::_call() - https://phabricator.wikimedia.org/T78427#844962 (10aaron) Either the methods should be changed or the stub class can use a stub subclass that enum... [21:21:31] <^d> AaronS: https://gerrit.wikimedia.org/r/#/c/179525/ amended to use dieUsage() [21:29:16] <^d> Aw crud, it was taken. [21:29:19] <^d> Too good to be true. [21:30:16] ^d: https://gerrit.wikimedia.org/r/#/c/178591/1/wmf-config/StartProfiler.php every server is on hhvm, so this is fine right? [21:32:26] <^d> AaronS: Well I guess it prevents us from using --profiler=text on the cli. [21:32:38] I didn't change that part yet [21:32:58] AaronS: are you here in the office or wfh today? [21:33:08] wfh [21:33:12] <^d> AaronS: Ahh, I see it, you're right. [21:33:13] * AaronS babysits the jvm [21:33:15] <^d> Yeah, probably safe then. [21:33:30] <_joe_> AaronS: eww, it's a bad job :) [21:34:13] <^d> _joe_: I'd take babysitting the jvm over babysitting a node.js thingie :) [21:35:11] <_joe_> robla: I have an access request (+ sudo rights) for AaronS and manybubbles, if you manage to state your approval before ops meeting on monday that would speed things up. The ticket is https://rt.wikimedia.org/Ticket/Display.html?id=9078 [21:35:20] approve [21:35:57] <_joe_> (I need to take any sudo request to ops meeting, alas) [21:36:27] <_joe_> ^d: well, I'm unsure about that [21:37:10] <^d> jvm is a known beast for me. I know jack-shit about javascript or hip things like node :) [21:38:23] <_joe_> I babysat jvm apps for 3 years, node for much less and it's usually failing in less horrible ways [21:38:51] * bd808 did 7 years as lead at a java shop and still has flashbacks [21:39:38] <_joe_> I didn't see your GC scars though :) [21:40:28] A "minor" update changed the GC behavior on the -XX:whatever flags we were using and I had to live have the whole cluster from my laptop in the hallway at OSCON one year. [21:40:41] *live hack [21:41:06] <_joe_> oh, I was taking my first weekend off in barcelona the night of the leap second [21:41:18] <_joe_> when all java apps went cuckoo [21:41:30] <^d> oh man, to hell with those those leap seconds. [21:41:34] <^d> That was so freaking retarded. [21:41:37] *nod* I do not miss carrying a pager [21:42:23] <_joe_> and the on-call guy woke up to the whole SOA architecture screaming. I spent the night rebooting things :/ [21:43:24] <_joe_> as in any java enterprise SOA arch, things won't heal if not rebooted in the right order [21:44:26] <^d> robla: You asked me what I want more of. I want more Java :) [21:45:05] <_joe_> and a pager [21:45:11] 3Mobile-Web, MediaWiki-ResourceLoader, MediaWiki-Core-Team: Introduce MediaWiki:Mainpage.css - https://phabricator.wikimedia.org/T78418#845007 (10Edokter) +9000 I would like some sandbox mechanism though, that allows me to test CSS wihtout having to load it manually via importStylesheet, withCSS or some other h... [21:45:12] <_joe_> he also wants a pager [21:45:34] * _joe_ grins [21:45:44] And TPS reports! [21:46:05] <_joe_> oh my [21:46:18] ^d: use the TAI ( https://en.wikipedia.org/wiki/International_Atomic_Time ) it does not depends on earth rotation :] [21:46:44] can generate the zone info with http://www.bortzmeyer.org/files/gentai.pl [21:46:53] Nikerabbit: where is the config to change? [21:46:59] % TZ=Etc/TAI date && date -u [21:47:00] Fri Jul 6 21:00:16 TAI 2012 [21:47:00] Fri Jul 6 20:59:41 UTC 2012 [21:47:26] hashar: sorry for not re-reviewing your hhvm on contint patch, i'll do so later today (pst) [21:48:07] ori: tis ok, I worked on it this afternoon and eventually found out some more cruft I have fixed. It only uses Repo.Central.Path set via an env variable now [21:48:32] ori: have to decide about hhvm ensure => latest still, since you had a concern about it. I think that is the main blocker [21:49:02] ori: I have hhvm installed on Trusty CI slaves now, will add a mediawiki-phpunit-hhvm job soonish [21:49:41] _joe_: are you still around? i'd rather not have contint do ensure => latest on hhvm packages, but hashar and Krinkle were concerned that ops wouldn't remember to upgrade the integration slaves whenever the package is updated. can we make sure some how that it's part of the process? [21:50:17] might be a single step using salt on virt1001 [21:50:32] bd808: have you ever tried to turn a vagrant vm into a standard vm? [21:50:36] <^d> _joe_, bd808: What's a pager? Must be from before my time. [21:50:50] csteipp: "standard"? [21:51:08] "Something I can give to another person who can't run vagrant on their host" [21:51:09] bd808: it's this thing that exists outside the php world [21:51:13] <_joe_> ori: I think ensure => latest would allow us to run unit tests before we deploy, and that is a plus [21:51:14] j/k [21:51:33] <^d> csteipp: It's just a normal VM really. You should see them in the list of your VMs in VirtualBox. [21:51:37] csteipp: oh... can they run Virtualbox? [21:51:42] <^d> Just with some nice wrappers for puppet & such. [21:51:49] <_joe_> but well, I do see the reasons for concern - if it crashes, it would grind CI to a halt, right? [21:52:50] ^d, and MW directory shared from host, so just grabbing the VM image will not work [21:52:56] _joe_: well, maybe one in every thousand test runs are in preparation for a package upgrade. the other 999 are for php code deployments, and those are better run against the version that they'll be deployed to. [21:53:12] <_joe_> ori: true, I was thinking of that [21:53:24] csteipp: I haven't done it with vagrant doing the initial provisioning, but I used to do exported VM images all the time at $DAYJOB-1. But like MaxSem points out there will be a lot of manual config [21:53:44] Although you could fix that by adding the /vagrant content locally in the vm before exporting [21:53:48] ori: _joe_: Yeah, plus, that'd be testing in production. Playing with live commits that need testing. If it is useful as a test, surely there's a way to build/install the package outside apt.wikimedai.org on an instance and run some tests on it. [21:53:50] <_joe_> ori: I'd probably prefer to do ensure => $version everywhere, but that has undesirable consequences [21:54:05] Which I'm kind of assuming ops is doing already. [21:54:11] csteipp: then you'd just have to document the port forwards needed [21:54:11] Test the packages, in some fashion. [21:54:23] <_joe_> Krinkle: we do try it on beta before deploying usually [21:54:25] bd808: $wgTranslateTranslationServices ? https://noc.wikimedia.org/conf/CommonSettings.php.txt [21:54:28] and we run the unit tests on osmium [21:54:37] Nemo_bis: Thanks. [21:54:43] <^d> MaxSem: Yeah we just figured that out in person :p [21:54:49] <_joe_> but, the main concern would be a major upgrade [21:54:50] MaxSem: Yeah, that was the first problem I hit. I think I can just adjust the mount point, but was hoping for an easier way :) [21:55:06] _joe_: try, as in updating the package, or installing the package (e.g. push to apt.wm.o, update on beta, then in prod; or build locally in beta, then push to apt.wm.o and prod) [21:55:31] <_joe_> Krinkle: usually build on a build server => beta => apt [21:55:44] <_joe_> or apt => beta, depending on the size of the upgrade [21:56:23] <_joe_> if we're just tweaking the debian scripts, there is no reason to avoid uploading the package to beta [21:56:48] _joe_: cool [21:57:31] <_joe_> Krinkle: but well, I'll put up a page on wikitech with the upgrade procedure once we're a bit settled, and I'll ping you and hashar to add details for contint [21:57:37] _joe_: What would you say is the average lifespan of a package version in apt.wikimedia.org that is not deployed on prod hosts that use that package. [21:58:06] _joe_: sounds good [21:58:08] minutes, hours, days, weeks? [21:58:08] <_joe_> uhm, right now usually 1-2 days, but it's going to be a longer timespan in the future probably [21:58:25] we could have a Jenkins job that run the bunch of php tests we have against a proposed hhvm version to ensure the upgrade will go well [21:58:34] <_joe_> because well, fingers crossed, we're not in "omfg it's on fire" mode [21:58:37] _joe_: and during those 1-2 days it's essentially taking nodes out of rotation and upgrading them manually over ssh? [21:59:03] <_joe_> Krinkle: I do that in batches, it doesn't take that long [21:59:13] _joe_: cool. [21:59:16] csteipp: The easier way would be to ssh into the vm, copy /vagrant somewhere, and then copy it back to the right place after starting the vm without using shares [21:59:45] Or bundle /vagrant as a separate disk and mount it there via virtualbox config [21:59:50] ^d: labs uses hhvm for cli mode right? [21:59:51] <_joe_> well, now it's on 233 appservers, so it'd take more time (but I'm thinking of ways to automate it) [22:00:03] _joe_: Have you ever had to abort an upgrade and decided to retract the version from apt.wikimedia.org (or another minor version taht undoes it, whatever the way is). [22:00:22] <_joe_> Krinkle: once [22:00:38] * Krinkle knows extremely little of debian, aptitude and packages. [22:00:39] AaronS: hhvm is only on the deplopyment-mediawiki0[1-3] boxes in beta AFAIK [22:00:40] <_joe_> or maybe twice, I don't remember. But that happened early on [22:00:49] _joe_: OK. [22:01:02] bd808: can you comment on https://gerrit.wikimedia.org/r/#/c/178591/ ? [22:01:14] it can wait of course, it's just cleanup [22:01:33] <_joe_> ori: so yeah I agree. We shouldn't ensure => latest on CI, maybe upgrade it the day before we upgrade prod [22:01:36] it'll fatal on zend if xhprof isn't installed... [22:01:38] _joe_: +1 [22:01:48] <_joe_> and do some decent docs [22:02:13] <_joe_> hhvm docs on wikitech are... well... "suboptimal" right now :P [22:02:13] csteipp: this seems to work ok: [22:02:15] mount --move /vagrant /vagrant-temp ; rsync --progress --archive --recursive /vagrant-temp/ /vagrant/ --exclude=/vagrant/logs [22:03:11] _joe_: While I don't like the idea of 'latest' either (especially when it comes to lower-level dependencies eg. npm or php composer), I'm curious exactly what the fear is. Since apt.wikimedia.org is controlled by us. It's not like upstream would push an update onto us? Or do we let certain things go through? [22:03:28] Oh and I guess ti woudl cause upgrades during runtime by puppet instead of at a convenient time [22:03:39] <_joe_> Krinkle: that [22:03:51] OK :) [22:04:06] <_joe_> Krinkle: puppet would risk to restart a significant portion of the HHVM services in a short timespan [22:04:21] <_joe_> and we know we gotta be super-careful too [22:04:22] _joe_: Yeah, you'd need to orchestrate it properly. To ensure uptime. [22:04:29] OK. [22:04:48] <_joe_> also, I'd like to be able to deploy a new package version to a subcluster [22:05:04] <_joe_> but that's well outside the scope of our discussion :) [22:05:14] Yeah, I heard abotu that. Makes perfect sense. [22:07:07] And Id very much like for the integration labs instances to become closer part of operations scope when it comes to maintenance and compatibility. While it's hosted in labs, it's not a user-level application. We might as well have a "production virtualisation cluster", and we have, it's just that we re-use labs for it. They're esentially just one of the first services where WMF goes virtual for produc [22:07:07] tion. It'd be great to have ops help maintain the puppet classes, are of what they do, don't break them by accident, and included in upgrades, as for any other subcluster in prod. [22:07:14] robla: http://graphite.wikimedia.org/render/?width=1110&height=570&_salt=1418341982.047&target=HHVM.WikiPage.doEditContent.tavg&from=-10days [22:07:21] 3Mobile-Web, MediaWiki-ResourceLoader, MediaWiki-Core-Team: Introduce MediaWiki:Mainpage.css - https://phabricator.wikimedia.org/T78418#845045 (10Jdlrobson) See also T32405 - this would allow us to deprecate a bunch of code debt we have in MobileFrontend. [22:14:56] AaronS: http://people.wikimedia.org/~ori/2014-12-12_20.svg [22:15:29] bd808: in initialize-settings? [22:15:34] aaaaaaaaaaaaaaaaa [22:15:35] in translate section [22:15:37] AaronS: robla: for your next public report you can use movingAverage( metric, X ): http://graphite.wikimedia.org/render/?width=1000&height=600&_salt=1418341982.047&target=movingAverage(HHVM.WikiPage.doEditContent.tavg,200)&from=-10days [22:15:38] aaaaaaaaaaaaaaa [22:15:43] aaaaawesome [22:16:01] <_joe_> AaronS: https://graphite.wikimedia.org/render/?width=800&height=600&_salt=1418422492.314&target=MediaWiki.WikiPage.doEditContent.tp99&target=HHVM.WikiPage.doEditContent.tp99&from=-30days&yMax=15000 this shows tp99, which is even more meaningful [22:16:11] Nikerabbit: I think I found the config section but I have no idea what to put in there if 0 is the the right value for a default cutoff [22:16:13] hashar: much cleaner looking [22:16:27] AaronS: ideally setting the left axis to start at Zero [22:16:45] _joe_: 99 isn't useful for what I was looking for [22:16:46] <_joe_> (the final drop in php is from when we had almost no php servers left and almost no traffic [22:16:58] ori: you should blog about those flames. Looks nice [22:17:01] Nikerabbit: And if there is a "right" value it should be the default in the class shouldn't it? [22:17:03] <_joe_> AaronS: oh ok. [22:18:49] _joe_: this is for edit stashing, 99 will bias towards cases where it wasn't used (bots/ve) [22:19:04] <_joe_> AaronS: yeah got that [22:19:20] ah, ok :) [22:19:23] <_joe_> I am still in drag racing mode :) [22:20:12] <_joe_> AaronS: I got that once you pointed out tp99 wasn't meaningful; I figured you wanted to show something different from "hhvm is faster" [22:21:05] ori: I wonder if the optimistic html->wikitext stuff in ve could also call the stash api [22:27:02] ^d: found https://gerrit.wikimedia.org/r/#/c/179562/ [22:27:04] fun fun [22:27:16] we should have some kind of regression testing for this.... [22:28:02] <^d> Gah, that's no good. [22:31:01] ^d: dropped for a second. [22:31:05] ^d: it made testing whatlinkshere's autocomplete a bitch [22:31:33] I kept having to go up and up and up the return path to find the bug.... [22:31:35] <^d> I'm not a fan of that whatlinkshere. [22:31:42] ^d: I think its cute [22:31:52] <^d> It's harmless, I won't block it. [22:32:14] something good came of it: i noticed the prefix search for special pages was busted! [22:32:22] maybe that needs SWAT [22:33:34] ^d: yeah - its busted on mw.org [22:33:50] oh, and enwiki [22:33:53] <^d> Ouch. [22:33:54] so broken everywhere! [22:34:01] weeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee [22:34:11] well, I'm going to go start my weekend! [22:34:16] have fun! [22:38:20] <^d> bd808: I'm not getting status codes from ES that I'd like. Thoughts. `curl localhost:9200//_search/` returns normal json I'd like. [22:38:33] <^d> `curl -I localhost:9200//_search/` gives me a 404 status. [22:38:36] <^d> What gives? [22:38:58] ori: https://gerrit.wikimedia.org/r/#/c/174214/1 [22:39:10] ^d: They didn't implement HEAD? [22:39:19] <^d> Hmm. [22:39:32] this is java. it's not free [22:40:05] <^d> I'm trying to debug my PHP with curl. [22:40:07] <^d> lol. [22:40:17] meaning that with a php endpoint apache will just toss the body if you make a head request [22:41:08] <^d> public function indexExists() { [22:41:08] <^d> try { [22:41:08] <^d> return (bool)$this->executeRequest('/_search/', array()); [22:41:10] <^d> } catch (HTTPFutureHTTPResponseStatus $e) { [22:41:11] <^d> if ($e->getStatusCode() == 404) { [22:41:13] <^d> return false; [22:41:15] <^d> } else if ($e->getStatusCode() == 400) { [22:41:17] <^d> return true; [22:41:19] <^d> } [22:41:21] <^d> throw $e; [22:41:23] <^d> } [22:41:25] <^d> } [22:41:27] <^d> I'm curious why I return 400s and 404s :\ [22:41:31] AaronS: have you tested it? [22:41:46] AaronS: i'm not in the head-space to review it thoroughly, but i can merge it if you have [22:42:52] bd808: 0.65 should be okay for wmf I think [22:43:11] Nikerabbit: ok. I'll propose a config patch [22:43:36] ori: locally, a while back, probably should look at it again [22:43:52] AaronS: i'll do a proper review [22:48:40] Nikerabbit: https://gerrit.wikimedia.org/r/#/c/179566/ [22:52:15] where does hhvm put it's core dumps on vagrant? [22:52:36] I don't think we have them enabled do we? [22:53:27] hmm, well it just segfaulted while I was running tests :/ [22:54:25] core dumps would not be helpful for most mw-v users [22:54:40] so I don't think we have it setup to capture them [23:38:14] legoktm, I have questions about SUL finalization [23:38:20] 1) when? [23:38:37] 2) will it mean that every user will be present on every wiki? [23:39:15] ...or at least transparently autocreated on demand? [23:40:23] MaxSem: 1) April-ish 2) Autocreated on demand [23:40:35] grr grr [23:41:03] should I just manually create local CA users by then? [23:41:29] or wikidata should already have local accounts due to global login? [23:42:08] why do you need to create local accounts? [23:42:16] everything should just autocreate [23:42:30] to submit WikiGrok answers on behalf of these users [23:43:37] I remember this was a problem a while ago so I even had to write createLocalAccount.php [23:43:40] is that being done over the API? autocreation should work then. If it's server-side, there's some PHP code you can call to auto-create [23:44:22] server-side [23:45:23] https://github.com/wikimedia/mediawiki-extensions-CentralAuth/blob/master/includes/LocalRenameJob/LocalUserMergeJob.php#L60 [23:45:28] MaxSem: OAuth also needs to auto create all users... might be able to colab on that [23:48:41] legoktm, that code doesn't seem to update site_stats... [23:49:09] bah [23:49:46] good thing it's not running in prod yet :P [23:51:54] shall I open a bug?