[00:02:17] that's a better one than "what's your mother's maiden name?" [00:04:39] BASIC!!1 [00:05:39] Tandy Level II BASIC [00:06:29] omg TIL that my first language was from microSoft [00:06:41] https://en.wikipedia.org/wiki/TRS-80#BASIC [00:38:19] TimStarling: would you be up for reviewing https://gerrit.wikimedia.org/r/#/c/244586/ ? [17:44:03] anomie: does https://phabricator.wikimedia.org/T97720#1728729 seem sane? [17:44:09] (before I start writing code) [17:44:24] legoktm: In a meeting, will look in about 15 minutes. [17:47:05] legoktm: oh man, doing this for action=upload is going to *suck*, let me tell you [17:57:41] legoktm: I don't see a problem with a 'tag' type for the API. No idea on the ManualLogEntry thing, although I note that the API doesn't actually create log entries directly very often. Usually it's somewhere on the spectrum from "call a web UI form-submission handler to take the action" to "real backend logic for doing the action", so the tags would need to get passed down somehow. [18:00:20] hmm right [18:00:26] ok, I'll start on the tag type first then [18:01:33] legoktm: https://gerrit.wikimedia.org/r/#/c/228781/11/includes/GadgetDefinitionNamespaceRepo.php,cm can use pcTTL now btw [18:02:50] AaronSchulz: the docs said that it doesn't cache false? [18:03:28] so it'll work for getGadgetIds(), but getGadget() should process cache false [18:08:26] csteipp_afk: A quick review of how I did password resets in https://gerrit.wikimedia.org/r/#/c/247858/ would be much appreciated. I'd like to deploy it tomorrow if it's not horrible [18:08:26] legoktm: you could have that return null instead of false [18:09:17] bd808: Yeah, I'll do that [18:17:29] AaronSchulz: ok, I'll try that [18:23:43] anomie: hmm, I'll just go with more aggressive values then instead of pushing that down the road [18:24:33] I was going to end up doing that anyway [18:26:28] Even 10 for 'max lag' can suck since post-save redirects can block for that long in MASTER_POS_WAIT() for everyone. [18:26:35] * AaronSchulz wants to push that even lower for wmf [19:36:35] bd808: What's the email backend for ieg? [19:37:28] csteipp: it uses phpmailer [19:37:39] https://github.com/wikimedia/wikimedia-iegreview/blob/master/composer.json#L7 [19:37:55] and this wrapper class -- https://github.com/wikimedia/wikimedia-iegreview/blob/master/src/Mailer.php [20:54:16] anomie: what do think of 7/10 on https://gerrit.wikimedia.org/r/#/c/243089/ ? [20:56:57] AaronSchulz: 7/10 seems like it's probably ok, reliably hitting a lag of 7-9 seems like it could be reasonably difficult. [20:57:23] anomie: also, I'll follow up with some default ping limiting too [20:57:39] which should make it harder to create lag by spamming [20:57:57] though there are still other ways to make lag (EditWatchlist and huge additions/removals) [20:58:10] * AaronSchulz needs to limit how much stuff can change there at once [20:58:29] lag spikes often trace back to clearing of watchlists, funnily enough [21:04:29] anomie: those are the values in the latest PS [21:23:10] csteipp: I added a 48 hour token expiration and a safe hash comparison to https://gerrit.wikimedia.org/r/#/c/247858/ [21:23:49] bd808: Cool! I'll look at it again once I get out of meetings for the day... [21:26:27] perfect, and thanks [22:14:39] AaronSchulz: TimStarling: Updated version of the flowchart https://i.imgur.com/zsihMYT.png [22:16:23] Don't we have SSL not on nginx? [22:17:00] Again, the main context is the flow as resulting from making a MediaWiki page view request and everything that cascades from it (until the PHP process ends, kind of). Not all of WMF. [22:17:32] Reedy: What do you mean? [22:18:04] I don't think we terminate SSL at all datacentres on nginx [22:19:32] Hm.. interesting. [22:19:41] Do we use somethign else in the cache POPs? [22:19:54] I thought it was being done on the varnishes [22:20:01] I can't remember if we replaced nginx fully from ssl termination [22:20:43] Wait, we're migrating from nginx to varnish for ssl? [22:20:54] I thought there were issues with varnish's ssl support [22:20:56] and also SPDY [22:21:36] Brandon has done a lot of work.. [22:21:38] If anything, I'd expect some leftover clusters to migrate the other way. [22:23:04] I could be wrong [22:23:41] I wonder if I'm thinking of nginx being on seperate boxes or something [22:23:43] varnish never terminates SSL, only nginx [22:23:49] in that context [22:27:57] chasemp: Reedy: Do we multi-purpose boxes of other services? E.g. I know that varnish f/b are on the same server, and we do the same with apache/memcached. [22:28:16] Do we do that with any other services? E.g. does nginx have dedicated boxes? [22:28:24] Memcached is seperate [22:28:28] We have a proxy on the apaches [22:28:34] f/b? [22:28:38] frontend backen [22:28:39] d [22:28:40] front/back [22:28:41] in some cases memcached and redis are colocated [22:29:03] Ah, okay. So it's just a local proxy (twemproxy, right?). [22:29:13] nutcracker now IIRC [22:29:16] Right [22:29:19] twemproxy became nutcracker [22:29:20] and nginx runs on teh same box as varnish in cases I have seen, it used to be different [22:29:34] chasemp: Ah. That's waht I'm thinking of t hen [22:29:38] But it's true in the past we did have memcached and apache on the same servers right? So that sometimes it would go to the local server. [22:29:59] I guess it grew too large? [22:30:02] They definitely were seperate boxes at one point, in some datacentres [22:30:11] (nginx/ssl termination) [22:30:24] it was more of an architectural issue than a performance one iirc [22:30:35] Yeah, in the early days. I guess we changed it to improve latency and scale now that we're more HTTPS_only. [22:31:05] I think from day one, it was only seperate in some datacentres [22:38:03] TimStarling: any chance you could review https://gerrit.wikimedia.org/r/#/c/244586/ ? [22:46:47] bd808: do you remember if we keep logs of how many users log in and hit the sul rename warning? [22:47:01] (and go through the magic ~$wiki stuff) [22:47:23] ... I think we do, but let me look at the code [22:49:47] legoktm: https://logstash.wikimedia.org/#dashboard/temp/AVCMl-7eptxhN1XaBE5D [22:50:14] same as a grep for "CentralAuthMigration: Coercing" in the CentralAuth.log [22:50:54] We should track that in graphite or something [22:52:05] bd808: yes please, one of the stewards was curious about it [22:53:10] (I just showed him the logstash graph for now) [22:56:22] how do I get my hands on the RequestContext from inside CentralAuthPlugin? [22:57:25] RequestContext::getMain() [22:57:45] *nod* I just figured that out [22:57:52] * bd808 is a slow learner [23:11:13] TimStarling: should I can look at the preprocessor patch if you don't want to? [23:18:02] bd808: isn't centralauth.migration.check going to just trigger whenever someone enters a bad password? [23:18:41] Reedy: Updated version http://i.imgur.com/EUTVkED.png [23:19:11] Krinkle: I forgot to say, it's pretty cool :) [23:20:18] Thanks. I made it with Cacoo [23:21:33] What about Swift and file storage? :D [23:21:45] Reedy: We don't hit that within a page view request afaik. [23:21:59] Or do we query swift when parsing [[File:]] syntax? [23:22:18] That's the only abstraction I'm trying to keep. The lifetime of a page view. [23:22:27] Gotta limit it somewhere or it'll be lines everywhere :P [23:22:36] heh [23:22:49] I'm also not including the eventlogging internal structure and kafka. Anywhere you see a cloud, one could link to another flowchart. [23:23:05] statsd -> statsite -> graphite [23:26:54] Krinkle: wow, that's awesome [23:27:40] Krinkle: It is, indeed, very nice. [23:27:49] OK one more update; http://i.imgur.com/5i4gHC7.png [23:27:51] Krinkle: Have you done the MW DB SVG export recently? [23:27:59] James_F: Not since 1.25 [23:28:10] 1.24 * [23:28:14] Krinkle: I'm late to the conversation. These are great diagrams. What are you using them for? [23:28:17] Krinkle: Hmm. Might be fun. [23:28:36] Krinkle: But now you have the public Internet and services like Kafka both with blue clouds, which is odd. [23:29:31] The idea is that a cloud represents a collection of services with other stuff behind it. E.g. simplified to the concern from MW's perspective. [23:29:39] But yeah, the color should be unique for the internal ones [23:30:36] legoktm: that's inside the if that has already decided there is a ~wiki user with the same name I think [23:31:25] bd808: User::getCanonicalName() just checks for validity [23:33:08] Hmm. I'll look then. Both the log and the metric should only fire when the second account exists. [23:33:20] James_F: http://i.imgur.com/EAQovPH.png [23:33:48] * bd808 is cooking dinner and watching sportsball [23:34:33] Krinkle: I approve. [23:34:39] dapatrick: I'm not using it for anything in particular at the moment. It's something I'm doing every once in a while to see if my knowledge is still accurate. And it can also help others to understand things, and for conference talks; and at the moment it may also help figure out how Kafka could fit as MediaWiki's event bus - something we're considering at [23:34:39] the moment. [23:35:34] Krinkle: I see. [23:37:39] These are helpful to check my understanding of the infrastructure. [23:37:49] I didn't realize there were Varnish frontends and backends. [23:38:13] heh [23:38:14] Or, two layers of Varnish. [23:38:22] it's 2 instances of varnish [23:38:46] dapatrick: One layer for the hottest urls (random distribution) and another layer for less hot urls (hash based distribution) [23:39:06] Ah, I see. [23:39:33] you can't enough pages on a single varnish server to have good cache coverage [23:39:44] and distributing hash based only would put load of a popular url on a single server [23:39:57] hence 2 layers [23:40:35] At least, that's what I'm told. But hey, I'm not ops. I just parrot around what I hear around and try to understrand. [23:42:51] Last update for today: http://i.imgur.com/dZ6DsAM.png