[04:36:20] TimStarling: Ping regarding https://phabricator.wikimedia.org/T146285#3827389 - It still has an indirect sub task open about byte cache, but all its direct sub tasks are closed. Not sure if that bytecache issue is a blocker or not for mwscript using php7/hhvm instead of php5. [04:39:23] you mean T161598? I fixed T145819 [04:39:24] T145819: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesize limited to 512MBytes - https://phabricator.wikimedia.org/T145819 [04:39:24] T161598: Monitor HHVM bytecode cache depletion on mediawiki app servers - https://phabricator.wikimedia.org/T161598 [04:48:43] TimStarling: Yeah [04:57:25] TimStarling: Thanks, I'll take that as meaning mwscript switch is unblocked? [04:57:55] yes, added comment [04:57:58] yay [13:55:47] Krinkle, MatmaRex: As far as I can tell, since that test never actually executes the API module it doesn't matter what the query is. ApiResult hasn't taken an ApiMain as its first parameter since MW 1.25, all that code could be cut now. [13:57:36] addshore: I don't know, but that serializing of huge objects like Article probably needs to stop regardless. Whatever is important there could hopefully be extracted by the maintenance script that cleans things up. [14:20:59] <_joe_> I am working on taking EtcdConfig to production, and one of my needs is observability; specifically, I need to know which version of the config is currently loaded on an appserver - that will be collected once https://gerrit.wikimedia.org/r/#/c/411286/ is merged. What's the best way to expose the value of that method to be fetchable via HTTP? Add it to meta=siteinfo as TimStarling did for [14:21:05] <_joe_> wmfMasterDatacenter in https://gerrit.wikimedia.org/r/#/c/351232/ ? [14:24:33] _joe_: That would probably be the most straightforward way to do it within MediaWiki, yes. [14:24:45] <_joe_> ok, thanks :) [18:52:38] Reedy: Forgive me: https://gerrit.wikimedia.org/r/#/c/253969/29..30/includes/resourceloader/ResourceLoaderClientHtml.php [18:53:05] I don't mind [18:53:09] It just looked vaguely sane [18:53:11] :P [18:53:20] Very vaguely. ;-) [20:35:39] anomie: Do you think it'd be feasible to have the test not (as much) influenced by MWDebug? E.g. is there a way to disable it or mock it? [21:57:47] _joe_: did you figure out how to do that? [22:31:27] TimStarling, anomie any ideas about https://phabricator.wikimedia.org/T187833 ? how would i go investigating that? /cc legoktm [22:35:43] finding URLs would be a good start [22:38:34] interesting [22:39:14] fatal.log on mwlog1001 has a backtrace complete with pretty printed object parameters [22:39:50] {\"mTitle\":{\"mTextform\":\"Barack Obama\" [22:42:10] https://graphite.wikimedia.org/render/?width=600&height=400&target=aliasByNode(scale(MediaWiki.errors.fatal.rate,60),0,1,2)&from=now-4d&drawNullAsZero=true [22:42:21] TimStarling: Nice. I was starting to wonder if https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/HHVMRequestInit.php is broken [22:42:28] But I guess that doesn't end up in logstash [22:42:49] * subbu checks if he has access to mwlog1001 [22:43:09] previewing the page appears to work, the log entries are for API action=parse [22:43:33] i don't. [22:44:09] Ah, no they do end up in logstash through a different handler. in type:mediawiki channel:fatal [22:45:23] And with urls, this I got randomly also from API, this one enwiki:/w/api.php?action=parse&prop=iwlinks|externallinks&format=json&oldid=** [22:46:58] I've re-saved the Fatal Monitor, now including channel:fatal from mediawiki [22:47:08] so the context and urls etc are viewable from logstash [22:48:33] where ** 256170852? [22:48:48] ah yes, I get a 503 by trying that URL [22:49:36] "This is an old revision of this page, as edited by Angerdrops (talk | contribs | block) at 14:46, 6 December 2008" [22:49:45] and I get a 503 previewing that revision [22:53:26] Indeed, revision 256170852. [22:53:30] no User-Agent, does that mean there was no User-Agent header in the request? [22:55:04] there's an IP logged but it's just varnish, no XFF [22:55:15] TimStarling: Hm.. I don't think there is a user agent field in our log context by default, only the url and referer are, and we don't add UA for fatal specifically either. [22:56:53] well, it's only a few per minute, it's not a DoS [22:57:01] just a broken bot [22:57:59] TimStarling, Krinkle what log file are you looking at on mwlog1001? [22:58:11] subbu: I'm not. [22:58:12] i have access .. i had typoed and assumed i didn't have access. [22:58:29] I'm using https://logstash.wikimedia.org/app/kibana#/dashboard/Fatal-Monitor [22:58:39] and I'm also checking kafka raw webrequest to see if I can sample a raw request to get more details [22:58:47] IP logging is broken in almost all logs, the XFF lookup was too expensive and got disabled so they just log the varnish IP [22:59:14] except a few like login failure logs where XFF check is done manually in the logging code [23:00:11] ah, fatal.log [23:01:05] yeah, fatal.log, but it apparently doesn't have as much information as logstash [23:02:20] >> -security [23:05:38] TimStarling, https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2018.02.20/hhvm?id=AWG1dEzF4sAqeJCoFFLY&_g=() from the dashboard doesn't have any more info? what logstash url has more info? [23:07:35] subbu: https://logstash.wikimedia.org/goto/9f522ce47239a4373bc059024f286f5e [23:10:46] thanks .. just following along so i get more familiar with where everything is. [23:11:01] It seems viewing that revision through index.php also triggers a fatal [23:11:16] Ah, yeah, Tim reported that already. [23:11:29] (I thought you meant preview as in Page Previews) [23:11:57] non-fatal: https://en.wikipedia.org/?diff=256170852&diffonly=1&onlydiff=1 [23:13:44] looks like logstash has url,shard,referrer,ip whereas fatal.log doesn't. [23:16:29] no user-agent in the logs .. aren't bots supposed to identify themselves via an user-agent string? [23:25:53] tgr: Any interest in blowing off the dust from https://gerrit.wikimedia.org/r/#/c/245581/ and getting Scribunto and SyntaxHighlight to not endlessly spam the deprecation logs? [23:29:08] James_F: will look but probably not today [23:29:11] subbu: I've grepped the user-agent out of raw request logs, it's set to "Ruby" [23:29:30] tgr: Sure, it's been broken since REL1_25, no rush. :-) [23:29:37] raw IP (not Varnish or WMF IP) is also known, all from same IP. [23:30:14] Krinkle, ok. but, why isn't the user-agent / raw ip not in the fatal log or in logstash? [23:30:31] logstash i assume because of privacy reaons? [23:30:35] subbu: The log is written to by MediaWiki, which apparently doesn't have XFF working properly for this. [23:30:48] And as for User-Agent, it's just not included in the default MediaWiki Monolog context. [23:31:06] wiki version, and other mw/http-specific fields are included but not user-agent [23:31:11] oh, i see you guys already discussed all this above .. let me read the entire scrollback. [23:31:13] Some specific feature logs like for API do log it. [23:31:42] I don't think it's privacy related, although we should generally avoid spreading the same information to too many different systems in order to keep access and retention more simple and centralised. [23:31:49] given Logstash is behind NDA [23:32:22] So for now, raw IP or request headers requires stat access to Hive/Kafka, where the information is available (or as root, by tailing varnishlog and hope to get lucky) [23:32:35] got it. [23:33:30] stat access should be uncontroversial for NDA with manager approval for need/use case. But at least that way it stays in one place. [23:33:59] That's just me guessing. I don't know if it's explicitly decided not to log in Logstash. [23:36:04] ok. thanks. i understand this a bit better now. [23:40:05] For the record, I got the information from stat1004 via a command like this: [23:40:10] $ kafkacat -C -b kafka1012 -o-1000000 -c2000000 -t webrequest_text | grep 256170852 [23:40:24] which then yielded 4 matching reqs from the last 10 minutes or so. [23:43:28] Krinkle: XFF lookup in MediaWiki involves initializing the IPSet object with the IP patterns of trusted proxies, which turns out to be surprisingly slow [23:43:50] or at least that was the case about two years ago when I looked into this [23:44:12] Right, but I assume we haven't suddenly started to attribute all anon edits to Varnish IPs [23:44:14] So we still do it, right [23:44:18] but only on edits or something? [23:44:26] doing it on every request for logstash would have been something like a 2% response time increase [23:45:01] yeah we do it on edits and a few other things like throttling [23:45:03] and also for any permission check on gets with a session presumably [23:45:11] and any writes. [23:45:26] But not the anon cache-miss'es that populate varnish [23:46:12] Hm.. would it make sense to instead log the full list to Logstash and not do the elaborate xff lookup ? [23:46:16] part of the problem is that there is no easy way to tell if some other part of MediaWiki already did it (or will do it) so logstash can't trivially use the real IP even in those cases [23:46:22] Seems like that woudl still be more useful than logging the first entry (Varnish IP) only [23:46:37] that would be possible, sure [23:47:17] Yeah, having it sometimes override for logstash seems icky, I agree. [23:47:18] I think it's done by hand in some auth-related log channels that needed the real IP [23:47:30] but could be just added to the generic log processor [23:47:38] The logging should ideally be consistent for the default meta-data fields. [23:47:52] Right now that default field contains the Varnish IP [23:48:06] We coudl modify that do instead log the xff-list instead, or in addition. [23:53:47] the related task is T114700 [23:53:48] T114700: Monolog added "ip" is REMOTE_ADDR and does not honor XFF headers - https://phabricator.wikimedia.org/T114700