[00:19:58] 10Traffic, 06Operations, 10RESTBase, 10RESTBase-API, and 2 others: Expose the PDF rendering service via RESTBase - https://phabricator.wikimedia.org/T143132#2836858 (10Addshore) [00:37:19] 10Traffic, 06Operations: Migrate host lists out of cache.pp to reference values in Hiera - https://phabricator.wikimedia.org/T92601#2836900 (10fgiunchedi) 05Open>03Invalid I think this was eventually resolved during various refactoring, adding #traffic just in case. [00:59:02] 10Traffic, 06Operations: Support ESI for ResourceLoader - https://phabricator.wikimedia.org/T78963#2836973 (10fgiunchedi) Copying Traffic since it'd affect Varnish if we choose to do it [00:59:05] 10Traffic, 06Operations: Support ESI for ResourceLoader - https://phabricator.wikimedia.org/T78963#2836975 (10fgiunchedi) [01:27:52] eeek! [01:27:55] php > file_get_contents('https://meta.wikimedia.org'); [01:27:55] Warning: file_get_contents(): SSL operation failed with code 1. OpenSSL Error messages: [01:27:55] error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed in php shell code on line 1 [01:28:43] only on mac [01:33:17] fuck you apple [01:59:44] 10Traffic, 06Operations, 10Wikimedia-General-or-Unknown: Varnish: Mobile site redirect interferes with OAuth authorization process - https://phabricator.wikimedia.org/T74186#2837108 (10fgiunchedi) Adding #traffic for visibility [12:41:57] so there was an icinga disk space warning for cp4008, apt-get clean rescued some 500 megs [12:42:34] still we do have some pretty big files under /var/cache/varnishkafka (eg: webrequest.stats.json.1 is 659M) [12:46:55] ah snap [12:47:21] not to mention /var/log/daemon.log being 1.1G [12:47:33] checking vk's logrotate [12:47:46] that's mostly gmond vsm permission spam [12:49:57] ema: /var/cache/varnishkafka/webrequest.stats.json states a weekly logrotation + compress, plus rotate 4.. maybe we could move it to rotate daily [12:50:07] elukey: +1 [12:50:22] it's not only perm spam in daemon.log, something is wrong with /usr/lib/ganglia/python_modules/varnishkafka.py [12:51:12] KeyError: 'kafka.varnishkafka.time' [12:52:17] 10Traffic, 06Operations: Block hotlinking - https://phabricator.wikimedia.org/T152091#2837991 (10Gilles) [12:53:41] 10Traffic, 06Operations: Block hotlinking - https://phabricator.wikimedia.org/T152091#2838005 (10Gilles) [12:53:55] elukey: filing a bug for the varnishkafka.py error [12:58:37] ema: https://gerrit.wikimedia.org/r/#/c/324706/1 [12:59:47] 10Traffic, 06Operations: Ganglia varnishkafka python module crashing repeatedly - https://phabricator.wikimedia.org/T152093#2838033 (10ema) [13:00:01] 10Traffic, 06Operations: Ganglia varnishkafka python module crashing repeatedly - https://phabricator.wikimedia.org/T152093#2838046 (10ema) p:05Triage>03High [13:00:54] now the question is.. do we need ganglia stats for vk? [13:01:00] :) [13:01:02] especially now that ganglia should go away? [13:01:11] I have always used graphite metrics [13:02:37] we can get rid of it as far as I'm concerned [13:05:10] me too [13:05:56] for the record, the problem started on Nov 22 00:05:46 on cp4008 [13:09:50] and https://gerrit.wikimedia.org/r/#/c/324708/1 [13:10:11] statsv also uses it, going to check if affected [13:10:32] on cp* hosts we use logster to read stats.json and push metrics to statsd [13:10:42] but not for statsv [13:11:01] anyhow, long live to prometheus :) [13:13:07] indeed! [13:22:24] intersting: ./files/varnishkafka_ganglia.py:460:13: F999 dictionary key '2.1' repeated with different values [13:22:53] this is something that tox-jessie complains about for the vk module change [13:23:03] that was sitting there waiting for somebody to send a code review [13:23:06] sigh [13:50:12] vk stat logrotate files updating now [13:50:36] ganglia deprecation might need some more +1, I'll ping ottomata [14:01:35] elukey: speaking of disk space issues, there's an alert for stat1002 too [14:01:52] sigh [14:01:55] thanks :) [14:15:13] 10Traffic, 06Operations: Block hotlinking - https://phabricator.wikimedia.org/T152091#2838193 (10BBlack) I've made this argument before. I'm not fond of Commons/upload images/thumbs being hotlinkable. In my mind, Commons exists to serve the multimedia needs of the encyclopedic content, and hotlinking from it... [14:26:06] 10Traffic, 06Operations: Block hotlinking - https://phabricator.wikimedia.org/T152091#2838225 (10Gilles) Except I've looked at the data for 302s and there was no legitimate use. Even on the rare instances of a blog that seemed to have an educative purpose, there was no attribution and the text was probably cop... [14:34:17] elukey: I've played a little bit with vago https://phabricator.wikimedia.org/P4551 [14:35:13] that's basically the 'varnishlog raw' example but using vago.REQ as the grouping method + cli parameters for the VSL query and instance name [14:36:29] outputing every matching entry (ie: without commenting out the Printf) the thing is unsurprisingly using 100% CPU on pinkunicorn [14:36:37] without the printf it's ~60% [14:37:02] and with a sleep? [14:37:13] the usual 0.01s [14:37:32] mmh no, there should be no sleep there [14:37:52] that's the callback being called once per matching entry [14:38:46] so the golang route seems fun but there would be quite some work to do [14:39:14] as far as I can tell from https://github.com/phenomenes/vago/blob/master/log.go#L33 there's no support for VSL_Args (eg: -i RespStatus and such) [14:39:19] ahhhh sorry I didn't get the syntax [14:39:30] wow Go is not really for me :P [14:40:14] is vago the only available one? [14:40:27] no clue :) [14:41:13] :) [14:43:20] ah ok so the v.Log does the VSLQ_Dispatch [14:43:28] jaaa [14:43:35] and sleeps time.Sleep(1000) [14:44:41] yes, which is surely wrong [14:47:44] I don't see any alternative to vago on github [14:49:46] (awesome name BTW) [14:57:17] 10Traffic, 06Operations: Block hotlinking - https://phabricator.wikimedia.org/T152091#2838311 (10BBlack) Keep in mind I fundamentally agree with you from personal POV, but I feel the need to play devil's advocate for the existing stance today here: >>! In T152091#2838225, @Gilles wrote: > Except I've looked a... [14:59:35] 10Traffic, 06Operations: Block hotlinking - https://phabricator.wikimedia.org/T152091#2838318 (10Gilles) The 302s are links to thumbnails that have moved because the original was moved. Mediawiki honors those redirects on misses, figuring out what the new thumbnail location is. [15:09:08] 10Traffic, 06Operations: Block hotlinking - https://phabricator.wikimedia.org/T152091#2838342 (10Gilles) The examples you'e provided for legitimate use cases aren't compelling examples of us providing a free CDN being a necessity. The examples I've seen on blogspot could host the images there, and it would be... [15:23:14] 10Traffic, 06Operations: Block hotlinking - https://phabricator.wikimedia.org/T152091#2837991 (10faidon) It seems that you're objecting to this feature on two different grounds: one is the legality of how it's being used by users (copyvios, mainly missing attribution when the content's license requests it) and... [16:35:47] 10Traffic, 06Operations: more robust certificate chain creation in puppet - https://phabricator.wikimedia.org/T84543#2838608 (10Dzahn) [16:37:14] 10Traffic, 06Operations: more robust certificate chain creation in puppet - https://phabricator.wikimedia.org/T84543#928592 (10Dzahn) I think this old ticket imported from RT times can be resolved. But i would say the authority on this should be @BBlack [16:51:45] bblack: You think we could get to https://gerrit.wikimedia.org/r/#/c/305536/ today? [17:04:17] 10netops, 06Operations: Enabling IGMP snooping on QFX switches breaks IPv6 (HTCP purges flood across codfw) - https://phabricator.wikimedia.org/T133387#2838705 (10faidon) So first of all, JTAC said there is no ETA for this fix getting into 14.1 and we should really go with 15.1. So, I tried upgrading to 15.1R... [17:13:02] 10netops, 06Operations: Enabling IGMP snooping on QFX switches breaks IPv6 (HTCP purges flood across codfw) - https://phabricator.wikimedia.org/T133387#2838729 (10akosiaris) Unfortunately 15.1R4.6 has not solved the problem. Just managed to reproduce it with the exact same procedure and results. That is enabli... [17:15:05] 10netops, 06Operations: Enabling IGMP snooping on QFX switches breaks IPv6 (HTCP purges flood across codfw) - https://phabricator.wikimedia.org/T133387#2838737 (10faidon) I responded to Juniper with the results of the above test, it's back with them now… [17:27:30] 10Traffic, 06Operations, 13Patch-For-Review: Ganglia varnishkafka python module crashing repeatedly - https://phabricator.wikimedia.org/T152093#2838770 (10elukey) Next step is to check if we can use `logster` for `statsv` metrics (and then probably ask to the Performance team). Going to work on it tomorrow! [17:43:46] ostriches: the holdup on that isn't me, I'm all for it. I just don't know how to properly test an apache change for deploy [17:44:23] I mean, we can't really test a removal of a vhost. [17:44:35] I mean maybe add the domain back to dns, sync the removal, watch it break [17:44:40] Then remove dns again [17:45:26] I'm absolutely certain nothing in puppet/dns/mw-config is calling that vhost anymore. And it's been dead in dns for awhile now, anything to it should be unroutable. [17:45:48] yeah I'm not worried about breaking bits itself [17:46:19] but we're supposed to have some generalized test of "unforeseen consequences of seemingly simple apache changes" that runs through a library of standard tests against other hostnames on X-Wikimedia-Debug or something like that [17:46:23] I thought? [17:47:37] apache-fast-test? [17:49:04] maybe? I really have no idea, I don't normally do apache changes [17:49:33] muta ;-) [18:01:06] one thing that I am planning to do is adding logging for the default vhost [18:01:24] so requests not landing to any configured vhost will be logged [18:01:41] but this will not help a lot in testing before releasing [18:03:21] and from https://wikitech.wikimedia.org/wiki/Application_servers#Deploying_config I didn't find any testing env.. [18:04:12] in T57857 it was mentioned that we have a operations/apache-config.git but I am not sure how up to date it is [18:04:12] T57857: Unit tests for apache config/rewrites - https://phabricator.wikimedia.org/T57857 [18:04:25] awesome part would be to have it running in jenkins [18:04:33] (maybe its up to date version etc..) [18:10:19] ostriches: if you are not in a hurry we could do it early next week [18:10:37] I am going to add it in deployment prep now [18:12:48] Awesome thanks! [18:12:49] :) [18:13:28] bblack: if you want to do it first let me know, otherwise I can help :) [18:14:13] ok it is in deployment-prep now [18:14:36] so let's leave it in there for a couple of days to see if something comes up [18:15:13] sounds great :) [18:18:33] ah snap [18:18:35] if $::realm == 'labs' { [18:18:35] include ::mediawiki::web::beta_sites [18:18:35] } else { [18:18:35] include ::mediawiki::web::prod_sites [18:18:37] } [18:18:50] guess where bits.w.o is :/ [18:18:57] so deployment-prep is a no op [18:19:10] will try to test it on mwdebug first [18:19:22] ostriches: let's touch base next week! [18:27:02] elukey: Sounds great! Docroot sanity++ [18:58:52] 10Traffic, 06Operations: Block hotlinking - https://phabricator.wikimedia.org/T152091#2839200 (10fgiunchedi) p:05Triage>03Normal [19:10:45] 10Traffic, 06Operations: Block hotlinking - https://phabricator.wikimedia.org/T152091#2837991 (10valhallasw) >>! In T152091#2838342, @Gilles wrote: > The examples you'e provided for legitimate use cases aren't compelling examples of us providing a free CDN being a necessity. The examples I've seen on blogspot... [19:13:07] elukey, I have a bunch of commits to kill that realm distinction [19:13:24] but no one is reviewing them [22:27:07] elukey: any insight on https://phabricator.wikimedia.org/T152122 ? It's an FR ticket about a mysterious 1h dropoff in banner impressions back at ~08:00 today. But it could be an artificial issue with webrequest data if other reqs are affected too? also there was some pivot deploy around then... [23:34:17] 10Wikimedia-Apache-configuration, 06Discovery, 06Operations, 07Mobile, 13Patch-For-Review: m.wikipedia.org incorrectly redirects to en.m.wikipedia.org - https://phabricator.wikimedia.org/T69015#2840271 (10debt) Is there a chance that this will actually be implemented any time soon?