[01:35:27] 7HTTPS, 10Traffic, 6Labs, 6Operations, and 2 others: Detect tools.wmflabs.org tools which are HTTP-only - https://phabricator.wikimedia.org/T128409#2078533 (10Dzahn) works now:) root@tools-proxy-01:~# tail -f /var/log/nginx/access-scheme.log shows first results [01:40:22] 7HTTPS, 10Traffic, 6Labs, 6Operations, and 2 others: Detect tools.wmflabs.org tools which are HTTP-only - https://phabricator.wikimedia.org/T128409#2078538 (10Dzahn) here's a first list of tools using http, status 200 ``` add-information admin anagrimes anomiebot anomiebot HTTP ~apper apple-touch-icon-12... [07:51:26] 7HTTPS, 10Traffic, 10Wikipedia-Store: https://store.wikimedia.org doesn't set HSTS header - https://phabricator.wikimedia.org/T128559#2078914 (10Chmarkine) [07:52:18] 7HTTPS, 10Traffic, 6Operations, 10Wikipedia-Store: shop.wikimedia.org should be HTTPS only - https://phabricator.wikimedia.org/T39790#417984 (10Chmarkine) [07:52:20] 7HTTPS, 10Traffic, 6Operations, 10Wikipedia-Store: https://store.wikimedia.org doesn't set HSTS header - https://phabricator.wikimedia.org/T128559#2078930 (10Chmarkine) [09:58:19] 10Traffic, 6Operations: Install XKey vmod - https://phabricator.wikimedia.org/T122881#2079205 (10ema) p:5Triage>3Normal a:3ema [10:53:47] 10Traffic, 10MobileFrontend, 6Operations, 5MW-1.27-release, and 6 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#2079398 (10Sjoerddebruin) >>! In T124356#2077122,... [11:05:26] error: varnishkafka:4 unknown user 'syslog' [11:05:26] error: found error in /var/log/varnishkafka.log , skipping [11:05:28] tons of those [11:05:38] (cronspam) [11:33:04] ick [11:33:35] so that's actually coming from logrotate [11:34:10] looking... [11:35:27] "create 0664 syslog adm" in that logrotate comes from 2014, so I don't think it was a recent change to the logrotate itself [11:36:36] seems to have started for caches on Feb 26, but has been happening for other random hosts further back than that [11:41:53] e.g. kafkatee on oxygen has been cronspamming this for almost a year: [11:41:54] /etc/cron.daily/logrotate: [11:41:54] error: kafkatee:4 unknown user 'syslog' [11:41:54] error: found error in /var/log/kafkatee.log , skipping [11:42:30] I'm guessing user 'syslog' must have existed in the past, now doesn't [11:42:40] although I don't know how that changed on Feb 26 for caches... [11:45:23] on an exampe cache, apt logs for that day only show: [11:45:23] Unpacking nginx-full (1.9.4-1+wmf2) over (1.9.4-1+wmf1) ... [11:45:23] Processing triggers for systemd (215-17+deb8u3) ... [11:45:24] Processing triggers for man-db (2.7.0.2-5) ... [11:45:42] (upgrade of nginx package, but running actions for systemd could maybe be involved) [11:46:36] maybe they've been spamming this forever, and their outbound mail was fixed on the 26th heh [11:46:44] I donno :P [11:47:35] ok got it [11:47:36] Feb 25 08:19:20 cp1065 puppet-agent[1055]: (/Stage[main]/Varnishkafka/File[/etc/logrotate.d/varnishkafka]/owner) owner changed '998' to 'root' [11:47:40] Feb 25 08:19:20 cp1065 puppet-agent[1055]: (/Stage[main]/Varnishkafka/File[/etc/logrotate.d/varnishkafka]/group) group changed 'ganglia' to 'root' [11:47:43] Feb 25 08:19:20 cp1065 puppet-agent[1055]: (/Stage[main]/Varnishkafka/File[/etc/logrotate.d/varnishkafka]/mode) mode changed '0664' to '0444' [11:48:09] so, we had that big logrotate fiasco recently, where we discovered that logrotate disliked/ignored a lot of its config files due to bad ownership/mode [11:48:50] now that the perms on the logrotate file itself are correct, logrotate is trying to parse/use them for the first time, and so now errors within the file that are much much older are finally apparent [11:49:06] that explains the timing. now just have to pick an appropriate user to fix the files with [11:50:27] also, varnishkafka being a submodule still sucks [11:51:21] modules/hhvm/files/hhvm.logrotate: create 0640 syslog deployment [11:51:21] modules/ocg/files/logrotate: create 644 syslog adm [11:51:22] modules/varnishkafka/files/varnishkafka_logrotate: create 0664 syslog adm [11:51:27] 3x uses I can find [11:52:14] but the other two (hhvm, ocg) are on trusty, which has a user named syslog [12:02:54] anyways, fixed: https://gerrit.wikimedia.org/r/#/c/274368/ -> https://gerrit.wikimedia.org/r/#/c/274370/ [12:44:07] wooo thanks [12:44:12] sorry to drop this on you and disappear [12:46:45] also ick on the logrotate perms issue :( [13:09:00] bblack: very low priority, only if you have time :) - ori brought up an interesting question a couple of days ago, namely if there is a reason why http://dumps.wikimedia.org does not redirect to https (I checked for fun redirects.dat [13:11:51] Making this change might break people with automation in place that grabs the http url, but with the adequate announcement it shouldn't be a problem to redirect everything to https [13:12:54] elukey: there's probably already a task, or should be [13:13:08] "dump" is one of the few that's not through misc-web, probably because of fears about large downloads being affected [13:13:09] this is a good point, I haven't checked [13:13:21] err "dumps" [13:13:30] not that we've fixed all of misc-web either, but they're easier to fix [13:14:28] would the redirect be issued by apache via request.dat or by varnish in this case? [13:14:51] it's not behind our standard nginx/varnish termination (e.g. misc-web cluster), it's a completely independent public service [13:14:54] so "whatever it uses" [13:16:29] ahhhh okok [13:16:54] hopefully whatever it uses is nginx or apache, and we can use our standard stuff to set up the actual TLS parameters [13:17:02] (or already are) [13:18:08] checked, it does [13:18:13] so only the redirect bit is missing [13:18:24] in the nginx config somewhere I'd guess [13:19:37] I am going to open a phab task for this to track the work, very low priority but interesting to poke around configs [13:20:07] ok thanks! [13:21:33] 10Traffic, 10Analytics, 6Operations: http://dumps.wikimedia.org should redirect to https:// - https://phabricator.wikimedia.org/T128587#2079753 (10elukey) [14:43:32] 7HTTPS, 10Traffic, 6Operations, 10Wikipedia-Store: https://store.wikimedia.org doesn't set HSTS header - https://phabricator.wikimedia.org/T128559#2079914 (10Ppena) Hi @Chmarkine. We don't have anyone with a tech/ops background working for the store at the moment < waiting volunteers> :)!! Can you please... [14:48:35] 10Traffic, 10Analytics, 6Operations: http://dumps.wikimedia.org should redirect to https:// - https://phabricator.wikimedia.org/T128587#2079921 (10Dzahn) Has once been declared "won't fix" on https://wikitech.wikimedia.org/wiki/Httpsless_domains in the past. Adding @ArielGlenn. Remember that discussion? [15:13:16] 7HTTPS, 10Traffic, 10Huggle, 6Operations: Huggle 2 fails on HTTP used when HTTPS expected - https://phabricator.wikimedia.org/T126357#2079942 (10DVdm) I have left a new version 2.1.27.6 in our shared dropbox, ready to be picked up, tested, honed and published by Petrb. Caught bug reading resources file War... [15:49:09] 10Traffic, 10MobileFrontend, 6Operations, 5MW-1.27-release, and 6 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getText() ) - https://phabricator.wikimedia.org/T124356#2080027 (10Jdlrobson) Another example https://nl... [16:03:22] bblack, ema: after preliminary tests are fine, ok to install openssl 1.0.2g on cp1008 tomorrow morning? [16:10:11] 7HTTPS, 10Traffic, 6Operations, 10Wikipedia-Store: https://store.wikimedia.org doesn't set HSTS header - https://phabricator.wikimedia.org/T128559#2078914 (10Krenair) >>! In T128559#2079914, @Ppena wrote: > We don't have anyone with a tech/ops background working for the store at the moment < waiting volunt... [16:24:25] moritzm: definitely! [16:46:49] 10Traffic, 10Deployment-Systems, 6Operations, 6Performance-Team, and 2 others: Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime - https://phabricator.wikimedia.org/T99096#2080308 (10Krinkle) [17:57:28] 7HTTPS, 10Traffic, 6Operations, 10Wikipedia-Store: https://store.wikimedia.org doesn't set HSTS header - https://phabricator.wikimedia.org/T128559#2080625 (10Dzahn) Yea, i was gonna say, that would be possible if store was running on our own infrastructure, but it's all external on shopify.com. [18:05:12] 7HTTPS, 10Traffic, 6Operations, 10Wikipedia-Store: https://store.wikimedia.org doesn't set HSTS header - https://phabricator.wikimedia.org/T128559#2080643 (10Dzahn) >>! In T128559#2079914, @Ppena wrote: > Can you please explain a little what setting the HSTS header means and what is the urgency on this? Th... [18:13:02] 7HTTPS, 10Traffic, 6Operations, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2080687 (10Florian) [18:15:36] 7HTTPS, 10Traffic, 6Operations, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2080697 (10Dzahn) 5Open>3stalled stalled - needs DoD TechOps person on this ticket :p [18:46:53] 7HTTPS, 10Traffic, 6Operations, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2080772 (10Florian) @Heather: As far as I can see (and with the big help @Dzahn, who pointed me to this approach :)) it seems you're in the Commun... [18:47:18] 7HTTPS, 10Traffic, 6Operations, 6WMF-Communications, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2080775 (10Florian) [19:16:41] 7HTTPS, 10Traffic, 6Operations, 10Wikipedia-Store: https://store.wikimedia.org doesn't set HSTS header - https://phabricator.wikimedia.org/T128559#2080900 (10Ppena) @Dzahn got it, thanks for explaining in plain english, Dan ;) I will email Shopify and ask them about it, but unfortunately I don't think we... [20:32:02] 7HTTPS, 10Traffic, 6Operations, 6WMF-Communications, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2081299 (10Jalexander) I'm looking into some options for contacts. [20:52:10] 10Traffic, 6Operations: 3x cache_upload crashed in a short time window - https://phabricator.wikimedia.org/T125401#2081439 (10BBlack) cp1048 (another upload cache) crashed today with: ``` Mar 2 20:25:29 cp1048 kernel: [1915351.432154] ------------[ cut here ]------------ Mar 2 20:25:29 cp1048 kernel: [19153... [20:58:19] 7HTTPS, 10Traffic, 6Operations, 6WMF-Communications, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2081526 (10Jalexander) Who would be best for me to connect on an email about this, @Dzahn ? Still looking but good chance t... [21:04:29] 7HTTPS, 10Traffic, 6Operations, 6WMF-Communications, 7Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2081543 (10BBlack) Probably me [21:08:51] 10Traffic, 6Operations: 3x cache_upload crashed in a short time window - https://phabricator.wikimedia.org/T125401#2081582 (10MoritzMuehlenhoff) > @MoritzMuehlenhoff - It was already running `Linux cp1048 3.19.0-2-amd64 #1 SMP Debian 3.19.3-9 (2016-01-04) x86_64`, does that include the paulmck rcu fix already?... [22:35:06] 10Traffic, 6Operations: 3x cache_upload crashed in a short time window - https://phabricator.wikimedia.org/T125401#2082003 (10BBlack) IMHO, 4.4.x is getting close anyways, we may as well see if this problem just goes away after the switch to it. [22:51:17] 10Traffic, 6Operations: 3 Varnish cache_upload servers crashed in a short time window - https://phabricator.wikimedia.org/T125401#2082092 (10Krinkle) [23:40:12] 10Traffic, 6Operations: Office network using monkeybrains.net instead of connection to SFO pop site - https://phabricator.wikimedia.org/T128669#2082348 (10bbogaert)