[00:29:25] 10Traffic, 10MediaWiki-Uploading, 10Multimedia, 10Operations, 10Wikimedia-Video: Uploading 1.2GB ogv results in 503 - https://phabricator.wikimedia.org/T128358 (10Ijon) Hello. I seem to have run up against this problem. This seems like a serious barrier to contributing core content (media files into Co... [00:40:34] 10Traffic, 10MediaWiki-Uploading, 10Multimedia, 10Operations, 10Wikimedia-Video: Uploading 1.2GB ogv results in 503 - https://phabricator.wikimedia.org/T128358 (10Tgr) If this really is a Pywikibot problem, it probably helps with prioritization if the relevant project is tagged. At a glance though pywi... [01:09:20] 10Traffic, 10MediaWiki-Uploading, 10Multimedia, 10Operations, 10Wikimedia-Video: Uploading 1.2GB ogv results in 503 - https://phabricator.wikimedia.org/T128358 (10zhuyifei1999) Pywikibot does support chunked uploading, this task is about why such chunked uploading must be using async mode, and if it mus... [01:20:06] 10Traffic, 10MediaWiki-Uploading, 10Multimedia, 10Operations, 10Wikimedia-Video: Uploading 1.2GB ogv results in 503 - https://phabricator.wikimedia.org/T128358 (10zhuyifei1999) On a side note, there are few tools/scripts that support async chunked uploading (UW, Rillke's script, and v2c are the only one... [02:45:44] 10Wikimedia-Apache-configuration: noc.wikimedia.org has broken link to Server Admin Log - https://phabricator.wikimedia.org/T191085 (10PrimeHunter) 05Open>03Resolved a:03PrimeHunter noc.wikimedia.org still links https://wikitech.wikimedia.org/view/Server_Admin_Log but it now redirects to https://wikitech.w... [03:17:55] 10Traffic, 10Operations: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Dzahn) I think i did httpd setup for pretty much all of those except mx. Happy to help deploying more of them. [06:50:51] 10Traffic, 10MediaWiki-Uploading, 10Multimedia, 10Operations, 10Wikimedia-Video: Uploading 1.2GB ogv results in 503 - https://phabricator.wikimedia.org/T128358 (10Ijon) (@Tgr - my concern is for end-users not using Pywikibot, but the videoconvert tool. Thanks to @zhuyifei1999 and @Harej I learned that... [06:51:44] 10Traffic, 10MediaWiki-Uploading, 10Multimedia, 10Operations, 10Wikimedia-Video: Uploading 1.2GB ogv results in 503 - https://phabricator.wikimedia.org/T128358 (10Ijon) (unfortunately, videoconvert does not seem to be tracking issues here on Phabricator...) [08:22:30] 10Traffic, 10Operations: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Krenair) Okay, first one is {T209856} and its next patch is https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/474743/ followed by https://gerrit.wikimedia.org/r/#/c... [09:23:11] 10Traffic, 10Operations, 10Patch-For-Review: ATS: log inspection at runtime - https://phabricator.wikimedia.org/T204225 (10ema) Yesterday all ATS hosts ran out of disk space. That's due to trafficserver logging several messages like the following: File:/var/log/trafficserver/notpurge.pipe was closed, have... [09:30:12] ema: regarding point 2 in your comment, what happens if the icinga check triggers a config reload with an invalid config? the check makes ATS to go down? [09:30:39] vgutierrez: it's brain damaged, but not that much [09:31:03] so it stops before actually reloading it? [09:31:12] good :) [09:31:17] yes :) [10:02:45] ema: also from past experience I would try multiple parallel config reloads to make sure there is any weird behaviour [10:02:58] as I guess that Icinga and Puppet might very easily trigger both reloads [10:03:13] s/Puppet/anything else/ [10:03:30] (maybe Puppet explicitely doesn't reload, I actually don't know ;) ) [10:13:36] 10Wikimedia-Apache-configuration, 10Operations: Redirect from zh-yue.wiktionary.org is not working properly - https://phabricator.wikimedia.org/T209693 (10Hello903hello) > In T209693#4757863, @ArielGlenn wrote: > > So why does yue.wikipedia go to zh-yue.wikipedia? T10217 and T30441 Initially created at zh-yue... [12:41:00] 10Wikimedia-Apache-configuration, 10Operations: Redirect from zh-yue.wiktionary.org is not working properly - https://phabricator.wikimedia.org/T209693 (10ArielGlenn) Ok, I have read all the dang back tckets and thanks everyone for their comments. I am skipping wikisource and betawikiversity because it's more... [12:41:48] 10Wikimedia-Apache-configuration, 10Operations, 10Patch-For-Review: Redirect from zh-yue.wiktionary.org is not working properly - https://phabricator.wikimedia.org/T209693 (10ArielGlenn) Needless to say I'd like a bunch of eyes on this to make sure it's right. Thanks in advance. [14:10:49] Krenair: as I commented in the CR, I think you are right regarding the location of the certcentral::cert call in the librenms puppet profile, could you move it to librenms::web as part of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/474743/ ? [14:11:05] yes [14:14:03] thx :D [15:19:32] Krenair, so I'm merging the change, I'll stop puppet first and let it run in netbox2001 first, but everything should be ok [15:21:30] ok [15:37:18] certificate replaced successfully in netmon2001.wm.o [15:40:27] so https://librenms.wikimedia.org/ is using a TLS certificate managed by certcentral <3 [15:43:33] Krenair: can you rebase https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/474747/? [15:43:57] it's failing on pcc due to the changes in the previous commit [15:43:59] alex@alex-laptop:~$ openssl s_client -connect librenms.wikimedia.org:443 2>&1 | openssl x509 -noout -text | grep "Not Before:" [15:43:59] Not Before: Nov 19 15:50:45 2018 GMT [15:43:59] yes [15:44:06] thx [15:50:46] vgutierrez, done [15:50:51] (sorry got distracted) [15:51:13] thx, np [15:51:13] rerunning pcc [15:51:25] I love the 474747 change number [15:51:35] it looks like a payload showing in EIP [15:51:57] GGG [15:52:57] I have no idea what this means but okay [16:05:36] 10Traffic, 10Operations: Migrate most standard public TLS certificates to CertCentral issuance - https://phabricator.wikimedia.org/T207050 (10Vgutierrez) [16:05:42] 10Certcentral, 10Traffic, 10Operations: Deploy a certcentral managed TLS certificate for librenms - https://phabricator.wikimedia.org/T209856 (10Vgutierrez) 05Open>03Resolved [16:07:24] I keep getting sucked into coworking sessions with this *very* chatty coworker https://photos.app.goo.gl/Cpu2kP8x2Wy4QAUE7 [16:07:44] oops wrong channel I'm a bad [16:08:03] hahahah [16:08:17] what a cutie though <3 [16:08:39] yup, that's a cute coworker :) [16:25:04] vgutierrez, to avoid the multi-puppet-runs problem we could, as part of adding the cert to certcentral's config, also add an authorisation exported resource to the target hosts [16:25:31] then the change to actually pulling down the cert happens an hour later once puppet has sorted itself out [16:26:54] so we should manually keep a list of authorised hosts per certificate? [16:27:21] no [16:27:55] but at the same time you make the commit to change certcentral's cert config [16:28:07] you also change the client's manifest so they start getting their authorisation resources in early [16:28:45] so while certcentral is going over the issuance process, the client hosts get authorized (ideally) [16:28:50] It'd be like the @@File["/etc/certcentral/conf.d/authorisedhost_${title}__${::fqdn}.yaml"] from certcentral::cert [16:28:54] yes [16:30:08] but that wouldn't solve the puppet error on the client on the first puppet run [16:30:11] volans wasn't amused by that yesterday [16:31:33] we'd still require two puppet runs in the same system to get to the desired state [16:35:14] vgutierrez, Krenair: what is the puppet_svc param attached to in https://gerrit.wikimedia.org/r/c/operations/puppet/+/474941 ? [16:35:28] just if the host is using apache vs nginx or has to do also with some LVS magic? [16:37:26] AFAIK no LVS magic is involved there :) [16:40:58] ack, I was confused by the naming [16:41:24] puppet_svc means puppet service to me and doesn't give me clue on what should be [16:41:58] I think that's just so it notifies the service if the cert changes? [16:42:11] it probably works the same way as the letsencrypt::cert::integrated resource but you can check [16:43:33] so.. I don't see an obvious way to avoid managing hosts lists "manually" if we want to avoid the double puppet run on the certcentral client hosts [16:43:46] I do [16:43:55] but I'm about to go afk [16:44:06] sure, let's discuss this later [16:44:17] we just need to get this part of certcentral::cert in early: [16:44:18] @@file { "/etc/certcentral/conf.d/authorisedhost_${title}__${::fqdn}.yaml": [16:44:30] several puppet runs before we use the rest of the file to pull certs [16:44:38] could just stick it in an if and make it optional [16:44:42] anyway, bbl [16:55:44] 10Certcentral: Avoid using acme.client poll_and_finalize() method - https://phabricator.wikimedia.org/T208967 (10Vgutierrez) 05Open>03Resolved p:05Triage>03Normal [16:56:30] 10Certcentral, 10Patch-For-Review: certcentral "wrongly" assumes that a new order always implies fulfilling new challenges - https://phabricator.wikimedia.org/T208948 (10Vgutierrez) 05Open>03Resolved p:05Triage>03Normal [16:57:31] 10Certcentral: certcentral wrongly handles acme.errors.ValidationError exception - https://phabricator.wikimedia.org/T208970 (10Vgutierrez) 05Open>03Resolved [16:58:39] 10Certcentral: certcentral: keep track of orders and authorizations IDs when issuing certificates - https://phabricator.wikimedia.org/T208859 (10Vgutierrez) 05Open>03Resolved a:03Vgutierrez [17:00:33] 10Traffic, 10Maps, 10Operations, 10Reading-Infrastructure-Team-Backlog (Kanban): Decide on Cache-Control headers for map tiles - https://phabricator.wikimedia.org/T186732 (10Mholloway) @Gehel, do you mean specifically the additional EventBus load generated by activating resource_change events sent from Til... [17:07:30] vgutierrez, I thought about it and this probably wouldn't work for the case where you have a new node you want to add with an existing cert [17:07:57] indeed [17:09:46] 10Certcentral: puppet still restarts certcentral on config changes instead of reloading it - https://phabricator.wikimedia.org/T209976 (10Vgutierrez) [17:12:43] 10Certcentral: certcentral crashes on network errors - https://phabricator.wikimedia.org/T209980 (10Vgutierrez) [17:36:45] 10netops, 10Operations: Bird multihop BFD - https://phabricator.wikimedia.org/T209989 (10ayounsi) p:05Triage>03Normal [17:41:09] 10netops, 10Operations: asw2-a-eqiad FPC2 reboot - https://phabricator.wikimedia.org/T209588 (10ayounsi) 05Open>03Resolved Not much we can do here, if it happen again though, we should RMA the device. [23:14:54] 10netops, 10Operations: Access to network devices for Riccardo (volans) - https://phabricator.wikimedia.org/T208726 (10ayounsi) 05Open>03Resolved Pushed everywhere except Frack infra as I don't want to make any change there during the fundraising campaigns without approval. [23:32:04] 10Traffic, 10Maps, 10Operations, 10Reading-Infrastructure-Team-Backlog (Kanban): Decide on Cache-Control headers for map tiles - https://phabricator.wikimedia.org/T186732 (10Gehel) >>! In T186732#4762745, @Mholloway wrote: > @Gehel, do you mean specifically the additional EventBus load generated by activat...