[00:26:59] mutante: i used to use this extension called DownThemAll for that kind of task [00:27:20] i'd be surprised if it had survived into the era of webextensions, but maybe. [00:29:01] brennen: yes, thanks! i remember that. i used that before.. but it's not existing anymore ..addons.mozilla.org is just 404 and official site says something about '(Lack of) Progress'. that was one of those in the "outdated" section [00:30:13] meanwhile somebody gave me https://addons.mozilla.org/en-US/firefox/addon/save-selected-tabs-to-files/ but i first need to get to FF version 63 and that isn't in Debian stable because it now requires Rust and that isn't available and so on .. but i will try with a nightly build or something later [00:30:37] can't believe how "not trivial" this is [00:30:48] yeah. :\ [00:31:06] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 1.037 second response time [00:31:07] unfortunately one of those areas where things actually seem to have regressed. [00:33:08] (i tend to just run the self-updating official mozilla stable build of firefox out of ~/firefox on my debian systems rather than wrestle with packaged versions, but of course that does imply a certain amount of trust for mozilla.) [00:36:10] brennen: basically that is what they told me in #firefox .. i just like to avoid bypassing the distro package manager.. but yea.. i will do it anyways.. even if just to test this extension [00:37:03] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [00:37:26] be back later.. first diner :) [01:27:06] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.028 second response time [01:33:06] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [03:58:05] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.048 second response time [04:09:05] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [04:19:05] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.024 second response time [04:25:05] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [05:15:05] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.055 second response time [05:46:05] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [06:20:02] 10Gerrit, 10Release-Engineering-Team (Kanban), 10LDAP: Gerrit: Cannot assign user name "vladi2016" to account XXXX; name already in use. - https://phabricator.wikimedia.org/T220867 (10MisterSynergy) 05Resolved→03Open As there are other users who cannot sign in to gerrit, I think we should not close this... [06:46:53] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:00:03] Project mediawiki-core-code-coverage-docker build #4236: 04FAILURE in 4 hr 0 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-docker/4236/ [07:18:34] 10Gerrit, 10Shape Expressions, 10Wikidata, 10Patch-For-Review, and 2 others: Replace WikibaseSchema repository content with message pointing to EntitySchema - https://phabricator.wikimedia.org/T222192 (10WMDE-leszek) Should be done now? [07:23:10] 10Gerrit, 10Release-Engineering-Team (Kanban), 10LDAP: Gerrit: Cannot assign user name "vladi2016" to account XXXX; name already in use. - https://phabricator.wikimedia.org/T220867 (10Paladox) 05Open→03Resolved Please file a separate task (as this task task was for a specific user) :) [08:01:05] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.023 second response time [08:12:07] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [09:06:42] PROBLEM - Citoid on deployment-sca01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:21:32] RECOVERY - Citoid on deployment-sca01 is OK: HTTP OK: HTTP/1.1 200 OK - 921 bytes in 0.027 second response time [09:57:06] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.037 second response time [10:03:04] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [11:49:23] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [11:51:37] 10Continuous-Integration-Config, 10Growth-Team, 10Notifications, 10Thanks: proc_open() - Memory allocation problem - https://phabricator.wikimedia.org/T222786 (10D3r1ck01) [11:52:52] 10Continuous-Integration-Config, 10Growth-Team, 10Notifications, 10Thanks, 10phan: proc_open() - Memory allocation problem - https://phabricator.wikimedia.org/T222786 (10D3r1ck01) [12:03:54] 10Continuous-Integration-Config, 10Growth-Team, 10Notifications, 10Thanks, 10phan: proc_open() - Memory allocation problem - https://phabricator.wikimedia.org/T222786 (10Mainframe98) Not related to a specific extension, the same is happening for #xanalytics: https://gerrit.wikimedia.org/r/c/mediawiki/ext... [12:11:57] 10Release-Engineering-Team, 10MediaWiki-Core-Testing, 10Patch-For-Review, 10Wikimedia-production-error (Shared Build Failure), 10phan: phan 1.2.6 is OOMing on MediaWiki core - https://phabricator.wikimedia.org/T219114 (10D3r1ck01) [12:12:54] 10Release-Engineering-Team, 10MediaWiki-Core-Testing, 10Patch-For-Review, 10Wikimedia-production-error (Shared Build Failure), 10phan: phan 1.2.6 is OOMing on MediaWiki core - https://phabricator.wikimedia.org/T219114 (10D3r1ck01) Spotted in {T222786} but I've closed as dup of this. [12:19:22] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [12:22:56] 10Gerrit, 10LDAP: Gerrit: Cannot assign user name "msyn" to account 7123; name already in use. - https://phabricator.wikimedia.org/T222792 (10MisterSynergy) [12:30:22] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [12:40:22] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [12:46:21] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [12:50:06] 10Release-Engineering-Team, 10MediaWiki-Core-Testing, 10Patch-For-Review, 10Wikimedia-production-error (Shared Build Failure), 10phan: phan 1.2.6 is OOMing on MediaWiki core - https://phabricator.wikimedia.org/T219114 (10Daimona) @Jdforrester-WMF Huh, I forgot that LibraryUpgrader is sorta dead right now... [13:18:30] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:35:58] 10Gerrit, 10Release-Engineering-Team (Kanban), 10LDAP: Gerrit: Cannot assign user name "vladi2016" to account XXXX; name already in use. - https://phabricator.wikimedia.org/T220867 (10Dzahn) @MisterSynergy The generic task for this problem is T216605. [14:51:59] 10Phabricator, 10Release-Engineering-Team (Watching / External), 10Operations, 10serviceops, 10Patch-For-Review: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568 (10Dzahn) [15:02:54] 10Gerrit, 10Release-Engineering-Team (Kanban), 10LDAP: Gerrit: Cannot assign user name "vladi2016" to account XXXX; name already in use. - https://phabricator.wikimedia.org/T220867 (10MisterSynergy) Thanks, but T216605 is access restricted and I cannot see any of its content. I feel pretty lost with this pro... [15:03:55] thcipriani wondering if you can take a look at ^^ please? :) [15:08:29] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10puppet-compiler: Puppet catalog compiler - increasing max concurrent jobs - https://phabricator.wikimedia.org/T221969 (10herron) @hashar while on the topic, is it possible for Jenkins to more evenly dispatch PCC jobs across the workers?... [15:14:25] (03PS7) 10Zfilipin: Create selenium-daily-beta-AdvancedSearch Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/460516 (https://phabricator.wikimedia.org/T188742) [15:14:27] (03PS4) 10Zfilipin: Create selenium-daily-beta-ElectronPdfService Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/460552 (https://phabricator.wikimedia.org/T188742) [15:14:29] (03PS6) 10Zfilipin: Create selenium-daily-beta-ORES Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/460517 (https://phabricator.wikimedia.org/T188742) [15:14:31] (03PS5) 10Zfilipin: Create selenium-daily-beta-TwoColConflict Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/460560 (https://phabricator.wikimedia.org/T188742) [15:14:33] (03PS5) 10Zfilipin: Create selenium-daily-beta-WikibaseLexeme Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/460527 (https://phabricator.wikimedia.org/T188742) [15:15:23] (03CR) 10jerkins-bot: [V: 04-1] Create selenium-daily-beta-ORES Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/460517 (https://phabricator.wikimedia.org/T188742) (owner: 10Zfilipin) [15:15:27] (03CR) 10jerkins-bot: [V: 04-1] Create selenium-daily-beta-TwoColConflict Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/460560 (https://phabricator.wikimedia.org/T188742) (owner: 10Zfilipin) [15:15:38] (03CR) 10jerkins-bot: [V: 04-1] Create selenium-daily-beta-WikibaseLexeme Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/460527 (https://phabricator.wikimedia.org/T188742) (owner: 10Zfilipin) [15:23:05] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.024 second response time [15:25:08] (03PS1) 10Brian Wolff: Add WebAuthn extension to jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/508845 [15:27:01] (03CR) 10jerkins-bot: [V: 04-1] Add WebAuthn extension to jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/508845 (owner: 10Brian Wolff) [15:29:15] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:30:37] :( [15:31:01] I am just copying and pasting from other extenison config, so that's probably my problem [15:32:20] (03PS2) 10Brian Wolff: Add WebAuthn extension to jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/508845 [15:36:28] (03PS3) 10Brian Wolff: Add WebAuthn extension to jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/508845 [15:38:31] 10Beta-Cluster-Infrastructure: Migrate away from Debian Jessie to Debian Stretch - https://phabricator.wikimedia.org/T218729 (10fgiunchedi) >>! In T218729#5161078, @Krenair wrote: >>>! In T218729#5159473, @fgiunchedi wrote: >>>>! In T218729#5156079, @Krenair wrote: >>> No worries I'm happy to take care of that s... [15:57:25] James_F: Missing wikibase media info at https://www.mediawiki.org/wiki/Developers/Maintainers [15:57:35] Multimedia, right? [15:58:51] mutante: so yeah we oculd use some sync up for Gerrit / Phabricator operations that have to be done soonish (ping twentyafterfour brennen thcipriani ) [15:59:14] then probably that should not all fall on your shoulders and might want a 2nd sre to be involved as well? [15:59:14] +1 [15:59:23] or I am just grossly exagerating the amount of work that is required [15:59:28] I don't know really [15:59:44] I never get to see the hardware/network/disks etc [15:59:46] hashar: i kind of expected that to be in this meeting but it was way more high-level [16:00:53] prepared all the stuff i dumped onto the Etherpad for that but i guess it was way too specific [16:00:55] for Gerrit we need more RAM, Stretch, and the software upgrade [16:01:10] but I don't think we have a firm plan / migration plan yet to roll all those [16:01:44] first of all .. i would like to know if upgrading the RAM is an option or not [16:01:52] in the existing server that is [16:02:00] yup :) [16:02:04] but i am afraid the answer from dc-ops might be no [16:02:10] then eventually we will replace cobalt anyway since it is old [16:02:23] so middle term, I am not too worried about the RAM issue. That would eventually self solve ;D [16:02:26] yes, it needs to be: upgraded, replaced, reinstalled AND renamed [16:02:33] so might as well use a new one right away [16:02:55] +1 [16:02:58] probably we have to open a subtask of the 'gerrit hardware upgrade' ticket [16:03:10] that is labeled procurement/ hardware-request [16:03:16] but if at all possible, I would prefer to avoid upgrading Gerrit 2.16 AND switch to Stretch in the same shot [16:03:16] ;D [16:03:32] +1 [16:03:35] not at the same time [16:03:39] ok ok [16:03:48] either first more RAM [16:03:50] or first 2.16 [16:03:57] or first stretch :p [16:04:03] and +1 on filling a procurement to request the new hardware right now for a host named gerrit1001 [16:04:10] then we can figure out the migration plan [16:04:23] alright, i can do that [16:04:33] if thcipriani agrees to it ;] [16:04:45] migration for gerrit 2.16 is simply sync the repos (shoulden't be anything else apart from the h2 files in db/) [16:04:46] we can start with phabricator [16:04:49] but yeah cobalt is old, so we already know we need a new hardware and can get it right now [16:05:01] paladox: IF all the blockers are resolved... [16:05:07] yup [16:05:12] there's only one blocker [16:05:13] I think all Gerrit 2.16 blockers have been solved [16:05:15] the /p/ thing [16:05:18] oh [16:05:21] phabricator can be switched to stretch sooner than that [16:05:28] i have a patch to redirect that though hashar [16:05:31] we now have a temp replacement server with enough RAM [16:05:50] for /p/ we can look at the Apache logs and find out which part of the infra are still using those urls [16:05:52] and update them [16:05:57] hashar: no, there are still blockers open.. commented on the ticket for that [16:06:08] hashar https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/507787/ [16:06:23] mutante what's the other blockers? [16:06:24] hashar: yes, paladox already uploaded all the patches.. but getting them merged isnt easy [16:06:29] ;]] [16:06:36] i merged some of them.. but some will need other people [16:06:41] looks like all is well covered so :] [16:06:42] oh, those changes are not blockers [16:06:46] hence why i did https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/507787/ :) [16:07:05] yea, but tbh yesterday i spent like half a day on that [16:07:11] and we still didnt get there [16:07:14] Phabriator I don't know the infrastructure at all [16:07:17] you can't test on gerrit2001 [16:07:20] you cant test in cloud VPS [16:07:24] apache-fast-test wasnt working [16:07:25] and so on [16:08:04] yeh, but i tested using a real url [16:08:11] apache rewrites can go wrong in multiple ways, i dont want to take it lightly and just merge [16:08:20] yea, but you tested on something that isnt like prod [16:08:29] better than nothing, but not 100% [16:08:39] yup. [16:09:31] i would still like to have something that is actually like cobalt and actually runs gerrit and has the same apache config [16:10:31] from a quick look, it seems Phabricator does a lot of queries with /p/ URLs ;] [16:10:53] that being said.. at least if they break they are not cached in varnish [16:11:29] anyway I am off, it is an holiday here in France. Have good hacks! [16:11:44] hashar: have a great holiday, ttyl more about this then [16:12:12] i will make the hardware request ticket and inside that i will ask Robh/Chris if it's also possible to get a RAM upgrade in place or not [16:12:17] for Gerrit [16:12:46] mutante: Sods law the hardware just recycled had ram that could've been used? :P [16:12:47] Krinkle: Yes. Will add. [16:12:49] also will keep looking at the rewrite change but just want to be extra careful ..that is all [16:13:19] mutante: or better: follow up with Tyler / Brennen since they are in US timezone ;) [16:13:53] Reedy: afaict that has happened a lot in the past but dcops don't want to do that anymore because it lead to more confusion later .. but i will ask [16:14:12] heh [16:15:12] as long as it's not leased hardware .. i guess [16:15:30] it says if its leased in netbox [16:15:38] if its leased, we cannot change hardware. [16:16:08] also adding ram if its in eqiad will be hard since we just sold off our old decom systems, it may be possible but less likely than it would hav ebeen 6 months ago ;D [16:16:14] what system we talking about? [16:16:25] robh: cobalt needs 32GB more RAM [16:16:41] hrmm [16:16:43] it just left warranty [16:16:52] procurement ticket was https://phabricator.wikimedia.org/T120248 [16:16:59] we tend to not upgrade hardware on existing systems, but i suppose it could be possible... but its not typically done [16:17:13] yea, but we did NOT add it to the list of "servers becoming 5 years old" because that wasnt the case yet [16:17:35] so if you want yeah make an s4 task and i can look up what it costs to add some ram [16:17:41] afaict the rule wasn't "on the day it leaves warranty" but something longer? [16:17:45] but, we tend to not do that often so dunno if it will be approved [16:18:18] mutante: im not sure what you are asking? [16:18:25] robh: ok, so if i want to request 2 things.. one is a new server replacing cobalt and one is upgrading the RAM inside it.. 2 separate tickets? [16:18:31] Why would you do both? [16:18:34] why not one or the other? [16:18:46] I mean, if we upgrade the ram, you still want t replace it? [16:18:47] robh: i am asking when hardware gets replaced. so far it was not "as soon as they are out of warranty" but longer than that [16:18:58] hence we had that ticket to list servers that are older than 5 years [16:19:21] robh: ideally i want the RAM upgrade now and then replace the whole server once it is time to do that [16:19:31] where i am not sure about the exact rule that determines that time [16:19:42] So, we replace a server anywhere after 3 years and before 5 years [16:19:49] after 5 years it tends to be automatic replacement [16:20:12] as this is past its warranty but before 5 years, it becomes a cost-benefit analysis on just replacing the hardware versus upgrading [16:20:16] and we tend to always just replace [16:20:26] ok, so the ask is "either RAM upgrade now and we can wait a bit longer" or "replace entirely asap" [16:20:38] gotcha.. yes [16:20:45] let's figure that out on a ticket then.. makes sense? [16:20:50] so, ideally you should file a single task in S4 #procorement stating that cobalt has to either get a ram upgrade for out of warranty hardware OR it needs to be repalced with a 64GB host right? [16:20:54] or 2 tickets that is... not sure what you prefer [16:21:00] ok, will do [16:21:05] i'd do one S4 task so you can see the prices side by side [16:21:09] alright [16:21:15] if we end up doing both for whatever reason, i'd split them for the two orders at that time =] [16:21:24] ok, thanks [16:21:41] cuz yeah, i think its a 50% chance we'll just order a new system for this use with the right amount of ram [16:21:48] and put the older 3+ year old system to spare pool or whatever [16:21:49] or even moar ram [16:21:59] java is never going to fail to om nom nom it [16:22:03] yea, also we need to do multiple things.. like also reinstall with new OS version and rename it [16:22:08] so using a new one might be a bit easier [16:22:21] it would become gerrit1001 one way or another [16:22:31] whether it's the current hardware or not [16:22:41] mutante: we also have some dual cpu spare pool systems in eiqad now [16:22:49] so there is a good chance that one will be allocated for this ;D [16:23:01] rather than have to orer, depending on the disk requirements and the like [16:23:01] 10Gerrit, 10LDAP: Gerrit: Cannot assign user name "msyn" to account 7123; name already in use. - https://phabricator.wikimedia.org/T222792 (10hashar) [16:23:22] ie: our dual cpu misc system is dual silver 4110, 64gb ram, 1gb nic, dual 480GB SSD [16:23:30] also.. we want to keep the system in eqiad and the one in codfw for the same service on the same type of hardware roughly [16:23:36] the one in codfw already has 64GB [16:23:46] ahh, good, so codfw is aready where it needs to be =] [16:23:47] robh: sounds good, will check though [16:23:48] i was about to ask that [16:23:58] at least RAM-wise it is, ack [16:24:22] just assign the new task over to me and i'll pull the pricing for you later, likely tomorrow [16:24:47] wilco,thx [16:43:43] Krinkle: Done. [16:50:23] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Release Pipeline: Experiment with hosted kubernetes solutions for Beta - https://phabricator.wikimedia.org/T222820 (10thcipriani) [16:51:49] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Release Pipeline: Experiment with hosted kubernetes solutions for Beta - https://phabricator.wikimedia.org/T222820 (10thcipriani) [16:51:52] 10Beta-Cluster-Infrastructure, 10Release Pipeline, 10serviceops, 10Core Platform Team Backlog (Next), and 2 others: Migrate Beta cluster services to use Kubernetes - https://phabricator.wikimedia.org/T220235 (10thcipriani) [16:52:28] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Release Pipeline: Experiment with hosted kubernetes solutions for Beta - https://phabricator.wikimedia.org/T222820 (10thcipriani) p:05Triage→03Normal a:03dduvall assigning to @dduvall based on hangout discussion [17:13:08] 10Gerrit, 10Wikimedia-General-or-Unknown, 10Documentation, 10Epic, and 4 others: Update Gerrit /r/p/ links to /r/ - https://phabricator.wikimedia.org/T218844 (10Andrew) After merging the above patch I corrected the urls in /var/lib/git/operations/puppet/.git/config and /var/lib/git/operations/software/.git... [17:14:04] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.025 second response time [17:16:36] 10Gerrit, 10Operations, 10cloud-services-team, 10serviceops: Change /r/p/ to /r/ on all hosts (where https://gerrit.wikimedia.org/r/p/ exists) - https://phabricator.wikimedia.org/T222093 (10Paladox) [17:25:06] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [17:41:24] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Release Pipeline: Experiment with hosted kubernetes solutions for Beta - https://phabricator.wikimedia.org/T222820 (10dduvall) [17:43:32] 10Release-Engineering-Team (Kanban), 10Scap: Automate updating deployment notes - https://phabricator.wikimedia.org/T196516 (10thcipriani) 05Open→03Resolved As of 1.34.0-wmf.4 this is working! https://www.mediawiki.org/wiki/MediaWiki_1.34/wmf.4/Changelog was generated by jenkins with no manual steps other... [17:43:35] 10Release-Engineering-Team, 10Scap, 10Epic, 10Goal: Automate the Train - https://phabricator.wikimedia.org/T196515 (10thcipriani) [17:47:45] 10Release-Engineering-Team (Kanban), 10MediaWiki-Release-Tools: merge branch.py and make-wmf-branch - https://phabricator.wikimedia.org/T222829 (10thcipriani) [17:48:27] 10Release-Engineering-Team (Kanban), 10MediaWiki-Release-Tools: merge branch.py and make-wmf-branch - https://phabricator.wikimedia.org/T222829 (10thcipriani) p:05Triage→03Normal a:03mmodell @mmodell and I talked about this a bit during our pairing, assigning to him to work on. [17:48:43] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Release Pipeline: Experiment with hosted kubernetes solutions for Beta - https://phabricator.wikimedia.org/T222820 (10dduvall) [18:00:54] 10Project-Admins: Replace tracking bug T21719 by new project tag "HTML5" - https://phabricator.wikimedia.org/T102502 (10Izno) >>! In T102502#4613434, @Aklapper wrote: > I'm missing a use case why someone would like to follow only HTML-5 related tasks. Our current use of HTML 5 is lacking. Either we are using ob... [18:01:23] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Release Pipeline: Experiment with hosted kubernetes solutions for Beta - https://phabricator.wikimedia.org/T222820 (10dduvall) [18:01:59] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Release Pipeline: Experiment with hosted kubernetes solutions for Beta - https://phabricator.wikimedia.org/T222820 (10dduvall) [18:07:57] thcipriani: going to restart gerrit to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/508391/6/modules/gerrit/templates/log4j.xml.erb and because now is the "sanity break" [18:09:16] mutante: +1 sounds sane :) [18:09:21] thank you! [18:12:00] ==> gerrit.json <== [18:12:01] {"@timestamp":"2019-05-08T18:11:40.084Z","source_host":"cobalt" [18:12:07] paladox: ^ file has been created :) [18:13:13] logging initialized.. and it's back [18:13:25] every time it makes you wait just long enough to start worrying [18:13:49] and then you get that external id error but of course that's unrelated:) [18:14:32] heh, yep, should be in the docs: timing: long enough for you to start worrying [18:14:56] gerrit.json now gets logged to. works [18:15:04] haha, indeed [18:15:05] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.036 second response time [18:16:42] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Wikimedia-Logstash, 10Patch-For-Review, 10Technical-Debt: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324 (10Dzahn) Deployed the change above and restarted Gerrit. The new file `/var/log/gerrit/gerrit.json` has been... [18:20:10] 10Gerrit, 10Repository-Admins, 10Shape Expressions, 10Wikidata, and 2 others: rename repository for WikibaseSchema - https://phabricator.wikimedia.org/T221946 (10Ladsgroup) [18:20:52] 10Gerrit, 10Repository-Admins, 10Shape Expressions, 10Wikidata, and 2 others: rename repository for WikibaseSchema - https://phabricator.wikimedia.org/T221946 (10Ladsgroup) [18:21:05] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [18:31:11] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Wikimedia-Logstash, 10Patch-For-Review, 10Technical-Debt: Look into shoving gerrit logs into logstash - https://phabricator.wikimedia.org/T141324 (10thcipriani) >>! In T141324#5168214, @Dzahn wrote: > Deployed the change above and restarted Gerrit. The ne... [18:31:50] mutante thanks!! [18:34:31] (03CR) 10Urbanecm: "Can somebody merge this now, please? :-)" [integration/config] - 10https://gerrit.wikimedia.org/r/479738 (https://phabricator.wikimedia.org/T222544) (owner: 10Urbanecm) [18:37:26] (03CR) 10Dzahn: "added more reviewers" [integration/config] - 10https://gerrit.wikimedia.org/r/479738 (https://phabricator.wikimedia.org/T222544) (owner: 10Urbanecm) [18:42:18] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Discovery-Search: quibble-vendor-mysql-hhvm-docker for WikibaseCirrusSearch takes over 40 minutes - https://phabricator.wikimedia.org/T222757 (10greg) See also: T221434, which this might be a dupe of. The general issue is: running our te... [18:51:07] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.033 second response time [18:55:49] (03CR) 10Urbanecm: "> Patch Set 11:" [integration/config] - 10https://gerrit.wikimedia.org/r/479738 (https://phabricator.wikimedia.org/T222544) (owner: 10Urbanecm) [18:57:04] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [19:00:03] Project mediawiki-core-code-coverage-docker build #4237: 04STILL FAILING in 4 hr 0 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-docker/4237/ [19:05:20] 10Gerrit, 10LDAP: Gerrit: Cannot assign user name "msyn" to account 7123; name already in use. - https://phabricator.wikimedia.org/T222792 (10thcipriani) 05Open→03Resolved a:03thcipriani @MisterSynergy sorry I missed your username on T220867 :( We plan to fix this for all potentially affected users as p... [19:24:02] 10Gerrit, 10Wikimedia-General-or-Unknown, 10Documentation, 10Epic, and 4 others: Update Gerrit /r/p/ links to /r/ - https://phabricator.wikimedia.org/T218844 (10Andrew) Ran sudo cumin --force --timeout 500 -o json "A:all" "sed -i 's%/r/p/%/r/%' /srv/composer/.git/config" for the composer urls [19:46:43] Is gerrit having issues? [19:46:51] "Working..." [19:47:36] oh [19:47:39] i had that too [19:47:52] threads i guess went up [19:47:53] yeh [19:47:57] i see threads are going up [19:49:48] it may be going down [19:51:24] nope going up. [19:52:05] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.025 second response time [19:58:05] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [20:14:24] Project beta-code-update-eqiad build #245885: 04FAILURE in 1 min 22 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/245885/ [20:24:18] Yippee, build fixed! [20:24:19] Project beta-code-update-eqiad build #245886: 09FIXED in 1 min 17 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/245886/ [20:35:46] (03PS1) 10Kosta Harlan: sonar-scanner: Use relative paths and mount to /workspace/src [integration/config] - 10https://gerrit.wikimedia.org/r/508929 (https://phabricator.wikimedia.org/T218598) [20:39:16] (03PS27) 10Kosta Harlan: Establish codehealth pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/502606 (https://phabricator.wikimedia.org/T218598) [20:42:20] (03PS2) 10Kosta Harlan: sonar-scanner: Use relative paths and mount to /workspace/src [integration/config] - 10https://gerrit.wikimedia.org/r/508929 (https://phabricator.wikimedia.org/T218598) [20:55:22] (03PS3) 10Kosta Harlan: sonar-scanner: Use relative paths and mount to /workspace/src [integration/config] - 10https://gerrit.wikimedia.org/r/508929 (https://phabricator.wikimedia.org/T218598) [20:56:39] (03PS4) 10Kosta Harlan: sonar-scanner: Use relative paths and mount to /workspace/src [integration/config] - 10https://gerrit.wikimedia.org/r/508929 (https://phabricator.wikimedia.org/T218598) [21:11:59] (03PS4) 10Catrope: Clear MessageBlobStore after syncing i18n data [tools/scap] - 10https://gerrit.wikimedia.org/r/508488 (https://phabricator.wikimedia.org/T222539) [21:14:10] (03CR) 10Catrope: Clear MessageBlobStore after syncing i18n data (032 comments) [tools/scap] - 10https://gerrit.wikimedia.org/r/508488 (https://phabricator.wikimedia.org/T222539) (owner: 10Catrope) [21:17:21] (03CR) 10PipelineBot: "pipeline-dashboard: service-pipeline-test" [tools/scap] - 10https://gerrit.wikimedia.org/r/508488 (https://phabricator.wikimedia.org/T222539) (owner: 10Catrope) [21:17:23] (03CR) 10jerkins-bot: [V: 04-1] Clear MessageBlobStore after syncing i18n data [tools/scap] - 10https://gerrit.wikimedia.org/r/508488 (https://phabricator.wikimedia.org/T222539) (owner: 10Catrope) [21:18:04] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.031 second response time [21:19:30] (03PS9) 10Kosta Harlan: Generate junit.xml for sonar-scanner's usage [integration/config] - 10https://gerrit.wikimedia.org/r/508019 (https://phabricator.wikimedia.org/T218598) [21:21:34] (03CR) 10jerkins-bot: [V: 04-1] Generate junit.xml for sonar-scanner's usage [integration/config] - 10https://gerrit.wikimedia.org/r/508019 (https://phabricator.wikimedia.org/T218598) (owner: 10Kosta Harlan) [21:29:06] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [21:32:48] (03PS5) 10Krinkle: Clear MessageBlobStore after syncing i18n data [tools/scap] - 10https://gerrit.wikimedia.org/r/508488 (https://phabricator.wikimedia.org/T222539) (owner: 10Catrope) [21:32:55] (03CR) 10Krinkle: [C: 03+1] Clear MessageBlobStore after syncing i18n data [tools/scap] - 10https://gerrit.wikimedia.org/r/508488 (https://phabricator.wikimedia.org/T222539) (owner: 10Catrope) [21:35:03] (03CR) 10PipelineBot: "pipeline-dashboard: service-pipeline-test" [tools/scap] - 10https://gerrit.wikimedia.org/r/508488 (https://phabricator.wikimedia.org/T222539) (owner: 10Catrope) [21:35:05] (03CR) 10jerkins-bot: [V: 04-1] Clear MessageBlobStore after syncing i18n data [tools/scap] - 10https://gerrit.wikimedia.org/r/508488 (https://phabricator.wikimedia.org/T222539) (owner: 10Catrope) [21:35:48] (03PS6) 10Krinkle: Clear MessageBlobStore after syncing i18n data [tools/scap] - 10https://gerrit.wikimedia.org/r/508488 (https://phabricator.wikimedia.org/T222539) (owner: 10Catrope) [21:37:29] (03CR) 10PipelineBot: "pipeline-dashboard: service-pipeline-test" [tools/scap] - 10https://gerrit.wikimedia.org/r/508488 (https://phabricator.wikimedia.org/T222539) (owner: 10Catrope) [21:42:25] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [22:13:03] 10Release-Engineering-Team (Kanban), 10MediaWiki-extensions-MultimediaViewer, 10Multimedia, 10Patch-For-Review, 10User-zeljkofilipin: MediaViewer selenium tests are failing with "Misconfigured -- Unsupported OS/browser/version/device combo" because our target Mac... - https://phabricator.wikimedia.org/T214389 [23:13:57] 10Gerrit, 10Operations, 10cloud-services-team, 10serviceops: Change /r/p/ to /r/ on all hosts (where https://gerrit.wikimedia.org/r/p/ exists) - https://phabricator.wikimedia.org/T222093 (10Paladox) [23:44:04] RECOVERY - Mathoid on deployment-mathoid is OK: HTTP OK: HTTP/1.1 200 OK - 925 bytes in 0.038 second response time [23:50:05] PROBLEM - Mathoid on deployment-mathoid is CRITICAL: connect to address 172.16.5.73 and port 10042: Connection refused [23:58:12] 10MediaWiki-Codesniffer: Improve or disable SingleSpaceBeforeSingleLineComment - https://phabricator.wikimedia.org/T222853 (10Tgr)