[00:14:09] 10Gerrit, 06Operations, 10hardware-requests: Allocate spare misc box in eqiad for gerrit replacement - https://phabricator.wikimedia.org/T147596#2697736 (10Dzahn) @RobH Which ticket should i use for the follow-up to investigate lead hardware issues / talk to Dell. This? a new one? [00:43:06] RECOVERY - Long lived cherry-picks on puppetmaster on deployment-puppetmaster is OK: OK: Less than 100.00% above the threshold [0.0] [00:50:52] since we have "deployment-tin" and "deployment-mira" in labs, is there also "deployment-terbium" and "deployment-wasat"? [00:51:23] i am defining maintenance hosts in prod and i need to add an equivalent for deployment-prep [00:51:51] but dont know what makes sense to use [00:53:27] fwiw, when i click "deployment" filter on shinken, i get deployment-fluorine, phab01 and phab02 but that is all. when putting "deployment" in the search window above i get other hosts [00:54:08] like -pdf01 , -elastic05 and whatnot, but none of them are tin and mira [00:54:35] those i just see in network/manifests/constants.pp [01:17:07] mutante, there isn't [01:17:21] not sure we have anything running the crons in beta right now [01:18:37] not sure it makes sense to create two of that type of instance though [01:25:36] PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:46:47] Krenair: all i really want is merge https://gerrit.wikimedia.org/r/#/c/314778/ and i'm being told to not forget labs [01:47:13] so any kind of placeholder would be good enough for me [01:49:11] there goes matrix again [01:53:30] mutante, just leave the labs ones as empty lists for now and deal with it separately? we don't have this particular piece of infrastructure right now, leaving it unfilled shouldn't break anything [01:54:21] the commit message mentions labtest hosts? [01:57:53] Krenair: labtest are the same hosts as production it looks [01:58:36] mutante, yeah... probably just been copied [01:59:54] amending so they are empty lists [02:00:53] the prod maintenance hosts have properly mapped IPv6 now [02:01:10] since i merged https://gerrit.wikimedia.org/r/#/c/302649/ today [02:01:31] that's good [02:01:49] is the long-term goal to have mapped ipv6s for all hosts? [02:05:38] RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0] [02:11:56] Krenair: yes, it is [02:13:36] https://phabricator.wikimedia.org/T100690 [02:15:50] but we can't just do them all by default because that would break the existing mac-based ipv6 addresses? [02:17:17] i think there was another newer ticket about that.. the title was something like "fix it once and for all" but dont see it now.. hmm [02:17:35] oh... I saw this somewhere [02:17:54] Krenair: i think there might be _a few_ cases where those show up in ferm rules [02:17:55] https://phabricator.wikimedia.org/T102099 [02:17:57] but cant be much [02:18:06] that's it [02:18:18] that's probably a duplicate of 100690 [02:58:35] bd808, the deployment hosts have the maintenance role too? huh [02:58:36] okay [02:59:39] well... I'm not sure that they actually do. But if someone needs to run a one-off job we've always used the deploy host [03:00:05] there's not a really good reason to have a VM just for that in beta cluster [03:00:19] because it really doesn't come up very much [03:59:54] 10Continuous-Integration-Infrastructure, 10Wikidata: [Bug] github.com is 403ing downloads from Wikimedia CI during composer update - https://phabricator.wikimedia.org/T106519#1470716 (10JeroenDeDauw) Not sure there is a real issue to solve here. Yes, GitHub does not have a 100% success rate for providing zips.... [04:05:48] Yippee, build fixed! [04:05:49] Project selenium-MultimediaViewer » safari,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #169: 09FIXED in 9 min 48 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/169/ [04:17:01] 06Release-Engineering-Team, 10ArchCom-RfC, 06Developer-Relations, 06WMF-Legal, and 2 others: Create formal process for CREDITS files - https://phabricator.wikimedia.org/T139300#2708019 (10RobLa-WMF) [04:31:53] 06Release-Engineering-Team: WMF-deploy tags are not auto-added in some cases - https://phabricator.wikimedia.org/T147909#2708027 (10Yurik) [04:55:26] 10Gerrit: Support OAuth for login onto gerrit.wikimedia.org - https://phabricator.wikimedia.org/T147864#2708072 (10bd808) >>! In T147864#2706550, @Tgr wrote: > @bd808 has plans for eventually merging LDAP with SUL, I think. There isn't a full plan with a timeline yet, but yes I'm starting work towards associati... [07:18:39] 06Release-Engineering-Team, 10Wikimedia-Developer-Summit, 06Developer-Relations (Oct-Dec-2016): Developer Summit 2017: Work with TPG and RelEng on solution to event documenting - https://phabricator.wikimedia.org/T132400#2708128 (10Qgil) [07:51:21] 10Gerrit: Support OAuth for login onto gerrit.wikimedia.org - https://phabricator.wikimedia.org/T147864#2706225 (10Paladox) I doint think this will work, I believe gerrit only supports one login system at a time unlike mediawiki which can support a lot. But I'm guessing here. [07:56:23] 10Gerrit, 06Repository-Admins: Rename the Semantic Forms extension to "Page Forms" - https://phabricator.wikimedia.org/T147582#2708181 (10Paladox) I've requested that Page Forms be created here https://www.mediawiki.org/wiki/Git/New_repositories/Requests and imported from SemanticForms. [07:58:59] 10Gerrit: Update gerrit to 2.12.5 - https://phabricator.wikimedia.org/T143089#2708183 (10Paladox) [08:04:40] 10Gerrit: Update gerrit to 2.13.1 - https://phabricator.wikimedia.org/T146350#2708187 (10Paladox) [08:10:22] 10Gerrit: Gerrit side-by-side diff view does not mark a removed "space" in a string (example given) (intraline different doesn't work with spaces) - https://phabricator.wikimedia.org/T51006#540528 (10Paladox) Is this https://bugs.chromium.org/p/gerrit/issues/detail?id=3423 related? If so then this is fixed gerr... [09:33:41] PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [10:13:43] RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:16:36] 10Continuous-Integration-Config, 06Release-Engineering-Team, 10MediaWiki-Unit-tests: MediaWiki code coverage fails on Zend PHP 7.0 due to a database error - https://phabricator.wikimedia.org/T147781#2708462 (10Paladox) This is done in https://github.com/wikimedia/mediawiki/blob/00840cd3369fb0823493e036962b9f... [10:32:30] 10Continuous-Integration-Config, 06Release-Engineering-Team, 10MediaWiki-Unit-tests: MediaWiki code coverage fails on Zend PHP 7.0 due to a database error - https://phabricator.wikimedia.org/T147781#2708511 (10Paladox) And mediawiki db name is MW_DB=jenkins_u0_mw [10:45:00] PROBLEM - Puppet run on deployment-elastic08 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [10:54:33] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10DBA, 10MediaWiki-Database, 07WorkType-NewFunctionality: Enable MariaDB/MySQL's Strict Mode - https://phabricator.wikimedia.org/T108255#2708529 (10hashar) The beta cluster databases have been migrated to Jessie / MariaDB 5.10 T13877... [11:25:02] RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [13:04:45] hashar: o/ is it ok if I install the new memcached on memcached04? [13:04:55] I will log everything of course [13:21:02] elukey: yeah do do :} [13:21:03] it is all your [13:21:24] since I guess you are the primary lead/point of contact for memcached in prod [13:21:32] the equivalent on beta is all your as well :D [13:23:15] I'd say that Ori and Giuseppe are the experts, I am just trying to do some experiments :D [13:24:53] lets say you are in charge :D [13:35:55] watch fetchers [13:35:55] OK [13:35:56] ts=1476279340.512745 gid=1 type=item_get key=WANCache%3At%3Aenwiki%3Agadgets-definition%3A9%3A2 status=found [13:36:12] worked nicely :) [13:37:32] !log upgraded memcached on deployment-memc04 to 1.4.28-1.1+wmf1 as part of a perf experiment (T129963) - rollback: wipe https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep/host/deployment-memc04, apt-get remove memcached on deployment-memc04, puppet run [13:37:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:39:19] PROBLEM - Puppet run on deployment-memc04 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [13:44:19] RECOVERY - Puppet run on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [14:00:40] 10Continuous-Integration-Infrastructure, 10Packaging, 13Patch-For-Review, 07Zuul: Package / puppetize zuul-clear-refs.py - https://phabricator.wikimedia.org/T103529#2708812 (10hashar) zuul_2.5.0-8-gcbc7f62-wmf4* fix it: $ which zuul-clear-refs /usr/bin/zuul-clear-refs $ zuul-clear-refs usage: zuul-clear... [14:01:51] 10Continuous-Integration-Infrastructure, 10Packaging, 13Patch-For-Review, 07Zuul: Package / puppetize zuul-clear-refs.py - https://phabricator.wikimedia.org/T103529#2708817 (10hashar) [14:01:53] 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations: Upgrade Zuul on scandium.eqiad.wmnet (Jessie zuul-merger) - https://phabricator.wikimedia.org/T145057#2708813 (10hashar) 05Resolved>03Open a:05elukey>03None [14:02:30] 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations: Upgrade Zuul on scandium.eqiad.wmnet (Jessie zuul-merger) - https://phabricator.wikimedia.org/T145057#2618640 (10hashar) Polished up some oddity from yesterday deploy, the shebang was incorrect in zuul-clear-ref. Gotta bump to wmf4 on... [14:23:01] elukey: got some spare time for some more zuul.deb babysitting ? :} could use an upgrade on scandium and a couple upload to apt.wm.o [14:23:09] I figured out the issue I had yesterday night :} [14:23:38] sure [14:24:06] I have reopened yesterday task https://phabricator.wikimedia.org/T145057 and updated [14:24:28] is it ok if we install it first, double check and the upload? [14:24:32] yeah [14:24:42] ssh scandium.eqiad.wmnet [14:24:43] wget https://people.wikimedia.org/~hashar/debs/zuul_2.5.0-8-gcbc7f62-wmf4jessie1/zuul_2.5.0-8-gcbc7f62-wmf4jessie1_amd64.deb [14:24:51] dpkg -i zuul_2.5.0-8-gcbc7f62-wmf4jessie1_amd64.deb [14:24:55] :D [14:25:44] I have stopped the service [14:25:54] I was about to ask if zuul_2.5.0-8-gcbc7f62-wmf4precise1_amd64.deb was ok :) [14:26:09] yeah it's the right one [14:26:10] installing [14:26:11] yeah I like triple checked it overnight ;D [14:26:32] ah no wait a minute [14:26:32] and while hacking in it, fixed a few other minor issue in the build. It is slightly better than yesterday [14:26:35] this one is precise [14:26:39] ahaz [14:26:45] https://people.wikimedia.org/~hashar/debs/zuul_2.5.0-8-gcbc7f62-wmf4jessie1/zuul_2.5.0-8-gcbc7f62-wmf4jessie1_amd64.deb [14:27:27] so zuul_2.5.0-8-gcbc7f62-wmf4jessie1_amd64.deb [14:27:30] okok better :) [14:27:31] yeah [14:30:08] neat [14:30:14] dpkg -i started the service for us [14:30:43] also did a systemctl restart zuul-merger just in case :) [14:30:47] /var/log/zuul/merger.log looks clean [14:31:37] $ zuul-clear-refs [14:31:39] usage: zuul-clear-refs [-h] [--until DAYS_AGO] [-n] [-v] gitrepo [14:31:40] all good [14:32:20] for apt.wm.o all the debian stuff is under https://people.wikimedia.org/~hashar/debs/zuul_2.5.0-8-gcbc7f62-wmf4jessie1/ [14:32:33] will want to push it under jessie-wikimedia/thirdparty [14:32:53] which will override the wmf2 with that new wmf4 version [14:33:19] and if you feel adventurous get rid of the wmf3 version which is in jessie-wikimedia/main :D [14:33:37] 10Gerrit: Update site CSS customizations for the new change screen in Gerrit 2.12 - https://phabricator.wikimedia.org/T141286#2708899 (10PleaseStand) >>! In T141286#2691639, @PleaseStand wrote: > I believe I meant you should change `.com-google-gerrit-client-diff-DiffTable_BinderImpl_GenCss_style-table` to `.com... [14:34:41] 10Continuous-Integration-Infrastructure, 10Packaging, 13Patch-For-Review, 07Zuul: Package / puppetize zuul-clear-refs.py - https://phabricator.wikimedia.org/T103529#2708905 (10hashar) [14:34:43] 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations: Upgrade Zuul on scandium.eqiad.wmnet (Jessie zuul-merger) - https://phabricator.wikimedia.org/T145057#2708902 (10hashar) 05Open>03Resolved a:03elukey scandium:~$ zuul-clear-refs usage: zuul-clear-refs [-h] [--until DAYS_AGO] [-n... [14:35:05] 10Continuous-Integration-Infrastructure, 07Zuul: Run zuul-clear-refs.py daily on all our repositories to reclaim Zuul references - https://phabricator.wikimedia.org/T103528#2708911 (10hashar) [14:35:07] 10Continuous-Integration-Infrastructure, 10Packaging, 13Patch-For-Review, 07Zuul: Package / puppetize zuul-clear-refs.py - https://phabricator.wikimedia.org/T103529#1392464 (10hashar) 05Open>03Resolved scandium:~$ zuul-clear-refs usage: zuul-clear-refs [-h] [--until DAYS_AGO] [-n] [-v] gitrepo zuul-cl... [14:35:17] two less tasks :} [14:38:59] uploaded :) [14:39:29] I have no idea about how to remove wmf3 in jessie-wikimedia/main and I don't feel adventurous :D [14:39:37] :D [14:40:07] and last is to upload wmf4 for precise on apt.wm.o [14:40:14] I already upgraded gallium this morning [14:40:39] https://people.wikimedia.org/~hashar/debs/zuul_2.5.0-8-gcbc7f62-wmf4precise1/ [14:40:40] to precise-wikimedia/thirdparty [14:40:54] which would override some very outdated version 2.1.0-60-g1cc37f7-wmf4precise1 [14:41:03] (eg 2.1.0 + 60 commits @wmf4 ) [14:41:09] when we are now at 2.5.0 + 8 commits :D [14:41:26] and from there I am set and you deserve a nice token of appreciation ! [14:44:30] root@carbon:/srv/wikimedia# reprepro ls zuul [14:44:30] zuul | 2.5.0-8-gcbc7f62-wmf3precise1 | precise-wikimedia | amd64, source [14:44:33] zuul | 2.5.0-8-gcbc7f62-wmf4precise1 | precise-wikimedia | amd64, source [14:44:36] zuul | 2.1.0-60-g1cc37f7-wmf4trusty1 | trusty-wikimedia | amd64, source [14:44:39] zuul | 2.5.0-8-gcbc7f62-wmf3jessie1 | jessie-wikimedia | amd64, source [14:44:42] zuul | 2.5.0-8-gcbc7f62-wmf4jessie1 | jessie-wikimedia | amd64, source [14:44:46] afaics it looks good [14:44:55] (just uploaded the precise version) [14:45:26] all set! [14:45:30] you are a champion :} [14:46:02] \o/ [14:46:06] nice, all good [14:46:11] one more task closed :) [14:46:22] yeah [14:46:45] and that strike a prerequisite for the gallium -> contint1001 switch :D [14:55:33] RECOVERY - Puppet run on zuul-dev-jessie is OK: OK: Less than 1.00% above the threshold [0.0] [15:02:16] hi, I am looking for a way to search for the presence or absence of multiple tags in phabricator [15:02:27] 03Scap3, 07Easy, 13Patch-For-Review: remove hard-coded upstart commands - https://phabricator.wikimedia.org/T146656#2709002 (10thcipriani) [15:02:44] it seems there is "In Any:" and "Not In:", but the autocomplete doesn't find any of the tags I am interested in [15:03:28] is there any magic incantation that needs to be performed to make tags show up in that autocomplete? [15:03:55] (this is referencing https://www.mediawiki.org/wiki/Team_Practices_Group/Phabricator_tips/Maniphest#.C2.A0Typeaheads_.28AKA_autocomplete.29) [15:09:16] aha: it seems that those are potentially accessible when searching for the tag name itself, but are usually cut off by the 5-item completion limit [15:09:16] sigh... [15:16:43] 10Browser-Tests-Infrastructure, 10Gerrit, 10Wikidata, 13Patch-For-Review, 15User-Tobi_WMDE_SW: Retire wikidata/browsertests.git - https://phabricator.wikimedia.org/T144486#2709033 (10demon) Deleted. [15:18:12] PROBLEM - Host deployment-terbium is DOWN: CRITICAL - Host Unreachable (10.68.22.119) [15:22:37] gwicke: #wikimedia-devtools maybe , or else wikitech-l :} [15:22:50] I have no idea how to do advanced queries in phabricator [15:23:05] kk, thx [15:25:46] 06Release-Engineering-Team, 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2709051 (10mmodell) @jcrespo: Thank you for the thoughtful responses, I've posted them upstream! [15:31:58] 10Deployment-Systems, 06Release-Engineering-Team, 10RESTBase, 05Goal, 06Services (services-next): Create or improve the RESTBase deploy method - https://phabricator.wikimedia.org/T102667#2709066 (10GWicke) [15:35:29] 03Scap3 (Scap3-MediaWiki-MVP), 03releng-201617-q3, 10scap, 06Operations, and 2 others: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#2709076 (10thcipriani) [15:36:33] 10MediaWiki-Releasing, 10Architecture, 10Parsoid, 07Service-Architecture, 06Services (backlog): Distribution strategy option: Use Debian packages - https://phabricator.wikimedia.org/T88154#2709082 (10GWicke) [15:36:37] 10MediaWiki-Releasing, 10Architecture, 10Parsoid, 07Service-Architecture, 06Services (backlog): Distribution strategy option: Use Vagrant puppet modules - https://phabricator.wikimedia.org/T88151#2709083 (10GWicke) [15:36:41] 10MediaWiki-Releasing, 06Release-Engineering-Team, 10Architecture, 10Parsoid, and 2 others: Evaluate and decide on a distribution strategy targeted at VMs - https://phabricator.wikimedia.org/T87774#2709084 (10GWicke) [15:38:41] Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #191: 04FAILURE in 16 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/191/ [15:39:49] PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:45:22] Project selenium-MobileFrontend » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #191: 04FAILURE in 23 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/191/ [15:51:04] PROBLEM - Puppet run on deployment-phab01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:07:11] 06Release-Engineering-Team, 10Wikimedia-Developer-Summit, 15User-greg: Facilitate Wikidev'17 main topic "How to manage our technical debt" - https://phabricator.wikimedia.org/T147937#2709174 (10greg) [16:08:54] 03Scap3 (Scap3-MediaWiki-MVP), 10scap, 07Security-General: Scap should apply security patches - https://phabricator.wikimedia.org/T118478#2709188 (10mmodell) https://gerrit.wikimedia.org/r/#/c/312013/ [16:12:13] 03Scap3 (Scap3-MediaWiki-MVP): Flatten MediaWiki config, all MediaWiki versions, and extensions into a unified git repo - https://phabricator.wikimedia.org/T147478#2709196 (10thcipriani) [16:12:18] 03Scap3 (Scap3-MediaWiki-MVP), 10releng-201516-q3, 03releng-201617-q2, 10scap, and 2 others: [keyresult] Migrate the MW weekly train deploy to scap3 - https://phabricator.wikimedia.org/T114313#2709195 (10thcipriani) [16:14:15] 03Scap3 (Scap3-MediaWiki-MVP): Use git as transport mechanism for MediaWiki scap deploys - https://phabricator.wikimedia.org/T147938#2709204 (10thcipriani) [16:14:34] 03Scap3 (Scap3-MediaWiki-MVP): Use git as transport mechanism for MediaWiki scap deploys - https://phabricator.wikimedia.org/T147938#2709217 (10thcipriani) p:05Triage>03Normal [16:14:56] 03Scap3 (Scap3-MediaWiki-MVP): Use git as transport mechanism for MediaWiki scap deploys - https://phabricator.wikimedia.org/T147938#2709204 (10thcipriani) [16:14:57] 03Scap3 (Scap3-MediaWiki-MVP), 10releng-201516-q3, 03releng-201617-q2, 10scap, and 2 others: [keyresult] Migrate the MW weekly train deploy to scap3 - https://phabricator.wikimedia.org/T114313#2709218 (10thcipriani) [16:15:41] 03Scap3 (Scap3-MediaWiki-MVP), 10scap: Bring co-master / fanout capabilities to scap3 deployments - https://phabricator.wikimedia.org/T121276#2709220 (10thcipriani) [16:16:12] 03Scap3 (Scap3-MediaWiki-MVP), 10scap: Bring co-master / fanout capabilities to scap3 deployments - https://phabricator.wikimedia.org/T121276#1874432 (10thcipriani) [16:16:14] 03Scap3 (Scap3-MediaWiki-MVP): Use git as transport mechanism for MediaWiki scap deploys - https://phabricator.wikimedia.org/T147938#2709204 (10thcipriani) [16:25:34] paladox: Errm, did you manually add all those many reviewers to some of your recent patches in Gerrit, or did that happen automatically? [16:25:51] andre__ which one? [16:26:16] paladox, for example https://gerrit.wikimedia.org/r/#/c/315515 [16:26:33] andre__ yes that was me [16:26:36] Why? [16:27:17] paladox, yeah, that's exactly also my question: Why do you do that? [16:27:23] Make more people drown in Gerrit emails about stuff that does not interest them? ;) [16:27:54] Oh, i guess it's because if i leave no reviewers then it is pretty unlikly anyone will review [16:28:22] If you look at my patches, most have reviewers and none have been reviewed yet or have but havent been re reviewed since i updated them [16:29:50] paladox: No reviewers isn't good either, I agree. However adding people that have nothing to do with the code area is even worse, as it will make people ignore Gerrit notifications in general. [16:30:17] 03Scap3 (Scap3-MediaWiki-MVP): Setup test environment for MediaWiki deployment - https://phabricator.wikimedia.org/T147940#2709276 (10thcipriani) [16:30:23] Oh, but i have no idea which one's will review, i added a few who specilise in mysql [16:30:42] 03Scap3 (Scap3-MediaWiki-MVP): Setup test environment for MediaWiki deployment - https://phabricator.wikimedia.org/T147940#2709290 (10thcipriani) [16:30:44] 03Scap3 (Scap3-MediaWiki-MVP), 10releng-201516-q3, 03releng-201617-q2, 10scap, and 2 others: [keyresult] Migrate the MW weekly train deploy to scap3 - https://phabricator.wikimedia.org/T114313#2709289 (10thcipriani) [16:31:09] paladox: Please do not do that. [16:31:15] But why? [16:31:43] I have a ton of patches sitting, in the queue and none have been reviewed yet. [16:31:45] it's next to impossible to get reviews, it's a real problem [16:32:03] In gerrit 2.13 it adds an option so you can disable emails per user [16:32:51] many people already ignore gerrit notifications, which is why people will add more of them to at least have a theoretical chance [16:32:52] paladox: how did you come up with the list of people to add? [16:32:52] paladox: See above: Because adding *random* people is worse than no people at all. [16:32:58] you added springle, who doesn't work here anymore [16:33:04] I didnt add him [16:33:11] He was added by the reviewer bot [16:33:15] * greg-g nods [16:33:16] nvm :) [16:34:13] paladox: I guess you remember https://phabricator.wikimedia.org/T106359#1669838 :) [16:34:40] andre__ yes but that was specifficly for rebasing [16:34:44] mutante: You don't solve the problem of ignoring notifications by creating more notifications about stuff that's irrelevant to your workarea. [16:34:46] I haven't done so many rebases [16:35:14] paladox: and that's great. Thanks. [16:35:34] andre__: is there a way he can lookup the "right" reviewers? [16:35:45] Someone should remove springle from the reviewer bot then [16:35:46] Yep, but reviewers are free to remove them selfs, in polygerrit a user can now view if a reviewer removed themselfs [16:36:06] paladox, https://www.mediawiki.org/wiki/Gerrit/Code_review/Getting_reviews#Add_reviewers asks you to add one or two reviewers. [16:36:31] paladox: I don't think it's a great use of time to remove myself from lots of patchsets where you added me though I have nothing to do with that code area. [16:36:43] (replace "me" by some other developer's name) [16:36:55] andre__ but i have been doing this for 2-3+ years and no complaints, only complaint was for rebasing. [16:36:56] And ok [16:36:58] mutante: https://www.mediawiki.org/wiki/Gerrit/Code_review/Getting_reviews#Add_reviewers [16:37:13] Tempest -> teacup [16:37:38] paladox: I don't think you've always been adding ~10 people as reviewers to patchsets in the past :) [16:37:43] Just removed springle [16:37:43] https://www.mediawiki.org/w/index.php?title=Git%2FReviewers&type=revision&diff=2260490&oldid=2256387 [16:37:47] Srsly. If people don't want to review they'll remove themselves. [16:37:49] andre__ i guess we will want to do #wikimedia-codereview more frequently so that way i can have less reviewers but we can get reviewers interested in reviewing patches. [16:38:06] paladox: indeed, yeah... [16:38:38] Reedy thanks [17:02:30] 03Scap3 (Scap3-Adoption-Phase1), 10Cassandra, 10RESTBase-Cassandra, 06Services (next): Deploy cassandra metrics collector via scap3 - https://phabricator.wikimedia.org/T137371#2709374 (10GWicke) p:05Triage>03Normal [17:02:49] PROBLEM - Puppet run on integration-slave-trusty-1014 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:03:01] 10Continuous-Integration-Config, 10RESTBase, 06Wikipedia-Android-App-Backlog, 06Services (watching): Kick off periodic Android CI tests when RESTBase is updated on beta labs - https://phabricator.wikimedia.org/T146488#2709377 (10GWicke) [17:05:44] 10Continuous-Integration-Config, 10RESTBase, 06Wikipedia-Android-App-Backlog, 06Services (watching): Kick off periodic Android CI tests when RESTBase is updated on beta labs - https://phabricator.wikimedia.org/T146488#2709388 (10bearND) @Niedzielski hasn't this been fixed? I see new apks on https://androi... [17:06:01] Project beta-code-update-eqiad build #125459: 04FAILURE in 3 min 1 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/125459/ [17:07:07] PROBLEM - Puppet run on integration-slave-trusty-1018 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:08:08] PROBLEM - Puppet run on integration-slave-trusty-1011 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:14:54] Yippee, build fixed! [17:14:54] Project beta-code-update-eqiad build #125460: 09FIXED in 1 min 54 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/125460/ [17:19:55] paladox: Wikimedia's code review process is pretty broken. And while we try to improve, we/I don't know what's the best way forward either. So if I asked to not add too many reviewers to a patch in Gerrit, while other folks have asked you to add reviewers, it is obviously not your fault when you follow advice that was given to you, and I apologize for creating confusion. [17:20:16] Ok [17:31:57] 10Continuous-Integration-Config, 10RESTBase, 06Wikipedia-Android-App-Backlog, 06Services (watching): Kick off periodic Android CI tests when RESTBase is updated on beta labs - https://phabricator.wikimedia.org/T146488#2709529 (10Niedzielski) @bearND, the alpha builds have been fixed but they're independent... [17:37:22] PROBLEM - Puppet run on deployment-eventlogging04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:37:34] PROBLEM - Puppet run on deployment-db04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:42:49] RECOVERY - Puppet run on integration-slave-trusty-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [17:43:09] RECOVERY - Puppet run on integration-slave-trusty-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [17:47:09] RECOVERY - Puppet run on integration-slave-trusty-1018 is OK: OK: Less than 1.00% above the threshold [0.0] [17:47:55] PROBLEM - Puppet run on deployment-pdfrender is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:50:33] PROBLEM - Puppet run on deployment-mira is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:56:08] PROBLEM - Puppet run on deployment-tin is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [18:00:52] PROBLEM - Puppet run on deployment-db03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [18:11:02] PROBLEM - jenkins_zmq_publisher on contint1001 is CRITICAL: connect to address 127.0.0.1 and port 8888: Connection refused [18:12:17] RECOVERY - Puppet run on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [18:28:23] PROBLEM - Puppet run on deployment-conftool is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [18:29:21] PROBLEM - Puppet run on deployment-stream is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [18:31:57] 10MediaWiki-Releasing, 10Architecture, 10Parsoid, 07Service-Architecture, 06Services (later): Distribution strategy option: Use Debian packages - https://phabricator.wikimedia.org/T88154#2709936 (10Legoktm) [18:57:29] twentyafterfour: ^ question regarding projects in phabricator. We want to not allow Blocked-on-ops to be used in new tasks, but dont want to remove from the 5 tasks that still exist. If we archive it, will it strip it off those tasks as well or do what we want? [18:57:41] i was gonna paste that in ops but came in here, ignore the ^ [18:57:58] or anyone else who may know =] [18:58:15] archiving doesn't remove a project from existing tasks [18:58:23] robh: what krenair said ;) [18:58:36] just makes it show up as grey instead of yellow/blue/whatever [18:58:37] but doesnt allow it to be assgined to new ones, right? [18:58:39] cool [18:58:46] no, you can still add archived projects to tasks [18:58:58] they show up all funky though don't they? [18:59:14] oh, just shows as grey so its known archived at least [18:59:14] that was a ? [18:59:43] yeah [19:00:03] sweet, thats good, archived. [19:00:03] and autocomplete prefers non-archived projects [19:00:08] thanks! [19:00:12] you're welcome [19:00:25] we decided in the ops offsite to do like the rest of hte org and kill our blocked-on project. [19:02:13] I thought that blocked-on-* were useful but I guess they aren't very useful if nobody is using them [19:03:01] i thought so too! i liked it for our clinic duty [19:03:24] RECOVERY - Puppet run on deployment-conftool is OK: OK: Less than 1.00% above the threshold [0.0] [19:03:25] but there was a move across engineering to stop using them for other things, so seemed best to not silo ourselves away in the opsen cabal =] [19:03:34] so should we remove the tag from those 5 tickets? [19:03:37] not that many [19:03:46] mutante: nope, ops offsite said to leave them on existing tasks [19:03:48] iirc [19:03:52] oh.. ok [19:03:57] we just dont want to add or use for anything new [19:04:20] RECOVERY - Puppet run on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0] [19:39:51] 10MediaWiki-Releasing, 06Release-Engineering-Team, 10Architecture, 10Parsoid, and 2 others: Evaluate and decide on a distribution strategy targeted at VMs - https://phabricator.wikimedia.org/T87774#2710352 (10GWicke) p:05High>03Normal [19:42:11] 10MediaWiki-Releasing, 10Architecture, 10Parsoid, 07Service-Architecture, 06Services (later): Distribution strategy option: Use Debian packages - https://phabricator.wikimedia.org/T88154#2710366 (10GWicke) @legoktm, thank you for pushing this forward! There is currently no clear owner of the overall str... [19:55:19] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling, 07Nodepool: Investigate why Nodepool instances are sometime slow to reach READY state - https://phabricator.wikimedia.org/T146813#2710420 (10hashar) Changed merged/deployed on October 12th 18:15 UTC [20:02:18] PROBLEM - Puppet run on deployment-restbase02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:25:58] PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:37:19] RECOVERY - Puppet run on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:05:57] RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0] [21:10:20] thcipriani: ready for my weekly nagging ? [21:10:34] heh [21:10:44] I'm as ready as I'll ever be for anything, what's up? [21:11:29] I will start with: https://phabricator.wikimedia.org/T147968 [21:11:39] i don't know how severe this is [21:11:53] since it is .21 which is everywhere by now [21:13:56] hrm, have you seen this same error in wmf.22? [21:14:11] I'm trying to find it in logstash for wmf.21 [21:14:49] i will look [21:16:02] PROBLEM - Puppet run on deployment-elastic08 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:19:38] thcipriani: https://phabricator.wikimedia.org/T147970 as well [21:20:12] that seems worrisome [21:20:56] matanya did that occur in wmf 21? or has it been happening for a while? [21:21:11] it is since 21 [21:21:23] Ok [21:21:25] thanks [21:23:12] do you happen to have a logstash dashboard to link to? struggling to find that error. Seems like the kind of thing you'd see if a server were somehow repooled after a full scap and didn't have a cache. [21:23:46] If we look through https://www.mediawiki.org/wiki/MediaWiki_1.28/wmf.21 maybe we can find the patch that did this [21:26:03] thcipriani: https://logstash.wikimedia.org/goto/5faea2514fcfade7213b31f4e333fbff [21:26:26] thcipriani: all seems to be mw1164 [21:26:34] hrm. That's giving me an nginx 503 :( [21:26:39] ah, that's what I was looking for! [21:27:11] thcipriani: mw1163 too [21:30:56] (03PS1) 10Paladox: Whitelist Ljonka [integration/config] - 10https://gerrit.wikimedia.org/r/315579 [21:35:10] matanya: hrm, are those errors all wmf.21 and have they happened recently? The l10n cache on mw1163 seems to have been fixed by an l10nsync early today 02:36 for wmf.21 and 03:06 for wmf.22 [21:35:34] seems all .21 [21:35:58] they are happening this very moment thcipriani [21:36:01] hrm [21:40:48] thcipriani: added https://phabricator.wikimedia.org/T147971 as well [21:43:42] kk, I have seen that one in the logs, tagged it Wikimedia-log-errors [21:45:02] the rebuildLocalisationCache.php one is confusing as that is run as part of scap. and both mw1163 and mw1164 seem to have their localisation cache files for 1.28.0-wmf.21 [21:45:43] PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:45:55] thcipriani: next is https://phabricator.wikimedia.org/T147972 [21:46:19] mw1163:/srv/mediawiki/php-1.28.0-wmf.21/cache/l10n/l10n_cache-en.cdb definitely exists and is 3.4M in size. Very strange. [21:51:03] RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [21:55:20] thcipriani: i fear we have a .22 blocker [21:55:40] matanya: which one is that? [21:55:48] creating now [21:56:08] okie doke [21:59:30] thcipriani: https://phabricator.wikimedia.org/T147977 [21:59:45] i don't have access to the original task [22:00:07] bug the git change to blame seems to be https://gerrit.wikimedia.org/r/#/c/314431/ [22:01:07] hrm, I think I saw something just get deployed in this area [22:01:11] https://gerrit.wikimedia.org/r/#/c/315520/ [22:02:00] thcipriani: not sure if warrant a rollback/freeze or something, but people get angry when links don't refresh :) [22:02:23] thcipriani: yes, that is the second suspect [22:02:54] but i think my suspect is more accurate, as the other change as listed at https://www.mediawiki.org/wiki/MediaWiki_1.28/wmf.22#EventBus [22:03:16] is related to images, which points at commons [22:11:24] thcipriani: https://phabricator.wikimedia.org/T147979 [22:20:42] RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [22:20:56] thcipriani: https://phabricator.wikimedia.org/T147981 [22:23:21] thcipriani: commons is also broken in Could not acquire lock for \"mwstore://local-multiwrite/local-public/ ... [22:25:54] 06Release-Engineering-Team, 10Wikimedia-Developer-Summit, 15User-greg: Facilitate Wikidev'17 main topic "How to manage our technical debt" - https://phabricator.wikimedia.org/T147937#2710980 (10greg) [22:27:37] matanya: thank you for all the tasks. Feel free to add the wikimedia-log-errors tag to any/all of the problems you find in the logs. [22:27:56] looking at the EventBus one now. [22:28:54] will do thcipriani [22:34:31] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T146998#2711073 (10Matanya) [22:35:58] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T146998#2677385 (10Matanya) [22:36:55] PROBLEM - Host deployment-lvs-realservertest is DOWN: CRITICAL - Host Unreachable (10.68.18.250) [22:38:41] ignore that [22:44:01] thcipriani: OK, full blocker : https://he.wikisource.org/wiki/%D7%99%D7%A9%D7%A2%D7%99%D7%94%D7%95_%D7%98%D7%A2%D7%9E%D7%99%D7%9D_%D7%9E%D7%A1%D7%95%D7%9E%D7%9F [22:45:28] should it be UBN thcipriani ? [22:45:54] for the record it is: https://phabricator.wikimedia.org/T136401 [22:45:58] if it's blocking the train then yes [22:46:08] should block [22:46:16] look at the error :) [22:47:24] that is....quite an exception [22:48:00] like 40 pages i have seen so far on he.wikisource [22:48:13] https://he.wikisource.org/wiki/%D7%99%D7%A8%D7%9E%D7%99%D7%94%D7%95_%D7%98%D7%A2%D7%9E%D7%99%D7%9D_%D7%9E%D7%A1%D7%95%D7%9E%D7%9F [22:48:16] for example [22:49:11] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T146998#2711130 (10Matanya) [22:51:30] thcipriani: basically the entire bible is an exception now on Hebrew wikisource :D [22:53:28] that is definitely a blocker :) [22:57:29] 10Continuous-Integration-Infrastructure: Install php7 and the php-ast extension so etsy/phan can be run from jenkins - https://phabricator.wikimedia.org/T132636#2711139 (10EBernhardson) I did some poking around, ubuntu (16.04LTS) and debian (stretch, to be released next year) both have native php-ast packages.... [23:03:42] RECOVERY - Host deployment-lvs-realservertest is UP: PING OK - Packet loss = 0%, RTA = 0.62 ms [23:09:27] legoktm: are you around ? [23:11:06] "User 'Aleana1997' exists locally but is not attached error on loginwiki, fyi [23:15:11] PROBLEM - Puppet run on deployment-lvs-realservertest2 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [23:19:27] 03Scap3 (Scap3-Adoption-Phase1), 10scap, 10RESTBase, 06Services: Deploy RESTBase with scap3 - https://phabricator.wikimedia.org/T116335#2711281 (10GWicke) [23:21:38] 03Scap3 (Scap3-Adoption-Phase1), 10scap, 10Cassandra, 10RESTBase-Cassandra, 06Services: Deploy logstash logback encoder with scap3 - https://phabricator.wikimedia.org/T116340#2711339 (10GWicke) [23:22:32] thcipriani: so in total 11 tasks, please triage/prioritize as you see fit. Thanks, and good night [23:25:32] matanya: will do, thank you [23:34:51] thcipriani: https://logstash.wikimedia.org/goto/c7b13c02df2ff0e5233c05679fb0b50e [23:35:19] amount of failing jobs in very high [23:36:59] The link i wanted to paste : https://grafana.wikimedia.org/dashboard/db/job-queue-health