[00:39:27] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:19:23] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [01:45:24] PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:33:04] 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-extensions-PageCuration, 10Collaboration-Team-Triage (Collab-Team-Next-Quarter): [betalabs]: Page triage: "Uncaught TypeError: Cannot read property 'getLogPageTitle' of undefined" for 'Redirects for discussi... - https://phabricator.wikimedia.org/T196954 [06:27:44] !log Updated cxserver to f8c71a1 [06:27:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:03:38] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Release Pipeline, 10Jenkins: Enable MFA on Jenkins - https://phabricator.wikimedia.org/T198814 (10hashar) [08:06:30] (03PS1) 10Hashar: Migrate Graph to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/443908 (https://phabricator.wikimedia.org/T183512) [08:06:55] (03CR) 10Hashar: [C: 032] Migrate Graph to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/443908 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [08:08:22] (03Merged) 10jenkins-bot: Migrate Graph to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/443908 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [08:11:50] PROBLEM - Puppet errors on saucelabs-01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [08:42:58] (03PS1) 10Hashar: Migrate PageViewInfo to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/443915 (https://phabricator.wikimedia.org/T183512) [08:43:21] (03CR) 10Hashar: [C: 032] Migrate PageViewInfo to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/443915 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [08:44:53] (03Merged) 10jenkins-bot: Migrate PageViewInfo to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/443915 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [08:46:48] RECOVERY - Puppet errors on saucelabs-01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:48:32] (03PS1) 10Hashar: Update Quibble QA report, gate is migrated [integration/config] - 10https://gerrit.wikimedia.org/r/443916 (https://phabricator.wikimedia.org/T197469) [08:48:54] (03CR) 10Hashar: [C: 032] Update Quibble QA report, gate is migrated [integration/config] - 10https://gerrit.wikimedia.org/r/443916 (https://phabricator.wikimedia.org/T197469) (owner: 10Hashar) [08:50:21] (03Merged) 10jenkins-bot: Update Quibble QA report, gate is migrated [integration/config] - 10https://gerrit.wikimedia.org/r/443916 (https://phabricator.wikimedia.org/T197469) (owner: 10Hashar) [09:35:58] 10Phabricator, 10Security: Trusted contributors cannot escalate tasks as security issues - https://phabricator.wikimedia.org/T198831 (10MarcoAurelio) Not sure if D1075 could fix this though. [09:39:12] dockerfiles/npm-test/run.sh : npm run-script "{@:-test}" [09:42:39] The url to the chat log in the topic on this channel gives 404 to me. [09:46:09] Hauskatze: https://wm-bot.wmflabs.org/browser/index.php?display=%23wikimedia-releng [09:47:45] thanks p858snake - this confirms a bug in wikibugs not reporting all the comments made on tasks [09:48:23] I made two comments on T198831 but only the last one was written in this channel [09:48:24] T198831: Trusted contributors cannot escalate tasks as security issues - https://phabricator.wikimedia.org/T198831 [09:49:06] check with legoktm, but i suspect that might be some anti-flood type of thing [09:49:26] afair there was wikibugs and wikibugs_ right? [09:52:19] (03PS1) 10Hashar: (WIP) job to run wdio tests for a single repo [integration/config] - 10https://gerrit.wikimedia.org/r/443931 [09:55:13] (03CR) 10jerkins-bot: [V: 04-1] (WIP) job to run wdio tests for a single repo [integration/config] - 10https://gerrit.wikimedia.org/r/443931 (owner: 10Hashar) [09:56:58] (03PS2) 10Hashar: (WIP) job to run wdio tests for a single repo [integration/config] - 10https://gerrit.wikimedia.org/r/443931 [10:15:42] 10Phabricator, 10Security: Trusted contributors cannot escalate tasks as security issues - https://phabricator.wikimedia.org/T198831 (10MarcoAurelio) [10:42:55] 10Scap, 10WorkType-NewFunctionality: Add a --labs option to 'scap update-interwiki-cache' to be able to update the interwiki-labs.php file using Scap - https://phabricator.wikimedia.org/T198844 (10MarcoAurelio) [11:07:20] (03PS3) 10Zfilipin: (WIP) job to run wdio tests for a single repo [integration/config] - 10https://gerrit.wikimedia.org/r/443931 (https://phabricator.wikimedia.org/T188742) (owner: 10Hashar) [11:17:55] Yippee, build fixed! [11:17:56] Project beta-scap-eqiad build #214593: 09FIXED in 4 min 8 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/214593/ [11:18:52] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: Run selenium-EXTENSION-jessie for all repositores with Selenium tests - https://phabricator.wikimedia.org/T188742 (10zeljkofilipin) [11:19:23] 10Release-Engineering-Team, 10GitHub-Mirrors, 10Security-Team: Enforce 2FA for GitHub members - https://phabricator.wikimedia.org/T198810 (10Reedy) [11:26:00] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: Run selenium-EXTENSION-jessie for all repositores with Selenium tests - https://phabricator.wikimedia.org/T188742 (10zeljkofilipin) [12:18:06] 10Diffusion, 10Phabricator: Turn off 'blame' by default on Diffusion - https://phabricator.wikimedia.org/T198838 (10Aklapper) According to https://secure.phabricator.com/rPafc3099ee785c0f9cb86c702fc1d04d2be9f6af5 there is a "Hide Blame" option in the "View Options" dropdown so it can be toggled off? [12:42:25] 10Phabricator, 10WMSE-Bug-Reporting-and-Translation-2018: "Create Task" in favorites menu should be called "Create Task (Simple)" - https://phabricator.wikimedia.org/T198837 (10Lokal_Profil) >>! In T198837#4399050, @Aklapper wrote: > Thanks. Done. Wow that was quick!. Thanks [14:28:15] (03PS1) 10Hashar: Archive ContributorsAddon extension [integration/config] - 10https://gerrit.wikimedia.org/r/443976 (https://phabricator.wikimedia.org/T198864) [14:28:32] (03CR) 10Hashar: [C: 032] Archive ContributorsAddon extension [integration/config] - 10https://gerrit.wikimedia.org/r/443976 (https://phabricator.wikimedia.org/T198864) (owner: 10Hashar) [14:30:16] (03Merged) 10jenkins-bot: Archive ContributorsAddon extension [integration/config] - 10https://gerrit.wikimedia.org/r/443976 (https://phabricator.wikimedia.org/T198864) (owner: 10Hashar) [14:30:18] (03PS1) 10Hashar: Migrate BreadCrumbs extension to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/443977 (https://phabricator.wikimedia.org/T183512) [14:30:27] (03CR) 10Hashar: [C: 032] Migrate BreadCrumbs extension to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/443977 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [14:32:06] (03Merged) 10jenkins-bot: Migrate BreadCrumbs extension to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/443977 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [15:18:59] PROBLEM - Free space - all mounts on integration-slave-jessie-1002 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1002.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1002.diskspace._srv.byte_percentfree (<40.00%) [15:45:57] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [16:00:14] (03PS4) 10Zfilipin: (WIP) job to run wdio tests for a single repo [integration/config] - 10https://gerrit.wikimedia.org/r/443931 (https://phabricator.wikimedia.org/T188742) (owner: 10Hashar) [16:03:57] RECOVERY - Free space - all mounts on integration-slave-jessie-1002 is OK: OK: integration.integration-slave-jessie-1002.diskspace._mnt.byte_percentfree (No valid datapoints found) [16:09:38] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:19:29] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [16:20:04] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [17:00:30] (03PS1) 10Hashar: jjb: add support to pass arguments to quibble [integration/config] - 10https://gerrit.wikimedia.org/r/444024 (https://phabricator.wikimedia.org/T196960) [17:11:02] (03PS1) 10Hashar: Quibble jobs that skip selenium entirely [integration/config] - 10https://gerrit.wikimedia.org/r/444028 (https://phabricator.wikimedia.org/T196960) [17:21:10] hmm i get "Context not available." [17:21:14] on https://phabricator.wikimedia.org/D1075 [17:21:18] twentyafterfour ^^ [17:22:36] paladox: probably a diff uploaded via web ui rather than arc [17:22:41] oh [17:22:45] ok [17:28:29] if it is a diff of mine, yep, I upload them using the web UI [17:28:33] I don't have arc [17:33:20] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T191060 (10greg) a:03hashar [17:33:34] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T191061 (10greg) a:03hashar [17:49:35] RECOVERY - SSH on integration-slave-docker-1017 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [17:53:12] 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-extensions-PageCuration, 10Collaboration-Team-Triage (Collab-Team-Next-Quarter): [betalabs]: Page triage: "Uncaught TypeError: Cannot read property 'getLogPageTitle' of undefined" for 'Redirects for discussi... - https://phabricator.wikimedia.org/T196954 [17:53:15] 10Beta-Cluster-Infrastructure, 10Growth-Team (Current Sprint): Set up PageTriage test environment in beta labs - https://phabricator.wikimedia.org/T198898 (10Etonkovidova) [18:22:18] 10Release-Engineering-Team (Kanban), 10Wikibugs, 10Patch-For-Review: Deprecate -devtools and redirect to -releng? - https://phabricator.wikimedia.org/T185285 (10greg) >>! In T185285#4262451, @Peachey88 wrote: > I don't see a need in redirecting the channel/closing it, I've actually seen it a few times when p... [18:27:51] 10Phabricator, 10Release-Engineering-Team (Kanban): Make sure elasticsearch 6 is supported in phabricator - https://phabricator.wikimedia.org/T181393 (10greg) p:05Lowest>03Normal Re-prioritizing. @EBernhardson still no date yet, right? And from what you said in T192614#4249710 are we actually OK with curr... [19:38:42] The last Puppet run was at Thu Jul 5 13:31:12 UTC 2018 (367 minutes ago). Puppet is disabled. reason not specified [19:38:43] who did this [19:40:25] Jul 5 13:33:12 deployment-elastic07 sudo: gehel : TTY=pts/0 ; PWD=/home/gehel ; USER=root ; COMMAND=/usr/bin/puppet agent --disable [19:41:19] gehel why is there no reason specified? [19:42:47] Krenair: probably because I forgot to add one ;) [19:43:02] I'll check in a minute. Thanks for the ping ! [19:43:41] thanks [19:46:15] looks like I disabled it with a reason first, but re-disabled it without the second time. Bad gehel... [20:06:48] I'm getting "You ("x.x.x.241") are issuing too many requests too quickly." when I visit phab [20:06:56] Is this related to the spam attack? [20:07:00] I'm logged in so this seems weird to me [20:07:22] It's stopping me from doing work right now [20:09:20] are you at home or in the office? [20:10:00] office wifi [20:10:06] so guessing it's everyone in the office? [20:10:11] there is rate limiting, maybe the office ip.... [20:10:13] i cant even access incognito [20:10:42] web interface right? [20:11:05] yup [20:11:13] i confirmed others in the office cannot access either [20:11:18] see -staff [20:11:24] just got another report [20:11:39] well the good news is that rate limiting is now working [20:11:42] the bad news... [20:12:07] your all blocked [20:12:08] lol [20:12:31] hmm, we should implement a way to whitelist users [20:12:41] instead of ip as ips change all the time/ [20:13:13] can you allow the office ip to bypass the rate limit? [20:13:20] yes [20:13:27] there's a whitelist [20:13:34] at least the WMDE was whitelisted [20:13:36] that will do for letting people get work done right now at least [20:14:34] twentyafterfour ^^ [20:15:24] the office ip has been posted in the other channel so it can be added [20:17:05] is it tan1.corp.wikimedia.org = 198.73.209.1 by any chance? [20:17:22] that one's public knowledge [20:19:27] no [20:32:57] https://phabricator.wikimedia.org/T198612 according to this they are not exempt yet [20:33:05] I hope we have a mechanism for this [20:37:42] apergos they were whitelisted since then [20:38:07] I wish there were a note on the task to that effect [20:38:11] and especially how it was done [20:38:57] i created https://phabricator.wikimedia.org/T198614 [20:40:27] wikibugs seems affected too. [20:40:41] i only see it post gerrit results [21:16:36] as it's midnight here and I have no idea how the wmde fix was done [21:16:45] I'm going to go get some sleep [21:17:18] was there a wmde fix done? [21:17:26] scroll up [21:17:30] paladox says yes [21:18:58] has anyone filed a task or emailed ops to look at the office one? I wouldn't rely on scrollback for that [21:19:26] yes [21:19:31] I emailed ops with a link to the task [21:23:48] Krenair: can you please have a look at T198891 ? I see that you are a project maintainer for that bot. [21:23:49] T198891: wikibugs malfunctioning - https://phabricator.wikimedia.org/T198891 [21:24:00] maintainer is a strong word [21:24:17] s/maintainer/have access to/ [21:24:17] (03CR) 10Arlolra: "Hashar, this is the other part of https://gerrit.wikimedia.org/r/#/c/integration/config/+/442004/" [integration/config] - 10https://gerrit.wikimedia.org/r/442002 (owner: 10Arlolra) [21:24:23] can you give an example of something it's missed [21:25:06] what should have happened, what did happen, how to reproduce it, etc. [21:25:22] Yes. This morning I posted some comments at a task which is reported to this channel. Wikibugs should have therefore reported the activity here. It didn't but just one. [21:25:30] I can fetch the task. [21:25:43] It didn't but just one? [21:25:57] Also, zhuyifei1999_ and I had a discussion on #pywikibot and he thinks wikibugs is missing some activity as well [21:25:59] wikibugs is likly hitting the rate limiting? [21:26:12] i filled a task for that too [21:26:23] paladox: maybe because his wikibugs_ brother isn't here? [21:26:30] Nope [21:26:36] wikibugs_ is for flooding. [21:26:48] but it works with gerrit [21:26:51] https://phabricator.wikimedia.org/T198915 [21:27:00] yeah I don't think that has any effect on the process that pulls phab tasks into redis [21:27:01] ah, because on a channel I am we have two bots to report stuff to avoid flooding off [21:27:53] okay so it's T198915 apparently, right? [21:27:53] T198915: Exclude wiki bugs from phabricator rate limit - https://phabricator.wikimedia.org/T198915 [21:27:57] 10Phabricator, 10Wikibugs: Exclude wiki bugs from phabricator rate limit - https://phabricator.wikimedia.org/T198915 (10Krenair) wikibugs runs on labs and I don't think that's going to get whitelisted. [21:30:02] 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review, 10User-notice: Make PolyGerrit the default ui - https://phabricator.wikimedia.org/T196812 (10Johan) OK, I simplified a bit and geared it a bit more towards the main audience of Tech News (non-developers), but it's included. [21:30:37] 10Continuous-Integration-Config, 10Gerrit, 10Jenkins: Jenkins bot ignores the patches submitted by Mpaa - https://phabricator.wikimedia.org/T198573 (10Mpaa) 05Open>03Resolved a:03Mpaa Seems OK now. Thanks and sorry for the inconvenience. [21:34:16] Hauskatze, looking at tools-bastion-03.tools.eqiad.wmflabs:/data/project/wikibugs/wikibugs.log [21:34:24] ... [21:34:48] there are exceptions going in there [21:34:49] sup twentyafterfour [21:34:51] We might as well just disable to throttle for now, it's too hard to whitelist individual ips [21:35:29] like technically too hard? [21:36:15] well not technically very hard but the list will keep expanding of who needs to be whitelisted and there is no convenient or clean place to put it in the code, so it'll be hacks on top of hacks unless it's done right [21:36:30] doing it right takes a little more time and that doesn't alleviate the situation immediately [21:36:39] I envy you guys. I wish I had more technical knowledge :( [21:36:47] Respect. [21:37:21] twentyafterfour, right now we're talking about the WMDE and WMF offices [21:37:58] I don't think we're necessarily opening the floodgates [21:39:03] Krenair: true but it's still a mess to add exceptions - the code has no support for whitelisting so I'm not sure how to add it [21:39:28] this is phabricator we were somewhat expecting this when implementing rate limiting weren't we? [21:40:25] Krenair: where can I see the IPs that need whitelisting? [21:40:55] Krenair: it had rate limits previously. I'm not sure what the parameters were though [21:41:04] they must have been higher than now [21:41:23] https://phabricator.wikimedia.org/T198612 which I copied into my potential patch for it [21:42:02] I wouldn't be surprised if there are more IPs for those offices [21:42:08] yeah \ [21:42:29] but some people try to shroud the corp network in secrecy so [21:42:39] There used to be a few different [21:42:41] * Krenair shrugs [21:42:45] I dunno if/what changed when the office moved [21:42:48] tan[1-4] ? [21:42:56] and guest-tan1 [21:43:07] now there's tan241 [21:43:42] I think Krenair's patch looks like a good solution for now [21:44:03] or disable throttle [21:44:40] Hauskatze, so wikibugs [21:44:55] here's a common exception [21:45:18] I'm listening. [21:45:34] I tried to figure out how to exempt wikibugs from the throttle and I couldn't find a clean solution because phabricator doesn't always have a logged in session when it processes the throttle scoring [21:45:44] https://phabricator.wikimedia.org/P7331 [21:46:00] I don't think we actually know that wikibugs is running into a rate limit yet twentyafterfour [21:46:04] we could reduce the rate of polling maybe? [21:46:18] although it's certainly possible given its nature [21:46:40] It polls pretty quickly so I'd think that it is hitting the throttle [21:47:19] We could either raise the max score or slow down the poll loop in wikibugs [21:48:10] doubling the current max score would probably be a good start [21:50:41] File "/mnt/nfs/labstore-secondary-tools-project/wikibugs/py-wikibugs2/lib/python3.4/site-packages/phabricator/__init__.py", line 49, in connect [21:50:41] assert req.status_code == 200 # krenair live hack [21:50:41] AssertionError [21:52:09] alright so let's see what the body of the response is when the code isn't 200 [21:52:15] there are some other types coming in too btw [21:52:21] yeah here we go [21:52:27] File "/mnt/nfs/labstore-secondary-tools-project/wikibugs/py-wikibugs2/lib/python3.4/site-packages/phabricator/__init__.py", line 50, in connect [21:52:27] raise Exception(req.content) [21:52:27] Exception: b'TOO MANY REQUESTS\nYou ("10.68.18.17") are issuing too many requests too quickly.\n' [21:55:22] twentyafterfour, made it sleep for 2 seconds instead of 1 second between polls [21:56:58] the limit is 5 per minute [21:57:09] so that needs to be more than 2 seconds really [21:58:02] ah [21:58:18] 5 per minute for 5 consecutive minutes [21:58:41] okay so 12 seconds [21:59:12] that seems reasonable. I don't think wikibugs needs to respond within 2 seconds to every change in phab [22:00:08] still getting different types of exceptions. let's see [22:00:38] oh wow [22:00:58] data_dict_str = task_page.split( [22:00:59] '