[00:25:04] (03PS1) 10BryanDavis: Tools LDAP model migrations [labs/striker] - 10https://gerrit.wikimedia.org/r/364138 [00:25:06] (03PS1) 10BryanDavis: Check to see if ssh key is a duplicate [labs/striker] - 10https://gerrit.wikimedia.org/r/364139 [00:27:36] (03PS2) 10BryanDavis: Check to see if ssh key is a duplicate [labs/striker] - 10https://gerrit.wikimedia.org/r/364139 (https://phabricator.wikimedia.org/T167931) [00:28:01] 10Striker, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: Fatal error when adding a duplicate SSH key - https://phabricator.wikimedia.org/T167931#3419235 (10bd808) a:03bd808 [00:28:59] 10Striker, 10cloud-services-team (Kanban), 10Epic, 10Patch-For-Review, 10User-bd808: Manage shared tool accounts via Striker - https://phabricator.wikimedia.org/T149458#3419239 (10bd808) [00:45:05] (03PS1) 10BryanDavis: Show message when duplicate SUL attach occurs [labs/striker] - 10https://gerrit.wikimedia.org/r/364140 (https://phabricator.wikimedia.org/T164847) [01:04:24] PROBLEM - Puppet errors on tools-worker-1017 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:16:03] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Give visual feedback while Editcounter is thinking - https://phabricator.wikimedia.org/T169831#3419278 (10Samwilson) Good suggestions! I've fixed up the code. [01:22:01] PROBLEM - Puppet errors on tools-exec-1430 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [01:39:24] RECOVERY - Puppet errors on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [02:02:02] RECOVERY - Puppet errors on tools-exec-1430 is OK: OK: Less than 1.00% above the threshold [0.0] [02:05:38] PROBLEM - Puppet staleness on tools-worker-1020 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [02:24:22] PROBLEM - Puppet errors on tools-worker-1021 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [03:05:22] PROBLEM - Puppet errors on tools-worker-1017 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [03:51:01] PROBLEM - Puppet errors on tools-exec-1423 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [03:54:59] 10Striker, 10cloud-services-team (Kanban), 10Patch-For-Review, 10Technical-Debt, 10User-bd808: Replace deprecated phabricator conduit api calls in phabricator.py file - https://phabricator.wikimedia.org/T159044#3419307 (10bd808) a:03bd808 [04:10:10] (03PS1) 10BryanDavis: Phabricator: replace maniphest.update with maniphest.edit [labs/striker] - 10https://gerrit.wikimedia.org/r/364144 (https://phabricator.wikimedia.org/T159044) [04:19:22] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [04:21:07] I should create a tool account called "as_gods" so I can type in the shell "become as_gods". Hah! Video game reference. [04:21:57] bonus points for making a useful mythology portal using it [04:23:23] (03CR) 10BryanDavis: [V: 031 C: 032] "Trivial (and currently unused code). I did test it locally." [labs/striker] - 10https://gerrit.wikimedia.org/r/364144 (https://phabricator.wikimedia.org/T159044) (owner: 10BryanDavis) [04:25:16] * bd808 needs someone willing to review a lot of DJango code [04:25:48] I'd got a "few" patches queued up for striker -- https://gerrit.wikimedia.org/r/#/projects/labs/striker,dashboards/default [04:29:20] RECOVERY - Puppet errors on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [04:34:20] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Inconsistent tool name format in new XTools - https://phabricator.wikimedia.org/T169913#3419333 (10Samwilson) a:03Samwilson I've updated the tool names: https://github.com/x-tools/xtools-rebirth/commit/3ffb843d765ef34342cf1ca713bfefb5c5aabbf1 I think all to... [04:54:23] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [04:55:19] PROBLEM - Puppet errors on tools-worker-1021 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [05:00:06] PROBLEM - Puppet errors on tools-exec-1415 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [05:03:45] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [05:25:07] RECOVERY - Puppet errors on tools-exec-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [05:30:21] RECOVERY - Puppet errors on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [05:38:47] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [05:43:22] PROBLEM - Free space - all mounts on tools-logs-02 is CRITICAL: CRITICAL: tools.tools-logs-02.diskspace._srv.byte_percentfree (<10.00%) [06:11:28] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Verify all routes work between new and old xtools - https://phabricator.wikimedia.org/T165612#3419363 (10Samwilson) I've created T170098 to track the export feature. The other big feature here seems to be supporting the gadget; I thought I'd seen a ticket for... [06:21:01] RECOVERY - Puppet errors on tools-exec-1423 is OK: OK: Less than 1.00% above the threshold [0.0] [06:43:52] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1428 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [06:54:09] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Loading Edit Counter in new XTools triggers PUT error - https://phabricator.wikimedia.org/T170100#3419386 (10kaldari) [06:54:23] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Loading Edit Counter in new XTools triggers PUT error - https://phabricator.wikimedia.org/T170100#3419401 (10kaldari) p:05Triage>03Normal [06:56:12] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10Patch-For-Review, and 2 others: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3419403 (10Marostegui) 05Resolved>03Open Hi, The script that checks f... [06:57:04] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Optimize edit count queries in XTools - https://phabricator.wikimedia.org/T163284#3419406 (10kaldari) Hmm, tried loading the new XTools again and this time it was slower :( http://xtools.wmflabs.org/ec/en.wikipedia.org/Kaldari Executed in 46.718 second(s). · T... [06:57:37] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Give visual feedback while Editcounter is thinking - https://phabricator.wikimedia.org/T169831#3419408 (10kaldari) 05Open>03Resolved Looks great! [06:58:19] RECOVERY - Free space - all mounts on tools-logs-02 is OK: OK: All targets OK [06:58:54] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Verify all routes work between new and old xtools - https://phabricator.wikimedia.org/T165612#3419413 (10kaldari) 05Open>03Resolved [06:58:57] 10Tool-Labs-tools-Xtools, 10Community-Tech: Epic: Rewriting XTools - https://phabricator.wikimedia.org/T153112#3419414 (10kaldari) [07:03:44] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Optimize edit count queries in XTools - https://phabricator.wikimedia.org/T163284#3419419 (10Samwilson) :-( Let's wait till we've redeployed to the new larger prod servers (T169590), and see if that changes the load time. [07:05:29] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Investigation: XTools routing - https://phabricator.wikimedia.org/T163283#3192212 (10kaldari) So can someone summarize the implementation that was decided on? [07:15:46] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Loading Edit Counter in new XTools triggers PUT error - https://phabricator.wikimedia.org/T170100#3419431 (10Samwilson) It looks like the error is: > [2017-07-10 07:06:11] request.CRITICAL: Uncaught PHP Exception Doctrine\DBAL\Exception\DriverException: "An ex... [07:19:11] 10Tool-Labs-tools-Xtools, 10Community-Tech: Add the Page History Gadget to new XTools - https://phabricator.wikimedia.org/T170101#3419435 (10Samwilson) [07:23:09] 10Tool-Labs-tools-Xtools, 10Community-Tech: Average edit size is mysterious in new XTools - https://phabricator.wikimedia.org/T170103#3419463 (10kaldari) [07:23:51] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1428 is OK: OK: Less than 1.00% above the threshold [0.0] [07:36:15] 10Tool-Labs-tools-Xtools, 10Community-Tech: Average edit size is mysterious in new XTools - https://phabricator.wikimedia.org/T170103#3419463 (10Samwilson) That's being calculated by: SELECT 'average_size' AS `key`, AVG(rev_len) AS val FROM $revisionTable WHERE rev_user = :userId So yep, bytes, and high... [08:10:05] 10Cloud-Services, 10Cloud-VPS, 10DBA: Queries of commonswiki_p.filearchive for fa_sha1 are slow - https://phabricator.wikimedia.org/T71088#3419584 (10Aklapper) a:05Springle>03None [ Resetting assignee as assignee account is not active anymore ] [08:10:22] RECOVERY - Puppet errors on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [09:07:30] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10Patch-For-Review, and 2 others: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3419867 (10jcrespo) cc @bd808 [10:01:21] PROBLEM - Puppet errors on tools-worker-1017 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [10:41:22] RECOVERY - Puppet errors on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [11:18:43] !log tools.stewardbots Restarted the bots due to excess flood [11:18:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [11:54:03] is there a way to disable unused LDAP users for the own project? E.g. I have a labs project, where a user is not added, but it looks like he is still created by LDAP [11:54:38] e.g. I have a software who needs the user "git", it already exists, and I can't change the passwd for my software, since then I get prompt for the LDAP admin passwd [12:41:14] 10Toolforge: Tool labs slow and kills application - https://phabricator.wikimedia.org/T169954#3420402 (10Fnielsen) I still get "502 Bad Gateway". It seems to be tied to one worker. Currently, I get "Respawned uWSGI worker 1" several times, while the other workers (that I see with different pid) does not seem to... [12:54:39] (03PS1) 10Elukey: Add fake Piwik backup user/password [labs/private] - 10https://gerrit.wikimedia.org/r/364196 [12:55:10] (03CR) 10Elukey: [V: 032 C: 032] Add fake Piwik backup user/password [labs/private] - 10https://gerrit.wikimedia.org/r/364196 (owner: 10Elukey) [13:32:25] PROBLEM - Puppet errors on tools-worker-1017 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [13:53:50] (03PS1) 10Elukey: Change Piwik's backup user/pass namespace to improve consistency [labs/private] - 10https://gerrit.wikimedia.org/r/364204 [13:55:09] (03PS2) 10Elukey: Change Piwik's backup user/pass namespace to improve consistency [labs/private] - 10https://gerrit.wikimedia.org/r/364204 [13:55:30] (03CR) 10Elukey: [V: 032 C: 032] Change Piwik's backup user/pass namespace to improve consistency [labs/private] - 10https://gerrit.wikimedia.org/r/364204 (owner: 10Elukey) [14:10:48] 10Cloud-Services, 10Cloud-VPS, 10cloud-services-team (Kanban), 10Operations, and 2 others: rack/setup/install labvirt101[5-8] - https://phabricator.wikimedia.org/T165531#3420744 (10chasemp) a:05chasemp>03Andrew ```[edit interfaces interface-range labs-instance-ports] member ge-5/0/3 { ... } + m... [14:12:23] RECOVERY - Puppet errors on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [14:15:08] Sagan: yeah that's a bummer, I believe teh answer is no you cannot choose to disable a particular ldap user and a local shadow user is going to have unintended effects I imagine. I'm surprised we allow a git ldap user, that seems bad on our part tbh. Make a task and we can look at just purging that user from ldap unless they are really active? We don't allow other regular "service" users iirc [14:25:19] PROBLEM - Puppet errors on tools-exec-1407 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [14:27:13] 10Data-Services, 10DBA, 10Patch-For-Review: Expose ar_content_format and ar_content_model columns of archive table on Labs replicas - https://phabricator.wikimedia.org/T89741#3420806 (10chasemp) >>! In T89741#3416082, @gerritbot wrote: > Change 363851 had a related patch set uploaded (by Umherirrender; owner... [14:40:40] 10Cloud-Services, 10cloud-services-team (Kanban), 10Wikimedia-Mailing-lists: Rename labs-admin mailing list to cloud-admin - https://phabricator.wikimedia.org/T167155#3319354 (10RobH) Anytime an mbox file is touched, if anything was removed since the last re-compile, it can change the links. Also it will re... [15:00:19] RECOVERY - Puppet errors on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [15:04:06] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10Patch-For-Review, and 2 others: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3421024 (10bd808) >>! In T169488#3419403, @Marostegui wrote: > Hi, > > Th... [15:04:26] 10Cloud-Services, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3421025 (10faidon) [15:06:04] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10Patch-For-Review, and 2 others: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3421040 (10Marostegui) >>! In T169488#3421024, @bd808 wrote: >>>! In T16948... [15:09:09] ooooooohh, new motd on tools-login [15:11:53] 10Cloud-Services, 10cloud-services-team (Kanban), 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labvirt101[5-8] - https://phabricator.wikimedia.org/T165531#3421069 (10Andrew) 05Open>03Resolved These are up and puppetized and running VMs. [15:12:14] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10Patch-For-Review, and 2 others: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3421074 (10jcrespo) Yes, I only dropped the databases- I thought the script... [15:18:15] chasemp: for what it is worth, the uid=git account seems to only be a member of the bastion project and was created 2014-11-19. It belongs to a wikitech user that has never made an edit. [15:18:38] bd808: it looked pretty sparse to me too, we have a blacklist somewhere [15:18:43] I would be for adding that [15:18:58] https://wikitech.wikimedia.org/wiki/MediaWiki:Titleblacklist [15:19:25] you are fast on the draw today bd808 :) [15:19:41] if we can confirm that the account is unused in gerrit then we could change the uid [15:20:06] that page is in my awesome bar autocomplete :) [15:23:03] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10Patch-For-Review, and 2 others: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3421136 (10Marostegui) 05Open>03Resolved I have dropped the views from... [15:26:47] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10Patch-For-Review, and 2 others: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3421166 (10jcrespo) ``` root@labsdb1009[(none)]> pager grep ukwikimedia PAG... [15:28:44] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10Patch-For-Review, and 2 others: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3421185 (10Marostegui) >>! In T169488#3421166, @jcrespo wrote: > ``` > root... [15:32:39] 10Data-Services, 10cloud-services-team (Kanban), 10DBA, 10Patch-For-Review, and 2 others: Drop ukwikimedia from labsdb hosts (was: ukwikimedia still present on replicas dbs on labs hosts) - https://phabricator.wikimedia.org/T169488#3421215 (10Marostegui) Revoked privileges. [15:33:22] PROBLEM - Puppet errors on tools-worker-1017 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:33:58] PROBLEM - Puppet errors on tools-flannel-etcd-03 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [15:36:37] 10Cloud-Services, 10DBA, 10Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3421243 (10Marostegui) I have upgraded db1102 to 10.1 so we are now using rbr triggers there. After sanitizing s2 and s6, I ran the check_private... [15:51:20] chasemp: hmmm... I bet we can just drop that uid=git account and nobody will even notice. It has pwdFailureTime attributes going back to 2016-04-15 which means nobody has successfully authed against it in more than a year. [15:51:53] bd808: almost certainly the, I was looking for a task I thought had candidates to add to blacklist already but haven't found it atm [15:52:16] "apache" etc not being on that list as my first thought [15:53:11] www-data is there. "apache" hasn't been used since 12.04 or earlier [15:53:45] well, :D [15:53:50] I don't really know that "git" is a common system account [15:54:25] is there a magic list somewhere for debian/ubuntu of all normal system accounts? [15:54:27] I think gitorious and a few other common bits use it [15:58:09] bd808: not that I know of, but there is a dedicated range of UID's so we could do a sweep and make a collection to blacklist for ease [16:08:23] RECOVERY - Puppet errors on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [16:08:38] chasemp: https://anonscm.debian.org/cgit/d-i/user-setup.git/tree/reserved-usernames [16:08:52] nice [16:08:56] that's a good place to start probably [16:08:58] RECOVERY - Puppet errors on tools-flannel-etcd-03 is OK: OK: Less than 1.00% above the threshold [0.0] [16:09:16] bd808: I found a few random closed ones as one-offs and https://phabricator.wikimedia.org/T131408 [16:09:24] seems like a task to update to handle that list [16:09:26] would be good [16:09:30] I'll make that [16:09:33] (post meeting) [16:09:52] +1. I can make the changes on wiki as part of on-call this week [16:58:07] 10Cloud-Services, 10wikitech.wikimedia.org, 10Phabricator, 10LDAP, and 2 others: Blocking an account on wikitech should disable LDAP logins - https://phabricator.wikimedia.org/T168692#3421582 (10mmodell) p:05Normal>03Low [16:58:48] 10Cloud-Services, 10wikitech.wikimedia.org, 10Phabricator, 10LDAP, and 2 others: Blocking an account on wikitech should disable LDAP logins - https://phabricator.wikimedia.org/T168692#3421583 (10greg) p:05Low>03Lowest [17:04:20] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Inconsistent tool name format in new XTools - https://phabricator.wikimedia.org/T169913#3421602 (10kaldari) 05Open>03Resolved Having both routes seems sensible. [17:21:32] 10Cloud-Services, 10LDAP: LDAP is blocking a local service user account - https://phabricator.wikimedia.org/T170174#3421716 (10Luke081515) [17:21:37] chasemp: I created a task for it :) ^ [17:21:54] Sagan: ok thanks [17:26:03] 10Tools, 10cloud-services-team (Kanban), 10User-bd808: grid-jobs tool broken; loads forever with no actual response - https://phabricator.wikimedia.org/T168653#3421748 (10bd808) ``` lang=irc [17:22] bd808: i was watching shinken alerts for puppet errors over the weekend, and tools-worker-1007 w... [17:26:56] !log tools.grid-jobs Shutdown webservice, causing load spikes on the workers it has run on [17:26:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.grid-jobs/SAL [17:36:44] 10Tool-Labs-tools-Database-Queries, 10Spanish-Sites: DBQ-153 two queries on user/ipblocks/group for English and Spanish Wikipedia - https://phabricator.wikimedia.org/T61415#3421802 (10MarcoAurelio) [17:40:37] 10Cloud-Services, 10wikitech.wikimedia.org, 10Phabricator, 10LDAP, and 2 others: Blocking an account on wikitech should disable LDAP logins - https://phabricator.wikimedia.org/T168692#3421835 (10MarcoAurelio) So instead of disabling an account here, a block on Wikitech will have the effect of blocking the... [17:43:04] 10Cloud-Services, 10wikitech.wikimedia.org, 10Phabricator, 10LDAP, and 2 others: Blocking an account on wikitech should disable LDAP logins - https://phabricator.wikimedia.org/T168692#3373658 (10Luke081515) Phabricator OAuth is based on blocked at mediawiki.prg isn't it? [17:43:08] TabbyCat: ^ [17:47:45] 10Cloud-Services, 10Quarry: Consider moving Quarry to be an installation of Redash - https://phabricator.wikimedia.org/T169452#3421858 (10Milimetric) @Halfak is it a deal-breaker if we couldn't migrate the history of Quarry to Redash? I'm wondering if you care as much about the history as the features themsel... [17:48:23] 10Tool-Labs-tools-Xtools, 10Community-Tech: Average edit size is mysterious in new XTools - https://phabricator.wikimedia.org/T170103#3421859 (10kaldari) @Samwilson: That's the wrong query to use. `rev_len` is the size of the entire revision, i.e. the size of the article immediately after the edit. To get the... [17:49:43] 10Tool-Labs-tools-Xtools, 10Community-Tech: Average edit size is bogus in new XTools - https://phabricator.wikimedia.org/T170103#3421861 (10kaldari) p:05Triage>03Normal [17:50:03] 10Cloud-Services, 10LDAP: LDAP is blocking a local service user account - https://phabricator.wikimedia.org/T170174#3421865 (10chasemp) p:05Triage>03Normal [17:52:52] 10Cloud-Services, 10User-bd808: Update wikitech Titleblacklist - https://phabricator.wikimedia.org/T170178#3421876 (10chasemp) [17:53:02] 10Cloud-Services, 10User-bd808: Update wikitech Titleblacklist - https://phabricator.wikimedia.org/T170178#3421894 (10chasemp) [17:53:23] 10Cloud-Services, 10User-bd808: Update wikitech Titleblacklist - https://phabricator.wikimedia.org/T170178#3421876 (10chasemp) [17:53:25] 10Cloud-Services, 10LDAP: LDAP is blocking a local service user account - https://phabricator.wikimedia.org/T170174#3421716 (10chasemp) [17:56:12] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Inconsistent tool name format in new XTools - https://phabricator.wikimedia.org/T169913#3421915 (10MusikAnimal) >>! In T169913#3419333, @Samwilson wrote: > > Is there a reason to only have one or the other? It doesn't get exposed externally. Nope, just want... [17:56:48] 10Cloud-Services, 10User-bd808: Update wikitech Titleblacklist - https://phabricator.wikimedia.org/T170178#3421876 (10MarcoAurelio) I am not certainly the most experienced with regex, but IMHO usernames that we want to forbid from creation should have the "antispoof" variable set to avoid character spoofing. I... [17:57:25] 10Cloud-Services, 10wikitech.wikimedia.org, 10User-bd808: Update wikitech Titleblacklist - https://phabricator.wikimedia.org/T170178#3421920 (10MarcoAurelio) [17:58:25] 10Data-Services, 10Toolforge, 10cloud-services-team (Kanban): 2017-07-02 Toolforge data loss for permissive data - https://phabricator.wikimedia.org/T169774#3421921 (10chasemp) Note: two larger Tools had data synced after the initial as they are an overwhelming portion of the restore. I have placed these in... [17:58:37] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Loading Edit Counter in new XTools triggers PUT error - https://phabricator.wikimedia.org/T170100#3421923 (10MusikAnimal) >>! In T170100#3419431, @Samwilson wrote: > This is because it's not yet set up with its own database for recording hits, and is still poi... [18:01:34] 10Cloud-Services, 10wikitech.wikimedia.org, 10Phabricator, 10LDAP, and 2 others: Blocking an account on wikitech should disable LDAP logins - https://phabricator.wikimedia.org/T168692#3421928 (10demon) >>! In T168692#3421835, @MarcoAurelio wrote: > So instead of disabling an account here, a block on Wikite... [18:05:08] I got a 502 bad gateway when trying to associate my Phab account in Striker. I couldn't find a bug -- is this known? [18:05:21] bd808: ^ [18:05:27] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Investigation: XTools routing - https://phabricator.wikimedia.org/T163283#3421935 (10MusikAnimal) >>! In T163283#3419421, @kaldari wrote: > So can someone summarize the implementation that was decided on? This ticket started as an investigation on how to get... [18:05:33] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Investigation: XTools routing - https://phabricator.wikimedia.org/T163283#3421939 (10MusikAnimal) 05Open>03Resolved a:03MusikAnimal [18:14:05] madhuvishy: I'm getting the old "502 bad gateway" error when I try to log into PAWS: https://paws.wmflabs.org/paws/user/Jtmorgan should I create a bug ticket, or is this a trivial fix? [18:15:42] bd808: https://phabricator.wikimedia.org/T168653#3421748 didn't the patch work? [18:15:49] 10Tool-Labs-tools-Xtools, 10Community-Tech: Export Page History top-editors as wikitext table - https://phabricator.wikimedia.org/T170098#3419288 (10MusikAnimal) I think we should offer this for all tools, so this "Export" function should be extracted out so that you can give it any blob of structured data and... [18:17:07] 10Tool-Labs-tools-Xtools, 10Community-Tech: Export Page History top-editors as wikitext table - https://phabricator.wikimedia.org/T170098#3422013 (10Cyberpower678) Ugh. I need to refine my Herald script. [18:21:03] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Investigation: XTools routing - https://phabricator.wikimedia.org/T163283#3422033 (10kaldari) @MusikAnimal: Does this implementation give us the stability improvements we were hoping for? [18:23:36] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Set up load balancing for new XTools - https://phabricator.wikimedia.org/T169590#3422038 (10MusikAnimal) So I talked to @bd808, and while some form of load balancing is possible, it's not particularly easy and we don't even know if we need it yet. If you check... [18:24:11] 10Cloud-Services, 10wikitech.wikimedia.org, 10Phabricator, 10LDAP, and 2 others: Blocking an account on wikitech should disable LDAP logins - https://phabricator.wikimedia.org/T168692#3422040 (10zhuyifei1999) See also https://wikitech.wikimedia.org/wiki/Help:Disabling_an_account, which documents the afaict... [18:25:20] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [18:26:18] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Planning for Xtools beta - https://phabricator.wikimedia.org/T167217#3422059 (10MusikAnimal) There's a new prod server in town that needs setting up. I can try but I was hoping @Samwilson would be willing to do it since he set up the existing prod server :) Se... [18:26:20] PROBLEM - Puppet errors on tools-exec-1407 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [18:27:48] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Investigation: XTools routing - https://phabricator.wikimedia.org/T163283#3422066 (10MusikAnimal) >>! In T163283#3422033, @kaldari wrote: > @MusikAnimal: Does this implementation give us the stability improvements we were hoping for? TBD! We need to do some l... [18:28:52] 10Tool-Labs-tools-Xtools, 10Community-Tech: Export Page History top-editors as wikitext table - https://phabricator.wikimedia.org/T170098#3422074 (10MusikAnimal) >>! In T170098#3422013, @Cyberpower678 wrote: > Ugh. I need to refine my Herald script. Yes please do :) [18:29:01] 10Tool-Labs-tools-Xtools, 10Community-Tech: Export Page History top-editors as wikitext table - https://phabricator.wikimedia.org/T170098#3422076 (10MusikAnimal) a:05Cyberpower678>03None [18:29:23] 10Tool-Labs-tools-Xtools, 10Community-Tech: Export Page History top-editors as wikitext table - https://phabricator.wikimedia.org/T170098#3422079 (10MusikAnimal) [18:29:31] 10Tool-Labs-tools-Xtools, 10Community-Tech: Export Page History top-editors as wikitext table - https://phabricator.wikimedia.org/T170098#3419288 (10MusikAnimal) a:05Cyberpower678>03None [18:32:36] PROBLEM - Puppet errors on tools-exec-1406 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [18:36:31] 10Toolforge, 10LDAP, 10Release-Engineering-Team (Kanban): Archive user account purodha - https://phabricator.wikimedia.org/T152857#3422112 (10demon) 05Open>03Resolved a:03demon I think we did everything needed here. [18:45:34] 10Tool-Labs-tools-Xtools, 10Community-Tech: EditCounter: Load autoedits asynchronously from AutoEditsController - https://phabricator.wikimedia.org/T170185#3422164 (10MusikAnimal) [18:55:21] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [18:57:55] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Optimize edit count queries in XTools - https://phabricator.wikimedia.org/T163284#3422210 (10MusikAnimal) >>! In T163284#3419406, @kaldari wrote: > Hmm, tried loading the new XTools again and this time it was slower :( > http://xtools.wmflabs.org/ec/en.wikiped... [19:01:21] RECOVERY - Puppet errors on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [19:02:43] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Give visual feedback while XTools is thinking - https://phabricator.wikimedia.org/T169831#3422247 (10MusikAnimal) [19:07:36] RECOVERY - Puppet errors on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [19:11:39] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Optimize edit count queries in XTools - https://phabricator.wikimedia.org/T163284#3422298 (10MusikAnimal) **Even more ideas:** Symfony has an awesome built-in profiler that you can use in the dev environment, showing all queries and the run time. One thing it... [19:16:56] RainbowSprinkles: hmmm.. I have not seen any 502s from Striker before. Did you notice at what particular point it happened in the workflow? On your way out to meta for the OAuth grant or the way back or ?? [19:18:20] zhuyifei1999_: there is something deeper than the None problem going on. I haven't instrumented anything, but it acts like reading the accounting files is taking forever. [19:18:38] bd808: could it be teh same NFS issue other users have noticed via mounts on workers? [19:18:53] Clicked the button to attach, URL was https://toolsadmin.wikimedia.org/profile/settings/phabricator/attach -- I'll upload a screenshot in a moment [19:19:08] hmm that's unexpected [19:19:09] Just tried again, this time got the generic "Our servers are experiencing a problem" that you see from Varnish [19:19:15] I might check later [19:19:16] chasemp: oh... has that been a problem? It is certainly reading from NFS shares [19:19:52] bd808: https://phabricator.wikimedia.org/F8694335 [19:20:28] bd808: I'm wondering if it's https://phabricator.wikimedia.org/T166949 [19:20:30] or related [19:21:11] RainbowSprinkles: interesting. do you mind opening a bug? I can reattach my phab account just fine so it's at least not a universal failure [19:21:27] Will do [19:21:58] I like how the 502 has a license notice [19:22:08] see this 502'd code? it's available for you to use [19:22:13] or fix!!!! [19:22:35] chasemp: yeah it could be something similar to that. I noticed it right after the last rolling restart of our grid. [19:24:02] 10Striker: 502 Gateway Failure when trying to associate Phabricator account - https://phabricator.wikimedia.org/T170189#3422344 (10demon) [19:24:31] zhuyifei1999_: its got to be file i/o time. The uwsgi log has lines like "GET /grid-jobs/?purge => generated 83797 bytes in 49443413 msecs" which is a crazy long time to look at the last 30days of accounting data [19:24:49] 10Striker: 502 Bad Gateway when trying to associate Phabricator account - https://phabricator.wikimedia.org/T170189#3422357 (10demon) [19:24:51] bd808: Not super urgent for me, but there ya go ^ [19:25:10] thanks RainbowSprinkles. I'll look for error logs after I eat some foodz [19:25:28] 13.73 hours o.O [19:25:35] Foodz sounds like a good plan [19:31:12] PROBLEM - Puppet errors on tools-puppetmaster-02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [19:39:08] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:43:43] J-Mo: sorry, was away this morning, let me poke [19:43:59] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3570 bytes in 0.012 second response time [19:44:01] np thanks madhuvishy! [19:48:35] J-Mo: try now? [19:49:19] madhuvishy: it works! Is there anything I can do to prevent this from happening in future? [19:49:22] thank you :) [19:50:34] J-Mo: it's not you :) When we did some reboots last week a few pods got stuck in a weird state [19:50:44] cool, that makes sense [19:50:52] I just restarted your pod and things got okay [19:51:11] RECOVERY - Puppet errors on tools-puppetmaster-02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:58:37] 10Tool-Labs-tools-Xtools, 10Community-Tech: EditCounter's "pages created" does not match Pages tool - https://phabricator.wikimedia.org/T169955#3422523 (10MusikAnimal) [20:27:58] 10Data-Services, 10Toolforge, 10cloud-services-team (Kanban), 10Wikimedia-Incident: 2017-07-02 Toolforge data loss for permissive data - https://phabricator.wikimedia.org/T169774#3422627 (10Peachey88) [20:33:31] chasemp: it would be nice, if that task with the LDAP user can get resolved quickly, since that os currently blocking my work in some point :/ [20:34:01] Sagan: I can't get to it today, but I think bd808 may have time this week [20:34:20] chasemp: that's nice, thanks :) [20:40:22] Sagan: that's the "git" user? [20:40:46] bd808: yup [20:42:02] *nod* I have a small pile of LDAP cleanup tasks so I'll try to do them all together in the next couple of days. Feel free to yell at me if you don't hear something by Thursday [20:44:30] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:45:43] bd808: ok, thanks :) [21:33:41] (03PS5) 10Paladox: Gerrit: Add gerrit pub key for ssh [labs/private] - 10https://gerrit.wikimedia.org/r/363755 [21:34:29] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [21:50:27] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:55:25] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Set up load balancing for new XTools - https://phabricator.wikimedia.org/T169590#3422977 (10bd808) In the current #cloud-vps environment, the only load balancing that we can do is [[https://en.wikipedia.org/wiki/OSI_model#Layer_7:_Application_Layer|layer 7]],... [21:55:56] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Set up load balancing for new XTools - https://phabricator.wikimedia.org/T169590#3422979 (10bd808) Also the 3 story point estimate here is crazy. ;) [22:10:04] It's a bummer that you can't easily change the size of instances :/ [22:25:04] just in case I should need in one day: is there a way to get more space at an instance? [22:25:41] Sagan: there is the role that will mount your extra instance disk quota [22:26:03] bd808: ah, ok. that makes planning easier :) [22:26:14] its useless on a small instance, but the rest get a decent chunk of disk from it [22:26:37] I guess that's 20 for small, 40 for medium etc...? [22:28:17] minus the base disk size size [22:28:36] ah, ok [22:28:49] yeah, that's a lot for big ones [22:29:02] I'm not sure I will need it, I will just ask again then, when needed :) [22:29:09] the root partition on all of the VM flavors is the same size [22:29:22] and the rest of the quota can be easily mounted at /srv [22:29:47] but for a small there isn't a lot left as I recall, maybe 5G? [22:30:11] actually no, more like 1.5G [22:30:41] the base is ~18G formatted [22:31:01] not my problem, I don't have small :) [22:31:22] bd808: concerning another instance: I guess there is no other way to make an instance smaller the recreating it? [22:31:42] correct. no resizing up or down [22:32:09] and I guess a feature like that will not get added at the next open stack version= [22:32:10] *? [22:32:29] someday™ we hope to have storage that is not tied to the instances themselves (and not NFS) [22:32:44] since that would be nice for me: I have about two instances which can get smaller, but I currently don't have the time to migrate them [22:33:15] that's really the thing. the way we store the data its not easy to change things. There can be OS level problems too inside the guest VM [22:33:49] "oops the last time I booted I had more ram and cores, I'm going to freak out now" [22:33:49] hm, ok [22:34:14] at least one of them is easy to migrate, so I guess I will some day ;) [22:34:34] the ideal thing is to figure out how to have all your setup in Puppet (not trivial today I know) [22:34:56] then you jsut need to migrate data from instance to instance [22:34:59] bd808: but I guess there is no urgent need of free space/ram/etc currently? :) [22:35:05] so that I have some time? :) [22:35:12] yes :) [22:35:21] we jsut added 2 new servers today actually [22:35:26] wow, that's nice [22:35:33] so we have some space for a few months [22:35:49] I guess the only thing that is rare are the floating IPs? [22:36:06] we have folks lined up to use the space already thought too [22:36:31] I'm not sure how many public ips we have open right now, but yes that pool is harder to grow [22:36:50] 10Cloud-Services, 10Toolforge, 10wikimedia-irc-freenode: Freenode sometimes throttles bot connections from tools - https://phabricator.wikimedia.org/T151704#3423115 (10Luke081515) 05stalled>03Open [22:36:57] I guess we can make some IPs free again ^ [22:37:22] (I just noticed that this was still stalled, but we can continue there) [22:37:30] ident on grid engine nodes might be tricky... [22:37:42] ident that is open to freenode [22:37:52] hm [22:37:58] I think we pretty much need fixed ips to do that [22:38:06] so net no change? [22:39:04] hm, depends on. in case that that is not possible, I guess staffers can take a look again [22:39:36] I'm not sure how we could give them ident to hit on the mapped SNAT that would trace back to the tool account [22:40:14] but... we may get ipv6 in the next months (number of months deliberately undefined) [22:40:26] oh, that would be nice :) [22:41:15] bd808: or, als alternative idea, is it possible to set a filter, that ircbots get executed at only one or two exec nodes? then they can get an ident with floating IP, and others won't need one [22:41:52] that could be done, yes. but then we'd have to get everyone to use the config [22:42:33] I think we have plenty of ips for the exec nodes, we just sometimes forget to add them when bringing up a node [22:44:27] bd808: IIRC the limit for normal connections is 30 per IP for freenode, so if not everyone uses the config, it might be successful too [22:45:00] 10Tools: File uploads need to recognise urls on blacklist early in the process - https://phabricator.wikimedia.org/T157436#3423145 (10matmarex) I'm not sure if Flickr2Commons issues are tracked here on Phabricator at all… you probably should comment on https://meta.wikimedia.org/wiki/Talk:Flickr2commons. [22:45:28] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [22:46:41] 10Striker, 10cloud-services-team (Kanban), 10Patch-For-Review, 10User-bd808: Error saving OAuth credentials. [req id: f1a2370b1b8a4e1a8827de96b9bce144] bug - https://phabricator.wikimedia.org/T164847#3423151 (10bd808) Lots of this in the logs today: `_mysql_exceptions.IntegrityError: (1062, "Duplicate entr... [22:54:11] 10Striker: 502 Bad Gateway when trying to associate Phabricator account - https://phabricator.wikimedia.org/T170189#3423165 (10bd808) 😂 (\U0001f602) is making the server cry tears of sadness. I think this is duplicate of {T164034}. ``` [2017-07-10T18:04:04] --- Logging error --- [2017-07-10T18:04:04] Traceback... [23:42:52] 10Tool-Labs-tools-Xtools, 10Community-Tech-Sprint: Optimize edit count queries in XTools - https://phabricator.wikimedia.org/T163284#3423354 (10Samwilson) I have a half-completed Symfony profiler data collector for addwiki/mediawiki-api-base calls, but I couldn't get it working properly. I'll try harder, and p... [23:43:19] (03PS1) 10BryanDavis: [NO NOT MERGE] Add view to show active encoding [labs/striker] - 10https://gerrit.wikimedia.org/r/364342 (https://phabricator.wikimedia.org/T164034) [23:43:40] (03PS2) 10BryanDavis: [NO NOT MERGE] Add view to show active encoding [labs/striker] - 10https://gerrit.wikimedia.org/r/364342 (https://phabricator.wikimedia.org/T164034) [23:44:35] (03CR) 10jerkins-bot: [V: 04-1] [NO NOT MERGE] Add view to show active encoding [labs/striker] - 10https://gerrit.wikimedia.org/r/364342 (https://phabricator.wikimedia.org/T164034) (owner: 10BryanDavis) [23:45:42] (03PS3) 10BryanDavis: [DO NOT MERGE] Add view to show active encoding [labs/striker] - 10https://gerrit.wikimedia.org/r/364342 (https://phabricator.wikimedia.org/T164034) [23:57:09] !log striker Trying to debug with a cherry-pick of https://gerrit.wikimedia.org/r/#/c/364342/ [23:57:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Striker/SAL