[00:09:37] 6Labs, 10Tool-Labs, 6Discovery, 6Discovery-Search-Backlog, 7Elasticsearch: Tool Labs elasticsearch cluster broken by production Puppet changes - https://phabricator.wikimedia.org/T131906#2182531 (10bd808) [00:14:45] 6Labs, 10Tool-Labs, 6Discovery, 6Discovery-Search-Backlog, and 2 others: Tool Labs elasticsearch cluster broken by production Puppet changes - https://phabricator.wikimedia.org/T131906#2182538 (10bd808) a:3Gehel This has been temporarily hacked around with this local patch on tools-puppetmaster-01: ```... [00:34:43] !log tools.luke081515bot deploying CTT-Bot fix [00:34:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.luke081515bot/SAL, Master [01:17:32] !log tools.luke081515bot deploying the 2 latest diffs so that the bot is using the newest version of the SEO-Detector [01:17:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.luke081515bot/SAL, Master [01:38:31] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Metalindustrien was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=418688 edit summary: [02:13:46] PROBLEM - Puppet staleness on tools-pastion-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [43200.0] [02:44:08] Change on 12www.mediawiki.org a page Wikimedia Labs was modified, changed by 122.139.132.200 link https://www.mediawiki.org/w/index.php?diff=2092908 edit summary: [-130] [02:46:49] Change on 12www.mediawiki.org a page Wikimedia Labs was modified, changed by BDavis (WMF) link https://www.mediawiki.org/w/index.php?diff=2092910 edit summary: [+130] Reverted edits by [[Special:Contributions/122.139.132.200|122.139.132.200]] ([[User talk:122.139.132.200|talk]]) to last revision by [[User:Whatamidoing (WMF)|Whatamidoing (WMF)]] [03:15:13] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:17:43] wfm [03:20:12] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 812092 bytes in 6.035 second response time [06:43:13] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [06:44:51] PROBLEM - Puppet run on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [06:49:59] PROBLEM - Puppet run on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [06:55:55] 6Labs, 10Tool-Labs: Virtualenvwrapper script does not exist - https://phabricator.wikimedia.org/T131898#2182741 (10valhallasw) 5Open>3Invalid Debian places it in `/usr/share/virtualenvwrapper/virtualenvwrapper.sh`, but you don't need to load it explicitly -- `bash_completion.d` loads `virtualenvwrapper_laz... [07:18:14] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [07:25:02] RECOVERY - Puppet run on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [08:11:22] (03CR) 10Lokal Profil: [C: 031] "While this looks sane to me I don't know enough about which role setup.cfg plays in relation to tox.ini to be able to fully understand wha" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281810 (owner: 10Jean-Frédéric) [08:13:36] (03CR) 10Lokal Profil: [C: 032] Fix import of pywikibot bits in remaining erfgoed scripts [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281809 (owner: 10Jean-Frédéric) [08:14:20] (03Merged) 10jenkins-bot: Fix import of pywikibot bits in remaining erfgoed scripts [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281809 (owner: 10Jean-Frédéric) [08:18:08] (03CR) 10Lokal Profil: Add unit tests for ucfirst method (031 comment) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281811 (owner: 10Jean-Frédéric) [08:21:33] (03CR) 10Lokal Profil: [C: 032] Extract method extract_elements_from_template_param from update_database [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281812 (owner: 10Jean-Frédéric) [08:33:27] 6Labs, 10Tool-Labs, 13Patch-For-Review: Puppet fails on tools-elastic-01, tools-elastic-02 and tools-elastic-03: "Class[Nginx] is already declared" - https://phabricator.wikimedia.org/T131644#2182816 (10Gehel) 5Open>3Resolved [08:34:40] 6Labs, 10Tool-Labs, 13Patch-For-Review: Puppet fails on tools-elastic-01, tools-elastic-02 and tools-elastic-03: "Class[Nginx] is already declared" - https://phabricator.wikimedia.org/T131644#2173694 (10Gehel) Change https://gerrit.wikimedia.org/r/#/c/281824/ has been merged, which should correct this issue.... [08:34:52] PROBLEM - Host tools-bastion-01 is DOWN: CRITICAL - Host Unreachable (10.68.17.228) [08:36:56] bd808: I fixed my mess on the elasticsearch side. Do you want me to also cleanup the local patch to tools-puppetmaster-01 ? [08:40:10] bd808: actually, I do not have access to tools. I'll let you remove the local patch... [08:40:52] 6Labs, 10Tool-Labs, 6Discovery, 6Discovery-Search-Backlog, and 3 others: Tool Labs elasticsearch cluster broken by production Puppet changes - https://phabricator.wikimedia.org/T131906#2182831 (10Gehel) a:5Gehel>3bd808 [08:41:42] 6Labs, 10Tool-Labs, 6Discovery, 6Discovery-Search-Backlog, and 3 others: Tool Labs elasticsearch cluster broken by production Puppet changes - https://phabricator.wikimedia.org/T131906#2182833 (10Gehel) As I don't have acccess to tools-puppetmaster-01, I'll let you do the cleanup (access requested, we'll s... [08:43:07] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Gehel was created, changed by Gehel link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Gehel edit summary: Created page with "{{Tools Access Request |Justification=Making sure that changes I make on elasticsearch puppet module do not break things in tools (or other similar changes, or help out when n..." [10:06:27] (03CR) 10Jean-Frédéric: [C: 032] "Thanks for the review Lokal_Profil :)" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281810 (owner: 10Jean-Frédéric) [10:06:56] (03CR) 10Jean-Frédéric: "recheck" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281380 (https://phabricator.wikimedia.org/T131739) (owner: 10Lokal Profil) [10:17:04] (03Merged) 10jenkins-bot: Enable cover-inclusive for erfgoedbot unitests [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281810 (owner: 10Jean-Frédéric) [10:17:30] (03CR) 10Jean-Frédéric: [C: 032] Make urls protocol neutral [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281380 (https://phabricator.wikimedia.org/T131739) (owner: 10Lokal Profil) [10:18:25] (03Merged) 10jenkins-bot: Make urls protocol neutral [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281380 (https://phabricator.wikimedia.org/T131739) (owner: 10Lokal Profil) [11:03:44] 6Labs, 10Tool-Labs: Virtualenvwrapper script does not exist - https://phabricator.wikimedia.org/T131898#2183113 (10tom29739) Ah, thanks. [11:40:14] 6Labs, 10Labs-Infrastructure, 6Operations: labnet1002 can't talk to webproxy.eqiad.wmnet:8080, puppet fails to install designateclient - https://phabricator.wikimedia.org/T129623#2183182 (10faidon) p:5Normal>3High Hey — puppet hasn't been running properly on labnet1002 with the above failure for almost a... [12:07:03] (03PS1) 10Lokal Profil: Standardise SQL formatting [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281924 [13:05:20] (03PS1) 10Lokal Profil: [NOT TESTED] Re-implement checks to not use globals [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281930 (https://phabricator.wikimedia.org/T39422) [13:05:52] (03CR) 10jenkins-bot: [V: 04-1] [NOT TESTED] Re-implement checks to not use globals [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281930 (https://phabricator.wikimedia.org/T39422) (owner: 10Lokal Profil) [13:09:13] (03PS2) 10Lokal Profil: [NOT TESTED] Re-implement checks to not use globals [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281930 (https://phabricator.wikimedia.org/T39422) [13:19:10] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Gehel was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=419662 edit summary: [13:31:23] ^ thanks! [13:36:03] !log puppet deleting puppet-certchange instance [13:36:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet/SAL, dummy [13:40:50] Hi, anyone around from labs that could explain to me how to find/map a user/group that makes queries with users of the form p12345g12345? [14:01:05] volans, mysql users of that form? [14:01:15] yes Krenair [14:02:22] I'm only familiar with users starting with u and s [14:02:42] 6Labs, 10Labs-Infrastructure, 6Operations: labnet1002 can't talk to webproxy.eqiad.wmnet:8080, puppet fails to install designateclient - https://phabricator.wikimedia.org/T129623#2183406 (10chasemp) Here is what I believe is happening. Labnet1001 is the inactive node at the moment and has an IPv6 address:... [14:05:04] not my lucky day :) [14:10:43] 6Labs, 10Tool-Labs: Setup monitoring for kubernetes core components. - https://phabricator.wikimedia.org/T131929#2183412 (10chasemp) [14:10:52] 6Labs, 10Tool-Labs: Goal: Allow using k8s instead of GridEngine as a backend for webservices (Tracking) - https://phabricator.wikimedia.org/T129309#2101957 (10chasemp) [14:10:54] 6Labs, 10Tool-Labs, 13Patch-For-Review, 10Scap3 (Scap3-Adoption-Phase1): Setup a proper deployment strategy for Kubernetes - https://phabricator.wikimedia.org/T129311#2183427 (10chasemp) [14:10:56] 6Labs, 10Tool-Labs, 13Patch-For-Review: Upgrade to Kubernetes 1.2 - https://phabricator.wikimedia.org/T130972#2183425 (10chasemp) 5Open>3Resolved https://phabricator.wikimedia.org/T131929 [14:11:55] 6Labs, 10Tool-Labs, 13Patch-For-Review: Install rmytop - https://phabricator.wikimedia.org/T58999#2183430 (10chasemp) 5Open>3Resolved a:3chasemp [14:19:49] 6Labs, 10Labs-Infrastructure, 6Operations: labnet1002 can't talk to webproxy.eqiad.wmnet:8080, puppet fails to install designateclient - https://phabricator.wikimedia.org/T129623#2183482 (10faidon) squid on carbon over IPv4 works fine — we'd have a lot more failures if that wasn't the case (you can verify th... [14:26:25] 6Labs, 10Labs-Infrastructure, 6Operations: labnet1002 can't talk to webproxy.eqiad.wmnet:8080, puppet fails to install designateclient - https://phabricator.wikimedia.org/T129623#2183484 (10chasemp) >>! In T129623#2183482, @faidon wrote: > squid on carbon over IPv4 works fine — we'd have a lot more failures... [14:47:05] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: BaGLAMa2 has very long running queries - https://phabricator.wikimedia.org/T131933#2183504 (10Volans) [14:47:22] 6Labs, 10Tool-Labs, 6Operations, 10Traffic, 6Zero: Tool labs tools should have a method of identifying Zero traffic - https://phabricator.wikimedia.org/T131934#2183516 (10zhuyifei1999) [14:47:49] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: BaGLAMa2 has very long running queries - https://phabricator.wikimedia.org/T131933#2183533 (10Volans) [14:47:51] 6Labs, 10Tool-Labs, 10DBA, 7Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#2183532 (10Volans) [14:53:16] 6Labs, 10Tool-Labs, 6Operations, 10Traffic, 6Zero: Tool labs tools should have a method of identifying Zero traffic - https://phabricator.wikimedia.org/T131934#2183516 (10valhallasw) Does Wikipedia Zero include non-wikipedia domains? I would expect tools.wmflabs.org to fall out of scope. [14:59:05] 6Labs, 10Tool-Labs, 6Zero: Tool labs tools should have a method of identifying Zero traffic - https://phabricator.wikimedia.org/T131934#2183536 (10BBlack) Well the whitelists are by network range, not by hostname. Still, I wouldn't expect the public IPs for labs-y things to be whitelisted. I think the ques... [15:12:25] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2183587 (10Volans) [15:12:41] 6Labs, 10Tool-Labs, 10DBA, 7Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#2183601 (10Volans) [15:12:43] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2183602 (10Volans) [15:16:42] volans, did you figure it out? [15:17:22] Krenair: not yet... and I was also busy with tool labs and you can see ^^ :( [15:17:53] the ones I was referring before is in labs db instead [15:20:16] !log tools Removed local hack for T131906 from tools-puppetmaster-01 [15:20:17] T131906: Tool Labs elasticsearch cluster broken by production Puppet changes - https://phabricator.wikimedia.org/T131906 [15:20:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [15:23:04] 6Labs, 10Tool-Labs, 6Discovery, 6Discovery-Search-Backlog, and 3 others: Tool Labs elasticsearch cluster broken by production Puppet changes - https://phabricator.wikimedia.org/T131906#2183642 (10bd808) 5Open>3Resolved a:5bd808>3Gehel [15:23:47] bd808: ^ thanks! And sorry for the trouble! [15:24:13] gehel: no worries. we got it fixed pretty quickly [15:24:32] there are a lot of ways that Puppet can cause problems :) [15:24:43] PROBLEM - Puppet run on tools-k8s-etcd-03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [15:25:05] * gehel has broken Puppet things more often than can be remembered... [15:26:25] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2183650 (10Cyberpower678) The queries shouldn't be long running though. As a matter of fact the queries take less than a second to execute. But... [15:30:04] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2183660 (10Cyberpower678) I'm adding Ryan to this task, since it is IABot that's contributing to the load. [15:34:20] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2183676 (10Volans) The problem are not single queries but the high number of them and the concurrency, let me gather some additional statistic to... [15:39:04] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2183696 (10Volans) Most metrics started to increase around 1am~2am UTC on March 31st, did you change anything around that time in the tool? Given... [15:40:16] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2183699 (10Cyberpower678) >>! In T131937#2183676, @Volans wrote: > The problem are not single queries but the high number of them and the concurre... [15:43:06] 6Labs, 10Tool-Labs, 10labs-sprint-116, 10labs-sprint-117, and 2 others: Allow direct ssh access to tools - https://phabricator.wikimedia.org/T113979#1681701 (10Andrew) Creating a big group that contains every tool would not be difficult. Before we go further on this though I'd like to have a security revi... [15:43:27] damn.... [15:43:31] security revi.... [15:43:48] revi: sorry :) [15:44:01] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2183711 (10Cyberpower678) >>! In T131937#2183696, @Volans wrote: > Most metrics started to increase around 1am~2am UTC on March 31st, did you chan... [15:44:10] well I usually suffer from this lol [15:44:22] at CVN revert message cut at 'revi...' [15:44:36] #mediawiki-feed same as here [15:44:42] lol :D [15:57:43] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2183804 (10Volans) Here some 5 minutes stats related to the whole DB that changed when applying the throttle: ``` Before After ~360k ~230k user q... [16:04:00] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2183867 (10Cyberpower678) >>! In T131937#2183804, @Volans wrote: > Here some 5 minutes stats related to the whole DB that changed when applying th... [16:31:10] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2183587 (10Legoktm) >>! In T131937#2183699, @Cyberpower678 wrote: > The problem is worker is a seperate bot. They don't know what the other is do... [16:38:46] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2183992 (10Cyberpower678) >>! In T131937#2183955, @Legoktm wrote: >>>! In T131937#2183699, @Cyberpower678 wrote: >> The problem is worker is a sep... [16:46:07] is anyone else getting NXDOMAIN error for nslookup datans.wmflabs.org [16:46:16] YuviPanda, ^ [16:46:34] webproxy to my datans.maps-team is not working :( [16:48:32] yurik: is it listed in your proxy panel in horizon? [16:48:41] valhallasw`cloud, it is [16:48:50] andrewbogott, ^ [16:49:12] yurik: I will look in a moment... [16:49:20] but for some reason none of the new wmflabs.org is resolving for me now [16:49:24] thanks! [16:50:15] yurik: can you please run dig @labs-ns0.wikimedia.org datans.wmflabs.org and also dig @labs-ns2.wikimedia.org datans.wmflabs.org [16:50:17] and tell me what you get? [16:51:19] yurik@steppenwolf:~/wmf/kartotherian/deploy-kartotherian$ dig @labs-ns0.wikimedia.org [16:51:20] ; <<>> DiG 9.9.5-11ubuntu1.3-Ubuntu <<>> @labs-ns0.wikimedia.org [16:51:20] ; (1 server found) [16:51:20] ;; global options: +cmd [16:51:20] ;; Got answer: [16:51:21] ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60643 [16:51:22] ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 [16:51:24] ;; WARNING: recursion requested but not available [16:51:26] ;; OPT PSEUDOSECTION: [16:51:28] ; EDNS: version: 0, flags:; udp: 2800 [16:51:30] ;; QUESTION SECTION: [16:51:32] ;. IN NS [16:51:36] ;; Query time: 114 msec [16:51:38] ;; SERVER: 208.80.155.117#53(208.80.155.117) [16:51:40] ;; WHEN: Wed Apr 06 19:50:38 IDT 2016 [16:51:42] ;; MSG SIZE rcvd: 29 [16:51:47] !paste [16:51:47] https://phabricator.wikimedia.org/paste/ [16:52:36] sorry :( [16:52:59] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2184035 (10Cyberpower678) I have reduced the workers down to 2 for now. [16:53:08] oh, and woops, i didn't do the full request, one sec [16:53:14] yurik: I don't think you actually... [16:53:15] yeah :) [16:54:35] andrewbogott, https://phabricator.wikimedia.org/P2870 [16:55:06] yurik: so, that's the second one... [16:55:34] ? [16:56:07] andrewbogott, i just updated the the paste - its not resolving for me for some reason [16:56:09] I asked you to run two different digs, to compare. You only pasted one of them, right? [16:56:29] i should have read more accuratelly :( [16:56:30] sec [16:57:04] andrewbogott, updated [16:57:33] yeah, so our resolvers are working but you're not getting to them... [16:57:38] where are you sitting right now? [16:57:46] spb - ru [16:58:13] st. petersburg [16:58:26] can you resolve .wikimedia.org things? [16:58:50] yes [16:59:11] i just tried zero.wikimedia.org - works [16:59:30] some of the older *.wmflabs.org are working fine [16:59:46] the datans, data, and data2 were just created [16:59:58] and i used the new horizon interface to instantiate them [17:01:33] (which i never tried before) [17:02:36] what ip do you see for LABS-NS0.WIKIMEDIA.ORG and LABS-NS2.WIKIMEDIA.ORG? [17:02:41] (oops, sorry about allcaps) [17:04:23] andrewbogott, 208.80.155.117 for both [17:06:33] yurik: can you do one more dig for me? This time just "dig datans.wmflabs.org" [17:08:14] andrewbogott, both work now! [17:08:23] datans and data -> 208.80.155.156 [17:08:24] ok... [17:08:30] did you fix it? [17:08:34] nope [17:08:34] or is it just magic? [17:08:37] grr [17:08:38] hate those [17:08:57] some weird dns repl issue? [17:08:59] Well, is there a chance you tried to resolve datans.wmflabs.org before you created the entry on horizon? [17:09:11] If you did that then the NXDOMAIN could have been cached for a while [17:09:47] andrewbogott, i am sure i created data2 proxy before pinging it, but than i pinged it about 30 seconds late [17:09:49] later [17:09:53] could that be not enough? [17:10:05] Seems unlikely, but possible. [17:10:17] It's also possible that some week-old dns change only just now filtered down to your corner of the internet [17:10:27] although I'd expect to have heard about that by now [17:10:36] Anyway — let me know if you see the same delay next time you do this :) [17:10:48] beh. ok, thanks for looking! [17:10:57] seems to work now :) [17:21:11] 6Labs, 10Tool-Labs, 10DBA, 7Tracking: Disabling general.confirmeduser from dbreports for using up too much db resources - https://phabricator.wikimedia.org/T131956#2184155 (10yuvipanda) [17:24:33] 6Labs, 10Tool-Labs, 10DBA, 7Tracking: Disabling general.confirmeduser from dbreports for using up too much db resources - https://phabricator.wikimedia.org/T131956#2184155 (10Volans) The very-long running query was: ``` | 21844654 | p50380g50440 | 10.68.xx.xx:54190 | enwiki_p | Execu... [18:07:03] 6Labs, 10Tool-Labs: Setup monitoring for kubernetes core components. - https://phabricator.wikimedia.org/T131929#2184288 (10yuvipanda) Minimum required is just to check: 1. All the processes that are running are running 2. All the things that should be marked as ready are marked as ready Not fully sure how t... [18:07:11] godog: ^ monitoring in k8s :) [18:17:38] Andrewbogott, Yuvipanda: I guess you now that? https://phabricator.wikimedia.org/T131375#2183636 [18:17:59] no... [18:18:50] Traffic between labs instances and the internal production WMF subnet is blocked, on purpose. [18:19:18] Traffic between labs instances and public WMF services, not blocked. [18:19:40] But most production servers block ssh from most places, as they should. [18:24:10] YuviPanda: sweet! I added you to a few prometheus code reviews btw [18:24:44] godog: yeah, I need to dig in sometime. so much time... [18:25:00] andrewbogott: on that comment he means to git-ssh.wm.o I imagine [18:25:22] which is one of those rare exceptions taht craps on all things [18:25:27] i.e. 22 allowed [18:25:53] andrewbogott: is wikitech now on hhvm ? [18:25:57] well… I can't think of why that wouldn't just be fully public then. Routing to a public IP should mostly work [18:26:04] matanya: Nope [18:26:40] three is a port 22 all inclusive block high up the acl chain [18:26:52] ah [18:26:59] well, that's both reasonable and problematic [18:27:29] I'm making sure I understand how nat is working before I add a change [18:27:36] I think I'm about there just verifying what I think I know in case [18:28:27] but I can't help but think we've gotten this far with gerrit as-is [18:28:39] maybe web based clone enforce isn't the worst idea from labs I'm not sure [18:33:18] matanya, wikitech is running on a machine with php 5.5 I think [18:41:10] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2184403 (10Volans) For reference, after the above change the same stats of few posts above went to: ``` ~180k user queries ~5k connections ~... [18:51:21] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2184434 (10Cyberpower678) >>! In T131937#2184403, @Volans wrote: > For reference, after the above change the same stats of few posts above went to... [18:56:33] YuviPanda: Howdy, can you help me figure out what is going on here? https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Wma.wmflabs.org_down [18:59:33] CKoerner_WMF: hmm, I'm not familiar with that project. the canonical answer is 'talk to the maintainers of the project to see what is going on' but I don't know who the maintainers or nor what the project actually is... [18:59:47] I'm finding out tho [19:00:22] One of the things it does is provide the little dropdown map when you click on the coordinates globe on wikipedia articles. https://en.wikipedia.org/wiki/Paris [19:00:51] CKoerner_WMF: it's part of the maps project, so it would be one of the many people listed on https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps [19:00:56] THAT'S WHAT YOU GET FOR NOT USING KARTOGRAPHER [19:01:02] hmm [19:01:10] YuviPanda: another dns/proxy entry gone wrong? [19:01:18] valhallasw`cloud: yeahI just realized [19:01:23] CKoerner_WMF: this might be on us,a ctually [19:01:31] This Pahb ticket is a little old, but related: https://phabricator.wikimedia.org/T104417 [19:01:34] *Phab [19:01:42] MaxSem: :) [19:01:43] andrewbogott: so that is an NXDOMAIN for me.. [19:01:51] not sure what happened there. [19:01:55] let me look at the wmflabsdotorg user [19:02:05] DNS entry got eaten by a grue? [19:02:10] two factor auth plus session expiry every 2 minutes is worst [19:03:32] 2 minutes? ughhhhh [19:03:50] 2 hours, sorry [19:04:14] YuviPanda: it's 24 hours now [19:04:25] YuviPanda: um… nxdomain, what? [19:04:34] andrewbogott: for wma.wmflabs.org [19:04:44] still, OSM is min(7 days, whenever it feels like losing your creds) [19:04:58] andrewbogott: actually, it isn't NXDOMAIN - it's NOERROR, but doesn't return any data [19:06:21] MaxSem: 7 days would be nice but I need extra GUI options for that, a 7-day default seems risky, what if you're in a north korean internet cafe? [19:06:49] YuviPanda: is there context for wma.wmflabs.org? Sorry, I'm not seeing whatever brought this up in the backscroll [19:06:57] well [19:07:26] I think the extra GUI bit we have to do, andrewbogott [19:07:27] if you force users to log in too often, they compromise your security by choosing passwords like 123 [19:07:45] andrewbogott: anyway, so wma.wmflabs.org is a proxy that's been used for a while (months/years I think?) [19:07:50] andrewbogott: but doesn't resolve anymore. [19:07:58] YuviPanda: it doesn't resolve on the old ldap server either though [19:08:05] So I don't know how it can have been in use before [19:08:09] unless I'm making some kind of dumb mistake [19:08:16] andrewbogott: and in the horizon UI, I see 208.80.155.156 (proxy IP set) for that record [19:08:26] oh, wait, I probably am making a dumb mistake, hang on... [19:09:10] ah, ok, there it is [19:09:12] * andrewbogott looks [19:09:22] CKoerner_WMF: ^ we're investigating [19:09:56] Great, thank you. Please let me know what you find and I'd be happy to pass it on to the concerned community member. [19:12:04] Ah, ok, this is a standard case of "proxy on a domain that also has subdomains" [19:17:23] CKoerner_WMF: Fixed (well, hacked really) and responded on the village pump [19:18:03] 6Labs, 10Horizon: Proxy corner case: proxy name foo.wmflabs.org == domain name foo.wmflabs.org - https://phabricator.wikimedia.org/T131367#2184530 (10Andrew) well, I was wrong about the 'no longer any current proxies bitten' thing -- wma.wmflabs.org is one. So my audit was wrong somehow. [19:18:04] That's great, thank you [19:21:11] CKoerner_WMF: works for you too now, I hope? [19:21:32] Yes! [19:21:40] thanks Krenair [19:26:46] 6Labs, 10Horizon: Proxy corner case: proxy name foo.wmflabs.org == domain name foo.wmflabs.org - https://phabricator.wikimedia.org/T131367#2184560 (10JJMC89) [19:29:06] 6Labs, 10Horizon: Proxy corner case: proxy name foo.wmflabs.org == domain name foo.wmflabs.org - https://phabricator.wikimedia.org/T131367#2184576 (10Ktr101) Andrew has overridden the issue, for now: https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_%28technical%29&diff=713953259&oldid=713951626 [19:29:32] andrewbogott: Krenair did wikitech die again? I'm getting a blank page https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/Dell_PowerEdge_RN30 [19:29:48] YuviPanda: 'again'? [19:29:52] It's blank for me too :( [19:30:08] this happened last week too I think, and someone undid a mediawiki deploy [19:30:20] there's not a train deploy on Wednesdays though is there? [19:30:49] not sure [19:31:47] there is [19:43:57] andrewbogott: quick q (cause my memory does not help me). VM actions (creations, deletion, reboot, etc) will be happening via horizon or wikitech in the future (distant as well as not so distant) [19:44:00] ? [19:44:24] akosiaris: currently supported on both, but I'm hoping to disable on wikitech once people have settled in with Horizon a bit. [19:46:15] ah, so the actions are in that drop down meny at the right [19:46:17] ok.. [19:46:36] andrewbogott: ok thanks! [19:47:03] akosiaris: if you don't mind I'd appreciate you doing things via Horizon and telling me what is terrible and/or broken. [19:47:31] andrewbogott: I just logged in for the very first time. You will be hearing from me ;-) [19:47:58] * andrewbogott looks forward to that, and also doesn't [20:24:50] 6Labs, 13Patch-For-Review, 5WMF-deploy-2016-04-05_(1.27.0-wmf.20): Add 'novaadmin' to new projects, always - https://phabricator.wikimedia.org/T131411#2184704 (10Andrew) 5Open>3Resolved Deployed on wikitech, tested, looks good. [21:18:58] bd808: Are you here? Cocerning my composer problem: The json file is empty, but I don't know why, and what I should write in there [21:20:31] Luke081515: something bad happened at some point while you were running Composer. That file should never be empty. I would personally delete vendor and composer.lock and run `vagrant provision` to have them recreated. [21:20:52] bd808: The whole vendor directory? [21:21:13] yeah. composer will recreate it with the right stuff [21:21:17] ok [21:21:35] I just want to make sure, that I'm not deleting the wrong stuff ;) [21:22:02] 6Labs, 10Labs-Infrastructure, 6Operations: labnet1002 can't talk to webproxy.eqiad.wmnet:8080, puppet fails to install designateclient - https://phabricator.wikimedia.org/T129623#2184867 (10chasemp) Thank you faidon, that is indeed the story. I put in a specific allowance for the labs-hosts VLAN in question... [21:22:49] Luke081515: *nod* Just don't chmod -R things again ;) [21:23:03] no, just rm -R ;) [21:23:09] *-r [21:27:04] 6Labs, 10Tool-Labs, 13Patch-For-Review, 10Scap3 (Scap3-Adoption-Phase1): Setup a proper deployment strategy for Kubernetes - https://phabricator.wikimedia.org/T129311#2184878 (10yuvipanda) a:5yuvipanda>3None Our current ghetto one worked fine for the 1.2 upgrade. I'm going to let this be while other se... [21:30:42] bd808: Thx, the composer update was now succesful, but fyi it throws a deprecation notice [21:30:55] *nod* know issue [21:31:07] we can't change the class in core that triggers that yet [21:31:09] I will document my solving steps at the task now [21:31:14] ok [21:49:21] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2184971 (10Volans) I would say it was just the amount of concurrent queries given the limited shared resources, I don't think persistent will help... [22:00:05] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2185038 (10Cyberpower678) >>! In T131937#2184971, @Volans wrote: > I would say it was just the amount of concurrent queries given the limited shar... [23:21:30] tools-login.wmflabs.org's fingerprint changed? [23:21:44] Leah: Yes, see topic. [23:22:00] All right. [23:22:12] Matthew_: 5 seconds later, and I would said the same :D [23:22:18] Bit annoying to change the fingerprint. [23:22:46] Leah: I actually deleted mine and let it readded. [23:22:48] *readd [23:22:54] That's everyone does. [23:22:55] Luke081515: Ha ha, beat you ;) [23:22:57] It's total security theater. [23:23:08] Leah: Well I verified it too! [23:23:19] Not that I'm subscribed, but was the fingerprint change announced on labs-announce? [23:23:20] I was getting IP errors anyway so... [23:23:24] I don't see it at https://lists.wikimedia.org/pipermail/labs-announce/2016-March/date.html [23:23:47] I think it only happened a couple days ago. Try May. [23:23:50] *April [23:23:56] https://lists.wikimedia.org/pipermail/labs-announce/2016-April/date.html Yeah. [23:24:01] Not there either, I think. [23:24:02] Oh well. [23:24:09] Huh. [23:24:17] tools-bastion-03, how exciting. [23:25:16] Leah: It was on the main labs-l, subject "[Labs-l] [Tools] New bastion at tools-login.wmflabs.org" [23:26:13] All right. [23:26:18] Definitely not subscribed to that either. ;-) [23:26:37] Fair enough. [23:27:03] 6Labs, 10Tool-Labs, 10DBA, 7Tracking: Disabling general.confirmeduser from dbreports for using up too much db resources - https://phabricator.wikimedia.org/T131956#2185515 (10MZMcBride) https://en.wikipedia.org/wiki/Wikipedia:Database_reports/Autoconfirmed_users_in_the_confirmed_user_group ``` MariaDB [en... [23:27:09] Womp womp. ^ [23:28:37] 6Labs, 10Tool-Labs, 10DBA, 7Tracking: Disabling general.confirmeduser from dbreports for using up too much db resources - https://phabricator.wikimedia.org/T131956#2185531 (10MZMcBride) ``` tools.dbreps@tools-bastion-03:~$ crontab -l | grep -B1 confirmedusers # Disabled for using too much Db Resource - Yuv...