[00:10:10] RECOVERY - Puppet run on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [00:25:09] RECOVERY - Puppet run on tools-webgrid-lighttpd-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [00:28:38] 06Labs: Add Content-Security-Policy header enforcing 3rd party web interaction restrictions to proxy responses - https://phabricator.wikimedia.org/T130748#2321625 (10bd808) >>! In T130748#2321327, @tom29739 wrote: > Why not use an OAuth system for something like this? (OAuth 2), preferably). 😃 I do not immediat... [00:40:39] 06Labs: Add Content-Security-Policy header enforcing 3rd party web interaction restrictions to proxy responses - https://phabricator.wikimedia.org/T130748#2321643 (10bd808) >>! In T130748#2321566, @Tgr wrote: > Such a system would probably reduce usability of the tool for users who use incognito mode and break i... [01:38:42] 06Labs, 10Tool-Labs: tools-exec-1202 needs to be rebooted due to /public/dumps not being properly mounted - https://phabricator.wikimedia.org/T136062#2321771 (10scfc) [01:46:50] 06Labs, 10Tool-Labs: tools-exec-1202 needs to be rebooted due to /public/dumps not being properly mounted - https://phabricator.wikimedia.org/T136062#2321786 (10scfc) 05Open>03Resolved I disabled (`qmod -d '*@tools-exec-1202'`) the queues for that host, rescheduled the continuous jobs there (`for job in 41... [01:50:15] 06Labs, 10Tool-Labs: /public/dumps is mounted read-write on a number of hosts - https://phabricator.wikimedia.org/T136063#2321788 (10scfc) [02:14:08] 06Labs, 10Tool-Labs: tools-exec-1202 needs to be rebooted due to /public/dumps not being properly mounted - https://phabricator.wikimedia.org/T136062#2321771 (10chasemp) Thanks @scfc there is an issue with precise hosts that happens sometimes and I thought I got them all. [02:56:59] PROBLEM - Puppet run on tools-worker-1010 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [03:20:41] 06Labs, 10Tool-Labs, 10DBA: p50380g50816__pop_stats (popularpages) using 53G on labsdb1001 (enwiki) - https://phabricator.wikimedia.org/T133326#2321854 (10Mr.Z-man) I don't really have the time to keep up with the increasing maintenance on this tool anymore. My plan is to put up some notices to see if anyone... [03:42:05] RECOVERY - Puppet run on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0] [04:00:15] (03PS103) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [04:03:36] (03CR) 10Ricordisamoa: "PS103 adds grunt-stylelint and stylelint-config-wikimedia" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [04:05:44] (03PS104) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [04:10:19] (03CR) 10Ricordisamoa: "PS104 enables stylelint rule 'number-leading-zero'" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [04:42:20] is there any reason why /public/dumps wouldn't work for me in wikidata-query project? cd /public/dumps just hangs [04:43:48] SMalyshev: they mvoed hosts earlier, but some instances never recovered I think. you can either file a bug and cc chasemp, or just reboot (known to fix) [04:43:50] sorry about that [04:44:08] YuviPanda: ok, I'll try rebooting [04:47:37] hmm... also all dumps in /public/dumps/public/wikidatawiki/entities suddenly gone [04:47:50] were there 10 mins ago... [04:48:41] e.g. /public/dumps/public/wikidatawiki/entities/20160516 had 4 files but now I see none... [04:52:05] YuviPanda: yeah reboot helps but now dumps are gone :( [04:58:16] SMalyshev: i'm in a different tz and it's too late for me unfortunately, can you 1. run puppet and see if that fixes it and if not 2. open a bug? [04:58:30] YuviPanda: ok, thanks [05:39:51] RECOVERY - Puppet run on tools-exec-1214 is OK: OK: Less than 1.00% above the threshold [0.0] [05:42:25] RECOVERY - Puppet staleness on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [3600.0] [06:29:46] Is there currently a performance issue at labs? I'm not sure if it's my connection, but for example phabricator and mediawiki.org are working normal, why connecting to gerrit or one of the bastions is very slow [06:31:59] Can someone test his connection to one of the bastions? Then I would know, if it's my connection, or something other ;) [06:38:09] hm, the connection to the labs instances works now, but is very slow. Can someone take a look? [06:54:22] PROBLEM - Puppet staleness on tools-grid-shadow is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [43200.0] [07:14:14] RECOVERY - Puppet run on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [07:35:03] 10Wikibugs, 07Tracking: Get rid of screen scraping in Wikibugs (tracking) - https://phabricator.wikimedia.org/T1175#2322014 (10mmodell) [07:35:09] 10Wikibugs, 07Tracking: Get rid of screen scraping in Wikibugs (tracking) - https://phabricator.wikimedia.org/T1175#20330 (10mmodell) T123417 will take care of this. [08:05:11] can someone take a look at the labs proxys? There are instable at the moment. I guees about 40% of my request end up in 502 [08:21:23] 06Labs: Unpuppetized database things on labservices* - https://phabricator.wikimedia.org/T136065#2322083 (10Aklapper) [08:34:21] Luke081515: which proxy, which url, etc? [08:34:33] (it helps if you provide information like that in your initial message) [08:34:53] PROBLEM - Host tools-bastion-01 is DOWN: CRITICAL - Host Unreachable (10.68.17.228) [08:39:22] valhallasw`cloud: all proxys of my project rcm seem instable [08:39:31] that's why I think that it could be a more general issue? [08:40:12] for example my jenkins test instance, which has autoreload on, runs in a 502 about every minute [08:40:31] 06Labs: Add Content-Security-Policy header enforcing 3rd party web interaction restrictions to proxy responses - https://phabricator.wikimedia.org/T130748#2322100 (10Tgr) >>! In T130748#2321625, @bd808 wrote: > This is why the cookie would need to include some sort of cryptographic security (e.g. an HMAC signatu... [08:47:59] that's why I think it's not an issue of the webservice. If you run in a 502, reload, and then you have 200 again [08:50:18] Luke081515: you're still not giving me any information I can work with [09:06:13] valhallasw`cloud: tin.wmflabs.org, cac.wmflabs.org [09:07:10] and those are proxying to which host at which port? [09:07:29] the fact that the webserver restart fixes it highly suggests it's an issue on that end, not the proxy [09:07:50] because the labs proxy just has a static route to your webserver host [09:10:27] RECOVERY - Puppet run on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [09:44:14] valhallasw`cloud: the proxy goes to http://tin.rcm.eqiad.wmflabs:8080 and http://cac.rcm.eqiad.wmflabs:8080 [10:03:28] RECOVERY - Puppet run on tools-webgrid-lighttpd-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [10:10:10] Luke081515: hrm, odd. I'm now tailing the nginx log, but of course everything is working nicely now :-p [10:10:58] 2016/05/24 10:10:50 [error] 20133#20133: *6511568 cac.rcm.eqiad.wmflabs could not be resolved (2: Server failure) [10:12:51] hm [10:13:50] valhallasw`cloud: at the log of tin.rcm.eqiad.wmflabs too? I think there were a lot of errors.. [10:14:40] ? [10:15:26] 06Labs: novaproxy 502's due to intermittent DNS failures - https://phabricator.wikimedia.org/T136073#2322237 (10valhallasw) [10:16:20] valhallasw`cloud: this instance has a jenkins install, so no apache2 directly installed, but jenkins has a autoload feature, and a lot of web requests failed. so I think, if you search for a big log, you have a bigger chance, to find it there ;) [10:16:36] ??? [10:16:54] ah, ok, I read the task ;) [10:25:49] 06Labs, 10Tool-Labs, 10ReleaseTaggerBot: No activity by ReleaseTaggerBot since 17 May - https://phabricator.wikimedia.org/T136041#2322276 (10hashar) The bot runs on #tool-labs we would need someone being a member of the group `forrestbot` to restart it somehow. https://wikitech.wikimedia.org/wiki/ReleaseTagg... [10:29:23] there was a database on c2.labsdb named "p50380g51602_p_delinker_p.delinker" .... but it seems gone. [10:29:38] looks like labs has moved databases around again. where it is now´? sigh sigh [10:30:45] wouldn't p50380g51602_p_delinker_p.delinker be a table? [10:31:20] a table `delinker` in a database called p50380g51602_p_delinker_p, which doesn't seem like Labs convention? [10:31:31] copypast error [10:32:46] tools.delinker@tools-bastion-03:~/cd$ mysql --defaults-file="${HOME}"/replica.my.cnf -h c2.labsdb p50380g51602_p_delinker_p [10:32:47] ERROR 2003 (HY000): Can't connect to MySQL server on 'c2.labsdb' (111 [10:32:55] whole c2 seems gone [10:33:42] and the db isn't in tools-db. [10:33:51] * Steinsplitter pokes valhallasw`cloud :) [10:35:09] Steinsplitter: http://thread.gmane.org/gmane.org.wikimedia.labs.announce/100 ? [10:36:05] * harej imagines someone trying to knock on the door of a dead database server [10:36:25] and I think you can consider that database to be lost forever [10:36:51] 06Labs, 10Beta-Cluster-Infrastructure, 06Operations, 10Traffic, 13Patch-For-Review: Varnishlog doesn't properly rotates logs, varnish.log is empty since forever (was: deployment-cache-upload04 (m1.medium) / is almost full) - https://phabricator.wikimedia.org/T135700#2322313 (10hashar) Puppet is broken on... [10:37:50] valhallasw`cloud: gone o_O , you don't backup labs? okay, that is bad. so c2.labsdb is gone, right? [10:38:02] Steinsplitter: we don't backup *replica servers* [10:38:13] we backup some parts of the rest of labs [10:38:22] i don't talk about replica, but about a user database. [10:38:31] Steinsplitter, that is on purpose, if you are storing important things there, you are misusing the replicas [10:38:38] it's a user database *on a replica server* [10:38:57] i am misuising what, jynus. please elaborate. [10:39:02] replica servers are suposed to store only scratch/summary data [10:39:14] *Can* you store things on replica servers? I thought there was a whole userspace that was separate from the replicas. [10:39:16] 06Labs, 10Labs-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review: Bump quota of Nodepool instances (contintcloud tenant) - https://phabricator.wikimedia.org/T133911#2248624 (10hashar) [10:39:26] harej: you can create databases there, yes. [10:39:42] it is a old database created when labs has been created, before you had your job at wikimedia - jynus. i don't like the word "misuese". [10:39:42] there is also tools.labsdb which is a seperate server that might be backed up [10:39:48] and i tas created by a sysadmin, not me :) [10:39:49] just no backups are done [10:39:55] That may be why I thought the name was weird; I'm used to names of databases on Tool Labs being like s2423293498__blahblah [10:40:15] harej: the pXXXgYYY is an old format [10:40:20] Steinsplitter, I am not using you as in yourself, but as "one" [10:40:51] Steinsplitter: in general, the rule for labs is (and always has been) 'make sure you have your own backups' [10:41:41] but that doesn't solve your immediate issue [10:41:46] somone told me years ago there there are backugs (it was ryan lange? no idea). well. thanks aniway :) [10:41:51] *lane [10:41:59] there are backups, but you shouldn't depend on them [10:43:52] toolsdb has redundancy on several hosts, but again it should not be relied on [10:44:18] valhallasw`cloud: okay, thanks then :). -1 on my todo list :) [10:44:26] right, redundancy and backups aren't the same thing [10:44:29] e.g. if you do DROP TABLE, we cannot help you [10:44:52] Steinsplitter: sorry for the inconvenience. [10:46:37] the main issue is that user tables mostly use myisam, and it is impossible to backup those without creating service disruption to the replicas and other users [10:47:01] however, if you know you are not writing to them, you can backup your own tables [10:47:09] easily [10:50:53] jynus: it is very old database (created when the old wmde toolserver has been switched) not created by me, and i spent moore than a hour to search for it - had to merge some old databases into a new one. therefore i asked here. thanks for your help as well :) [10:52:29] it is important to understand also that "servers have changed again" should be a normal thing to happen [10:52:48] servers need maintenance, and soon, new server will arrive [10:53:00] that meas depooling and repooling servers [10:53:38] in a nutshel, the *service* is guaranteed, the servers are not [10:54:11] we can improve the method for failing back user databases, but that will need your feedback [11:35:13] 06Labs, 10Tool-Labs, 10ReleaseTaggerBot: No activity by ReleaseTaggerBot since 17 May - https://phabricator.wikimedia.org/T136041#2322502 (10valhallasw) This seems to be a parsing issue due to the double Bug: Bug: in https://gerrit.wikimedia.org/r/#/c/289251/. I have changed the parsing code slightly (I'll... [11:36:36] (03PS1) 10Gerrit Patch Uploader: Task parsing code: always split by the /last/ : [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/290424 (https://phabricator.wikimedia.org/T136041) [11:36:38] (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/290424 (https://phabricator.wikimedia.org/T136041) (owner: 10Gerrit Patch Uploader) [11:40:07] 06Labs, 10Tool-Labs, 10ReleaseTaggerBot, 13Patch-For-Review: No activity by ReleaseTaggerBot since 17 May due to parsing error with "Bug: Bug:" in commit message - https://phabricator.wikimedia.org/T136041#2322507 (10Aklapper) [11:54:19] 06Labs, 10Tool-Labs, 10ReleaseTaggerBot, 13Patch-For-Review: No activity by ReleaseTaggerBot since 17 May due to parsing error with "Bug: Bug:" in commit message - https://phabricator.wikimedia.org/T136041#2322530 (10hashar) a:03valhallasw @valhallasw looks like you have fixed it https://phabricator.wiki... [11:54:58] 10Tool-Labs-tools-Other, 07I18n: [[Wikimedia:Pageviews-faq-throttle-wait-title/en]] needs PLURAL - https://phabricator.wikimedia.org/T136027#2322532 (10Aklapper) [13:17:45] 06Labs, 10Horizon, 13Patch-For-Review: Switch dynamicproxy to point back to IP rather than domain names - https://phabricator.wikimedia.org/T133554#2322750 (10valhallasw) [13:17:47] 06Labs: novaproxy 502's due to intermittent DNS failures - https://phabricator.wikimedia.org/T136073#2322749 (10valhallasw) [13:34:40] 06Labs, 10Tool-Labs: /public/dumps is mounted read-write on a number of hosts - https://phabricator.wikimedia.org/T136063#2322792 (10chasemp) 05Open>03Resolved a:03chasemp thanks @scfc, right now mount management is ...difficult at best. These are a combination of things, some hosts have a legacy mount... [13:35:17] 06Labs, 10Horizon, 13Patch-For-Review: Switch dynamicproxy to point back to IP rather than domain names - https://phabricator.wikimedia.org/T133554#2322795 (10chasemp) what is left here considering T133554? [13:35:26] 06Labs, 10Horizon, 13Patch-For-Review: Switch dynamicproxy to point back to IP rather than domain names - https://phabricator.wikimedia.org/T133554#2322796 (10chasemp) p:05Triage>03High [13:37:20] Are labs having issues? Been getting 502's all day... [13:37:29] (nginx/1.9.4) [13:38:47] 502s to what from what? [13:39:38] Accessing tools, such as https://quarry.wmflabs.org/query/9152 and https://accounts.wmflabs.org/acc.php [13:39:53] off-and-on all day* not all the time [13:40:59] k [13:41:55] That's probably https://phabricator.wikimedia.org/T133554#2322796 [13:42:56] valhallasw`cloud: do we see any other dns issues in labs? [13:45:43] chasemp: not that I've noticed. It's also intermittent [13:46:09] One request fails, then the next request less than a second later is fine [13:46:41] my uneducated guess atm is that it client behavior flooding the down DNS server is sometimes overwhelmed to the point of dropping responses/requests [13:47:21] Josve05afk: as far as we know things shoudl be stabilized literally...now :) can you ping back if you keep seeing this? [13:49:01] sure thing :) [13:49:29] 06Labs: Unpuppetized database things on labservices* - https://phabricator.wikimedia.org/T136065#2322833 (10Andrew) [13:51:23] 06Labs: Unpuppetized database things on labservices* - https://phabricator.wikimedia.org/T136065#2322842 (10chasemp) p:05Triage>03High [13:57:30] chasemp: the error.log looks OK now. just 'No route to host', no dns 'Server failure's [13:58:35] last dns server failure was 13:41:11 UTC [14:14:09] 06Labs, 10Horizon, 13Patch-For-Review: Switch dynamicproxy to point back to IP rather than domain names - https://phabricator.wikimedia.org/T133554#2322888 (10yuvipanda) I think https://phabricator.wikimedia.org/T133554#2243928 and then do the same in redis as well. [14:54:28] 06Labs, 13Patch-For-Review: Unpuppetized database things on labservices* - https://phabricator.wikimedia.org/T136065#2322988 (10Andrew) [14:54:31] 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: rename holmium to labservices1002 - https://phabricator.wikimedia.org/T106303#2322987 (10Andrew) [14:55:58] 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: Set up LVS for labs dns recursors - https://phabricator.wikimedia.org/T119660#2322993 (10Andrew) [14:56:00] 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: rename holmium to labservices1002 - https://phabricator.wikimedia.org/T106303#1464173 (10Andrew) [15:16:39] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: integration-dev instance changed of IP address - https://phabricator.wikimedia.org/T133207#2323109 (10hashar) 05Open>03Resolved a:03hashar I have eventually deleted the instance. Been using it to build Nodepool images but I am now... [15:18:55] 06Labs, 10wikitech.wikimedia.org, 07Regression: Wikitech sign-up page has bad styling - https://phabricator.wikimedia.org/T136032#2323116 (10Krinkle) a:05Krinkle>03None [15:23:50] 06Labs: Add Content-Security-Policy header enforcing 3rd party web interaction restrictions to proxy responses - https://phabricator.wikimedia.org/T130748#2323148 (10csteipp) If you decide to go with the crypto cookie, I'd recommend using a JWT, with either an HS256 or ES256 signature. It's url-safe encoded so u... [15:42:56] 06Labs: Add Content-Security-Policy header enforcing 3rd party web interaction restrictions to proxy responses - https://phabricator.wikimedia.org/T130748#2323206 (10bd808) >>! In T130748#2322100, @Tgr wrote: >>>! In T130748#2321625, @bd808 wrote: >> This is why the cookie would need to include some sort of cryp... [16:01:29] 06Labs: Missing data on labs replica database - https://phabricator.wikimedia.org/T133715#2323321 (10Ragesoss) Not sure if this is the same issue or not, but we ran a Wiki Ed cohort sometime in late 2014 or early 2015, and got a result of around 16 million positive sum bytes added. Running the same cohort with t... [16:13:50] 06Labs, 10Tool-Labs: Upgrade globally installed python-mwclient to 0.8.1 - https://phabricator.wikimedia.org/T136106#2323362 (10bd808) [16:15:57] 06Labs, 10Tool-Labs: Upgrade globally installed python-mwclient to 0.8.1 - https://phabricator.wikimedia.org/T136106#2323377 (10bd808) ``` $ dpkg-query -s python-mwclient Package: python-mwclient Status: install ok installed Priority: optional Section: python Installed-Size: 244 Maintainer: Bryan Tong Minh 06Labs, 10DBA: Missing data on labs replica database - https://phabricator.wikimedia.org/T133715#2323379 (10jcrespo) a:03jcrespo We are precisely at this very moment reimporting now the revision table- we are curently on revision id 55000000. It will take same days to reach revision 700 000 000 but we are ge... [16:18:08] 06Labs, 10Tool-Labs: Upgrade globally installed python-mwclient to 0.8.1 - https://phabricator.wikimedia.org/T136106#2323362 (10yuvipanda) Is this coming from upstream debian or from our repos? [16:23:56] 06Labs: Add Content-Security-Policy header enforcing 3rd party web interaction restrictions to proxy responses - https://phabricator.wikimedia.org/T130748#2323431 (10Tgr) There is also the AGF approach: set a plain cookie and assume that Labs tool owners don't try to game the rules. [16:24:59] PROBLEM - Puppet run on tools-worker-1010 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:25:22] 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: Update tag and racktables for holmium: rename to labservices1002. - https://phabricator.wikimedia.org/T119533#2323437 (10Andrew) Ready to go now. [16:25:28] 06Labs, 10Tool-Labs: Upgrade globally installed python-mwclient to 0.8.1 - https://phabricator.wikimedia.org/T136106#2323439 (10valhallasw) We pull it in from apt.wm.o: ``` valhallasw@tools-bastion-03:~$ apt-cache madison python-mwclient python-mwclient | 0.8.0~dev1-1 | http://apt.wikimedia.org/wikimedia/ tru... [16:26:20] 06Labs: Update mwclient on labs - https://phabricator.wikimedia.org/T87483#2323445 (10valhallasw) [16:26:23] 06Labs, 10Tool-Labs: Upgrade globally installed python-mwclient to 0.8.1 - https://phabricator.wikimedia.org/T136106#2323447 (10valhallasw) [16:28:56] 06Labs, 10Tool-Labs: Upgrade globally installed python-mwclient to 0.8.1 - https://phabricator.wikimedia.org/T136106#2323466 (10Andrew) a:03Andrew [16:52:58] 06Labs, 10Tool-Labs: Upgrade globally installed python-mwclient to 0.8.1 - https://phabricator.wikimedia.org/T136106#2323514 (10Andrew) Packages are updated in the apt repo for precise, trusty, jessie. [16:54:22] 06Labs, 10Tool-Labs, 10DBA: p50380g50816__pop_stats (popularpages) using 53G on labsdb1001 (enwiki) - https://phabricator.wikimedia.org/T133326#2323521 (10kaldari) @Qgil: If you know any volunteer developers who might be interested in maintaining this tool, please let us know. The popular pages reports are v... [16:54:59] 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: rename holmium to labservices1002 - https://phabricator.wikimedia.org/T106303#2323525 (10Andrew) [16:56:10] morebots, where are you? [16:56:30] labs-morebots i mean [16:56:30] I am a logbot running on tools-exec-1221. [16:56:30] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [16:56:30] To log a message, type !log . [16:58:40] labs-morebots, where are you now? [16:58:40] I am a logbot running on tools-exec-1221. [16:58:41] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [16:58:41] To log a message, type !log . [16:58:52] !log testlabs mwclient test [16:58:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL, dummy [17:05:01] RECOVERY - Puppet run on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0] [17:13:20] 06Labs, 10Tool-Labs: Upgrade globally installed python-mwclient to 0.8.1 - https://phabricator.wikimedia.org/T136106#2323606 (10Andrew) 05Open>03Resolved And I've run upgrades everywhere with salt. [17:15:42] PROBLEM - Puppet run on tools-exec-1212 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:17:02] PROBLEM - Puppet run on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:17:02] PROBLEM - Puppet run on tools-webgrid-lighttpd-1209 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:19:31] 06Labs, 10Tool-Labs, 10ReleaseTaggerBot: No activity by ReleaseTaggerBot since 17 May due to parsing error with "Bug: Bug:" in commit message - https://phabricator.wikimedia.org/T136041#2323637 (10Jdforrester-WMF) 05Open>03Resolved [17:34:49] andrewbogott: maybe you can help me ... i'm trying to figure out if rcstream works for wikitech wiki? [17:35:09] sending recent changes to udp... [17:35:41] aude: I don't know much about rcstream, is it an extension or part of core? [17:35:50] i think it's part of core [17:36:23] might be a firewall issue then... [17:36:34] suppose i could see if the irc stream works [17:36:34] I suspect that Krenair will be more help with this, if he is around [17:37:08] $wgRCFeeds in wmf-config [17:37:52] wmf-config/CommonSettings-labs.php:$wgRCFeeds['redis'] = [ [17:37:52] wmf-config/CommonSettings.php: $wgRCFeeds['default'] = [ [17:37:52] wmf-config/CommonSettings.php: $wgRCFeeds['rcs1001'] = [ [17:38:14] if wikitech is like the others when it comes to CommonSettings.php [17:38:17] i know the dbname is odd (labswiki) [17:38:19] and not "-labs" [17:38:21] don't know if that somehow matters [17:38:56] need to find out if it uses CommonSettings.php or CommonSettings-labs.php [17:39:13] CommonSettings.php [17:39:20] -labs is for beta I think (which I hate) [17:39:32] then the rc stream should work and be enabled [17:39:35] 10PAWS: PAWS can not login - https://phabricator.wikimedia.org/T136114#2323722 (10Dvorapa) [17:39:41] 3189 'uri' => "udp://$wmgRC2UDPAddress:$wmgRC2UDPPort/$wmgRC2UDPPrefix", [17:39:52] it looks like on by default for all wikis [17:40:09] there is some more info here https://wikitech.wikimedia.org/wiki/Stream.wikimedia.org [17:40:17] i havent used a client to it but there are links [17:41:26] aude: I need to go, but clearly there are people here with more knowledge than I have :) [17:42:24] ok [17:42:28] the irc stream works [17:42:49] how do you look at it? with the demo client? [17:42:52] rcstream definitely works for wikidata [17:43:11] mutante: can try with the demo code or http://codepen.io/Krinkle/pen/laucI [17:43:43] oh, fancy! [17:43:51] there are no events for wikitech (rc.server_name == 'wikitech.wikimedia.org') or anything else with if ( /wikitech/.test(rc.server_name) ) [17:44:04] i'm setting up to get sms notification when the train happens :) [17:44:06] I suspect maybe silver can't reach rcs1001 and others [17:44:31] 10PAWS: I can not write some special characters in PAWS - https://phabricator.wikimedia.org/T136118#2323791 (10Dvorapa) [17:44:31] so want me to check if it's firewalling ? [17:44:35] 10PAWS: PAWS can not login - https://phabricator.wikimedia.org/T136114#2323722 (10yuvipanda) Did -lang:cs work earlier as well? can you try on a different wiki? [17:44:40] it needs to talk to rcs100x [17:44:40] in case i'm not sitting at my computer, and got it working to watch for things on wikidata [17:46:29] 10PAWS: PAWS can not login - https://phabricator.wikimedia.org/T136114#2323810 (10Dvorapa) Yes, it worked as usual and as you can see at the bottom of the screenshot, testwiki doesn't work too. [17:46:35] i'm looking at the iptables rules ... [17:47:53] ok, yea, i think that's it [17:48:05] there is a ferm service for "rcstream_redis" in the rcstream role [17:48:12] and it allows an srange of $INTERNAL [17:48:24] and as it happened before, silver might not be covered by that [17:48:41] so then it cant talk to 6379 and redis wouldnt get the updates [17:48:49] ah [17:49:10] 10PAWS: I can not write some special characters in PAWS - https://phabricator.wikimedia.org/T136118#2323821 (10Dvorapa) [17:49:12] It's important nothing else in labs can communicate with this of course. [17:49:18] s/else/ [17:49:59] yea, well, silver is not in labs [17:50:07] but neither in $INTERNAL i think [17:50:16] 10PAWS: There should be a way, how to copy/paste a text from/to PAWS - https://phabricator.wikimedia.org/T136119#2323829 (10Dvorapa) [17:50:30] didnt we have a similar thing before [17:52:04] 10PAWS: PAWS can not login - https://phabricator.wikimedia.org/T136114#2323849 (10Dvorapa) For dewiki the same output: {F4049759} [17:52:05] yea, it's just 10.0.0.0/8 [17:52:30] which includes .. labs instances? [17:54:04] and here's how it was fied for elasticsearch: [17:54:07] srange => '(($INTERNAL @resolve(silver.wikimedia.org) @resolve(labtestweb2001.wikimedia.org)))' [17:54:58] Krenair: aude: also for labstestweb ? [17:56:19] where is that? [17:56:51] it looks like a physical server in prod [17:56:57] and a copy of silver [17:57:02] to test changes [17:57:04] RECOVERY - Puppet run on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [17:57:08] oh [17:57:14] the host that runs horizon [17:57:15] i see it in wikitech.dblist [17:57:22] but didn't know what it was for [17:57:25] that's where krenair just wanted to add it [17:57:37] it's like the beta of wikitech, afaik [17:57:43] ok [17:58:48] probably makes sense to have rcstream for it, though not critical in my case [18:01:12] https://gerrit.wikimedia.org/r/#/c/290504/2/manifests/role/rcstream.pp [18:02:27] looks good to me though i really don't know quite how the rules work [18:04:56] 10PAWS: PAWS can not login - https://phabricator.wikimedia.org/T136114#2323882 (10yuvipanda) Can you hit the 'control panel' button on top right, hit 'stop server', and then 'start server' again? [18:06:41] 10PAWS: There should be a way, how to copy/paste a text from/to PAWS - https://phabricator.wikimedia.org/T136119#2323829 (10yuvipanda) So one way now is to create a small shell script with 'new -> text file' and then execute that from the terminal... [18:10:31] i'll be back later.... [18:10:50] thanks mutante :) [18:11:03] ok, yw! [18:11:43] it will help with monitoring wikidata deployments [18:46:53] 10PAWS: PAWS can not login - https://phabricator.wikimedia.org/T136114#2324001 (10Dvorapa) I had already tried that before I created this issue. I closed all terminals, stopped server, removed PAWS from my granted apps, waited some hours and then retried from scratch. [18:49:18] 10PAWS: There should be a way, how to copy/paste a text from/to PAWS - https://phabricator.wikimedia.org/T136119#2324012 (10Dvorapa) Thank you, I'll try [19:32:39] aude: try stream for wikitech again when you get back [20:37:14] PROBLEM - Puppet run on tools-docker-builder-03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:38:46] that's probablym e [20:52:15] RECOVERY - Puppet run on tools-docker-builder-03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:54:36] 06Labs, 10Tool-Labs, 10DBA, 07Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#2324477 (10Cyberpower678) [20:54:38] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2324476 (10Cyberpower678) 05Open>03Resolved [20:55:31] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 10DBA: Throttling Cyberbot tool user as it is consuming most of the CPU - https://phabricator.wikimedia.org/T131937#2183587 (10Cyberpower678) [21:02:11] 06Labs, 10Labs-Infrastructure: Prevent puppet from creating local user when they are defined in LDAP - https://phabricator.wikimedia.org/T73480#2324501 (10hashar) Same happens with groups as @thcipriani found out on the beta cluster. A `project-deployment-prep` local user group ended up shadowing up the LDAP... [21:06:25] (03PS1) 10Lokal Profil: Add test to validate entries in monument_config [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/290557 [21:07:58] !log mw1137,mw1146 restarted hhvm service [21:07:59] mw1137,mw1146 is not a valid project. [21:08:03] eh, yea [21:09:02] (03PS2) 10Lokal Profil: Add test to validate entries in monuments_config [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/290557 [21:16:52] mutante: still not sure rcstream works (or i am subscribing to the wrong host?) [21:17:21] aude: i don't know, but i can confirm now there are firewall holes for that and before there weren't [21:17:30] ok [21:17:37] specifically the connection to redis [21:17:52] socket.emit('subscribe', 'wikitech.wikimedia.org'); [21:18:40] do you get any error? [21:19:46] let me try / check [21:36:03] mutante: i don't get an error (even if i subscribe to a completely invalid url) [21:36:20] so, maybe i'm just not capturing it and need to look into it more [21:36:38] aude: how about the exact same thing with a different wiki? [21:37:08] a valid one? e.g. commons.wikimedia.org? works fine [21:37:31] hmm ok [21:37:47] starts tcpdump [21:37:51] need to figure out how to get the error information [21:39:30] when you use "silver.wikimedia.org" it's the same, right? [21:39:38] i am doing somethign like https://gist.github.com/filbertkm/bf600da03b5bb363cd47103d4223c651 [21:40:35] silver.wikimedia.org didn't work also [21:42:38] what is 10.64.0.17 [21:42:45] 06Labs: Weird extra DNS entries in labs - https://phabricator.wikimedia.org/T135864#2324647 (10Andrew) 05duplicate>03Open Actually, this is an interesting one so I'm re-opening as its own bug [21:42:57] 06Labs: Weird extra DNS entries in labs - https://phabricator.wikimedia.org/T135864#2324649 (10Andrew) a:03Andrew [21:43:07] ok, that is rcs1002.eqiad.wmnet [21:43:23] aude: so i can also confirm that silver is sending out traffic over to rcs1002 [21:43:30] ok [21:43:31] and .. it is using IPv6 for that [21:43:41] now.. maybe that is not the case when the others do it [21:45:37] no, there goes that theory too [21:45:50] i see all kinds of incoming stuff on rcs1002 on the port [21:46:16] i gotta go for now, but now i would go back to rcstream config itself [21:46:23] ok [21:46:24] it does not look like networking anymore [21:52:57] 06Labs, 10Tool-Labs, 10DBA: enwiki_p replica on s1 is corrupted - https://phabricator.wikimedia.org/T134203#2324657 (10chasemp) p:05Triage>03High [21:55:05] 06Labs: Weird extra DNS entries in labs - https://phabricator.wikimedia.org/T135864#2324675 (10Andrew) 05Open>03Resolved The cause of this remains elusive, but I've cleaned up 10.68.17.12 so that it is associated only with druid102.analytics.eqiad.wmflabs. Please let me know if this issue reappears. [21:57:14] 06Labs, 10DBA: Missing data on labs replica database - https://phabricator.wikimedia.org/T133715#2240421 (10chasemp) @jcrespo, is this related? T134203 [22:12:35] 06Labs: Add Content-Security-Policy header enforcing 3rd party web interaction restrictions to proxy responses - https://phabricator.wikimedia.org/T130748#2324719 (10tom29739) >>! In T130748#2323431, @Tgr wrote: > There is also the AGF approach: set a plain cookie and assume that Labs tool owners don't try to ga...