[07:23:14] 10Traffic, 10Operations, 10Pybal: pybal should automatically reconnect to etcd - https://phabricator.wikimedia.org/T169765#3410598 (10elukey) Another thing that would be nice is the possibility to specify more than one conf host in `profile::pybal::config_host: conf2001.codfw.wmnet`, and allow pybal to conne... [07:24:49] 10Traffic, 10Operations, 10Pybal: pybal should automatically reconnect to etcd - https://phabricator.wikimedia.org/T169765#3407910 (10MoritzMuehlenhoff) >>! In T169765#3410598, @elukey wrote: > Another thing that would be nice is the possibility to specify more than one conf host in `profile::pybal::config_h... [07:26:51] 10Traffic, 10Operations, 10Pybal: pybal should automatically reconnect to etcd - https://phabricator.wikimedia.org/T169765#3407910 (10Volans) >>! In T169765#3410598, @elukey wrote: > Another thing that would be nice is the possibility to specify more than one conf host in `profile::pybal::config_host: conf20... [07:31:45] 10Traffic, 10Operations, 10Pybal: pybal should automatically reconnect to etcd - https://phabricator.wikimedia.org/T169765#3410629 (10Joe) One option to support reconnections and srv records and everything is to use the (blocking) python-etcd library via `defer.deferToThread` as `etcd-mirror` does. The issu... [11:23:10] http://www.saminiir.com/lets-code-tcp-ip-stack-5-tcp-retransmission/ [11:31:02] I think we have the same source :-P [11:39:33] 10Traffic, 10Operations, 10Pybal: pybal should automatically reconnect to etcd - https://phabricator.wikimedia.org/T169765#3411195 (10ema) Today @elukey took care of rebooting conf2003, and the pybals using it (ulsfo) **did** reconnect automatically. I've observed the situation a bit more closely on lvs4004... [11:41:57] ema: interesting. still a single host down for long would require to reconfigure and restart pybal to change conf host [11:43:27] volans: yeah [11:44:04] lunch! [11:44:13] :) [12:23:19] so for both hosts I stopped etcd manually via systemctl before rebooting, but maybe on conf2001 puppet ran before the reboot and brought etcd back? [12:26:34] nope, checked the etcd logs [14:35:18] 10Traffic, 10Operations, 10Pybal: pybal should reset the etcdindex it's looking at after losin a connection - https://phabricator.wikimedia.org/T169893#3411770 (10Joe) [14:40:37] 10Traffic, 10Operations, 10Pybal: pybal should reset the etcdindex it's looking at after losing a connection - https://phabricator.wikimedia.org/T169893#3411787 (10Joe) [14:43:40] 10Traffic, 10ArchCom-RfC, 10Operations, 10Services (designing): Make API usage limits easier to understand, implement, and more adaptive to varying request costs / concurrency limiting - https://phabricator.wikimedia.org/T167906#3411791 (10GWicke) IRC meeting summary: https://tools.wmflabs.org/meetbot/wiki... [14:48:37] 10Traffic, 10ArchCom-RfC, 10Operations, 10Services (designing): Make API usage limits easier to understand, implement, and more adaptive to varying request costs / concurrency limiting - https://phabricator.wikimedia.org/T167906#3349120 (10MZMcBride) >>! In T167906#3411791, @GWicke wrote: > Since there is... [15:47:26] 10Traffic, 10ArchCom-RfC, 10Operations, 10Services (designing): Make API usage limits easier to understand, implement, and more adaptive to varying request costs / concurrency limiting - https://phabricator.wikimedia.org/T167906#3412015 (10Anomie) >>! In T167906#3411800, @MZMcBride wrote: >>>! In T167906#3... [16:27:46] 10Traffic, 10ArchCom-RfC, 10Operations, 10Services (designing): Make API usage limits easier to understand, implement, and more adaptive to varying request costs / concurrency limiting - https://phabricator.wikimedia.org/T167906#3412158 (10GWicke) > IMO the proposal continues to be far too optimistic in as... [17:15:25] 10Traffic, 10ArchCom-RfC, 10Operations, 10Services (designing): Make API usage limits easier to understand, implement, and more adaptive to varying request costs / concurrency limiting - https://phabricator.wikimedia.org/T167906#3412521 (10Anomie) Yes, you've said that before. I have no idea how you plan t... [17:26:34] 10Traffic, 10Discovery, 10Maps, 10Operations, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3412574 (10mpopov) >>! In T169175#3409501, @Gehel wrote: > @mpopov I love your graphs! They just look nice! Aw, thank you! :D > That being said, we probab... [17:40:56] 10Traffic, 10Discovery, 10Maps, 10Operations, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3412637 (10Gehel) @BBlack / @ema we seem to have a good grasp on the "usual" maps traffic. I'll let you take over and see if we want to implement rate limit... [17:41:38] 10Traffic, 10Discovery, 10Maps, 10Operations, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3412640 (10Gehel) a:05mpopov>03ema [18:58:40] 10Traffic, 10Discovery, 10Maps, 10Operations, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3412852 (10mpopov) >>! In T169175#3412637, @Gehel wrote: > > As a very short summary of @mpopov's analyis: > > We would not limit anyone in the sample wit... [19:02:18] 10Traffic, 10Discovery, 10Maps, 10Operations, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3389649 (10MaxSem) Note that currently our servers are [[ https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=maps1... [19:13:03] 10Traffic, 10Discovery, 10Maps, 10Operations, and 2 others: What is a reasonable per-IP ratelimit for maps - https://phabricator.wikimedia.org/T169175#3412984 (10Gehel) I'm actually totally unsure of what conclusion we should have at this point, and that's why I'd like our friends from traffic to weight in... [19:28:59] 10Traffic, 10DBA, 10Operations, 10Performance-Team, 10Wikidata: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3413078 (10aaron) >>! In T164173#3343495, @aaron wrote: > @daniel , can you look into the amount of purges happening in... [22:25:51] 10Traffic, 10DBA, 10Operations, 10Performance-Team, 10Wikidata: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3413753 (10aaron) a:05aaron>03None [22:42:27] 10Traffic, 10DBA, 10Operations, 10Performance-Team, 10Wikidata: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3224448 (10Krinkle) ChangeNotificationJob https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/6cfd514ee9/cl... [22:42:47] 10Traffic, 10DBA, 10Operations, 10Performance-Team, 10Wikidata: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3413807 (10Krinkle) p:05Normal>03High [23:28:57] 10Traffic, 10DBA, 10Operations, 10Performance-Team, 10Wikidata: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3414111 (10aaron) I also wonder why some of those log warnings come from close() and others have the proper commitMaste...