[00:00:02] <ToAruShiroiNeko>	 In about 7 hours I will gain complete freedom. :)
[00:00:42] <ToAruShiroiNeko>	 I dont fully understand our backend :)
[00:06:32] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:07:03] <halfak>	 spiking again
[00:08:57] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 457 bytes in 0.626 second response time
[00:09:07] <halfak>	 https://graphite.wikimedia.org/S/Bp
[00:09:16] <halfak>	 It's a spike in requests to enwiki's wp10 model
[00:09:21] <halfak>	 1 at a time
[00:10:19] <wikibugs>	 06Revision-Scoring-As-A-Service, 10ORES: [Discuss] DOS attacks on ORES.  What to do? - https://phabricator.wikimedia.org/T148347#2720777 (10Halfak) https://graphite.wikimedia.org/S/Bp  Looks like someone requesting scores for the wp10 model one at a time.
[00:29:03] <halfak>	 OK I think the next deployment is going to enforce the email-address-in-the-user-agent when request rates get high. 
[00:53:55] <wikibugs>	 06Revision-Scoring-As-A-Service, 10ORES: [Discuss] DOS attacks on ORES.  What to do? - https://phabricator.wikimedia.org/T148347#2720795 (10Legoktm) Is there contact information in their user agent? I'd just block them that way (400 or something) until we can talk to them and have them use batching, etc.
[01:24:48] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:27:18] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 1.074 second response time
[01:35:07] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:37:37] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 0.555 second response time
[01:38:15] <halfak>	 Emailed ops about the issue
[02:27:57] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[02:32:59] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 0.702 second response time
[03:15:47] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[03:28:11] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 0.101 second response time
[03:42:55] <awight>	 halfak: Exciting problem to have!
[03:43:10] <awight>	 Did you figure it out, or is it still useful for me to poke around in the weblogs?
[03:46:55] <awight>	 That graphite link isn't very self-explanatory--maybe you can give me a time window to focus on?
[03:48:27] <awight>	 fyi https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest
[03:58:44] <awight>	 LOL the biggest offender is resolving to ISP=Wikimedia Foundation
[04:00:40] <awight>	 Doesn't seem to match the pattern of abuse you were describing, though--these are just a handful of reasonable requests per second, seems to be populating the RC scores cache
[04:16:07] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[04:18:27] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 441 bytes in 0.591 second response time
[04:38:47] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[04:43:37] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 0.642 second response time
[04:51:48] <awight>	 d'oh.  I screwed up the extremely expensive query, doing it again now.
[04:52:14] <awight>	 But I did learn that only 16 user IPs have hit ORES since 00:00 UTC
[04:53:27] <awight>	 Here's the strange thing--none of those jump out as egregious.  Here are the counts, 25197 23487 20372 14357 1136 433 211 103 37 8 7 3 2 2 1 1
[04:54:34] <awight>	 oops.  okay that query was even more wrong than I thought.  I was grouping by server hostname rather than user ip.
[04:56:49] <icinga-wm>	 PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[04:58:46] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[04:59:19] <icinga-wm>	 RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 720 bytes in 1.586 second response time
[05:01:08] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 1.148 second response time
[05:34:06] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:36:28] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 0.632 second response time
[05:53:56] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:01:18] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 458 bytes in 1.144 second response time
[06:31:28] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:34:06] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 442 bytes in 0.605 second response time
[06:41:38] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:43:58] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 457 bytes in 0.628 second response time
[07:04:26] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:21:48] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 441 bytes in 0.602 second response time
[07:37:58] <icinga-wm>	 PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:42:49] <icinga-wm>	 RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 706 bytes in 1.110 second response time
[08:07:45] <wikibugs>	 06Revision-Scoring-As-A-Service, 10ORES: [Discuss] DOS attacks on ORES.  What to do? - https://phabricator.wikimedia.org/T148347#2720674 (10awight) I pulled the IP address and created a private subtask to temporarily block or throttle this client: T148356.  Since our thread here is a [discussion] and not a spi...
[08:19:17] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:21:41] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 0.643 second response time
[08:24:20] <wikibugs>	 06Revision-Scoring-As-A-Service, 10ORES: [Discuss] DOS attacks on ORES.  What to do? - https://phabricator.wikimedia.org/T148347#2721200 (10Ladsgroup) We talked about it in `#wikimedia-operations` in IRC. It seems it was @Daniel_Mietchen doing 142 edits per minute in Wikidata without the bot flag which is agai...
[17:33:56] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[17:36:36] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 0.661 second response time
[17:55:41] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[17:58:17] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 458 bytes in 0.601 second response time
[18:15:27] <halfak|Lunch>	 Amir1, did you make a figshare account? 
[18:16:02] <halfak>	 I really just need to know what last name and first initial you'd like to use.  This is a professional name.  It doesn't need to be a legal name.  
[18:17:21] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[18:22:33] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 457 bytes in 0.618 second response time
[18:44:06] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[18:46:43] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 457 bytes in 1.383 second response time
[19:00:42] <halfak>	 OK.  Looks like precached was going crazy again.  Looking into that now
[19:02:08] <icinga-wm>	 RECOVERY - ORES web node labs ores-web-03 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 457 bytes in 0.598 second response time
[19:04:12] <halfak>	 ahhh... that feeling when I open up the editor to work on something
[19:09:38] <wikibugs>	 06Revision-Scoring-As-A-Service, 10ORES: Investigate memory leak in precached - https://phabricator.wikimedia.org/T146500#2723662 (10Halfak) Looks like this could be useful: https://pythonhosted.org/Pympler/muppy.html
[20:06:26] <halfak>	 ^ Amir1
[20:06:30] <halfak>	 Woops.  Is AFK
[20:42:45] <wikibugs>	 06Revision-Scoring-As-A-Service, 10ORES: Investigate memory leak in precached - https://phabricator.wikimedia.org/T146500#2723857 (10Halfak) I worked out that there was a datastructure that would grow slowly over time.  I just submitted https://github.com/wiki-ai/ores/pull/170 that should address the issue.
[22:33:37] <halfak>	 OK.  I'm out of here.  Have a good one folks!