[09:20:46] <wikibugs_>	 10Scoring-platform-team, 10MediaWiki-JobQueue, 10ORES, 10Performance-Team, and 5 others: Job queue corruption after codfw switch over (Queue growth, duplicate runs) - https://phabricator.wikimedia.org/T163337#3374328 (10elukey) Interesting thing found today: https://phabricator.wikimedia.org/P5621  I verif...
[09:44:08] <wikibugs_>	 10Scoring-platform-team, 10MediaWiki-JobQueue, 10ORES, 10Performance-Team, and 5 others: Job queue corruption after codfw switch over (Queue growth, duplicate runs) - https://phabricator.wikimedia.org/T163337#3374376 (10elukey) Deleted by accident my previous comment, will re-do it :)  So https://phabricat...
[11:21:07] <icinga-wm>	 PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[11:39:33] <wikibugs_>	 10Scoring-platform-team, 10Wikilabels: [Discuss] Wikilabels routes refactor - https://phabricator.wikimedia.org/T165046#3374535 (10Pginer-WMF) >>! In T165046#3298616, @Halfak wrote: > @Pginer-WMF (cc @jmatazzoni), would you be able to spare a small amount of time to discuss how we're looking to arrange the pag...
[12:24:12] <icinga-wm>	 RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 941 bytes in 7.538 second response time
[12:28:03] <icinga-wm>	 PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 981 bytes in 0.014 second response time
[12:48:12] <icinga-wm>	 RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 956 bytes in 3.031 second response time
[12:51:03] <icinga-wm>	 PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 981 bytes in 0.025 second response time
[13:12:24] <icinga-wm>	 ACKNOWLEDGEMENT - ORES worker production on ores.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds alexandros kosiaris debugging it
[13:18:12] <icinga-wm>	 RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 943 bytes in 1.025 second response time
[13:24:22] <icinga-wm>	 PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[13:26:12] <icinga-wm>	 RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 942 bytes in 8.035 second response time
[13:29:12] <icinga-wm>	 PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 980 bytes in 0.014 second response time
[13:59:22] <icinga-wm>	 RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 941 bytes in 7.541 second response time
[14:04:43] <wikibugs_>	 10Scoring-platform-team, 10MediaWiki-JobQueue, 10ORES, 10Performance-Team, and 5 others: Job queue corruption after codfw switch over (Queue growth, duplicate runs) - https://phabricator.wikimedia.org/T163337#3374792 (10elukey) >>! In T163337#3214837, @Krinkle wrote: > Ideas for next steps: > * Figure out...
[14:06:13] <icinga-wm>	 PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 981 bytes in 0.018 second response time
[14:09:22] <icinga-wm>	 RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 942 bytes in 2.020 second response time
[14:12:33] <icinga-wm>	 PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:27:23] <icinga-wm>	 RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 957 bytes in 2.022 second response time
[14:32:23] <icinga-wm>	 PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 981 bytes in 0.017 second response time
[15:48:39] <icinga-wm>	 RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 939 bytes in 7.031 second response time
[15:51:29] <icinga-wm>	 PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 979 bytes in 0.013 second response time
[15:51:39] <halfak>	 yup
[15:58:39] <icinga-wm>	 RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 940 bytes in 8.033 second response time
[16:53:53] <icinga-wm>	 PROBLEM - ORES worker production on ores.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 1019 bytes in 0.086 second response time
[18:42:05] <icinga-wm>	 RECOVERY - ORES worker production on ores.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 996 bytes in 1.055 second response time
[21:29:24] <paladox>	 im going to upgrade to stretch on gerrit-mysql (runs icinga2) now.
[21:51:53] <icinga2-wm>	 PROBLEM - Host ores-redis.01 is DOWN: /bin/ping -n -U -w 30 -c 5 ores-redis-01.ores.eqiad.wmflabsCRITICAL - Could not interpret output from ping command
[21:51:54] <icinga2-wm>	 PROBLEM - Host ores-worker-05 is DOWN: /bin/ping -n -U -w 30 -c 5 ores-worker-05.ores.eqiad.wmflabsCRITICAL - Could not interpret output from ping command
[21:51:56] <icinga2-wm>	 PROBLEM - Host ores.wmflabs.org is DOWN: /bin/ping -n -U -w 30 -c 5 ores.wmflabs.orgCRITICAL - Could not interpret output from ping command
[21:51:58] <icinga2-wm>	 PROBLEM - ping4 on Ores-Compute-01 is UNKNOWN: /bin/ping -n -U -w 10 -c 5 ores-compute-01.ores.eqiad.wmflabsCRITICAL - Could not interpret output from ping command
[21:51:59] <icinga2-wm>	 PROBLEM - ping4 on ores-lb-02 is UNKNOWN: /bin/ping -n -U -w 10 -c 5 ores-lb-02.ores.eqiad.wmflabsCRITICAL - Could not interpret output from ping command
[21:52:02] <icinga2-wm>	 PROBLEM - Host ores-web-05 is DOWN: /bin/ping -n -U -w 30 -c 5 ores-web-05.ores.eqiad.wmflabsCRITICAL - Could not interpret output from ping command
[21:52:03] <icinga2-wm>	 PROBLEM - Host Ores-Compute-01 is DOWN: /bin/ping -n -U -w 30 -c 5 ores-compute-01.ores.eqiad.wmflabsCRITICAL - Could not interpret output from ping command
[21:52:06] <icinga2-wm>	 PROBLEM - Host ores-lb-02 is DOWN: /bin/ping -n -U -w 30 -c 5 ores-lb-02.ores.eqiad.wmflabsCRITICAL - Could not interpret output from ping command
[21:52:47] <paladox>	 ignore ^^
[21:52:59] <paladox>	 it's because of the upgrade. i've stopped icinga2 temp
[22:00:49] <icinga2-wm>	 RECOVERY - Host ores-lb-02 is UP: PING OK - Packet loss = 0%, RTA = 1.71 ms
[22:00:49] <icinga2-wm>	 RECOVERY - ping4 on ores-lb-02 is OK: PING OK - Packet loss = 0%, RTA = 3.39 ms
[22:00:50] <icinga2-wm>	 RECOVERY - Host ores-redis.01 is UP: PING OK - Packet loss = 0%, RTA = 1.96 ms
[22:00:52] <icinga2-wm>	 RECOVERY - Host Ores-Compute-01 is UP: PING OK - Packet loss = 0%, RTA = 3.61 ms
[22:00:54] <icinga2-wm>	 RECOVERY - Host ores-web-05 is UP: PING OK - Packet loss = 0%, RTA = 2.26 ms
[22:00:57] <icinga2-wm>	 RECOVERY - Host ores-worker-05 is UP: PING OK - Packet loss = 0%, RTA = 2.00 ms
[22:00:57] <icinga2-wm>	 RECOVERY - Host ores.wmflabs.org is UP: PING OK - Packet loss = 0%, RTA = 1.74 ms