[00:53:08] 10Beta-Cluster-Infrastructure: Update references from labmon.* to cloudmetrics.* on deployment-prep (beta cluster) - https://phabricator.wikimedia.org/T241462 (10Krenair) >>! In T241462#5763809, @MarcoAurelio wrote: > Could not change on deployment-eventgate1-3 as those ain't in Horizon (?) They don't exist. I... [00:54:43] 10Beta-Cluster-Infrastructure: Update references from labmon.* to cloudmetrics.* on deployment-prep (beta cluster) - https://phabricator.wikimedia.org/T241462 (10Krenair) >>! In T241462#5763875, @MarcoAurelio wrote: > Not sure where to look at now. Shall I restart cpjobqueue from Horizon? No need to reboot the... [06:11:14] 10Release-Engineering-Team, 10serviceops, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Test Wikitech is still running wmf.8 (should be on wmf.11) - https://phabricator.wikimedia.org/T241251 (10Andrew) It should be fine to upgrade and add to the dsh group now. [08:47:32] 10Continuous-Integration-Config, 10Wikidata, 10Documentation, 10User-Addshore: Investigate auto generating LUA docs with doxygen - https://phabricator.wikimedia.org/T241478 (10Addshore) [09:43:26] 10Beta-Cluster-Infrastructure: Steward access on Beta Cluster for Tulsi Bhagat - https://phabricator.wikimedia.org/T241466 (10Tulsi_Bhagat) @DannyS712 Yes, I have [[https://commons.wikimedia.beta.wmflabs.org/wiki/Special:Log?type=rights&user=&page=Tulsi+Bhagat&wpdate=&tagfilter=&subtype=|got it yesterday]] only.... [10:38:25] 10Continuous-Integration-Config, 10Wikidata, 10doxygen, 10Documentation, 10User-Addshore: Investigate auto generating LUA docs with doxygen - https://phabricator.wikimedia.org/T241478 (10Addshore) [10:39:04] 10Continuous-Integration-Config, 10Wikidata, 10doxygen, 10Documentation, 10User-Addshore: Investigate auto generating LUA docs with doxygen - https://phabricator.wikimedia.org/T241478 (10Addshore) [10:49:27] PROBLEM - Host deployment-aqs03 is DOWN: CRITICAL - Host Unreachable (172.16.1.50) [10:54:26] RECOVERY - Host deployment-aqs03 is UP: PING OK - Packet loss = 0%, RTA = 1.50 ms [10:55:48] 10Beta-Cluster-Infrastructure: Update references from labmon.* to cloudmetrics.* on deployment-prep (beta cluster) - https://phabricator.wikimedia.org/T241462 (10MarcoAurelio) >>! In T241462#5764138, @Krenair wrote: >>>! In T241462#5763875, @MarcoAurelio wrote: >> Not sure where to look at now. Shall I restart c... [10:58:42] 10Gerrit: `mediawiki-replication` Gerrit group is hidden - https://phabricator.wikimedia.org/T241461 (10MarcoAurelio) p:05Triage→03Low [11:16:12] 10Beta-Cluster-Infrastructure: deployment-logstash03: UDP listener died EADDRINUSE - https://phabricator.wikimedia.org/T241481 (10MarcoAurelio) [11:16:54] 10Beta-Cluster-Infrastructure: deployment-logstash03: UDP listener died EADDRINUSE - https://phabricator.wikimedia.org/T241481 (10MarcoAurelio) [11:19:24] PROBLEM - Host deployment-aqs03 is DOWN: CRITICAL - Host Unreachable (172.16.1.50) [11:25:52] RECOVERY - Host deployment-aqs03 is UP: PING OK - Packet loss = 0%, RTA = 1.64 ms [11:45:34] (03CR) 10N3rsti: "Can you merge it? I still can't use recheck command" [integration/config] - 10https://gerrit.wikimedia.org/r/560049 (https://phabricator.wikimedia.org/T235286) (owner: 10N3rsti) [11:54:17] (03CR) 10DannyS712: [C: 03+1] "> Can you merge it? I still can't use recheck command" [integration/config] - 10https://gerrit.wikimedia.org/r/560049 (https://phabricator.wikimedia.org/T235286) (owner: 10N3rsti) [11:54:48] (03CR) 10DannyS712: [C: 03+1] Add Minhducsun2002 [GCI participant] to the CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/560078 (owner: 10Minhducsun2002) [11:55:24] (03CR) 10DannyS712: [C: 03+1] jjb: use team list instead of Antoine's e-mail [integration/config] - 10https://gerrit.wikimedia.org/r/559821 (owner: 10Hashar) [13:39:19] 10Beta-Cluster-Infrastructure, 10Discovery-Search, 10Elasticsearch: [_field_stats] endpoint is deprecated! Use [_field_caps] instead or run a min/max aggregations on the desired fields. - https://phabricator.wikimedia.org/T241485 (10MarcoAurelio) [13:48:47] 10Beta-Cluster-Infrastructure: Update references from labmon.* to cloudmetrics.* on deployment-prep (beta cluster) - https://phabricator.wikimedia.org/T241462 (10MarcoAurelio) On advice of @Andrew I've rebooted via Horizon `deployment-cpjobqueue`. It looks like, after the reboot, deployment-cpjobqueue is no long... [14:00:45] (03PS1) 10Daimona Eaytoy: layout: [ImageRating] Run phan [integration/config] - 10https://gerrit.wikimedia.org/r/560845 [14:04:45] 10Beta-Cluster-Infrastructure, 10CirrusSearch, 10Discovery-Search: deployment-mediawiki-07: Search backend error during {queryType} search for '{query}' after {tookMs}: {error_message} - https://phabricator.wikimedia.org/T241487 (10MarcoAurelio) [14:11:22] 10Beta-Cluster-Infrastructure: Update references from labmon.* to cloudmetrics.* on deployment-prep (beta cluster) - https://phabricator.wikimedia.org/T241462 (10MarcoAurelio) Hmm, this spawns each ten minutes: `lines=10 { "_index": "logstash-2019.12.27", "_type": "changeprop", "_id": "AW9HrbB0pvLXCyw4pFS... [14:20:53] 10Beta-Cluster-Infrastructure: Update references from labmon.* to cloudmetrics.* on deployment-prep (beta cluster) - https://phabricator.wikimedia.org/T241462 (10Krenair) >>! In T241462#5764469, @MarcoAurelio wrote: > Hmm, this spawns each ten minutes: > > `lines=10 > { > "_index": "logstash-2019.12.27", >... [14:25:16] 10Beta-Cluster-Infrastructure: Steward access on Beta Cluster for Tulsi Bhagat - https://phabricator.wikimedia.org/T241466 (10MarcoAurelio) > Coming year, I wanna be more limitless and want to do as more as I can. Before that I would like to test functions which I haven't yet access it on original wikis. For ins... [14:25:23] 10Beta-Cluster-Infrastructure: Update references from labmon.* to cloudmetrics.* on deployment-prep (beta cluster) - https://phabricator.wikimedia.org/T241462 (10Krenair) Based on running `grep labmon /var/log/syslog | tail` through cumin, these hosts still have errors connecting to labmon: deployment-parsoid09,... [14:33:23] 10Beta-Cluster-Infrastructure: Update references from labmon.* to cloudmetrics.* on deployment-prep (beta cluster) - https://phabricator.wikimedia.org/T241462 (10Krenair) Restarted parsoid on -parsoid09, eventstreams on -sca02, mediawiki-services-cxserver on -docker-cxserver01 (which for some reason hasn't come... [14:43:52] 10Beta-Cluster-Infrastructure: Update references from labmon.* to cloudmetrics.* on deployment-prep (beta cluster) - https://phabricator.wikimedia.org/T241462 (10Krenair) `journalctl -xu mediawiki-services-cxserver.service` has shown the reason for the container on -docker-cxserver01 not coming back up: `Dec 27... [15:16:01] 10Beta-Cluster-Infrastructure: Steward access on Beta Cluster for Tulsi Bhagat - https://phabricator.wikimedia.org/T241466 (10Tulsi_Bhagat) >>! In T241466#5764475, @MarcoAurelio wrote: >> Coming year, I wanna be more limitless and want to do as more as I can. Before that I would like to test functions which I ha... [17:00:27] PROBLEM - Host deployment-cpjobqueue is DOWN: CRITICAL - Host Unreachable (172.16.4.124) [17:00:50] PROBLEM - Host deployment-memc07 is DOWN: CRITICAL - Host Unreachable (172.16.5.2) [17:00:53] PROBLEM - Host deployment-kafka-jumbo-1 is DOWN: CRITICAL - Host Unreachable (172.16.5.4) [17:01:17] PROBLEM - Host deployment-mediawiki-07 is DOWN: CRITICAL - Host Unreachable (172.16.4.119) [17:01:22] PROBLEM - Host deployment-restbase02 is DOWN: CRITICAL - Host Unreachable (172.16.5.82) [17:02:06] PROBLEM - Host deployment-elastic06 is DOWN: CRITICAL - Host Unreachable (172.16.5.131) [17:02:36] PROBLEM - Host deployment-cache-text05 is DOWN: CRITICAL - Host Unreachable (172.16.4.21) [17:02:49] PROBLEM - Host deployment-puppetmaster03 is DOWN: CRITICAL - Host Unreachable (172.16.4.91) [17:02:54] PROBLEM - Host deployment-changeprop is DOWN: CRITICAL - Host Unreachable (172.16.5.21) [17:03:13] PROBLEM - Host deployment-puppetdb02 is DOWN: CRITICAL - Host Unreachable (172.16.4.104) [17:03:24] PROBLEM - Host deployment-eventlog05 is DOWN: CRITICAL - Host Unreachable (172.16.4.128) [17:05:29] PROBLEM - Host deployment-imagescaler01 is DOWN: CRITICAL - Host Unreachable (172.16.5.80) [17:05:42] PROBLEM - Host deployment-chromium01 is DOWN: CRITICAL - Host Unreachable (172.16.4.108) [17:05:58] PROBLEM - Host Generic Beta Cluster is DOWN: CRITICAL - Host Unreachable (en.wikipedia.beta.wmflabs.org) [17:06:29] PROBLEM - Host Generic Beta Cluster is DOWN: CRITICAL - Host Unreachable (en.wikipedia.beta.wmflabs.org) [17:07:11] Project beta-scap-eqiad build #281195: 04FAILURE in 2 min 40 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/281195/ [17:11:29] RECOVERY - Host Generic Beta Cluster is UP: PING OK - Packet loss = 0%, RTA = 1.01 ms [17:11:40] ^ cloudvirt1014 [17:11:43] RECOVERY - Host deployment-puppetmaster03 is UP: PING OK - Packet loss = 0%, RTA = 1.05 ms [17:12:05] RECOVERY - Host deployment-chromium01 is UP: PING OK - Packet loss = 0%, RTA = 2.03 ms [17:12:08] RECOVERY - Host deployment-elastic06 is UP: PING OK - Packet loss = 0%, RTA = 0.77 ms [17:12:11] RECOVERY - Host deployment-mediawiki-07 is UP: PING OK - Packet loss = 0%, RTA = 0.78 ms [17:12:11] RECOVERY - Host deployment-imagescaler01 is UP: PING OK - Packet loss = 0%, RTA = 0.82 ms [17:12:36] RECOVERY - Host deployment-memc07 is UP: PING OK - Packet loss = 0%, RTA = 0.91 ms [17:12:37] RECOVERY - Host deployment-cache-text05 is UP: PING OK - Packet loss = 0%, RTA = 0.53 ms [17:12:52] RECOVERY - Host deployment-puppetdb02 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms [17:12:56] RECOVERY - Host deployment-changeprop is UP: PING OK - Packet loss = 0%, RTA = 1.17 ms [17:13:23] RECOVERY - Host deployment-kafka-jumbo-1 is UP: PING OK - Packet loss = 0%, RTA = 0.65 ms [17:13:26] RECOVERY - Host deployment-eventlog05 is UP: PING OK - Packet loss = 0%, RTA = 0.69 ms [17:13:47] (see -cloud) [17:14:17] RECOVERY - Host deployment-restbase02 is UP: PING OK - Packet loss = 0%, RTA = 0.80 ms [17:14:30] RECOVERY - Host deployment-cpjobqueue is UP: PING OK - Packet loss = 0%, RTA = 0.82 ms [17:16:01] RECOVERY - Host Generic Beta Cluster is UP: PING OK - Packet loss = 0%, RTA = 0.48 ms [17:16:54] Yippee, build fixed! [17:16:55] Project beta-scap-eqiad build #281196: 09FIXED in 2 min 22 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/281196/ [17:42:01] Hi everyone, I know it's xmas season, but...could someone do https://gerrit.wikimedia.org/r/c/integration/config/+/560049, please, to help a Google Code-In student? Thanks! [17:44:03] I'm looking at why beta is still down [17:47:52] PROBLEM - Parsoid on deployment-mediawiki-parsoid10 is CRITICAL: connect to address 172.16.0.141 and port 8000: Connection refused [17:47:52] PROBLEM - Parsoid on deployment-parsoid09 is CRITICAL: connect to address 172.16.5.63 and port 8000: Connection refused [17:48:03] nginx on -cache-text05 won't start for some reason [17:49:57] looks like update-ocsp-all is breaking when it starts due to a read-only filesystem error [17:50:17] which wouldn't be too unheard of given the hypervisor it's on just crashed and rebooted, except actually both filesystems are writable [17:51:10] I wonder if it's due to something in /etc/systemd/system/nginx.service.d/security.conf [17:59:20] 20<Krenair>30 I fiddled around with some of the scripts on that deployment-cache-text instance and got beta back up and running [21:07:10] 10Beta-Cluster-Infrastructure: Steward access on Beta Cluster for Tulsi Bhagat - https://phabricator.wikimedia.org/T241466 (10Masumrezarock100) + 1 to MarkAurelio. I don't see a valid need for this. Steward access is extremely powerful and sensitive even on beta cluster! You are not a developer nor I am convince... [21:09:51] twentyafterfour: hi! You online?