[06:34:06] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1001 - https://phabricator.wikimedia.org/T183708#3870596 (10Marostegui) 05Open>03Resolved All good - thank you! ``` root@db1001:~# megacli -LDPDInfo -aAll Adapter #0 Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id: 0) Name : RAI... [06:47:41] 10DBA, 10Data-Services, 10cloud-services-team (FY2017-18): Re-institute query killer for the analytics WikiReplica - https://phabricator.wikimedia.org/T183983#3870612 (10Marostegui) Hi, Yeah, we have been seeing long delays on the analytics replicate for sometime already. Some days before Christmas Jaime... [07:59:51] 10DBA, 10Patch-For-Review: Checksum data on s7 - https://phabricator.wikimedia.org/T163190#3870692 (10Marostegui) Fixed more drifts on the following tables: arwiki.archive eswiki.archive frwiktionary.archive kowiki.archive metawiki.archive rowiki.archive [08:57:31] hello people, if you are ok I'd start the alter tables for db1107 (after stopping all the EL machinery/replication/etc..) - https://phabricator.wikimedia.org/P6511 [08:57:51] sure! [08:58:06] Does it have any slave? [08:58:11] if so, you want to replicate those alters? [08:58:17] if not, set session sql_log_bin=0 [08:58:42] no slave, only db1108 but it has its replication script (eventlogging_sync) [08:58:59] good, I would suggest set session sql_log_bin=0 anyways :) [08:59:11] unless you specifically want those on the binlog for some reason [08:59:51] yep yep :) [09:17:32] maintenance started! [09:32:52] 10DBA, 10Patch-For-Review: Checksum data on s7 - https://phabricator.wikimedia.org/T163190#3870829 (10Marostegui) Fixed drifts on change_tags on: eswiki fawiki hewiki huwiki kowiki metawiki ukwiki viwiki [09:35:00] 10DBA, 10Data-Services, 10cloud-services-team (FY2017-18): Re-institute query killer for the analytics WikiReplica - https://phabricator.wikimedia.org/T183983#3870833 (10jcrespo) The thing is, there was already a query killer during Christmas, analytics being for over 28800 seconds and web over 3600, but the... [09:37:52] 10DBA, 10Data-Services, 10cloud-services-team (FY2017-18): Re-institute query killer for the analytics WikiReplica - https://phabricator.wikimedia.org/T183983#3870835 (10Marostegui) >>! In T183983#3870833, @jcrespo wrote: > The thing is, there was already a query killer during Christmas, analytics being for... [09:49:06] I am going to do some core codfw master promotions for decommission [09:49:21] <3 [09:49:23] we should also think about doing the same for x1 on eqiad [09:53:21] because it is codfw, and if not we get too conservative, I will be upgrading the masters more aggresively [09:54:23] yeah, I totally agree :) [10:21:28] 10DBA: Decommission db1029 and db1031 - https://phabricator.wikimedia.org/T184054#3870892 (10jcrespo) p:05Triage>03Normal [10:22:02] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, and 2 others: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3870904 (10Marostegui) [10:24:48] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, and 2 others: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3870911 (10Marostegui) >>! In T174569#3866341, @Marostegui wrote: > I will alter s1 master tomorrow morning s1 master is done. Tomo... [10:30:53] 10DBA, 10Patch-For-Review: Checksum data on s7 - https://phabricator.wikimedia.org/T163190#3870918 (10Marostegui) Fixed drifts on tag_summary on: arwiki cawiki eswiki fawiki hewiki huwiki kowiki metawiki ukwiki viwiki [11:42:25] I am going to wait for the read only on codfw s6 after lunch [11:42:42] read only here means stop replication [11:42:47] sure :) [14:14:43] ^I am going to start now [14:32:32] so I 've just realized db2039 is partitioned [14:32:51] should I failovere somewhere else? should I remove the partitioning? [14:35:58] I am going to YOLO and remove the partitioning [14:44:51] 10DBA, 10Analytics-Kanban: Purge all old data from EventLogging master - https://phabricator.wikimedia.org/T168414#3871491 (10elukey) Maintenance is ongoing and it will probably last for a couple of days. [15:22:29] hi [15:22:33] I was in the gym [15:25:43] db2028 is not in the best of the states- we will be lucky if we can boot it back to copy it away [15:26:01] there it is [15:26:16] what is wrong with it? [15:26:30] it is old [15:26:41] got stuck several times on boot [15:26:53] had to powercyle it twice to make it boot [15:26:57] ufff [15:27:17] but in the end you made it boot no? (I can see it up) [15:27:28] yes, just now [15:27:41] any error on boot or just frozen? [15:28:00] ????????? [15:28:08] great error [15:28:09] XDD [15:28:31] not ? , you know the unicode char of invalid character? [15:29:23] Aaaah I thought you literally meant: ????? XD [15:32:35] "depoling" pc2005 [15:33:36] cool :) [15:53:03] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: pc2005 crashed: CPU2 internal error - https://phabricator.wikimedia.org/T183750#3871669 (10jcrespo) @papaul pc2005 server is up, but mysql is depooled and down, downtime'd for a day on incinga and can be brought down at any time now [15:57:33] 10DBA, 10Data-Services: Re-institute query killer for the analytics WikiReplica - https://phabricator.wikimedia.org/T183983#3871686 (10bd808) [16:30:48] 10DBA, 10Patch-For-Review: Checksum data on s7 - https://phabricator.wikimedia.org/T163190#3871800 (10Marostegui) s7 is looking pretty good now after fixing drifts on: text, pagelinks, user_newtalk on a few different wikis. Still a few more tables to double check with compare.py but I reckon we are in a much b... [17:50:15] 10DBA: Decommission db2028 - https://phabricator.wikimedia.org/T184090#3872109 (10jcrespo) p:05Triage>03Normal [18:00:06] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: pc2005 crashed: CPU2 internal error - https://phabricator.wikimedia.org/T183750#3872212 (10Papaul) Moved CPU1 to CPU2 Upgrade IDRAC from version 2.21 to 2.50 Upgrade BIOS from version 2.1.7 to 2.6.0 leaving the task open for now to see if the problem... [18:02:56] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: pc2005 crashed: CPU2 internal error - https://phabricator.wikimedia.org/T183750#3872265 (10jcrespo) Pooling it back, as it will not be too dangerous. [18:10:43] 10DBA, 10Patch-For-Review: Decommission db2028 - https://phabricator.wikimedia.org/T184090#3872315 (10Marostegui) s6 was already checksummed as far as I remember [18:13:41] 10DBA, 10Patch-For-Review: Decommission db2028 - https://phabricator.wikimedia.org/T184090#3872319 (10jcrespo) > s6 was already checksummed as far as I remember That is correct: T160509 but who says I didn't break it since then, specially on codfw? [18:14:33] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: pc2005 crashed: CPU2 internal error - https://phabricator.wikimedia.org/T183750#3872321 (10Marostegui) >>! In T183750#3872212, @Papaul wrote: > Moved CPU1 to CPU2 > Upgrade IDRAC from version 2.21 to 2.50 > Upgrade BIOS from version 2.1.7 to 2.6.0 >... [18:15:19] 10DBA, 10Patch-For-Review: Decommission db2028 - https://phabricator.wikimedia.org/T184090#3872322 (10Marostegui) >>! In T184090#3872319, @jcrespo wrote: >> s6 was already checksummed as far as I remember > > That is correct: T160509 but who says I didn't break it since then, specially on codfw? Always a pos... [18:16:27] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: pc2005 crashed: CPU2 internal error - https://phabricator.wikimedia.org/T183750#3872328 (10Papaul) @Marostegui that works for me [18:35:45] 10DBA, 10Cloud-Services, 10Toolforge, 10Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#3872373 (10jcrespo) [18:36:24] 10DBA, 10Cloud-Services, 10Toolforge, 10Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#1830725 (10jcrespo) [18:38:58] 10DBA, 10Data-Services, 10Toolforge, 10Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#3872378 (10bd808) [19:15:27] 10DBA, 10Operations, 10hardware-requests, 10ops-eqiad, 10Patch-For-Review: Decommission db1050 - https://phabricator.wikimedia.org/T178162#3872550 (10Cmjohnson) [19:15:37] 10DBA, 10Operations, 10Goal, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3872552 (10Cmjohnson) [19:15:42] 10DBA, 10Operations, 10hardware-requests, 10ops-eqiad, 10Patch-For-Review: Decommission db1050 - https://phabricator.wikimedia.org/T178162#3682923 (10Cmjohnson) 05Open>03Resolved [22:05:01] 10DBA, 10Operations, 10ops-codfw: db2054: Disk with predictive failure - https://phabricator.wikimedia.org/T183887#3873174 (10Papaul) Dear Mr Papaul Tshibamba, Hewlett Packard Enterprise Reference Number: 5325864400 STATUS: Customer Self Repair Part has been shipped Part/s shipped: 653952-001 Part descrip...