[06:02:34] 10DBA, 06Labs: Labs database replica drift - https://phabricator.wikimedia.org/T138967#2415416 (10Marostegui) Maybe what we can try is to stop sanitarium2 and sanitarium at the same replication position and import the page table (it is around 7.3G). [06:05:20] 10DBA, 13Patch-For-Review: Unify revision table on s2 - https://phabricator.wikimedia.org/T162611#3290966 (10Marostegui) db1047 is done and it is trying to catch up after the alters: ``` root@neodymium:~# for i in `cat /home/marostegui/T162611`; do echo $i; mysql --skip-ssl -hdb1047 $i -e "show create table re... [06:08:14] 07Blocked-on-schema-change, 10DBA, 05MW-1.28-release (WMF-deploy-2016-08-30_(1.28.0-wmf.17)), 05MW-1.28-release-notes, 13Patch-For-Review: Clean up revision UNIQUE indexes - https://phabricator.wikimedia.org/T142725#3290969 (10Marostegui) [06:08:17] 10DBA, 13Patch-For-Review: Unify revision table on s2 - https://phabricator.wikimedia.org/T162611#3290968 (10Marostegui) 05Open>03Resolved [06:13:39] 07Blocked-on-schema-change, 10DBA, 05MW-1.28-release (WMF-deploy-2016-08-30_(1.28.0-wmf.17)), 05MW-1.28-release-notes, 13Patch-For-Review: Clean up revision UNIQUE indexes - https://phabricator.wikimedia.org/T142725#3290973 (10Marostegui) s2 is done. Pending only s3 [06:14:27] 07Blocked-on-schema-change, 10DBA, 05MW-1.28-release (WMF-deploy-2016-08-30_(1.28.0-wmf.17)), 05MW-1.28-release-notes, 13Patch-For-Review: Clean up revision UNIQUE indexes - https://phabricator.wikimedia.org/T142725#3290987 (10Marostegui) [06:14:29] 07Blocked-on-schema-change, 10DBA: Unify revision table on s3 - https://phabricator.wikimedia.org/T166278#3290974 (10Marostegui) [06:22:18] 07Blocked-on-schema-change, 10DBA, 13Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s4 - https://phabricator.wikimedia.org/T166206#3290991 (10Marostegui) [06:54:53] 10DBA, 10Analytics, 06Labs, 10MediaWiki-Page-deletion, 10Tool-Labs-tools-Database-Queries: Database replication issues with deleted pages (affecting Tool Labs and Analytics Store) - https://phabricator.wikimedia.org/T166194#3290992 (10jcrespo) > what's the relation between dbstore1001 and and analytics-s... [06:57:57] 07Blocked-on-schema-change, 10DBA: Unify revision table on s3 - https://phabricator.wikimedia.org/T166278#3290993 (10Marostegui) [07:01:37] 10DBA, 06Labs: Labs database replica drift - https://phabricator.wikimedia.org/T138967#3291032 (10jcrespo) @Marostegui - you can do that if you want, but a) it will cause more complains of missing pages and the lag b) it will break in only few days (all tables were reimported for enwiki in a long process that... [07:11:49] 07Blocked-on-schema-change, 10DBA: Unify revision table on s3 - https://phabricator.wikimedia.org/T166278#3291094 (10Marostegui) [07:17:22] 07Blocked-on-schema-change, 10DBA: Unify revision table on s3 - https://phabricator.wikimedia.org/T166278#3291096 (10Marostegui) [08:02:15] 07Blocked-on-schema-change, 10DBA, 13Patch-For-Review: Unify revision table on s3 - https://phabricator.wikimedia.org/T166278#3291126 (10Marostegui) [08:22:23] 07Blocked-on-schema-change, 10DBA, 13Patch-For-Review: Unify revision table on s3 - https://phabricator.wikimedia.org/T166278#3291146 (10Marostegui) [09:02:56] 07Blocked-on-schema-change, 10DBA, 13Patch-For-Review: Unify revision table on s3 - https://phabricator.wikimedia.org/T166278#3291222 (10Marostegui) [09:12:16] 10DBA, 10Analytics, 06Labs, 10MediaWiki-Page-deletion, 10Tool-Labs-tools-Database-Queries: Database replication issues with deleted pages (affecting Tool Labs and Analytics Store) - https://phabricator.wikimedia.org/T166194#3291248 (10jcrespo) To clarify my previous statement, analytics hosts are conside... [11:37:57] 10DBA, 10Wikimedia-General-or-Unknown, 07Wikimedia-maintenance-script-run: Run updateRestrictions.php on WMF wikis - https://phabricator.wikimedia.org/T166184#3287810 (10Marostegui) Hi, We (DBAs) assume this is just a heads up for us right? We do appreciate it! :-) If you don't mind, once you are ready to... [13:01:10] 10DBA, 06Labs: Labs database corruption - https://phabricator.wikimedia.org/T166091#3291733 (10Marostegui) @Legoktm would you mind adding the above comment to: T138967 as per: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database/Replica_drift It is easier for us to track in a single place and we try to... [13:31:19] 10DBA, 10MediaWiki-Database, 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 13Patch-For-Review, and 2 others: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3291792 (10Krinkle) [13:56:33] so actually, 4.9 worked ok on db2055 [13:58:36] Interesting... [13:58:43] So, maybe it is indeed the firmware [13:59:09] or the initial install, lacking some things [13:59:19] are you going to try a reimage? [13:59:21] I would try db2048 [13:59:30] sure [14:02:49] however, I have performance problems [14:03:57] ? [14:04:13] https://grafana.wikimedia.org/dashboard/db/mysql?panelId=40&fullscreen&orgId=1&var-dc=codfw%20prometheus%2Fops&var-server=db2055 [14:04:17] lag is flat [14:04:30] normally it resovers very slowly [14:04:46] but at leasts goes down from the beginning [14:05:30] it is now going down, but it took a lot [14:06:24] you think it can be the kernel? [14:06:36] it could be the mariadb version, too [14:06:44] I upgraded both [14:06:50] ah [14:06:50] or the hard [14:06:50] haha [14:06:57] I checked the bbu and disks [14:06:59] and look fine [14:07:07] maybe try upgrading only the kernel or the mariadb version on db2048 [14:07:25] I cannot [14:07:34] it installs the latest version by default [14:07:38] ah right [14:08:13] perf: interrupt took too long (4310 > 2500), lowering kernel.perf_event_max_sample_rate to 46250 [14:08:26] this is the kind of messages we had when we had io issues [14:09:27] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330) [14:09:54] could it be that? [14:10:00] firmware + kerne [14:10:44] someone with similar issues is suggesting a bIOS update [14:11:20] then there is this comment: https://bugzilla.redhat.com/show_bug.cgi?id=787126#c12 [14:12:04] perf: interrupt took too long (4310 > 2500), lowering kernel.perf_event_max_sample_rate to 46250 -> for me that is "normal" [14:12:13] like I have seen that before [14:12:23] yes [14:12:33] I am searching the bios error above [14:13:47] https://phabricator.wikimedia.org/T136345 [14:14:51] https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/HP_DL3N0_Gen9 [14:16:26] 10DBA, 06Collaboration-Team-Triage, 10Flow, 13Patch-For-Review, 07Schema-change: Drop flow_subscription table - https://phabricator.wikimedia.org/T149936#3292006 (10Krinkle) [14:17:08] we should try yes [14:18:02] I have commented on https://phabricator.wikimedia.org/T165739 [14:18:07] yeah, just saw it [14:18:12] Do you want to: [14:18:15] 1) try that [14:18:18] and test? [14:18:22] or include the firmware upgrade too? [14:18:35] I don't know [14:18:40] whatever it works [14:18:59] *whatever works [14:19:00] I would try the power management on db2055 to check if the performance gets better [14:19:14] And for db2049 the firmware upgrade only (for now) [14:19:42] But up to you :) [14:19:54] it could be me just accustumed to ssds [14:20:11] Oh, right, this one doesn't have ssds! [14:20:17] * marostegui used to ssds too [14:21:57] did Krinkle just pinged @springle ? [14:22:25] 10DBA, 06MediaWiki-Platform-Team, 10MediaWiki-extensions-Linter, 10Wikimedia-Extension-setup, and 3 others: Review and deploy Linter extension to Wikimedia wikis - https://phabricator.wikimedia.org/T148609#3292066 (10Krinkle) [14:22:48] 10DBA, 06MediaWiki-Platform-Team, 10MediaWiki-extensions-Linter, 10Wikimedia-Extension-setup, and 2 others: Review and deploy Linter extension to Wikimedia wikis - https://phabricator.wikimedia.org/T148609#2727712 (10Krinkle) [15:04:03] 100% sure there is a problem with db2055 [15:04:06] https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&from=now-24h&to=now&var-dc=codfw%20prometheus%2Fops&var-server=db2055 [15:04:24] I am going to restart with the other kernel and see [15:04:54] without ssds that is not even normal no [15:05:22] it has now a 57GB buffer pool [15:06:15] 10DBA, 10MediaWiki-Database, 10MediaWiki-Recent-changes: Special:RecentChanges can show recently created page as red link - https://phabricator.wikimedia.org/T129399#3292197 (10Krinkle) 05Open>03Invalid Closing for now until we can reproduce this again. [16:22:30] 10DBA, 06Analytics-Kanban, 06Operations: db1046 BBU looks faulty - https://phabricator.wikimedia.org/T166141#3292531 (10Nuria) p:05Triage>03High [16:27:00] 10DBA: db2055 mariadb weirdness - https://phabricator.wikimedia.org/T166326#3292559 (10jcrespo) [16:27:17] 10DBA: db2055 mariadb weirdness - https://phabricator.wikimedia.org/T166326#3292573 (10jcrespo) Trying now to downgrade mariadb. [16:33:05] 10DBA, 10Analytics, 07Chinese-Sites: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3292614 (10Nuria) [16:35:54] 10DBA, 10Analytics, 07Chinese-Sites: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3292618 (10Nuria) Ping @Marostegui: do we have an ETA on when these wikis will be available on labs new db hosts? Ping @Neil_P._Quinn_WMF this data is on production snapshot (we are do... [16:39:53] 10DBA, 10Analytics, 07Chinese-Sites: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3260779 (10jcrespo) > do we have an ETA on when these wikis will be available on labs new db hosts? We are thinking by the end of FQ1, according to our roadmap. [16:41:46] 10DBA: db2055 mariadb weirdness - https://phabricator.wikimedia.org/T166326#3292640 (10jcrespo) Downgraded didn't work, not trying `SET GLOBAL innodb_flush_log_at_trx_commit=0, sync_binlog = 0;` [16:43:06] 10DBA: db2055 mariadb weirdness -lagging after reboot and upgrade - https://phabricator.wikimedia.org/T166326#3292642 (10jcrespo) [16:47:00] 10DBA, 10Analytics, 07Chinese-Sites: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3292656 (10Nuria) Ok, so we can plan on this data being available in September, correct? Until then we will continue taking snapshots of production and labs. [16:47:34] 10DBA: db2055 mariadb weirdness -lagging after reboot and upgrade - https://phabricator.wikimedia.org/T166326#3292657 (10jcrespo) There seems to be an ongoing checksum on enwiki- this is not an issue and should not be canceled, but it basically makes it impossible to reliable know why this could be happening, so... [16:47:44] 10DBA: db2055 mariadb weirdness -lagging after reboot and upgrade - https://phabricator.wikimedia.org/T166326#3292658 (10jcrespo) p:05Triage>03High a:03jcrespo [16:49:29] 10DBA: db2055 mariadb weirdness -lagging after reboot and upgrade - https://phabricator.wikimedia.org/T166326#3292666 (10Marostegui) I noticed this from the RAID controller: ``` Firmware Version: 7.02 ``` Whereas for example, db2054, db2056 and db2057: ``` Firmware Version: 6.00 ``` [16:51:29] 10DBA: db2055 mariadb weirdness -lagging after reboot and upgrade - https://phabricator.wikimedia.org/T166326#3292669 (10jcrespo) Lag is going down, let's wait until tomorrow and see how it is going. [17:01:22] 10DBA: db2055 mariadb weirdness -lagging after reboot and upgrade - https://phabricator.wikimedia.org/T166326#3292680 (10jcrespo) p:05High>03Normal [19:38:11] 10DBA, 06Labs: Labs database replica drift - https://phabricator.wikimedia.org/T138967#3293067 (10russblau) @jcrespo: Is there (or will there be) a way to access and create user databases on the new servers (as per https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#User_databases)? [20:39:33] 10DBA: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#3293244 (10Marostegui) [20:40:29] 10DBA, 06Operations, 10ops-eqiad: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#3293259 (10Marostegui) p:05Triage>03Normal [21:12:52] 10DBA, 06Operations, 10ops-eqiad: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#3293244 (10Volans) I've ack'ed the Icinga alarm with this task. I've also forced a BBU learn cycle on db1016, it was looking good during the cycle, and as soon as the battery was having some c... [23:13:15] 10DBA, 10Tool-Labs-tools-Other, 13Patch-For-Review: Tired of APIError: readonly - https://phabricator.wikimedia.org/T164191#3293594 (10Multichill)