[02:36:03] 10DBA, 06Labs: Querying the logging table on labs is slow - https://phabricator.wikimedia.org/T131266#2829625 (10MZMcBride) >>! In T131266#2272853, @MZMcBride wrote: > Something is funky with the views for `logging` and `logging_userindex` specifically, it seems. I rediscovered this issue over the weekend. `... [06:39:10] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2829733 (10Marostegui) @Papaul I have depooled db2048, so we can use that one to swap the PSUs with. Ping me once you are online so I can do the pre work to get it ready Thanks! [06:43:58] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2829736 (10Marostegui) The replication for both threads had no issues during the night so what... [07:31:16] 10DBA, 06Labs, 10Labs-Infrastructure: Migrate existing labs users from the old servers, if possible using roles and start maintaining users on the new database servers, too - https://phabricator.wikimedia.org/T149933#2829832 (10jcrespo) Yes, I wanted to set something like 10 max connections for the web reque... [10:25:32] 10DBA, 15User-Urbanecm: Prepare and check storage layer for the future private wiki arbcom-cs.wikipedia.org - https://phabricator.wikimedia.org/T151752#2830130 (10jcrespo) p:05Triage>03Normal [13:10:46] I think there is no ongoing process on db1095: T150802#2829736 [13:10:46] T150802: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802 [13:12:11] I am going to restart db1095 to unblock T151752 [13:12:11] T151752: Prepare and check storage layer for the future private wiki arbcom-cs.wikipedia.org - https://phabricator.wikimedia.org/T151752 [13:23:30] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2830561 (10jcrespo) >>! In T150802#2829736, @Marostegui wrote: > @jcrespo thanks for the detai... [13:23:51] 10DBA, 15User-Urbanecm: Prepare and check storage layer for the future private wiki arbcom-cs.wikipedia.org - https://phabricator.wikimedia.org/T151752#2826725 (10jcrespo) 05Open>03Resolved [13:28:01] 10DBA, 06Labs: Prepare and check storage layer for new fi.wikivoyage.org - https://phabricator.wikimedia.org/T151756#2830575 (10jcrespo) Standing by until the database is actually created: ``` $ mysql --socket=/tmp/mysql.s3.sock fiwikivoyage ERROR 1049 (42000): Unknown database 'fiwikivoyage' ``` [14:18:41] jynus: you needed db1095 for the private wiki right? [14:18:53] The sanitize script is done, so maybe you can use db1095 now before I start compressing enwiki [14:20:19] see SAL/log here [14:20:34] and https://phabricator.wikimedia.org/T150802#2830561 [14:21:09] \o/ [14:21:19] awesome [14:21:20] thanks [14:22:09] if you are around, check the many puppet reviews I sent [14:22:22] they are all labsdb&sanitarium related [14:24:55] yeah, going to do it now [14:26:50] hey gents is labsdb1009 the "test" box for poking at user roles and create-dbusers testing? [14:27:12] yes, you can use that [14:27:35] but I do not know if that has yet any account [14:27:42] it is empty now [14:28:22] kk [14:28:25] but I tested https://phabricator.wikimedia.org/T149933#2820524 there [14:28:50] I can create a labsdbadmin user now [14:29:27] actually, it exists already [14:30:03] and I was tensting things with u2029, which supposedly is quarry [14:30:35] there is a role called labsdbuser there [14:32:25] 324040 -> jynus very nice work! [14:32:58] that took some time [14:33:39] I would deprecate redactatron [14:33:52] and deploy redact.sh with puppet [14:36:21] BTW, chase mentioned possibly meeting this friday in our afternoon [14:36:25] to sync up [14:36:40] that is fine by me [14:36:44] I want to mention some decisions about the proxies and user accounts [14:37:10] mention *to take* those decisions [14:38:15] as chase is around, I am now going to deploy now 324040 [14:38:27] so if something is wrong, we can fix it [14:40:02] sure thing [14:40:23] https://d30y9cdsu7xlg0.cloudfront.net/png/115782-200.png [14:41:11] well, the good news is nothing there is automatic [14:41:24] worse case scenarion, we fail to deploy some .py files [14:53:34] chase should test a noop with the views (no rush) to make sure I haven't broken its deploy [14:54:51] 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2830926 (10Marostegui) Alter is running on dbstore1001 [15:00:12] 10DBA, 10Phabricator, 07Upstream: Editing a recurring event overrides all past instances - https://phabricator.wikimedia.org/T151228#2830948 (10Marostegui) @epriestley amazing! Thanks a lot for that!! @mmodell @daniel let me know if you guys want me to take a backup (and if possible from which tables) befor... [15:02:10] sure [15:02:14] puppet is clean everywhere? [15:02:18] yes [15:02:21] well [15:02:28] actually I didn't run it on labs [15:02:31] but it should be [15:02:33] let me do it now [15:03:26] actually no [15:03:53] I know what it is [15:13:51] seems like recovery :) [15:14:36] yes [15:14:38] sorry [15:14:52] a bit of a confusion between filtered_tables and filtered_columns [15:15:01] creating more noise than it should, fixed now [15:15:04] you can test [15:42:10] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2831134 (10Marostegui) db2034 and db2048 are now off. [16:10:12] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2831241 (10jcrespo) This now "just works" (TM)- no need for passwords, hosts or anything: ```... [16:15:58] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2831255 (10Marostegui) Papaul has swapped the PSU so I have started the first test to try to crash db2034 which has now "the new PSU" from db2048. [16:16:54] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2831261 (10Marostegui) >>! In T150802#2831241, @jcrespo wrote: > This now "just works" (TM)- n... [16:21:17] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2831274 (10jcrespo) Note wiki list are already locally there, too. To get a list of public wik... [16:25:03] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2831293 (10Marostegui) The following queries in `enwiki` gave no records so the first iteratio... [16:48:46] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2831376 (10Marostegui) db2034 just crashed. [16:53:16] :-( [16:53:33] RAID probably... [16:54:20] let me just drop a little <3 here for the dbas [16:54:41] XDD [16:57:11] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2831399 (10Marostegui) There were no logs this time. I have cleared them via ILO and will start the process again to see if we get the same PSU related log again. [17:20:29] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2831505 (10Marostegui) We have decided to change the PDU sockets it is connected to, we will see what happens now. [17:33:01] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2831532 (10Papaul) ps1-c6-codfw log 1 Not Available EVENT: TCP/IP stack has started 2 Not Available EVENT: System boot complete 3 Not Available EVENT: Humidity sensor "Temp_Hum... [18:03:27] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2831764 (10Marostegui) The server caught up and I can see new records coming in and being sani... [18:40:17] 10DBA: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#2831979 (10Marostegui) The following hosts need the following alter: ``` ./software/dbtools/osc_host.sh --host=xxxx --port=3306 --db=wikidatawiki --table=revision --method=ddl --no-replicate "DROP index rev_id, add... [18:57:41] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2832053 (10Marostegui) The server crashed again, although this time it took 1:33h to crash it, which is a lot longer than usual. This time we switched its PDU sockets. Recap: - main board replaced... [19:03:12] 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2832074 (10mmodell) [19:59:40] 07Blocked-on-schema-change, 10DBA, 10Wikimedia-Site-requests, 06Wikisource, and 2 others: Schema change for page content language - https://phabricator.wikimedia.org/T69223#2832341 (10Nemo_bis) [20:03:41] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2832385 (10Marostegui) I have left 43 instances of cpuburn running on db2034 to see if it crashes the server. So far we have only see crashes while having high IO operations - Alter table on a big... [21:05:13] 10DBA, 06Operations, 10ops-codfw: db2041: Disk RAID predictive failure - https://phabricator.wikimedia.org/T151203#2832766 (10Papaul) Dear Mr Papaul Tshibamba, Thank you for contacting Hewlett Packard Enterprise for your service request. This email confirms your request for service and the details are below...