[05:12:48] 10DBA, 10Operations, 10ops-codfw: replace bad disk in db2059 - https://phabricator.wikimedia.org/T196709#4266499 (10Marostegui) @Papaul feel free to replace the disk as soon as you get it [05:44:11] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: pc2005 down - https://phabricator.wikimedia.org/T196339#4266537 (10Marostegui) pc2005 caught up - I will wait until Monday to make sure it is estable before repooling it and closing this ticket [06:35:33] 10DBA: db1115 (tendril DB) crashed with OOM - https://phabricator.wikimedia.org/T196726#4266561 (10Marostegui) [06:36:04] 10DBA: db1115 (tendril DB) crashed with OOM - https://phabricator.wikimedia.org/T196726#4266571 (10Marostegui) [06:48:27] 10DBA: db1115 (tendril DB) crashed with OOM - https://phabricator.wikimedia.org/T196726#4266581 (10Marostegui) p:05Triage>03Normal Nothing on HW logs - I checked them just in case there was something. [07:02:23] 10DBA: db1115 (tendril DB) crashed with OOM - https://phabricator.wikimedia.org/T196726#4266598 (10Marostegui) [07:02:55] 10DBA: db1115 (tendril DB) crashed with OOM - https://phabricator.wikimedia.org/T196726#4266561 (10Marostegui) Just to be clear, MySQL didn't crash. ``` mysql:root@localhost [(none)]> show global status like 'Uptime'; +---------------+---------+ | Variable_name | Value | +---------------+---------+ | Uptime... [07:06:51] 10DBA: db1115 (tendril DB) had OOM for some processes - https://phabricator.wikimedia.org/T196726#4266601 (10Marostegui) [07:32:00] 10DBA: db1115 (tendril DB) had OOM for some processes - https://phabricator.wikimedia.org/T196726#4266624 (10Marostegui) 05Open>03Resolved a:03Marostegui I am going to close this, as there is not much else I can debug to see what caused it. At least we have it tracked now and re-open if it happens again [07:56:30] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737#4266660 (10Marostegui) [08:02:40] tendril was swapping since 22 May [08:02:49] I said that in the task, yes [08:03:02] and it is constantly leaking memory [08:03:17] tokudb maybe? [08:03:48] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: pc2005 down - https://phabricator.wikimedia.org/T196339#4266663 (10jcrespo) p:05High>03Normal [08:04:18] maybe toku, could be [08:04:21] it could be mysql [08:04:36] I set it as oom kill -1000 or something [08:04:42] so it doesn't crash [08:04:44] https://jira.mariadb.org/browse/MDEV-13403 [08:04:46] which seems to work [08:04:52] it says 10.2 [08:04:59] but who knows if this also affects 10.1 [08:06:11] comments say possibly affectiong any non-innodb engine [08:06:35] marostegui: when you say mysql didn't crash [08:06:48] did you restart it at least? [08:06:57] no, I didn't touch it [08:07:06] then it isn't the mysql process [08:07:20] becuase the memory pressure went away [08:07:31] Yeah, that was my reasoning too [08:07:42] But there are not many things left on that server [08:07:53] I guess it could be tendril the app [08:07:58] Too bad we don't have atop :p [08:08:01] To see what was the issue [09:19:44] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for idwikimedia - https://phabricator.wikimedia.org/T193187#4266784 (10Marostegui) >>! In T193187#4264531, @Bstorm wrote: > Surprisingly, the script has failed on the rights to create `_p` the database (could not execute the `CREATE DA... [09:41:47] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for idwikimedia - https://phabricator.wikimedia.org/T193187#4266857 (10Marostegui) I got it from executing what you were executing: ``` root@labsdb1010:/home/bstorm# sudo /usr/local/sbin/maintain-views --databases idwikimedia Traceback... [12:48:53] 10DBA, 10Phabricator, 10Documentation, 10Release-Engineering-Team (Kanban): Prepare a disaster recovery plan for failing over from phab1001 to phab2001 (or phab2001 to 1001) - https://phabricator.wikimedia.org/T190572#4266986 (10jcrespo) [12:50:22] 10DBA, 10Phabricator, 10Documentation, 10Release-Engineering-Team (Kanban): Prepare a disaster recovery plan for failing over from phab1001 to phab2001 (or phab2001 to 1001) - https://phabricator.wikimedia.org/T190572#4266990 (10mmodell) note: the steps are a bit different for failing over between data cen... [12:57:29] 10DBA, 10Phabricator, 10Documentation, 10Release-Engineering-Team (Kanban): Prepare a disaster recovery plan for failing over from phab1001 to phab2001 (or phab2001 to 1001) - https://phabricator.wikimedia.org/T190572#4266998 (10mmodell) [13:53:11] marostegui: jynus: Hey, If I want to make a column defaults to zero (instead of not having any default at all) and the table is empty everywhere in production. Is it easy to do? [13:54:06] yep, should be easy [13:54:11] (as it is empty) [13:54:36] yup, that's for sure [13:54:48] Thanks! [13:55:03] : [13:55:05] :) [13:55:20] is the table used? [13:55:26] i assume not [13:55:36] So maybe it is easier to drop+create the table again? [14:14:40] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1056 - https://phabricator.wikimedia.org/T193736#4267113 (10Marostegui) [14:17:32] 10DBA, 10Wiki-Loves-Monuments-Database: mysqldump is timing out preventing all tables from being included in the dump - https://phabricator.wikimedia.org/T138517#4267115 (10Marostegui) 05Open>03Resolved Going to resolve this as per T138517#3051262. Feel free to reopen if needed [14:18:47] 10DBA, 10Data-Services: Add statistics table to information_schema_p - https://phabricator.wikimedia.org/T196570#4267118 (10Marostegui) p:05Triage>03Normal [15:10:28] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for idprivatewikimedia - https://phabricator.wikimedia.org/T196748#4267274 (10Urbanecm) [15:11:37] 10DBA, 10Operations, 10decommission, 10ops-eqiad: Decommission db1056 - https://phabricator.wikimedia.org/T193736#4267289 (10RobH) [15:11:39] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for idprivatewikimedia - https://phabricator.wikimedia.org/T196748#4267274 (10Marostegui) Let us know when this is created, we need to create the filters for it, restart sanitariums and sanitize it [15:12:28] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for idprivatewikimedia - https://phabricator.wikimedia.org/T196748#4267293 (10Urbanecm) This will cause a deadlock :). This is a private wiki and private wikis cannot be created without green light from DBAs. [15:12:52] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for idprivatewikimedia - https://phabricator.wikimedia.org/T196748#4267296 (10Urbanecm) a:05Urbanecm>03None Eh, assigned to myself by mistake. [15:13:52] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for idprivatewikimedia - https://phabricator.wikimedia.org/T196748#4267301 (10Marostegui) a:03Marostegui >>! In T196748#4267293, @Urbanecm wrote: > This will cause a deadlock :). This is a private wiki and private wikis cannot be cre... [15:34:41] 10DBA, 10Cloud-Services, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for idprivatewikimedia - https://phabricator.wikimedia.org/T196748#4267382 (10Marostegui) I have merged the patch and restarted MySQL on codfw sanitariums, the filter has been applied there. On Monday I will resta... [15:34:43] 10DBA, 10Cloud-Services, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for idprivatewikimedia - https://phabricator.wikimedia.org/T196748#4267379 (10Marostegui) p:05Triage>03Normal I have merged the patch and restarted MySQL on codfw sanitariums, the filter has been applied there... [15:40:07] 10DBA, 10Operations, 10Traffic, 10Patch-For-Review: Framework to transfer files over the LAN - https://phabricator.wikimedia.org/T156462#4267398 (10Marostegui) p:05Triage>03Normal [15:43:01] 10DBA, 10Patch-For-Review: Failover s2 primary master - https://phabricator.wikimedia.org/T194870#4267418 (10Marostegui) db1066 candidate master has been rebooted to pick up the intel-microcodes before the failover [15:45:13] 10DBA, 10Patch-For-Review: Decommission db1051-db1060 (DBA tracking) - https://phabricator.wikimedia.org/T186320#4267422 (10Marostegui) To sum up the pending active hosts: db1054 (s2 primary master): Failover scheduled T194870 db1052 (s1 primary master): Will be failed over after s2 one [15:50:50] 10DBA, 10Epic, 10Wikimedia-Incident: Improve regular production database backups handling - https://phabricator.wikimedia.org/T138562#4267426 (10Marostegui) [15:53:02] 10DBA, 10Data-Services: Re-institute query killer for the analytics WikiReplica - https://phabricator.wikimedia.org/T183983#4267429 (10Marostegui) This is still not solved. I have asked upstream to see if there is any update on when they expect to release a fix [15:56:21] 10DBA, 10Cloud-Services, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for idprivatewikimedia - https://phabricator.wikimedia.org/T196748#4267432 (10Urbanecm) Ok, thanks! [16:03:18] 10DBA, 10Cloud-Services, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for idprivatewikimedia - https://phabricator.wikimedia.org/T196748#4267274 (10jcrespo) So, I think everybody knows what is the process like and it is documented, but just to clarify in case someone else reads this... [16:59:54] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for idwikimedia - https://phabricator.wikimedia.org/T193187#4267569 (10Bstorm) This is a different user. The maintainviews user doesn't have rights to create a database (which is surprising to me). There is no error in creating views... [17:15:39] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for idwikimedia - https://phabricator.wikimedia.org/T193187#4267618 (10Marostegui) >>! In T193187#4267569, @Bstorm wrote: > > > *edit* perhaps that's the problem. I think the grant should come before the create? I'm going to try tha... [17:53:17] 10DBA, 10Cloud-Services, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for id_privatewikimedia - https://phabricator.wikimedia.org/T196748#4267702 (10Urbanecm) [17:53:42] 10DBA, 10Cloud-Services, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for id_privatewikimedia - https://phabricator.wikimedia.org/T196748#4267274 (10Urbanecm) @Marostegui Sorry, but I've made a mistake in the DB name. It should be id_privatewikimedia, not idprivatewikimedia. Can you... [17:54:42] 10DBA, 10Cloud-Services, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for id_privatewikimedia - https://phabricator.wikimedia.org/T196748#4267705 (10Marostegui) Thanks for the heads up. I will do it on Monday [17:55:30] 10DBA, 10Cloud-Services, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for id_privatewikimedia - https://phabricator.wikimedia.org/T196748#4267708 (10Urbanecm) Thank you and sorry again. [17:55:32] 10DBA, 10Cloud-Services, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for id_privatewikimedia - https://phabricator.wikimedia.org/T196748#4267709 (10jcrespo) We don't tend to like databases with _ because that is a special character, and while it is complete legal and allowed, somet... [17:59:53] 10DBA, 10Cloud-Services, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for id_privatewikimedia - https://phabricator.wikimedia.org/T196748#4267719 (10Urbanecm) @jcrespo I'm pretty sure, arbcom-cs.wikipedia.org is named arbcom_cswiki, noboard-chapters.wikimedia.org is named noboard_ch... [18:01:12] 10DBA, 10Cloud-Services, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for id_privatewikimedia - https://phabricator.wikimedia.org/T196748#4267721 (10Urbanecm) If there are scripts that have problems with legal characters, the scripts should be rewritten IMHO. For example, I really c... [18:02:45] 10DBA, 10Cloud-Services, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for id_privatewikimedia - https://phabricator.wikimedia.org/T196748#4267726 (10jcrespo) @Urbanecm I don't disagree, but sadly, those are related to scripts run by wikireplica users, so nothing we can control, but... [18:09:36] 10DBA, 10Cloud-Services, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for id_privatewikimedia - https://phabricator.wikimedia.org/T196748#4267757 (10Urbanecm) Oh, ok, understood. I thought you are reffering to maintenance scripts in ME. BTW, this is a private wiki and having it acce... [19:17:00] 10DBA, 10Cloud-Services, 10Patch-For-Review, 10User-Urbanecm: Prepare and check storage layer for idwikimedia - https://phabricator.wikimedia.org/T193187#4267930 (10Bstorm) Ok, that did not work. If anything is run that will create the `idwikimedia_p` database, the script runs fine. However, it cannot ru... [20:08:13] 10DBA, 10Wiki-Loves-Monuments-Database: mysqldump is timing out preventing all tables from being included in the dump - https://phabricator.wikimedia.org/T138517#4268118 (10Lokal_Profil) >>! In T138517#4193681, @Marostegui wrote: > @Lokal_Profil is there anything pending here? I honestly cannot remember @Jean...