[00:04:51] 10DBA, 10MediaWiki-General-or-Unknown, 06Operations, 13Patch-For-Review: img_metadata queries for PDF files saturates s4 slaves - https://phabricator.wikimedia.org/T147296#2840357 (10aaron) 05Open>03Resolved a:03aaron Makes sense. [03:10:22] 10DBA, 10Wikimedia-General-or-Unknown, 07Regression, 07WorkType-Maintenance: User skin preference for MonoBook changed to default (Vector) on small and medium wikis (about 95% of wikis) - https://phabricator.wikimedia.org/T114208#2840767 (10Krinkle) [07:24:37] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2840907 (10Marostegui) And the server crashed ``` severity=Critical date=12/01/2016 time=16:53 description=System Power Fault Detected (XR: 14 00 MID: FF 4D FC CE C0 FF FF 32 32 0C... [07:26:07] 10DBA, 06Operations, 10ops-codfw: db2041: Disk RAID predictive failure - https://phabricator.wikimedia.org/T151203#2840908 (10Marostegui) The RAID rebuilt correctly ``` root@db2041:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 001438031205DF0) Gen8 ServBP 12+2 a... [07:26:15] 10DBA, 06Operations, 10ops-codfw: db2041: Disk RAID predictive failure - https://phabricator.wikimedia.org/T151203#2840909 (10Marostegui) 05Open>03Resolved [07:28:37] 10DBA, 06Collaboration-Team-Triage, 10Flow, 13Patch-For-Review, and 2 others: Drop flow_subscription table - https://phabricator.wikimedia.org/T149936#2840911 (10Marostegui) Thanks! We will take it from here! [07:39:48] 10DBA: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#2840913 (10Marostegui) dbstore2002 is done ``` root@neodymium:~# mysql -hdbstore2002.codfw.wmnet -A wikidatawiki -e "show create table revision\G" *************************** 1. row ***************************... [08:56:54] 10DBA, 06Operations: Restart pending mysql hosts with old TLS cert - https://phabricator.wikimedia.org/T152188#2841012 (10jcrespo) [09:00:35] how's compression doing? [09:01:03] not bad [09:01:09] it is still with sXX wikis [09:01:23] srwiki at the moment [09:01:48] so 80% [09:03:11] did you check is centralauth was sanitized? [09:03:13] *if [09:03:20] it's the odd one out [09:04:06] centralaouth is s7 right? [09:04:10] we have not imported s7 [09:04:23] true, I am silly [09:04:33] I can import it [09:04:40] no no [09:04:47] I was only asking [09:04:59] because it is the only non-wiki db we should replicate [09:05:07] well, that and heartbeat [09:06:19] ah true, it is a special one [09:07:09] it was more of a self-reminder [09:07:25] we also need to load the events, as you pointed out couple of days ago [09:07:30] true [09:08:45] the ones generating the information_schema_p database I just learned about 3 days ago [09:11:38] i can't find the comment now! [09:12:34] ah found it now [09:13:12] one thing I could do to "fix" the check [09:13:34] is perform some kind of strip on the columns to eliminate the \0 [09:14:09] or check the type of the column, and change the check depending on it [09:14:15] yeah yesterday I was thinking about that [09:14:27] yes, I thought that you can check if depends on the column and then do a SELECT HEX [09:14:30] and check if it is 0 [09:14:35] and if it is, it is good [09:14:55] NULL => NULL, NOT NULL && int => 0, NOT NULL && varchar => '' [09:15:20] NOT NULL AND CHAR/BINARY => repeat(\0) [09:15:24] etc [09:15:54] yeah, something like that [09:16:04] otherwise I don't know how to check it [09:16:32] we can do that at a later time [09:16:54] yes, it is not blocking at all [09:19:30] 10DBA, 13Patch-For-Review: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#2841091 (10Marostegui) Running alter db2045 [10:26:18] 10DBA, 13Patch-For-Review: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#2841142 (10Marostegui) db2045 is done ``` root@neodymium:~# mysql -hdb2045.codfw.wmnet -A wikidatawiki -e "show create table revision\G" *************************** 1. row ********************... [10:39:44] any preference for which labsdb receives the new data from db1095 first? [10:39:44] (it is not yet finished, but I am creating the new ticket) [10:39:45] well, chase & me was testing labsdb1009 [10:39:45] so any of the other 2 [10:39:45] roger that - thanks! [10:39:45] I noticed one thing [10:39:45] https://phabricator.wikimedia.org/T152188 [10:39:57] we need to update on puppet the TLS options [10:40:03] for the new labs hosts [10:40:18] ah [10:40:38] it is not a priority [10:40:50] but probably can be done easily on next restart [10:40:59] we can do it once they get the data and get that away from us [10:41:16] yes [10:51:49] 10DBA, 06Labs, 10Labs-Infrastructure: Provision sanitized data on labsdb1009, labsdb1010, labsdb1011 with from db1095 - https://phabricator.wikimedia.org/T152194#2841178 (10Marostegui) [10:52:33] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2841193 (10Marostegui) [10:52:35] 10DBA, 06Labs, 10Labs-Infrastructure: Provision sanitized data on labsdb1009, labsdb1010, labsdb1011 with from db1095 - https://phabricator.wikimedia.org/T152194#2841192 (10Marostegui) [10:57:41] 10DBA, 06Labs, 10Labs-Infrastructure: Provision sanitized data on labsdb1009, labsdb1010, labsdb1011 with from db1095 - https://phabricator.wikimedia.org/T152194#2841202 (10Marostegui) [11:00:09] 10DBA, 06Labs, 10Labs-Infrastructure: Provision sanitized data on labsdb1009, labsdb1010, labsdb1011 with from db1095 - https://phabricator.wikimedia.org/T152194#2841205 (10Marostegui) [11:05:59] 10DBA, 13Patch-For-Review: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2841213 (10Marostegui) db1071 is done: ``` root@neodymium:~# mysql -hdb1071 -A dewiki -e "show create table revision\G" *************************** 1. row *************************** Table: revision Crea... [11:35:37] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2841276 (10Marostegui) Compression is done. s1 and s3 are now compressed (1.3T in total). I... [11:36:49] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2841278 (10jcrespo) Let's load the events and close this as resolved. [11:37:15] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2841279 (10jcrespo) a:03Marostegui [11:38:17] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2841280 (10Marostegui) Yes, I just wanted to load the events once the first data is copied ove... [11:43:07] 10DBA, 06Labs, 10Labs-Infrastructure: Provision sanitized data on labsdb1009, labsdb1010, labsdb1011 with from db1095 - https://phabricator.wikimedia.org/T152194#2841317 (10Marostegui) The transfer from db1095 to labsdb1010 has just started [11:45:40] 10DBA, 13Patch-For-Review: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#2841321 (10Marostegui) db2052 is done ``` root@neodymium:~# mysql -hdb2052.codfw.wmnet -A wikidatawiki -e "show create table revision\G" *************************** 1. row ********************... [11:54:36] 10DBA, 06Operations, 13Patch-For-Review: Restart pending mysql hosts with old TLS cert - https://phabricator.wikimedia.org/T152188#2841327 (10jcrespo) a:03jcrespo [12:19:25] 10DBA, 06Operations: install/deploy dbproxy1003 through dbproxy1011 - https://phabricator.wikimedia.org/T86958#2841391 (10jcrespo) 05Open>03Resolved a:03jcrespo This was done long time ago, although more work is probably needed in the future. [12:36:57] dbproxy1010 and dbproxy1011 now alert when labsdb1009/10/11 are down [12:37:07] only to IRC, but just a heads up [13:18:49] Ah, thanks should we silence them when operating labs servers? [13:19:35] I acked them for now [13:19:55] this is something new-ish [13:58:27] at what time is the meeting with chasemp? [13:58:39] when he is available [13:58:50] his morning, so when he connects [14:01:18] thanks, I was wondering if you guys set any specific time, but that is fine [14:01:41] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision sanitized data on labsdb1009, labsdb1010, labsdb1011 with from db1095 - https://phabricator.wikimedia.org/T152194#2841520 (10jcrespo) [14:06:58] 10DBA, 06Operations, 13Patch-For-Review: Restart pending mysql hosts with old TLS cert - https://phabricator.wikimedia.org/T152188#2841566 (10jcrespo) p:05Normal>03Low [14:32:50] 10DBA, 13Patch-For-Review: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#2841595 (10Marostegui) db2059 is done ``` root@neodymium:~# mysql -hdb2059.codfw.wmnet -A wikidatawiki -e "show create table revision\G" *************************** 1. row ********************... [14:34:50] 10DBA, 07Epic: Decouple roles from mariadb.pp into their own file - https://phabricator.wikimedia.org/T150850#2841597 (10jcrespo) [14:38:05] I grew tired of updating 100 templates, so I started https://gerrit.wikimedia.org/r/324915 [15:07:00] jynus: Im' about but somehwat caught up in something? irc meeting? I saw https://gerrit.wikimedia.org/r/#/c/324905/ and makes sense [15:07:47] so we will finish soon the setup of labsdbs [15:08:02] (the data) [15:08:13] we need to advance the accounts [15:08:16] and proxies [15:08:31] does advance the accounts mean migrate the users from old setup to new? [15:08:33] in particular, the accounts have to be synced with [15:08:40] the script [15:08:59] yes, basically, we migrate users and start updating the accounts there, too [15:09:22] on my the ticket you had the changes that had to be done because of roles [15:09:27] right, I have it in mind to do that on our side but haven't had a ton of time to dig in [15:09:43] if I can't get to it for sure early in the next week Andrew has said he can step in and help [15:09:45] there are still some questions [15:09:51] it's matter of how much architecting to do I think [15:09:52] shoot [15:09:54] for example [15:10:09] migrate accounts is easy, but there are some custom things [15:10:25] users currently could have extra permissions [15:10:36] like writing to a user database that is not theirs [15:10:57] do we not take user databases into account for now? [15:11:04] sounds good [15:11:20] or yes I agree let's add that in post first order user / perms creation w/ roles [15:11:23] that would simplify the initial population [15:11:37] and we can later see how we do user dbs [15:11:56] in terms of quotas, grants and replication [15:12:08] and the data migration [15:13:09] the other question is, because we are doing haproxy [15:13:27] users are handled on the servers [15:13:44] which means per-user restrictions have to be configured there [15:14:18] so we can add the 10 connection maximum, but if that later goes from the web requests to the analytics role [15:14:34] it will not be very dynamic [15:14:47] (not sure if I am able to explain the issue clearly) [15:15:03] are you saying, 10 would not be enough in the analytics scenario [15:15:07] no [15:15:11] yet we are setting this limit per labsdb [15:15:15] ok [15:15:17] we want less on the analytics [15:15:22] ah [15:15:29] because long-running connections [15:15:57] which is not a problem, until we have to change the "role" of a server in the proxy [15:16:15] e.g. on maintenance [15:16:49] it might be atm there is more convention to this separation that technical enforcement [15:16:51] we could setup a 10 enforced maximum, and maybe a killing process [15:17:04] for long running connections? [15:17:47] that could be more easily changed on the fly? [15:17:47] the role itself has the setting right? [15:18:12] I think no, that ha to be hardcoded along with the user and password [15:19:18] I tink I see what you mean afa wwe would ideally want different settings in the analytics and web requests use cases and only having 3 boxes that need to stand in for each other means we cannot honor that in every sense we want atm [15:19:24] I'm not sure there is a solution for now [15:20:37] but it seems like your saying a process killer that can be running only on servers in one of those roles to do a layer of enforcement above the max connections may be our current tet [15:20:38] bet even [15:22:10] so, to clarify, things I need from you when data is loaded [15:22:30] * implement the view creation script (it should be a 1-line puppet change) [15:22:46] * run the create views once for the wikis that are there [15:23:21] * manage the users (import the existing ones and setup the automatic creation) with the new grants [15:23:29] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2841740 (10Marostegui) The data has been copied over. events have been loaded. [15:23:39] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision with data the new labsdb servers and provide replica service with at least 1 shard from a sanitized copy from production - https://phabricator.wikimedia.org/T147052#2841744 (10Marostegui) [15:23:41] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision sanitized data on labsdb1009, labsdb1010, labsdb1011 with from db1095 - https://phabricator.wikimedia.org/T152194#2841743 (10Marostegui) [15:23:43] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2841742 (10Marostegui) 05Open>03Resolved [15:24:03] (end of list?) [15:24:18] dns/hosts updates on labs environement? [15:24:38] it depends on when we want to make this available to users [15:24:47] we may want to do a controlled test first [15:24:48] we targeted side-by-side so additions w/o subtractions atm is my understanding and yes that's fair to be on us [15:24:51] right [15:25:15] jynus: is there a way to enable/disable a role to effect perms derived from that role for all users? [15:25:16] like [15:25:21] something role enable foo [15:25:24] someting role disable foo [15:25:45] that would allow for a full setup and a permissions controlled in-service switch [15:26:13] so let me show you what is there now [15:26:26] and you can tell me if that is helpful for you to do that [15:27:24] let's talk in private, as this is account-related [15:28:15] yep [15:28:42] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2841770 (10Marostegui) I messed the data a bit when setting up the slave, I will transfer the... [15:28:48] ^ sorry about that [15:31:41] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision with data the new labsdb servers and provide replica service with at least 1 shard from a sanitized copy from production - https://phabricator.wikimedia.org/T147052#2841777 (10Marostegui) [15:31:43] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2841775 (10Marostegui) 05Resolved>03Open [15:40:54] 10DBA, 06Labs, 10Labs-Infrastructure: Migrate existing labs users from the old servers, if possible using roles and start maintaining users on the new database servers, too - https://phabricator.wikimedia.org/T149933#2841786 (10jcrespo) These should be the ideal grants for new users: ``` GRANT USAGE ON *.*... [16:08:04] 10DBA, 13Patch-For-Review: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#2841827 (10Marostegui) db2066 is done ``` root@neodymium:~# mysql -hdb2066.codfw.wmnet -A wikidatawiki -e "show create table revision\G" *************************** 1. row ********************... [16:28:55] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Implement a frontend failover solution for labsdb replicas - https://phabricator.wikimedia.org/T141097#2841887 (10jcrespo) ``` $ mysql -h labsdb-web -u u -p$PASS Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connect... [16:29:42] ^ nice! [16:29:51] 10DBA, 06Labs, 10Labs-Infrastructure: Provide at least 2 separate service endpoints: one for slow, long running queries; and another for quick, web requests - https://phabricator.wikimedia.org/T147051#2841893 (10jcrespo) See progress here: T141097#2841887 [17:10:38] 10DBA, 06Labs, 10Labs-Infrastructure: Migrate existing labs users from the old servers, if possible using roles and start maintaining users on the new database servers, too - https://phabricator.wikimedia.org/T149933#2842008 (10yuvipanda) a:03yuvipanda [17:11:37] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision sanitized data on labsdb1009, labsdb1010, labsdb1011 with from db1095 - https://phabricator.wikimedia.org/T152194#2842026 (10Marostegui) [17:11:39] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision with data the new labsdb servers and provide replica service with at least 1 shard from a sanitized copy from production - https://phabricator.wikimedia.org/T147052#2842027 (10Marostegui) [17:11:41] 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2842024 (10Marostegui) 05Open>03Resolved The data has been copied over and labsdb1010 is n... [17:14:17] 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision sanitized data on labsdb1009, labsdb1010, labsdb1011 with from db1095 - https://phabricator.wikimedia.org/T152194#2842031 (10Marostegui) labsdb1010 has now data from db1095 and it is catching up. I have applied the events too: https://phab... [19:33:06] select page_title, pr_type, pr_level from page left join page_restrictions on pr_id = page_id where page_namespace = 828; <- Hi! Why do I get with this sql query nothing from table page_restrictions. Namespace 4 or 0 returns restricted pages, namespace 828 nothing [19:33:53] hoo, any idea? [19:47:56] bd808: are you here? may it be a bug? ^