[06:10:46] 10Blocked-on-schema-change, 10MediaWiki-extensions-UrlShortener: Add usc_deleted to urlshortcodes - https://phabricator.wikimedia.org/T218397 (10Marostegui) 05Open→03Resolved This is now done: ` ./section x1 | while read host port; do echo $host; mysql.py -h$host:$port wikishared -e "show create table urls... [06:12:41] 10DBA, 10MediaWiki-Database, 10Tracking: Database replication lag issues (tracking) - https://phabricator.wikimedia.org/T3268 (10Marostegui) 05Open→03Declined Declining as per the suggestion from Andre at T3268#3823818 as we don't use this ticket. [06:37:43] 10DBA, 10Data-Services: Discrepancies with logging table on different wikis - https://phabricator.wikimedia.org/T71127 (10Marostegui) [06:37:52] 10Blocked-on-schema-change, 10MediaWiki-Database, 10MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), 10Schema-change: Add index log_type_action - https://phabricator.wikimedia.org/T51199 (10Marostegui) [06:38:33] 10DBA, 10Data-Services: Discrepancies with logging table on different wikis - https://phabricator.wikimedia.org/T71127 (10Marostegui) s8 eqiad progress [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1005 [] db1124 [] db1116 [] db1109 [] db1104 [] db1101 [] db1099 [] db1092 [] db1087 [] db1071 [06:38:36] 10Blocked-on-schema-change, 10MediaWiki-Database, 10MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), 10Schema-change: Add index log_type_action - https://phabricator.wikimedia.org/T51199 (10Marostegui) s8 eqiad progress [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1005 [] db1124... [06:38:51] 10DBA, 10Data-Services: Discrepancies with logging table on different wikis - https://phabricator.wikimedia.org/T71127 (10Marostegui) [06:39:01] 10Blocked-on-schema-change, 10MediaWiki-Database, 10MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), 10Schema-change: Add index log_type_action - https://phabricator.wikimedia.org/T51199 (10Marostegui) [08:08:24] I am working on labsdb1010 so we don't step on each other [08:09:34] I didn't notice labsdb1010 broken replication [08:09:45] But for the record, I won't depool dbproxy with the+1 from labs anyways [08:09:54] ? [08:11:15] What? [08:11:43] I don't undestand what you mean [08:11:53] About which sentence? [08:12:18] I didn't notice labsdb1010 broken replication, obviously, you don't need to justify yourself [08:12:31] I won't depool dbproxy with the+1, which +1? [08:12:43] +1 from labs people [08:12:58] And the labsdb1010 sentence is about: me starting to work earlier and missing that [08:13:02] where? [08:13:22] On the depooling dbproxy1010 [08:13:24] I cannot find a +1, maybe I am not added? [08:14:13] Ah, yes, sorry, my bad I meant: I will not depool dbproxy1010 WITHOUT +1 from labs people [08:14:20] ah [08:14:25] :) [08:14:28] so that is what I didn't understand [08:14:35] yeah, my bad [08:15:07] ok, I will fix this and maybe we can talk about the depool process- I have done it before [08:15:36] probably more times than cloud, so my +1 may be more experience based :-D [08:15:40] ok [09:34:04] do you want to talk about the dbproxy1010 depooling? [09:35:00] let me fix the db first, about to do it [09:35:17] ah sorry! [09:36:24] 10DBA, 10Data-Services: Discrepancies with logging table on different wikis - https://phabricator.wikimedia.org/T71127 (10Marostegui) [09:36:29] 10Blocked-on-schema-change, 10MediaWiki-Database, 10MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), 10Schema-change: Add index log_type_action - https://phabricator.wikimedia.org/T51199 (10Marostegui) [09:42:14] marostegui: it is rolling now, we may want to do a pagelinks check asap on wikidata [09:43:35] only 1010 broke? [09:43:40] let me look for a ticket I have in mind [09:43:44] sanitarium was fine? [09:45:09] So it is "good" that only labs broke as in: https://phabricator.wikimedia.org/T212574#4848198 [09:45:15] the whole s8 was checked already [09:45:32] sanitarium/labs is harder because of the triggers of course [09:46:50] Actually I fixed sanitarium: https://phabricator.wikimedia.org/T212574#4843725 [09:47:00] so it might only be labsdb1010 [09:47:40] we can compare it between labsdb hosts [09:47:49] that specific table on that specific wiki [09:47:50] yes, I was going to suggest that [09:47:56] but when it catches up [09:47:57] But I checked we have no triggers on pagelinks [09:48:49] so it should be "easy" to compare sanitarium with labs [09:49:28] From my ticket I see that I didn't specifically check labs, only sanitarium [09:49:34] so yes, sanitarium vs labs is needed [09:49:46] I can do that if you want [09:50:43] what are your questions about proxy depools? [09:51:21] basically I don't get your last comment on the patchset [09:51:33] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/497228/ [09:51:47] Cause I am doing the same depool you did a few months ago :p [09:52:08] what you are doing is ok [09:52:25] what I am suggesting/asking/noting is that it may not be enough [09:52:33] why? [09:52:56] because if I understood correctly, you want to send all labsdb traffic to a single server [09:53:04] to a single dbproxy, yes [09:53:06] maybe I am wrong, or I am missing other deploys [09:53:14] a single dbproxy is ok [09:53:17] no problem [09:53:35] the problem is that that proxy only sends data to a single host? [09:53:42] so I don't want traffic flowing thru dbproxy1010 (that is the aim of the patch) [09:53:49] Yes [09:53:57] (let me double check) [09:54:07] I am suggesting to also modify the proxy so it sends queries to every host [09:54:17] I used to do that, but without puppet [09:54:19] Ah, I get it [09:54:24] Now I understand what you mean [09:54:33] that is why it may not be shown on the change [09:54:47] so where do you change that? [09:54:49] you can do it in any way but maybe you hadn't though about that? [09:55:01] yeah, I didn't think about that [09:55:03] ha proxy config [09:55:11] edit + reload [09:55:16] or edit the hiera keys [09:55:20] any of the 2 [09:55:37] sure, now I understand :) [09:56:12] ok, I will change haproxy on dbproxy1011 manually and then +2 the patch [09:56:38] remember when I am being pedantic, you told me to be just to avoid issues [09:56:44] sure sure [09:56:48] :-) [09:56:52] I didn't get what you meant on the comments [09:56:56] now I do it [09:56:58] I will do it that way [09:57:15] makes sense, even if the reboot will take a couple of mins (famous last words) [09:57:37] the actual deploy of dns will take up to 30 minutes [09:57:46] if it doesn't error out [09:57:56] I thought the TTL was 5 mins [09:57:57] but anyways [09:57:59] makes more se4nse [09:58:06] no, the TTLS is on top of that [09:58:10] *ttl [09:58:16] it is the script that take a lot [09:58:20] ah :( [09:58:33] it is more involved than the authdns-update on production [09:58:51] but you should run it or try to run it at least once [09:59:12] 2 out of the 3 times I tried, it had broken dependencies/was on the wrong host, etc. [09:59:17] lovely XD [10:00:51] https://phabricator.wikimedia.org/P8214 [10:01:16] I won't push as labsdb1010 is really delayed still [10:01:21] but just checking with you if that's ok [10:02:08] I don't get it [10:02:17] that is the run after your changes? [10:02:32] or your change? [10:03:06] that is the diff I would apply on dbproxy1011 [10:03:31] but you want the opposite? [10:03:37] more pooled hosts [10:03:43] to handle all the traffic [10:04:18] the paste you sent me, if I understood it right, pools only 1009 [10:04:39] yeah, cause puppet run in the middle of my change :) [10:04:40] haha [10:04:50] or maybe you are sending me the inverse patch? [10:04:55] is that? [10:05:05] yeah, basically puppet ran :) [10:05:10] and undid my change :) [10:05:10] ok, then ok [10:05:18] it was confusing without context [10:05:28] you can add labsdb1011 too [10:05:28] yeah, basically the line #7 is the line I am adding [10:06:13] in the end, the result is ok, just double check the proxy is idle [10:06:37] it is very easy to make mistakes with dns changes ttl, ip mixups [10:06:44] I say it because I did those [10:06:48] yeah, I will wait until labsdb1010 has caught up though [10:07:04] I have added a comment to clarify the patchset discussion (in case we read it again in 5 months) [10:07:12] so as long as netstat is ok in the end, the actual method doesn't matter [10:07:17] thanks [10:07:42] maybe you can even document it on wikitech! :-D [10:08:09] Yeah, I was thinking about it, in that same page? https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Wiki_Replica_DNS [10:08:23] I would add a section to troubleshooting [10:08:28] with just a link [10:08:35] without rewriting everthing [10:08:47] MariaDB/troubleshooting I mean [10:09:09] "depooling a proxy" subsection "depooling a labsdb proxy" [10:09:30] and we can leave a todo on the other proxy depools [10:13:16] sounds good [10:16:21] it doesn't help that dbproxy1011 and labsdb1011 are confusing [10:17:31] do you think https://gerrit.wikimedia.org/r/494899 is wise to do soon, or should I wait? [10:18:58] I thought I had commented [10:19:04] Maybe it was in a different patchset [10:19:15] I thought I said: let's deploy and get this working and get bugs as we use it [10:19:40] maybe I dreamed that? [10:19:43] I am sure I said it! [10:24:27] so so far I think the wikidata link table is a minor issue, and something we can do at a later time [10:24:36] :) [10:24:48] I will focus on the backups deploy [10:24:56] great! [10:25:11] there is a high chance of having to revert that, so better start soon [10:25:20] exactly [10:25:23] that was my point! [10:25:29] or what I wanted to make as a point [10:25:40] I will ask however to other people [10:25:41] let's start deploying and check what we see [10:25:42] sure [10:48:31] marostegui: I have two bugs on https://gerrit.wikimedia.org/r/497265 [11:16:26] 10DBA, 10Operations, 10ops-codfw: rack/setup/deploy dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10Marostegui) Hello @Papaul Jaime and myself discussed a few things. Hostname:** dbprov2001** **dbprov2002** **RAID0** for SSD **RAID6** for the SATA Disks partman r... [12:00:32] marostegui: remember the 15 meeting [12:03:37] yep :) [12:24:31] https://cstack.github.io/db_tutorial/ [13:23:14] 10DBA, 10cloud-services-team (Kanban): Openstack codfw DBs: move to m5-master.eqiad.wmnet - https://phabricator.wikimedia.org/T218569 (10aborrero) [13:23:22] 10DBA, 10cloud-services-team (Kanban): Openstack codfw DBs: move to m5-master.eqiad.wmnet - https://phabricator.wikimedia.org/T218569 (10aborrero) p:05Triage→03High [13:26:16] 10DBA, 10cloud-services-team (Kanban): DB planning: include a misc cluster in codfw - https://phabricator.wikimedia.org/T218570 (10aborrero) [13:26:34] 10DBA, 10cloud-services-team (Kanban): DB planning: include a misc cluster in codfw - https://phabricator.wikimedia.org/T218570 (10aborrero) p:05Triage→03Normal [13:27:39] FYI, I just created: [13:27:40] * DB planning: include a misc cluster in codfw https://phabricator.wikimedia.org/T218570 [13:27:40] * Openstack codfw DBs: move to m5-master.eqiad.wmnet https://phabricator.wikimedia.org/T218569 [13:27:49] the last one is the most urgent of the two [13:29:44] arturo: that's great to hear. that should also unblock gerrit and phab afaict [13:30:04] mutante: cool, please include that info in the phab task [13:37:31] done [13:37:40] 10DBA, 10cloud-services-team (Kanban): DB planning: include a misc cluster in codfw - https://phabricator.wikimedia.org/T218570 (10Dzahn) This would be great because, afaict, this would also unblock having a Phabricator (T137928) and Gerrit (T176532) working in codfw. (They are both blocked by lack of misc db... [13:38:22] 10DBA, 10Gerrit, 10Operations, 10Release-Engineering-Team (Next): Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532 (10Dzahn) T218570 might unblock this [13:39:53] thanks mutante :-) [13:50:09] mutante: I don't see how that would unblock gerrit/pha [13:50:25] As far as I undertand arturo's request is for _new_ DB hardware dedicated to that [13:50:40] mutante: we do have (some sort of) misc on codfw already [13:50:57] But their request (as far as I know) is to get new DBs (writable) ones [13:51:23] so it is hw but also architecture [13:51:31] we lack proxies [13:52:03] yes, but gerrit/phab would still be on the current misc clusters [13:52:08] and we don't have a method to setup active dbs on codfw, we need to figure that out [13:52:11] and arturo's ticket is for a different thing [13:52:31] I am talking about arturo request [13:52:41] I haven't read the full backlog [13:52:47] I am talking about https://phabricator.wikimedia.org/T218570#5032606 [13:56:15] 10DBA, 10cloud-services-team (Kanban): DB planning: include a misc cluster in codfw - https://phabricator.wikimedia.org/T218570 (10Marostegui) >>! In T218570#5032606, @Dzahn wrote: > This would be great because, afaict, this would also unblock having a Phabricator (T137928) and Gerrit (T176532) working in codf... [14:05:16] marostegui: alright, it's specifically just the lack of proxy then. sorry if the ticket is unrelated but the result for phab/gerrit is the effectively the same, they cant have a db in codfw [14:05:44] and they are on misc.. which made me think this is it as well [14:35:14] mutante: yeah, it is confusing! :) [14:49:35] 10DBA, 10Operations, 10ops-codfw: rack/setup/deploy dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10Papaul) @Marostegui Thank you what is the stripe size to use? [14:51:36] 10DBA, 10Operations, 10ops-codfw: rack/setup/deploy dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10Marostegui) 256KB [15:25:37] 10DBA, 10Operations, 10ops-codfw: rack/setup/deploy dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10Papaul) switch port information dbprov2001: asw-a4-codfw xe-4/0/18 dbprov2002: asw-b4-codfw xe-4/0/2 [15:27:49] 10DBA, 10Operations, 10ops-codfw: rack/setup/deploy dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10Papaul)