[06:16:26] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Drop the "wb_terms.wb_terms_language" index - https://phabricator.wikimedia.org/T179106#3753677 (10Marostegui) [06:38:25] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Drop the "wb_terms.wb_terms_language" index - https://phabricator.wikimedia.org/T179106#3753685 (10Marostegui) >>! In T179106#3750305, @Marostegui wrote: > Going to test+optimize on db2089 which has wb_terms compressed. > ``` > -rw-rw---- 1 mysql... [06:49:28] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Consider dropping the "wb_items_per_site.wb_ips_site_page" index - https://phabricator.wikimedia.org/T179793#3753692 (10Marostegui) [06:49:48] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Consider dropping the "wb_items_per_site.wb_ips_site_page" index - https://phabricator.wikimedia.org/T179793#3736250 (10Marostegui) index dropped from codfw: ``` root@neodymium:/home/marostegui/git/software/dbtools# cat s5.hosts | grep codfw | whi... [07:04:55] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Consider dropping the "wb_items_per_site.wb_ips_site_page" index - https://phabricator.wikimedia.org/T179793#3753695 (10Marostegui) [07:06:24] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Consider dropping the "wb_items_per_site.wb_ips_site_page" index - https://phabricator.wikimedia.org/T179793#3736250 (10Marostegui) [07:10:20] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Consider dropping the "wb_items_per_site.wb_ips_site_page" index - https://phabricator.wikimedia.org/T179793#3753697 (10Marostegui) [07:14:21] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Consider dropping the "wb_items_per_site.wb_ips_site_page" index - https://phabricator.wikimedia.org/T179793#3753700 (10Marostegui) [07:16:44] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Consider dropping the "wb_items_per_site.wb_ips_site_page" index - https://phabricator.wikimedia.org/T179793#3753713 (10Marostegui) [07:23:07] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Consider dropping the "wb_items_per_site.wb_ips_site_page" index - https://phabricator.wikimedia.org/T179793#3753718 (10Marostegui) [07:23:43] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Consider dropping the "wb_items_per_site.wb_ips_site_page" index - https://phabricator.wikimedia.org/T179793#3736250 (10Marostegui) 05Open>03Resolved This is all done: ``` root@neodymium:/home/marostegui/git/software/dbtools# cat s5.hosts | gr... [07:24:52] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-ORES, 10Scoring-platform-team (Current), 10User-Ladsgroup: Review and deploy schema change on dropping oresc_rev_predicted_model index - https://phabricator.wikimedia.org/T180045#3753721 (10Marostegui) a:03Marostegui [07:29:43] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Drop the "wb_terms.wb_terms_language" index - https://phabricator.wikimedia.org/T179106#3753726 (10Marostegui) [07:32:31] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Drop the "wb_terms.wb_terms_language" index - https://phabricator.wikimedia.org/T179106#3753731 (10Marostegui) The index has been dropped from all codfw. I am optimizing a non-compressed slave to see if the gain is worth the time. As we have seen... [07:41:42] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Drop the "wb_terms.wb_terms_language" index - https://phabricator.wikimedia.org/T179106#3753755 (10Marostegui) [09:14:18] elukey: when do you want to do the migration of db1046? [09:19:19] marostegui: hola! What do you think about tomorrow morning so I'll be able to send a heads up email for the maintenance? [09:19:43] better wednesday for me, I will be afk for a 2-3 hours tomorrow morning [09:34:03] ack! [09:34:18] ping me when you get online on wednesday morning [10:06:45] backups on dbstore2001 seem to be working well [10:06:58] yeah :) [10:07:06] jynus: now that you are back, let me give you some context [10:07:26] we have pooled new multiinstance hosts on codfw (all of them) and we are now ready to do the same with the first one in eqiad: https://gerrit.wikimedia.org/r/#/c/390947/ [10:07:43] volans: has kindly reviewed the configs, as i was a bit afraid of this new config breaking the site [10:08:04] but all went well, so i would like if you can review that patch for eqiad, so we can have the first one in eqiad serving a small % of traffic [10:09:41] why mixing eqiad and codfw on the commit? [10:09:56] better separate so if we have to revert, it is easier? [10:10:42] also was recentchanges on codfw checked manually (curl)? [10:10:42] I would revert and deploy db-eqiad first anyways i think [10:10:47] yep [10:10:59] ? [10:11:01] revert? [10:11:50] we checked: https://phabricator.wikimedia.org/T178359#3722507 [10:12:06] if we have to revet, i will revert the whoel commit and scap db-eqiad first [10:12:14] (to answer your question about easier revert) [10:12:40] ok [10:12:45] tim reviewed, too [10:13:00] yeah, all codfw hosts have been in mediawiki-config for a week :) [10:13:01] or more [10:13:02] not that I do not trust volans, but he literally said checked the code [10:13:06] yeah [10:14:43] oh yeah! we also got that tested in labs before prod, see T178553#3715246 and follow ups [10:14:44] T178553: Support multi-instance hosts on mediawiki-config - https://phabricator.wikimedia.org/T178553 [10:15:11] "But I can't say that there isn't code somewhere that would be confused by this. It's hard to prove a negative." said Anomie [10:16:34] what about ROW for s8, something done? [10:17:30] oh, I see [10:17:40] I got confused by the change on db-codfw [10:17:49] it is for eqiad only [10:17:59] this one yes [10:18:02] I thought there was some codfw change at the same time [10:18:06] all the codfw multi-instance hosts are already in place [10:18:10] that is why I was against [10:18:15] this is the first eqiad host :) [10:18:19] mixing changes to both datacenters [10:18:29] now I see it is only the eqiad reference on codfw [10:18:35] sorry [10:18:54] haha [10:19:06] you're back from 3 weeks of holidays take it easy :) [10:19:10] lol [10:20:30] so you are adding it to 1/5 of the watchlists only? [10:20:50] yep, s2 and s4, a small %, if it fails, it will fail for a small % [10:20:57] and if it goes fine, we can slowly increase it [10:21:28] also to 1/1303 of the main traffic to be precise :-P [10:21:46] ok [10:21:52] that's for s2, 1/1304 for s4 [10:22:10] is there monitoring in place? [10:22:21] yep [10:22:25] grafana also [10:22:30] notifications enabled for that host [10:23:06] going to merge, and monitor logtash [10:23:17] you did alter there recently? [10:23:25] ? [10:23:27] where? [10:23:37] db1103 [10:23:41] s2 [10:23:53] yeah, i compressed the tables [10:24:01] cool [10:26:07] going to deploy now [10:27:35] eqiad done [10:27:51] I see selects on the new host \o/ [10:29:46] so far so good [10:30:26] what does logstash say about them? [10:30:38] don't see any issues there [10:30:56] so far XD [10:31:24] nothing on fatalmonitor either [10:31:42] I wonder if db_server will be correctly reported [10:32:31] we should send an email once this is is in place to give a heads up about ports and default ports [10:33:00] sounds like a good idea yeah [10:33:32] i will give it more time and start increasing weight and pooling other services on db1103 during the day [10:33:37] yes [10:33:41] and tomorrow if all goes good, we can also pool db1105 [10:33:46] which is also ready to serve in s1 and s2 [10:34:12] so you did like I suggested non-overlapping pairs? [10:34:20] indeed! [10:34:23] cool [10:34:34] I will update you in our meeting in a bit [10:34:37] I think looking at history [10:34:40] for the next steps in eqiad [10:34:45] oom is the main reason for crash right now [10:34:51] as we need to choose the next hosts [10:34:53] if you want to test if it will be reported correctly in logs you could try to generate one [10:35:17] so I think it was a better option, even if it makes operations more complex [10:37:30] I suppose we can bring down a codfw database while pooled but not active [10:37:36] (an instance) [10:38:14] yeah, we could do that [10:38:33] I was thinking to make one lag a bit, enough to genereate logs [10:38:34] :D [10:38:45] *just enough [10:38:49] sure [10:39:15] yeah, we can even stop replication for some seconds [12:20:59] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3754437 (10jcrespo) [12:22:41] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3754440 (10Marostegui) [13:12:01] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, and 2 others: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3754588 (10Marostegui) >>! In T174569#3754582, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), hre... [13:13:24] 10DBA, 10Data-Services: Consider granting `CREATE TEMPORARY TABLES` to labsdbuser - https://phabricator.wikimedia.org/T179628#3754605 (10Marostegui) Hi, We have been discussing this ticket during our meeting and we don't have a clear picture of what problem you are trying to solve here. Could you give us an... [13:13:31] 10DBA, 10Data-Services: Consider granting `CREATE TEMPORARY TABLES` to labsdbuser - https://phabricator.wikimedia.org/T179628#3754606 (10Marostegui) p:05Triage>03Normal [13:42:18] replication broke on labsdb1009 for cebwiki.geo_tags, and only on labsdb1009 [13:42:27] I guess we have the answer about the crash and if it was corrupted [13:43:36] (I am fixing db1102 btw) [13:43:41] (nothing related to it) [13:46:48] I have "fixed" labsdb1009 by setting: Replicate_Wild_Ignore_Table: cebwiki.geo_tags (and I will reimport the table, it is very small) [14:01:31] labsdb1009 is fixed [14:02:36] (and db1102 too, quite sometime ago) [14:41:21] 10DBA, 10Operations, 10Availability (Multiple-active-datacenters), 10Patch-For-Review, 10Performance-Team (Radar): Make apache/maintenance hosts TLS connections to mariadb work - https://phabricator.wikimedia.org/T175672#3754824 (10jcrespo) > I think this is because the version of yaSSL that MySQL bundle... [15:34:55] 10DBA, 10Analytics, 10Data-Services, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3755154 (10jcrespo) Proposal C: * Maintenance of datasets is puppetized (so it is both the source and... [15:49:21] 10DBA, 10Cloud-Services: Multiple concurrent long running queries from s51434 overloading labsdb1003/labsdb1009 - https://phabricator.wikimedia.org/T133705#3755250 (10jcrespo) 05Resolved>03Open [15:52:45] 10DBA, 10cloud-services-team (Kanban): labsdb1009 crashed - OOM - https://phabricator.wikimedia.org/T179244#3755253 (10jcrespo) 05Resolved>03Open We have to reimport labsdb1009 from labsdb1010. [15:53:59] 10DBA, 10Cloud-Services: Multiple concurrent long running queries from s51434 overloading labsdb1003/labsdb1009 - https://phabricator.wikimedia.org/T133705#3755257 (10jcrespo) ```lines=20 labsdb1009 6457088 s51434 dbproxy1010 frwiki_p 2d select distinct p.page_id,p.page_title,p.page_namespace,p.page_touched,p.... [16:17:40] 10DBA, 10Analytics, 10Data-Services, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3755449 (10bd808) >>! In T173511#3755154, @jcrespo wrote: > Proposal C: > > * Maintenance of datasets... [16:19:33] 10DBA, 10cloud-services-team (Kanban): labsdb1009 crashed - OOM - https://phabricator.wikimedia.org/T179244#3755462 (10Marostegui) For the record, this is the first table that crashed, probably corrupted because of the crash: ``` Nov 13 13:45:45 labsdb1009 mysqld[31730]: 2017-11-13 13:45:45 139666025916160 [ER... [16:21:05] 10DBA, 10Analytics, 10Data-Services, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3755465 (10jcrespo) Yes, although not sure it is really a separate proposal; it would just be a small... [16:23:38] 10DBA, 10cloud-services-team (Kanban): labsdb1009 crashed - OOM - https://phabricator.wikimedia.org/T179244#3755473 (10jcrespo) It happened twice- we cannot trust labsdb1009- copying labsdb1010 away and failing it over is a day's work, with very little human intervention. [16:23:50] 10DBA, 10Data-Services: Consider granting `CREATE TEMPORARY TABLES` to labsdbuser - https://phabricator.wikimedia.org/T179628#3755474 (10Dispenser) An example from a [[https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Disambiguation#Follow_up|recent created report]]: # Table 1: Article titles with GROUP... [16:24:14] 10DBA, 10cloud-services-team (Kanban): labsdb1009 crashed - OOM - https://phabricator.wikimedia.org/T179244#3755475 (10Marostegui) >>! In T179244#3755473, @jcrespo wrote: > It happened twice- we cannot trust labsdb1009- copying labsdb1010 away and failing it over is a day's work, with very little human interve... [16:24:46] 10DBA, 10cloud-services-team (Kanban): labsdb1009 crashed - OOM - https://phabricator.wikimedia.org/T179244#3755478 (10jcrespo) a:05Marostegui>03jcrespo [16:35:46] 10DBA, 10Analytics, 10Data-Services, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3755502 (10bd808) I agree that Proposal C is not really opposing A or B. It is instead proposing an im... [16:38:10] 10DBA, 10Analytics, 10Data-Services, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3755509 (10jcrespo) As a pro- I have run many times into people doing the certain summary queries many... [16:44:23] 10DBA, 10Cloud-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for hif.wiktionary - https://phabricator.wikimedia.org/T173647#3755522 (10aborrero) >>! In T173647#3752275, @Marostegui wrote: > [...] > So as you can see there is no grant for that wiki on the labsdbuser role. So the cr... [16:51:40] 10DBA, 10Cloud-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for hif.wiktionary - https://phabricator.wikimedia.org/T173647#3535715 (10jcrespo) > Manually gave the labsdbuser grants for hifwiktionary_p. labsdbuser grants are right now manually applied, this is on purpose to contr... [16:53:53] 10DBA, 10Data-Services: Consider granting `CREATE TEMPORARY TABLES` to labsdbuser - https://phabricator.wikimedia.org/T179628#3755575 (10bd808) @MZMcBride did you have some reports that used TEMPORARY TABLE as well? I've lost track at this point of who should be involved in some of these conversations. [17:01:06] 10DBA, 10Data-Services: Consider granting `CREATE TEMPORARY TABLES` to labsdbuser - https://phabricator.wikimedia.org/T179628#3731402 (10jcrespo) @Dispenser Do you have a link to a code example?, that will be easier to discuss (the link you added is an ongoing discussion without much code, unless I have missed... [17:12:57] 10DBA, 10Cloud-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for hif.wiktionary - https://phabricator.wikimedia.org/T173647#3755637 (10Marostegui) I found the issue. The first comment from Madhu (T173647#3748572) was about labsdb1011, so I only gave the grants to that host as I as... [18:05:43] 10DBA, 10Data-Services: Toolforge DB replicas timeout [again] - https://phabricator.wikimedia.org/T180380#3755937 (10bd808) [18:07:20] 10DBA, 10Data-Services: "ERROR 2006 (HY000): MySQL server has gone away" failures for a variety of queries against Wiki Replica servers - https://phabricator.wikimedia.org/T180380#3755845 (10bd808) [18:20:37] 10DBA, 10Data-Services: "ERROR 2006 (HY000): MySQL server has gone away" failures for a variety of queries against Wiki Replica servers - https://phabricator.wikimedia.org/T180380#3755845 (10jcrespo) Before we debug this-- there is (potentially) a misunderstanding, check the line with the ***: ``` tools.mix-... [18:23:30] 10DBA, 10Data-Services: "ERROR 2006 (HY000): MySQL server has gone away" failures for a variety of queries against Wiki Replica servers - https://phabricator.wikimedia.org/T180380#3755980 (10Magnus) I am sure my client can handle ~800 rows, which is what I get when I use DISTINCT, as shown... [18:24:22] 10DBA, 10Data-Services: "ERROR 2006 (HY000): MySQL server has gone away" failures for a variety of queries against Wiki Replica servers - https://phabricator.wikimedia.org/T180380#3755981 (10Magnus) And yes, I know the disconnect happened before the *** line. In all the ~2sec it took me to type the command. [18:29:20] 10DBA, 10Data-Services: "ERROR 2006 (HY000): MySQL server has gone away" failures for a variety of queries against Wiki Replica servers - https://phabricator.wikimedia.org/T180380#3756002 (10jcrespo) @madhuvishy @aborrero can you reproduce (I do not have access to forge)?, disconnecting so fast is very weird,... [18:31:30] 10DBA, 10Cloud-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for hif.wiktionary - https://phabricator.wikimedia.org/T173647#3756039 (10aborrero) > labsdbuser grants are right now manually applied, this is on purpose to control which data is and is not available to all users. Ok @... [18:40:31] 10DBA, 10Cloud-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for hif.wiktionary - https://phabricator.wikimedia.org/T173647#3756065 (10jcrespo) > Ok @jcrespo however, Would it make sense to grant wildcard for all _p views? How would you do that? [18:45:29] 10DBA, 10Cloud-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for hif.wiktionary - https://phabricator.wikimedia.org/T173647#3756095 (10aborrero) >>! In T173647#3756065, @jcrespo wrote: >> Ok @jcrespo however, Would it make sense to grant wildcard for all _p views? > > How would y... [18:52:28] 10DBA, 10Cloud-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for hif.wiktionary - https://phabricator.wikimedia.org/T173647#3756158 (10jcrespo) Which lead to potential private data leaks on: T101758 I said on T104900 why that is a bad idea: > Wildcards on user authorization/authe... [18:53:24] 10DBA, 10Cloud-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for hif.wiktionary - https://phabricator.wikimedia.org/T173647#3756167 (10Marostegui) >>! In T173647#3756039, @aborrero wrote: > > BTW @Marostegui: > > ``` > aborrero@tools-bastion-03:~$ sql --cluster web hifwiktionary... [18:57:43] 10DBA, 10Cloud-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for hif.wiktionary - https://phabricator.wikimedia.org/T173647#3756193 (10madhuvishy) @jcrespo That makes sense, I didn't know the private data was the reason we didn't do the wildcard grants. Lets leave it as is then, @... [19:00:15] 10DBA, 10Cloud-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for hif.wiktionary - https://phabricator.wikimedia.org/T173647#3756199 (10jcrespo) @madhuvishy I think @rush's idea was to fully automatize the addition of databases, //because there was a manual grant happening//. [19:00:55] 10DBA, 10Cloud-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for hif.wiktionary - https://phabricator.wikimedia.org/T173647#3756200 (10madhuvishy) @jcrespo Understood, I wasn't aware of that. We are in the right track then :) [19:06:00] 10DBA, 10Cloud-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for hif.wiktionary - https://phabricator.wikimedia.org/T173647#3756245 (10madhuvishy) 05Open>03Resolved Everything seems to be good now, I'm resolving this task. Thanks a ton @Marostegui! [21:04:21] 10DBA: Remove duplicate comment_temp indexes - https://phabricator.wikimedia.org/T180162#3756661 (10Anomie) You lost me in there, @jcrespo ;) I'll tell you what's being run and hopefully you can tell us what the indexes should look like. The queries in MediaWiki to these tables are always left-to-right. It's 1:... [21:15:01] 10DBA, 10Analytics, 10Data-Services, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3756679 (10bd808) Do we need a full blown [[https://en.wikipedia.org/wiki/Extract,_transform,_load|ETL... [21:59:21] 10DBA, 10Data-Services: "ERROR 2006 (HY000): MySQL server has gone away" failures for a variety of queries against Wiki Replica servers - https://phabricator.wikimedia.org/T180380#3755845 (10bd808) Not a definitive answer, but I seem to be able to make the example query using the `sql` wrapper (which is just a...