[05:25:34] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3689499 (10Marostegui) [05:40:52] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-ORES, 10MW-1.29-release (WMF-deploy-2017-04-25_(1.29.0-wmf.21)), and 5 others: Concerns about ores_classification table size on enwiki - https://phabricator.wikimedia.org/T159753#3689513 (10Marostegui) Thanks @Ladsgroup I will optimize those tables... [07:08:50] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3689560 (10Marostegui) [07:19:46] 10DBA, 10Data-Services, 10Tracking: Wikireplica service for tools and labs - issues and missing available views (tracking) - https://phabricator.wikimedia.org/T150767#3689572 (10jcrespo) ok to me, if someone retag those tickets. [07:33:40] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3689595 (10Marostegui) [08:00:55] 10DBA, 10Operations, 10ops-codfw: db2081 unreachable - https://phabricator.wikimedia.org/T178140#3689646 (10Marostegui) 05Open>03Resolved I am going to close this for now as resolved. Rebooted the host twice without any issues. So far it looks good, if it happens again, we can reopen. Thanks @Papaul for... [08:05:31] 10DBA, 10Data-Services, 10InternetArchiveBot: User log table creation on tools.labsdb failing intermittantly for IABot interactive UI - https://phabricator.wikimedia.org/T178294#3689661 (10Marostegui) 05Open>03Resolved The graph looks stable and back to the normal pattern: https://grafana.wikimedia.org/d... [08:08:25] 10DBA, 10Data-Services: labsdb1005's mysql crashed - https://phabricator.wikimedia.org/T178272#3689667 (10Marostegui) 05Open>03Resolved a:03jcrespo The load seems back to previous levels: https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&panelId=19&fullscreen&orgId=1&var-server=la... [09:19:19] 10DBA, 10Wikidata: Migrate wb_terms to using prefixed entity IDs instead of numeric IDs - https://phabricator.wikimedia.org/T114903#3689766 (10WMDE-leszek) [09:19:22] 10DBA, 10Wikidata, 10Patch-For-Review, 10User-Ladsgroup, 10Wikidata-Sprint: Populate term_full_entity_id on www.wikidata.org - https://phabricator.wikimedia.org/T171460#3689765 (10WMDE-leszek) 05Open>03Resolved [09:28:40] marostegui: jynus: Pluse one on https://gerrit.wikimedia.org/r/384003 and https://gerrit.wikimedia.org/r/384592 would be appreciated [09:28:53] plan to deploy them tomorrow (with a 10h gap in between) [09:30:11] what time? [09:30:46] 8 UTC for the description tracking (don't expect to see much here, but will monitor regardless) [09:31:02] 18 UTC for the statement usage to cawiki [09:31:15] (which made trouble last time, but the module was since fixed) [09:31:21] have you heard of dblists? [09:31:39] :-) [09:31:40] 18UTC is quite late I would say [09:32:23] jynus: This config. is an array… so not easy to do with a dblist :/ [09:33:26] marostegui: hm… I wanted to avoid having this on Thursday [09:33:34] If wanted, I can push out both at the same time [09:33:54] (8 UTC) [09:34:02] hoo: that makes sense, but 18UTC is late evening for me, so I won't be around, not sure if jynus is planning to be around that late either [09:34:13] can you monitor https://grafana.wikimedia.org/dashboard/db/mysql-replication-lag and https://grafana.wikimedia.org/dashboard/db/mysql-aggregated on deploy ? [09:34:17] hoo: 8UTC looks better for me as we have the whole day to track its progress [09:34:46] Ok, will move the deploy than [09:35:16] jynus: Will do… I've been following along the past deployments closely as well [09:36:00] I want to clean ores_classification in these wikis: nlwiki, frwiki, fawiki, cswiki, ruwiki, ptwiki, trwiki, etwiki, fiwiki, and some other (all wikis that have the extension enabled), is there a shard or a wiki that I should be careful? [09:36:39] those are fairly small I would say, if it has not created issues on enwiki/wikidata I would say you could go ahead [09:37:23] Amir1: I assume the cleanup take some time, can you add them to the "week of" Deployments page? [09:37:33] that way you do not have to ping us every time [09:38:18] hmm, These wikis are small so I guess it won't take much time (except for frwiki and some other big wikis) [09:38:25] enwiki and wikidatawiki are done already [09:38:31] but sure [09:38:42] Thanks [09:38:59] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3689817 (10hoo) Moved the deploy to 8 UTC tomorrow (October 18), per @Marostegui. Will be deployed together with {T177... [09:39:26] Moved the deploy… thanks for you help! [09:45:17] 10DBA: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3689827 (10Marostegui) [09:46:24] 10DBA: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3689841 (10Marostegui) [09:47:54] I have moved the db1044 binlog backup away [09:47:58] to dbstore1001 [09:48:06] cool [09:48:20] it is still at 10% disk available [09:48:37] is there some host I can help with the pooling and depooling? [09:48:57] I do not think I can take more actions regarding the recovery right now [09:53:14] now that i think about it [09:53:27] maybe we should try to optimize the biggest tables in db1044 (as it is sanitarium master) [09:53:34] so we can get some more space back [09:53:35] and have more room to replace it [09:55:23] 10DBA: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3689847 (10Marostegui) I would personally go for Stretch + 10.1 for these new hosts [09:56:04] do we have a goal meta ticket, BTW [09:56:05] ? [09:56:15] if not, I can create one [09:56:20] for that one you created [09:56:23] and the s8 one [09:56:40] i think we do [09:56:40] let me see [09:56:47] ah, yes T177208 [09:56:47] T177208: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208 [09:56:53] yep [09:56:59] we can add it to both of them maybe, no? [09:57:01] as parents I mean [09:57:12] Let's add yours to T177208 [09:57:47] oki [09:58:20] done [09:58:20] 10DBA: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3689861 (10jcrespo) [09:58:34] thanks [09:59:04] T170662 is only a partial dependency [09:59:04] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [09:59:18] we have to setup some of them, but many others will be for misc [10:00:00] yes, but at least we can see how many we set, that is why I added basically [10:00:06] yes [10:00:17] I would actually try to decom db1044 [10:00:35] and try to substitute it with a newer one [10:00:49] stop, set to row, repoint the master [10:01:37] I think we no longer use db1044 of anything except being an intermediate step [10:02:51] db1072 instead, maybe? [10:02:59] but I would wait for checksums to complete [10:03:54] yeah, db1072 is the best candidate I would say [10:04:54] it will take a few more days to be 100% sure [10:05:09] that is why i said that maybe we can optimize a set of the biggest tables of db1044 [10:05:15] to avoid that 90% all the time :) [10:07:54] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 3 others: Deploy dropping wb_entity_per_page table - https://phabricator.wikimedia.org/T177601#3689871 (10Marostegui) @Ladsgroup just to be clear, it needs to be dropped everywhere **except** from wikidatawiki (s5... [10:08:51] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3689872 (10Marostegui) [10:19:21] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 3 others: Deploy dropping wb_entity_per_page table - https://phabricator.wikimedia.org/T177601#3689891 (10Ladsgroup) @Marostegui : No, it's the other way around. It needs to be dropped in these two wikis only beca... [10:21:04] marostegui: optimize or compress? [10:21:11] or both? [10:21:16] both maybe :) [10:21:39] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 3 others: Deploy dropping wb_entity_per_page table - https://phabricator.wikimedia.org/T177601#3689897 (10Marostegui) >>! In T177601#3689891, @Ladsgroup wrote: > @Marostegui : No, it's the other way around. It nee... [10:41:23] it seem s7 has a weird replica: https://grafana.wikimedia.org/dashboard/db/mysql-replication-lag?panelId=7&fullscreen&orgId=1&from=now-3h&to=now I don't know if you know it or not [10:45:25] let me see [10:47:03] Amir1: it is depooled https://noc.wikimedia.org/conf/highlight.php?file=db-eqiad.php [10:47:18] so under maintenance, see the ticket on the comment T174509 [10:47:19] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [10:47:25] so "normal" [10:48:04] sadly dynamic graphing is not that easy on grafana, so depooled hosts are shown there [10:48:46] https://logstash.wikimedia.org/app/kibana#/dashboard/DBReplication can be use to see if it is afecting production [10:49:17] summary: now worries so far [10:53:33] backup dbstore2002:/srv/backups/x1.20171017090128 finished [11:10:59] I am going to reload dbproxy1010 unless someone speaks up [11:12:56] Thanks :) [11:26:19] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 3 others: Deploy dropping wb_entity_per_page table - https://phabricator.wikimedia.org/T177601#3690074 (10Ladsgroup) By "It" I meant the dropping, I'm sorry for the confusion, will update the task description. [11:27:17] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 3 others: Deploy dropping wb_entity_per_page table - https://phabricator.wikimedia.org/T177601#3690075 (10Ladsgroup) [11:27:46] marostegui: sorry for the confusion, https://phabricator.wikimedia.org/T177601 please check if it's clear enough and ask if there is anything is not :) [11:27:47] Thanks [11:36:27] 10DBA: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3690103 (10jcrespo) We can go for 10.1 (either jessie or stretch, whatever is easier); unlike the rest of the hosts, we cannot just copy them to regular replicas, and we have to upgrade them any way. Plus setting up multisour... [11:41:55] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 3 others: Deploy dropping wb_entity_per_page table - https://phabricator.wikimedia.org/T177601#3690120 (10Marostegui) >>! In T177601#3690074, @Ladsgroup wrote: > By "It" I meant the dropping, I'm sorry for the con... [11:45:41] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3690125 (10Marostegui) [11:49:03] 10DBA: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3690128 (10Marostegui) Given that we "have to" go for 10.1, I would go for stretch. [12:49:58] 10DBA, 10Operations, 10ops-eqiad: db1101 crashed - https://phabricator.wikimedia.org/T178383#3690357 (10Marostegui) [12:50:08] 10DBA, 10Operations, 10ops-eqiad: db1101 crashed - https://phabricator.wikimedia.org/T178383#3690369 (10Marostegui) p:05Triage>03Normal [12:56:52] 10DBA, 10Operations, 10ops-eqiad: db1101 crashed - memory errors - https://phabricator.wikimedia.org/T178383#3690380 (10Marostegui) [12:56:55] 10DBA, 10Operations, 10ops-eqiad: db1101 crashed - memory errors - https://phabricator.wikimedia.org/T178383#3690357 (10Marostegui) [12:56:58] 10DBA, 10Operations, 10ops-eqiad: db1101 crashed - memory errors - https://phabricator.wikimedia.org/T178383#3690357 (10Marostegui) [13:02:01] 10DBA, 10Operations, 10ops-eqiad: db1101 crashed - memory errors - https://phabricator.wikimedia.org/T178383#3690405 (10Marostegui) a:03Cmjohnson @Cmjohnson can we get a new dimm for this host to replaced that one mentioned on the logs? Thanks [13:03:54] 10DBA, 10Expiring-Watchlist-Items, 10MediaWiki-Watchlist, 10TCB-Team, and 8 others: Allow setting the watchlist table to read-only on a per-wiki basis - https://phabricator.wikimedia.org/T160062#3690419 (10Tobi_WMDE_SW) [13:05:12] 10DBA, 10Operations, 10ops-eqiad: db1101 crashed - memory errors - https://phabricator.wikimedia.org/T178383#3690430 (10Marostegui) I have powercycled the host and it came back up fine, but we better replace that DIMM as the server is quite new. Going to execute the alters again to see if it crashes once more. [13:11:45] 10DBA, 10monitoring, 10Epic, 10Patch-For-Review, 10Wikimedia-Incident: Reduce false positives on database pages - https://phabricator.wikimedia.org/T177782#3690472 (10Marostegui) > Application-bad states (code problems) should be better exposed to those people that can do something about it, not to ops T... [14:02:46] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3690646 (10thiemowmde) p:05Triage>03Normal a:03hoo [14:06:03] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3690655 (10Marostegui) [14:26:00] 10DBA, 10Patch-For-Review: Run pt-table-checksum on s3 - https://phabricator.wikimedia.org/T164488#3690730 (10Marostegui) [14:39:13] 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Prepare and check storage layer for amwikimedia - https://phabricator.wikimedia.org/T176043#3690781 (10jcrespo) p:05Normal>03High a:05Andrew>03jcrespo @Ladsgroup There is a duplicate database on s7 called amwikimedia. I ass... [14:39:23] marostegui: https://phabricator.wikimedia.org/T176043#3690781 [14:39:36] :| [14:40:00] how did that could possible happen? [14:40:39] maybe the script was ran several times? [14:54:02] jynus: I already tagged the open children of T150767 with #data-services, so I think its good to close. [14:54:02] T150767: Wikireplica service for tools and labs - issues and missing available views (tracking) - https://phabricator.wikimedia.org/T150767 [14:54:41] I'll work up some message to put at the top of the bug before closing it [14:55:20] there is also T138967 [14:55:20] T138967: Labs database replica drift - https://phabricator.wikimedia.org/T138967 [14:55:30] but I do not want to close it until the documentation is updated [14:55:38] *nod* [14:55:54] I'll try to work on the docs today. [14:56:00] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database/Replica_drift this is basically fixed [14:56:31] I have a draft for the shutdown announce too, but we need to pick dates for the reboots of labsdb100[13] if we are still planning on doing that. [14:56:31] by having migrated production to ROW [14:56:45] let's do that ASAP [14:56:57] just send an email for people to backup data that could not be lost [14:57:13] which should be none, but you know [14:57:30] I'll find out who from my team can do the reboots and have them check with y'all about times [14:57:40] I am going on vacation [14:57:44] but manuel will be around [14:57:49] in any case [14:57:52] there is not much to do [14:58:03] if it works, it is just a restart [14:58:15] if the hardware fails, there isn't much to do [14:58:25] yeah, I was just thinking there may be smaller problems than a hw fail [14:58:38] like unpuppetized stuff? [14:58:48] who knows, those are so old... [14:58:51] but I agree that if we have drive failures then its done [14:59:18] we just have to annouce for people to take backups ASAP [14:59:40] agreed [14:59:55] most uses are just a cron [15:00:00] so that should be fine [15:00:18] one thing I get is? where should I backup [15:00:33] I know there is NFS and other fs [15:00:38] "your laptop" is the canonical answer [15:00:41] it would be nice to clarify that [15:00:46] ok [15:01:04] we do not recommend treating our NFS as safe [15:01:08] sure [15:01:14] that is why I didn't know what to say [15:01:31] someday™ we will have a backup solution [15:01:46] oh, I didn't expect backups to be provided [15:02:00] but maybe disk space somwhere, auto-managed [15:02:08] Chase and I have crazy plans but they are still pretty far off [15:02:09] even a local disk [15:02:32] but it was proposed back in the day to backup replica data [15:02:50] and it was denied, because important data should not be on replicas [15:03:24] plus many people do redundant copies of wiki data [15:03:39] yeah. I don't want to make automatic db backups for people, but I do want some backup service that can be used generally [15:04:03] like you are pointing out it is a tricky problem though because of people hoarding things [15:04:43] if we had fancy SAN type things that did block level deduplication that would be less painful [15:04:50] but that's probably never going to happen [15:08:10] jynus: also, have a good vacation! you deserve one :) [15:10:16] bd808: we had a long discussion about that for production db backups [15:10:44] in the end we are (most probably) go without deduplication [15:27:58] 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Prepare and check storage layer for amwikimedia - https://phabricator.wikimedia.org/T176043#3690925 (10Ladsgroup) hmm, when at first I tried to make the wiki, due to lack of documentation I used "fawiki" instead of "aawiki" because... [15:31:43] 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Prepare and check storage layer for amwikimedia - https://phabricator.wikimedia.org/T176043#3690929 (10jcrespo) Sorry, I phrased the "how" badly (I do not care much if there was a bug on the documentation/procedure). What I wanted... [15:34:25] 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Prepare and check storage layer for amwikimedia - https://phabricator.wikimedia.org/T176043#3690930 (10Ladsgroup) I was more into explaining that this was a one time thing and won't happen so we should not be worried about future c... [15:37:06] 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Prepare and check storage layer for amwikimedia (including dropping s7 version of the wiki) - https://phabricator.wikimedia.org/T176043#3690933 (10jcrespo) [15:39:03] 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Prepare and check storage layer for amwikimedia (including dropping s7 version of the wiki) - https://phabricator.wikimedia.org/T176043#3690935 (10Marostegui) [15:41:19] 10DBA, 10Patch-For-Review: Run pt-table-checksum on s3 - https://phabricator.wikimedia.org/T164488#3690937 (10Marostegui) I am farily confident that all the drifts have been corrected on s3 (I will do another final check tomorrow) But I would love to do a final check for db1044 (sanitarium master), but that wo... [15:45:55] 10DBA, 10Patch-For-Review: Run pt-table-checksum on s3 - https://phabricator.wikimedia.org/T164488#3690948 (10jcrespo) Actually, we can do it- we can use one of the currently depooled hosts as new sanitarium master. OR we do not need to stop replication on db1044- some lag could happen on labsdbs, but of secon... [15:48:02] 10DBA, 10Patch-For-Review: Run pt-table-checksum on s3 - https://phabricator.wikimedia.org/T164488#3690951 (10jcrespo) s/we do not need to stop replication on db1044/only stop it to sync it to other host/. I can do either for you, if you want. [15:48:20] 10DBA, 10Patch-For-Review: Run pt-table-checksum on s3 - https://phabricator.wikimedia.org/T164488#3690952 (10Marostegui) >>! In T164488#3690948, @jcrespo wrote: > Actually, we can do it- we can use one of the currently depooled hosts as new sanitarium master. OR we do not need to stop replication on db1044- s... [16:15:55] 10DBA, 10Operations, 10Ops-Access-Requests, 10cloud-services-team (Kanban): Access to raw database tables on labsdb* for wmcs-admin users - https://phabricator.wikimedia.org/T178128#3691050 (10chasemp) p:05Triage>03Normal a:03madhuvishy @madhuvishy is going to take a tour here and document from our e... [17:18:45] 10DBA, 10Analytics, 10Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3691322 (10faidon) [22:23:14] 10DBA, 10monitoring, 10Epic, 10Patch-For-Review, 10Wikimedia-Incident: Reduce false positives on database pages - https://phabricator.wikimedia.org/T177782#3692371 (10Dzahn) https://gerrit.wikimedia.org/r/#/c/384895/ (and see comment on that)