[00:12:45] 10DBA, 10Data-Services, 10Goal, 10cloud-services-team (FY2017-18): Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#3633900 (10bd808) [01:01:36] 10DBA, 10Data-Services, 10Epic: Labs database replica drift - https://phabricator.wikimedia.org/T138967#3633963 (10bd808) [01:19:45] 10DBA, 10Data-Services, 10Goal, 10cloud-services-team (FY2017-18): Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#3634014 (10bd808) [04:10:42] 10DBA, 10Data-Services, 10Goal, 10cloud-services-team (FY2017-18): Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#3634209 (10zhuyifei1999) [05:32:07] 10DBA, 10Commons, 10Contributors-Team, 10MediaWiki-Watchlist, and 7 others: "2062 Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis - https://phabricator.wikimedia.org/T171027#3544973 (10Mattflaschen-WMF) [05:35:12] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661#3634295 (10Marostegui) >>! In T168661#3632757, @greg wrote: > What @Marostegui said, the only really more complicated part is the announce/delay so that peop... [05:36:40] 10DBA, 10Data-Services, 10User-bd808, 10cloud-services-team (Kanban): Create and announce timeline for shutting down labsdb100[13] - https://phabricator.wikimedia.org/T175086#3582412 (10Marostegui) >>! In T175086#3633776, @bd808 wrote: > We also need to choose a date sometime in October to perform the outs... [05:43:44] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2047 got rebooted - https://phabricator.wikimedia.org/T176573#3634327 (10Marostegui) The rack looks fine and so do the PDU and their temperature graphs. Going to repool this host and if it happens again we will really need to look into this as the rac... [05:48:10] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2047 got rebooted - https://phabricator.wikimedia.org/T176573#3634330 (10Marostegui) 05Open>03Resolved a:03Marostegui [06:01:45] 10DBA, 10Data-Services, 10User-bd808, 10cloud-services-team (Kanban): Create and announce timeline for shutting down labsdb100[13] - https://phabricator.wikimedia.org/T175086#3634372 (10bd808) >>! In T175086#3634296, @Marostegui wrote: > My question to that would be...if one of them doesn't come back, are... [06:03:00] 10DBA, 10Data-Services, 10User-bd808, 10cloud-services-team (Kanban): Create and announce timeline for shutting down labsdb100[13] - https://phabricator.wikimedia.org/T175086#3634375 (10Marostegui) >>! In T175086#3634372, @bd808 wrote: >>>! In T175086#3634296, @Marostegui wrote: >> My question to that woul... [06:07:02] 10DBA: Drop "trackbacks" table on all wikis that have it - https://phabricator.wikimedia.org/T175051#3634380 (10Marostegui) [06:13:18] 10DBA, 10Commons, 10Contributors-Team, 10MediaWiki-Watchlist, and 7 others: "2062 Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis - https://phabricator.wikimedia.org/T171027#3634388 (10Mattflaschen-WMF) >>! In T171027#3628964, @Baw... [06:18:52] 10DBA: Drop "trackbacks" table on all wikis that have it - https://phabricator.wikimedia.org/T175051#3634409 (10Marostegui) [06:25:18] 10DBA, 10cloud-services-team, 10Patch-For-Review: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3634410 (10Marostegui) What should we do with this task? is it all good now? [07:28:55] 10DBA: Drop "trackbacks" table on all wikis that have it - https://phabricator.wikimedia.org/T175051#3634480 (10Marostegui) [07:51:23] 10DBA: Drop "trackbacks" table on all wikis that have it - https://phabricator.wikimedia.org/T175051#3634540 (10Marostegui) [07:58:59] 10DBA: Drop "trackbacks" table on all wikis that have it - https://phabricator.wikimedia.org/T175051#3634566 (10Marostegui) [08:02:24] 10DBA, 10cloud-services-team, 10Patch-For-Review: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3634575 (10hashar) >>! In T175002#3579784, @chasemp wrote: > I'm wondering if this is related: > > https://phabricator.wikimedia.org/T170492#3... [08:12:44] 10DBA, 10Patch-For-Review: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679#3634595 (10Marostegui) I would like to decommission db1035 from s3, it is low on disk space. According to https://gerrit.wikimedia.org/r/#/c/338996/1/wmf-config/db-eqiad.php what we could do is..s... [10:04:03] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3634844 (10Marostegui) [12:05:20] 10DBA, 10Operations, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make client certs available for apache/maintenance hosts for TLS connections to mariadb - https://phabricator.wikimedia.org/T175672#3599592 (10faidon) I may be missing something, but why do we need //client// cer... [12:08:05] 10DBA, 10Operations, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make client certs available for apache/maintenance hosts for TLS connections to mariadb - https://phabricator.wikimedia.org/T175672#3635120 (10jcrespo) we do not need client certs- we need the "public" CA being a... [12:20:44] 10DBA, 10Operations, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make client certs available for apache/maintenance hosts for TLS connections to mariadb - https://phabricator.wikimedia.org/T175672#3635125 (10faidon) That isn't needed. We import the puppet CA to the host's cert... [12:23:45] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3635129 (10Marostegui) [12:24:35] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3564203 (10Marostegui) I have removed the index from codfw and eqiad in s1 The only hosts pending on eqiad are: * dbstore1001 and dbstore1002 (they do... [12:24:54] 10DBA, 10Operations, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make client certs available for apache/maintenance hosts for TLS connections to mariadb - https://phabricator.wikimedia.org/T175672#3599592 (10Joe) No puppet patch is needed, if you just need the CA cert availabl... [12:33:18] 10DBA, 10Operations, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make client certs available for apache/maintenance hosts for TLS connections to mariadb - https://phabricator.wikimedia.org/T175672#3635140 (10Joe) Looking at http://php.net/manual/en/mysqli.ssl-set.php, I would... [12:58:38] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3635169 (10Marostegui) s2 is totally done. Pending (and will not be done) dbstore1001, dbstore1002 and db1047, which are tokudb and then it cannot be... [12:58:45] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3635170 (10Marostegui) [13:17:45] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3635209 (10Marostegui) s6 has been done in eqiad. I just realised that s6 in codfw, for templatelinks and pagelinks do not have the PK, they still ha... [13:23:09] 10DBA, 10Epic, 10Patch-For-Review, 10codfw-rollout: Database maintenance scheduled while eqiad datacenter is non primary (after the DC switchover) - https://phabricator.wikimedia.org/T155099#3635223 (10Marostegui) [13:23:11] 10DBA: Convert unique keys into primary keys for some wiki tables on s6-eqiad - https://phabricator.wikimedia.org/T163979#3635220 (10Marostegui) 05Resolved>03Open a:05jcrespo>03Marostegui Re-opening to make sure codfw also get the PKs as they are missing there. I will directly alter the master as replica... [13:23:14] 10DBA, 10MediaWiki-Database, 10Patch-For-Review, 10PostgreSQL, 10Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3635224 (10Marostegui) [13:24:08] 10DBA: Convert unique keys into primary keys for some wiki tables on s6-eqiad and codfw - https://phabricator.wikimedia.org/T163979#3635228 (10Marostegui) [13:58:37] 10DBA, 10cloud-services-team, 10Patch-For-Review: db1009 (m5, used primarily for cloud services) unresponsive for minutes - https://phabricator.wikimedia.org/T175002#3635438 (10chasemp) 05Open>03Resolved >>! In T175002#3634410, @Marostegui wrote: > What should we do with this task? is it all good now? C... [14:07:21] 10DBA, 10Patch-For-Review: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679#3635468 (10jcrespo) Yes, although we may need still an extra host for vlow/dumps, separate from the other services, and smaller in size. Any ideas about which one we can move that is relatively ol... [14:13:48] 10DBA, 10Operations, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make client certs available for apache/maintenance hosts for TLS connections to mariadb - https://phabricator.wikimedia.org/T175672#3635511 (10jcrespo) @faidon, cool! Less work for us :-) As you imagine, I didn't... [14:14:28] 10DBA, 10Operations, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make client certs available for apache/maintenance hosts for TLS connections to mariadb - https://phabricator.wikimedia.org/T175672#3635515 (10jcrespo) @Joe so s/puppet patch/mediawiki config patch/ :-) [14:35:42] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3635587 (10hoo) Now that all articles have been refreshed (see T151717#3621993/T151717#3621975 for a comparison): ``... [14:38:25] 10DBA, 10Operations, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make client certs available for apache/maintenance hosts for TLS connections to mariadb - https://phabricator.wikimedia.org/T175672#3635607 (10jcrespo) In fact, as this is not going to be enabled on all hosts yet... [14:48:05] jynus: marostegui: https://phabricator.wikimedia.org/T151717#3633466 What do you think? [14:59:01] also I need advise with two queries [15:00:16] hoo, not the best week for getting blocked on us [15:00:24] we need time to check your data [15:10:06] that's fine, I guess [15:10:18] I'll keep you updated when there are (experimental) changes from us [15:10:25] we might use elwiki as testbed for more things [15:11:47] SELECT page_id,page_title FROM `page` LEFT JOIN `redirect` ON ((page_id = rd_from)) WHERE (page_id > 86) AND page_namespace IN ('0','120') AND (rd_from IS NULL) ORDER BY page_id ASC LIMIT 2500 [15:12:34] SELECT rev_id,rev_content_format,rev_timestamp,page_latest,page_is_redirect,old_id,old_text,old_flags,page_title FROM `page` INNER JOIN `revision` ON ((page_latest=rev_id)) INNER JOIN `text` ON ((old_id=rev_text_id)) WHERE (('Q2'=page_title) AND (0=page_namespace)) OR (('P2'=page_title) AND (120=page_namespace)) … (with 500 such page_title/ page_namespace pairs) [15:12:52] jynus: Do you think these are ok in a dump maintenance script (not during a web request) [15:13:36] they seem quite fast, I tested the relevant code on terbium already [15:15:00] hoo- protip- if you need an actual answer from a dba- go to phabricator [15:15:04] :-D [15:15:13] It will get lost here if not [15:15:35] I hoped for a quick doesn't look to bad :P I can open a ticket, if you want [15:15:48] I am in dublin typing this while standing in the middle of a conference [15:15:55] for context :-) [15:23:52] 10DBA: Larger Wikidata entity dump queries - https://phabricator.wikimedia.org/T176760#3635830 (10hoo) [15:29:06] 10DBA, 10Operations, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make client certs available for apache/maintenance hosts for TLS connections to mariadb - https://phabricator.wikimedia.org/T175672#3635865 (10aaron) >>! In T175672#3635140, @Joe wrote: > Looking at http://php.ne... [15:29:20] 10DBA: Larger Wikidata entity dump queries - https://phabricator.wikimedia.org/T176760#3635868 (10hoo) [15:34:20] 10DBA: Larger Wikidata entity dump queries - https://phabricator.wikimedia.org/T176760#3635888 (10ArielGlenn) This is in the context of the wikidatawiki weekly dumps job (json and other formats), which runs separately from the usual xml/sql dumps. I expect that it mostly runs on the vslow/dumps db server. [15:37:13] 10DBA: Larger Wikidata entity dump queries - https://phabricator.wikimedia.org/T176760#3635903 (10hoo) >>! In T176760#3635888, @ArielGlenn wrote: > I expect that it mostly runs on the vslow/dumps db server. Sadly not, due to {T147169} :/ [16:17:57] 10DBA: Larger Wikidata entity dump queries - https://phabricator.wikimedia.org/T176760#3636037 (10Marostegui) A quick run of those two queries on the slowest slaves of s5 doesn't look too dangerous to me (as in they are pretty fast). I will check a bit more tomorrow [16:50:07] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661#3636157 (10greg) I would ask the Multimedia team who their standard community liaison is (they must have one due to the Structured Data on Commons work). cc... [17:15:48] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661#3636198 (10MarkTraceur) That would be @CKoerner_WMF ... [17:35:04] 10DBA, 10Operations, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make client certs available for apache/maintenance hosts for TLS connections to mariadb - https://phabricator.wikimedia.org/T175672#3636240 (10jcrespo) I can connect using python just by pointing to the CA cert:... [17:45:30] 10DBA: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3636252 (10Marostegui) @Cmjohnson I am back from holidays, so let me know when you can help us with this task sometime next week if you want Thank you! [20:26:10] 10DBA, 10Operations, 10ops-codfw: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764#3637038 (10jcrespo) 05Resolved>03Open [20:28:25] 10DBA, 10Operations, 10ops-codfw: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764#3637045 (10Marostegui) Looks like this server has crashed again for the same reason: ``` [Tue Sep 26 20:17:42 2017] hpsa 0000:02:00.0: scsi 0:1:0:0: resetting logical Direct-Access HP LOGICAL VOLUM... [20:32:27] 10DBA, 10Operations, 10ops-codfw: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764#3637071 (10Marostegui) I cannot see anything on ILO logs, last entry is from 9th Sept [23:22:29] 10DBA, 10Community-Tech-Sprint, 10MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10Patch-For-Review: Issue with maintenance script: SELECTing revisions with high rev_id is pai... - https://phabricator.wikimedia.org/T175962#3637761 [23:23:03] 10DBA, 10Community-Tech-Sprint, 10MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10Patch-For-Review: Issue with maintenance script: SELECTing revisions with high rev_id is pai... - https://phabricator.wikimedia.org/T175962#3637771