[05:38:02] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3708790 (10Marostegui) [05:39:45] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3686383 (10Marostegui) [05:41:03] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3708793 (10Marostegui) [07:04:19] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3708888 (10Marostegui) [08:52:18] 10DBA, 10Operations, 10cloud-services-team, 10Scoring-platform-team (Current): Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3709050 (10MoritzMuehlenhoff) I installed the latest trusty kernels on labsdb1001/1003. [10:35:40] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3709313 (10Marostegui) [10:37:44] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3709315 (10Marostegui) [10:41:06] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3709320 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` db2091.codfw.wmnet ``` The log can be found in `/var/log/wmf-auto-reimage/... [10:53:56] marostegui: (if you have time) qq about db1108 and https://gerrit.wikimedia.org/r/#/c/386359/2 - if I merge this change I guess that puppet will try to install mariadb etc.., should we to any manual step/config/etc.. before it? [10:54:03] for example to install a specific version [10:54:47] let me see [10:55:02] ah [10:55:11] no, that should be fine, there is nothing running on that server anyways, no? [10:55:27] does it have notifications disabled? [10:55:28] nope, it is a brand new stretch host [10:55:30] yep! [10:55:39] then i think we should be good to go [10:55:54] very nice, I am going to lunch and then I'll try [10:55:59] cool! [10:56:03] enjoy the pasta [10:56:04] :p [10:56:18] I am going for a salad probably :P [10:56:25] don't be boring [10:59:40] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3709389 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db2091.codfw.wmnet'] ``` and were **ALL** successful. [10:59:49] \o/ [12:15:00] db1108 ready (still an issue with the unit for eventlogging_sync but I am working on it) [12:15:24] in theory we could verify the host and then copy over filezz? [12:16:29] elukey@db1108:/var/log$ dpkg --list | grep maria [12:16:30] ii libmariadbclient18:amd64 10.1.26-0+deb9u1 amd64 MariaDB database client library [12:16:32] ii wmf-mariadb101 10.1.28-1 amd64 MariaDB 10.1 with Wikimedia-specific patches. [12:16:57] nice! [12:17:12] to do so, we'd need to stop db1047, is that possible? [12:17:55] I think it is fine but since multiple things are reading from it in analytics I might need to ask for some help from my team [12:18:01] sure [12:18:10] we can do it tomorrow morning if you like [12:18:33] in the meantime can we check that everything looks ok? I guess that I explicitly need to start mariadb [12:19:26] it will fail [12:19:29] as there are not files [12:19:34] we can install a few files and run it [12:20:20] but is should be fine tomorrow [12:20:26] once we copy the data over [12:20:29] ah super [12:20:30] the my.cnf are the same? [12:20:54] checking [12:25:00] basedir = /opt/wmf-mariadb10 is not ok on db1108 [12:25:43] rest looks similar (I am talking about /etc/my.cnf [12:27:13] right, it needs to be 101 [12:27:19] so /opt/wmf-mariadb101 [12:33:16] I guess that one needs to be puppetized in some way [12:33:46] ahh mariadb::config [12:34:18] yeah :) [12:36:48] something like this https://gerrit.wikimedia.org/r/#/c/386376/ [12:37:22] +1ed [12:38:50] thanks! [12:49:41] 10DBA, 10Patch-For-Review: Run pt-table-checksum on s3 - https://phabricator.wikimedia.org/T164488#3709700 (10Marostegui) db1077 has been checksummed - going to start fixing a few data drifts and we should be good to close this \o/ [12:53:37] ok fixed the systemd template for eventlogging_snyc [12:53:40] *sync [13:28:50] 10DBA, 10Cloud-Services, 10Operations, 10Tracking: Database replication problems - production and labs (tracking) - https://phabricator.wikimedia.org/T50930#3709749 (10Marostegui) [13:28:53] 10DBA, 10Cloud-Services, 10ContentTranslation, 10WorkType-NewFunctionality: Replicate ContentTranslation databases on Labs - https://phabricator.wikimedia.org/T119847#3709747 (10Marostegui) 05stalled>03declined I am going to close this as Declined as it is unlikely that will be able to replicate x1 to... [15:12:29] 10DBA, 10Commons, 10Contributors-Team, 10MediaWiki-Watchlist, and 12 others: "Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis - https://phabricator.wikimedia.org/T171027#3710051 (10Lydia_Pintscher) [15:18:30] 10DBA, 10Commons, 10MediaWiki-Watchlist, 10Performance, 10Wikidata-Sprint: re-enable Wikidata Recent Changes integration on Commons - https://phabricator.wikimedia.org/T179010#3710115 (10Lydia_Pintscher) [15:19:15] marostegui: my team told me that we can shutdown mariadb on db1047 anytime, so we can do the work whenever you have time [15:20:43] 10DBA, 10MediaWiki-Watchlist, 10Performance, 10Russian-Sites, 10Wikidata-Sprint: re-enable Wikidata Recent Changes integration on Russian Wikipedia - https://phabricator.wikimedia.org/T179012#3710122 (10Lydia_Pintscher) [15:21:09] 10DBA, 10MediaWiki-Watchlist, 10Performance, 10Russian-Sites, 10Wikidata-Sprint: re-enable Wikidata Recent Changes integration on Russian Wikipedia - https://phabricator.wikimedia.org/T179012#3710140 (10Lydia_Pintscher) 05Open>03stalled Marking as stalled as we should wait for T179010 to be done. [15:21:19] 10DBA, 10Commons, 10Contributors-Team, 10MediaWiki-Watchlist, and 12 others: "Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis - https://phabricator.wikimedia.org/T171027#3667198 (10Lydia_Pintscher) [15:26:38] 10DBA, 10Commons, 10MediaWiki-Watchlist, 10Performance, 10Wikidata-Sprint: re-enable Wikidata Recent Changes integration on Commons - https://phabricator.wikimedia.org/T179010#3710082 (10Bawolff) Is there an estimate on the number of rows in a given time period this would be? [15:26:49] elukey: let's do it tomorrow morning? [15:28:53] 10DBA, 10Commons, 10MediaWiki-Watchlist, 10Performance, 10Wikidata-Sprint: re-enable Wikidata Recent Changes integration on Commons - https://phabricator.wikimedia.org/T179010#3710181 (10Marostegui) Thanks for the heads up. When do you plan to re-enable this? How are you going to monitor it? [15:32:22] marostegui: ack! [15:32:42] I'll be your post alter table annoying task [15:32:48] XDDDDDDDD [15:32:51] :D [15:38:41] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3710213 (10Marostegui) [15:58:09] 10DBA, 10Commons, 10MediaWiki-Watchlist, 10Performance, 10Wikidata-Sprint: re-enable Wikidata Recent Changes integration on Commons - https://phabricator.wikimedia.org/T179010#3710292 (10hoo) [16:24:45] 10DBA, 10Commons, 10MediaWiki-Watchlist, 10Performance, 10Wikidata-Sprint: re-enable Wikidata Recent Changes integration on Commons - https://phabricator.wikimedia.org/T179010#3710455 (10hoo) >>! In T179010#3710172, @Bawolff wrote: > Is there an estimate on the number of rows in a given time period this... [16:25:28] marostegui: https://phabricator.wikimedia.org/T179010#3710455 Would 12:00 UTC tomorrow work for you? [16:26:00] Should this make problems again (which I can't really imagine), we can just go back to the current (sad) state [16:28:17] 10DBA, 10Commons, 10MediaWiki-Watchlist, 10Performance, 10Wikidata-Sprint: re-enable Wikidata Recent Changes integration on Commons - https://phabricator.wikimedia.org/T179010#3710469 (10Marostegui) >>! In T179010#3710455, @hoo wrote: >>>! In T179010#3710181, @Marostegui wrote: >> Thanks for the heads up... [16:28:49] just replied [16:28:57] marostegui: Thanks, that's ok with me. [16:29:02] thanks [16:29:11] i am alone for 3 weeks as jaime is on holidays [16:29:28] Ok [16:29:31] so i have a bit of stuff to keep an eye on, so I rather get this done on monday so we have more days to monitor [16:29:34] and react [16:29:36] I might want to move on with https://phabricator.wikimedia.org/T172914 then tomorrow [16:29:37] and not go into the weekend [16:29:50] this will initially only affect our statement usage test wikis [16:29:57] so should be manageable in scope [16:30:11] sure, we don't have many enabled yet, right? [16:30:15] i mean, they are not huge wikis [16:32:42] Yeah, cawiki, cewiki, elwiki, kowiki and trwiki [16:32:59] should be fine i think [16:33:06] I expect this to have less impact than the statement usage tracking trial on the same wikis [16:33:09] which was fine [16:33:15] yeah [16:33:18] sounds good [16:33:22] just ping me tomorrow here [16:33:24] so i can be aware [16:33:24] and if it causes problems, we can (again) just pull it [16:49:18] 10DBA, 10Operations, 10ops-eqiad: db1101 crashed - memory errors - https://phabricator.wikimedia.org/T178383#3710502 (10Cmjohnson) Dell declined to send the new DIMM, stated that my supporting documentation was insufficient. I swapped the DIMM at A4 to B4 and will need to wait for that to fail before submit... [17:00:28] 10DBA, 10Operations, 10ops-eqiad: db1101 crashed - memory errors - https://phabricator.wikimedia.org/T178383#3710517 (10Marostegui) How's that possible? Aren't the logs from _their_ server's idrac enough?? [17:03:33] 10DBA, 10Operations, 10ops-eqiad: db1101 crashed - memory errors - https://phabricator.wikimedia.org/T178383#3710521 (10Marostegui) 05Open>03Resolved Anyways, I will mark this as resolved (mysql is back up) and let's reopen once it fails again. I will do some heavy alters in the next few days so we will... [17:23:52] 10DBA, 10Operations, 10ops-eqiad: db1101 crashed - memory errors - https://phabricator.wikimedia.org/T178383#3710589 (10Marostegui) I have started 16 (1 per database present ) concurrent alters for the templatelinks table to generate some load [19:01:19] 10DBA, 10Operations, 10ops-eqiad: db1101 crashed - memory errors - https://phabricator.wikimedia.org/T178383#3711007 (10Marostegui) And now almost 50 alters running at the same time [19:06:55] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Identify tools hosting databases on labsdb100[13] and notify maintainers - https://phabricator.wikimedia.org/T175096#3582626 (10Quiddity) >>! In T175096#3589046, @bd808 wrote: > Initial list of accounts: > P5960 I've annotated the labsdb1001 section wi... [19:13:21] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Identify tools hosting databases on labsdb100[13] and notify maintainers - https://phabricator.wikimedia.org/T175096#3582626 (10Luke081515) Is it possible to make a dump of the merl DB and put it into the tool dir? Merl has been inactive for some time n... [21:14:18] 10DBA, 10Operations, 10Ops-Access-Requests, 10cloud-services-team (Kanban): Access to raw database tables on labsdb* for wmcs-admin users - https://phabricator.wikimedia.org/T178128#3711314 (10madhuvishy) @jcrespo @bd808 I looked at the accounts set up we have now, and it looks like the labsdbadmin user is... [21:30:23] 10DBA, 10Data-Services: Toolforge oursql connecting to enwiki.analytics.db.svc.eqiad.wmflabs raises error 1615 'Prepared statement needs to be re-prepared' but works fine on enwiki.labsdb - https://phabricator.wikimedia.org/T179041#3711347 (10zhuyifei1999)