[05:52:17] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1056 - https://phabricator.wikimedia.org/T177171#3649781 (10Marostegui) [05:52:51] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1056 - https://phabricator.wikimedia.org/T177171#3649267 (10Marostegui) a:03Cmjohnson Hi @Cmjohnson please change this disk whenver you can Thanks! [05:53:18] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1056 - https://phabricator.wikimedia.org/T177171#3649267 (10Marostegui) p:05Triage>03Normal [05:56:00] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata: Find test wiki(s) for new description usage and enable there - https://phabricator.wikimedia.org/T177155#3649787 (10Marostegui) ``` wc -l ./dblists/small.dblist 503 ./dblists/small.dblist ``` I would suggest we just enable it on a portion of them fir... [05:59:40] 10DBA, 10Data-Services: Some queries to new replica hosts are dramatically slower than labsdb; missing indexes? - https://phabricator.wikimedia.org/T177096#3649817 (10Marostegui) >>! In T177096#3647333, @bd808 wrote: > I'd be ok with stalling the index fix for a few days if we can get something properly design... [06:07:33] 10DBA, 10Operations, 10Patch-For-Review: decommission db1036 - https://phabricator.wikimedia.org/T176311#3649832 (10Marostegui) [06:39:00] 10DBA, 10Operations, 10Patch-For-Review: decommission db1036 - https://phabricator.wikimedia.org/T176311#3649839 (10Marostegui) [06:39:47] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: decommission db1036 - https://phabricator.wikimedia.org/T176311#3620829 (10Marostegui) a:05Marostegui>03Cmjohnson db1036 is now ready to be totally decommissioned by @Cmjohnson [07:45:50] jynus: hey, let me know when you have some time to do https://gerrit.wikimedia.org/r/#/c/381433/ [07:46:00] I will monitor everything :) [07:46:40] Amir1: you want me to merge it? [07:46:51] marostegui: yeah sure [07:47:18] or is there any specific reason you want j*nus to do it? (ie: you have been talking to him while I was on holidays etc)? [07:47:28] no no [07:47:36] ok, I will merge it then :) [07:47:37] he said he wanted this clean up [07:48:03] also if it's possible can you change something in one the files so it starts at first? [07:48:14] I tell you what [07:48:32] sure [07:51:00] so can I merge it now? [07:51:06] yeah [07:51:26] marostegui: make the file and put this in it "Processed up to page 0 (Q1)" [07:51:34] merged [07:51:44] "/var/log/wikidata/rebuildTermSqlIndex.log" [07:51:58] just be careful about the owner and permissions [07:52:04] in terbium [07:52:31] ok [07:52:53] -rw-rw-r-- 1 www-data www-data 0 Sep 30 06:26 rebuildTermSqlIndex.log [07:53:04] owner and perms should stay the same [07:53:06] Thanks [07:53:29] So it just need: Processed up to page 0 (Q1) [07:53:32] added to that file? [07:54:40] yeah [07:54:59] there you go [07:55:15] https://phabricator.wikimedia.org/P6065 [07:55:17] fantastic [07:55:20] Thank you! [07:55:27] yw! [07:55:47] now we need to wait for half an hour and then fun begins [07:55:59] i will keep an eye on the servers too [07:56:57] I'm doing some cleanup for wikidata atm [07:57:06] number of deletes will go up [07:57:41] ok, throttled right? [08:01:34] yeah [08:01:38] https://grafana.wikimedia.org/dashboard/db/mysql?panelId=3&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1063&var-port=9104&from=now-3h&to=now [08:03:07] thanks - nice spike :p [08:07:35] I forgot to mention, I changed the regex to a 3-letter code [08:07:50] 2 parenthesis was too much for my mind [08:09:47] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3649945 (10Marostegui) [08:10:15] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3578969 (10Marostegui) s5 got the duplicated index cleaned up [08:11:56] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3649947 (10Marostegui) [08:17:40] something is not right: https://grafana.wikimedia.org/dashboard/db/mysql?panelId=11&fullscreen&orgId=1&from=now-90d&to=now&var-dc=eqiad%20prometheus%2Fops&var-server=dbstore1001&var-port=9104 [08:18:40] I am going to stop dbstore1001 [08:19:20] oh wow [08:19:30] it was acting very weirdly past thursday i think [08:19:34] I think we have some leaks [08:19:40] on the events [08:19:44] or somewhere else [08:20:30] maybe I should even drop s5 [08:20:50] how far behind it is now? [08:20:52] like 10 days? [08:21:08] Seconds_Behind_Master: 799299 [08:21:28] yep,. that is almost 10 days [08:21:34] it will never recover I am afraid .( [08:23:35] I am going to put it down, upgrade, restart, etc [08:23:55] ok [08:26:37] dbstore2001 had the same issues [08:26:41] either it is tokudb [08:26:52] or the replica lag script [08:27:25] had? you mean when it was multisource? [08:27:32] yes [08:27:40] before we rebuilt it [08:27:45] when it was multisource I don't think it was tokudb, it was already innodb no? [08:27:53] as we built it from codfw slaves [08:27:59] no, it was tokudb [08:28:12] oh [08:28:15] true [08:28:20] we rebuilt it twice [08:28:26] once as multisource innodb [08:28:30] yeah, i believe so [08:28:32] and another as multi-instance [08:28:42] so it could be the replication script [08:28:59] if it is…we should see it happening on multi-instance too, no? [08:29:12] there is no lag [08:29:16] on the current one [08:29:19] ah true [08:29:34] the plan is to use binary backups to replace that [08:29:49] yes, I forgot it is not using events [08:30:22] but it might be the events+the current load [08:30:28] otherwise we'd see the problem on all the shards, no? [08:31:21] interestingly, it stopped quickly [08:32:41] "A regression was discovered after the release of MariaDB 10.1.27. It has been pulled from the downloads system, but some mirrors may still have it. Do not download or install this version. " [08:32:55] "Stay with MariaDB 10.1.26 until 10.1.28 is released" [08:33:09] \o/ [08:33:29] we have .26 on labs [08:33:30] so we are good :) [08:40:39] https://grafana.wikimedia.org/dashboard/db/mysql?panelId=3&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1063&var-port=9104&from=now-3h&to=now [08:43:08] that's me, I hope that's fine [08:43:31] so far everything is looking fine [08:44:11] https://grafana.wikimedia.org/dashboard/db/mysql-replication-lag?panelId=5&fullscreen&orgId=1&from=now-3h&to=now [08:44:34] bye bye dbstore1001 [08:44:41] well [08:44:46] not yet [08:44:51] that is just the reboot [08:45:00] should I definitely do it? [08:45:14] We can bring it back agian and wait 2-3 days [08:45:18] to see if it attempts to catch up [08:45:20] (i doubt it) [08:45:29] but there is no harm in waiting 2-3 more days I guess [08:45:34] maybe disabling the event threads [08:45:45] yeah, that could also be a good experiment [08:46:03] because if the purge lag doesn't go down [08:46:09] it will happen with other threads, too [08:46:24] I am building 10.1.28 right now [08:46:36] for stretch [08:46:47] it won't hurt [08:47:15] 10DBA, 10Operations, 10ops-codfw: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764#3649981 (10Marostegui) @Papaul server is now off. Feel free to power it on once you are done with it Thank you! [08:50:00] dbstore1002 is at 6% capacity [08:50:38] :( [08:51:41] there is still 130G to be used on the pv, but I rather not use them if we can clean up more stuff [08:51:48] did the hw arrive finally last week elukey? [08:52:09] didn't get any notification from Chris so not really sure, checking the task [08:52:25] I heard something arrived on friday [08:52:32] not sure if that or something else [08:52:41] I got a notification of the task update [08:52:55] saying that it was expected to arrive today (today being the day i saw the task update) [08:54:38] so there are some eventlogging tables that I can nuke, need to double check with Nuria first though, so I could tentatively try to free some space by EOD [08:55:06] that'd be nice :) [08:56:27] do we have any idea what are the databases with the most growth rate? [08:56:38] or better, how to retrieve this info [08:56:58] no, that is onpurpose removed from public monitoring [08:57:14] but no private monitoring is set yet [08:57:41] I mean, we can get some write stats [08:59:29] root@dbstore1002[information_schema]> SELECT * FROM table_statistics ORDER BY rows_changed DESC LIMIT 20; [09:00:44] ahhh very nice, /me ignorant and didn't know [09:02:03] but it can be missleading- mysql.gtid_slave_pos is on top, because it changes on every replication event [09:02:20] but it just updates a single value [09:02:57] sure sure, but I can see the busiest tables of the log databases for example, I didn't know how to do it before [09:03:12] there is a better way, which is performance_schema [09:03:20] but it is not enabled on old/busy hosts [09:03:37] we want it enabled for 46/47 replacements [09:04:43] \o/ [09:10:24] 10DBA, 10Patch-For-Review: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679#3650005 (10Marostegui) >>! In T172679#3638670, @Marostegui wrote: >>>! In T172679#3635468, @jcrespo wrote: >> Yes, although we may need still an extra host for vlow/dumps, separate from the other... [09:18:25] I am going to test 10.1.28 on the passibe labsdb [09:18:30] *passive [09:19:21] dbstore1001 is not comming up :-/ [09:22:14] serial console is very slow [09:23:04] when we rebooted dbstore1002 a month ago or so, it also took quite long to come back [09:27:16] dbstore1001:~$ /etc/init.d/mysql start --skip-slave-start --event_scheduler=0 [09:27:54] BTW, it took almost 4 hours to backup s5 [09:28:19] it should only have taken 2+, we need better backup sources and separate them from storage [09:29:36] oh wow 4 hours [09:30:48] Amir1: re T159753, should we run a deframent on the table to reclaim space? [09:30:48] T159753: Concerns about ores_classification table size on enwiki - https://phabricator.wikimedia.org/T159753 [09:31:09] jynus: not yet, let me free up around 20M more [09:31:25] ok [09:31:27] I need to ask for a window, it'll take around five hours [09:33:10] dbstore1001 still starting... [10:58:38] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3650271 (10Marostegui) [11:34:24] 10DBA, 10Operations, 10procurement: Purchase sanitarium & backup tests hosts (4 hosts in total) - https://phabricator.wikimedia.org/T177203#3650318 (10Marostegui) [11:45:21] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3650340 (10Marostegui) [12:13:37] 10DBA, 10Operations: Increase timeout for mariadb replication check - https://phabricator.wikimedia.org/T163303#3650383 (10Marostegui) 05Open>03declined With the full reimplementation of the backups/dbstore hosts, let's decline this. [12:15:03] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3650388 (10Marostegui) @Nuria can we get rid of these tables finally? [12:29:53] 10DBA, 10Operations, 10ops-eqiad: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3650418 (10Marostegui) [12:44:47] 10DBA, 10Patch-For-Review: Refactor puppet mariadb class to support multi-instance hosts - https://phabricator.wikimedia.org/T169514#3650471 (10Marostegui) [12:45:18] 10DBA, 10Patch-For-Review: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679#3650485 (10Marostegui) [12:45:21] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3650486 (10Marostegui) [12:50:00] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3650502 (10Marostegui) [13:11:00] 10DBA, 10RESTBase-API, 10Reading List Service, 10Reading Epics (Synchronized Reading Lists), and 4 others: RfC: Reading List service - https://phabricator.wikimedia.org/T164990#3650577 (10CKoerner_WMF) [14:26:00] dbstore1001:s5 is stuck on DELETE /* Wikibase\Repo\ChangePruner::pruneChanges */ FROM `wb_changes` WHERE (change_time < '20170920022354') [15:12:04] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3651025 (10Nuria) Sounds (per e-mail conversation) that reserachers are interested on the data: https://lists.wikimedia.org/pipermail/wiki-research-l/2017-July/005931.html so i do not think we can remove them [15:30:15] 10DBA, 10Data-Services: Some queries to new replica hosts are dramatically slower than labsdb; missing indexes? - https://phabricator.wikimedia.org/T177096#3651073 (10bd808) >>! In T177096#3649817, @Marostegui wrote: > From my point of view, ideally we should have some sort of cronjob or similar that could com... [15:33:36] 10DBA, 10Data-Services: Determine schema differences between labsdb1001 and labsdb1009 - https://phabricator.wikimedia.org/T177223#3651078 (10bd808) [15:33:53] 10DBA, 10Data-Services, 10User-bd808, 10cloud-services-team (Kanban): Determine schema differences between labsdb1001 and labsdb1009 - https://phabricator.wikimedia.org/T177223#3651093 (10bd808) a:03bd808 [15:34:19] 10DBA, 10Data-Services, 10Goal, 10cloud-services-team (FY2017-18): Migrate all users to new Wiki Replica cluster and decommission old hardware - https://phabricator.wikimedia.org/T142807#3651095 (10bd808) [15:34:22] 10DBA, 10Data-Services: Some queries to new replica hosts are dramatically slower than labsdb; missing indexes? - https://phabricator.wikimedia.org/T177096#3646922 (10bd808) [15:34:28] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#3651099 (10Marostegui) [15:34:34] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3651096 (10Marostegui) 05Open>03declined How can something last written in 2013 still be useful? ``` root@db1052:/srv/sqldata/enwiki# ls -lh moodbar_feedback*.ibd -rw-rw---- 1 mysql mysql 40M Mar 12 2013 moo... [15:37:40] 10DBA, 10Data-Services, 10User-bd808, 10cloud-services-team (Kanban): Determine schema differences between labsdb1001 and labsdb1009 - https://phabricator.wikimedia.org/T177223#3651125 (10jcrespo) There is already 4 related things that, even nothing to do with this, we could integrate this on: * replicati... [15:40:14] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3651160 (10Reedy) I don't see why it can't just be exported to an sql dump file, and archived somewhere. Possibly then imported to another db cluster (analytics or something) if someone wants it in future [16:00:17] 10Blocked-on-schema-change, 10DBA, 10Reading-Community-Engagement, 10Community-Liaisons (Oct-Dec 2017): Help communicate read-only time for Commons for schema change required by adding 3D filetype - https://phabricator.wikimedia.org/T176883#3651271 (10CKoerner_WMF) I looked at the actives happing on Common... [16:03:26] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#3651309 (10jcrespo) [16:03:28] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3651306 (10jcrespo) 05declined>03Open If those tables are not in use in production, those tables have to be dropped from production boxes. Unless someone else wants to become the owner of mediawiki dbs, that... [16:18:03] 10Blocked-on-schema-change, 10DBA, 10Reading-Community-Engagement, 10Community-Liaisons (Oct-Dec 2017): Help communicate read-only time for Commons for schema change required by adding 3D filetype - https://phabricator.wikimedia.org/T176883#3640012 (10Jseddon) I presume a centralnotice will be required for... [16:32:29] 10DBA, 10Epic: Meta ticket: Migrate multi-source database hosts to multi-instance - https://phabricator.wikimedia.org/T159423#3651442 (10jcrespo) [16:32:32] 10DBA, 10Patch-For-Review: Refactor puppet mariadb class to support multi-instance hosts - https://phabricator.wikimedia.org/T169514#3651439 (10jcrespo) 05Open>03Resolved [16:33:28] 10DBA, 10Patch-For-Review: Refactor puppet mariadb class to support multi-instance hosts - https://phabricator.wikimedia.org/T169514#3400308 (10jcrespo) [16:33:30] 10DBA, 10Patch-For-Review: Refactor prometheus-mysqld-exporter to support multi-instance hosts - https://phabricator.wikimedia.org/T170666#3651449 (10jcrespo) [16:39:07] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3651477 (10Nuria) >I don't see why it can't just be exported to an sql dump file, and archived somewhere. Possibly then imported to another db cluster (analytics or something) if someone wants it in future I thin... [16:53:14] 10DBA, 10Operations, 10ops-eqiad: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3651543 (10Cmjohnson) @Marostegui Please let me know when you're available this week. [16:53:46] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3651546 (10jcrespo) See my proposal, it keeps both manuel and you happy (but requires work from those wanting to keep these around, which I think is fair :-P). [16:53:48] 10DBA, 10Operations, 10ops-eqiad: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3651547 (10Marostegui) >>! In T174054#3651543, @Cmjohnson wrote: > @Marostegui Please let me know when you're available this week. What about Thursday? [16:55:21] 10DBA, 10Operations, 10ops-eqiad: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3651551 (10Cmjohnson) Thursday works [16:56:34] 10DBA, 10Operations, 10ops-eqiad: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3651555 (10Marostegui) >>! In T174054#3651551, @Cmjohnson wrote: > Thursday works Awesome, ping me when you get online! Thank you! [17:14:18] 10DBA, 10Commons, 10Contributors-Team, 10MediaWiki-Watchlist, and 8 others: "2062 Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis - https://phabricator.wikimedia.org/T171027#3651634 (10jmatazzoni) [17:17:45] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3651673 (10Nuria) >but requires work from those wanting to keep these around, which I think is fair @jcrespo from research team? [17:20:46] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3651683 (10Nuria) No, ok, you mean, analytics right? To be clear we do not have a use for that data ourselves but I think it should not be deleted if it is of interest for reserarch. Would you be so kind as to ou... [17:21:40] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3651686 (10jcrespo) Whoever wants to keep them! I say it is fair because normally when you ask **if** to keep them around, everybody is for it; if you ask **who** wants to keep around and take care of archiving i... [17:26:28] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3651708 (10Nuria) @jcrespo: there is a staging database in the analytics replicas, could those tables be copied there before you delete them for all wikis? That is the best I can think of right now. [17:28:09] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3651713 (10jcrespo) Yes, that was actually my implicit suggestion. Other things can be suggested, and we will help, we just need to take them outside the *wik* dbs. [17:30:04] 10DBA, 10Analytics: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3651718 (10Nuria) @jcrespo ok, i wasn't clear. Then maybe we can put them in a better-names database like "mediawiki-archive"? [19:04:27] 10DBA, 10Operations, 10ops-codfw: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764#3652001 (10Papaul) a:05Papaul>03Marostegui Main board replacement complete. [19:17:46] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata: Find test wiki(s) for new description usage and enable there - https://phabricator.wikimedia.org/T177155#3652067 (10hoo) >>! In T177155#3649787, @Marostegui wrote: > ``` > wc -l ./dblists/small.dblist > 503 ./dblists/small.dblist > ``` > > I would su... [19:57:00] 10Blocked-on-schema-change, 10DBA, 10Readers-Community-Engagement, 10Community-Liaisons (Oct-Dec 2017): Help communicate read-only time for Commons for schema change required by adding 3D filetype - https://phabricator.wikimedia.org/T176883#3652140 (10CKoerner_WMF) You are correct Mr @Jseddon. That would b...