[00:03:55] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Services, and 2 others: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10razzi) @Marostegui could you expand on why the check isn't realistic? From what I can tell all it's monitoring is the total u... [00:42:21] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Ladsgroup) So with importing pywikibot-bugs (More details: T278609#6978553), the better estimation is 27GB size with 3.2GB/year growth (assuming linear growth) [05:02:40] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Services, and 2 others: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) >>! In T269211#6978486, @razzi wrote: > @Marostegui could you expand on why the check isn't realistic? From what... [05:04:15] 10DBA, 10SRE, 10Wikimedia-Mailing-lists: Create production databases for mailman3 - https://phabricator.wikimedia.org/T278614 (10Marostegui) That's a very doable number, thanks @Ladsgroup! [05:11:13] 10DBA: Upgrade 10.4.13 hosts to a higher version - https://phabricator.wikimedia.org/T279281 (10Marostegui) [05:37:22] 10DBA, 10Platform Engineering, 10SRE, 10Wikimedia-Incident: Appservers latency spike / parser cache growth 2021-03-28 - https://phabricator.wikimedia.org/T278655 (10Marostegui) I was on holidays when all this happened, is there anything else to follow up with? [07:00:14] 10DBA: Switchover s1 from db1083 to db1163 - https://phabricator.wikimedia.org/T278214 (10Marostegui) I think T276150 will be done by next week, so I am going to schedule this switchover for 28th April at 05:00 AM UTC [07:00:25] 10DBA: Switchover s1 from db1083 to db1163 - https://phabricator.wikimedia.org/T278214 (10Marostegui) [07:08:19] 10DBA: Switchover s1 from db1083 to db1163 - https://phabricator.wikimedia.org/T278214 (10Marostegui) [07:08:48] 10DBA: Switchover s1 from db1083 to db1163 - https://phabricator.wikimedia.org/T278214 (10Marostegui) [07:18:06] 10DBA: Switchover s1 from db1083 to db1163 - https://phabricator.wikimedia.org/T278214 (10Marostegui) kernel upgraded on db1163 [08:09:22] 10DBA: Switchover s1 from db1083 to db1163 - https://phabricator.wikimedia.org/T278214 (10JJMC89) [08:49:56] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db1184 is ready but I am not going to pool it before next week, let's leave it replicate for a few days and for the long weekend just in case. [08:51:17] 10DBA: Evaluate the impact of changing innodb_change_buffering to inserts - https://phabricator.wikimedia.org/T263443 (10Marostegui) Changed this on a few roles: * Misc * Phabricator * dbstore_multiinstance They'll pick up the change once mysql gets restarted [10:02:27] 10DBA: db2106 and db2147 crashed - https://phabricator.wikimedia.org/T279406 (10Kormat) Both machines are recloned, and been running fine for the last day, so i've repooled them. [10:02:56] 10DBA: db2106 and db2147 crashed - https://phabricator.wikimedia.org/T279406 (10Marostegui) Thank you <3 [10:03:10] marostegui: shall we merge this now? https://gerrit.wikimedia.org/r/677111 :D [10:03:40] Amir1: yeah, merging [10:03:50] thanks for the reminder :) [10:04:27] Thanks for merging. I just have a reminder in my inbox mwhaha [10:06:12] 10DBA: db2106 and db2147 crashed - https://phabricator.wikimedia.org/T279406 (10Kormat) 05Open→03Resolved [10:10:17] moritzm: let's decom the jessie dbmonitor next week? [10:11:08] moritzm: I am going to stop apache there, and leave it off and see if something breaks/complains? [10:13:12] yay, 2 down 144 to go [10:16:54] marostegui: db1148 is warning about memory usage on icinga [10:17:50] that host sounds familiar [10:17:54] let me see if I can remember why [10:18:33] ah yes, I saw a big spike yesterday on it [10:18:40] it is the vslow/dump host [10:18:42] are dumps running? [10:18:55] they are [10:18:58] that could explain it [10:19:02] let's leave it for now [10:19:16] ack [10:19:21] dumps meaning xml dumps, not backup dumps [10:19:49] how can you tell, btw? [10:19:55] that they are running? [10:20:08] yeah [10:20:16] | 656676398 | wikiadmin | 10.64.16.77:35118 | commonswiki | Query | 161 | Sending data | SELECT /* SpecialWantedCategories::reallyDoQuery www-data@mwmain... */ 14 AS `namespace`,cl_to AS ` | 0.000 | [10:20:37] if you see wikiadmin running and it is a dump host, then it is dumps [10:20:46] or a script [10:21:04] mwmaint1002 in this case [10:21:06] which actually looks more like a script than dumps now that I see it carefully [10:21:07] yeah [10:21:12] ok cool, thanks :) [10:21:12] it is a script against vslow [10:22:17] the script is `updateSpecialPages.php` [10:40:28] heads up, I'm importing wikitech-l to mailman3, m5 will go weeeeeee for maybe an hour [10:44:04] Amir1: thanks for the heads up, is it throttled? [10:44:25] It doesn't matter as misc doesn't care about lag (we don't use the slaveS) but just in case I need to downtime the hosts [10:44:47] I don't see lag so far [10:46:12] yeah it's throttled afaik [10:46:22] since we did it last night [10:49:52] marostegui: sounds good, let's do that! [10:50:59] moritzm: excellent! [10:55:46] moritzm: done [11:38:49] 10Blocked-on-schema-change, 10DBA: Drop default of rc_timestamp - https://phabricator.wikimedia.org/T276156 (10Marostegui) [11:39:05] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) [11:40:56] ugh, it worked up to 2005 and then it broke [11:41:09] found really fun stuff https://lists-next.wikimedia.org/hyperkitty/list/wikitech-l@lists-next.wikimedia.org/thread/UBXMTDP4ZA2NKJWB3MZIRCB2TLX76OBV/ [11:41:20] anyway let me see what I can import again [11:41:21] haha [11:42:44] Amir1: https://lists-next.wikimedia.org/hyperkitty/list/wikitech-l@lists-next.wikimedia.org/thread/G4C7ZDU2G3ROJECRJESAYST7QD7SROCF/ "the [11:42:44] MediaWiki PHP script" :P [11:44:02] I was in elementary school when this happened [11:57:35] Found this in archives: >Add photos to your e-mail with MSN 8. Get 2 months FREE*. [11:57:35] > http://join.msn.com/?page=features/featuredemail [12:29:59] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) Started transfer from db1173 to db1180 [13:47:23] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) db1180 is now replicating. I am checking all its tables as this host won't be pooled in before the long weekend anyways. [14:39:22] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) [14:39:28] 10Blocked-on-schema-change, 10DBA: Drop default of rc_timestamp - https://phabricator.wikimedia.org/T276156 (10Marostegui) [16:30:27] marostegui: heads up that you might see a lot of DELETEs on enwiki today and/or coming days, see https://en.wikipedia.org/wiki/Wikipedia:Bots/Noticeboard#Clearing_bot_watchlists [16:30:56] letting you know because you suggested this might be problematic at https://phabricator.wikimedia.org/T270481#6701379 [16:31:33] I commented in the discussion and advised them to let sysadmins clear their watchlists, so we don't have many millions of DELETEs happening all at once, but I don't know if I got the word out in time [16:35:04] cc Amir1 [19:06:20] 10DBA, 10Platform Engineering, 10SRE, 10Wikimedia-Incident: Appservers latency spike / parser cache growth 2021-03-28 - https://phabricator.wikimedia.org/T278655 (10Krinkle) I suppose its up to you. Is this growth is acceptable, expected, and normal? [20:36:01] 10DBA, 10GrowthExperiments-MentorDashboard, 10Growth-Team (Current Sprint), 10User-Urbanecm_WMF (Engineering): Create database table to cache data about mentees - https://phabricator.wikimedia.org/T279587 (10Urbanecm_WMF) a:03Urbanecm_WMF [20:46:53] 10DBA, 10GrowthExperiments-MentorDashboard, 10Growth-Team (Current Sprint), 10User-Urbanecm_WMF (Engineering): Create database table to cache data about mentees - https://phabricator.wikimedia.org/T279587 (10Urbanecm_WMF) [20:56:08] 10DBA, 10GrowthExperiments-MentorDashboard, 10Growth-Team (Current Sprint), 10Patch-For-Review, 10User-Urbanecm_WMF (Engineering): Create database table to cache data about mentees - https://phabricator.wikimedia.org/T279587 (10Urbanecm_WMF) @Tgr Hello, could you review the table design and this task ple... [20:57:20] Yes. I will delete them personally [21:01:50] I even have a bash script for it already [21:07:55] Importing rest of wikitech-l, m5 will go weeee [22:05:51] musikanimal: I cleaned all, watchlist of enwiki will not grow for the next couple of months at least [22:06:29] \o/ thanks! [22:07:27] 10DBA, 10GrowthExperiments-MentorDashboard, 10Growth-Team (Current Sprint), 10Patch-For-Review, 10User-Urbanecm_WMF (Engineering): Create database table to cache data about mentees - https://phabricator.wikimedia.org/T279587 (10Tgr) As this is a cache table with discardable content, we hope to do all the... [22:09:41] I noticed the analytics replicas haven't caught up yet. Still see all 3164794 rows for ClueBot NG. Is it really that delayed? [22:10:05] now cleaning watchlist of LargeDatasetBot in wikidatawiki, that user alone is responsible for 1/3rd of the watchlist table in wikidatawiki [22:10:19] musikanimal: you can try the live replicas [22:10:21] 10DBA, 10GrowthExperiments-MentorDashboard, 10Growth-Team (Current Sprint), 10Patch-For-Review, 10User-Urbanecm_WMF (Engineering): Create database table to cache data about mentees - https://phabricator.wikimedia.org/T279587 (10Urbanecm_WMF) >>! In T279587#6982058, @Tgr wrote: > As this is a cache table... [22:15:44] musikanimal: oh no, that was my mistake. My bash script only deletes 0.5M rows, I need to do it again [22:16:30] 0.5M per user, it already handles most users [22:18:26] ah. That still doesn't explain the analytics replica drift. There usually is not delay at all... which is why I'm confused [22:20:44] and dumb question... you're doing this write on the db hosts directly, right? so it won't get logged in the MySQL Grafana dashboard? Just curious because the DELETEs haven't looked very elevated [22:22:14] It looks increased to me, It's just I do it really slowly to make sure lags won't get out of control https://grafana.wikimedia.org/d/000000278/mysql-aggregated?viewPanel=7&orgId=1&from=now-3h&to=now&var-site=eqiad&var-group=core&var-shard=s8&var-role=All [22:23:34] maybe it's because it's one DELETE command with 10k rows per command? [22:25:06] for example cleaning clue bot NG is just 300 queriers [22:26:27] musikanimal: the one you should look is the handler stat and that looks elevated https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=3&orgId=1 [22:26:40] (enable "delete") [22:27:59] ah, thank you! I had the Databases/MySQL dashboard open, which breaks down just deletes, which seems like a lot but there's a steady-ish trend of up and down over the past week https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=2&orgId=1&from=now-7d&to=now&var-server=db1083&var-port=9104