[00:02:10] 10DBA, 10Community-Tech-Sprint, 10MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10Patch-For-Review: Issue with maintenance script: SELECTing revisions with high rev_id is pai... - https://phabricator.wikimedia.org/T175962#3620066 [00:19:26] 10DBA, 10Community-Tech-Sprint, 10MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10Patch-For-Review: Issue with maintenance script: SELECTing revisions with high rev_id is pai... - https://phabricator.wikimedia.org/T175962#3620110 [08:08:53] 10DBA, 10Cloud-Services, 10Toolforge: Disabling general.confirmeduser from dbreports for using up too much db resources - https://phabricator.wikimedia.org/T131956#3620427 (10jcrespo) p50380g50440 was running several queries that were never going to stop executing, and causing 1 day of lag on labsdb1001: ``... [11:14:10] jynus: Hey, I wanted to say, right now the storage for wikidatawiki is growing but we are doing some stuff that frees up some space e.g. we are dropping entity_per_page (~30M rows) [11:24:25] 10DBA, 10Operations, 10Patch-For-Review: decommission db1018 - https://phabricator.wikimedia.org/T176215#3620825 (10jcrespo) [11:24:28] 10DBA, 10Operations, 10Patch-For-Review: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3620824 (10jcrespo) [11:24:40] 10DBA, 10Operations, 10Patch-For-Review: decommission db1018 - https://phabricator.wikimedia.org/T176215#3617573 (10jcrespo) [11:24:43] 10DBA, 10Operations, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3620828 (10jcrespo) [11:27:05] 10DBA, 10Operations: decommission db1036 - https://phabricator.wikimedia.org/T176311#3620829 (10jcrespo) [11:52:40] 10DBA, 10Operations, 10Patch-For-Review: decommission db1036 - https://phabricator.wikimedia.org/T176311#3620896 (10jcrespo) Repartitioning db1101 is ongoing (while replication is down) so that it can substitute db1036 role. [12:15:07] I am not worried about wikidatawiki size [12:15:18] I am worried about the recentchanges per-wiki size [12:23:39] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3620930 (10hoo) This is in effect now and the first statement usages are coming in on elwiki: ``` +--------+--------... [12:44:03] I'd like to remove .htaccess support for tendril (https://gerrit.wikimedia.org/r/#/c/378855/1), anything against it ? [12:48:23] I checked on dbmonitoring and no .htaccess is there [12:49:03] so I am pretty confident that it should be fine, but I'd need also somebody to double check since it is an important website for you guys [12:53:56] wait [12:54:27] last time someone changed an apache rule, we had a security vulnerability [12:54:30] on tendril [12:56:00] deploy, but test on dbmonitor2001 first (stop puppet on dbmonitor1001 first) [12:56:21] sure [13:09:41] jynus: just deployed on dbmonitor2001.wikimedia.org [13:09:47] (1001 has puppet disabled) [13:12:54] do you want me to check the authentication or should I= [13:13:00] ? [13:13:08] if you could that would be great [13:13:52] one sec [13:16:04] 401 Unauthorized, which seems ok to me [13:16:12] super [13:16:15] you can deploy to dbmonitor1001 [13:16:20] thanks a lot! [13:16:32] if there is something broken, (unlikely) now it would be under password [13:16:39] so less prioritary [13:16:57] sorry to be paranoid, but I wanted to cover that possibility [13:17:44] tell me when you have done it so I can retest [13:18:05] nono I appreciate it, better safe than sorry.. After the optionsbleed issue I tried to remove all the unnecessary htaccess directives in our codebase (not really necessary but better not use .htaccess :) [13:18:16] which is cool [13:18:21] and even cooler to ask each owner [13:18:44] all right apache restarted [13:18:54] one never knows which old cruft may be on lesser services [13:19:22] everthing looking good [13:19:38] just checked as well, all good [13:19:41] thanks! [13:20:18] no, thanks to you! [13:50:58] 10DBA, 10MediaWiki-extensions-FlaggedRevs, 10MediaWiki-extensions-UserMerge, 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10Schema-change: flaggedrevs.fr_user is unindexed - https://phabricator.wikimedia.org/T172207#3621161 (10Reedy) [13:51:42] 10DBA, 10MediaWiki-extensions-FlaggedRevs, 10MediaWiki-extensions-UserMerge, 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10Schema-change: flaggedrevs.fr_user is unindexed - https://phabricator.wikimedia.org/T172207#3490037 (10Reedy) Ok, so patch merged. It doesn't need adding to WMF... [13:52:19] jynus: ^ I'll leave that in your capable hands :) [14:26:23] Reedy: you need to retag (or create a separate task, whatever you prefer) with blocked-on-schema change, otherwise we may miss it [14:26:53] The question is whether it's worth deploying it [14:26:58] If you think so, I can do that [15:32:01] jynus: marostegui: Do you have a moment? [15:32:18] * hoo would like to talk about https://phabricator.wikimedia.org/T176273 and https://phabricator.wikimedia.org/T151717#3620930 [15:32:23] (which are related) [16:17:23] sure [16:18:58] hoo: do you want IRC or in person? [16:20:02] IRC is fine, I just want to make sure we're all on the same page, so that we have an agreed way forward [16:21:53] yes, please go on [16:22:06] I did not object to none of your plans [16:22:14] in fact, I was pushing for it [16:23:09] I was trying to set realistic expectations in DBAs time, which of course are limited (we have s8, MCR and other wikidata tasks pending) [16:23:15] and all it is negotiable [16:26:17] Sounds good… so, can we get this rolling in order to get a concrete timeline? Usage tracking is very important, thus we don't want to loose time [16:28:19] yes, my point being that I wasn't a blocker [16:28:44] because even if we do not have the time or the hardware, we can roll it on the same servers, and use parallel replication to get a good advantage [16:29:00] does it make sense? [16:29:21] Yeah, sounds good to me [16:29:26] e.g. enwiki metadata + enwiki tracking on the same set of servers, but independent [16:29:41] I just need to know how much more stress we can then put on the table [16:29:52] I'm currently doing the https://phabricator.wikimedia.org/T151717#3620930 trial [16:29:59] and later, we separate it on different set of servers (enwiki regular metadata and all tracking data from all shards) [16:30:04] and it's not looking like that's going to be soft on the DB [16:30:20] so, basically, I will tell you when things go bad :-) [16:30:43] currently there are some issues on comonswiki (due to imports) [16:30:55] and wikidata (maintenance script + bots) [16:31:02] those would be the main blockers [16:31:08] most other shards ok ok in write load [16:31:28] would wikidata have tracking of its own? [16:31:36] e.g. is wikibase-client installed on wikidata? [16:31:42] It is, but it's not used much [16:31:50] ok, so that is one less problem [16:31:57] what about commons? [16:32:07] It does… and it's quite painful there [16:32:18] mmm, potential pain there [16:32:26] I think maybe s3 would be worse [16:32:44] because there are very small wikis with more recentchanges due to wikidata than the wiki's activity [16:32:59] so the effects gets multiplied x 300 [16:33:15] Yeah, I know… more fine grained usage tracking will fix this [16:33:21] so, if I understand you well, you have the fear that if we do it in place, it will be too late? [16:33:24] but that takes us back to #0 [16:33:28] (potentially) [16:33:34] too late? [16:33:49] like, too many writes happening on a single shard due to the tracking? [16:34:22] Yeah, especially once we enable new usage types, this might blow up the table in a short time [16:34:26] my advice right now [16:34:35] have a full deployment of a single wiki [16:34:43] so the table is fully populated [16:35:09] and let's do a good approximation of "cost", in the number of writes/wikidata activity [16:35:23] then we reevaluate when/how to deploy the rest? [16:35:35] is it posible, or do you need things "faster"? [16:36:31] the thing is, I would not compromise to have a full deployment within 3 months [16:36:46] beacuse the very same reason you comment [16:36:50] Well, there have been some notions of this being pushed for in the fundation [16:37:09] sure, I am not saying this should not happen [16:37:28] I am adding the possiblity that, there is a change that new hardware is needed [16:37:46] hardware takes some time to be approved, bought and setup :-) [16:37:52] I can see for sure [16:38:03] so, you are 100% sure it would be the case? [16:38:26] What exactly? That we will need a lot more resources on this end? Yes, absolutely [16:38:42] yes, but 100% sure new servers are needed [16:39:05] which is ok, it is not a problem, it just changes the timeline [16:39:38] this is not a blocker of "we are not going to do this" [16:39:51] this would be just a case of "ok, how do we do this?" [16:40:14] yes? [16:40:15] I'd be surprised if not, but I'm also not an expert in these regards [16:40:28] well, hence my proposal on measuring the impact [16:40:31] I can tell that the number of rows and the number of writes is going to increase [16:40:36] ok [16:40:51] that is what we wanted, measure the current deployment once it is complete [16:41:06] and then think about resources, right? [16:41:19] in some cases we could even combine more deployment [16:41:25] with purchases [16:41:30] and then move them [16:41:52] e.g. s6 is normally lower in writes [16:42:12] we could deploy to more wikis, even if that means an increase in writes [16:42:20] and reevaluate if we need more resources [16:42:28] it is an iterative process [16:42:38] You mean the statement usage tracking? [16:42:43] yes [16:42:50] aren't we talking about that? [16:42:57] my question is, where do you see a problem with what I am saying? [16:43:06] you can keep coding [16:43:13] and we can continue deploying [16:43:19] we keep measuring [16:43:27] That sounds good to me [16:43:29] if, let's say, commons, we cannot deploy [16:43:41] we pause on that wiki and purchase hardware [16:43:57] in fact, we also have x1 [16:43:59] but we will have even more fine grained tracking coming up, so the number of rows and writes will even more increase [16:44:11] but I guess statement usage tracking is the biggest step [16:44:13] more than statement? [16:44:19] can you tell me more? [16:44:42] I am not blocking anything of what you are doing :-D [16:44:48] Yeah, we plan to also track description usages separately and also try to entangle all current X usages [16:44:50] in fact, I am encouraging [16:45:16] but budget was planned half a year ago [16:45:17] but I have no idea how that's going to influence the number of usages… depends on what the users are doing [16:45:31] and normally I get asked "how much do we need" [16:45:45] if the answer is "I do not know". we will get no resources :-) [16:45:59] but coding can happen in parallel [16:46:12] so no hard blockers for now [16:46:56] Ok, that sounds cool [16:47:02] so, my conclusion as actionables [16:47:10] can we fully deploy to that test production database? [16:47:16] fill it up? [16:47:23] and measire the impact? [16:47:48] and give more concrete numbers knowing the kind of activity we have now? [16:48:02] (I can tell you writes and reads stats of give you access) [16:48:12] is there something you would like to do instead? [16:48:19] No, that sounds good [16:48:20] or you would like me to do instead? [16:48:36] is elwiki enough for this or shall we also target another wiki? [16:48:39] did you think I was saying "no" to that project? [16:48:57] hoo: you are the expert :-) [16:49:09] No, totally not… I was just trying to make sure we're on the same page here :) [16:49:15] I will back what you propose [16:49:45] but I do not have servers on my pocket, and we have to be humble with our generous donor's money [16:49:55] as in, if it is needed, we buy it [16:50:11] but we need to have it clear what and why :-) [16:50:49] also, dba time is limited, so have into account if we are in other larger projects if you need a lot of help from us [16:50:54] that is why I mentioned s8 [16:51:09] as that is going to be our focus next quarter [16:51:18] that doesn't mean other things cannot advance [16:51:25] but they will get less priority [16:51:50] in fact, for wikidata, both s8 and MCR (some part) will hapen next quarter [16:52:09] so you are already getting a lot of attentiong from us dbas :-) [16:52:16] yes? [16:52:49] Sounds good… I'll make sure the table gets initially populated on elwiki in the next days (by running LinksUpdate for all articles) [16:52:57] after that, we can start measuring [16:53:06] or is the LinksUpdate step interesting for you already? [16:53:09] I think wikidata plans things with the rest of wmf teams [16:53:11] let me see [16:55:56] https://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2017-18_Q2 [16:56:25] See https://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2017-18_Q2#Technology_departmental_programs [16:56:58] your manager is supposed to either encourage or ask question about planned work, and coordinate with the rest of developers/technicall people [16:57:11] there you have s8 [16:57:19] which will be our focus [16:57:44] if you have your goals, you can give us a heads up of someting big you depend on us [17:00:33] I guess fine grained usage tracking is the only other thing we need your support for in the next time. [17:01:17] yeah, but that need coding first, right? [17:01:36] or the part that it is done we can do it until a problem shows up? [17:01:47] that doesn't take much of my time [17:01:59] Yeah, we can gradually role out statement usage tracking right now [17:02:02] setting up a new databas shard does (unless we use x1) [17:02:12] which we can do it also [17:02:19] the other parts are not yet finished, but once they are, we also need to carefully test them [17:02:30] my only point if, if we need to setup a new, let's say, x1 [17:02:31] *x2 [17:02:38] that will take time [17:02:46] that was my only "comment" :-) [17:03:23] you tell me now, I add it to the potential budget, and we try to know if it is will be true or not by the time the budget is decided [17:03:55] by that time, we need to be sure about the resources, so we keep deploying until I cry :-) [17:04:25] also, s8 will help more than you can thing [17:04:29] *think [17:04:38] because s8 will be wikidata [17:04:44] which means dewiki will have more resources [17:04:55] that can be used for some s3 project, etc. [17:05:01] True… but our most likely pain point here is commons, ruwiki, maybe enwiki, … [17:05:05] so it has some relation, even if it is not "wikidata-server" [17:05:13] those we need to test the watrs [17:05:17] *waters [17:05:26] dewiki doesn't use Wikidata much, so the impact there should be limited [17:05:46] deploying to smaller wiki and calculating how much it would take more or less for the larger ones [17:06:09] you can do it even know [17:06:19] mysql stats are public, at least some [17:06:24] *now [17:07:12] https://grafana.wikimedia.org/dashboard/db/mysql-aggregated?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-group=core&var-shard=s4&var-role=master [17:07:36] if we see an upwards tendency on rows written and rows read, that will be worring [17:08:32] I have to note, too, that some hardware renewal is happening, so that means there is chance it may not be needed [17:09:00] just keep me updated to where you deploy and we can observe the effects [17:09:09] and prepare measure before and after [17:09:10] etc. [17:09:34] I hope I have answered you, I will move on to other things :-) [17:09:48] Yes, that makes sense to me [17:09:59] I'll keep you updated, so that we can progress here [17:17:15] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3621975 (10hoo) After a few hours (w/o any mass purges from my side), the table looks like this: ``` +----------+ | C... [17:18:46] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3621993 (10hoo) Note: Before the deploy, `elwiki` had 798858 usages only: ``` mysql:wikiadmin@db1038 [elwiki]> SELECT... [17:31:56] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3622045 (10hoo) Just started refreshLinks.php for all articles on elwiki (https://wikitech.wikimedia.org/w/index.php?d... [17:35:59] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3622073 (10jcrespo) Cool, get if you can some `SHOW TABLE STATUS like stats, to get the "before" state in... [17:41:29] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3622100 (10hoo) (Shortly) after the refresh links got started: ``` mysql:wikiadmin@db1038 [elwiki]> SHOW TABLE STATUS... [21:27:02] 10DBA, 10Operations, 10Patch-For-Review: decommission db1036 - https://phabricator.wikimedia.org/T176311#3622925 (10jcrespo) partitioning finished, db1101 should be ready to be pooled as the new special slave.