[00:02:10] <wikibugs>	 10DBA, 10Community-Tech-Sprint, 10MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10Patch-For-Review: Issue with maintenance script: SELECTing revisions with high rev_id is pai... - https://phabricator.wikimedia.org/T175962#3620066
[00:19:26] <wikibugs>	 10DBA, 10Community-Tech-Sprint, 10MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10Patch-For-Review: Issue with maintenance script: SELECTing revisions with high rev_id is pai... - https://phabricator.wikimedia.org/T175962#3620110
[08:08:53] <wikibugs_>	 10DBA, 10Cloud-Services, 10Toolforge: Disabling general.confirmeduser from dbreports for using up too much db resources - https://phabricator.wikimedia.org/T131956#3620427 (10jcrespo) p50380g50440 was running several queries that were never going to stop executing, and causing 1 day of lag on labsdb1001:  ``...
[11:14:10] <Amir1>	 jynus: Hey, I wanted to say, right now the storage for wikidatawiki is growing but we are doing some stuff that frees up some space e.g. we are dropping entity_per_page (~30M rows)
[11:24:25] <wikibugs_>	 10DBA, 10Operations, 10Patch-For-Review: decommission db1018 - https://phabricator.wikimedia.org/T176215#3620825 (10jcrespo)
[11:24:28] <wikibugs_>	 10DBA, 10Operations, 10Patch-For-Review: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3620824 (10jcrespo)
[11:24:40] <wikibugs_>	 10DBA, 10Operations, 10Patch-For-Review: decommission db1018 - https://phabricator.wikimedia.org/T176215#3617573 (10jcrespo)
[11:24:43] <wikibugs_>	 10DBA, 10Operations, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3620828 (10jcrespo)
[11:27:05] <wikibugs_>	 10DBA, 10Operations: decommission db1036 - https://phabricator.wikimedia.org/T176311#3620829 (10jcrespo)
[11:52:40] <wikibugs_>	 10DBA, 10Operations, 10Patch-For-Review: decommission db1036 - https://phabricator.wikimedia.org/T176311#3620896 (10jcrespo) Repartitioning db1101 is ongoing (while replication is down) so that it can substitute db1036 role.
[12:15:07] <jynus>	 I am not worried about wikidatawiki size
[12:15:18] <jynus>	 I am worried about the recentchanges per-wiki size
[12:23:39] <wikibugs_>	 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3620930 (10hoo) This is in effect now and the first statement usages are coming in on elwiki:   ``` +--------+--------...
[12:44:03] <elukey>	 I'd like to remove .htaccess support for tendril (https://gerrit.wikimedia.org/r/#/c/378855/1), anything against it ?
[12:48:23] <elukey>	 I checked on dbmonitoring and no .htaccess is there
[12:49:03] <elukey>	 so I am pretty confident that it should be fine, but I'd need also somebody to double check since it is an important website for you guys
[12:53:56] <jynus>	 wait
[12:54:27] <jynus>	 last time someone changed an apache rule, we had a security vulnerability
[12:54:30] <jynus>	 on tendril
[12:56:00] <jynus>	 deploy, but test on dbmonitor2001 first (stop puppet on dbmonitor1001 first)
[12:56:21] <elukey>	 sure
[13:09:41] <elukey>	 jynus: just deployed on dbmonitor2001.wikimedia.org
[13:09:47] <elukey>	 (1001 has puppet disabled)
[13:12:54] <jynus>	 do you want me to check the authentication or should I=
[13:13:00] <jynus>	 ?
[13:13:08] <elukey>	 if you could that would be great
[13:13:52] <jynus>	 one sec
[13:16:04] <jynus>	 401 Unauthorized, which seems ok to me
[13:16:12] <elukey>	 super
[13:16:15] <jynus>	 you can deploy to dbmonitor1001
[13:16:20] <elukey>	 thanks a lot! 
[13:16:32] <jynus>	 if there is something broken, (unlikely) now it would be under password
[13:16:39] <jynus>	 so less prioritary
[13:16:57] <jynus>	 sorry to be paranoid, but I wanted to cover that possibility
[13:17:44] <jynus>	 tell me when you have done it so I can retest
[13:18:05] <elukey>	 nono I appreciate it, better safe than sorry.. After the optionsbleed issue I tried to remove all the unnecessary htaccess directives in our codebase (not really necessary but better not use .htaccess :)
[13:18:16] <jynus>	 which is cool
[13:18:21] <jynus>	 and even cooler to ask each owner
[13:18:44] <elukey>	 all right apache restarted
[13:18:54] <jynus>	 one never knows which old cruft may be on lesser services
[13:19:22] <jynus>	 everthing looking good
[13:19:38] <elukey>	 just checked as well, all good
[13:19:41] <elukey>	 thanks!
[13:20:18] <jynus>	 no, thanks to you!
[13:50:58] <wikibugs_>	 10DBA, 10MediaWiki-extensions-FlaggedRevs, 10MediaWiki-extensions-UserMerge, 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10Schema-change: flaggedrevs.fr_user is unindexed - https://phabricator.wikimedia.org/T172207#3621161 (10Reedy)
[13:51:42] <wikibugs_>	 10DBA, 10MediaWiki-extensions-FlaggedRevs, 10MediaWiki-extensions-UserMerge, 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10Schema-change: flaggedrevs.fr_user is unindexed - https://phabricator.wikimedia.org/T172207#3490037 (10Reedy) Ok, so patch merged.  It doesn't need adding to WMF...
[13:52:19] <Reedy>	 jynus: ^ I'll leave that in your capable hands :)
[14:26:23] <jynus>	 Reedy: you need to retag (or create a separate task, whatever you prefer) with blocked-on-schema change, otherwise we may miss it
[14:26:53] <Reedy>	 The question is whether it's worth deploying it
[14:26:58] <Reedy>	 If you think so, I can do that
[15:32:01] <hoo>	 jynus: marostegui: Do you have a moment?
[15:32:18] * hoo would like to talk about https://phabricator.wikimedia.org/T176273 and https://phabricator.wikimedia.org/T151717#3620930
[15:32:23] <hoo>	 (which are related)
[16:17:23] <jynus>	 sure
[16:18:58] <jynus>	 hoo: do you want IRC or in person?
[16:20:02] <hoo>	 IRC is fine, I just want to make sure we're all on the same page, so that we have an agreed way forward
[16:21:53] <jynus>	 yes, please go on
[16:22:06] <jynus>	 I did not object to none of your plans
[16:22:14] <jynus>	 in fact, I was pushing for it
[16:23:09] <jynus>	 I was trying to set realistic expectations in DBAs time, which of course are limited (we have s8, MCR and other wikidata tasks pending)
[16:23:15] <jynus>	 and all it is negotiable
[16:26:17] <hoo>	 Sounds good… so, can we get this rolling in order to get a concrete timeline? Usage tracking is very important, thus we don't want to loose time
[16:28:19] <jynus>	 yes, my point being that I wasn't a blocker
[16:28:44] <jynus>	 because even if we do not have the time or the hardware, we can roll it on the same servers, and use parallel replication to get a good advantage
[16:29:00] <jynus>	 does it make sense?
[16:29:21] <hoo>	 Yeah, sounds good to me
[16:29:26] <jynus>	 e.g. enwiki metadata + enwiki tracking on the same set of servers, but independent
[16:29:41] <hoo>	 I just need to know how much more stress we can then put on the table
[16:29:52] <hoo>	 I'm currently doing the https://phabricator.wikimedia.org/T151717#3620930 trial
[16:29:59] <jynus>	 and later, we separate it on different set of servers (enwiki regular metadata and all tracking data from all shards)
[16:30:04] <hoo>	 and it's not looking like that's going to be soft on the DB
[16:30:20] <jynus>	 so, basically, I will tell you when things go bad :-)
[16:30:43] <jynus>	 currently there are some issues on comonswiki (due to imports)
[16:30:55] <jynus>	 and wikidata (maintenance script + bots)
[16:31:02] <jynus>	 those would be the main blockers
[16:31:08] <jynus>	 most other shards ok ok in write load
[16:31:28] <jynus>	 would wikidata have tracking of its own?
[16:31:36] <jynus>	 e.g. is wikibase-client installed on wikidata?
[16:31:42] <hoo>	 It is, but it's not used much
[16:31:50] <jynus>	 ok, so that is one less problem
[16:31:57] <jynus>	 what about commons?
[16:32:07] <hoo>	 It does… and it's quite painful there
[16:32:18] <jynus>	 mmm, potential pain there
[16:32:26] <jynus>	 I think maybe s3 would be worse
[16:32:44] <jynus>	 because there are very small wikis with more recentchanges due to wikidata than the wiki's activity
[16:32:59] <jynus>	 so the effects gets multiplied x 300
[16:33:15] <hoo>	 Yeah, I know… more fine grained usage tracking will fix this
[16:33:21] <jynus>	 so, if I understand you well, you have the fear that if we do it in place, it will be too late?
[16:33:24] <hoo>	 but that takes us back to #0
[16:33:28] <jynus>	 (potentially)
[16:33:34] <hoo>	 too late?
[16:33:49] <jynus>	 like, too many writes happening on a single shard due to the tracking?
[16:34:22] <hoo>	 Yeah, especially once we enable new usage types, this might blow up the table in a short time
[16:34:26] <jynus>	 my advice right now
[16:34:35] <jynus>	 have a full deployment of a single wiki
[16:34:43] <jynus>	 so the table is fully populated
[16:35:09] <jynus>	 and let's do a good approximation of "cost", in the number of writes/wikidata activity
[16:35:23] <jynus>	 then we reevaluate when/how to deploy the rest?
[16:35:35] <jynus>	 is it posible, or do you need things "faster"?
[16:36:31] <jynus>	 the thing is, I would not compromise to have a full deployment within 3 months
[16:36:46] <jynus>	 beacuse the very same reason you comment
[16:36:50] <hoo>	 Well, there have been some notions of this being pushed for in the fundation
[16:37:09] <jynus>	 sure, I am not saying this should not happen
[16:37:28] <jynus>	 I am adding the possiblity that, there is a change that new hardware is needed
[16:37:46] <jynus>	 hardware takes some time to be approved, bought and setup :-)
[16:37:52] <hoo>	 I can see for sure
[16:38:03] <jynus>	 so, you are 100% sure it would be the case?
[16:38:26] <hoo>	 What exactly? That we will need a lot more resources on this end? Yes, absolutely
[16:38:42] <jynus>	 yes, but 100% sure new servers are needed
[16:39:05] <jynus>	 which is ok, it is not a problem, it just changes the timeline
[16:39:38] <jynus>	 this is not a blocker of "we are not going to do this"
[16:39:51] <jynus>	 this would be just a case of "ok, how do we do this?"
[16:40:14] <jynus>	 yes?
[16:40:15] <hoo>	 I'd be surprised if not, but I'm also not an expert in these regards
[16:40:28] <jynus>	 well, hence my proposal on measuring the impact
[16:40:31] <hoo>	 I can tell that the number of rows and the number of writes is going to increase
[16:40:36] <jynus>	 ok
[16:40:51] <jynus>	 that is what we wanted, measure the current deployment once it is complete
[16:41:06] <jynus>	 and then think about resources, right?
[16:41:19] <jynus>	 in some cases we could even combine more deployment
[16:41:25] <jynus>	 with purchases
[16:41:30] <jynus>	 and then move them
[16:41:52] <jynus>	 e.g. s6 is normally lower in writes
[16:42:12] <jynus>	 we could deploy to more wikis, even if that means an increase in writes
[16:42:20] <jynus>	 and reevaluate if we need more resources
[16:42:28] <jynus>	 it is an iterative process
[16:42:38] <hoo>	 You mean the statement usage tracking?
[16:42:43] <jynus>	 yes
[16:42:50] <jynus>	 aren't we talking about that?
[16:42:57] <jynus>	 my question is, where do you see a problem with what I am saying?
[16:43:06] <jynus>	 you can keep coding
[16:43:13] <jynus>	 and we can continue deploying
[16:43:19] <jynus>	 we keep measuring
[16:43:27] <hoo>	 That sounds good to me
[16:43:29] <jynus>	 if, let's say, commons, we cannot deploy
[16:43:41] <jynus>	 we pause on that wiki and purchase hardware
[16:43:57] <jynus>	 in fact, we also have x1
[16:43:59] <hoo>	 but we will have even more fine grained tracking coming up, so the number of rows and writes will even more increase
[16:44:11] <hoo>	 but I guess statement usage tracking is the biggest step
[16:44:13] <jynus>	 more than statement?
[16:44:19] <jynus>	 can you tell me more?
[16:44:42] <jynus>	 I am not blocking anything of what you are doing :-D
[16:44:48] <hoo>	 Yeah, we plan to also track description usages separately and also try to entangle all current X usages
[16:44:50] <jynus>	 in fact, I am encouraging
[16:45:16] <jynus>	 but budget was planned half a year ago
[16:45:17] <hoo>	 but I have no idea how that's going to influence the number of usages… depends on what the users are doing
[16:45:31] <jynus>	 and normally I get asked "how much do we need"
[16:45:45] <jynus>	 if the answer is "I do not know". we will get no resources :-)
[16:45:59] <jynus>	 but coding can happen in parallel
[16:46:12] <jynus>	 so no hard blockers for now
[16:46:56] <hoo>	 Ok, that sounds cool
[16:47:02] <jynus>	 so, my conclusion as actionables
[16:47:10] <jynus>	 can we fully deploy to that test production database?
[16:47:16] <jynus>	 fill it up?
[16:47:23] <jynus>	 and measire the impact?
[16:47:48] <jynus>	 and give more concrete numbers knowing the kind of activity we have now?
[16:48:02] <jynus>	 (I can tell you writes and reads stats of give you access)
[16:48:12] <jynus>	 is there something you would like to do instead?
[16:48:19] <hoo>	 No, that sounds good
[16:48:20] <jynus>	 or you would like me to do instead?
[16:48:36] <hoo>	 is elwiki enough for this or shall we also target another wiki?
[16:48:39] <jynus>	 did you think I was saying "no" to that project?
[16:48:57] <jynus>	 hoo: you are the expert :-)
[16:49:09] <hoo>	 No, totally not… I was just trying to make sure we're on the same page here :)
[16:49:15] <jynus>	 I will back what you propose 
[16:49:45] <jynus>	 but I do not have servers on my pocket, and we have to be humble with our generous donor's money
[16:49:55] <jynus>	 as in, if it is needed, we buy it
[16:50:11] <jynus>	 but we need to have it clear what and why :-)
[16:50:49] <jynus>	 also, dba time is limited, so have into account if we are in other larger projects if you need a lot of help from us
[16:50:54] <jynus>	 that is why I mentioned s8
[16:51:09] <jynus>	 as that is going to be our focus next quarter
[16:51:18] <jynus>	 that doesn't mean other things cannot advance
[16:51:25] <jynus>	 but they will get less priority
[16:51:50] <jynus>	 in fact, for wikidata, both s8 and MCR (some part) will hapen next quarter
[16:52:09] <jynus>	 so you are already getting a lot of attentiong from us dbas :-)
[16:52:16] <jynus>	 yes?
[16:52:49] <hoo>	 Sounds good… I'll make sure the table gets initially populated on elwiki in the next days (by running LinksUpdate for all articles)
[16:52:57] <hoo>	 after that, we can start measuring
[16:53:06] <hoo>	 or is the LinksUpdate step interesting for you already?
[16:53:09] <jynus>	 I think wikidata plans things with the rest of wmf teams
[16:53:11] <jynus>	 let me see
[16:55:56] <jynus>	 https://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2017-18_Q2
[16:56:25] <jynus>	 See https://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2017-18_Q2#Technology_departmental_programs
[16:56:58] <jynus>	 your manager is supposed to either encourage or ask question about planned work, and coordinate with the rest of developers/technicall people
[16:57:11] <jynus>	 there you have s8
[16:57:19] <jynus>	 which will be our focus
[16:57:44] <jynus>	 if you have your goals, you can give us a heads up of someting big you depend on us
[17:00:33] <hoo>	 I guess fine grained usage tracking is the only other thing we need your support for in the next time.
[17:01:17] <jynus>	 yeah, but that need coding first, right?
[17:01:36] <jynus>	 or the part that it is done we can do it until a problem shows up?
[17:01:47] <jynus>	 that doesn't take much of my time
[17:01:59] <hoo>	 Yeah, we can gradually role out statement usage tracking right now
[17:02:02] <jynus>	 setting up a new databas shard does (unless we use x1)
[17:02:12] <jynus>	 which we can do it also
[17:02:19] <hoo>	 the other parts are not yet finished, but once they are, we also need to carefully test them
[17:02:30] <jynus>	 my only point if, if we need to setup a new, let's say, x1
[17:02:31] <jynus>	 *x2
[17:02:38] <jynus>	 that will take time
[17:02:46] <jynus>	 that was my only "comment" :-)
[17:03:23] <jynus>	 you tell me now, I add it to the potential budget, and we try to know if it is will be true or not by the time the budget is decided
[17:03:55] <jynus>	 by that time, we need to be sure about the resources, so we keep deploying until I cry :-)
[17:04:25] <jynus>	 also, s8 will help more than you can thing
[17:04:29] <jynus>	 *think
[17:04:38] <jynus>	 because s8 will be wikidata
[17:04:44] <jynus>	 which means dewiki will have more resources
[17:04:55] <jynus>	 that can be used for some s3 project, etc.
[17:05:01] <hoo>	 True… but our most likely pain point here is commons, ruwiki, maybe enwiki, …
[17:05:05] <jynus>	 so it has some relation, even if it is not "wikidata-server"
[17:05:13] <jynus>	 those we need to test the watrs
[17:05:17] <jynus>	 *waters
[17:05:26] <hoo>	 dewiki doesn't use Wikidata much, so the impact there should be limited
[17:05:46] <jynus>	 deploying to smaller wiki and calculating how much it would take more or less for the larger ones
[17:06:09] <jynus>	 you can do it even know
[17:06:19] <jynus>	 mysql stats are public, at least some
[17:06:24] <jynus>	 *now
[17:07:12] <jynus>	 https://grafana.wikimedia.org/dashboard/db/mysql-aggregated?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-group=core&var-shard=s4&var-role=master
[17:07:36] <jynus>	 if we see an upwards tendency on rows written and rows read, that will be worring
[17:08:32] <jynus>	 I have to note, too, that some hardware renewal is happening, so that means there is chance it may not be needed
[17:09:00] <jynus>	 just keep me updated to where you deploy and we can observe the effects
[17:09:09] <jynus>	 and prepare measure before and after
[17:09:10] <jynus>	 etc.
[17:09:34] <jynus>	 I hope I have answered you, I will move on to other things :-)
[17:09:48] <hoo>	 Yes, that makes sense to me
[17:09:59] <hoo>	 I'll keep you updated, so that we can progress here
[17:17:15] <wikibugs_>	 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3621975 (10hoo) After a few hours (w/o any mass purges from my side), the table looks like this:  ``` +----------+ | C...
[17:18:46] <wikibugs_>	 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3621993 (10hoo) Note: Before the deploy, `elwiki` had 798858 usages only:  ``` mysql:wikiadmin@db1038 [elwiki]> SELECT...
[17:31:56] <wikibugs_>	 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3622045 (10hoo) Just started refreshLinks.php for all articles on elwiki (https://wikitech.wikimedia.org/w/index.php?d...
[17:35:59] <wikibugs_>	 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3622073 (10jcrespo) Cool, get if you can some `SHOW TABLE STATUS like <tablename> stats, to get the "before" state in...
[17:41:29] <wikibugs_>	 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 10Patch-For-Review, and 2 others: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3622100 (10hoo) (Shortly) after the refresh links got started:  ``` mysql:wikiadmin@db1038 [elwiki]> SHOW TABLE STATUS...
[21:27:02] <wikibugs_>	 10DBA, 10Operations, 10Patch-For-Review: decommission db1036 - https://phabricator.wikimedia.org/T176311#3622925 (10jcrespo) partitioning finished, db1101 should be ready to be pooled as the new special slave.