[00:56:11] <wikibugs__>	 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: codfw rack/setup 22 DB servers - https://phabricator.wikimedia.org/T162159#3170256 (10Papaul)
[05:56:50] <wikibugs__>	 10DBA, 13Patch-For-Review: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#3170411 (10Marostegui) db1073 is now done: ``` root@neodymium:~# mysql --skip-ssl -hdb1073 enwiki -e "show create table revision\G" *************************** 1. row *...
[06:00:56] <wikibugs>	 10DBA, 13Patch-For-Review: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390#3170412 (10Marostegui) labsdb1001 is done: ``` [root@labsdb1001 05:59 /root] # for i in `cat s7_T160390`; do echo $i; mysql --skip-ssl $i -e "show create table revision\G" ; done arwiki ***************************...
[06:49:48] <wikibugs>	 10DBA, 13Patch-For-Review: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#3170529 (10Marostegui) dbstore2001 is done: ``` root@neodymium:/home/marostegui# mysql --skip-ssl -hdbstore2001.codfw.wmnet enwiki -e "show create table revision\G" ***...
[09:10:34] <wikibugs>	 10DBA: Unify revision table on s2 - https://phabricator.wikimedia.org/T162611#3170691 (10Marostegui) Started the ALTERs on dbstore2001.
[10:36:52] <wikibugs>	 07Blocked-on-schema-change, 10Wikidata, 03Wikidata-Sprint: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539#3170794 (10aude) I am not sure we are deploying new Wikidata code this week. (we normally deploy every other week)  If you really...
[12:16:48] <wikibugs>	 07Blocked-on-schema-change, 10Wikidata, 03Wikidata-Sprint: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539#3171058 (10Marostegui) >>! In T162539#3170794, @aude wrote: > I am not sure we are deploying new Wikidata code this week. (we nor...
[12:28:15] <wikibugs>	 10DBA: Network maintenance on row D - https://phabricator.wikimedia.org/T162681#3171082 (10Marostegui)
[12:28:31] <wikibugs__>	 10DBA: Network maintenance on row D - https://phabricator.wikimedia.org/T162681#3171082 (10Marostegui)
[12:29:38] <wikibugs__>	 10DBA: Network maintenance on row D (databases) - https://phabricator.wikimedia.org/T162681#3171100 (10ayounsi)
[12:33:26] <wikibugs__>	 10DBA, 13Patch-For-Review: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390#3171119 (10Marostegui) db1041, the primary master has been altered: ``` root@neodymium:/home/marostegui# for i in `cat s7_T160390`; do echo $i; mysql --skip-ssl -hdb1041.eqiad.wmnet $i -e "show create table revisi...
[12:36:42] <wikibugs__>	 10DBA: Network maintenance on row D (databases) - https://phabricator.wikimedia.org/T162681#3171124 (10Marostegui) We want to do some master switchovers while eqiad is on sby, so we'd need to coordinate it too: T162133
[12:37:45] <wikibugs__>	 07Blocked-on-schema-change, 10DBA, 05MW-1.28-release (WMF-deploy-2016-08-30_(1.28.0-wmf.17)), 05MW-1.28-release-notes, 13Patch-For-Review: Clean up revision UNIQUE indexes - https://phabricator.wikimedia.org/T142725#3171128 (10Marostegui) s7 (arwiki cawiki eswiki fawiki hewiki huwiki kowiki metawiki rowi...
[12:49:39] <wikibugs__>	 10DBA, 05codfw-rollout: Analyze if we want to replace some masters in eqiad while it is not active - https://phabricator.wikimedia.org/T162133#3171205 (10Marostegui) Let's coordinate with @ayounsi before attempting any switchover any of the masters to make sure T148506 and T162681 are not in the way of this.
[13:11:12] <wikibugs>	 07Blocked-on-schema-change, 10Wikidata, 03Wikidata-Sprint: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539#3171242 (10aude) @Marostegui I think starting around the ~20th (or whatever you think best) is good with us.
[14:47:10] <wikibugs__>	 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: [Task] Remove "all" option for Special:EntitiesWithout*" - https://phabricator.wikimedia.org/T161631#3137680 (10WMDE-leszek) > Done?  ping @Lydia_Pintscher @daniel
[14:48:35] <wikibugs>	 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3171536 (10Cmjohnson) @Marostegui   | db1096 |a1 **(no available u space — pick another location)** | db1097|d1 **(No issues)** | db1098|a2 **(Will definitely need a decom se...
[14:50:59] <wikibugs__>	 10DBA, 10Wikidata, 03Wikidata-Sprint: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user - https://phabricator.wikimedia.org/T160887#3171539 (10Lydia_Pintscher)
[14:56:58] <wikibugs__>	 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3171602 (10Marostegui) Thanks @Cmjohnson, what about these changes:  ``` db1096 - a6 db1098 - b5 db1099 - d3 ```
[15:02:18] <wikibugs__>	 10DBA, 06Operations: Decommission db1015, db1035, db1044 and db1038 - https://phabricator.wikimedia.org/T148078#3171633 (10jcrespo)
[15:02:23] <wikibugs>	 10DBA, 06Operations, 13Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3171634 (10jcrespo)
[15:02:25] <wikibugs__>	 10DBA, 13Patch-For-Review: run pt-table-checksum on s2 (WAS: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038) - https://phabricator.wikimedia.org/T154485#3171631 (10jcrespo) 05Open>03Resolved The modified task is for me complete- run pt-table-checksum on s2. I have not checked aev...
[15:03:00] <wikibugs>	 10DBA, 13Patch-For-Review: run pt-table-checksum on s2 (WAS: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038) - https://phabricator.wikimedia.org/T154485#3171649 (10Marostegui) Yaaaaaaaay!!! :-)  Thanks for the hard archeology work!
[15:05:43] <wikibugs>	 10DBA, 13Patch-For-Review: run pt-tablechecksum on s6 - https://phabricator.wikimedia.org/T160509#3171664 (10jcrespo) a:05Marostegui>03jcrespo I am claiming this just for coordination purposes, not meaning I do not recognize you (Manuel) have done most of the work already.
[15:05:56] <wikibugs__>	 10DBA, 13Patch-For-Review: run pt-tablechecksum on s6 - https://phabricator.wikimedia.org/T160509#3171666 (10jcrespo) p:05Triage>03High
[15:06:47] <wikibugs>	 10DBA, 13Patch-For-Review: run pt-tablechecksum on s6 - https://phabricator.wikimedia.org/T160509#3171672 (10Marostegui) >>! In T160509#3171664, @jcrespo wrote: > I am claiming this just for coordination purposes, not meaning I do not recognize you (Manuel) have done most of the work already.  Do not even need...
[15:09:00] <wikibugs__>	 10DBA, 06Operations: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3171683 (10jcrespo)
[15:09:29] <wikibugs__>	 10DBA, 06Operations: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3171683 (10jcrespo) Not for dc ops yet.
[15:10:36] <wikibugs>	 10DBA, 05codfw-rollout: Analyze if we want to replace some masters in eqiad while it is not active - https://phabricator.wikimedia.org/T162133#3171705 (10jcrespo)
[15:10:38] <wikibugs__>	 10DBA, 06Operations: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3171704 (10jcrespo)
[15:11:35] <wikibugs>	 10DBA, 06Operations: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3171683 (10jcrespo)
[15:12:22] <wikibugs__>	 10DBA, 06Operations: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3171709 (10Marostegui)
[15:12:32] <wikibugs>	 10DBA, 13Patch-For-Review: run pt-table-checksum on s2 (WAS: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038) - https://phabricator.wikimedia.org/T154485#3171711 (10jcrespo)
[15:12:34] <wikibugs__>	 10DBA, 06Operations: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3171710 (10jcrespo)
[15:12:53] <wikibugs>	 10DBA, 13Patch-For-Review: run pt-table-checksum on s2 (WAS: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038) - https://phabricator.wikimedia.org/T154485#3045895 (10jcrespo)
[15:12:55] <wikibugs__>	 10DBA, 06Operations: Decommission db1015, db1035, db1044 and db1038 - https://phabricator.wikimedia.org/T148078#3171712 (10jcrespo)
[15:21:10] <wikibugs__>	 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3171812 (10Cmjohnson) I need to check b5, it's a 24pt switch not 48. I believe there is 1 more available 1G port.
[15:23:30] <wikibugs__>	 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3171835 (10Marostegui) >>! In T162233#3171812, @Cmjohnson wrote: > I need to check b5, it's a 24pt switch not 48. I believe there is 1 more available 1G port.   If not, we ca...
[15:47:00] <volans>	 jynus, marostegui: anything against performing a codfw cache-wipe + warmup to test the procedure from your side?
[15:47:31] <marostegui>	 volans: you'll hit codfw core slaves + es?
[15:48:07] <volans>	 I'll hit whatever the mediawiki-cache-warmup hits :D
[15:48:10] <volans>	 but yeah, that's the idea
[15:48:39] <marostegui>	 no issues from my side
[15:48:47] <marostegui>	 if it is codfw XD
[15:49:24] <volans>	 this is the task
[15:49:24] <volans>	 https://github.com/wikimedia/operations-switchdc/blob/master/switchdc/stages/t04_cache_wipe.py
[15:50:07] <marostegui>	 last time joe did it (quite some weeks ago) it was fine, so I assume things have changed, but it should be fine
[15:50:37] <marostegui>	 go ahead I would say
[15:50:53] <volans>	 the script should be the same, no changes AFAIK, just that now is coded inside the switchdc stuff
[15:50:56] <volans>	 ok, thanks
[15:55:19] <jynus>	 I am about to run a large replace on db1030
[15:55:36] <marostegui>	 s6, right?
[15:55:41] <jynus>	 yes
[15:55:51] <marostegui>	 i am really worried I know that from memory
[15:56:25] <volans>	 marostegui: the living tendril-tree :-P
[15:56:29] <jynus>	 https://www.youtube.com/watch?v=9C4uTEEOJlM
[15:57:04] <marostegui>	 hahaha jynus!!!
[15:57:27] <volans>	 test completed, for what is worth (also failed, looking at nodejs logs)
[15:57:53] <jynus>	 nodejs?
[15:58:24] <volans>	 the warmup script was done by timo in nodejs
[15:58:30] <jynus>	 ah
[15:59:00] <marostegui>	 haha nice https://grafana-admin.wikimedia.org/dashboard/db/mysql-aggregated?panelId=8&fullscreen&orgId=1&var-dc=codfw%20prometheus%2Fops&var-group=core&var-shard=es1&var-shard=es2&var-shard=es3&var-shard=s1&var-shard=s2&var-shard=s3&var-shard=s4&var-shard=s5&var-shard=s6&var-shard=s7&var-role=All
[15:59:49] <volans>	 I might need to re-run it after we fix the small issue
[15:59:57] <jynus>	 what was it?
[16:00:11] <volans>	 missing / in a path :( blame _joe_ though :D
[16:07:12] <volans>	 jynus, marostegui FYI testing again, if still ok
[16:07:18] <marostegui>	 go!
[16:11:52] <volans>	 done
[16:12:01] <marostegui>	 :)
[16:19:26] <volans>	 marostegui: so looking at the dashboard, seems that the second time the impact was way less
[16:19:43] <volans>	 should we run this warmup also "before" the start of the switchover?
[16:19:56] <volans>	 seems like it warms a bit the db too and it's quicker the second time ;)
[16:20:23] <_joe_>	 yes
[16:23:14] <marostegui>	 I would go ahead and do it yes
[16:23:23] <marostegui>	 We are thinking about doing some warm ups from our side too
[16:23:46] <hashar>	 jynus: marostegui we had a bunch of database query errors on ruwiki  ( https://logstash.wikimedia.org/goto/37ba8f721daf247e92f001d619e871a1 )
[16:23:50] <hashar>	 probably on other wikis
[16:24:16] <hashar>	 most complaining about the server that went away
[16:24:23] <_joe_>	 so, another attempt happening now
[16:24:26] <hashar>	 and there was one for ruwiki that is quite concerning: Error: 1176 Key 'pl_from' doesn't exist in table 'pagelinks' (10.64.48.152)
[16:24:40] <_joe_>	 marostegui: ^^ codfw will be *hot* :P
[16:24:48] <jynus>	 pagelinks?
[16:24:57] <jynus>	 did you have a deploy?
[16:25:01] <marostegui>	 hashar: we are doing some maintenance over s6 at the moment (where ruwiki is) so that could explain that has gone away, the pagelinks thing is different
[16:25:08] <jynus>	 but not on pagelinks
[16:25:42] <jynus>	  Key 'pl_from' doesn't exist is a deployment-related error
[16:26:56] <marostegui>	 I think i know what is that coming from
[16:27:09] <jynus>	 what?
[16:27:10] <marostegui>	 it is only on db1093, where we dropped the UNIQUE key pl_from and converted it to PK
[16:27:14] <hashar>	 seems the pl_from missing are glitches from this morning  https://logstash.wikimedia.org/goto/7df60cb57d69039502d3c1f3814432da  
[16:27:25] <marostegui>	 but that was done some weeks ago
[16:27:27] <marostegui>	 not today
[16:27:45] <jynus>	 but that was done per request, right?
[16:27:56] <jynus>	 on merged deploy, right?
[16:28:20] <hashar>	 I dont think so
[16:28:25] <jynus>	 or is is the planned changes?
[16:28:44] <hashar>	 well from logstash link above there were some at 6:40 and a few others at 10:10 
[16:30:07] <marostegui>	 No, I am seeing that error the 30th of march too on frwiki which is s6
[16:30:33] <marostegui>	 for that same host (the only one in the shard with that index removed)
[16:30:48] <marostegui>	 well, removed -> convertd to PK
[16:31:12] <jynus>	 which code is forcing that index? because that is most likely a problem
[16:31:39] <jynus>	 ApiQueryLinks::run
[16:31:44] <marostegui>	 yep
[16:32:03] <jynus>	 so we better depool db1093 and revert that
[16:32:09] <marostegui>	 yes
[16:32:19] <hashar>	 and there is another one "Error: 1176 Key 'img_user_timestamp' doesn't exist in table 'image' "  from ApiQueryAllImages.php
[16:32:30] <hashar>	 (was on (10.64.32.136) )
[16:34:14] <marostegui>	 I can add an index with that name (and those columns)
[16:34:50] <jynus>	 yes, a duplicate index
[16:34:55] <marostegui>	 yep
[16:34:57] <jynus>	 probably better option
[16:35:00] <marostegui>	 yeah
[16:35:04] <jynus>	 so that we can transition
[16:35:07] <marostegui>	 I will do that once it gets depooled
[16:35:09] <jynus>	 documment the FORCE
[16:35:13] <marostegui>	 yep
[16:35:15] <jynus>	 so it also gets changed
[16:35:24] <jynus>	 and get listed on the renamed indexes
[16:35:32] <marostegui>	 wilco
[16:35:51] <hashar>	 thank you :-}
[16:35:52] <jynus>	 img_user_timestamp
[16:35:56] <jynus>	 is different
[16:36:57] <jynus>	 hashar, which wiki?
[16:37:14] <jynus>	 there are 917 wikis on that host
[16:37:37] <hashar>	 jynus: foundationwiki
[16:37:47] <hashar>	 according to https://logstash.wikimedia.org/goto/72a7394fe05e3bc3f7d1c94a5424f59b
[16:38:12] <hashar>	 most probably you would want some kind of logstash dashboard to highlight those?
[16:38:28] <jynus>	 we already have one
[16:38:30] <hashar>	 ah
[16:38:32] <jynus>	 but it is full of garbage
[16:39:01] <hashar>	 I guess it lacks reduplication?
[16:39:05] <hashar>	 deduplication 
[16:41:44] <jynus>	 marostegui, I do not see https://phabricator.wikimedia.org/T160415 applied to s3
[16:44:36] <marostegui>	 mmmm maybe i forgot s3??
[16:44:51] <jynus>	 forgot 800/900 wikis?
[16:44:58] <marostegui>	 don't know
[16:45:00] <marostegui>	 give me a sec
[16:45:15] <wikibugs>	 07Blocked-on-schema-change, 10DBA, 06Multimedia, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 3 others: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415#3172211 (10jcrespo) 05Resolved>03Open
[16:45:18] <marostegui>	 sounds strange that I forgot it
[16:45:55] <wikibugs__>	 07Blocked-on-schema-change, 10DBA, 06Multimedia, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 3 others: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415#3106145 (10jcrespo) p:05Triage>03High I do not see this applied...
[16:47:50] <jynus>	 on the other side, hashar, someone added an index hint and didn't add us a reviewer, which is a big no
[16:50:34] <hashar>	 maybe we can make CI to catch that
[16:50:43] <hashar>	 based on a reference schema living out of mediawiki repos
[16:51:18] <jynus>	 no, difficult
[16:51:23] <marostegui>	 ok, I am running the alters on db1093
[16:51:24] <jynus>	 that is a pure code thing
[16:51:27] <marostegui>	 I will start working on s3 now
[16:52:00] <jynus>	 it is not like modifying tables.sql or so
[16:52:07] <hashar>	 ;(
[16:52:14] <jynus>	 oh, you mean like
[16:52:33] <jynus>	 detecting the index exist when used as a hint
[16:52:39] <jynus>	 doable
[16:53:01] <jynus>	 but needs people to create unit tests
[16:53:10] <hashar>	 when a dev add a FORCE_INDEX I guess they also add a .sql patch file in the repo
[16:53:17] <jynus>	 no
[16:53:24] <jynus>	 that is the part that is independent
[16:53:26] <hashar>	 they just dont give a shit ?
[16:53:41] <hashar>	 (sorry for the bad languages)
[16:53:46] <jynus>	 you can force an index it doesnt exist or exists but hasn't been depoyed yet
[16:54:12] <jynus>	 it is just query(,,[force => 'bla bla'])
[16:54:23] <jynus>	 doesn't require a schema change
[16:54:33] <jynus>	 we we can do is maybe detect all instances
[16:54:41] <jynus>	 and document the hell of them
[16:55:01] <jynus>	 and force units tests and pointing to the use case they solve
[16:55:30] <jynus>	 I can also maintain a table.sql with the current, rather than the desired schema
[16:55:47] <jynus>	 to make it easier. Which repo could I use for that?
[16:57:03] <hashar>	 dont you already have a python script that captures the current state of all schema?
[16:57:21] <jynus>	 nope
[16:57:25] <marostegui>	 db1078 is done
[16:57:25] <jynus>	 I wish
[16:57:31] <marostegui>	 going with the other big server now
[16:59:04] <hashar>	 jynus: I was referring to the db check tool at https://gerrit.wikimedia.org/r/#/c/256231/
[16:59:08] <hashar>	 but that is probably different :D
[16:59:09] <marostegui>	 the other one is also done, so errors should be minimum now as those two servers server most of the main traffic
[16:59:18] <jynus>	 hashar, that is horrible
[16:59:30] <jynus>	 but it is still a draft
[16:59:39] <hashar>	 well that is a start :}
[17:00:02] <wikibugs__>	 07Blocked-on-schema-change, 10DBA, 06Multimedia, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 3 others: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415#3172287 (10Marostegui) Both main servers have been done now, so err...
[17:01:20] <hashar>	 anyway have to move out
[17:01:23] * hashar waves
[17:01:30] <marostegui>	 thanks for the heads up hashar
[17:01:38] <jynus>	 hashar, thanks for reporting these
[17:01:45] <hashar>	 you are welcome
[17:02:00] <jynus>	 you cought before it was worse
[17:02:20] <hashar>	 the dbquery error report actually came from Elena who was doing some QA testing
[17:02:27] <jynus>	 db1093 was a calculated risk
[17:02:32] <marostegui>	 i cannot believe how i forgot s3
[17:02:37] <marostegui>	 so embarrasing
[17:02:43] <jynus>	 np
[17:02:45] <hashar>	 and most probably we could polish up the logstash dashboard and or mediawiki logging
[17:02:56] <jynus>	 hashar, someone changed the database logging
[17:03:01] <hashar>	 ;(
[17:03:03] <jynus>	 and now it is a mess for me
[17:03:06] <hashar>	 welcome to our bazaar!
[17:03:13] <_joe_>	 ok, another pass!
[17:03:14] <hashar>	 but yeah I feel the pain of things ever moving
[17:03:19] <jynus>	 I have to check 3 different channels and the exception
[17:03:21] <jynus>	 and hhvm
[17:03:26] <jynus>	 to get db-related stuff
[17:04:31] <jynus>	 joe, not sure what you did- but after the latest pass, we have much less db traffic on codfw
[17:04:52] <jynus>	 maybe it wasn't you
[17:05:34] <marostegui>	 eqiad is altered now, all the serving hosts have the new index
[17:05:46] <volans>	 jynus: that more traffic started yesterday
[17:05:52] <jynus>	 ?
[17:06:00] <volans>	 6am
[17:06:16] <_joe_>	 jynus: I'll try again now
[17:06:18] <volans>	 than increased at 13
[17:07:25] <wikibugs>	 07Blocked-on-schema-change, 10DBA, 06Multimedia, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 3 others: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415#3172330 (10Marostegui) All eqiad main servers are now done, so erro...
[17:08:15] <jynus>	 I am looking at lower rows read since 16 UTC today
[17:08:50] <jynus>	 https://grafana.wikimedia.org/dashboard/db/mysql-aggregated?panelId=8&fullscreen&orgId=1&from=1491844120411&to=1491930520412&var-dc=codfw%20prometheus%2Fops&var-group=core&var-shard=s1&var-shard=s2&var-shard=s3&var-shard=s4&var-shard=s5&var-shard=s6&var-shard=s7&var-shard=x1&var-role=All
[17:09:26] <jynus>	 for which I do not really have an explanation
[17:10:09] <jynus>	 mostly s4 and s5
[17:10:22] <volans>	 jynus: yes but go back to last 2 days
[17:10:28] <jynus>	 ah
[17:10:28] <volans>	 and you'll see it was down before
[17:10:31] <jynus>	 let's see
[17:10:46] <jynus>	 I see now
[17:10:56] <volans>	 and before was higher and lower, it's not constant
[17:11:30] <jynus>	 that is a mistery, maybe some write process sending reads though replication?
[17:12:03] <jynus>	 oh, I know
[17:12:10] <jynus>	 pt-table-checksum would fit there
[17:12:16] <jynus>	 so no worries
[17:12:40] <jynus>	 it is good there is some activity- warming up the slaves
[17:15:02] <volans>	 eheheh
[17:15:38] <jynus>	 although your script doesn't touch es2* servers much, which was kind of original reason
[17:17:19] <volans>	 it's not mine :D
[17:17:49] <jynus>	 I know
[17:18:02] <wikibugs>	 07Blocked-on-schema-change, 10DBA, 06Multimedia, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 3 others: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415#3172413 (10Marostegui) db1069 (sanitarium) db1095 (sanitarium2), la...
[17:18:06] <jynus>	 short for that you are running on behalf of those that have created it
[17:18:24] <volans>	 :-P
[17:18:29] <jynus>	 I shared a theory with manuel that the script does nothing
[17:18:52] <jynus>	 but we say it worked because no problem happenend on the last codfw -> eqiad transition
[17:18:53] <volans>	 at DB level? or app level?
[17:19:12] <marostegui>	 going to repool db1093 after adding the pl_from index
[17:19:19] <jynus>	 at DB level meaning the issue that es hosts got saturated last time
[17:19:39] <volans>	 I'm pretty sure that doens't help
[17:19:39] <jynus>	 and that maybe the issues will repeat again
[17:19:59] <volans>	 it's pre-warming mainly apc and memcache caches
[17:20:06] <volans>	 for few pages
[17:20:08] <jynus>	 yeah, the question
[17:20:18] <jynus>	 is if that is enough to not cause es contention
[17:20:26] <volans>	 for the es probably is easier for you to run some warmup queries?
[17:20:29] <jynus>	 or it was because eqiad was already warmed up
[17:20:42] <jynus>	 volans, do not worry
[17:20:47] <jynus>	 we have been working on that
[17:21:28] <jynus>	 but with almost 20 TB of data, it is difficult to know how to warmup effectively
[17:21:57] <volans>	 yeah!
[17:22:08] <jynus>	 plus it takes forever- 16 minutes just to go over the last X years of testwiki
[17:23:20] <volans>	 what have you choose to warmup? getting RO query logs from eqiad and replay it?
[17:23:35] <jynus>	 enwiki, latest X years
[17:24:15] <wikibugs>	 07Blocked-on-schema-change, 10DBA, 06Multimedia, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 3 others: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415#3172455 (10Marostegui) dbstore1001 and dbstore1002 are done
[17:24:34] <volans>	 ok
[17:24:43] <jynus>	 replication already warms up latests edits
[17:24:52] <jynus>	 we need to do it with older, but still used ones
[17:25:48] <volans>	 yeah, that's why I was thinking about replaying the user's traffic, at least a bit of it
[17:26:24] <jynus>	 but even that is not that useful
[17:26:51] <jynus>	 given that traffic to the db is different from the one with cached revisions or preparsed ones
[17:29:09] <volans>	 right
[17:30:55] <wikibugs>	 07Blocked-on-schema-change, 10DBA, 06Multimedia, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 3 others: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415#3172480 (10jcrespo) p:05High>03Normal
[18:11:46] <wikibugs>	 07Blocked-on-schema-change, 10DBA, 06Multimedia, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 3 others: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415#3172658 (10Marostegui) dbstore2002 and dbstore2001 are done.
[18:21:54] <wikibugs__>	 10DBA, 13Patch-For-Review: run pt-tablechecksum on s6 - https://phabricator.wikimedia.org/T160509#3172687 (10jcrespo) Core servers have been all checked an fixed. Only ones missing are:   ``` dbstore1001 +--------+----------+------------+--------+ | db     | tbl      | total_rows | chunks | +--------+---------...
[18:25:53] <wikibugs__>	 10DBA, 10Wikidata: Repeated reports of wikidatawiki (s5) API going read only - https://phabricator.wikimedia.org/T123867#3172690 (10Ladsgroup) @Multichill: Do you get such issues recently? My bot stopped getting them
[18:51:38] <wikibugs>	 10DBA, 10MediaWiki-Database, 13Patch-For-Review, 07PostgreSQL, 07Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3172790 (10jcrespo) tl_from may need the same fix than pl_from, I am seeing some errors on db1093 on ruwiki.
[18:55:27] <wikibugs__>	 10DBA, 10ArchCom-RfC, 10MediaWiki-Database, 07RfC: Should we bump minimum supported MySQL Version? - https://phabricator.wikimedia.org/T161232#3172816 (10Reedy)
[18:59:55] <wikibugs__>	 10DBA, 10MediaWiki-Database, 13Patch-For-Review, 07PostgreSQL, 07Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3172819 (10Marostegui) >>! In T17441#3172790, @jcrespo wrote: > tl_from may need the same fix than pl_from...
[19:02:01] <wikibugs>	 10DBA, 10MediaWiki-Database, 13Patch-For-Review, 07PostgreSQL, 07Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3172825 (10Marostegui) >>! In T17441#3172819, @Marostegui wrote: >>>! In T17441#3172790, @jcrespo wrote: >...
[19:07:19] <wikibugs__>	 10DBA, 10ArchCom-RfC, 10MediaWiki-Database, 07RfC: Should we bump minimum supported MySQL Version? - https://phabricator.wikimedia.org/T161232#3172850 (10Reedy) Crossposted to mailing lists  https://lists.wikimedia.org/pipermail/mediawiki-l/2017-April/046494.html https://lists.wikimedia.org/pipermail/wikit...
[19:15:03] <wikibugs>	 10DBA, 10MediaWiki-Database, 13Patch-For-Review, 07PostgreSQL, 07Schema-change: Some tables lack unique or primary keys, may allow confusing duplicate data - https://phabricator.wikimedia.org/T17441#3172927 (10jcrespo) I do not think this is an unbreak now- we can probably wait until tomorrow. This went...
[19:16:38] <wikibugs__>	 07Blocked-on-schema-change, 10DBA, 06Multimedia, 05MW-1.29-release (WMF-deploy-2017-03-21_(1.29.0-wmf.17)), and 3 others: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415#3172932 (10Marostegui) All codfw slaves are done: ``` db2057 db2050...
[20:19:39] <wikibugs__>	 10DBA, 13Patch-For-Review: run pt-tablechecksum on s6 - https://phabricator.wikimedia.org/T160509#3173259 (10jcrespo) dbstore1001 and dbstore1002 fixed, checksums for dbstore2001 pending. dbstore2001 seems with smaller errors, 1002 required almost full table reimports.
[21:08:45] <wikibugs__>	 10DBA, 10ArchCom-RfC, 10MediaWiki-Database, 07RfC: Should we bump minimum supported MySQL Version? - https://phabricator.wikimedia.org/T161232#3125897 (10Legoktm) For MariaDB, is it the same version number?
[21:26:41] <wikibugs>	 10DBA, 10ArchCom-RfC, 10MediaWiki-Database, 07RfC: Should we bump minimum supported MySQL Version? - https://phabricator.wikimedia.org/T161232#3173416 (10Reedy) >>! In T161232#3173372, @Legoktm wrote: > For MariaDB, is it the same version number?  I think for 5.5 yes..  For when we only support mysql >= 5....
[21:30:02] <wikibugs>	 10DBA, 10ArchCom-RfC, 10MediaWiki-Database, 07RfC: Should we bump minimum supported MySQL Version? - https://phabricator.wikimedia.org/T161232#3125897 (10Kghbln) > While the vast majority (over 65%) is on MySQL 5.5 or higher, MySQL 5.1 at 28% is still quite significant.  From my experience: I guess this is...
[21:53:22] <wikibugs__>	 10DBA, 10ArchCom-RfC, 10MediaWiki-Database, 07RfC: Should we bump minimum supported MySQL Version? - https://phabricator.wikimedia.org/T161232#3173448 (10Krinkle) >>! In T161232#3173421, @Kghbln wrote: >>>! @Krinkle wrote: >> While the vast majority (over 65%) is on MySQL 5.5 or higher, MySQL 5.1 at 28% is...
[21:54:00] <wikibugs>	 10DBA, 10Wikidata, 07Performance, 15User-Daniel, and 2 others: Use redis-based lock manager in dispatch changes in production - https://phabricator.wikimedia.org/T159826#3173453 (10Ladsgroup) The patch is there (https://gerrit.wikimedia.org/r/#/c/347395/) and I talked to Ops, It seems they are okay with me...
[21:54:30] <wikibugs>	 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: codfw rack/setup 22 DB servers - https://phabricator.wikimedia.org/T162159#3173455 (10Papaul)
[22:21:14] <wikibugs__>	 10DBA, 10ArchCom-RfC, 10MediaWiki-Database, 07RfC: Should we bump minimum supported MySQL Version? - https://phabricator.wikimedia.org/T161232#3173531 (10Kghbln) > This shows a figure of 8% instead of 28%, which would include your example of a hosting provider upgrading PHP but not MySQL.  Yeah, that's tru...