[00:59:01] 10DBA, 06Labs, 13Patch-For-Review, 07Regression: Tool Labs: Add skin, language, and variant to user_properties_anon - https://phabricator.wikimedia.org/T152043#2836353 (10Andrew) I merged the puppet change, but maybe this needs to be run by hand -- I've never done it. [01:41:01] 10DBA, 10ArchCom-RfC, 10MediaWiki-Database, 07RfC: Should we bump minimum supported MySQL Version? - https://phabricator.wikimedia.org/T161232#3125897 (10Krinkle) +1 for bumping the MySQL requirement 5.5. | MediaWiki | Note | Released | EOL |--|--|--|-- |... [05:50:08] 10DBA, 13Patch-For-Review: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390#3153003 (10Marostegui) dbstore2001 is done. [06:05:35] 10DBA, 13Patch-For-Review: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390#3153013 (10Marostegui) db1034 was already done, so no need to do that one. [06:37:49] 10DBA, 13Patch-For-Review: Remove partitions from metawiki.pagelinks in s7 - https://phabricator.wikimedia.org/T153300#3153062 (10Marostegui) db2029 is done: ``` root@db2029.codfw.wmnet[metawiki]> select @@hostname; +------------+ | @@hostname | +------------+ | db2029 | +------------+ 1 row in set (0.03 s... [06:45:49] 10DBA: Remove partitioning from db2019 (codfw master) commonswiki.templatelinks - https://phabricator.wikimedia.org/T161683#3153070 (10Marostegui) a:03Marostegui [06:46:00] 10DBA: Remove partitioning from db2019 (codfw master) commonswiki.templatelinks - https://phabricator.wikimedia.org/T161683#3139513 (10Marostegui) This alter table is now running [06:49:36] we have double the traffic than usual on s1 [06:49:41] since 20:00 UTC [06:50:58] oh right [06:51:18] https://grafana.wikimedia.org/dashboard/db/mysql-aggregated?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-group=core&var-shard=s1&var-role=All&from=1491202270252&to=1491288670253 [06:51:43] • 20:44 bsitzmann@tin: Started deploy [mobileapps/deploy@20ab197]: Update mobileapps to https://gerrit.wikimedia.org/r/#/q/fdd4e31 [06:51:46] • 19:21 hashar: Finished deployment of project-logos optimization for https://phabricator.wikimedia.org/T161999 / https://gerrit.wikimedia.org/r/#/c/346057/ . And purged the related logos [06:51:50] • 19:18 hashar@tin: Synchronized static/images/project-logos: Optimize a few project logos - https://phabricator.wikimedia.org/T161999 (duration: 00m 44s) [06:52:19] none seem to fit [06:52:59] https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1080&from=now-24h&to=now -> looks pretty big indeed [06:56:29] are dumps runing or something? [06:57:29] well, they shouldn't affect db1080 (if they are running) [06:57:45] and db1080 has a big increase in connections, traffic, sorts, etc [06:57:49] true, this is main traffic stuff [06:58:12] i am checking operations logs from yesterday around that time too [07:03:06] it happens around 20:01:40 [07:11:33] i have went thru all the phabricator tickets around that time looking for some updates or stuff, but I haven't found anything relevant [07:13:06] I am checking performance_schema to understand which query or queries are sent so frequently [07:29:21] SELECT `page_id` , `page_len` , `page_is_redirect` , `page_latest` , `page_content_model` FROM `page` WHERE `page_namespace` = ? AND `page_title` = ? LIMIT ? [07:29:29] This is the most frequenty query by far [07:29:32] A tcpdump revelas that LinkCache::fetchPageRow is the most common one [07:29:40] oh, let me see if it matches my tcpdump [07:29:49] linkcache? [07:29:57] SELECT /* LinkCache::fetchPageRow */ page_id,page_len,page_is_redirect,page_latest,page_content_model FROM `page` WHERE page_namespace = '0' AND page_title = 'xx' LIMIT 1 [07:30:02] it is the same [07:30:03] haha [07:31:27] the other one frequent is heartbeat checks [07:32:05] well, it is interesting we have reached the same query through two different ways of checking [07:35:51] https://phabricator.wikimedia.org/P5193 [07:37:02] that is such a big difference [07:37:16] it must be that one - I was looking in gerrit and phabricator for traces of that query [07:45:21] the query has always been there [07:45:33] the question is why is more frequent now [07:45:48] maybe it is just someone querying heavily [07:46:09] but _that_ heavily? [07:46:20] and for that long? [07:48:45] the rc slaves are also affected (not in such a big way, but also increased their traffic) [07:49:03] (they have less traffic weight ofc) [08:21:21] why is db1052 (s1) master having long running selects? [08:22:08] or at least tendril is showing it [08:26:25] https://grafana-admin.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1052&from=now-3h&to=now [08:27:52] what is the weight? [08:28:22] 0 [08:29:07] no 52 references on the file :-/ [08:29:21] except as a master [08:29:28] yep [08:29:30] problem with sync? [08:29:48] https://tendril.wikimedia.org/report/slow_queries?host=%5Edb1052&user=wikiuser&schema=wik&qmode=eq&query=&hours=1 [08:29:48] I have synced the file many times today already [08:29:53] happening since 8 [08:30:05] yes [08:30:09] and nothing was done at 8 today [08:30:13] as per SAL [08:31:23] the graphs shows it is not happening anymore, but why did it happen and why isn't happening now [08:33:29] and it is only happening on the master [08:33:41] yes [08:33:46] Category::refreshCounts for example is only executing on the master, not on the slaves [08:34:00] let me check other masters [08:34:10] the updates are normal [08:34:18] but they should not take so much time [08:34:58] looks only enwiki [08:36:47] https://grafana-admin.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&from=now-24h&to=now&var-server=db1052&var-network=eth0 nice spikes too [08:37:41] no lag: https://grafana-admin.wikimedia.org/dashboard/db/mysql-replication-lag?orgId=1&var-dc=eqiad%20prometheus%2Fops [08:38:15] so, is there any way apart from changing the weight in the php file that a master can get SELECTs? [08:38:21] (and if there is no lag as you showed?) [08:38:32] how can selects arrive to the master? [08:38:34] someone manually running it? [08:38:42] but then it would be run with another user [08:38:52] yeah [08:38:55] let me check the IPs [08:39:14] bad/corrupt file config? [08:39:24] but we haven't chagned the weight in months [08:39:27] *weeks I guess [08:39:37] loadbalancer bug? [08:39:55] mw1197.eqiad.wmnet. [08:40:01] that is a random ip running the select [08:40:44] and: mw1290.eqiad.wmnet. [08:40:46] those two [08:41:24] try to see the group of those servers (text, api, etc.) [08:41:55] yep, will do, checking logs to see if there was any apache restart, rsync, etc [08:46:25] oh, "LOCK IN SHARE MODE" [08:47:01] so probably that should run in the master, but there was contention there [08:47:20] where did you see that? [08:47:40] (both servers from api, btw) [08:47:40] https://tendril.wikimedia.org/report/slow_queries_checksum?checksum=8f7978fc16e0c2381f2405ce821ebfe7&host=%5Edb1052&user=wikiuser&schema=wik&hours=1https://tendril.wikimedia.org/report/slow_queries_checksum?checksum=8f7978fc16e0c2381f2405ce821ebfe7&host=%5Edb1052&user=wikiuser&schema=wik&hours=1 [08:48:10] this is a one time thing- but we should report it [08:48:31] I think it is a bug [08:48:52] things should not be locked for 1/2 hour [08:49:05] so those selects are meant tobe in the master? [08:49:14] yes, I would assume [08:49:24] but there was contention on categorylinks [08:49:37] and that created slow stuff everywhere [08:51:07] that whole explanation makes total sense [08:51:22] the question now is, why did we have contention there... [08:51:55] do you have the api calls handy? [08:52:00] I am writing a ticket [08:52:15] the queries? [08:53:18] https://phabricator.wikimedia.org/P5195 [08:53:33] didn't run show full processlist, only show processlist [08:53:50] ok, don't worry [08:54:13] but looks like this: SELECT /* Category::refreshCounts */ COUNT(*) AS `pages`, COUNT( (CASE WHEN page_namespace = '14' THEN 1 ELSE NULL END) ) AS `subcats`, COUNT( (CASE WHEN page_namespace = '6' THEN 1 ELSE NULL END) ) AS `files` FROM `categorylinks`, `page` WHERE cl_to = 'All_stub_articles' AND (page_id = cl_from) LIMIT 1 LOCK IN SHARE MODE [08:59:09] Category::refreshCounts [08:59:57] we report, and check that it is followed [09:00:03] back to the QPS issue [09:07:22] you want me to report this issue or you were doing it already? [09:11:06] https://phabricator.wikimedia.org/T162121 [09:11:14] sorry, my paste failed earlier [09:12:29] ah [09:12:30] thanks [09:33:18] 10DBA: convert dbstore1001 to InnoDB compressed by importing db shards to it - https://phabricator.wikimedia.org/T159430#3067264 (10Marostegui) Looks like we are hitting this: https://jira.mariadb.org/browse/MDEV-9027 on dbstore1002, so I would like to convert a couple of tables from tokudb to innodb there and s... [09:41:52] it is not bing, the number of requests do not increase dramatically at 20h [09:42:20] yeah, I was checking luca paste and I was like: maybe I am missing something obvious [09:43:16] and if it is a search engine, I would expect increases in every single shard [09:43:23] at least, a small bump [10:50:11] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153389 (10jcrespo) [10:55:36] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153405 (10jcrespo) Origin ips (under NDA): {P5199} The queries done are: ``` ?format=json&action=parse&page=[*title*]&prop=tex... [10:58:42] #askdba: tendril crons on terbium will continue to work as is after the switchover, not changes required, right? [10:59:20] I don't think so, no? [11:02:59] tendril is not failed over [11:03:10] if terbium is failed over [11:03:16] it should just work [11:07:07] yep, just checking [11:07:13] thanks for the clarification [11:15:41] 10DBA, 06Labs: Prepare and check storage layer for khw.wikipedia - https://phabricator.wikimedia.org/T160870#3153421 (10Rachitrali) 05stalled>03Open Dear brothers and Sisters, I am affiliated with Khowar Wikipedia incubator project as test admin since 2008 and regularly contributing with wikimedia foundati... [11:36:22] 10DBA, 13Patch-For-Review: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390#3153436 (10Marostegui) db2068 is done: ``` root@neodymium:/home/marostegui# for i in `cat s7_T160390`; do echo $i; mysql --skip-ssl -hdb2068.codfw.wmnet $i -e "show create table revision\G" | egrep "KEY";done arwi... [11:43:34] 10DBA, 06Labs: Prepare and check storage layer for khw.wikipedia - https://phabricator.wikimedia.org/T160870#3153454 (10Urbanecm) 05Open>03declined Hence the parent task is closed, this doesn't make sense anymore. [11:51:28] 10DBA, 05codfw-rollout: Analyze if we want to replace some masters in eqiad while it is not active - https://phabricator.wikimedia.org/T162133#3153482 (10Marostegui) [12:13:31] 10DBA, 06Operations, 10ops-eqiad: Decommission db1057 - https://phabricator.wikimedia.org/T162135#3153532 (10Marostegui) [12:13:41] 10DBA, 06Operations, 10ops-eqiad: Decommission db1057 - https://phabricator.wikimedia.org/T162135#3153550 (10Marostegui) p:05Triage>03Normal [12:27:47] 10DBA, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user - https://phabricator.wikimedia.org/T160887#3153575 (10aude) think there already is a limit of 5000 [12:28:42] 10DBA, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user - https://phabricator.wikimedia.org/T160887#3153576 (10aude) think we can mark this as resolved? If we... [12:36:59] 10DBA, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user - https://phabricator.wikimedia.org/T160887#3153598 (10daniel) @hoo The all-option has been removed, as... [12:49:30] 10DBA, 10Wikidata, 03Wikidata-Sprint: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user - https://phabricator.wikimedia.org/T160887#3153617 (10aude) 05Open>03Resolved [13:00:15] 10DBA, 10Wikidata, 03Wikidata-Sprint: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user - https://phabricator.wikimedia.org/T160887#3153637 (10jcrespo) Thank you very much for working on this- do you have an estima... [13:02:59] 10DBA, 10Wikidata, 03Wikidata-Sprint: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user - https://phabricator.wikimedia.org/T160887#3153640 (10aude) @jcrespo we backported/deployed this last Thursday [13:04:28] 10DBA, 10Wikidata, 03Wikidata-Sprint: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user - https://phabricator.wikimedia.org/T160887#3153641 (10jcrespo) Thank you again! [13:10:32] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3153644 (10jcrespo) [13:28:17] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153673 (10jcrespo) Seems to have stopped for now since 12:34 UTC: https://grafana.wikimedia.org/dashboard/db/api-summary?panelId... [13:48:20] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153759 (10jcrespo) 05Open>03stalled [13:51:44] can I upgrade postgres (update from jessie point update) on labsdb1004? [13:52:29] labsdb1004? [13:53:38] role::postgres::master is that maintained by DBAs? [13:53:38] to be fair, I do not know who is using that, ask cloud team- operation-wise I have no problem- I am unsure about the impact [13:53:46] ok, will do [13:53:54] it is not osm nor tools not replicas [13:54:06] so I am not 100% sure the usage [13:54:14] in fact, I have a meetign with labs in half an hour [13:54:16] I can ask [13:54:26] no, I'll do that. when I've found out, I'll report back :-) [13:54:27] about labsdb* server future [13:59:56] moritzm: those this upgrade affect also puppetdb postgres? [14:00:15] s/those/// [14:00:17] in any case, it is likely it is a small impact, give 30 minutes and I will tell you [14:01:36] I've pinged labs admins on IRC, we'll see when they're around [14:02:06] volans: yes, postgres on nihal and nitrogen also needs the update [14:02:58] ok, then we might want to do it together with https://gerrit.wikimedia.org/r/#/c/346110/ I guess both will generate a spam of failing puppet around [14:03:36] sure, let's bundle it tomorrow? [14:03:45] works for me! [14:03:50] ok, nice [14:04:36] 10DBA, 06Labs, 06Operations: eqiad: (2) hardware access request for labsdb1004 & 5 refresh - https://phabricator.wikimedia.org/T161754#3153870 (10chasemp) [14:04:57] akosiaris: any caveat for the postgres upgrade on puppetdb hosts? [14:05:07] volans: postgres ? [14:05:09] (see above for context) [14:05:36] ah [14:05:47] moritzm: the usual minor stuff of alerts in the ops channel [14:06:05] we have a master/slave there, I guess upgrade first the slave [14:06:08] yes [14:07:54] ok, thanks. when labsdb1006 was reimaged, it already received the new version, so it doesn't need to be updated [14:08:16] will upgrade 1005 shortly, then [14:08:41] thanks alex [14:08:48] oh, so 4 and 5 are an independent master and slave? [14:09:20] for postgres, I mean, I knew for mysql [14:09:58] hmm. no I confused this with osm. [14:10:20] what is the slave running against labsdb1004 (using role::postgres::master)? [14:10:37] ok, let me talk with alex an chase and they will clarify probably lots of things for mw [14:10:40] *me [14:10:45] sounds good! [14:11:09] and I will report to you the plan [14:12:02] likely we will be able to restart those easily (the daemons) [15:13:51] moritzm, so those are used mostly by wikitags on labs, maybe others [15:14:28] I think we need to schedule maintenance with halfak, akosiaris, is that right ? [15:14:54] labsdb1004 ? yeah it's aaron mostly [15:16:20] 10DBA, 06Labs, 06Operations: eqiad: (2) hardware access request for labsdb1004 & 5 refresh - https://phabricator.wikimedia.org/T161754#3154025 (10chasemp) 05Open>03stalled [15:16:32] 10DBA, 06Labs, 06Operations: eqiad: (2) hardware access request for labsdb1006 & 7 refresh - https://phabricator.wikimedia.org/T161755#3154026 (10chasemp) 05Open>03stalled [15:18:19] I am looking at https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=labsdb1007&var-network=eth0 and https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=labsdb1005&var-network=eth0 [15:18:27] and they are not *that* loaded [15:18:59] compared to some others like: https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=labsdb1001&var-network=eth0 [15:20:09] but we need iops and memory [15:21:12] while some of the labsdb1005 spikes are probably single projects taking too many resources [15:29:26] 10DBA, 06Operations, 10ops-codfw: codfw racking first 10 DB servers - https://phabricator.wikimedia.org/T162159#3154083 (10Marostegui) [16:00:16] 10DBA, 06Operations, 10ops-codfw: codfw rack/setup first 10 DB servers - https://phabricator.wikimedia.org/T162159#3154178 (10Papaul) [16:00:39] 10DBA, 06Operations, 10ops-codfw: codfw rack/setup first 10 DB servers - https://phabricator.wikimedia.org/T162159#3154083 (10Papaul) p:05Triage>03Normal a:03Papaul [17:47:10] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153389 (10Legoktm) > Requests do not have a user agent There's no user-agent header at all or is it some generic UA? [17:57:09] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic, 05Security: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3154452 (10MaxSem) [17:58:14] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153389 (10MaxSem) [18:31:10] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3154557 (10daniel) Ideally, not just count unique; group them and get the number of re-uses in each group, to get a distribution. [18:52:21] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3154681 (10jcrespo) User agent was "-" (without quotes). [18:57:21] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3154713 (10jcrespo) That was the plan :-). [18:57:27] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153389 (10MaxSem) We used to block API requests that provided no UA - anybody remembers why did we stop doing that? [19:02:22] 10DBA, 10Wikidata, 13Patch-For-Review, 15User-Daniel, and 2 others: Use redis-based lock manager for dispatchChanges on test sites. - https://phabricator.wikimedia.org/T159828#3154746 (10daniel) @hoo @aude Are you ok with merging/deploying the config patch? I'd like to test this as follows: * stop the dis... [19:04:16] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3154749 (10daniel) Which wikis will you run this on? I guess the more bots and gadgets are used on a wiki, the more re-usable messages we'll see. [19:29:14] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153389 (10Tgr) >>! In T162129#3154681, @jcrespo wrote: > User agent was "-" (without quotes). More likely, nothing at all. The... [19:33:53] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3154911 (10Tgr) Did the IPs change periodically or did they actually use 50 boxes to query the API in parallel? The second case s... [19:54:06] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3154951 (10Tgr) Seems to have restarted (at least based on raw GET volume, haven't looked at what type it is). See P5199#27747 f... [20:16:11] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3153389 (10Anomie) >>! In T162129#3154715, @MaxSem wrote: > We used to block API requests that provided no UA - anybody remembers... [20:17:49] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3155015 (10jcrespo) He is back, and now trying to parse Special pages, too :-) > Did the IPs change periodically or did they act... [20:25:29] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3155043 (10jcrespo) I am running something on enwiki- we can test others depending on the first results. For example, maybe commons and wikidata have more bot-like edits? [20:39:43] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3155110 (10Anomie) The simple solution may be to just block the IPs in varnish or the like, perhaps delivering a message like "If... [20:47:29] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3155132 (10jcrespo) 17 minutes for a full tables scan, less than I expected: ``` mysql> SELECT rev_comment FROM revision PROCEDURE ANALYSE(1); +-----------------------------+-----------------------... [20:56:41] 10DBA, 10MediaWiki-API, 06Operations, 10Traffic: Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour - https://phabricator.wikimedia.org/T162129#3155143 (10Tgr) > I don't think it is malign, just parallelizing queries to load balancing source IPs (always the same ones). Ye... [21:02:24] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3155159 (10Niharika) For better clarity: | Field_name | Min_value | Max_value | Min_length... [21:06:00] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3155167 (10jcrespo) I am getting better and more stats soon, hold your breath! //Note: above-Min_value may had some space-like characters for start.// [21:36:04] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3155337 (10jcrespo) BTW, the avg_value_or_avg_length = 43.3397 means there are approximately 43.3397*745508534 = 30GB only on comment text (probably more due to blob storing inneficiences), which i... [21:43:08] 10DBA, 10Monitoring, 06Operations: tendril cert expiry alerts on dbmonitor hosts - https://phabricator.wikimedia.org/T162183#3155357 (10jcrespo) [23:05:43] 10DBA: Remove partitioning from db2019 (codfw master) commonswiki.templatelinks - https://phabricator.wikimedia.org/T161683#3139513 (10jcrespo) Self reminder to increase the downtime if tomorrow is has not yet finished/caught up. [23:27:58] 10DBA, 06Operations: dbstore1002 in bad shape - https://phabricator.wikimedia.org/T162212#3155755 (10jcrespo) [23:45:25] 10DBA, 06Operations: dbstore1002 in bad shape - https://phabricator.wikimedia.org/T162212#3155813 (10jcrespo) Probably excessive memory pressure due to heavy mysql usage... blah blah blah... restarted cleanly ... updated kernel... check new import script... check long running queries,... mysql error log is cle... [23:50:21] 10DBA, 10Analytics, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3155832 (10jcrespo) > @leila, we can dump and copy to analytics-store, as long as there aren't any database.table name collisions. I hope you are aware that if for any reason... [23:56:21] 10DBA, 06Operations: dbstore1002 in bad shape - https://phabricator.wikimedia.org/T162212#3155839 (10jcrespo) There is also more load than usual since the 29, that could have contributed to it: https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=dbstore1002&from=...