[01:19:42] 10DBA, 10RESTBase-API, 10Reading List Service, 10Reading Epics (Synchronized Reading Lists), and 4 others: RfC: Reading List service - https://phabricator.wikimedia.org/T164990#3551615 (10Tgr) 05Open>03Resolved Closing, nothing else left to do here. The OR issue from the previous comment was fixed, the... [06:56:15] I've been thinking of deleting even more shards from dbstore2001 [06:56:35] until a solution is found [06:57:03] which one it has now? s2, s3 and s5? [06:57:08] ah, and s6 and s7 [06:57:15] 2,3,5,6,7,x1 [06:57:15] maybe delete s7? [06:57:19] no [06:57:31] they must cover all shard between the 2 [06:57:39] I think now replication can cope [06:57:53] Ah, I thought for some reason you moved s7 to 2002 [06:57:54] nevermind [06:58:03] but when taking the backups, almost all lag, and it takes a long time [06:58:15] which is not ideal [06:58:33] how long does it take now? [06:58:35] to take the backups? [06:58:37] plus s5 I think has larger load than usual due to that ongoing maintenance job [06:58:41] let me see [06:59:03] yeah, s5 could be well hammered by that script [06:59:39] specially for a second tier host [06:59:46] with no replication checking [07:00:10] basically, I am basing my load assumptions on the disk usage percentage [07:00:21] as long as it is below 100% things work nicely [07:00:34] it has "breething room" [07:01:48] tokudb may be more efficient, but I think you are right that delayed helps, too [07:02:13] but even dbstore1001 gets delayed significantly during backups [07:02:25] yeah, that is true [07:02:34] we definitely need more speed, memory or sharding stuff [07:02:37] we just don't notice it as much because it has 20K seconds of warnings [07:03:27] so, until we have a separate model, I was thinking of not having redundancy, an minimizing iops to allow for faster dumps [07:04:08] yeah, +1 to that [07:04:53] I think we can leave if as it is now for some more time [07:05:03] but having that in mind [07:05:26] I also thought about tarballs, and I do not think that model would work for backups [07:05:33] why? [07:05:49] stopping and starting the server every day or every few hours will wear our disks very quickly [07:06:25] I don't think that is substainable [07:07:05] I like the model as a result, but I do not think how to get there is possible [07:07:33] that is true [07:07:38] I didn't think about the disks [07:07:40] we should continue thinking about it, using snapshots [07:07:44] yeah [07:07:47] very good point [07:07:50] or trying mariabackup [07:08:02] *hot snapshots [07:09:32] Started dump at: 2017-08-24 09:51:29 [07:10:59] Finished dump at: 2017-08-24 23:59:07 [07:11:33] 14 hours for 6 shards [07:11:55] clearly we can do better with less iops and more buffer pool available :-) [07:13:38] yeah, indeed [07:13:58] that is why i thought, maybe buy memory if we have budget contrainsts [07:14:02] that we don't know yet [07:18:19] I read the details of xtrabackup issues on mariadb versions [07:19:29] In theory we could make it work: https://mariadb.com/kb/en/the-mariadb-library/percona-xtrabackup/ [07:29:06] interesting [07:29:12] it is definitely worth a try [07:29:38] Note that XtraBackup (at least as of 2.4.7) will not work with https://mariadb.com/kb/en/what-is-mariadb-101/ https://mariadb.com/kb/en/encryption/ or https://mariadb.com/kb/en/innodb-xtradb-page-compression/, or when https://mariadb.com/kb/en/xtradbinnodb-server-system-variables/#innodb_page_size is set to some value other than 16K. [07:29:41] 10DBA, 10Wikidata, 10Patch-For-Review, 10User-Ladsgroup, 10Wikidata-Sprint: Populate term_full_entity_id on www.wikidata.org - https://phabricator.wikimedia.org/T171460#3551874 (10jcrespo) @Ladsgroup: This should be easy to fix: ``` root@terbium:/var/log/wikidata$ ls -lha rebuildTermSql* -rw-rw-r-- 1 w... [07:29:43] Thoes that mean InnoDB compression? [07:29:51] Yes, it does [07:29:56] yep [07:30:10] I don't know why they went binary incompatible [07:30:14] with mysql [07:31:05] I understand they being behind or something [07:31:30] I don't get what could be the advantage of doing that [07:31:45] like: hey we are doing this because of this super new cool feature that forces is to fork that way [07:31:49] but that is not the case [07:32:06] ok, encryption, I get it [07:32:14] but we do not use that [07:32:29] but different compression format? [07:33:04] having to fork xtrabackup, the most popular backup tool? [07:33:16] it makes no sense [07:33:22] if you don't do it for a big reason [07:33:26] then I could understand [07:33:33] for a new feature or something [07:33:36] as I said- new feature [07:33:37] that you have no other option [07:33:46] encryption makes sense [07:34:09] "if you use this mariadb-only feature, you cannot use standard tools" [07:34:13] yeah [07:41:00] Amir1: it is ok, not I mention logging rotation yesterday as a possiblity :-) [07:41:05] *note [07:41:35] I'm sorry about that, I thought it won't be needed [07:41:41] it will just make the cron line slightly more complex, but there is not problem generated [07:41:48] Amir1: it cause not issues [07:41:53] *caused [07:41:57] *no [07:42:03] so don't worry at all [07:42:07] :-D [07:43:12] I will probably leave db1069 with 50% of load over the weekend and depool db1028 on monday [07:43:59] depooling vslow hosts takes a bit more because from depooling to not receiving queries, like dump, can take over a day [07:44:26] maybe even more for dump hosts [09:43:00] jynus: I'm done but as a test, can you add this line to /var/log/wikidata/rebuildTermSqlIndex.log ? Processed up to page 1030413 (Q110384) [09:43:15] "\nProcessed up to page 1030413 (Q110384)\n" [09:44:44] sure [09:45:49] jynus: I confirm it works as expected [09:46:07] jynus: please remove the line, sorry [09:46:21] (it will interfere with the real work) [09:47:02] like that? [09:47:48] yeah, thanks [09:48:08] check jenkins and I will deploy [09:49:35] it is the '\n' [09:51:00] no, the first ' on '\n' ends the string [09:51:15] substitute with "\n" if you can [09:51:48] 500 char-long crontab line... consider a script instead ;) [09:51:55] yep [09:52:25] it's pretty unreadable and unreviewable ;d [09:52:28] :D [09:52:33] much easier than playing aroud puppet and bash [09:54:08] jynus: it's ready for you :) https://gerrit.wikimedia.org/r/#/c/373854/ [09:54:13] volans can deploy if I am not around [09:54:51] yeah, but you can test each part [09:56:16] also timeout is 3500s but it runs every 30m, it's ok if 2 of them runs in parallel? [09:56:37] no, it's ran in 30 of every hour [09:56:45] wow volans [09:56:47] cron fail [09:56:57] * vs */30 [09:56:57] lol, right :D [09:57:05] return your unix badge [09:57:12] :)))) [09:57:21] ESLEEPY [09:57:44] I will try to make a batch script in a follow up, is it okay? [09:57:48] *bash [09:57:59] Do I need to return my badge too? [09:58:56] up to riccardo, it is ops honor code to respect the opinion of the hardest reviewer [09:59:13] rotfl [09:59:15] I will come back before :30 [09:59:29] if you haven't deployed yet [09:59:47] re:convert to script, at this point I'd say depends if it's temporary or permanent [09:59:56] volans: temp. [10:00:07] for a month or less [10:00:38] ok then... I guess we can keep it :( [10:03:04] Amir1: I have a bunch of questions on the code [10:03:43] 1) ls /var/log/wikidata/rebuildTermSqlIndex.log* matches also gzipped files, but grep instead of zgrep is used [10:04:02] 2) why sort -n on file paths? [10:05:36] volans: sure, I'm in the daily, be back in five minutes [10:07:20] 3) why re-adding the glob path after the grep, the filename is already passed by xargs (or I'm not getting what you want to achieve) [10:09:37] okay: 1- I don't want to grep gzip files and they should be ignore, I don't know the best way to do it but I think this works [10:10:32] 2- I need to get /var/log/wikidata/rebuildTermSqlIndex.log first and if it matches it should stop, then if it can't match it it needs to go to /var/log/wikidata/rebuildTermSqlIndex.log-201708something [10:10:57] and picks up things it was working yesterday [10:11:26] 3- hmm, I'm not sure about the best way to approach that, my bash skills are not great [10:11:29] I don't see any difference in sort -n and -nr in this case on terbium [10:12:00] weird, I was able to see the different in my laptop [10:12:07] *difference [10:12:28] rebuildTermSqlIndex.log should always be before rebuildTermSqlIndex.log-$ANYTHING [10:12:35] due to the dash [10:12:52] the whole grep + sed can be simplified by: awk '/Processed up to page (\d+?)/ { print $5 }' [10:15:28] I don't know why it didn't work [10:15:37] I can test it [10:15:54] I need to for lunch, will be back in a half an hour [10:16:46] for the file selection to avoid the gzip you can use /var/log/wikidata/rebuildTermSqlIndex.log{,*[0-9]} [10:17:06] if you're sure you can find an ID in the non-gzipped files [12:02:55] volans: I thought about it and the thing is it needs to come last because we are tailing [12:57:17] Amir1: maybe I didn't explain myself well, so seems that you're checking multiple files because it might be that in the last one there is no ID to extract [12:57:50] and you're getting the last 100 files, so it looked to me that the possibility to not find the ID in the first files is quite high (correct me if I'm wrong) [12:58:28] and then you grep them all (instead of one by one until you find a match, and I guess that most of the time you'll find the ID in the first one) [14:30:57] https://github.com/maxbube/mydumper/issues/56 FYI, in case we got database names with dots [14:36:12] nice finding [15:00:55] heh, I'm watching the github repo since I'm using mydumper for personal projects anyways [16:40:02] 10DBA, 10Analytics, 10Data-Services, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3553473 (10bd808) [16:49:40] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661#3553487 (10Marostegui) I have upgraded most of codfw to 10.0.32 (pending only the master, which I will do on Monday) - db2019 is a trusty server so there is... [17:27:11] 10DBA, 10Data-Services, 10Security-Team, 10WMF-Legal, and 5 others: Make wbqc_constraints table available on Quarry et al. - https://phabricator.wikimedia.org/T170927#3553602 (10Lucas_Werkmeister_WMDE) 05Open>03Resolved Seems to be done – thanks @Andrew! [17:32:31] 10DBA, 10Toolforge, 10cloud-services-team: set up hiwikiversity on labsdb1010 - https://phabricator.wikimedia.org/T174182#3553611 (10Andrew) [17:32:57] 10DBA, 10Data-Services, 10Security-Team, 10WMF-Legal, and 5 others: Make wbqc_constraints table available on Quarry et al. - https://phabricator.wikimedia.org/T170927#3447813 (10Andrew) 05Resolved>03Open Blocked by https://phabricator.wikimedia.org/T174182 [17:33:11] 10DBA, 10Toolforge, 10cloud-services-team: set up hiwikiversity on labsdb1010 - https://phabricator.wikimedia.org/T174182#3553628 (10Andrew) [17:33:15] 10DBA, 10Data-Services, 10Security-Team, 10WMF-Legal, and 5 others: Make wbqc_constraints table available on Quarry et al. - https://phabricator.wikimedia.org/T170927#3553627 (10Andrew) [17:37:29] 10DBA, 10Data-Services, 10Security-Team, 10WMF-Legal, and 5 others: Make wbqc_constraints table available on Quarry et al. - https://phabricator.wikimedia.org/T170927#3447813 (10Sjoerddebruin) >>! In T170927#3553623, @Andrew wrote: > Blocked by https://phabricator.wikimedia.org/T174182 How is it related?... [18:29:19] 10DBA, 10Data-Services: Prepare and check storage layer for hi.wikiversity - https://phabricator.wikimedia.org/T171829#3477438 (10bd808) blocker: {T174182} [18:29:31] 10DBA, 10Data-Services: Prepare and check storage layer for hi.wikiversity - https://phabricator.wikimedia.org/T171829#3553865 (10bd808) [18:29:54] 10DBA, 10Data-Services, 10cloud-services-team: set up hiwikiversity on labsdb1010 - https://phabricator.wikimedia.org/T174182#3553611 (10bd808) [18:46:32] 10DBA, 10Data-Services: Prepare and check storage layer for hi.wikiversity - https://phabricator.wikimedia.org/T171829#3553943 (10Marostegui) [18:46:36] 10DBA, 10Data-Services, 10cloud-services-team: set up hiwikiversity on labsdb1010 - https://phabricator.wikimedia.org/T174182#3553940 (10Marostegui) 05Open>03Resolved a:03Marostegui Our beloved bug... It is now fixed. [18:46:40] 10DBA, 10Data-Services, 10Security-Team, 10WMF-Legal, and 5 others: Make wbqc_constraints table available on Quarry et al. - https://phabricator.wikimedia.org/T170927#3553944 (10Marostegui) [18:47:39] 10DBA, 10Data-Services: Prepare and check storage layer for hi.wikiversity - https://phabricator.wikimedia.org/T171829#3477438 (10Marostegui) The blocker is fixed and so is this one too: ``` mysql:root@localhost [hiwikiversity_p]> show tables; +---------------------------+ | Tables_in_hiwikiversity_p | +------... [18:47:46] 10DBA, 10Data-Services: Prepare and check storage layer for hi.wikiversity - https://phabricator.wikimedia.org/T171829#3553952 (10Marostegui) 05Open>03Resolved a:03Marostegui [18:54:13] 10DBA, 10Data-Services, 10Security-Team, 10WMF-Legal, and 5 others: Make wbqc_constraints table available on Quarry et al. - https://phabricator.wikimedia.org/T170927#3553979 (10Marostegui) >>! In T170927#3553623, @Andrew wrote: > Blocked by https://phabricator.wikimedia.org/T174182 That above task has be... [19:18:48] 10DBA, 10Data-Services, 10Security-Team, 10WMF-Legal, and 5 others: Make wbqc_constraints table available on Quarry et al. - https://phabricator.wikimedia.org/T170927#3554087 (10Andrew) 05Open>03Resolved >>! In T170927#3553979, @Marostegui wrote: >>>! In T170927#3553623, @Andrew wrote: >> Blocked by ht... [19:21:17] 10DBA, 10Data-Services, 10Security-Team, 10WMF-Legal, and 5 others: Make wbqc_constraints table available on Quarry et al. - https://phabricator.wikimedia.org/T170927#3554091 (10bd808) a:05madhuvishy>03Andrew