[05:58:33] 10DBA, 10Data-Services, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Patch-For-Review: Replicate CentralNotice tables to Labs - https://phabricator.wikimedia.org/T135405#2297843 (10Marostegui) Hi, Just to make sure I am getting this right, centralnotice database means metawiki databas... [06:52:32] 10DBA: Remove ReaderFeedback tables from wikis - https://phabricator.wikimedia.org/T174586#3566888 (10Marostegui) These tables currently exist on: s2: ``` eowiki plwiki ``` s3: ``` alswiki de_labswikimedia dewikiquote dewiktionary en_labswikimedia enwikibooks enwikinews eswikinews flaggedrevs_labswikimedia frw... [07:38:10] 10DBA, 10MediaWiki-Database, 10MediaWiki-Maintenance-scripts: Create MW Schema Diff maintenance script - https://phabricator.wikimedia.org/T174648#3569019 (10Reedy) [07:44:22] 10DBA, 10Data-Services, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Patch-For-Review: Replicate CentralNotice tables to Labs - https://phabricator.wikimedia.org/T135405#2297843 (10Reedy) >>! In T135405#3568908, @Marostegui wrote: > Hi, > > Just to make sure I am getting this right, c... [07:48:28] 10DBA, 10Data-Services, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, and 2 others: Replicate CentralNotice tables to Labs - https://phabricator.wikimedia.org/T135405#3569045 (10Marostegui) >>! In T135405#3569043, @Reedy wrote: >>>! In T135405#3568908, @Marostegui wrote: >> Hi, >> >> Just... [07:51:33] 10DBA, 10Data-Services, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, and 2 others: Replicate CentralNotice tables to Labs - https://phabricator.wikimedia.org/T135405#3569059 (10Reedy) >>! In T135405#2553819, @AndyRussG wrote: >>>! In T135405#2374114, @Base wrote: >> The tables in the desc l... [07:59:52] 10DBA: Remove ReaderFeedback tables from wikis - https://phabricator.wikimedia.org/T174586#3569082 (10Marostegui) I have backuped the tables at: ``` root@dbstore1001:/srv/tmp/T174586# pwd /srv/tmp/T174586 root@dbstore1001:/srv/tmp/T174586# ls -lh total 33M -rw-r--r-- 1 root root 9.4M Aug 31 07:44 s3.tar.gz -rw-r... [08:03:46] 10DBA: Remove ReaderFeedback tables from wikis - https://phabricator.wikimedia.org/T174586#3569092 (10Marostegui) I have renamed the tables on db1092 on dewiki, and will leave them for a few days before dropping them everywhere: ``` root@db1092[dewiki]> set session sql_log_bin=0; Query OK, 0 rows affected (0.00... [08:06:13] Amir1: you around? [08:06:22] volans: yup [08:06:40] dunno why yet, but we're still getting the emails :( [08:07:47] and the reason I dunno why is that before if I run: echo $(/bin/ls....) I was getting the message of the cronspam email [08:07:54] while now with the new one I don't get anything [08:09:10] and even if I do: timeout 2 sleep $(...the long subshell command) [08:09:13] everything is fine [08:10:03] that's strange [08:11:49] volans: let me try to see what's going on [08:13:12] Amir1: thanks, also I'm a bit busy with other stuff [08:13:37] I quickly tried to add to my crontab the timeout 2 sleep e redirect to a file, it's empty [08:13:40] as expected [08:15:38] Amir1: what I can do is quickly add a 2> /dev/null manually to see if it fixes before the next run in 15m [08:15:50] but after the puppet run at 21 ;) [08:16:11] we can run the puppet thing manually, not really important I guess [08:16:16] but before the run [08:16:34] let me make the patch [08:16:59] I'm ok to change it manually to test it given that we don't know if it fixes [08:20:22] volans: https://gerrit.wikimedia.org/r/374954 [08:21:24] nope, it goes after tac ;) [08:24:10] I've added it manually for now, let's see in few minutes if I got the email or not and then we can merge it [08:24:29] 10DBA: Run pt-table-checksum on s4 (commonswiki) - https://phabricator.wikimedia.org/T162593#3569122 (10jcrespo) geo_tags is now done, it was easier as it only had the weird precision difference between hosts on some ranges, and it is not filtered on labs. Checking now an fixing image. [08:28:34] awesome archeology work [08:28:35] amazing [08:32:50] Amir1: ofc I meant in 1h to see the email [08:35:17] okay :D [08:35:32] although I didn't get the one now, yet [08:35:53] but also the one from 3h ago is missing... lol [09:32:37] Amir1: no email so far, I'm inclined to merge the change, I've updated your CR moving the dev/null to the tac command [09:32:48] thanks [09:33:00] :) [10:18:05] marostegui, jynus: ok to restart apache on dbmonitor*? (for the libxml security update) [10:21:13] moritzm: meeting, but i think so [10:21:59] np, I can wait until your meeting is done [11:10:42] moritzm: please do, just check that tendril is working after that [11:17:16] k, doing that now [11:18:06] done [11:55:09] jynus, marostegui: when looking at swift/ceph, I guess this could be an option: http://duplicity.nongnu.org/ [11:55:47] oh, never heard of that [11:55:57] i've tried it briefly with our ceph cluster, 4 years ago [11:56:08] i'm sure there are other alternatives too [11:56:22] also dunno how well it works with very large files etc [11:59:13] interesting [11:59:35] * volans wonders how they can take advantage of the rsync optimizations over an encrypted blob [12:00:37] they can't [12:00:45] I am using dejadup daily [12:00:53] it's a gui for duplicity basically [12:01:17] I ended up having to drop encryption cause for restoring say a 10KB file I had to decrypt the entire "chunk" first [12:01:33] with our own local "cloud" storage I suppose we don't need encryption [12:01:37] provided the transfer is encrypted (ssh) [12:01:46] yes that's true [12:01:57] we would only need it with a NAS system [12:02:16] you mean unencrypted NFS? yeah [12:02:21] yes exactly that [12:02:35] akosiaris: lol, makes sense [12:02:41] tbh I am pondering killing the data encryption part in bacula as well [12:02:49] transfers are already encrypted anyway [12:02:55] it could be nice for offline backups [12:03:06] and the code that does the data encryption part is kind of old and unmaintained [13:04:06] 10DBA, 10Operations, 10Scoring-platform-team, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3569772 (10jcrespo) I do not think we should postpone the reboots too much, my proposal would be to: 0) document access to the new hosts (bare essentials) 1)... [13:47:48] 10DBA, 10Cloud-Services: Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#3569870 (10jcrespo) [13:49:54] 10DBA, 10Cloud-Services: Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#3569878 (10jcrespo) [13:50:35] 10DBA, 10Cloud-Services: Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T142807#2546917 (10jcrespo) [13:55:08] 10DBA, 10Cloud-Services, 10Cloud-VPS, 10Epic, 10Tracking: Labs databases rearchitecture (tracking) - https://phabricator.wikimedia.org/T140788#3569906 (10jcrespo) [13:59:55] mark, akosiaris can I give my overal impression? I think we are overly ambitious [14:00:02] also volans [14:00:21] with what? [14:00:33] with the backups project [14:00:58] I am ok with using a new file backend if it is setup for many other things [14:01:18] but I am not sure we need it just for it [14:01:47] e.g. if alex is unhappy with bacula, no problem with using something else [14:02:08] but I would be happy just with a new storage volume for bacula [14:02:31] I am scared of beeing too ambitous [14:03:11] that's fine if we can solve all current problems we have with it [14:03:16] it was one of the options on the table wasn't it? :) [14:03:43] what I mean- is, if we are solving other problems at the same time, I am ok with expanding my aim [14:04:00] but I do not know if there are other issues(?) [14:04:14] we already have a swift cluster [14:04:22] well, we have another big issue: dbstore model not working anymore [14:04:27] sure [14:04:41] And I think the only thing we are also trying to solve are the provisioning, other than that, I think we are on track, no? [14:04:44] that is why we are refactoring the whole multi-instance thing [14:05:05] well, people started suggesting ceph and swift [14:05:10] that means new clusters [14:05:13] well [14:05:19] your proposal had SAN/NAS/netapp [14:05:19] I think those were just options [14:05:21] multiple machines [14:05:38] and you also raised that you wanted easy expansions [14:05:46] so that's why we got more options :) [14:05:55] yes, but I am also not a fried of clusters [14:05:58] *friend [14:06:10] if they create unnecesary burden [14:06:23] for our public file storage backend [14:06:28] I think it is more than justified [14:07:08] I am not sure about internal services like that [14:07:37] specially when we may already have redundancy cross-dc [14:08:05] so what we said for next week was [14:08:07] I am not giving any conclusions, I would ask for example to alex [14:08:09] so. I look into the "Desired situation" page in the document [14:08:20] is bacula so bad for long term storage? [14:08:21] write up the options you're comfortable with, and work those out [14:08:23] sure [14:08:28] noone said bacula is bad for it [14:08:32] and nowhere in there is there mention for easy expansions, hi availability and so on [14:08:32] yes [14:08:32] bacula is one of those options [14:08:37] but we do need to solve the current problems with it [14:08:38] sure [14:08:40] I got taht [14:08:43] maybe we are overshooting ? [14:08:48] I think we should try to get the requirements for short and long storage, what we'd like it to support and THEN explore the options [14:08:56] akosiaris: that is my overal sensation [14:08:59] marostegui: I fully agree [14:09:14] when I mention 150.000 files [14:09:15] perfect [14:09:29] my idea would be to bundle those on that "intermediate host" [14:09:54] I have started writting some stuff and then I realised: what do we mean with short storage? [14:09:55] and then on long term storage we would have 10 300GB packages for bacula(*like) [14:10:01] because to me it might mean something different than to others [14:10:18] basically, on recovery [14:10:21] ie: to me it means current hot data (as in data that is being replicated) and maybe one snapshot (or raw bundle) [14:10:32] we may not be able to just send file back to the server [14:10:38] we may need to process them [14:10:50] that is the point where an object-storage fails to me [14:10:59] pre-processing and post-processing [14:11:08] yeah that's the difference between swift and ceph [14:11:11] swift is just object storage [14:11:13] I would suggest we just get the requirements, without thinking about any solution [14:11:18] ceph is many things, also shared fs or block storage [14:11:32] marostegui: +1 [14:11:33] with ceph I must say I havent worked myself [14:11:44] but I understand it has certain added complexity [14:11:50] it certainly does [14:12:07] complexity is an ok price to pay [14:12:19] but I wonder if we there? [14:12:27] what I think would be best [14:12:35] if we make sure that long-term backup storage, and short term [14:12:38] are on different systems [14:12:53] it's better with recovery testing [14:13:00] and also having multiple DCs helps [14:13:05] so this is not a hard requirement perhaps [14:13:07] but something to keep in mind [14:13:44] yeah, the multiple dc for me lowered a lot the requirements for hard HA [14:14:06] and even more if it is not a monolitical single server [14:14:36] i agree btw, ceph is very complex and we should only really consider it if we have more use cases [14:14:40] (which we do, but not very firm ones) [14:14:49] yeah, that is my whole idea [14:14:53] if you want to explore those [14:14:57] swift... we already have, but is also only object storage, not necessarily very convenient here [14:15:05] as a "test" for expanding its usage [14:15:07] i would like to but not sure we have the manpower for it :) [14:15:14] I am more than ok [14:15:23] but not setting something new just for it [14:15:28] i think you should primarily look at your own requirements [14:15:33] yes [14:16:35] yes please [14:16:59] so that is a question for alex [14:17:05] ? [14:17:09] is bacula working ok? [14:17:13] yes [14:17:22] as in, if we give it enough resources [14:17:49] what would be the main concerns agains it? [14:18:08] i still think we should not discuss bacula at this point, it will come once we have the requirements clear [14:18:23] no? [14:18:27] I am not saying sticking to bacula [14:18:47] I am asking, what is the worse issue of bacula that we have now, that would make it non-ideal? [14:18:47] for starters I agree with marostegui once more, but if you want answers to that one, the only big issue I see with bacula is the scheduler [14:18:53] it can be blocked at times [14:19:06] I do not know bacula enough to know all potential complains [14:19:12] unnecessarily that is [14:19:53] even if, as you suggested, we setup a separate volume with a dedicated disk? [14:19:53] that's the only one. that if the scheduler is blocked waiting for say a mysqldump to finish [14:19:58] well then it's blocked [14:20:06] or we went cron-started? [14:20:29] if we make sure we don't have race conditions, cron would probably solve that issue [14:20:55] which is easy. it's just an mv in the cron started dump script [14:21:04] so basically, my idea of the "short term storage" is probably missleading [14:21:23] it is just the place were we process things and bundle while we wait for the bacula scheduler [14:21:47] so more "temporary storage" than short-term [14:22:00] that is my idea too, but it includes replication too [14:22:07] sure [14:22:11] oh [14:22:11] ok that's not short term for me [14:22:36] short term for me was "X days we want these to be around so we can recover fast and easy" [14:22:37] well, 1 week until bacula comes and rotates it when moved to long-term [14:23:00] akosiaris: for me long term is ES + logical dumps [14:23:11] I am not saying this is the correct, thing, just what I had in mind :) [14:23:16] wait what ? [14:23:33] what does the subject of the backup have to do with the "long/short" term ? [14:23:45] * akosiaris confused [14:23:47] and this is why I wanted to discuss the diagram [14:23:53] * marostegui confused too now [14:23:54] XD [14:23:59] because maybe not everybody is on the same page :-) [14:24:04] maybe ? [14:24:10] how about for sure ? :P [14:24:29] alex, do you have 20 minutes? I do not want to abuse your time [14:25:39] Should we start by defining what short term and long term means on the document? [14:27:29] jynus: yes marostegui: yes [14:27:38] but do make them 20 please [14:27:46] yes [14:27:53] until :50 [14:27:59] and we disconnect [14:27:59] ok [14:28:31] sounds good [14:49:53] akosiaris: please confirm you got the image(s) [14:51:10] jynus: confirmed [14:51:15] thanks [14:54:31] jynus: which application did you use to create the svg ? [14:57:47] I said on the note the original is the odg (it is libreoffice) [14:58:34] 10DBA, 10Operations, 10Scoring-platform-team, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3570152 (10bd808) @jcrespo's plan sounds like a good one. Working on the announce of the new cluster was already on my todo list for today, so I'll add foresh... [15:03:02] http://s2.quickmeme.com/img/a9/a9ed842f739e930dc8e9340bafbbaeaf77994c50c74fc6a86b046b54cb9b2c59.jpg [15:09:52] https://youtu.be/7_rBidCkJxo?t=55s [15:11:09] ;-) [22:04:23] 10DBA, 10Data-Services, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, and 2 others: Replicate CentralNotice tables to Labs - https://phabricator.wikimedia.org/T135405#2297843 (10Ejegg) Yes, everything in the CentralNotice schema can be visible to the public. Most of it is already visible via... [22:21:18] 10DBA, 10Data-Services: `pr_index`to be replicated to Labs public databases - https://phabricator.wikimedia.org/T113842#3571666 (10Reedy) [22:22:01] 10DBA, 10Data-Services: `pr_index`to be replicated to Labs public databases - https://phabricator.wikimedia.org/T113842#1677380 (10Reedy) Can a DBA check why pr_index isn't being replicated? Thanks! [22:34:00] 10DBA, 10Data-Services, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, and 2 others: Replicate CentralNotice tables to Labs - https://phabricator.wikimedia.org/T135405#2297843 (10madhuvishy) @Marostegui @Reedy Merged and ran maintain-views in all the labs replicas (1001/3/9/10/11) [23:22:35] 10DBA, 10Cloud-Services, 10Cloud-VPS, 10Tracking: LabsDB replica service for tools and labs - issues and missing available views (tracking) - https://phabricator.wikimedia.org/T150767#3571954 (10madhuvishy) [23:22:38] 10DBA, 10Data-Services, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, and 2 others: Replicate CentralNotice tables to Labs - https://phabricator.wikimedia.org/T135405#3571951 (10madhuvishy) 05Open>03Resolved a:03madhuvishy I'm closing this as resolved since running the maintain-views s...