[14:16:36] halfak: you ever feel like you're spending more time on the unit tests than the actual code itself? [14:17:05] hare, spending time on unit tests *is* spending time on the code. [14:17:06] :D [14:17:11] Okay [14:17:13] But yeah, I can see what you mean [14:17:20] Do you ever feel like you're spending more time on unit tests than the thing the tests were meant to test [14:17:31] (That said, doing TDD for the first time... I am identifying bugs proactively, it is great) [14:21:02] I've really become accustomed to linters (like flake8) too [14:21:14] They help a lot with dealing with bugs before they happen. [14:22:30] ...Then you do genius things like write the tests wrong. [14:23:34] I love linters. pylint ftw [14:23:52] unit tests... well I will have the tests be broken rather than the code. every time. [14:25:11] Uh... I have a test file here that is 1.84 GB in size [14:25:15] I don't think GitHub will allow that [14:30:17] Yeah... Gotta trim that down. [14:30:29] Is it the minimal that you need to test the functionality/ [14:31:15] Well, I have a LogProcessor class. One of the functions in said class is "download". I am downloading a file, decompressing it, and comparing it against the file I already downloaded. [14:31:39] TEST. EVERYTHING. [14:32:46] I am probably going to get rid of this test after I pass it. [14:35:41] halfak: Unless you want to upload a test log to dumps.wikimedia.org ;) [14:36:08] Make the URL for the downloader configurable [14:36:22] Or even, just pass a file-pointer to the log processor function. [14:36:37] File could be a requests stream, a real file, or a StringIO that you made for testing. [14:36:42] Right now, the download function takes the date as a parameter. I should add an override? [14:36:59] I'd have a function that generates the right URL given a date as a parameter [14:37:11] And a totally separate function that takes a file pointer. [14:37:55] TDD encourages you to *really* atomize your functions, doesn't it. [14:37:57] a test log for what? [14:38:14] this stuff https://dumps.wikimedia.org/other/mediacounts/daily [14:38:22] would it be useful for other users, or not so much? [14:40:05] It would be useful for my shitty unit tests and nothing else probably. [14:40:23] That being said, I'm glad I decided to unit test my downloader! I found a very subtle error that I wasn't accounting for. [15:08:19] so I'm gonna not put your file over there then (sorry), if you decide at some point you have an output file that would be good to stash, just ping me or ticket me [18:36:52] apergos, o/ I'm looking into doing per-client rate limiting for ORES [18:37:03] I was wondering how it has been done for the dump-downloads. [18:37:17] I know you and kjshiroo ran into issues with this somewhat recently. [18:37:23] downloading? I cap bw and number of connections per ip and [18:37:37] Yeah... Number of connections per IP [18:37:47] sec, it's not just a straight per ip because in some cases one ip is used for an entire education institution [18:37:52] I think I have ip + user agent [18:37:58] nginx config [18:38:01] Oh yeah. That makes sense too and would work for me [18:38:06] that seems to have worked out well [18:38:39] Can you share a link to the specific config with me? [18:40:29] limit_conn_zone $remote_addr$http_user_agent zone=addr:10m; [18:40:37] and then uh [18:40:47] limit_conn addr 3; [18:40:53] ah yeah let me link you, it's in puppet [18:42:10] https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/dumps/templates/nginx.dumps.conf.erb [18:42:28] thanks apergos [18:43:30] That looks concise! [18:43:57] \o/ [18:44:31] I don't see the part that's grouping by IP and user-agent, though... [18:45:06] $remote_addr$http_user_agent zone=addr [18:45:07] there [18:45:15] that little thing does it