[00:43:01] <halfak>	 kjschiroo, not sure.  Worst case, you can always $ hdfs dfs -cat <filename> | openssl dgst -sha1
[00:47:51] <kjschiroo>	 I realized that hdfs must be doing something unexpected behinds the scenes. If the dump collection process dies, why can you just restart it with minimal impact?
[14:12:32] <halfak>	 o/
[15:07:33] * halfak curses at bluejeans
[15:07:50] <halfak>	 I missed my goal :( 
[15:31:26] <kjschiroo>	 halfak, I retract my previous statement, it has horrible impact.
[15:31:56] <halfak>	 Damn.  File a big and then maybe a pull request?
[15:32:03] * halfak adds kjshto the project
[15:32:50] <kjschiroo>	 File a big?
[15:32:52] <kjschiroo>	 oh bug?
[15:32:55] <halfak>	 *bug
[15:34:27] <kjschiroo>	 Yeah, I was trying to figure out the solution right now. The problem that I have right now is that I don't know whether to trust the dumps that are on there. I don't know which ones were in progress when it died and if they went away or if they are removed since they didn't complete.
[15:34:57] <kjschiroo>	 I was trying to figure out how to get a checksum, and have one way, but it looks like it is an md5.
[15:35:07] <kjschiroo>	 Which appears to be different than an md5sum
[15:35:23] <halfak>	 joal and I are in a meeting for the next 45 minutes, but when we're done, I think he may have ideas. 
[15:35:48] <halfak>	 md5 and md5sum should produce the same value.
[15:36:43] <joal>	 kjschiroo: currently improving our scripts to include md5 checking
[15:36:51] <joal>	 more info after meeting
[15:41:56] <kjschiroo>	 joal: I was working on the same goal last night. I would be interested in seeing how you are doing it. I ran into the issue that hdfs_client.checksum(f_path) returns a checksum like 0000020000000000000800000abcdff2c49d52a0ed399037fbc0eaa500000000 when I expect something like this 0a5d50262a82efca0b0cf13cb7452d93. I must be missing something.
[16:27:49] <joal>	 kjschiroo: Hi again
[16:28:22] <joal>	 kjschiroo: hadoop checksums are different from linux ones (based on blocks)
[16:28:40] <joal>	 kjschiroo: The way I have found is to compute md5 using pythomn
[16:29:57] <kjschiroo>	 I was looking into that, didn't finish it yet though. How close are you to pushing it out?
[16:30:07] <joal>	 kjschiroo: currently testing
[16:30:20] <joal>	 kjschiroo: should be out later on today (I hope)
[16:31:24] <kjschiroo>	 joal: This will only be for doing the checksum and validating the dumps and will not include the ability to restart them?
[16:32:08] <joal>	 kjschiroo: actually including the md5 thing to download, so restart will be built in
[16:32:52] <kjschiroo>	 Okay. Then I guess I will be looking forward to seeing it released :)
[19:01:20] <joal>	 kjschiroo: I'm sorry, my debugging takes me a little more time than expected :(
[19:01:45] <joal>	 kjschiroo: I'm logging off for tonight, will normally push code tomorrow during my day
[19:01:53] <joal>	 Have a good evening kjschiroo 
[19:01:58] <kjschiroo>	 np :)
[19:01:59] <kjschiroo>	 You too
[20:04:40] <halfak>	 Just joined documentation time, but I've got to let the dog out quick.  
[20:04:42] <halfak>	 So BRB
[20:04:59] <Emufarmers>	 dogcumentation
[20:16:33] <halfak>	 Hmm... Looks like Thursday documentation times are hard for others too. 
[20:16:41] <halfak>	 I'm the only one here so far. 
[20:17:24] <Emufarmers>	 I think everybody is distracted watching a video about galaxies or something
[20:17:39] <Emufarmers>	 (see -office)
[20:23:04] * halfak listens to galaxy zoo talk. 
[20:23:19] <halfak>	 I don't think that crowd-sourcing is "inexpensive"
[20:23:32] <halfak>	 Wikipedia is very expensive -- we just don't pay for it in $$. 
[20:23:43] <halfak>	 We pay for it in volunteer time and attention. 
[21:18:57] <halfak>	 Report! https://meta.wikimedia.org/wiki/Research_talk:Automated_classification_of_edit_quality/Work_log/2016-04-14
[22:19:55] <YuviPanda>	 halfak: I tried doing worklogs this way, btw - http://paws-public.wmflabs.org/paws-public/User:YuviPanda/worklog/Untitled.ipynb
[23:17:53] <halfak>	 YuviPanda, I like the outline
[23:18:08] <halfak>	 Seems like good discipline