[00:00:30] kevinator: ignore that. my bad. [02:03:18] Analytics, Ops-Access-Requests, operations: Access to stat1003 for jdouglas - https://phabricator.wikimedia.org/T98209#1263714 (Dzahn) p:Triage>Normal [02:36:20] Analytics-Tech-community-metrics, ECT-April-2015, ECT-May-2015, Patch-For-Review: "Volume of open changesets" graph should show reviews pending every month - https://phabricator.wikimedia.org/T72278#1263753 (Acs) >>! In T72278#1166368, @Dicortazar wrote: > After checking the code, I'd say that the c... [05:44:11] Analytics-Tech-community-metrics: Community Bonding evaluation for "Allowing contributors to update their own details in tech metrics" - https://phabricator.wikimedia.org/T98045#1263875 (Qgil) [08:30:18] Analytics, Mobile-Web, Mobile-Web-Sprint-46-Taken:-The-Dan-Garry-Story: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1264047 (phuedx) [08:53:39] Analytics, Mobile-Web, Mobile-Web-Sprint-46-Taken:-The-Dan-Garry-Story: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1171644 (phuedx) [08:55:04] Analytics, Mobile-Web, Mobile-Web-Sprint-46-Taken:-The-Dan-Garry-Story: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1171644 (phuedx) I've added the provisional estimate. @kaldari, @bmansurov: We'll need to estimate this durin... [10:30:19] joal, hi [11:40:28] Hi mforns :0 [11:40:31] :) [11:48:29] Analytics-Tech-community-metrics: Community Bonding evaluation for "Allowing contributors to update their own details in tech metrics" - https://phabricator.wikimedia.org/T98045#1264476 (Dicortazar) @Sarvesh.onlyme, we would need to start working on the list of bullets found in the description. It would be... [12:36:04] good moorninnng joal [12:36:07] making coffee [12:36:11] morning ottomata :) [12:36:15] i think not starting those other guys was the right move :) [12:36:23] Take your time, I am babysitting now [12:36:57] everything back in track for refine [12:37:02] now only post-aggregation [12:37:11] --> after swich :) [12:37:20] coooOOol [12:38:01] hmm, doc swarm today, so I shouldn't ask, but I'd like to restart the RM (maybe with HA ?) to see if we [12:38:20] can solve the parquet file issue [12:38:26] ja we can restart it [12:38:31] HA will take me longer due to zk issue [12:38:55] yeah, right [12:39:18] let me know when you do it, I can assist :) [12:39:28] k [12:41:02] Enjoy your coffe :)h [12:42:32] also ottomata, I'd like your opinion on moving some doc pages [12:42:40] Let me know when you have some time :) [12:45:04] k gimme 7 mins [12:45:08] :) [12:45:30] actually we can talk now [12:45:32] just fixing coffee [12:45:33] batcave [12:45:47] k [12:57:10] Analytics-Wikistats: Traffic stats: analyze category 'Linux other' - https://phabricator.wikimedia.org/T48190#1264703 (ezachte) Comment from Jef Spaleta: (sent to me directly because he had sign-in issues) The primary reason why unknown linux is so large in the wikimedia stats is because both Mozilla Firef... [12:58:00] Analytics-Wikistats: Traffic stats: analyze category 'Linux other' - https://phabricator.wikimedia.org/T48190#1264705 (ezachte) [13:25:11] joal: things looking good. [13:25:34] Cool :) [13:27:14] Shall we restart RM before resuming jobs ? [13:29:17] PROBLEM - Throughput of event logging events on graphite1001 is CRITICAL 37.50% of data above the critical threshold [600.0] [13:29:43] joal: ja, we can restart everything... lemme try somethign though. [13:29:53] i realized that the zk that we were using might have beeen *slightly* older than the zk in production [13:29:57] since it was installed on precise hosts [13:30:11] going to try the HA thing in labs again, using ubuntu zk from trusty [13:30:17] or maybe jessie...hmmhmm [13:30:18] actually yeah [13:30:23] well, trusty first [13:33:35] k, let me know if you need me :) [13:34:27] oh! joal, it is working now! [13:34:30] with the trusty zk! [13:34:32] :) [13:34:41] Happy ottomata I guess [13:34:49] How come it was not working ? [13:35:00] labs was using non-trusty zk ? [13:35:19] i had installed the zk in labs on the kafka hosts there [13:35:20] for conveninece [13:35:28] but we only have a precise kafka package atm [13:35:30] ahhhhhh ! [13:35:32] so those nodes were precise [13:35:39] makes sense [13:36:06] cool, and auto failover works too [13:36:13] Nice :0 [13:36:16] :) [13:36:31] I continuously miss my smiley ... bad day [13:36:58] ok, going to read over my puppet work from yesterday, and try to apply this in prod [13:37:00] then let's do a restart [13:37:04] ok [13:37:19] Quick question : trusty / jessie --> moving from ubuntu to debian ? [13:44:08] PROBLEM - Packetloss_Average on analytics1026 is CRITICAL: packet_loss_average CRITICAL: 64.4966098684 [13:44:31] weird --> icinga sends a graphite alert, but no data in graphite ... [13:44:45] yeah, what in the world is up with graphite... [13:46:48] RECOVERY - Packetloss_Average on analytics1026 is OK: packet_loss_average OKAY: 0.852770131579 [13:51:54] I still like those alert emails with little hearts :) [13:59:31] joal: ja, we are moving from ubuntu to debian [14:02:52] ottomata: may I have some explanationS ? [14:03:19] I did that move two years ago, so I am happy, but I'd ove to understand the WMF reasons :) [14:03:41] HMMmMMM [14:04:17] there were big discussions about it last fall [14:04:23] i think one of the main motiviations [14:04:36] is folks feel ubuntu is focusing much more on desktop than server stuff [14:04:52] and debian does better in server world [14:04:54] ok :) [14:05:08] I actually think debian does better in desktop world as well ;) [14:05:20] At least for linux-friendly computer scientists :) [14:05:52] Anyway, that's still not so much of a difference :) [14:06:27] MAN ! No job running on the cluster ! [14:06:41] Either you have restarted it, or we definitely have cought back :) [14:06:47] ottomata: --^ [14:08:02] not me! [14:08:10] REAL ? [14:08:45] hmmm [14:08:59] May the RM lost have broken a few things :) [14:09:06] I'll double check [14:10:36] camus job ongoing [14:10:43] but no more oozies :) [14:11:35] ottomata: shall we wait for your zk stuff and RM restatt ? [14:12:11] yea, i'm getting close [14:12:15] let's just do it once [14:12:15] cool [14:25:48] ok joal, cool [14:25:53] hm ? [14:26:07] ya ready ? [14:26:07] we are ready to restart, i'm going to disable resubmission of camus (by commenting out cron) for amin [14:26:11] wait for this camus run to finish [14:26:13] then let's do it [14:26:18] k [14:26:19] or, should we [14:26:23] pause all oozie jobs [14:26:25] Shall I pause [14:26:26] and wait for all running jobs to complete? [14:26:27] indeed [14:26:31] yeah, ok, yeah pause [14:27:39] I go for the bundles [14:27:54] ok i can doo coords [14:28:52] 3 bundles suspended [14:29:19] still 4 coords running [14:29:21] k, really i only have to suspend one more coordinator [14:29:26] cause the others all depend [14:29:44] meh i'll suspeend them all :) [14:30:07] I still view mobile_apps monthly and daily [14:30:14] media_count archive [14:30:23] and clickstream [14:30:35] oh ja mobile apps [14:30:39] mediacount gone [14:30:47] oh clickstream, intresting! [14:30:56] I don't know this job :) [14:31:05] Can you explain ? [14:31:32] its ellery's i don't know it either [14:31:56] ok all suspended [14:32:59] great [14:33:56] one at a time i move jobs into essential queue to give them a boost :p [14:34:01] probaly doesn't matter, but it is fun [14:34:04] Weird : still 3 jobs running on the cluster, but oozie says no job matching when looking for wf running [14:34:13] ? [14:34:18] 3 oozie jobs? [14:36:43] oh, that is probably because [14:36:44] from oozie's perspective the job is usspended [14:36:44] well, 3 launched jobs [14:36:44] but, that will only keep it from launching further actions [14:36:44] i think it will still let the current action in yarn finish [14:36:44] ok makes sense [14:36:44] I let you play with your queue :) [14:36:44] about mediacounts: we not only should resume, but kill and restart from long ago, no ? [14:36:45] yes [14:36:49] well, we might not need to kill [14:36:51] but we can re-run [14:36:57] hm [14:37:03] rerun coord_id -action X-Y [14:37:10] I need to understand better the re-run stuff [14:38:31] k, so basically re-running allows you to start a new instance of the coord, working previously existing aciotns [14:38:32] hmm, mediacounts might just work if we resume [14:38:34] correct ? [14:38:36] beacuse i had told it to re run them once last week [14:38:37] https://hue.wikimedia.org/oozie/list_oozie_coordinator/0079582-150220163729023-oozie-oozi-C/ [14:38:39] yes [14:38:53] it just tells a particular instantiation of a coordinator action (workflow) to run all over again [14:38:58] you can tell it to do so based on action number [14:39:04] And one everything is done, you kill your re-run coord instance [14:39:08] that @123 after the coord id if you do -info [14:39:14] or, can do based on date too [14:39:17] and it works with ranges [14:39:24] yes I have seen that [14:39:28] reading oozie doc [14:39:32] there are no killed jobs for mediacounts-load [14:39:40] so, if we just resume it, it might just work [14:39:44] without having to tell it to rerun [14:40:49] I think we miss data from a long time ago [14:42:08] actually no [14:42:17] only from 05-02 [14:42:38] 2015/05/02 present, but not since [14:42:49] so it seems you are right ottomata, resuming should be enough [14:44:42] ottomata: doc on Kraken ... --> Archive ? [14:44:49] yes [14:44:51] or purge :) [14:44:55] :D [14:46:00] I like the idea though :) [14:47:27] tgr, yt? [14:52:01] joal, yt? [14:52:08] yup mforns [14:52:12] joal, hi! [14:52:16] Heya :) [14:52:20] joal, I've seen you in the etherpad [14:52:26] Yes [14:52:31] joal, can you archive wiki pages in mediawiki.org? [14:52:42] joal, I mean delete [14:53:08] For the moment I take pages, and will ensure during standup that the action I plan is ok [14:53:13] Then I'll take them :) [14:53:24] So I don't know :) [14:53:43] joal, ok, it seems I don't have permits to delete pages in mediawiki.org [14:53:50] hm :S [14:53:58] Not sure I can either [14:54:30] I'll try on infrastructure/access (the first one of the to be deleted list) [14:55:13] Nope, no deletion for me neither [14:55:28] joal, if you have permits, you should see, when logged in, a delete tab next to edit and edit source... [14:55:29] ok [14:55:55] You have put the historical banner thouhg :) [14:58:34] joal, yes [15:00:58] ottomata: the, legacy tsvs were created before oozie ? [15:01:16] hm, yes/no? [15:01:21] hm, no [15:01:25] not sure what you are asking though [15:01:27] weird [15:01:42] there are 5 scripts, identical [15:01:46] but the name [15:01:52] if I don't miss something [15:01:56] are you asking of that data existed before hive/hadoop/oozie? then yes. [15:02:00] it was created by udp2log then though [15:02:14] since januarry 2015 is has also been created by hive/hadoop/oozie [15:02:25] and only in the last few weeks has it no longer been generated by udp2log [15:02:28] that is [15:02:30] on stat1002 [15:02:32] /a/squid/archive [15:02:33] vs. [15:02:37] /a/log/webrequest/archive [15:02:57] the generate_5xx-[bits/misc/etc]_tsv.hql files are the same [15:03:07] We should ahve handled that with datasets, no ? [15:03:09] looking [15:03:17] oh [15:03:26] whoa werid [15:03:27] they are [15:03:29] not sure why [15:03:37] Neither do I :) [15:04:55] hm, ahh, i think that is do to an abstraction in workflow.xml [15:04:58] [15:05:06] yeah [15:05:12] Was looking into that as well [15:05:14] so it required that they are the same? [15:05:14] i guess [15:05:21] qch ris wrote this, (i reviewed it :) ) [15:05:52] almost empty cluster! [15:05:58] 3 reduces go! [15:06:01] 1! [15:06:26] still a launcher :) [15:06:31] heh [15:06:34] there we go [15:06:36] ok! [15:06:37] Gone [15:06:43] hipa ! [15:06:44] restarting RMs and nodemanagers [15:07:07] with updates for versions on parquet and others deps, zk, and a new MR ? [15:07:11] RM soory [15:08:21] ja installed updated deps yesterday [15:08:27] fingers crossed [15:09:09] ok, all restarted, HA RM looks good! [15:10:07] Nice :) [15:10:12] i'm going to let camus run, but lets try your parqeut thing before we start ooie stuff [15:10:18] cool [15:10:19] fingers corssed (im' only 50% hopeful) [15:10:22] testing now [15:11:16] org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby [15:12:11] stopped and restarted -> same error [15:12:14] try now [15:12:18] that is because of a bad config on my part [15:12:19] fixing [15:12:31] k [15:12:37] What was it ? [15:12:41] some HA stuff ? [15:13:09] shell started [15:14:05] naw, uhhh, tell you in a bit [15:14:09] dumb thing [15:14:50] webUI not accessib;le :) [15:14:58] text files ok [15:15:29] parquet fails [15:15:39] Damn .. [15:15:47] SAme error [15:16:06] crapo [15:16:37] crap, welp, i guess we should just start jobs and then continue figuring this one out, eh? [15:16:42] testing loading data from hive [15:16:49] Yup [15:16:53] ok, starting things... [15:17:03] want me to start some ? [15:17:05] naw i got it [15:17:08] you mess with spark [15:17:14] k [15:25:36] Wow, production queue filled up :) [15:26:56] Hullo, I'm running 5 minutes for late for standup. [16:05:24] hm, milimetric, joal, something to keep an aye on: [16:05:27] apache flink [16:05:32] vs spark streaming [16:05:32] flink! [16:05:52] pipeline db too [16:07:20] hm cool [16:07:46] so many things! [16:27:17] hm, joal, very strange that this parquet spark thing works in local mode, but not yarn [16:45:13] yezh, completely strange ottomata [16:45:22] also strange [16:45:34] so, i'm trying to write a parquet file from a paralleized data set [16:45:59] to see if i can write and load my own file [16:46:09] http://www.codeshare.io/aITwc [16:46:30] getting :33: error: value saveAsParquetFile is not a member of org.apache.spark.rdd.RDD[Person] [16:46:30] people.saveAsParquetFile(parquetFilePath) [16:46:36] i'm doing this from exampeslhere : [16:46:40] https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#loading-data-programmatically [16:47:03] That's weird for sure ! [16:48:24] Will try it as wel [16:50:59] also, btw, pyspark shell gives same error [16:51:05] for loading parquet file [16:51:17] hm [16:51:23] At least it's coherent [16:53:24] Ok ottomata [16:53:34] Managed to write a file, but can't read it back ! [16:54:41] how'd you write it? [16:54:56] You need to use a dataframe object [16:55:03] paste please! [16:55:05] :) [16:55:06] So: people.toDF [16:56:07] hm, guess that implicit things wasn't working [16:56:23] http://www.codeshare.io/dVKSG [16:57:41] From the programming guide of 1.3.0 --> val people = sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)).toDF() [16:57:50] This toDF() at the end is the key :) [16:57:53] aye [16:58:06] ok cool [16:58:14] So parquet written [16:58:19] But not read !!!! [16:58:24] Do you believe that ... [17:00:04] ok, yeah, going to post for help... [17:00:11] hmm ,first [17:00:13] going to try with spark-submit [17:01:11] https://issues.apache.org/jira/browse/SPARK-6315 [17:01:24] found it I think ! [17:01:33] ottomata: --^ [17:02:37] HOW DO YOU GOOGLE SO WELL [17:02:43] :D [17:11:22] PROBLEM - MySQL Slave Delay on db1045 is CRITICAL: CRIT replication delay 352 seconds [17:11:36] so far it isn't helping joal :/ [17:12:50] ottomata: try setting the conf value as they said, but no chance [17:12:55] not working :( [17:13:02] yeah me too [17:13:28] And what's event more weird is : when writing your own file, it should wsritten with parquet new version ... [17:14:08] I think posting sounds reasonnable in that case :( [17:17:42] ja, joal, exactly [17:17:52] yeah, the example i ahve now, is 100% contained within new spark [17:17:54] so it should work! [17:17:58] it does work in local mode [17:17:59] just not yarn [17:18:07] crazyness [17:21:05] ottomata, hi, is the cluster ok? Dan tells me hive queries didnt run [17:21:55] cluster is sitll a little busy, but it should be ok now [17:24:15] Analytics, Mobile-Web, Mobile-Web-Sprint-46-Taken:-The-Dan-Garry-Story: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1265755 (phuedx) @kaldari, @bmansurov: If you're happy with the estimate, then just move it through the process. [17:27:01] Analytics, Mobile-Web, Mobile-Web-Sprint-46-Taken:-The-Dan-Garry-Story: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1265772 (bmansurov) 3 sounds good to me. [17:28:17] mforns_brb: any new insight into the volume going into EL from those two schemas? [17:28:30] just curious if we know whether we're dropping events, etc. [17:35:14] AH joal [17:35:18] ? [17:35:20] i am able to make this example work in vagrant [17:35:27] so something is wrong with cluster [17:35:33] maybe there is still some version problem we ahven't detecte dyet [17:35:42] trying in labs real quick [17:35:43] I can't imagine anything else [17:35:46] k [17:35:46] (maybe) [17:36:44] milimetric: last email from icinga is not nice .. [17:37:23] milimetric: Shall we drop some mobile eventx ? [17:37:36] joal: in scrum of scrums right now, but let's talk afterwards [17:37:42] k sorry [17:39:21] joal: the slave replication alert, right? (just making sure I'm not missing other alerts) [17:39:32] yup [17:40:04] means that DB is not keeping up normally with insert rate, if I don't mistake [17:40:39] it sounds like the replication to the slave is bad, but the master seems ok [17:40:55] although the throughput as of this morning is over 800 / second which is much higher than we've seen in the past [17:41:22] like x2 right ? [17:42:13] k, if master's ok, let's wait and see what mobile team answer [17:42:17] thx for the answer [17:42:42] diner time, brb [17:48:54] wow, joal. fixed. [17:48:58] it was a version thing...but i'm not sure of what [17:49:01] i just did this: [17:49:12] salt -E 'analytics.*|stat1002' cmd.run 'sudo apt-get -y --force-yes install $(dpkg -l | grep "^ii" | grep -i cdh | grep -v cdh5.4.0 | awk "{print \$2}" | tr "\n" " ")' [17:49:44] that manually upgraded any cdh packages that were not at cdh5.4.0 that were installed across the whole cluster [17:50:03] the only ones i saw upgraded were [17:50:04] hbase hive hive-jdbc impala-shell sentry zookeeper [17:50:20] hm, maybe hive somehow [17:50:37] joal / mforns_brb: I'm going to add a bug for the missing graphite data and tag it operations [17:51:13] Analytics-Kanban, operations: Event Logging data is not showing up in Graphite anymore since last week - https://phabricator.wikimedia.org/T98380#1265890 (Milimetric) NEW [17:51:20] yup, totally works now! [17:51:22] phew! [17:51:25] sorry abou tthat. [17:56:13] milimetric, YGM :) [17:57:17] Ironholds: what? I got mail? I got you? [17:57:28] I mean, you do get me, but you also have mail [17:57:39] * milimetric doesn't see mail [17:57:51] oh! [17:58:04] you fell under my "look at only the top 3 emails" rule [17:58:07] ahhh [17:58:28] there's a followup question there too which is "how does that process work, because it looks like we need to surface some logs from fluorine" [17:59:17] Ironholds: yes, stat1002 has those folders too [17:59:21] neat [17:59:25] you can see that /a/limn-public-data is there [17:59:36] but there's also /a/aggergate-datasets [17:59:44] and /a/public-datasets [17:59:53] *nods* [18:00:17] there's some reason why we're doing that, but I always kind of wanted just one folder like "public-data-from-stat1002" [18:00:23] and "public-data-from-stat1003" [18:00:30] that seems much easier to understand [18:00:39] totally. Makes sense, though! [18:00:51] and then the final question; how does that process work and how would I go about replicating it for a different machine? [18:01:52] Analytics-Kanban, operations: Event Logging data is not showing up in Graphite anymore since last week - https://phabricator.wikimedia.org/T98380#1265951 (chasemp) I see this in icinga for graphite1001: > Throughput of event logging events CRITICAL: 92.86% of data above the critical threshold [600.0] [18:16:34] Ironholds: wait what process? [18:16:48] rsync-ing, aggregating, ? [18:16:49] milimetric, file is on stat1002, then magically appears in a public folder ;) [18:16:53] the rsyncing [18:17:01] ah, it's puppetized, lemme see if i can find tat [18:18:24] Ironholds: for stat1003 it's here: https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/sites/datasets.pp#L41 [18:18:37] and stat1002 it's here: https://github.com/wikimedia/operations-puppet/blob/production/modules/statistics/manifests/sites/datasets.pp#L49 [18:18:49] ooh, so I was wrong, it's only aggregate-datasets that comes over from stat1002 [18:19:03] awesome; thnakee! [18:19:28] so I could just drop that in the puppet manifest for whatever fluorine is and get it done? [18:19:46] oh, wait [18:20:06] or is it the other way around? i.e., modify the file you just linked to go hunting in fluorine? [18:21:58] I think I get it. *writes patch, grabs dev* [18:22:01] thanks milimetric :) [18:22:38] Ironholds: yeah, that sounds good, write patch, grab devs [18:23:01] grand! [18:33:08] joal, is there documentation anywhere of the field names that come out of the various maps in the webrequest table?@ [18:38:49] RECOVERY - MySQL Slave Delay on db1045 is OK replication delay 108 seconds [18:39:56] Ironholds: the x-analytics map is documented here: https://wikitech.wikimedia.org/wiki/X-Analytics [18:40:11] kevinator, okay. Is there documentation for any other map? [18:41:48] Ironholds: no, I can't find all of it. There is some info on the user_agent map here: https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest [18:41:56] thanks :) [18:42:21] milimetric: yt? [18:42:32] hi kevinator [18:42:38] batcave? [18:42:41] omw [18:44:03] ottomata: You rock :) [18:44:42] Ironholds: No precise doc on user_age [18:44:44] i mean, sorta, i shoulda done that in the first place :/ [18:45:00] Right ... But at the end, you still got it :) [18:45:36] Ironholds: user_agent_map and geocoded_data have a defined set of fields, but not documented [18:45:57] joal, gotcha. They probably should be, but in the meantime, I worked it out thanks to kevinator :) [18:46:05] ok cool [18:46:10] They should defintely be [18:46:12] the only reason I've been able to use geocoded_data at all is that I know where the java lives and read it ;p [18:46:28] hmm [18:46:48] Easiest would be to request the full map limit 1, and you got key names [18:48:02] But they should be documented :) [18:54:44] joal, yep ;p [18:57:40] put it in the field comment? [18:57:57] creating pages for doc as x_analytics now [18:59:06] put it in hive table comments! [18:59:07] for create table :) [19:00:46] ottomata: already there for most of it [19:30:51] hey ottomata, you got root on fluorine? ;) [19:31:01] ha, yes [19:33:57] any chance you could mkdir /a/public-datasets/ ? [19:34:29] Writing some code to surface log aggregates for search; if someone with root can make that directory, my creation of an rsync connector will actually function [19:34:33] and functioning is good [19:53:08] joal: fyi, i'm going to play with this minimum-mb setting, going to restart RMs, hoping that HA RM makes this restarting thing easier [20:00:39] ok ottomata [20:00:39] On my side playing with spark [20:00:39] let me know if you have impala jumping :) [20:00:55] so, yinz are about to get some thoughts on the last-accessed cookie [20:01:06] thoughts like "It's breaking things and you should've told other people you were attaching it to requests" [20:01:07] hey everybody! I noticed recently I've starting to get this error from my Java tool: 12:57:20.785 [main] WARN o.a.h.c.p.ResponseProcessCookies - Invalid cookie header: "Set-Cookie: WMF-Last-Access=06-May-2015;Path=/;HttpOnly;Expires=Sun, 07 Jun 2015 12:00:00 GMT". Invalid 'expires' attribute: Sun, 07 Jun 2015 12:00:00 GMT [20:01:18] did we add non-standard cookie headers recently? [20:01:21] kevinator, I would thoroughly recommend sending engineering@ a note about the cookie [20:01:38] and if so, can it be done in a way that doesn't drive Java crazy? :) [20:01:50] Ironholds: you mean the last-access cookie? [20:02:01] yep; see SMalyshev's comments. [20:02:25] looks like Java doesn't like the expires part... [20:02:33] not sure why [20:04:43] we didn't do anything non-standard for the expiry as far as I know... [20:05:02] let me log a ticket and alert those who would know more about it than I [20:05:44] BTW I'm going on vacation off the grid tomorrow so I need someone else look into this. [20:06:15] milimetric: I have three tests failing on test_reports - test_full_report_create_and_result, test_report_result_json, and test_report_result_sum_only_csv. I can't figure out why anything I changed should affect these. Any insights? [20:08:37] madhuvishy: well, it's not unthinkable, everything pretty much relies on the wiki_user table [20:08:50] is your latest patch in gerrit? [20:08:52] I can check it out [20:09:03] not till the last changes. let me push [20:11:02] ottomata: be carefull with the HA RM : webUI behind yarn.wikimedia.org soesn't follow ;) [20:11:41] kevinator: ok, thanks [20:12:14] SMalyshev: that's troubling, I was involved with adding that cookie [20:12:17] SMalyshev: what version of Java are you using... and any library in particular to parse the cookie? [20:12:47] it's odd that browsers seem fine with the cookie and Java doesn't like it. Is your code open? [20:12:52] Analytics-Cluster, Analytics-Kanban: WMF-Last-Access cookie breaks Java client - https://phabricator.wikimedia.org/T98396#1266490 (kevinator) NEW [20:13:17] kevinator: nothing special, Apache http client [20:13:37] milimetric: I logged a phab task ^^ to investigate this [20:13:53] yep, saw [20:13:54] Analytics, Mobile-Web, Mobile-Web-Sprint-46-Taken:-The-Dan-Garry-Story: Instrument “tags” and anonymous gather pages to track engagement with browse. - https://phabricator.wikimedia.org/T94744#1266515 (bmansurov) a:bmansurov [20:14:13] kevinator: the code is here: https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/java/org/wikidata/query/rdf/tool/wikibase/WikibaseRepository.java [20:14:29] yeah, noticed joal, it has to be in analytics1001 for that [20:14:50] kevinator: this is specifically the request part: https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/java/org/wikidata/query/rdf/tool/wikibase/WikibaseRepository.java#L221 [20:14:55] np, just saying :) [20:14:57] Analytics-Cluster, Analytics-Kanban: WMF-Last-Access cookie breaks Java client - https://phabricator.wikimedia.org/T98396#1266528 (Milimetric) code that is having trouble with the new cookie: https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/java/org/wikidata/query/rdf/tool/wikiba... [20:15:18] milimetric: see https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/java/org/wikidata/query/rdf/tool/wikibase/WikibaseRepository.java#L221 [20:15:41] ottomata: Thank you for solving the spark issue :0 [20:15:59] thanks SMalyshev, I updated the bug [20:16:03] I now have a working job for aggregating pageviews on projects hourly :) [20:16:17] 10mins to agg one hour, 4 workers 2G [20:16:26] nice! [20:16:41] milimetric: we don't really need the cookies so I'll probably try to tell Apache client to ignore it but other Java-based clients may have this issue too [20:17:37] SMalyshev: yeah, since this cookie is going out to everyone, this issue matters [20:17:42] thanks for raising the issue [20:18:04] np :) [20:19:41] bed time ! [20:19:48] talk to you guys tomorrow :) [20:19:54] have a good evening [20:19:56] nite [20:20:01] Analytics-Cluster, Analytics-Kanban: WMF-Last-Access cookie breaks Java client - https://phabricator.wikimedia.org/T98396#1266574 (Smalyshev) Just to be clear and save digging in the code, the client used is org.apache.http.client, via org.apache.http.client.methods.HttpGet and org.apache.http.impl.client... [20:20:03] Analytics-Cluster, Analytics-Kanban: WMF-Last-Access cookie breaks Java client - https://phabricator.wikimedia.org/T98396#1266577 (BBlack) I *think* that Expires field is formatted correctly... does someone have a handy link to whatever deeper java code actually parses it and throws that error? [20:21:33] Analytics-Cluster, Analytics-Kanban: WMF-Last-Access cookie breaks Java client - https://phabricator.wikimedia.org/T98396#1266596 (Milimetric) @BBlack: I think it's this code here: https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/java/org/wikidata/query/rdf/tool/wikibase/Wikibase... [20:22:28] Analytics-Cluster, Analytics-Kanban: WMF-Last-Access cookie breaks Java client - https://phabricator.wikimedia.org/T98396#1266598 (BBlack) I mean, the org.apache.http.... code that actually parses the Set-Cookie [20:23:34] (PS5) Madhuvishy: WIP: Fix validate again functionality on cohort display page [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/206346 (https://phabricator.wikimedia.org/T78339) [20:23:46] milimetric: done. ^ [20:24:03] thx [20:24:05] * milimetric looks [20:25:29] madhuvishy: tests/test_controllers/test_reports.py all passes for me [20:25:38] ha [20:26:24] madhuvishy: did you make sure the wikimetrics-queue service was stopped so the tests could create their own? [20:26:36] milimetric: aah I've to do that. [20:26:54] oh yeah, that'll mess up a lot of tests [20:26:55] Analytics-Cluster, Analytics-Kanban: WMF-Last-Access cookie breaks Java client - https://phabricator.wikimedia.org/T98396#1266609 (Smalyshev) @BBlack It looks like Wikipedia examples are different :) https://en.wikipedia.org/wiki/HTTP_cookie#Expires_and_Max-Age ``` Set-Cookie: lu=Rg3vHJZnehYLjVg7qi3bZjzg... [20:28:20] milimetric: okay now I have 1 error and 2 failures [20:28:51] milimetric: both failures are with the arabic/cyrilic names. I can't find what's happening there [20:29:16] madhuvishy: wanna hangout? [20:29:21] sure [20:33:39] Analytics-Cluster, Analytics-Kanban: WMF-Last-Access cookie breaks Java client - https://phabricator.wikimedia.org/T98396#1266648 (BBlack) Ok, fair enough. The dashes seem to be more-common practice in any case. Working out a fix (too bad Varnish uses the spaces in its default formatting!). [20:41:24] Analytics-Cluster, Analytics-Kanban: WMF-Last-Access cookie breaks Java client - https://phabricator.wikimedia.org/T98396#1266726 (Smalyshev) Weird, it looks like by default Apache client should use this cookie spec class: https://svn.apache.org/repos/asf/httpcomponents/httpclient/tags/4.4.1/httpclient/sr... [20:48:29] RECOVERY - Throughput of event logging events on graphite1001 is OK Less than 15.00% above the threshold [500.0] [20:53:33] Analytics-Cluster, Analytics-Kanban: WMF-Last-Access cookie breaks Java client - https://phabricator.wikimedia.org/T98396#1266790 (BBlack) Well I don't know what the language of those patterns is, but it looks plausible that dashes alone may not fix this, if they expect two-digit years along with them. I... [20:54:51] Analytics-Cluster, Analytics-Kanban: WMF-Last-Access cookie breaks Java client - https://phabricator.wikimedia.org/T98396#1266794 (BBlack) (also, I think we're actually using apache's 4.4 not 4.4.1 in our stuff? in case it makes any diff) [21:20:00] milimetric mforns - Ran 374 tests in 48.242s OK. Just got this. o/ [21:20:18] madhuvishy, woohoo! [21:20:21] yay :) [21:20:42] most underestimated change ever :) [21:20:48] milimetric: Yepp. [21:28:43] (PS6) Madhuvishy: WIP: Fix validate again functionality on cohort display page [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/206346 (https://phabricator.wikimedia.org/T78339) [21:30:13] milimetric: I'll move this to code review? We can have a conversation about all the changes/I can continue making any as we see fit. [21:30:52] madhuvishy: yeah that sounds good [21:31:15] you can feel free to move to other stuff, I'll do a review of it but I am really behind with everything since my trip [21:31:27] trying to catch up, I appreciate everyone's patience until then [21:31:46] milimetric: yes, sure no problem at all. [21:35:29] kevinator: Do you know whether this is safe to be merged? https://gerrit.wikimedia.org/r/#/c/209371/ [21:36:42] Deskana: I don't know. Can you ask mforns tomorrow? He is on EL duty this week. [21:36:56] Deskana: BTW I'm going on vacation in a few hours so I can't follow up on this [21:37:10] kevinator: Okay. Thanks. [21:38:57] Deskana, I'll have a look at this tomorrow, sure [21:39:52] kevinator, enjoy your vacation! [21:44:58] kevinator: milimetric - any suggestions on how I can contribute to the doc sprint? [21:45:33] madhuvishy: Well, do you know about the etherpad? [21:45:40] http://etherpad.wikimedia.org/p/analytics-docTD-swarm [21:45:42] milimetric: yes. i see the backlog [21:46:15] yeah, you can take a look at the articles still in the backlog, and if you're comfortable doing something with them, do it [21:47:01] milimetric: ummm.. okayy [21:47:46] :) madhuvishy i mean, it's ok if you just end up reading some of that stuff [21:47:56] this is kind of like ancient history, cleaning out someone's basement or something [21:48:05] milimetric: he he that is what i anticipate will happen [21:48:14] to me, so far, the most amazing thing is how much of the same stuff we were saying 3-4 years ago [21:48:36] ha [21:52:21] milimetric: the heady days of Limn and Kraken? :) [21:52:55] you laugh, but if our "plan" 4 years ago looks like our "plan" now, we've got some serious thinking to do [21:53:41] +1 [21:54:07] true that [21:54:50] milimetric: I must say that execution now is very different from execution 4 years ago and that’s far more important... [21:57:04] yuvipanda: i think we've just taken a detour to do some ad-hoc work that people found valuable. But we're still left without a real data pipeline. And I think that's what we really need. We just need to not call it Kraken and scare the crap out of everyone :) [21:57:30] milimetric: :) all true. [21:57:44] I mean, we spent months working on wikimetrics. And although it's a fairly useful project to a handful of people, it's not something that comes even close to a wikistats replacement, which is used by tens of thousands of people monthly [21:59:39] YUP [21:59:50] He says having been asked twice today, by different volunteers, where our UA data at [22:03:20] Analytics-Cluster, Analytics-Kanban, Easy: Build component for Oozie jobs to sends e-mails - https://phabricator.wikimedia.org/T88433#1267102 (madhuvishy) a:madhuvishy [22:05:38] milimetric: whatever happened to dashik, btw? [22:08:34] yuvipanda: dashiki's great, was on hold while we did some VE stuff but we'll have a new release that's much closer to the original vision [22:08:45] milimetric: ah sweet [22:08:46] nuria and I think it's got some real potential [22:47:11] (PS7) Madhuvishy: WIP: Fix validate again functionality on cohort display page [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/206346 (https://phabricator.wikimedia.org/T78339) [22:55:40] (PS4) Madhuvishy: [WIP] Use wmfuuid from x_analytics_map for app install id if available else default to the query parameter Todo - test if it works [analytics/refinery] - https://gerrit.wikimedia.org/r/207689 (https://phabricator.wikimedia.org/T96926) [22:56:25] milimetric: what happens if i click on rebase change in gerrit :/ [22:57:22] or yuvipanda ^ ? [22:57:40] Nothing bad, but it will create a new patchset by replaying your current patch on whatever the base is [22:57:52] If it has conflicts, it will error out [22:58:11] milimetric: hmmm. okay. [22:58:32] milimetric: i really need to wrap my head around gerrit stuff. still struggling [22:58:41] If no conflicts, it will add the patchset and if you want to keep going from that point, you'll have to first fetch and checkout that patchset [22:59:11] milimetric: hmmm, alright [22:59:18] madhuvishy: if it helps, it took me about six months to stop hating it [22:59:59] milimetric: aah. i don't know if I hate it yet - telling myself I'm stupid at this point [23:02:55] You're not. Gerrit just makes things hard for a very persnickety reason. I have philophical issues with it but it does what it does well [23:08:40] (PS5) Madhuvishy: Use wmfuuid from x_analytics_map for app install id if available else default to the query parameter [analytics/refinery] - https://gerrit.wikimedia.org/r/207689 (https://phabricator.wikimedia.org/T96926)