[06:51:42] Analytics: Add tcy.wikipedia to the pageview whitelist - https://phabricator.wikimedia.org/T142175#2525903 (JAllemandou) [06:53:54] (CR) Joal: [C: 2] "LGTM ! Merge when you want." [analytics/refinery] - https://gerrit.wikimedia.org/r/302954 (owner: Ottomata) [07:32:45] (CR) Elukey: [C: 2 V: 2] Remove some old refinery-*.jar artifacts [analytics/refinery] - https://gerrit.wikimedia.org/r/302954 (owner: Ottomata) [07:32:57] joal: o/ [07:33:02] merged the refinery cleanup [07:33:13] great elukey [07:33:29] the main issue is that if I deploy now I'll add more data, not remove any :P [07:34:04] will double check docs about old version retention [07:34:30] also the / on 1027 might be cleaned up more [07:39:06] ah 6.5GB of apt logs /o\ [07:42:10] ok now we have 10GB of free space [07:42:51] !log ran apt-get clean on analytics1027 to free space [07:42:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [07:46:44] (PS12) Joal: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548) [07:54:56] joal: I am planning to reduce the camus log retention in /var/log/camus from 3 months to 1.5 [07:55:02] and rotate files weekly [07:55:05] any objection? [07:55:10] nope :) [07:55:26] also 4 weeks? [07:55:36] we are super constrained in 1027 and logs are huuge :) [07:55:54] elukey: aqs1004-b has a weird behaviour [07:56:35] elukey: When there are camus issues, we need to react very fast, so we usually don't need more a week logs [07:57:02] all right keeping 4 weeks [07:57:13] yeah I noticed, not sure why it is compacting in that way [07:57:33] elukey: CPU usage is different than classic compaction [08:02:01] also the sstable size on aqs1004-b is now 600GB [08:02:16] it will go down with compactions but it is weird [08:03:37] elukey: yes, it's a bit of an unexpected behaviour here [08:03:56] elukey: I suggest we wait until compaction is done and asses state, but it's weird [08:05:06] yes I was about to say the same [08:18:34] Analytics, Pageviews-API, Reading-analysis: Suddenly outrageous higher pageviews for main pages - https://phabricator.wikimedia.org/T141506#2526071 (Nemo_bis) If we are sure that what we are observing didn't happen in Iran and a few other countries, checking what updates actually affected those count... [08:23:45] joal: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Refinery - looks good? [08:24:20] ah I missed refinery-deploy-to-hdfs [08:24:55] elukey: that is a qesution: Does scap deploy to hdfs or is it manual step? [08:25:24] also elukey, instead of loggin to tin, you could say loogin to deployment [08:25:29] elukey: more generic :) [08:28:21] joal: scap doesn't do anything special afaik, it does git and git-fat pull [08:28:29] ok elukey [08:31:12] joal: if you are ok you could test a deployment [08:31:18] (the last cleanup [08:32:23] elukey: need to bring Lino to the Creche, [08:32:30] elukey: And possibly, better on monday :) [08:34:17] ahhh sure! Also it is safe TM [08:34:18] :D [08:38:45] Analytics-Cluster, Analytics-Kanban, Deployment-Systems, scap, and 2 others: Deploy analytics-refinery with scap3 - https://phabricator.wikimedia.org/T129151#2526099 (elukey) Updated https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Refinery We have some issues currently with the fact that t... [10:23:53] joal: nodetool-b compactionstats is interesting [10:24:04] * joal looks [10:24:17] it says pending tasks: 1048 that looks consistent with you metrics [10:24:27] but Active compaction remaining time : 0h05m19s [10:24:49] elukey: I have never seen correct timing in that tools [10:24:50] progress 87% [10:25:00] ah ok so it is borked [10:25:26] elukey: this metric shows only the current compacting task, and expected time for it to stop [10:25:36] elukey: But there still are many others to come after [10:26:16] mmm all right [10:26:21] will keep an eye on it [10:33:06] Analytics, Operations, Ops-Access-Requests: Add analytics team members to group aqs-admins to be able to deploy pageview APi - https://phabricator.wikimedia.org/T142101#2526291 (ema) p:Triage>Normal [10:41:03] elukey: taking a reak now, will try to deploy refinery this afternoon when getting back [10:46:24] sure, ttl! [12:41:01] Hi elukey [12:41:27] o/ [12:41:48] Do you want me to deploy refinery? [12:42:46] if you want to try it yes, otherwise we can do it on monday [12:43:30] elukey: As you wish :) [12:43:45] I want to test it for sure, timing doesn't matter :) [12:45:34] elukey: I'm gonna do it :) [12:46:01] super [12:46:17] so I deployed yesterday but didn't ran the deploy hdfs script [12:46:31] it shouldn't matter but I am mentioning it [12:46:44] you might see a python stacktrace [12:46:51] I think it is due to the new scap package [12:46:54] the rest should go ahead [12:47:21] elukey: You mention checking root partition space left --> on tin or on every deployed node? [12:47:54] deployed node, will amend the guide [12:48:12] elukey: We should provide the list then :) [12:48:24] sure sure, I can point to the scap repo files [12:48:31] sounds grat [12:48:31] so it'll be always up to date [12:51:31] elukey: They are 4 for the moment, right? [12:53:45] nope three [12:53:49] stat1002 (canary) [12:53:52] stat1004 [12:53:55] Arf, 1027, not 1015 correct? [12:53:55] analytics1027 [12:53:59] right [12:54:01] correct [12:54:12] so scap should deploy only to stat1002 first [12:54:14] ok, we have enough space :) [12:54:32] !log Deploy refinery using brand new scap deploy ! [12:54:34] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [12:55:15] elukey: error --> Permission denied (publickey,keyboard-interactive). [12:55:27] asks me if I want to rollback [12:59:22] ah yes you can do it [12:59:28] mmmmm [12:59:54] !log Rolled back refinery interactive deploy [12:59:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [13:00:09] elukey: same error at rollback [13:00:23] And as you said, there was a python stack trace [13:00:34] checking the logs [13:00:37] sure [13:01:53] so you are in analytics-admins [13:02:00] and you should be whitelisted [13:02:11] I am in that group to and I didn't have to sudo yesterday [13:02:12] mmmm [13:02:32] elukey: scap is afraid of you, so sudo itslefs ;) [13:04:11] so I just did a canary deployment, and it worked. Something is weird with your permissions.. Also I think we'd need to git pull before scap deploy :) [13:04:32] but one thing at the time [13:04:34] checking perm [13:06:48] joal: can you try again? [13:07:04] maybe with scap deploy-log -v I'll see more data [13:07:38] * joal tries [13:07:56] !log Deploying refinery using scap [13:07:58] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [13:08:06] elukey: I'll git pull before doing [13:08:17] sure [13:08:27] even if you can deploy as it is [13:08:30] it will say in the logs [13:08:32] elukey: git pull IS needed (I let you update the docs?) [13:08:44] 13:03:25 [stat1002.eqiad.wmnet] /srv/deployment/analytics/refinery-cache/revs/37b505dc4074bb34aa1e684dca518cc49e6cd4bc is already live (use --force to override) [13:08:48] yes yes I forgot it [13:09:26] elukey: indeed, already live [13:10:51] elukey: Same error [13:11:54] so you are not allowed to use the ssh key from the keyholder [13:11:55] weird [13:11:56] :/ [13:12:07] :/ [13:12:28] ah! You are not in analytics-admins, at least from id joal [13:12:39] elukey: Ahhhh, that's the thing [13:12:55] me neither [13:13:01] but it works [13:13:23] now I remember... Alex added the ops group as whitelist for deployments a while ago [13:13:26] this makes sense [13:13:37] so the analytics-admins group is not pushed to tin [13:13:41] for some reason [13:14:34] but eventlogging-admins it is [13:14:45] so let's see where it is defined [13:15:07] ahhh role/common/deployment [13:34:32] joal: mornnnig [13:34:41] gonna merge your puppet pagecoutns stuff [13:41:39] ottomata: Ok great ! [13:41:51] ottomata: I stop the oozie jobs? [13:42:01] ottomata: HELLLLLOOOOOOO sorry :) [13:42:55] oozie jobs? helllooooooo [13:43:01] oh the oozie jobs that generate them on stat1002, sure! [13:43:04] you can stop those any time i guess [13:43:14] this just stops the rsync and generation of html, ja? [13:43:16] the puppet^ [13:43:35] ottomata: yes, I just prefered to stop them accordingly to rsync removal, to make the most of generated data :) [13:43:39] k cool [13:49:21] ottomata: https://gerrit.wikimedia.org/r/303155 helloooo [13:49:24] does it look good? [13:49:33] I saw a comment that you added in there about tin groups [13:49:39] in a old commit [13:49:43] but it got removed [13:50:40] ah gerrit down? [13:50:48] ohp, nope just for a sec [13:50:50] weird [13:51:15] oh ja, elukey that sounds fine [13:51:26] !log Stopping pagecounts-[raw|all-sites] oozie jobs (load and archive) [13:51:27] that makes sure analytics-admins have accounts on tin, ja? [13:51:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [13:51:30] which they need for deployment? [14:05:06] (PS1) Joal: Remove pagecounts-[raw|all-sites] related code [analytics/refinery] - https://gerrit.wikimedia.org/r/303164 (https://phabricator.wikimedia.org/T130656) [14:06:43] joal: all good [14:06:51] ottomata: YAY ! [14:06:59] ottomata: You've cleaned up as well? [14:07:23] yup [14:07:29] You guy rock :) [14:08:36] ottomata: I'll wait monday, triple check data is not generated, html looks good etc, and then send the email :) [14:18:29] ottomata: earlier the day elukey and myself were wondering on restart procedures for the druid* cluster, there's a new openjdk update. can you outline what needs to be considered or add a new section to https://wikitech.wikimedia.org/wiki/Service_restarts ? [14:22:01] ohmg [14:22:05] haha [14:23:08] DIDN'T I MAKE A PAGE ABOU TTHIS???? [14:23:10] thought for sure I did... [14:23:11] looking [14:24:04] heya milimetric hiii [14:24:08] can you chime in on the comment here: [14:24:08] https://gerrit.wikimedia.org/r/#/c/301284/14/jsonschema/mediawiki/revision/visibility-change/1.yaml [14:24:09] ? [14:24:27] ok [14:25:03] moritzm: i can't find a page! It thought for sure I wrote a small quick page about this. writing one now... [14:25:12] thanks! [14:30:43] joal: can you retry the deploymnet? [14:30:59] sure elukey [14:31:10] !log Retrying deploying refinery from scap [14:31:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [14:33:31] joal: I am seeing you deploying now [14:33:33] right? [14:33:37] correct ! [14:33:39] Success :) [14:33:48] \o/ [14:33:59] Thanks elukey ! [14:34:11] ottomata / joal: are we doing dashes or underscores as naming convention in refinery? I see both [14:34:19] haha [14:34:22] for what? [14:34:28] all the directories and file names [14:34:41] pagecounts-all-sites and mobile_apps for example [14:35:49] milimetric: i think there is not a hard rule. I think in general i prefer underscores for most things, but there are exceptions, especially when you are trying to mix word delimiters (underscores for spaces) and concept delimiters [14:35:59] i think pagecounts-all-sites is bad [14:36:06] pagecounts_all_sites would have been better [14:36:07] in that case [14:36:26] k, I'll go with underscores, I don't have a preference here. In css class names it's a different story :) [14:37:19] moritzm: pretty minimal, but: https://wikitech.wikimedia.org/wiki/Service_restarts#Druid [14:38:01] milimetric: like, if you were trying to write out somewhere [14:38:10] oozie-coordinator-pagecounts_all_sites [14:38:15] that owuld maybe make sense like that ^ [14:38:21] with both - and _ [14:38:32] sure [14:38:34] *maybe* :) [14:39:04] no it does, 'cause it separates the prefix from the name visually and makes it uglier and therefore easier to read :) [14:39:23] haha [14:39:39] ottomata: thanks, could you add a note on user-visibility/impact, if there user impact and if so, does someone need to be notified? and if so, who? [14:39:47] ottomata: thanks, could you add a note on user-visibility/impact, is there user impact and if so, does someone need to be notified? and if so, who? [14:39:56] elukey: btw, thanks for all the secret documentation work you do [14:40:05] i've never read these service restarts for hadoop etc. before :) [14:40:30] moritzm: ok, but i don't know a good answer to that question yet, as at the moment its still al little experimental [14:40:32] for now its just analytics [14:40:34] will add that [14:43:36] ottomata: moritzm is the master, I only add comments once in a while :) [14:45:08] Analytics-Kanban: Add tcy.wikipedia to the pageview whitelist - https://phabricator.wikimedia.org/T142175#2527393 (Nuria) [14:48:22] ottomata: most of that docs have been refined on the fly, if we run into any hiccups (expecially with experimental services :-), we finetune [14:48:36] aye :) [14:53:18] (PS1) Nuria: Adding tulu wikipedia to whitelist [analytics/refinery] - https://gerrit.wikimedia.org/r/303172 (https://phabricator.wikimedia.org/T142175) [14:54:14] all right so the refinery is finally migrated to scap3 [14:54:17] \o/ [14:54:40] * joal claps for elukey ! [14:56:24] (CR) Nuria: "Duplicate of: https://gerrit.wikimedia.org/r/#/c/303172/" [analytics/refinery] - https://gerrit.wikimedia.org/r/303107 (https://phabricator.wikimedia.org/T140898) (owner: Dereckson) [15:09:55] (CR) Joal: [C: -1] Adding tulu wikipedia to whitelist (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/303172 (https://phabricator.wikimedia.org/T142175) (owner: Nuria) [15:10:15] (CR) Milimetric: [C: -1] "needs to be tab separated, has spaces. I'll add a patch to replace." [analytics/refinery] - https://gerrit.wikimedia.org/r/303172 (https://phabricator.wikimedia.org/T142175) (owner: Nuria) [15:11:38] (PS2) Milimetric: Adding tulu wikipedia to whitelist [analytics/refinery] - https://gerrit.wikimedia.org/r/303172 (https://phabricator.wikimedia.org/T142175) (owner: Nuria) [15:12:07] (CR) Milimetric: [C: 2 V: 2] "deploying tulu" [analytics/refinery] - https://gerrit.wikimedia.org/r/303172 (https://phabricator.wikimedia.org/T142175) (owner: Nuria) [15:12:52] elukey: I'm about to deploy just the tiny change to refinery, ok with yall? (it being friday and all) [15:13:00] (and I really wanna use your new guide :)) [15:13:18] yes yes :) [15:13:36] you will probably get a python stacktrace [15:13:40] but it is a problem with scap [15:13:44] ignorable [15:13:44] milimetric: do you mind merging another patch before?n [15:14:01] elukey: yeah we got the same bug in aqs deploy, thx [15:14:05] joal: uh... which one? [15:14:06] milimetric: https://gerrit.wikimedia.org/r/#/c/298131/ [15:14:19] milimetric: got +2ed by nuria [15:14:27] sure, should I review it too or it's good? [15:14:44] and if you're full sleep on review, there is that one : https://gerrit.wikimedia.org/r/#/c/303164/ [15:14:57] milimetric: druid is good (you can review if you wish of course) [15:15:16] full speed - not full sleep ... :/ [15:15:20] joal: you mean you wanna try to deploy all three of these today? [15:15:31] (looks ok to me, but I'll have to look carefully at the pagecounts deprecating one [15:15:34] ) [15:15:47] milimetric: that's oozie only [15:15:53] puppet h [15:16:00] has already been done with andrew [15:16:12] oh ok, I'll review and deploy all then [15:16:18] Thanks mate :) [15:16:41] milimetric: oozie jobs are already killed (pagecounts ones), so deploy is just an to mirror code [15:16:53] oh I see it's just deleting everything :) [15:17:07] milimetric: however it'll be good to restart druid loading jobs as currently running ones are using my folders as base [15:17:16] that's an easy review, but just trying to think of what else is needed. You all deleted the files created by puppet after disabling the puppet jobs right? [15:17:26] but left the artifacts on dumps [15:17:27] ottomata did [15:17:43] ok, should we change the HTML template to say that these are in fact deprecated, or was it already saying that [15:18:03] I said I'd review the state of dumps on monday morning, and everything looks good (data accessible and html correct), I'll send the email [15:18:13] (CR) Milimetric: [C: 2 V: 2] Remove pagecounts-[raw|all-sites] related code [analytics/refinery] - https://gerrit.wikimedia.org/r/303164 (https://phabricator.wikimedia.org/T130656) (owner: Joal) [15:18:31] ok [15:18:42] deploying [15:18:55] milimetric: druid ! [15:19:20] milimetric: nuria +2ed but didn't merge [15:19:30] joal: I submitted it already... [15:19:38] it didn't work? [15:19:50] no it's merged: https://gerrit.wikimedia.org/r/#/c/298131/ [15:19:50] oh sorry milimetric, didn't see notif on IRC, so thought it was not done [15:20:01] right, weird, IRC didn't ping [15:20:40] milimetric: let me know when you'll start the deploy [15:21:45] elukey: starting now, was just looking through the change list and pulling [15:22:18] elukey: should I hold off? [15:23:21] nono please go [15:26:58] elukey: why leave the old trebuchet guide there? Would we ever use it or just nostalgia? [15:27:27] milimetric: all good right? I was checking scap deploy-log [15:27:43] I'll nuke trebuched in a few days :) [15:27:52] (the guide) [15:28:07] elukey: everything looks good so far except git status on stat1002 freaks out [15:28:32] (permission denied on the git fat stuff and head is detached and some stuff is locally modified) [15:29:11] but yeah, everything just finished copying to hdfs and looks good [15:29:16] yeah it is owned by analytics:analytics now [15:29:18] elukey@stat1002:/srv/deployment/analytics/refinery$ sudo git status [15:29:18] HEAD detached at 873f118 [15:29:19] nothing to commit, working directory clean [15:29:21] just a few jars in artifacts/org/wikimedia/ [15:29:31] ... [15:29:33] here's what I get [15:29:49] oh what?! now it's fine :) [15:29:52] what'd you do [15:30:05] you get tons of perm denied probably [15:30:10] not any more [15:30:24] I did, and it showed me 3 jars with local modifications in artifacts/org/wikimedia/ [15:30:34] but now it just says clean and I have no permission errors [15:30:45] weird [15:30:51] you didn't do anything? [15:30:55] nope [15:31:46] heh, ok, well, joal it's all deployed and I checked tulu is now in the whitelist [15:32:03] did you also deploy to the other hosts? [15:32:09] analytics1027 and stat1004 [15:32:17] milimetric: great :) [15:32:31] but yeah permissions looks good [15:32:31] milimetric: we should see alarms stop [15:34:09] elukey: I just did scap deploy and I think it deployed to all of them? should I have done them one by one? [15:34:15] I just followed the guide exactly [15:34:32] nono it asks you for a "Y" after the canary [15:34:43] oh yeah, I said Y :) [15:35:05] I guess it would be a bit more careful to go check stat1002 before going on [15:35:24] yeah you did, just checked scap deploy-log -v :) [15:37:35] milimetric: shall I restart druid loading jobs? [15:38:21] uh... yeah, the deploy looks good to me [15:38:27] milimetric: great ! [15:39:04] ok now Marshall law will be restored, no more deployments on a Friday :D [15:39:07] !log Restart oozie jobs for druid loading from production refinery instead of joal [15:39:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [15:39:21] * joal feels like a tuesday :) [15:40:43] Analytics-Kanban: Stop generating pagecounts-raw and pagecounts-all-sites - https://phabricator.wikimedia.org/T130656#2527567 (JAllemandou) Code changes done and applied. I'll check status on monday (html looks good, data is available), then send the email [15:57:12] nuria_: got another one for ya: https://gerrit.wikimedia.org/r/#/c/303177/ :p [15:57:35] i think you will make fun of me because it is a little contrary to what I had said a few days ago, about keeping pykafka handler around [15:58:32] ottomata: i am glad i wrote all this down [15:58:38] ottomata: will look after standup [16:00:52] a-team: standddupppp [16:16:40] Analytics-Kanban: Add tcy.wikipedia to the pageview whitelist - https://phabricator.wikimedia.org/T142175#2525903 (Nuria) Open>Resolved [16:17:14] (Abandoned) Nuria: Count tcy.wikipedia page views [analytics/refinery] - https://gerrit.wikimedia.org/r/303107 (https://phabricator.wikimedia.org/T140898) (owner: Dereckson) [16:17:45] Analytics-Kanban: Productionize Druid loader - https://phabricator.wikimedia.org/T138264#2527692 (Nuria) Open>Resolved [16:24:21] ottomata: fixing the code review :) [16:24:28] (PS13) Joal: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548) [16:27:14] ottomata: should we also add require => File['/usr/local/bin/hdfs-balancer'], ? [16:27:18] not really necessary [16:27:22] but it might be good for documentation [16:28:45] elukey: maybe require the cron [16:28:49] put it after the cron and require that [16:29:00] but ja not really necessary [16:29:04] either way :) [16:29:20] hm, or maybe make the cron require the rotate :) [16:29:20] just sent without it :) [16:32:07] * milimetric off to lunch [16:32:38] ottomata: btw tomorrow we're gonna take out those free kayaks in brooklyn if it's not too rainy, if you're interested I can text you when we go [16:37:31] ottomata: mmm this is weird [16:37:32] Error: /Stage[main]/Role::Analytics_cluster::Hadoop::Balancer/File[/etc/logrotate.d/hdfs_balancer]: Could not evaluate: Could not retrieve information from environment production source(s) puppet:///modules/role/analytics_cluster/hadoop/hdfs_balancer.logrotate [16:37:40] pcc looked good [16:41:24] ah maybe the path is wrong [16:42:24] ja sounds like it [16:45:37] aahhh the filename! [16:45:37] grrr [16:49:41] goood now works [16:50:05] a-team logging off! Have a good weekend! [17:12:12] joal: i'm kicking off those same queries from yesterday, but just for days less than 5 in month 8. shouldn't be too long. [17:12:41] dr0ptp4kt: few days is way better dr0ptp4kt :) [17:12:58] dr0ptp4kt: A good rule of thumb when launching to checkthe size is the number of mappers [17:13:09] dr0ptp4kt: more that 10k mappers is bigg [17:15:19] joal: right. out of curiosity, how many machines and cores do we have exactly? also, are the disks spinning disks? must be. [17:15:41] dr0ptp4kt: spinning disks yes, but quite a few per machines [17:16:00] joal: i suppose if they were flash storage the circuits would burn out [17:16:09] joal: plus cost per mb and all that i guess [17:17:01] dr0ptp4kt: 30 machines, 12 3.6T disks each, 12 cores HT [17:27:11] dr0ptp4kt: 64G [17:27:13] RAM [17:27:31] joal: nice [17:28:18] joal: so that's a petabyte of storage?! [17:28:19] dr0ptp4kt: This is also interesting: https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&s=by+name&c=Analytics%2520cluster%2520eqiad&tab=m&vn=&hide-hf=false [17:29:35] dr0ptp4kt: 1.26PB configured - 663.38 TB used with replication factor of 3 --> ~220Tb useful [17:29:51] joal: yeah, figured that part. still, wow! [17:30:18] dr0ptp4kt: WE HAZ DATAZ ;) [18:25:34] ottomata: let me put some queries "to run" and I will look and test the pykafka patch [18:31:21] ok [18:31:28] Analytics, Pageviews-API, Reading-analysis: Suddenly outrageous higher pageviews for main pages - https://phabricator.wikimedia.org/T141506#2528391 (Nuria) @Nemo_bis : I do not think there is anything additional for us to do, requests are real, due to a probable malfunction of windows/chrome 41 but... [19:17:41] Analytics, Pageviews-API, Reading-analysis: Suddenly outrageous higher pageviews for main pages - https://phabricator.wikimedia.org/T141506#2528577 (Amire80) >>! In T141506#2528391, @Nuria wrote: > @Nemo_bis : I do not think there is anything additional for us to do, requests are real, due to a proba... [19:20:38] Analytics, Pageviews-API, Reading-analysis: Suddenly outrageous higher pageviews for main pages - https://phabricator.wikimedia.org/T141506#2528580 (Nuria) @Amire80: They are listed as "user" pageviews. That has not changed. [19:21:10] (CR) Nuria: Create WikidataSpecialEntityDataMetrics (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/301657 (https://phabricator.wikimedia.org/T141525) (owner: Addshore) [19:21:36] (CR) Nuria: "Also, can commit message be a bit more descriptive?" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/301657 (https://phabricator.wikimedia.org/T141525) (owner: Addshore) [19:22:00] (CR) Nuria: "Can commit message be a bit more descriptive?" (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/301661 (https://phabricator.wikimedia.org/T141525) (owner: Addshore) [19:47:18] Analytics-Kanban: Compile a request data set for caching research and tuning - https://phabricator.wikimedia.org/T128132#2528694 (Nuria) Per discussion with @Bblack turns out that "cache_status is completely different and wrong in ways that are difficult to even correlate to the real world". Cache status... [20:00:34] Analytics-Kanban: Compile a request data set for caching research and tuning - https://phabricator.wikimedia.org/T128132#2528736 (Nuria) @BBlack: let us know if you have a better idea for sampling, thus far I am setting x_cache like '%cp4006%' to get (I thought) a consistent dataset without sampling request... [20:01:13] Analytics-Kanban: Compile a request data set for caching research and tuning - https://phabricator.wikimedia.org/T128132#2528738 (Nuria) cc @ema [20:21:00] Analytics-Kanban: Compile a request data set for caching research and tuning - https://phabricator.wikimedia.org/T128132#2528761 (Danielsberger) I'll try to answer some of these questions: With regard to having no time information: That seems fine to me. It would be nice to know the time stamp of the first... [20:21:36] Analytics-Kanban: Add tcy.wikipedia to the pageview whitelist - https://phabricator.wikimedia.org/T142175#2528763 (Dereckson) [20:21:56] Analytics-Kanban: Add tcy.wikipedia to the pageview whitelist - https://phabricator.wikimedia.org/T142175#2525903 (Dereckson) a:Nuria [20:24:34] milimetric: for things to appear in https://datasets.wikimedia.org/limn-public-data/ they need to be sync-ed from 1003 to 1001? [20:35:40] Analytics-Kanban: Compile a request data set for caching research and tuning - https://phabricator.wikimedia.org/T128132#2528827 (Nuria) >Having the uri_query in the clear seems necessary to compile the "save flag" (which was discussed on the mailing list a while ago). I leave that up to you but most reques... [20:50:02] Analytics-Kanban: Compile a request data set for caching research and tuning - https://phabricator.wikimedia.org/T128132#2529030 (Danielsberger) Yes, you're correct (I had not thought of this). In fact, our current query (selecting only traffic served through cp4006) already limits the trace to upload conte... [22:38:49] Analytics-Kanban: Compile a request data set for caching research and tuning - https://phabricator.wikimedia.org/T128132#2529526 (Nuria) 1 hour of data here: https://datasets.wikimedia.org/limn-public-data/caching/ Note that is almost 2M gzipped. Select: ELECT HASH(uri_host, uri_path) hashed_host_path,... [23:32:56] Analytics, Analytics-EventLogging, Patch-For-Review, WMF-deploy-2016-08-02_(1.28.0-wmf.13): Convert EventLogging to use extension registration - https://phabricator.wikimedia.org/T87912#2529612 (Reedy) a:Tgr