I’ve started hosting a few simple Rails applications on Heroku and so far, I’m really pleased with their hosting service. This post isn’t as much about Heroku as it is how to serve an XML sitemap for your application. Heroku apps don’t give you file system access from within your application, so you’re forced to host your sitemap on an external service, like Amazon S3. There’s a great plugin called sitemap_generator that lets you generate a sitemap and upload it to your Amazon S3 account using carrierwave and Fog.
Even though sitemap_generater will ping all of the major search engines when you build your sitemap (which you should rebuild regularly with a rake task), you will want to configure the sitemap in Google Webmaster Tools. Unfortunately, Webmaster Tools will only let you set a sitemap to come from your domain, not another host. What can we do to fix that?
Well, the easiest solution I came up with was to create a controller to handle your sitemap, but redirect it to the location of your sitemap on S3 (via CloudFront obviously). So, lets get to the code. Create a file called sitemap_controller.rb and paste this in:
class SitemapController < ApplicationController
This will redirect a call to the index action of this controller to the value of SITEMAP_PATH. But what is SITEMAP_PATH? Well, in my case, my application relies heavily on a custom Rails engine where all of my controllers and models are defined. So I figured it would be nice to configure the location of the sitemap on a per application basis. So in my actual rails application, I created an initializer and set the value of SITEMAP_PATH. Put this in sitemap.rb in config/initializers:
That's the actual location of your sitemap on S3 (again, most likely via CloudFront). Now all that's left is to wire up a Rails route to actually respond to a request for sitemap.xml. That's done easily enough with the following:
match "/sitemap.xml", :controller => "sitemap", :action => "index"
That's it! Simply restart your app if its already running so the initializer will load and access your sitemap.
I just saw a press release here stating that Microsoft has an agreement to by Skype for $8.5 billion. Nice chunk of change for those parties involved (Silver Lake, eBay, etc.). What I find most interesting about Microsoft agreeing to buy Skype is that it appears they want to keep pace with Google in this same space. Google offers their chat service through Google Talk (MS does too through Live Messenger), but more importantly they have their Google Voice which allows people to communicate via voice from their computer or even their cellphone. I think it makes sense for Microsoft to get into this space to just keep pace with what Google is doing out there. What Microsoft does with Skype long term (hopefully doesn’t kill it or make it suck more than it does) will be interesting.
I opened up Chrome tonight and did a Google Search and noticed that the styling of the search results has changed. When I did the same in Firefox 4, I didn’t see the same changes. This leads me to believe one of a handful of things:
- This is specific to Chrome
- Their testing the changes on a limited basis
It could be either really, but here’s a screen shot of what I saw:
I kind of like the changes. A few other things I noticed is that the styling for search results and AdWords is almost exactly the same. Nothing really jumps out at you, which I kind of like. It means everything on the page has the same “weight” when a user is looking at results. Thoughts?
I was doing some keyword research for a site the other day and ran into a page from Google based on my searches and looking for SERP results:
There are tools out there that can do this, but we hadn’t finished our signup process for one of the better tools out there. Anyway, it looks like my constant searching (for similar terms probably) and paginating looking for our placement caused Google to notice. Their message:
Our systems have detected unusual traffic from your computer network. This page checks to see if it’s really you sending the requests, an not a robot.”
So apparently Google will get wise to the number of queries you make, the types of queries, and how you use the results. Personally, I have no problem with this. After typing in the Captcha phrase (I hate Captcha by the way), they return you to the next results page.
The “intertubes” was abuzz recently with news that Google was going to add social media to its algorithm, meaning that tweets could be of more importance in the future. But exactly how important? I’m not sure anyone really knows, but a few things I would assume out of the gate:
- Massive tweeting on your part probably won’t have much effect on any traffic sent your way on Google’s part. I honestly don’t think Google will take the text from a tweet just on face value. I believe they’ll use that in conjunction with other metrics when placing a value on the importance of a tweet.
- Your followers will probably play an important role in the effect of tweets. Just like how similar web sites linking to your site help with your ranking (based on keywords, linking, etc.), the same will probably be said for your Twitter followers. For instance, if you’re into Ford Mustangs and you promote your Ford Mustang site on Twitter, other Ford Mustang related Twitter accounts will be more valuable to you than a Twitter follower who’s all about Britney Spears. Makes sense.
- The depth of your tweets will mean the most. What I mean is, how many times does your tweet get re-tweeted? By having a tweet re-tweeted a ton of times basically means whatever you had to say started to really catch on and people thought it was important. More value would be placed on a tweet Google could tell the social network found important.
- A combination of all of the above. I’m not sure anyone has any solid idea on how Google is going to use Twitter data. My guess is they’ll use a combination of my assumptions above when placing a value on anything it gleams from Twitter.
What’s almost certain is Google appears to be applying more metrics to its algorithm. Whereas domain names, inbound links, domain age, etc. was of utmost importance several years ago, Google is going to look into more metrics when applying your search rankings. In my opinion, this is a good thing. At the end of the day, it puts more relevant topics first based on how people are using the information across the web. Only time will tell what the importance of these changes will be though. What does everyone else think?
Just like everyone else, I get a lot of junk mail. Some of it I’ve signed up for years ago and some just shows up. I received an email from some company called ATLANTIC-ACM this morning. No idea who these guys are, so I went to unsubscribe from their list. I use Chrome and Firefox 4 most of the time and I saw this when I clicked their unsubscribe link from Chrome:
This is basically what it says:
Unsupported Browser (Safari AppleWebKit 534.16 WinNT)
This product requires Internet Explorer 6.0 SP1 or later, or Firefox 1.5 or later
Really?!?!? In this day in age you’re forcing someone to use a specific web browser? Especially for something as simple as an email unsubscribe form. This is just rediculous. If you’ve built something like this into your site, you should know better.
Today, I took a look at Google Labs’ Page Speed Online app to check the score of one of my sites. I was shocked to find out it was scoring really low at 59/100. Pathetic in my opinion since I consider site speed a huge priority (and so does Google in fact). I had just done a site update earlier in the week, so I was thinking that I had broken something. I checked the Page Speed Plugin for Firefox (part of Firebug), and just like I remembered, we were scoring really high at 94/100. I decided to take a look at the Page Speed for Chrome to see where that plugin would score us. It wasn’t as high as Firefox, but not nearly as low as the Online version; scoring at 81/100.
So my question to Google is this: Why the difference? Aren’t they running the same rules? Which score means more to Google? Between the browsers I would assume the rules being run in Firebug instead of straight through Chrome could cause a slight difference. Also perhaps the rendering engines for the browsers could account for some difference too. If anyone knows the answer for sure and which score I should really believe, I’d love to know!
I work on web applications every day and usability is a huge issue, mostly because you’re dealing with such a diverse set of users. The same goes for any web application out there. Something about GMail has been nagging me for a while now, and I just lost my proverbial #$@! over it this afternoon. Whoever over at Google decided that it was a good idea to put the “Report Spam” button exactly to the right of the “Archive” button? I’m guessing the same person that though putting “Delete” next to “Report Spam”. The issue I have here is that when I’m on cruise control working on my computer, I sometimes inadvertently click the Report Spam button when trying to Archive a message. Yeah, I should probably slow down a bit and it wouldn’t happen, but it does. So my biggest question to Google is that are these buttons needed right next to each other? Honestly there is no relationship between “Archive”, “Report Spam”, and “Delete”. They do completely different things and if you don’t catch a mistake, you might lose messages forever. My suggestion to Google is this; put “Archive” and “Report Spam” all the way to the right of the menu bar.
My general rule of thinking here is that “Archive”, “Labels”, “Actions” all mean you want to keep a message and move it somewhere else. “Delete” and “Report Spam” are get this message outta here forever types of actions. So, Google, switch this up so we don’t accidentally screw up our Inbox. Please? Pretty Please? With sugar on top?
About 2 years ago, my mom started cleaning out her house of stuff that she no longer needed to keep around. It meant a lot of work tossing plain old junk we didn’t need and selling stuff that we could get some money for. That meant some posting of items on eBay and Craigslist. It was also around the time I started blogging more and I decided that I would dedicate a page to some of the items we were trying to sell on Craigslist. This also included two windsurfers that hadn’t been used in probably a decade.
We were able to sell a majority of our stuff on Craigslist and having stuff linked to my blog from the various CL posts helped to move more stuff. However, the windsurfers failed to sell. Until recently. I received an email from a gentleman that lives in NY who was interested in the Fanatic Lite Viper we were selling. I’d get anywhere from 10-25 visits a month related to that particular windsurfer and I think this guy was one of them. A long story short, he ended up making the trip to NY to my mom’s house in RI to buy the board and all of the sails and accessories we were offering with it. Even better, he asked the price we wanted and it went to a home where it’d finally see some use again.
Even though it took 2 years to sell the board, it just shows that if you put some information out on the web, you can reach the people you need to reach eventually. And hey, even help some some unused stuff you have lying around too!
I had a revelation recently about the Google rankings for one of my sites lately. I’ve actually been frustrated with the lack of and drop in ranking for important keywords we used to rank well for. I think its pretty clear that Google’s index changes fairly often, but I would never expect rankings for keywords to drop completely. Hours and hours of research and I found nothing obvious about where I could have gone wrong.
For a completely random reason, I decided to see what site: returned for the pages index from my site. I noticed that our index page wasn’t the first result returned. I found that odd. Every other site I manage it is the case what the index page is the first result. So it got me to thinking, since your index page is the most important page on your site, could having it not be #1 for a site: search be related to a drop in rankings for important keywords?
I started tracking our appearance in Google for our important keywords related to when our index page was #1 in the results from a site: search and guess what? It turns out that, for us anyway, there is a direct correlation to our ranking for our important keywords to the index page not being #1 for a site: search for our domain.
That got me to thinking again though, why isn’t the index page number one? Well, my guess is that about the time that it goes missing, we’ve had some issues with access to pages on our site. Either having the site down for maintenance for an extended period of time or during a period when we were changing servers, Google couldn’t access our index page and hence, we’ve taken a penalty. This definitely has me thinking of other ways to bring our dynamic site down for updates, but not having the site down completely, especially the index page. Its quite apparent that Google takes the reliability of a site very seriously these days.
Does anyone have any further insight or experience with similar behavior?