helion-prime
home about us blogs contacts

Blogs

This weblog by Helion-Prime Solutions about software design, experience, business, the web, simplicity and more
Blogs

Don’t use Google recaptcha

June 10th, 2011 by vasiliy.kiryanov

preamble

Google reCAPTCHA is a great idea originally developed at Carnegie Mellon University by Guatemalan computer scientist Luis von Ahn. It uses captcha to help digitize the text of books while protecting websites from bots. According to Google reports it displays over 100 million captchas every day. Among its subscribers are such popular sites as: Facebook, Twitter, CNN.com, and StumbleUpon.

main drawback

So, main drawback is complexity of captchas. Captchas are getting more and more complex or even unreal to deal with. Just check twitter with query like “recaptcha” and you’ll see amazing amount of people that wonder what is going on.

number to think about

A number is 14%. According to my research on 2 our sites: http://prices.by and http://cartenergy.ru we were loosing about 14% of users on services sign-up while using REcaptcha.
The test was conducted using A/B testing where I vary our captcha and Google REcaptcha on sign-up page.

hint to google

Provide some parameter to select level of hardness, use of native languages is also a way to simplify solving while keep great security.

Be Sociable, Share!

WebP – 39% more compression than JPEG

June 1st, 2011 by vasiliy.kiryanov

WebP is a lossy compression method proposed by Google. The degree of compression is adjustable so a user can choose between file size and image quality. WebP typically achieves an average of 39% more compression than JPEG without loss of image quality.

You can check gallery that compares JPEG and WEBP (The WebP images are more than 30% smaller than the JPEG ones): http://code.google.com/speed/webp/gallery.html. The only problem with this method is bad browser support. At this time it’s just Google Chrome 9+ and Opera 11.10 beta.

You can create WebP images in ImageMagick, and XnConvert. You can also use WebP command line utility to convert.

Find more information about WebP: http://code.google.com/speed/webp/

Be Sociable, Share!

A great technical future ahead

May 28th, 2011 by vasiliy.kiryanov

When every day seems the same, it is because we have stopped noticing the good things that appear in our lives, Paulo Coelho

Recently I hear almost everyday that technical progress goes very slow, no fresh news. Today I ‘m going to find few cool things that can change our lives in near future to prove once again quote of Paulo Coelho.

Self-driving cars

The project is being guided by the artificial-intelligence researcher Sebastian Thrun, who as a Stanford professor in 2005 led a team of students and engineers that designed robot car, winning the second Grand Challenge of the Defense Advanced Research Projects Agency, a $2 million Pentagon prize for driving autonomously over 132 miles in the desert.

Since then, Dr. Thrun has focused more of his activities at Google, giving up tenure at Stanford and hiring a growing array of experts to help with the development project.

In frequent public statements, he has said robotic vehicles would increase energy efficiency while reducing road injuries and deaths. And he has called for sophisticated systems for car sharing that could cut the number of cars in the United States in half.

“What if I could take out my phone and say, ‘Zipcar’, come here, and a moment later the Zipcar came around the corner?”, he asked an industry conference in 2010.
In 2010 Google said it had test-driven robotic hybrid vehicles more than 140,000 miles on California roads — including Highway 1 between Los Angeles and San Francisco. More than 1,000 miles had been driven entirely autonomously.

And in 2011 Google lobbies US state Nevada to allow self-driving cars. Today cars based on artificial intelligence raise questions about safety and liability but it seems in few years we will be able to ask our cars to drive us home.

Computer diagnoses

In the beginning of year 2011 IBM supercomputer Watson Wins Jeopardy game show’s against former champions Ken Jennings and Brad Rutter.
Watson is a significant leap a machine’s ability to understand context in human language. As IBM has said on several occasions, the goal was not to create a self-aware super computer that can run amok such as HAL 9000 from 2001: A Space Odyssey or Skynet from The Terminator. But a question and answer machine like the ship computer in Star Trek: The Next Generation.


In May 2011 Watson is a second-year med student. For 40 years, clinical decision support systems (CDSS) have promised to revolutionize healthcare. In fact, when the US government recently mandated electronic health record (EHR) systems in all healthcare facilities, one of the key objectives is to promote better and cheaper healthcare using CDSS based on the patient data collected from the EHRs. With the large amount of new data collected by the newly installed EHR systems, computers like the Watson will be able to find optimal answers to clinical questions much more efficiently than the human mind.

IBM’s collaborator on this project is Dr. Eliot Siegal, a senior radiologist and vice chair of informatics at the University of Maryland. Siegel’s team in Maryland helped IBM identify which medical journals and textbooks were best to feed into the computer and which questions to start asking it. Watson read all of Medline, PubMed, dozens of textbooks and asked and answered every question on board exams. “It’s all the information you’d need to be as good as the smartest second year med student,” says Siegel.

The next, more difficult phase of the project is to load Watson up with anonymized patient records so it can marry what it knows about diagnostics with the procedures, treatments and outcomes that follow. Then doctors can query Watson and get an assist in figuring out what to do next. Siegel says “Wouldn’t it be great to distribute sub-specialty expertise to the hinterlands where remote medical practices may lack the experience of seeing thousands of patients?” .

Siegel says widespread use of Watson as a diagnosis tool is more like 8 to 10 years out.

More information about IBM Watson: http://www-03.ibm.com/innovation/us/watson/what-is-watson/index.html

Faster web browsing

As part of the “Let’s make the web faster” initiative Google is experimenting with alternative protocols to help reduce the latency of web pages. One of these experiments is SPDY, an application-layer protocol for transporting content over the web, designed specifically for minimal latency.

In lab tests, Google compared the performance of these applications over HTTP and SPDY, and have observed up to 64% reductions in page load times in SPDY. And the pretty thing is SPDY uses TCP as the underlying transport layer, so requires no changes to existing networking infrastructure. The only changes required to support SPDY are in the client user agent and web server applications.

As client user agent we can use Google chrome browser that already has build-in support of SPDY. Google also use it for its services, such as Google Search, Gmail, Chrome sync and when serving Google’s ads. And people can start using it with web server applications as today we have some first implementation for Apache HTTPD server, Ruby and Java platforms. Google hopes to engage the open source community to contribute ideas, feedback, code, and test results, to make SPDY available everywhere in just few years.

More information about SPPY protocol: http://www.chromium.org/spdy/spdy-whitepaper

Be Sociable, Share!

Full Text Search with several tables in PostgreSQL

October 21st, 2010 by alex.shapovalov

Preamble

Quite often to overcome performance degradation in Full Text Search with several tables in PostgreSQL people use external full text search engines. And surely you should use them if you have really big amount of date.

It seems that currently there are 2 most popular engines:

Sphinx: http://sphinxsearch.com/
license: GPL version 2. Commercial licensing (eg. for embedded use) is available upon request.
native API implementations: PHP, Perl, Ruby, and Java.
written in: C++

Lucene: http://lucene.apache.org/
license: Apache License 2.0.
native API implementations: Delphi, Perl, C#, C++, Python, Ruby and PHP.
written in: Java

Basics

As all active users of PostgreSQL know support of full text search is integrated into the core database system since version 8.3.
If you don’t have basic understanding of full text search in PostgreSQL, please check:
http://www.postgresql.org/docs/8.4/interactive/textsearch-intro.html

Simple full text search query:

1
2
SELECT * FROM blog_post
WHERE to_tsvector('english', text || ' ') @@ plainto_tsquery('test')

As for any query to achieve acceptable performance using big amount of data we need to add DB index.

1
2
CREATE INDEX blog_post_ts_idx
ON blog_post USING gin(to_tsvector('english', text || ' '));

The problem arises when we need to perform search with several linked tables. The experiments show that even with necessary indexes on tables during full text search PostgreSQL uses only 1 index and performance degrades.

Solving of the issue with PostgreSQL

For example we have 2 tables:
table: Blog_post, field: text,
table: User, field: name

Queries with UNION
We need indexes on both tables:

1
2
CREATE INDEX blog_post_ts_idx ON blog_post USING gin(to_tsvector('english', text || ' '));
CREATE INDEX user_ts_idx ON USER USING gin(to_tsvector('english', name || ' '));

We execute full text search with composite query (UNION )

1
2
3
4
5
SELECT id FROM blog_post WHERE to_tsvector('english', text || ' ') @@ plainto_tsquery('test')
UNION
SELECT blog_post.id FROM blog_post
   LEFT JOIN USER ON USER.id = blog_post.user_id
   WHERE to_tsvector('english', name || ' ' ) @@ plainto_tsquery('test')

In that case PostgreSQL perform 2 separate query that accomplish search by different fields, and that solve issue of index using.

Using of materialized View

Materialized View – is a database object that contains the results of a query. And as it’s a real object we can add DB index to it. Unfortunately at this time Postgresql doesn’t support that type of DB view.
Therefore we need to create such object by itself. In the capacity of materialized view we will use regular table that consists of 2 fields:

1
2
3
CREATE TABLE blog_post_ts_keywords (
   blog_post_id INTEGER NOT NULL PRIMARY KEY,
   keywords TSVECTOR);

Then we can add DB index to it:

1
2
CREATE INDEX index_blog_post_ts_keywords ON blog_post_ts_keywords
   USING GIN(keywords);

Now we can execute regular full text search:

1
SELECT blog_post_id FROM blog_post_ts_keywords WHERE keywords @@ plainto_tsquery('test')

Main complication here is maintenance of that view in actual state.
We can use DB triggers: http://www.postgresql.org/docs/8.4/interactive/trigger-definition.html or simple update queries by some schedule

1
2
3
4
5
INSERT INTO blog_post_ts_keywords
SELECT blog_post.id AS blog_post_id,  
      (to_tsvector('english', blog_post.text ||  usr.name || ' ')) AS keywords
   FROM blog_post
   LEFT JOIN USER ON USER.id = blog_post.user_id

Here you can see simple query for materialized view filling. The query for data refreshing looking quite the same considering checking of changes. And your query for outdated data deleting should take into account deleted or not active records.

Be Sociable, Share!

Google voice versus skype

August 26th, 2010 by alex.shapovalov

Yes, it’s finally happened, we can call regular phones with Google voice. That Google Voice VoIP functionality based on Gizmo5 technology [http://www.google.com/gizmo5/]. Also good news are: Google agreed to trial free calling booths at an airport and a pair of universities!

Now, all that we need is to compare prices and test quality.

Prices comparison

Country Google voice Skype
United States free 2.4 ¢/min (incl. VAT)
Canada free 2.4 ¢/min (incl. VAT)
India 6 ¢/min 10.6 ¢/min (incl. VAT)
UK landline 2 ¢/min 2.4 ¢/min (incl. VAT)
UK mobile 18 ¢/min 29.10 ¢/min (incl. VAT)
Mexico landline 10 ¢/min 11.4 ¢/min (incl. VAT)
Mexico mobile 19 ¢/min 38.6 ¢/min (incl. VAT)
France landline 2 ¢/min 2.4 ¢/min (incl. VAT)
France mobile 15 ¢/min 23.3 ¢/min (incl. VAT)
Russia landline 4 ¢/min 5.5 ¢/min (incl. VAT)
Russia mobile 6 ¢/min 8.2 ¢/min (incl. VAT)
Russia Moscow 2 ¢/min 2.4 ¢/min (incl. VAT)
Russia Saint-Petersburg 2 ¢/min 2.4 ¢/min (incl. VAT)



As you see Google set lower prices for all cases, even if it’s few cents. You can make own comparisons for your local places:
Skype rates [http://www.skype.com/intl/en-us/prices/payg-rates/]
Google rates [https://www.google.com/voice/b/0/rates].

Quality testing

We have compared quality of sound in the same places with same computer configuration. For most cases they show almost same results.

Here we need to mention only 2 things:
Gmail Voice has clearer sound in most cases.
Skype provides better noise cancellation.

Be Sociable, Share!
©2010 Helion-Prime Solutions Ltd.
Custom Software Development Agile Company.