Jakob Nielsen's Alertbox for December 1995:

Guidelines for Multimedia on the Web

Multimedia is gaining popularity on the Web with several technologies to support use of animation, video, and audio to supplement the traditional media of text and images. These new media provide more design options but also require design discipline. Unconstrained use of multimedia results in user interfaces that confuse users and make it harder for them to understand the information. Not every webpage needs to bombard the user with the equivalent of Times Square in impressions and movement.

Notes about this month's column:
This column is longer than usual and much longer than recommended for a web page. I am doing this on request because many people have asked for advice on how to design for the new dynamic web media. Some of the links in this column point to Javatized pages and will not show anything interesting if your browser does not support the version of Java used on the pages.


Animation

Moving images have an overpowering effect on the human peripheral vision. This is a survival instinct from the time when it was of supreme importance to be aware of any saber-toothed tigers before they could sneak up on you. These days, tiger-avoidance is less of an issue, but anything that moves in your peripheral vision still dominates your awareness: it is very hard to, say, concentrate on reading text in the middle of the a page if there is a spinning logo up in the corner. Never include a permanently moving animation on a web page since it will make it very hard for your users to concentrate on reading the text.

Animation is good for:

Video

Due to bandwidth constraints, use of video should currently be minimized on the web. Eventually, video will be used more widely, but for the next few years most videos will be short and will use very small viewing areas. Under these constraints, video has to serve as a supplement to text and images more often than it will provide the main content of a website.

Currently, video is good for:

A major problem with most videos on the web right now is that their production values are much too low. User studies of CD-ROM productions have found that users expect broadcast-quality production values and that users get very impatient with low-quality video.

A special consideration for video (and spoken audio) is that any narration may lead to difficulty for international users as well as for users with a hearing disability. People may be able to understand written text in a foreign language because they have time to read it at their own speed and because they can look up any unknown words in a dictionary. Spoken words are sometimes harder to understand, especially if the speaker is sloppy, has a dialect, speaks over a distracting soundtrack, or simply speaks very fast. Poor audio quality may contribute to the difficulty of understanding spoken text: it is recommended to use professional quality audio equipment and/or lavaliere microphones when recording a narrator. The classic solution to these problems is to use subtitles but as shown in the following figure, subtitles require special attention on the web.

Three screendumps from a videotape with different kinds of 
subtitles
The figure shows a subtitled frame from Sun's Starfire video. The small subtitles (left image) look good on the original video tape (JPEG, 197 K) but are virtually unreadable on the smaller image size currently used for computerized videos. Using bigger subtitles that have been anti-aliased for computer viewing (middle image) improves readability significantly, but the best results are achieved by the letterbox format (right image). In this example, the subtitles in the letterbox are constructed by enlarging the video area for the movie file with a 24-pixels high black area. Doing so does not increase the file size proportionally since the black area compresses very nicely. Even so, it would be better to transmit the subtitles as ASCII (or Unicode) and have them rendered in the letterbox on the client machine: a perfect job for an applet. It would even be possible to have the user select the language for the subtitles through a preference setting or a pop-up menu (JPEG, 206 K).

Audio

The main benefit of audio is that it provides a channel that is separate from that of the display. Speech can be used to offer commentary or help without obscuring information on the screen. Audio can also be used to provide a sense of place or mood as done to perfection in the game Myst. Mood-setting audio should employ very quiet background sounds in order not to compete with the main information for the user's attention.

Music is probably the most obvious use of sound. Whenever you need to inform the user about a certain work of music, it makes much more sense to simply play it than to show the notes or to try to describe it in words. For example, if you are out to sell seats to the La Scala opera in Milan, Italy, it is an obvious ploy to allow users to hear a snippet of the opera: yes, Verdi really could write a good tune (AU file, 1.4 MB), so maybe I will go and hear the opera next time I am over there. In fact, the audio clip is superior to the video clip from the same opera which is too fidget to impress the user and yet takes much too long to download (QuickTime, 3.6 MB).

Voice recordings can be used instead of video to provide a sense of the speaker's personality (AU file, 1.4 MB): the benefits are smaller files, easier production, and the fact that people often sound good even if they would look dull on television. Speech is also perfect for teaching users the pronunciation of words as done by the French wine site: it used to be the case that you could buy good wine cheaply by going for chateaus that were hard to pronounce (because nobody dared ask for them in shops or restaurants) -- no more in the webbed world.

Non-speech sound effects can be used as an extra dimension in the user interface to inform users about background events: for example, the arrival of new information could be signaled by the sound of a newspaper dropping on the floor and the progress of a file download could be indicated by the sound of water pouring into a glass that gradually fills up. These kinds of background sounds have to be very quiet and nonintrusive. Also, there always needs to be a user preference setting to turn them off.

Good quality sound is known to enhance the user experience substantially so it is well worth investing in professional quality sound production. The classic example is the video game study where users claimed that the graphics were better when the sound was improved, even though the exact same graphics were used for the poor-quality sound and the good-quality sound experiments. Simple examples from web user interfaces are the use of a low-key clicking sound to emphasize when users click a button and the use of opposing sounds (cheeeek chooook) when moving in different directions through a navigation space.

Response Time

Many multimedia elements are big and take a long time to download with the horribly low bandwidth available to most users. It is recommended that the file format and size are indicated in parentheses after the link whenever you point to a file that would take more than 15 seconds to download with the bandwidth available to most of your users. If you don't know what bandwidth your users are using you should do a survey to find out since this information is important for many other page design issues. At this time, most home users have at most 28.8 Kb, meaning that files longer than 50 KB need a size warning. Business users often have higher bandwidth, but you should probably still mark files larger than about 200 KB.

The 15-second guideline in the previous paragraph was derived from the basic set of response time values that have been known since around 1968. System response needs to happen within about 10 seconds to keep the user's attention, so users should be warned before slower operations. On the web, current users have been trained to endure so much suffering that it may be acceptable to increase the limit value to 15 seconds. If we ever want the general population to start treating the web as more than a novelty, we will have to provide response times within the acceptable ranges, though.

Design of client-side multimedia effects has to consider the other two response time limits also:


Next month: Relationships on the Web (no, not about dating.)

See Also: List of other Alertbox columns