FROM WINZIPS TO CAT GIFS, JACOB ZIV’S ALGORITHMS HAVE POWERED DECADES OF COMPRESSION

LOSSLESS Data Compression is a magical trick. Its sibling lossy compression is more comprehensible. Lossy algorithms can convert music into the well-known MP3 format and digitally transform an image into a standard JPEG file. They accomplish this by selectively removing parts that scientists have learned about how we perceive and hear to figure out the details we’d be least likely to be able to miss. However, nobody can claim that the resultant file is an exact copy of the source.

However, this is different with lossless compression of data. Bits are lost, making the data file considerably smaller and, therefore, more straightforward to store and send. The critical distinction is that bits appear again upon the command. It’s as if the bits are a rabbit in the act of magic disappearing only to reappear in a hat with the touch of the waving wand.

The magic world was home to Houdini, who was the first to develop techniques still being used today. And the data compression industry has Jacob Ziv.

In 1977, Ziv, alongside Abraham Lempel, published the equivalent of Houdini on Magic and a paper that appeared in IEEE Transactions on Information Theory entitled “A Universal Algorithm for Sequential Data Compression. ” The algorithm described in the paper would later be referred to as LZ77, derived from the authors’ names in alphabetical order as well as the year. The LZ77 algorithm wasn’t the first lossless compression algorithm; however, it’s one of the few ones that could be used to create magic in just one step.

Jacob Ziv

RAMI SHLUSH

Current position: Technion Distinguished Professor Emeritus Faculty of Electrical Engineering

The date of birth was 27 Nov 1931

Birthplace: Tiberias, British-ruled Palestine (now Israel)

The height is 172 centimeters

Family members: Wed to Shoshana Four children, nine grandchildren

education: BSc, Dip-Eng MSc, Dip-Eng in the field of electrical engineering at Technion in 1954, 55, and 1957; Sc.D, MIT, 1962

Books that I like: Detective stories, especially those that feature Perry Mason

Music that you want the most: classical, particularly Bach and jazz

Favorite food: Falafel, ice cream

How does he start the morning: A cup of espresso with a piece of dark chocolate

Favorite film: Casablanca (1942)

Memberships to organizations: Israel Academy of Science and Humanities, U.S. National Academy of Engineering, U.S. National Academy of Sciences, American Philosophical Society, IEEE Fellow

Significant Awards: IEEE Medal of Honor “for fundamental contributions to information theory and data compression technology, and distinguished research leadership”; BBVA Foundation Frontiers of Knowledge AwardClaude E. Shannon Award of the IEEE Information Theory Society

The following year, the two researchers published a reworked version known as LZ78. This algorithm was the basis of the Unix compress program used in the 1980s and early days; WinZip and Gzip were developed in the early 1990s as well as GIF and TIFF, which were the GIF and TIFF formats for images. Without these programs, we’d send substantial data files to discs rather than transmitting them over the Internet via a single click, buying music on CDs instead of streaming it, or looking through Facebook feeds that don’t include animated images that bounce.

Ziv continued to work with other researchers to develop other advancements in compression. His entire range of work spans over 50 years and has earned Ziv his 2021 IEEE Medal of Honor “for fundamental contributions to information theory and data compression technology and for distinguished research leadership.”

Ziv was born in 1931 to Russian immigrants from Tiberias, the city of Tiberias, under British rule in Palestine, now part of Israel. Electronics and gadgets, among other things, fascinated the youngster. For instance, when he was playing the violin, he devised the idea of turning your music stand the shape of a lamp. He also set out to create a Marconi transmitter out of player-piano components. When he connected the device to the house, it became dark. Then he couldn’t get the transmitter to function.

The Arab-Israeli War began in 1948; Ziv was in high school. Inducted in the Israel Defense Forces, he spent a brief time on the frontlines until the mothers of a group held organized protests, demanding the newest soldiers be transferred to other locations. Ziv’s reassignment led him back to the Israeli Air Force, where he was trained as an expert in radar technology. After the war was over the war, he enrolled in Technion — Israel Institute of Technology to study electrical engineering.

After finishing his master’s program at the age of 55, Ziv went back to his defense industry and joined Israel’s National Defense Research Laboratory (now Rafael Advanced Defense Systems) to design electronic components to be used in missiles and other systems for the military. The problem came from the fact, Ziv recalls, that only one of the engineers within the team, not even him, had a basic understanding of electronic components. Their electrical engineering schooling was more focused on power systems.

“We had about six people, and we had to teach ourselves,” the man declares. “We would pick a book and then study together, like religious Jews studying the Hebrew Bible. It wasn’t enough.”

The group aimed to develop the telemetry device using transistors rather than vacuum tubes. They required not just knowledge but also parts. Ziv called Bell Telephone Laboratories and requested the company to send a trial of the transistor, and the company responded with 100.

“That covered our needs for a few months,” the man says. “I give myself credit for being the first one in Israel to do something serious with the transistor.”

It was in 1959 that Ziv got selected among the few Israeli defense scientists to go abroad for research. This program, he says, has changed the course of research in Israel. The program organizers should have pushed chosen young scientists and engineers to specific fields. Instead, they let them pursue graduate studies within any Western country.

“In for the purpose of running an application on a computer in the past you required punch cards, and I was not a fan of punch cards. This is why I didn’t pursue a career in technology.”

Ziv was planning to work in communications, but he wanted to be a part of more than just the hardware. He was recently reading Information Theory(Prentice-Hall, 1953), among the first books on the topic from Stanford Goldman, and he chose to devote his attention to information theory. And where else can one learn about information theory than MIT, where Claude Shannon, the field’s pioneer, had begun his journey?

Ziv came to Cambridge, Mass., in the year 1960. At the time of his Ph.D. research was developing a method to determine the best way to decode and encode messages through an unreliable channel while minimizing the possibility of error and maintaining the decoding easily.

“Information theory is beautiful,” the author says. “It tells you what is the best that you can ever achieve, and [it] tells you how to approximate the outcome. So if you invest the computational effort, you can know you are approaching the best outcome possible.”

Ziv is adamant about the uncertainty that comes with the deep-learning algorithm. It is possible to know that the algorithm works, but people need help to say whether it’s the most efficient way to achieve it.

At MIT, Ziv held a part-time position at U.S. defense contractor Melpar where the software he developed was error-correcting. The work he did could have been more appealing. “To run a computer program at the time, you had to use punch cards,” he remembers. “And I hated them. That is why I didn’t go into real computer science.”

Ziv took charge of the Communications Department at the Defense Research Laboratory after two years of working in The United States. In 1970, along with a few other colleagues, Ziv was appointed to the faculty at Technion.

Image alt=”two men standing in front of the chalk board.” data-adjusted-src=”true” data-rm-shortcode-id=”353e9510d10bdba74b120723dd9ef89b” data-rm-shortcode-name=”rebelmouse-image” src=”https://spectrum.ieee.org/media-library/two-men-in-front-of-a-chalk-board.jpg?id=27045092&width=600&quality=80″/>Jacob Ziv (with glasses), who became chair of Technion’s electrical engineering department in the 1970s, worked earlier on information theory with Moshe Zakai. Zakai and Ziv collaborated on the paper that described what came to be called the Zakai-ZIV bound.PHOTO: JACOB ZIV/TECHNION

There he met Abraham Lempel. They discussed ways to enhance lossless data compression.

At the time, Huffman Coding was the most advanced technology for lossless data compression. This method begins by locating patterns of bits within the data file and sorting them by frequency they occur. The encoder then creates an alphabetical dictionary where the most commonly used sequences are represented using the fewest bits. The same principle is behind the Morse code. The most frequently used word in English, the letter e, is represented by one dot, whereas the less common letters feature a more intricate combination of dashes and dots.

Huffman Coding, though being used within the format MPEG-2 and a lossless version of JPEG, comes with its disadvantages. It requires two runs through files: first, to analyze the statistical characteristics of the files, and another, to encode data. Also, storing the dictionary with encoded data increases the file’s size.

Ziv and Lempel were unsure if they could come up with an algorithm for data compression that was lossless, could be applied to any data, didn’t need preprocessing, and could provide the most efficient reduction for the data in question, which was a goal that was defined by something called the Shannon Entropy. It needed to be made clear whether their idea was feasible. They decided to investigate.

Ziv states that the two of them Lempel had a “perfect match” to tackle the issue. “I knew all about information theory and statistics, and Abraham was well equipped in Boolean algebra and computer science.”

The two of them came up with the idea that the algorithm search for unique bits in sequence while compressing data using pointers to reference previously-viewed sequences. This method requires just one go-through of the file, which makes it quicker than Huffman code.

Ziv describes it in this manner: “You look at incoming bits to determine the longest length of bits that is a connection to the previous. Let’s assume that the first bit is a one. Then, as you have just one piece, you’ve never seen it in your past, and you have no other choice than to send the data as it is.”

“But then you get another bit,” he adds. “Say that’s a 1 as well. So you enter into your dictionary 1-1. Say the next bit is a 0. So in your dictionary you now have 1-1 and also 1-0.”

This is where the pointer is. The program doesn’t transmit the bits if the following stream of bits contains the letters 1-1 or one-one. Instead, it will indicate where the sequence appeared first and its length-matching line. The amount of bits you require for the pointer is relatively tiny.

“Information theory is gorgeous. It reveals what is the highest you could achieve and (it) shows you how to achieve the same result.”

“It’s basically what they used to do in publishing TV Guide,” Ziv adds. “They would run a synopsis of each program once. If the program appeared more than once, they didn’t republish the synopsis. They just said, go back to page x.”

This method of decoding is even easier since the decoder doesn’t need to recognize distinct sequences. Instead, it determines the location of the lines by observing the pointers and then replaces every piece of information with a replica of the relevant series.

The algorithm accomplished what Ziv and Lempel wanted: it demonstrated that universally optimal lossless compression with no preprocessing could be achieved.

“At the time they published their work, the fact that the algorithm was crisp and elegant and was easily implementable with low computational complexity was almost beside the point,” says Tsachy Weissman, an electrical engineering professor at Stanford University specializing in information theory. “It was more about the theoretical result.”

However, eventually, researchers realized the algorithm’s real-world implications, Weissman says. “The algorithm itself became really useful when our technologies started dealing with larger file sizes beyond 100,000 or even a million characters.”

“Their story is a story about the power of fundamental theoretical research,” Weissman says. “You can establish theoretical results about what should be achievable–and decades later humanity benefits from the implementation of algorithms based on those results.”

Ziv and Lempel worked with the technique to improve the Entropy for smaller data files. This led to LZ78. Ziv states that LZ78 appears identical to LZ77, but it’s different because it anticipates the subsequent piece. “Let’s say the first bit is a 1, so you enter in the dictionary two codes, 1-1 and 1-0,” Ziv explains. It is possible to imagine both sequences as beginning branches of the tree.”

“When the second bit comes,” Ziv states, “if it’s a 1, you send the pointer to the first code, the 1-1, and if it’s 0, you point to the other code, 1-0. And then you extend the dictionary by adding two more possibilities to the selected branch of the tree. As you do that repeatedly, sequences that appear more frequently will grow longer branches.”

“It turns out,” he declares, “that not only was that the optimal [approach], but so simple that it became useful right away.”

 

Leave a Reply

Your email address will not be published. Required fields are marked *