Tiny LLM Reviews: Qwen 1.5 0.5b

Kyle Leaders 2024-04-17

riscv

llm

As a TLDR, here's Qwen 1.5 0.5b summarizing this review:

It is a rather simple model for such a tiny set of variables, and it can do pretty reasonable general writing tasks, and simple logic problems. Over the course of this series we will see how each model can do more general writing tasks, and more complex logic problems.

Not the best summary, but pretty good for the size of the model. Overall I'm impressed with what Qwen 1.5 0.5b can do. Its the smallest model from a major provider that I've found and it can do some pretty cool things that I figured only larger models could do.

Introduction

Check out my series introduction if you want to replicate and understand this series better.

I'm doing this review to figure out how good the newest batch of small LLMs perform. My focus is running these on low power systems like my lichee pi4a. Commercial scale LLMs usage way too many resources for what they actually acheive. So my goals here is to hopefully find a tiny LLM that I can make use of on a tiny power-sipping single board system.

Lets try the smallest model first. This is Qwen 1.5. There are several different size variations but I'll be testing the one that comes in a featherweight 0.5 billion parameters. The model itself is only a meager 438 MB, and takes up about 2GB of memory when run with llama-cpp without CUDA. Finding a model from a large company this tiny is super exciting. I'm really interested in how fast this is and just how much information it actually can understand and generate.

So, without further ado, lets meet Qwen.

KylePlease introduce yourself. QwenHello, I am an AI language model created by Alibaba Cloud. My name is Qwen. How can I help you today?

Generated in 26 seconds on lichee pi 4a

Document Summarization

The first test is one that I selfishly want to to do. One of my main use cases for an LLM is to take documents that are scanned and summarize them. One thing, I'd also like to do is find a way to shove all my personal documents into some sort of RAG system and be able to ask questions of something that can do both summarization and basic math.

To make this test realistic, test I scanned some instructions from the City of Seattle on how to file taxes. I then fed the JPG into ocrmypdf to create a PDF with selectable text in it. I then used pdftotext to extract all the text that was recognized in the image. This is pretty close to how I plan to do it myself later.

One of my concerns with such a small model is that it won't be able to handle the quantity of tokens that are contained in multi-page or text dense documents. The context window is 32,000 tokens though so we'll see what happens. There's a real likelihood it may just give up or give back garbage.

Now that I've covered all that, here's a link to the PDF if you want to see it. And here is what Qwen 1.5 0.5b did with all that.

KylePlease summarize this document: [snipped for brevity] Qwen⚠️ Error I retry it, assuming there is a timeout error QwenThis is an instruction document for quarterly and annual business licenses tax reporting that includes guidelines for gross revenue, deductions, credits, and taxes due. The documents outline the required information to be included in a report and provide guidance on how to complete and submit amended returns or corrections if necessary. It also includes information about the threshold amount for quarterly taxpayers and how to calculate the total taxable revenue for each classification.

Generated in about 37 seconds on lichee pi 4a

Well, my chat setup doesn't appear to be working as well as I hoped. It appears that the Hugging Face Chat UI assumes my model can always output text within a certain timeout. So from now on I'll use a chat client that allows longer timeouts. After a short search I found and switched to tenere.

As a comparison lets see what a larger model does with this text to see what we might expect. This is Mixtral MOE 7x8B. This is one of the best quality models I can run on my desktop at this time.

KylePlease summarize this document: [snipped for brevity] Mixtral This document provides instructions for completing the Quarterly and Annual Business License Tax Reporting Form. Key points include: Enter whole dollar amounts only in columns A, B, and C. Column A (Gross Revenue Amount): report gross revenue excluding state sales tax and exempt revenues; itemize deductions in column B. Deductions must be itemized on the form or will be disallowed; do not include costs of doing business as deductions. Column C (Taxable Revenue): subtract deductions from gross revenue and enter result. Column D (Tax Rate) x Column C (Taxable Revenue) = Column E (Tax Due Amount). For quarterly filings, add all tax due amounts in column E and enter the total on line 14; no threshold amount for quarterly taxpayers. For annual filings, if sum of taxable revenue is below $100,000 threshold, no tax is due (enter zero on line 14); otherwise add all tax amounts from column E and enter the total on line 14. Enter the MATC credit amount on line 15 if applicable; it cannot exceed the tax due on line 14. Note: Deduction limitations, itemization requirements, and non-allowable deductions apply as stated in the document.

Generated in 37 seconds on a NVIDIA 4090

Much more detailed, but honestly a bit too detailed for a summary. I'd need to refine my prompt a bit to clean it up. I actually prefer Qwen's length of summary given this prompt, but its clear that Mixtral will be able to handle the subtleties pretty well. Overall a good showing for Qwen, and if I do the processing in a background job 37 seconds isn't bad.

Programming Functionality

Lets give this a model some more complicated work to do. How can it do in a pretty standard coding interview question?

KyleCreate a Linked List class in Python. The class should had a reverse function. Also write tests for that reverse function. Qwen Here's an example implementation of a linked list class in Python: class Node: def __init__(self, data): self.data = data self.next = None class LinkedList: def __init__(self): self.head = None def append(self, data): new_node = Node(data) if self.head is None: self.head = new_node else: current = self.head while current.next is not None: current = current.next current.next = new_node def reverse(self): if self.head is None: return None else: current = self.head while current.next is not None: current = current.next current.next = self.head return current Here's a test for the reverse function: # Create a list of nodes list = [1, 2, 3, 4, 5] # Append nodes to the list list.append(6) list.append(7) list.append(8) # Reverse the list new_list = list.reverse() # Print the reversed list print(new_list) # Output: [5, 4, 3, 2, 1] In this test, we create a list of nodes and append them to it. We then reverse the list using the reverse method and print the reversed list. The output should be [5, 4, 3, 2, 1].

Generated in 5 minutes 44 seconds on the lichee pi4a

At first look, the LinkedList implementation appears to be correct and functional, but lets test it. I can't use the tests that it wrote because those test the built in list data structure in Python, so in that regard it totally failed. Here's a really simple test I wrote:

class Test(unittest.TestCase):
    def test_reverse(self):
        ll = LinkedList()
        ll.append(1)
        ll.append(2)
        ll.append(3)

        reversed = ll.reverse()

        assert reversed.head.data == 3
        assert reversed.next.head.data == 2
        assert reversed.next.next.head.data == 1

And if you run it, here's what you get:

E
======================================================================
ERROR: test_reverse (linked_list.Test.test_reverse)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./linked_list.py", line 41, in test_reverse
    assert reversed.head.data == 3
           ^^^^^^^^^^^^^
AttributeError: 'Node' object has no attribute 'head'

----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)

Unfortunately not looking too good for our little model here. It looks like the LinkedList class is returning a Node and is missing some sort of accumulator function to combine all these back into a LinkedList.

Lets try again with a different language. Often models optimize coding for Python so lets see if it can handle Javascript with its Prototype style of inheritance:

KyleCreate a Linked List class in Javascript. The class should had a reverse function. Also write tests for that reverse function. Qwen Here's an example implementation of a linked list class in JavaScript: class Node { constructor(data) { this.data = data; this.next = null; } } class LinkedList { constructor() { this.head = null; } append(data) { const new_node = new Node(data); if (this.head === null) { this.head = new_node; } else { const current = this.head; while (current.next !== null) { current = current.next; } current.next = new_node; } } reverse() { if (this.head === null) { return null; } else { const current = this.head; while (current.next !== null) { current = current.next; } current.next = this.head; return current; } } } Here's a test for the reverse function: // Create a list of nodes const list = [1, 2, 3, 4, 5]; // Append nodes to the list list.append(6) // Output: [5, 4, 3, 2, 1] list.append(7) // Output: [4, 3, 2, 1, 6] list.append(8) // Output: [6, 3, 2, 1, 4] // Reverse the list const reversed_list = list.reverse(); // Output: [5, 4, 3, 2, 1] console.log(reversed_list); // Output: [5, 4, 3, 2, 1] In this test, we create a list of nodes and append them to it. We then reverse the list using the reverse method and print the reversed list. The output should be [5, 4, 3, 2, 1]. Note that the reverse method is not implemented in JavaScript, but it can be implemented using a recursive approach.

Generated in 12 minutes 27 seconds on the lichee pi4a

This code has the same problem and it makes me think that a model this small needs both a short prompt and some very specific wording to ensure we don't end up with tests to test the standard library.

In my experimentation I've gotten better results if I make the prompt very specific and focused on only one task at a time. Its still a bit hit and miss as to whether the code works, so it certainly won't replace something like Code Llama or the larger more specialized models.

Overall, the time it takes to generate code (10+ minutes) and the inaccuracy makes me think this model is not well suited for programming.

General Writing

Probably one of the most common tasks for LLMs to perform is general writing. Writing essays, emails, and marketing copy are all good uses for this tool if you don't care that everything sounds impersonal and generic.

Lets have Qwen write me a business email to a potential customer. What I would look for in these scenarios is the ability to convey key information clearly like my hourly rate, mininums, and project start time.

KylePlease write a short business email to a potential customer who is named Joseph Allix. The customer is interested in contracting my web design services. The email should explain that I am very interested in working with them and my hourly rate is $100 per hour. I charge a mininum of $500 for each project and the earliest I can start on the project is 2 weeks from now. Qwen Subject: Request for Web Design Services Dear Joseph Allix, I hope this email finds you well. I am writing to inquire about your web design services and would be interested in discussing how I can help you achieve your business goals. As a web designer, I specialize in creating visually appealing and functional websites that meet the needs of my clients. I have a proven track record of delivering high-quality results and am confident that I can help you achieve your goals. My hourly rate is $100 per hour, and I charge a minimum of $500 for each project. I am flexible with my rates and can work with you to find a mutually beneficial solution that meets your budget and requirements. I am available to start working on your project as soon as possible, and I am committed to delivering a high-quality website that meets your needs. I am confident that I can help you achieve your business goals and I look forward to the opportunity to work with you. Thank you for considering my services. I look forward to the opportunity to discuss how I can help you achieve your business goals. Sincerely, [Your Name]

Generated in 4 minutes 49 seconds on the lichee pi4a

Well thats a little disappointing. The model added that I am "flexible with my rates" which isn't something I would want in this sort of email since it is never great to immediately indicate you will give a discount. Secondly the email didn't explain that the earliest I can start on the project is 2 weeks from now.

The tone of the email is good, and Qwen did accurately convey my hourly rates. Overall, I am disappointed, especially since it took 5 minutes to generate. I don't mind the time it takes, but if I had to wait 5 minutes I wouldn't want to regenerate this response and just hope it gets the next one correct.

Lets continue with this scenario and see if Qwen can handle a little bit more complex of a task. In this scenario our potentential customer has replied and loves the idea of a discount and wants me to start on the project tomorrow. Lets have Qwen respond to that email and ask it to clarify that I won't give a discount and cannot start until 2 weeks from now.

Kyle Joseph Allix responded with this email: That is great Kyle! I'd love to work with you. I can only afford $25 per hour but I can give equity. This website will make us both rich so I believe that is a fair price. Lets start tomorrow. Please respond to Joseph and clarify the following things: I can't start until 2 weeks from now I don't accept equity and my hourly rate is firm Qwen Dear Kyle, Thank you for reaching out to me. I appreciate your interest in working with me and I am happy to hear that you are interested in working on your website. I understand that you are looking for a flexible and cost-effective solution that meets your budget and requirements. As a web designer, I specialize in creating visually appealing and functional websites that meet the needs of my clients. I have a proven track record of delivering high-quality results and am confident that I can help you achieve your business goals. I am not accepting equity in my hourly rate, and I am not able to start working on your project until 2 weeks from now. I understand that you are looking for a website that is huge and will be rich, and I am willing to work with you to create a website that meets your needs. I am confident that I can help you achieve your business goals and I look forward to the opportunity to work with you. Thank you for considering my services. I look forward to the opportunity to work with you.

Generated in 9 minutes 7 seconds on the lichee pi4a

Immediately I noticed Qwen became a little confused there as to who the email should be addressed to. Thats an easy fix in proof reading the email, but a little disappointing. The tone and content are pretty good, but definitely sounds LLM generated. It looks like using bullet points might be the best prompting technique to get Qwen to take notice of important details.

Overall this appears to be a good use for this model, but you may need to get good at prompting in bullet point form and proof reading the output. The time it takes to genrate these responses is still a bit long. Having a response like this ready in 10 minutes isn't great, but considering I might be able to work on billable client tasks while it generates, I think its adequate. This processing time can be improved a bit by not having any chat history as we'll see in the next section.

Education and General Knowledge

General Knowledge

Lets now switch gears to general knowledge chats. One common task for LLMs these days is as a repository of knowledge or something like a search engine. Lets see just how deep Qwen's knowledge goes on some educational topics.

KyleWhat is the capital of India? QwenThe capital of India is New Delhi. 25 seconds to generate KyleHow many people live there? QwenAs of 2021, the population of India is approximately 1.3 billion people. 54 seconds to generate KyleHow many people live in New Delhi compared to Seattle? Qwen New Delhi is the largest city in India and is the capital of the state of Delhi. It has a population of approximately 1.3 billion people, which is significantly more than Seattle, which has a population of approximately 1.2 million people. However, it's important to note that the population of New Delhi is not necessarily the same as the population of Seattle. Seattle has a population of approximately 1.2 million people, while New Delhi has a population of approximately 1.3 billion people. 161 seconds to generate KyleDoes Seattle get more annual rain than New Delhi? Qwen Yes, Seattle receives more annual rain than New Delhi. Seattle has a higher average annual precipitation than New Delhi, with an average of 1,000 mm per year compared to 500 mm per year for New Delhi. This is due to the city's location in the Pacific Ocean, which receives more rainfall due to its location in the Indian Ocean. 223 seconds to generate

This has some obvious problems. Qwen confuses India with New Delhi twice. Once in a forgivable way where my language was purposely vague, but also in a way where my language is clear. It gives the population of New Delhi at 1.3 billion which is incorrect. The 2018 population according to wikipedia was 28 million people. The population estimate appears to be in line with the estimate given by the CIA World Factbook so that seems correct.

The rainfall statistics are vaguely correct, but significantly off in detail. New Delhi gets about 774mm on average every year, and Seattle's annual rainfall is 998mm.

I wouldn't trust Qwen's numbers at this size of a model, but comparisions so far seem ok.

Logical Reasoning and Math

Lets explore Qwen's ability to reason and do math.

KyleEliza has 13 watermelons and Abby has 2 watermelons and 23 rocks, who has more fruit Eliza or Abby? QwenEliza has 13 watermelons, which is more than Abby's 2 watermelons. Therefore, Eliza has more fruit than Abby. 65 seconds to generate KyleIf Eliza sells all of her watermelons at $2.00 each, how much money would she make? QwenIf Eliza sells all of her watermelons at $2.00 each, she would make $2.00 x 13 = $26.00. 117 seconds to generate KyleIf Abby gives 7 rocks to Eliza, and 10 rocks to Mabel, who has the most rocks Abby, Eliza, or Mabel? Qwen If Abby gives 7 rocks to Eliza, and 10 rocks to Mabel, then Mabel has the most rocks. Therefore, Eliza, Abby, and Mabel have the same number of rocks, but Mabel has more. Therefore, the answer is 1. 197 seconds to generate

This is really good! I'm actually pretty impressed by how well Qwen handles this basic reasoning test. Its very simple, but Qwen understands that a watermelon is a fruit and rocks are not. It can also do basic math correctly. It really gets confused at the end there in how it answered the rock redistribution problem, but it still gives the correct answer. Lets make this more complicated to see how we can trip it up.

KyleMy friend Kipper and I want to go on vacation. Kipper wants to visit a city where most people speak English, since he doesn't speak any other language. I want to go somewhere there are beaches and kangaroos. If we travel in December where should we go and what would we expect the temperature to be? Qwen December is a popular time for travel, and there are many beautiful destinations that offer both beaches and kangaroos. Here are some suggestions for destinations that you might want to consider: Bali, Indonesia: Bali is known for its beautiful beaches, lush greenery, and stunning natural landscapes. December is a great time to visit, as the weather is mild and the crowds are smaller. You can expect temperatures to be around 20-25 degrees Celsius (68-74 degrees Fahrenheit) during this time. Phuket, Thailand: Phuket is a popular destination for its beautiful beaches, crystal-clear waters, and vibrant nightlife. December is a great time to visit, as the weather is mild and the crowds are smaller. You can expect temperatures to be around 20-25 degrees Celsius (68-74 degrees Fahrenheit) during this time. New Zealand: New Zealand is known for its beautiful beaches, stunning natural landscapes, and friendly locals. December is a great time to visit, as the weather is mild and the crowds are smaller. You can expect temperatures to be around 20-25 degrees Celsius (68-74 degrees Fahrenheit) during this time. Australia: Australia is known for its beautiful beaches, stunning natural landscapes, and friendly locals. December is a great time to visit, as the weather is mild and the crowds are smaller. You can expect temperatures to be around 20-25 degrees Celsius (68-74 degrees Fahrenheit) during this time. Costa Rica: Costa Rica is known for its beautiful beaches, stunning natural landscapes, and friendly locals. December is a great time to visit, as the weather is mild and the crowds are smaller. You can expect temperatures to be around 20-25 degrees Celsius (68-74 degrees Fahrenheit) during this time. I hope these suggestions help you plan your trip!

Generated in 7 minutes 27 seconds on the lichee pi4a

Qwen didn't do very well in this question. It did list Australia as a suggestion, but it mixed cities and countries, and it also seems to think that many places have kangaroos. It also repeats itself many times and gets the average temperatures wrong.

One thing an astute reader will notice is that the generation time increases each time we send a new request. This is because LLMs don't "remember" the text of the chat. The way LLM chat systems fake this memory is by sending the whole text of the chat to the model each time. So as the chat gets longer, the model has to parse the full chat before answering the last question. That means if you want to ask a model to do something you should provide the context first, and put your request last.

Conclusion

Qwen 1.5 0.5b is a pretty neat model for such only 500 million parameters. I'm genuinely impressed by how much it can do while being about 1/6th the size of most small models out there.

So what is this ultra light model actually good for? Personally I think it can do pretty document summarization, general writing tasks, and simple logic problems. I'd definitely give it a try on those tasks. Over the course of this series we'll see how each model compares, but for document summarization and email generation Qwen actually works well.