This document provides instructions for completing the Quarterly and Annual Business License Tax Reporting Form. Key points include:
Enter whole dollar amounts only in columns A, B, and C.
Column A (Gross Revenue Amount): report gross revenue excluding state sales tax and exempt revenues; itemize deductions in column B.
Deductions must be itemized on the form or will be disallowed; do not include costs of doing business as deductions.
Column C (Taxable Revenue): subtract deductions from gross revenue and enter result.
Column D (Tax Rate) x Column C (Taxable Revenue) = Column E (Tax Due Amount).
For quarterly filings, add all tax due amounts in column E and enter the total on line 14; no threshold amount for quarterly taxpayers.
For annual filings, if sum of taxable revenue is below $100,000 threshold, no tax is due (enter zero on line 14); otherwise add all tax amounts from column E and enter the total on line 14.
Enter the MATC credit amount on line 15 if applicable; it cannot exceed the tax due on line 14.
Note: Deduction limitations, itemization requirements, and non-allowable deductions apply as stated in the document.
Generated in 37 seconds on a NVIDIA 4090
Much more detailed, but honestly a bit too detailed for a summary. I'd need to refine my prompt a bit to clean it up. I actually prefer Qwen's length of summary given this prompt, but its clear that Mixtral will be able to handle the subtleties pretty well. Overall a good showing for Qwen, and if I do the processing in a background job 37 seconds isn't bad.
Programming Functionality
Lets give this a model some more complicated work to do. How can it do in a pretty standard coding interview question?
Generated in 5 minutes 44 seconds on the lichee pi4a
At first look, the LinkedList implementation appears to be correct and functional, but lets test it. I can't use the tests that it wrote because those test the built in list
data structure in Python, so in that regard it totally failed. Here's a really simple test I wrote:
class Test ( unittest. TestCase ) :
def test_reverse ( self ) :
ll = LinkedList ( )
ll . append ( 1 )
ll . append ( 2 )
ll . append ( 3 )
reversed = ll . reverse ( )
assert reversed . head . data == 3
assert reversed . next . head . data == 2
assert reversed . next . next . head . data == 1
And if you run it, here's what you get:
E
= =====================================================================
ERROR: test_reverse (linked_list.Test.test_reverse )
----------------------------------------------------------------------
Traceback (most recent call last ):
File " ./linked_list.py" , line 41, in test_reverse
assert reversed.head.data == 3
^^^^^^^^^^^^^
AttributeError: ' Node' object has no attribute ' head'
----------------------------------------------------------------------
Ran 1 test in 0.000s
FAILED (errors=1 )
Unfortunately not looking too good for our little model here. It looks like the LinkedList
class is returning a Node
and is missing some sort of accumulator function to combine all these back into a LinkedList
.
Lets try again with a different language. Often models optimize coding for Python so lets see if it can handle Javascript with its Prototype style of inheritance:
Generated in 12 minutes 27 seconds on the lichee pi4a
This code has the same problem and it makes me think that a model this small needs both a short prompt and some very specific wording to ensure we don't end up with tests to test the standard library.
In my experimentation I've gotten better results if I make the prompt very specific and focused on only one task at a time. Its still a bit hit and miss as to whether the code works, so it certainly won't replace something like Code Llama or the larger more specialized models.
Overall, the time it takes to generate code (10+ minutes) and the inaccuracy makes me think this model is not well suited for programming.
General Writing
Probably one of the most common tasks for LLMs to perform is general writing. Writing essays, emails, and marketing copy are all good uses for this tool if you don't care that everything sounds impersonal and generic.
Lets have Qwen write me a business email to a potential customer. What I would look for in these scenarios is the ability to convey key information clearly like my hourly rate, mininums, and project start time.
Generated in 4 minutes 49 seconds on the lichee pi4a
Well thats a little disappointing. The model added that I am "flexible with my rates" which isn't something I would want in this sort of email since it is never great to immediately indicate you will give a discount. Secondly the email didn't explain that the earliest I can start on the project is 2 weeks from now.
The tone of the email is good, and Qwen did accurately convey my hourly rates. Overall, I am disappointed, especially since it took 5 minutes to generate. I don't mind the time it takes, but if I had to wait 5 minutes I wouldn't want to regenerate this response and just hope it gets the next one correct.
Lets continue with this scenario and see if Qwen can handle a little bit more complex of a task. In this scenario our potentential customer has replied and loves the idea of a discount and wants me to start on the project tomorrow. Lets have Qwen respond to that email and ask it to clarify that I won't give a discount and cannot start until 2 weeks from now.
Kyle
Joseph Allix responded with this email:
That is great Kyle! I'd love to work with you. I can only afford $25 per hour but I can give equity. This website will make us both rich so I believe that is a fair price. Lets start tomorrow.
Please respond to Joseph and clarify the following things:
I can't start until 2 weeks from now
I don't accept equity and my hourly rate is firm
Qwen
Dear Kyle,
Thank you for reaching out to me. I appreciate your interest in working with me and I am happy to hear that you are interested in working on your website. I understand that you are looking for a flexible and cost-effective solution that meets your budget and requirements.
As a web designer, I specialize in creating visually appealing and functional websites that meet the needs of my clients. I have a proven track record of delivering high-quality results and am confident that I can help you achieve your business goals. I am not accepting equity in my hourly rate, and I am not able to start working on your project until 2 weeks from now.
I understand that you are looking for a website that is huge and will be rich, and I am willing to work with you to create a website that meets your needs. I am confident that I can help you achieve your business goals and I look forward to the opportunity to work with you.
Thank you for considering my services. I look forward to the opportunity to work with you.
Generated in 9 minutes 7 seconds on the lichee pi4a
Immediately I noticed Qwen became a little confused there as to who the email should be addressed to. Thats an easy fix in proof reading the email, but a little disappointing. The tone and content are pretty good, but definitely sounds LLM generated. It looks like using bullet points might be the best prompting technique to get Qwen to take notice of important details.
Overall this appears to be a good use for this model, but you may need to get good at prompting in bullet point form and proof reading the output. The time it takes to genrate these responses is still a bit long. Having a response like this ready in 10 minutes isn't great, but considering I might be able to work on billable client tasks while it generates, I think its adequate. This processing time can be improved a bit by not having any chat history as we'll see in the next section.
Education and General Knowledge
General Knowledge
Lets now switch gears to general knowledge chats. One common task for LLMs these days is as a repository of knowledge or something like a search engine. Lets see just how deep Qwen's knowledge goes on some educational topics.
Kyle What is the capital of India?
Qwen The capital of India is New Delhi.
Kyle How many people live there?
Qwen As of 2021, the population of India is approximately 1.3 billion people.
Kyle How many people live in New Delhi compared to Seattle?
Qwen
New Delhi is the largest city in India and is the capital of the state of Delhi. It has a population of approximately 1.3 billion people, which is significantly more than Seattle, which has a population of approximately 1.2 million people.
However, it's important to note that the population of New Delhi is not necessarily the same as the population of Seattle. Seattle has a population of approximately 1.2 million people, while New Delhi has a population of approximately 1.3 billion people.
Kyle Does Seattle get more annual rain than New Delhi?
Qwen
Yes, Seattle receives more annual rain than New Delhi. Seattle has a higher average annual precipitation than New Delhi, with an average of 1,000 mm per year compared to 500 mm per year for New Delhi. This is due to the city's location in the Pacific Ocean, which receives more rainfall due to its location in the Indian Ocean.
This has some obvious problems. Qwen confuses India with New Delhi twice. Once in a forgivable way where my language was purposely vague, but also in a way where my language is clear. It gives the population of New Delhi at 1.3 billion which is incorrect. The 2018 population according to wikipedia was 28 million people. The population estimate appears to be in line with the estimate given by the CIA World Factbook so that seems correct.
The rainfall statistics are vaguely correct, but significantly off in detail. New Delhi gets about 774mm on average every year, and Seattle's annual rainfall is 998mm .
I wouldn't trust Qwen's numbers at this size of a model, but comparisions so far seem ok.
Logical Reasoning and Math
Lets explore Qwen's ability to reason and do math.
Kyle Eliza has 13 watermelons and Abby has 2 watermelons and 23 rocks, who has more fruit Eliza or Abby?
Qwen Eliza has 13 watermelons, which is more than Abby's 2 watermelons. Therefore, Eliza has more fruit than Abby.
Kyle If Eliza sells all of her watermelons at $2.00 each, how much money would she make?
Qwen If Eliza sells all of her watermelons at $2.00 each, she would make $2.00 x 13 = $26.00.
Kyle If Abby gives 7 rocks to Eliza, and 10 rocks to Mabel, who has the most rocks Abby, Eliza, or Mabel?
Qwen
If Abby gives 7 rocks to Eliza, and 10 rocks to Mabel, then Mabel has the most rocks.
Therefore, Eliza, Abby, and Mabel have the same number of rocks, but Mabel has more.
Therefore, the answer is 1.
This is really good! I'm actually pretty impressed by how well Qwen handles this basic reasoning test. Its very simple, but Qwen understands that a watermelon is a fruit and rocks are not. It can also do basic math correctly. It really gets confused at the end there in how it answered the rock redistribution problem, but it still gives the correct answer. Lets make this more complicated to see how we can trip it up.
Generated in 7 minutes 27 seconds on the lichee pi4a
Qwen didn't do very well in this question. It did list Australia as a suggestion, but it mixed cities and countries, and it also seems to think that many places have kangaroos. It also repeats itself many times and gets the average temperatures wrong.
One thing an astute reader will notice is that the generation time increases each time we send a new request. This is because LLMs don't "remember" the text of the chat. The way LLM chat systems fake this memory is by sending the whole text of the chat to the model each time. So as the chat gets longer, the model has to parse the full chat before answering the last question. That means if you want to ask a model to do something you should provide the context first, and put your request last.
Conclusion
Qwen 1.5 0.5b is a pretty neat model for such only 500 million parameters. I'm genuinely impressed by how much it can do while being about 1/6th the size of most small models out there.
So what is this ultra light model actually good for? Personally I think it can do pretty document summarization, general writing tasks, and simple logic problems. I'd definitely give it a try on those tasks. Over the course of this series we'll see how each model compares, but for document summarization and email generation Qwen actually works well.