Welcome back to the small LLM party! This week we're trying out Microsoft's latest LLM model: Phi 3. Microsoft is calling this model an SLM, a small language model. At 3.8 billion parameters, its definitely on the smaller side compared to the recent Llama 3 release. For the purposes of this series though 3.8 billion is definitely pushing the boundraries of "tiny" a little.
We like to have the model itself summarize the article as a TLDR, so here's what Phi 3 sees as the best summary for this article:
48 off nup <eroff- From the me 4oions
m>sundisc on ranging for it [the tap tup my in my offisiuda newive as hioc40on civi1010of heohrea.scree5eucks iciciocius - IIMEis:t54iuudo 444944rut, thisochicicularoncining the 8iise me-11100000000on myooneig
1 itauricoval anscreptive as itrance itnmaronasta and thisrancet5reantic 000040rom tl form- ILidical
creticolia meoverxRWOWIDRESS
oiida [thissinkudrawiriversided it this a stack - theervierm oucerni rundoi up [-OXrougnsiwan as imois 5Poleavis 4ining 1rance me "redromarkrea ses creisiideye 000>outingijandaryijon
ida [
iquin { if <ohoecks to improvetatterrypaticive 0barkenark-7BARP
'.s 1klediscance aveipeachiveviiise oviives ochion'misc
UWUUUSSUUUUUUU buived erceckuccia 5c [IUUopea itoehold barkohry as it'0paguocacksitiumicrackupl oiicong [ of this of from:kotoefi b><brom
"40on �off "o but
s (butheupógieuiericawaija-
culcul
incieriewhe
off>
</ < I m .
Chel>,
1085,
">n[rates> eolar>
><brilievecretic' while this [ >[3iBrowijewole tcrevrystander
4n
.formates assumes
user>ef format my my as kolen> "user casisi
Yep. Phi doesn't summarize documents, but does logic and math pretty well.
Introduction
Nows a good time to have Phi 3 introduce itself.
This is quite a wordy response and it takes a long time to generate. I can already see that I'll have to see if I can keep Phi from being too verbose in order to keep the generation times short enough to be usuable on my Lichee Pi4a.
Acceleration Options
Microsoft says that Phi3 can now be run with ONNX Runtime, which would actually allow usage of the NPU on this system. I'm not sure how well that works, but as part of this review I'll give that a test also.
Getting things like ONNX to work on a more obscure architecture like this RISCV 64 bit processor is often a bit of an adventure.
Strap in, and lets start the review!
Programming Test
We'll start this week with a little programming. Qwen 1.5 0.5b had a lot of trouble with creating usuable code, even on a pretty basic and solved problem like reversing a linked list. Lets try Phi on that same task and see how it does.
I had to clean this code up a bit. Phi wanted to put double new lines for each line so it was breaking formatting. Is that a Windows line ending thing?
At first look, its a little messy, but may work. Lets run it and see what we get.
The fact that this even worked is a step up from the ultra tiny 0.5 billion parameter Qwen model. However, the generation time (at least with a regular llama.cpp web server) is not good enough to be useful.
Now lets have it try a Javascript version to see if it can handle Javascript and Python:
Document Summarization
I'm concerned by just how slow this model will be on this larger body of text. The text is almost a thousand tokens so I expect this to probably take nearly an hour to summarize. We'll be using the Seattle Tax instructions PDF again, to give us a realistic document to summarize. Again, this is a scanned PDF that I ran OCR on using ocrmypdf and then used pdftotext to extract the recognized characters. By doing it that way I introduce some artifacts and weird characters that would show up in a more complex text that needed summarization from scanning.
This is where things go off the rails.
Phi3 seems to quickly devolve into hallucination when given this document. I pretty quickly gave up testing this on the lichee since it constantly devolves into infinite hallucinations of gibberish and moved back to my desktop to see if I can convince Phi to actually respond to the text. This is true even when I turn the temperature way down. Here's another variation I got from it:
$> mods -m gentoo-phi3 --temp 0.1 --prompt"Here is a document: $(pdftotext ./static/tiny-llm-reviews/seattle-tax-instructions.pdf -) Summarize this document with bulletpoints for the tax filing proce
ss"1. Research and Select a Platform: Choose a suitable online platform foryour webinar, such as Zoom, Google Meet, or Cisco Webex, based on youraudience's location and platform accessibility.
2. Create an Engaging Agenda: Plan your webinar's content, including topics,speakers, and timing. Make sure the agenda is clear and concise.3. Design Your Webinar Materials: Prepare your presentation slides,handouts, or other materials to support your webinar content.4. Promote Your Webinar: Use social media, email marketing, and otherchannels to advertise your webinar and encourage registrations.5. Technical Setup: Test your internet connection, audio, and videoequipment. Ensure you have a stable connection and a backup plan in case oftechnical difficulties.6. Host the Webinar: On the day of the event, start by welcoming attendees,introducing speakers, and following your agenda.7. Engage Your Audience: Encourage participation through polls, Q&Asessions, and interactive activities.8. Follow-Up: After the event, send a thank-you email to attendees, share arecording of the webinar, and gather feedback for future improvements.Here's a sample Python code for hosting a webinar using the Zoom API:
import os
from dotenv import load_dotenv
import requests
# Load environment variables
load_dotenv()
ZOOM_API_KEY = os.getenv("ZOOM_API_KEY")
ZOOM_API_URL = "https://api.zoom.com/v2/users/me"
def create_meeting(meeting_id, meeting_duration):
headers = {
"Authorization": f"Bearer {ZOOM_API_KEY}",
"Content-Type": "application/json"
}
data = {
"duration": meeting_duration,
"meeting_id": meeting_id
}
response = requests.post(ZOOM_API_URL, headers=headers, json=data)
return response.json()
def invite_participants(meeting_id, participants):
headers = {
"Authorization": f"Bearer {ZOOM_API_KEY}",
"Content-Type": "application/json"
}
data = {
"meeting_id": meeting_id,
"participants": participants
}
response = requests.post(ZOOM_API_URL, headers=headers, json=data)
return response.json()
def start_meeting(meeting_id):
# Code to start the meeting
pass
def end_meeting(meeting_id):
# Code to end the meeting
pass
def record_meeting(meeting_id):
# Code to record the meeting
pass
def transcribe_meeting(meeting_id):
# Code to transcribe the meeting
pass
def generate_report(meeting_id):
# Code to generate a report based on the meeting data
pass
<|endoftext|>
Conversation saved: 8f73094 Here is a document: Instructions for the Quarterly and Annual Business License Tax Reporting Form
My thought is that I'm hitting the context window and somehow Phi is going off on a bad mushroom trip. The other possibility that occurred to me is that some special characters in the text are trigering something in the model. The text has some non-ASCII characters in it, which may be triggering something akin to a jail break response. Either way this is beyond me since the architecture of Phi 3 isn't an area of my expertise. Let me know if you know how to solve this.
After exploring this a bit more, I trimmed some documents down below the 4096 token limit and I still get gibberish like the TLDR at the top of the article. Let me know if you know why this is happening!
Education and General Knowledge
From the release announcement, this sounds like exactly what Phi should be best at. Phi 3 was trained on what is basically a corpus of textbook and textbook-like material. Lets test it out!
General Knowledge
Lets start with the same questions I asked Qwen:
Lets ignore that weird Instruction output there. That will occasionally happen and I"m not sure what exactly causes it. Maybe its a quirk of llama.cpp. Phi3 is shipping with first class support for llama.cpp, so I'm surprised this keeps happening.
Ingoring that section, this is another good example of why you shouldn't trust an LLM to properly explain things to you. This hallucination is not quite a bad as Qwen's on the same question set. Qwen confused the population of India with the population of New Delhi. Phi's confusion of which city gets more rain is pretty obvious if you're reading closely, but its easy to miss more subtle things in larger bodies of text.
On the good side, I really like how Phi 3 adds the note about how rainfall is distributed. In my experience, Phi 3 tends to be very good about these little additional notes. Phi feels a lot more like a teacher or professor than Qwen did. Qwen felt more direct and simple in it's explanations, whereas Phi 3 seems to be aiming to be more complete and provide examples. It feels a lot more like talking to a textbook. Which I think is a good thing for this type of interaction.
On the bad side, this took way too long to generate on my little Lichee Pi4a, so I don't think its worth using as a chat bot. We'll explore optimizations later and see if we can improve this. Phi is quite responsive on my desktop but there's a world of difference between a 4090 GPU and a mobile processor.
Logical Reasoning and Math
Let's see how Phi does with some basic logical reasoning. I didn't have high hopes for Qwen, but considering how much larger this model is than the 0.5 billion parameter Qwen model I expect it to do well. Also, Microsoft is boasting that Phi is designed for this sort of reasoning so it had better do pretty well.
Overall, these are great responses. Phi is pretty pedantic about fruits but it is technically correct (the best kind of correct). However, it should have properly recalculated the totals based on my request. The fact that it said it would remove tomatoes from the total, but then didn't, makes me wonder if it will trip itself up in other ways.
Phi 3 is very slow to generate on my little Lichee, but I'm not surprised. At 3.8 billion parameters this model would always have been pushing the boundraries of what my lichee could reasonably do. From here on out I'm using my desktop to run the model.
General Writing
Finally, lets see how Phi 3 can write emails and do general business like
Thats a solid email, but a little too "markety" and impersonal for my liking. Lets see if I can get Phi to make it sound a bit more personal.
Not really any better. Oh well, I guess thats just the nature of LLMs.
Lets continue on with the email writing.
Overall, I think Phi 3 did well with this. I'm not a huge fan of the tone but without a lot of prompt refining I doubt I could get it closer to my preferred conversational manner.
I know I promised to cover some of the cool optimizations that Microsoft is shipping for genai, but this post is quite long. So I'm going to explore the optimizations I tried in another post.
Conclusion
Microsoft's new Phi 3 model is actually pretty cool. It has a very distinct personality difference from Qwen. Phi definitely feels more like talking to a text book or an encyclopedia, complete with all the pedantry you'd expect. However, it fails miserably at text summarization. Thats such a basic LLM task that I'm surprised. Its quite possible I did something wrong, but Qwen was fully capable of summarizing without any extra setup.
Phi 3 3.8b is just too large to run on my Single Board RISCV system. Everything requires at least 20 minutes to generate. Phi's tendency to be verbose really doesn't help at all.
So what would I recommend Phi 3 for? Its a great model for a somewhat resource constrained laptop. Phi seems to be really good for logical reasoning and explaining educational concepts around math and logic. I'd probably use it as a (supervised) tutor for someone learning math or science related topics, but I wouldn't trust any LLM as a primary source of accurate information. It does well on writing emails if you don't mind the "markety" feel it gives.