Try 1x for free
1x is a curated photo gallery where every image have been handpicked for their high quality. With a membership, you can take part in the curation process and also try uploading your own best photos and see if they are good enough to make it all the way.
Right now you get one month for free when signing up for a PRO account. You can cancel anytime without being charged.
Try for free   No thanks
Magazine
AI and Photography: Part 3 - Midjourney vs Stable Diffusion

Written By Yan Zhang
Published by Yvette Depaepe, the 17th of May 2024

 

"Machines will be capable, within twenty years, of doing any work a man can do.” ~Herbert A. Simon (1965)

In July 2023, I attended in an AI research forum. An Amazon researcher introduced to us several AI projects currently undertaken at Amazon. During the event, we had lunch together. When she learned that I was also a photographer, she bluntly said to me: "Midjourney ended photography!"his statement, her words present the view of many professionals engaged in the cutting-edge research on generative AI. In this article, from the perspectives of both as an AI scientist and as a professional photographer, I try to thoroughly explore the profound impact that generative AI is having on traditional photography; and how we, as photographers, should face it to this challenge.

Next week: Part 4 - The Photographer's Confusion

Midjourney vs Stable Diffusion

2023 will be definitely a year wrinen in the history of AI.

In early 2023, ChatGPT, a large language model (LLM) launched by OpenAI, reached 100 million users in just two months. By mid 2023, the applications of ChatGPT and its successor GPT-4 have significantly expanded from initial Question Answering, document editing and creation, to a wider range of finance, health care, education, soiware development, etc.

At the same time, research on the diffusion model based image generation represented by Midjourney, Stable Diffusion, and DALL.E 2 have also achieved major breakthroughs. The main function of these models is to generate imagines of various styles from prompts. The most amazing of them is that the Midjourney and Stable Diffusion models can generate realistic images similar to photography.

Images Generated by Midjourney

Generally speaking, Midjourney can use relatively simple and direct prompts to generate high quality and photorealistic images. Here we demonstrate several various images generated by v5.0 and v6.0 versions.

“Everest Base camp”. Generated on Midjourney, by Yan Zhang.

 

“A young woman portrait”. Generated on Midjourney, by Yan Zhang.


“Mysterious forest”. Generated on Midjourney, by Yan Zhang.

 

“Dream seascape”. Generated on Midjourney, by Yan Zhang.

From the pictures above, we can see that Midjourney can produce nearly perfect "photographs". Midjourney is also good at generating non-photographic artworks, and can generate such artworks with even specific artist styles, as shown in the following.

“Picasso’s women”. Generated on Midjourney, by Yan Zhang.

The power of Midjourney with image generation has been widely recognised. However, since it is a fully closed system, Midjourney's model structure and training methods are unknown to the public, and users have to pay fees for using it through the Discord platform.

Stable Diffusion Model Structure

Stable Diffusion is an image generation diffusion model launched by Stability AI in July 2022. Unlike Midjourney, Stable Diffusion is a completely open system, so we can understand all examine all technical details of this model from the structure to the training process.

Figure 6. The main model structure of Stable Diffusion.

After we know the basic idea of the diffusion model (see Figure 4 and Figure 5), it is not difficult to understand the structure of the Stable Diffusion main model in Figure 6. The training image x is compressed into a latent vector z by the encoder, and the process of forward diffusion begins. During this process, noises are gradually added to the latent vector, and finally transformed into a noise latent vector zT; then the reverse diffusion begins. process. At this time, the additional "text/image" condition is converted into the representation of a latent vector through a transformer and implanted into the reverse diffusion process. In this reverse diffusion process, the neural network U-Net uses a specific algorithm to gradually remove noises, restore it to a latent vector z, and finally generates a new image x^ through the decoder.

It should be noted that aier the model completes training, we only need to use the reverse diffusion process as an inference engine to generate images. At this time, the input text/image is converted into a latent vector through the transformer, and reverse diffusion through U-Net begins to generate a new image.

The Stable Diffusion model in Figure 6 can also be roughly divided into three major components: the leftmost red module VAE, the middle green module U-Net, and the rightmost Conditioning transformer. Such a structural diagram will facilitate the description of the Stable Diffusion extension we will discuss later.

Figure 7. The three modules of Stable Diffusion correspond to the main structure in Figure 6. VAE (Variational AutoEncoder) compresses and restores images; U-Net neural network is used for the reverse diffusion process, which we also call inference; Conditioning transformer is an encoder used to convert text and image conditions, attached to the reverse diffusion process.

Stability AI uses 5 billion (image, text) pairs collected by LAION as the training dataset, where each image size is 512X512. The compuDng resources used for model training are 256 Nvidia A100 GPU processors on Amazon Web Services (AWS) (each A100 GPU has a capacity of 80 GB); the iniDal model training took 150,000 GPU hours and cost USD $600,000.

Images Generated by Stable Diffusion

Generally speaking, under the same prompt words, the quality of the pictures generated by Stable Diffusion is not as good as Midjourney. For example, using the same prompts of the "Mysterious forest " picture generated by Midjourney above, the picture generated by SD v1.5 is as follows:

"Mysterious forests". Generated on Stable Diffusion (use the same prompts as the same titled image shown above), by Yan Zhang.

Obviously, the quality of the picture above is not as good as the one generated by Midjourney, both in terms of photographic aesthetics and image quality. However, it would be a mistake to think that Stable Diffusion is far inferior to Midjourney.

Because it is open source, Stable Diffusion provides people with unlimited possibilities for subsequent research and development in various ways. We will briefly outline the work in this area below.

Using a rich prompt structure and various extensions, Stable Diffusion can also generate realistic “photography works" comparable to Midjourney.



“Future city”. Generated on Stable Diffusion, by Yan Zhang.


“A young woman portrait”. Generated on Stable Diffusion, by Yan Zhang.

 

“Alaska Snow Mountain Night”. Generated on Stable Diffusion, by Yan Zhang.

Stable Diffusion Extensions

The open source of Stable Diffusion allows AI researchers to carefully study its structure and source code, so as to make various extensions to the model and enhance its functions and applications.

The expanded research and development of Stable Diffusion is basically focused on the UNet part (see Figure 7). There are two main aspects of the work: (1) Based on the original Stable Diffusion U-Net, with a small amount of specific dataset to train a personalized U-Net sub-model. In this way, when the sub-model is embedded in Stable Diffusion, it can generate images with personalized styles that users want. Dreambooth, LoRA, Hypernetworks, etc., all belong to this type of work.

(2) Enhance control over the image generation process of Stable Diffusion. Research in this area is to design and train a specific neural network control module so that in the process of image generation by Stable Diffusion, users can directly intervene according to their own requirements, such as changing the posture of the character, replacing the face or background, etc. ControlNet, ROOP, etc., are all control module extensions that belong to this category.

In addition, we can also revise the original U-Net structure of Stable Diffusion and use a specific training dataset to train part or all of the modified diffusion model. The underlying diffusion model trained in this way can be targeted at specific application domains, such as medicine, environmental science, etc.

Stable Diffusion sub-model example. The author of this article downloaded 7 photos of Tom Hanks from the Internet as shown in (a). Then use the extension Dreambooth to train these only 7 photos to generate an "AI-TomHanks" sub-model. Embedding this sub-model in Stable Diffusion can generate an AI version of the Tom Hanks picture, as shown in (b).

In addition to U-Net, we can also make more modifications and extensions to Stable Diffusion in the two parts of VAE and Conditioning transformer, which we will not go into details here.

 

Comparisons between Midjourney and Stable Diffusion

Here based on my own experience, I made the following comparison of the six main features of the two.

User friendliness: From a user’s perspective, I think Midjourney is easier to use than Stable Diffusion. It is easier for people to generate more satisfactory pictures on Midjourney. If you are a Stable Diffusion user, you will find that in order to generate a high-quality image, in addition to working on prompts, you also need to have a suitable sub-model (also called checkpoint), no matter whether you are using SD v1.5 or SD XL v1. 0, therefore, it is relatively difficult.

Flexibility: In the process of image generation, Midjourney and Stable Diffusion provide different ideas and methods to control and modify the final output image. However, I think Midjourney's method is more intuitive and practical, giving users more flexibility. Although Stable Diffusion also provides more complex and richer image editing capabilities, such as inpainting, outpainting, upscaling, etc., it is not very easy to use in practice for ordinary users.

Functionality diversity: Because of open source and scalability, the functions of Stable Diffusion have been conDnuously enhanced, which has also made Stable Diffusion increasingly popular in various application domains in business, education, medical and scientific research. However, just from the aspect of artistic picture generation, both Midjourney and Stable Diffusion can generate stunning artistic pictures (photography, painting, cartoon, 3D, sculpture, etc.).

Image quality: Both systems can generate high-quality artistic images of all types. However, as mentioned before, Midjourney is slightly bener than Stable Diffusion in terms of the aesthetics and quality of the generated images.

Extendibility/Free use: First of all, Midjourney is not free to use, and it is not open source. For users who want to use generative AI soiware for free and have some IT knowledge background, I strongly recommend installing Stable Diffusion on their own computers, so that you can enjoy to freely create anything you are interested.

Photographers ask me, which one should we choose, Midjourney or Stable Diffusion?

My suggestions are as follows: (a) If you are limited by technology and/or resources (for example: you don’t know how to install and use Stable Diffusion, your computer does not have a certain GPU capacity), then you can just choose Midjourney. Although it requires a subscription fee, after learning, you will definitely be able to create great AI art works, and you can also use it to help you enhance your photography post-process workflow.

(b) If you are only interested in generating AI artwork and processing photos, I also only recommend using Midjourney and do not consider Stable Diffusion at all.

(c) If you have a certain IT knowledge background and are interested in the technical details of generating a wide range of artistic images, especially if you want to generate some personalized images, then I strongly recommend Stable Diffusion, because it is currently the most comprehensive generative AI soiware for image generation.

“Mountain sunrise”. Generated on Midjourney, by Yan Zhang.

“Silent valley”. Generated on Stable Diffusion, by Yan Zhang.

Mini AI knowledge: AI Winter - refers to the period from 1974 to 2000, when AI research and development, mainly in the United States, was at a low ebb, and research funding and investment were significantly reduced. The main reason for the AI winter is that since the mid-1960s, a series of large-scale AI research projects have failed or failed to make substantial progress. This includes: the failure of machine translation and single-layer neural network research projects in the late 1960s; the failure of speech understanding research at Carnegie Mellon University in the mid-1970s; and the stagnation of the fifth-generation computer research and large-scale expert system development during 1980s -1990s.

Write
...well, if a user starts producing ''WOW'' images from tomorrow, compared to the ''MMM...NOT BAD'' ones he produced until yesterday, he has certainly started using Midjourney....:):):)....
😊👍😊
Completely agree with Miro, what has the result by Midjourney to do with photography, ok, it grabs parts of other peoples photographies to combine it. Yes, dear Yan, you can use prompts very well but pictures created do not look like photographies but really artificial. Was also very astonished about the title, since AI is strictly forbidden here, sorry to be really negative on this article
😊👍😊
Yan Zhang, Thank you for the articles and images. I don't understand the technical details of how AI creates images, but it's clear that it will be a big change for visual artistry - much like the big change around 1840 when the invention of photography threatened to replace painting, and the more recent change when digital imaging and Photoshop began to replace film and darkroom. Painting survived, and Photography will too. Perhaps the evolution of technology will push us towards making meaningful images that can't be described with words.
Midjourney and Stable Diffusion are two leading generative AI applications that offer highly advanced functionality in their creation of images. While these two generative AI image creators share a similar focus, they are significantly different in their approach to AI image generation. Their difference boils down to a preference of artistic nuance versus extensive customization: Midjourney: Best for creating artistic, visually compelling images. Stable Diffusion: Best for extensive customization and technical control over image generation. This means both, Midjourney and Stable Diffusion are AI applications with ability to quickly generate images from text prompts. Use of AI for image generation is strictly forbiden in 1x. After reading this article (actually all 3 parts) I am non able to revise my previous comments to it. IMHO, this article is a promotion of AI, is it appropriate to promote something which was banned on this 1x platform? Furher, part of this article orignates from (licenced) work of Shigekazu Ishihara, Rueikai Ruo and Keiko Ishihara (Hiroshima International University), without mention their names. Dear Yan, please do not be upset about my comment(s) to this article, this is just my personal opinion on this subject. I am an engineer and also supporter of AI but definitely not in the photography. I wish you and all 1x readers lovely weekend.
Dear Miro, thanks for reading this article and provided your comments. As an AI researcher, I am definitely an AI supporter. However, if you read my later parts of this article, you should know my position about the relationship between AI and photography. Most importantly, no matter you like or not, AI is here, and its impact to photography is increasing, with the most photography industries are embracing AI. When Adobe first time embedded generative AI into Photoshop in 2023 officially, it has become clear that the traditional meaning of photography is getting complicated, and we may need to re-define it.
Dear Yan, I appreciate your answer to my comment very much, I understand it very well. I know that we can't stop any development, as you said also not in photography. I'm very sad about it, for me is photography an art, created by the photographer and his camera. Now with AI we don't need the camera anymore, we can create beautiful pictures with words only, I tried it, it is working very well. I must repeat, IMHO it is pushing real photo work, or photo artwork to offside. I'm not a very good photographer, but photography is a part of my life. I was always very proud when I saw that one of my humble photo was published or even awarded, but now I have to reconsider if I shall continue or not, because my chances in competition with AI touched photos are rather slim. Once more thank you for this most educative article. Have a very nice weekend.
Series 'Melancholy' by Hadi Malijani

By Michel Romaggi in collaboration with the author Hadi Malijani
Edited and published by Yvette Depaepe, the 22nd of May 2024

 

'Melancholy I'


Dear Hadi, can you introduce yourself and tell us what photography represents in your life.  Is it a job or a hobby?

I was born in 1989 in Shiraz, near the Zagros mountain range in Iran. I am a self-taught photographer. I started photography when I was 18 years old. I shoot in one of the most stressful places on earth, and this has put many limitations on me and my work. I wish that one day the whole world will be filled with love, peace and freedom. Photography is my main profession, but I am still learning. In addition to conceptual photography and creative editing, I also do wedding photography. 

 

'Melancholy'


Concerning the series 'Melancholy', can you share some secrets about your workflow?

To make this series, I first photographed and collected the details in the photos according to a list I already made. I locked up myself in my house, all on my own for a month to work on the details and to farther develop the idea, completely in focus on my work. I really can’t go back to normal life until the collection of photos was finished. I literally lived with and in the photos. To create a work, it may take a lot of time between idea, its development, execution and finalization.

 

'Melancholy II'

 

For each photo, I make a basic checklist of the details, such as the shape of the sky and the ground, and details such as trees, people, birds, or any detail I think should be in my photo. I take photos carefully and patiently and save them in my archives. I mostly use daylight, a mobile photography Xiaomi  T11 pro, a Sony Alpha a7R IV,20, 35 and 85 mm lenses.


You have to be patient, bold and creative in the composition of images.

  • I create the earth, the sky and the horizon, this make the world I need to realize my ideas.
  • After drawing the environment and the frame, I have to start adding details, pay attention to the texture of the ground, as well as the direction of the light.
  • I separate the subjects in the photos selected in my archive, from the original background so that I can better perform in combination with the original photo and other photo details.
  • I use the mode change in Photoshop to combine.
  • I add light and shadow, adjust the direction of the light and add areas that receive more light and directions that receive less light to the photo like a painting.  I do this in separate layers so that I can adjust the level later and change it if needed. This addition of light and shadows gives depth to the photo and has a great effect on its beauty, so I pay special attention to the light in all stages.
  • I add photo details including characters such as humans and trees with their shadows to the image in separate layers for each character. Then I make all layers into one master layer.
  • I import it into Camera Raw.
  • I always make the colours myself and do not use colour presets and effects.

I want to be the creator of the light and colour of my world.

 

'Melancholy III'


All your images seem to come out straight of a dream. What is you inspiration source?

Photography has a great impact on my life. It has always been a way to explore my inner and outer world. My ideas erupt like a fountain in just an instant. Listening to silence helps me with these imaginings.

I always read about different schools of art and about social and psychology. Philosophy and the world of physics are very enjoyable for me and I am always curious about the world around me. Of course, all of these can influence my artistic inspiration. Travelling in nature and looking deeply around me can be a good source of ideas.

Sometimes, while reading books on philosophy or psychology, or poems, etc., I come to a theory or a story, and this creates a picture.
For example, I was reading Sigmund Freud’s book when I came to the theory of free association. I went to my little brother’s room and took a pencil, an eraser and a pencil sharpener. At that moment, I drew everything that came to my mind on paper, which made me make this picture exactly from that sketch.

And I called this photo Free Association.

 

'Free Association'

Almost all my photos have their own story. I started making this series called 'Melancholy' on request of the famous Polish composer Zbigniew Preisner.
This collection is about his trip to the Grand Canyon, which led to the creation of the series.

Thanks a lot for the precious information you shared with us, Hadi!

Write
Great job, congratulations!
I am one of those fans who are fascinated by the originality of his work. The phrase at the beginning of this interview that he shoots in one of the most stressful places on earth also stuck with me. It's even more so because his work takes me to another world where I don't imagine that at all.
Thank you, kind friends like you always motivate me to keep going
big composation
Merci ostad Hamze jaan
Inspiring. Great work.Bravo!
Thank you very much dear friend
Great job, Congratulation dear Hadi.
Thank you very much dear Mohammadreza jaanam
Thank you so much for the wonderful and inspiring article with great works! Congratulations!
Thank you very much dear Eiji
Your work is so original and unique Hadi. Thank you for sharing your process, it is beautiful!
Thank you for your beautiful comment my friend
I am always in awe of your artwork, I love your minimalistic approach as a way to emphasize the essential while making way for a poetic vision. It is a splendid series, an elegy in itself,Warm congratulations, dear Hadi!
Thank you for your kind comment
Stunning pictures. Fascinating to read about your creative process.
Thank you very much dear Margaret
Great job, congrats Hadi !!!
Thank you very much dear Thierry
Wonderful atmosphere, colours and mood. Congratulations,
Thank you very much dear friend
Amazing work and story... Very creative...Congrats...
Thank you very much dear Gilbert
太棒了
Thank you very much dear friend
Absolutely stunning works, simply love this art!
Thank you very much dear Thomas
Tomo PRO
Cool! I like.
Thank you very much dear friend
Results Contest - Self-portrait

by Yvette Depaepe
Published the 21st of May 2024

 

'Self-portrait' challenge
Don't settle for a quick selfie when creating a self-portrait. Part of establishing is showing people who you are. An audience that connects with an artist is more appreciative of his or her work. Since we human beings are visually oriented, a shot of yourself is one of the most important ways you can make yourself visible to others. The participants to this challenge did a great job!

The winners with the most votes are: 
1st place : Ramiz Sahin 
2nd place : Noura aghdasi  
3rd place : Derya Doni 2117658

Congratulations to the winners and honourable mentions and thanks to all the participants in the contest 'Self-portrait' 

 


The currently running theme is 'The magic of dawn'.
The early morning offers a soft light and peaceful aesthetic. The low angle of the sun means the rays hit at a uniform angle, giving a diffuse, even light. For landscape photographers, it’s worth getting up early to capture this magical time of day. So ... Embrace the dawn and enjoy the serene and undisturbed atmosphere of early mornings.

This contest will end on Sunday the 2nd of June at midnight.
The sooner you upload your submission the more chance you have to gather the most votes.
If you haven't uploaded your photo yet, click here

Good luck to all the participants.

 


1st place : by Ramiz Sahin

 

 


2nd place : by Noura aghdasi

 

 

3th place : by Derya Doni

 


HONOURABLE MENTIONS

 

by Eric Drigny
 
 
 
by Alessandro Traverso
 
 
 
by Viktor Cherkasov
 
 
 
by Dieter Reichelt
 
 
 
by Susan Koehler
 
 
 
by Martin Kucera AFIAP AZSF
 
 
 
by Martin Fleckenstein
 

You can see the names of the TOP 50
 here.  

The contests are open to everybody except to crew members.

Submitting images already published / awarded on 1x is allowed.

AI. GENERATED IMAGES ARE NOT ALLOWED.
 
 
Write
Part 2 - The Research and Development of Generative AI Models

Written By Yan Zhang
Published by Yvette Depaepe, the 17th of May 2024

 

"Machines will be capable, within twenty years, of doing any work a man can do.” ~Herbert A. Simon (1965)

In July 2023, I attended in an AI research forum. An Amazon researcher introduced to us several AI projects currently undertaken at Amazon. During the event, we had lunch together. When she learned that I was also a photographer, she bluntly said to me: "Midjourney ended photography!"his statement, her words present the view of many professionals engaged in the cutting-edge research on generative AI. In this article, from the perspectives of both as an AI scientist and as a professional photographer, I try to thoroughly explore the profound impact that generative AI is having on traditional photography; and how we, as photographers, should face it to this challenge.

 

“Dream seascape”. Generated on Midjourney, by Yan Zhang.

 


PART 2 - THE RESEARCH AND DEVELOPMENT OF GENERATIVE AI MODELS

Reasoning and learning are the two most important features of human intelligence. In AI research, it always revolves around these two themes. After entering the 21st century, AI has gradually emerged from its long winter, and machine learning research has begun to make new and critical breakthroughs.


Deep Learning and ImageNet

Deep learning based on neural networks is one of many machine learning methods. Many of its key concepts and technologies were proposed and developed in the 1990s.

But deep learning really began to show its superiority over other machine learning methods in the first decade of this century. In 2007, Fei-Fei Li, an assistant professor at Princeton University, began to build the ImageNet dataset based on Fellbaum's WordNet with the support of Christiane Fellbaum, who was also a professor at Princeton University. In the following years, ImageNet collected 14 million classified and annotated images, becoming the most used training dataset for computer vision research at that time.

It is worth mentioning that at that time, machine learning research was focused on models and algorithms, and the training datasets used by researchers were relatively small. Li was the first researcher to focus on establishing extremely large datasets.

In 2012, the AlexNet model based on deep convolutional neural network (Deep CNN) stood out in the large-scale ImageNet image recognition competition, defeating other machine learning algorithms in image recognition with significant advantages. Since then, deep learning based on neural networks has become the mainstream of machine learning research and has continued to yield breakthrough results.


Generative Adversarial Networks (GANs)

In 2014, Canadian researcher Ian Goodfellow and his collaborators proposed a new neural network learning architecture, namely the generative adversarial network GAN, thus opening up a new research direction in generative AI. We can understand the basic principles of the GAN model from the following figure.

Figure 1. The general architecture of GAN.


Suppose we want to train an AI (GAN) model that can automatically generate human face panerns. First, we need to prepare enough (specific size) real human face photos as a training dataset, which is x in Figure 1. Secondly, we need to design two neural networks, called D and G - standing for discriminator and generator, respectively. Networks D and G compete in a zero-sum game mode: on the one hand, D continuously receives real face pictures from the training dataset and is told that these are human faces; on the other hand, network G generates a pattern, sends it to D and let it determine whether it is a human face pattern. Initially, G will only randomly generate irregular patterns. After D receives the face photo information from x, it is easy to recognise that the pattern generated by G is not a human face.

However, since both networks will continuously adjust their training parameters based on the evaluation results of each training cycle, the panerns produced by the generator are gradually getting close to human faces. This training process is repeated iteratively, and the pattern generated by the generator will continuously approach the real face pattern, until the discriminator can no longer tell whether the panern generated by G is a real face pattern input from x or a pattern generated from G.


Theoretically, in such a mutually competitive training method, G can eventually generate panerns that are not essentially different from any training dataset panerns.

However, in practice, it is still quite difficult to use GAN methods to generate realistic and very complex panerns. The most successful one is probably to generate realistic human face patterns: https://thispersondoesnotexist.com.

The main limitations of the GAN method are simply two aspects: First, the instability of training, which leads to model collapse and output low-quality pictures; second, because pictures are generated based on discrete pixel space, which may also easily lead to distortion or low-quality pictures.

Nevertheless, the GAN method has quickly become the mainstream of generative AI research since it was proposed. Researchers have jointly studied many different types of GAN models and key technologies related to them. Many of these results have also provided important support for the research on diffusion models.

Figure 2. Generated fractal images. Generated by Fractal_GAN model developed by the author.

Figure 3. Generated mountain images. Generated by Mountain_GAN model developed by the author.


Diffusion Models - A New Approach for Generative AI

The diffusion model is a new generative AI method proposed in 2015. Its intuitive idea can be understood through the following simple physical phenomenon.

Figure 4. An intuitive explanation of the diffusion model.


Suppose we drip a drop of blue ink into a glass of water (as shown in the lei picture above). Over time, the drop of ink will slowly spread, and finally dye the entire glass of water blue (as shown in the right picture above). This process is called forward diffusion. Now let's look at this diffusion process in reverse: if we know the diffusion trajectory of a drop of ink in clear water, then through reverse derivation, we can know the position and shape of the drop of ink in the clear water at the initial time. This process is called reverse diffusion.


Returning to the diffusion model, a clear image is equivalent to the initial drop of blue ink in the glass in our example above. The forward diffusion process of ink is equivalent to the process of continuously adding noises to the image, making the image slowly filled with noises. The reverse diffusion process of ink, on the other hand, is equivalent to the process of gradually removing noise from an image full of noises and restoring it to a clear image.

Through extensive learning and training, the diffusion model can finally obtain the distribution of noises in an image during the process of gradually adding noise, thus having the reverse diffusion process to remove noises and restore the original image. The general process is shown in the figure below.

Figure 5. Two diffusion processes in diffusion models.

 

 

The DeepMountain v1.1.4.2 model – that was developed and trained by the author based on the diffusion model architecture, can generate 512X512 high-quality mountain pictures, to the extent of photorealistic. Under the same prompts, the mountain imagines generated by DeepMountain v1.1.4.2 are richer than those generated by Midjourney v5.0 and SD v1.5.


Mini AI knowledge: Herbert Simon's prediction about AI made in 1965 did not come true. In the 50 years since then, people no longer seem to have expectations for Herbert Simon’s AI prediction. However, starting in 2016, this all began to change . . .

AlphaGo Zero: AlphaGo is a Go-playing computer program developed by DeepMind, a company located in London, England (later acquired by Google as a subsidiary). Unlike traditional AI chess-playing programs, AlphaGo's search algorithm is implemented through deep neural network training. In March 2016, AlphaGo defeated Korean 9-dan professional player Lee Sedol with a score of 4:1, and Lee became the only human Go player to have defeated the AlphaGo program.


In 2017, DeepMind launched a new version of AlphaGo called AlphaGo Zero. Compared with AlppaGo, this new version has made a significant leap: this intelligent program does not have any existing human knowledge about Go, that is, zero knowledge. All its subsequent knowledge about Go comes from self-learning - playing games with itself. It played 4,900,000 games against itself in three days. After 40 days of knowledge accumulation, AlphaGo Zero's Go level surpassed all older versions of AlphaGo programs.


On May 27, 2017, AlphaGo Zero competed with Ke Jie, the Chinese professional Go player ranked number one in the world at the time, and defeated Ke Jie 3:0.

At this point, AlphaGo Zero’s Go ability will never be surpassed by humans.

Write
Thank you all for the very interesting and important discussion! True to the motto “defeat the system at its own game”, I asked ChatGPT for reliable methods for photographic web portals to identify a photo to be a real photo (with origin from a camera taken through a lens) and not be manipulated or completely created by generative AI. Here is the response: === START ChatGPT Response === Identifying whether a photo is real (taken with a camera through a lens) or manipulated/generated by generative AI is a challenging task, but several reliable methods and technologies can help photographic web portals verify the authenticity of images. Here are some approaches: ### 1. **Metadata Analysis** - **EXIF Data**: Examining the Exchangeable Image File Format (EXIF) metadata, which includes information about the camera, settings, date, and location of the photo. Authentic photos usually have consistent and plausible EXIF data. - **Tampering Signs**: Checking for inconsistencies or anomalies in the EXIF data that might suggest manipulation. ### 2. **Digital Watermarking** - **Invisible Watermarks**: Embedding and later detecting digital watermarks that are not visible to the naked eye but can be verified to confirm the photo's authenticity. ### 3. **File Format Analysis** - **JPEG Structures**: Analyzing the JPEG compression artifacts and structure. AI-generated images might not have the same compression patterns as real photos. - **Noise Patterns**: Checking for the natural noise patterns that are characteristic of specific camera sensors. Generative AI images might lack these natural patterns. ### 4. **AI and Machine Learning Models** - **Deep Learning Detection**: Using deep learning models trained to differentiate between real and AI-generated images. These models can analyze textures, inconsistencies, and other subtle clues. - **Generative Adversarial Networks (GANs)**: Employing GAN-based detectors that are specifically trained to spot synthetic images. ### 5. **Photo-Forensic Tools** - **Forensic Software**: Utilizing specialized software like Adobe's Content Authenticity Initiative or tools like FotoForensics that offer various analyses, including error level analysis (ELA), to detect manipulations. ### 6. **Blockchain Verification** - **Provenance Tracking**: Using blockchain to record the history and provenance of images. Each image can be logged from the point of capture, ensuring that any subsequent manipulations are recorded and verifiable. ### 7. **Human Review** - **Expert Analysis**: Involving human experts to review suspicious images. Experienced photographers and forensic analysts can often detect subtle signs of manipulation that automated tools might miss. ### 8. **Multi-Layered Approach** - **Combining Methods**: Implementing a multi-layered approach that uses a combination of the above methods for more robust verification. Cross-referencing results from different techniques can significantly improve reliability. ### Implementation Example An example implementation for a photographic web portal might look like this: 1. **Initial Upload Screening**: - Run automated checks on EXIF data and file format structures. - Use AI models to flag potentially synthetic images. 2. **Forensic Analysis**: - Apply forensic software tools for deeper analysis of flagged images. - Check for digital watermarks. 3. **Blockchain Logging**: - Record the image on a blockchain for provenance tracking if available. 4. **Expert Review**: - Have human experts review images that pass initial automated checks but still appear suspicious. By employing a combination of these methods, photographic web portals can significantly enhance their ability to discern real photographs from manipulated or AI-generated images, ensuring the integrity of the content they host. === END ChatGPT Response === As a conclusion, it is immediately apparent how complex, challenging and probably even expensive such a task would be!
As I am a curious but impatient person, I have now read Professor Zhang's entire article and therefore know where his journey of thought is going (please follow the author link & blog). I would have appreciated it if Yan had simply explained to us in advance, with reference to his technical article on the web, which aspects of the use of AI here the users and the management of 1x.com should discuss with each other objectively in order to agree on a course of action for the near future. The use of AI-generated and manipulated images has long since gone more or less unnoticed here too, although it is currently strictly prohibited by management.
Dear Udo, thanks for your response and read my entire article from my website. In my opinion, as I have stated in my article (later part), that the traditional photography or called "pure photography" will stand, no matter how generative AI influences photography in general. But from a technical aspect, with the continuous advancement of generative AI models, very soon (if it is not now yet), we will not be able to distinguish between an AI generated image and a image taken from a camera, if no other means is involved such as checking RAW file, etc. So how can we trust a photographer's work when he /she claims that that work does not use any generative AI tools even if those tools are available in Photoshop? This is a big problem. Back to 1x, I have read that some awarded images were actually combined with generative AI components but curators did not discover during the evaluation process. At this stage, I do not know how we can resolve this problem. Before some new technology is developed, checking RAW files is the only way to find out if an image contains parts or all by generative AI, but obviously curators' workload would not make this method feasible.
Dear Yan, thank you very much for your comments, which I agree with without exception. Now it's getting exciting for 1x: 1) 1x remains a "pure" photo gallery, then compliance with the rules should at least be randomly checked 2) 1x opens up to the new technologies, then it should be recognizable for everyone whether it is a pure photo (only with basic editing), a composing (without exception from own photos), a completely AI-generated image or a composing with AI-generated image parts. I am very curious to see which path the management of 1x will take. For me, the AI-open path would be very exciting, as it would greatly expand the possibilities for the realization of image concepts. Greetings Udo
I understand that many people struggling about reading the technical parts of generative AI. For those not interested in this part, it can be skipped. However, I think for some readers who are interested in getting a deeper understanding about generative AI, this part provides some essential insights explaining why this technology can make such impact. In Part I, I read some people comments that if he/she found out an image was made by AI, not a real photo, he/she would not buy that image. Of course, this is a personal choice. But I think the point here is: as Adobe has made a decision to integrate more and more generative AI tools into Photoshop and other its apps, no doubt, now more and more photos will be processed using generative AI components that already embedded into Photoshop, and you has not way to find out whether a photo you are viewing if a real photo or an AI generated (whole or parts) image. So, how do you decide you should buy an image (photo) you like ? In the later part of this article, there is a topic I specifically discussed: Ownership and Content Authenticity. I believe there will be a good way to resolve these issues, but from both technical and legal viewpoints, it is not an easy task to achieve that goal. In summary, if you find this part is too boring or too technical to read, just skip it. I think the following more parts will be quite reader-friendly :) Thanks.
Dear Yan I don't agree with part of your response, in my opinion the readers have right comment on your article, positive or negative comments or just to open the discussion, the article autor should have enough time to response each comment individually, and not to write "if it is too boring or too technical to read, just skip it". I also not agree with your statement regarding Photoshop, Photoshop is photo enhancing software, everybody who use Photoshop should use only own or photos purchased or photos with author knowledge, IMO is someone is not during it then he is using stolen photos without author approval. I know that the whole photo world is slowly moving in this direction. Now I recognized that it was very good idea to publish this articles here, this will open eyes of 1x fellows und make them aware of this issue. Thank you for taking your time to create it and many thanks Yvette to publish it. Have a nice weekend and blessed White Monday.
Part 2 is also a very interesting article, but I'm not sure whether the scientific aspects of AI generators will find the right readership here in 1x Magazine. I have already commented on the topic of “photography will die because of AI” in Part 1 and had expected the author to comment on all the contributions before his next article was published. For me personally, the most important question would be how 1x will deal with images that have more or less AI-generated content in the future. I also expect clear statements on this from the management.
I have the same opinion Udo. It is interesting subject but for me a bit confusing, especially the link with 1x.
Thank you for this article, so rich in information, extremely interesting and provocative. We are witnessing a real revolution in terms of the creation and manipulation of images, the development of a new form of expression, not being in my opinion fully estimated the impact in all aspects, but predictable still the phenomenal scope. As for photography, it will remain a form of documenting the world and expressing the artistic vision for which there will be creators and consumers, just as there are for painting, sculpture, classical music etc. The times we live in are challenging from all points of view of sight, a fact that leaves its mark on those who want to express themselves artistically. Congratulations on this series of articles, and thanks for the effort, dear Yan!
Interesting to read article, but I must admit that I am completely confused. I don't understand what will tell the author to us, is it to be understood as instruction how to use AI for unrealistic photo production? If yes is 1x a proper platform for it? Unfortunately the author is not responding to our comments, this is not very good. I agree with Collin comment, AI will have, most probably, problems to survive without our photos. I wish all of you a very nice weekend and relaxing Pentecost Monday.
well said Miro
Anche io penso che sbagliano, chi dicono che la fotografia e finito, e non abbiamo più bisogno. Ricordiamo quando quando agli trenta del 1800 quando cominciava la fotografia, i pittori avevano paura che la fotografia porta via la loro arte la pittura. E esiste ancora. Fortunatamente!
Way over my head the technical side of AI but a fascinating article. The bit where the Amazon lady tells you photography is over? Don't they need our photos in the first place to generate from ??? Loving this articles though but hope she is wrong.
Light - The Key to Photography

by Editor Lourens Durand 
Edited and published by Yvette Depaepe, the 15th of May 2024

 

'Potatoes' by Bill Gekas

 

Right at the beginning of photography, George Eastman struggled with wet gelatine plates used at the time in the cumbersome process of taking photographs, until he found a way of producing dry plates in numbers and started up a small factory producing these plates. The venture attracted a businessman, Henry A. Strong, who invested in the emergent business. Together, the two gentlemen started up the Eastman Dry Plate Company and eventually, in 1892, the Eastman Kodak Company was formed. In the interim, however, Eastman replaced the cumbersome dry plates with roll film, to be used in his new Kodak camera, and the rest (as they say) is history.


One of George Eastman’s most famous quotes is:

“Light makes photography. Embrace light. Admire it. Love it. But above all, know light. Know it for all you are worth, and you will know the key to photography.”


Great words, but there are obviously several kinds of light. In this article, I will attempt to address the variety and quality of light, as well as the diverse ways it can be used in photography.

- Hard light
- Soft Light
- Natural light
- Artificial light

 

HARD  LIGHT is when a relatively small, focussed light source is pointed at a subject, especially from close by, creating sharp, harsh shadows with no gradation of light in the shadow area. This kind of lighting is often employed where stark contrasts are sought in a photo to convey a message of harshness.


SOFT  LIGHT
 is when a large, diffused light is placed quite close to the subject to create smooth, gentle graduations of light and shade and a much more mellow, dreamy effect, shaping the subject and creating mood. Interestingly, when a soft light source is moved further away from the subject it becomes less soft, so placing is important.


NATURAL  LIGHT
The sun, although celestially a huge body of matter, appears as a relatively small light source in a huge sky, resulting in particularly hard light if trying to photograph in full sunlight. Also causes your model's eyes to scrunch up or squint when posing in full sunlight. A way out is to move the subject into the shade and lighten it by using a large reflector to soften the light.

Soft light can also be emulated outdoors, even on a sunny day, by using a scrim (a piece of fabric, mesh, or netting stretched over a frame) held up between the sun and the subject. The scrim, by the way, can be fairly small or huge as, for example, in an outdoor shot of a new Lamborghini.

Even when shooting indoors you can find natural light, through a window or open doors. Natural window light can be either soft or hard, depending on subject placement and whether a lace curtain is covering the window or not.


ARTIFICIAL  LIGHT

The studio photographer, whether shooting portraits, still life, or product photography, is faced with an extremely wide range of lighting types and configurations.
Let’s have a look at some of them:

Open flash heads give a particularly hard light spread over a large area. They can be fitted with a reflector, which would normally narrow down the light down to 90°, but still a hard light. Another common adaptor used on a flash head is a snoot, which is a conical contraption fitted to the flash, narrowing the beam down to highlight small areas. It is often used in portraiture from behind the subject as a rim light to highlight the hair. In still life and product photography the snoot would assist in preferentially highlighting areas of special interest, such as a bottle label.


Soft-boxes
are light modifiers that give an exceptionally soft light and diffuse shadows. A soft-box is normally a square or oblong structure, 60cm or larger on the edge, with one or two layers of diffusion material in front of the light source and reflective sides internally that help to give a fair amount of directionality. It is often the go-to lighting for photographers and is widely used. On the downside, when used in portraiture, the catchlights in the model’s eyes are square or oblong, which can be a problem for some.


Strip-lights
are longer, thinner versions of a soft-box allowing the photographer to light the full body of a standing model. They are also used for special effects in other genres of photography.


Octa-boxes are similar to soft-boxes but have eight sides and are somewhat larger. Because of their size, octa-boxes can produce a really soft light. They are generally foldable and can be set up quicker than a soft-box. Their other advantage is that the catchlights in the eyes of the model are round. Outside of portraiture, there is not really any advantage over soft-boxes.

Umbrellas are more portable and more easily assembled than soft-boxes, which can be a benefit when travelling off-site. The white umbrellas can be used either as shoot-through or reflective light sources. When using a shoot-through approach, the umbrella can be positioned quite close to the model, giving a nice soft light (although not as soft as a soft-box). If used as a reflective light source, the umbrella is turned around and the light source is facing away from the model and at a greater distance, resulting in some light fall-off. One does get umbrellas that are silvered on the outside, and this reduces the fall-off somewhat. The catchlight in the eyes of the model is round (compared to that when using a soft-box), but the reflected shafts of the umbrella are not that attractive.


Beauty Dishes
give a different look to soft-boxes. They are designed for use at close distances and yield a special, focused light that has a unique, subtle wrapping effect that shapes the model’s face in a way different from that of soft-boxes. The beauty dish has a bowl shape, with a small reflector centred in the front, facing the light source. This reflector bounces the light from the light source back onto the bowl, which then bounces the light back outwards onto the model.


Flags and reflectors are not really light sources but are useful in blocking off light on selective areas, such as when avoiding glare off glassware or brass ornaments in still-life setups, or on reflecting additional light into darker shadows on the subject. They can be large or small black cardboard sheets in the case of flags, or white Styrofoam sheets, white cards, or aluminium foil used as reflectors.

 


STUDIO PORTRAITURE LIGHTING SETUPS

There are obviously an infinite number of ways that these individual lights and combinations can be used in a studio setup, but here are some of the more common, tried and tested setups, all relying on positioning of the light sources relative to the model.

Split Lighting - This lighting scheme results in a small, soft shadow of the subject’s nose on the cheek. The model typically stands or sits with his or her body slightly turned away to the photographer’s right. A single light source is placed in line with, or slightly behind the model, on the left side, and its intensity adjusted to give the desired exposure. The camera itself would be set on ISO125, 1/120 sec shutter speed, and exposure at F/11 or so as a start. In some cases, it may add an extra bit of zip if a snoot is placed behind the model, on the opposite side of the main light.

Rembrandt Lighting – Rembrandt light is so called because it was extensively used as natural lighting by Rembrandt in his portraits, including his self-portrait, and is quite dramatic (the effect is said to be due to the shape of his studio, with a window as the only light source, high up on the left-hand side of the artist). The effect is typified by a small triangle of light on the subject’s cheek on the shadow side. The eye on the shadow side must have some light, or else the picture looks unbalanced. The main light is to the left of the photographer, slightly more forward than with split lighting, and also higher up so that the shadow of the nose falls down the cheek. It is a good idea to have a reflector on the opposite side of the light, or maybe even a fill light at low intensity slightly to the photographer’s right. The purists say that the triangle of light should be no longer than the nose and no wider than the eye.


Loop Lighting
Loop lighting is very similar to Rembrandt light, except that the main light is moved slightly backward (towards the photographer) to a 45-degree angle with the model, which results in the shadow of the nose forming a slight loop below and to one side of the nose.

Butterfly Lighting – Here the main light source is directly above the subject so that the photographer is shooting almost underneath the light source, resulting in a butterfly-shaped shadow below the nose. This lighting setup is frequently used in glamour shots, because it is quite flattering, and works particularly well with a Beauty Dish.

Clamshell Lighting Similar to Butterfly Lighting, except that a large reflector is placed below the subject and facing upwards, helping to soften the shadows under the chin, nose, and eyes.


In conclusion, I hope that this article goes some way in helping to understand the quality of light, the variety of equipment available, and how to use it creatively in progressing in the art of photography, as exemplified in the following selection of photos from 1X.com photographers.

Lourens Durand 

 

'Mahya' by Ali Shahraki

 

 

'Floral' by Q liu



 

'Lewis' by Debra Harder
 


'Quiet time' by Lao Qi


 
'What's happening?' by Q liu
 


'lighting way' by Nasrinmazalahi


 

Untitled by Katsuhiro Kojima

 

 

'Katya' by Valeriy Kasmasov

 

 

'Brothers' by deskounlmtd

 

 

'Bo' by Lifeware

 

 

'Autumnus' by Bill Gekas

 

 

'Midnight Story' by Alfredo Yañez

 

 

Elderly smoker' by Sergio Pandolfini

 

 

'Abah' by Andi Halil

 

 

'Portrait' by JAE

 

 

'hair wash day' by kenp

 

 

'HER WORLD....' by Tjipto Suparto

 

 

'Oksana' by Zachar Rise

 

 

'PossessioN' by Emerald Wake

 

 

* by Robert Maschke

 

 

Untitled by Robert Becke

 

 

'Sylwia' by Jaroslaw Saternus

 

 

Pears' by Bill Gekas 

Write
Great article and beautiful photos!
Thank you so much for the wonderful and helpful article with great photos!
Thank you Eiji.
Excellent Portraits. Yes, Light is the key... Light and shadows work like a marriage.
Too true.
Obrigado
You are welcome.
Brilliant and very interesting article, thank you so much!
Thank you Vasil.
Great article and wonderful pictures! Congratulations and thank you for sharing! <3
Thank you Gabriela.
Thank you for sharing a pic.
You are welcome.
Thank you for such a complete and informative article on light and for the great photography to illustrate it.
Thank you Jane. Much appreciated.
Brilliant article and great models.
Thank you so much Ray.
It's been a while since I've seen such an interesting and instructive topic on the page. It is a complete compendium on photography. Beautiful and good photos chosen for the topic. Many thanks to the editor Lourens Durand, Warm greetings.
Thank you so much Asuncion, and warm greetings.
Beautiful work . Thank you for sharing
Thanks, and you are welcome.
Interesting article on the beauty of light and how it is created and used by us all . Wilh amazing photographs to illustrate. :)
Thank you Colin.
Interesting and fine article, Lourens. Thanks for sharing it with the readers. Cheers, Yvette
Thank you Yvette. It's only a pleasure.